IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2014/09/02)2014/09/03 
mvrhel_laptop bbiaw00:14.56 
Toppi hi06:34.55 
ghostbot Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.06:34.55 
Toppi Is there any plan to support stylus pen(android) on Mupdf?06:36.11 
chrisl Toppi: you'll probably have to wait a few hours - the main mupdf devs probably won't be here for 2-3 hours yet.06:38.05 
Toppi OK06:38.20 
chrisl And I don't know enough about mupdf or Android to know what supporting "stylus pen" would involve......06:39.02 
Toppi Stylus pen return (x,y,presure) instead of (x,y) of the touching point, the presure can be used to determine thinness.06:41.25 
chrisl As mupdf is primarily for viewing rather than editing, I'm not sure just how much utility there would be in that - but as I said, it's not my area06:43.18 
callgovind hi all07:55.51 
  can anyone tell me that how mupdf search text works?07:56.21 
  ?07:57.41 
  anyone?07:59.33 
kens what ?07:59.52 
chrisl callgovind: it's not really my area, but I think it extracts the text from the PDF and searches it......08:01.47 
  off for a couple of hours.....08:12.19 
kens bb08:12.23 
norbertj chrisl: hi, do you know if anyone is picking up on my 695374 perf problem. Especially my question in comment#20 . I have the impression that the renderingthreads are not running in parallel. Is Ray the person to check on this?11:15.57 
chrisl norbertj: yes, Ray probably needs to weigh in on it - it might have gone under his radar because it's a PCL issue.....11:18.17 
norbertj chrisl: when will he be around? He'll see the logs, so I think he will then follow up?11:19.09 
chrisl norbertj: he's in SoCal so it will be later this afternoon - I'll make sure he's aware11:19.45 
norbertj chrisl: thanks11:19.59 
chrisl norbertj: we *might* want to create a new bug, as the original issue in 695374 seems to have been left behind11:21.52 
norbertj chrisl: ok I'll make a new one regarding NumRenderingThreads etc.11:22.30 
chrisl norbertj: or we can do it if henrys and/or Ray feel it's necessary, and let you know11:23.11 
norbertj chrisl: fine with me.11:23.31 
chrisl norbertj: okay, I'll take it over with them - it's just, personally, I find it hard to keep track when bug threads head off at a tangent11:24.07 
norbertj chrisl: I know, same here ;)11:24.32 
rayjj norbertj: (for the logs) I haven't had a chance to look into the NRT>0 performance issue yet. It is true that contentions and mutex locking _can_ impact degrade things, particularly on Windoze that seems particularly bad at mutexes13:13.56 
  norbertj: This seems to be best instantiated under bug 694750. This is listed as a P1 enhancement.13:16.37 
chrisl rayjj: thanks for spotting that - but sorry it got dumped on you!13:42.12 
norbertj rayjj: thanks, I will follow 694750. Could you add 661 to the customer ids?14:30.50 
henrys Robin_Watts: did you say you'd been to Morroco? We're thinking of adding a warm leg onto the December trip - considering options.14:34.40 
Robin_Watts henrys: Nope, never done North Africa.14:34.55 
  but Helen did a day trip there from Gibraltar.14:35.08 
  and we have friends who did a longer holiday there.14:35.16 
henrys a friend of mine did a tour of the sahara with camels starting there. They really enjoyed it.14:36.05 
  seemed like something you might have done ;-)14:36.36 
Robin_Watts Morocco seems one of the safer North African destinations at the moment.14:37.18 
norbertj chrisl: thanks14:39.51 
chrisl norbertj: np!14:40.28 
rayjj chrisl: (norbertj for the logs) I updated bug 694750 with the relevant info and changed this to "normal" so I will at least investigate it. If it turns out to be something that is not eassily addressed, we may revert it to enhancement15:12.31 
  chrisl: but at least my analysis should allow us to better understand when, and when not, to use NRT > 015:13.16 
chrisl rayjj: thanks, I was just worried about the original bug from norbert spiralling off to areas not really related to *that* bug report15:15.21 
henrys Robin_Watts: certainly not taking sides with the customer but that is pokey right? That looks more like a high resolution rendering time than a check for color, around 200 ppm.15:23.33 
kens 200 ppm seems reasonable to me. You have to interpret the file, decompress images, examine every image sample etc.15:24.34 
  rendering 200 ppm at high resolution would be pretty quick....15:25.05 
Robin_Watts henrys: yeah. For color ones we can skip the image decompression as soon as we find the first color pixel/object.15:25.07 
henrys kens: I guess it depends on the resolution of the images, this is done in source space15:25.07 
Robin_Watts For greyscale we need to decompress *everything*.15:25.22 
kens Certainly it will depend heavily on the image data.15:25.41 
  Big images are obviously going to take longer, big images in JP2k even more longer15:26.01 
Robin_Watts Images are jpegs at 2560x3840.15:26.19 
kens So, pretty big then.15:26.28 
tor8 Robin_Watts: properly grayscale images shouldn't be decompressed at all, only grayscale images masquerading as color will have the worst case behaviour15:27.15 
Robin_Watts tor8: Right.15:27.26 
kens Sure, but it means you ahve to decompress all colour images15:27.37 
Robin_Watts I think the slow pages are ones with a single huge color image on.15:27.50 
kens you're talking 10 million damples for those images, each of which has to be individually checked, and compared against a threshold.15:27.59 
henrys okay I didn't expect images that large.15:28.23 
tor8 kens: yeah. but once we find the first color pixel we stop looking. still, we have to start over on each page since we flag it per page, not per file but that would be an easy change if that's what they want15:28.38 
Robin_Watts No, they want per page.15:28.50 
kens tor8 yes, but the worst case is that the images *are* grayscale, so you have to check every pixel15:28.58 
  I'm guessing that's the case here15:29.29 
Robin_Watts kens: No, the slow pages are single huge color images.15:29.41 
  And the time is in the decode of the image, not the checking, I'd guess.15:29.55 
  We *could* decode images subsampled, which would be faster.15:30.12 
kens Hmm, well that is still not unlikely, JPEG is slow15:30.12 
tor8 we still decompress the whole image, but we stop looking at the pixels and comparing colors (and skip the rest of the page) once we see it's not grayscale15:30.13 
henrys are yo decoding a scanline or the entire thing?15:30.26 
Robin_Watts whole thing.15:30.32 
kens You can't decode a scan line in JPEG15:30.38 
tor8 decompressing it still needs to be done, though. and our interfaces are not complex enough to decode a scanline at a time.15:30.58 
Robin_Watts kens: You can decode at macroblock level, most of the time.15:31.03 
kens Yes, but that's still a number of scans15:31.14 
Robin_Watts If we cared, we could extend the image interface to do this faster.15:31.32 
  but it's a non-trivial amount of work.15:31.39 
  unlike the device itself which has been pretty simple.15:31.53 
kens How long does GS take to do the same job ? Is there some other tool which does it significantly faster ?15:32.00 
kens suspectrs not15:32.08 
tor8 Robin_Watts: the only easy performance increase I see possible now is keeping a list of images we've already checked15:33.16 
  and even that is likely to double the amount of code used by the device, and probably not to any significant benefit15:33.36 
kens That seems unlikely to help with this file, presumably each page is unique15:33.47 
tor8 huge full page graphics that take a lot of time are unlikely to be sharde15:33.49 
Robin_Watts yeah.15:33.51 
chrisl If they actually knew enough to know what they want, I reckon they'd want the entire source image checked, and not the subsampled image15:34.10 
Robin_Watts chrisl: indeed.15:34.29 
henrys kens: it isn't scanline at a time but there is a scanline buffer in the jpeg stuff and it is not necessary to decode the entire image to fill that buffer ...15:35.51 
  kens: in the gs jpeg filter15:36.07 
kens No, but you can't decode a single scan line form a jpeg file15:36.09 
chrisl This is JPX, isn't it?15:36.23 
henrys yes I understand that.15:36.26 
Robin_Watts No, not JPX.15:36.31 
chrisl Oh, okay.....15:36.38 
Robin_Watts If we were working at the pdf stream level, then yes, we could bale out without having to decode the whole thing.15:36.45 
  But we're not.15:36.47 
kens JPX would be even worse15:36.48 
Robin_Watts We working at the fz_device level.15:36.53 
  The fz_device gets called with fz_images.15:36.59 
  and the simple thing to do with an fz_image is to call it's 'getpixmap' function.15:37.31 
  Then you check the pixmap.15:37.38 
chrisl You could give them the option of just checking the colour space.....15:37.46 
Robin_Watts That's what we started with :)15:37.58 
henrys decoding blocks would seem to give a really good speedup with that size image but I don't know if we want to bother for this customer.15:37.58 
kens Then they complain that 'its wrong'15:37.59 
Robin_Watts They already complained it was wrong.15:38.08 
  We *could* call the 'getsource' member and get a compressed buffer and a format back.15:38.21 
  And then do a special check for formats we understand.15:38.40 
chrisl So the device could decode the image any way it saw fit?15:38.52 
Robin_Watts The device could see if the fz_image had the original compressed data available, and whether it was in a format that it understands.15:40.04 
  If not, it would drop back to the current code.15:40.10 
tor8 chrisl: yeah. the fz_image is just a wrapper around a compressed buffer with some info about its format.15:40.16 
  and a convenient getpixmap function15:40.29 
Robin_Watts More exactly the fz_image is an abstraction to encapsulate images; most of the time they have a compressed buffer with the original data, but not always.15:40.56 
  Sometimes you'll get fz_images that just wrap bitmaps.15:41.19 
chrisl Presumably that would also mean handling multiple colour spaces15:41.32 
  Rather than just RGB15:41.41 
Robin_Watts fz_image has a colorspace field in it.15:41.49 
  getpixmap returns the pixmap in whatever the native image colorspace is.15:42.16 
kens What do you do about shadings ?15:42.17 
Robin_Watts So the device converts to rgb.15:42.21 
  kens: We check the colors at the vertexes.15:42.31 
tor8 kens: or we check the lookup function for parameterized shadings15:42.43 
kens OK seems reasonable, if the vertexes are gray the whoel shadingf is15:42.51 
Robin_Watts The code for shadings is admirably small cos of the abstraction we did for drawing them. It fell out very nicely.15:42.56 
chrisl TBH, for this customer, I reckon it's not worth a lot of extra effort.....15:43.29 
rayjj chrisl: probably the case. HCL's customer is not one of our top-ten15:44.43 
Robin_Watts Lets wait and see what they say.15:44.51 
rayjj Robin_Watts: agreed.15:45.02 
Robin_Watts If they come back with a valid reason why they need it faster, we can think about it some more.15:45.04 
  Like "well, tool X does it in only 2 minutes" or something.15:45.16 
tor8 Robin_Watts: then we'll say "so why don't you just use tool X?"15:45.39 
rayjj Robin_Watts: do you want me to bemchmark the 3001 page file with gs vs. mudraw (or did you already) ?15:45.52 
chrisl I have a feeling that whatever we do, they'll want more, and they'll want it faster - and finished yesterday15:45.54 
Robin_Watts tor8: Ahem, the idea with customers is to get them to pay *us*, not someone else :)15:46.01 
tor8 Robin_Watts: not if it means we actually have to work for it ;)15:46.20 
Robin_Watts rayjj: I did not do that.15:46.28 
rayjj chrisl: of course HCL will -- that way they can look like heroes to the actual customer (and we don't get any credit)15:46.30 
Robin_Watts Possibly we should say "there are further speedups possible, but that would require an NRE agreement"15:47.07 
rayjj now that Nori's not there, working directly with that customer would probably be much more trouble than it's worth15:47.12 
Robin_Watts which will shut them up nicely.15:47.22 
henrys rayjj: we should be able to do it much faster in gs since it isn't decoding the entire image.15:47.28 
Robin_Watts but gs is writing the clist.15:47.40 
  so it does decode the whole image, I think15:47.51 
henrys why would we need a clist?15:48.01 
Robin_Watts and scales and plots it.15:48.03 
rayjj henrys: actually, gs does decode the entire image (as currently implemented) since it writes the entire clist for the page before we look at the color/gray status15:48.19 
kens Could you restrict Mudraw to a subset of the pages, and run multiple instances ?15:48.21 
Robin_Watts That'sJustTheWayItWorks(TM)15:48.24 
henrys rayjj: even with nullpage?15:48.35 
Robin_Watts kens: sure.15:48.41 
rayjj henrys: it's just how it's implemented -- it was done for cust 801 and they always wanted a clist anyway15:48.50 
kens Might be a valid solution for them, then15:48.54 
Robin_Watts kens: clue.required > 0 for that solution though.15:49.10 
kens :-D15:49.17 
rayjj henrys: what is needed is an actual 'detect' device that is more like mudraw's (akin to bbox) 15:49.21 
kens We could buy a big stick.....15:49.25 
rayjj henrys: the bbox device gets partial info, but the detection for images was done (by mvrhel) in gxclimag.c15:50.20 
henrys rayjj: yeah i'm thinking if we start from the tracing devices that would be a nice solution.15:50.21 
rayjj henrys: starting from bbox is even nicer15:50.43 
  at least, if I were to do it, that's what I would start with15:51.14 
Robin_Watts henrys: Making mupdf do this with the alternate image handling would be easier than getting gs to do it, and would be faster in the long term, I think.15:51.17 
  It's 1-2 days work.15:51.32 
chrisl We can't really bale out of the image in Ghostscript either..... otherwise we could end up trying to interpret the image binary15:52.23 
rayjj Robin_Watts: that's probably what the 'graydetect' device in gs would be, given what's already implemented for everything except images.15:52.26 
  Robin_Watts: and the plumbing to have gs skip out once color is detected is quite easy15:52.59 
Robin_Watts You're more hopeful than me about the time taken to actually get stuff done properly in gs.15:53.20 
rayjj gs is nice because skipping all the text is easy15:53.26 
  Robin_Watts: did I say "properly" (not that such a thing exists in most of gs)15:53.54 
  I have to head to a doctor's appt. 15:54.26 
  bbiaw15:54.32 
Robin_Watts_ They are busily installing fibre in the village today.15:54.39 
  woo hoo!15:54.45 
rayjj I'll look at the performance issue with NRT while waiting...15:54.58 
henrys Robin_Watts: well there is a .05% chance they won't respondt or you last email let's pray for that.15:54.59 
  rayjj: the board is so much more important than that.15:55.25 
chrisl rayjj: yes, the dev board to the doc's with you! :-)15:56.02 
Robin_Watts_ rayjj: mupdf already skips text.15:56.08 
henrys chrisl: right he needs to have that board in his pocket at all times.15:56.41 
rayjj chrisl: henrys: where is the 25pages.pcl file for norbert ? (the one that has 47 pages) 15:56.47 
chrisl Probably on peeves.....15:57.03 
rayjj henrys: I can't work on the dev board at the doctor15:57.18 
  henrys: if only it was battery powered ...15:57.43 
henrys rayjj: you shouldn't be working on anything at the doctors read a damn magazine15:57.46 
chrisl rayjj: on peeves: /home/norbert/20140721_profiledata/25pages.pcl15:58.33 
rayjj henrys: yeah, right. People or Us or O -- I'd rather have needles stuck in me (oh, right, that might happen anyway)15:58.34 
  chrisl: thanks.15:58.39 
henrys coffees15:58.41 
chrisl And I don't think a color detection device in gs would buy us anything over the mupdf one.....15:59.17 
rayjj chrisl: no, I doubt it as well15:59.40 
chrisl rayjj: like I said, we can't bale out of the image early in gs either....15:59.57 
rayjj as long as mupdf can bail early on images (gs handles things more as scanlines)16:00.10 
chrisl rayjj: but we still have to consume all the scanlines16:00.31 
henrys chrisl: no you don't.16:00.51 
kens We certainly do. PostScript or PDF we will need to consume all the input btes16:00.56 
  Otherwise as Chirs says the PS interpreter will be trying to consume the binary compressed stream16:01.21 
henrys pxl doesn't there is nothging in the filter that imposes that16:01.22 
chrisl henrys: Postscript in particular we don't know the size of the compressed data....16:01.44 
kens PostScript does, and the PDF interpreter is in PostScript, so....16:01.45 
henrys anyway this is never going to be fast unless you can decompress a block at a time and bale early. 16:02.57 
kens Large gray images masquerading as colour will always be slow since they must be completely checked.16:03.35 
henrys I'm surprised about PDF and ps though that seems out of sync with how the filter code usually operates, but I haven't looked at it in years.16:04.13 
kens I still think that if a multi-thousand page file processes slowly then the answer is to break the problem up and have multiple instances process the file. Which MuPDF is better suited for anyway16:04.18 
chrisl I can't remember how pxl specifies image dimensions etc - does it include the length of the compressed stream? (and can it be trusted)?16:04.23 
kens henrys, its not the filter code. THe problem is that if you stop consumign data from teh input stream, before exhausting all the dtaa, then the interpreter starts interpreting the remaining data as PostScript.16:04.56 
  And you can't know in advance how much data there is in a PostScript image.16:05.18 
  Well, you know how much there is in the decompressed image of course.16:05.33 
chrisl Or in a PDF, since we can't and don't trust the compressed stream length!16:05.58 
kens Yes indeed, because it canbe wrong and we're expected to fix it.16:06.15 
chrisl In PDF you can flush to the endstream/endobj, though16:06.37 
kens And trying to have the PDF itnerpreter break out of an image part way makes my head hurt16:06.40 
henrys anyway I'm sure doing anything in gs will be order of magnitude harder than mupdf so we don't need to worry about it.16:06.44 
  if the customer is interested XL color detectinon then ... ;-)16:07.10 
chrisl Hmm, a gs device that only works with one interpreter - as if we didn't have enough trouble!16:07.39 
rayjj chrisl: it's easy to bail out with gs16:08.30 
chrisl rayjj: no it isn't16:08.46 
kens rayjj how would you avoid an error ?16:08.50 
chrisl Well, it is easy to bail out, but you can't complete the job.....16:09.18 
kens WHich works for 'document' colour detectgion, but not individual pages16:09.59 
chrisl Yeh16:10.09 
henrys yes pages make the problem much harder.16:10.33 
chrisl Hmm, I don't see how pxl can abort the image either.....16:12.02 
  pxl doesn't seem to have advance notice of the compressed data size, either, so it seems to me it will run into the same problem Postscript will16:14.07 
kens Goodnight all16:24.58 
henrys chrisl: there's a datalength operator16:30.01 
  chrisl: does acrobat/adobe pdf continue after a jpeg error?16:31.16 
chrisl henrys: yes, sometimes acrobat carries on16:34.57 
  Like I said, in PDF you have the endstream string you can search for (although, that's been known to be missing, too)16:35.57 
henrys chrisl: right, I imagine you could easily trick the XL code it just hasn't been beaten up as much as gs16:37.14 
chrisl henrys: it still leaves Postscript, where we have no data length, and no sure end of data "tag"16:38.18 
henrys chrisl: sure but obviously the customer doesn't care about postscript - well I hope they don't ... did anyone ask.16:39.37 
  ?16:39.39 
  It's not going to be an easier problem to solve in mupdf in ps is needed ;-)16:40.20 
  s/in/if16:40.34 
chrisl henrys: No, I'm pretty sure it's only PDF - but it means the "early abort" for images isn't really viable in Ghostscript at all16:40.47 
  We'd run into problems trying implement it even for PDF in the current Ghostscript scheme, and I'd be reticent to devote a lot time to a device that wouldn't work with Postscript16:42.03 
Robin_Watts henrys: Sure it is. We run it through gs with pdfwrite, then call mupdf :)16:43.40 
  hi fredross-perry.19:36.35 
  I'm about to disappear for the night. Anything I can help you with before I go?19:37.10 
fredross-perry Hi Robin. No thanks, Michael is giving me good help. Cheers.20:07.24 
  hi michael, calling you now...20:08.15 
rayjj chrisl_away: kens: (for the logs) gs can "recover" from an error thrown during processing a PDF. Here's an example:20:19.47 
  gswin32c -r9020:19.49 
  at the GS> prompt, enter:20:19.50 
  (tests_private/comparefiles/Bug689450.pdf) (r) file runpdfbegin 1 1 dopdfpages20:19.52 
  -- Error is thrown: Error: /typecheck in /20:19.53 
  then at the GS<5> prompt, enter:20:19.55 
  clear 2 2 dopdfpages20:19.56 
  This completes normally and thus continues after the error. This mechanism _could_ be used to "examine" PDF's for pages with color vs. monochrome if the "device" throws an error when color is detected from the gray detection logic (probably, maybe, if, ...)20:19.58 
  oops -- that gs invocation was actually: gswin32c -dPDFSTOPONERROR -r9020:23.44 
rayjj wishes that IRC had the skype "go back and correct" feature (but not just 1 message)20:24.55 
  note, that feature would save Robin (or marcos or I) a lot of editing when someone (henrys) mentions a customer by name ;-)20:28.17 
Robin_Watts Muhaha.23:44.06 
  I have 3001Pages.pdf running through mupdf in 18 seconds now :)23:44.21 
  and it's a small patch too. I love it when abstraction pays off.23:50.58 
  Patch on robin/master for tor to look at in the morning.23:51.07 
rayjj Robin_Watts: what was it before the patch ?23:54.16 
rayjj vaguely recalls 30+ sec23:54.41 
  Robin_Watts: and does it still get the correct result ?? ;-)23:55.57 
 Forward 1 day (to 2014/09/04)>>> 
ghostscript.com
Search: