| <<<Back 1 day (to 2014/09/02) | 2014/09/03 |
mvrhel_laptop | bbiaw | 00:14.56 |
Toppi | hi | 06:34.55 |
ghostbot | Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line. | 06:34.55 |
Toppi | Is there any plan to support stylus pen(android) on Mupdf? | 06:36.11 |
chrisl | Toppi: you'll probably have to wait a few hours - the main mupdf devs probably won't be here for 2-3 hours yet. | 06:38.05 |
Toppi | OK | 06:38.20 |
chrisl | And I don't know enough about mupdf or Android to know what supporting "stylus pen" would involve...... | 06:39.02 |
Toppi | Stylus pen return (x,y,presure) instead of (x,y) of the touching point, the presure can be used to determine thinness. | 06:41.25 |
chrisl | As mupdf is primarily for viewing rather than editing, I'm not sure just how much utility there would be in that - but as I said, it's not my area | 06:43.18 |
callgovind | hi all | 07:55.51 |
| can anyone tell me that how mupdf search text works? | 07:56.21 |
| ? | 07:57.41 |
| anyone? | 07:59.33 |
kens | what ? | 07:59.52 |
chrisl | callgovind: it's not really my area, but I think it extracts the text from the PDF and searches it...... | 08:01.47 |
| off for a couple of hours..... | 08:12.19 |
kens | bb | 08:12.23 |
norbertj | chrisl: hi, do you know if anyone is picking up on my 695374 perf problem. Especially my question in comment#20 . I have the impression that the renderingthreads are not running in parallel. Is Ray the person to check on this? | 11:15.57 |
chrisl | norbertj: yes, Ray probably needs to weigh in on it - it might have gone under his radar because it's a PCL issue..... | 11:18.17 |
norbertj | chrisl: when will he be around? He'll see the logs, so I think he will then follow up? | 11:19.09 |
chrisl | norbertj: he's in SoCal so it will be later this afternoon - I'll make sure he's aware | 11:19.45 |
norbertj | chrisl: thanks | 11:19.59 |
chrisl | norbertj: we *might* want to create a new bug, as the original issue in 695374 seems to have been left behind | 11:21.52 |
norbertj | chrisl: ok I'll make a new one regarding NumRenderingThreads etc. | 11:22.30 |
chrisl | norbertj: or we can do it if henrys and/or Ray feel it's necessary, and let you know | 11:23.11 |
norbertj | chrisl: fine with me. | 11:23.31 |
chrisl | norbertj: okay, I'll take it over with them - it's just, personally, I find it hard to keep track when bug threads head off at a tangent | 11:24.07 |
norbertj | chrisl: I know, same here ;) | 11:24.32 |
rayjj | norbertj: (for the logs) I haven't had a chance to look into the NRT>0 performance issue yet. It is true that contentions and mutex locking _can_ impact degrade things, particularly on Windoze that seems particularly bad at mutexes | 13:13.56 |
| norbertj: This seems to be best instantiated under bug 694750. This is listed as a P1 enhancement. | 13:16.37 |
chrisl | rayjj: thanks for spotting that - but sorry it got dumped on you! | 13:42.12 |
norbertj | rayjj: thanks, I will follow 694750. Could you add 661 to the customer ids? | 14:30.50 |
henrys | Robin_Watts: did you say you'd been to Morroco? We're thinking of adding a warm leg onto the December trip - considering options. | 14:34.40 |
Robin_Watts | henrys: Nope, never done North Africa. | 14:34.55 |
| but Helen did a day trip there from Gibraltar. | 14:35.08 |
| and we have friends who did a longer holiday there. | 14:35.16 |
henrys | a friend of mine did a tour of the sahara with camels starting there. They really enjoyed it. | 14:36.05 |
| seemed like something you might have done ;-) | 14:36.36 |
Robin_Watts | Morocco seems one of the safer North African destinations at the moment. | 14:37.18 |
norbertj | chrisl: thanks | 14:39.51 |
chrisl | norbertj: np! | 14:40.28 |
rayjj | chrisl: (norbertj for the logs) I updated bug 694750 with the relevant info and changed this to "normal" so I will at least investigate it. If it turns out to be something that is not eassily addressed, we may revert it to enhancement | 15:12.31 |
| chrisl: but at least my analysis should allow us to better understand when, and when not, to use NRT > 0 | 15:13.16 |
chrisl | rayjj: thanks, I was just worried about the original bug from norbert spiralling off to areas not really related to *that* bug report | 15:15.21 |
henrys | Robin_Watts: certainly not taking sides with the customer but that is pokey right? That looks more like a high resolution rendering time than a check for color, around 200 ppm. | 15:23.33 |
kens | 200 ppm seems reasonable to me. You have to interpret the file, decompress images, examine every image sample etc. | 15:24.34 |
| rendering 200 ppm at high resolution would be pretty quick.... | 15:25.05 |
Robin_Watts | henrys: yeah. For color ones we can skip the image decompression as soon as we find the first color pixel/object. | 15:25.07 |
henrys | kens: I guess it depends on the resolution of the images, this is done in source space | 15:25.07 |
Robin_Watts | For greyscale we need to decompress *everything*. | 15:25.22 |
kens | Certainly it will depend heavily on the image data. | 15:25.41 |
| Big images are obviously going to take longer, big images in JP2k even more longer | 15:26.01 |
Robin_Watts | Images are jpegs at 2560x3840. | 15:26.19 |
kens | So, pretty big then. | 15:26.28 |
tor8 | Robin_Watts: properly grayscale images shouldn't be decompressed at all, only grayscale images masquerading as color will have the worst case behaviour | 15:27.15 |
Robin_Watts | tor8: Right. | 15:27.26 |
kens | Sure, but it means you ahve to decompress all colour images | 15:27.37 |
Robin_Watts | I think the slow pages are ones with a single huge color image on. | 15:27.50 |
kens | you're talking 10 million damples for those images, each of which has to be individually checked, and compared against a threshold. | 15:27.59 |
henrys | okay I didn't expect images that large. | 15:28.23 |
tor8 | kens: yeah. but once we find the first color pixel we stop looking. still, we have to start over on each page since we flag it per page, not per file but that would be an easy change if that's what they want | 15:28.38 |
Robin_Watts | No, they want per page. | 15:28.50 |
kens | tor8 yes, but the worst case is that the images *are* grayscale, so you have to check every pixel | 15:28.58 |
| I'm guessing that's the case here | 15:29.29 |
Robin_Watts | kens: No, the slow pages are single huge color images. | 15:29.41 |
| And the time is in the decode of the image, not the checking, I'd guess. | 15:29.55 |
| We *could* decode images subsampled, which would be faster. | 15:30.12 |
kens | Hmm, well that is still not unlikely, JPEG is slow | 15:30.12 |
tor8 | we still decompress the whole image, but we stop looking at the pixels and comparing colors (and skip the rest of the page) once we see it's not grayscale | 15:30.13 |
henrys | are yo decoding a scanline or the entire thing? | 15:30.26 |
Robin_Watts | whole thing. | 15:30.32 |
kens | You can't decode a scan line in JPEG | 15:30.38 |
tor8 | decompressing it still needs to be done, though. and our interfaces are not complex enough to decode a scanline at a time. | 15:30.58 |
Robin_Watts | kens: You can decode at macroblock level, most of the time. | 15:31.03 |
kens | Yes, but that's still a number of scans | 15:31.14 |
Robin_Watts | If we cared, we could extend the image interface to do this faster. | 15:31.32 |
| but it's a non-trivial amount of work. | 15:31.39 |
| unlike the device itself which has been pretty simple. | 15:31.53 |
kens | How long does GS take to do the same job ? Is there some other tool which does it significantly faster ? | 15:32.00 |
kens | suspectrs not | 15:32.08 |
tor8 | Robin_Watts: the only easy performance increase I see possible now is keeping a list of images we've already checked | 15:33.16 |
| and even that is likely to double the amount of code used by the device, and probably not to any significant benefit | 15:33.36 |
kens | That seems unlikely to help with this file, presumably each page is unique | 15:33.47 |
tor8 | huge full page graphics that take a lot of time are unlikely to be sharde | 15:33.49 |
Robin_Watts | yeah. | 15:33.51 |
chrisl | If they actually knew enough to know what they want, I reckon they'd want the entire source image checked, and not the subsampled image | 15:34.10 |
Robin_Watts | chrisl: indeed. | 15:34.29 |
henrys | kens: it isn't scanline at a time but there is a scanline buffer in the jpeg stuff and it is not necessary to decode the entire image to fill that buffer ... | 15:35.51 |
| kens: in the gs jpeg filter | 15:36.07 |
kens | No, but you can't decode a single scan line form a jpeg file | 15:36.09 |
chrisl | This is JPX, isn't it? | 15:36.23 |
henrys | yes I understand that. | 15:36.26 |
Robin_Watts | No, not JPX. | 15:36.31 |
chrisl | Oh, okay..... | 15:36.38 |
Robin_Watts | If we were working at the pdf stream level, then yes, we could bale out without having to decode the whole thing. | 15:36.45 |
| But we're not. | 15:36.47 |
kens | JPX would be even worse | 15:36.48 |
Robin_Watts | We working at the fz_device level. | 15:36.53 |
| The fz_device gets called with fz_images. | 15:36.59 |
| and the simple thing to do with an fz_image is to call it's 'getpixmap' function. | 15:37.31 |
| Then you check the pixmap. | 15:37.38 |
chrisl | You could give them the option of just checking the colour space..... | 15:37.46 |
Robin_Watts | That's what we started with :) | 15:37.58 |
henrys | decoding blocks would seem to give a really good speedup with that size image but I don't know if we want to bother for this customer. | 15:37.58 |
kens | Then they complain that 'its wrong' | 15:37.59 |
Robin_Watts | They already complained it was wrong. | 15:38.08 |
| We *could* call the 'getsource' member and get a compressed buffer and a format back. | 15:38.21 |
| And then do a special check for formats we understand. | 15:38.40 |
chrisl | So the device could decode the image any way it saw fit? | 15:38.52 |
Robin_Watts | The device could see if the fz_image had the original compressed data available, and whether it was in a format that it understands. | 15:40.04 |
| If not, it would drop back to the current code. | 15:40.10 |
tor8 | chrisl: yeah. the fz_image is just a wrapper around a compressed buffer with some info about its format. | 15:40.16 |
| and a convenient getpixmap function | 15:40.29 |
Robin_Watts | More exactly the fz_image is an abstraction to encapsulate images; most of the time they have a compressed buffer with the original data, but not always. | 15:40.56 |
| Sometimes you'll get fz_images that just wrap bitmaps. | 15:41.19 |
chrisl | Presumably that would also mean handling multiple colour spaces | 15:41.32 |
| Rather than just RGB | 15:41.41 |
Robin_Watts | fz_image has a colorspace field in it. | 15:41.49 |
| getpixmap returns the pixmap in whatever the native image colorspace is. | 15:42.16 |
kens | What do you do about shadings ? | 15:42.17 |
Robin_Watts | So the device converts to rgb. | 15:42.21 |
| kens: We check the colors at the vertexes. | 15:42.31 |
tor8 | kens: or we check the lookup function for parameterized shadings | 15:42.43 |
kens | OK seems reasonable, if the vertexes are gray the whoel shadingf is | 15:42.51 |
Robin_Watts | The code for shadings is admirably small cos of the abstraction we did for drawing them. It fell out very nicely. | 15:42.56 |
chrisl | TBH, for this customer, I reckon it's not worth a lot of extra effort..... | 15:43.29 |
rayjj | chrisl: probably the case. HCL's customer is not one of our top-ten | 15:44.43 |
Robin_Watts | Lets wait and see what they say. | 15:44.51 |
rayjj | Robin_Watts: agreed. | 15:45.02 |
Robin_Watts | If they come back with a valid reason why they need it faster, we can think about it some more. | 15:45.04 |
| Like "well, tool X does it in only 2 minutes" or something. | 15:45.16 |
tor8 | Robin_Watts: then we'll say "so why don't you just use tool X?" | 15:45.39 |
rayjj | Robin_Watts: do you want me to bemchmark the 3001 page file with gs vs. mudraw (or did you already) ? | 15:45.52 |
chrisl | I have a feeling that whatever we do, they'll want more, and they'll want it faster - and finished yesterday | 15:45.54 |
Robin_Watts | tor8: Ahem, the idea with customers is to get them to pay *us*, not someone else :) | 15:46.01 |
tor8 | Robin_Watts: not if it means we actually have to work for it ;) | 15:46.20 |
Robin_Watts | rayjj: I did not do that. | 15:46.28 |
rayjj | chrisl: of course HCL will -- that way they can look like heroes to the actual customer (and we don't get any credit) | 15:46.30 |
Robin_Watts | Possibly we should say "there are further speedups possible, but that would require an NRE agreement" | 15:47.07 |
rayjj | now that Nori's not there, working directly with that customer would probably be much more trouble than it's worth | 15:47.12 |
Robin_Watts | which will shut them up nicely. | 15:47.22 |
henrys | rayjj: we should be able to do it much faster in gs since it isn't decoding the entire image. | 15:47.28 |
Robin_Watts | but gs is writing the clist. | 15:47.40 |
| so it does decode the whole image, I think | 15:47.51 |
henrys | why would we need a clist? | 15:48.01 |
Robin_Watts | and scales and plots it. | 15:48.03 |
rayjj | henrys: actually, gs does decode the entire image (as currently implemented) since it writes the entire clist for the page before we look at the color/gray status | 15:48.19 |
kens | Could you restrict Mudraw to a subset of the pages, and run multiple instances ? | 15:48.21 |
Robin_Watts | That'sJustTheWayItWorks(TM) | 15:48.24 |
henrys | rayjj: even with nullpage? | 15:48.35 |
Robin_Watts | kens: sure. | 15:48.41 |
rayjj | henrys: it's just how it's implemented -- it was done for cust 801 and they always wanted a clist anyway | 15:48.50 |
kens | Might be a valid solution for them, then | 15:48.54 |
Robin_Watts | kens: clue.required > 0 for that solution though. | 15:49.10 |
kens | :-D | 15:49.17 |
rayjj | henrys: what is needed is an actual 'detect' device that is more like mudraw's (akin to bbox) | 15:49.21 |
kens | We could buy a big stick..... | 15:49.25 |
rayjj | henrys: the bbox device gets partial info, but the detection for images was done (by mvrhel) in gxclimag.c | 15:50.20 |
henrys | rayjj: yeah i'm thinking if we start from the tracing devices that would be a nice solution. | 15:50.21 |
rayjj | henrys: starting from bbox is even nicer | 15:50.43 |
| at least, if I were to do it, that's what I would start with | 15:51.14 |
Robin_Watts | henrys: Making mupdf do this with the alternate image handling would be easier than getting gs to do it, and would be faster in the long term, I think. | 15:51.17 |
| It's 1-2 days work. | 15:51.32 |
chrisl | We can't really bale out of the image in Ghostscript either..... otherwise we could end up trying to interpret the image binary | 15:52.23 |
rayjj | Robin_Watts: that's probably what the 'graydetect' device in gs would be, given what's already implemented for everything except images. | 15:52.26 |
| Robin_Watts: and the plumbing to have gs skip out once color is detected is quite easy | 15:52.59 |
Robin_Watts | You're more hopeful than me about the time taken to actually get stuff done properly in gs. | 15:53.20 |
rayjj | gs is nice because skipping all the text is easy | 15:53.26 |
| Robin_Watts: did I say "properly" (not that such a thing exists in most of gs) | 15:53.54 |
| I have to head to a doctor's appt. | 15:54.26 |
| bbiaw | 15:54.32 |
Robin_Watts_ | They are busily installing fibre in the village today. | 15:54.39 |
| woo hoo! | 15:54.45 |
rayjj | I'll look at the performance issue with NRT while waiting... | 15:54.58 |
henrys | Robin_Watts: well there is a .05% chance they won't respondt or you last email let's pray for that. | 15:54.59 |
| rayjj: the board is so much more important than that. | 15:55.25 |
chrisl | rayjj: yes, the dev board to the doc's with you! :-) | 15:56.02 |
Robin_Watts_ | rayjj: mupdf already skips text. | 15:56.08 |
henrys | chrisl: right he needs to have that board in his pocket at all times. | 15:56.41 |
rayjj | chrisl: henrys: where is the 25pages.pcl file for norbert ? (the one that has 47 pages) | 15:56.47 |
chrisl | Probably on peeves..... | 15:57.03 |
rayjj | henrys: I can't work on the dev board at the doctor | 15:57.18 |
| henrys: if only it was battery powered ... | 15:57.43 |
henrys | rayjj: you shouldn't be working on anything at the doctors read a damn magazine | 15:57.46 |
chrisl | rayjj: on peeves: /home/norbert/20140721_profiledata/25pages.pcl | 15:58.33 |
rayjj | henrys: yeah, right. People or Us or O -- I'd rather have needles stuck in me (oh, right, that might happen anyway) | 15:58.34 |
| chrisl: thanks. | 15:58.39 |
henrys | coffees | 15:58.41 |
chrisl | And I don't think a color detection device in gs would buy us anything over the mupdf one..... | 15:59.17 |
rayjj | chrisl: no, I doubt it as well | 15:59.40 |
chrisl | rayjj: like I said, we can't bale out of the image early in gs either.... | 15:59.57 |
rayjj | as long as mupdf can bail early on images (gs handles things more as scanlines) | 16:00.10 |
chrisl | rayjj: but we still have to consume all the scanlines | 16:00.31 |
henrys | chrisl: no you don't. | 16:00.51 |
kens | We certainly do. PostScript or PDF we will need to consume all the input btes | 16:00.56 |
| Otherwise as Chirs says the PS interpreter will be trying to consume the binary compressed stream | 16:01.21 |
henrys | pxl doesn't there is nothging in the filter that imposes that | 16:01.22 |
chrisl | henrys: Postscript in particular we don't know the size of the compressed data.... | 16:01.44 |
kens | PostScript does, and the PDF interpreter is in PostScript, so.... | 16:01.45 |
henrys | anyway this is never going to be fast unless you can decompress a block at a time and bale early. | 16:02.57 |
kens | Large gray images masquerading as colour will always be slow since they must be completely checked. | 16:03.35 |
henrys | I'm surprised about PDF and ps though that seems out of sync with how the filter code usually operates, but I haven't looked at it in years. | 16:04.13 |
kens | I still think that if a multi-thousand page file processes slowly then the answer is to break the problem up and have multiple instances process the file. Which MuPDF is better suited for anyway | 16:04.18 |
chrisl | I can't remember how pxl specifies image dimensions etc - does it include the length of the compressed stream? (and can it be trusted)? | 16:04.23 |
kens | henrys, its not the filter code. THe problem is that if you stop consumign data from teh input stream, before exhausting all the dtaa, then the interpreter starts interpreting the remaining data as PostScript. | 16:04.56 |
| And you can't know in advance how much data there is in a PostScript image. | 16:05.18 |
| Well, you know how much there is in the decompressed image of course. | 16:05.33 |
chrisl | Or in a PDF, since we can't and don't trust the compressed stream length! | 16:05.58 |
kens | Yes indeed, because it canbe wrong and we're expected to fix it. | 16:06.15 |
chrisl | In PDF you can flush to the endstream/endobj, though | 16:06.37 |
kens | And trying to have the PDF itnerpreter break out of an image part way makes my head hurt | 16:06.40 |
henrys | anyway I'm sure doing anything in gs will be order of magnitude harder than mupdf so we don't need to worry about it. | 16:06.44 |
| if the customer is interested XL color detectinon then ... ;-) | 16:07.10 |
chrisl | Hmm, a gs device that only works with one interpreter - as if we didn't have enough trouble! | 16:07.39 |
rayjj | chrisl: it's easy to bail out with gs | 16:08.30 |
chrisl | rayjj: no it isn't | 16:08.46 |
kens | rayjj how would you avoid an error ? | 16:08.50 |
chrisl | Well, it is easy to bail out, but you can't complete the job..... | 16:09.18 |
kens | WHich works for 'document' colour detectgion, but not individual pages | 16:09.59 |
chrisl | Yeh | 16:10.09 |
henrys | yes pages make the problem much harder. | 16:10.33 |
chrisl | Hmm, I don't see how pxl can abort the image either..... | 16:12.02 |
| pxl doesn't seem to have advance notice of the compressed data size, either, so it seems to me it will run into the same problem Postscript will | 16:14.07 |
kens | Goodnight all | 16:24.58 |
henrys | chrisl: there's a datalength operator | 16:30.01 |
| chrisl: does acrobat/adobe pdf continue after a jpeg error? | 16:31.16 |
chrisl | henrys: yes, sometimes acrobat carries on | 16:34.57 |
| Like I said, in PDF you have the endstream string you can search for (although, that's been known to be missing, too) | 16:35.57 |
henrys | chrisl: right, I imagine you could easily trick the XL code it just hasn't been beaten up as much as gs | 16:37.14 |
chrisl | henrys: it still leaves Postscript, where we have no data length, and no sure end of data "tag" | 16:38.18 |
henrys | chrisl: sure but obviously the customer doesn't care about postscript - well I hope they don't ... did anyone ask. | 16:39.37 |
| ? | 16:39.39 |
| It's not going to be an easier problem to solve in mupdf in ps is needed ;-) | 16:40.20 |
| s/in/if | 16:40.34 |
chrisl | henrys: No, I'm pretty sure it's only PDF - but it means the "early abort" for images isn't really viable in Ghostscript at all | 16:40.47 |
| We'd run into problems trying implement it even for PDF in the current Ghostscript scheme, and I'd be reticent to devote a lot time to a device that wouldn't work with Postscript | 16:42.03 |
Robin_Watts | henrys: Sure it is. We run it through gs with pdfwrite, then call mupdf :) | 16:43.40 |
| hi fredross-perry. | 19:36.35 |
| I'm about to disappear for the night. Anything I can help you with before I go? | 19:37.10 |
fredross-perry | Hi Robin. No thanks, Michael is giving me good help. Cheers. | 20:07.24 |
| hi michael, calling you now... | 20:08.15 |
rayjj | chrisl_away: kens: (for the logs) gs can "recover" from an error thrown during processing a PDF. Here's an example: | 20:19.47 |
| gswin32c -r90 | 20:19.49 |
| at the GS> prompt, enter: | 20:19.50 |
| (tests_private/comparefiles/Bug689450.pdf) (r) file runpdfbegin 1 1 dopdfpages | 20:19.52 |
| -- Error is thrown: Error: /typecheck in / | 20:19.53 |
| then at the GS<5> prompt, enter: | 20:19.55 |
| clear 2 2 dopdfpages | 20:19.56 |
| This completes normally and thus continues after the error. This mechanism _could_ be used to "examine" PDF's for pages with color vs. monochrome if the "device" throws an error when color is detected from the gray detection logic (probably, maybe, if, ...) | 20:19.58 |
| oops -- that gs invocation was actually: gswin32c -dPDFSTOPONERROR -r90 | 20:23.44 |
rayjj | wishes that IRC had the skype "go back and correct" feature (but not just 1 message) | 20:24.55 |
| note, that feature would save Robin (or marcos or I) a lot of editing when someone (henrys) mentions a customer by name ;-) | 20:28.17 |
Robin_Watts | Muhaha. | 23:44.06 |
| I have 3001Pages.pdf running through mupdf in 18 seconds now :) | 23:44.21 |
| and it's a small patch too. I love it when abstraction pays off. | 23:50.58 |
| Patch on robin/master for tor to look at in the morning. | 23:51.07 |
rayjj | Robin_Watts: what was it before the patch ? | 23:54.16 |
rayjj | vaguely recalls 30+ sec | 23:54.41 |
| Robin_Watts: and does it still get the correct result ?? ;-) | 23:55.57 |
| Forward 1 day (to 2014/09/04)>>> | |