Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2014/09/02)	2014/09/03
mvrhel_laptop	bbiaw	00:14.56
Toppi	hi	06:34.55
ghostbot	Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.	06:34.55
Toppi	Is there any plan to support stylus pen(android) on Mupdf?	06:36.11
chrisl	Toppi: you'll probably have to wait a few hours - the main mupdf devs probably won't be here for 2-3 hours yet.	06:38.05
Toppi	OK	06:38.20
chrisl	And I don't know enough about mupdf or Android to know what supporting "stylus pen" would involve......	06:39.02
Toppi	Stylus pen return (x,y,presure) instead of (x,y) of the touching point, the presure can be used to determine thinness.	06:41.25
chrisl	As mupdf is primarily for viewing rather than editing, I'm not sure just how much utility there would be in that - but as I said, it's not my area	06:43.18
callgovind	hi all	07:55.51
	can anyone tell me that how mupdf search text works?	07:56.21
	?	07:57.41
	anyone?	07:59.33
kens	what ?	07:59.52
chrisl	callgovind: it's not really my area, but I think it extracts the text from the PDF and searches it......	08:01.47
	off for a couple of hours.....	08:12.19
kens	bb	08:12.23
norbertj	chrisl: hi, do you know if anyone is picking up on my 695374 perf problem. Especially my question in comment#20 . I have the impression that the renderingthreads are not running in parallel. Is Ray the person to check on this?	11:15.57
chrisl	norbertj: yes, Ray probably needs to weigh in on it - it might have gone under his radar because it's a PCL issue.....	11:18.17
norbertj	chrisl: when will he be around? He'll see the logs, so I think he will then follow up?	11:19.09
chrisl	norbertj: he's in SoCal so it will be later this afternoon - I'll make sure he's aware	11:19.45
norbertj	chrisl: thanks	11:19.59
chrisl	norbertj: we might want to create a new bug, as the original issue in 695374 seems to have been left behind	11:21.52
norbertj	chrisl: ok I'll make a new one regarding NumRenderingThreads etc.	11:22.30
chrisl	norbertj: or we can do it if henrys and/or Ray feel it's necessary, and let you know	11:23.11
norbertj	chrisl: fine with me.	11:23.31
chrisl	norbertj: okay, I'll take it over with them - it's just, personally, I find it hard to keep track when bug threads head off at a tangent	11:24.07
norbertj	chrisl: I know, same here ;)	11:24.32
rayjj	norbertj: (for the logs) I haven't had a chance to look into the NRT>0 performance issue yet. It is true that contentions and mutex locking _can_ impact degrade things, particularly on Windoze that seems particularly bad at mutexes	13:13.56
	norbertj: This seems to be best instantiated under bug 694750. This is listed as a P1 enhancement.	13:16.37
chrisl	rayjj: thanks for spotting that - but sorry it got dumped on you!	13:42.12
norbertj	rayjj: thanks, I will follow 694750. Could you add 661 to the customer ids?	14:30.50
henrys	Robin_Watts: did you say you'd been to Morroco? We're thinking of adding a warm leg onto the December trip - considering options.	14:34.40
Robin_Watts	henrys: Nope, never done North Africa.	14:34.55
	but Helen did a day trip there from Gibraltar.	14:35.08
	and we have friends who did a longer holiday there.	14:35.16
henrys	a friend of mine did a tour of the sahara with camels starting there. They really enjoyed it.	14:36.05
	seemed like something you might have done ;-)	14:36.36
Robin_Watts	Morocco seems one of the safer North African destinations at the moment.	14:37.18
norbertj	chrisl: thanks	14:39.51
chrisl	norbertj: np!	14:40.28
rayjj	chrisl: (norbertj for the logs) I updated bug 694750 with the relevant info and changed this to "normal" so I will at least investigate it. If it turns out to be something that is not eassily addressed, we may revert it to enhancement	15:12.31
	chrisl: but at least my analysis should allow us to better understand when, and when not, to use NRT > 0	15:13.16
chrisl	rayjj: thanks, I was just worried about the original bug from norbert spiralling off to areas not really related to that bug report	15:15.21
henrys	Robin_Watts: certainly not taking sides with the customer but that is pokey right? That looks more like a high resolution rendering time than a check for color, around 200 ppm.	15:23.33
kens	200 ppm seems reasonable to me. You have to interpret the file, decompress images, examine every image sample etc.	15:24.34
	rendering 200 ppm at high resolution would be pretty quick....	15:25.05
Robin_Watts	henrys: yeah. For color ones we can skip the image decompression as soon as we find the first color pixel/object.	15:25.07
henrys	kens: I guess it depends on the resolution of the images, this is done in source space	15:25.07
Robin_Watts	For greyscale we need to decompress everything.	15:25.22
kens	Certainly it will depend heavily on the image data.	15:25.41
	Big images are obviously going to take longer, big images in JP2k even more longer	15:26.01
Robin_Watts	Images are jpegs at 2560x3840.	15:26.19
kens	So, pretty big then.	15:26.28
tor8	Robin_Watts: properly grayscale images shouldn't be decompressed at all, only grayscale images masquerading as color will have the worst case behaviour	15:27.15
Robin_Watts	tor8: Right.	15:27.26
kens	Sure, but it means you ahve to decompress all colour images	15:27.37
Robin_Watts	I think the slow pages are ones with a single huge color image on.	15:27.50
kens	you're talking 10 million damples for those images, each of which has to be individually checked, and compared against a threshold.	15:27.59
henrys	okay I didn't expect images that large.	15:28.23
tor8	kens: yeah. but once we find the first color pixel we stop looking. still, we have to start over on each page since we flag it per page, not per file but that would be an easy change if that's what they want	15:28.38
Robin_Watts	No, they want per page.	15:28.50
kens	tor8 yes, but the worst case is that the images are grayscale, so you have to check every pixel	15:28.58
	I'm guessing that's the case here	15:29.29
Robin_Watts	kens: No, the slow pages are single huge color images.	15:29.41
	And the time is in the decode of the image, not the checking, I'd guess.	15:29.55
	We could decode images subsampled, which would be faster.	15:30.12
kens	Hmm, well that is still not unlikely, JPEG is slow	15:30.12
tor8	we still decompress the whole image, but we stop looking at the pixels and comparing colors (and skip the rest of the page) once we see it's not grayscale	15:30.13
henrys	are yo decoding a scanline or the entire thing?	15:30.26
Robin_Watts	whole thing.	15:30.32
kens	You can't decode a scan line in JPEG	15:30.38
tor8	decompressing it still needs to be done, though. and our interfaces are not complex enough to decode a scanline at a time.	15:30.58
Robin_Watts	kens: You can decode at macroblock level, most of the time.	15:31.03
kens	Yes, but that's still a number of scans	15:31.14
Robin_Watts	If we cared, we could extend the image interface to do this faster.	15:31.32
	but it's a non-trivial amount of work.	15:31.39
	unlike the device itself which has been pretty simple.	15:31.53
kens	How long does GS take to do the same job ? Is there some other tool which does it significantly faster ?	15:32.00
*kens*	suspectrs not	15:32.08
tor8	Robin_Watts: the only easy performance increase I see possible now is keeping a list of images we've already checked	15:33.16
	and even that is likely to double the amount of code used by the device, and probably not to any significant benefit	15:33.36
kens	That seems unlikely to help with this file, presumably each page is unique	15:33.47
tor8	huge full page graphics that take a lot of time are unlikely to be sharde	15:33.49
Robin_Watts	yeah.	15:33.51
chrisl	If they actually knew enough to know what they want, I reckon they'd want the entire source image checked, and not the subsampled image	15:34.10
Robin_Watts	chrisl: indeed.	15:34.29
henrys	kens: it isn't scanline at a time but there is a scanline buffer in the jpeg stuff and it is not necessary to decode the entire image to fill that buffer ...	15:35.51
	kens: in the gs jpeg filter	15:36.07
kens	No, but you can't decode a single scan line form a jpeg file	15:36.09
chrisl	This is JPX, isn't it?	15:36.23
henrys	yes I understand that.	15:36.26
Robin_Watts	No, not JPX.	15:36.31
chrisl	Oh, okay.....	15:36.38
Robin_Watts	If we were working at the pdf stream level, then yes, we could bale out without having to decode the whole thing.	15:36.45
	But we're not.	15:36.47
kens	JPX would be even worse	15:36.48
Robin_Watts	We working at the fz_device level.	15:36.53
	The fz_device gets called with fz_images.	15:36.59
	and the simple thing to do with an fz_image is to call it's 'getpixmap' function.	15:37.31
	Then you check the pixmap.	15:37.38
chrisl	You could give them the option of just checking the colour space.....	15:37.46
Robin_Watts	That's what we started with :)	15:37.58
henrys	decoding blocks would seem to give a really good speedup with that size image but I don't know if we want to bother for this customer.	15:37.58
kens	Then they complain that 'its wrong'	15:37.59
Robin_Watts	They already complained it was wrong.	15:38.08
	We could call the 'getsource' member and get a compressed buffer and a format back.	15:38.21
	And then do a special check for formats we understand.	15:38.40
chrisl	So the device could decode the image any way it saw fit?	15:38.52
Robin_Watts	The device could see if the fz_image had the original compressed data available, and whether it was in a format that it understands.	15:40.04
	If not, it would drop back to the current code.	15:40.10
tor8	chrisl: yeah. the fz_image is just a wrapper around a compressed buffer with some info about its format.	15:40.16
	and a convenient getpixmap function	15:40.29
Robin_Watts	More exactly the fz_image is an abstraction to encapsulate images; most of the time they have a compressed buffer with the original data, but not always.	15:40.56
	Sometimes you'll get fz_images that just wrap bitmaps.	15:41.19
chrisl	Presumably that would also mean handling multiple colour spaces	15:41.32
	Rather than just RGB	15:41.41
Robin_Watts	fz_image has a colorspace field in it.	15:41.49
	getpixmap returns the pixmap in whatever the native image colorspace is.	15:42.16
kens	What do you do about shadings ?	15:42.17
Robin_Watts	So the device converts to rgb.	15:42.21
	kens: We check the colors at the vertexes.	15:42.31
tor8	kens: or we check the lookup function for parameterized shadings	15:42.43
kens	OK seems reasonable, if the vertexes are gray the whoel shadingf is	15:42.51
Robin_Watts	The code for shadings is admirably small cos of the abstraction we did for drawing them. It fell out very nicely.	15:42.56
chrisl	TBH, for this customer, I reckon it's not worth a lot of extra effort.....	15:43.29
rayjj	chrisl: probably the case. HCL's customer is not one of our top-ten	15:44.43
Robin_Watts	Lets wait and see what they say.	15:44.51
rayjj	Robin_Watts: agreed.	15:45.02
Robin_Watts	If they come back with a valid reason why they need it faster, we can think about it some more.	15:45.04
	Like "well, tool X does it in only 2 minutes" or something.	15:45.16
tor8	Robin_Watts: then we'll say "so why don't you just use tool X?"	15:45.39
rayjj	Robin_Watts: do you want me to bemchmark the 3001 page file with gs vs. mudraw (or did you already) ?	15:45.52
chrisl	I have a feeling that whatever we do, they'll want more, and they'll want it faster - and finished yesterday	15:45.54
Robin_Watts	tor8: Ahem, the idea with customers is to get them to pay us, not someone else :)	15:46.01
tor8	Robin_Watts: not if it means we actually have to work for it ;)	15:46.20
Robin_Watts	rayjj: I did not do that.	15:46.28
rayjj	chrisl: of course HCL will -- that way they can look like heroes to the actual customer (and we don't get any credit)	15:46.30
Robin_Watts	Possibly we should say "there are further speedups possible, but that would require an NRE agreement"	15:47.07
rayjj	now that Nori's not there, working directly with that customer would probably be much more trouble than it's worth	15:47.12
Robin_Watts	which will shut them up nicely.	15:47.22
henrys	rayjj: we should be able to do it much faster in gs since it isn't decoding the entire image.	15:47.28
Robin_Watts	but gs is writing the clist.	15:47.40
	so it does decode the whole image, I think	15:47.51
henrys	why would we need a clist?	15:48.01
Robin_Watts	and scales and plots it.	15:48.03
rayjj	henrys: actually, gs does decode the entire image (as currently implemented) since it writes the entire clist for the page before we look at the color/gray status	15:48.19
kens	Could you restrict Mudraw to a subset of the pages, and run multiple instances ?	15:48.21
Robin_Watts	That'sJustTheWayItWorks(TM)	15:48.24
henrys	rayjj: even with nullpage?	15:48.35
Robin_Watts	kens: sure.	15:48.41
rayjj	henrys: it's just how it's implemented -- it was done for cust 801 and they always wanted a clist anyway	15:48.50
kens	Might be a valid solution for them, then	15:48.54
Robin_Watts	kens: clue.required > 0 for that solution though.	15:49.10
kens	:-D	15:49.17
rayjj	henrys: what is needed is an actual 'detect' device that is more like mudraw's (akin to bbox)	15:49.21
kens	We could buy a big stick.....	15:49.25
rayjj	henrys: the bbox device gets partial info, but the detection for images was done (by mvrhel) in gxclimag.c	15:50.20
henrys	rayjj: yeah i'm thinking if we start from the tracing devices that would be a nice solution.	15:50.21
rayjj	henrys: starting from bbox is even nicer	15:50.43
	at least, if I were to do it, that's what I would start with	15:51.14
Robin_Watts	henrys: Making mupdf do this with the alternate image handling would be easier than getting gs to do it, and would be faster in the long term, I think.	15:51.17
	It's 1-2 days work.	15:51.32
chrisl	We can't really bale out of the image in Ghostscript either..... otherwise we could end up trying to interpret the image binary	15:52.23
rayjj	Robin_Watts: that's probably what the 'graydetect' device in gs would be, given what's already implemented for everything except images.	15:52.26
	Robin_Watts: and the plumbing to have gs skip out once color is detected is quite easy	15:52.59
Robin_Watts	You're more hopeful than me about the time taken to actually get stuff done properly in gs.	15:53.20
rayjj	gs is nice because skipping all the text is easy	15:53.26
	Robin_Watts: did I say "properly" (not that such a thing exists in most of gs)	15:53.54
	I have to head to a doctor's appt.	15:54.26
	bbiaw	15:54.32
Robin_Watts_	They are busily installing fibre in the village today.	15:54.39
	woo hoo!	15:54.45
rayjj	I'll look at the performance issue with NRT while waiting...	15:54.58
henrys	Robin_Watts: well there is a .05% chance they won't respondt or you last email let's pray for that.	15:54.59
	rayjj: the board is so much more important than that.	15:55.25
chrisl	rayjj: yes, the dev board to the doc's with you! :-)	15:56.02
Robin_Watts_	rayjj: mupdf already skips text.	15:56.08
henrys	chrisl: right he needs to have that board in his pocket at all times.	15:56.41
rayjj	chrisl: henrys: where is the 25pages.pcl file for norbert ? (the one that has 47 pages)	15:56.47
chrisl	Probably on peeves.....	15:57.03
rayjj	henrys: I can't work on the dev board at the doctor	15:57.18
	henrys: if only it was battery powered ...	15:57.43
henrys	rayjj: you shouldn't be working on anything at the doctors read a damn magazine	15:57.46
chrisl	rayjj: on peeves: /home/norbert/20140721_profiledata/25pages.pcl	15:58.33
rayjj	henrys: yeah, right. People or Us or O -- I'd rather have needles stuck in me (oh, right, that might happen anyway)	15:58.34
	chrisl: thanks.	15:58.39
*henrys*	coffees	15:58.41
chrisl	And I don't think a color detection device in gs would buy us anything over the mupdf one.....	15:59.17
rayjj	chrisl: no, I doubt it as well	15:59.40
chrisl	rayjj: like I said, we can't bale out of the image early in gs either....	15:59.57
rayjj	as long as mupdf can bail early on images (gs handles things more as scanlines)	16:00.10
chrisl	rayjj: but we still have to consume all the scanlines	16:00.31
henrys	chrisl: no you don't.	16:00.51
kens	We certainly do. PostScript or PDF we will need to consume all the input btes	16:00.56
	Otherwise as Chirs says the PS interpreter will be trying to consume the binary compressed stream	16:01.21
henrys	pxl doesn't there is nothging in the filter that imposes that	16:01.22
chrisl	henrys: Postscript in particular we don't know the size of the compressed data....	16:01.44
kens	PostScript does, and the PDF interpreter is in PostScript, so....	16:01.45
henrys	anyway this is never going to be fast unless you can decompress a block at a time and bale early.	16:02.57
kens	Large gray images masquerading as colour will always be slow since they must be completely checked.	16:03.35
henrys	I'm surprised about PDF and ps though that seems out of sync with how the filter code usually operates, but I haven't looked at it in years.	16:04.13
kens	I still think that if a multi-thousand page file processes slowly then the answer is to break the problem up and have multiple instances process the file. Which MuPDF is better suited for anyway	16:04.18
chrisl	I can't remember how pxl specifies image dimensions etc - does it include the length of the compressed stream? (and can it be trusted)?	16:04.23
kens	henrys, its not the filter code. THe problem is that if you stop consumign data from teh input stream, before exhausting all the dtaa, then the interpreter starts interpreting the remaining data as PostScript.	16:04.56
	And you can't know in advance how much data there is in a PostScript image.	16:05.18
	Well, you know how much there is in the decompressed image of course.	16:05.33
chrisl	Or in a PDF, since we can't and don't trust the compressed stream length!	16:05.58
kens	Yes indeed, because it canbe wrong and we're expected to fix it.	16:06.15
chrisl	In PDF you can flush to the endstream/endobj, though	16:06.37
kens	And trying to have the PDF itnerpreter break out of an image part way makes my head hurt	16:06.40
henrys	anyway I'm sure doing anything in gs will be order of magnitude harder than mupdf so we don't need to worry about it.	16:06.44
	if the customer is interested XL color detectinon then ... ;-)	16:07.10
chrisl	Hmm, a gs device that only works with one interpreter - as if we didn't have enough trouble!	16:07.39
rayjj	chrisl: it's easy to bail out with gs	16:08.30
chrisl	rayjj: no it isn't	16:08.46
kens	rayjj how would you avoid an error ?	16:08.50
chrisl	Well, it is easy to bail out, but you can't complete the job.....	16:09.18
kens	WHich works for 'document' colour detectgion, but not individual pages	16:09.59
chrisl	Yeh	16:10.09
henrys	yes pages make the problem much harder.	16:10.33
chrisl	Hmm, I don't see how pxl can abort the image either.....	16:12.02
	pxl doesn't seem to have advance notice of the compressed data size, either, so it seems to me it will run into the same problem Postscript will	16:14.07
kens	Goodnight all	16:24.58
henrys	chrisl: there's a datalength operator	16:30.01
	chrisl: does acrobat/adobe pdf continue after a jpeg error?	16:31.16
chrisl	henrys: yes, sometimes acrobat carries on	16:34.57
	Like I said, in PDF you have the endstream string you can search for (although, that's been known to be missing, too)	16:35.57
henrys	chrisl: right, I imagine you could easily trick the XL code it just hasn't been beaten up as much as gs	16:37.14
chrisl	henrys: it still leaves Postscript, where we have no data length, and no sure end of data "tag"	16:38.18
henrys	chrisl: sure but obviously the customer doesn't care about postscript - well I hope they don't ... did anyone ask.	16:39.37
	?	16:39.39
	It's not going to be an easier problem to solve in mupdf in ps is needed ;-)	16:40.20
	s/in/if	16:40.34
chrisl	henrys: No, I'm pretty sure it's only PDF - but it means the "early abort" for images isn't really viable in Ghostscript at all	16:40.47
	We'd run into problems trying implement it even for PDF in the current Ghostscript scheme, and I'd be reticent to devote a lot time to a device that wouldn't work with Postscript	16:42.03
Robin_Watts	henrys: Sure it is. We run it through gs with pdfwrite, then call mupdf :)	16:43.40
	hi fredross-perry.	19:36.35
	I'm about to disappear for the night. Anything I can help you with before I go?	19:37.10
fredross-perry	Hi Robin. No thanks, Michael is giving me good help. Cheers.	20:07.24
	hi michael, calling you now...	20:08.15
rayjj	chrisl_away: kens: (for the logs) gs can "recover" from an error thrown during processing a PDF. Here's an example:	20:19.47
	gswin32c -r90	20:19.49
	at the GS> prompt, enter:	20:19.50
	(tests_private/comparefiles/Bug689450.pdf) (r) file runpdfbegin 1 1 dopdfpages	20:19.52
	-- Error is thrown: Error: /typecheck in /	20:19.53
	then at the GS<5> prompt, enter:	20:19.55
	clear 2 2 dopdfpages	20:19.56
	This completes normally and thus continues after the error. This mechanism _could_ be used to "examine" PDF's for pages with color vs. monochrome if the "device" throws an error when color is detected from the gray detection logic (probably, maybe, if, ...)	20:19.58
	oops -- that gs invocation was actually: gswin32c -dPDFSTOPONERROR -r90	20:23.44
*rayjj*	wishes that IRC had the skype "go back and correct" feature (but not just 1 message)	20:24.55
	note, that feature would save Robin (or marcos or I) a lot of editing when someone (henrys) mentions a customer by name ;-)	20:28.17
Robin_Watts	Muhaha.	23:44.06
	I have 3001Pages.pdf running through mupdf in 18 seconds now :)	23:44.21
	and it's a small patch too. I love it when abstraction pays off.	23:50.58
	Patch on robin/master for tor to look at in the morning.	23:51.07
rayjj	Robin_Watts: what was it before the patch ?	23:54.16
*rayjj*	vaguely recalls 30+ sec	23:54.41
	Robin_Watts: and does it still get the correct result ?? ;-)	23:55.57
	Forward 1 day (to 2014/09/04)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.