Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2011/08/31)	2011/09/01
mvrhel2	Robin_Watts: I missed the pattern discussion	00:27.20
	I will take a look at that tongiht	00:27.24
	some diffs with the double to float change. doing a bmpcmp to see if I can see anything	00:28.20
gsnoob	good day everyone, is anyone here willing to point me to an issue I'm getting with ghostscript?	05:27.17
	*point me to solution areas to an issue ...	05:27.58
	I'm running ghostscript in winXP commandline converting a pdf to tif. I've hit on an invalid pdf file, but I can't get gs commandline to give me an error code/message. the process just doesn't quit. I've tried PDFSTOPONERROR but GS doesn't stop	05:29.55
	can anyone point a way?	05:37.00
chrisl	gsnoob: In what way is the PDF invalid?	06:31.18
gsnoob	chrisl: there's supposedly an image in the pdf file, acrobat pop-up "Insufficient data for an image."	07:59.05
	adobe acrobat isn't able to display the image part, but ghostscript seems to be able to get some of when it attempts to convert it to tiff but the process never terminates and I can't get any message/error to do something about it (programatically terminate)	08:01.26
chrisl	gsnoob: that sounds like a bug, rather than something you can have control over. I'd guess that the filter for reading the image data goes into an infinite loop trying to refill a buffer - I suggest you raise a bug.	08:04.33
gsnoob	I suspected it could be a bug. I was hoping someone else encountered it and had some workaround. thanks chrisl.	08:06.43
chrisl	gsnoob: it is remotely possible that someone has, but without seeing the file, it's pretty much impossible for us to judge it - and easiest way to get the file to the relevant person is to make a bug in bugzilla.	08:08.32
Robin_Watts	gsnoob: How do you know it's an infinite loop, not just taking a very long time?	08:55.12
	Try using -dNOINTERPOLATE and -dNOTRANSPARENCY	08:55.22
gsnoob	Robin_Watts: I didnmy script is logging filenames of documents which are taking too long to convert, it was running ov	08:59.34
	I didn't write the infinite loop theory on the issue	08:59.55
Robin_Watts	ok, let me rephrase.	09:00.20
	How do you know the "process never terminates"? How do you know it's not just taking a very long time. My previous advice stands.	09:01.01
kens	The command line uses GraphicsAlpha and TextAlpha as well	09:01.43
*kens*	goes off to try it	09:02.16
gsnoob	I was running the script overnight and checked the logged filenames manually to see if there are issues on the document themselves. The PDF file is only 3 pages, around 220kb. And Acrobat is telling me the file is has "Insufficient data for an image." - I'm still waiting for the process gswin32c to terminate, once which I've ran about an hour ago.	09:02.49
	*one which I've execute about an hour ago.	09:02.59
kens	Robin_Watts : It does appear to enter an infinite loop processing the image on page 2	09:03.46
gsnoob	kens: I'm guessing you have the bug report and file I posted?	09:04.46
kens	Yes, just received.	09:05.03
	Its in jdhuff, so it looks like a Huffman coding problem.	09:05.20
	It looks like a problem with jpeglib.	09:06.53
	Looks like jpeglib isn't handling the error (which is being flagged in teh bottom of the code) and keeps retrying. Probably we should extract the JPEG and send it upstream.	09:09.22
	Or fix it ourselves, but its been too long since I looked at this code for me to do it quickly.	09:09.41
Robin_Watts	I'll have a quick look after my run.	09:10.12
kens	OK	09:10.23
chrisl	Is it a regression with libjpeg 8?	09:10.32
gsnoob	A buddy is also testing this on Server 2008 R2 and encountering the same issue.	09:10.38
kens	con't know.	09:10.43
	chrisl do not know.	09:10.48
	WOudl have to try the JPEG on an earlier version. which means buiding ti stand-alone	09:11.04
gsnoob	Server 2008 R2 x64	09:11.18
Robin_Watts	kens: Or trying an older version of gs.	09:11.52
kens	chrisl, Robin_Watts in process_data_simple_main (jdmainct.c) :	09:12.01
	if (! pmain->buffer_full) {	09:12.01
	if (! (*cinfo->coef->decompress_data) (cinfo, pmain->buffer))	09:12.01
	return; /* suspension forced, can do nothing more */	09:12.01
	Notice it does not return an erroc ode.	09:12.17
chrisl	Is there an error value in the cinfo context?	09:12.47
kens	Possibly, wait one.	09:12.58
	I don't think so.	09:13.19
chrisl	The problem is, I think that just indicates it's run out of data, and needs more, it's not actually an error condition	09:13.41
kens	The real culprit seems to be the fact that it doesn't increment the row counter, so jpeg_read_scanlines just goes round again.	09:13.57
	and again, and again...	09:14.12
	Presumably because there simply is no more data to be had.	09:14.22
chrisl	It probably can't increment the row number before, being streamable, it's doesn't necessarily get a whole row of data at one time	09:14.58
kens	I'm back up into GS code now, in the stream stuff.	09:15.18
	This is Ray's bailiwick, I don't understand it :-(	09:15.56
	Uses bloddy macros as well	09:16.08
	Could be a problem with the meaning of 'status'	09:17.59
chrisl	I'm not sure how gs filters work - can a filter return an empty buffer?	09:18.20
kens	Its reading from a stream, and it looks like hte stream source is sayign 'no more data' but we're ignoring it. Maybe....	09:18.57
	Its beyond me, one for Ray I think.	09:19.28
*kens*	goes back to type 3 fonts	09:19.51
Robin_Watts	IIRC postscript image streams are supposed to repeat when they hit the end, right ?	10:39.18
kens	Yes, but I'm not sure if this is a PS stream	10:39.46
	Its jpeglib using our stream code to get the JPEG data	10:40.27
Robin_Watts	Right.	10:40.32
	but ultimately, this data is going to be coming from a PS stream.	10:40.43
kens	I guess so, yes.	10:40.55
	But we do need to detect error conditions	10:41.22
Robin_Watts	One idea that occurred to me is that maybe we were expected to pad the image data with zeros if we hit the end.	10:41.25
	which might have been why the stream was not reporting an EOF.	10:41.38
	BUT if image streams are supposed to repeat, then they can't be expected to pad with zeros.	10:41.58
kens	JPEG shouldn't expect anything like that	10:42.05
Robin_Watts	I know JPEG doesn't. wasn't so sure about PS.	10:42.43
kens	PostScritp does, yes.	10:42.53
	And if that happens the JPEG decoder should recognise an error and abort	10:43.19
Robin_Watts	indeed.	10:43.25
	I was trying to think of reasons why that might not be happening, and padding with zeros was the best I could come up with - but I think we've eliminated that now.	10:43.57
	The problem appears to be that the jpeg lib gets called to decode a 0x800 bytes of data, and can't get a complete scanline out of it, so it suspends and returns in the hopes that it will get called with more data next time.	11:42.33
	Ghostscript simply reenters with the same 0x800 bytes of data.	11:43.11
	The reason it can't get a complete scanline out is that the last 7bc bytes of the 0x8000 are all FF, which get skipped by jpeg parsing.	11:44.20
tor8	kens, Robin_Watts: regular use of libjpeg (without the suspension stuff) works fine on the file -- it just terminates early with an error or two	11:50.49
	one warning: Corrupt JPEG data: premature end of data segment	11:52.58
	followed by an error about unsupported marker types	11:53.11
Robin_Watts	tor8: Yes.	12:15.32
	The problem, I think is the way gs uses it.	12:15.50
	Because we use the suspending stuff, we assume that we'll never have a scanline that takes more than 0x800 bytes to encode.	12:16.18
	(and by scanline I mean line of MCU blocks)	12:16.35
	What should happen is that we should detect the fact that we've fallen out of the gs call without using any bytes from the buffer, and double the size of the buffer before we retry.	12:17.32
	but whether or not that's possible I don't know.	12:17.46
	(actually, the condition should be: if (buffer size on exit == buffer size on entry && no more output produced) then grow buffer before retrying, I think, to allow for it playing out buffered rows)	12:24.06
tor8	Robin_Watts: you're not getting jpeg errors before entering the loop?	12:26.08
Robin_Watts	tor8: no.	12:26.28
	The 'suspending' calls to the JPEG lib work on the assumption that you're always capable of passing a large enough buffer into the lib so that at least 1 encoded row can be extracted from it.	12:38.31
	The gs code assumes that 0x800 bytes is enough for that.	12:38.43
	which seems bad to me, a) because we might have a really large, high quality jpeg where that's not enough, and b) if we get duff data (like a load of 0xFF's) we can blow that.	12:39.24
	lunchtime.	12:39.44
henrys	kens, alexcher:don't answer hcl please, thanks.	13:13.53
kens	OK No problem.	13:14.03
	Dumb question anyway :-)	13:14.48
henrys	I'll be out until meeting time - 9:30 Pacific.	13:25.28
kens	OK	13:25.55
matogel	hey	13:28.41
kens	Hello	13:28.49
matogel	can someone here help me with an unattended install for ghostscript on windows?	13:29.15
	I need a commandline for that, and I can't find anything about that, until now, so I am asking here	13:31.45
kens	I think you will need to write your own installer for that.	13:32.27
chrisl	matogel: are you using an up to date GS version?	13:32.44
matogel	chrisl: I don't know, I am asked to find a commandline for installing Ghostscript unattended, I never used it	13:33.23
	(at work)	13:33.30
chrisl	matogel: well, it is important, as we changed the windows installer we use recently	13:33.55
matogel	chrisl: hm, okay, so you are using a msi installer, now, or what? I don't know, which version we got here, thats not in my hand, I could, however ask the responsible person	13:35.31
chrisl	matogel: we're using the nsis installer - you may be able to use the nsis options: /S /NCRC /D=<install dir>	13:36.53
matogel	chrisl: oh, wow, thank you very much	13:37.25
chrisl	matogel: I'm not guaranteeing it will work - it's only a recent change for us.....	13:37.56
Robin_Watts	matogel: Please let us know your results. It may be worth an FAQ entry for us.	13:38.31
matogel	chrisl: okay, so I will see, if this works	13:38.40
	Robin_Watts: sure, when I could test it, I join here, again	13:39.01
Robin_Watts	Thanks.	13:39.09
chrisl	matogel: if it doesn't, raise a bug and I'll look at fixing it.	13:39.13
matogel	chrisl: okay, I just don't know, when we will be able to test it, so it may take a bit longer, but it should be done, until end of september at least, I think, this is our deadline	13:40.26
henrys	kens:great you got the pattern problem	13:41.28
chrisl	matogel: I just tested and it does work, but you need to run it as administrator, or you get the dialogue "Do you want to be administrator....."	13:45.34
matogel	chrisl: okay, so I'll write that as note, thank you very much for your help :)	13:46.13
chrisl	matogel: np, I'm relieved it works ;-)	13:46.48
Robin_Watts	chrisl: Sounds like it's worth an FAQ entry to me.	13:47.50
chrisl	Robin_Watts: yes, I'll throw something together and mail it to marcos (I think we dumped that on him!)	13:49.49
kens	henrys it was one of those 'hard to find, easy to fix' problems :-)	13:50.47
ray_laptop	Robin_Watts: ken: chrisl: I saw the logs about the JPEG stream issue.	13:50.54
Robin_Watts	ray_laptop: I've burbled onto the bug.	13:51.18
	Hopefully it makes sense.	13:51.28
ray_laptop	the problem is that the JPEG logic is 'skipping' the 0xff bytes, but not consuming them from the stream	13:51.30
Robin_Watts	ray_laptop: It can't consume them from the stream.	13:51.42
	The buffer contains a few bytes of useful data, then lots of 0xFF.	13:51.53
ray_laptop	why not, if it is skipping them ?	13:51.57
Robin_Watts	We call the JPEG lib using its suspending feature.	13:52.21
	That means that it attempts to decode a line, and if it hasn't got enough data, it leaves everything exactly as it was and then bales.	13:52.58
ray_laptop	Robin_Watts: I was going by your comment: last 7bc bytes of the 0x8000 are all FF, which get skipped by jpeg parsing.	13:53.04
Robin_Watts	hence callers need to ensure that the buffer contains enough information to decode an entire lines worth.	13:53.26
	The problem is that the lines worth of data is more than 0x800 bytes.	13:53.40
ray_laptop	Robin_Watts: that's what 'min' is about. If a stream filter requires a certain amount of data before it can proceed, that needs to be identified as a condition of the filter.	13:55.01
Robin_Watts	ray_laptop: The JPEG lib can't suspend in the middle of a line - hence it never advances the pointers past the beginning of the lines data until it's done the whole thing.	13:55.03
	The problem is the 0xFF stuff means that a lines data can be arbitrarily long.	13:55.37
ray_laptop	Robin_Watts: but you said it is skipping the 0xff bytes (presumably looking for something)	13:55.52
Robin_Watts	Suppose I have a lines worth of data in my buffer.	13:56.05
ray_laptop	Robin_Watts: can the 0xff bytes be ignored ? (thrown away) or are they data	13:56.17
Robin_Watts	0x80 bytes worth say.	13:56.17
	Sometimes they can be ignored :)	13:57.01
	Markers are preceeded by 0xFF bytes, IIRC.	13:57.17
	The only 'safe' fix, is as I described I believe.	13:57.47
	As a caller we need to note that the buffer was full on entry, was still full on exit, and no output was given.	13:58.19
chrisl	That sounds like a libjpeg bug to me - it should consume padding bytes, surely?	13:59.02
ray_laptop	chrisl: that's what I was suggesting as well.	13:59.47
Robin_Watts	chrisl: To do that, it would need to copy the data into an internal buffer.	14:00.00
	And libjpeg has no such internal buffer.	14:00.31
kens	Sorry, network died....	14:01.00
chrisl	But they are still valid bytes in a jpeg stream, so the decoder should consume them	14:01.12
Robin_Watts	Gah. I'm clearly not making myself clear here.	14:01.28
ray_laptop	Robin_Watts: it is legal for a stream to move the data it wants to keep around in the buffer it has to move it past the 'junk' (placing it over some of the later 0xff bytes) and then advance the pointers as if 'consumed'	14:01.38
Robin_Watts	The jpeg code remembers the pointers at the start of every scanline.	14:01.44
	It then attempts to decode the scanline.	14:01.57
	If it succeeds, it updates the pointers (and stuff is consumed)	14:02.12
	If it fails (due to lack of data), it resets everything to the beginning of the scanline, and suspends.	14:02.34
	Hence no data is consumed until it is passed a whole scanlines worth.	14:02.58
chrisl	So if you have a hard maximum buffer size, you could get an image that libjpeg would infinite loop on?	14:03.27
Robin_Watts	Yes.	14:03.33
chrisl	That's sh*t!	14:03.40
Robin_Watts	(If you use the suspending interface)	14:03.40
	That's why no one uses the suspending interface :)	14:03.51
chrisl	But JPEG is supposed to be streamable, and that breaks the ability to stream it	14:04.21
ray_laptop	Robin_Watts: I am saying if the buffer comes in as 3 bytes that _will_ be needed, say 0, 1, 2 then 8 0xff's it can change the data from: 0, 1, 2, ff, ff, ff, ff, ff, ff, ff, ff to 0, 1, 2, ff, ff, ff, ff, ff, 0, 1, 2 and advance the pointers by 8 bytes	14:04.32
Robin_Watts	chrisl: No.	14:05.03
chrisl	No?	14:05.11
Robin_Watts	You could use the non-suspending interface (which is actually much simpler in most cases).	14:05.44
chrisl	How do you feed data in byte-wise if the decoder can't suspend?	14:06.16
Robin_Watts	Or you can spot the condition as I suggest, and increase the buffer size.	14:06.29
ray_laptop	Robin_Watts: if it doesn't have a known 'min' amount of data that a stream provider knows about, then it isn't streamable	14:06.32
Robin_Watts	ray_laptop: It is if you don't use the suspending interface!	14:07.09
ray_laptop	Robin_Watts: increasing the buffer size 'just isn't done', but might be possible. I'd have to look at all of the stream system invariants carefully again.	14:07.21
kens	So why are we using the 'suspending interface ' ?	14:07.56
*ray_laptop*	hasn't looked at the suspending vs. other interface	14:08.08
chrisl	Robin_Watts: surely, if you don't use the suspending interface, you need to give libjpeg the all the image data at once?	14:08.23
Robin_Watts	kens: Because to do otherwise would be harder w.r.t postscript, I suspect.	14:08.24
	chrisl: No.	14:08.28
kens	OK	14:08.29
Robin_Watts	The normal way to call jpeglib is to give it functions that refill a buffer for it.	14:09.03
ray_laptop	Robin_Watts: what's the other interface called (so I can look it up)	14:09.06
Robin_Watts	then when you run it, jpeg calls that and it calls back into your code and you read more data from the stream, and then jpeg continues.	14:09.47
ray_laptop	Robin_Watts: Ah, a 'pull' rather than 'push' interface. So for this data stream, it would keep calling to ask for data ?	14:09.58
Robin_Watts	indeed.	14:10.04
chrisl	Robin_Watts: Ah, I see - that might be workable for PDF, but not for PS :-(	14:10.07
Robin_Watts	chrisl: indeed.	14:10.17
ray_laptop	PS allows for indefinite early end to the amount of data for images	14:10.39
Robin_Watts	You can turn it from a pull into a push using a separate thread and passing data that way, but it's harder.	14:10.43
chrisl	It still seems very poor to include a broken interface like that	14:11.34
ray_laptop	Robin_Watts: so is moving data within to buffer to save what you need, but skip the junk possible ?	14:11.41
Robin_Watts	ray_laptop: Not easily.	14:11.51
ray_laptop	s/within to/within the/	14:11.55
Robin_Watts	I'm looking at that now, but it's not trivial.	14:12.05
chrisl	And it doesn't have a way to find how many bytes it tried to consume before running out of valid data?	14:13.46
Robin_Watts	I could make it 'shuffle up' all the data in the buffer whenever it skips a padding byte, but that would be slow.	14:14.10
	Might be possible to hack something in to 'count' the number of padding bytes it skips, and then adjust on a suspension.	14:14.50
ray_laptop	Robin_Watts: you'd do that only upon suspension	14:14.54
Robin_Watts	but I don't know if I can do that without hacking the lib itself, rather than our interface routines.	14:15.24
	actually, I bet I can.	14:16.08
ray_laptop	Robin_Watts: does it only suspend in certain states ?	14:16.10
Robin_Watts	ray_laptop: What do you mean?	14:18.02
ray_laptop	Robin_Watts: nm. I really haven't enough knowledge about the JPEG lib and its interface yet to help	14:19.00
	Robin_Watts: mainly I wanted to mention that changing the data in the buffer is legal in GS (the 'stream_compact' step does that all the time)	14:19.55
Robin_Watts	right.	14:20.04
	Aha.	14:36.12
	In JPEG, FF is used as an escape byte. So a genuine FF will be outputted as FF 00.	14:36.50
	FF xx (where xx !=0 and xx != FF) means 'marker xx'.	14:37.24
	FF FF is illegal.	14:37.29
	IF the jpeg lib finds a stream of FF's then it ignores all but the last one.	14:38.55
	So on a suspension, I can 'compact' the stream data by squashing any sequences of FF to single ones.	14:39.40
xreal	How can I change the author of a PDF file without losing fast web viewing/rendering?	14:44.18
Robin_Watts	xreal: In other words, you have a linearized PDF, and you want to change the PDF and still have it linearized.	14:45.05
	Good question :)	14:45.39
xreal	Robin_Watts: MMh, yeah. Ghostscript can rewrite the whole file to be optimized, but Corel won't reopen it then.	14:45.57
	Okay, it's only 24 kb ... but it makes me sad.	14:46.04
	"pdfopt" can make it fast viewing, but the PDF even crashed in Acrobat :.-)	14:47.45
ray_laptop	I assume our 'rewritten' PDF is valid PDF (according to other viewers) that it's just something bad about Corel's PDF open ?	14:48.19
	xreal: OK, so please file a bug for us.	14:48.35
xreal	ray_laptop: Okay. It's wired, since Corel uses Ghostscript in background :-)	14:49.05
ray_laptop	xreal: just curious -- does GS open the resulting PDF ?	14:49.07
xreal	ray_laptop: 1 sec	14:49.13
ray_laptop	xreal: that's interesting that "Corel uses Ghostscript in background"	14:49.39
	xreal: what Corel package, and what version ?	14:49.53
xreal	ray_laptop: Yeah, since X4 or X5 (I think, X5)	14:49.54
	X5 uses 8.64, but can be updated to any GS	14:50.04
ray_laptop	Corel which (draw ?)	14:50.14
xreal	CorelDrae, Corel Technical Designer	14:50.32
	CorelDraw	14:50.42
ray_laptop	xreal: thanks.	14:50.44
	(sounds like a job for Miles)	14:50.51
xreal	Let me describe my main problem.	14:51.02
ray_laptop	xreal: OK, but we would like a bug report (or at least the file that you use as input) so we can look at what's wrong with our pdfopt (probably in gs_pdfwr.ps)	14:52.11
xreal	I'm exporting a PDF from Corel (which really uses a nice PDF export, it's from Adobe I think). But it writes the filename to title... I can drop the title in Acrobat, but it changes the change date of the PDF... I don't want that. GS can rewrite the changedate, but the fonts in the PDF gets changed and can't be reopened in Corel :-)	14:52.15
Robin_Watts	https://coreldraw.com/forums/t/25637.aspx	14:52.38
ray_laptop	xreal: I'm surprised that pdfopt changes the fonts. The full pdfwrite device processing will (probably) change the fonts, but AFAIK, pdfopt just copies objects around.	14:53.53
xreal	Also, I need to file a bug with GS' EPS output. Word (any version since 2002/XP) can import EPS without a problem, but sometimes, the EPS-output of GS seems to be damaged.	14:54.00
	ray_laptop: pdfopt makes it unreadable in Corel at all and Acrobat 9 (pro) crashes.	14:54.36
ray_laptop	xreal: OK, please open bugs at bugs.ghostscript.com	14:55.54
xreal	ray_laptop: Okay, I'll do that after my design is finished.	14:56.17
ray_laptop	Robin_Watts: looks like Corel is DEFINITELY violating GPL in that they distribute Ghostscript (include it in X5)	14:57.24
xreal	ray_laptop: No. It's a separate installer.	14:57.42
Robin_Watts	ray_laptop: Only if they don't supply the sources for any changes they've made.	14:57.57
xreal	It's a 1:1 of the 8.64 release with an own installer.	14:58.36
	All files are included without a change.	14:58.41
chrisl	Even if it isn't actually a GPL violation, it still seems like a good one for Miles to follow up	14:58.58
xreal	(Who is Miles?)	14:59.18
Robin_Watts	It's worth telling Miles, yes, but we should be careful to give him accurate information (i.e. that's it's not necessarily an infringement)	14:59.43
	xreal: Miles is the CEO of Artifex, the company that owns/develops ghostscript (and employs us :) )	15:00.13
xreal	Ah, I see.	15:01.04
	I've got a very bad idea for a work-around. I'll use a debugger to make CorelDraw not to write the title into the PDF file... That will fix my problem. I hate workarounds like these.	15:02.52
ray_laptop	sorry -- was taking the kids to school (went much smoother this AM)	15:37.16
Robin_Watts	ray_laptop: Is pr->limit the last byte in the buffer?	15:37.35
	or one past the last byte in the buffer ?	15:37.42
ray_laptop	Miles doesn't assume GPL violation, he just looks into the stuff we send him. What's salient is that we had attempted several times to interest Corel in GS over the years.	15:39.14
	Robin_Watts: I always have to look at the code (the stream pointers aren't the way I expect)	15:40.11
	Robin_Watts: the docs are in base/stream.h	15:40.56
Robin_Watts	Given that (pr->limit - pr->ptr) seems to be used in the debug statements for 'how many bytes in the buffer', it seems that limit should be one past the end.	15:41.54
ray_laptop	* The following invariants apply at all times for read streams:	15:44.23
	* s->cbuf - 1 <= s->srptr <= s->srlimit.	15:44.25
	* The amount of data in the buffer is s->srlimit + 1 - s->cbuf.	15:44.27
	* s->position represents the stream position as of the beginning	15:44.28
	* of the buffer, so the current position is s->position +	15:44.30
	* (s->srptr + 1 - s->cbuf).	15:44.31
Robin_Watts	Oh, gawd.	15:44.37
	so *pr->ptr is not the first value in the stream?	15:44.53
	it's pr->ptr[1]; ?	15:44.59
	and pr->limit points to the last byte.	15:45.21
ray_laptop	Robin_Watts: right. That's why I always have to check. It is never intuitive to me.	15:45.46
henrys	all the stream stuff works that way - it's a peterism.	15:45.50
Robin_Watts	(there's no place like home, there's no place like home...)	15:45.50
ray_laptop	Peter had a reason for it (I think), but I've never understood why it's so screwy	15:46.28
Robin_Watts	It's insane. And nasty. And (if I understand it correctly) involves an unbearable amount of copying of data.	15:47.07
*Robin_Watts*	rewrites his code then.	15:48.20
henrys	anyway Robin_Watts shouldn't have to fool with this ...	15:48.42
Robin_Watts	henrys: I'm practically there.	15:49.00
henrys	as you wish	15:49.18
	ray_laptop:are you still working on the memory leak? - I had a thought if you are tearing down the entire allocator everything there's no need to the finalize calls.	15:51.42
	if that is the issue.	15:52.36
	unless we are closing files or other resources in finalize calls.	15:54.45
ray_laptop	henrys: we need to finalize so we can free the semaphore handles (gp_semaphore_cliose). If all we do is free chunks, then the memory doesn't leak, but the handles do	15:55.37
	so, it is precisely the 'other resources' (handles) that we are concerned with	15:56.10
mvrhel2	hehe. Robin_Watts: I remember when I saw that the first byte in the stream was not the first data value.	15:56.33
	and scratched my head	15:56.46
henrys	ray_laptop:in a complete teardown those can be done explicitly.	15:57.17
ray_laptop	all I can guess is that Peter didn't want people playing around in the stream code without reading the docs, so he made is totally non-intuitive so that 'common sense' code wouldn't work ;-)	15:57.37
Robin_Watts	ray_laptop: More likely that somehow it has a formulation that when compiled with some 20 year old compiler on the chip that that now runs my washing machine, it saved a cycle.	15:58.28
mvrhel2	hehe	15:58.50
Robin_Watts	(and god knows I've been guilty of that myself :) )	15:58.56
mvrhel2	speaking of which, henrys: I did not do the commit of the change from double to float. I need to dig into this a bit more. for some reason even though the profiling showed it should be better it actually ran slower	15:59.48
	the diffs in the cluster push were tiny though	16:00.03
ray_laptop	henrys: a hack like that is really ugly (IMHO), but the dev_ht (partial) reference counting is already quite ugly, so maybe a hack is the best we can do. I'd just rather be able to have the icc code's ref counting stuff take care of stuff (since it appears to be properly maintained).	16:01.03
mvrhel2	I have to wonder if there was some compiling optimizations between what I was profiling and the release that I ran	16:01.04
	Robin_Watts: is there a bug for the pattern issue that we have left for the plank device?	16:02.15
	do I need to look at anything?	16:02.21
Robin_Watts	mvrhel2: Not currently.	16:02.26
	yes please.	16:02.31
mvrhel2	ok	16:02.31
	oh	16:02.39
Robin_Watts	(There is not currently a bug, yes, I'd like you to look at it :) )	16:02.46
mvrhel2	gotcha	16:02.51
ray_laptop	mvrhel2: I know that some Intel chips are much less efficient with floats (needing more memory cycles), particularly if the floats aren't stored on double alignment	16:02.59
Robin_Watts	All the information should be in the irc comments from yesterday - or I can open a bug if you'd rather.	16:03.09
henrys	ray_laptop:I am having difficulty understanding how the memory can just be released in random order, without dereferencing dangling references. I guess if each free was followed by setting the pointer to NULL and then we checked it would work.	16:03.30
mvrhel2	ray_laptop: that would make sense.	16:03.33
	Robin_Watts: let me look at the log	16:03.48
ray_laptop	mvrhel2: but every chip from Intel seems a little different in the nitty gritty of what runs fastest	16:04.12
	now, in terms of embedded system CPU's floats are generally faster than doubles	16:04.59
mvrhel2	sigh	16:05.18
henrys	mvrhel2:I am looking at the assembly on the mac and we are generating 2 extra instructions the convert to and from double hard to believe it would be slower without those, maybe not significant.	16:06.31
	but shouldn't be slower.	16:06.50
Robin_Watts	ray_laptop: It's harder than that even. On ARM, all floats are converted to doubles for procedure calls (or were at least)	16:07.02
mvrhel2	henrys: when I get done with this pattern thing for the plank device I will revisit this	16:07.22
henrys	okay	16:07.32
Robin_Watts	ray_laptop: I have something that seems to work here; if we suspend without reading anything from the buffer, then I compact the jpeg buffer.	16:07.46
	But it's still not ideal, as I could construct valid JPEG data that used more than 0x800 bytes for a scanline, in which case we'd go into an infinite loop anyway.	16:08.19
	Is there a way to test whether the buffer is 'full' or not ?	16:08.33
*kens*	is logging out now.	16:09.21
Robin_Watts	night kens.	16:09.27
kens	Goodnight all, see you tomorrow.	16:09.28
wasabi2	Hey folks. Was wondering if there's anyway to round trip a rather large postscript stream through ghostscript (or something else) to remove duplicated resources, like images?	16:09.31
kens	More likely it will add more duplictaes.	16:10.04
ray_laptop	wasabi2: it might be useful to round trip it through PDF (then back to PS) since the 'pdfwrite' device does try to (I think) detect images and objects that are the same.	16:13.38
wasabi2	Will do.	16:14.07
ray_laptop	wasabi2: I know that it has all of these 'cos_***_equal' functions that use md5 sums to (try to) detect the same dicts, streams, etc.	16:20.10
wasabi2	Hmm. Might be working then.	16:20.46
	2gb file has been working for... 5 minutes, and is only 2MB output so far... it should be WAY bigger than that after 5 minutes.	16:21.18
ray_laptop	wasabi2: which step is that? creating the PDF or creating the PS from the PDF ?	16:22.00
wasabi2	ps2pdf	16:22.30
	2gb postscript file to pdf... only at 2.1mb now	16:22.48
ray_laptop	wasabi2: there are several temp files that the pdfwrite uses (iirc)	16:22.59
wasabi2	well, it should eb way further along than 2MB if it was outputting duplicate images. heh.	16:23.21
*wasabi2*	waits patiently.	16:23.27
	Trying to get this process down... ultimately, I'm going to be printing 80,000+ pages	16:24.05
mvrhel2	Robin_Watts: so yes, it appears that we were not writing and reading in the pattern size correctly	16:49.02
	I made a fix	16:49.07
Robin_Watts	cool. That would have taken me weeks :/	16:49.17
mvrhel2	I dont think so	16:49.32
	so, the question is, do I just go ahead and commit this	16:50.20
	or should I send you the patch	16:50.35
Robin_Watts	Commit it, and I'll test it.	16:51.12
mvrhel2	ok	16:51.19
Robin_Watts	It won't hurt the cluster because the cluster isn't testing planar stuff yet.	16:51.31
mvrhel2	ok good	16:51.36
	shoot forgot to add CLUSTER_UNTESTED in my comment	16:55.11
	is it possible to change it before I push	16:55.22
Robin_Watts	mvrhel2: git commit --amend	16:55.32
mvrhel2	oh yes	16:55.35
Robin_Watts	But, testing it won't hurt, right, just in case it has an unintended effect for non planar stuff ?	16:55.57
mvrhel2	oh that is probably true	16:56.06
	I have seen that happen before :)	16:56.15
Robin_Watts	It's clist^H^H^H^Hhinatown.	16:56.47
mvrhel2	ok. it is commited	16:57.04
Robin_Watts	Ta.	16:57.08
mvrhel2	so I would have thought all the files with patterns when running in clist mode would have crashed without this	16:57.42
	but this was a rather large pattern	16:59.11
	I certainly think they would have rendered wrong	16:59.26
	Robin_Watts: I am going to go back to looking at this performance issue	16:59.43
Robin_Watts	ok.	16:59.52
mvrhel2	let me know if you need me to help with anymore plank stuff	17:00.00
Robin_Watts	When I test this, I'll get marcosw_ to rerun the comparison. Hopefully this should get us down a lot more.	17:00.31
mvrhel2	sounds good	17:02.06
Robin_Watts	robin,alexcher,henrys: http://ghostscript.com/patch.txt	17:15.19
	robin,alexcher,henrys: http://ghostscript.com/~robin/patch.txt	17:15.30
	That is a patch that I believe should solve it properly.	17:15.50
	When we create a DCT decoding stream, I store the maximum buffer size somewhere that I'll have access to.	17:16.23
	Then when we suspend, I only compact if the buffer was full. And if the compaction fails, I throw an error to stop us going into an infinite loop.	17:17.06
	I'd like options before I commit though, as it relies on the fptr(osp_)->strm voodoo, which I don't really understand.	17:17.47
	Can we get DCT streams created OTHER than from psi ?	17:18.20
henrys	tor8:how about a release?	17:23.11
Robin_Watts	Yes, it looks like we can. I need to populate buffersize in those cases too.	17:23.17
henrys	the stream code has screwed intelligent rational humans at every turn, any change like this needs careful consideration.	17:24.50
	did you mean to write to ray_laptop? you wrote robin, alexcher, henrys:	17:26.42
Robin_Watts	I did indeed mean to write ray_laptop, thanks.	17:27.04
henrys	see what the stream code does to people ? ;-)	17:27.24
ray_laptop	Robin_Watts: yes, DCT streams can come from XPS, PXL, and possibly PCL (I think)	17:27.58
Robin_Watts	NEw version of the patch that should fix xps and pxl too.	17:28.22
tor8	henrys: wrapping up for one on friday or this weekend	17:28.37
henrys	great tor8	17:29.07
Robin_Watts	tor8: Anything I need to do for that ? (any showstopper android bugs or anything?)	17:29.22
henrys	just XL not PCL	17:29.53
	Robin_Watts:you may have just fixed a longstanding xl problem.	17:30.36
tor8	Robin_Watts: if you could take a look at #692377 and/or figure out why some AA pixels end up darker than they should (premul problems?) in one of the test files you had I'd appreciate it	17:31.23
wasabi2	Okay... so... PDF to postscript. I need to insert some special stuff before each page that switches paper trays. Any advice on this? I can post process the postscript pretty easily to insert the commands, but I want someway to be able to mark pages in the original PDF and have pdf2ps insert the commands based on that. Can this be done?	17:31.38
Robin_Watts	tor8: Will try and look.	17:32.19
femfum	I presented an issue regarding the apparent exaggeration of the size of the PXL files: I have an example where a little more 1'5Mb PDF (16 pages, with images at 72 dpi, text, fonts, shadings and vector graphics) with this command line... gswin32c -dBATCH -dNOPAUSE -sDEVICE=pxlcolor -dCompressMode=1 -sOutputFile=barca-athletic.pxl barca-athletic.pdf ...becomes a PXL file of more than 2.5 GB ...you think this is okay? ...or it co	17:32.37
tor8	the one with the overlapping circles that you used for the knockout/isolated tests had aa-artifacts where the aa pixels are darker than the non-aa mid-path pixels	17:33.31
Robin_Watts	henrys: malformed JPEG data in a PXL file can cause it to hang ?	17:33.48
henrys	femfum:you "presented" a bug - does that mean reported a bug on bugzilla?	17:34.22
Robin_Watts	actually, no, I can't see how the code would hang.	17:34.42
henrys	Robin_Watts:I need to dig up the bug but I recall there was a similar issue with XL.	17:35.03
femfum	not yet	17:35.29
Robin_Watts	oh, maybe it is possible.	17:35.40
	don't understand PXL :(	17:35.47
henrys	femfum:sounds like a worthwhile problem to report.	17:36.05
	femfum:include a command line and the athletic.pdf file please.	17:36.41
wasabi2	Hmm.... maybe there's a better way to do this. How about "can I have pdf2ps choose a mediatype/tray per page, and find the command from a ppd?	17:36.51
femfum	ok	17:37.44
tor8	Robin_Watts: the AA artifacts are in l16.pdf	17:40.34
Robin_Watts	tor8: The hell file.	17:40.49
	I am pretty damn sure I'm not going to get that sorted for a release.	17:41.09
	what I might be able to do is to add a #define to determine whether we try the knockout/isolated support or not.	17:41.48
	so people can at least get "the same as before".	17:41.57
tor8	yeah, the hell file! if you can't that's not a problem.	17:41.59
mvrhel2	henrys: so restarting my profiling from a clean build I had the lcms taking 31% of the time. With my changes to the lcms interpolation it brought lcms down to 25%	17:42.01
	zoom y is coming in around 18 percent	17:42.56
	I am going to cluster push this lcms change	17:43.12
	it got rid of the divide and and moved the loop inside the decision branches	17:43.43
	going to try one other thing first though	17:44.41
ray_laptop	mvrhel2: are these improvements portable to the lcms2 code base ?	17:51.20
mvrhel2	yes	17:51.24
	lcms2 and lcms share the same interpolation code	17:51.46
	that is there was no real change in this part of the code	17:51.59
ray_laptop	mvrhel2: cool.	17:52.36
henrys	mvrhel2:but with the interpolation code you had the same experience and absolute times got worse or something - are you timing also?	17:55.10
mvrhel2	good point. let me double check the times now	17:55.27
	I think though my profiling yesterday was messed up	17:56.01
Robin_Watts	running tests on that commit now. going to walk dogs while they run.	17:57.22
mvrhel2	cool. let me know how it turns out	17:58.07
	ok so at 600dpi we went from 48 seconds to 30 seconds	18:00.09
	with my fix	18:00.12
	let me try 1200 dpi	18:00.16
	oh wait	18:00.43
	we went from 95 seconds to 30 seconds	18:00.57
	It had not finished page 2 yet when I commented above	18:01.21
	so henrys: this is a pretty big win	18:01.28
	let me try 1200 dpi	18:01.38
wasabi2	So it seems like pdf2ps doesn't preserve image XObjects as individual references.	18:14.29
henrys	that's great mvrhel2	18:15.14
Robin_Watts	And the plank vs pamcmyk4 stuff seems good too.	18:24.01
	marcosw_: You awake ?	18:24.10
mvrhel2	oh thats good news	18:26.34
	wtf. how could 600 dpi go from 95 to 30 seconds but 1200 dpi goes from 802 to 822 seconds	18:32.10
Robin_Watts	mvrhel2: IS the patch online ?	18:32.38
mvrhel2	no, but I can do so	18:32.49
*henrys*	wonders if robin, ken and chris are going to try and bring tungsten bulbs home from the meetings.	18:32.56
mvrhel2	hold on	18:32.57
henrys	I can test it on the mac also if you want.	18:33.15
Robin_Watts	henrys: I've pretty much moved over completely to energy savers.	18:33.35
mvrhel2	http://www.ghostscript.com/~mvrhel/lcms.patch	18:37.38
	henrys and Robin_Watts: if you can run a check to see if this speeds things up with the file for bug 692323 that would be great	18:38.56
	like I said I do get a speed up at 600dpi but not 1200dpi	18:39.07
	I am going to run a profile at 1200 dpi to see what is the issue	18:39.19
Robin_Watts	mvrhel2: Will be tricky at the moment as my machine is chugging with pancmyk4 vs plank tests.	18:39.35
mvrhel2	also I am running out to a null output file	18:39.45
	Robin_Watts: ok	18:39.50
	henrys: maybe you can do a sanity check for me	18:40.20
Robin_Watts	That patch seems unlikely.	18:40.37
mvrhel2	unlikely for what	18:40.52
Robin_Watts	oh, sorry, that's not SSE2.	18:41.13
mvrhel2	no its not	18:41.18
	the SSE2 stuff is different	18:41.24
Robin_Watts	That's just shuffling the loops round, right.	18:41.25
	I have a similar patch to that. Let me find it.	18:41.32
mvrhel2	and tossing out the divide	18:41.33
Robin_Watts	http://ghostscript.com/~robin/patch2.txt	18:43.39
	oops. Patch is reversed, but you should understand it.	18:44.39
mvrhel2	hehe	18:45.13
	I was wondering	18:45.16
Robin_Watts	The c0,c1,c2,c3 rx,ry,rz values are signed, right ?	18:47.23
mvrhel2	rx, ry and rz are not signed but in fact are 16 bit unsigned weights	18:48.27
Robin_Watts	But the c's are signed ?	18:49.29
	so the 'division' is of a signed number.	18:49.40
mvrhel2	Robin_Watts: need to go eat with the family	18:50.01
Robin_Watts	ok.	18:50.06
mvrhel2	indian buffet for lunch	18:50.10
	I will double check on the signed stuff when I get back	18:50.20
	I realize this will effect the rescale	18:50.32
	bbiaw	18:51.42
Robin_Watts	I don't understand the justification for replacing the divide by a shift.	18:53.53
mvrhel2	well I am multiplying 2 16 bit numbers together and I want the result to be a 16 bit rounded result	18:57.19
	anyway	18:57.21
	have to head out now	18:57.24
Robin_Watts	You are multiplying 2 numbers in the 0..ffff range together, and want to get a number in the 0..ffff range.	18:58.12
	I'll scribble on paper and try to figure it out.	18:59.31
henrys	6:30 -> 5:54 at 1200 dpi on the mac	19:03.16
Robin_Watts	mvrhel2: (when you get back) I think I disagree with your division removal.	19:13.16
	The existing code does: (a + 0x8000) >> 16	19:13.38
	For a = 0..7fff that gives 0. 8000..17ffe it gives 1, 17fff...? it gives 2.	19:14.28
	Your code takes 17fff to 1.	19:14.49
	The correct reformulation is, I believe (0x8000+a+(a>>16))>>16	19:15.25
	(and that's for +ve a only)	19:17.05
	The plank vs pamcmyk4 gives big differences on 29-07B.PS in that stuff is shifted around the page.	19:22.35
	Hey marcosw_. Could you rerun the plank vs pamcmyk4 tests please ?	19:24.10
marcosw_	Robin_Watts: I'm online now.	19:24.22
	you noticed :-)	19:24.30
Robin_Watts	:)	19:24.50
marcosw_	how's the weather in the UK. Bloody cold and wet in Germany, might as well be living in Scotland	19:25.16
Robin_Watts	Quite nice here at the moment.	19:25.28
	but that's due to change in the next couple of days.	19:25.38
	Looks hot in Chicago at the moment :(	19:25.46
marcosw_	supposed to hit 30 on Saturday, but then thunderstorms on Sunday.	19:25.54
	Robin_Watts: okay, I've started the comparison, results in ~10 hours (depending on how many cluster jobs are run).	19:27.42
Robin_Watts	marcosw_: Thanks.	19:27.53
marcosw_	do we want to discuss adding plank and/or some of the plan variants to the cluster and/or the nightly regression tests?	19:28.27
Robin_Watts	At the meeting.	19:29.04
marcosw_	sounds good.	19:30.13
Robin_Watts	mvrhel2: you here?	20:08.00
	If the tables are monotonically increasing then the c values are always positive, I think.	20:08.51
	but I have no idea if that's reasonable.	20:09.17
mvrhel2	Robin_Watts: ok yes I do need to fix this. working on it in a minute	20:53.56
	this was more me doing some timing tests to understand where the cost was	20:54.27
	henrys: ok. let me fix this then we can rerun the timing	20:55.01
	maybe I screwed up my test at 1200dpi	20:55.18
henrys	the winter 2011 line of mac books with sandy bridge really have a nice performance improvement but the apple tax is high.	21:09.44
Robin_Watts	mvrhel2: here?	23:02.48
mvrhel2	Robin_Watts: I am now	23:17.01
	so what do we need to do now with the plank device? is it all done?	23:17.30
	if so, then we need to start looking at the halftoning	23:17.48
Robin_Watts	mvrhel2: pending the results of marcosw_'s latest run, yes.	23:18.00
mvrhel2	ok cool	23:18.14
Robin_Watts	I suspect it'll still tell us there are lots of differences. Last time was 245.	23:18.27
mvrhel2	I would have thought there were be quite a few based upon that last issue	23:18.55
	it would seem that none of the clist rendered cases would have worked	23:19.09
Robin_Watts	but I've tried almost all of them locally and either found no differences, or excusable ones.	23:19.10
	Yeah, my local tests were running all non-banded.	23:19.22
mvrhel2	ok	23:19.24
	I wonder if his is	23:19.32
Robin_Watts	so I'll retest with banded on tomorrow.	23:19.35
	His tests both banded and nonbanded.	23:19.40
	That may explain the 'no difference found' ones.	23:19.51
mvrhel2	yes	23:19.56
	hopefully with this last fix the count will drop	23:20.07
Robin_Watts	indeed. But there are still a few where patterns are shifted by 1 pixel.	23:20.28
	(and lines).	23:20.32
mvrhel2	that is odd	23:20.37
Robin_Watts	which is probably the usual gs thing of changing position when you change device.	23:20.50
mvrhel2	ok. I am not aware of that issue	23:21.05
Robin_Watts	It's something you see when comparing various devices in gs.	23:21.23
	I can't entirely explain it, but I know of at least one factor behind it.	23:21.35
	To return to the interpolation code momentarily...	23:26.28
	If the c's are negative, then the division in the existing code is nasty.	23:26.54
mvrhel2	Robin_Watts: no I have that figured out it is rather simple	23:27.09
Robin_Watts	because it rounds towards zero, so there is a 'flat spot' in the graph around zero.	23:27.21
	My code trades 3 subtraction operations for a multiply.	23:28.15
	sorry, ignore that for a mo.	23:28.30
mvrhel2	ok. I am testing my fix now with bmpcmp	23:28.41
Robin_Watts	So, what are you using for the division replacement now ?	23:28.43
mvrhel2	you have to realize what it is that is being computed in the operation	23:29.04
	rx, ry and rz are a scaled representation of 0 to 1 by 0 to 65535	23:29.30
Robin_Watts	Not quite.	23:29.42
mvrhel2	yes	23:29.45
	that is tetrahedral interpolation	23:29.51
Robin_Watts	rx are a scaled representation of the interval [0..1)	23:29.56
	by 0 to 65535.	23:30.02
mvrhel2	ok yes	23:30.30
	but in the end, we can scale the output by adding ffff and then shifting right by 16	23:31.36
Robin_Watts	Now I'm even more confused.	23:31.53
mvrhel2	hehe	23:31.57
Robin_Watts	If the existing code is correct, then the correct way to give an identical result for all positive a is (a + (a>>16) + 0x8000)>>16	23:32.37
mvrhel2	we need to correct for our representation of 0 to 1 by 0 to 65535. We are taking a convex combination of points	23:32.45
	in the term rest	23:32.58
Robin_Watts	But what you said about the interval has made me think that the existing code isn't right.	23:33.09
	We're adding rx lots of 1/65536 in when we interpolate.	23:33.39
	So surely we should be >>16 anyway, not /0xffff	23:33.49
mvrhel2	where I may have run afoul is n the signed issue. anyway I will beat on it more tonight to check	23:35.00
	we have school open house that I have to head to in a bit	23:35.14
Robin_Watts	Do you agree that the code as written originally by marti is wrong ?	23:35.25
mvrhel2	No I think it is valid. He is doing a rounding operation with his addition of 7fff and then normalizing with the division	23:36.20
	I just don't think you want to do a division there	23:36.33
	but a shift	23:36.35
	and I need to make sure I am not mucking up the sign bit	23:36.49
Robin_Watts	Humour me.	23:36.56
	We have a table of values v[i]	23:37.11
	and we want to find a value at a position i.f.	23:37.33
	So we take v[i] + (v[i+1]-v[i]).f	23:38.02
	Now in our representation f = rx, which is how many 65536ths we are between i and i+1.	23:38.46
	Therefore the linear interpolation step should be:	23:38.59
	v[i] + (v[i+1]-v[i]).rx/65536	23:39.17
	i.e. it should be >>16 NOT /65535	23:39.30
	(regardless of rounding)	23:39.46
	Does that make any sense?	23:40.58
	I believe martis code would be right if we wanted f == 65535 to mean 'just give me the value at i+1'.	23:42.08
mvrhel2	yes. but let me check something real quick	23:42.09
	yes. 65536	23:43.39
	I think you are correct	23:43.48
	the code as written is wrong	23:43.55
Robin_Watts	Cool, then we have the perfect excuse to break it :)	23:44.27
mvrhel2	when I get back tonight, I will force some "known" values through the code to double check	23:44.34
Robin_Watts	excellent. I'm heading to bed. Have a good one, and we can talk tomorrow. Night.	23:45.00
mvrhel2	actually my bmpcmp shows diffs	23:45.11
	I mean will show us the diffs	23:45.22
	Robin_Watts: ok have a good night	23:45.29
	marcosw_ will love this commit.	23:46.06
	lots of changes	23:46.09
	so far all appear minor	23:48.32
	Forward 1 day (to 2011/09/02)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.