Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2014/04/16)	2014/04/17
ray_laptop	I don't know about the cut 532 code, but comparing 8.71 to 9.14, the "Tf" PDF operator uses 1,150,052 interpreter steps with 9.14 compared to 60,297 with 8.71 !!!	01:11.22
	thing is, that's only once, but it IS once per page	01:12.32
mvrhel_laptop	ray_laptop: wow	02:22.06
aditsu	hi, I tried to convert an xps document to pdf using gxps, and some words are overlapping; mupdf renders the xps correctly though; then I tried to convert the document to pdf using mudraw, but it failed	04:33.40
	how can I convert it correctly?	04:35.10
	huh, it seems that I can convert to svg.. then I could convert from svg to pdf page by page and join them... but there must be a better way	04:36.46
kens	aditsu without seeing your XPS file there's no way we can comment on your problem. If you think you've found a bug please report it, but be aware that XPS -> PDF doesn't get a huge prioirity.	07:08.15
aditsu	kens: the xps file is somewhat confidential, I'll check if I can reproduce the problem another way; also, mudraw's error was "pdf device supports only base 14 fonts currently", does that help?	07:12.02
	the file seems to use some Chinese fonts	07:12.48
	warning: not building glyph bbox table for font 'PMingLiU' with 34046 glyphs	07:13.00
kens	Not really, it just means that the XPS file contains a font outside of the base 14 set, the current PDF output in MuPDF can't deal with embedded fonts I think	07:13.07
aditsu	ok	07:13.23
kens	A Chinese font woudl definitely not work with that code	07:13.44
	chrisl got a weird one overnight, not sure what to do about it, got a moment to chat ?	07:25.19
chrisl	kens: sure, give me a minute to finish an email	07:26.23
kens	No rush, I'm still poking it gingerly with a pointy stick	07:26.45
chrisl	kens: Okay, mail sent......	07:29.55
kens	OK bug is 695167 if you grab teh simplified file from there	07:30.13
	And I'll babble while you do.	07:30.24
	The file has a type 11 (TT outlines) CIDFont TKJEFD+ArialMT which has a single descendant also called TKJEFD+ArialMT (now there's a bad plan to start with)	07:31.36
	The descendant has a CIDToGIDMap (object 14) which has a length of 0.....	07:32.02
	buildfont11 throws a fit over that	07:32.25
	Or at least, I think its that	07:32.35
	The actual code line is:	07:32.53
	code = font_string_array_param(imemory, op, "CIDMap", &rcidmap);	07:32.53
	line 394 in zfcid.c	07:33.04
	I'm assuming that CIDToGIDMap gets munged into CIDMap somewhere along the way ?	07:33.30
	My first thought was to ignore the error, but the font is actually used in the course of the job (goodness knows what actually gets printed by Acrobt, I suspect nothing, or a notdef)	07:34.29
chrisl	Erm, I can't remember,,,,	07:34.42
	BTW, zfcid.c only seems to have about 90 lines in it	07:35.14
kens	The CIDToGIDMap is probably buried somewhere in the PDF interpreter	07:35.16
	Sorry, zfcid1.c	07:35.30
	Hmm, OK if I ignore the 'error' then the job renders	07:36.21
	And I do indeed get a TrueType /.notdef (hollow square)	07:36.38
chrisl	That's not surprising	07:37.00
kens	Well, I thought we might get an error further on when it tried to access the CIDMap	07:37.19
	WHat's the rune for not rendering TT notdef ?	07:37.44
	ah got it	07:38.11
chrisl	I can never remember the capitalisation..... -dRenderTTNotdef, or -dRENDERTTNOTDEF	07:38.44
kens	all caps	07:38.48
	Hmm, still rendering a notdef	07:39.28
	Oh, so does Acrobat :-)	07:39.41
chrisl	So that goes some way to confirming my heuristic for that :-)	07:40.02
kens	:-)	07:40.07
chrisl	So, question is, are we rendering the "same" notdef....?	07:41.00
kens	Well I can cut the file down further to see.	07:41.20
	I'll get started on that. THanks chrisl, I'll have a patch for you to comment on later.	07:42.00
chrisl	I might be more informative to run the entire customer file	07:42.01
	s/I/It	07:42.11
kens	I'll do that too, but let me cut it down to just the broken font.	07:42.23
	BTW the entire customer file is huge	07:42.40
	34 inches by 22	07:42.49
chrisl	They always are from that customer	07:43.13
kens	Yes, and the file claims to be produced byAcrobat 10.1.9	07:43.33
chrisl	Hmm, and probably hacked around by something else, like the last one they sent...	07:44.03
kens	I'd have to guess so, yes, I don't believe Acrobat would create a file with a font with an empty CIDToGIDMap	07:44.29
	and rendering a notdef	07:44.35
chrisl	I just wonder if an empty CIDToGIDMap should be replaced with an explicit identity map, or just ignored	07:45.11
kens	I don't think we can tell from this file	07:45.32
	THe only glyph used in the font is FFFF which looks quite likely to be a notdef anyway	07:45.50
chrisl	Well, in that case, ignoring the error seems a sane way to go	07:46.28
kens	I think so, I'm going to test the returned array and see if its length is 0, if it is I'll ignore the error code and continue	07:46.54
	If we ever get a file like this which renders a real glyph we may need to revisit it	07:47.11
	Surprisingly I can't see any actual sign that this file has been meddled with after production	07:47.53
chrisl	There wasn't much evidence of it in the last one either, except for very un-Acrobat like constructs in the contents	07:53.33
kens	Yes, but in this case there's no bionkers stuff in there, its very clean	07:53.52
chrisl	An invalid CID font is pretty bonkers......	07:54.14
kens	I guess it depends if Acrobat thinks a 0 length map is 'legal'	07:54.34
	Its definitely seems to be the one glyph in this font that's causing the notdef	07:56.00
	I've removed all the other content and its still there	07:56.10
	And the original file now runs as well	07:56.44
chrisl	Oh, it can be a name - I would bet Acrobat is replacing the zero length stream with /Identity	07:57.35
kens	That's plausible	07:57.53
chrisl	Actually, the entry is optional, so what the PDF interpreter do when it's missing? That would seem a reasonable first pass	07:59.17
kens	Its optional ? Hmm	07:59.38
chrisl	According to table 5.14 in the 1.7 PDFRM	08:00.14
kens	I'm just removing it now	08:00.26
	OK it biold down to doing exactly the same as my pach :-)	08:01.21
chrisl	"Default value: Identity."	08:01.34
kens	if you can understand that mangled sentence	08:01.36
	I guess I could try and fix it in the PDF interpreter instead....	08:02.26
	I could drop zero length CIDToGIDMap	08:02.57
chrisl	My worry is ignoring it in the C code might cause problems with Postscript Type 11 fonts	08:03.17
kens	I think I'd be happier with that actually, as it means that we will throw an error yeah what you said	08:03.27
chrisl	It should be easy - /processCIDToGIDMap	08:03.45
kens	I'm looking at it	08:03.53
chrisl	Although, once again, why the hell would you implement that in Postscript.....??	08:04.37
kens	"Because we can" :-(	08:04.50
chrisl	kens: How would you feel about having /resolveR store the object number in any dictionaries it retrieves?	08:15.04
kens	Sounds reasonable, do you need it for something ?	08:15.19
	Have to give the key a reasonably unique name of course	08:15.31
chrisl	This stuff for customer 532 - if we want to cache (even non-embedded) fonts between pages, I need a way to validate what we retrieve from the cache	08:16.15
kens	OK might be useful in future for identifying different fonts with the same name also, I'm not sure I remember why we want that	08:16.57
	OK I htink I have a fix in the PDF itnerpreter, time for a cluster run	08:18.25
chrisl	We have at least one bug where the file uses, IIRC, two different font subsets with the same name on the same page, and we get that wrong	08:18.41
kens	That sounds familiar yes	08:18.53
chrisl	Also, if we can have something other than the font name to check, we could cache even embedded fonts between pages	08:19.17
kens	Which would be nice I guess	08:19.31
	Coffee, back in a minute	08:19.42
	Hmm, the cluster 'termination', ie the time take to exit after a test, seems faster since marcos reworked it.	08:35.54
chrisl	As in the job timeout?	08:37.59
kens	Yes, it used to 'stick' for ages after the jobs were 100% completed and all nodes idle	08:38.20
chrisl	It's probably not waiting for the one minute prompt now	08:39.20
kens	Well, not sure what the difference is, but I'm happy to have it :-)	08:39.38
Robin_Watts	spanners: You got a mo to answer some silly picsel font questions? (please feel free to say no if you're busy/disinclined etc)	09:52.34
henrys	anybody know about needing an agreement with "EMC" to use RSA? Sounds crazy.	12:21.15
kens	We used RSA before at 5D I think	12:21.34
	don't remember an agreement with anyone else. Unless RSA is owned by EMCV now ?	12:21.50
Robin_Watts	RSA was purchased by EMC in 2006.	12:24.33
	The RSA encryption method itself was patented, but that expired in 2000.	12:26.19
henrys	but openssl uses RSA and surely they haven't need to sign an agreement with emc	12:26.28
Robin_Watts	"RSA Security", the company, have produced other products.	12:26.58
	To use those you presumably need an agreement with EMC.	12:27.15
kens	eg BSafe	12:27.24
Robin_Watts	but to use the original RSA algorithm, why you you need an agreement?	12:27.34
	henrys: What is leading you to think that use of the RSA algorithm requires a license?	12:28.02
henrys	yeah that sounds right, we have a customer that needed to sign with EMC for their Adobe RSA stuff and now they want to know if they need to do the same for MuPDF's usage of RSA	12:28.44
kens	I would say no tehn	12:28.59
	I'd bet Adobe uses BSafe or something similar	12:30.39
tor8	henrys: hate to nag, but it's been 3 weeks since we got the domain... when are we going to get access to manage it?	12:42.19
henrys	tor8:I'll poke him again	12:42.55
tor8	henrys: whois lists miles as the owner, and the registrar as enom.com	12:43.22
	so I'm hoping the login details to enom.com is sitting around in someones inbox.	12:44.11
henrys	tor8: last time I pinged him - April 11 he said he "missed the auto-confirm" and it timed out - so he wrote back and hadn't heard anything.	12:46.06
	tor8: I sent a reminder again.	12:53.41
paulgardiner_lap	tor8: does MuPDF use rsa other than for signature support?	12:57.44
tor8	paulgardiner_lap: no, all we use openssl for is what you did.	13:10.13
	we still have our own md5 and rc4 and aes functions for encrypted pdf documents.	13:10.31
	Robin_Watts: 692708 is a bummer... any suggestions for how to fix it?	13:11.10
Robin_Watts	looking	13:11.16
	Presumably we are hitting errors in the parsing and ignoring them?	13:12.24
paulgardiner_lap	henrys: did the customer just notice openSSL amongst our thirdparty libraries or do they need signature support? If the former, another option may be to build without it (although we have yet to make that trivial in the make files).	13:12.55
Robin_Watts	IIRC the cookie allows us to set whether we ignore errors or not.	13:12.56
	and there is a count of the errors we've hit in the cookie.	13:13.12
	Possibly there is a max_errors as well where we can bale out if we hit that number?	13:13.30
tor8	Robin_Watts: looks like we're spending all our time in lex_white	13:14.12
Robin_Watts	OK, so the cookie has "errors", but not "errors_max"	13:14.49
tor8	it might be necessary to add the compression bomb detection to the base stream layer	13:14.49
	(gdb) p *csi->cookie	13:15.14
	$2 = {abort = 0, progress = 1, progress_max = -1, errors = 0, incomplete_ok = 1, incomplete = 0}	13:15.14
	counting errors won't help with this file	13:15.33
Robin_Watts	So, this is a file with a deliberately crafted compression bomb in it?	13:16.15
tor8	yup	13:16.55
Robin_Watts	Can we improve the speed of lex_white so it doesn't matter?	13:17.20
tor8	I'm not super concerned, but it would be nice if we could fail nicely	13:17.22
	I doubt it, it's the lzw decompression underlying it that's the culprit	13:17.36
Robin_Watts	So when you say the time is in lex_white, you mean it's in the lzw decompression called from lex_white ?	13:18.30
tor8	yeah. lex_white just indicates that it's reading a lot of whitespace	13:18.46
Robin_Watts	Can we improve the speed of the lzw decompression? :)	13:19.06
tor8	considering that it at current takes a minute or two to decode, I doubt we can make it that much faster :)	13:19.33
	but at least it completes now	13:19.42
	we could close as WONTFIX, compression bombs just make us hang for a long time	13:20.21
Robin_Watts	That would be my temptation.	13:21.02
	cos someone could construct a real file with lots of whitespace and then complain when we don't read it.	13:21.23
tor8	Robin_Watts: done.	14:23.14
ray_laptop	tor8: henrys: That's the same thing Miles told me, but it didn't make sense -- the thing that was no longer valid was the verification of email address. Miles did it (once) so it went away. The name is now registered to Miles	14:41.00
	tor8: henrys: I specifically told Miles that what we needed was the login for enom.com so we could modify the stuff there (or transfer to somewhere else)	14:41.54
	tor8: henrys: He said he was going to call them.	14:42.27
henrys	ray_laptop: well I sent him another message. Maybe he can hand it off to one of us, I know he is busy with other stuff.	14:43.30
ray_laptop	henrys: yes, once we get the login info. But in order to give that to Miles, they probably want some personal ID (such as the credit card info he used) or something. Otherwise, I could just call and say I'm Miles.	14:45.47
*ray_laptop*	didn't try that yet	14:45.58
henrys	ray_laptop: well he should come in soon and see my mail and we'll go from there.	14:47.18
	paulgardiner_lap: was that mupdf bug from raed ever assigned I can't find the damn thing now.	14:58.04
	paulgardiner_lap: nvm I found it.	15:02.48
ray_laptop	lots of folks showing up here recently. I wonder what's so interesting to most of them ? Not that I mind working with an audience :-)	15:10.41
chrisl	ray_laptop: the idea of making fonts persist between PDF pages is looking like a major project - everything in the font handling assumes local VM :-(	15:15.00
ray_laptop	chrisl: Thanks for looking. Are there any hacks that we can use for built-in fonts that we can use to avoid purging the cache for those (and find them on subsequent pages) ?	15:21.17
chrisl	ray_laptop: yes, but that won't help with this job, as it doesn't use a "built-in" font, as such....	15:22.07
ray_laptop	chrisl: we don't care about some of the font that may have been modified (such as Widths), just the bitmap as it has been rendered at a particular size	15:22.28
chrisl	ray_laptop: the problem is, this isn't using a base-14 font	15:23.18
ray_laptop	chrisl: yes, this _does_ use built-in font (or at least the font lookup machinery has selected a built-in substitute for TimesNewRomanPSMT)	15:23.22
chrisl	TimesNewRomanPSMT is not a base 14 font	15:23.47
ray_laptop	chrisl: i.e., it is NOT a font embedded in the PDF. And we map it to a base-14 equivalent	15:24.18
chrisl	It's still not using a base-14 font directly, we are substituting a base-14 font for the requested font	15:25.08
ray_laptop	chrisl: NimbusRomNo9L-Regu with regular gs, TimesNewRoman in the UFST casee	15:25.36
chrisl	ray_laptop: the job is using TimesNewRomanPSMT	15:25.55
ray_laptop	chrisl: it is _using_ a built-in	15:25.57
chrisl	TimesNewRomanPSMT is NOT a built-in font!!	15:26.14
ray_laptop	chrisl: the job requested TimesNewRomanPSMT	15:26.16
	chrisl: but after substitution, as far as the font machinery knows, we are using the UFST TimesNewRoman (or NimbusRomNo9L-Regu) -- I'm not sure it even knows what the requested fontn was	15:27.17
Robin_Watts	ray_laptop: That may be true for the graphics engine. I don't know that it's true for the PDF interpreter ?	15:27.57
chrisl	So we get a substituted font. When we substitute a font, we blow away the UID for the "base" font we started with - without the UID we have no way to know that the next font with that name is the same font with that name, so we purge it's entries from the cache	15:28.11
ray_laptop	At the PS FontDir level, yes, it's been put there as TimesNewRomanPSMT, but not down in the graphics lib	15:28.12
	chrisl: so the font cache is based on the UID ?	15:28.57
	so all we have to do is keep that somehow ?	15:29.15
chrisl	ray_laptop: persistence in the font cache is partially based on the UID, yes	15:29.16
	We can't keep it once we've manipulated the font	15:29.34
	We can't synthesize a small caps font, and keep the UID of the base font we started with	15:30.13
	For example	15:30.23
ray_laptop	chrisl: we could keep the original UID under a different key / struct element	15:30.30
chrisl	ray_laptop: that doesn't help us, we'd still risk false positives	15:31.09
ray_laptop	chrisl: we don't ever synthesize a all caps or small caps font that I know of.	15:31.16
chrisl	ray_laptop: oh yes we do......	15:31.28
ray_laptop	chrisl: really -- when/where does that happen ?	15:31.59
chrisl	ray_laptop: pdf_font.ps line ~900	15:32.34
ray_laptop	I vaguely recall logic to create fake italic using skee	15:32.38
	s/skee/skew/	15:32.44
kens	Its a common enough trick, the font is awlays ugly becaus eits not italic	15:33.51
chrisl	ray_laptop: this is part of the problem - these bits of font synthesis code happen in several places, and each place makes one or more copies of the font dictionary - working when and where the UID became invalid could be a nightmare.....	15:34.40
ray_laptop	chrisl: what's a key I can search for -- I don't see it near line 900	15:34.51
kens	Not to mention potentially breaking pdfwrite which itself makes multiple copies of fonts and merges them (sometimes)	15:35.11
ray_laptop	(that's readtype1dict in my code)	15:35.13
chrisl	/Flags oget 16#20000	15:35.13
ray_laptop	kens: I am looking for a hack for cust 532 -- pdfwrite is NOT an issue	15:35.38
kens	ray_laptop : then it beomes something special we have to maintqain :-(	15:35.55
chrisl	I suppose if we just change the font name, we don't need to zap the UID.... I wonder if we copy the dictionary anywhere else relevant	15:40.59
	ray_laptop: one thing did occur to me when you pointed out how many more iterations through the interpreter loop Tf makes now......	15:41.33
*ray_laptop*	waits anxiously ...	15:41.53
chrisl	in pdf_font.ps there is code which compares the average width of glyphs in the font with the average width in the Widths array - that is probably not relevant for cust 532	15:42.38
kens	What does it do that for ? Seems like an odd thing to do	15:43.12
*kens*	expects chrisl to tell me I added it....	15:43.34
chrisl	If the width of the glyphs is greater that the widths in the array, we scale the font down so the glyphs don't collide	15:43.58
ray_laptop	chrisl: and we probably didn't do that in 8.71	15:44.21
kens	Hmm, presumably only with substituted fonts ?	15:44.22
chrisl	It was me that added it, actually, <hangs head in shame> - yes, only substituted fonts.	15:44.51
ray_laptop	kens: well, we did substitute TimesNewRoman for TimesNewRomanPSMT :-)	15:44.52
kens	ray_laptop : yes I know, I was thinking more generally about that code	15:45.07
chrisl	ray_laptop: we did not do the width matching in 8.71	15:45.07
	It's a horrid hack to get around not having multiple-master fonts for substitution...	15:46.02
kens	Ys, it seems like a reasonable solution for font substitution	15:46.19
chrisl	It's just rather clunky doing it in Postscript..... but neither my testing, not the cluster should a measurable performance deficit, so it seemed reasonable.	15:47.23
	s/not/nor	15:47.31
kens	Yeah but as I understand things, our vanilla code doens't show a performance hit between the two versions (8.x and 9.x) either.	15:48.07
ray_laptop	chrisl: I also didn't see an overall performance difference between 8.71 and 9.14 on this file, but that was on my laptop.	15:48.25
	kens: right	15:48.32
chrisl	Which is why I thought it might be something worth trying for Len	15:48.45
ray_laptop	but they have a painfully slow CPU	15:48.48
	chrisl: do you have their 906 code base ?	15:49.29
chrisl	ray_laptop: no, I don't	15:49.39
ray_laptop	chrisl: or if I send you their pdf_font.ps, can you tell me where to change it ?	15:49.58
	or just do it in the HEAD 9.14 code and I can back port it	15:50.16
chrisl	ray_laptop: can you look in the current master pdf_font.ps?	15:50.23
ray_laptop	I already have it open (looking at the small-caps hack)	15:50.43
chrisl	Line 777 should be a comment "% Some non-compliant files are missing FirstChar/LastChar,"	15:51.01
ray_laptop	chrisl: yes	15:51.14
chrisl	Good, then you can delete, comment out, whatever, everything down to line 809: "} ifelse"	15:52.01
henrys	chrisl: is it possible they have a smaller amount of memory dedicated for caching - are we getting the same cache hit rate?	15:52.04
ray_laptop	henrys: the cache hit rate is fine	15:52.47
kens	Ray's email seemed to indicate a pretty good hit rate to me	15:52.49
chrisl	henrys: it's possible, I can't remember if they change the cache size. I've yet to see evidence the glyph rendering is actually slower	15:53.06
	ray_laptop: so that code in pdf_font.ps should not have changed (probably just the exact line numbers) since the 9.06 code	15:54.06
ray_laptop	chrisl: in the HEAD pdf_font.ps line 809 is a "{" following a line that has: dup length dup 0 gt	15:55.14
chrisl	Sorry, misread it: line 845	15:55.48
aditsu	kens: hi again, I have a test file I can provide (xps that doesn't get converted to pdf correctly), should I put it online somewhere or email it or file a bug report and attach it?	15:56.01
kens	aditsu ideally open a bug report please	15:56.24
ray_laptop	the ifelse corresponding to the 3 index /FirstChar knownoget test _IS_ line 845, so that makes more sense	15:56.36
aditsu	alright	15:56.50
kens	If the file is priavte let me know and I'll mark it so only Artifex folk can downloa dit	15:56.51
chrisl	ray_laptop: sorry, I'm suffering from a bit of hayfever, and it's making my eyes itchy and watery	15:57.24
ray_laptop	chrisl: they checked and the glyph rendering speed is the same as on the 871 code	15:57.27
aditsu	this one is not really private	15:57.32
kens	OK no problem then	15:57.39
ray_laptop	my left eye has been that way for months :-/	15:57.49
Robin_Watts	ray_laptop: Has that improved at all?	15:58.13
ray_laptop	chrisl: thanks. I'll test it and then send it off to Len to try -- and I WILL credit (blame) you :-)	15:58.35
	Robin_Watts: just recently (in the last week or so) I've seen some improvement in the eyelid responsiveness	15:59.22
Robin_Watts	excellent!	15:59.30
*ray_laptop*	agrees wholeheartedly	15:59.48
chrisl	ray_laptop: hah! Thanks (I think). As I said, given that they have multiple-master substitution, that kind of width matching shouldn't be needed in their code	15:59.57
henrys	congrats ray_laptop	15:59.58
ray_laptop	henrys: congrats for what -- just waiting somewhat impatiently ? ;-)	16:00.40
	henrys: but I know what you mean	16:00.56
henrys	ray_laptop: has anybody generated call counts - profile for the gs 906 with ufst? I don't see that in the emails.	16:01.26
chrisl	ray_laptop, kens: How about doing something with the PDF object number to "create" a UID for a substitute font?	16:03.06
ray_laptop	henrys: the AQtime (from the simulator) has call counts that are (AFAICT) trustworthy. email 4/15	16:03.40
kens	chrisl I was thinking of something like that when you mentioned the resolve_object stuff this morning	16:03.40
	I'd need to think abou thte impact of that with pdfwrite though. Its 'probably' safe	16:04.18
	Back in a minute	16:04.25
chrisl	kens: Yes, that wasn't how I was planning to use it, but it might a decent alternative	16:04.25
henrys	ray_laptop: right but where is the profile for gs running on a host with the same job?	16:04.25
aditsu	kens: bug 695168	16:06.35
ray_laptop	I am looking at one large mismatch in call counts -- looks like there are 216,792 calls to zgetdeviceparams vs. 684 on 8.71	16:06.40
	henrys: I just spotted that this AM	16:07.18
chrisl	Crumbs, that's probably a chunk of the time right there!	16:07.59
henrys	ray_laptop: I just though profile 8.64 and 9.06 customer next to 8.64 and 9.06 host based is going to tell us what happened.	16:08.31
ray_laptop	chrisl: on the simulator (which is DEBUG build, so it also includes validating refs) it accounts for 11 seconds out of 413.	16:09.39
	henrys: it's 8.71, but maybe. I have to do that on linux since profiling won't work on my laptop (neither VerySleepy or VS Performance tools)	16:10.51
	henrys: and profiling on an x86 is a LOT different to there puny CPU. I did consider profiling on the Raspberry to get closer to their performance	16:11.52
chrisl	And that probably won't help much if the extra time is coming from the PDF interpreter	16:12.13
kens	aditsu, OK I've reassigned it to me	16:12.22
henrys	well we probably just want call counts	16:12.29
	more reclaim?	16:12.38
ray_laptop	chrisl: right, it doesn't help identifying where the difference is coming from	16:12.40
kens	ray_laptop : because we (I) changed teh way that device detection works we call getdeviceparms a lot in the PDF interpreter	16:13.05
ray_laptop	henrys: I didn't understand that	16:13.07
aditsu	thanks for checking :)	16:13.22
kens	THanks for the report, I can't promise I'll get to it soon though :-(	16:13.37
chrisl	kens: does pdfwrite do anything special with private XUIDs?	16:13.38
ray_laptop	kens: OK. I thought we only did that once and set a flag	16:14.00
kens	chrisl off the top of my head, I don't think so, but I can't be 100% certain, I haven't meddled with fontsrecently	16:14.05
	ray_laptop : no, we do it every time we need to know whether a given device has a particular capability	16:14.22
henrys	ray_laptop: well if the slowdown is from GC, say I would expect more calls to the vmreclaim - that should be in the call count for the profile.	16:14.32
ray_laptop	kens: that seems ... non optimal	16:14.43
kens	ray_laptop : possibly	16:14.53
	But it means that we can change devices in the job and not get caught out	16:15.06
ray_laptop	kens: seems like we could collect the things we need to know from a single getparams and set flags for the pertinent characteristice	16:15.23
	kens: how can we change devices while parsing a PDF ???	16:15.48
kens	We could, yes, or we could do it the way I wanted originally, and use the device special ops	16:15.55
	ray_laptop : while we use it extensively in the PDF interpreter, we could also do it in PostScript	16:16.29
ray_laptop	kens: well, I didn't think you'd need to call getparams more than once	16:16.32
	kens: for PS, I don't care as much (right now)	16:16.48
kens	Well, obviously, but the code was written a long time ago now	16:17.03
ray_laptop	kens: it's nearing the end of your day, so I'll take a look at revamping the getparams in the PDF interpreter. I'll let you know when I quit for the day if I still need help on it (so you can work while I sleep)	16:18.34
kens	All the same, it seems like a lot of calls to be caused by that, but then 684 seems like a lot anyway	16:18.52
	ray its pretty easy to find all the places it gets used	16:19.27
ray_laptop	kens: yeah, but on the simulator that amounts to only 0.27 sec / 489 (instead of 11 sec / 413)	16:20.09
kens	Sorry don't follow the figures there	16:20.34
chrisl	I wonder why we feel compelled to blow away the UniqueID when we change just the FontName......	16:20.42
ray_laptop	kens: those are from their AQtime profiler results	16:20.59
kens	Then they still aren't clear to me, sorry	16:21.14
chrisl	I thought 9.06 was slower?	16:21.44
ray_laptop	the 684 calls to getdeviceparams on 871 code base is only 0.027/489 of the time, vs. 11/413 in 906 code	16:22.07
	that's why the AQtime is only relative	16:22.23
kens	So fewer calls but it takes more time ?	16:22.26
chrisl	But 489 is higher than 413......	16:22.41
*kens*	truly doesn't understand these numbers.....	16:23.05
ray_laptop	chrisl: ignore that -- it includes LOTS of simulator specific code and DEBUG code and may even be run on different hosts	16:23.35
chrisl	Oh, useful as ever :-(	16:23.53
ray_laptop	only call counts and relative timings are useful (AFAICT)	16:23.59
	and since it's on x86 something, it doesn't exactly correspond to the target anyway	16:24.35
	which is why I didn't dive into it until looking at other sources of info to try and find out what's going on	16:25.28
	kens and chrisl: thanks for the ideas. I'm going to work on them. If you come up with anything to improve the cache usage, let me know. If you want immediate response SMS since I may not be paying attention to IRC or email	16:27.28
kens	I'm off out shortly, won't be back till tomorrow morning. I'll look at the IRC log then	16:28.01
chrisl	ray_laptop: I'm going to see if there's a sane way we can preserve the UID when we don't actually change the glyphs in the font....	16:28.25
ray_laptop	chrisl: I just looked. Their 906 code doesn't have that small-caps stuff in it. It must have gone in later	16:30.30
chrisl	ray_laptop: you mean the width matching?	16:30.59
	ray_laptop: both the width matching and the small-caps code were in the 9.06 release - I checked before mentioning it to you	16:32.19
ray_laptop	chrisl: there code doesn't have the /Flags oget 16#20000 code in it	16:33.18
chrisl	ray_laptop: It's definitely in the 9.06 release - I'm looking at it right now!	16:33.53
ray_laptop	so either they don't have 906 or they ripped it out already	16:33.54
kens	OK I have to go,night all	16:34.12
chrisl	ray_laptop: what about the block above that for the Widths?	16:34.22
ray_laptop	kens: g'nite	16:34.25
chrisl	kens: 'nite	16:34.27
ray_laptop	chrisl: checking...	16:34.35
chrisl	ray_laptop: in the release code, it starts at line 726 with "3 index /FirstChar oget"	16:35.39
ray_laptop	chrisl: nope. It looks like that's where they have all of their MMFont stuff	16:37.23
chrisl	ray_laptop: oh well, sorry :-(	16:37.41
ray_laptop	chrisl: that's OK. Thanks anyway.	16:38.10
chrisl	So, I've hacked it so the UID remains, despite the metrics being changed, and I'm still getting the same number of calls to FAPI_do_char() :-(	16:39.08
ray_laptop	They have captured their findfont times and those haven't changed, but I may still have to look into thei Tf differences	16:39.12
	chrisl: the purge was being invoked from 'restore' (font_finalize) so unless that is fixed, the cache will disappear	16:40.12
chrisl	ray_laptop: no, there is a specific check to see if the font has a valid UID - if the UID is valid we shouldn't purge the cache, even though the font is disappearing	16:41.01
ray_laptop	and we have to make sure that the cache itself is in stable memory (or non_gc memory)	16:41.22
chrisl	It should at least be in global memory	16:41.54
ray_laptop	because each page has a save/restore bracketing it	16:41.59
	now, we might be able to just get rid of that save/restore. I'll try that	16:42.44
chrisl	No, we really can't do that!	16:43.35
	Hmm, something is still zapping the UID before we actually use the font......	16:44.04
ray_laptop	chrisl: why can't we get rid of the save/restore ?	16:45.26
chrisl	ray_laptop: you don't just want to keep accumulating objects in VM for the entire file	16:45.58
	ray_laptop: and I rather assume the save/restore is there for a reason, and not just for decoration!!!	16:46.26
ray_laptop	chrisl: OK, right. Objects that are "resolved" and actually stored in a dict somewhere won't go away. But Fonts are about the only large thing that does that, and those ARE kept across pages	16:51.25
chrisl	ray_laptop: but you'll run into the same problems as I described about putting fonts in global VM - we can't rely on the font name to distinguish between different fonts	16:52.21
ray_laptop	We don't ever save images or contents , which are the other large things, so anything unused will be picked up by the GC	16:52.26
chrisl	ray_laptop: with the current setup, you cannot reliably retain fonts between pages, whatever the mechanism for achieving that might be	16:54.22
ray_laptop	chrisl: we don't care about the font -- just the font cache, which is always using the same base font	16:55.15
chrisl	ray_laptop: look, take my word for it, you can't do it safely.	16:55.59
	ray_laptop: we have jobs in our test suite that a built-in font on one page, and an embedded font on the next page - both with the same FontName. If you preserve the font across the page, we'll use the font loaded on the first page, and get it wrong	16:58.42
henrys	ray_laptop: on linux 8.71 is much slower than 9.06 - maybe they have it backwards ;-)	17:03.26
ray_laptop	chrisl: The embedded font on the next page won't have the same UID, right ?	17:24.45
	chrisl: or are you concerned with not doing the save/restore	17:25.22
chrisl	ray_laptop: You don't select fonts with the UID	17:25.29
	ray_laptop: the problem is, because we're bound to Postscript we only identify fonts (in the interpreter) by font name.	17:27.37
ray_laptop	chrisl: the Tf complexity probably is what you identified -- the interpreter debug I had was not for their simulator, but was for the standalone code, so it would have your Widths adjustment stuff	17:28.59
	I'm re-running the capture of the 'I' output using the 906 simulator	17:29.36
chrisl	ray_laptop: the PDF font substitution isn't terribly well thought out - we never remove the original font after creating the substitute, so that precludes us from benefitting from the UID cache optimisations for substituted fonts.....	17:31.51
	ray_laptop: how general does the performance increase have to be?	17:35.44
Mikkadu	Hello. I'm working on PCL driver. It seems to work on old printers, but it does not on the new one. Seems like I've to use ESC*g#W command, but I cannot find its description. Anybody knows where I can find it?	17:41.32
Robin_Watts	Mikkadu: Sorry, can you be more explicit?	17:58.57
	You're working on a PCL device for ghostscript?	17:59.08
	so that ghostscript can feed PCL devices?	17:59.25
	Or are you using one of the existing PCL output devices in ghostscript and finding that there are some printers it does not drive?	18:00.03
	Or is this a question utterly unrelated to ghostscript?	18:00.27
Mikkadu	Robin_Watts Ghost script work greate in my case. My boss wants me to reproduce it =(	18:03.26
	I found bug report in gostscript	18:03.55
	related to the same issue	18:04.02
	http://bugs.ghostscript.com/show_bug.cgi?id=694082#c2	18:04.40
	This one	18:04.44
	So Hin-Tak Leung 2013-06-05 14:17:13 PDT	18:05.15
Robin_Watts	So you are writing your own PCL generator, nothing to do with ghostscript.	18:05.27
Mikkadu	Yees	18:05.38
	It have to ork on android	18:05.46
Robin_Watts	henrys is the PCL expert. He may be able to point you at some documentation. You'll have to wait for him to see the question.	18:06.02
Mikkadu	ok thx you a lot	18:06.29
ray_laptop	chrisl: sorry -- I was away. We don't need a totally general improvement. Cust 532 only runs PDF's and only to clist with a custom CMYtag device. As far as other files, we don't have any performance needs identified (yet), but we don't want to do something that might slow down other files	18:11.18
chrisl	ray_laptop: I just thought that adding an explicit mapping for TimesNewRomanPSMT and TimesNewRomanPS-ItalicMT might be an option?	18:12.29
	ray_laptop: adding an explicit map in the FCOFontmap for the two fonts saves between 2.5 and 4 seconds on the two problem PDFs on my machine	18:14.09
ray_laptop	chrisl: WOW. Is that with their simulator, or with regualr gs UFST build ?	18:15.56
chrisl	ray_laptop: that's with a regular build on Linux	18:16.25
ray_laptop	chrisl: what was the entire 50 page time ? (i.e. what percentage did it save)	18:17.34
	chrisl: and are you running -sDEVICE=ppmraw -dUseFastColor -dMaxBitmap=0 -dBandHeight=128 -o /dev/null	18:18.21
chrisl	ray_laptop: no, let me rerun the tests with those options	18:18.42
	What resolution?	18:19.06
ray_laptop	chrisl: and with -Z: the rendering time can be ignored (between Outputpage start and end)	18:19.13
	chrisl: sorry: -r600 -Z: -sDEVICE=ppmraw -dUseFastColor -dMaxBitmap=0 -dBandHeight=128 -o /dev/null	18:19.42
chrisl	Okay, running now	18:20.01
	Is FinalTime good enough?	18:22.11
henrys	Mikkadu: which printers are you generating output for?	18:23.35
Mikkadu	HP Deskjet Ink Advantage 3525	18:23.58
chrisl	ray_laptop: so FinalTime "normal" is 18.9493, FinalTime with the explicit mapping is 15.4757 for WWTTN1CT_PDF_1_7.pdf	18:24.42
ray_laptop	chrisl: Final time is fine (it'lll include rendering time, but so what)	18:24.43
	chrisl: WOW!!! That's >20% and more than I expected. Are you sure that the output is still correct ???	18:25.54
henrys	Mikkadu: HP PCL 3 GUI not up on that as much as Hin-Tak is. but what is the problem?	18:26.35
Mikkadu	So it uses uknown format of ESC*g#W 0x06 I've found description only for 0x02 =(	18:26.46
	HP's driver generates same format as Hin-Tak does.	18:27.17
chrisl	ray_laptop: the output seems okay, it's using the same fonts, so it should be okay	18:27.37
	ray_laptop: the difference here is almost certainly that, because we're not using the PDF interpreter's "general purpose" font substitution, we actually get cached glyphs persisting between pages	18:29.16
marcosw_	Mikkadu: The ESC*g#W sequence is an RTL command. The only documentation I've been able to find is on this page: <http://www.undocprint.org/formats/page_description_languages/pcl>	18:30.06
henrys	Mikkadu: you're ahead of me I don't have 0x02 or 0x06 or know what that means. I see Hin-Tak's patch which suggests there is 20 bytes of data for the command.	18:30.41
Mikkadu	Yes and me too, but it does not match Hin-Tak's commit http://bugs.ghostscript.com/show_bug.cgi?id=694082#c2	18:30.48
chrisl	ray_laptop: if you can put the 9.06 sim somewhere I can download it, I'll try it properly tomorrow - I'm struggling with the hayfever now	18:31.25
Mikkadu	the first byte of command if Format. I've found descriptions for format 0x02.	18:33.04
	When I print whith HP's driver it generates	18:33.23
	0x06 0x1f 0x00 0x02 0x02 0x58 0x02 0x58 0x09 0x00 0x01 0x01 0x02 0x58 0x02 0x58 0x20 0x0A 0x01 0x20 0x01	18:33.28
	So, format is 0x6	18:33.40
	and length is 20	18:33.47
	According to docuemntations marcosw_ and me found on <http://www.undocprint.org/formats/page_description_languages/pcl>	18:34.20
	length can be only 8 or 24	18:34.30
	and format can be 0x2	18:34.51
	In CUPS it is hard coded as 0x02	18:35.05
	So, I don't know what to do and where to go please safe my sole =(((((((	18:36.05
marcosw_	I can't find this command in the official HP RTL documentation at all, otoh I'm not at home so don't have access to any of the paper copies.	18:37.07
henrys	Mikkadu: who do you work for?	18:37.36
marcosw_	I did find a reference saying this command "is not supported by the HP DesignJet 4xx, 7xx, 1xxx, 2xxx, 3xxx or ColorPro printers".	18:37.47
Mikkadu	hm	18:38.05
	CDC =)	18:38.09
ray_laptop	chrisl: BTW, the customer simulator Tf interpreter cycles are the same between 871 and 906 (both 143980)	18:38.32
Robin_Watts	Centre for Disease Control ?	18:38.38
Mikkadu	=)	18:38.50
henrys	and your producing PCL ?	18:38.55
	;-)	18:38.56
Mikkadu	hm...	18:39.02
	yes why not	18:39.06
	=)	18:39.07
chrisl	ray_laptop: that's odd.....	18:39.17
henrys	Mikkadu: industry joke sorry	18:40.20
*henrys*	searches pcl docs	18:41.34
ray_laptop	chrisl: I'll double check, but they've hacked the font lookup in both	18:41.37
Mikkadu	it's 2340 and I want to go home =( Please guys if you will find something please send me email misha.zador@gmail.com =(	18:42.30
	henrys Thank you a lot	18:42.43
ray_laptop	chrisl: the count isn't EXACTLY the same. 906: 143377, 871: 143980	18:43.10
chrisl	ray_laptop: as I said before, we're doing an awful lot in stopped contexts now, which aren't exactly cheap....	18:43.26
henrys	Mikkadu: CDC in asia?	18:44.43
Mikkadu	no	18:44.53
	in Russia )	18:44.56
ray_laptop	chrisl: I'm on the hunt for the getdeviceparams now. the counts for page 1 for those are 871:17, 906:4259	18:45.56
Robin_Watts	chrisl: Random thought... is it possible that we could reduce some of the time taken by the stopped contexts by using some custom postscript operators?	18:46.27
ray_laptop	as I was discussing earlier with kens	18:46.27
chrisl	Robin_Watts: not really, they are mostly to catch errors during parsing the streams	18:47.21
ray_laptop	clearly this is a customer that would REALLY benefit from a PDF parser in C !!!	18:47.27
Robin_Watts	I don't know postscript well enough to have a clear view of this, but how many PS instructions does it take to setup/recover from a stopped context?	18:48.07
chrisl	Robin_Watts: I'm not sure about Ghostscript's implementation - the issue more the recovery: on an error, PS just stops, so cleaning up after than can be complex	18:49.22
henrys	Mikkadu: I'm not seeing anything drop a note to Hin-Tak	18:50.10
ray_laptop	Robin_Watts: in terms of execution it is not much overhead (unless there was an error, which is NOT the usual case)	18:50.10
Robin_Watts	chrisl: Maybe I'm talking utter rubbish here, so before I say anything else, can you point me at an example please?	18:50.16
	ray_laptop: I was thinking purely in terms of reducing the number of times we pass through the intrepreter loop.	18:50.35
Mikkadu	henrys how can I do it?	18:52.25
	email?	18:52.41
henrys	become a member of bugzilla and add a question to the bug	18:53.16
chrisl	Robin_Watts: actually, looking through the pdf interpreter, it's not as bad as I thought....	18:53.19
Mikkadu	Ok thx	18:53.30
chrisl	Robin_Watts: however, I think anything like that would be effort better put to mooscript	18:54.01
Robin_Watts	essentially: X becomes { X } stopped { stuff to clean up X } if ?	18:54.43
ray_laptop	chrisl: both 871 and 906 execute ".internalstopped" 792 times on page 1	18:55.23
	Robin_Watts: correct, except the clean up is mostly done by the interpreter	18:55.47
chrisl	Yes, exactly. Generally, you have to be fairly careful cleaning up the stack(s) afterwards, but it seems less necessary in the PDF interpreter	18:55.51
ray_laptop	the interpreter takes care of restoring the stacks	18:56.11
chrisl	No, it doesn't	18:56.21
	At least, not always	18:56.40
Robin_Watts	{ X } { stuff to clean up X } .stoppedif	18:56.42
ray_laptop	chrisl: right, sorry. Not always	18:56.55
	Robin_Watts: and what's the difference ?	18:57.14
Robin_Watts	That would save us 1 time around the interpreter loop.	18:57.32
	but for 800 executions it's not worth it.	18:57.58
ray_laptop	Robin_Watts: agreed	18:58.05
Robin_Watts	I thought we were talking much higher numbers.	18:58.17
chrisl	Yeh, I am surprised there's not a more significant difference between the two code bases, but there you go.....	18:58.51
ray_laptop	as chrisl said, it's mostly when we run streams to handle unexpected EOF	18:58.57
Robin_Watts	Also, we could maybe do some smart stuff by binding the cleanup code and then do: {X} cleanup .stoppedif	18:59.11
Latarian	has anybody seen an issue printing fillable form PDFs with mac	18:59.12
	Unable to convert to postscript file	18:59.19
ray_laptop	this file only has the main "Content" stream	18:59.24
Robin_Watts	Less stack manipulation going on there.	18:59.28
	But again for 800 executions we're never going to gain much.	19:00.21
ray_laptop	Robin_Watts: procedures get 'packed' and bound when loading the PS file.	19:00.40
Robin_Watts	oh, right. didn't know that. So we'd gain nothing with the latter idea.	19:01.06
ray_laptop	Robin_Watts: so the interpreter steps are "push packed array" stopped "push packed array" if	19:01.31
chrisl	Okay, I'm going to have to finish - I'm struggling to focus. ray_laptop as I said, if you want me to try out the font substitution thing tomorrow, I'll need the new simulator	19:02.12
Robin_Watts	chrisl: Are you working tomorrow?	19:02.38
chrisl	I was planning to, yes	19:02.46
	I don't want to spend hours sat in Good Friday traffic.....	19:03.17
ray_laptop	chrisl: right. I'll clean my build and send it up for you. I'll email tech with the location (in case anyone else wants it)	19:03.21
	there is traffic on Good Friday (more than usual Friday)? Here it is less	19:03.52
chrisl	ray_laptop: okay, thanks.	19:04.00
	ray_laptop: yeh, not so much local, but out of town: day trippers, long weekenders etc - and I'm on one of the main routes down to the west country, where a lot of people go for those kinds of jaunts	19:05.05
mvrhel_laptop	bbiab	19:41.36
ray_laptop	Patch to collect device characteristics at the start of the page GREATLY reduced the number of getdeviceparams calls (factor of 200). I'll apply it to HEAD and regression test and let kens review it tomorrow	21:39.50
	I need an automated fork to dig through all the spaghetti code so I can find the meatballs :-)	21:40.55
	as I said before, this customer (532) is a perfect example of where a PDF parser in C would help a LOT !	21:41.52
	maybe tor is ready to work with ghostscript again ;-)	21:42.24
henrys	ray_laptop: I must have done something wrong my 9.06 code was 2x faster than 8.71	21:42.52
ray_laptop	henrys: yeah, I was thinking that. Once I get the patches applied to HEAD and a regression running, I may give it a whirl on some linux box	21:44.02
	or maybe build it for the pi	21:44.15
	henrys: note that I am only looking at the parsing times, since their target printing times were almost the same. I use -Z: output and a little awk script to collect the parsing and rendering times separately	21:46.27
	-Z: gives the Outputpage times even for a release build	21:47.02
	henrys: I just re-ran the tests on my laptop (with HEAD -- none of my "improvements") and I get 2.99 seconds for all 50 pages with 8.71 and 5.00 seconds with HEAD.	22:01.40
	The command line args I used were: -q -Z: -sDEVICE=ppmraw -o /dev/null -r600 -dUseFastColor -dBandHeight=128 -dMaxBitmap=10000 WWTN1.pdf	22:02.44
	I have to run an errand. BBIAB	22:06.27
	Forward 1 day (to 2014/04/18)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.