Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2012/10/18)	2012/10/19
paulgardiner	Robin_Watts: ping	11:34.11
Robin_Watts	pong	11:34.15
paulgardiner	When a field changes in a way that will affect its appearance, I need to repaint the relevant areas of pages where they appear. Fields aren't associated with a page, but they have widget annotations as children that are specific to a page. Given an annotation, I need to find what page it's on efficiently.	11:37.38
	I don't want to search through the page's tree of annotations.	11:38.06
	Annotations can have a reference to the page, but its optional. I'm thinking I should add a reference if it doesn't already exist when we process the annotations	11:38.48
	That shouldn't break anything, should it?	11:39.39
Robin_Watts	Let me restate that to see if I've followed it...	11:40.06
paulgardiner	That would be appreciated. Maybe then I'll understand it. :-)	11:41.53
Robin_Watts	A given field can appear on one or more pages by having a widget annotation on that page. The only way a field can appear on a page is by having such a widget annotation	11:41.53
	Every annotation can only appear on a single page.	11:42.28
paulgardiner	Yes, I believe so.	11:42.54
Robin_Watts	Annotations can optionally have a reference to the page that they are on, and you propose to make this compulsory by making such a reference if there isn't one already existing.	11:43.10
	Assuming that every annotation can only appear on a single page, that sounds fine.	11:43.30
paulgardiner	Yes, although now I'm concerned that it might still be difficult to derive the page number	11:43.43
Robin_Watts	sainsburys delivery. back in a mo, sorry.	11:43.51
paulgardiner	np	11:43.57
kens	I'm not certain that an annotation can appear on only one page	11:55.53
	because each page has an Annots entry	11:56.24
paulgardiner	Hmmm, I'd better check. Not sure what made me think it so.	11:57.56
Robin_Watts	back.	11:58.41
	yeah, that's vaguely my worry.	11:58.46
	If every page has a list of the annotations that's on it of the form: /Annots [24 0 R 25 0 R 26 0 R] etc...	11:59.15
paulgardiner	Annotations that have a page-object reference presumably appear on one page.	11:59.15
Robin_Watts	it would be possible for several pages to refer to the same annotation (e.g. 24 0 R)	11:59.40
paulgardiner	Somewhere I picked up the idea that fields could appear on multiple pages, and they did so by having multiple annotations,. each appearing on a single page	12:00.02
Robin_Watts	Page 148 of the 1.7 spec shows /Annots [23 0 R 24 0 R] as an example	12:00.46
paulgardiner	Ah. Bottom of Page 605	12:01.35
	A given annotation dictionary may be referenced from the Annots array of only one page. Attempting to share an annotation dictionary among multiple pages produces unpredictable behavior	12:02.06
Robin_Watts	OK, perfect.	12:02.13
	That sounds like exactly the restriction you need to be able to pull your trick.	12:02.33
paulgardiner	But there's another problem. The current plan lets me get the page object, but I need the page number	12:02.50
Robin_Watts	Presumably, it's the P entry you are looking at ?	12:02.54
paulgardiner	Yep	12:03.04
Robin_Watts	You can go from page object -> page number.	12:03.22
paulgardiner	That was my hope, but I haven't found it yet	12:03.47
Robin_Watts	by doing a linear search of xref->page_refs and doing pdf_to_num on each.	12:03.49
paulgardiner	I was hoping to avoid searches	12:04.52
Robin_Watts	You could add a PageNum entry to that dict?	12:05.06
	but it's not nice.	12:05.21
paulgardiner	It's looking like the only reasonable way	12:05.36
	I already add a "Dirty" entry temporarily to fields.	12:06.01
Robin_Watts	paulgardiner: Or you could build a mapping from object number -> page.	12:06.10
	and hold it in the xref.	12:06.28
tor8	pdf_lookup_page_number(xref, obj)	12:06.36
paulgardiner	tor8: Does that already exist?	12:06.52
Robin_Watts	tor8: Ah, right, that's the encapsulation of the linear search ?	12:06.58
tor8	it exists. and it's a wrap of a linear search. any improvements should go in there :)	12:07.17
	like building a sorted array of obj-num to page-num mappings that can be binary searched	12:07.39
paulgardiner	Ok. I think the fact that that already exists implies, I should use it rather than the trick on adding an entry. This isn't horrendously time critical any way.	12:08.23
	Must the obj be a page reference? Or would that work with an annotation? I'm guessing the former.	12:08.53
tor8	it must be a page reference indirect object	12:09.19
	it compares against the references used in the page tree	12:09.33
paulgardiner	So I still need to add "P" entrees to the annotations that don't already have them.	12:10.05
	?	12:10.12
Robin_Watts	Yes, but that's much less nasty, as you're simply filling out an optional bit of the spec.	12:10.47
paulgardiner	Unless I give in completely, and add pdf_lookup_annotations_page?	12:11.03
	Hmm, that would be slow though because it would have to look through the annotation lists of every page.	12:11.45
	... but could be sped up with the right data structure	12:12.28
	Robin_Watts: yes, it isn't really nasty at all, I guess.	12:13.49
	Thanks. I'll battle on.	12:14.05
	Presumably 10,000 page documents don't tend to be forms anyway.	12:15.27
*Robin_Watts*	could imagine an O'Reilly book with a form in the back for "tell me about updates to this book" etc?	12:18.16
paulgardiner	I'll ignore that. :-)	12:23.51
Robin_Watts	I think that should work with your planned "fill in the P option" plan.	12:27.19
	http://www.printercomparison.com/default.asp?newsID=1509	13:30.56
	I'm all for supporting a range of products, but 42 new laser printers?	13:31.15
chrisl	kens: I have to go out, and I almost certainly won't be back for 4 o'clock.......	13:39.26
kens	OK chrisl no problem	13:39.40
chrisl	If I do get back at a vaguely sensible time, I'll give you a call	13:40.05
paulgardiner	Robin_Watts: there's a few commits on paulg/master if you have a moment.	13:40.22
Robin_Watts	ok.	13:40.56
paulgardiner	Robin_Watts, tor8: I still haven't sorted out this ensuring page references are present in annotations.	13:53.44
	What looks like the natural place to do it, has the page object in the form of a dict. The P entry is supposed to be an indirect reference. Is there a llokup function for that?... I should look really, I guess.	13:55.18
Robin_Watts	xref->page_objs and xref->page_refs are kept for exactly this reason.	13:56.51
	Where are you in the code?	13:56.59
paulgardiner	Ah right	13:57.16
	Line 410 of pdf_page.c	13:58.02
Robin_Watts	In pdf_load_page?	13:58.27
paulgardiner	I haved the pageref	13:58.30
	have	13:58.35
Robin_Watts	(My code has changes in that file)	13:58.37
paulgardiner	It's ok. The ref is already in a variable	13:58.49
Robin_Watts	Right, so the pageref is what you what.	13:58.51
paulgardiner	Yep, thanks.	13:59.26
tor8	paulgardiner: pdf_lookup_page_number takes the indirect reference	14:01.39
paulgardiner	tor8: ah right. So another reason I should make sure its the ref I put in the "P" entry.	14:05.28
kens	Robin_Watts : ping ?	14:18.06
Robin_Watts	pong	14:18.12
kens	Can you help me understand a log entry ?	14:18.22
Robin_Watts	I can try.	14:18.28
kens	My regression test has:	14:18.46
	The following 2 regression file(s) have started producing errors:	14:18.46
	tests_private/comparefiles/446-01.ps.pdf.pkmraw.300.0 gs pdfwrite inches miles Error_reading_Ghostscript_produced_PDF/PS_file	14:18.46
	So I look in teh logs for miles ?	14:19.02
Robin_Watts	I think so.	14:19.12
kens	OK well if I do that I don;t see an error, so I'm puzzled....	14:19.24
	===tests_private__comparefiles__446-01.ps.pdf.pkmraw.300.0===	14:19.34
	gs pdfwrite	14:19.34
	./gs/bin/gs -sOutputFile=./temp/tests_private__comparefiles__446-01.ps.pdf.pkmraw.300.0.pdf -sDEVICE=pdfwrite -r300 -sDEFAULTPAPERSIZE=letter -dNOPAUSE -dBATCH -dClusterJob -dJOBSERVER - < ./tests_private/comparefiles/446-01.ps	14:19.34
	GPL Ghostscript 9.07 (2012-07-31)	14:19.34
	Copyright (C) 2012 Artifex Software, Inc. All rights reserved.	14:19.34
	Ooops	14:20.04
	DOn't know how much of that you saw before I got kicked off	14:20.15
Robin_Watts	kens: OK. I have the same log.	14:20.28
kens	BUt I see the report here:	14:20.35
	http://ghostscript.com/cgi-bin/clustermonitor.cgi?log=log&machine=miles&report=ken	14:20.35
Robin_Watts	Note that there are 2 entries for each pdfwrite.	14:20.47
	First you have the pdfwrite step, which as you say completes with no error.	14:20.57
kens	So there are, the first is the pdfwrite conersion I guess	14:21.10
Robin_Watts	Then you have another step which is it reading the pdfwritten thing, and writing the pkmraw output.	14:21.20
kens	Ah, and the second does indeed afil	14:21.20
Robin_Watts	And that... yeah.	14:21.24
kens	OK well I don't see that locally so I guess I'd better try it with teh command line, thanks	14:21.40
Robin_Watts	no worries.	14:21.46
kens	Hmm, well that does reproduce it, something to do with the environment then	14:25.22
Robin_Watts	tor8, kens, sebras, anyone else...	14:54.41
	When we'd looked at hints tables before, we'd decided that nothing uses them, right?	14:54.59
sebras	Robin_Watts: yeah, that's what kens said about acrobat at least.	14:55.13
Robin_Watts	How then can you know what order pages go in?	14:55.17
	It's easy to know the first page to use for a file.	14:55.26
	Do we then assume that you don't display any more pages until the whole lot has arrived?	14:55.39
	(or at least you only display blank pages, but the right number of blanks)	14:55.55
	(possibly of the wrong size)	14:56.06
kens	Robin_Watts : I don't really think anything uses hte hints at all, so you can only use page 1	14:56.14
sebras	well, what kens said was that whether the hints stream was bogus or not didn't affect the "optimized" state according to acrobat.	14:56.18
Robin_Watts	Right, so if we DID use the hints table we could display subsequent pages as we go.	14:56.48
kens	One of teh acrobat implementation notes says that, although techincially the 'first' page need not be ordinal page 1, that's the only way Acrobat writes it.	14:56.53
	Robin_Watts : Technically I believe we could but I would want to reread tghe spec (again) before comitting myself.	14:57.13
	Also, I wouldn't count on being able to do it from the Acrobat output, which is even worse than GS's	14:57.30
sebras	Robin_Watts: as I understand it that must be the point of the hints table, no..?	14:57.40
Robin_Watts	sebras: Indeed.	14:57.55
kens	I guess my point is that you can't rely on this stuff being correct. Alos, there's another implementation note about compressed objects which basically says 'can't use hint streams then'	14:58.21
	And since most PDF files from Adobe apps use compressed objects....	14:58.37
sebras	this is just the case of badly implemented generators, not a bad spec, right? at least that is my understanding.	14:58.49
Robin_Watts	Actually, even with hint streams how do I find what number object is the page object?	14:59.21
sebras	kens: objects in object streams, or just objects with compressed stream contents?	14:59.34
kens	THe spec is OK (if daft, opaque, hard to understand and harder to implemetn), the 'problem' is that AQcrobat Distiller, and by implication other Adobe products) don't follow the spec well.	14:59.45
	Robin_Watts : the headers tell you taht stiuff IIRC	15:00.06
Robin_Watts	The top level dict tells me the object number for page 1.	15:00.23
kens	sebras object streams	15:00.24
	Robin_Watts : hang on I'll go and get the spec out again....	15:00.35
sebras	kens: are object streams really that common? and also, will the adobe apps really generate object streams when they are asked to generate linearized pdfs!? if that's an implementation limitation that seems really strange.	15:01.53
Robin_Watts	I can find "the first object number" for any given page by looking at item 1 in the page offset hint table and accumulating.	15:02.04
kens	Yes, that's it Robin_Watts	15:02.23
	I believbe the first object for the page is the page object	15:02.35
Robin_Watts	But I can't find which entry within the block of objects is the page number.	15:02.43
kens	There is no page number	15:03.00
Robin_Watts	Where in the spec does it say the first element has to be the page object? (It may do, I can't find it)	15:03.01
	is the page object, sorry	15:03.10
kens	I believe item 1 for the page is the page object	15:03.22
*kens*	is still reading	15:03.57
Robin_Watts	Ah, got it. You're right.	15:04.19
	So if I use the hint tables, I can get pages out early.	15:04.37
kens	Assuming they are correct.	15:04.59
Robin_Watts	indeed.	15:05.06
kens	You can test our linearisation :-)	15:05.08
	I bet GS isn't right	15:05.17
henrys	if colorado is a swing state 4 years from now I'm leaving the state for the election period.	15:07.00
*Robin_Watts*	can see the headlines now: "colorado swings"	15:07.57
	henrys: We're sick of the election coverage over here, and it's not even our election. I can only imagine how you feel.	15:08.53
	Surely it's a simple choice though "another 4 years of underachievement" vs "ABSOLUTELY BATSHIT CRAZY!"	15:09.18
henrys	to sick the entire U.S. campaign machinery on a few relatively small states is too much.	15:11.41
	replacing the electoral college with a popular vote would spread the hell around evenly	15:16.33
Robin_Watts	But that'll never happen.	15:17.18
kens	OK, one more try....	15:17.46
	hopefully no errors this time	15:17.57
*Robin_Watts*	loses an hour looking for a bug that turns out to be = instead of ==.	16:05.45
	kens: Where did you see the implementation note that said "compressed objects -> no hint streams" ?	16:17.21
kens	PDF reference manual, give me a minute	16:17.42
	p1025	16:21.43
Robin_Watts	I have something on page 1040	16:22.03
kens	For files containing object streams, hint data can specify the location and size of the object streams only (or uncompressed objects), not the individual compressed objects	16:22.03
	Robin_Watts : p1040 that's the one	16:22.51
Robin_Watts	ok.	16:23.38
	So that doesn't say "no hint streams". It just says "hint streams even more broken".	16:24.00
	Presumably you have to assume that entries for pages won't be shared within a single compressed stream.	16:24.49
kens	essentially useless	16:24.53
Robin_Watts	so the first compressed object in a stream is the page object.	16:25.06
	Oh, but you can't know what objects are in a compressed stream, without the xref.	16:25.19
	gah. Useless :(	16:25.25
kens	what I said was 'can't use hint streams'	16:25.26
	Time to be off, night all	16:42.20
Robin_Watts	tor8: Well, I have a first version of progressive file loading working.	17:19.58
	I've added a -b option to mupdf that lets you give it a bps figure, and it then simulates the chosen file arriving at that many bps.	17:20.44
	So if I: mupdf -b 409600 pdf_reference17.pdf I first off get an error box telling me "not enough data to open the file". Then I click "OK" and it tries again, and I get another error box telling me "not enough data to count pages". Then I click OK again a few times, and that goes away and I get page 1 (possibly with no fonts).	17:22.14
	If I navigate to pages I haven't got yet I get blank pages.	17:22.38
	Then when the whole file arrives, it loads properly and I can navigate between properly sized/filled in pages.	17:23.19
sebras	Robin_Watts: what prompts the file reader to progress?	18:08.29
Robin_Watts	currently when you change page and reload.	18:11.24
sebras	Robin_Watts: ok, so it's not the dialog-box-clicking in the scenario outlined above?	18:16.10
Robin_Watts	sebras: Basically, when we try to read beyond the end of the data we have currently got, it throws a TRYLATER exception.	18:17.01
	I've modified the app so that when it's initially trying to open, it just puts a dialogue up and retrys.	18:17.41
	Once we have got far enough that the page object for the first page is loaded, we can then at least start to display something, and no more dialogues.	18:18.22
	I'm working to the idea that if the app asks us to do something, either we should say "not yet", or we should say "here is the best I can do", or "Done".	18:19.49
	"not yet" is achievable by the TRYLATER exception.	18:20.08
	"here is the best I can do" can be done by the cookie coming back with an 'incomplete, try later' flag set.	18:20.30
	and Done is normal exit.	18:20.38
	I guess I should add a mechanism for the app to keep poling to say "is it worth me retrying yet?"	18:21.05
	at the moment whenever we try to load a page, if the pageobject is NULL I call pdf_progressive_advance	18:23.15
	and that 'gobbles' more objects from the file (akin to doing a repair), and inserting them into xref as it goes.	18:23.38
	For the polling mechanism I could call the same function, and then give a return code based on whether we pass a significant point (i.e. another page loaded, or end of file reached or something).	18:24.33
	I've just pushed the patch to my repo if you want to look at it. It's probably still broken in many places though...	18:25.38
	got to go help cook. bbs.	18:25.48
*sebras*	is back.	18:36.48
	Robin_Watts: I'm thinking that this means that there are three ways of information there.	18:38.58
	would it make sense to have cookie return -1 or something to mean TRYLATER?	18:39.19
	I'm not even convinced that this is a good approach myself, just writing out loud.	18:39.40
	maybe -1 has a special meaning already.	18:40.00
	how much does pdf_progressive_advance() advance by?	18:40.36
	oh, cooking. ok. I'll look in the sources.	18:43.56
Robin_Watts	sebras: The cookie is a structure that's passed in.	18:55.27
	so we can add as many fields to it as we want.	18:55.37
	but the cookie isn't present for all operations - only for rendering.	18:55.46
tor8	Robin_Watts: just a thought; refactor pdf_xref and give it two modes -- normal and repair/progressive. a lot of the stuff we do when repairing is common with what we have to do for progressive loading.	18:59.42
	and if we hit an xref/object/parsing error in normal mode, "restart" it in repair mode	19:00.14
	so if we get a bad xref that looks initially okay, we won't completely break like currently	19:00.53
sebras	tor8: on the subject of breaking -- you saw me mentioning that we die on improper hex-strings in repair-mode right?	19:07.40
tor8	sebras: no. I tihnk I've seen that problem though.	19:08.23
	but is it die, or just spew a lot of warnings?	19:08.39
sebras	tor8: warnings + error and then dead.	19:09.03
	one of the ioccc entries generates a .pdf-file the doubles as a .c-file.	19:09.26
	hence it contains #include <stdio.h> which trips us up.	19:09.40
tor8	rats. I think Robin_Watts did something about parsing hex strings to fix that (or cause it... never can know when Robin's been busy ;)	19:09.40
sebras	I tested it late in the evening, so it may very well be me.	19:11.36
	Forward 1 day (to 2012/10/20)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.