IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2012/10/18)2012/10/19 
paulgardiner Robin_Watts: ping11:34.11 
Robin_Watts pong11:34.15 
paulgardiner When a field changes in a way that will affect its appearance, I need to repaint the relevant areas of pages where they appear. Fields aren't associated with a page, but they have widget annotations as children that are specific to a page. Given an annotation, I need to find what page it's on efficiently.11:37.38 
  I don't want to search through the page's tree of annotations.11:38.06 
  Annotations can have a reference to the page, but its optional. I'm thinking I should add a reference if it doesn't already exist when we process the annotations11:38.48 
  That shouldn't break anything, should it?11:39.39 
Robin_Watts Let me restate that to see if I've followed it...11:40.06 
paulgardiner That would be appreciated. Maybe then I'll understand it. :-)11:41.53 
Robin_Watts A given field can appear on one or more pages by having a widget annotation on that page. The only way a field can appear on a page is by having such a widget annotation11:41.53 
  Every annotation can only appear on a single page.11:42.28 
paulgardiner Yes, I believe so.11:42.54 
Robin_Watts Annotations can optionally have a reference to the page that they are on, and you propose to make this compulsory by making such a reference if there isn't one already existing.11:43.10 
  Assuming that every annotation can only appear on a single page, that sounds fine.11:43.30 
paulgardiner Yes, although now I'm concerned that it might still be difficult to derive the page number11:43.43 
Robin_Watts sainsburys delivery. back in a mo, sorry.11:43.51 
paulgardiner np11:43.57 
kens I'm not certain that an annotation can appear on only one page11:55.53 
  because each page has an Annots entry11:56.24 
paulgardiner Hmmm, I'd better check. Not sure what made me think it so.11:57.56 
Robin_Watts back.11:58.41 
  yeah, that's vaguely my worry.11:58.46 
  If every page has a list of the annotations that's on it of the form: /Annots [24 0 R 25 0 R 26 0 R] etc...11:59.15 
paulgardiner Annotations that have a page-object reference presumably appear on one page.11:59.15 
Robin_Watts it would be possible for several pages to refer to the same annotation (e.g. 24 0 R)11:59.40 
paulgardiner Somewhere I picked up the idea that fields could appear on multiple pages, and they did so by having multiple annotations,. each appearing on a single page12:00.02 
Robin_Watts Page 148 of the 1.7 spec shows /Annots [23 0 R 24 0 R] as an example12:00.46 
paulgardiner Ah. Bottom of Page 60512:01.35 
  A given annotation dictionary may be referenced from the Annots array of only one page. Attempting to share an annotation dictionary among multiple pages produces unpredictable behavior12:02.06 
Robin_Watts OK, perfect.12:02.13 
  That sounds like exactly the restriction you need to be able to pull your trick.12:02.33 
paulgardiner But there's another problem. The current plan lets me get the page object, but I need the page number12:02.50 
Robin_Watts Presumably, it's the P entry you are looking at ?12:02.54 
paulgardiner Yep12:03.04 
Robin_Watts You can go from page object -> page number.12:03.22 
paulgardiner That was my hope, but I haven't found it yet12:03.47 
Robin_Watts by doing a linear search of xref->page_refs and doing pdf_to_num on each.12:03.49 
paulgardiner I was hoping to avoid searches12:04.52 
Robin_Watts You *could* add a PageNum entry to that dict?12:05.06 
  but it's not nice.12:05.21 
paulgardiner It's looking like the only reasonable way12:05.36 
  I already add a "Dirty" entry temporarily to fields.12:06.01 
Robin_Watts paulgardiner: Or you could build a mapping from object number -> page.12:06.10 
  and hold it in the xref.12:06.28 
tor8 pdf_lookup_page_number(xref, obj)12:06.36 
paulgardiner tor8: Does that already exist?12:06.52 
Robin_Watts tor8: Ah, right, that's the encapsulation of the linear search ?12:06.58 
tor8 it exists. and it's a wrap of a linear search. any improvements should go in there :)12:07.17 
  like building a sorted array of obj-num to page-num mappings that can be binary searched12:07.39 
paulgardiner Ok. I think the fact that that already exists implies, I should use it rather than the trick on adding an entry. This isn't horrendously time critical any way.12:08.23 
  Must the obj be a page reference? Or would that work with an annotation? I'm guessing the former.12:08.53 
tor8 it must be a page reference indirect object12:09.19 
  it compares against the references used in the page tree12:09.33 
paulgardiner So I still need to add "P" entrees to the annotations that don't already have them.12:10.05 
  ?12:10.12 
Robin_Watts Yes, but that's much less nasty, as you're simply filling out an optional bit of the spec.12:10.47 
paulgardiner Unless I give in completely, and add pdf_lookup_annotations_page?12:11.03 
  Hmm, that would be slow though because it would have to look through the annotation lists of every page.12:11.45 
  ... but could be sped up with the right data structure12:12.28 
  Robin_Watts: yes, it isn't really nasty at all, I guess.12:13.49 
  Thanks. I'll battle on.12:14.05 
  Presumably 10,000 page documents don't tend to be forms anyway.12:15.27 
Robin_Watts could imagine an O'Reilly book with a form in the back for "tell me about updates to this book" etc?12:18.16 
paulgardiner I'll ignore that. :-)12:23.51 
Robin_Watts I think that should work with your planned "fill in the P option" plan.12:27.19 
  http://www.printercomparison.com/default.asp?newsID=150913:30.56 
  I'm all for supporting a range of products, but *42* new laser printers?13:31.15 
chrisl kens: I have to go out, and I almost certainly won't be back for 4 o'clock.......13:39.26 
kens OK chrisl no problem13:39.40 
chrisl If I do get back at a vaguely sensible time, I'll give you a call13:40.05 
paulgardiner Robin_Watts: there's a few commits on paulg/master if you have a moment.13:40.22 
Robin_Watts ok.13:40.56 
paulgardiner Robin_Watts, tor8: I still haven't sorted out this ensuring page references are present in annotations.13:53.44 
  What looks like the natural place to do it, has the page object in the form of a dict. The P entry is supposed to be an indirect reference. Is there a llokup function for that?... I should look really, I guess. 13:55.18 
Robin_Watts xref->page_objs and xref->page_refs are kept for exactly this reason.13:56.51 
  Where are you in the code?13:56.59 
paulgardiner Ah right13:57.16 
  Line 410 of pdf_page.c13:58.02 
Robin_Watts In pdf_load_page?13:58.27 
paulgardiner I haved the pageref13:58.30 
  have13:58.35 
Robin_Watts (My code has changes in that file)13:58.37 
paulgardiner It's ok. The ref is already in a variable13:58.49 
Robin_Watts Right, so the pageref is what you what.13:58.51 
paulgardiner Yep, thanks.13:59.26 
tor8 paulgardiner: pdf_lookup_page_number takes the indirect reference14:01.39 
paulgardiner tor8: ah right. So another reason I should make sure its the ref I put in the "P" entry.14:05.28 
kens Robin_Watts : ping ?14:18.06 
Robin_Watts pong14:18.12 
kens Can you help me understand a log entry ?14:18.22 
Robin_Watts I can try.14:18.28 
kens My regression test has:14:18.46 
  The following 2 regression file(s) have started producing errors:14:18.46 
  tests_private/comparefiles/446-01.ps.pdf.pkmraw.300.0 gs pdfwrite inches miles Error_reading_Ghostscript_produced_PDF/PS_file14:18.46 
  So I look in teh logs for miles ?14:19.02 
Robin_Watts I think so.14:19.12 
kens OK well if I do that I don;t see an error, so I'm puzzled....14:19.24 
  ===tests_private__comparefiles__446-01.ps.pdf.pkmraw.300.0===14:19.34 
  gs pdfwrite14:19.34 
  ./gs/bin/gs -sOutputFile=./temp/tests_private__comparefiles__446-01.ps.pdf.pkmraw.300.0.pdf -sDEVICE=pdfwrite -r300 -sDEFAULTPAPERSIZE=letter -dNOPAUSE -dBATCH -dClusterJob -dJOBSERVER - < ./tests_private/comparefiles/446-01.ps14:19.34 
  GPL Ghostscript 9.07 (2012-07-31)14:19.34 
  Copyright (C) 2012 Artifex Software, Inc. All rights reserved.14:19.34 
  Ooops14:20.04 
  DOn't know how much of that you saw before I got kicked off14:20.15 
Robin_Watts kens: OK. I have the same log.14:20.28 
kens BUt I see the report here:14:20.35 
  http://ghostscript.com/cgi-bin/clustermonitor.cgi?log=log&machine=miles&report=ken14:20.35 
Robin_Watts Note that there are 2 entries for each pdfwrite.14:20.47 
  First you have the pdfwrite step, which as you say completes with no error.14:20.57 
kens So there are, the first is the pdfwrite conersion I guess14:21.10 
Robin_Watts Then you have another step which is it reading the pdfwritten thing, and writing the pkmraw output.14:21.20 
kens Ah, and the second does indeed afil14:21.20 
Robin_Watts And that... yeah.14:21.24 
kens OK well I don't see that locally so I guess I'd better try it with teh command line, thanks14:21.40 
Robin_Watts no worries.14:21.46 
kens Hmm, well that does reproduce it, something to do with the environment then14:25.22 
Robin_Watts tor8, kens, sebras, anyone else...14:54.41 
  When we'd looked at hints tables before, we'd decided that nothing uses them, right?14:54.59 
sebras Robin_Watts: yeah, that's what kens said about acrobat at least.14:55.13 
Robin_Watts How then can you know what order pages go in?14:55.17 
  It's easy to know the first page to use for a file.14:55.26 
  Do we then assume that you don't display any more pages until the whole lot has arrived?14:55.39 
  (or at least you only display blank pages, but the right number of blanks)14:55.55 
  (possibly of the wrong size)14:56.06 
kens Robin_Watts : I don't really think anything uses hte hints at all, so you can only use page 114:56.14 
sebras well, what kens said was that whether the hints stream was bogus or not didn't affect the "optimized" state according to acrobat.14:56.18 
Robin_Watts Right, so if we DID use the hints table we could display subsequent pages as we go.14:56.48 
kens One of teh acrobat implementation notes says that, although techincially the 'first' page need not be ordinal page 1, that's the only way Acrobat writes it.14:56.53 
  Robin_Watts : Technically I believe we could but I would want to reread tghe spec (again) before comitting myself.14:57.13 
  Also, I wouldn't count on being able to do it from the Acrobat output, which is even worse than GS's14:57.30 
sebras Robin_Watts: as I understand it that must be the point of the hints table, no..?14:57.40 
Robin_Watts sebras: Indeed.14:57.55 
kens I guess my point is that you can't rely on this stuff being correct. Alos, there's another implementation note about compressed objects which basically says 'can't use hint streams then'14:58.21 
  And since most PDF files from Adobe apps use compressed objects....14:58.37 
sebras this is just the case of badly implemented generators, not a bad spec, right? at least that is my understanding.14:58.49 
Robin_Watts Actually, even with hint streams how do I find what number object is the page object?14:59.21 
sebras kens: objects in object streams, or just objects with compressed stream contents?14:59.34 
kens THe spec is OK (if daft, opaque, hard to understand and harder to implemetn), the 'problem' is that AQcrobat Distiller, and by implication other Adobe products) don't follow the spec well.14:59.45 
  Robin_Watts : the headers tell you taht stiuff IIRC15:00.06 
Robin_Watts The top level dict tells me the object number for page 1.15:00.23 
kens sebras object streams15:00.24 
  Robin_Watts : hang on I'll go and get the spec out again....15:00.35 
sebras kens: are object streams really that common? and also, will the adobe apps really generate object streams when they are asked to generate linearized pdfs!? if that's an implementation limitation that seems really strange.15:01.53 
Robin_Watts I can find "the first object number" for any given page by looking at item 1 in the page offset hint table and accumulating.15:02.04 
kens Yes, that's it Robin_Watts15:02.23 
  I believbe the first object for the page is the page object15:02.35 
Robin_Watts But I can't find which entry within the block of objects is the page number.15:02.43 
kens There is no page number15:03.00 
Robin_Watts Where in the spec does it say the first element has to be the page object? (It may do, I can't find it)15:03.01 
  is the page object, sorry15:03.10 
kens I believe item 1 for the page is the page object15:03.22 
kens is still reading15:03.57 
Robin_Watts Ah, got it. You're right.15:04.19 
  So if I use the hint tables, I can get pages out early.15:04.37 
kens Assuming they are correct.15:04.59 
Robin_Watts indeed.15:05.06 
kens You can test our linearisation :-)15:05.08 
  I bet GS isn't right15:05.17 
henrys if colorado is a swing state 4 years from now I'm leaving the state for the election period.15:07.00 
Robin_Watts can see the headlines now: "colorado swings"15:07.57 
  henrys: We're sick of the election coverage over here, and it's not even our election. I can only imagine how you feel.15:08.53 
  Surely it's a simple choice though "another 4 years of underachievement" vs "ABSOLUTELY BATSHIT CRAZY!"15:09.18 
henrys to sick the entire U.S. campaign machinery on a few relatively small states is too much.15:11.41 
  replacing the electoral college with a popular vote would spread the hell around evenly15:16.33 
Robin_Watts But that'll never happen.15:17.18 
kens OK, one more try....15:17.46 
  hopefully no errors this time15:17.57 
Robin_Watts loses an hour looking for a bug that turns out to be = instead of ==.16:05.45 
  kens: Where did you see the implementation note that said "compressed objects -> no hint streams" ?16:17.21 
kens PDF reference manual, give me a minute16:17.42 
  p102516:21.43 
Robin_Watts I have something on page 104016:22.03 
kens For files containing object streams, hint data can specify the location and size of the object streams only (or uncompressed objects), not the individual compressed objects16:22.03 
  Robin_Watts : p1040 that's the one16:22.51 
Robin_Watts ok.16:23.38 
  So that doesn't say "no hint streams". It just says "hint streams even more broken".16:24.00 
  Presumably you have to assume that entries for pages won't be shared within a single compressed stream.16:24.49 
kens essentially useless16:24.53 
Robin_Watts so the first compressed object in a stream is the page object.16:25.06 
  Oh, but you can't know what objects are in a compressed stream, without the xref.16:25.19 
  gah. Useless :(16:25.25 
kens what I said was 'can't use hint streams'16:25.26 
  Time to be off, night all16:42.20 
Robin_Watts tor8: Well, I have a first version of progressive file loading working.17:19.58 
  I've added a -b option to mupdf that lets you give it a bps figure, and it then simulates the chosen file arriving at that many bps.17:20.44 
  So if I: mupdf -b 409600 pdf_reference17.pdf I first off get an error box telling me "not enough data to open the file". Then I click "OK" and it tries again, and I get another error box telling me "not enough data to count pages". Then I click OK again a few times, and that goes away and I get page 1 (possibly with no fonts).17:22.14 
  If I navigate to pages I haven't got yet I get blank pages.17:22.38 
  Then when the whole file arrives, it loads properly and I can navigate between properly sized/filled in pages.17:23.19 
sebras Robin_Watts: what prompts the file reader to progress? 18:08.29 
Robin_Watts currently when you change page and reload.18:11.24 
sebras Robin_Watts: ok, so it's not the dialog-box-clicking in the scenario outlined above?18:16.10 
Robin_Watts sebras: Basically, when we try to read beyond the end of the data we have currently got, it throws a TRYLATER exception.18:17.01 
  I've modified the app so that when it's initially trying to open, it just puts a dialogue up and retrys.18:17.41 
  Once we have got far enough that the page object for the first page is loaded, we can then at least start to display something, and no more dialogues.18:18.22 
  I'm working to the idea that if the app asks us to do something, either we should say "not yet", or we should say "here is the best I can do", or "Done".18:19.49 
  "not yet" is achievable by the TRYLATER exception.18:20.08 
  "here is the best I can do" can be done by the cookie coming back with an 'incomplete, try later' flag set.18:20.30 
  and Done is normal exit.18:20.38 
  I guess I should add a mechanism for the app to keep poling to say "is it worth me retrying yet?"18:21.05 
  at the moment whenever we try to load a page, if the pageobject is NULL I call pdf_progressive_advance18:23.15 
  and that 'gobbles' more objects from the file (akin to doing a repair), and inserting them into xref as it goes.18:23.38 
  For the polling mechanism I could call the same function, and then give a return code based on whether we pass a significant point (i.e. another page loaded, or end of file reached or something).18:24.33 
  I've just pushed the patch to my repo if you want to look at it. It's probably still broken in many places though...18:25.38 
  got to go help cook. bbs.18:25.48 
sebras is back.18:36.48 
  Robin_Watts: I'm thinking that this means that there are three ways of information there.18:38.58 
  would it make sense to have cookie return -1 or something to mean TRYLATER?18:39.19 
  I'm not even convinced that this is a good approach myself, just writing out loud.18:39.40 
  maybe -1 has a special meaning already.18:40.00 
  how much does pdf_progressive_advance() advance by?18:40.36 
  oh, cooking. ok. I'll look in the sources.18:43.56 
Robin_Watts sebras: The cookie is a structure that's passed in.18:55.27 
  so we can add as many fields to it as we want.18:55.37 
  but the cookie isn't present for all operations - only for rendering.18:55.46 
tor8 Robin_Watts: just a thought; refactor pdf_xref and give it two modes -- normal and repair/progressive. a lot of the stuff we do when repairing is common with what we have to do for progressive loading.18:59.42 
  and if we hit an xref/object/parsing error in normal mode, "restart" it in repair mode19:00.14 
  so if we get a bad xref that looks initially okay, we won't completely break like currently19:00.53 
sebras tor8: on the subject of breaking -- you saw me mentioning that we die on improper hex-strings in repair-mode right?19:07.40 
tor8 sebras: no. I tihnk I've seen that problem though.19:08.23 
  but is it die, or just spew a lot of warnings?19:08.39 
sebras tor8: warnings + error and then dead.19:09.03 
  one of the ioccc entries generates a .pdf-file the doubles as a .c-file.19:09.26 
  hence it contains #include <stdio.h> which trips us up.19:09.40 
tor8 rats. I think Robin_Watts did something about parsing hex strings to fix that (or cause it... never can know when Robin's been busy ;)19:09.40 
sebras I tested it late in the evening, so it may very well be me.19:11.36 
 Forward 1 day (to 2012/10/20)>>> 
ghostscript.com
Search: