Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2018/07/09)20180710 
inflex Is the muPDF "outline" a small-page-view column down the side of the viewer, or is it something else entirely?10:16.57 
  If muPDF doesn't already support something like that, then I suppose I could look at generating bitmapped thumbnails of the pages on load.10:26.30 
kens While MuPDF (the core code) can certainly access the /Outlines of the PDF file, I don't think any of the demo apps expose it (ie add a user interface to allow you to select a page based on the Outlines). I could be wrong though.10:27.40 
inflex np, all good. I noticed the mupdf-gl does have the press 'o' for outlines, but maybe that's actually more a "outline/description" of the PDF.10:36.44 
kens Ah I wasn't aware of that, I'm afraid I don't know what it does, sorry10:37.23 
tor8 inflex: the 'o' outline is the table of contents (called the 'outline' in PDF specification)10:37.26 
inflex thanks tor8. Is there something I can expose from the inner workings to achieve the side-pane of thumbnails, or is it something I just need to generate myself?11:33.16 
tor8 inflex: you'll have to render the page thumbnails yourself. beware that it could be very slow on image-intensive pages.11:37.40 
  rendering the thumbnail is going to involve parsing and rendering the whole page, just at a small size11:37.56 
  there are no page thumbnail images stored in the PDF format11:38.12 
  so unless you're doing multiple threads and rendering in the background, I can't recommend it11:38.45 
inflex tor8, that's fine, I had suspected as much12:03.30 
paulgardiner tor8: I've run into a problem with signature support that I'm struggling with. The signature field dictionary refers to a byte range, which specifies what parts of the document are hashed as part of signing. When verifying, we need to check that the byte range is reasonable: it should cover what was the whole document at the time of signing (although that might be a prefix of the document at...12:41.10 
  ...time of verification because of subsequent incremental updates).12:41.12 
  I've tried changing the loading functions to look for startxref <number> %%EOF after reading the trailer. That works for anything mupdf produces, but I'm seeing some files where the trailer is stored near the beginning of the file. I'm a bit lost how to approach this now.12:41.18 
  Possibly it fails with some file mupdf produces, come to think of it. The structure I'm seeing may be due to linearization12:54.09 
tor8 paulgardiner: I don't think I can be of much help... you and robin have changed that code so much I no longer recognize it or know what it does...12:58.39 
paulgardiner I could search for the startxref <number> %%EOF for which the number corresponds to the start of the xref, but that might require reading through the whole document.12:59.08 
  tor8: I'm not sure I'm asking about mupdf so much as about PDF.12:59.33 
tor8 paulgardiner: the trailer can be anywhere, often at the beginning with linearized files I expect.13:00.56 
  can't you just save the size of the file when we first open the PDF?13:01.20 
  we start by scanning for 'startxref' at the end to find the trailer13:01.32 
paulgardiner That works only for the last xref section.13:01.45 
inflex tor8, almost wondering, since most of the PDFs that are going to be viewed through the viewer will be used over and over again (schematic diagrams), it could be possible to create PNG thumbnails once-off13:01.54 
  tor8, and have them stored alongside the actual PDF as a metafile.13:02.21 
paulgardiner tor8: I need to be able to work out the sizes the file corresponding to each xref section.13:02.54 
tor8 paulgardiner: you can't reasonably verify anything other than the last xref section?13:02.58 
paulgardiner tor8: I think we need to for the case of multiple signatures.13:03.21 
tor8 I mean, any additional xref sections or data appended to the end can very much change the document13:03.22 
paulgardiner Possibly we also need to determine if an incremental update adds only signatures.13:03.50 
tor8 and saying "this signature checks out, because the subset of the file that was used when it was signed matches, but hey, I'm just kidding somebody replaced bits with newer objects at the end of the file"13:04.28 
  is not okay13:04.42 
  I mean, it's perfectly plausible to replace the content stream of a page with a newer generation object in an incremental update13:05.05 
paulgardiner See above.13:05.14 
tor8 that (and any other edits) *should* invalitade *all* signatures.13:05.17 
paulgardiner the case of the only change being to add another signature.13:05.43 
tor8 wouldn't multiple signatures put each other in the byte ranges that are excluded?13:05.53 
paulgardiner It seems not. If using AR I sign a document that has multiple signature fields, it doesn't leave room for the digest of the other signatures.13:07.10 
tor8 right. so each signature checks a certain subset of the file.13:07.47 
paulgardiner Yep.13:07.54 
tor8 and the ranges are shorter than a file if incrementally updated13:08.12 
paulgardiner Yep.13:08.20 
tor8 so if we incrementally update the file, we can't tell if those edits in general invalidate a signature13:08.54 
  or well, if they should invalidate it13:09.02 
paulgardiner AR will say something like "This signature is valid but for an early version of the document, and then asks if the user would like to view the ealier version" I assume it doesn't do that if the only change is to add another signature13:10.02 
  AR can, in some cases at least, list the changes that have been made since a signing.13:10.37 
tor8 paulgardiner: looking at the spec (pdfref17.pdf page 726) there's a note13:11.16 
  If a signed document is modified and saved by incremental update, bla bla bla, it is possible to recreate the state of the document as it existed at the time of signing.13:11.50 
  it doesn't say how, other than implying the ByteRange array13:12.07 
  so I would have to assume the last range will imply the end of the file13:12.22 
  at the time of signing13:12.28 
  and it's only possible to tell the end of the file for the latest iteration (the startxref entry)13:13.16 
paulgardiner I don't see how that relates to byte ranges13:13.20 
tor8 incremental updates don't write the end of the file of the previous version, it just chains the xrefs13:13.38 
  Note: If a signed document is modified and saved by incremental update (see Sec-13:14.11 
  tion 3.4.5, “Incremental Updates”), the data corresponding to the byte range of the13:14.11 
  original signature is preserved. Therefore, if the signature is valid, it is possible to13:14.11 
  recreate the state of the document as it existed at the time of signing.""13:14.11 
paulgardiner You can create the state of the document at the time of signing by just ignoring the subsequent xrefs, I think13:14.13 
  No need for the byte ranges.13:14.29 
  The byte ranges specify what is hashed.13:14.43 
tor8 there is not a guaranteed one-to-one mapping between xref sections and incremental updates13:14.46 
  it is entirely possible to have multiple xref sections with only one trailer13:15.14 
paulgardiner In any case, I don't believe there is an intention to use the byte ranges in the recreation of older versions of the document.13:16.19 
tor8 and the previous trailers are 'lost' when you incrementally save13:16.33 
  "Note: If a signed document is modified and saved by incremental update (see Sec-13:17.08 
  tion 3.4.5, “Incremental Updates”), the data corresponding to the byte range of the13:17.08 
  original signature is preserved. Therefore, if the signature is valid, it is possible to13:17.09 
  recreate the state of the document as it existed at the time of signing."13:17.09 
paulgardiner Really? I thought we always appended to the end of the file for incremental update.13:17.13 
tor8 we do, but there's nothing in the appended data that points to the previous end of file13:17.35 
  there's the "Prev" entry13:18.37 
  which points to the previous 'xref' section but not the actual EOF13:18.55 
  since the xref can be anywhere in the file13:19.01 
  and that sentence leads me to believe the ByteRange implies the length of the previous file13:20.12 
  (especially given how vague and implementation-driven "this is what acrobat does, do that and ignore what the spec actually says" the later additions to the PDF spec are)13:20.45 
paulgardiner I have PDF32000_2008 here. Is that the wrong version? It says it's v 1.713:24.36 
tor8 paulgardiner: it's the ISO version of the same text -13:24.49 
  worse typography, same content13:24.59 
paulgardiner My incremental updates section seems to be 7.5.613:25.32 
tor8 oh dear, this is worrying... our code assumes the 'trailer' always succeeds an 'xref' (old style) section13:27.10 
  but the spec says it precedes the 'startxref'13:27.17 
  I wonder if that might trip us up into doing a repair job on valid files13:28.35 
paulgardiner "precede" as in just before?13:28.43 
tor8 yes.13:28.48 
  of course, it then throws out the baby and the bathwater and the whole bathroom when they introduce 'Cross Reference streams' where the 'xref' and 'trailer' keywords are just gone13:29.14 
paulgardiner Well that seems not to be true.13:29.17 
  All the problems I'm having are with xref streams.13:29.43 
tor8 paulgardiner: the 'new style' ones?13:30.04 
paulgardiner yep13:30.12 
tor8 yeah, they don't have an 'xref' or 'trailer' keyword anywhere13:30.16 
  the only reliable end-of-file marker is the "startxref\n[0-9]+\n%%EOF\n" string13:30.54 
paulgardiner It's not the lack of keywords that is troubling me, but the position within the document.13:31.05 
tor8 but I think if you look at the ByteRange that would probably be enough?13:31.11 
paulgardiner tor8: the problem with that, is the whole point of what I'm trying to do at the moment is to validate the byte range.13:31.43 
tor8 and check that the byte range ends at an appropriate point?13:32.24 
paulgardiner Yeah.13:32.44 
tor8 you could do what adobe does, and say "this matched an earlier version" and add a question to 'restore the old version?' which would copy the file up to the end of the byteranges13:32.53 
paulgardiner AR seems to do that in some cases13:32.54 
  I think you are misreading that clause.13:33.57 
tor8 though maybe, just maybe, if our assumption about oldstyle is always the sequence 'xref <sections> trailer ... startxref ... %%EOF'13:34.00 
  but that would fail for new style where the trailer is at the head of the stream and could be anywhere in the file13:34.52 
paulgardiner It just points out the fact that an incremental update doesn't alter the bytes of the previous version of the document and hence nothing in the byte range changes13:35.02 
tor8 paulgardiner: I'm reading more into it than it says, by the sentence "herefore, if the signature is valid, it is possible to13:35.37 
  recreate the state of the document as it existed at the time of signing."13:35.37 
  the *therefore* is my key word13:35.49 
  but yes, it may be I'm fantasizing13:36.06 
  but I do wonder how you could recreate an old version other than parsing the whole file from the start and scanning for %%EOF13:36.39 
  since nothing in the trailer and xref chain point to the old version's eof13:36.57 
paulgardiner I've been reading "state" as reader state necessary to show the old version, not file state.13:37.52 
  .. which I believe can be done by just ignoring all the xref sections since the signing.13:38.33 
tor8 you could pop xref sections until you get to some other version13:38.40 
  but how do you know which one that is?13:38.44 
paulgardiner The signature field will be refered to by the one you should stop on?13:39.07 
tor8 referred to by what?13:39.21 
  the signature field could exist in all versions13:39.34 
  I know we discussed being able to time travel by popping xrefs13:40.22 
paulgardiner But presumably only one xref section refers to the version that you are looking at13:40.25 
tor8 I don't understand.13:41.04 
  consider this case: a file is created with a signature field. this is the original version A.13:41.18 
  then it is edited to create version B.13:41.26 
  then it is signed and saved as version C.13:41.31 
  then it is edited and saved as version D.13:41.36 
paulgardiner And what do you wish to achieve from that point?13:42.10 
tor8 that is my question for you.13:42.34 
  what does version C look like?13:42.58 
  signing the field, writes the digest somewhere. is that in a new incremental section?13:43.19 
paulgardiner yes13:43.29 
tor8 and this new xref section, is it included in the byteranges (minus the actual bytes with the digest checksum)?13:43.56 
paulgardiner I assume if I drop all xref sections since that one, I see the document as it was.13:43.57 
tor8 now, someone opens version D, and asks to verify the signature13:44.23 
  they sohuld see "this matches a previous version", right?13:44.36 
paulgardiner yes, and possibly "here's a list of changes since then"13:44.59 
  and an offer to show the version as signed13:45.14 
tor8 and you're saying how we find this is by looking at the signature field V object, and looking to see which xref subsection it is defined in?13:45.45 
paulgardiner That's what I've been assuming. I'm not currently trying to do that, but I assumed we might if we need to.13:46.18 
tor8 if we can be sure that the signature will be saved in an incremental update, we should be able to find it by that way13:46.48 
  because we can't track (a) the EOF for any given xref section, or (b) what it was when it was actually signed13:47.13 
  finding the subsection where the form field object was last updated would probably do well enough13:47.32 
  and we can show the document from that point onwards13:47.45 
  but we'd have to take care to flush all cached pdf_obj's when we rewind the xref view13:47.59 
paulgardiner Yeah, I was assuming that would work, but that wasn't what I was working on.13:48.02 
  I was just trying to find a way to validate the byte range values.13:48.16 
tor8 right, so consider if we implement the above13:49.00 
  we can rewind the view to C13:49.03 
paulgardiner To check that the signing software didn't maliciously use a small byte range that meant almost none of the file is included in the hash13:49.04 
tor8 but when checking it, we still don't know the actual EOF13:49.34 
  so we can't find such cases13:50.07 
  do we write a new byterange when signing? I thought that was part of the original structure created by the PDF authoring software.13:50.44 
paulgardiner So you are saying it cannot be done. AR is doing it for the case of the signing being the last thing done, but perhaps that is the only case it does it for,13:51.06 
tor8 It cannot be done trivially at least :)13:51.21 
paulgardiner byterange is part of the signature13:51.25 
  My original question was "hey tor look at this. I can't find a trivial way to do it" :-)13:51.59 
  The byte range is usually the whole document with a hole for the digest13:52.32 
tor8 Then the TL;DR version of my answer is: "Neither can I"13:52.36 
paulgardiner Damn! :-)13:52.52 
  for old style xrefs, I think looking for startxref after the trailer works, but for new style xrefs...13:53.44 
tor8 paulgardiner: yeah. for new style xrefs ... no can do.13:54.01 
paulgardiner Possibly AR doesn't check byte ranges other than for the case of the signature being the last change13:54.38 
 Forward 1 day (to 2018/07/11)>>> 
ghostscript.com #ghostscript
Search: