| <<<Back 1 day (to 2018/07/09) | 20180710 |
inflex | Is the muPDF "outline" a small-page-view column down the side of the viewer, or is it something else entirely? | 10:16.57 |
| If muPDF doesn't already support something like that, then I suppose I could look at generating bitmapped thumbnails of the pages on load. | 10:26.30 |
kens | While MuPDF (the core code) can certainly access the /Outlines of the PDF file, I don't think any of the demo apps expose it (ie add a user interface to allow you to select a page based on the Outlines). I could be wrong though. | 10:27.40 |
inflex | np, all good. I noticed the mupdf-gl does have the press 'o' for outlines, but maybe that's actually more a "outline/description" of the PDF. | 10:36.44 |
kens | Ah I wasn't aware of that, I'm afraid I don't know what it does, sorry | 10:37.23 |
tor8 | inflex: the 'o' outline is the table of contents (called the 'outline' in PDF specification) | 10:37.26 |
inflex | thanks tor8. Is there something I can expose from the inner workings to achieve the side-pane of thumbnails, or is it something I just need to generate myself? | 11:33.16 |
tor8 | inflex: you'll have to render the page thumbnails yourself. beware that it could be very slow on image-intensive pages. | 11:37.40 |
| rendering the thumbnail is going to involve parsing and rendering the whole page, just at a small size | 11:37.56 |
| there are no page thumbnail images stored in the PDF format | 11:38.12 |
| so unless you're doing multiple threads and rendering in the background, I can't recommend it | 11:38.45 |
inflex | tor8, that's fine, I had suspected as much | 12:03.30 |
paulgardiner | tor8: I've run into a problem with signature support that I'm struggling with. The signature field dictionary refers to a byte range, which specifies what parts of the document are hashed as part of signing. When verifying, we need to check that the byte range is reasonable: it should cover what was the whole document at the time of signing (although that might be a prefix of the document at... | 12:41.10 |
| ...time of verification because of subsequent incremental updates). | 12:41.12 |
| I've tried changing the loading functions to look for startxref <number> %%EOF after reading the trailer. That works for anything mupdf produces, but I'm seeing some files where the trailer is stored near the beginning of the file. I'm a bit lost how to approach this now. | 12:41.18 |
| Possibly it fails with some file mupdf produces, come to think of it. The structure I'm seeing may be due to linearization | 12:54.09 |
tor8 | paulgardiner: I don't think I can be of much help... you and robin have changed that code so much I no longer recognize it or know what it does... | 12:58.39 |
paulgardiner | I could search for the startxref <number> %%EOF for which the number corresponds to the start of the xref, but that might require reading through the whole document. | 12:59.08 |
| tor8: I'm not sure I'm asking about mupdf so much as about PDF. | 12:59.33 |
tor8 | paulgardiner: the trailer can be anywhere, often at the beginning with linearized files I expect. | 13:00.56 |
| can't you just save the size of the file when we first open the PDF? | 13:01.20 |
| we start by scanning for 'startxref' at the end to find the trailer | 13:01.32 |
paulgardiner | That works only for the last xref section. | 13:01.45 |
inflex | tor8, almost wondering, since most of the PDFs that are going to be viewed through the viewer will be used over and over again (schematic diagrams), it could be possible to create PNG thumbnails once-off | 13:01.54 |
| tor8, and have them stored alongside the actual PDF as a metafile. | 13:02.21 |
paulgardiner | tor8: I need to be able to work out the sizes the file corresponding to each xref section. | 13:02.54 |
tor8 | paulgardiner: you can't reasonably verify anything other than the last xref section? | 13:02.58 |
paulgardiner | tor8: I think we need to for the case of multiple signatures. | 13:03.21 |
tor8 | I mean, any additional xref sections or data appended to the end can very much change the document | 13:03.22 |
paulgardiner | Possibly we also need to determine if an incremental update adds only signatures. | 13:03.50 |
tor8 | and saying "this signature checks out, because the subset of the file that was used when it was signed matches, but hey, I'm just kidding somebody replaced bits with newer objects at the end of the file" | 13:04.28 |
| is not okay | 13:04.42 |
| I mean, it's perfectly plausible to replace the content stream of a page with a newer generation object in an incremental update | 13:05.05 |
paulgardiner | See above. | 13:05.14 |
tor8 | that (and any other edits) *should* invalitade *all* signatures. | 13:05.17 |
paulgardiner | the case of the only change being to add another signature. | 13:05.43 |
tor8 | wouldn't multiple signatures put each other in the byte ranges that are excluded? | 13:05.53 |
paulgardiner | It seems not. If using AR I sign a document that has multiple signature fields, it doesn't leave room for the digest of the other signatures. | 13:07.10 |
tor8 | right. so each signature checks a certain subset of the file. | 13:07.47 |
paulgardiner | Yep. | 13:07.54 |
tor8 | and the ranges are shorter than a file if incrementally updated | 13:08.12 |
paulgardiner | Yep. | 13:08.20 |
tor8 | so if we incrementally update the file, we can't tell if those edits in general invalidate a signature | 13:08.54 |
| or well, if they should invalidate it | 13:09.02 |
paulgardiner | AR will say something like "This signature is valid but for an early version of the document, and then asks if the user would like to view the ealier version" I assume it doesn't do that if the only change is to add another signature | 13:10.02 |
| AR can, in some cases at least, list the changes that have been made since a signing. | 13:10.37 |
tor8 | paulgardiner: looking at the spec (pdfref17.pdf page 726) there's a note | 13:11.16 |
| If a signed document is modified and saved by incremental update, bla bla bla, it is possible to recreate the state of the document as it existed at the time of signing. | 13:11.50 |
| it doesn't say how, other than implying the ByteRange array | 13:12.07 |
| so I would have to assume the last range will imply the end of the file | 13:12.22 |
| at the time of signing | 13:12.28 |
| and it's only possible to tell the end of the file for the latest iteration (the startxref entry) | 13:13.16 |
paulgardiner | I don't see how that relates to byte ranges | 13:13.20 |
tor8 | incremental updates don't write the end of the file of the previous version, it just chains the xrefs | 13:13.38 |
| Note: If a signed document is modified and saved by incremental update (see Sec- | 13:14.11 |
| tion 3.4.5, âIncremental Updatesâ), the data corresponding to the byte range of the | 13:14.11 |
| original signature is preserved. Therefore, if the signature is valid, it is possible to | 13:14.11 |
| recreate the state of the document as it existed at the time of signing."" | 13:14.11 |
paulgardiner | You can create the state of the document at the time of signing by just ignoring the subsequent xrefs, I think | 13:14.13 |
| No need for the byte ranges. | 13:14.29 |
| The byte ranges specify what is hashed. | 13:14.43 |
tor8 | there is not a guaranteed one-to-one mapping between xref sections and incremental updates | 13:14.46 |
| it is entirely possible to have multiple xref sections with only one trailer | 13:15.14 |
paulgardiner | In any case, I don't believe there is an intention to use the byte ranges in the recreation of older versions of the document. | 13:16.19 |
tor8 | and the previous trailers are 'lost' when you incrementally save | 13:16.33 |
| "Note: If a signed document is modified and saved by incremental update (see Sec- | 13:17.08 |
| tion 3.4.5, âIncremental Updatesâ), the data corresponding to the byte range of the | 13:17.08 |
| original signature is preserved. Therefore, if the signature is valid, it is possible to | 13:17.09 |
| recreate the state of the document as it existed at the time of signing." | 13:17.09 |
paulgardiner | Really? I thought we always appended to the end of the file for incremental update. | 13:17.13 |
tor8 | we do, but there's nothing in the appended data that points to the previous end of file | 13:17.35 |
| there's the "Prev" entry | 13:18.37 |
| which points to the previous 'xref' section but not the actual EOF | 13:18.55 |
| since the xref can be anywhere in the file | 13:19.01 |
| and that sentence leads me to believe the ByteRange implies the length of the previous file | 13:20.12 |
| (especially given how vague and implementation-driven "this is what acrobat does, do that and ignore what the spec actually says" the later additions to the PDF spec are) | 13:20.45 |
paulgardiner | I have PDF32000_2008 here. Is that the wrong version? It says it's v 1.7 | 13:24.36 |
tor8 | paulgardiner: it's the ISO version of the same text - | 13:24.49 |
| worse typography, same content | 13:24.59 |
paulgardiner | My incremental updates section seems to be 7.5.6 | 13:25.32 |
tor8 | oh dear, this is worrying... our code assumes the 'trailer' always succeeds an 'xref' (old style) section | 13:27.10 |
| but the spec says it precedes the 'startxref' | 13:27.17 |
| I wonder if that might trip us up into doing a repair job on valid files | 13:28.35 |
paulgardiner | "precede" as in just before? | 13:28.43 |
tor8 | yes. | 13:28.48 |
| of course, it then throws out the baby and the bathwater and the whole bathroom when they introduce 'Cross Reference streams' where the 'xref' and 'trailer' keywords are just gone | 13:29.14 |
paulgardiner | Well that seems not to be true. | 13:29.17 |
| All the problems I'm having are with xref streams. | 13:29.43 |
tor8 | paulgardiner: the 'new style' ones? | 13:30.04 |
paulgardiner | yep | 13:30.12 |
tor8 | yeah, they don't have an 'xref' or 'trailer' keyword anywhere | 13:30.16 |
| the only reliable end-of-file marker is the "startxref\n[0-9]+\n%%EOF\n" string | 13:30.54 |
paulgardiner | It's not the lack of keywords that is troubling me, but the position within the document. | 13:31.05 |
tor8 | but I think if you look at the ByteRange that would probably be enough? | 13:31.11 |
paulgardiner | tor8: the problem with that, is the whole point of what I'm trying to do at the moment is to validate the byte range. | 13:31.43 |
tor8 | and check that the byte range ends at an appropriate point? | 13:32.24 |
paulgardiner | Yeah. | 13:32.44 |
tor8 | you could do what adobe does, and say "this matched an earlier version" and add a question to 'restore the old version?' which would copy the file up to the end of the byteranges | 13:32.53 |
paulgardiner | AR seems to do that in some cases | 13:32.54 |
| I think you are misreading that clause. | 13:33.57 |
tor8 | though maybe, just maybe, if our assumption about oldstyle is always the sequence 'xref <sections> trailer ... startxref ... %%EOF' | 13:34.00 |
| but that would fail for new style where the trailer is at the head of the stream and could be anywhere in the file | 13:34.52 |
paulgardiner | It just points out the fact that an incremental update doesn't alter the bytes of the previous version of the document and hence nothing in the byte range changes | 13:35.02 |
tor8 | paulgardiner: I'm reading more into it than it says, by the sentence "herefore, if the signature is valid, it is possible to | 13:35.37 |
| recreate the state of the document as it existed at the time of signing." | 13:35.37 |
| the *therefore* is my key word | 13:35.49 |
| but yes, it may be I'm fantasizing | 13:36.06 |
| but I do wonder how you could recreate an old version other than parsing the whole file from the start and scanning for %%EOF | 13:36.39 |
| since nothing in the trailer and xref chain point to the old version's eof | 13:36.57 |
paulgardiner | I've been reading "state" as reader state necessary to show the old version, not file state. | 13:37.52 |
| .. which I believe can be done by just ignoring all the xref sections since the signing. | 13:38.33 |
tor8 | you could pop xref sections until you get to some other version | 13:38.40 |
| but how do you know which one that is? | 13:38.44 |
paulgardiner | The signature field will be refered to by the one you should stop on? | 13:39.07 |
tor8 | referred to by what? | 13:39.21 |
| the signature field could exist in all versions | 13:39.34 |
| I know we discussed being able to time travel by popping xrefs | 13:40.22 |
paulgardiner | But presumably only one xref section refers to the version that you are looking at | 13:40.25 |
tor8 | I don't understand. | 13:41.04 |
| consider this case: a file is created with a signature field. this is the original version A. | 13:41.18 |
| then it is edited to create version B. | 13:41.26 |
| then it is signed and saved as version C. | 13:41.31 |
| then it is edited and saved as version D. | 13:41.36 |
paulgardiner | And what do you wish to achieve from that point? | 13:42.10 |
tor8 | that is my question for you. | 13:42.34 |
| what does version C look like? | 13:42.58 |
| signing the field, writes the digest somewhere. is that in a new incremental section? | 13:43.19 |
paulgardiner | yes | 13:43.29 |
tor8 | and this new xref section, is it included in the byteranges (minus the actual bytes with the digest checksum)? | 13:43.56 |
paulgardiner | I assume if I drop all xref sections since that one, I see the document as it was. | 13:43.57 |
tor8 | now, someone opens version D, and asks to verify the signature | 13:44.23 |
| they sohuld see "this matches a previous version", right? | 13:44.36 |
paulgardiner | yes, and possibly "here's a list of changes since then" | 13:44.59 |
| and an offer to show the version as signed | 13:45.14 |
tor8 | and you're saying how we find this is by looking at the signature field V object, and looking to see which xref subsection it is defined in? | 13:45.45 |
paulgardiner | That's what I've been assuming. I'm not currently trying to do that, but I assumed we might if we need to. | 13:46.18 |
tor8 | if we can be sure that the signature will be saved in an incremental update, we should be able to find it by that way | 13:46.48 |
| because we can't track (a) the EOF for any given xref section, or (b) what it was when it was actually signed | 13:47.13 |
| finding the subsection where the form field object was last updated would probably do well enough | 13:47.32 |
| and we can show the document from that point onwards | 13:47.45 |
| but we'd have to take care to flush all cached pdf_obj's when we rewind the xref view | 13:47.59 |
paulgardiner | Yeah, I was assuming that would work, but that wasn't what I was working on. | 13:48.02 |
| I was just trying to find a way to validate the byte range values. | 13:48.16 |
tor8 | right, so consider if we implement the above | 13:49.00 |
| we can rewind the view to C | 13:49.03 |
paulgardiner | To check that the signing software didn't maliciously use a small byte range that meant almost none of the file is included in the hash | 13:49.04 |
tor8 | but when checking it, we still don't know the actual EOF | 13:49.34 |
| so we can't find such cases | 13:50.07 |
| do we write a new byterange when signing? I thought that was part of the original structure created by the PDF authoring software. | 13:50.44 |
paulgardiner | So you are saying it cannot be done. AR is doing it for the case of the signing being the last thing done, but perhaps that is the only case it does it for, | 13:51.06 |
tor8 | It cannot be done trivially at least :) | 13:51.21 |
paulgardiner | byterange is part of the signature | 13:51.25 |
| My original question was "hey tor look at this. I can't find a trivial way to do it" :-) | 13:51.59 |
| The byte range is usually the whole document with a hole for the digest | 13:52.32 |
tor8 | Then the TL;DR version of my answer is: "Neither can I" | 13:52.36 |
paulgardiner | Damn! :-) | 13:52.52 |
| for old style xrefs, I think looking for startxref after the trailer works, but for new style xrefs... | 13:53.44 |
tor8 | paulgardiner: yeah. for new style xrefs ... no can do. | 13:54.01 |
paulgardiner | Possibly AR doesn't check byte ranges other than for the case of the signature being the last change | 13:54.38 |
| Forward 1 day (to 2018/07/11)>>> | |