| <<<Back 1 day (to 2018/01/02) | 20180103 |
Robin_Watts | tor8, sebras: I've just been talking to paulgardiner about digital signatures. | 15:56.02 |
| Got a mo to run an idea past you? | 15:56.10 |
tor8 | Robin_Watts: sure. | 15:56.40 |
Robin_Watts | When we write a file with digital signatures, the old code writes the document to a file. | 15:56.42 |
| Then the signing process reads that file back in again to calculate stuff. | 15:56.57 |
| and then the file is updated. | 15:57.03 |
tor8 | yes. horrible stuff. | 15:57.17 |
Robin_Watts | Since that was done, we generalised it to write to fz_output. | 15:57.19 |
| And you can't read back from an fz_output. | 15:57.28 |
tor8 | mixing fz_outptu and FILE* reopening and reading and patching | 15:57.34 |
Robin_Watts | I think the correct solution here is to have an fz_output call that says "give me an fz_stream so I can read from what has been output". | 15:57.57 |
| With the understanding being that you should close the fz_stream before continuing to output. | 15:58.46 |
tor8 | Robin_Watts: how about a special checksumming fz_output? | 15:58.51 |
Robin_Watts | tor8: it's not just a checksumming fz_output that is needed (we discussed that idea :) ) | 15:59.09 |
tor8 | or a tee-ing fz_output that feeds output both to the backend and the digital signature checksumming while it's being written | 15:59.22 |
Robin_Watts | Yep, thought of that too. | 15:59.31 |
tor8 | okay. | 15:59.34 |
Robin_Watts | The problem is we need to 'checksum' all of the file, apart from the bits of the file where the signatures are to be written. | 15:59.59 |
| And we don't know where the signature data is to be written until we've written that much of the file. | 16:00.15 |
tor8 | well, that can be solved by either toggling a bit for the checksumming device, or by having a chained set of fz_outputs where the checksumming one forwards to the base one | 16:00.41 |
| and when you want to write bits that aren't to be checksummed, bypass the checksumming one | 16:01.00 |
Robin_Watts | tor8: So the situation is that we have a normal file, in the middle of which we have a PDF object that needs to be updated. | 16:03.58 |
tor8 | I'm not up to date with how the ByteRange stuff is handled when writing the PDF | 16:05.05 |
Robin_Watts | The object is 999 0 obj\n<< /ByteRange [ 888888888 88888888 8888888 8888888 ] /Contents .... / Filter ..... >> stream endstream endobj | 16:05.13 |
| And the byteranges are calculated to be everything except the stuff we're about to update. | 16:06.08 |
tor8 | presize_unsaved_signature_byteranges() looks like it puts dummy values into that ByteRange that are then found and overwritten in complete_signatures | 16:06.24 |
Robin_Watts | tor8: Yes. | 16:06.31 |
| So we'd need to be writing into the magic checksumming fz_output, at least until we start to write this object. | 16:07.18 |
tor8 | the old code has an ugly hack that assumes there's always a filename it can reopen; I asusme that's what you're trying to fix here? | 16:07.36 |
paulgardiner | I think the "/ByteRange ..." bit has to be included in the digest otherwise one could fake a signed document. | 16:07.48 |
Robin_Watts | The trick would be to see if we could stop the checksumming fz_output at exactly the right point, and start it again afterwards. | 16:07.56 |
| tor8: yes. | 16:07.59 |
paulgardiner | tor8: that is mainly what we are trying to fix | 16:08.26 |
Robin_Watts | It's possible that the magic checksumming fz_ouput could be made to work, but my feeling is that it could be hard, and nasty. | 16:08.53 |
tor8 | yeah. I expect knowing when to stop is the problem. maybe have a special bit in the pdf_obj flag words, and trigger on that in the pdf object printing. | 16:09.05 |
Robin_Watts | Wheras an fz_stream_from_output call is dead simple. | 16:09.17 |
paulgardiner | I also think "/ByteRange ..." being in the digest makes the doing it on the fly idea difficult | 16:09.29 |
Robin_Watts | paulgardiner: Indeed. | 16:09.35 |
tor8 | Robin_Watts: it would be dead simple ... IFF you're working from a file or memory buffer | 16:09.38 |
Robin_Watts | tor8: Right, and if you're not, then you can't do digital signatures :) | 16:09.53 |
tor8 | I can live with that. it's certainly better than our current hack :) | 16:10.08 |
paulgardiner | Magic! | 16:10.17 |
Robin_Watts | tor8: I suspect it wouldn't be too hard for secureFS either, cos that already has a reading thing. | 16:10.26 |
tor8 | but it does run into possible problems with windows and opening files that are already open | 16:10.32 |
| Robin_Watts: push come to shove, you could save to a fz_buffer and then write that out | 16:10.55 |
Robin_Watts | tor8: Hmm, maybe, yes. | 16:13.17 |
| (windows problems) | 16:13.23 |
| but then if that's a problem, we can have a special fz_stream_from_file_thats_already_being_written. | 16:13.57 |
| that resuses the given FILE * and doesn't close it. | 16:14.10 |
paulgardiner | Can you close and reopen an fz_output? | 16:14.11 |
tor8 | because I assume you'll be wanting to open the fz_output, write the file, open as stream to digest, seek in the output to patch, close the lot | 16:14.14 |
Robin_Watts | tor8: yes, exactly. | 16:14.25 |
tor8 | Robin_Watts: we could always open FILE as read-write | 16:14.31 |
Robin_Watts | Yeah, that's what I'm thinking. | 16:14.38 |
tor8 | and just reuse the FILE* | 16:14.39 |
paulgardiner | So would we open read-write as default just in case? | 16:15.32 |
Robin_Watts | paulgardiner: probably, yes. | 16:15.57 |
paulgardiner | Is closing and reopening not a possibility? | 16:16.09 |
Robin_Watts | paulgardiner: Not nicely, no. | 16:16.17 |
| Especially cos of ****ing virus programs that spot .pdf's being closed and immediately open them to scan them, only to stop your reopening attempt working. | 16:16.47 |
paulgardiner | Opening read-write should be okay I guess. | 16:17.21 |
Robin_Watts | We really shouldn't close and open a file - consider the 'I am writing to a file that happens to be on dropbox' case. | 16:17.35 |
| dropbox does versioning, so we don't want it to store both the unsigned and signed versions because it got closed. | 16:17.56 |
paulgardiner | Sounds like we have a plan | 16:18.23 |
Robin_Watts | read-write should be fine. I can't see a case where that should hurt us. | 16:18.25 |
| indeed. | 16:18.32 |
| I shall go back to bashing munin :) | 16:18.45 |
| tor8: Updated commits on robin/master. | 16:18.54 |
paulgardiner | Thanks for the idea. | 16:18.58 |
Robin_Watts | tor8: Just waiting for a) a review, and b) an updated random commit. | 16:20.59 |
tor8 | Robin_Watts: it might be a problem on filesystems where you have read-access to the PDF but not write access | 16:23.04 |
| in case you just want to open it for reading | 16:23.14 |
Robin_Watts | tor8: When we're signing, we writing out the file. | 16:23.25 |
tor8 | but that's not really the case for pdf_write_document | 16:23.37 |
Robin_Watts | fz_outputs have to have write access, or they aren't much use :) | 16:23.49 |
tor8 | I was thinking the reverse case for some brain fuddled reason... fz_output_from_stream :) | 16:24.15 |
| so nvm me | 16:24.29 |
| Robin_Watts: while you're poking with this, it would be nice if we could fix save_incremental to be a bit more robust and work even if you're not appending to the same file name | 16:26.04 |
Robin_Watts | s/Robin_Watts/paulgardiner/ :) | 16:26.18 |
tor8 | or write invalid files if you use it 'incorrectly' and ask for an incremental save of a new document (if we haven't already fixed that problem) | 16:26.36 |
Robin_Watts | I'm merely kibbitzing here, it's paulgardiner with his sleeves rolled up. | 16:26.55 |
tor8 | paulgardiner: what I just said to Robin :) | 16:27.13 |
paulgardiner | tor8: noted | 16:27.48 |
tor8 | it would be nice if we could read from the original fz_stream to copy the 'old' bits and then write the new sections, so we can incrementally save to a new file | 16:31.32 |
| and not need to bother with the 'append' flag | 16:31.41 |
| I don't want to clobber the original file if saving fails or throws an exception | 16:32.31 |
paulgardiner | Agreed. I think the apps do the copying at the moment, but better done as part of the write call. | 16:33.42 |
mike1 | greetings | 22:31.36 |
| is anyone here available for a quick question? | 22:31.43 |
| new to pdfs and mupdf and am trying to find an equivalent set of commands from some python based code i have | 22:32.23 |
| just to throw it out there, when I run "mutool show a.pdf 1" I get breakout of the first object | 22:48.26 |
| i.e. https://pastebin.com/W8HkgCLH | 22:48.27 |
| but how do I traverse the objects listed there, like AcroForm or Metadata? | 22:48.47 |
| I understand that this is a tree structure but can't figure out how I walk the tree | 22:49.03 |
| I'm looking to do something equivalent to what pyquery and pdfminer can do | 22:49.19 |
sebras | mike1: to show the AcroForm object you would do "mutool show a.pdf 83" | 23:44.13 |
| or "mutool show a.pdf 81" for the meta data object | 23:44.23 |
| mike1: if you want you can also so | 23:44.32 |
| do "mutool show a.pdf xref" to show the xref which lists all objects and their offsets. | 23:44.58 |
| or "mutool show a.pdf trailer" to show the trailer "object" from which a PDF parser starts to locate all the things it needs. | 23:45.30 |
| Forward 1 day (to 2018/01/04)>>> | |