Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2018/01/02)20180103 
Robin_Watts tor8, sebras: I've just been talking to paulgardiner about digital signatures.15:56.02 
  Got a mo to run an idea past you?15:56.10 
tor8 Robin_Watts: sure.15:56.40 
Robin_Watts When we write a file with digital signatures, the old code writes the document to a file.15:56.42 
  Then the signing process reads that file back in again to calculate stuff.15:56.57 
  and then the file is updated.15:57.03 
tor8 yes. horrible stuff.15:57.17 
Robin_Watts Since that was done, we generalised it to write to fz_output.15:57.19 
  And you can't read back from an fz_output.15:57.28 
tor8 mixing fz_outptu and FILE* reopening and reading and patching15:57.34 
Robin_Watts I think the correct solution here is to have an fz_output call that says "give me an fz_stream so I can read from what has been output".15:57.57 
  With the understanding being that you should close the fz_stream before continuing to output.15:58.46 
tor8 Robin_Watts: how about a special checksumming fz_output?15:58.51 
Robin_Watts tor8: it's not just a checksumming fz_output that is needed (we discussed that idea :) )15:59.09 
tor8 or a tee-ing fz_output that feeds output both to the backend and the digital signature checksumming while it's being written15:59.22 
Robin_Watts Yep, thought of that too.15:59.31 
tor8 okay.15:59.34 
Robin_Watts The problem is we need to 'checksum' all of the file, apart from the bits of the file where the signatures are to be written.15:59.59 
  And we don't know where the signature data is to be written until we've written that much of the file.16:00.15 
tor8 well, that can be solved by either toggling a bit for the checksumming device, or by having a chained set of fz_outputs where the checksumming one forwards to the base one16:00.41 
  and when you want to write bits that aren't to be checksummed, bypass the checksumming one16:01.00 
Robin_Watts tor8: So the situation is that we have a normal file, in the middle of which we have a PDF object that needs to be updated.16:03.58 
tor8 I'm not up to date with how the ByteRange stuff is handled when writing the PDF16:05.05 
Robin_Watts The object is 999 0 obj\n<< /ByteRange [ 888888888 88888888 8888888 8888888 ] /Contents .... / Filter ..... >> stream endstream endobj16:05.13 
  And the byteranges are calculated to be everything except the stuff we're about to update.16:06.08 
tor8 presize_unsaved_signature_byteranges() looks like it puts dummy values into that ByteRange that are then found and overwritten in complete_signatures16:06.24 
Robin_Watts tor8: Yes.16:06.31 
  So we'd need to be writing into the magic checksumming fz_output, at least until we start to write this object.16:07.18 
tor8 the old code has an ugly hack that assumes there's always a filename it can reopen; I asusme that's what you're trying to fix here?16:07.36 
paulgardiner I think the "/ByteRange ..." bit has to be included in the digest otherwise one could fake a signed document.16:07.48 
Robin_Watts The trick would be to see if we could stop the checksumming fz_output at exactly the right point, and start it again afterwards.16:07.56 
  tor8: yes.16:07.59 
paulgardiner tor8: that is mainly what we are trying to fix16:08.26 
Robin_Watts It's possible that the magic checksumming fz_ouput could be made to work, but my feeling is that it could be hard, and nasty.16:08.53 
tor8 yeah. I expect knowing when to stop is the problem. maybe have a special bit in the pdf_obj flag words, and trigger on that in the pdf object printing.16:09.05 
Robin_Watts Wheras an fz_stream_from_output call is dead simple.16:09.17 
paulgardiner I also think "/ByteRange ..." being in the digest makes the doing it on the fly idea difficult16:09.29 
Robin_Watts paulgardiner: Indeed.16:09.35 
tor8 Robin_Watts: it would be dead simple ... IFF you're working from a file or memory buffer16:09.38 
Robin_Watts tor8: Right, and if you're not, then you can't do digital signatures :)16:09.53 
tor8 I can live with that. it's certainly better than our current hack :)16:10.08 
paulgardiner Magic!16:10.17 
Robin_Watts tor8: I suspect it wouldn't be too hard for secureFS either, cos that already has a reading thing.16:10.26 
tor8 but it does run into possible problems with windows and opening files that are already open16:10.32 
  Robin_Watts: push come to shove, you could save to a fz_buffer and then write that out16:10.55 
Robin_Watts tor8: Hmm, maybe, yes.16:13.17 
  (windows problems)16:13.23 
  but then if that's a problem, we can have a special fz_stream_from_file_thats_already_being_written.16:13.57 
  that resuses the given FILE * and doesn't close it.16:14.10 
paulgardiner Can you close and reopen an fz_output?16:14.11 
tor8 because I assume you'll be wanting to open the fz_output, write the file, open as stream to digest, seek in the output to patch, close the lot16:14.14 
Robin_Watts tor8: yes, exactly.16:14.25 
tor8 Robin_Watts: we could always open FILE as read-write16:14.31 
Robin_Watts Yeah, that's what I'm thinking.16:14.38 
tor8 and just reuse the FILE*16:14.39 
paulgardiner So would we open read-write as default just in case?16:15.32 
Robin_Watts paulgardiner: probably, yes.16:15.57 
paulgardiner Is closing and reopening not a possibility?16:16.09 
Robin_Watts paulgardiner: Not nicely, no.16:16.17 
  Especially cos of ****ing virus programs that spot .pdf's being closed and immediately open them to scan them, only to stop your reopening attempt working.16:16.47 
paulgardiner Opening read-write should be okay I guess.16:17.21 
Robin_Watts We really shouldn't close and open a file - consider the 'I am writing to a file that happens to be on dropbox' case.16:17.35 
  dropbox does versioning, so we don't want it to store both the unsigned and signed versions because it got closed.16:17.56 
paulgardiner Sounds like we have a plan16:18.23 
Robin_Watts read-write should be fine. I can't see a case where that should hurt us.16:18.25 
  indeed.16:18.32 
  I shall go back to bashing munin :)16:18.45 
  tor8: Updated commits on robin/master.16:18.54 
paulgardiner Thanks for the idea.16:18.58 
Robin_Watts tor8: Just waiting for a) a review, and b) an updated random commit.16:20.59 
tor8 Robin_Watts: it might be a problem on filesystems where you have read-access to the PDF but not write access16:23.04 
  in case you just want to open it for reading16:23.14 
Robin_Watts tor8: When we're signing, we writing out the file.16:23.25 
tor8 but that's not really the case for pdf_write_document16:23.37 
Robin_Watts fz_outputs have to have write access, or they aren't much use :)16:23.49 
tor8 I was thinking the reverse case for some brain fuddled reason... fz_output_from_stream :)16:24.15 
  so nvm me16:24.29 
  Robin_Watts: while you're poking with this, it would be nice if we could fix save_incremental to be a bit more robust and work even if you're not appending to the same file name16:26.04 
Robin_Watts s/Robin_Watts/paulgardiner/ :)16:26.18 
tor8 or write invalid files if you use it 'incorrectly' and ask for an incremental save of a new document (if we haven't already fixed that problem)16:26.36 
Robin_Watts I'm merely kibbitzing here, it's paulgardiner with his sleeves rolled up.16:26.55 
tor8 paulgardiner: what I just said to Robin :)16:27.13 
paulgardiner tor8: noted16:27.48 
tor8 it would be nice if we could read from the original fz_stream to copy the 'old' bits and then write the new sections, so we can incrementally save to a new file16:31.32 
  and not need to bother with the 'append' flag16:31.41 
  I don't want to clobber the original file if saving fails or throws an exception16:32.31 
paulgardiner Agreed. I think the apps do the copying at the moment, but better done as part of the write call.16:33.42 
mike1 greetings22:31.36 
  is anyone here available for a quick question?22:31.43 
  new to pdfs and mupdf and am trying to find an equivalent set of commands from some python based code i have22:32.23 
  just to throw it out there, when I run "mutool show a.pdf 1" I get breakout of the first object22:48.26 
  i.e. https://pastebin.com/W8HkgCLH22:48.27 
  but how do I traverse the objects listed there, like AcroForm or Metadata?22:48.47 
  I understand that this is a tree structure but can't figure out how I walk the tree22:49.03 
  I'm looking to do something equivalent to what pyquery and pdfminer can do22:49.19 
sebras mike1: to show the AcroForm object you would do "mutool show a.pdf 83"23:44.13 
  or "mutool show a.pdf 81" for the meta data object23:44.23 
  mike1: if you want you can also so23:44.32 
  do "mutool show a.pdf xref" to show the xref which lists all objects and their offsets.23:44.58 
  or "mutool show a.pdf trailer" to show the trailer "object" from which a PDF parser starts to locate all the things it needs.23:45.30 
 Forward 1 day (to 2018/01/04)>>> 
ghostscript.com #ghostscript
Search: