MuPDF IRC logs

	<<<Back 1 day (to 2018/01/02)	20180103
Robin_Watts	tor8, sebras: I've just been talking to paulgardiner about digital signatures.	15:56.02
	Got a mo to run an idea past you?	15:56.10
tor8	Robin_Watts: sure.	15:56.40
Robin_Watts	When we write a file with digital signatures, the old code writes the document to a file.	15:56.42
	Then the signing process reads that file back in again to calculate stuff.	15:56.57
	and then the file is updated.	15:57.03
tor8	yes. horrible stuff.	15:57.17
Robin_Watts	Since that was done, we generalised it to write to fz_output.	15:57.19
	And you can't read back from an fz_output.	15:57.28
tor8	mixing fz_outptu and FILE* reopening and reading and patching	15:57.34
Robin_Watts	I think the correct solution here is to have an fz_output call that says "give me an fz_stream so I can read from what has been output".	15:57.57
	With the understanding being that you should close the fz_stream before continuing to output.	15:58.46
tor8	Robin_Watts: how about a special checksumming fz_output?	15:58.51
Robin_Watts	tor8: it's not just a checksumming fz_output that is needed (we discussed that idea :) )	15:59.09
tor8	or a tee-ing fz_output that feeds output both to the backend and the digital signature checksumming while it's being written	15:59.22
Robin_Watts	Yep, thought of that too.	15:59.31
tor8	okay.	15:59.34
Robin_Watts	The problem is we need to 'checksum' all of the file, apart from the bits of the file where the signatures are to be written.	15:59.59
	And we don't know where the signature data is to be written until we've written that much of the file.	16:00.15
tor8	well, that can be solved by either toggling a bit for the checksumming device, or by having a chained set of fz_outputs where the checksumming one forwards to the base one	16:00.41
	and when you want to write bits that aren't to be checksummed, bypass the checksumming one	16:01.00
Robin_Watts	tor8: So the situation is that we have a normal file, in the middle of which we have a PDF object that needs to be updated.	16:03.58
tor8	I'm not up to date with how the ByteRange stuff is handled when writing the PDF	16:05.05
Robin_Watts	The object is 999 0 obj\n<< /ByteRange [ 888888888 88888888 8888888 8888888 ] /Contents .... / Filter ..... >> stream endstream endobj	16:05.13
	And the byteranges are calculated to be everything except the stuff we're about to update.	16:06.08
tor8	presize_unsaved_signature_byteranges() looks like it puts dummy values into that ByteRange that are then found and overwritten in complete_signatures	16:06.24
Robin_Watts	tor8: Yes.	16:06.31
	So we'd need to be writing into the magic checksumming fz_output, at least until we start to write this object.	16:07.18
tor8	the old code has an ugly hack that assumes there's always a filename it can reopen; I asusme that's what you're trying to fix here?	16:07.36
paulgardiner	I think the "/ByteRange ..." bit has to be included in the digest otherwise one could fake a signed document.	16:07.48
Robin_Watts	The trick would be to see if we could stop the checksumming fz_output at exactly the right point, and start it again afterwards.	16:07.56
	tor8: yes.	16:07.59
paulgardiner	tor8: that is mainly what we are trying to fix	16:08.26
Robin_Watts	It's possible that the magic checksumming fz_ouput could be made to work, but my feeling is that it could be hard, and nasty.	16:08.53
tor8	yeah. I expect knowing when to stop is the problem. maybe have a special bit in the pdf_obj flag words, and trigger on that in the pdf object printing.	16:09.05
Robin_Watts	Wheras an fz_stream_from_output call is dead simple.	16:09.17
paulgardiner	I also think "/ByteRange ..." being in the digest makes the doing it on the fly idea difficult	16:09.29
Robin_Watts	paulgardiner: Indeed.	16:09.35
tor8	Robin_Watts: it would be dead simple ... IFF you're working from a file or memory buffer	16:09.38
Robin_Watts	tor8: Right, and if you're not, then you can't do digital signatures :)	16:09.53
tor8	I can live with that. it's certainly better than our current hack :)	16:10.08
paulgardiner	Magic!	16:10.17
Robin_Watts	tor8: I suspect it wouldn't be too hard for secureFS either, cos that already has a reading thing.	16:10.26
tor8	but it does run into possible problems with windows and opening files that are already open	16:10.32
	Robin_Watts: push come to shove, you could save to a fz_buffer and then write that out	16:10.55
Robin_Watts	tor8: Hmm, maybe, yes.	16:13.17
	(windows problems)	16:13.23
	but then if that's a problem, we can have a special fz_stream_from_file_thats_already_being_written.	16:13.57
	that resuses the given FILE * and doesn't close it.	16:14.10
paulgardiner	Can you close and reopen an fz_output?	16:14.11
tor8	because I assume you'll be wanting to open the fz_output, write the file, open as stream to digest, seek in the output to patch, close the lot	16:14.14
Robin_Watts	tor8: yes, exactly.	16:14.25
tor8	Robin_Watts: we could always open FILE as read-write	16:14.31
Robin_Watts	Yeah, that's what I'm thinking.	16:14.38
tor8	and just reuse the FILE*	16:14.39
paulgardiner	So would we open read-write as default just in case?	16:15.32
Robin_Watts	paulgardiner: probably, yes.	16:15.57
paulgardiner	Is closing and reopening not a possibility?	16:16.09
Robin_Watts	paulgardiner: Not nicely, no.	16:16.17
	Especially cos of ****ing virus programs that spot .pdf's being closed and immediately open them to scan them, only to stop your reopening attempt working.	16:16.47
paulgardiner	Opening read-write should be okay I guess.	16:17.21
Robin_Watts	We really shouldn't close and open a file - consider the 'I am writing to a file that happens to be on dropbox' case.	16:17.35
	dropbox does versioning, so we don't want it to store both the unsigned and signed versions because it got closed.	16:17.56
paulgardiner	Sounds like we have a plan	16:18.23
Robin_Watts	read-write should be fine. I can't see a case where that should hurt us.	16:18.25
	indeed.	16:18.32
	I shall go back to bashing munin :)	16:18.45
	tor8: Updated commits on robin/master.	16:18.54
paulgardiner	Thanks for the idea.	16:18.58
Robin_Watts	tor8: Just waiting for a) a review, and b) an updated random commit.	16:20.59
tor8	Robin_Watts: it might be a problem on filesystems where you have read-access to the PDF but not write access	16:23.04
	in case you just want to open it for reading	16:23.14
Robin_Watts	tor8: When we're signing, we writing out the file.	16:23.25
tor8	but that's not really the case for pdf_write_document	16:23.37
Robin_Watts	fz_outputs have to have write access, or they aren't much use :)	16:23.49
tor8	I was thinking the reverse case for some brain fuddled reason... fz_output_from_stream :)	16:24.15
	so nvm me	16:24.29
	Robin_Watts: while you're poking with this, it would be nice if we could fix save_incremental to be a bit more robust and work even if you're not appending to the same file name	16:26.04
Robin_Watts	s/Robin_Watts/paulgardiner/ :)	16:26.18
tor8	or write invalid files if you use it 'incorrectly' and ask for an incremental save of a new document (if we haven't already fixed that problem)	16:26.36
Robin_Watts	I'm merely kibbitzing here, it's paulgardiner with his sleeves rolled up.	16:26.55
tor8	paulgardiner: what I just said to Robin :)	16:27.13
paulgardiner	tor8: noted	16:27.48
tor8	it would be nice if we could read from the original fz_stream to copy the 'old' bits and then write the new sections, so we can incrementally save to a new file	16:31.32
	and not need to bother with the 'append' flag	16:31.41
	I don't want to clobber the original file if saving fails or throws an exception	16:32.31
paulgardiner	Agreed. I think the apps do the copying at the moment, but better done as part of the write call.	16:33.42
mike1	greetings	22:31.36
	is anyone here available for a quick question?	22:31.43
	new to pdfs and mupdf and am trying to find an equivalent set of commands from some python based code i have	22:32.23
	just to throw it out there, when I run "mutool show a.pdf 1" I get breakout of the first object	22:48.26
	i.e. https://pastebin.com/W8HkgCLH	22:48.27
	but how do I traverse the objects listed there, like AcroForm or Metadata?	22:48.47
	I understand that this is a tree structure but can't figure out how I walk the tree	22:49.03
	I'm looking to do something equivalent to what pyquery and pdfminer can do	22:49.19
sebras	mike1: to show the AcroForm object you would do "mutool show a.pdf 83"	23:44.13
	or "mutool show a.pdf 81" for the meta data object	23:44.23
	mike1: if you want you can also so	23:44.32
	do "mutool show a.pdf xref" to show the xref which lists all objects and their offsets.	23:44.58
	or "mutool show a.pdf trailer" to show the trailer "object" from which a PDF parser starts to locate all the things it needs.	23:45.30
	Forward 1 day (to 2018/01/04)>>>

Log of #mupdf at irc.freenode.net.