MuPDF IRC logs

	<<<Back 1 day (to 2019/07/21)	Fwd 1 day (to 2019/07/23)>>>	20190722
sebras	ator: where did we end up with hack3 in the end?		10:23.39
	ator: seems like you wanted to avoid the 's' 'S' 'T' variant.		10:24.00
ator	sebras: tor/hack3 is my current proposal		10:36.48
*sebras*	pulls.		10:37.01
	ator: that looks like it would resolve the issue, yes.		10:44.35
ator	and print a helpful warning in the case Robin and I worry about, the potential false positives in future specs		10:47.47
	or by custom abuse of the current spec, if someone uses ByteRange in a way that isn't documented		10:48.06
Robin_Watts	ator: Looking at that now.		10:49.03
	pdf_crypt_obj_imp shouldn't called it "is_sig_dict".		10:49.21
	It should call it something like "no_encrypt_contents".		10:49.44
ator	Robin_Watts: I originally had it 'skip_contents'		10:50.18
sebras	if we want to parametrize it, I guess we could supply a PDF_NAME() as an argument?		10:50.22
Robin_Watts	cos looking at the code in the future, being told what that flag does rather than what it does it to is more useful.		10:50.26
ator	Robin_Watts: I thought is_sig_dict would be clearer as to why we were skipping it		10:50.35
sebras	and if it is PDF_NULL then do nothing..?		10:50.44
Robin_Watts	ator: It's right to call it is_sig_dict in pdf-xref.c, just not in pdf_crypt_obj		10:51.09
ator	Robin_Watts: fair enough.		10:51.18
Robin_Watts	At least, that's what my monday-morning-brain is telling me.		10:51.37
*Robin_Watts*	is struggling with intel document/inline assembler/weirdass virtual machines at the moment.		10:52.02
	Doesn't pdf_dict_get check pdf_is_dict as the first thing it does?		10:52.49
ator	Robin_Watts: yes. that would be a good optimization to rely on.		10:53.34
	Robin_Watts: but agh, I think we should be paranoid here and double check that it's not an indirect object		10:54.15
	or we could get a malicious loop with: 10 0 obj 10 0 R endobj		10:54.26
	some fixups pushed to tor/hack3		10:55.20
Robin_Watts	We could roll !pdf_is_indirect && pdf_dict_get into a new pdf_dict_get_non_indirect		10:58.29
	which could save a bit.		10:58.41
	I don't know if it will be significant, I suspect not, but we hit this case on every object now, right?		10:59.08
ator	Robin_Watts: on every top-level indirect object, yes		10:59.40
	so maybe a few thousand times in the document total on big documents		11:00.10
Robin_Watts	lgtm as it is then.		11:00.16
ator	thanks		11:01.25
sebras	ator: so in pdf-write.c in create_encryption_dictionary() we create the Encrypt dictionary as a non-indirect dictionary in the trailer.		11:09.55
	if we were to create the encryption dictionary as an indirect object, it would be encrypted, no..?		11:10.09
ator	sebras: yes.		11:11.20
sebras	ator: so at the moment we do not need to consider this case. but if we change it your hack needs changing too.		11:11.55
ator	sebras: but do check dowriteobject() where we check 'num == opts->crypt_object_number)		11:11.58
sebras	right. but we don't do that for contents since we overwrite its value in complete_signatures().		11:18.00
ator	sebras: yeah. see hack3 for commit fixing pdf-write		11:20.13
sebras	ator: though this doesn't matter since we seem to overwrite its value in complete_signature().		11:20.58
	but that parsing is quite icky.		11:21.09
ator	sebras: this will preserve the data if we forget to call complete_signature		11:21.31
	and it also guarantees 100% that it will be a hex string		11:21.38
	of the appropriate length		11:21.45
	which I'm not sure it was, given AES padding etc		11:21.58
sebras	oh, I didn't even think of the type of the object. but yeah, that is quite important.		11:22.04
ator	I believe it was always encrypted so there's probably a 1 in a billion chance it would not be hex formatted		11:22.44
	but better safe, I think		11:22.55
sebras	ator: so now we have two ways of keeping track of objects whose contents should not be encrypted upon writing: 1) crypt		11:24.52
	crypt_object_number and detection of /ByteRange		11:25.03
ator	yes, only missing the ID strings now... :)		11:25.25
sebras	let's do that in a third way! ;)		11:25.44
	I'm a little bit concerned that we keep track of the two in two different ways.		11:25.59
ator	sebras: ByteRange only affects one dictionary entry		11:26.08
	we can't do the same as crypt_object		11:26.14
	which affects the whole object		11:26.23
sebras	ah, yes. that's the difference.		11:26.53
ator	crypt_object_number only affects the current crypt object		11:27.03
	we may be messing up other encryption dictionaries here		11:27.16
sebras	thoug for the encryption dict I'm unsure what happens to its subdicts. like /Recipients.		11:27.22
	can there be two?		11:27.40
	or you are thinking about indirect objects pointed to from the Encryption dictionary?		11:28.05
ator	sebras: no, I'm thinking about Crypt filters		11:28.35
	(and old replaced Encryption dictionaries that are no longer in use, because we added new encryption but didn't garbage collect)		11:29.00
sebras	aren't all crypt filters just specified in the top-level encryption dict, and then referred to be name?		11:30.25
ator	sebras: in that case we should be mostly safe (modulo botching the old encryption dictionary)		11:30.44
sebras	mmm, old encryption dictionaries I think are less of a problem, because they ought to be unused.		11:31.15
ator	yeah, I'm not overly concerned there		11:31.26
	though we could do similar automatic detection to ByteRange here		11:31.53
	the Filter is a required entry for encryption dictionaries		11:32.04
	so we could scan for /Filter/Standard		11:33.10
	sebras: hmm, when we create a new Encryption dictionary in pdf_new_encrypt		11:33.42
	it does not seem like we are setting that required field though!		11:33.49
sebras	ator: that's not where the PDF object is created though.		11:34.57
ator	ah, you're using put_name"Standard" instead of put PDF_NAME(Standard)		11:34.58
sebras	ator: shouldn't that be create_encryption_dictionary()?		11:35.02
ator	my grepping failed		11:35.03
sebras	I wrote this? :)		11:35.32
	I don't remember!		11:35.37
	oh... I'm turning into Robin_Watts :)		11:35.43
	ator: but yeah, that ought to be pdf_dict_put( PDF_NAME(), PDF_NAME()){.		11:36.23
ator	done.		11:37.58
sebras	ator: hm... by changing from pdf_is_dict() to !pdf_is_indirect(), aren't we then ignoring the type of the object?		12:18.16
	we do usually call pdf_dict_get*() later on so perhaps this matters less.		12:18.48
	oh and this is in writeobject().		12:19.03
ator	sebras: yes.		12:19.10
sebras	ator: I'm not quite sure if that is good or bad.		12:19.39
ator	I realized we had the same anti-pattern as when reading, pdf_is_dict will auto-resolve indirect objects		12:19.44
	and we really shouldn't be		12:19.47
	in these instances		12:19.53
sebras	we _do_ usually just attempt to parse whatever crap we get and fallback to a default value, so perhaps it is not problematic.		12:20.02
	ator: right, but aren't there many places where we call pdf_is_dict() where we are trying to guard against indirect objects?		12:20.38
ator	only a small handful, in the object parsing and writing code.		12:21.13
sebras	have you reviewed them all?		12:21.30
ator	almost all other places we benefit from having automatic resolving		12:21.32
sebras	ok.		12:21.36
ator	no.		12:21.38
	at least, no, I haven't reviewed them today.		12:21.54
	<<<Back 1 day (to 2019/07/21)	Forward 1 day (to 2019/07/23)>>>

Log of #mupdf at irc.freenode.net.