Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2019/07/21)Fwd 1 day (to 2019/07/23)>>>20190722 
sebras ator: where did we end up with hack3 in the end?10:23.39 
  ator: seems like you wanted to avoid the 's' 'S' 'T' variant.10:24.00 
ator sebras: tor/hack3 is my current proposal10:36.48 
sebras pulls.10:37.01 
  ator: that looks like it would resolve the issue, yes.10:44.35 
ator and print a helpful warning in the case Robin and I worry about, the potential false positives in future specs10:47.47 
  or by custom abuse of the current spec, if someone uses ByteRange in a way that isn't documented10:48.06 
Robin_Watts ator: Looking at that now.10:49.03 
  pdf_crypt_obj_imp shouldn't called it "is_sig_dict".10:49.21 
  It should call it something like "no_encrypt_contents".10:49.44 
ator Robin_Watts: I originally had it 'skip_contents'10:50.18 
sebras if we want to parametrize it, I guess we could supply a PDF_NAME() as an argument?10:50.22 
Robin_Watts cos looking at the code in the future, being told what that flag does rather than what it does it to is more useful.10:50.26 
ator Robin_Watts: I thought is_sig_dict would be clearer as to why we were skipping it10:50.35 
sebras and if it is PDF_NULL then do nothing..?10:50.44 
Robin_Watts ator: It's right to call it is_sig_dict in pdf-xref.c, just not in pdf_crypt_obj10:51.09 
ator Robin_Watts: fair enough.10:51.18 
Robin_Watts At least, that's what my monday-morning-brain is telling me.10:51.37 
Robin_Watts is struggling with intel document/inline assembler/weirdass virtual machines at the moment.10:52.02 
  Doesn't pdf_dict_get check pdf_is_dict as the first thing it does?10:52.49 
ator Robin_Watts: yes. that would be a good optimization to rely on.10:53.34 
  Robin_Watts: but agh, I think we should be paranoid here and double check that it's not an indirect object10:54.15 
  or we could get a malicious loop with: 10 0 obj 10 0 R endobj10:54.26 
  some fixups pushed to tor/hack310:55.20 
Robin_Watts We *could* roll !pdf_is_indirect && pdf_dict_get into a new pdf_dict_get_non_indirect10:58.29 
  which could save a bit.10:58.41 
  I don't know if it will be significant, I suspect not, but we hit this case on every object now, right?10:59.08 
ator Robin_Watts: on every top-level indirect object, yes10:59.40 
  so maybe a few thousand times in the document total on big documents11:00.10 
Robin_Watts lgtm as it is then.11:00.16 
ator thanks11:01.25 
sebras ator: so in pdf-write.c in create_encryption_dictionary() we create the Encrypt dictionary as a non-indirect dictionary in the trailer.11:09.55 
  if we were to create the encryption dictionary as an indirect object, it would be encrypted, no..?11:10.09 
ator sebras: yes.11:11.20 
sebras ator: so at the moment we do not need to consider this case. but if we change it your hack needs changing too.11:11.55 
ator sebras: but do check dowriteobject() where we check 'num == opts->crypt_object_number)11:11.58 
sebras right. but we don't do that for contents since we overwrite its value in complete_signatures().11:18.00 
ator sebras: yeah. see hack3 for commit fixing pdf-write11:20.13 
sebras ator: though this doesn't matter since we seem to overwrite its value in complete_signature().11:20.58 
  but that parsing is quite icky.11:21.09 
ator sebras: this will preserve the data if we forget to call complete_signature11:21.31 
  and it also guarantees 100% that it will be a hex string11:21.38 
  of the appropriate length11:21.45 
  which I'm not sure it was, given AES padding etc11:21.58 
sebras oh, I didn't even think of the type of the object. but yeah, that is quite important.11:22.04 
ator I believe it was always encrypted so there's probably a 1 in a billion chance it would not be hex formatted11:22.44 
  but better safe, I think11:22.55 
sebras ator: so now we have two ways of keeping track of objects whose contents should not be encrypted upon writing: 1) crypt11:24.52 
  crypt_object_number and detection of /ByteRange11:25.03 
ator yes, only missing the ID strings now... :)11:25.25 
sebras let's do that in a third way! ;)11:25.44 
  I'm a little bit concerned that we keep track of the two in two different ways.11:25.59 
ator sebras: ByteRange only affects one dictionary entry11:26.08 
  we can't do the same as crypt_object11:26.14 
  which affects the whole object11:26.23 
sebras ah, yes. that's the difference.11:26.53 
ator crypt_object_number only affects the current crypt object11:27.03 
  we may be messing up other encryption dictionaries here11:27.16 
sebras thoug for the encryption dict I'm unsure what happens to its subdicts. like /Recipients.11:27.22 
  can there be two?11:27.40 
  or you are thinking about indirect objects pointed to from the Encryption dictionary?11:28.05 
ator sebras: no, I'm thinking about Crypt filters11:28.35 
  (and old replaced Encryption dictionaries that are no longer in use, because we added new encryption but didn't garbage collect)11:29.00 
sebras aren't all crypt filters just specified in the top-level encryption dict, and then referred to be name?11:30.25 
ator sebras: in that case we should be mostly safe (modulo botching the old encryption dictionary)11:30.44 
sebras mmm, old encryption dictionaries I think are less of a problem, because they ought to be unused.11:31.15 
ator yeah, I'm not overly concerned there11:31.26 
  though we could do similar automatic detection to ByteRange here11:31.53 
  the Filter is a required entry for encryption dictionaries11:32.04 
  so we could scan for /Filter/Standard11:33.10 
  sebras: hmm, when we create a new Encryption dictionary in pdf_new_encrypt11:33.42 
  it does not seem like we are setting that required field though!11:33.49 
sebras ator: that's not where the PDF object is created though.11:34.57 
ator ah, you're using put_name"Standard" instead of put PDF_NAME(Standard)11:34.58 
sebras ator: shouldn't that be create_encryption_dictionary()?11:35.02 
ator my grepping failed11:35.03 
sebras I wrote this? :)11:35.32 
  I don't remember!11:35.37 
  oh... I'm turning into Robin_Watts :)11:35.43 
  ator: but yeah, that ought to be pdf_dict_put( PDF_NAME(), PDF_NAME()){.11:36.23 
ator done.11:37.58 
sebras ator: hm... by changing from pdf_is_dict() to !pdf_is_indirect(), aren't we then ignoring the type of the object?12:18.16 
  we do usually call pdf_dict_get*() later on so perhaps this matters less.12:18.48 
  oh and this is in writeobject().12:19.03 
ator sebras: yes.12:19.10 
sebras ator: I'm not quite sure if that is good or bad.12:19.39 
ator I realized we had the same anti-pattern as when reading, pdf_is_dict will auto-resolve indirect objects12:19.44 
  and we really shouldn't be12:19.47 
  in these instances12:19.53 
sebras we _do_ usually just attempt to parse whatever crap we get and fallback to a default value, so perhaps it is not problematic.12:20.02 
  ator: right, but aren't there many places where we call pdf_is_dict() where we are trying to guard against indirect objects?12:20.38 
ator only a small handful, in the object parsing and writing code.12:21.13 
sebras have you reviewed them all?12:21.30 
ator almost all other places we benefit from having automatic resolving12:21.32 
sebras ok.12:21.36 
ator no.12:21.38 
  at least, no, I haven't reviewed them today.12:21.54 
 <<<Back 1 day (to 2019/07/21)Forward 1 day (to 2019/07/23)>>> 
ghostscript.com #ghostscript
Search: