| <<<Back 1 day (to 2019/07/21) | Fwd 1 day (to 2019/07/23)>>> | 20190722 |
sebras | ator: where did we end up with hack3 in the end? | 10:23.39 |
| ator: seems like you wanted to avoid the 's' 'S' 'T' variant. | 10:24.00 |
ator | sebras: tor/hack3 is my current proposal | 10:36.48 |
sebras | pulls. | 10:37.01 |
| ator: that looks like it would resolve the issue, yes. | 10:44.35 |
ator | and print a helpful warning in the case Robin and I worry about, the potential false positives in future specs | 10:47.47 |
| or by custom abuse of the current spec, if someone uses ByteRange in a way that isn't documented | 10:48.06 |
Robin_Watts | ator: Looking at that now. | 10:49.03 |
| pdf_crypt_obj_imp shouldn't called it "is_sig_dict". | 10:49.21 |
| It should call it something like "no_encrypt_contents". | 10:49.44 |
ator | Robin_Watts: I originally had it 'skip_contents' | 10:50.18 |
sebras | if we want to parametrize it, I guess we could supply a PDF_NAME() as an argument? | 10:50.22 |
Robin_Watts | cos looking at the code in the future, being told what that flag does rather than what it does it to is more useful. | 10:50.26 |
ator | Robin_Watts: I thought is_sig_dict would be clearer as to why we were skipping it | 10:50.35 |
sebras | and if it is PDF_NULL then do nothing..? | 10:50.44 |
Robin_Watts | ator: It's right to call it is_sig_dict in pdf-xref.c, just not in pdf_crypt_obj | 10:51.09 |
ator | Robin_Watts: fair enough. | 10:51.18 |
Robin_Watts | At least, that's what my monday-morning-brain is telling me. | 10:51.37 |
Robin_Watts | is struggling with intel document/inline assembler/weirdass virtual machines at the moment. | 10:52.02 |
| Doesn't pdf_dict_get check pdf_is_dict as the first thing it does? | 10:52.49 |
ator | Robin_Watts: yes. that would be a good optimization to rely on. | 10:53.34 |
| Robin_Watts: but agh, I think we should be paranoid here and double check that it's not an indirect object | 10:54.15 |
| or we could get a malicious loop with: 10 0 obj 10 0 R endobj | 10:54.26 |
| some fixups pushed to tor/hack3 | 10:55.20 |
Robin_Watts | We *could* roll !pdf_is_indirect && pdf_dict_get into a new pdf_dict_get_non_indirect | 10:58.29 |
| which could save a bit. | 10:58.41 |
| I don't know if it will be significant, I suspect not, but we hit this case on every object now, right? | 10:59.08 |
ator | Robin_Watts: on every top-level indirect object, yes | 10:59.40 |
| so maybe a few thousand times in the document total on big documents | 11:00.10 |
Robin_Watts | lgtm as it is then. | 11:00.16 |
ator | thanks | 11:01.25 |
sebras | ator: so in pdf-write.c in create_encryption_dictionary() we create the Encrypt dictionary as a non-indirect dictionary in the trailer. | 11:09.55 |
| if we were to create the encryption dictionary as an indirect object, it would be encrypted, no..? | 11:10.09 |
ator | sebras: yes. | 11:11.20 |
sebras | ator: so at the moment we do not need to consider this case. but if we change it your hack needs changing too. | 11:11.55 |
ator | sebras: but do check dowriteobject() where we check 'num == opts->crypt_object_number) | 11:11.58 |
sebras | right. but we don't do that for contents since we overwrite its value in complete_signatures(). | 11:18.00 |
ator | sebras: yeah. see hack3 for commit fixing pdf-write | 11:20.13 |
sebras | ator: though this doesn't matter since we seem to overwrite its value in complete_signature(). | 11:20.58 |
| but that parsing is quite icky. | 11:21.09 |
ator | sebras: this will preserve the data if we forget to call complete_signature | 11:21.31 |
| and it also guarantees 100% that it will be a hex string | 11:21.38 |
| of the appropriate length | 11:21.45 |
| which I'm not sure it was, given AES padding etc | 11:21.58 |
sebras | oh, I didn't even think of the type of the object. but yeah, that is quite important. | 11:22.04 |
ator | I believe it was always encrypted so there's probably a 1 in a billion chance it would not be hex formatted | 11:22.44 |
| but better safe, I think | 11:22.55 |
sebras | ator: so now we have two ways of keeping track of objects whose contents should not be encrypted upon writing: 1) crypt | 11:24.52 |
| crypt_object_number and detection of /ByteRange | 11:25.03 |
ator | yes, only missing the ID strings now... :) | 11:25.25 |
sebras | let's do that in a third way! ;) | 11:25.44 |
| I'm a little bit concerned that we keep track of the two in two different ways. | 11:25.59 |
ator | sebras: ByteRange only affects one dictionary entry | 11:26.08 |
| we can't do the same as crypt_object | 11:26.14 |
| which affects the whole object | 11:26.23 |
sebras | ah, yes. that's the difference. | 11:26.53 |
ator | crypt_object_number only affects the current crypt object | 11:27.03 |
| we may be messing up other encryption dictionaries here | 11:27.16 |
sebras | thoug for the encryption dict I'm unsure what happens to its subdicts. like /Recipients. | 11:27.22 |
| can there be two? | 11:27.40 |
| or you are thinking about indirect objects pointed to from the Encryption dictionary? | 11:28.05 |
ator | sebras: no, I'm thinking about Crypt filters | 11:28.35 |
| (and old replaced Encryption dictionaries that are no longer in use, because we added new encryption but didn't garbage collect) | 11:29.00 |
sebras | aren't all crypt filters just specified in the top-level encryption dict, and then referred to be name? | 11:30.25 |
ator | sebras: in that case we should be mostly safe (modulo botching the old encryption dictionary) | 11:30.44 |
sebras | mmm, old encryption dictionaries I think are less of a problem, because they ought to be unused. | 11:31.15 |
ator | yeah, I'm not overly concerned there | 11:31.26 |
| though we could do similar automatic detection to ByteRange here | 11:31.53 |
| the Filter is a required entry for encryption dictionaries | 11:32.04 |
| so we could scan for /Filter/Standard | 11:33.10 |
| sebras: hmm, when we create a new Encryption dictionary in pdf_new_encrypt | 11:33.42 |
| it does not seem like we are setting that required field though! | 11:33.49 |
sebras | ator: that's not where the PDF object is created though. | 11:34.57 |
ator | ah, you're using put_name"Standard" instead of put PDF_NAME(Standard) | 11:34.58 |
sebras | ator: shouldn't that be create_encryption_dictionary()? | 11:35.02 |
ator | my grepping failed | 11:35.03 |
sebras | I wrote this? :) | 11:35.32 |
| I don't remember! | 11:35.37 |
| oh... I'm turning into Robin_Watts :) | 11:35.43 |
| ator: but yeah, that ought to be pdf_dict_put( PDF_NAME(), PDF_NAME()){. | 11:36.23 |
ator | done. | 11:37.58 |
sebras | ator: hm... by changing from pdf_is_dict() to !pdf_is_indirect(), aren't we then ignoring the type of the object? | 12:18.16 |
| we do usually call pdf_dict_get*() later on so perhaps this matters less. | 12:18.48 |
| oh and this is in writeobject(). | 12:19.03 |
ator | sebras: yes. | 12:19.10 |
sebras | ator: I'm not quite sure if that is good or bad. | 12:19.39 |
ator | I realized we had the same anti-pattern as when reading, pdf_is_dict will auto-resolve indirect objects | 12:19.44 |
| and we really shouldn't be | 12:19.47 |
| in these instances | 12:19.53 |
sebras | we _do_ usually just attempt to parse whatever crap we get and fallback to a default value, so perhaps it is not problematic. | 12:20.02 |
| ator: right, but aren't there many places where we call pdf_is_dict() where we are trying to guard against indirect objects? | 12:20.38 |
ator | only a small handful, in the object parsing and writing code. | 12:21.13 |
sebras | have you reviewed them all? | 12:21.30 |
ator | almost all other places we benefit from having automatic resolving | 12:21.32 |
sebras | ok. | 12:21.36 |
ator | no. | 12:21.38 |
| at least, no, I haven't reviewed them today. | 12:21.54 |
| <<<Back 1 day (to 2019/07/21) | Forward 1 day (to 2019/07/23)>>> | |