| <<<Back 1 day (to 2018/01/22) | 20180123 |
Robin_Watts | tor8: The string is copied INTO the display list. It's safe :) | 00:01.56 |
sebras | Robin_Watts: hm. the logic in add_range() doesn't take differing values of many into account when checking for overlap at all. | 01:57.14 |
| but perhaps I still misunderstand the code. | 01:57.43 |
| I did misunderstand, or rather I didn't read far enough. | 02:08.30 |
titanous | sebras: your fix for 5496 (uninitialized version buffer) doesn't work, strlen has undefined behavior when used on uninitialized bytes, you should check the length of the data read | 04:06.23 |
sebras | titanous: fz_read_line() will write a trailing NUL even if it onöy reads EOF from the underlying stream, sp strlen() ought to be safe. memcmp() doesn't (in the name of optimization) necedsarily respect NUL as it may read data one word at a time. | 08:16.19 |
titanous | sebras: ah, great, I must have messed up reproing with the fix | 09:51.15 |
chrisl | mutool clean seems to be broken - it's using "rb+" with fopen for opening the output file | 10:14.43 |
sebras | chrisl: as long as there is no seeking that shouldn't be a problem. | 10:35.55 |
| chrisl: but... | 10:36.00 |
| chrisl: if I read e.g. fz_new_output_with_path() correctly it provides a seek mechanism. | 10:36.18 |
kens | sebras, not I'm afraid that *is* a probme | 10:37.01 |
chrisl | sebras: it also only opens an existing file - which I'd argue is not a common use of mutool clean | 10:37.11 |
kens | fopen with rb+ fails if the file doe snot exist | 10:37.15 |
sebras | chrisl: ah, true. | 10:37.26 |
| chrisl: kens: I blane paulgardiner whose commit 457873fbf7fd6d40242722f3a51b41428302d0ca went in yesterday. :) | 10:38.21 |
kens | That is indeed the problem sebras, we believe | 10:38.41 |
| Paul is talking about it now | 10:38.48 |
tor8 | sebras: so, your pdf_lex_no_string commit | 11:13.13 |
| it's been a long time since I wrote this code | 11:13.23 |
| but I think the intent is to skip over the 'illegal' tokens | 11:13.43 |
| your fix patches over the '<' case to loop forever | 11:14.32 |
| nvm that last statement | 11:15.23 |
| yes, with your patch it behaves like it does for '(' and ')' | 11:15.37 |
| question is ... should it? or should we return PDF_TOK_ERROR | 11:15.46 |
| and why do we have a special 'no_string' lexer that is used for repair? | 11:17.24 |
sebras | tor8: from my pov pdf_lex_no_string() should behave as pdf_lex(). | 11:17.54 |
| tor8: today it does not. | 11:17.58 |
| the difference should _only_ be in that strings will not be returned as tokens. | 11:18.12 |
| so really a flag to pdf_lex() might do, but when doing the original commit robin decided to duplicate the function for some reason. | 11:18.43 |
| tor8: and missed this fz_unread_byte(). | 11:18.51 |
tor8 | Robin_Watts: added the pdf_lex_no_string function | 11:19.05 |
sebras | tor8: he did. | 11:19.15 |
tor8 | and the problem with reading strings is that if we hit a 'garbage' string opening character | 11:19.29 |
| we could swallow the entire file | 11:19.35 |
sebras | yup. | 11:19.39 |
tor8 | so we just want to skip them | 11:19.43 |
| or return PDF_TOK_ERROR maybe | 11:19.58 |
sebras | I'm not sure PDF_TOK_ERROR is the best way though. | 11:20.21 |
| because these things are used when repairing too. | 11:20.30 |
tor8 | it's *only* called in one spot, and that is one where it is scanning for the top level "X Y obj" and trailer dictionary | 11:21.09 |
| returning PDF_TOK_ERROR would be nicer if someone else were to try to use it | 11:21.43 |
sebras | tor8: would you want pdf_lex() to do this too? | 11:22.01 |
tor8 | do what? | 11:22.13 |
sebras | tor8: return PDF_TOK_ERROR. | 11:22.24 |
tor8 | that's weird, it doesn't... | 11:22.58 |
| oh, it has other checks when reading numbers and names and keywords where it checks for errors and returns PDF_TOK_ERROR | 11:24.03 |
| strange that it only warns but doesn't return an error on a single ">" character | 11:24.16 |
sebras | tor8: that change was introduced in 95f29c10759bc42e940869f1a0d2f82db16f8709 | 11:25.54 |
tor8 | yeah, from throwing to warning | 11:26.38 |
| and we probably introduced TOK_ERROR after that | 11:26.45 |
| I'll make a patch and see what the cluster says | 11:26.52 |
sebras | tor8: no, PDF_TOK_ERROR was already present in the enum at that time. | 11:27.21 |
tor8 | then changing it to a warn was probably a mistake. let me try my commit on the cluster. | 11:32.12 |
| it's on tor/master if you want to have a look | 11:32.44 |
sebras | tor8: two first patches on tor/master LGTM. | 11:34.26 |
tor8 | sebras: thanks. | 11:35.13 |
sebras | tor8: in pdf_lex() when you unread the byte after '>' you check for EOF before unreading. | 11:36.16 |
| tor8: yet you do not do this if you first read '<' and then something other than '<'. why not? | 11:36.41 |
| tor8: I'm confused. so if you have "<tjosan" in your file when you attempt to repair and thus calls pdf_lex_no_string | 11:40.25 |
| () then you will in pdf_lex_no_string() first read '<' and assume it is either a dict or a hex string. | 11:40.57 |
| next you read 't' and decide that this is indeed not a dict so you return PDF_TOK_ERROR | 11:41.31 |
| the execution then returns to pdf_repair_xref() which checks the value in tok, and if tok == PDF_TOK_ERROR we read an extra byte. | 11:42.26 |
| that doesn't seem right. I'm ok with us consuming a byte and returning PDF_TOK_ERROR and then have the next call process the next byte. | 11:43.06 |
| but this way pdf_lex_no_string() consumes one and the extra fz_read_byte() in pdf_repair_xref() consumes another. | 11:43.39 |
tor8 | oh... the repair code reads an extra byte after each error token? that seems fishy. | 11:43.47 |
sebras | it does. | 11:43.58 |
tor8 | sebras: I shouldn't be allowed to program today... you're right, I don't handle EOF in the '<' unread part | 11:46.02 |
| should possibly rejig this to use fz_peek_byte instead... | 11:46.48 |
| but that would stick out like a sore thumb... the rest of the lexer code is structured around read/unread | 11:47.42 |
sebras | tor8: mmm, and now that you made me aware that it might be a good idea to check for EOF I started looking at other places where we call fz_unread_byte(). | 11:48.19 |
| tor8: like at the beginning of pdf_repair_xref(). | 11:48.26 |
tor8 | sebras: updated pdf_lex and proposal on tor/master | 11:51.25 |
| sebras: updated again | 11:53.49 |
sebras | tor8: in "Allow oversize allocations in pool allocator." why don't you just rely on the ternary expression for assigning extra? checking size twice seems a bit unnecessary. | 11:55.41 |
| tor8: also what prompted you to do this change? is there a broken file somewhere? | 11:56.37 |
malc_ | sebras: yes there is | 12:00.24 |
sebras | malc_: ...related to the patch tor8 made. | 12:00.49 |
malc_ | sebras: yes there is :) | 12:00.59 |
sebras | malc_: ok, then I'd prefer a bug report with the file and a pointer to the bugfix in the commit message. | 12:01.24 |
malc_ | sebras: the file is private :( | 12:01.46 |
sebras | malc_: if it can be uploaded to bugzilla we can mark it private so only we can see it. sometimes that doesn't help though. | 12:03.11 |
tor8 | sebras: oops. I added the ternary, didn't like it and tried to revert the code back to non-ternary form. | 12:03.12 |
| like I said, I oughtn't program today :( | 12:03.20 |
malc_ | sebras: and i guess tor8 can come up with a simple reproducer himself should he so desire, given that he fixed it the fact that, i'm speculating here, he understands the problem | 12:03.32 |
sebras | tor8: TIL oughtn't is a word. :) | 12:03.47 |
tor8 | sebras: I didn't anticipate XML documents with text chunks > 64kb | 12:03.57 |
malc_ | sebras: i was about to say something similar :) | 12:04.02 |
| 2 vowels followed by 5 consonants and one special symbol | 12:04.16 |
| neat | 12:04.16 |
| sebras: what tor8 said, plus it is just a fb2 xml header plus embedded jpeg image | 12:09.07 |
paulgardiner | tor8, sebras, Robin_Watts: I've completed most of the refactoring I was planning. It's on my signature-support-refactor branch. It has the pkcs7 and pdf specifics separated, so that swapping in an alternative pkcs7 implementation should be a lot easier. The only mupdf specifics in the api is the use of a stream to access the bytes to hash. | 12:38.26 |
| tor8, sebras, Robin_Watts: I could take it one stage further. At the moment the choice of pkcs7 implementation has to be made at build time. I could alternatively make pdf_check_signature and pdf_sign_signature pass in a pkcs7 object. That way the caller determines the implementation. | 12:42.32 |
Robin_Watts | paulgardiner: That sounds nicer. | 12:43.28 |
| And it's in keeping with what we do with lcms etc, I think. | 12:43.41 |
| (fz_set_cmm_engine(ctx, engine); or something like that) | 12:44.01 |
paulgardiner | Would mean some work updating apps for anyone currently using the feature. | 12:44.09 |
Robin_Watts | paulgardiner: Only in so far as they would need to add an fz_set_signature_engine(ctx, engine); call, right ? | 12:44.53 |
paulgardiner | Ah, I hadn't considered doing it that way. Yes. That might be better. | 12:45.33 |
| Not quite. pdf_sign_signature currently takes a file path and password for where to get the certificate and private key. I wanted to make that abstract, so that instead you pass in the function that creates the digest and allow for the possibility that certificates and private keys might be obtained from elsewhere. | 12:48.15 |
| pdf_sign_signature already involves wrapping up a signer object and storing it until the document is saved, so it would be quite easy to make that object just a little more sophisticated | 12:50.49 |
Robin_Watts | I don't see why that precludes setting the engine up front. | 12:53.35 |
sebras | paulgardiner: "Fix failure of non-incremental document saving" wants a trailing period on the first line of the commit message. apart from that it looks reasonable to me, but you might want to have Robin_Watts and tor8 OK this too. | 13:10.55 |
paulgardiner | Robin_Watts: it doesn't, but then you gain nothing from doing so. It's just an extra complication. | 13:16.47 |
| sebras: thanks. I've added the trailing period. | 13:25.57 |
fredross-perry | tor8, sebras - reminder, please finish reviewing my streaming interface stuff so we can get it in, thanks. | 15:53.16 |
sebras | fredross-perry: I did, and I reminded tor8. | 16:04.58 |
fredross-perry | ok thanks | 16:05.10 |
Robin_Watts | tor8, sebras, fredross-perry, mvrhel_laptop, paulgardiner: Trivial cut and paste bug needs a review: | 19:11.20 |
| http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=791d24b28731d59f19bf69dec943357cf64fb30c | 19:11.22 |
fredross-perry | seems legit. | 19:12.13 |
Robin_Watts | fredross-perry: Ta. | 19:12.18 |
sebras | Robin_Watts: mitrer typo in commit message. | 21:05.44 |
| Robin_Watts: if you are around, do you mind taking a look at sebras/master? | 22:22.36 |
| three commits concerning the cmap splay trees, one concerning allowed objects as keys in the PDF store and one for docs. | 22:23.13 |
| Forward 1 day (to 2018/01/24)>>> | |