Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2018/01/22)20180123 
Robin_Watts tor8: The string is copied INTO the display list. It's safe :)00:01.56 
sebras Robin_Watts: hm. the logic in add_range() doesn't take differing values of many into account when checking for overlap at all.01:57.14 
  but perhaps I still misunderstand the code.01:57.43 
  I did misunderstand, or rather I didn't read far enough.02:08.30 
titanous sebras: your fix for 5496 (uninitialized version buffer) doesn't work, strlen has undefined behavior when used on uninitialized bytes, you should check the length of the data read04:06.23 
sebras titanous: fz_read_line() will write a trailing NUL even if it onöy reads EOF from the underlying stream, sp strlen() ought to be safe. memcmp() doesn't (in the name of optimization) necedsarily respect NUL as it may read data one word at a time.08:16.19 
titanous sebras: ah, great, I must have messed up reproing with the fix09:51.15 
chrisl mutool clean seems to be broken - it's using "rb+" with fopen for opening the output file10:14.43 
sebras chrisl: as long as there is no seeking that shouldn't be a problem.10:35.55 
  chrisl: but...10:36.00 
  chrisl: if I read e.g. fz_new_output_with_path() correctly it provides a seek mechanism.10:36.18 
kens sebras, not I'm afraid that *is* a probme10:37.01 
chrisl sebras: it also only opens an existing file - which I'd argue is not a common use of mutool clean10:37.11 
kens fopen with rb+ fails if the file doe snot exist10:37.15 
sebras chrisl: ah, true.10:37.26 
  chrisl: kens: I blane paulgardiner whose commit 457873fbf7fd6d40242722f3a51b41428302d0ca went in yesterday. :)10:38.21 
kens That is indeed the problem sebras, we believe10:38.41 
  Paul is talking about it now10:38.48 
tor8 sebras: so, your pdf_lex_no_string commit11:13.13 
  it's been a long time since I wrote this code11:13.23 
  but I think the intent is to skip over the 'illegal' tokens11:13.43 
  your fix patches over the '<' case to loop forever11:14.32 
  nvm that last statement11:15.23 
  yes, with your patch it behaves like it does for '(' and ')'11:15.37 
  question is ... should it? or should we return PDF_TOK_ERROR11:15.46 
  and why do we have a special 'no_string' lexer that is used for repair?11:17.24 
sebras tor8: from my pov pdf_lex_no_string() should behave as pdf_lex().11:17.54 
  tor8: today it does not.11:17.58 
  the difference should _only_ be in that strings will not be returned as tokens.11:18.12 
  so really a flag to pdf_lex() might do, but when doing the original commit robin decided to duplicate the function for some reason.11:18.43 
  tor8: and missed this fz_unread_byte().11:18.51 
tor8 Robin_Watts: added the pdf_lex_no_string function11:19.05 
sebras tor8: he did.11:19.15 
tor8 and the problem with reading strings is that if we hit a 'garbage' string opening character11:19.29 
  we could swallow the entire file11:19.35 
sebras yup.11:19.39 
tor8 so we just want to skip them11:19.43 
  or return PDF_TOK_ERROR maybe11:19.58 
sebras I'm not sure PDF_TOK_ERROR is the best way though.11:20.21 
  because these things are used when repairing too.11:20.30 
tor8 it's *only* called in one spot, and that is one where it is scanning for the top level "X Y obj" and trailer dictionary11:21.09 
  returning PDF_TOK_ERROR would be nicer if someone else were to try to use it11:21.43 
sebras tor8: would you want pdf_lex() to do this too?11:22.01 
tor8 do what?11:22.13 
sebras tor8: return PDF_TOK_ERROR.11:22.24 
tor8 that's weird, it doesn't...11:22.58 
  oh, it has other checks when reading numbers and names and keywords where it checks for errors and returns PDF_TOK_ERROR11:24.03 
  strange that it only warns but doesn't return an error on a single ">" character11:24.16 
sebras tor8: that change was introduced in 95f29c10759bc42e940869f1a0d2f82db16f870911:25.54 
tor8 yeah, from throwing to warning11:26.38 
  and we probably introduced TOK_ERROR after that11:26.45 
  I'll make a patch and see what the cluster says11:26.52 
sebras tor8: no, PDF_TOK_ERROR was already present in the enum at that time.11:27.21 
tor8 then changing it to a warn was probably a mistake. let me try my commit on the cluster.11:32.12 
  it's on tor/master if you want to have a look11:32.44 
sebras tor8: two first patches on tor/master LGTM.11:34.26 
tor8 sebras: thanks.11:35.13 
sebras tor8: in pdf_lex() when you unread the byte after '>' you check for EOF before unreading.11:36.16 
  tor8: yet you do not do this if you first read '<' and then something other than '<'. why not?11:36.41 
  tor8: I'm confused. so if you have "<tjosan" in your file when you attempt to repair and thus calls pdf_lex_no_string11:40.25 
  () then you will in pdf_lex_no_string() first read '<' and assume it is either a dict or a hex string.11:40.57 
  next you read 't' and decide that this is indeed not a dict so you return PDF_TOK_ERROR11:41.31 
  the execution then returns to pdf_repair_xref() which checks the value in tok, and if tok == PDF_TOK_ERROR we read an extra byte.11:42.26 
  that doesn't seem right. I'm ok with us consuming a byte and returning PDF_TOK_ERROR and then have the next call process the next byte.11:43.06 
  but this way pdf_lex_no_string() consumes one and the extra fz_read_byte() in pdf_repair_xref() consumes another.11:43.39 
tor8 oh... the repair code reads an extra byte after each error token? that seems fishy.11:43.47 
sebras it does.11:43.58 
tor8 sebras: I shouldn't be allowed to program today... you're right, I don't handle EOF in the '<' unread part11:46.02 
  should possibly rejig this to use fz_peek_byte instead...11:46.48 
  but that would stick out like a sore thumb... the rest of the lexer code is structured around read/unread11:47.42 
sebras tor8: mmm, and now that you made me aware that it might be a good idea to check for EOF I started looking at other places where we call fz_unread_byte().11:48.19 
  tor8: like at the beginning of pdf_repair_xref().11:48.26 
tor8 sebras: updated pdf_lex and proposal on tor/master11:51.25 
  sebras: updated again11:53.49 
sebras tor8: in "Allow oversize allocations in pool allocator." why don't you just rely on the ternary expression for assigning extra? checking size twice seems a bit unnecessary.11:55.41 
  tor8: also what prompted you to do this change? is there a broken file somewhere?11:56.37 
malc_ sebras: yes there is12:00.24 
sebras malc_: ...related to the patch tor8 made.12:00.49 
malc_ sebras: yes there is :)12:00.59 
sebras malc_: ok, then I'd prefer a bug report with the file and a pointer to the bugfix in the commit message.12:01.24 
malc_ sebras: the file is private :(12:01.46 
sebras malc_: if it can be uploaded to bugzilla we can mark it private so only we can see it. sometimes that doesn't help though.12:03.11 
tor8 sebras: oops. I added the ternary, didn't like it and tried to revert the code back to non-ternary form.12:03.12 
  like I said, I oughtn't program today :(12:03.20 
malc_ sebras: and i guess tor8 can come up with a simple reproducer himself should he so desire, given that he fixed it the fact that, i'm speculating here, he understands the problem12:03.32 
sebras tor8: TIL oughtn't is a word. :)12:03.47 
tor8 sebras: I didn't anticipate XML documents with text chunks > 64kb12:03.57 
malc_ sebras: i was about to say something similar :)12:04.02 
  2 vowels followed by 5 consonants and one special symbol12:04.16 
  neat12:04.16 
  sebras: what tor8 said, plus it is just a fb2 xml header plus embedded jpeg image12:09.07 
paulgardiner tor8, sebras, Robin_Watts: I've completed most of the refactoring I was planning. It's on my signature-support-refactor branch. It has the pkcs7 and pdf specifics separated, so that swapping in an alternative pkcs7 implementation should be a lot easier. The only mupdf specifics in the api is the use of a stream to access the bytes to hash.12:38.26 
  tor8, sebras, Robin_Watts: I could take it one stage further. At the moment the choice of pkcs7 implementation has to be made at build time. I could alternatively make pdf_check_signature and pdf_sign_signature pass in a pkcs7 object. That way the caller determines the implementation.12:42.32 
Robin_Watts paulgardiner: That sounds nicer.12:43.28 
  And it's in keeping with what we do with lcms etc, I think.12:43.41 
  (fz_set_cmm_engine(ctx, engine); or something like that)12:44.01 
paulgardiner Would mean some work updating apps for anyone currently using the feature.12:44.09 
Robin_Watts paulgardiner: Only in so far as they would need to add an fz_set_signature_engine(ctx, engine); call, right ?12:44.53 
paulgardiner Ah, I hadn't considered doing it that way. Yes. That might be better.12:45.33 
  Not quite. pdf_sign_signature currently takes a file path and password for where to get the certificate and private key. I wanted to make that abstract, so that instead you pass in the function that creates the digest and allow for the possibility that certificates and private keys might be obtained from elsewhere.12:48.15 
  pdf_sign_signature already involves wrapping up a signer object and storing it until the document is saved, so it would be quite easy to make that object just a little more sophisticated12:50.49 
Robin_Watts I don't see why that precludes setting the engine up front.12:53.35 
sebras paulgardiner: "Fix failure of non-incremental document saving" wants a trailing period on the first line of the commit message. apart from that it looks reasonable to me, but you might want to have Robin_Watts and tor8 OK this too.13:10.55 
paulgardiner Robin_Watts: it doesn't, but then you gain nothing from doing so. It's just an extra complication.13:16.47 
  sebras: thanks. I've added the trailing period.13:25.57 
fredross-perry tor8, sebras - reminder, please finish reviewing my streaming interface stuff so we can get it in, thanks.15:53.16 
sebras fredross-perry: I did, and I reminded tor8.16:04.58 
fredross-perry ok thanks16:05.10 
Robin_Watts tor8, sebras, fredross-perry, mvrhel_laptop, paulgardiner: Trivial cut and paste bug needs a review:19:11.20 
  http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=791d24b28731d59f19bf69dec943357cf64fb30c19:11.22 
fredross-perry seems legit.19:12.13 
Robin_Watts fredross-perry: Ta.19:12.18 
sebras Robin_Watts: mitrer typo in commit message.21:05.44 
  Robin_Watts: if you are around, do you mind taking a look at sebras/master?22:22.36 
  three commits concerning the cmap splay trees, one concerning allowed objects as keys in the PDF store and one for docs.22:23.13 
 Forward 1 day (to 2018/01/24)>>> 
ghostscript.com #ghostscript
Search: