MuPDF IRC logs

	<<<Back 1 day (to 2018/01/22)	20180123
Robin_Watts	tor8: The string is copied INTO the display list. It's safe :)	00:01.56
sebras	Robin_Watts: hm. the logic in add_range() doesn't take differing values of many into account when checking for overlap at all.	01:57.14
	but perhaps I still misunderstand the code.	01:57.43
	I did misunderstand, or rather I didn't read far enough.	02:08.30
titanous	sebras: your fix for 5496 (uninitialized version buffer) doesn't work, strlen has undefined behavior when used on uninitialized bytes, you should check the length of the data read	04:06.23
sebras	titanous: fz_read_line() will write a trailing NUL even if it onÃ¶y reads EOF from the underlying stream, sp strlen() ought to be safe. memcmp() doesn't (in the name of optimization) necedsarily respect NUL as it may read data one word at a time.	08:16.19
titanous	sebras: ah, great, I must have messed up reproing with the fix	09:51.15
chrisl	mutool clean seems to be broken - it's using "rb+" with fopen for opening the output file	10:14.43
sebras	chrisl: as long as there is no seeking that shouldn't be a problem.	10:35.55
	chrisl: but...	10:36.00
	chrisl: if I read e.g. fz_new_output_with_path() correctly it provides a seek mechanism.	10:36.18
kens	sebras, not I'm afraid that is a probme	10:37.01
chrisl	sebras: it also only opens an existing file - which I'd argue is not a common use of mutool clean	10:37.11
kens	fopen with rb+ fails if the file doe snot exist	10:37.15
sebras	chrisl: ah, true.	10:37.26
	chrisl: kens: I blane paulgardiner whose commit 457873fbf7fd6d40242722f3a51b41428302d0ca went in yesterday. :)	10:38.21
kens	That is indeed the problem sebras, we believe	10:38.41
	Paul is talking about it now	10:38.48
tor8	sebras: so, your pdf_lex_no_string commit	11:13.13
	it's been a long time since I wrote this code	11:13.23
	but I think the intent is to skip over the 'illegal' tokens	11:13.43
	your fix patches over the '<' case to loop forever	11:14.32
	nvm that last statement	11:15.23
	yes, with your patch it behaves like it does for '(' and ')'	11:15.37
	question is ... should it? or should we return PDF_TOK_ERROR	11:15.46
	and why do we have a special 'no_string' lexer that is used for repair?	11:17.24
sebras	tor8: from my pov pdf_lex_no_string() should behave as pdf_lex().	11:17.54
	tor8: today it does not.	11:17.58
	the difference should _only_ be in that strings will not be returned as tokens.	11:18.12
	so really a flag to pdf_lex() might do, but when doing the original commit robin decided to duplicate the function for some reason.	11:18.43
	tor8: and missed this fz_unread_byte().	11:18.51
tor8	Robin_Watts: added the pdf_lex_no_string function	11:19.05
sebras	tor8: he did.	11:19.15
tor8	and the problem with reading strings is that if we hit a 'garbage' string opening character	11:19.29
	we could swallow the entire file	11:19.35
sebras	yup.	11:19.39
tor8	so we just want to skip them	11:19.43
	or return PDF_TOK_ERROR maybe	11:19.58
sebras	I'm not sure PDF_TOK_ERROR is the best way though.	11:20.21
	because these things are used when repairing too.	11:20.30
tor8	it's only called in one spot, and that is one where it is scanning for the top level "X Y obj" and trailer dictionary	11:21.09
	returning PDF_TOK_ERROR would be nicer if someone else were to try to use it	11:21.43
sebras	tor8: would you want pdf_lex() to do this too?	11:22.01
tor8	do what?	11:22.13
sebras	tor8: return PDF_TOK_ERROR.	11:22.24
tor8	that's weird, it doesn't...	11:22.58
	oh, it has other checks when reading numbers and names and keywords where it checks for errors and returns PDF_TOK_ERROR	11:24.03
	strange that it only warns but doesn't return an error on a single ">" character	11:24.16
sebras	tor8: that change was introduced in 95f29c10759bc42e940869f1a0d2f82db16f8709	11:25.54
tor8	yeah, from throwing to warning	11:26.38
	and we probably introduced TOK_ERROR after that	11:26.45
	I'll make a patch and see what the cluster says	11:26.52
sebras	tor8: no, PDF_TOK_ERROR was already present in the enum at that time.	11:27.21
tor8	then changing it to a warn was probably a mistake. let me try my commit on the cluster.	11:32.12
	it's on tor/master if you want to have a look	11:32.44
sebras	tor8: two first patches on tor/master LGTM.	11:34.26
tor8	sebras: thanks.	11:35.13
sebras	tor8: in pdf_lex() when you unread the byte after '>' you check for EOF before unreading.	11:36.16
	tor8: yet you do not do this if you first read '<' and then something other than '<'. why not?	11:36.41
	tor8: I'm confused. so if you have "<tjosan" in your file when you attempt to repair and thus calls pdf_lex_no_string	11:40.25
	() then you will in pdf_lex_no_string() first read '<' and assume it is either a dict or a hex string.	11:40.57
	next you read 't' and decide that this is indeed not a dict so you return PDF_TOK_ERROR	11:41.31
	the execution then returns to pdf_repair_xref() which checks the value in tok, and if tok == PDF_TOK_ERROR we read an extra byte.	11:42.26
	that doesn't seem right. I'm ok with us consuming a byte and returning PDF_TOK_ERROR and then have the next call process the next byte.	11:43.06
	but this way pdf_lex_no_string() consumes one and the extra fz_read_byte() in pdf_repair_xref() consumes another.	11:43.39
tor8	oh... the repair code reads an extra byte after each error token? that seems fishy.	11:43.47
sebras	it does.	11:43.58
tor8	sebras: I shouldn't be allowed to program today... you're right, I don't handle EOF in the '<' unread part	11:46.02
	should possibly rejig this to use fz_peek_byte instead...	11:46.48
	but that would stick out like a sore thumb... the rest of the lexer code is structured around read/unread	11:47.42
sebras	tor8: mmm, and now that you made me aware that it might be a good idea to check for EOF I started looking at other places where we call fz_unread_byte().	11:48.19
	tor8: like at the beginning of pdf_repair_xref().	11:48.26
tor8	sebras: updated pdf_lex and proposal on tor/master	11:51.25
	sebras: updated again	11:53.49
sebras	tor8: in "Allow oversize allocations in pool allocator." why don't you just rely on the ternary expression for assigning extra? checking size twice seems a bit unnecessary.	11:55.41
	tor8: also what prompted you to do this change? is there a broken file somewhere?	11:56.37
malc_	sebras: yes there is	12:00.24
sebras	malc_: ...related to the patch tor8 made.	12:00.49
malc_	sebras: yes there is :)	12:00.59
sebras	malc_: ok, then I'd prefer a bug report with the file and a pointer to the bugfix in the commit message.	12:01.24
malc_	sebras: the file is private :(	12:01.46
sebras	malc_: if it can be uploaded to bugzilla we can mark it private so only we can see it. sometimes that doesn't help though.	12:03.11
tor8	sebras: oops. I added the ternary, didn't like it and tried to revert the code back to non-ternary form.	12:03.12
	like I said, I oughtn't program today :(	12:03.20
malc_	sebras: and i guess tor8 can come up with a simple reproducer himself should he so desire, given that he fixed it the fact that, i'm speculating here, he understands the problem	12:03.32
sebras	tor8: TIL oughtn't is a word. :)	12:03.47
tor8	sebras: I didn't anticipate XML documents with text chunks > 64kb	12:03.57
malc_	sebras: i was about to say something similar :)	12:04.02
	2 vowels followed by 5 consonants and one special symbol	12:04.16
	neat	12:04.16
	sebras: what tor8 said, plus it is just a fb2 xml header plus embedded jpeg image	12:09.07
paulgardiner	tor8, sebras, Robin_Watts: I've completed most of the refactoring I was planning. It's on my signature-support-refactor branch. It has the pkcs7 and pdf specifics separated, so that swapping in an alternative pkcs7 implementation should be a lot easier. The only mupdf specifics in the api is the use of a stream to access the bytes to hash.	12:38.26
	tor8, sebras, Robin_Watts: I could take it one stage further. At the moment the choice of pkcs7 implementation has to be made at build time. I could alternatively make pdf_check_signature and pdf_sign_signature pass in a pkcs7 object. That way the caller determines the implementation.	12:42.32
Robin_Watts	paulgardiner: That sounds nicer.	12:43.28
	And it's in keeping with what we do with lcms etc, I think.	12:43.41
	(fz_set_cmm_engine(ctx, engine); or something like that)	12:44.01
paulgardiner	Would mean some work updating apps for anyone currently using the feature.	12:44.09
Robin_Watts	paulgardiner: Only in so far as they would need to add an fz_set_signature_engine(ctx, engine); call, right ?	12:44.53
paulgardiner	Ah, I hadn't considered doing it that way. Yes. That might be better.	12:45.33
	Not quite. pdf_sign_signature currently takes a file path and password for where to get the certificate and private key. I wanted to make that abstract, so that instead you pass in the function that creates the digest and allow for the possibility that certificates and private keys might be obtained from elsewhere.	12:48.15
	pdf_sign_signature already involves wrapping up a signer object and storing it until the document is saved, so it would be quite easy to make that object just a little more sophisticated	12:50.49
Robin_Watts	I don't see why that precludes setting the engine up front.	12:53.35
sebras	paulgardiner: "Fix failure of non-incremental document saving" wants a trailing period on the first line of the commit message. apart from that it looks reasonable to me, but you might want to have Robin_Watts and tor8 OK this too.	13:10.55
paulgardiner	Robin_Watts: it doesn't, but then you gain nothing from doing so. It's just an extra complication.	13:16.47
	sebras: thanks. I've added the trailing period.	13:25.57
fredross-perry	tor8, sebras - reminder, please finish reviewing my streaming interface stuff so we can get it in, thanks.	15:53.16
sebras	fredross-perry: I did, and I reminded tor8.	16:04.58
fredross-perry	ok thanks	16:05.10
Robin_Watts	tor8, sebras, fredross-perry, mvrhel_laptop, paulgardiner: Trivial cut and paste bug needs a review:	19:11.20
	http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=791d24b28731d59f19bf69dec943357cf64fb30c	19:11.22
fredross-perry	seems legit.	19:12.13
Robin_Watts	fredross-perry: Ta.	19:12.18
sebras	Robin_Watts: mitrer typo in commit message.	21:05.44
	Robin_Watts: if you are around, do you mind taking a look at sebras/master?	22:22.36
	three commits concerning the cmap splay trees, one concerning allowed objects as keys in the PDF store and one for docs.	22:23.13
	Forward 1 day (to 2018/01/24)>>>

Log of #mupdf at irc.freenode.net.