MuPDF IRC logs

	<<<Back 1 day (to 2018/08/08)	20180809
tor8	sebras: not sure I like the idea about padding hexstring parsing ... we can't really know how much to pad (at least not in the lexer) because bfchar's have variable length outputs	09:11.41
sebras	so if we look at <00SD> (S, not 5), is currently parsed as 0x00 wich is turned into an output byte followed by 0xD stored in a local variable in the lexer, which is ignored. even though we have successfully parsed those four bits we discard them.	09:15.44
	I gues the same would happend if we encounter <00D> and then the question is if there is a good reason to parse this as 0x00D or 0x00 (discard 0xD) or perhaps 0x000D.	09:16.59
	tor8: oh, and welcome back! :)	09:17.12
tor8	yeah... wonder what acrobat does when parsing <00SD>	09:20.42
kens	I'd like to think it throws an error....	09:21.02
sebras	tor8: I think kens said it error... yes.	09:21.11
kens	sebras: Acrobat threw errors on your file, I don't know if there were other problems in it beyond the SD though	09:21.37
	If there were, it might have stopped on some other error	09:22.00
sebras	tor8: I know that the cmap is broken, but I wonder what the best way to handle those broken hexstrings are. mupdf tends to try to sensibly parse it and continue, but the question is what is sensible in this case? :)	09:22.16
	kens: there are more errors in the cmap, but I don't know if any other parts of the file are broken.	09:22.45
kens	I'd say that non-hex characters in hex strings is a good reason to stop and error out myself. Its clearly broken	09:22.47
sebras	tor8: the file is still at casper if you want to take a look.	09:23.09
kens	sebras: I could possibly try fixing the errors and see what happens	09:23.13
tor8	sebras: put "%%Font Helv Helvetica	09:23.15
	BT /Helv 16 Tf 10 10 Td <48656c6S6f>Tj ET" in a.txt and use mutool create and try what happens in various interpreters	09:23.15
	gs drops the entire string	09:23.38
kens	That's pretty much what I'd expect	09:23.53
tor8	evince shows "Hel`o"	09:24.03
	mupdf shows "Helf"	09:24.10
kens	Well, the behaviour of PDF consumers in the faxce of illegal files is not specified (unlike PostScript)	09:24.48
tor8	acrobat doesn't show anything at all either	09:25.03
kens	Yay, GS is Acrobat-compatible :-)	09:25.18
tor8	adobe reader xi at least	09:25.24
	mupdf shows "warning: ignoring invalid character in hex string"	09:25.51
	and I think we just ignore it completely, making the rest of the string 'off by one'	09:26.06
sebras	tor8: acroread 9 shows a blank page without errors.	09:26.21
kens	Acrobat doesn't always complain when it fionds errors	09:26.38
	Try exiting, does it offer to 'save changes' ?	09:26.50
sebras	kens: nope.	09:27.05
kens	Well tha'ts even worse than usual....	09:27.16
sebras	tor8: we parse <48656c6S6f> as 0x48 0x65 0x6c 0x66 0xf and then discard 0xf before returning.	09:27.44
tor8	sebras: yeah, acrobat almost never warns or complains	09:27.52
kens	If it does you know your file is really broken :-)	09:28.08
tor8	sebras: I have a one-line tweak I'm going to try	09:28.11
	and with that tweak, we match what evince does	09:29.30
sebras	tor8: right, so each unreadable nibble is parsed as 0x0	09:30.04
tor8	sebras: yeah. see tip of tor/master	09:30.23
sebras	tor8: what do we do if there is an odd number of nibbles though?	09:30.33
tor8	we ignore the last odd one	09:31.10
	we only emit a 'parsed' byte when we see the second hex character in each byte pair	09:31.36
	we could pad it easily enough	09:32.01
sebras	pdfref17 says on page 56 that <901FA> should be parsed as <901FA0> though.	09:32.39
	tor8: I assumed that the spec didn't define this until checking just now. :)	09:33.21
tor8	sebras: huh. you're right.	09:34.20
	and if I have a truncated byte at the end, acrobat does show it same as evince.	09:34.30
sebras	tor8: yes, I know. it doesn't happen often enough though. :)	09:35.06
kens	Dumb question, what does GS do with that string ?	09:35.17
tor8	kens: the right thing :)	09:35.58
*kens*	is highly relieved	09:36.06
tor8	sebras: new fix on tor/master	09:36.13
sebras	tor8: LGTM.	09:36.42
	there's quite a bit of thins on tor/master and sebras/master. maybe we should review a bit?	09:37.13
tor8	sebras: yes, we probably should	09:37.31
sebras	tor8: the one's on sebras/master all triggered something in ossfuzz.	09:38.25
tor8	kens: looks like the heat in london has dissipated	09:38.38
sebras	tor8: one of the WIP commit messages needs rewriting, I know, but the change itself ought to be sound.	09:38.47
kens	tor8 yes the weather changed on Tuesday night	09:39.08
	Its currently raining	09:39.13
	Sop, back to normal :-)	09:39.20
tor8	Vienna was baking hot. I was hoping to come back to some cooler weather, but nope, the drought is still going strong...	09:40.50
sebras	tor8: 36C still?	09:41.23
tor8	maybe your weather will make it over here in a day or two...	09:41.26
kens	tor8 did you enjoy Vienna ?	09:41.32
	Apart from the heat....	09:41.50
tor8	expecting only 27 degrees today... which I guess is cooler than the 34 degrees it was yesterday	09:42.03
kens	Well, its an improvement, slightly	09:42.16
tor8	kens: Yes. I'd never been to Austria before.	09:42.18
kens	Lots of history there	09:42.38
tor8	The palaces are almost as pompous as the ones in France :)	09:42.52
kens	Yeah we did the our as welll :-)	09:43.07
tor8	kens: Yeah. It's a bit sad seeing how much they lost after the wars, considering how big and important Austria was in European history.	09:44.20
kens	Well, they had already lost quite a bit even before WW1	09:44.48
	And teh Ottomans came really close to rolling right over them	09:45.02
tor8	I guess the summary is: keen to fight, but not very good at it :)	09:45.28
kens	Well, they are a small nation really	09:45.41
	But certainly had a lot of involvement cerntrally in European history	09:46.07
	I guess location counts for a lot	09:46.19
tor8	Being the center of the Habsburg dynasty and HRE did give them a bit more punch than one would expect	09:47.01
	At least they're still around ... can't say the same for Prussia.	09:47.24
kens	Yes good point the Holy Roman Empire did have a lto to do with it, politically at least	09:47.38
	Well Prussia got wrapped up into Germany	09:47.50
	And if memory serves they more or less ran the new country initially	09:48.03
tor8	wrapped up, then parceled up and divided into what is today germany, russia, poland, and czechia	09:48.58
kens	Oh, I htought it was ll in Germany	09:49.16
*kens*	was never much of a history student	09:50.06
tor8	you forgot KÃ¶nigsberg (Kaliningrad), Bohemia, and Poland	09:50.30
kens	I have to admit I find it difficutl to keep track of borders in Central Euyrope :-)	09:50.56
tor8	https://en.wikipedia.org/wiki/Prussia#/media/File:Map-DR-Prussia.svg this picture shows the mess it was	09:50.59
kens	Well, that's certainly a lot bigger than I realised	09:51.22
tor8	The name comes from a region in hwat is currently northeastern Poland.	09:52.16
sebras	tor8: in "Rejig pdf_update_page and pdf_update_annot." PDFPage_update/PDFAnnotation_update ought to return JNI_FALSE if !ctx. also would it be wise to rename has_new_ap to make sure that clients not poke at it?	09:55.09
	tor8: also we have a pdf_update_page(), but not fz_update_page() the same goes for pdf_update_annot() vs fz_update_annot(). but we do have fz_annot and functions for iterating over them and getting displaylists.	09:55.47
tor8	sebras: you mean to return JNI_FALSE instead of 0?	09:56.03
sebras	tor8: would it makes sense to hoist these operations up to the fitz level interface?	09:56.05
	yes.	09:56.08
tor8	sebras: right.	09:56.15
sebras	we do that in other functions that return jboolean.	09:56.16
tor8	sebras: fz_annot was (IMO) a bit of a design mistake; I don't want to make it stronger	09:57.22
	it's a superclass with only one descendant, and no prospects to ever get any more	09:58.01
sebras	doesn't xps have those too?	09:58.15
	perhaps only links, and that's what I'm remembering, hazily.	09:59.24
kens	XPS has hyperlinks, but not annotations	09:59.51
	As I recall	09:59.54
tor8	sebras: only links, which are not covered by fz_annot	10:00.42
	when and if we add more descendants, we'll tackle the issue of generalizing to fz_update_page/annot then	10:01.28
	but now, it would just be pointless object oriented complexity for no immediate gain	10:02.09
sebras	tor8: ok, then I agree that it is definitely not pressing. if it was already specified by xps but not implemented yet I'd be hard to convince. :)	10:02.18
	tor8: agreed.	10:02.24
tor8	sebras: "Avoid trying to populate ui without entries." s/ui/OCG ui struct/ ?	10:06.15
sebras	tor8: one question though: pdf_load_annots() apperas to se annot->has_new_app = 0 after loading each annot but that contradicts the last paragraph in the explanation of pdf_update_annot() in the header, right?	10:06.59
	tor8: yes, I'll fix.	10:07.24
tor8	sebras: you could possibly be misreading the comment: "if the annotation appearance has changed since ... the annotation was first loaded"	10:08.00
sebras	ah, yes.	10:08.36
	tor8: with that "Rejig pdf_update_page and pdf_update_annot." LGTM	10:10.52
tor8	the fz_warn in "Ignore cmap input ranges" I'd maybe rephrase as "ignoring CMap range (%u-%u) that is outside the codespace"	10:12.24
	fz_warn doesn't come with a stack trace, so it's more important to mention what/where it is warning	10:13.10
	I'd ask you to prod Robin about the cmap splay tree fix, but I'm not sure if he's around today before leaving for his holidays	10:16.29
kens	He doesn't go till Sunday	10:16.41
tor8	I can't see anything wrong with it, but I don't claim to understand that part of code at all	10:16.46
kens	here today and tomorrow	10:16.46
sebras	tor8: neither did I, but now I know more. :)	10:17.35
tor8	sebras: with that wording fix, excepting the WIP that you should prod Robin about if you feel uncertain (but if you're confident you have my quiet nod), all on sebras/master LGTM	10:17.45
sebras	tor8: as far as I read the code when delete_node() deletes a node the last node is moved to that location and the length of the node array is shortened, if lt and gt point to the last node it will now have moved.	10:18.29
	tor8: "gl: Adobe allows border widths from 0 to 12." to "Add fz_memmem function taken from musl.c" LGTM.	10:20.11
	tor8: in "Clean up null/range/endstream filter." struct null_filter seems to have an size_t extra which is not used.	10:35.28
	also I believe size_t is not used..?	10:35.48
tor8	sebras: you're right. neither 'extras' nor 'size' are used in the null_filter. will remove.	10:37.32
sebras	tor8: is there a reason why remain in struct range_filter and struct null_filter is size_t and the len field in fz_range is an int?	11:09.34
	and why do we need to cast the n to int64_t at the end of next_range when updating state->offset and stm->pos?	11:10.33
	tor8: I also notice a difference in next_null() and next_range() in that if fz_available() returns 0, we set up stm->rp and stm->wp in next_range() before returning EOF, but in next_null() we don't. does that matter?	11:12.56
	tor8: perhaps that doesn't matter as I think stm->wp == stm->rp every time we enter next_range() or next_null() because that's the only state when fz_available() would call next().	11:16.41
	the same seems to go for fz_read_byte() and fz_peek_byte(), ok. so whether we reset stm->rp and stm->wp doesn't seem too important.	11:18.39
	tor8: why does fz_open_endstream_filter() bother with silently limiting the len input argument to >= 0 if the range filter does not do the same for the ranges?	11:29.00
	oh and "duff Length" here means that Length indicates a shorter stream than endstream indicates, right?	11:30.43
	tor8: also shouldn't the warning "PDF stream Length incorrect." might be better as "stream length shorter than actual stream length"? at least git grep 'fz_warn[^"]"[^"][\.!?;:]"' seems to indicate that we don't often end with interpunctuation.	11:42.47
	I don't think next_endstream() should set eof at all. it does now when it finds the endstream marker.	11:45.12
paulgardiner	My cluster test failed, seemingly because GL/gl.h not present. Should I just try again, or is there a known problem?	11:46.50
sebras	even if n > 1 setting eof to 1 causes next_endstream() to only return a single byte in the first call when endstream is found and then any future call to fz_available() indicates that 0 bytes are available (because eof == 1) and so fz_read() would fails to read those bytes.	11:47.18
paulgardiner	Ah, of course....	11:47.44
sebras	paulgardiner: isn't the problem the missing jbig2dec functions?	11:49.35
	paulgardiner: I think those GL errors happen every time.	11:50.15
paulgardiner	sebras: I'd just forgotten to refresh submodules	11:50.24
	... I believe	11:50.34
sebras	paulgardiner: yes.	11:50.48
	the commit your branch is sitting on does update the makefiles I believe.	11:51.04
	tor8: though perhaps run.new in the cluster ought to not only set HAVE_X11=NO but also HAVE_GLUT=no to avoid trying to build mupdf-gl?	11:54.02
paulgardiner	Please can someone review the two commit on my master? Cluster seems happy.	11:58.00
sebras	paulgardiner: I'm not yet clever enough to LGTM that first focus patch. because focus resides at the document level it seems to me like we only allow for a single focused widget in the entire document, but we allow for multiple pages to be rendered.	12:04.17
	I can see how having two pages with widgets and droping one of the would interfere with the focus on the other page.	12:05.19
paulgardiner	The focus isn't well handled. Needs rethinking. But I believe this commit is a definite improvement	12:05.21
sebras	but I'm not sure if the correct fix is to allow for multiple focused widgets.	12:05.31
	I think your commit disentagles the focus on one page from dropping another page, yes.	12:06.06
	paulgardiner: hows does acrobat behave if you have >=2 pages with widgets? does it remember the focus?	12:07.08
paulgardiner	I've actually worked around this in muso and don't actually need that commit. If it's questionable I can leave it out. I put it up only because it seemed worth having that small improvement	12:07.10
	Actually, I think only one focus is allowed in any case.	12:07.47
sebras	paulgardiner: yes, I think it improves things too, not sure if tor8 wants to fix the entire focus issue now though. perhaps? :)	12:08.10
paulgardiner	I'm easy either way with that commit.	12:08.38
	The focus pointer in the doc struct is definitely fragile.	12:09.28
sebras	paulgardiner: seems like pdfref17 page 652 states that focus is an element in the _document_, which I read as that there can only be one per document, not one per page.	12:09.35
paulgardiner	Even with my patch, holding two fz_pages on the same page is problematic	12:10.36
	The other commit is the more important. That one fixes a bug, I believe	12:12.55
	Just nipping out for a run. biab	12:13.19
sebras	perhaps we should only clear doc->focus if we are freeing the last reference to the page containing the annotation in focus.	12:13.43
	paulgardiner: in the second comment, why wouldn't accepted be 1 when returned from run_keystroke()?	12:22.04
	paulgardiner: to me it seems that would require the javascript action to set event->rc to 0, but why would that happen?	12:22.27
	tor8: if the buffer ends with "\r\n" we will include these in the data returned to the calle of next_endstream(), upon the next call to next_endstream() we would then read "endstream" into the buffer and then consider the stream at EOF.	12:30.56
	tor8: if we however read a buffer which contains "\r\nendstream" the in one go we will avoid returning "\r\n" to the user.	12:31.38
	s/the user/the caller/	12:31.47
	at least I think this is waht would happen.	12:33.07
paulgardiner	sebras: it seems the keystroke events look for a specific pattern and accept just that, which doesn't always include the empty string	12:41.00
	Any more thoughts on my commits?	15:32.44
sebras	paulgardiner: I have been reviewing tor8s commits on tor/master, and checking somethings for kate.	15:54.07
	paulgardiner: tor8 did however redo the keystroke annotation patch and put it on tor/master.	15:54.37
	paulgardiner: I took a quick look and it bis basically the same as yours apart from not calling strlen. :)	15:55.02
paulgardiner	*str == 0 ?	15:56.09
sebras	yes.	15:56.18
paulgardiner	Yep, makes sense	15:56.31
	If that goes in, it's all I really need	15:56.53
sebras	so once I get through tor/master it'll be in I suppose.	15:57.08
	maybe tomorrow?	15:57.16
	tor8: up to "Detect cycles in pdf_dict_get_inheritable." LGTM.	15:59.19
	Robin_Watts: ok, the top one on mupdf:sebras/master needs a review.	16:16.09
	Robin_Watts: origin/master..sebras/master~1 tor8 already LGTMed.	16:16.21
	Robin_Watts: I might need to rephrase the commit message, but	16:16.41
	it is more the logic of the patch that needs to be checked for soundness.	16:16.59
Robin_Watts	lookng	16:17.59
	So, the worry is that we might call 'delete_node' and have gt and lt left as being invalid ?	16:19.47
sebras	yes.	16:20.03
	not only worry, it got triggered in by an oss-fuzz file.	16:20.22
Robin_Watts	There are only 2 cases where we call delete_node.	16:20.32
	In one of them, we exit the function, so that's not an issue.	16:20.43
sebras	correct, but this is the previous one.	16:20.56
Robin_Watts	In the other we set current = EMPTY and continue, which sets current = move;	16:21.08
	Ah, I see.	16:21.36
sebras	correct, but we might have arrived at the end of the while loop setting lt/gt in the previous iteration.	16:21.50
	before delete_node() is called.	16:21.54
	and pointing to lt/gt is as far as I can see still valid, it is just that the node might have moved after current was deleted by delete_node().	16:22.28
Robin_Watts	I understand why your patch is where it is.	16:22.51
	I don't understand what it's doing.	16:22.59
sebras	Robin_Watts: so current is the node being deleted,	16:23.23
Robin_Watts	You're setting lt and gt (in some circumstance I don't understand) to a value that is about to be deleted.	16:23.30
sebras	but when it is deleted the last node in the array at cmap->tlen - 1 is moved into the empty slot.	16:23.42
Robin_Watts	Ah.	16:24.09
sebras	so I'm actually setting lt/gt to the position where this node is moved into.	16:24.12
Robin_Watts	Now that makes sense.	16:24.20
sebras	I can doit after delete_node isntead of course.	16:24.21
	but before current = EMPTY.	16:24.40
Robin_Watts	How about adding: /* delete_node moves the element at cmap->tlen-1 into current. */	16:25.15
	That'd be enough of a hint to explain what was happening to me, I think.	16:25.54
sebras	not doing this later triggered an assert() in delete_node() called at the end of add_range() because gt was not valid and so its left/right pointers were inconsistent.	16:25.58
	I can do that, but you otherwise agree that the logic is sound?	16:26.22
Robin_Watts	Yes, with that, it lgtm.	16:26.49
sebras	Robin_Watts: that was all, thanks! :)	16:27.02
Robin_Watts	no worries.	16:27.09
sebras	tor8: "Remove functions that implement duplicate functionality." on page 1038 in pdfref17.pdf it is actually briefly mentioned that the interactive form dictionary /AcroForm is separate from the interactive form field hierarchy. NB this is mentioned in the setting of linearization, but still.	17:52.15
	tor8: ok, so I have reviwed the entire tor/master up to "FIXUP error messages in NULL-openssl impl" it all LGTM. the only small question is on "Respect NoRotate flag for icon-based annotations." where you make a change in do_edit_icon(), but I'm not sure it belongs in that commit.	19:42.25
	tor8 also after clustering I noticed that a few files had issues reporting a large number of warnings.	19:43.14
	tor8: I believe that tsebras/master now contains a suitable FIXUP commit for that issue.	19:43.33
	additionally there's a WIP commit for allowing alteration of the sanitize flags and specifically allowing me to compile MSAN (which requires compiling and linking with clang).	19:44.17
*sebras*	afk.	19:47.04
tor8	sebras: up to sebras/master~1 LGTM. I don't understand the WIP clang msan changes at all	22:07.22
	you change $(LD) to ld, which breaks cross compiles	22:07.38
	changing += to ?= has no effect on variables set on the command line... command line variables always override all makefile variables	22:08.10
	you probably want to add a SANITIZE_FLAGS += $(SANITIZE_XFLAGS) line instead	22:08.28
	Forward 1 day (to 2018/08/10)>>>

Log of #mupdf at irc.freenode.net.