| <<<Back 1 day (to 2018/08/08) | 20180809 |
tor8 | sebras: not sure I like the idea about padding hexstring parsing ... we can't really know how much to pad (at least not in the lexer) because bfchar's have variable length outputs | 09:11.41 |
sebras | so if we look at <00SD> (S, not 5), is currently parsed as 0x00 wich is turned into an output byte followed by 0xD stored in a local variable in the lexer, which is ignored. even though we have successfully parsed those four bits we discard them. | 09:15.44 |
| I gues the same would happend if we encounter <00D> and then the question is if there is a good reason to parse this as 0x00D or 0x00 (discard 0xD) or perhaps 0x000D. | 09:16.59 |
| tor8: oh, and welcome back! :) | 09:17.12 |
tor8 | yeah... wonder what acrobat does when parsing <00SD> | 09:20.42 |
kens | I'd like to think it throws an error.... | 09:21.02 |
sebras | tor8: I think kens said it error... yes. | 09:21.11 |
kens | sebras: Acrobat threw errors on your file, I don't know if there were other problems in it beyond the SD though | 09:21.37 |
| If there were, it might have stopped on some other error | 09:22.00 |
sebras | tor8: I know that the cmap is broken, but I wonder what the best way to handle those broken hexstrings are. mupdf tends to try to sensibly parse it and continue, but the question is what is sensible in this case? :) | 09:22.16 |
| kens: there are more errors in the cmap, but I don't know if any other parts of the file are broken. | 09:22.45 |
kens | I'd say that non-hex characters in hex strings is a good reason to stop and error out myself. Its clearly broken | 09:22.47 |
sebras | tor8: the file is still at casper if you want to take a look. | 09:23.09 |
kens | sebras: I could possibly try fixing the errors and see what happens | 09:23.13 |
tor8 | sebras: put "%%Font Helv Helvetica | 09:23.15 |
| BT /Helv 16 Tf 10 10 Td <48656c6S6f>Tj ET" in a.txt and use mutool create and try what happens in various interpreters | 09:23.15 |
| gs drops the entire string | 09:23.38 |
kens | That's pretty much what I'd expect | 09:23.53 |
tor8 | evince shows "Hel`o" | 09:24.03 |
| mupdf shows "Helf" | 09:24.10 |
kens | Well, the behaviour of PDF consumers in the faxce of illegal files is not specified (unlike PostScript) | 09:24.48 |
tor8 | acrobat doesn't show anything at all either | 09:25.03 |
kens | Yay, GS is Acrobat-compatible :-) | 09:25.18 |
tor8 | adobe reader xi at least | 09:25.24 |
| mupdf shows "warning: ignoring invalid character in hex string" | 09:25.51 |
| and I think we just ignore it completely, making the rest of the string 'off by one' | 09:26.06 |
sebras | tor8: acroread 9 shows a blank page without errors. | 09:26.21 |
kens | Acrobat doesn't always complain when it fionds errors | 09:26.38 |
| Try exiting, does it offer to 'save changes' ? | 09:26.50 |
sebras | kens: nope. | 09:27.05 |
kens | Well tha'ts even worse than usual.... | 09:27.16 |
sebras | tor8: we parse <48656c6S6f> as 0x48 0x65 0x6c 0x66 0xf and then discard 0xf before returning. | 09:27.44 |
tor8 | sebras: yeah, acrobat almost never warns or complains | 09:27.52 |
kens | If it does you know your file is *really* broken :-) | 09:28.08 |
tor8 | sebras: I have a one-line tweak I'm going to try | 09:28.11 |
| and with that tweak, we match what evince does | 09:29.30 |
sebras | tor8: right, so each unreadable nibble is parsed as 0x0 | 09:30.04 |
tor8 | sebras: yeah. see tip of tor/master | 09:30.23 |
sebras | tor8: what do we do if there is an odd number of nibbles though? | 09:30.33 |
tor8 | we ignore the last odd one | 09:31.10 |
| we only emit a 'parsed' byte when we see the second hex character in each byte pair | 09:31.36 |
| we could pad it easily enough | 09:32.01 |
sebras | pdfref17 says on page 56 that <901FA> should be parsed as <901FA0> though. | 09:32.39 |
| tor8: I assumed that the spec didn't define this until checking just now. :) | 09:33.21 |
tor8 | sebras: huh. you're right. | 09:34.20 |
| and if I have a truncated byte at the end, acrobat does show it same as evince. | 09:34.30 |
sebras | tor8: yes, I know. it doesn't happen often enough though. :) | 09:35.06 |
kens | Dumb question, what does GS do with that string ? | 09:35.17 |
tor8 | kens: the right thing :) | 09:35.58 |
kens | is highly relieved | 09:36.06 |
tor8 | sebras: new fix on tor/master | 09:36.13 |
sebras | tor8: LGTM. | 09:36.42 |
| there's quite a bit of thins on tor/master and sebras/master. maybe we should review a bit? | 09:37.13 |
tor8 | sebras: yes, we probably should | 09:37.31 |
sebras | tor8: the one's on sebras/master all triggered something in ossfuzz. | 09:38.25 |
tor8 | kens: looks like the heat in london has dissipated | 09:38.38 |
sebras | tor8: one of the WIP commit messages needs rewriting, I know, but the change itself ought to be sound. | 09:38.47 |
kens | tor8 yes the weather changed on Tuesday night | 09:39.08 |
| Its currently raining | 09:39.13 |
| Sop, back to normal :-) | 09:39.20 |
tor8 | Vienna was baking hot. I was hoping to come back to some cooler weather, but nope, the drought is *still* going strong... | 09:40.50 |
sebras | tor8: 36C still? | 09:41.23 |
tor8 | maybe your weather will make it over here in a day or two... | 09:41.26 |
kens | tor8 did you enjoy Vienna ? | 09:41.32 |
| Apart from the heat.... | 09:41.50 |
tor8 | expecting only 27 degrees today... which I guess is cooler than the 34 degrees it was yesterday | 09:42.03 |
kens | Well, its an improvement, slightly | 09:42.16 |
tor8 | kens: Yes. I'd never been to Austria before. | 09:42.18 |
kens | Lots of history there | 09:42.38 |
tor8 | The palaces are almost as pompous as the ones in France :) | 09:42.52 |
kens | Yeah we did the our as welll :-) | 09:43.07 |
tor8 | kens: Yeah. It's a bit sad seeing how much they lost after the wars, considering how big and important Austria was in European history. | 09:44.20 |
kens | Well, they had already lost quite a bit even before WW1 | 09:44.48 |
| And teh Ottomans came really close to rolling right over them | 09:45.02 |
tor8 | I guess the summary is: keen to fight, but not very good at it :) | 09:45.28 |
kens | Well, they are a small nation really | 09:45.41 |
| But certainly had a lot of involvement cerntrally in European history | 09:46.07 |
| I guess location counts for a lot | 09:46.19 |
tor8 | Being the center of the Habsburg dynasty and HRE did give them a bit more punch than one would expect | 09:47.01 |
| At least they're still around ... can't say the same for Prussia. | 09:47.24 |
kens | Yes good point the Holy Roman Empire did have a lto to do with it, politically at least | 09:47.38 |
| Well Prussia got wrapped up into Germany | 09:47.50 |
| And if memory serves they more or less ran the new country initially | 09:48.03 |
tor8 | wrapped up, then parceled up and divided into what is today germany, russia, poland, and czechia | 09:48.58 |
kens | Oh, I htought it was ll in Germany | 09:49.16 |
kens | was never much of a history student | 09:50.06 |
tor8 | you forgot Königsberg (Kaliningrad), Bohemia, and Poland | 09:50.30 |
kens | I have to admit I find it difficutl to keep track of borders in Central Euyrope :-) | 09:50.56 |
tor8 | https://en.wikipedia.org/wiki/Prussia#/media/File:Map-DR-Prussia.svg this picture shows the mess it was | 09:50.59 |
kens | Well, that's certainly a lot bigger than I realised | 09:51.22 |
tor8 | The name comes from a region in hwat is currently northeastern Poland. | 09:52.16 |
sebras | tor8: in "Rejig pdf_update_page and pdf_update_annot." PDFPage_update/PDFAnnotation_update ought to return JNI_FALSE if !ctx. also would it be wise to rename has_new_ap to make sure that clients not poke at it? | 09:55.09 |
| tor8: also we have a pdf_update_page(), but not fz_update_page() the same goes for pdf_update_annot() vs fz_update_annot(). but we do have fz_annot and functions for iterating over them and getting displaylists. | 09:55.47 |
tor8 | sebras: you mean to return JNI_FALSE instead of 0? | 09:56.03 |
sebras | tor8: would it makes sense to hoist these operations up to the fitz level interface? | 09:56.05 |
| yes. | 09:56.08 |
tor8 | sebras: right. | 09:56.15 |
sebras | we do that in other functions that return jboolean. | 09:56.16 |
tor8 | sebras: fz_annot was (IMO) a bit of a design mistake; I don't want to make it stronger | 09:57.22 |
| it's a superclass with only one descendant, and no prospects to ever get any more | 09:58.01 |
sebras | doesn't xps have those too? | 09:58.15 |
| perhaps only links, and that's what I'm remembering, hazily. | 09:59.24 |
kens | XPS has hyperlinks, but not annotations | 09:59.51 |
| As I recall | 09:59.54 |
tor8 | sebras: only links, which are not covered by fz_annot | 10:00.42 |
| when and if we add more descendants, we'll tackle the issue of generalizing to fz_update_page/annot then | 10:01.28 |
| but now, it would just be pointless object oriented complexity for no immediate gain | 10:02.09 |
sebras | tor8: ok, then I agree that it is definitely not pressing. if it was already specified by xps but not implemented yet I'd be hard to convince. :) | 10:02.18 |
| tor8: agreed. | 10:02.24 |
tor8 | sebras: "Avoid trying to populate ui without entries." s/ui/OCG ui struct/ ? | 10:06.15 |
sebras | tor8: one question though: pdf_load_annots() apperas to se annot->has_new_app = 0 after loading each annot but that contradicts the last paragraph in the explanation of pdf_update_annot() in the header, right? | 10:06.59 |
| tor8: yes, I'll fix. | 10:07.24 |
tor8 | sebras: you could possibly be misreading the comment: "if the annotation appearance has changed since ... the annotation was first loaded" | 10:08.00 |
sebras | ah, yes. | 10:08.36 |
| tor8: with that "Rejig pdf_update_page and pdf_update_annot." LGTM | 10:10.52 |
tor8 | the fz_warn in "Ignore cmap input ranges" I'd maybe rephrase as "ignoring CMap range (%u-%u) that is outside the codespace" | 10:12.24 |
| fz_warn doesn't come with a stack trace, so it's more important to mention what/where it is warning | 10:13.10 |
| I'd ask you to prod Robin about the cmap splay tree fix, but I'm not sure if he's around today before leaving for his holidays | 10:16.29 |
kens | He doesn't go till Sunday | 10:16.41 |
tor8 | I can't see anything wrong with it, but I don't claim to understand that part of code at all | 10:16.46 |
kens | here today and tomorrow | 10:16.46 |
sebras | tor8: neither did I, but now I know more. :) | 10:17.35 |
tor8 | sebras: with that wording fix, excepting the WIP that you should prod Robin about if you feel uncertain (but if you're confident you have my quiet nod), all on sebras/master LGTM | 10:17.45 |
sebras | tor8: as far as I read the code when delete_node() deletes a node the last node is moved to that location and the length of the node array is shortened, if lt and gt point to the last node it will now have moved. | 10:18.29 |
| tor8: "gl: Adobe allows border widths from 0 to 12." to "Add fz_memmem function taken from musl.c" LGTM. | 10:20.11 |
| tor8: in "Clean up null/range/endstream filter." struct null_filter seems to have an size_t extra which is not used. | 10:35.28 |
| also I believe size_t is not used..? | 10:35.48 |
tor8 | sebras: you're right. neither 'extras' nor 'size' are used in the null_filter. will remove. | 10:37.32 |
sebras | tor8: is there a reason why remain in struct range_filter and struct null_filter is size_t and the len field in fz_range is an int? | 11:09.34 |
| and why do we need to cast the n to int64_t at the end of next_range when updating state->offset and stm->pos? | 11:10.33 |
| tor8: I also notice a difference in next_null() and next_range() in that if fz_available() returns 0, we set up stm->rp and stm->wp in next_range() before returning EOF, but in next_null() we don't. does that matter? | 11:12.56 |
| tor8: perhaps that doesn't matter as I think stm->wp == stm->rp every time we enter next_range() or next_null() because that's the only state when fz_available() would call next(). | 11:16.41 |
| the same seems to go for fz_read_byte() and fz_peek_byte(), ok. so whether we reset stm->rp and stm->wp doesn't seem too important. | 11:18.39 |
| tor8: why does fz_open_endstream_filter() bother with silently limiting the len input argument to >= 0 if the range filter does not do the same for the ranges? | 11:29.00 |
| oh and "duff Length" here means that Length indicates a shorter stream than endstream indicates, right? | 11:30.43 |
| tor8: also shouldn't the warning "PDF stream Length incorrect." might be better as "stream length shorter than actual stream length"? at least git grep 'fz_warn[^"]*"[^"]*[\.!?;:]"' seems to indicate that we don't often end with interpunctuation. | 11:42.47 |
| I don't think next_endstream() should set eof at all. it does now when it finds the endstream marker. | 11:45.12 |
paulgardiner | My cluster test failed, seemingly because GL/gl.h not present. Should I just try again, or is there a known problem? | 11:46.50 |
sebras | even if n > 1 setting eof to 1 causes next_endstream() to only return a single byte in the first call when endstream is found and then any future call to fz_available() indicates that 0 bytes are available (because eof == 1) and so fz_read() would fails to read those bytes. | 11:47.18 |
paulgardiner | Ah, of course.... | 11:47.44 |
sebras | paulgardiner: isn't the problem the missing jbig2dec functions? | 11:49.35 |
| paulgardiner: I think those GL errors happen every time. | 11:50.15 |
paulgardiner | sebras: I'd just forgotten to refresh submodules | 11:50.24 |
| ... I believe | 11:50.34 |
sebras | paulgardiner: yes. | 11:50.48 |
| the commit your branch is sitting on does update the makefiles I believe. | 11:51.04 |
| tor8: though perhaps run.new in the cluster ought to not only set HAVE_X11=NO but also HAVE_GLUT=no to avoid trying to build mupdf-gl? | 11:54.02 |
paulgardiner | Please can someone review the two commit on my master? Cluster seems happy. | 11:58.00 |
sebras | paulgardiner: I'm not yet clever enough to LGTM that first focus patch. because focus resides at the document level it seems to me like we only allow for a single focused widget in the entire document, but we allow for multiple pages to be rendered. | 12:04.17 |
| I can see how having two pages with widgets and droping one of the would interfere with the focus on the other page. | 12:05.19 |
paulgardiner | The focus isn't well handled. Needs rethinking. But I believe this commit is a definite improvement | 12:05.21 |
sebras | but I'm not sure if the correct fix is to allow for multiple focused widgets. | 12:05.31 |
| I think your commit disentagles the focus on one page from dropping another page, yes. | 12:06.06 |
| paulgardiner: hows does acrobat behave if you have >=2 pages with widgets? does it remember the focus? | 12:07.08 |
paulgardiner | I've actually worked around this in muso and don't actually need that commit. If it's questionable I can leave it out. I put it up only because it seemed worth having that small improvement | 12:07.10 |
| Actually, I think only one focus is allowed in any case. | 12:07.47 |
sebras | paulgardiner: yes, I think it improves things too, not sure if tor8 wants to fix the entire focus issue now though. perhaps? :) | 12:08.10 |
paulgardiner | I'm easy either way with that commit. | 12:08.38 |
| The focus pointer in the doc struct is definitely fragile. | 12:09.28 |
sebras | paulgardiner: seems like pdfref17 page 652 states that focus is an element in the _document_, which I read as that there can only be one per document, not one per page. | 12:09.35 |
paulgardiner | Even with my patch, holding two fz_pages on the same page is problematic | 12:10.36 |
| The other commit is the more important. That one fixes a bug, I believe | 12:12.55 |
| Just nipping out for a run. biab | 12:13.19 |
sebras | perhaps we should only clear doc->focus if we are freeing the last reference to the page containing the annotation in focus. | 12:13.43 |
| paulgardiner: in the second comment, why wouldn't accepted be 1 when returned from run_keystroke()? | 12:22.04 |
| paulgardiner: to me it seems that would require the javascript action to set event->rc to 0, but why would that happen? | 12:22.27 |
| tor8: if the buffer ends with "\r\n" we will include these in the data returned to the calle of next_endstream(), upon the next call to next_endstream() we would then read "endstream" into the buffer and then consider the stream at EOF. | 12:30.56 |
| tor8: if we however read a buffer which contains "\r\nendstream" the in one go we will avoid returning "\r\n" to the user. | 12:31.38 |
| s/the user/the caller/ | 12:31.47 |
| at least I think this is waht would happen. | 12:33.07 |
paulgardiner | sebras: it seems the keystroke events look for a specific pattern and accept just that, which doesn't always include the empty string | 12:41.00 |
| Any more thoughts on my commits? | 15:32.44 |
sebras | paulgardiner: I have been reviewing tor8s commits on tor/master, and checking somethings for kate. | 15:54.07 |
| paulgardiner: tor8 did however redo the keystroke annotation patch and put it on tor/master. | 15:54.37 |
| paulgardiner: I took a quick look and it bis basically the same as yours apart from not calling strlen. :) | 15:55.02 |
paulgardiner | *str == 0 ? | 15:56.09 |
sebras | yes. | 15:56.18 |
paulgardiner | Yep, makes sense | 15:56.31 |
| If that goes in, it's all I really need | 15:56.53 |
sebras | so once I get through tor/master it'll be in I suppose. | 15:57.08 |
| maybe tomorrow? | 15:57.16 |
| tor8: up to "Detect cycles in pdf_dict_get_inheritable." LGTM. | 15:59.19 |
| Robin_Watts: ok, the top one on mupdf:sebras/master needs a review. | 16:16.09 |
| Robin_Watts: origin/master..sebras/master~1 tor8 already LGTMed. | 16:16.21 |
| Robin_Watts: I might need to rephrase the commit message, but | 16:16.41 |
| it is more the logic of the patch that needs to be checked for soundness. | 16:16.59 |
Robin_Watts | lookng | 16:17.59 |
| So, the worry is that we might call 'delete_node' and have gt and lt left as being invalid ? | 16:19.47 |
sebras | yes. | 16:20.03 |
| not only worry, it got triggered in by an oss-fuzz file. | 16:20.22 |
Robin_Watts | There are only 2 cases where we call delete_node. | 16:20.32 |
| In one of them, we exit the function, so that's not an issue. | 16:20.43 |
sebras | correct, but this is the previous one. | 16:20.56 |
Robin_Watts | In the other we set current = EMPTY and continue, which sets current = move; | 16:21.08 |
| Ah, I see. | 16:21.36 |
sebras | correct, but we might have arrived at the end of the while loop setting lt/gt in the previous iteration. | 16:21.50 |
| before delete_node() is called. | 16:21.54 |
| and pointing to lt/gt is as far as I can see still valid, it is just that the node might have moved after current was deleted by delete_node(). | 16:22.28 |
Robin_Watts | I understand why your patch is where it is. | 16:22.51 |
| I don't understand what it's doing. | 16:22.59 |
sebras | Robin_Watts: so current is the node being deleted, | 16:23.23 |
Robin_Watts | You're setting lt and gt (in some circumstance I don't understand) to a value that is about to be deleted. | 16:23.30 |
sebras | but when it is deleted the last node in the array at cmap->tlen - 1 is moved into the empty slot. | 16:23.42 |
Robin_Watts | Ah. | 16:24.09 |
sebras | so I'm actually setting lt/gt to the position where this node is moved into. | 16:24.12 |
Robin_Watts | Now that makes sense. | 16:24.20 |
sebras | I can doit after delete_node isntead of course. | 16:24.21 |
| but before current = EMPTY. | 16:24.40 |
Robin_Watts | How about adding: /* delete_node moves the element at cmap->tlen-1 into current. */ | 16:25.15 |
| That'd be enough of a hint to explain what was happening to me, I think. | 16:25.54 |
sebras | not doing this later triggered an assert() in delete_node() called at the end of add_range() because gt was not valid and so its left/right pointers were inconsistent. | 16:25.58 |
| I can do that, but you otherwise agree that the logic is sound? | 16:26.22 |
Robin_Watts | Yes, with that, it lgtm. | 16:26.49 |
sebras | Robin_Watts: that was all, thanks! :) | 16:27.02 |
Robin_Watts | no worries. | 16:27.09 |
sebras | tor8: "Remove functions that implement duplicate functionality." on page 1038 in pdfref17.pdf it is actually briefly mentioned that the interactive form dictionary /AcroForm is separate from the interactive form field hierarchy. NB this is mentioned in the setting of linearization, but still. | 17:52.15 |
| tor8: ok, so I have reviwed the entire tor/master up to "FIXUP error messages in NULL-openssl impl" it all LGTM. the only small question is on "Respect NoRotate flag for icon-based annotations." where you make a change in do_edit_icon(), but I'm not sure it belongs in that commit. | 19:42.25 |
| tor8 also after clustering I noticed that a few files had issues reporting a large number of warnings. | 19:43.14 |
| tor8: I believe that tsebras/master now contains a suitable FIXUP commit for that issue. | 19:43.33 |
| additionally there's a WIP commit for allowing alteration of the sanitize flags and specifically allowing me to compile MSAN (which requires compiling and linking with clang). | 19:44.17 |
sebras | afk. | 19:47.04 |
tor8 | sebras: up to sebras/master~1 LGTM. I don't understand the WIP clang msan changes at all | 22:07.22 |
| you change $(LD) to ld, which breaks cross compiles | 22:07.38 |
| changing += to ?= has no effect on variables set on the command line... command line variables always override all makefile variables | 22:08.10 |
| you probably want to add a SANITIZE_FLAGS += $(SANITIZE_XFLAGS) line instead | 22:08.28 |
| Forward 1 day (to 2018/08/10)>>> | |