Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2018/08/08)20180809 
tor8 sebras: not sure I like the idea about padding hexstring parsing ... we can't really know how much to pad (at least not in the lexer) because bfchar's have variable length outputs09:11.41 
sebras so if we look at <00SD> (S, not 5), is currently parsed as 0x00 wich is turned into an output byte followed by 0xD stored in a local variable in the lexer, which is ignored. even though we have successfully parsed those four bits we discard them.09:15.44 
  I gues the same would happend if we encounter <00D> and then the question is if there is a good reason to parse this as 0x00D or 0x00 (discard 0xD) or perhaps 0x000D.09:16.59 
  tor8: oh, and welcome back! :)09:17.12 
tor8 yeah... wonder what acrobat does when parsing <00SD>09:20.42 
kens I'd like to think it throws an error....09:21.02 
sebras tor8: I think kens said it error... yes.09:21.11 
kens sebras: Acrobat threw errors on your file, I don't know if there were other problems in it beyond the SD though09:21.37 
  If there were, it might have stopped on some other error09:22.00 
sebras tor8: I know that the cmap is broken, but I wonder what the best way to handle those broken hexstrings are. mupdf tends to try to sensibly parse it and continue, but the question is what is sensible in this case? :)09:22.16 
  kens: there are more errors in the cmap, but I don't know if any other parts of the file are broken.09:22.45 
kens I'd say that non-hex characters in hex strings is a good reason to stop and error out myself. Its clearly broken09:22.47 
sebras tor8: the file is still at casper if you want to take a look.09:23.09 
kens sebras: I could possibly try fixing the errors and see what happens09:23.13 
tor8 sebras: put "%%Font Helv Helvetica09:23.15 
  BT /Helv 16 Tf 10 10 Td <48656c6S6f>Tj ET" in a.txt and use mutool create and try what happens in various interpreters09:23.15 
  gs drops the entire string09:23.38 
kens That's pretty much what I'd expect09:23.53 
tor8 evince shows "Hel`o"09:24.03 
  mupdf shows "Helf"09:24.10 
kens Well, the behaviour of PDF consumers in the faxce of illegal files is not specified (unlike PostScript)09:24.48 
tor8 acrobat doesn't show anything at all either09:25.03 
kens Yay, GS is Acrobat-compatible :-)09:25.18 
tor8 adobe reader xi at least09:25.24 
  mupdf shows "warning: ignoring invalid character in hex string"09:25.51 
  and I think we just ignore it completely, making the rest of the string 'off by one'09:26.06 
sebras tor8: acroread 9 shows a blank page without errors.09:26.21 
kens Acrobat doesn't always complain when it fionds errors09:26.38 
  Try exiting, does it offer to 'save changes' ?09:26.50 
sebras kens: nope.09:27.05 
kens Well tha'ts even worse than usual....09:27.16 
sebras tor8: we parse <48656c6S6f> as 0x48 0x65 0x6c 0x66 0xf and then discard 0xf before returning.09:27.44 
tor8 sebras: yeah, acrobat almost never warns or complains09:27.52 
kens If it does you know your file is *really* broken :-)09:28.08 
tor8 sebras: I have a one-line tweak I'm going to try09:28.11 
  and with that tweak, we match what evince does09:29.30 
sebras tor8: right, so each unreadable nibble is parsed as 0x009:30.04 
tor8 sebras: yeah. see tip of tor/master09:30.23 
sebras tor8: what do we do if there is an odd number of nibbles though?09:30.33 
tor8 we ignore the last odd one09:31.10 
  we only emit a 'parsed' byte when we see the second hex character in each byte pair09:31.36 
  we could pad it easily enough09:32.01 
sebras pdfref17 says on page 56 that <901FA> should be parsed as <901FA0> though.09:32.39 
  tor8: I assumed that the spec didn't define this until checking just now. :)09:33.21 
tor8 sebras: huh. you're right.09:34.20 
  and if I have a truncated byte at the end, acrobat does show it same as evince.09:34.30 
sebras tor8: yes, I know. it doesn't happen often enough though. :)09:35.06 
kens Dumb question, what does GS do with that string ?09:35.17 
tor8 kens: the right thing :)09:35.58 
kens is highly relieved09:36.06 
tor8 sebras: new fix on tor/master09:36.13 
sebras tor8: LGTM.09:36.42 
  there's quite a bit of thins on tor/master and sebras/master. maybe we should review a bit?09:37.13 
tor8 sebras: yes, we probably should09:37.31 
sebras tor8: the one's on sebras/master all triggered something in ossfuzz.09:38.25 
tor8 kens: looks like the heat in london has dissipated09:38.38 
sebras tor8: one of the WIP commit messages needs rewriting, I know, but the change itself ought to be sound.09:38.47 
kens tor8 yes the weather changed on Tuesday night09:39.08 
  Its currently raining09:39.13 
  Sop, back to normal :-)09:39.20 
tor8 Vienna was baking hot. I was hoping to come back to some cooler weather, but nope, the drought is *still* going strong...09:40.50 
sebras tor8: 36C still?09:41.23 
tor8 maybe your weather will make it over here in a day or two...09:41.26 
kens tor8 did you enjoy Vienna ?09:41.32 
  Apart from the heat....09:41.50 
tor8 expecting only 27 degrees today... which I guess is cooler than the 34 degrees it was yesterday09:42.03 
kens Well, its an improvement, slightly09:42.16 
tor8 kens: Yes. I'd never been to Austria before.09:42.18 
kens Lots of history there09:42.38 
tor8 The palaces are almost as pompous as the ones in France :)09:42.52 
kens Yeah we did the our as welll :-)09:43.07 
tor8 kens: Yeah. It's a bit sad seeing how much they lost after the wars, considering how big and important Austria was in European history.09:44.20 
kens Well, they had already lost quite a bit even before WW109:44.48 
  And teh Ottomans came really close to rolling right over them09:45.02 
tor8 I guess the summary is: keen to fight, but not very good at it :)09:45.28 
kens Well, they are a small nation really09:45.41 
  But certainly had a lot of involvement cerntrally in European history09:46.07 
  I guess location counts for a lot09:46.19 
tor8 Being the center of the Habsburg dynasty and HRE did give them a bit more punch than one would expect09:47.01 
  At least they're still around ... can't say the same for Prussia.09:47.24 
kens Yes good point the Holy Roman Empire did have a lto to do with it, politically at least09:47.38 
  Well Prussia got wrapped up into Germany09:47.50 
  And if memory serves they more or less ran the new country initially09:48.03 
tor8 wrapped up, then parceled up and divided into what is today germany, russia, poland, and czechia09:48.58 
kens Oh, I htought it was ll in Germany09:49.16 
kens was never much of a history student09:50.06 
tor8 you forgot Königsberg (Kaliningrad), Bohemia, and Poland09:50.30 
kens I have to admit I find it difficutl to keep track of borders in Central Euyrope :-)09:50.56 
tor8 https://en.wikipedia.org/wiki/Prussia#/media/File:Map-DR-Prussia.svg this picture shows the mess it was09:50.59 
kens Well, that's certainly a lot bigger than I realised09:51.22 
tor8 The name comes from a region in hwat is currently northeastern Poland.09:52.16 
sebras tor8: in "Rejig pdf_update_page and pdf_update_annot." PDFPage_update/PDFAnnotation_update ought to return JNI_FALSE if !ctx. also would it be wise to rename has_new_ap to make sure that clients not poke at it?09:55.09 
  tor8: also we have a pdf_update_page(), but not fz_update_page() the same goes for pdf_update_annot() vs fz_update_annot(). but we do have fz_annot and functions for iterating over them and getting displaylists.09:55.47 
tor8 sebras: you mean to return JNI_FALSE instead of 0?09:56.03 
sebras tor8: would it makes sense to hoist these operations up to the fitz level interface?09:56.05 
  yes.09:56.08 
tor8 sebras: right.09:56.15 
sebras we do that in other functions that return jboolean.09:56.16 
tor8 sebras: fz_annot was (IMO) a bit of a design mistake; I don't want to make it stronger09:57.22 
  it's a superclass with only one descendant, and no prospects to ever get any more09:58.01 
sebras doesn't xps have those too?09:58.15 
  perhaps only links, and that's what I'm remembering, hazily.09:59.24 
kens XPS has hyperlinks, but not annotations09:59.51 
  As I recall09:59.54 
tor8 sebras: only links, which are not covered by fz_annot10:00.42 
  when and if we add more descendants, we'll tackle the issue of generalizing to fz_update_page/annot then10:01.28 
  but now, it would just be pointless object oriented complexity for no immediate gain10:02.09 
sebras tor8: ok, then I agree that it is definitely not pressing. if it was already specified by xps but not implemented yet I'd be hard to convince. :)10:02.18 
  tor8: agreed.10:02.24 
tor8 sebras: "Avoid trying to populate ui without entries." s/ui/OCG ui struct/ ?10:06.15 
sebras tor8: one question though: pdf_load_annots() apperas to se annot->has_new_app = 0 after loading each annot but that contradicts the last paragraph in the explanation of pdf_update_annot() in the header, right?10:06.59 
  tor8: yes, I'll fix.10:07.24 
tor8 sebras: you could possibly be misreading the comment: "if the annotation appearance has changed since ... the annotation was first loaded"10:08.00 
sebras ah, yes.10:08.36 
  tor8: with that "Rejig pdf_update_page and pdf_update_annot." LGTM10:10.52 
tor8 the fz_warn in "Ignore cmap input ranges" I'd maybe rephrase as "ignoring CMap range (%u-%u) that is outside the codespace"10:12.24 
  fz_warn doesn't come with a stack trace, so it's more important to mention what/where it is warning10:13.10 
  I'd ask you to prod Robin about the cmap splay tree fix, but I'm not sure if he's around today before leaving for his holidays10:16.29 
kens He doesn't go till Sunday10:16.41 
tor8 I can't see anything wrong with it, but I don't claim to understand that part of code at all10:16.46 
kens here today and tomorrow10:16.46 
sebras tor8: neither did I, but now I know more. :)10:17.35 
tor8 sebras: with that wording fix, excepting the WIP that you should prod Robin about if you feel uncertain (but if you're confident you have my quiet nod), all on sebras/master LGTM10:17.45 
sebras tor8: as far as I read the code when delete_node() deletes a node the last node is moved to that location and the length of the node array is shortened, if lt and gt point to the last node it will now have moved.10:18.29 
  tor8: "gl: Adobe allows border widths from 0 to 12." to "Add fz_memmem function taken from musl.c" LGTM.10:20.11 
  tor8: in "Clean up null/range/endstream filter." struct null_filter seems to have an size_t extra which is not used.10:35.28 
  also I believe size_t is not used..?10:35.48 
tor8 sebras: you're right. neither 'extras' nor 'size' are used in the null_filter. will remove.10:37.32 
sebras tor8: is there a reason why remain in struct range_filter and struct null_filter is size_t and the len field in fz_range is an int?11:09.34 
  and why do we need to cast the n to int64_t at the end of next_range when updating state->offset and stm->pos?11:10.33 
  tor8: I also notice a difference in next_null() and next_range() in that if fz_available() returns 0, we set up stm->rp and stm->wp in next_range() before returning EOF, but in next_null() we don't. does that matter?11:12.56 
  tor8: perhaps that doesn't matter as I think stm->wp == stm->rp every time we enter next_range() or next_null() because that's the only state when fz_available() would call next().11:16.41 
  the same seems to go for fz_read_byte() and fz_peek_byte(), ok. so whether we reset stm->rp and stm->wp doesn't seem too important.11:18.39 
  tor8: why does fz_open_endstream_filter() bother with silently limiting the len input argument to >= 0 if the range filter does not do the same for the ranges?11:29.00 
  oh and "duff Length" here means that Length indicates a shorter stream than endstream indicates, right?11:30.43 
  tor8: also shouldn't the warning "PDF stream Length incorrect." might be better as "stream length shorter than actual stream length"? at least git grep 'fz_warn[^"]*"[^"]*[\.!?;:]"' seems to indicate that we don't often end with interpunctuation.11:42.47 
  I don't think next_endstream() should set eof at all. it does now when it finds the endstream marker.11:45.12 
paulgardiner My cluster test failed, seemingly because GL/gl.h not present. Should I just try again, or is there a known problem?11:46.50 
sebras even if n > 1 setting eof to 1 causes next_endstream() to only return a single byte in the first call when endstream is found and then any future call to fz_available() indicates that 0 bytes are available (because eof == 1) and so fz_read() would fails to read those bytes.11:47.18 
paulgardiner Ah, of course....11:47.44 
sebras paulgardiner: isn't the problem the missing jbig2dec functions?11:49.35 
  paulgardiner: I think those GL errors happen every time.11:50.15 
paulgardiner sebras: I'd just forgotten to refresh submodules11:50.24 
  ... I believe11:50.34 
sebras paulgardiner: yes.11:50.48 
  the commit your branch is sitting on does update the makefiles I believe.11:51.04 
  tor8: though perhaps run.new in the cluster ought to not only set HAVE_X11=NO but also HAVE_GLUT=no to avoid trying to build mupdf-gl?11:54.02 
paulgardiner Please can someone review the two commit on my master? Cluster seems happy.11:58.00 
sebras paulgardiner: I'm not yet clever enough to LGTM that first focus patch. because focus resides at the document level it seems to me like we only allow for a single focused widget in the entire document, but we allow for multiple pages to be rendered.12:04.17 
  I can see how having two pages with widgets and droping one of the would interfere with the focus on the other page.12:05.19 
paulgardiner The focus isn't well handled. Needs rethinking. But I believe this commit is a definite improvement12:05.21 
sebras but I'm not sure if the correct fix is to allow for multiple focused widgets.12:05.31 
  I think your commit disentagles the focus on one page from dropping another page, yes.12:06.06 
  paulgardiner: hows does acrobat behave if you have >=2 pages with widgets? does it remember the focus?12:07.08 
paulgardiner I've actually worked around this in muso and don't actually need that commit. If it's questionable I can leave it out. I put it up only because it seemed worth having that small improvement12:07.10 
  Actually, I think only one focus is allowed in any case.12:07.47 
sebras paulgardiner: yes, I think it improves things too, not sure if tor8 wants to fix the entire focus issue now though. perhaps? :)12:08.10 
paulgardiner I'm easy either way with that commit.12:08.38 
  The focus pointer in the doc struct is definitely fragile.12:09.28 
sebras paulgardiner: seems like pdfref17 page 652 states that focus is an element in the _document_, which I read as that there can only be one per document, not one per page.12:09.35 
paulgardiner Even with my patch, holding two fz_pages on the same page is problematic12:10.36 
  The other commit is the more important. That one fixes a bug, I believe12:12.55 
  Just nipping out for a run. biab12:13.19 
sebras perhaps we should only clear doc->focus if we are freeing the last reference to the page containing the annotation in focus.12:13.43 
  paulgardiner: in the second comment, why wouldn't accepted be 1 when returned from run_keystroke()?12:22.04 
  paulgardiner: to me it seems that would require the javascript action to set event->rc to 0, but why would that happen?12:22.27 
  tor8: if the buffer ends with "\r\n" we will include these in the data returned to the calle of next_endstream(), upon the next call to next_endstream() we would then read "endstream" into the buffer and then consider the stream at EOF.12:30.56 
  tor8: if we however read a buffer which contains "\r\nendstream" the in one go we will avoid returning "\r\n" to the user.12:31.38 
  s/the user/the caller/12:31.47 
  at least I think this is waht would happen.12:33.07 
paulgardiner sebras: it seems the keystroke events look for a specific pattern and accept just that, which doesn't always include the empty string12:41.00 
  Any more thoughts on my commits?15:32.44 
sebras paulgardiner: I have been reviewing tor8s commits on tor/master, and checking somethings for kate.15:54.07 
  paulgardiner: tor8 did however redo the keystroke annotation patch and put it on tor/master.15:54.37 
  paulgardiner: I took a quick look and it bis basically the same as yours apart from not calling strlen. :)15:55.02 
paulgardiner *str == 0 ?15:56.09 
sebras yes.15:56.18 
paulgardiner Yep, makes sense15:56.31 
  If that goes in, it's all I really need15:56.53 
sebras so once I get through tor/master it'll be in I suppose.15:57.08 
  maybe tomorrow?15:57.16 
  tor8: up to "Detect cycles in pdf_dict_get_inheritable." LGTM.15:59.19 
  Robin_Watts: ok, the top one on mupdf:sebras/master needs a review.16:16.09 
  Robin_Watts: origin/master..sebras/master~1 tor8 already LGTMed.16:16.21 
  Robin_Watts: I might need to rephrase the commit message, but16:16.41 
  it is more the logic of the patch that needs to be checked for soundness.16:16.59 
Robin_Watts lookng16:17.59 
  So, the worry is that we might call 'delete_node' and have gt and lt left as being invalid ?16:19.47 
sebras yes.16:20.03 
  not only worry, it got triggered in by an oss-fuzz file.16:20.22 
Robin_Watts There are only 2 cases where we call delete_node.16:20.32 
  In one of them, we exit the function, so that's not an issue.16:20.43 
sebras correct, but this is the previous one.16:20.56 
Robin_Watts In the other we set current = EMPTY and continue, which sets current = move;16:21.08 
  Ah, I see.16:21.36 
sebras correct, but we might have arrived at the end of the while loop setting lt/gt in the previous iteration.16:21.50 
  before delete_node() is called.16:21.54 
  and pointing to lt/gt is as far as I can see still valid, it is just that the node might have moved after current was deleted by delete_node().16:22.28 
Robin_Watts I understand why your patch is where it is.16:22.51 
  I don't understand what it's doing.16:22.59 
sebras Robin_Watts: so current is the node being deleted,16:23.23 
Robin_Watts You're setting lt and gt (in some circumstance I don't understand) to a value that is about to be deleted.16:23.30 
sebras but when it is deleted the last node in the array at cmap->tlen - 1 is moved into the empty slot.16:23.42 
Robin_Watts Ah.16:24.09 
sebras so I'm actually setting lt/gt to the position where this node is moved into.16:24.12 
Robin_Watts Now that makes sense.16:24.20 
sebras I can doit after delete_node isntead of course.16:24.21 
  but before current = EMPTY.16:24.40 
Robin_Watts How about adding: /* delete_node moves the element at cmap->tlen-1 into current. */16:25.15 
  That'd be enough of a hint to explain what was happening to me, I think.16:25.54 
sebras not doing this later triggered an assert() in delete_node() called at the end of add_range() because gt was not valid and so its left/right pointers were inconsistent.16:25.58 
  I can do that, but you otherwise agree that the logic is sound?16:26.22 
Robin_Watts Yes, with that, it lgtm.16:26.49 
sebras Robin_Watts: that was all, thanks! :)16:27.02 
Robin_Watts no worries.16:27.09 
sebras tor8: "Remove functions that implement duplicate functionality." on page 1038 in pdfref17.pdf it is actually briefly mentioned that the interactive form dictionary /AcroForm is separate from the interactive form field hierarchy. NB this is mentioned in the setting of linearization, but still.17:52.15 
  tor8: ok, so I have reviwed the entire tor/master up to "FIXUP error messages in NULL-openssl impl" it all LGTM. the only small question is on "Respect NoRotate flag for icon-based annotations." where you make a change in do_edit_icon(), but I'm not sure it belongs in that commit.19:42.25 
  tor8 also after clustering I noticed that a few files had issues reporting a large number of warnings.19:43.14 
  tor8: I believe that tsebras/master now contains a suitable FIXUP commit for that issue.19:43.33 
  additionally there's a WIP commit for allowing alteration of the sanitize flags and specifically allowing me to compile MSAN (which requires compiling and linking with clang).19:44.17 
sebras afk.19:47.04 
tor8 sebras: up to sebras/master~1 LGTM. I don't understand the WIP clang msan changes at all22:07.22 
  you change $(LD) to ld, which breaks cross compiles22:07.38 
  changing += to ?= has no effect on variables set on the command line... command line variables always override all makefile variables22:08.10 
  you probably want to add a SANITIZE_FLAGS += $(SANITIZE_XFLAGS) line instead22:08.28 
 Forward 1 day (to 2018/08/10)>>> 
ghostscript.com #ghostscript
Search: