Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2018/01/21)20180122 
FLEXO_ Hi. Is it possible to change scrolling direction in MuPDF?06:47.08 
Steve_ Hi, I've tested the "mupdf viewer" and I found some annoying bug: I read some ebook (epub) and the Viewer "crops" some pages, that the last row is missing. First I thought it's a bug in the epub and I opened it with another Reader. No Problem, the missing words are there. Opened in mupdf Viewer, the words are missing. So why and where are the words gone?07:23.28 
kens Steve_ : if you think you've found a bug, please open a bug report. Its not possible to even speculate about the problem without seeing the file in question, so you'll need to supply that and the easiest way to do that is to open a report.08:07.28 
_baskerville_ @kens: I've already made reports for this problem: 698875 and 69877008:25.54 
kens OK then presumably someone will look at it in time08:27.02 
  Assuming its the same problem08:27.22 
_baskerville_ I've started investigating it myself: mupdf relies on ceilf(ch->html->root->h / ch->html->page_h) to compute the number of pages inside a chapter.08:28.35 
kens _baskerville_ : I regret to say there's no point telling me about it, since I'm not a MuPDF developer..... But if you want to discuss it here, go ahead08:29.12 
_baskerville_ Sorry: I tought the devs were there.08:30.03 
kens Bit early yet, unless sebras is around, but its late for his zime zone08:30.32 
FLEXO_ still no help for the scrolling direction problem?10:12.31 
kens You'll need one of the developers for that10:13.03 
  Though I'm not sue exactly what you mean by 'change the scrolling direction'10:13.33 
FLEXO_ if i open a PDF in the MuPDF viewer and scroll with the mouse wheel it works the opposite way (Page order) than lets say in the firefox PDF viewer10:32.42 
  thanks anyway for the answer10:32.58 
kens Well there are several problems there. MuPDF actually refers to the underlying library, the various viewers are implementations of code which is more or less demo code on top of that library.10:33.30 
  Each of the viewers differs in its capabilities, and you haevn't said which one you are using. Or on which OS.10:33.57 
  If you don't like it, you can easily change it I would think.10:34.08 
FLEXO_ OS is win 10 and the viewer is the one from the homepage Version 1.12.010:39.25 
kens I'm reasonably certain the only way to change the drection of operation of the mouse scrollwheel is going to be by changing the code and recompiling it.10:39.57 
  Though it doesn't sound like an especially hard task10:40.11 
FLEXO_ :D so thats a bit tricky for me. I only develop PLC code. Thanks anyway10:41.08 
malc_ FLEXO_: any particular reason why you want to use vanilla mupdf viewer and not something based on the mupdf library, such as, say, summatrapdf?10:44.55 
FLEXO_ I really like to have it as minimal as possible. But hey, i installed summatra. Thanks for the hint.10:48.13 
paulgardiner What does fz_open_null do? It looks like it might be a filter that gives access to a specified range of bytes from a stream, but then, if so, I don't get the name.13:25.35 
tor8 it's a 'null' filter in postscript terminology13:25.52 
  one that doesn't modify the bytes (just restricts to a subset of the 'parent' stream)13:26.05 
kens A bit bucket :-)13:26.06 
  Oh a read filter13:26.26 
paulgardiner Ah "null" in the sense that it doesn't process the bytes.13:26.39 
  Could be that null and concat are just what I need. I want to pass a stream to the pkcs7 library to tell it what bytes to hash. Possibly I can use null and concat to avoid having to also pass the byte ranges.13:28.33 
  ... but concat seems to take ownership of the substreams which isn't ideal.13:29.02 
  I could be completely misunderstanding what these so.13:29.17 
  do13:29.31 
tor8 yeah, the concat stream also adds a space character between the streams (it's meant to be used for concatenating the substreams in a PDF content stream)13:29.35 
paulgardiner Oh right. Not quite what I'm looking for.13:30.02 
tor8 paulgardiner: you could make a new filter which works similar to 'null' filter (or generalise it) to take a set of ranges rather than just one range13:30.17 
paulgardiner Yes, I was just looking to do that, when I saw there were similar things already there.13:30.42 
tor8 it should be simpler than the concat filter because you can base it on one source stream rather than needing to switch between streams13:31.14 
paulgardiner Yes true13:31.26 
tor8 and just seek past the bits you don't want13:31.36 
paulgardiner concat was possibly not the ideal way to do it in any case13:32.01 
  Hmmm, fz_open_null takes ownership too, I think. That may be a problem13:36.14 
  Strange, because it doesn't have "drop" in its name.13:37.11 
tor8 paulgardiner: the filters have a quirky way of owning reference counts13:37.41 
  it's a bit problematic in places, but we haven't got around to cleaning it up yet13:37.57 
  this code predates reference counting13:38.01 
  if you want to fix it up to be sane, please do! :)13:38.25 
paulgardiner :-)13:38.31 
tor8 the complexity comes from how filter chains are built up in the pdf interpreter13:38.41 
paulgardiner I can imagine doing a lot of damage trying to clean that up13:39.10 
tor8 paulgardiner: if you make a new filter that doesn't take ownership, just call it "fz_new_skip_filter" rather than "fz_open_skip"13:41.00 
  and when I get around to cleaning up these murky areas, I'll change the names of the filter creation functions13:41.15 
paulgardiner I could make a wrapper filter that has no filtering effect at all but avoids droping it's argument. Bit of hack13:41.20 
  That way I could take my stream that I don't want dropped, wrap it and pass it to the generalised null filter.13:42.04 
  In one case, I'm accessing doc->file. Would be bad if I closed that I think.13:43.28 
tor8 you could fz_open_null(fz_keep_stream(doc->file))13:44.37 
  as a temporary fix until the code is fixed up13:44.59 
paulgardiner Oh I see. I didn't think streams were reference counted. I'm confusing them with outputs possibly14:17.30 
tor8 paulgardiner: it *used* to be they weren't reference counted (hence why fz_open_null takes ownership)14:22.16 
sebras titanous: https://github.com/google/oss-fuzz/blob/master/projects/mupdf/pdf_fuzzer.cc#L28 what is setting ctm here?15:10.15 
  titanous: seems like you might as well use fz_identity..?15:10.31 
  paulgardiner: ugh, why does the cluster claim that the fread you put into pdf-write as part of support for levels of incremental xref sections cause a _new_ warning when I run my code? :-/15:58.03 
  paulgardiner: that's not fair.15:58.09 
kens Happens with GS all the time15:58.25 
  The code for detecting if a warning is new is not 100% reliable15:58.48 
  If you look at my last commit it raises new warnings on jpegxr, which I didn't touch.....15:59.10 
sebras kens: nope, apparently not. if I had edited that file so the line had changed or something, then I'd understand it.15:59.13 
titanous sebras: I think that was actually just copied from some example, if that's not the right way to do it, I can fix, just let me know what it should be16:07.36 
sebras if you change line 28 to be fz_matrix ctm = fz_identity; I think we should be safe.16:08.52 
  titanous: if ctm is not initialized with any values then they might spread to other places.16:09.21 
titanous k, will do16:09.52 
sebras tor8: Robin_Watts: do you guys mind taking a look at sebras/master?16:10.54 
  it is clustering as we write.16:10.59 
Robin_Watts sebras: Looking.16:13.58 
  sebras: All 4 look plausible to me.16:16.23 
sebras Robin_Watts: is that LGTM?16:17.09 
  Robin_Watts: or should I ask tor8 too?16:17.16 
Robin_Watts lgtm16:22.42 
sebras fredross-perry (for the logs): call_SeekableOutputStream_tell(), call_SeekableInputStream_seek(), call_SeekableOutputStream_seek(), call_SeekableInputStream_close() has a fz_throw(...., "env is NULL...") call where the indentation is not correct.16:23.31 
titanous sebras: the fuzzer is fixed, do you think that was the cause of any of the crashes?16:24.51 
sebras titanous: I'm not sure. I have yet to be able to reproduce a few of them.16:25.08 
  titanous: most of them I can reproduce however.16:25.15 
titanous sebras: can you link me to the testcases on oss-fuzz that you can't repro? I can try16:25.45 
sebras titanous: 5502 is one of those.16:29.40 
  titanous: that would be the minimized testcase from https://oss-fuzz.com/v2/testcase-detail/6194612382203904?noredirect=116:30.06 
  can't see it using gcc ASAN, nor valgrind locally.16:30.33 
titanous try clang-6.0, which is the version used by oss-fuzz16:31.13 
sebras fredross-perry (for the logs): call_SeekableOutputStream_tell(), call_SeekableInputStream_seek(), call_SeekableOutputStream_seek(), call_SeekableInputStream_close() has a fz_throw(...., "env is NULL...") call where the indentation is not correct.16:31.16 
titanous it repros using the reproduce tool16:31.26 
sebras titanous: ok, good to know.16:31.33 
  titanous: I haven't got clang-6.0 available to install at the moment. and looking at the other bugs is probably a better idea.16:32.04 
Robin_Watts sebras, tor8: So, we have a customer who is using MuPDF to convert from PDF to SVG.16:33.01 
  And he's complaining that we don't output 'layer' information in the produced SVG file.16:33.18 
  The layer information would come from the 'OC' stuff in PDFs.16:33.46 
titanous sebras: https://bugs.chromium.org/p/oss-fuzz/issues/list?can=2&q=label%3AClusterFuzz-Top-Crash+project-mupdf that's the list of bugs slowing down the fuzzers, but I'd say the security bugs should be the first priority16:34.01 
Robin_Watts To get that stuff through, we'd need to add a new device method, I think.16:34.01 
fredross-perry sebras - in thos cases the "if" is indented with one tab, and the fz_throw with 2. ??16:34.35 
sebras titanous: that link didn't work for me.16:34.46 
titanous sebras: make sure the right account is selected in the top right dropdown16:35.15 
sebras titanous: got it.16:36.42 
  fredross-perry: ah, my bad! I hadn't fetched!16:39.52 
fredross-perry ok!16:40.05 
tor8 Robin_Watts: fz_begin/end_layer or group or something?16:42.19 
Robin_Watts tor8: That's what I'm thinking.16:42.34 
tor8 Robin_Watts: sounds like a reasonable extension, and would map fairly naturally to other language's constructs16:43.03 
  like svg <g> tags etc maybe16:43.09 
Robin_Watts tor8: It's exactly svg <g> tags I need to make :)16:43.20 
  Can we assume strict nesting?16:43.43 
  If not, the interpreter needs to store what layer it's in.16:44.13 
  (I'm going to say layer, rather than group, cos groups already have meaning)16:44.27 
tor8 yes, strict nesting is the only sane approach, IMO16:44.47 
Robin_Watts In PDF we have /OC /Foo BDC ... EMC, so strict nesting is implied.16:44.59 
sebras fredross-perry: would we need to declare all members and constants in the interfaces public? I've read that abstract, default and static methods are implicitly public, but I'm not sure if our members fall into either class.16:45.26 
tor8 if there's a mismatch between q and Q in PDF with clip groups etc the nesting may be interleaved with trasparency groups etc16:45.33 
  may be useful to trap and end the layer early when that nesting is broken in PDF files16:45.47 
fredross-perry I've read that you don't need to do that for if members.16:45.52 
tor8 it shouldn't happen, but you never know16:46.01 
sebras fredross-perry: ok.16:46.01 
Robin_Watts tor8: The problem is what to pass in fz_begin_layer.16:46.37 
  tag properties BDC16:46.50 
sebras fredross-perry: there's still the issue in Document_openWithStream() where stm leaks if fz_open_document_with_stream() calls fz_throw().16:46.59 
  fredross-perry: you need to use fz_try() similar to PDFDocument_saveWithStream().16:47.17 
Robin_Watts tag is a name, so no problem. properties will be hairier.16:47.54 
fredross-perry sebras - dropping stm is handled under fail: right now. But I can rearrange that.16:49.26 
Robin_Watts In fact, we'd only pass on for when tag = /OC.16:49.27 
sebras fredross-perry: ah I see now, in interfaces abstract methods need not explicitly be declared abstract, just not supplying an implementation is enough.16:49.56 
  fredross-perry: yes, but doing so is not enough.16:50.10 
  fredross-perry: if fz_open_document_with_stream() throws then we will simply longjmp() out from the function.16:50.31 
  or at least try to.16:50.38 
Robin_Watts so properties can either be an OCG or an OCM dictionary.16:50.38 
fredross-perry sebras - oh I see.16:50.46 
sebras fredross-perry: this is what happens inside fz_throw().16:50.49 
fredross-perry seme with fz_new_stream, I presume?16:51.01 
sebras fredross-perry: yes.16:51.06 
fredross-perry ok16:51.10 
sebras fredross-perry: fz_drop_*() and fz_free() may never call fz_throw() though.16:51.42 
fredross-perry ok16:51.48 
sebras fredross-perry: that's why you have to rearrange jni_attach_thread() a while back.16:52.03 
  s/have/had/16:52.09 
fredross-perry right16:52.57 
  sebras - ok look again, thanks.17:05.05 
sebras fredross-perry: looks safer, yes.17:09.16 
  fredross-perry: am I right in thinking that we now don't need the detach argument to jni_attach_thread() and jni_detach_thread()?17:09.35 
  fredross-perry: it seems to me that we'll return NULL on every error and hence we we never reach jni_detach_thread() unless detach == 1..?17:09.57 
tor8 Robin_Watts: yeah. I'm not sure. a name/tag string would be a start at least.17:11.30 
sebras fredross-perry: oh, and you still add add line in Document_finalize() which adds the now unnecessary idoc variable.17:11.34 
tor8 maybe a list of key/value string attributes as well.17:11.43 
fredross-perry i'll remove unnecessary idoc variable17:12.04 
Robin_Watts tor8: A name string is enough, I think.17:12.12 
sebras fredross-perry: what about the deatch thingy..?17:12.25 
Robin_Watts (Possibly, we ought to send a list of strings, one for each layer that's in force)17:12.49 
  but for now, just 1 will do.17:12.54 
fredross-perry there's a case: else if (state == JNI_OK) where detach might be 0.17:13.44 
  iow we were already attached.17:14.09 
sebras fredross-perry: ah, yes, because in that case env is not NULL. I see.17:15.07 
fredross-perry ok.17:15.35 
tor8 Robin_Watts: yes.17:16.18 
fredross-perry sebras - pushed again (no extra idoc)17:19.05 
sebras tor8: fred/master looks reasonable to me. did I miss anything?17:23.58 
fredross-perry don't think so.17:24.13 
  thanks for all the fish.17:24.26 
sebras fredross-perry: :)17:24.51 
fredross-perry should I push this then?17:25.36 
sebras fredross-perry: I'd want tor8 to LGTM it too.17:26.12 
fredross-perry ok, let me know.17:26.26 
sebras fredross-perry: will do.17:26.30 
  fredross-perry: if he chimes in and you're not here I might push your patch to master and let you know.17:26.48 
fredross-perry that's fine too.17:27.01 
sebras Robin_Watts: one more commit fixing 698885 on sebras/master, the fread() one is gone now that pauls thingies were merged.18:15.33 
  Robin_Watts: still around?18:48.57 
  I have a question about copy_node_types(). it asserts that low == high. why?18:49.08 
Robin_Watts sebras: I am.18:49.09 
sebras Robin_Watts: because I have this node in a fuzzed file: R177:^178<EMPTY>175(32,40,51,1) and of course it asserts.18:49.55 
Robin_Watts If node->many, then we're encoding a type where low must == high.18:50.09 
  Honestly, this is out of cache, you'll need to bear with me.18:50.20 
sebras my thinking is that low and high defines a range.18:50.58 
Robin_Watts OK, so the only way many gets set is if add_range gets called with 1 as its last argument.18:51.36 
sebras and the output for .low would be .out and the output for .low + 1 would be .out + 1, etc.18:51.48 
Robin_Watts which only ever happens from add_mrange18:52.35 
  And add_mrange passes low as both low and high.18:53.02 
  So I suspect we're into 'indexing off the end of the table' territory here.18:53.41 
sebras I put a breakpoint in add_mrange()18:53.44 
  and it has low == 49 and len == 2 on input, which seems sane.18:54.01 
Robin_Watts have you built with CHECK_SPLAY defined?18:54.18 
sebras and as you say, the call to add_range() has both low and high == 49 and many == 1.18:54.18 
  I have, and it doesn't trigger.18:54.25 
Robin_Watts have you built with DUMP_SPLAY defined? :)18:54.34 
sebras it only checks that the left and right childs parent is correct.18:54.35 
  it doesn't check that the ranges are correct.18:54.45 
  I have built with DUMP_SPLAY, yes.18:54.55 
  I had to undef for cmapdump.c though... ;)18:55.04 
Robin_Watts And does the splay tree look sensible?18:55.10 
sebras I can't tell reall, I'm still trying to understand both the datastructure and the way you dump it.18:55.33 
  Robin_Watts: do you want me to pastebin it?18:55.46 
Robin_Watts sebras: Splay trees are "just" binary trees.18:55.59 
  each node has a low/high/out. And left/right/parent pointers.18:56.40 
  node->left->low < node->low < node->right->low18:57.29 
sebras as I understand "R177:^178<EMPTY>175(32,40,51,1)"18:57.34 
  this means that node 177 has node 178 as its parent, the left child is empy and the right child is node 175 and THIS node covers the rnage 0x32-0x40 without starting at 0x51, but it is a many node.18:58.15 
  would the following also be true? node->left->high < node->low < node->right->low18:59.07 
Robin_Watts sounds plausible.18:59.15 
  yes, I believe that's true.18:59.32 
sebras Robin_Watts: perhaps we accidentally mess up the tree some time after having inserted the node.18:59.47 
Robin_Watts node->left->high < node->low and node->high < node->right->low18:59.59 
  cos if node->left->high == node->low we merge the two (assuming they are the same type)19:00.24 
sebras what does type mean in this case?19:01.09 
  Robin_Watts: the types are many, < 0xffff, and others?19:01.41 
Robin_Watts sebras: looking.19:01.53 
titanous sebras: as an aside, https://apt.llvm.org makes it very easy to get clang-6.019:02.31 
Robin_Watts sebras: So, we get values thrown at us by the CMAP file. We build the tree as a splay tree from those details.19:03.05 
  Then we break the details out into 3 arrays.19:03.17 
sebras titanous: good to know, but I'll fix the bugs I can reproduce easily first. :)19:03.48 
titanous cool19:04.01 
Robin_Watts mranges are "1 to many", ranges are "1 to 1" (< 0xffff), xranges are "1 to 1" (>= 0x10000)19:04.15 
sebras Robin_Watts: ok. I have a slight inclination of where the issue stems from: https://pastebin.com/raw/N796ueGr19:07.57 
  i.e. don't mess with my flate bytes.19:08.29 
  but we ought to be able to parse these things without assert()s though.19:08.50 
Robin_Watts indeed.19:10.49 
sebras Robin_Watts: ok, when I added assert(!node->many || (node->many && node->low == node->high)); into do_check() tree, I get an assert when we try to call add_mrange().19:19.57 
Robin_Watts assert(!node->many || node->low ==node->high); would have done, wouldn't it ?19:21.32 
  http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=77aa044f3378dc2fb31cca285a5ee3270857ec0219:22.16 
  sebras, tor8: That's my initial begin_layer/end_layer commit.19:22.32 
sebras Robin_Watts: can we really end up calling svg_dev_end_layer() an unbalanced number of times?19:26.18 
  yes, you assert() would be enough, true. I was being silly though and wrote assert(node->many && node->low == node->high) first.19:27.11 
  then I just extended it to paper over my thinko.19:27.22 
  Robin_Watts: and if you care about them being unbalanced in the SVG device I think we should do so in the trace device to, no..?19:28.16 
Robin_Watts sebras: The trace device is supposed to show us the raw device calls.19:36.24 
  And, yes, we could be unbalanced if the PDF stream contains crap.19:36.51 
sebras Robin_Watts: ok. and now having read more there doesn't seem to be any code fixing any unbalanced calls before hand.19:37.06 
Robin_Watts sebras: Indeed. We assume it to be balanced - isn't really much else we can do.19:37.29 
  In the SVG thing, I just do some really minimal sanity checking.19:38.05 
sebras Robin_Watts: the id attribute to a g element must be unique though. so if we end up with multiple calls to BMC or BDC without any tag, we're in deep trouble.19:38.18 
  Robin_Watts: does PDF require the tags to be unique throughout the page?19:38.37 
Robin_Watts sebras: No. You'd expect to see multiple things in each.19:38.54 
sebras Robin_Watts: https://www.w3.org/TR/SVG/struct.html#IDAttribute am I understanding this incorrectly?19:39.25 
  as I read it there is a requirement of uniqueness, but we can't not only not guarantee it, but also we're expecting that the tags _will_ be the same.19:40.41 
Robin_Watts https://www.w3.org/TR/SVG11/struct.html#IDAttribute19:41.30 
sebras yes, I sent that link a while ago. :)19:42.00 
Robin_Watts older version of the link, but yes :)19:42.18 
sebras meh.19:42.23 
Robin_Watts yeah, unique is a problem.19:42.32 
sebras and the grouping too somehow.19:42.46 
Robin_Watts let me get back to Vladimir and see what he says. He can probably make more examples.19:42.53 
  Thanks.19:42.59 
sebras I'm thinking that SVG expects to have everything that belongs to one group in one g element, but we don't really do that if there are multiple calls to BMC/BDC which have identical tags.19:43.22 
  but these calls are disjoint and separate by other content.19:43.55 
  separated.19:44.00 
  structurally the code look nice though. yey for that! :)19:44.18 
  I need to eat and then I'm going back to the many ranges, see you in a bit.19:45.01 
titanous sebras: I'm requesting CVEs for all of these bugs, do you mind triaging https://oss-fuzz.com/v2/testcase-detail/4831843418374144 to determine if the bug is in OpenJPEG or mupdf?21:32.15 
sebras titanous: I tried to reproduce using the opj_decompress tool that openjpeg provides but it didn't trigger. I don't know why yet, it ought to.21:47.18 
  titanous: to me the bug looks like it is in openjpeg. I'd need further time to prove it though.21:47.38 
  titanous: it seems strange to come running to openjpeg people with a bug that might be in their code but can't be reproduced cleanly with their tools.21:48.25 
titanous makes sense, I'll just file for a CVE for mupdf for it and then we can let OpenJPEG know when you have more time to sort it out21:49.33 
tor8 Robin_Watts: having the layer name be a const pointer in the display list worries me22:02.41 
  the pdf_obj string could go away while the display list is still alive22:02.53 
  I'd suggest doing a fz_strdup and fz_free on the layer name22:03.06 
  and sebras already brought up the problem of svg/xml 'id' attributes needing to be unique22:03.36 
  other than that, looks fine22:04.05 
sebras tor8: question!22:04.45 
  tor8: quick look at sebras/master?22:04.57 
  :)22:05.12 
tor8 sebras: all except "Do not throw away byte when lexing tokens without strings." LGTM22:07.47 
  I need to look at that one more closely, remind me to do that tomorrow22:08.07 
sebras why?22:08.16 
  we do the same thing in pdf_lex().22:08.23 
tor8 because I don't understand it by just reading the diff22:08.27 
sebras ok.22:08.33 
  context. tomorrow! :)22:08.37 
tor8 it's probably okay, but I need to fire up my editor and look at the context and I'm just about to turn off my computer for the night22:08.56 
sebras ok, I'll push the rest and let you look at this one tomorrow.22:10.20 
tor8 sebras: cool. ttytm.22:10.29 
 Forward 1 day (to 2018/01/23)>>> 
ghostscript.com #ghostscript
Search: