Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2018/04/22)20180423 
Robin_Watts tor8: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=67a7449fc1f186f318942b9c6b8d66d4458b7d8709:17.27 
tor8 Robin_Watts: LGTM.09:18.01 
Robin_Watts Ta.09:18.07 
tor8 Robin_Watts: there's a long list of commits on tor/master that need reviewing/discussing09:18.16 
Robin_Watts tor8: OK, just a mo...09:18.38 
cosimone Hi everyone, is this channel appropriate to ask minor questions and doubts about the android application? (i'm referring to the one you can find on fdroid)09:45.03 
moolc cosimone: i'm not a mupdf developer, but those guys are here and will most likely not spank you for asking droid questions... sebras/tor are the ones mainly responsible for droid stuff09:58.48 
  Robin_Watts: i already asked that in the past - no resolution.. Is it somehow possible to clone mupdf and submodules (important part) shallowly?10:07.24 
Robin_Watts tor8: Sorry, back.10:14.40 
  moolc: I don't follow.10:15.25 
  If you "git clone" then you get just the main repo, no submodules.10:15.41 
  If you git clone --recursive then you get the thirdparty submodules too.10:15.58 
  None of the submodules have subsubmodules as far as I know.10:16.14 
tor8 moolc: git submodules and shallow clones don't coexist well last I checked (about a year ago)10:16.52 
  moolc: you can sort of work around it by cloning mupdf (deep or shallow) without --recursive10:17.37 
  and then initialize the submodules manually, shallow or deep as you wish, but caveat emptor, etc, etc.10:17.54 
Robin_Watts ok, shallow clones is clearly something I don't understand.10:18.31 
  tor8: So, your commits...10:18.41 
  The PDF_NAME one, I am warming to.10:19.38 
moolc tor8: oh :( well let's keep our fingers crossed that not many people will use http://repo.or.cz/llpp.git/blob/b828b5a0553f3810c61628fcddb007066cdde389:/misc/bootstrap.sh10:20.00 
Robin_Watts In the current code, if I do PDF_Name_Bogus, it'll tell me there is no such name.10:20.11 
tor8 Robin_Watts: it will now too (but maybe slightly less clearly)10:21.03 
  error: use of undeclared identifier10:21.08 
  'PDF_ENUM_NAME_fooFont'10:21.08 
Robin_Watts In the new code if I do PDF_Name(Bogus), it'll tell me there is no PDF_ENUM_NAME_Bogus.10:21.17 
  I can live with that.10:21.21 
  One minor idea...10:21.31 
  The vast majority of the PDF_MAKE_NAME(A,B) things have A == "B"10:21.50 
  is is worth having #define PDF_MAKE_NAME(A) and PDF_MAKE_AWKWARD_NAME(A,B) ?10:22.22 
  The second commit, the pdf_name_eq one...10:23.46 
tor8 Robin_Watts: I did that at first, but it makes it harder to keep the list alphabetically sorted10:24.27 
  now I just pipe it through 'sort' and all is safe and sound10:24.44 
Robin_Watts Good answer.10:24.52 
tor8 with the AWKWARD macro, that won't work...10:24.56 
Robin_Watts Currently pdf_name_eq(A,B) checks for A and B being names, and A == B, and if that fails, it resolves A and resolves B, and then compares the two.10:25.18 
tor8 I figured I'd take the hit to awkwardness and not run into the same problem that we had once when sebras sorted the list, but had his LOCALE set to not-C and got some non-bytewise sort order...10:25.40 
Robin_Watts So stuff like pdf_name_eq(PDF_NAME(Foo), pdf_new_name(ctx, "Foo")) works now, but won't work in future.10:26.10 
tor8 Robin_Watts: right. you mean if either of A or B in an indirect object pointing to a numbered pdf object that is a name10:26.27 
Robin_Watts and pdf_name_eq(PDF_NAME(Foo), pdf_new_reference(ctx, PDF_NAME(Foo));10:26.33 
tor8 pdf_new_name("Foo") will always return the constant enum thing for PDF_NAME_Foo10:26.57 
Robin_Watts tor8: It will ?10:27.10 
tor8 yes. we bsearch the PDF_NAME_LIST looking for a hit, before we alloc a new pdf_obj10:27.27 
Robin_Watts Ah, ok.10:27.38 
  So my objection is just about the references.10:27.48 
tor8 and I'm not sure I care for the (oh god, my brain hurts, please don't do that) case of indirect references10:28.13 
  but I guess I should run that through the cluster just to make sure we don't actually have that problem10:28.32 
Robin_Watts If we're concerned about the overhead, then we should use a static inline for doing pdf_name_eq (for the simple case, falling back to a non-inline for the ref case)10:28.56 
  also, pdf_name_eq(PDF_TRUE, PDF_TRUE) should presumably not pass ?10:29.20 
  Hmm. The existing code will pass that.10:29.31 
tor8 Robin_Watts: the next commit after the PDF_NAME() one removes the pdf_name_eq function completely10:30.24 
  I thought you were talking about that one10:30.29 
Robin_Watts I am.10:30.33 
tor8 okay. just so we're on the same page.10:30.45 
Robin_Watts I'm saying that I don't like the direct comparison.10:30.54 
  pdf_name_eq(A,B) is at pains to only say true, if A and B are both names.10:31.17 
  not generic objects.10:31.22 
tor8 after that commit, pdf_name_eq is called in one place only10:32.58 
  all other comparisons were with a constant10:33.06 
Robin_Watts tor8: ok... and that place is?10:33.26 
tor8 pdf_add_portfolio_schema10:33.47 
Robin_Watts Not dict_get ?10:34.03 
tor8 so is your objection that we no longer return true for pdf_name_eq when comparing /Foo with 10 0 R where it points to 10 0 obj /Foo endobj10:34.13 
Robin_Watts That is one objection.10:34.28 
tor8 pdf_dict_get does not use pdf_name_eq10:35.54 
  I have to admit I didn't think about the /Foo == 10 0 R case, because I don't think I've ever seen that sort of structure ever used10:36.43 
Robin_Watts I'm struggling to build this version :(10:37.54 
  Ok, so the name change commit screws the windows build royally.10:42.46 
  does generate.bat delete the .h file maybe?10:43.26 
  No, the solution file does.10:43.59 
  pdf-font.c calls pdf_name_eq twice.10:45.11 
  but probably doesn't need to.10:45.50 
  OK, so broadly I'm happy with that, *if* we are happy that we don't need to redirect through references. Which seems like a big if.10:46.31 
  I worry that we'll strip this out now, only to hit a problem file in the future, and have to shove it all back in again.10:46.54 
  The next commit... the reordering of NULL/TRUE/FALSE.10:47.56 
  I seem to remember pondering this at a time.10:48.04 
  The attraction of having PDF_NULL being NULL etc.10:48.18 
  and also the idea of having PDF_TRUE == 110:48.26 
tor8 I looked at the pdf reference, and there it says that one should never make the distinction between 'null' and a missing value10:48.28 
Robin_Watts but we can't have PDF_FALSE == 0 too.10:48.52 
tor8 so having the distinction between PDF_NULL and NULL doesn't seem useful10:48.53 
  Robin_Watts: unfortunately no10:49.00 
  but having them be pointer values 0, 1, and 2 is easier when debugging, at least10:49.16 
Robin_Watts OK. So the change looks wrong to me.10:49.37 
  but then I'm confused by it looking wrong anyway.10:50.12 
  In pdf-object.h10:50.21 
  we have PDF_NAME_LIST10:50.28 
  The old code used to allow slots at the start for NULL, TRUE and FALSE.10:50.44 
  and then have all the others.10:50.52 
  But the old code used to actually list the objects in the other order.10:51.16 
tor8 the old code had a slot at the start for a DUMMY value that was never used (to reserve the NULL pointer)10:51.24 
Robin_Watts DUMMY, then names, then NULL, TRUE, FALSE10:51.30 
tor8 and then the PDF_NULL, TRUE, FALSE after the names10:51.35 
  this code puts three dummy slots ath the start for NULL, TRUE, FALSE then the names starting at index 310:51.52 
Robin_Watts D'Oh. I can't read diffs.10:52.11 
tor8 and in pdf_new_name I start the bsearch with left=3 (skip the dummy slots)10:52.51 
Robin_Watts pdf_name_eq is wrong.10:53.12 
  cos pdf_name_eq(PDF_NULL, PDF_NULL) will return 110:53.23 
  (looks like it was wrong before too :( )10:53.52 
  pdf_name_eq should only ever return true, if both elements are names.10:54.31 
tor8 that's easy enough to fix10:54.42 
Robin_Watts Indeed. With that fix, I'm happy.10:54.50 
  So, I'm happy with the first (if we fix the VS build).10:55.19 
  I'm unhappy with the second (because dereferencing seems important to me)10:55.52 
  I'm happy with the third (if we fix it to only equate names)10:56.12 
tor8 yeah. I'm starting to have second thoughts about the second (losing auto-dereferencing there may not be worth it)10:59.15 
Robin_Watts The next one... pdf_new_obj_from_str dates from c69a9ace9411:00.53 
  Which states it was added for zeniko.11:01.00 
tor8 They haven't pulled from us in 3 years11:01.55 
Robin_Watts tor8: Fair enough.11:02.27 
tor8 https://github.com/sumatrapdfreader/sumatrapdf/tree/master/mupdf/source/pdf11:02.28 
Robin_Watts (tor8: Is it worth us forking with an up to date version?)11:02.53 
tor8 so I figure they're happy enough with their fork (and don't care enough about new features to suffer through our API instability, given how many local patches they've added)11:03.11 
Robin_Watts (probably work we don't need. Depends on how much of a "shop window" sumatra is for us)11:03.14 
  Next one, the removal of 'doc'.11:04.07 
  I don't object. The argument against the change is regularity, but I'm not offended by it.11:04.34 
  Arguably it's clearer only to put 'doc' into things that actually remember the doc.11:04.47 
  Next one looks good.11:05.19 
tor8 with the dict_put_int, etc functions, we also don't need to call the pdf_new_int, etc functions nearly as often11:05.43 
Robin_Watts tor8: yeah.11:08.47 
tor8 Robin_Watts: one random idea occurred to me (feel free to hate it): PDF_NAME_EQ(ctx, var, Foo) that resolves to pdf_name_eq(ctx, var, PDF_NAME(Foo))11:08.49 
  not sure if it's worth it, but it would be shorter11:09.06 
  nah, I already hate it myself.11:09.10 
  just needed to type it out11:09.13 
Robin_Watts The worry I have with that is that PDF_NAME_EQ(ctx, Foo, var) won't work.11:09.21 
  ok :)11:09.39 
tor8 exactly, and it hides the third argument processing, which is icky11:09.45 
Robin_Watts The cmap stuff looks clever.11:09.53 
cosimone ok thanks. so, first of all, what does the button near the search button do? it switches from grey to blue and vice versa when pressed, but nothing seems to happen11:11.48 
Robin_Watts cosimone: The chain icon?11:12.03 
cosimone yes, that one11:12.13 
Robin_Watts It makes "links" active.11:12.16 
  i.e. if you click on a hyperlink it follows it.11:12.28 
cosimone oh, i see. i tried it on a document without links, that's why i didn't notice it, thanks11:12.51 
Robin_Watts no worries.11:12.57 
  tor8: So how does the merging stuff work?11:13.17 
cosimone another small thing, is there any way to select text? long press doesn't seem to work11:13.25 
Robin_Watts cosimone: I can't remember a way at the moment.11:14.32 
  Possibly you can go into reflow mode, and then select from there?11:14.41 
cosimone i'll try later11:14.52 
  no problems if you can't recall now, it's not urgent. thanks anyway, and keep up the good work!11:15.24 
Robin_Watts tor8: So the plan is to check in cmaps produced using this.11:15.45 
  cos if windows users are relying on mutool cmapdump to be able to do dump cmaps, we need some to get started with.11:16.56 
tor8 Robin_Watts: yeah. the plan is to check in all but the humongous font dumps11:19.12 
  Robin_Watts: which 'merging' stuff?11:19.37 
Robin_Watts the "share" stuff.11:19.53 
tor8 first it creates a flattened representation of the involved CMaps11:20.18 
  which just has all the ranges expanded, so it only maps single characters11:20.47 
Robin_Watts /UniCNS-X usecmap. Gottit.11:20.49 
  Nice.11:20.59 
tor8 then I extract the common subset into a -X cmap which both inherit using usecmap11:21.05 
Robin_Watts yeah, that's the bit I was struggling to see.11:21.24 
  Nice trick.11:21.27 
  Are there other potential savings still lurking in here?11:21.41 
tor8 I did it before, but forgot to check in and then lost my scripts11:21.46 
  so I recreated them properly again, and reran it on the latest CMaps11:22.06 
  possibly, the GB* cmaps may have some common bits that could be extracted likewise11:23.19 
  yeah. that could shave another 80kb11:26.57 
Robin_Watts Is it worth an exhaustive run to compare every cmap with every other cmap?11:27.22 
tor8 possibly, but the savings would get smaller and smaller11:29.03 
  sharing the GB* cmaps would save ~160k11:29.29 
Robin_Watts tor8: yeah, I just wondered if there was maybe some smaller lumps that were common to lots of files (like symbols etc)11:29.57 
tor8 the next on the list would be KSC and RKSJ and those might save 10k-20k each11:29.58 
  Robin_Watts: if we run flattencmap.py on all of them, is there a tool that can do a similarity score?11:30.36 
  well, diffstat I guess11:30.40 
Robin_Watts pass, "comm" is a new one on me as it is :)11:30.54 
  comm | wc -l ? :)11:31.16 
  (Does gs include these cmaps too? Possibly we should use the reduced ones there as well.)11:32.26 
  Next few look fine. Looking at the "Try other CJK languages to find missing characters" one now.11:37.48 
  Is there a way to know what chars a font has without loading the whole thing?11:38.09 
  Like, can we check in a CMAP first?11:38.17 
  I bet fonts don't correspond 1:1 with cmaps.11:38.33 
tor8 Robin_Watts: for the " Try other CJK languages to find missing characters." commit?11:39.05 
Robin_Watts tor8: Yeah.11:39.17 
tor8 sadly no, we need to load the TTF/OTF/TTC to look at the SFNT 'cmap' table11:39.21 
Robin_Watts I feared that might be the case. Looks good anyway.11:39.34 
  This is presumably for epub ?11:39.46 
tor8 if it weren't for the Han unification, we wouldn't be in this mess :)11:39.58 
Robin_Watts mmm.11:40.03 
tor8 yes. this is for epub (and eventually PDF form filling and appearance synthesis)11:40.18 
  the 'japan' font doesn't have non-japanese characters, but they could still be used in a japanese language context11:40.47 
  a fairly rare occurrence, but if it happens, we should look through the other fonts we have11:41.05 
  and of course, there's always the case where we have a unicode character but not a specified language11:41.27 
  Robin_Watts: okay, so there are more savings to be had by that sharing trick...11:45.24 
  but I think I'll need to write a new 'comm' tool that can take 3 input files...11:46.38 
  or more11:46.47 
Robin_Watts other commits all look good.11:50.53 
tor8 okay, I'll revert the "Use direct comparison to compare pdf_obj with constant name objects." commit, fix pdf_name_eq11:51.53 
  and I might need your help to make "Remove need for namedump by using macros and preprocessor." work with MSVC11:52.09 
Robin_Watts Sure.11:52.21 
tor8 and I've got some ideas, I might remake the 'cmapshare.py' script11:52.37 
  or rather the calling script11:52.47 
  to merge more cmap subsets11:52.52 
Robin_Watts ok. just yell when you need me.11:53.02 
tor8 will do11:55.42 
  Robin_Watts: success!13:12.06 
  I have squeezed the CMap data down from 1.5M to 840k by sharing more subsets13:12.18 
  oh, wait... I might be a bit over-optimistic13:12.46 
  Robin_Watts: I have managed to save 50kb by sharing the common bits between GBK-EUC-H, GBKp-EUC-H, and GBK2K-H14:06.20 
Robin_Watts tor8: So the cmaps were what size?14:06.38 
tor8 I think I shall leave it at that. the remaining CMap source files are already pretty optimal14:06.41 
  those three were 82kb, 82kb, and 89kb in size14:07.46 
Robin_Watts tor8: I was just wondering what the "full" set was before you tried squeezing, and what it is now.14:08.12 
  50K is always nice to have.14:08.36 
  Can we push the squashed sets into gs too ?14:08.52 
tor8 no idea; you'd have to consult chrisl or kens14:09.10 
kens Eh what ?14:09.22 
tor8 it might be that PS expects more from the CMap resources that these squeezed ones don't do14:09.27 
  kens: I have massaged some of the CMap resources in mupdf to 'usecmap' common subsets14:09.54 
  kens: a significant amount of savings for Uni*-UCS2-H and Uni*-UTF16-H and the GBK*-H CMap resources14:10.25 
kens Hmmm14:10.25 
  I can't really comment without looking at what you;ve done to be honest14:11.48 
  How much saving are you suggesting, what's the performance overhead of reconstructing the CMap ?14:13.33 
chrisl tor8: doesn't that make updating the cmaps a pain?14:15.06 
kens I guess we could write a customer findresource to coinstruct a CMap dictionary from a 'modified' CMap.14:17.18 
  s/customer/custom/14:17.27 
chrisl AIUI, tor8 is using the usecmap operator, so it should just work, as long as the Postscript names work out14:18.07 
kens Do we actually support usecmap ?14:18.21 
chrisl Given it's used all over the joint in the standard ones, I'd rather hope so14:18.59 
kens Hmm, seems we do14:19.01 
tor8 chrisl: it does, which is why I haven't suggested it for gs14:19.18 
  and gs doesn't embed them directly in the binary, does it?14:19.26 
kens It does for romfs builds14:19.37 
chrisl By default, yes14:19.39 
moolc tor8: you guys host git on some aws like thingie?16:37.01 
Robin_Watts moolc: We do.16:52.31 
  We have an aws instance that hosts various things, including our git server.16:52.52 
moolc Robin_Watts: git remote update just took ~15 minutes (from tor's branch) Receiving objects: 100% (470/470), 24.01 MiB | 58.00 KiB/s, done.16:56.06 
  no wonder a friend of mine mentioned how long it took him to bootstrap my stuff that does a clone of mupdf.16:56.32 
Robin_Watts That's unusually slow.16:56.44 
moolc Robin_Watts: perhaps it makes sense for me to switch git url in http://repo.or.cz/llpp.git/blob/6f05faf8e0f8bef1697edec342e8e8cfe02b43d0:/misc/bootstrap.sh to github mirror?17:14.57 
Robin_Watts moolc: Perhaps.17:15.33 
moolc Robin_Watts: okay.. i'll try17:16.19 
 Forward 1 day (to 2018/04/24)>>> 
ghostscript.com #ghostscript
Search: