Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2016/10/23)20161024 
Robin_Watts sebras: For the logs: I will look at your commit tomorrow, but I agree tor8 should take a look too.00:10.24 
  I have a proposed fix on my branch for the slowdown bug. Just clustering it now.00:10.41 
  gotta check one file that is now timing out.00:25.14 
sebras tor8: morning?11:16.40 
tor8 morning!11:18.15 
  sebras: I have a few comments on your commits11:18.24 
  FZ_STEXT_EXPAND_WHITESPACE .. is not really expanding whitespace11:18.36 
  normalize, clean, or maybe collapse (to match css/html whitespace terminology)11:19.02 
  the comment says replace with "space characters of variable width" ... variable width? I think you mean "regular space characters"11:19.25 
sebras tor8: well, codepoint-wise yes, but width-wise no.11:19.47 
  tor8: this was due to a file that fred had issues with that contained tab chars IIRC.11:20.17 
tor8 well, stext is mainly about the character codes so I think being clear that it means U+0032 SPACE is important11:21.24 
  ahem, U+002011:21.33 
sebras tor8:2yes2i2agree.11:21.58 
tor8 in the last bug fix for 697236, you change the actual output11:23.07 
  it used to be that we'd emit an array for a ligature with charater code/glyph indexes [glyph-id-of-ffi/f, -1/f, -1/i)11:24.00 
  now we emit [-1/f, -1/f, -1/i]11:24.12 
  the intent of the output array is to match what XPS calls 'cluster mapping'11:24.35 
  where you have N-to-M glyph index to character code clusters11:24.50 
  we do that by passing -1 for character codes or glyph indexes that belong to the cluster but don't have a separate character code or glyph11:25.30 
sebras ok, right but besides getting the gids wrong, how about the computed widths? are you happy with this approach?11:26.17 
  the gids ought to be easy to fix I belive.11:26.24 
  tor8: here, you get an extra character for the previous message: e11:26.47 
tor8 yes, that's a reasonable approach. it doesn't quite match what happens with harfbuzz and how ligatures are emitted in say HTML11:26.50 
  and opentype11:26.53 
  sebras: thanks. can't have to many eeees11:27.04 
sebras tor8: how are they handled in hb/html?11:27.12 
tor8 sebras: oh, and have an extra 'o'11:27.12 
  the widths are all put on the glyphs, and the characters have zero width11:27.34 
  there is no reasonable way to do otherwise for things like the indic language syllable reshaping11:27.53 
  sebras: have you got a sample file somewhere called "cluster.xhtml"?11:28.35 
  or is that not committed to the cluster test files?11:28.43 
sebras tor8: what is cluster.xhtml?11:29.15 
tor8 http://ghostscript.com/~tor/epub-tests/11:29.22 
sebras oh.11:29.27 
  no, that file is not mine.11:29.31 
tor8 it has a bunch of text that exercises harfbuzz11:29.35 
sebras ah.11:29.43 
tor8 samples of many-to-one (f,i -> fi ligature), one-to-many, and many-to-many mappings11:30.20 
sebras tor8: my commits originate from reading bug 697236 and discussing it with robin last night.11:31.10 
  s/night/sleep cycle/11:31.20 
tor8 sebras: http://pastebin.com/raw/ii2jtGLj has the output of what is drawn for cluster.xhtml11:33.07 
  and that comes out crap, with an extra space character after 'fi'11:34.02 
  I thought I'd fixed that... *sigh*11:34.07 
sebras tor8: I know why it happenbs.11:34.20 
  tor8: we add characters when expanding the ligatures.11:34.37 
  and the trm is never updated between characters.11:35.00 
tor8 <char bbox="48 46.696878 83677230000000000 111.595317" x="48" y="92.4" c="i"/> is what comes out with sebras/master with stext device11:35.23 
sebras so eventually we are so far from the left hand side of the span that we decide to add a space.11:35.24 
  apparently I need to download cluster.xhtml11:35.48 
  tor8: did my explanation about the eeeeeeeeeeeeeeextraneous space make sense?11:41.40 
  tor8: ehm... crouton decided to give you a few extra e:s.11:42.05 
tor8 the trim should be updated between glyphs, and if there's a diff between what the font's advance says it sohuld be and where actually ends up we add a space11:44.03 
  trim? trm!11:44.11 
  I've borrowed kens keypboard today too....11:44.18 
kens :-)11:44.27 
  Comedy keyboards R us11:44.36 
sebras kens: :)11:47.27 
  tor8: so on origin/master11:48.05 
  in source/fitz/stext-device.c in fz_add_stext_char_imp()11:48.28 
  around line 617 we compute p and q for the character11:48.43 
  because trm is never updated between constituent characters in ligatures11:49.04 
  shouldn't we accumulate the advance per character?11:49.48 
  because i think delta will later be computed incorrectly, if we do not.11:50.31 
  IIUC this is the source of the extraneous space.11:51.03 
tor8 sebras: yes. it seems like fz_add_stext_char doesn't cope with gid==-1 and/or c==11:55.41 
  -111:55.43 
sebras well, that's not the mistake I think.11:56.47 
  tor8: basically if gid < 0 we simply make sure not to add any extra space between the chars.11:57.07 
  tor8: which ought to be fine for ligatures.11:57.17 
tor8 if we look at the EPUB case where we want the c/glyph/adv tuples for "fi" to be [char_f/glyph_fi/100, char_i/-1/0]11:57.21 
sebras tor8: 100 and 0 being the widths of the characters?11:57.51 
tor8 yes11:57.56 
  100 is the width of the 'fi' ligature11:58.01 
  0 is the width of nothing11:58.22 
sebras :) yes, i got that much!11:58.38 
tor8 for EPUB we *don't* do the split the width evenly because the ligature handling is all done by harfbuzz11:58.39 
sebras does harfbuzz split ligatures and figure out the constituent chars individual widths?11:59.21 
  or for epub that is simply never desirable?11:59.35 
tor8 unfortunately harfbuzz emits the 'fi' and -1 characters at the same spot, and I think that triggers the addition of a space11:59.47 
  <g unicode="f" glyph="fi" x="36" y="68.4" />12:00.27 
  <g unicode="i" glyph="-1" x="36" y="68.4" />12:00.27 
  <g unicode="l" glyph="l" x="58.640626" y="68.4" />12:00.27 
  that's what harfbuzz spits out for "fil" when it makes a ligature12:01.21 
sebras tor8: yes, if that happens then we'll detect the just from 36 to 58.640626 and compare that to our space detection.12:01.30 
tor8 and the delta between 'f' and 'i' will be negative12:01.53 
  and thus ignored (as kerning)12:02.15 
  and then the delta between 'i' and 'l' is really big12:02.25 
  so yes, I understand what's going on but not quite sure how we fix it12:02.36 
sebras tor8: the width of the ligature, right..?12:02.41 
tor8 well, we fix it by ignoring adding spaces between any -1 glyphs or characters12:02.58 
  yeah, the width of the ligature, erroneously compared with the width of .notdef12:03.18 
sebras neither before, nor after -1 characters.12:03.22 
tor8 adv = fz_advance_glyph(ctx, style->font, -1, wmode); in fz_add_stext_char12:03.32 
sebras yeah, after i moved that code in there.12:04.02 
  but maybe that was not a good idea.12:04.10 
tor8 the pen advancement space addition stuff should just ignore all glyph==-1 || character==-112:04.11 
  the closing brace after EXPAND_WHITESPACE is on the wrong indent12:04.38 
sebras tor8: it is indeed.12:06.28 
  wait... from where do you infer that fz_advance_glyph() uses gid == -1?12:07.58 
  tor8: it will compute the width of the ligature, not the -1 chars.12:08.14 
  tor8: that's why I take pains to call fz_encode_character_with_fallback() in fz_add_stext_ligature().\12:08.55 
tor8 on the 'i' character12:08.56 
  this is for epub. we don't call fz_add_stext_ligature.12:09.18 
sebras oh.12:09.29 
tor8 that's only for when we use compatibility unicode character 'fi'12:09.47 
sebras so for epub harfbuzz has split up the ligatures before we end up in stext?12:09.50 
tor8 harfbuzz has *created* a ligature and created a fz_text object with the ligature pre-split12:10.25 
sebras can feel tor8's frustration from .tw12:10.46 
tor8 the input to harfbuzz is 'f', 'i' and the output is f/fi, i/-112:11.08 
  the input to pdf is 'fi' and output is 'fi' and we want to split that in the stext-12:11.24 
  device12:11.27 
  sebras: :)12:11.46 
sebras ok my concept of when things happen for epub is a bit hazy (as with most things)12:12.09 
tor8 I'm not sure fz_encode_character_with_fallback is the proper approach in fz_add_stext_ligature12:12.22 
  that will load fallback fonts for PDF12:12.31 
  which is not really desired12:12.39 
sebras tor8: I needed to figure out the gid somehow and this was the best I could come up with.12:12.53 
tor8 fz_encode_character, and if that fails, take a wild guess12:13.10 
sebras but if we keep the gid == -1 and the widths == 0 then this is not needed.12:13.14 
tor8 but let's discuss that after we fix the epub case where ligatures are pre-split12:13.37 
  the reason I just split evenly in the old ligature code is because of not being able to guarantee I can get the advances for the parts of the ligature12:14.36 
  due to funky embedded pdf fonts that have stripped out th encoding tables and use Identity-H encodings12:15.10 
sebras tor8: right, I also worried that when asking the font it might not have the constituent chars.12:15.15 
tor8 sebras: I need to think about this for a bit12:17.13 
  how to handle incoming ligatures in the stext device12:17.24 
  I might experiment a bit with your commits12:17.36 
sebras tor8: no worries.12:17.43 
  this was more complicated than i anticipated.12:18.00 
tor8 sebras: it always is :)12:18.11 
sebras tor8: basically i don't see any reason to keep my commits around, because they feel... incorrect.12:18.55 
tor8 sebras: Robin_Watts: what would you call the flag for turning all whitespace into regular space characters?12:18.56 
  I'd probably go with FZ_STEXT_NORMALIZE_WHITESPACE 12:19.24 
sebras tor8: ok.12:19.58 
tor8 sebras: the lastchar commits should be good, if you fix the commit message and indentation12:20.34 
  sebras: the change from PRESERVE to EXPAND ... is that to change the default option?12:21.08 
sebras tor8: basically the bug reporter wanted to be able to disable both types of preservation.12:22.03 
  tor8: and if I inverted the interpretation of the flags then this was possible.12:22.19 
  tor8: otherwise I'd need a special value to denote: disable all.12:22.34 
tor8 sebras: something must have gone wrong with the old code...12:23.00 
  the default should be OFF for both preserves12:23.08 
Robin_Watts tor8: "FZ_STEXT_NORMALIZE_WHITESPACE" sounds good to me.12:24.41 
sebras tor8: really!? I thought you wanted stext to _not_ expand ligatures or translate tabs into variable width space characters by default..?12:25.17 
tor8 my original thought is that we parse them with fz_has_option("preserve-whitespace")12:25.21 
Robin_Watts 2 commits on robin/master.12:25.30 
tor8 which is why the default should be to normalize/expand and you can turn it off by setting the preserve option12:25.45 
  but that must have gotten lost in translation somewhere12:25.51 
Robin_Watts The first commit tested out all but for 1 file last night. The second commit should fix that I hope. Clustering now.12:25.53 
sebras tor8: ehm.. i think this is quite similar to what the code looked like from the beginning, albeit I parsed everything in fz_device instead of in fz_stext.12:26.46 
tor8 Robin_Watts: both LGTM12:26.55 
sebras tor8: I remeber having this discussion with robin at least.12:27.13 
  tor8: but then we'd have to change the interface for the hints for the device.12:27.30 
tor8 sebras: yes, we moved it (but apparently not completely) to the subclass device options rather than device hints12:27.34 
sebras tor8: no, i messed up the default values, sorry.12:27.56 
  tor8: and if we have hints as a bitfield in device I think it makes sense to have the same for stext.12:28.25 
tor8 IMO the defaults should be options that are 'off', and we turn them 'on' when we want the other behaviour12:28.30 
sebras thought personally i'd prefer them to be strings (as these are more extensible and we don't create a milion devices per second in any case)12:28.52 
tor8 sebras: pdf-write and draw-device has options parsing12:29.09 
  by strings12:29.12 
  but there we pre-parse them into option structs with bitfields12:29.52 
  we could do something similar with stext_device, fz_parse_stext_options returns the bitmask to pass as the option argument12:30.18 
Robin_Watts tor8: clustered and pushed, thanks.12:30.38 
tor8 or even a struct, if we want to add more tunable heuristic arguments12:30.41 
  sebras: shouldn't the last line in fz_new_stext_device just be dev->options = options without a ternary expression?12:32.58 
  and that would fix the bug with the not being able to disable both12:33.16 
  and we keep the 'preserve' name12:33.26 
sebras so if I set options to FZ_STEXT_PRESERVE_LIGATURES then I'd actually expand them..?12:34.14 
tor8 and add device option parsing to mutool draw -Fstext12:34.18 
sebras you wanted ligature expansion to be inactive by default, no..?12:34.51 
  and having to be explicitly activated.12:35.01 
tor8 I want cleaning up crap to be the default12:35.09 
  so PRESERVE should be off by default12:35.24 
  and fredross-perry can turn it on to preserve tabs in his app12:35.35 
sebras tor8: fred actually wanted the output to contain spaces instead of tabs to be able to detect "words: properly.12:36.41 
  ("words" are hard to define though, but whitespace delimited entities is what he was looking IIRC)12:37.12 
  looking for.12:37.21 
  so he wanted conversion from 0x09 -> 0x20. he never really asked for being able to control ligatures but it is a similar type of interpretation that a use might want to control.12:38.31 
tor8 sebras: right, so fred should be okay with the defaults being to normalize and expand (which should be options field 0)12:47.52 
sebras tor8: indeed.12:48.20 
Robin_Watts I need a review for: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=e332187c7f1259756b1accf8ef5b49f3068fc79b13:18.01 
  I will prod Paul and Joseph too, cos it's androidy (and fred if he comes on line)13:18.15 
sebras Robin_Watts: i'm not well equipped to reivew that code.13:20.49 
Robin_Watts either. It looks plausible, but...13:21.31 
jogux Robin_Watts: If I understand everything correctly, then 'cancelAndWait' seems like a bad name for a function that doesn't wait.13:25.02 
  It otherwise seems plausible, but I think I'd rename that function rather than leaving it as a trap for the next person that works with the code :)13:28.15 
Robin_Watts s/cancelAndWait/cancel/ then ? seems reasonable.13:30.20 
jogux yeah, assuming cancel isn't reserved/in use :-)13:34.48 
Robin_Watts jogux: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=a76bbd0f0b2f966a04a560cb02ba2d926f0e4adb13:48.23 
  That seems to work.13:48.27 
jogux Robin_Watts: Cool. Seems to means we're somewhat away from what Matt originally tried to achieve in http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commit;f=platform/android/viewer/src/com/artifex/mupdfdemo/CancellableAsyncTask.java;h=d7ece4132d6219ee10ba9ed85a9f2a052a6bb92c - but... no worse than we currently are, except with the bonus of not crashing.13:51.20 
tor8 sebras: ligature space fix on tor/master15:33.19 
sebras fetches.15:38.20 
  dfa1d0e ought to mention fixing bug 697236..?15:39.59 
  tor8: besides that it LGTM.15:42.19 
tor8 sebras: ah, that's the bug number.15:51.22 
  I lost it when you deleted your commits :)15:51.33 
  so, right now ligatures will look the same whether they come from epub or pdf15:52.29 
  when they come from pdf we can try to make their bboxes be more like if they weren't15:53.00 
  but I think aiming for consistency is probably better here15:53.23 
Robin_Watts tor8: Ideally bboxes should be 'sane' for text selection.15:54.35 
tor8 sebras: hmmm. it looks like the final character in a word has its bbox merged with inserted fake space characters15:54.48 
  Robin_Watts: define sane for indic languages with syllable rearrangements, or thai with one-to-many mappings :)15:55.12 
Robin_Watts ie. the bbox for 'ffi' should be approximately the same as 'f' 'f' 'i'.15:55.28 
  for such things, not sane is fine :)15:55.34 
tor8 Robin_Watts: the output from harfbuzz has lost that information though15:55.54 
  and for PDF we can't reliably recover it since we can't trust the font encoding15:56.19 
  we can try to evenly divide the ligature/cluster into equal parts15:56.34 
Robin_Watts So we can't know that 'ffi' is actually 'ffi' and not something else ?15:56.57 
tor8 Robin_Watts: we don't know which glyph is the 'f' character, if it even exists in the font15:57.26 
Robin_Watts Right.15:57.33 
tor8 so we can't guarantee we can measure the width of 'f' and 'i' to do the split proportionally15:57.43 
  we can definitely try, but that may be more wrong than just dividing it evenly15:57.59 
  many-to-one mappings (such as f,f,i to ffi) we can divide up in the stext-device if we rearchitect it a bit15:58.57 
  one-to-many (like a hypothetical Ã¥ in some fancy font that decomposes it to a,aring glyphs) 15:59.45 
  those we just drop the 'aring' part in the stext device as it is written now16:00.24 
  so, dividing the many-to-one mappings evenly might be the sanest approach for everything that doesn't involve digging through opentype tables for caret positions16:01.02 
  it might be that harfbuzz can give us those, in which case we really need to rearchitect things16:01.17 
sebras tor8: welll if can_append == 0 we add a new span16:03.11 
  tor8: and in that span the first thing we might do is insert a space and we use p for both the space and the char.16:03.36 
tor8 sebras: so, adding a fake space character seems to get the p,q values wrong too16:03.41 
sebras tor8: meybe we should update p and q accordingly when inserting the space?16:03.49 
tor8 the p should be the old pen value16:04.00 
sebras tor8: and q shoudl be adv more advanced.16:04.56 
  basically increment both p and q by r and you ought to be fine..?16:05.18 
tor8 hm, add_space has the 'r' and 'p' arguments to add_char reversed16:05.34 
sebras no I don;t think so.16:05.52 
  tor8: i think the idea is that r extends 0.2f to the left of trm.e16:06.11 
tor8 the idea is that 'r' is the start of the character following the space16:06.14 
  I mean, 'p' is the start of the character following the space16:06.24 
  so 'r' is calculated to be a space character's width before that16:06.32 
sebras ah...16:06.47 
tor8 so we should be adding space,r,p then c,p,q16:06.56 
sebras but wait...16:07.05 
  does this take wmode into account?16:07.12 
tor8 or better yet, take 'r' from the cur_span->max.x if possible16:07.12 
  nope, it does not16:08.00 
sebras ok, cool, then I at least spotted something.16:08.15 
tor8 sebras: yes! thanks for spotting that. I have a fix on tor/master.16:11.07 
  which uses the actual space width if available (which I think it should always be possible to do)16:11.42 
  but I left the old 0.2 width case in there for safety16:11.58 
sebras yeah, but the old 0.2 doesn't take wmode into account. :)16:13.43 
  now, I don't think that we can end up in the situation where we want to add a space without having a cur_span though.16:14.29 
  correct. if cur_span == NULL then can_append == 0 and add_space is consequently never set.16:15.04 
  why is there an ifdef around setting spacing = 0 in this function?16:16.23 
  around prints() seems sane though.16:16.45 
tor8 sebras: I'm guessing because it's only used by the ifdef'd printf in the next ifdef chunk16:18.52 
sebras tor8: we init spacing to 0, kill the ifdef please. :)16:20.01 
  oh, wrong if-statement!16:20.22 
tor8 I can kill the ifdef anyway, and always set spacing=0 if you like16:20.33 
sebras maybe I should just keep quiet. :)16:20.37 
  tor8: maybe with this new found knowledge it is a good time to tackle http://bugs.ghostscript.com/show_bug.cgi?id=697074 ?16:24.57 
tor8 sebras: ohhhh16:31.50 
  I spotted another "problem" with out search16:31.58 
sebras tor8: ok..?16:32.06 
tor8 if I search for 'mm' I will find overlapping hits :)16:32.08 
  both the start and end of 'mmm' will result in a hit16:32.16 
  (mm)m and m(mm)16:32.25 
  so double highlighted!16:32.31 
sebras tor8: elegant!16:33.48 
  tor8: luckily the x11 viewer uses Xor for highlighting so I never get to see the double highlighting. ;)16:34.16 
tor8 and that file runs into the 'too many results' problem with having a fixed array of result hit boxes16:35.57 
sebras tor8: oh, I simply created a test file: <html><body>commmit :)16:37.33 
tor8 okay, more commits on tor/master16:43.28 
sebras tor8: LGTM.16:48.21 
  tor8: this doesn't cover the top three commits on tor/master though.16:50.42 
tor8 sebras: nope, those need to wait until post release16:51.52 
  sebras: and thanks!16:51.59 
sebras tor8: did you decide to tackle 697074 too or, did you give up on that one?16:53.44 
tor8 sebras: that one is fixed by the space coordinate commit17:02.18 
  I guess I could put the bug number in. I spotted the same error before you mentioned the bug :)17:02.31 
  bah, too late. I already pushed.17:02.40 
sebras tor8: :)17:02.45 
  tor8: well, close the bug at least!17:02.50 
tor8 I shall17:04.36 
  hm, thinknig a bit here, but isn't most kerning negative (as in bringing characters closer) rather than positive. thinking our heuristic for space detection could be a bit tighter if we only look for positive spaces...17:05.34 
  but that could fall into traps with things like apple's word processor adding even character spacing between all characters for justification sometimes17:06.03 
Robin_Watts All justification is done by adding space, AIUI.17:07.44 
  I've never knowingly seen justification that squashed text together.17:08.11 
tor8 Robin_Watts: I'm talking about the abomination that adds space between *all* characters17:08.44 
  not the usual kind that stretches the space characters17:09.03 
  some justification can shrink space characters though. I've seen that often enough.17:09.36 
sebras tor8: for 696710 (that also mentioned stext): both evince and xpdf manage to see the spaces on the first line of the abstract.17:10.44 
tor8 ah, no, looking through the AFM files we have for kerning pairs, 1/5th of pairs are positive17:11.47 
sebras and PDF.js17:11.51 
tor8 sebras: we can too, if we change the SPACE_DIST constant in stext-device17:13.51 
sebras tor8: I know.17:14.13 
  I wonder how the other viewers do it though.17:14.37 
  to avoid false positives.17:14.51 
tor8 probably check the sign of the extra spacing17:15.33 
  sebras: we should add text output to the cluster17:17.43 
sebras tor8: that'd be useful.17:19.34 
tor8 okay, commit on tor master. if you could test that with a bunch of weird files I'd appreciate it :)17:20.48 
  but now I need to eat dinner and you need to sleep, I think...17:21.09 
sebras tor8: hm...17:22.44 
  tor8: similar code exists in strain_soup().17:28.02 
  I'm thinking of the move ob fabsf().17:30.17 
  but I'm to tired to figure out how strain_soup() would handle negative spacing.17:30.46 
  maybe it cannot be negative due to other reasons though.17:30.57 
sebras sleeps.18:45.57 
 Forward 1 day (to 2016/10/25)>>> 
ghostscript.com
Search: