IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2016/01/24)20160125 
Robin_Watts tor8: I have the draw code written that uses harfbuzz to shape the chars as they render, but it's failing to actually render any r2l text for me as it's not falling back for me.11:18.56 
  Is there a quick change I can do to make it use the fallback font to start with rather than Times-Roman so I can test this bit of the code?11:19.45 
tor8 Robin_Watts: add a user.css with @font-face { font-family: serif; src: url("resources/fonts/droid/DroidSansFallback.ttc"); }11:21.16 
  with the code on tor/master11:21.33 
  oh wait, did I push that to origin already? I'm still monday morning confused.11:21.52 
Robin_Watts I get hebrew but not arabic.11:23.23 
  DroidSansFallbackFull ?11:24.06 
tor8 Robin_Watts: shouldn't make a difference...11:24.17 
Robin_Watts That gives me arabic too.11:24.35 
tor8 Robin_Watts: try FreeSerif.ttf from http://ftp.gnu.org/gnu/freefont/freefont-ttf-20120503.zip11:24.44 
  hm, I might've messed up the encoding when creating DroidSansFallback.ttf ... we only use FallbackFull using the default build11:25.15 
Robin_Watts OK, so I can't substitute individual chars into harfbuzz when it's shaping.11:25.21 
tor8 Robin_Watts: there's no way to tell harfbuzz to pick among a set of available fonts?11:26.17 
Robin_Watts Not that I can see.11:26.33 
tor8 and harfbuzz shapes a string with only one font at a time?11:26.45 
Robin_Watts You give harfbuzz an ft font handle, and say use this.11:26.50 
  yes.11:26.52 
  tor8: So I have a plan.11:26.57 
  When I measure the nodes, I have to pass each word in turn to harfbuzz to calculate the bbox.11:27.19 
  If there are chars missing in the reply I get back from harfbuzz, I will retry with the fallback font.11:27.43 
  The question is, do I split the node around missing chars, or just fall the whole node back ?11:28.31 
  I think the latter for now.11:29.47 
tor8 I'd go with the latter for now11:31.30 
  we probably ought to pre-scan the word and select a font for each range11:33.08 
  what's the difference between a hb_font and an hb_face?11:33.52 
Robin_Watts If we have a word ABCDE, and C is missing in our default font, then we could split into AB C DE, but if C requires shaping, then we'd get better results using ABCDE in the fallback font.11:34.41 
tor8 Robin_Watts: yeah, if ABCDE and C is missing in the default font, but the fallback font that has C also has AB and DE we should use that fallback font for the whole ABCDE string11:35.22 
  consider that we will want to have a chain of fallback fonts, if we say use the noto fonts11:35.44 
  or maybe not a chain, just a big bag of fallback fonts11:35.58 
Robin_Watts Currently I'm calling hb_ft_font_create(face, NULL);, where face is the FT_Face11:36.10 
tor8 are you using hb_buffer_guess_segment_properties?11:36.41 
Robin_Watts That calls hb_ft_face_create internally to get an hb_face_t, and calls hb_font_create with that.11:37.08 
  I am not.11:37.10 
bofh_ 711:37.31 
Robin_Watts tor8: but I could.11:38.29 
  That doesn't affect font choice though.11:38.56 
tor8 hm, hb_font has a parent field11:39.00 
  no, it just fills in the scripts and languages so that opentype knows what to do for each range11:39.18 
  hb_font_set_parent and hb_font_create_sub_font ... wonder what they're doing11:40.41 
Robin_Watts I think I need to make some tweaks to stuff for mirroring.11:41.29 
  Some chars (like '(' ) get mirrored when they are used in a r2l context.11:41.54 
  I think I want to keep a 'mirror' bit for every char in the nodes.11:42.29 
tor8 shouldn't harfbuzz take care of the actual glyph mirroring?11:43.01 
Robin_Watts I've been pondering that.11:43.26 
tor8 might be worth testing11:43.44 
Robin_Watts Actually, I think it does. That's a relief.11:45.04 
  tor8: Any objection to me adding an hb_font_face_t to every fz_font ?11:56.32 
  Currently I'm creating them every time the font changes, but that's not ideal, obviously.11:57.17 
tor8 Robin_Watts: go ahead, as long as it's null until needed11:57.48 
Robin_Watts I guess ultimately we should virtualise a lot of the stuff in fz_font.11:58.06 
tor8 I expect we'll want to do layout with this stuff when laying out pdf font annotations eventually11:58.12 
Robin_Watts tor8: yeah, maybe.11:58.25 
tor8 the guts of fz_font are nasty and messy11:58.49 
Robin_Watts tor8: Yes. We have lots of functions that do: "If this is a freetype font, call the freetype variant, if it's a type3 call the type3 variant" etc.12:00.23 
  and those would be nicer as virtualised things.12:00.34 
tor8 Robin_Watts: yes.12:01.09 
Robin_Watts I expect that with michael fiddling in this area for the pdf device, the internals might get juggled a bit.12:01.22 
tor8 Robin_Watts: here's an idea for dealing with the fallback font issue:12:10.51 
  we first scan the run trying to encode each character using the desired font, and if that succeeds all is good12:11.14 
  if not, we then use harfbuzz/ucdn to find runs of specific unicode scripts, and based on the script we look up a fallback font12:11.54 
Robin_Watts ok. The first step is equivalent to handing it to harfbuzz, saying 'shape' and then checking that none of the codepoints it gets back is 0.12:12.19 
tor8 yeah, that's probably the same but possibly faster because we don't need to run the shaping12:12.59 
  the other approach, which I think is what web browsers might be doing, is to keep a list of unicode ranges that each font covers12:13.37 
Robin_Watts It's interesting that if I pass arabic to hb with Times-Roman, I get a list of shaped glyphs back, all with 0 codepoints - but with offsets etc set for shaping.12:13.48 
tor8 which is more automatic, if you just hand the browser a directory of fonts from the system, it'll sort them out and have some way of automatically finding a proper fallback font12:14.18 
Robin_Watts I need to get hb to do the shaping cos I need to get the bbox for the shaped glyphs.12:14.31 
tor8 I think harfbuzz does some generic arabic shaping without using opentype tables if there are none12:14.51 
Robin_Watts Having script based fallbacks does sound the right way to go, ultimately.12:14.55 
tor8 using unicode presentation forms12:14.59 
  script based fallbacks is more robust, if we can control the set of fonts used12:15.15 
  which is something I definitely expect we'll want12:15.24 
  Robin_Watts: I was thinknig as a pre-pass before we do the shaping to measure12:15.46 
  just to figure out the font to use for runs handed off for shaping12:16.03 
Robin_Watts Well, I figure the shaping will be enough 90% of the time.12:16.12 
tor8 if we run the hb_buffer_guess_segmnet thing we could use that to look up the scripts12:16.27 
Robin_Watts tor8: I could pre-break the fragments at script changes.12:17.05 
tor8 Robin_Watts: that could work; shaping doesn't work past script changes either IIRC12:17.41 
Robin_Watts I can't see how it would.12:17.49 
tor8 s/could/should/ :)12:18.19 
  Robin_Watts: ICU seems to have better documentation, and there's an ICU to harfbuzz bridge project that exposes the ICU interface with harfbuzz as a back-end12:18.51 
  that might be worth reading to figure out what harfbuzz actually does12:19.00 
Robin_Watts hmm. That would end up with punctuation ending up in different fragments.12:19.22 
  unless we have punctuation being 'script neutral' or something ?12:19.39 
tor8 ICU also requires a run of a single font in a single script to its shaping12:19.41 
  punctuation is script neutral, I think the guess properties thing tries to resolve the punctuation12:20.04 
  better read the code to make sure though12:20.09 
  "Clients can use ICU's Bidi processing to determine the direction of the text and use the ScriptRun class in icu/source/extra/scrptrun to find a run of text in the same script." from the ICU docs12:20.29 
  "The ICU LayoutEngine is designed to process a run of text which is in a single font. It is written in a single direction (left-to-right or right-to-left), and is written in a single script." and I expect the same of harfbuzz12:20.44 
  hm, harfbuzz expects each hb_buffer_t to be in a single script so the guess_segment_properties call just finds the first non-neutral script in the buffer12:24.21 
Robin_Watts yeah.12:24.44 
tor8 so splitting the fragments at script changes is something we'll actually need to do rather than want12:24.55 
Robin_Watts Oh god, the thought of moving over to ICU has just drained me of all energy.12:25.08 
tor8 no no no! I don't want to do that at all.12:25.42 
  but it could be worth looking at the ICU docs and bridge to understand what harfbuzz's undocumented code is trying to do12:26.16 
Robin_Watts ICU may have a more up to date UAX #9 implementation.12:26.17 
  I'm happy with harfbuzz as is (other than being in C++)12:26.35 
tor8 ICU is all c++ though12:26.38 
  with a C++ interface, unless I'm mistaken12:26.50 
Robin_Watts I may try a C++ -> C conversion for harfbuzz later.12:27.04 
tor8 harfbuzz being C++ bothers me, but I'm okay with it since the visible interface is still c12:27.17 
  ICU has a scriptrun class they recommend for finding runs of the same script in text12:28.52 
  http://userguide.icu-project.org/layoutengine12:28.56 
Robin_Watts tor8: I'll just call ucdn_get_script where I am already breaking fragments in the bidi handling code.12:29.25 
tor8 yeah, I think it's just a matter of spreading/infecting the common/inherited scripts12:29.48 
Robin_Watts I'm going to assume that the punctation is going to be UCDN_SCRIPT_COMMON12:30.41 
tor8 Robin_Watts: all punctuation is UCDN_SCRIPT_COMMON, by my reading of http://www.unicode.org/Public/8.0.0/ucd/Scripts.txt12:35.00 
Robin_Watts ok, I have it splitting into runs of punctuation + script then.12:41.51 
  tor8: Is the mupdf viewer supposed to accept user css too?13:29.33 
  Aha, fixed it.13:31.24 
tor8 Robin_Watts: yes. -U flag should work on both -x11/win32 and -gl viewers13:33.27 
Robin_Watts The win32 one was reading layout_css and then not doing anything with it :)13:33.54 
tor8 ah!13:34.00 
  oops.13:34.16 
  you should be using the -gl viewer though ;)13:34.23 
Robin_Watts Me so dinosaur.13:35.29 
  Ok, tor8. Changes on robin/harfbuzz.13:51.59 
  See what you think.13:52.05 
Robin_Watts lunches13:52.08 
tor8 Robin_Watts: "This does mean that the fallback is reflected in the font that finally makes it through to the device interface." is true today as well13:55.15 
Robin_Watts tor8: ah, ok, so no worse then.14:26.29 
tor8 fz_encode_character_with_fallback fills out the &font with the actual fallbacked font for the glyph to use14:27.39 
  the code on robin/master doesn't look like it's using the fallback font (or I'm blind)14:29.15 
Robin_Watts robin/harfbuzz ?14:29.38 
  tor8: font = font->fallback in measure_word14:29.57 
tor8 oh, I was looking in draw_word14:33.00 
  ouch, you overwrite the node->style->font14:33.07 
  I've got to go out for a couple of hours, need to help a friend move some stuff.14:34.06 
  Robin_Watts: the node->style is potentially shared by all the flow nodes in the same paragraph14:39.48 
Robin_Watts tor8: ok, so that needs to be fixed.14:40.11 
  Will need to talk to you about then when you get back.14:53.11 
tor8 Robin_Watts: I'm back.17:59.05 
Robin_Watts tor8: I had a cunning idea about the style stuff.18:54.40 
  I'm going to add a fallback pointer to each style18:55.09 
  So for every style, there can be at most as many fallback equivalent styles as there are links in the fallback chain.18:55.50 
  But I've spent the afternoon looking at harfbuzz.18:56.36 
tor8 so you'll clone each style if there's a use of a fallback font?18:56.39 
Robin_Watts tor8: Yes.18:56.45 
tor8 I'd rather just stick a fz_font in the fz_html_flow node18:56.56 
  or an index into a "global" array of fonts18:57.21 
  or just duplicate the find_best_font stuff18:57.41 
Robin_Watts tor8: We could have an 'index into the fallback list' ?18:57.52 
  so for a given style, 0 would be style->font, 1 would be style->font->fallback etc.18:58.22 
tor8 one idea I had was to use a fz_font_set instead of a fz_font in the html code18:58.36 
  so style->font would be style->fontset and each flow node could have an index into the font set18:58.52 
  where the font set would be an array of fonts; the desired fz_font, and one fz_font per script we support for fallback18:59.26 
  but an index into the fallback list would serve the same purpose, yes18:59.48 
Robin_Watts ideally, we'd generalise stuff a bit, so that we didn't lose information going via the device interface.18:59.48 
  OK, I'll do the index.19:00.03 
  Converting HarfBuzz to C is possible, but it'll be very hard going.19:00.18 
  Harfbuzz makes lots of use of templates.19:00.31 
tor8 Robin_Watts: I mentioned that we use harfbuzz to sebras, and his first words were "are you going to rewrite it in C?"19:00.50 
Robin_Watts converting away from that will be time consuming.19:00.57 
  If we had a fortnight to invest, I'd say we could do it.19:01.08 
tor8 and then we'll be stuck with that version forever; backporting fixes could be troublesome19:01.22 
Robin_Watts There is lots of stuff in there that is overly defensive programming.19:01.38 
  Yes, so I stopped doing that.19:01.50 
  But it calls malloc/free/calloc, so I looked for a way to solve that.19:02.04 
  Updating the code so it passes a heap pointer around (and then can call hb_free/hb_malloc/hb_calloc) was what I tried.19:02.43 
  But that explodes out of control. Again it will make taking fixes on hard.19:02.59 
  So I am tempted to use some #define malloc hb_malloc magic in the hb.h file, and wrap calls to hb.19:03.52 
  The wrapper calls on windows/unix would use thread local storage.19:04.30 
tor8 hb as I've built it is not thread safe (I didn't investigate too much, I just set -DHB_NO_MT)19:04.39 
  Robin_Watts: yeah, that ought to work.19:05.18 
  not convinced it's worth spending a lot of time on though19:05.33 
Robin_Watts tor8: Actually... a better solution would be to lock/unlock a mutex and then call harfbuzz.19:05.52 
tor8 just reuse our freetype lock19:06.13 
Robin_Watts Nice, yes.19:06.20 
  Ok, I'll do that.19:06.31 
tor8 Robin_Watts: one potential problem with hacking malloc #defines is if someone uses both mupdf and harfbuzz libraries19:07.13 
  they'll end up with harfbuzz from mupdf, that expects hacked things to happen19:07.33 
  or we won't be able to use system harfbuzz19:08.05 
Robin_Watts tor8: Well, I could do a systematic malloc/free/calloc replacement with hb_...19:08.17 
tor8 better to ask upstream to fix it, but I seriously doubt they'll be willing19:08.36 
  since it'd break both source and binary compatibility, unless they do hacks like you propose :)19:08.52 
Robin_Watts Having looked at the amount of work it would be to do it nicely - yes.19:09.09 
  We *could* do a new header, that's included at the top of hb.h19:09.55 
  and that #defines all the API from being hb_blah to being fz_hb_blah19:10.17 
  So our harfbuzz lib would not conflict with the system one.19:10.33 
tor8 Robin_Watts: I'd just extend the hb api with a global allocator function+context and #define malloc/calloc/free calls in our hb build19:12.42 
  if someone uses both mupdf and harfbuzz, I hope they won't mind using the mupdf allocator for harfbuzz...19:13.57 
  I wonder how many, or even if any, of our customers use custom allocators19:14.26 
Robin_Watts tor8: That's fine until they call harfbuzz without mupdf being inited (or having closed down) and get a duff context.19:14.50 
tor8 well, we'd initialize it with system malloc defaults19:15.47 
  that could go wrong if they use it a bit, then use mupdf which installs its own, and then get mixed allocations :(19:16.24 
Robin_Watts I'll have a play.19:17.17 
tor8 well, if we're just hacking it for us, have the mupdf code install and uninstall allocators when we use it19:18.39 
Robin_Watts tor8: Updates on robin/mupdf.git/harfbuzz and robin/harfbuzz/artifex19:59.09 
  I haven't fixed the style font thing yet. Will do that tomorrow.19:59.23 
 Forward 1 day (to 2016/01/26)>>> 
ghostscript.com
Search: