| <<<Back 1 day (to 2016/01/24) | 20160125 |
Robin_Watts | tor8: I have the draw code written that uses harfbuzz to shape the chars as they render, but it's failing to actually render any r2l text for me as it's not falling back for me. | 11:18.56 |
| Is there a quick change I can do to make it use the fallback font to start with rather than Times-Roman so I can test this bit of the code? | 11:19.45 |
tor8 | Robin_Watts: add a user.css with @font-face { font-family: serif; src: url("resources/fonts/droid/DroidSansFallback.ttc"); } | 11:21.16 |
| with the code on tor/master | 11:21.33 |
| oh wait, did I push that to origin already? I'm still monday morning confused. | 11:21.52 |
Robin_Watts | I get hebrew but not arabic. | 11:23.23 |
| DroidSansFallbackFull ? | 11:24.06 |
tor8 | Robin_Watts: shouldn't make a difference... | 11:24.17 |
Robin_Watts | That gives me arabic too. | 11:24.35 |
tor8 | Robin_Watts: try FreeSerif.ttf from http://ftp.gnu.org/gnu/freefont/freefont-ttf-20120503.zip | 11:24.44 |
| hm, I might've messed up the encoding when creating DroidSansFallback.ttf ... we only use FallbackFull using the default build | 11:25.15 |
Robin_Watts | OK, so I can't substitute individual chars into harfbuzz when it's shaping. | 11:25.21 |
tor8 | Robin_Watts: there's no way to tell harfbuzz to pick among a set of available fonts? | 11:26.17 |
Robin_Watts | Not that I can see. | 11:26.33 |
tor8 | and harfbuzz shapes a string with only one font at a time? | 11:26.45 |
Robin_Watts | You give harfbuzz an ft font handle, and say use this. | 11:26.50 |
| yes. | 11:26.52 |
| tor8: So I have a plan. | 11:26.57 |
| When I measure the nodes, I have to pass each word in turn to harfbuzz to calculate the bbox. | 11:27.19 |
| If there are chars missing in the reply I get back from harfbuzz, I will retry with the fallback font. | 11:27.43 |
| The question is, do I split the node around missing chars, or just fall the whole node back ? | 11:28.31 |
| I think the latter for now. | 11:29.47 |
tor8 | I'd go with the latter for now | 11:31.30 |
| we probably ought to pre-scan the word and select a font for each range | 11:33.08 |
| what's the difference between a hb_font and an hb_face? | 11:33.52 |
Robin_Watts | If we have a word ABCDE, and C is missing in our default font, then we could split into AB C DE, but if C requires shaping, then we'd get better results using ABCDE in the fallback font. | 11:34.41 |
tor8 | Robin_Watts: yeah, if ABCDE and C is missing in the default font, but the fallback font that has C also has AB and DE we should use that fallback font for the whole ABCDE string | 11:35.22 |
| consider that we will want to have a chain of fallback fonts, if we say use the noto fonts | 11:35.44 |
| or maybe not a chain, just a big bag of fallback fonts | 11:35.58 |
Robin_Watts | Currently I'm calling hb_ft_font_create(face, NULL);, where face is the FT_Face | 11:36.10 |
tor8 | are you using hb_buffer_guess_segment_properties? | 11:36.41 |
Robin_Watts | That calls hb_ft_face_create internally to get an hb_face_t, and calls hb_font_create with that. | 11:37.08 |
| I am not. | 11:37.10 |
bofh_ | 7 | 11:37.31 |
Robin_Watts | tor8: but I could. | 11:38.29 |
| That doesn't affect font choice though. | 11:38.56 |
tor8 | hm, hb_font has a parent field | 11:39.00 |
| no, it just fills in the scripts and languages so that opentype knows what to do for each range | 11:39.18 |
| hb_font_set_parent and hb_font_create_sub_font ... wonder what they're doing | 11:40.41 |
Robin_Watts | I think I need to make some tweaks to stuff for mirroring. | 11:41.29 |
| Some chars (like '(' ) get mirrored when they are used in a r2l context. | 11:41.54 |
| I think I want to keep a 'mirror' bit for every char in the nodes. | 11:42.29 |
tor8 | shouldn't harfbuzz take care of the actual glyph mirroring? | 11:43.01 |
Robin_Watts | I've been pondering that. | 11:43.26 |
tor8 | might be worth testing | 11:43.44 |
Robin_Watts | Actually, I think it does. That's a relief. | 11:45.04 |
| tor8: Any objection to me adding an hb_font_face_t to every fz_font ? | 11:56.32 |
| Currently I'm creating them every time the font changes, but that's not ideal, obviously. | 11:57.17 |
tor8 | Robin_Watts: go ahead, as long as it's null until needed | 11:57.48 |
Robin_Watts | I guess ultimately we should virtualise a lot of the stuff in fz_font. | 11:58.06 |
tor8 | I expect we'll want to do layout with this stuff when laying out pdf font annotations eventually | 11:58.12 |
Robin_Watts | tor8: yeah, maybe. | 11:58.25 |
tor8 | the guts of fz_font are nasty and messy | 11:58.49 |
Robin_Watts | tor8: Yes. We have lots of functions that do: "If this is a freetype font, call the freetype variant, if it's a type3 call the type3 variant" etc. | 12:00.23 |
| and those would be nicer as virtualised things. | 12:00.34 |
tor8 | Robin_Watts: yes. | 12:01.09 |
Robin_Watts | I expect that with michael fiddling in this area for the pdf device, the internals might get juggled a bit. | 12:01.22 |
tor8 | Robin_Watts: here's an idea for dealing with the fallback font issue: | 12:10.51 |
| we first scan the run trying to encode each character using the desired font, and if that succeeds all is good | 12:11.14 |
| if not, we then use harfbuzz/ucdn to find runs of specific unicode scripts, and based on the script we look up a fallback font | 12:11.54 |
Robin_Watts | ok. The first step is equivalent to handing it to harfbuzz, saying 'shape' and then checking that none of the codepoints it gets back is 0. | 12:12.19 |
tor8 | yeah, that's probably the same but possibly faster because we don't need to run the shaping | 12:12.59 |
| the other approach, which I think is what web browsers might be doing, is to keep a list of unicode ranges that each font covers | 12:13.37 |
Robin_Watts | It's interesting that if I pass arabic to hb with Times-Roman, I get a list of shaped glyphs back, all with 0 codepoints - but with offsets etc set for shaping. | 12:13.48 |
tor8 | which is more automatic, if you just hand the browser a directory of fonts from the system, it'll sort them out and have some way of automatically finding a proper fallback font | 12:14.18 |
Robin_Watts | I need to get hb to do the shaping cos I need to get the bbox for the shaped glyphs. | 12:14.31 |
tor8 | I think harfbuzz does some generic arabic shaping without using opentype tables if there are none | 12:14.51 |
Robin_Watts | Having script based fallbacks does sound the right way to go, ultimately. | 12:14.55 |
tor8 | using unicode presentation forms | 12:14.59 |
| script based fallbacks is more robust, if we can control the set of fonts used | 12:15.15 |
| which is something I definitely expect we'll want | 12:15.24 |
| Robin_Watts: I was thinknig as a pre-pass before we do the shaping to measure | 12:15.46 |
| just to figure out the font to use for runs handed off for shaping | 12:16.03 |
Robin_Watts | Well, I figure the shaping will be enough 90% of the time. | 12:16.12 |
tor8 | if we run the hb_buffer_guess_segmnet thing we could use that to look up the scripts | 12:16.27 |
Robin_Watts | tor8: I could pre-break the fragments at script changes. | 12:17.05 |
tor8 | Robin_Watts: that could work; shaping doesn't work past script changes either IIRC | 12:17.41 |
Robin_Watts | I can't see how it would. | 12:17.49 |
tor8 | s/could/should/ :) | 12:18.19 |
| Robin_Watts: ICU seems to have better documentation, and there's an ICU to harfbuzz bridge project that exposes the ICU interface with harfbuzz as a back-end | 12:18.51 |
| that might be worth reading to figure out what harfbuzz actually does | 12:19.00 |
Robin_Watts | hmm. That would end up with punctuation ending up in different fragments. | 12:19.22 |
| unless we have punctuation being 'script neutral' or something ? | 12:19.39 |
tor8 | ICU also requires a run of a single font in a single script to its shaping | 12:19.41 |
| punctuation is script neutral, I think the guess properties thing tries to resolve the punctuation | 12:20.04 |
| better read the code to make sure though | 12:20.09 |
| "Clients can use ICU's Bidi processing to determine the direction of the text and use the ScriptRun class in icu/source/extra/scrptrun to find a run of text in the same script." from the ICU docs | 12:20.29 |
| "The ICU LayoutEngine is designed to process a run of text which is in a single font. It is written in a single direction (left-to-right or right-to-left), and is written in a single script." and I expect the same of harfbuzz | 12:20.44 |
| hm, harfbuzz expects each hb_buffer_t to be in a single script so the guess_segment_properties call just finds the first non-neutral script in the buffer | 12:24.21 |
Robin_Watts | yeah. | 12:24.44 |
tor8 | so splitting the fragments at script changes is something we'll actually need to do rather than want | 12:24.55 |
Robin_Watts | Oh god, the thought of moving over to ICU has just drained me of all energy. | 12:25.08 |
tor8 | no no no! I don't want to do that at all. | 12:25.42 |
| but it could be worth looking at the ICU docs and bridge to understand what harfbuzz's undocumented code is trying to do | 12:26.16 |
Robin_Watts | ICU may have a more up to date UAX #9 implementation. | 12:26.17 |
| I'm happy with harfbuzz as is (other than being in C++) | 12:26.35 |
tor8 | ICU is all c++ though | 12:26.38 |
| with a C++ interface, unless I'm mistaken | 12:26.50 |
Robin_Watts | I may try a C++ -> C conversion for harfbuzz later. | 12:27.04 |
tor8 | harfbuzz being C++ bothers me, but I'm okay with it since the visible interface is still c | 12:27.17 |
| ICU has a scriptrun class they recommend for finding runs of the same script in text | 12:28.52 |
| http://userguide.icu-project.org/layoutengine | 12:28.56 |
Robin_Watts | tor8: I'll just call ucdn_get_script where I am already breaking fragments in the bidi handling code. | 12:29.25 |
tor8 | yeah, I think it's just a matter of spreading/infecting the common/inherited scripts | 12:29.48 |
Robin_Watts | I'm going to assume that the punctation is going to be UCDN_SCRIPT_COMMON | 12:30.41 |
tor8 | Robin_Watts: all punctuation is UCDN_SCRIPT_COMMON, by my reading of http://www.unicode.org/Public/8.0.0/ucd/Scripts.txt | 12:35.00 |
Robin_Watts | ok, I have it splitting into runs of punctuation + script then. | 12:41.51 |
| tor8: Is the mupdf viewer supposed to accept user css too? | 13:29.33 |
| Aha, fixed it. | 13:31.24 |
tor8 | Robin_Watts: yes. -U flag should work on both -x11/win32 and -gl viewers | 13:33.27 |
Robin_Watts | The win32 one was reading layout_css and then not doing anything with it :) | 13:33.54 |
tor8 | ah! | 13:34.00 |
| oops. | 13:34.16 |
| you should be using the -gl viewer though ;) | 13:34.23 |
Robin_Watts | Me so dinosaur. | 13:35.29 |
| Ok, tor8. Changes on robin/harfbuzz. | 13:51.59 |
| See what you think. | 13:52.05 |
Robin_Watts | lunches | 13:52.08 |
tor8 | Robin_Watts: "This does mean that the fallback is reflected in the font that finally makes it through to the device interface." is true today as well | 13:55.15 |
Robin_Watts | tor8: ah, ok, so no worse then. | 14:26.29 |
tor8 | fz_encode_character_with_fallback fills out the &font with the actual fallbacked font for the glyph to use | 14:27.39 |
| the code on robin/master doesn't look like it's using the fallback font (or I'm blind) | 14:29.15 |
Robin_Watts | robin/harfbuzz ? | 14:29.38 |
| tor8: font = font->fallback in measure_word | 14:29.57 |
tor8 | oh, I was looking in draw_word | 14:33.00 |
| ouch, you overwrite the node->style->font | 14:33.07 |
| I've got to go out for a couple of hours, need to help a friend move some stuff. | 14:34.06 |
| Robin_Watts: the node->style is potentially shared by all the flow nodes in the same paragraph | 14:39.48 |
Robin_Watts | tor8: ok, so that needs to be fixed. | 14:40.11 |
| Will need to talk to you about then when you get back. | 14:53.11 |
tor8 | Robin_Watts: I'm back. | 17:59.05 |
Robin_Watts | tor8: I had a cunning idea about the style stuff. | 18:54.40 |
| I'm going to add a fallback pointer to each style | 18:55.09 |
| So for every style, there can be at most as many fallback equivalent styles as there are links in the fallback chain. | 18:55.50 |
| But I've spent the afternoon looking at harfbuzz. | 18:56.36 |
tor8 | so you'll clone each style if there's a use of a fallback font? | 18:56.39 |
Robin_Watts | tor8: Yes. | 18:56.45 |
tor8 | I'd rather just stick a fz_font in the fz_html_flow node | 18:56.56 |
| or an index into a "global" array of fonts | 18:57.21 |
| or just duplicate the find_best_font stuff | 18:57.41 |
Robin_Watts | tor8: We could have an 'index into the fallback list' ? | 18:57.52 |
| so for a given style, 0 would be style->font, 1 would be style->font->fallback etc. | 18:58.22 |
tor8 | one idea I had was to use a fz_font_set instead of a fz_font in the html code | 18:58.36 |
| so style->font would be style->fontset and each flow node could have an index into the font set | 18:58.52 |
| where the font set would be an array of fonts; the desired fz_font, and one fz_font per script we support for fallback | 18:59.26 |
| but an index into the fallback list would serve the same purpose, yes | 18:59.48 |
Robin_Watts | ideally, we'd generalise stuff a bit, so that we didn't lose information going via the device interface. | 18:59.48 |
| OK, I'll do the index. | 19:00.03 |
| Converting HarfBuzz to C is possible, but it'll be very hard going. | 19:00.18 |
| Harfbuzz makes lots of use of templates. | 19:00.31 |
tor8 | Robin_Watts: I mentioned that we use harfbuzz to sebras, and his first words were "are you going to rewrite it in C?" | 19:00.50 |
Robin_Watts | converting away from that will be time consuming. | 19:00.57 |
| If we had a fortnight to invest, I'd say we could do it. | 19:01.08 |
tor8 | and then we'll be stuck with that version forever; backporting fixes could be troublesome | 19:01.22 |
Robin_Watts | There is lots of stuff in there that is overly defensive programming. | 19:01.38 |
| Yes, so I stopped doing that. | 19:01.50 |
| But it calls malloc/free/calloc, so I looked for a way to solve that. | 19:02.04 |
| Updating the code so it passes a heap pointer around (and then can call hb_free/hb_malloc/hb_calloc) was what I tried. | 19:02.43 |
| But that explodes out of control. Again it will make taking fixes on hard. | 19:02.59 |
| So I am tempted to use some #define malloc hb_malloc magic in the hb.h file, and wrap calls to hb. | 19:03.52 |
| The wrapper calls on windows/unix would use thread local storage. | 19:04.30 |
tor8 | hb as I've built it is not thread safe (I didn't investigate too much, I just set -DHB_NO_MT) | 19:04.39 |
| Robin_Watts: yeah, that ought to work. | 19:05.18 |
| not convinced it's worth spending a lot of time on though | 19:05.33 |
Robin_Watts | tor8: Actually... a better solution would be to lock/unlock a mutex and then call harfbuzz. | 19:05.52 |
tor8 | just reuse our freetype lock | 19:06.13 |
Robin_Watts | Nice, yes. | 19:06.20 |
| Ok, I'll do that. | 19:06.31 |
tor8 | Robin_Watts: one potential problem with hacking malloc #defines is if someone uses both mupdf and harfbuzz libraries | 19:07.13 |
| they'll end up with harfbuzz from mupdf, that expects hacked things to happen | 19:07.33 |
| or we won't be able to use system harfbuzz | 19:08.05 |
Robin_Watts | tor8: Well, I could do a systematic malloc/free/calloc replacement with hb_... | 19:08.17 |
tor8 | better to ask upstream to fix it, but I seriously doubt they'll be willing | 19:08.36 |
| since it'd break both source and binary compatibility, unless they do hacks like you propose :) | 19:08.52 |
Robin_Watts | Having looked at the amount of work it would be to do it nicely - yes. | 19:09.09 |
| We *could* do a new header, that's included at the top of hb.h | 19:09.55 |
| and that #defines all the API from being hb_blah to being fz_hb_blah | 19:10.17 |
| So our harfbuzz lib would not conflict with the system one. | 19:10.33 |
tor8 | Robin_Watts: I'd just extend the hb api with a global allocator function+context and #define malloc/calloc/free calls in our hb build | 19:12.42 |
| if someone uses both mupdf and harfbuzz, I hope they won't mind using the mupdf allocator for harfbuzz... | 19:13.57 |
| I wonder how many, or even if any, of our customers use custom allocators | 19:14.26 |
Robin_Watts | tor8: That's fine until they call harfbuzz without mupdf being inited (or having closed down) and get a duff context. | 19:14.50 |
tor8 | well, we'd initialize it with system malloc defaults | 19:15.47 |
| that could go wrong if they use it a bit, then use mupdf which installs its own, and then get mixed allocations :( | 19:16.24 |
Robin_Watts | I'll have a play. | 19:17.17 |
tor8 | well, if we're just hacking it for us, have the mupdf code install and uninstall allocators when we use it | 19:18.39 |
Robin_Watts | tor8: Updates on robin/mupdf.git/harfbuzz and robin/harfbuzz/artifex | 19:59.09 |
| I haven't fixed the style font thing yet. Will do that tomorrow. | 19:59.23 |
| Forward 1 day (to 2016/01/26)>>> | |