| <<<Back 1 day (to 2016/01/25) | 20160126 |
keramis | Hi, on stackoverflow I found "Fontmap {exch ==only ( ) print ==} forall". It does exactly what I was looking for, but couldn't find any docs for '==only'. Can anybody give me a clue? | 09:00.45 |
kens | THere are no docs, its a Ghostscript-specific extension to the PostScript language | 09:01.14 |
keramis | I see. Do you know what this exactly does? Just curious... | 09:03.04 |
kens | Its defined in gs_init.ps if you are fmailiar with PostScript you can figure it out | 09:03.23 |
keramis | Thanks, I'll check it out. | 09:03.54 |
| Enlightening, if you know where to look. :) BTW that thread on stackoverflow helped me a lot (https://stackoverflow.com/questions/11137732/what-are-postscript-dictionaries-and-how-can-they-be-accessed-via-ghostscript/11144359#11144359) to understand what is going on "under the hood". Maybe it should be incorporated in the docs or at least this link. | 09:17.30 |
kens | Why ? It all appears to be standard PostScript, and the command line switches are documented | 09:18.18 |
| Also, if its not documented, you're not supposed to meddle with it, we will change undocumented stuff without warning. | 09:19.33 |
kens | coffees | 09:22.39 |
keramis | I understand this. But for me as a beginner this thread helped me a lot to get things up and running quickly. Reading 912 pages PLRM also helps of course, but is not nearly as quick! ;) | 09:23.53 |
kens | There are other resources, but our documentation is not intended as abeginners guide to PostScript | 09:26.40 |
keramis | Thanks for clarification anyway! | 09:28.43 |
| Apropos command line switches: I couldn't find the -c switch documented in the manpage of gs(1). It's easy to find out what it does, but is this intended? | 09:42.30 |
kens | Forget man pages, use our documentation | 09:42.50 |
| chrisl finally got dictionaries working....... | 09:57.18 |
| I may actually be starting to understand param lists :-( | 09:57.38 |
chrisl | kens: cool | 09:57.39 |
| I doubt that! | 09:58.05 |
kens | Well, I'll never understand the rationale | 09:58.21 |
| Found another little 'gotcha' | 09:58.30 |
| C param lists (and as far as I can see *only* C param lists can have a 'target | 09:58.51 |
chrisl | target?? | 09:59.20 |
kens | If you fail to find the key in the list, then you search the target, and of course if that's a C param list and you fial to find the key there, then you search its target and so on | 09:59.23 |
| The target is another param list | 09:59.35 |
| gs_param_list_set_persistent_keysIt only seems to exist in C param lists | 09:59.50 |
| Crazy implementation details | 10:00.11 |
chrisl | Why is it called "target" and what purpose does it serve? | 10:00.31 |
kens | Now you are asking.... | 10:00.42 |
| For purpose, it allows you to effectively aggregate dictionaries (or other collections) by pointing the 'target of the list to the new list | 10:01.21 |
| In ths case, its 'wrapping' the image params dictoinary with an outer dictionary whch contains some extra values, like the Rows and Columns | 10:02.05 |
chrisl | So, it's a hack, basically? | 10:02.30 |
kens | As far as I can tell, yes | 10:02.41 |
| Its not reliable, if you have a param list that alreaqdy has a target, and that target is not a C param list, then you simply can't replace the existing target | 10:03.15 |
| Basically as you say an ugly hack | 10:03.53 |
| But then that describes param lists pretty accurately throughout | 10:04.10 |
chrisl | It's also doesn't actually replicate the behaviour dictionaries, which is confusing | 10:04.39 |
kens | Well it 'sort of' does, it stops searching when it finds the first (most recent) definition for a key | 10:05.18 |
| But you can't 'undef' a key that way | 10:05.36 |
| Frankly its crap | 10:05.55 |
| I'm guessing it was put in as a quick solution for a problem, just like usual | 10:06.23 |
Robin_Watts | tor8: Updated versions of commits online now. | 11:22.29 |
| That sorts the fallback stuff. | 11:22.35 |
| We're never going to have more than 256 fallback fonts, right? :) | 11:22.43 |
tor8 | I certainly hope not! :) | 11:22.58 |
Robin_Watts | I think we need to address line spacing issues. | 11:23.04 |
| This is my quickly hacked together test: http://ghostscript.com/~robin/ManyLang.epub | 11:23.48 |
tor8 | if I take all of the noto fonts in regular style, (excepting the CJK fonts), there are 101 of them | 11:24.20 |
| and they are 7.3Mb | 11:24.44 |
| if we're daring, we could actually embed the lot | 11:24.51 |
Robin_Watts | tor8: Can we combine them into sane sets? | 11:26.05 |
tor8 | if we drop egyptian hieroglyphs and cuneiform we'll save 1M | 11:26.32 |
| I think they might be uncombinable due to use of opentype features etc for harfbuzz | 11:26.56 |
| but we can group them based on unicode scripts | 11:27.16 |
Robin_Watts | I can't immediately think why combining them should be a problem (assuming the tools know about opentype tables). | 11:27.50 |
| and we can't have more than 65535 glyphs in any given font. | 11:28.02 |
tor8 | even if we can, do we really gain anything by it? | 11:28.24 |
Robin_Watts | tor8: I suspect that every font contains the common glyphs. | 11:29.31 |
| You're presumably looking at the unhinted variants of the noto fonts ? | 11:30.48 |
tor8 | yes. | 11:31.19 |
| we don't use hinting in our rendering, so that would be a waste of space | 11:31.32 |
Robin_Watts | Indeed. | 11:31.36 |
| I'm guessing that stuff probably groups into 'CJK' 'Other South East Asia' 'Indic' 'Arabic' 'European' | 11:32.27 |
| Middle Eastern rather than Arabic. | 11:32.46 |
AverageJoe | Hello everyone! I am putting together a Android app for educational uses. Im no expert by any means: Is it possible/legal to use Mupdf to display PDFs within the application? | 11:33.02 |
Robin_Watts | AverageJoe: MuPDF is released under 2 licenses. | 11:33.20 |
| The first is the GNU AGPL. If you can abide by the terms of the GNU AGPL then you can use MuPDF in your application for free. | 11:33.45 |
tor8 | Robin_Watts: ah, you mean so that customers can easily skip entire families of scripts to save space? | 11:33.52 |
Robin_Watts | tor8: Yes. | 11:33.59 |
tor8 | thirdparty/harfbuzz/src/hb-alloc.h:14:17: error: unknown type name 'size_t' | 11:34.04 |
| could be done by #ifdefs in the file that includes the embedded font | 11:34.20 |
| like we do for the CJK CJKNOFULL etc | 11:34.25 |
Robin_Watts | AverageJoe: Those terms include, but are not limited to, the fact that you will have to give away the source code to your entire app to anyone that asks for it that has got a copy of your app. | 11:34.59 |
| AverageJoe: Most app developers looking to make a profit figure that that's a non-starter :) | 11:35.26 |
tor8 | Robin_Watts: needs a #include <stddef.h> in hb-alloc.h | 11:35.49 |
Robin_Watts | So, MuPDF is available under another license that removes all the nasty terms and conditions. That's the Artifex Commercial License. | 11:36.16 |
tor8 | Lock ordering violation: Attempt to take lock 3 when 2 held already! | 11:36.19 |
Robin_Watts | But that will cost you money. | 11:36.21 |
AverageJoe | well no Intentions to make profits here :D i doubt someone would pay for sth like this anyways. | 11:36.41 |
Robin_Watts | AverageJoe: Well, if you're happy to release your app under the AGPL, then yes, you can use MuPDF under that license for free. | 11:37.19 |
AverageJoe | so if i put it on bitbucket or github or sth. and copypaste the gnu licensing hints and links/email to me i am on the safe side basicaly | 11:37.49 |
Robin_Watts | AverageJoe: Yes, AIUI (but I Am Not A Lawyer) | 11:38.12 |
AverageJoe | guess i have to read into it | 11:38.17 |
| thank you very much so far! | 11:38.34 |
Robin_Watts | no worries. let us know how you get on. | 11:38.42 |
| AverageJoe: There is a new version of MuPDF coming out soon that includes revised Java/JNI code. | 11:39.03 |
| It means you can call MuPDF directly rather than using our example app. | 11:39.33 |
| tor8: crap. How are you getting that? | 11:39.47 |
tor8 | Robin_Watts: mupdf-gl on ManyLang.epub | 11:40.31 |
| with your branch | 11:40.46 |
Robin_Watts | I take and drop the freetype lock around the draw code. | 11:41.15 |
tor8 | the shaped text looks nothing like what firefox renders for ManyLang.epub (when you extract the html file) | 11:41.20 |
Robin_Watts | Does your draw code use the glyphcache lock ? | 11:41.24 |
tor8 | nope, mupdf-gl doesn't use the mupdf font rendering | 11:41.40 |
| it uses freetype directly on its own | 11:41.44 |
Robin_Watts | I will look into it. | 11:42.03 |
| tor8: Shaped text being different - crap. Same font? | 11:42.27 |
tor8 | Robin_Watts: which font are you testing with? | 11:45.21 |
Robin_Watts | DroidSansFallback in mupdf | 11:53.33 |
tor8 | Robin_Watts: something else is fishy. the japanese, chinese and hindi text disappears | 11:55.03 |
Robin_Watts | Hindi disappears, certainly. It's not in the fallback font. | 11:55.33 |
tor8 | it should turn into tofu or bullets, no? | 11:55.48 |
Robin_Watts | tor8: Not currently. | 11:56.01 |
| I only get 1 char of the japanese or chinese text. That's not right :( | 11:56.18 |
tor8 | still doesn't explain why only 1 char of japanese or chinese text appears... | 11:56.30 |
Robin_Watts | tor8: indeed. | 11:56.45 |
tor8 | and the text in "English sentence with ...... in the middle of it." looks nothing like firefox renders it | 11:56.51 |
Robin_Watts | tor8: I'll look into that, but I'd like to have a better fallback font in place first. | 11:57.32 |
| It wouldn't surprise me to find that the shaping stuff in notosansfallback is knackered, so I might be searching in code for what are actually font problems. | 11:58.12 |
tor8 | Robin_Watts: I still get the same characters (not matching firefox) out when using NotoNaskhArabic-Regular.ttf instead of DroidSansFallbackFul.ttc | 11:59.10 |
| and with NotoNaskhArabic-Regular.ttf I get tofu out for Hindi | 11:59.53 |
| still only 1 char each for japanese and chinese | 12:00.01 |
| at least it's the correct 1st char :) | 12:00.23 |
Robin_Watts | Ok, so the locking stuff is happening here. It's down to the draw call taking the glyphcache lock. | 12:00.56 |
| I can fix that. | 12:00.59 |
| tor8: OK, so new commit on robin/harfbuzz that fixes the locking. | 12:12.37 |
| Do you have a commit that adds the proper fonts I can snaffle? | 12:13.00 |
| brb. | 12:13.08 |
tor8 | Robin_Watts: working on adding a commit where we can do html_lookup_noto_font with a UCDN script tag | 12:21.51 |
Robin_Watts | tor8: Could that maybe be fitz_lookup_font_for_script ? | 12:25.05 |
| a) Doesn't need to be html rather than fitz. | 12:25.22 |
| b) would be nice if other people could slot in other fonts there without the 'noto' bit confusing names. | 12:25.44 |
tor8 | Robin_Watts: yeah, we could put in fitz instead | 12:26.47 |
| might make sense to move the pdf builtin fonts into fitz as well | 12:26.59 |
zoug | hello, any mupdf users here? Do you guys know if it's possible to display two pages side by side? I couldn't find anything on the web | 12:49.54 |
tor8 | zoug: it is not possible | 12:55.56 |
zoug | tor8: :( | 12:56.08 |
| thanks for your help | 12:56.15 |
tor8 | if you're handy with a c compiler, you can probably add to mupdf-gl given an afternoon or two of hacking | 12:57.24 |
Robin_Watts | zoug: To be clear. The MuPDF core is absolutely capable of that. | 12:57.54 |
| But none of our released viewers include that functionality. | 12:58.08 |
zoug | okey! hopefully it'll be added in the future if you guys think it could be good | 12:58.39 |
Robin_Watts | zoug: Check out gsview. | 12:58.56 |
zoug | I personnally am not good at all in programming so couldn't help unfortunately | 12:58.57 |
Robin_Watts | That uses mupdf as the view engine, and if it's added anywhere, it will probably be to there. | 12:59.17 |
zoug | okey, will do | 12:59.42 |
AverageJoe | Robin_Watts: any estimates on the java/JNI code? I got an example project that is using it but i am not sure if i understand it. | 13:09.10 |
Robin_Watts | AverageJoe: what do you mean by estimates? | 13:09.30 |
AverageJoe | date :D | 13:09.42 |
Robin_Watts | Oh, right, well, it's in the public git now. | 13:09.59 |
AverageJoe | whoops | 13:10.09 |
Robin_Watts | next release is scheduled for march. | 13:10.12 |
| actually, I lie. | 13:11.23 |
| It's not there yet. | 13:11.33 |
| It is here: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=shortlog;h=refs/heads/jni | 13:12.01 |
| The JNI bindings are in the penultimate commit on that branch. | 13:12.21 |
| The final commit is a work in progress to move the app over to working on top of those new bindings. | 13:12.49 |
| It's there enough to prove that those bindings work, but it's not fully featured yet. | 13:13.09 |
tor8 | Robin_Watts: commits on tor/master to add noto fonts, and load them into the html fallback chain | 13:24.51 |
Robin_Watts | tor8: Ta. | 13:24.59 |
| tor8: What do we need to do to get the JNI changes in? | 13:25.11 |
tor8 | it *should* build on windows as well, but that's untested | 13:25.18 |
Robin_Watts | You were looking at platform/java ? | 13:25.22 |
tor8 | Robin_Watts: I was going to look at platform/java and make a very simple desktop java viewer on top | 13:25.37 |
| but it completely slipped my mind... | 13:25.45 |
| I have goldfish memory when it comes to remembering where I was after the weekend | 13:26.08 |
| and I'd forgotten to write it down in my TODO file :) | 13:26.14 |
Robin_Watts | I'd really like to get this stuff into the release. | 13:26.21 |
tor8 | Robin_Watts: Agreed! | 13:26.28 |
| and I'd also love to have shaping with the noto fonts in the release as well | 13:26.49 |
| the full noto set as comiled looks to add 5.8Mb to the binary | 13:27.26 |
| we have both the sans and serif variants where available, but I dropped the 'bold' | 13:28.07 |
Robin_Watts | tor8: How much would bold and italic (and bold italic) options add? | 13:29.14 |
| If they are easily selectable at build time, then I can imagine that desktop users would just add the lot. | 13:29.53 |
tor8 | it'd double the size | 13:30.45 |
| roughly | 13:30.50 |
Robin_Watts | For desktop use that's nothing. | 13:31.01 |
tor8 | if we drop the serifs (and keep only the sans-regular faces for each script) we're down to 4.7m | 13:33.29 |
| if we take the serif rather than the sans, but still only keep one of the two: 4.8m | 13:34.16 |
Robin_Watts | tor8: In an ideal world, we'd like to be able to say at build time: "include serif", "include sans serif", "include bold", "include italic", "include cjk" etc. | 13:35.02 |
tor8 | Robin_Watts: yeah. | 13:35.12 |
| I'm thinking this compiler must be smarter than usual... it's dropping the static arrays that aren't used | 13:35.30 |
| yeah, if I compile with gcc it dumps in the whole lot | 13:35.58 |
Robin_Watts | Possibly we should have a header file that just resolves a nest of INCLUDE_NOTO_SERIF etc stuff into INCLUDE_NOTO_SANS_BOLD etc. | 13:36.07 |
| and that way people can EITHER use INCLUDE_NOTO_{SANS,SERIF,BOLD,ITALIC} etc, or they can use INCLUDE_NOTO_SANS_BOLD (i.e. explicit fonts) | 13:36.45 |
tor8 | ah, no, gcc is also smart. it's just dumb if we use the .incbin inline asm directives to speed up compilation | 13:37.04 |
Robin_Watts | tor8: That's strange. | 13:46.00 |
| The ABCDEF japanese text is being converted to A B C D E F before it even gets to the bidi stuff. | 13:46.22 |
tor8 | Robin_Watts: that's because japanese can be line broken between any characters | 13:46.52 |
Robin_Watts | oh, ok. | 13:47.17 |
| That makes more sense then, ta. | 13:47.27 |
| D'Oh. Stupid mistake. | 14:08.15 |
| Japanese/Chinese fixed. | 14:09.25 |
tor8 | Robin_Watts: hm, there's one noto font that doesn't fit the script scheme -- NotoSansSymbols | 14:10.24 |
| that's got all the weird punctuation in it | 14:10.30 |
| should have that as the final fallback I guess | 14:10.36 |
Robin_Watts | tor8: Ah, that's a pain. | 15:06.53 |
| Cos I split fragments so they are common + any one script in each fragment. | 15:07.23 |
| Are all the 'weird punctuation' things common? | 15:07.47 |
| (I mean, are they all classed as "common" rather than appearing commonly) | 15:08.10 |
kens | Sill not getting email from the mailing lists :-) | 15:09.04 |
Robin_Watts | kens: I was just thinking that'd be a topic for the meeting. | 15:09.19 |
kens | Yeah I noticed because I got Marcos' bug report and was checking through it before meeting | 15:09.42 |
Robin_Watts | tor8: So, looking at these commits: | 15:23.01 |
| I don't like the fz_lookup_noto_font name. Having 'noto' in there bothers me. having it as a fallback font would be better. I reckon. | 15:23.55 |
| I'm also not hugely keen on the implementation. Could we consider a table of script/serif/font pointer ? | 15:24.59 |
| When asked for a font for a given script we'd search the table for a matching entry? If we don't have a matching serif, we would live with a matching script. | 15:25.55 |
| We could actually make the table be script/serif/italic/weight/font. | 15:26.17 |
| and that way people can add new fonts just by adding to the table. | 15:26.41 |
| Possibly we could have font pointers be both internal (memory pointers) or external (filenames). | 15:27.18 |
marcosw | morning all. are we meeting here or on the other side? | 15:27.38 |
Robin_Watts | Other side, I think. | 15:28.43 |
| tor8: Also, I don't like fz_load_html_fallback_font being html specific. | 15:29.05 |
tor8 | Robin_Watts: I anticipate adding a caching layer where loaded fz_font's get stored and that's where people would add new fonts | 15:32.39 |
Robin_Watts | tor8: An fz_context->font thing ? | 15:33.09 |
tor8 | Robin_Watts: yes, all weird punctuation things are classed as common | 15:33.12 |
| that or fz_html_font_set thing | 15:33.21 |
Robin_Watts | tor8: Does the 'wierd symbols' font contain ALL the punctuation? | 15:33.54 |
tor8 | Robin_Watts: no, it does not contain the usual punctuation | 15:36.32 |
Robin_Watts | tor8: urgh. So we might need a wrapper around ucdn's get script call to define the 'weird' stuff as being a different script. | 15:40.07 |
tor8 | Robin_Watts: can we split the runs on final font as well as script | 15:43.46 |
| find the script, use the cmap to lookup the encoding in the fallback chain to get the font | 15:44.10 |
| and make runs of common direction+script+font to feed to harfbuzz? | 15:44.24 |
Robin_Watts | Not trivially, the way it's currently structured. | 15:44.44 |
| The bidi stuff (which is where I am doing the splitting at the moment) doesn't have a font. | 15:45.01 |
tor8 | the actual glyph index we find when encoding the unicode we'd toss, just use it to see if a code point exists in a given font | 15:45.10 |
| instead of looking for a best_font for an entire script, split where you get different fonts? | 15:45.51 |
Robin_Watts | tor8: nodes are already split on styles. | 15:46.51 |
| I look for a best_font on a node. | 15:47.19 |
tor8 | I mean to further split the node by looking up the actual font for each character | 16:01.57 |
Robin_Watts | tor8: Right. I can add code to do that to 'newFragCb' in the html stuff. | 16:02.47 |
tor8 | Robin_Watts: I think we should only load the fallback fonts on an as-needed basis though | 16:03.29 |
Robin_Watts | tor8: Yes. | 16:03.42 |
tor8 | so we should move away from the font->fallback chain to something else that keeps the fallback fonts in a separate struct | 16:03.53 |
Robin_Watts | tor8: So newFragCb gets called with fragments of code that are already split on directionality and script. | 16:04.35 |
| s/code/text/ | 16:04.41 |
| (well, by "script" I mean no fragment contains more than a single script type + punctuation) | 16:05.51 |
tor8 | Robin_Watts: hang on, let me check out your branch | 16:05.52 |
| Robin_Watts: single 'resolved' script type (where common and inherited are handled) | 16:07.11 |
| Robin_Watts: so, I think if you look up each character in that text to find the actual font needed and split the node on those boundaries too? | 16:07.43 |
Robin_Watts | inherited? | 16:08.06 |
tor8 | combining characters have inherited (diacritics, etc) | 16:09.02 |
| COMBINING GRAVE ACCENT, etc | 16:09.25 |
| just treat them the same as common | 16:09.34 |
Robin_Watts | OK. Will do. | 16:09.45 |
| tor8: I wonder whether I should do any subsequent splitting in measure_word | 16:10.06 |
tor8 | I'd just do it here, keep it gathered | 16:10.20 |
Robin_Watts | measure_word is where I attempt a shape on the node and gradually fallback. | 16:10.36 |
tor8 | or not split, just feed smaller spans of each node to shaping | 16:11.00 |
Robin_Watts | I reckon there is a better way to do this. | 16:14.16 |
| In the bidi stuff, where I split fragments before I call newFragCb, I call 'ucdn_get_script' for every char. | 16:14.56 |
| I want to call fz_ucdn_get_script instead, and that function will call ucdn_get_script. If it gets 'common' back, it will further subdivide out the 'weird' stuff. | 16:15.54 |
| Then the code will automatically split the nodes properly. | 16:16.18 |
| I could even return the script type with each fragment. | 16:17.10 |
| Which could be stored in the node in the bitfield for no extra cost. | 16:17.24 |
| Then it's easy to load the right fallback font without searching. | 16:17.45 |
marcosw | i'm going to reboot casper. it is overdue for updates. | 16:18.06 |
| tor8 Robin_Watts sebras: ^^^ | 16:18.29 |
tor8 | the boundary between common and weird-common scripts would depend on the actual font used | 16:20.49 |
kens | chrisl ping | 16:21.08 |
chrisl | kens: pong | 16:23.15 |
kens | Looking into a bug I see a (mior) problem with error reporting | 16:23.35 |
| in stream.h we define the check_file macro which returns gs_error_invalidaccess if the file is invalid, the PLRM says that should be an ioerror | 16:24.11 |
| I propose to change to an ioerror, what do you think ? | 16:24.20 |
| I should say I'm looking at the setfileposition operator which says it shoudl return ioerror if the file is invalid | 16:26.20 |
chrisl | kens: I don't have a problem with that - but I'm a little hazy on what that macro is checking..... | 16:26.40 |
kens | (A0) == flushFirst it checks to see if the type of the object is t_file, then it checks to see if its invalid (no idea what the details are there) | 16:27.21 |
| If its type is wrong then it returns a typecheck, otherwise it returns invalidaccess | 16:27.40 |
| And I thnk that *should* be ioerror | 16:27.49 |
chrisl | Yep, that makes sense | 16:28.17 |
kens | I'll try a cluster push now | 16:28.31 |
Robin_Watts | tor8: "the boundary between common and weird-common scripts would depend on the actual font used" howso? | 16:28.42 |
kens | Of course I still have to find out why the file is invalid, but that's a different problem | 16:28.44 |
tor8 | Robin_Watts: what do you consider 'common' and 'weird-common' and why would you want to split at them? | 16:32.27 |
| I thought this was to get around weird punctuation not existing in a given fallback font, so we can drop to another one | 16:32.50 |
| without dropping for the entire word | 16:33.03 |
Robin_Watts | tor8: Any char that is in 'common' should exist in every font, AIUI. | 16:33.23 |
| Any char that is in 'wierd-common' might well only live in the symbol font. | 16:33.39 |
| Hence I would split every node to have either common/inherited/a single language. | 16:34.10 |
| and any wierd-common stuff goes into their own nodes. | 16:34.29 |
tor8 | Robin_Watts: I think that's a flawed assumption though | 16:35.38 |
| it would be nice if such were the case | 16:35.51 |
| but we're going to have 3 levels of fonts as I see it: the user specified font (regardless of script), which will fall back to a per-script fallback font, which will fall back to a catch-all symbol font | 16:36.45 |
| the user-specified font is the one in the "style" struct | 16:37.31 |
Robin_Watts | tor8: Ok. So, I'm still going to change the bidi stuff to also pass out the 'script' value for each fragment (which may be COMMON). | 16:39.14 |
marcosw | I need to run to a doctor's appointment; will be back later today. | 16:39.33 |
tor8 | Robin_Watts: CJK for example uses different punctuation, which is script common but still sort-of specific to cjk | 16:39.36 |
Robin_Watts | That way we can try the user specified font, and if that fails, we can drop back direct to the script font. | 16:39.53 |
tor8 | Robin_Watts: yeah | 16:40.05 |
Robin_Watts | If the script font STILL misses some chars, I'll split those out, and look for them in the symbol font. | 16:40.23 |
tor8 | so you ran into trouble getting the punctuation script for cjk punctuation due to it being one fragment per character? | 16:40.39 |
Robin_Watts | tor8: No. | 16:41.04 |
tor8 | we could do the script analysis all the way up in generate_text and feed the script when creating the flow nodes | 16:41.14 |
Robin_Watts | The "only 1 char of japanese" was a stupid mistake to do with zero advance chars. | 16:41.38 |
tor8 | in fact, that'd probably be best should we eventually start respecting the html tags that define language etc | 16:41.39 |
Robin_Watts | tor8: Urm... | 16:42.03 |
| html can have markup for l2r and r2l. And that information should be fed into the nodes. | 16:42.26 |
tor8 | yeah, and language as well | 16:42.39 |
Robin_Watts | which in turn gets fed into the bidi stuff as a 'base' direction. | 16:42.40 |
tor8 | does the bidi stuff split on scripts? | 16:42.54 |
| or just bidi runs? | 16:42.57 |
| I'm suggesting we resolve, split and feed script into the nodes as well | 16:43.31 |
Robin_Watts | The language stuff is mostly useful for resolving the glyphs for unicode things that can have different looks in different languages. | 16:43.35 |
| The core bidi code (which I am not changing really) splits on runs. | 16:44.03 |
| The code of mine that wraps it splits on script too. | 16:44.12 |
tor8 | to cover the case where a node has no text that's anything other than UCDN_SCRIPT_COMMON | 16:44.24 |
| but should still copy it from surrounding text | 16:44.39 |
| a case we don't currently handle, AIUI | 16:44.46 |
Robin_Watts | Yeah, the bidi code takes care of all that. | 16:44.52 |
| I gather all the text from a flow up into a single buffer (punctuation, and all, multiple scripts, different styles etc). | 16:45.22 |
| That buffer gets fed to the bidi code which calls me back to say "chars 0 to 3 are one fragment, with directionality l2r" "chars 4 to 6 are another framgnet with directionality r2l" etc. | 16:46.04 |
tor8 | oh, right! you do the bidi scanning and splitting on the whole paragraph. nevermind my ramblings then :) | 16:46.06 |
Robin_Watts | I then split the nodes further so that no node is in more than one bidi fragment. | 16:46.37 |
tor8 | still, should probably save the script in the node from that step to help looking up the right set of fallback fonts each node | 16:46.41 |
Robin_Watts | Yes. | 16:46.45 |
| And each node should have a tag in it to say what language was actually specified by the markup (to be used when resolving the unicode unified stuff) | 16:47.19 |
tor8 | because once in draw_word and measure_word we don't have the script tag from the bidi splitting pass | 16:47.26 |
| Robin_Watts: yes. | 16:47.33 |
Robin_Watts | tor8: It will, cos I want to add that :) | 16:47.45 |
tor8 | un-unify the unified CJK characters per language? :) | 16:48.17 |
Robin_Watts | Also each node should contain a tag to say whether a specific l2r or r2l was specified by the markup. | 16:48.20 |
tor8 | then we'll need a better CJK font than droidsansfallback | 16:48.27 |
Robin_Watts | tor8: Yes, that's basically what you're required to do. | 16:48.27 |
| tor8: Right. | 16:48.36 |
| Or at least we'll have the freedom to do that. | 16:48.44 |
| THat's the argument for separate C/J/K/V fonts. | 16:48.54 |
tor8 | the noto han sans font (which is huge) has language specific characters | 16:48.56 |
| which we could manage to deal with by making a TTC with a subfont per language | 16:49.11 |
| like we currently do to hack vertical/horizontal writing modes | 16:49.25 |
Robin_Watts | tor8: Yes, you can have the different variants within a font and use an opentype specific mechanism to get the right ones. | 16:49.29 |
| but that's hard, and I'm not sure freetype can do that. | 16:49.38 |
tor8 | or as the ones they ship do, use opentype | 16:49.40 |
| I think harfbuzz can do that. freetype doesn't handle any opentype tables, at all. | 16:49.54 |
| which is why I had to make the TTC hack for droidsansfallback | 16:50.08 |
| the original droidsansfallback had alternate glyph lookup tables for vertical writing using opentype tables | 16:50.26 |
Robin_Watts | tor8: If Harfbuzz sorts it, great. | 16:51.13 |
| So, I'll push through the changes I need and put a commit up. | 16:51.36 |
tor8 | Robin_Watts: it should, not saying it'll be easy. and it won't work for PDF. | 16:51.36 |
Robin_Watts | None of this stuff works for PDF. | 16:51.45 |
tor8 | Robin_Watts: fab. I've got a full chain of fallback fonts on tor/master you could rebase ontop of | 16:51.54 |
| and then hack the ugly chain up and make the html_font_set cache things as needed, indexed by script | 16:52.20 |
Robin_Watts | tor8: How many scripts are there ? | 16:53.09 |
| 132, I think. | 16:53.28 |
tor8 | 88 in use | 16:53.35 |
| 132 or so max | 16:53.50 |
Robin_Watts | I was going by the UCDN_SCRIPT thing. | 16:53.57 |
| Right. | 16:53.58 |
tor8 | but the upper chunk 36 ones don't have a noto font | 16:54.06 |
| Robin_Watts: yeah. I'd use the UCDN_SCRIPT thing as the limit. | 16:54.26 |
Robin_Watts | I think I'd be in favour of a ctx->font->fallbacks. | 16:54.26 |
tor8 | Robin_Watts: yeah. that'd probably be better, so we don't reload them all for every html document | 16:54.55 |
Robin_Watts | yeah. | 16:54.59 |
| and can we define 132 to be 'SYMBOL' ? | 16:55.18 |
tor8 | the html_font_set is to collect all fonts for a given document | 16:55.20 |
| Robin_Watts: abuse SCRIPT_UNKNOWN? | 16:55.39 |
Robin_Watts | actually, ignore that. | 16:55.53 |
tor8 | I'd put the NotoSansSymbol font into font->fallbacks->final_fallback | 16:56.22 |
Robin_Watts | tor8: Did you see my burbling above about script/italic/weight/font ? | 16:57.46 |
tor8 | Robin_Watts: yeah, I'm thinking that stuff would go into the ctx->fallback thing? | 16:58.12 |
Robin_Watts | I'd like to be able to say "get me a fallback for this font" and get the closest fallback that we have from the fallback set, via the context thing. | 16:58.33 |
| Yeah, sounds like we're on the same page. | 16:58.38 |
tor8 | I'd still like to keep the builtin data font lookups like they are; those functions shouldn't be used by users. | 16:59.11 |
Robin_Watts | So I'll let you do the fonty stuff, and I'll do the html/language/bidi stuff. | 16:59.11 |
tor8 | Robin_Watts: okay, cool. I'll look at bashing together a fallback font context thing tomorrow then. | 16:59.45 |
Robin_Watts | I'm thinking that customers may want to customise this stuff. | 16:59.46 |
| If we were to sell this to a Korean e-reader manufacturer for instance, they might want to include bold/italic/sans/serif korean fonts but only rudimentary japanese ones, say. | 17:00.50 |
tor8 | Robin_Watts: yeah. | 17:01.04 |
Robin_Watts | Hence having a simple table driven thing would be an advantage. | 17:01.09 |
| Forward 1 day (to 2016/01/27)>>> | |