| <<<Back 1 day (to 2016/06/19) | 20160620 |
timofonic | Robin_Watts: What about International Digital Publishing Forum and it's members (http://idpf.org/membership/members)? What about a crowdfunding for full EPub support within MuPDF (what would be the estimated cost plus maybe extra devs to accelerate it?)? | 10:32.23 |
Robin_Watts | timofonic: I'm not sure we want to get into crowdfunding. I can't believe we'd raise enough. | 10:33.32 |
timofonic | Robin_Watts: I see | 10:34.40 |
Robin_Watts | When you say "full EPub support", what do you mean? | 10:35.46 |
| There is a difference between "filling in the gaps in Epub v2" and "Doing Epub v3" | 10:36.30 |
| Epub v3 is moving towards fixed layouts. | 10:37.12 |
| Rather than the benefit of Epub v2 which is reflowability. | 10:37.31 |
timofonic | Robin_Watts: I see | 10:44.56 |
tor8 | Robin_Watts: did you see the new coverity warnings from the signed/unsigned changes? | 11:06.30 |
Robin_Watts | tor8: I did not. | 11:08.09 |
| tor8: Various fixes etc on robin/master | 11:08.23 |
tor8 | Robin_Watts: all LGTM | 11:21.09 |
| Robin_Watts: okay, I've got 'lang' tags being parsed and fed through to the fallback font selection | 11:21.48 |
| but the actual tag used is the question -- do we reuse harfbuzz's language tags so we can easily pass them on to the opentype shaping as well? | 11:22.23 |
| I had to bump the 'markup_lang' bitfield to 16 bits from 8 | 11:22.40 |
Robin_Watts | tor8: how come? | 11:23.16 |
tor8 | 8 bits is not enough to hold even 2-char language codes | 11:23.53 |
Robin_Watts | 27*27*27 = 19683, so 15 bits. | 11:26.09 |
timofonic | Robin_Watts: I wonder why they are so interested in the useless fixed layout, anyway. I consider certain features to be interesting, I understand many of this would require a very major effort: XHTML (MathML, Images (jpg, png, gif), video (local apis? webvtt, ttml for captions and subtitles), audio (with remote sources?), epub:trigger ), SVG, Multimedia Overlay, JavaScript (with somewhat iBook Author | 11:26.41 |
Robin_Watts | Adding the 6 (?) extra special ones need not make that 16 bits. | 11:26.41 |
timofonic | compatibility), CSS3, pop-up footnotes, dictionary (Android has a few ones, ColorDict was the "standard API" and some apps like GoldenDict use it), strikethrough, TTS, annotation. | 11:26.41 |
tor8 | Robin_Watts: I wonder if we could limit ourselves to ISO 639-1 and get away with needing only 10 bits | 11:27.12 |
Robin_Watts | timofonic: Images are not really a problem. | 11:27.16 |
| SVG is similarly getting there. | 11:27.33 |
| Some of the other stuff is just batshit crazy. | 11:27.59 |
timofonic | Robin_Watts: But I consider MathML, SVG, CSS3, pop-up footnotes, dictionary APIs, , TTS, annotation, audio and video the most interesting ones. But that's subjective :P | 11:28.21 |
| Robin_Watts: I see | 11:28.23 |
| Robin_Watts: MathML is batshit crazy too? :P | 11:28.42 |
Robin_Watts | tor8: If it was me, I'd have the special case ones at the low end, then the 27*27*27 after that. | 11:28.46 |
| That way we can cut back to fewer bits by just ignoring the end ones. | 11:29.22 |
tor8 | Robin_Watts: at the moment I just hacked the special case ones to be 'zhs' and 'zht' (which are unused in the iso 639 lists I've looked through) | 11:29.25 |
Robin_Watts | tor8: ok. | 11:29.41 |
tor8 | Robin_Watts: commit on tor/master | 11:31.14 |
Robin_Watts | 1+1+7+15+8 = 32, so we're still good. | 11:40.39 |
| In *fz_load_fallback_font what's with the 150, 151, 152, 153 ? | 11:41.26 |
tor8 | the font context caches loaded fallback fonts | 11:43.41 |
| we used to cache based on script only | 11:43.47 |
| but we need to cache on script+language for the CJK variants | 11:44.03 |
| magic numbers; sadly there is no macro for the last UCDN script | 11:44.23 |
Robin_Watts | Can we define our own: enum { UCDN_EXTRAS_JA = 150, ... } ? | 11:45.10 |
| Or we could extend the UCDN table ourselves - that's probably better. | 11:46.02 |
| That way, if we ever upgrade UCDN, code will fail to build rather than appear to work. | 11:46.29 |
tor8 | I can add a UCDN_LAST_SCRIPT #define and base the numbers off that | 11:46.56 |
Robin_Watts | tor8: Cool. | 11:47.05 |
| script is used to lookup the noto font, right? | 11:47.46 |
| I can't see where these new values are ever used. | 11:47.54 |
| oh, it's just for caching, so the numbers just need to be unique. | 11:49.06 |
tor8 | yes. | 11:49.11 |
Robin_Watts | Seems plausible to me then. | 11:49.20 |
tor8 | Robin_Watts: updated commit on tor/master | 11:50.25 |
| Robin_Watts: http://pastebin.com/raw/gC2gcjc9 there's a sample file, paste it into cjk-lang.xhtml | 11:51.56 |
Robin_Watts | Looks good to me. | 11:52.05 |
| tor8: Am in the middle of fighting SOT at the moment. | 11:52.22 |
| Do we use the loca thing? | 11:52.29 |
tor8 | no. that's the next step. | 11:53.36 |
Robin_Watts | ok. | 11:53.42 |
tor8 | we can't use 'loca' lookups for PDF fonts | 11:53.57 |
Robin_Watts | Indeed, but we can for epub. | 11:54.08 |
tor8 | yes. so there the question becomes one of whether to use hb_language tags so we don't have to map back and forth | 11:54.36 |
Robin_Watts | hb_language tags are 32bit, IIRC. | 11:56.15 |
| and we need a tag on every fragment. | 11:56.31 |
| Our mapping back and forth should be fast. | 11:57.05 |
tor8 | each fragment currently has 3 pointers, 4 floats and 35 bits of bitfield flags | 11:58.01 |
Robin_Watts | 35? | 11:58.13 |
tor8 | yes. 3 bits of type, 1 bit of expand, 1 bit of linebreaking, 7 bits of bidi-level, 8 bits of script and 15 bits of language | 11:58.45 |
| oh, I can't even count :( | 11:59.02 |
| no, I can. I'm just having monday morning issues :) | 11:59.53 |
| hence my question if we can drop the language tag to ISO 639-1 only and use only 2 character codes | 12:00.15 |
| with 2 specials for zh-Hans and zh-Hant | 12:00.23 |
Robin_Watts | Ok, so if we could get language down to 12 bits we'd be OK. | 12:01.23 |
| How many bits do we need for script? | 12:01.46 |
| 8 I guess. | 12:01.53 |
tor8 | 8, there are 131 scripts | 12:02.04 |
| we could probably drop bidi level a few bits though | 12:02.14 |
Robin_Watts | tor8: I wonder if we could move to a table for iso 639-3. | 12:04.25 |
tor8 | 7847 entries in the iso 639-3 list | 12:06.22 |
Robin_Watts | Only 535 in harfbuzzes list. | 12:06.59 |
tor8 | http://www-01.sil.org/iso639-3/iso-639-3.tab | 12:07.10 |
Robin_Watts | but 7847 would fit in 13 bits. | 12:07.34 |
tor8 | Robin_Watts: the ot_languages list? | 12:10.33 |
Robin_Watts | Yeah. | 12:10.45 |
| You have to figure that if harfbuzz doesn't recognise it, there is no point in us recognising it :) | 12:11.35 |
tor8 | yeah. | 12:11.41 |
| too bad they use a full opentype 4-byte tag | 12:12.11 |
| though we could cheat, since the 4th byte is always ' ' | 12:12.20 |
| but using 24 or 32 bits makes no difference | 12:12.41 |
| I'm leaning towards promoting the markup_lang to a full integer (not bitfield) and just using harfbuzz tags | 12:13.29 |
Robin_Watts | tor8: Can't we have a copy of the tags from their table, and just bsearch it? | 12:20.25 |
| Maybe I'm wrong to be bothering about the size of this given the overheads in the rest of the structure. | 12:21.10 |
tor8 | Robin_Watts: huh. the harfbuzz api takes iso-639 strings and not opentype tags | 12:23.31 |
| so yeah, just a list of codes that harfbuzz supports would be sufficient | 12:23.46 |
| Robin_Watts: rats. harfbuzz supports language tags outside of their list. | 12:47.09 |
| if the language code is 3 characters long, they use it as is as an opentype tag | 12:47.26 |
Robin_Watts | tor8: I would be inclined to just us the extra bits for now. | 13:16.40 |
tor8 | Robin_Watts: agreed. | 13:17.01 |
| I've got an updated commit on tor/master that sets the language for shaping as well | 13:17.35 |
| just need to find some way to test it... | 13:17.40 |
| Robin_Watts: ahah! it works :) | 13:19.52 |
bonjour | hello | 13:45.22 |
ghostbot | Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line. | 13:45.22 |
bonjour | Does anyone know what is the source of the error "ghostscript failed to convert document" when using seclection extraction in GSV 6.0 ? | 13:46.14 |
kens | The input file was probably corrupted | 13:47.06 |
bonjour | The input file can be read without problem in ghostview 5.0 and also it can be converted in eps | 13:48.07 |
kens | Its not possible to give any opinion without seeing the file | 13:48.25 |
bonjour | the problem exists only with the selection extraction in GCV 6.0 | 13:48.30 |
| GSV | 13:48.34 |
| do I need ghostscript to have GSV working fine ? If yes, what is the recommended version ? thx | 13:51.30 |
| for GSV 6.0 | 13:51.44 |
kens | Seems I put bonjour off.... | 14:02.15 |
| Forward 1 day (to 2016/06/21)>>> | |