MuPdf / Fitz
Tor Andersson
tor.andersson at dsek.lth.se
Thu Jul 11 09:38:00 PDT 2002
Hi all,
First, where does the name Fitz come from?
Raph, will you set up mailing list and wiki?
> Great. You and I have obviously covered a lot of the same ground. My
> custom font is "LeBe":
>
> http://www.levien.com/~raph/lebe.ps
I'm afraid I'm just a big hack with a big mouth :/
> My reimplementation of the Tex H&J algorithms is LibHnj. You might be
> interested in the code; it's has some thought put in to efficient
> implementation. The whole-paragraph justification is based on a
> priority queue, but the implementation actually in LibHnj is the dumb
> O(n) one. I have O(log n) priority queue code in the new Libart
> intersector. Even with the O(n) code, it's pretty fast.
The code I wrote for that works in nearly linear time and deals with
discretionary hyphens, different stretch-ratios for different types
of spaces, penalties, margin adjustments and different glyph-and-kern
sequences for different choices of line-breaks. It's quite impressive
that I got it all to work, considering that I can only understand what
I'm doing five seconds every two months. The only problem I have is
processing a unicode text string to feed my algorithm with all the data
it needs. There are still some tricky issues as to how grouping glyphs
and inserting potential breaks influence ligatures and contextual glyph
substitution.
Oh, and your hyphenation code breaks down in mysterious ways when fed
patterns for Italian and Latin languages. I think something fishy is
going on with that statemachine, I never really understood how it works.
Anyway, it's right now a python/c hybrid and I can typeset around 10 or 20
pages per second on a 300 Mhz sparc.
> Raster rendering is a top priority. Higher level output is not an
> immediate priority for me, but is obviously important. In any case,
> the "display tree" in Fitz preserves the graphics structure at a
> relatively high level, so outputting it to printable formats makes
> sense. The big trick is (as always) putting an interface on the fonts
> so they can be exported and embedded into PS and PDF files. There's
> a ton of logic for this in GS's PS->PDF conversion code paths.
Would it be possible to shift the font definition logic into another
layer so that all Fitz works with is a collection of glyphs and an
encoding vector of size 256? How do CID fonts work? I haven't had time
to investigate.
> The meta information is beyond the scope of the PDF imaging model, and
> thus probably beyond the scope of the Fitz core. It's a very worthwhile
> project, and possibly deserves "brother" status.
>
> Note that defining an API for text rendering is a Hard Problem. ATSUI
> and Uniscribe are the leading proprietary API's now available. Both
> are big, heavy beasts.
[snip]
> Good font substitution is critical for PDF rendering. Note that PDF
> embeds a "Widths" array even when the glyph outlines are not present.
> Thus, it's possible to synthesize reasonably good-looking output from
> suitable multiple master fonts. GS can render MM fonts, but there are
> no free ones available, and GS lacks the logic to automatically
> instantiate the MM font from the Widths table. Over the longer term,
> I'd like to see free MM fonts drawn.
Again, part of a separate library for font synthesizing that a potential
PDF renderer can use?
> But, again, are we doing rendering or layout? PDF specifies the interface
> after layout but before rendering. Thus, results are quite consistent,
> even when a font has to get substituted.
The way I see it, I am interested in making a "brother" library to Fitz,
that does text layout. Getting text APIs right is harder and Fitz is not
really the place for it. Let's call it Rune.
However, it would be most pleasant to get at font data from one place,
so that I wouldn't have to duplicate all the font handling code.
What is your position regarding FreeType? Or do we roll our own?
> > Do we use floats or fixed point math?
>
> Big question. I'm still grappling with it.
Floats are faster on new machines, but fixed point feels safer as it's more
predictable. Numerical instability won't bite us, and division by zero can
be avoided without the EPSILON hack.
> Doing aa in a luminance-linear space is not _always_ better than doing
> it in a perceptually linear space.
Could it be possible to use another color space than RGB altogether?
Say L*a*b or YCbCr or something based more on physics.
Or will performance hurt too much in the last stage of converting to RGB?
> > I presume Fitz will be written in ANSI C.
>
> That's where I'm leaning. C++ is also on the table, though.
I wouldn't go down the C++ way. I'm very biased here, but going C++ will
not get rid of the worst deficiencies of C and add plenty of new ones.
A C++ program can use a C library, but the reverse is sadly not true.
If we leave the safe haven of C, I'm sure there are better alternatives
than C++.
/tor
More information about the fitz-dev
mailing list