17.1 Overview
Fonts are represented in MuPDF by the abstract fz_font type. This reference
counted structure, encapsulates the basic information about a font, specifically:
-
Glyph list
- Each font consists of a list of glyphs.
-
Glyph data
- How to draw each glyph. In traditional fonts this information is
known as the ‘Outline data’ (or ‘Outlines’), but some font types (such as
Type 3 fonts from PDF) can encapsulate other data, such as images and
colors too.
-
Unicode map
- Most (but not all) fonts contain information that enables
glyphs to be mapped to/from the Unicode code points they represent.
Without such information, it can be impossible to meaningfully extract
text information from a document (such as for cut and paste).
-
Font BBox
- All fonts include information for a bounding box that covers all the
glyphs within a font. Sadly this can frequently be inaccurate or incorrect,
so should be treated with distrust.
-
Glyph advances
- All fonts contain simple Glyph advance information - how far
to move the text cursor after having drawn a given glyph. This information
ensures that successive characters are properly spaced w.r.t. each other.
-
Kerning data
- Most fonts contain simple kerning data; this allows for the glyph
advance between any 2 glyphs to be adjusted based upon particular glyph
values. The classic example of kerning is noting that the spacing between
A and the left hand edge of its following letter is typically different between
AV and AN.
-
Shape data
- Some fonts allow for the automatic ‘shaping’ of glyph sequences.
The trivial example of this in western fonts is that the letters ‘f’ and ‘i’ can
be combined into a single ligature glyph ‘fi’. For many non Latin scripts
(especially Indic and south east Asian scripts), this procedure happens to a
far greater extent. This can be as simple as the incorporation of diacritical
marks, or as complex as the complete rearrangement or replacement of
glyph sequences to give different appearances on the final rendered page.
This process is know as ‘font shaping’ and the data required to perform
this is font specific, and is optionally encapsulated within fonts themselves.
The fz_font does not include information about the particular size that a font is
used at on the page, nor the basic colour used to render a font. It is therefore typical
to see fz_fonts passed around the system paired with both a size (and/or
transformation matrix), a colorspace, and color definition.
MuPDF uses Freetype to handle most of its font rendering. For Type 3
PDF fonts, it renders them itself. Font shaping is done using the HarfBuzz
library.