[gs-devel] Using OpenType (CFF-flavour) fonts
ken.sharp at artifex.com
Mon Dec 13 08:59:34 UTC 2010
apologies if some of the following is too simplistic, I'm just trying to
cover everything and I'm not certain where to pitch it. Please just skip
over anything which is too simple.
At 15:20 12/12/2010 +0000, Graham Douglas wrote:
>I would like to use some OpenType (CFF-flavour) fonts at my disposal
>but need to understand a little more about the process, especially
>accessing some of the glyphs which do not have designated
>Unicode code points --- such as small caps and oldstyle numbers, I guess
>because they are just design variations as opposed to being "real
>characters" of a language.
Its not clear to me what you mean by 'use' the fonts. The easiest way to
use them is to have the application embed the TrueType fonts as type 42
fonts in the PostScript program. But then I'm not sure how you are using
GS, possibly not by sending PostScript ?
If you want to use them as replacements for some other font requested in a
PostScript program then there is the possibility that this will not work as
expected, as this is a Ghostscript extension, not present in the PostScript
specification, so there are no rules on how it should work.
Unicode is not really relevant to PostScript, or in many ways to TrueType
fonts. TrueType glyphs are accessed by Glyph ID (GID), the font *may* or
may not contain a Unicode CMAP table which maps Unicode code points to
glyph IDs, but since PostScript doesn't use Unicode this isn't terribly
PostScript uses a 'character code' which for simple fonts is a single byte
index into an Encoding array, the array contains a glyph name for each
entry. The glyph programs are contained in a PostScript dictionary, and
referenced by name. So the character code is looked up in the Encoding to
get a name, then the name is looked up in the CharStrings dictionary to get
a program, which is then executed.
As you can see there is no Unicode in the PostScript at all. Note that
Encoding arrays are 256 entries, so referenced by a single byte.
PostScript also includes type 0 fonts (Original Composite Fonts) and
CIDFonts, both of which are potentially referenced using multiple bytes for
the character code. It is possible to apply a Unicode CMap (that's not the
TrueType CMAP table, but a PostScript construct which is quite different)
to a CIDFont which will allow you to access the glyphs using Unicode values.
However, that's rather co-incidental, just as its possible to construct a
regular Encoding which uses ASCII values to map to the named glyphs which
represent each of the ASCII code points. The PostScript interpreter neither
knows nor cares.
In general TrueType fonts in PostScript are handled by conversion to type
42 fonts, or CIDFonts with type 42 outlines. For OpenType fonts with CFF
outlines its also possible to convert into CFF fonts, or CIDFonts with CFF
outlines. This is normally done by the creating application and the font
embedded into the PostScript stream.
Now in addition to all that, Ghsostcript can use TrueType font directly (as
mentioned this is a non-standard extension, PostScript does not handle
TrueType fonts). Ths can be done in two ways, either as a regular font, or
as a CIDFont, depending on whether you load the font in Fontmap.GS or cidfmap.
When loaded as a regular font GS will attempt to use the CMAP and POST
subtables in the TrueType font to build a reasonable Encoding for the font.
If I remember correctly we prefer the 3,1 CMAP subtable for this. I'm not
certain what happens if that subtable is not present. The Encoding is used
to map the character code to a name, then on to a GID which is used to
access the glyph.
When loaded as a CIDFont things are of course somewhat more complicated.
You don't normally use a CIDFont 'as-is', the only operation which works
with a naked CIDFont is glyphshow, in this case it takes a CID rather than
a name. Normally the CIDFont is composed with a CMap in order to produce a
CID-keyed instance of the font. This maps a character code to a CID and
things proceed more or less as for a regular font. The difference is that
the character code can consist of more than one byte (indeed can be of
But we can't access a TrueType font using a CID, so we need to map the
character code to a CID, then map the CID to a GID so that we can access
the glyphs in the font. This is complicated, and obviously somewhat prone
to error, since we are trying to replace a PostScript font with a
non-PostScript font. There's quite a bit of internal jiggery-pokery
involved with the various TrueType tables, and CIDSystemInfo dictionary to
try and build the two step mapping we need for this.
However, the simplest way to use it is to declare the CIDFont as having an
Identity CIDSystemInfo and then compose that CIDFont with an Identity CMap.
Then you can use the GID as the character code. This effectively works out
as a one-to-one mapping so that the character code ends up as the GID used
to access the glyph from the font. Since you can get the GID, this may be
the easiest way to work.
Eg in cidfmap:
/TimesNewRoman << /CSI [(Identity) 0] /Path (c:/windows/fonts/times.ttf)
/SubfontID 0 /FileType /TrueType >> ;
Note that we've used the 'Identity' Ordering in the CIDSystemInfo (CSI).
Then to use the font :
Note that this uses the CIDFont 'TimesnewRoman' and the CMap 'Identity-H',
if GS can't find a named font then one of the things it will do is try and
identify a CIDFont-CMap combination. If it thinks there is a possible
combination then it will see if there are appropriately named CIDFont and
CMap available. If so, then it will automatically compose the two together
to produce the CID-keyed instance for you.
>Where I'm kinda lost is making the "bridge" between all that
>font info and the PostScript machinery/code I need to write in order to
>access all the glyphs in the font, especially the small caps and oldstyle
>numbers. I know you can use glyphshow to draw any glyph in the
>CharStrings dictionary, encoded or not, but that seems a complicated
>way to go about things, maybe --- perhaps not?
I'm not absolutely certain (I'd have to go and read the code), but I don't
think this will work with a TrueType font loaded as a PostScript font. The
way that the font is loaded I'm not certain that glyphs which are not
present in the Encoding have any way to map the name to a GID. Since you
can't access TrueType glyphs by name, that would mean you can't use
>So, in summary, what else do I need to read or
>understand to fully utilise OpenType CFF-flavour
>fonts with GS in order to access all the various
>glyphs in the font.
Ideally you should read the font yourself, convert into a type 42, or
CIDFont with type 42 outlines, and embed the font in the PostScript stream.
This is probably the only 100% reliable method.
>--- Is it a matter of reencoding the font?
>--- Or, is the CIDFont machinery something that
>I need to understand?
It may be that loading the font with an Identity-H CMap (and declaring it
with Identity Ordering in CIDSystemInfo in cidfmap) would be the easiest
way for you to proceed, though this does require that you know the GID for
the glyphs you want to use.
>Is CIDFont machinery relevant to this situation
>--- I have the Adobe specs but not yet fully
>read or absorbed them.
I doubt that you really want to use the full power of the CIDFont and CMap
mechanism. That would be more appropriate if you were going to embed the
font as a CIDFont, especially if you wanted to access more than 256 glyphs.
That mechanism is really present for languages which need more than 256
glyphs at a time (eg Chinese, Japanese, Korean, Vietnamese), though recent
Adobe products make extensive use of it even for Latin fonts.
More information about the gs-devel