[gs-devel] Using OpenType (CFF-flavour) fonts

Ken Sharp ken.sharp at artifex.com
Mon Dec 13 08:59:34 UTC 2010


Hi Graham,

apologies if some of the following is too simplistic, I'm just trying to 
cover everything and I'm not certain where to pitch it. Please just skip 
over anything which is too simple.


At 15:20 12/12/2010 +0000, Graham Douglas wrote:

>I would like to use some OpenType (CFF-flavour) fonts at my disposal
>but need to understand a little more about the process, especially
>accessing some of the glyphs which do not have designated
>Unicode code points --- such as small caps and oldstyle numbers, I guess 
>because they are just design variations as opposed to being "real 
>characters" of a language.

Its not clear to me what you mean by 'use' the fonts. The easiest way to 
use them is to have the application embed the TrueType fonts as type 42 
fonts in the PostScript program. But then I'm not sure how you are using 
GS, possibly not by sending PostScript ?

If you want to use them as replacements for some other font requested in a 
PostScript program then there is the possibility that this will not work as 
expected, as this is a Ghostscript extension, not present in the PostScript 
specification, so there are no rules on how it should work.

Unicode is not really relevant to PostScript, or in many ways to TrueType 
fonts. TrueType glyphs are accessed by Glyph ID (GID), the font *may* or 
may not contain a Unicode CMAP table which maps Unicode code points to 
glyph IDs, but since PostScript doesn't use Unicode this isn't terribly 
relevant.

PostScript uses a 'character code' which for simple fonts is a single byte 
index into an Encoding array, the array contains a glyph name for each 
entry. The glyph programs are contained in a PostScript dictionary, and 
referenced by name. So the character code is looked up in the Encoding to 
get a name, then the name is looked up in the CharStrings dictionary to get 
a program, which is then executed.

As you can see there is no Unicode in the PostScript at all. Note that 
Encoding arrays are 256 entries, so referenced by a single byte.

PostScript also includes type 0 fonts (Original Composite Fonts) and 
CIDFonts, both of which are potentially referenced using multiple bytes for 
the character code. It is possible to apply a Unicode CMap (that's not the 
TrueType CMAP table, but a PostScript construct which is quite different) 
to a CIDFont which will allow you to access the glyphs using Unicode values.

However, that's rather co-incidental, just as its possible to construct a 
regular Encoding which uses ASCII values to map to the named glyphs which 
represent each of the ASCII code points. The PostScript interpreter neither 
knows nor cares.


In general TrueType fonts in PostScript are handled by conversion to type 
42 fonts, or CIDFonts with type 42 outlines. For OpenType fonts with CFF 
outlines its also possible to convert into CFF fonts, or CIDFonts with CFF 
outlines. This is normally done by the creating application and the font 
embedded into the PostScript stream.

Now in addition to all that, Ghsostcript can use TrueType font directly (as 
mentioned this is a non-standard extension, PostScript does not handle 
TrueType fonts). Ths can be done in two ways, either as a regular font, or 
as a CIDFont, depending on whether you load the font in Fontmap.GS or cidfmap.

When loaded as a regular font GS will attempt to use the CMAP and POST 
subtables in the TrueType font to build a reasonable Encoding for the font. 
If I remember correctly we prefer the 3,1 CMAP subtable for this. I'm not 
certain what happens if that subtable is not present. The Encoding is used 
to map the character code to a name, then on to a GID which is used to 
access the glyph.

When loaded as a CIDFont things are of course somewhat more complicated. 
You don't normally use a CIDFont 'as-is', the only operation which works 
with a naked CIDFont is glyphshow, in this case it takes a CID rather than 
a name. Normally the CIDFont is composed with a CMap in order to produce a 
CID-keyed instance of the font. This maps a character code to a CID and 
things proceed more or less as for a regular font. The difference is that 
the character code can consist of more than one byte (indeed can be of 
variable length).

But we can't access a TrueType font using a CID, so we need to map the 
character code to a CID, then map the CID to a GID so that we can access 
the glyphs in the font. This is complicated, and obviously somewhat prone 
to error, since we are trying to replace a PostScript font with a 
non-PostScript font. There's quite a bit of internal jiggery-pokery 
involved with the various TrueType tables, and CIDSystemInfo dictionary to 
try and build the two step mapping we need for this.

However, the simplest way to use it is to declare the CIDFont as having an 
Identity CIDSystemInfo and then compose that CIDFont with an Identity CMap. 
Then you can use the GID as the character code. This effectively works out 
as a one-to-one mapping so that the character code ends up as the GID used 
to access the glyph from the font. Since you can get the GID, this may be 
the easiest way to work.

Eg in cidfmap:

/TimesNewRoman << /CSI [(Identity) 0] /Path (c:/windows/fonts/times.ttf) 
/SubfontID 0 /FileType /TrueType >> ;

Note that we've used the 'Identity' Ordering in the CIDSystemInfo (CSI).

Then to use the font :
/TimesNewRoman-Identity-H findfont

Note that this uses the CIDFont 'TimesnewRoman' and the CMap 'Identity-H', 
if GS can't find a named font then one of the things it will do is try and 
identify a CIDFont-CMap combination. If it thinks there is a possible 
combination then it will see if there are appropriately named CIDFont and 
CMap available. If so, then it will automatically compose the two together 
to produce the CID-keyed instance for you.


>Where I'm kinda lost is making the "bridge" between all that
>font info and the PostScript machinery/code I need to write in order to 
>access all the glyphs in the font, especially the small caps and oldstyle 
>numbers. I know you can use glyphshow to draw any glyph in the
>CharStrings dictionary, encoded or not, but that seems a complicated
>way to go about things, maybe --- perhaps not?

I'm not absolutely certain (I'd have to go and read the code), but I don't 
think this will work with a TrueType font loaded as a PostScript font. The 
way that the font is loaded I'm not certain that glyphs which are not 
present in the Encoding have any way to map the name to a GID. Since you 
can't access TrueType glyphs by name, that would mean you can't use 
un-encoded glyphs.


>So, in summary, what else do I need to read or
>understand to fully utilise OpenType CFF-flavour
>fonts with GS in order to access all the various
>glyphs in the font.

Ideally you should read the font yourself, convert into a type 42, or 
CIDFont with type 42 outlines, and embed the font in the PostScript stream. 
This is probably the only 100% reliable method.


>--- Is it a matter of reencoding the font?
>--- Or, is the CIDFont machinery something that
>I need to understand?

It may be that loading the font with an Identity-H CMap (and declaring it 
with Identity Ordering in CIDSystemInfo in cidfmap) would be the easiest 
way for you to proceed, though this does require that you know the GID for 
the glyphs you want to use.


>Is CIDFont machinery relevant to this situation
>--- I have the Adobe specs but not yet fully
>read or absorbed them.

I doubt that you really want to use the full power of the CIDFont and CMap 
mechanism. That would be more appropriate if you were going to embed the 
font as a CIDFont, especially if you wanted to access more than 256 glyphs. 
That mechanism is really present for languages which need more than 256 
glyphs at a time (eg Chinese, Japanese, Korean, Vietnamese), though recent 
Adobe products make extensive use of it even for Latin fonts.



                     Ken



More information about the gs-devel mailing list