[gs-bugs] [Bug 691274] Missing or incorrect ToUnicode when using Identity ordering

bugzilla-daemon at ghostscript.com bugzilla-daemon at ghostscript.com
Sat May 1 08:56:46 UTC 2010


http://bugs.ghostscript.com/show_bug.cgi?id=691274

--- Comment #5 from Ken Sharp <ken.sharp at artifex.com> 2010-05-01 08:56:45 UTC ---
GlyphNames2Unicode is not anything to do with the CMap, its a dictionary entry
in the FontInfo dictionary, in the Font dictionary.

This is an undocumented Adobe extension to the PostScript language. As noted
previously there is no provision for Unicode in PostScript, so there is no
standard method for creating a ToUnicode CMap. The information on this entry
noted below is gathered from PostScript files and the observed behaviour of
Adobe applications, and may not be complete or correct.

For regular PostScript fonts pdfwrite will try and assemble a ToUnicode CMap
using the glyph names of the entries in the CharStrings dictionary. This is not
100% reliable of course as embedded (particularly subset) fonts may use
meaningless glyph names, or may simply use standard names for non-standard
glyphs.

CIDFonts do not have glyph names, so this approach cannot work, this is where
the GlyphNames2Unicode entry is particularly useful, as it can associate either
glyph names or CIDs with Unicode points.

The dictionary contains up to 65534 entries which are of the form either:

/glyhname <Unicode code point>

Or 

CID <Unicode code point>


Please note that I haven't attempted any of the following myself, this is an
outline of how to proceed, not a recipe.

I'm assuming that you are using the Arial TrueType font from disk, and adding
an entry to cidfmap so that Ghostscript is able to treat that font as a
CIDFont, using a suitable CMap.

You will need to make a copy of that font dictionary by copying all the
contents of the font dict. The FontInfo dict needs to be copied, and into the
copy you need to insert a new dictionary named GlyphNames2Unicode. You will
need to populate the GlyphNames2Unicode dictionary with appropriate CIDs and
Unicode code points. Finally you will need to call definefont with the modified
font dict.

I'm fairly sure there will be some missing bits in there, so I'd suggest you
start by simply copying the font and defining a new font with a new name from
the copied font dictionary, then work on inserting a GlyphNames2Unicode
dictionary in the FontInfo dictionary. From there it should be relatively easy
to match the CIDs to the Unicode points and create a working ToUnicode CMap in
the output PDF.

-- 
Configure bugmail: http://bugs.ghostscript.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


More information about the gs-bugs mailing list