[gs-bugs] [Bug 691862] Unable to copy text from the converted PDF

bugzilla-daemon at ghostscript.com bugzilla-daemon at ghostscript.com
Tue Jan 4 09:39:52 UTC 2011


Ken Sharp <ken.sharp at artifex.com> changed:

           What    |Removed                     |Added
             Status|NEW                         |ASSIGNED

--- Comment #4 from Ken Sharp <ken.sharp at artifex.com> 2011-01-04 09:39:49 UTC ---

it seems the Adobe documentation lies (or more generously is inconsistent). The
CMap tech note (5014) says that entries are not zero padded, so values less
than 256 are emitted as single bytes, values 256->65535 are two bytes etc.
However the ToUnicode CMap tech note (5411) says:

"Because a “ToUnicode” mapping file is used to covert from CIDs (which begin at
decimal 0, which is expressed as 0x0000 in hexadecimal notation) to Unicode
code points, the following “codespacerange” definition, without exception,
shall always be used: 1 begincodespacerange  <0000> <FFFF>endcodespacerange"

(This is somewhat restrictive, since CIDs can exceed 2 bytes, even though
UTF-16 can't, I could forsee a need to map high CIDs to lower UTF-16 values)

Finally, the PDF Reference (1.7) says:

"The CMap file must contain begincodespacerange and endcodespacerangeoperators
that are consistent with the encoding that the font uses. In particular, for a
simple font, the codespace must be one byte long."

So the PDF Reference conflicts with the tech note which it references!

In fact none of the above seems to be quite what Acrobat actually does. 

It seems that Acrobat does not care what size (in bytes) the codespacerange is,
no matter what kind of font is present. However it *does* care what size the
bfrange entries are. For simple fonts the bfrange entries must be single bytes,
for CIDFonts the bfrange entries must be two bytes. Deviation in either case
leads to files which Acrobat cannot process and either causes errors or
incorrect text when copying and pasting.

A fix which writes the codespacerange and bfrange depending on the type of the
font is now in testing.

Configure bugmail: http://bugs.ghostscript.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

More information about the gs-bugs mailing list