[gs-bugs] [Bug 685335] PDF interpreter doesn't process ToUnicode
bugs.ghostscript.com-bugzilla-daemon at ghostscript.com
bugs.ghostscript.com-bugzilla-daemon at ghostscript.com
Fri Jul 3 00:54:18 PDT 2009
http://bugs.ghostscript.com/show_bug.cgi?id=685335
------- Additional Comments From ken.sharp at artifex.com 2009-07-03 00:54 -------
What do you mean by 'failed' ? Did you get a PostScript error, or something else ?
You shouldn't be calling gs_font_map_glyph_to_unicode directly, you should use
the fonts decode_glyph method.
The JPEG device doesn't handle text, so presumably you are using a custom device
? Its pretty difficult to comment on the action of code I haven't seen.
Note that pdfwrite doesn't use the Unicode information very much, it simply uses
it to construct a ToUnicode CMap for the output PDF file. I would suggest you
start by debugging the code, set a breakpoint in pdf_add_ToUnicode with your
test file as an input and see what happens.
You should also look at scn_cmap_text, especially this code:
if (pdf_is_CID_font(subfont)) {
if (subfont->procs.decode_glyph((gs_font *)subfont, glyph) != GS_NO_CHAR) {
/* Since PScript5.dll creates GlyphNames2Unicode with character codes
instead CIDs, and with the WinCharSetFFFF-H2 CMap
character codes appears different than CIDs (Bug 687954),
pass the character code intead the CID. */
code = pdf_add_ToUnicode(pdev, subfont, pdfont,
chr + GS_MIN_CID_GLYPH, chr, NULL);
} else {
/* If we interpret a PDF document, ToUnicode
CMap may be attached to the Type 0 font. */
code = pdf_add_ToUnicode(pdev, pte->orig_font, pdfont,
chr + GS_MIN_CID_GLYPH, chr, NULL);
You might find it easier to use MuPDF to extract the text while using GS to
create a JPEG file.
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.
More information about the gs-bugs
mailing list