[gs-bugs] [Bug 685335] PDF interpreter doesn't process ToUnicode

bugs.ghostscript.com-bugzilla-daemon at ghostscript.com bugs.ghostscript.com-bugzilla-daemon at ghostscript.com
Fri Jul 3 00:54:18 PDT 2009


------- Additional Comments From ken.sharp at artifex.com  2009-07-03 00:54 -------
What do you mean by 'failed' ? Did you get a PostScript error, or something else ?

You shouldn't be calling gs_font_map_glyph_to_unicode directly, you should use
the fonts decode_glyph method.

The JPEG device doesn't handle text, so presumably you are using a custom device
? Its pretty difficult to comment on the action of code I haven't seen.

Note that pdfwrite doesn't use the Unicode information very much, it simply uses
it to construct a ToUnicode CMap for the output PDF file. I would suggest you
start by debugging the code, set a breakpoint in pdf_add_ToUnicode with your
test file as an input and see what happens.

You should also look at scn_cmap_text, especially this code:

		    if (pdf_is_CID_font(subfont)) {
			if (subfont->procs.decode_glyph((gs_font *)subfont, glyph) != GS_NO_CHAR) {
			    /* Since PScript5.dll creates GlyphNames2Unicode with character codes
			       instead CIDs, and with the WinCharSetFFFF-H2 CMap
		               character codes appears different than CIDs (Bug 687954),
		               pass the character code intead the CID. */
			    code = pdf_add_ToUnicode(pdev, subfont, pdfont, 
				chr + GS_MIN_CID_GLYPH, chr, NULL);
			} else {
			    /* If we interpret a PDF document, ToUnicode 
			       CMap may be attached to the Type 0 font. */
			    code = pdf_add_ToUnicode(pdev, pte->orig_font, pdfont, 
				chr + GS_MIN_CID_GLYPH, chr, NULL);

You might find it easier to use MuPDF to extract the text while using GS to
create a JPEG file.

------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

More information about the gs-bugs mailing list