[gs-bugs] [Bug 691506] converting pdf with accented characters to text

bugzilla-daemon at ghostscript.com bugzilla-daemon at ghostscript.com
Wed Jul 28 12:17:34 UTC 2010


http://bugs.ghostscript.com/show_bug.cgi?id=691506

--- Comment #1 from Ken Sharp <ken.sharp at artifex.com> 2010-07-28 12:17:33 UTC ---
(In reply to comment #0)

> For example, occurrences of é are translated to e'.

Often a character with an accent is actually described as the base character +
an accent character. Without a sample file its not possible to say any more.


> Am I doing something wrong?
> Is there a way to convert text to utf-8 rather than to ascii?

Currently no, there is a long term project for better text extraction, but it
has a very low priority.

-- 
Configure bugmail: http://bugs.ghostscript.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


More information about the gs-bugs mailing list