[gs-bugs] [Bug 691506] converting pdf with accented characters to text

bugzilla-daemon at ghostscript.com bugzilla-daemon at ghostscript.com
Wed Jul 28 15:58:07 UTC 2010


--- Comment #10 from Ken Sharp <ken.sharp at artifex.com> 2010-07-28 15:58:05 UTC ---
(In reply to comment #9)
> Ken,
> Per your instructions, I have created the file ps2utf8.ps based upon
> ps2ascii.ps. The character-mappings were adapted from the DjVuLibre open-source
> project.

Wow, you've gone a lot further than anything I was suggesting...

> The file ps2utf8.ps was uploaded as attachment. I have also uploaded the
> original DjVuLibre file, for reference, as it takes some research to find.
> In the hope that you will find this useful, and might even be inspired to add a
> ps2utf8 encoding to GS ...

Thanks for that, if anyone else requests it I can point them to this. One day
we hope to have a more functional text extraction device written in C and able
to deal with things like ToUnicode CMaps which will allow processing of some

Configure bugmail: http://bugs.ghostscript.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

More information about the gs-bugs mailing list