[gs-bugs] [Bug 691506] converting pdf with accented characters to text

bugzilla-daemon at ghostscript.com bugzilla-daemon at ghostscript.com
Wed Jul 28 15:33:36 UTC 2010


http://bugs.ghostscript.com/show_bug.cgi?id=691506

--- Comment #9 from Harry McKame <mckameh1 at armadillo.fr> 2010-07-28 15:33:35 UTC ---
Ken,

Per your instructions, I have created the file ps2utf8.ps based upon
ps2ascii.ps. The character-mappings were adapted from the DjVuLibre open-source
project.

The result seems to work and does convert french pdf into utf-8 text, at least
on the (few) files that I have tested. I have not tested other languages,
although it seems that all European languages are supposedly supported.
However, I don't have the knowledge to verify whether what I did was totally
correct.

I have also not programmed the addition of a BOM (byte order mark) under
Windows, since I don't know how. This ideally should be optional.

The file ps2utf8.ps was uploaded as attachment. I have also uploaded the
original DjVuLibre file, for reference, as it takes some research to find.

In the hope that you will find this useful, and might even be inspired to add a
ps2utf8 encoding to GS ...

-- 
Configure bugmail: http://bugs.ghostscript.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


More information about the gs-bugs mailing list