[gs-devel] Problem converting to PDF/A
Ken Sharp
ken.sharp at artifex.com
Fri Dec 31 10:05:53 UTC 2010
At 13:34 30/12/2010 +0100, Vicente David Guardiola Buitrago wrote:
>I m trying to convert PDF to PDF/a using ghostscript 8.71 but I ve found
>some problems with the attached simple pdf document (test.pdf) and others.
This mailing list is really intended for developers using Ghostscript in an
application, not for general Ghostscript usage questions or bug reports.
>The problem is that the produced PDF is (I think) wrong. I mean, If I
>tried to extract the plain text I get odd characters in random positions
>over the document, while the original document is right.
Text extraction is never guaranteed, and is very dependent on a number of
factors. The Acrobat 9 preflight tool does not report any PDF/A conformance
errors with your output file.
Text search and extraction in PDF, especially with CIDFonts, is heavily
dependent on the presence of ToUnicode CMaps. Currently pdfwrite does not
emit a ToUnicode CMap in your file (I have not checked to see why) which is
almost certainly why your text extraction does not work.
The PDF/A specification discusses Unicode character maps in section 6.3.8
where it notes: "6.3.8 is applicable only for files meeting Level A
conformance. For Level B conformance the requirements of 6.3.8 can be
ignored.", Since pdfwrite does not create PDF/A-1a files, this requirement
is ignored.
If you think you have come across a bug, please open a bug report at
http://bugs.ghostscript.com
Ken
More information about the gs-devel
mailing list