[gs-devel] Problem converting to PDF/A

Ken Sharp ken.sharp at artifex.com
Fri Dec 31 10:05:53 UTC 2010


At 13:34 30/12/2010 +0100, Vicente David Guardiola Buitrago wrote:

>I m trying to convert PDF to PDF/a using ghostscript 8.71 but I ve found 
>some problems with the attached simple pdf document (test.pdf) and others.

This mailing list is really intended for developers using Ghostscript in an 
application, not for general Ghostscript usage questions or bug reports.


>The problem is that the produced PDF is (I think) wrong. I mean, If I 
>tried to extract the plain text I get odd characters in random positions 
>over the document, while the original document is right.

Text extraction is never guaranteed, and is very dependent on a number of 
factors. The Acrobat 9 preflight tool does not report any PDF/A conformance 
errors with your output file.

Text search and extraction in PDF, especially with CIDFonts, is heavily 
dependent on the presence of ToUnicode CMaps. Currently pdfwrite does not 
emit a ToUnicode CMap in your file (I have not checked to see why) which is 
almost certainly why your text extraction does not work.

The PDF/A specification discusses Unicode character maps in section 6.3.8 
where it notes: "6.3.8 is applicable only for files meeting Level A 
conformance. For Level B conformance the requirements of 6.3.8 can be 
ignored.", Since pdfwrite does not create PDF/A-1a files, this requirement 
is ignored.

If you think you have come across a bug, please open a bug report at 
http://bugs.ghostscript.com


             Ken



More information about the gs-devel mailing list