[gs-devel] [ xefitra ]Can ghostscript 8.70 produce PDF/A-1a compliant pdfs??

Ken Sharp ken.sharp at artifex.com
Fri Aug 21 04:05:49 PDT 2009


At 16:43 21/08/2009 +0500, Jahangir wrote:


>I have successfully created PDF/A-1b compliant pdfs using ghostscript. 
>Now, my next target is to be able to create PDF/A-1a compliant pdfs. Is it 
>possible with ghostscript 8.70?? If yes, can you point me in the right 
>direction. And if no, is there any work going on in this regard?

The Ghostscript pdfwrite device does not emit PDF/A1-a files. The 
differences between level b and level a are :

1) All fonts must be encoded using a standard encoding or they must include 
a ToUnicode CMap.

There exist numerous PostScript files for which it is not possible to 
construct a ToUnicode CMap, and which are not encoded in one of the 
specified ways. These files cannot be converted into PDF/A1-a files. Note 
that where possible pdfwrite *does* emit ToUnicode information, so for 
those files where this is possible it is already conformant in this regard.

2) The file must contain 'tagged' textual data in order to recover the text 
information easily. This is simply impossible for a general purpose PDF 
converter. In the general case there is no way to infer which portions of 
text on a page are (for example) running heads, page numbers, footnote 
rules etc.

The ISO specification clearly says "PDF/A-1 writers should not add 
structural or semantic information that is not explicitly or implicitly 
present in the source material solely for the purpose of achieving 
conformance." and also "It is inadvisable for writers to generate 
structural or semantic information using automated processes without 
appropriate verification."

There are a number of other requirements which are also not possible for a 
conversion application to deal with, for example:

"6.8.6 Non-textual annotations For annotation types that do not display 
text, the Contents key of an annotation dictionary should be specified with 
an alternative description of the annotation's contents in human-readable 
form."

Its impossible for pdfwrite to create a textual representation for an 
arbitrary annotation. Also:

"6.8.7 Replacement text All textual structure elements that are represented 
in a non-standard manner, e.g., custom characters or inline graphics, 
should supply replacement text using the ActualText entry in the structure 
element dictionary, as described in PDF Reference 9.8.3."

It is quite common to have text represented as an image, or drawn as a 
series of vectors, there is no way for pdfwrite to reconstruct an 
'ActualText' entry for these.


I do note that Adobe Distiller 9 does offer the option of producing 
PDF/A1-a (2001) output. While its possible that this simply aborts when 
ToUnicode information cannot be added, its not clear to me what it does 
about the limitations imposed by section 6.8.

I would guess that it simply tags all text as 'body', ignores the 
requirement for ActualText, and possibly aborts if annotations are 
non-textual. Its also possible that the 2001 revision does not contain 
section 6.8 I only have a copy of the 2005 revision.

Given that it would be essentially 'making it up' its not clear to me what 
the perceived benefits over PDF/A1-b are.

At present we have no plans to implement PDF/A1-a in pdfwrite.


             Ken


More information about the gs-devel mailing list