[gs-devel] [ xefitra ]Can ghostscript 8.70 produce
PDF/A-1a compliant pdfs??
ken.sharp at artifex.com
Fri Aug 21 04:05:49 PDT 2009
At 16:43 21/08/2009 +0500, Jahangir wrote:
>I have successfully created PDF/A-1b compliant pdfs using ghostscript.
>Now, my next target is to be able to create PDF/A-1a compliant pdfs. Is it
>possible with ghostscript 8.70?? If yes, can you point me in the right
>direction. And if no, is there any work going on in this regard?
The Ghostscript pdfwrite device does not emit PDF/A1-a files. The
differences between level b and level a are :
1) All fonts must be encoded using a standard encoding or they must include
a ToUnicode CMap.
There exist numerous PostScript files for which it is not possible to
construct a ToUnicode CMap, and which are not encoded in one of the
specified ways. These files cannot be converted into PDF/A1-a files. Note
that where possible pdfwrite *does* emit ToUnicode information, so for
those files where this is possible it is already conformant in this regard.
2) The file must contain 'tagged' textual data in order to recover the text
information easily. This is simply impossible for a general purpose PDF
converter. In the general case there is no way to infer which portions of
text on a page are (for example) running heads, page numbers, footnote
The ISO specification clearly says "PDF/A-1 writers should not add
structural or semantic information that is not explicitly or implicitly
present in the source material solely for the purpose of achieving
conformance." and also "It is inadvisable for writers to generate
structural or semantic information using automated processes without
There are a number of other requirements which are also not possible for a
conversion application to deal with, for example:
"6.8.6 Non-textual annotations For annotation types that do not display
text, the Contents key of an annotation dictionary should be specified with
an alternative description of the annotation's contents in human-readable
Its impossible for pdfwrite to create a textual representation for an
arbitrary annotation. Also:
"6.8.7 Replacement text All textual structure elements that are represented
in a non-standard manner, e.g., custom characters or inline graphics,
should supply replacement text using the ActualText entry in the structure
element dictionary, as described in PDF Reference 9.8.3."
It is quite common to have text represented as an image, or drawn as a
series of vectors, there is no way for pdfwrite to reconstruct an
'ActualText' entry for these.
I do note that Adobe Distiller 9 does offer the option of producing
PDF/A1-a (2001) output. While its possible that this simply aborts when
ToUnicode information cannot be added, its not clear to me what it does
about the limitations imposed by section 6.8.
I would guess that it simply tags all text as 'body', ignores the
requirement for ActualText, and possibly aborts if annotations are
non-textual. Its also possible that the 2001 revision does not contain
section 6.8 I only have a copy of the 2005 revision.
Given that it would be essentially 'making it up' its not clear to me what
the perceived benefits over PDF/A1-b are.
At present we have no plans to implement PDF/A1-a in pdfwrite.
More information about the gs-devel