[gs-devel] Can Ghostscript stream PDF?
Chris Liddell
chris.liddell at artifex.com
Wed Jun 2 07:16:31 UTC 2021
On 02/06/2021 01:42, David Newall wrote:
> On 2/6/21 6:10 am, Till Kamppeter wrote:
>> So I want to know now, if I start Ghostscript and feed in a PDF file,
>> would Ghostscript output each page of it as soon as the data of the
>> page has completely arrived or would Ghostscript read in the full PDF
>> before it starts rendering it?
>
> PDF files contain a cross-reference table, normally the second last part
> of the file, without which the file cannot be usefully used. The
> location of the cross-reference table is specified in the trailer, i.e.
> the last part of the file. This means that in general, a PDF file must
> be read in entirety before any useful processing can commence.
>
> See PDF 32000-1:2008 Document management -- Portable document format --
> Part 1: PDF 1.7 section 7.5 for details.
> (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf)
>
> Some PDFs have a copy of the trailer and the cross-reference table just
> after the header to permit processing the file sooner but you cannot
> rely on that, and, as a trailer must always be at the end, the
> possibility exists that the copy after the header might be wrong. As
> explained in the prior-mentioned specifications, PDF files "may be
> modified by later updates, whichappend additional elements to the end of
> the file".
>
> In other words, in general, yes, Ghostscript must read the full PDF
> before it starts rendering it and that is true for any program that
> processes PDFs.
Just to be clear: even a well "linearised"/"web optimized" PDF is *not*
streamable. it still, ultimately, requires the ability to "seek" in the PDF.
And to be even clearer: this is not a Ghostscript limitation, *no*
consumer can stream a PDF file - PDF is, fundamentally, a random access
file format.
If you want a streamable PDL, can I introduce you to Postscript....?
Chris
More information about the gs-devel
mailing list