[gs-devel] Can Ghostscript stream PDF?

Till Kamppeter till.kamppeter at gmail.com
Wed Jun 2 07:54:34 UTC 2021


On 02/06/2021 09:16, Chris Liddell wrote:
> On 02/06/2021 01:42, David Newall wrote:
>> PDF files contain a cross-reference table, normally the second last part
>> of the file, without which the file cannot be usefully used.  The
>> location of the cross-reference table is specified in the trailer, i.e.
>> the last part of the file.  This means that in general, a PDF file must
>> be read in entirety before any useful processing can commence.
>>
>> See PDF 32000-1:2008 Document management -- Portable document format --
>> Part 1: PDF 1.7 section 7.5 for details.
>> (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf)
>>
>> Some PDFs have a copy of the trailer and the cross-reference table just
>> after the header to permit processing the file sooner but you cannot
>> rely on that, and, as a trailer must always be at the end, the
>> possibility exists that the copy after the header might be wrong.  As
>> explained in the prior-mentioned specifications, PDF files "may be
>> modified by later updates, whichappend additional elements to the end of
>> the file".
>>
>> In other words, in general, yes, Ghostscript must read the full PDF
>> before it starts rendering it and that is true for any program that
>> processes PDFs.
> 
> Just to be clear: even a well "linearised"/"web optimized" PDF is *not*
> streamable. it still, ultimately, requires the ability to "seek" in the PDF.
> 
> And to be even clearer: this is not a Ghostscript limitation, *no*
> consumer can stream a PDF file - PDF is, fundamentally, a random access
> file format.
> 
> 
> If you want a streamable PDL, can I introduce you to Postscript....?

Chris and David, thanks for your answers.

I know that PDF in general is not streamable. My intention is not to let 
an arbitrary PDF thrown at my Printer Application being streamed through 
Ghostscript to the printer. I also know that PostScript exists, I had 
enough hassle with it in 20 years of OpenPrinting.

My attention is to stream Raster data (PWG or Apple Raster, both 
streamable formats) through Ghostscript, where Ghostscript uses one of 
its built-in (legacy) drivers to pass on to the printer. For this I do 
not want to modify the code of Ghostscript's legacy drivers nor to 
extract it from Ghostscript as I am not able to test my work, not having 
these 100s of old printer models.

As Ghostscript does not take Raster as input I have to encapsulate my 
Raster data in a format which Ghostscript understands as input format. 
That is either PostScript or PDF.

I have already written a streaming Raster-to-PostScript converter as 
part of the PostScript Printer Application 
(https://github.com/OpenPrinting/ps-printer-app). It streams into 
PostScript printers but could as well stream into Ghostscript.

Now my next Printer Application to retro-fit existing classic printer 
drivers should retro-fit Foomatic and Ghostscript's built-in printer 
drivers. I know that Ghostscript accepts PostScript and PDF as input and 
that these drivers where developed 15+ years ago with PostScript 
expected as input. When I switched the general printing workflow in 
Linux and similar operating systems to PDF between 2006 and 2011 I made 
PDF getting fed into Ghostscript and these drivers continued working.

Now I have heard about approaches of streamable PDF, a sub-set of PDF, 
PDFs written in a special way so that one does not need to read the file 
up to its end to be able to print the first page.

Examples are PDF/is

https://ftp.pwg.org/pub/pwg/candidates/cs-ifxpdfis10-20040315-5102.3.pdf

and also PCLm. PCLm is a raster-only sub-set of PDF which got designed 
as a Raster input format for driverless IPP printers, covering also the 
case where the printer cannot hold the complete job (not even a single 
page) and has to print the data right away.

For generating PCLm I have also support in cups-filters (rastertopclm 
filter), so I could probably rather easily implement a streaming 
Raster-to-PCLm converter for my Printer Application.

Now my questions are:

1. Is Ghostscript able to stream PDF/is or PCLm input? Or has it to read 
the whole file as it was a "normal" PDF?

2. If I want to print Raster data with a legacy driver built into 
Ghostscript, is there any advantage for me (like color management, color 
reproduction quality) to convert the Raster data into PDF and not into 
PostScript?

I have no problem to send the data as PostScript as it is much easier in 
terms of coding (the code is already in the PostScript Printer 
Application) and for legacy printer support most important is that it 
works at all instead of that it works in perfect color, I only want to 
know whether I do not miss anything important when feeding in the data 
via PostScript instead of via PDF.

And if I stream in the Raster data as PostScript, how do I embed color 
space/profile metadata in the PostScript or what command line options I 
should add to the GhostScript command line to get best color fidelity?

    Till\


More information about the gs-devel mailing list