[gs-devel] Can Ghostscript stream PDF?
Till Kamppeter
till.kamppeter at gmail.com
Wed Jun 2 07:54:34 UTC 2021
On 02/06/2021 09:16, Chris Liddell wrote:
> On 02/06/2021 01:42, David Newall wrote:
>> PDF files contain a cross-reference table, normally the second last part
>> of the file, without which the file cannot be usefully used. The
>> location of the cross-reference table is specified in the trailer, i.e.
>> the last part of the file. This means that in general, a PDF file must
>> be read in entirety before any useful processing can commence.
>>
>> See PDF 32000-1:2008 Document management -- Portable document format --
>> Part 1: PDF 1.7 section 7.5 for details.
>> (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf)
>>
>> Some PDFs have a copy of the trailer and the cross-reference table just
>> after the header to permit processing the file sooner but you cannot
>> rely on that, and, as a trailer must always be at the end, the
>> possibility exists that the copy after the header might be wrong. As
>> explained in the prior-mentioned specifications, PDF files "may be
>> modified by later updates, whichappend additional elements to the end of
>> the file".
>>
>> In other words, in general, yes, Ghostscript must read the full PDF
>> before it starts rendering it and that is true for any program that
>> processes PDFs.
>
> Just to be clear: even a well "linearised"/"web optimized" PDF is *not*
> streamable. it still, ultimately, requires the ability to "seek" in the PDF.
>
> And to be even clearer: this is not a Ghostscript limitation, *no*
> consumer can stream a PDF file - PDF is, fundamentally, a random access
> file format.
>
>
> If you want a streamable PDL, can I introduce you to Postscript....?
Chris and David, thanks for your answers.
I know that PDF in general is not streamable. My intention is not to let
an arbitrary PDF thrown at my Printer Application being streamed through
Ghostscript to the printer. I also know that PostScript exists, I had
enough hassle with it in 20 years of OpenPrinting.
My attention is to stream Raster data (PWG or Apple Raster, both
streamable formats) through Ghostscript, where Ghostscript uses one of
its built-in (legacy) drivers to pass on to the printer. For this I do
not want to modify the code of Ghostscript's legacy drivers nor to
extract it from Ghostscript as I am not able to test my work, not having
these 100s of old printer models.
As Ghostscript does not take Raster as input I have to encapsulate my
Raster data in a format which Ghostscript understands as input format.
That is either PostScript or PDF.
I have already written a streaming Raster-to-PostScript converter as
part of the PostScript Printer Application
(https://github.com/OpenPrinting/ps-printer-app). It streams into
PostScript printers but could as well stream into Ghostscript.
Now my next Printer Application to retro-fit existing classic printer
drivers should retro-fit Foomatic and Ghostscript's built-in printer
drivers. I know that Ghostscript accepts PostScript and PDF as input and
that these drivers where developed 15+ years ago with PostScript
expected as input. When I switched the general printing workflow in
Linux and similar operating systems to PDF between 2006 and 2011 I made
PDF getting fed into Ghostscript and these drivers continued working.
Now I have heard about approaches of streamable PDF, a sub-set of PDF,
PDFs written in a special way so that one does not need to read the file
up to its end to be able to print the first page.
Examples are PDF/is
https://ftp.pwg.org/pub/pwg/candidates/cs-ifxpdfis10-20040315-5102.3.pdf
and also PCLm. PCLm is a raster-only sub-set of PDF which got designed
as a Raster input format for driverless IPP printers, covering also the
case where the printer cannot hold the complete job (not even a single
page) and has to print the data right away.
For generating PCLm I have also support in cups-filters (rastertopclm
filter), so I could probably rather easily implement a streaming
Raster-to-PCLm converter for my Printer Application.
Now my questions are:
1. Is Ghostscript able to stream PDF/is or PCLm input? Or has it to read
the whole file as it was a "normal" PDF?
2. If I want to print Raster data with a legacy driver built into
Ghostscript, is there any advantage for me (like color management, color
reproduction quality) to convert the Raster data into PDF and not into
PostScript?
I have no problem to send the data as PostScript as it is much easier in
terms of coding (the code is already in the PostScript Printer
Application) and for legacy printer support most important is that it
works at all instead of that it works in perfect color, I only want to
know whether I do not miss anything important when feeding in the data
via PostScript instead of via PDF.
And if I stream in the Raster data as PostScript, how do I embed color
space/profile metadata in the PostScript or what command line options I
should add to the GhostScript command line to get best color fidelity?
Till\
More information about the gs-devel
mailing list