High Level Output Devices

Overview
PCL-XL file output
Text output
DOCX file output
XPS file output
PDF file output
PostScript file output
EPS file output
PDF/X-3 file output
PDF/A file output
Ghostscript PDF printer description
pdfmark extensions
Limitations

For other information, see the Ghostscript overview.

Overview

High level devices are Ghostscript output devices which do not render to a raster, in general they produce 'vector' as opposed to bitmap output. Such devices currently include: pdfwrite, ps2write, eps2write, txtwrite, xpswrite, pxlmono, pxlcolor and docxwrite.

Although these devices produce output which is not a raster, they still work in the same general fashion as all Ghostscript devices. The input (PostScript, PDF, XPS, PCL or PXL) is handled by an appropriate interpreter, the interpreter processes the input and produces from it a sequence of drawing 'primitives' which are handed to the device. The device decides whether to handle the primitive itself, or call upon the graphics library to render the primitive to the final raster.

Primitives are quite low level graphics operations; as an example consider the PDF sequence '0 0 100 100 re f'. This constructs a rectangle with the bottom left corner at 0,0 which is 100 units wide by 100 units high, and fills it with the current color. A lower level implementation using only primitives would first move the current point to 0,0, then construct a line to 0,100, then a line to 100,100, a line to 100, 0 and finally a line back to 0,0. It would then fill the result.

Obviously that's a simple example but it serves to demonstrate the point.

Now the raster devices all call the graphics library to process primitives (though they may choose to take some action first) and render the result to a bitmap. The high level devices instead reassemble the primitives back into high level page description and write the result to a file. This means that the output, while it should be visually the same as the input (because it makes the same marks), is not the same as the original input, even if the output Page Description Language is the same as the input one was (eg PDF to PDF).

Why is this important ? Firstly because the description of the page won't be the same, if your workflow relies upon (for example) finding rectangles in the description then it might not work after it has been processed by a high level device, as the rectangles may all have turned into lengthy path descriptions.

In addition, any part of the original input which does not actually make marks on the page (such as hyperlinks, bookmarks, comments etc) will normally not be present in the output, even if the output is the same format. In general the PDF interpreter and the PDF output device (pdfwrite) try to preserve the non-marking information from the input, but some kinds of content are not carried across, in particular comments are not preserved.

We often hear from users that they are 'splitting' PDF files, or 'modifying' them, or converting them to PDF/A, and it's important to realize that this is not what's happening. Instead, a new PDF file is being created, which should look the same as the original, but the actual insides of the PDF file are not the same as the original. This may not be a problem, but if it's important to keep the original contents, then you need to use a different tool (we'd suggest MuPDF, also available from Artifex). Of course, if the intention is to produce a modified PDF file (for example, reducing the resolution of images, or changing the colour space), then clearly you cannot keep the original contents unchanged, and pdfwrite performs these tasks well.

PCL-XL (PXL)

The pxlmono and pxlcolor devices output HP PCL-XL, a graphic language understood by many recent laser printers.

Options

-dCompressMode=1 | 2 | 3 (default is 1)
Set the compression algorithm used for bitmap graphics. RLE=1, JPEG=2, DeltaRow=3. When JPEG=2 is on, it is applied only to full-color images; indexed-color graphics and masks continues to be compressed with RLE.

Text output

The txtwrite device will output the text contained in the original document as Unicode.

Options

-dTextFormat=0 | 1 | 2 | 3 | 4 (default is 3)
Format 0 is intended for use by developers and outputs XML-escaped Unicode along with information regarding the format of the text (position, font name, point size, etc). The XML output is the same format as the MuPDF output, but no additional processing is performed on the content, so no block detection.

Format 1 uses the same XML output format, but attempts similar processing to MuPDF, and will output blocks of text. Note the algorithm used is not the same as the MuPDF code, and so the results will not be identical.

Format 2 outputs Unicode (UCS2) text (with a Byte Order Mark) which approximates the layout of the text in the original document.

Format 3 is the same as format 2, but the text is encoded in UTF-8.

Format 4 is internal format similar to Format 0 but with extra information.

DOCX output

The docxwrite device creates a DOCX file suitable for use with applications such as Word or LibreOffice, containing the text in the original document.

Rotated text is placed into textboxes. Heuristics are used to group glyphs into words, lines and paragraphs; for some types of formatting, these heuristics may not be able to recover all of the original document structure.

This device currently has no special configuration parameters.

XPS file output

The xpswrite device writes its output according to the Microsoft XML Paper Specification. This specification was later amended to the Open XML Paper specification, submitted to ECMA International and adopted as ECMA-388.

This device currently has no special configuration parameters.

The family of PDF and PostScript output devices

Common controls and features
Controls and features specific to PostScript and PDF input
Controls and features specific to PCL and PXL input
PDF file output
PostScript file output
EPS file output

Common controls and features

The PDF and PostScript (including Encapsulated PostScript, or EPS) devices have much of their code in common, and so many of the controlling parameters are also common amongst the devices. The pdfwrite, ps2write and eps2write devices create PDF or PostScript files whose visual appearance should match, as closely as possible, the appearance of the original input (PS, PDF, XPS, PCL, PXL). There are a number of caveats as mentioned in the overview above. In addition to the general comments there are some additional points that bear mentioning;

PCL has a graphics model which differs significantly from the PostScript or PDF one, in particular it has a form of transparency called RasterOps, some aspects of which cannot be represented in PDF at a high level (or at all, in PostScript). The pdfwrite device makes no attempt to handle this, and the resulting PDF file will not match the original input. The only way to deal with these types of file is to render the whole page to a bitmap and then 'wrap' the bitmap as a PDF file. Currently we do not do this either, but it is possible that a future enhancement may do so.

If the input contains PDF-compatible transparency, but the ps2write device is selected, or the pdfwrite device is selected, but has been told to limit the PDF feature set to a version less than 1.4, the transparency cannot be preserved. In this case the entire page is rendered to a bitmap and that bitmap is 'wrapped up' in appropriate PDF or PostScript content. The output should be visually the same as the input, but since it has been rendered it will not scale up or down well, unlike the original, vector, content of the input.

The options in the command line may include any switches that may be used with the language interpreter appropriate for the input (see here for a complete list). In addition the following options are common to all the pdfwrite family of devices, and should work when specified on the command line with any of the language interpreters.

-rresolution: Sets the resolution for pattern fills, for fonts that must be converted to bitmaps and any other rendering required (eg rendering transparent pages for output to PDF versions < 14). The default internal resolution for pdfwrite is 720dpi.
-dUNROLLFORMS: When converting from PostScript, pdfwrite (and ps2write) preserve the use of Form resources as Form XObjects in the output. Some badly written PostScript can cause this to produce incorrect output (the Quality Logic CET tests for example). By setting this flag, forms will be unrolled and stored in the output each time they are used, which avoids the problems. Note that the output file will of course be larger this way. We do not attempt to preserve Form XObjects from PDF files, unless they are associated with transparency groups.
-dNoOutputFonts: Ordinarily the pdfwrite device family goes to considerable lengths to preserve fonts from the input as fonts in the output. However in some highly specific cases it can be useful to have the text emitted as linework/bitmaps instead. Setting this switch will prevent these devices from emitting any fonts, all text will be stored as vectors (or bitmaps in the case of bitmapped fonts) in the page content stream. Note that this will produce larger output which will process more slowly, render differently and particularly at lower resolution produce less consistent text rendering. Use with caution.
-dCompressFonts=boolean: Defines whether pdfwrite will compress embedded fonts in the output. The default value is true; the false setting is intended only for debugging as it will result in larger output.
-dCompressStreams=boolean: Defines whether pdfwrite will compress streams other than those in fonts or pages in the output. The default value is true; the false setting is intended only for debugging as it will result in larger output.

Distiller Parameters

Options may also include -dparameter=value or -sparameter=string switches for setting "distiller parameters", Adobe's documented parameters for controlling the conversion of PostScript into PDF. The PostScript setdistillerparams and currentdistillerparams operators are also recognized when the input is PostScript, and provide an equivalent way to set these parameters from within a PostScript input file.

Although the name implies that these parameters are for controlling PDF output, in fact the whole family of devices use these same parameters to control the conversion into PostScript and EPS as well.

The pdfwrite family of devices recognize all of the Acrobat Distiller 5 parameters defined in the DistillerParameters (version 5) document available from the Adobe web site. Cells in the table below containing '=' mean that the value of the parameter is the same as in the "default" column.

Parameter name Notes default screen ebook printer prepress

AlwaysEmbed (13) [ ] = = = =
AntiAliasColorImages (0) false = = = =
AntiAliasGrayImages (0) false = = = =
AntiAliasMonoImages (0) false = = = =
ASCII85EncodePages false = = = =
AutoFilterColorImages (1) true = = = =
AutoFilterGrayImages (1) true = = = =
AutoPositionEPSFiles (0) true = = = =
AutoRotatePages /PageByPage /PageByPage /All /None /None
Binding (0) /Left = = = =
CalCMYKProfile (0) () = = = =
CalGrayProfile (0) () = = = =
CalRGBProfile (0) () = = = =
CannotEmbedFontPolicy (0) /Warning /Warning /Warning /Warning /Error
ColorACSImageDict (13) (note 7) (note 10) (note 10) (note 8) (note 9)
ColorConversionStrategy (6) LeaveColorUnchanged RGB RGB UseDeviceIndependentColor LeaveColorUnchanged
ColorImageDepth -1 = = = =
ColorImageDict (13) (note 7) = = = =
ColorImageFilter /DCTEncode = = = =
ColorImageDownsampleThreshold 1.5 = = = =
ColorImageDownsampleType (3) /Subsample /Average /Average /Average /Bicubic
ColorImageResolution 72 72 150 300 300
CompatibilityLevel 1.7 1.5 1.5 1.7 1.7
CompressPages (14) true = = = =
ConvertCMYKImagesToRGB false = = = =
ConvertImagesToIndexed (0) false = = = =
CoreDistVersion 4000 = = = =
CreateJobTicket (0) false false false true true
DefaultRenderingIntent /Default = = = =
DetectBlends (0) true = = = =
DoThumbnails (0) false false false false true
DownsampleColorImages false true true false false
DownsampleGrayImages false true true false false
DownsampleMonoImages false true true false false
EmbedAllFonts true false true true true
EmitDSCWarnings (0) false = = = =
EncodeColorImages true = = = =
EncodeGrayImages true = = = =
EncodeMonoImages true = = = =
EndPage (0) -1 = = = =
GrayACSImageDict (13) (note 7) (note 7) (note 10) (note 8) (note 9)
GrayImageDepth -1 = = = =
GrayImageDict (13) (note 7) = = = =
GrayImageDownsampleThreshold 1.5 = = = =
GrayImageDownsampleType (3) /Subsample /Average /Bicubic /Bicubic /Bicubic
GrayImageFilter /DCTEncode = = = =
GrayImageResolution 72 72 150 300 300
ImageMemory (0) 524288 = = = =
LockDistillerParams false = = = =
LZWEncodePages (2) false = = = =
MaxSubsetPct 100 = = = =
MonoImageDepth -1 = = = =
MonoImageDict (13) <<K -1>> = = = =
MonoImageDownsampleThreshold 1.5 = = = =
MonoImageDownsampleType /Subsample /Subsample /Subsample /Subsample /Subsample
MonoImageFilter /CCITTFaxEncode = = = =
MonoImageResolution 300 300 300 1200 1200
NeverEmbed (13) (note 11)(note 12) (note 11)(note 12) (note 11)(note 12) [ ](note 12) [ ](note 12)
OffOptimizations 0 = = = =
OPM 1 = = = =
Optimize (0,5) false true true true true
ParseDSCComments true = = = =
ParseDSCCommentsForDocInfo true = = = =
PreserveCopyPage (0) true = = = =
PreserveEPSInfo (0) true = = = =
PreserveHalftoneInfo false = = = =
PreserveOPIComments (0) false false false true true
PreserveOverprintSettings false false false true true
sRGBProfile (0) () = = = =
StartPage (0) 1 = = = =
SubsetFonts true = = = =
TransferFunctionInfo (4) /Preserve = = = =
UCRandBGInfo /Remove /Remove /Remove /Preserve /Preserve
UseFlateCompression (2) true = = = =
UsePrologue (0) false = = = =
PassThroughJPEGImages (15) true = = = =

Parameter name	Notes	default	screen	ebook	printer	prepress
`AlwaysEmbed`	(13)	[ ]	=	=	=	=
`AntiAliasColorImages`	(0)	false	=	=	=	=
`AntiAliasGrayImages`	(0)	false	=	=	=	=
`AntiAliasMonoImages`	(0)	false	=	=	=	=
`ASCII85EncodePages`		false	=	=	=	=
`AutoFilterColorImages`	(1)	true	=	=	=	=
`AutoFilterGrayImages`	(1)	true	=	=	=	=
`AutoPositionEPSFiles`	(0)	true	=	=	=	=
`AutoRotatePages`		/PageByPage	/PageByPage	/All	/None	/None
`Binding`	(0)	/Left	=	=	=	=
`CalCMYKProfile`	(0)	()	=	=	=	=
`CalGrayProfile`	(0)	()	=	=	=	=
`CalRGBProfile`	(0)	()	=	=	=	=
`CannotEmbedFontPolicy`	(0)	/Warning	/Warning	/Warning	/Warning	/Error
`ColorACSImageDict`	(13)	(note 7)	(note 10)	(note 10)	(note 8)	(note 9)
`ColorConversionStrategy`	(6)	LeaveColorUnchanged	RGB	RGB	UseDeviceIndependentColor	LeaveColorUnchanged
`ColorImageDepth`		-1	=	=	=	=
`ColorImageDict`	(13)	(note 7)	=	=	=	=
`ColorImageFilter`		/DCTEncode	=	=	=	=
`ColorImageDownsampleThreshold`		1.5	=	=	=	=
`ColorImageDownsampleType`	(3)	/Subsample	/Average	/Average	/Average	/Bicubic
`ColorImageResolution`		72	72	150	300	300
`CompatibilityLevel`		1.7	1.5	1.5	1.7	1.7
`CompressPages`	(14)	true	=	=	=	=
`ConvertCMYKImagesToRGB`		false	=	=	=	=
`ConvertImagesToIndexed`	(0)	false	=	=	=	=
`CoreDistVersion`		4000	=	=	=	=
`CreateJobTicket`	(0)	false	false	false	true	true
`DefaultRenderingIntent`		/Default	=	=	=	=
`DetectBlends`	(0)	true	=	=	=	=
`DoThumbnails`	(0)	false	false	false	false	true
`DownsampleColorImages`		false	true	true	false	false
`DownsampleGrayImages`		false	true	true	false	false
`DownsampleMonoImages`		false	true	true	false	false
`EmbedAllFonts`		true	false	true	true	true
`EmitDSCWarnings`	(0)	false	=	=	=	=
`EncodeColorImages`		true	=	=	=	=
`EncodeGrayImages`		true	=	=	=	=
`EncodeMonoImages`		true	=	=	=	=
`EndPage`	(0)	-1	=	=	=	=
`GrayACSImageDict`	(13)	(note 7)	(note 7)	(note 10)	(note 8)	(note 9)
`GrayImageDepth`		-1	=	=	=	=
`GrayImageDict`	(13)	(note 7)	=	=	=	=
`GrayImageDownsampleThreshold`		1.5	=	=	=	=
`GrayImageDownsampleType`	(3)	/Subsample	/Average	/Bicubic	/Bicubic	/Bicubic
`GrayImageFilter`		/DCTEncode	=	=	=	=
`GrayImageResolution`		72	72	150	300	300
`ImageMemory`	(0)	524288	=	=	=	=
`LockDistillerParams`		false	=	=	=	=
`LZWEncodePages`	(2)	false	=	=	=	=
`MaxSubsetPct`		100	=	=	=	=
`MonoImageDepth`		-1	=	=	=	=
`MonoImageDict`	(13)	<<K -1>>	=	=	=	=
`MonoImageDownsampleThreshold`		1.5	=	=	=	=
`MonoImageDownsampleType`		/Subsample	/Subsample	/Subsample	/Subsample	/Subsample
`MonoImageFilter`		/CCITTFaxEncode	=	=	=	=
`MonoImageResolution`		300	300	300	1200	1200
`NeverEmbed`	(13)	(note 11)(note 12)	(note 11)(note 12)	(note 11)(note 12)	[ ](note 12)	[ ](note 12)
`OffOptimizations`		0	=	=	=	=
`OPM`		1	=	=	=	=
`Optimize`	(0,5)	false	true	true	true	true
`ParseDSCComments`		true	=	=	=	=
`ParseDSCCommentsForDocInfo`		true	=	=	=	=
`PreserveCopyPage`	(0)	true	=	=	=	=
`PreserveEPSInfo`	(0)	true	=	=	=	=
`PreserveHalftoneInfo`		false	=	=	=	=
`PreserveOPIComments`	(0)	false	false	false	true	true
`PreserveOverprintSettings`		false	false	false	true	true
`sRGBProfile`	(0)	()	=	=	=	=
`StartPage`	(0)	1	=	=	=	=
`SubsetFonts`		true	=	=	=	=
`TransferFunctionInfo`	(4)	/Preserve	=	=	=	=
`UCRandBGInfo`		/Remove	/Remove	/Remove	/Preserve	/Preserve
`UseFlateCompression`	(2)	true	=	=	=	=
`UsePrologue`	(0)	false	=	=	=	=
`PassThroughJPEGImages`	(15)	true	=	=	=	=

(note 0) This parameter can be set and queried, but currently has no effect.

(note 1) -dAutoFilterxxxImages=false works since Ghostscript version 7.30. Older versions of Ghostscript don't examine the image to decide between JPEG and LZW or Flate compression: they always use Flate compression.

(note 2) Because the LZW compression scheme was covered by patents at the time this device was created, pdfwrite does not actually use LZW compression: all requests for LZW compression are ignored. UseFlateCompression is treated as always on, but the switch CompressPages can be set to false to turn off page level stream compression. Now that the patent has expired, we could change this should it become worthwhile.

(note 3) The xxxDownsampleType parameters can also have the value /Bicubic (a Distiller 4 feature), this will use a Mitchell filter. (older versions of pdfwrite simply used Average instead). Note; if a non-integer downsample factor is used the code will clamp to the nearest integer (if the difference is less than 0.1) or will silently switch to the old bicubic filter, NOT the Mitchell filter.

(note 4) The default for transfer functions is to preserve them, this is because transfer functions are a device-dependent feature, a set of transfer functions designed for an RGB device will give incorrect output on a CMYK device for instance. The pdfwrite device does now support /Preserve, /Apply and /Remove (the previous documentation was incorrect, application of transfer functions was not supported). PDF 2.0 deprecates the use of transfer functions, and so when producing PDF 2.0 compatible output if the TransferFunctionInfor is set to /Preserve it will be silently replaced with /Apply. You can instead specifically set TransferFunctionInfo to /Remove when producing PDF 2.0 in order to avoid the transfer function being applied.

(note 5) Use the -dFastWebView command line switch to 'optimize' output.

(note 6) The value UseDeviceIndependentColorForImages works the same as UseDeviceIndependentColor. The value sRGB actually converts to RGB with the default Ghostscript conversion. The new Ghostscript-specific value Gray converts all colors to DeviceGray. With the introduction of new color conversion code in version 9.11 it is no longer necessary to set ProcessColorModel when selecting Gray, RGB or CMYK. It is also no longer necessary to set UseCIEColor for UseDeviceIndependentColor to work properly, and the use of UseCIEColor is now strongly discouraged.

(note 7) The default image parameter dictionary is

<< /QFactor 0.9 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >>

(note 8) The printer ACS image parameter dictionary is

<< /QFactor 0.4 /Blend 1 /ColorTransform 1 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >>

(note 9) The prepress ACS image parameter dictionary is

<< /QFactor 0.15 /Blend 1 /ColorTransform 1 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >>

(note 10) The screen and ebook ACS image parameter dictionary is

<< /QFactor 0.76 /Blend 1 /ColorTransform 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >>

(note 11) The default, screen, and ebook settings never embed the 14 standard fonts (Courier, Helvetica, and Times families, Symbol, and ZapfDingbats). This behaviour is intentional but can be overridden by:

<< /NeverEmbed [ ] >> setdistillerparams

(note 12) NeverEmbed can include CID font names. If a CID font is substituted in lib/cidfmap, the substitute font name is used when the CID font is embedded, and the original CID font name is used when it is not embedded. NeverEmbed should always specify the original CID font name.

(note 13) The arrays AlwaysEmbed and NeverEmbed and image parameter dictionaries ColorACSImageDict, ColorACSImageDict, ColorImageDict, GrayACSImageDict, GrayImageDict, MonoImageDict cannot be specified on the command line. To specify these, you must use PostScript, either by including it in the PostScript source or by passing the -c command-line parameter to ghostscript as described in Limitations below. For example, including the PostScript string in your file in.ps:

<</AlwaysEmbed [/Helvetica /Times-Roman]>> setdistillerparams

is equivalent to invoking:

gs -dBATCH -dSAFER -DNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=out.pdf -c '<</AlwaysEmbed [/Helvetica /Times-Roman]>> setdistillerparams' -f in.ps

or using the extra parameters in a file:

@params.in

where the file params.in contains:

-c '<</AlwaysEmbed [/Helvetica /Times-Roman]>> setdistillerparams' -f in.ps

(note 14) The default value of CompressPages is false for ps2write and eps2write.

(note 15) When true image data in the source which is encoded using the DCT (JPEG) filter will not be decompressed and then recompressed on output. This prevents the multiplication of JPEG artefacts caused by lossy compression. PassThroughJPEGImages currently only affects simple JPEG images. It has no effect on JPX (JPEG2000) encoded images, or masked images. In addition this parameter will be ignored if the pdfwrite device needs to modify the source data. This can happen if the image is being downsampled, changing colour space or having transfer functions applied. Note that this parameter essentially overrides the 'EncodeColorImages' and 'EncodeGrayImages' parameters if they are false, the image will still be written with a DCTDecode filter. NB this feature currently only works with PostScript or PDF input, it does not work with PCL, PXL or XPS input.

Color Conversion and Management

As of the 9.11 pre-release, the color management in the pdfwrite family has been substantially altered so that it now uses the same Color Management System as rendering (the default is LCMS2). This considerably improves the color handling in both pdfwrite and ps2write, particularly in the areas of Separation and DeviceN color spaces, and Indexed color spaces with images.

Note that while pdfwrite uses the same CMS as the rendering devices, this does not mean that the entire suite of options is available, as described in the GS9_Colour Management.pdf file. The colour management code has no effect at all unless either ColorConversionStrategy or ConvertCMYKImagesToRGB is set, or content has to be rendered to an image (this is rare and usually required only when converting a PDF file with transparency to a version < PDF 1.4).

Options based on object type (image, text, linework) are not used, all objects are converted using the same scheme. -dKPreserve has no effect because we will not convert CMYK to CMYK. -dDeviceGrayToK also has no effect; when converting to CMYK DeviceGray objects are left in DeviceGray since that can be mapped directly to the K channel.

The ColorConversionStrategy switch can now be set to LeaveColorUnchanged, Gray, RGB, CMYK or UseDeviceIndependentColor. Note that, particularly for ps2write, LeaveColorUnchanged may still need to convert colors into a different space (ICCbased colors cannot be represented in PostScript for example). ColorConversionStrategy can be specified either as; a string by using the -s switch (-sColorConversionStrategy=RGB) or as a name using the -d switch (-dColorConversionStrategy=/RGB).

ps2write cannot currently convert into device-independent color spaces, and so UseDeviceIndependentColorshould not be used with ps2write (oe eps2write).

All other color spaces are converted appropriately. Separation and DeviceN spaces will be preserved if possible (ps2write cannot preserve DeviceN or Lab) and if the alternate space is not appropriate a new alternate space will be created. Eg a [/Separation (MyColor) /DeviceRGB {...}] when the ColorConversionStrategy is set to CMYK would be converted to [/Separation (MyColor) /DeviceCMYK {...}] The new tint transform will be created by sampling the original tint transform, converting the RGB values into CMYK, and then creating a function to linearly interpolate between those values.

The PreserveSeparation switch now controls whether the pdfwrite family of devices will attempt to preserve Separation spaces. If this is set to false then all Separation colours will be converted into the current device space specified by ProcessColorModel.

Setting page orientation

By default Ghostscript determines viewing page orientation based on the dominant text orientation on the page. Sometimes, when the page has text in several orientations or has no text at all, wrong orientation can be selected.

Acrobat Distiller parameter AutoRotatePages controls the automatic orientation selection algorithm. On Ghostscript, besides input stream, Distiller parameters can be given as command line arguments. For instance: -dAutoRotatePages=/None or /All or /PageByPage.

When there is no text on the page or automatic page rotation is set to /None an orientation value from setpagedevice is used. Valid values are: 0 (portrait), 3 (landscape), 2 (upside down), and 1 (seascape). The orientation can be set from the command line as -c "<</Orientation 3>> setpagedevice" using Ghostscript directly but cannot be set in ps2pdf. See Limitations below.

Ghostscript passes the orientation values from DSC comments to the pdfwrite driver, and these are compared with the auto-rotate heuristic. If they are different then the DSC value will be used preferentially. If the heuristic is to be preferred over the DSC comments then comment parsing can be disabled by setting -dParseDSCComments=false.

Controls and features specific to PostScript and PDF input

-dPDFSETTINGS=configuration

Presets the "distiller parameters" to one of four predefined settings:

/screen selects low-resolution output similar to the Acrobat Distiller (up to version X) "Screen Optimized" setting.
/ebook selects medium-resolution output similar to the Acrobat Distiller (up to version X) "eBook" setting.
/printer selects output similar to the Acrobat Distiller "Print Optimized" (up to version X) setting.
/prepress selects output similar to Acrobat Distiller "Prepress Optimized" (up to version X) setting.
/default selects output intended to be useful across a wide variety of uses, possibly at the expense of a larger output file.

NB Adobe has recently changed the names of the presets it uses in Adobe Acrobat Distiller, in order to avoid confusion with earlier versions we do not plan to change the names of the PDFSETTINGS parameters. The precise value for each control is listed in the table above.

Please be aware that the /prepress setting does not indicate the highest quality conversion. Using any of these presets will involve altering the input, and as such may result in a PDF of poorer quality (compared to the input) than simply using the defaults. The 'best' quality (where best means closest to the original input) is obtained by not setting this parameter at all (or by using /default).

The PDFSETTINGS presets should only be used if you are sure you understand that the output will be altered in a variety of ways from the input. It is usually better to adjust the controls individually (see the table below) if you have a genuine requirement to produce, for example, a PDF file where the images are reduced in resolution.

Controls and features specific to PCL and PXL input

Many of the controls used for distiller parameters can be used on the command line with the -d or -s switches, and these will work correctly with PCL or PXL input. However, some controls (eg /NeverEmbed) do not take simple numeric or string arguments, and these cannot be set from the command line. When the input is PostScript or PDF we can use the -c and -f switches to send PostScript through the interpreter to control these parameters, but clearly this is not possible when the interpreter does not understand PostScript. In addition some features are controlled using the PostScript pdfmark operator and again that clearly is not possible unless we are using a PostScript interpreter to read the input.

To overcome this new, GhostPCL-specific, PJL parameters have been added. These parameters are defined as PDFMARK and SETDISTILLERPARAMS. In order to reduce confusion when using PostScript and PCL as inputs these PJL parameters take essentially the same PostScript constructs as the corresponding PostScript operators pdfmark and setdistillerparams. However it is important to realise that these are not processed by a full PostScript interpreter, and there are syntactic rules which must be followed carefully when using these parameters.

You cannot use arbitrary PostScript operators, only boolean, number, name, string, array and dictionary objects are supported (but see PUTFILE later). All tokens must be separated by white space, so while this [/Test(string)] is perfectly valid in PostScript, you must instead write it as [ /Test (string) ] for PJL parsing. All PDFMARK and SETDISTILLERPARAMS must be set as DEFAULT, the values must be on a single line, and delimited by "".

pdfmarks sometimes require the insertion of file objects (especially for production of PDF/A files) so we must find some way to handle these. This is done (for the pdfmark case only) by defining a special (non-standard) pdfmark name PUTFILE, this simply takes the preceding string, and uses it as a fully qualified path to a file. Any further pdfmark operations can then use the named object holding the file to access it.

The easiest way to use these parameters is to create a 'settings' file, put all the commands in it, and then put it on the command line immediately before the real input file. For example:


./gpcl6 -sDEVICE=pdfwrite -dPDFA=1 -dCompressPages=false -dCompressFonts=false -sOutputFile=./out.pdf ./pdfa.pjl ./input.pcl

Where pdfa.pjl contains the PJL commands to create a PDF/A-1b file (see example below).

Example creation of a PDF/A output file

For readability the line has been bisected, when used for real this must be a single line. The 'ESC' represents a single byte, value 0x1B, an escape character in ASCII. The line must end with an ASCII newline (\n, 0x0A) and this must be the only newline following the @PJL. The line breaks between "" below should be replaced with space characters, the double quote charcters (") are required.



ESC%-12345X
@PJL DEFAULT PDFMARK = "
[ /_objdef {icc_PDFA} /type /stream /OBJ pdfmark
[ {icc_PDFA} << /N 3 >> /PUT pdfmark
[ {icc_PDFA} (/ghostpdl/iccprofiles/default_rgb.icc) /PUTFILE pdfmark
[ /_objdef {OutputIntent_PDFA} /type /dict /OBJ pdfmark
[ {OutputIntent_PDFA} << /S /GTS_PDFA1 /Type /OutputIntent /DestOutputProfile {icc_PDFA} /OutputConditionIdentifier (sRGB) >> /PUT pdfmark
[ {Catalog} << /OutputIntents [{OutputIntent_PDFA}] >> /PUT pdfmark
[ /Author (Ken) /Creator (also Ken) /Title (PDF/A-1b) /DOCINFO pdfmark
"

Example using DISTILLERPARAMS to set the quality of JPEG compression.

ESC%-12345X @PJL DEFAULT SETDISTILLERPARAMS = "<< /ColorImageDict << /QFactor 0.7 /Blend 1 /HSamples [ 2 1 1 2 ] /VSamples [ 2 1 1 2 ] >> >>"

PDF file output

-dMaxInlineImageSize=integer

Specifies the maximum size of an inline image, in bytes. For images larger than this size, pdfwrite will create an XObject instead of embedding the image into the context stream. The default value is 4000. Note that redundant inline images must be embedded each time they occur in the document, while multiple references can be made to a single XObject image. Therefore it may be advantageous to set a small or zero value if the source document is expected to contain multiple identical images, reducing the size of the generated PDF.

-dDoNumCopies

When present, causes pdfwrite to use the #copies or /NumCopies entry in the page device dictionary to duplicate each page in the output PDF file as many times as the 'copies' value. This is intended for use by workflow applications like CUPS and should not be used for generating general purpose PDF files. In particular any pdfmark operations which rely on page numbers, such as Link or Outline annotations will not work correctly with this flag.

-dDetectDuplicateImages

Takes a Boolean argument, when set to true (the default) pdfwrite will compare all new images with all the images encountered to date (NOT small images which are stored in-line) to see if the new image is a duplicate of an earlier one. If it is a duplicate then instead of writing a new image into the PDF file, the PDF will reuse the reference to the earlier image. This can considerably reduce the size of the output PDF file, but increases the time taken to process the file. This time grows exponentially as more images are added, and on large input files with numerous images can be prohibitively slow. Setting this to false will improve performance at the cost of final file size.

-dFastWebView

Takes a Boolean argument, default is false. When set to true pdfwrite will reorder the output PDF file to conform to the Adobe 'linearised' PDF specification. The Acrobat user interface refers to this as 'Optimised for Fast Web Viewing'. Note that this will cause the conversion to PDF to be slightly slower and will usually result in a slightly larger PDF file.

This option is incompatible with producing an encrypted (password protected) PDF file.

-dPreserveAnnots=boolean: We now attempt to preserve most annotations from input PDF files as annotations in the output PDF file (note, not in output PostScript!) There are a few annotation types which are not preserved, most notably Link and Widget annotations. However, should you wish to revert to the old behaviour, or find that the new behaviour leads to problems, you can set this switch to false which will cause all annotations to be inserted into the page content stream, instead of preserved as annotations.
-sUseOCR=string: Controls the use of OCR in pdfwrite. If enabled this will use an OCR engine to analyse the glyph bitmaps used to draw text in a PDF file, and the resulting Unicode code points are then used to construct a ToUnicode CMap.
PDF files containing ToUnicode CMaps can be searched, use copy/paste and extract the text, subject to the accuracy of the ToUnicode CMap. Since not all PDF files contain these it can be beneficial to create them.

Note that, for English text, it is possible that the existing standard character encoding (which most PDF consumers will fall back to in the absence of Unicode information) is better than using OCR, as OCR is not a 100% reliable process. OCR processing is also comparatively slow.

For the reasons above it is useful to be able to exercise some control over the action of pdfwrite when OCR processing is available, and the UseOCR parameter provides that control. There are three possible values:

Never Default - don't use OCR at all even if support is built-in.
AsNeeded If there is no existing ToUnicode information, use OCR.
Always Ignore any existing information and always use OCR.
Our experimentation with the Tesseract OCR engine has shown that the more text we can supply for the engine to look at, the better the result we get. We are, unfortunately, limited to the graphics library operations for text as follows.

The code works on text 'fragments'; these are the text sequences sent to the text operators of the source language. Generally most input languages will try to send text in its simplest form, eg "Hello", but the requirements of justification, kerning and so on mean that sometimes each character is positioned independently on the page.

So pdfwrite renders all the bitmaps for every charcter in the text document, when set up to use OCR. Later, if any character in the font does not have a Unicode value already we use the bitmaps to assemble a 'strip' of text which we then send to the OCR engine. If the engine returns a different number of recognised characters than we expected then we ignore that result. We've found that (for English text) constructions such as ". The" tend to ignore the full stop, presumably because the OCR engine thinks that it is simply noise. In contrast "text." does identify the full stop correctly. So by ignoring the failed result we can potentially get a better result later in the document.

Obviously this is all heuristic and undoubtedly there is more we can do to improve the functionality here, but we need concrete examples to work from.

Table of contents

Overview

PCL-XL (PXL)

Options

Text output

Options

DOCX output

XPS file output

The family of PDF and PostScript output devices

Common controls and features

Distiller Parameters

Color Conversion and Management

Setting page orientation

Controls and features specific to PostScript and PDF input

Controls and features specific to PCL and PXL input

Example creation of a PDF/A output file

Example using DISTILLERPARAMS to set the quality of JPEG compression.

PDF file output

The following option specifies creation of a PDF/X-3 file

The following switches are used for creating encrypted documents :

The following switches are used for generating metadata according to the Adobe XMP specification :

PostScript file output

Controlling device-specific behavior

Encapsulated PostScript (EPS) file output

Creating a PDF/X-3 document

Creating a PDF/A document

Ghostscript PDF Printer Description

Windows XP or 2000

pdfmark extensions

Limitations