35.2 Clean

The clean utility will produce a cleaned version of an input PDF. It can apply a range of different options, a full list of which can be obtained by running mutool clean with no options:

$ mutool clean 
usage: mutool clean [options] input.pdf [output.pdf] [pages] 
      -p -   password 
      -g    garbage collect unused objects 
      -gg   in addition to -g compact xref table 
      -ggg   in addition to -gg merge duplicate objects 
      -gggg  in addition to -ggg check streams for duplication 
      -l    linearize PDF 
      -a    ascii hex encode binary streams 
      -d    decompress streams 
      -z    deflate uncompressed streams 
      -f    compress font streams 
      -i    compress image streams 
      -s    clean content streams 
      pages  comma separated list of page numbers and ranges

The arguments here are fairly self explanatory, and usage is best explained with a few examples.

Firstly, and most simply, clean can be used to try to repair broken files. Many PDF files found in the wild are broken - sometimes because of having been corrupted, either by transmission/archiving problems, but a disappointing amount by just having been created by bad PDF writing software. Running a clean pass will attempt to repair the files:

mutool clean in.pdf out.pdf

Individual pages (or page ranges) can be extracted from a PDF. For example:

mutool clean -gggg in.pdf out.pdf 1-10,12

That will extract the pages 1 to 10, and page 12 of in.pdf and output it into a new out.pdf. The -gggg options ensure that unused objects will be dropped from the PDF.

An 8 page PDF might be rearranged into booklet form using:

mutool clean -gggg in.pdf out.pdf 8,1,7,2,6,3,5,4

Finally, a more exotic, but very common example; if someone reports a problem seen on page 4 of a given PDF, the following command will extract that page, and expand the content streams, without decompressing the images or the fonts:

mutool clean -difgggg in.pdf out.pdf 4

If this file still exhibits the same problem, it is generally far easier to debug through it than the original one was.