PDF Operator Processors

Graphical content within a PDF file is given as streams of PDF “operators”. These operators describe marking operations on a conceptual page. In order to display a PDF file the interpreter needs to run through these operators processing each in turn.

In addition, certain manipulations of PDF operations (like redaction, sanitisation and appending, for example) are best done by operating directly on these operators streams. The alternative scheme, of first converting the operators to graphical objects, then resynthesising an operator stream from that leads to problems with round trip conversions, and the potential loss of structure.

For this reason, the PDF interpreter within MuPDF is structured around an extensible class of pdf_processors. A pdf_processor is a set of functions, one for each operator. The interpreter runs through the operators and handles them by calling the appropriate functions.

By changing the pdf_processor in use, we can therefore change what the effect of interpreting the page is.

MuPDF contains three different pdf_processor implementations, though the system is deliberately open ended, and more can be supplied by any user of the library. Some can even be chained together in powerful ways.



Subsections