Graphical content within a PDF file is given as streams of PDF “operators”. These operators describe marking operations on a conceptual page. In order to display a PDF file the interpreter needs to run through these operators processing each in turn.
In addition, certain manipulations of PDF operations (like redaction, sanitisation and appending, for example) are best done by operating directly on these operators streams. The alternative scheme, of first converting the operators to graphical objects, then resynthesising an operator stream from that leads to problems with round trip conversions, and the potential loss of structure.
For this reason, the PDF interpreter within MuPDF is structured around an extensible class of pdf_processors. A pdf_processor is a set of functions, one for each operator. The interpreter runs through the operators and handles them by calling the appropriate functions.
By changing the pdf_processor in use, we can therefore change what the effect of interpreting the page is.
MuPDF contains three different pdf_processor implementations, though the system is deliberately open ended, and more can be supplied by any user of the library. Some can even be chained together in powerful ways.
The first, and most commonly used processor is the pdf_run_processor. This processor has the effect of interpreting the incoming operators and turning them into device calls (i.e. graphical objects rendered on a page).
When using the standard fz_run_page (and similar) function(s) this is the pdf_processor that is used automatically. It can still be useful to create these manually, especially when coupling them with a pdf_filter_processor (or similar).
Such processors can be created using:
The component parts of this processor are generally functions named pdf_run_..., and frequently call back into the main pdf interpreter (to handle nested content streams as found in XObjects etc).
The pdf_filter_processor is an example of a processor that allows chaining. PDF operators are fed into the processor, which then ‘filters’ them and passes them out to another processor.
Similar filtering processors could be written for other tasks, such as discarding all the text from a page, changing all occurrences of a particular font for another, or converting all the objects on a page to a given colorspace.
The component parts of this processor are generally functions named pdf_filter_....
The fz_buffer_processor is designed to produce a fz_buffer from an input stream of operators. This is frequently found coupled with a fz_filter_processor, to gather up the filtered version of the operator stream ready for reinsertion into the document.
This is built using a fz_output_processor.
The fz_output_processor is designed to produce an output stream from an input stream of operators. This is frequently found coupled with a fz_filter_processor, to gather up the filtered version of the operator stream ready for reinsertion into the document.
The component parts of this processor are generally functions named pdf_out_....