20.2 Implementing a Document Handler

20.2.1 Recognize and Open

To implement a new document handler, a new fz_document_handler structure is required. There are 3 components to such a structure, all function pointers:

typedef struct fz_document_handler_s 
{ 
   fz_document_recognize_fn *recognize; 
   fz_document_open_fn *open; 
   fz_document_open_with_stream_fn *open_with_stream; 
} fz_document_handler;

The first is a function to recognize a document from a magic string, typically a mimetype or a filename:

/* 
   fz_document_recognize_fn: Recognize a document type from 
   a magic string. 
 
   magic: string to recognise - typically a filename or mime 
   type. 
 
   Returns a number between 0 (not recognized) and 100 
   (fully recognized) based on how certain the recognizer 
   is that this is of the required type. 
*/ 
typedef int (fz_document_recognize_fn)(fz_context *ctx, const char *magic);

The second is a function to open a document from a filename:

/* 
   fz_document_open_fn: Function type to open a document from a 
   file. 
 
   filename: file to open 
 
   Pointer to opened document. Throws exception in case of error. 
*/ 
typedef fz_document *(fz_document_open_fn)(fz_context *ctx, const char *filename);

This function can permissibly be NULL, as it can be synthesized automatically from the third entry, a function to open a document from a stream:

/* 
   fz_document_open_with_stream_fn: Function type to open a 
   document from a file. 
 
   stream: fz_stream to read document data from. Must be 
   seekable for formats that require it. 
 
   Pointer to opened document. Throws exception in case of error. 
*/ 
typedef fz_document *(fz_document_open_with_stream_fn)(fz_context *ctx, fz_stream *stream);

To create a fz_document use the fz_new_document macro. For a document of type foo, typically a foo_document structure would be defined as below:

typedef struct 
{ 
   fz_document super; 
   <foo specific fields> 
} foo_document;

This would then be created using a call to fz_new_document, such as:

  foo_document *foo = fz_new_document(ctx, foo_document);

This returns an empty document structure with super populated with default values, and the foo specific fields initialized to 0. The document handler then needs to fill in the document level functions.

20.2.2 Document Level Functions

The fz_document structure contains a list of functions used to implement the document level calls:

typedef struct fz_document_s 
{ 
   int refs; 
   fz_document_drop_fn *drop_document; 
   fz_document_needs_password_fn *needs_password; 
   fz_document_authenticate_password_fn *authenticate_password; 
   fz_document_has_permission_fn *has_permission; 
   fz_document_load_outline_fn *load_outline; 
   fz_document_layout_fn *layout; 
   fz_document_make_bookmark_fn *make_bookmark; 
   fz_document_lookup_bookmark_fn *lookup_bookmark; 
   fz_document_resolve_link_fn *resolve_link; 
   fz_document_count_pages_fn *count_pages; 
   fz_document_load_page_fn *load_page; 
   fz_document_lookup_metadata_fn *lookup_metadata; 
   int did_layout; 
   int is_reflowable; 
} fz_document;

Implementations must fill in the drop_document field, with a pointer to a function called to free any resources help by the document when the reference count drops to 0. In the unlikely event that your implementation has no resources, this field can be left NULL.

/* 
   fz_document_drop_fn: Called when the reference count for 
   the fz_document drops to 0. The implementation should 
   release any resources held by the document. The actual 
   document pointer will be freed by the caller. 
*/ 
typedef void (fz_document_drop_fn)(fz_context *ctx, fz_document *doc);

If your document handler is capable of handling password protected documents, then you must fill in the needs_password field with a pointer to a function called to enquire whether a given document needs a password:

/* 
   fz_document_needs_password_fn: Type for a function to be 
   called to enquire whether the document needs a password 
   or not. See fz_needs_password for more information. 
*/ 
typedef int (fz_document_needs_password_fn)(fz_context *ctx, fz_document *doc);

If your document handler is capable of handling password protected documents, then you must fill in the authenticate_password field with a pointer to a function called to attempt to authenticate a password:

/* 
   fz_document_authenticate_password_fn: Type for a function to be 
   called to attempt to authenticate a password. See 
   fz_authenticate_password for more information. 
*/ 
typedef int (fz_document_authenticate_password_fn)(fz_context *ctx, fz_document *doc, const char *password);

Certain document types encode permissions within them to say what users are allowed to do with them (printing, extracting etc). If your document handler’s format has this concept, then you must fill in the has_permission field with a pointer to a function called to attempt to query such permissions:

/* 
   fz_document_has_permission_fn: Type for a function to be 
   called to see if a document grants a certain permission. See 
   fz_document_has_permission for more information. 
*/ 
typedef int (fz_document_has_permission_fn)(fz_context *ctx, fz_document *doc, fz_permission permission);

Certain document types can optionally include outline (table of contents) information within them. If your document handler’s format has this concept, then you must fill in the load_outline field with a pointer to a function called to attempt to load such information if it is there:

/* 
   fz_document_load_outline_fn: Type for a function to be called to 
   load the outlines for a document. See fz_document_load_outline 
   for more information. 
*/ 
typedef fz_outline *(fz_document_load_outline_fn)(fz_context *ctx, fz_document *doc);

If your document format requires a layout pass before it can be viewed, then you must fill in the layout field with a pointer to a function called to perform such a layout:

/* 
   fz_document_layout_fn: Type for a function to be called to lay 
   out a document. See fz_layout_document for more information. 
*/ 
typedef void (fz_document_layout_fn)(fz_context *ctx, fz_document *doc, float w, float h, float em);

If your document requires a layout pass, you should provide functions to both make and resolve bookmarks to enable reader positions to be kept over layout changes. Accordingly the make_bookmark and lookup_bookmark fields should be filled out:

/* 
   fz_document_make_bookmark_fn: Type for a function to make 
   a bookmark. See fz_make_bookmark for more information. 
*/ 
typedef fz_bookmark (fz_document_make_bookmark_fn)(fz_context *ctx, fz_document *doc, int page); 
 
/* 
   fz_document_lookup_bookmark_fn: Type for a function to lookup 
   a bookmark. See fz_lookup_bookmark for more information. 
*/ 
typedef int (fz_document_lookup_bookmark_fn)(fz_context *ctx, fz_document *doc, fz_bookmark mark);

Some document formats can encode internal links that point to another page in the document. If your document supports this concept, then you must fill in the resolve_link field with a pointer to a function called to resolve a textual link to a page number, and location on that page:

/* 
   fz_document_resolve_link_fn: Type for a function to be called to 
   resolve an internal link to a page number. See fz_resolve_link 
   for more information. 
*/ 
typedef int (fz_document_resolve_link_fn)(fz_context *ctx, fz_document *doc, const char *uri, float *xp, float *yp);

All document formats must fill in the count_pages field with a pointer to a function called to return the number of pages in a document:

/* 
   fz_document_count_pages_fn: Type for a function to be called to 
   count the number of pages in a document. See fz_count_pages for 
   more information. 
*/ 
typedef int (fz_document_count_pages_fn)(fz_context *ctx, fz_document *doc);

Different document formats encode different types of metadata. We therefore have an extensible function to allow such data to be queried. If your document handler wishes to support this, then the lookup_metadata field must be filled in with a pointer to a function to perform such lookups:

/* 
   fz_document_lookup_metadata_fn: Type for a function to query 
   a documents metadata. See fz_lookup_metadata for more 
   information. 
*/ 
typedef int (fz_document_lookup_metadata_fn)(fz_context *ctx, fz_document *doc, const char *key, char *buf, int size);

All document formats must fill in the load_page field with a pointer to a function called to return a reference to a fz_page structure:

/* 
   fz_document_load_page_fn: Type for a function to load a given 
   page from a document. See fz_load_page for more information. 
*/ 
typedef fz_page *(fz_document_load_page_fn)(fz_context *ctx, fz_document *doc, int number);

To create a fz_page use the fz_new_page macro. For a document of type foo, typically a foo_page structure would be defined as below:

typedef struct 
{ 
   fz_page super; 
   <foo specific fields> 
} foo_page;

This would then be created using a call to fz_new_page, such as:

  foo_page *foo = fz_new_page(ctx, foo_page);

This returns an empty document structure with super populated with default values, and the foo specific fields initialized to 0. The document handler implementation then needs to fill in the page level functions.

20.2.3 Page Level Functions

The fz_page structure contains a list of functions used to implement the page level calls:

typedef struct fz_page_s 
{ 
   int refs; 
   fz_page_drop_page_fn *drop_page; 
   fz_page_bound_page_fn *bound_page; 
   fz_page_run_page_contents_fn *run_page_contents; 
   fz_page_load_links_fn *load_links; 
   fz_page_first_annot_fn *first_annot; 
   fz_page_page_presentation_fn *page_presentation; 
   fz_page_control_separation_fn *control_separation; 
   fz_page_separation_disabled_fn *separation_disabled; 
   fz_page_count_separations_fn *count_separations; 
   fz_page_get_separation_fn *get_separation; 
} fz_page;

The fz_page (and hence derived foo_page) structures are reference counted. The refs field is used to keep the reference count in. All the reference counting is handled by the core library, and all that is required of the implementation is that it should supply a drop_page function that will be called when the reference count reaches zero. This is of type:

/* 
   fz_page_drop_page_fn: Type for a function to release all the 
   resources held by a page. Called automatically when the 
   reference count for that page reaches zero. 
*/ 
typedef void (fz_page_drop_page_fn)(fz_context *ctx, fz_page *page);

Implementations must fill in the bound_page field with the address of a function to return the pages bounding box, of type:

/* 
   fz_page_bound_page_fn: Type for a function to return the 
   bounding box of a page. See fz_bound_page for more 
   information. 
*/ 
typedef fz_rect *(fz_page_bound_page_fn)(fz_context *ctx, fz_page *page, fz_rect *);

Implementations must fill in the run_page_contents field with the address of a function to interpret the contents of a page, of type:

/* 
   fz_page_run_page_contents_fn: Type for a function to run the 
   contents of a page. See fz_run_page_contents for more 
   information. 
*/ 
typedef void (fz_page_run_page_contents_fn)(fz_context *ctx, fz_page *page, fz_device *dev, const fz_matrix *transform, fz_cookie *cookie);

If a document format supports internal or external hyperlinks, then its implementation must fill in the load_links field with the address of a function to load the links from a page, of type:

/* 
   fz_page_load_links_fn: Type for a function to load the links 
   from a page. See fz_load_links for more information. 
*/ 
typedef fz_link *(fz_page_load_links_fn)(fz_context *ctx, fz_page *page);

If a document format supports annotations, then its implementation must fill in the first_annot field with the address of a function to load the annotations from a page, of type:

/* 
   fz_page_first_annot_fn: Type for a function to load the 
   annotations from a page. See fz_first_annot for more 
   information. 
*/ 
typedef fz_annot *(fz_page_first_annot_fn)(fz_context *ctx, fz_page *page);

Some document formats can encode information that specifies how pages should be presented to the user as a slideshow - how long they should be displayed, and which transition to use when moving to the next page etc. In implementations of document handlers for such formats, they should fill in the page_presentation field with the address of a function to obtain this information, of type:

/* 
   fz_page_page_presentation_fn: Type for a function to 
   obtain the details of how this page should be presented when 
   in presentation mode. See fz_page_presentation for more 
   information. 
*/ 
typedef fz_transition *(fz_page_page_presentation_fn)(fz_context *ctx, fz_page *page, fz_transition *transition, float *duration);

Some document formats can encapsulate multiple color separations. In order to allow proofing of such formats, MuPDF allows such separations to be enumerated and enabled/disabled. In document handlers for such document formats, the control_separation, separation_disabled, count_separations and get_separation fields should be filled in with functions of the following types respectively:

/* 
   fz_page_control_separation: Type for a function to enable/ 
   disable separations on a page. See fz_control_separation for 
   more information. 
*/ 
typedef void (fz_page_control_separation_fn)(fz_context *ctx, fz_page *page, int separation, int disable); 
 
/* 
   fz_page_separation_disabled_fn: Type for a function to detect 
   whether a given separation is enabled or disabled on a page. 
   See fz_separation_disabled for more information. 
*/ 
typedef int (fz_page_separation_disabled_fn)(fz_context *ctx, fz_page *page, int separation); 
 
/* 
   fz_page_count_separations_fn: Type for a function to count 
   the number of separations on a page. See fz_count_separations 
   for more information. 
*/ 
typedef int (fz_page_count_separations_fn)(fz_context *ctx, fz_page *page); 
 
/* 
   fz_page_get_separation_fn: Type for a function to retrieve 
   details of a separation on a page. See fz_get_separation 
   for more information. 
*/ 
typedef const char *(fz_page_get_separation_fn)(fz_context *ctx, fz_page *page, int separation, uint32_t *rgb, uint32_t *cmyk);