31.3 PDF Objects

PDF files are made up of a series of objects. These objects can be in many different types, including dictionaries, streams, numbers, booleans, names, strings etc. For full details, see ‘The PDF Reference Manual’.

MuPDF represents all of these as a pdf_obj pointer. Such pointers are reference counted in the usual way:

pdf_obj *pdf_keep_obj(fz_context *ctx, pdf_obj *obj); 
void pdf_drop_obj(fz_context *ctx, pdf_obj *obj);

Given such a pointer, the actual type of the object can be obtained using:

int pdf_is_null(fz_context *ctx, pdf_obj *obj); 
int pdf_is_bool(fz_context *ctx, pdf_obj *obj); 
int pdf_is_int(fz_context *ctx, pdf_obj *obj); 
int pdf_is_real(fz_context *ctx, pdf_obj *obj); 
int pdf_is_number(fz_context *ctx, pdf_obj *obj); 
int pdf_is_name(fz_context *ctx, pdf_obj *obj); 
int pdf_is_string(fz_context *ctx, pdf_obj *obj); 
int pdf_is_array(fz_context *ctx, pdf_obj *obj); 
int pdf_is_dict(fz_context *ctx, pdf_obj *obj); 
int pdf_is_indirect(fz_context *ctx, pdf_obj *obj); 
int pdf_is_stream(fz_context *ctx, pdf_obj *obj);

These all return non-zero if the object is of the tested type, and zero otherwise.

To extract the data from a PDF object, you can use one of the following functions:

/* safe, silent failure, no error reporting on type mismatches */ 
int pdf_to_bool(fz_context *ctx, pdf_obj *obj); 
int pdf_to_int(fz_context *ctx, pdf_obj *obj); 
fz_off_t pdf_to_offset(fz_context *ctx, pdf_obj *obj); 
float pdf_to_real(fz_context *ctx, pdf_obj *obj); 
char *pdf_to_name(fz_context *ctx, pdf_obj *obj); 
char *pdf_to_str_buf(fz_context *ctx, pdf_obj *obj); 
int pdf_to_str_len(fz_context *ctx, pdf_obj *obj);

It is, in fact, safe to call any of these functions on any pdf_obj pointer. If the object is not of the expected type, a ‘safe’ default will be returned.

31.3.1 Arrays

Array objects consist of lists of other objects, each of which can potentially be of a different type. Accordingly, we have a function to enquire how long a list we have:

int pdf_array_len(fz_context *ctx, pdf_obj *array);

Armed with this knowledge we can then fetch any object we want from within the array.

pdf_obj *pdf_array_get(fz_context *ctx, pdf_obj *array, int i);

Ideally i should be between 0 and length-1 (though the function will just return NULL if an out of range element is requested).

Note that the pdf_obj reference returned by this function is merely borrowed. That is to say, if you wish to keep the object pointer around for more than the immediate lifespan of the call, you should manually call pdf_keep_obj to keep it, and later pdf_drop_obj to dispose of it.

An object can be inserted into an array at a given index, using:

void pdf_array_insert(fz_context *ctx, pdf_obj *array, pdf_obj *obj, int index);

Any objects after this point are shuffled up the array. Alternatively an object can be put into an array at a given point, overwriting any object that is there already:

void pdf_array_put(fz_context *ctx, pdf_obj *array, int i, pdf_obj *obj);

If the array needs to be extended it will be, and any intervening objects will be created as ‘null’. Alternatively objects can be appended to an array using:

void pdf_array_push(fz_context *ctx, pdf_obj *array, pdf_obj *obj);

In all these cases, the array will take new references to the object passed in - that is, after the call, both the array and the caller will hold references to the object. In cases where the object to be inserted is a ‘borrowed’ reference, this is ideal.

In other cases, where the ownership of the object reference should be passed down into the array, we have alternative formulations of those functions:

void pdf_array_insert_drop(fz_context *ctx, pdf_obj *array, pdf_obj *obj, int index); 
void pdf_array_put_drop(fz_context *ctx, pdf_obj *array, int i, pdf_obj *obj); 
void pdf_array_push_drop(fz_context *ctx, pdf_obj *array, pdf_obj *obj);

These functions are so named because they are equivalent to first inserting/putting/pushing the object, and then dropping it, with the nice side effect that any errors encountered during the push still result in the object being correctly dropped, often saving the caller from having to wrap the call in a fz_try/fz_catch clause.