21.5 Using the Store

21.5.1 Overview

Every “storable piece of information” in MuPDF is held in a data structure that begins with a fz_storable structure. Rather than repeatedly say “a storable piece of information”, we shall henceforth just say “a fz_storable”.

MuPDF uses reference counting for most of its data structures (see section 21.3 Reference Counting), and fz_storables are no exception.

The objects in the Store are held in a chain according to when they were last used. Whenever an object is ‘used’, it is moved to the head of the chain. Whenever we need to evict an object from the Store to make room, we therefore discard objects from the tail of the chain. In this way frequently used objects are kept around, while rarely used ones are discarded in preference.

Whenever MuPDF needs to use a fz_storable, it first checks to see if there is one in the Store already. It does this by forming a unique ‘key’ and scanning the Store for an object of a given type, with that key. If the object exists within the Store, the fact that the object has been used is noted (i.e. it is moved to the front of the usage chain), a reference is taken, and returned to the caller.

If no reference is returned, the code creates its own version of the fz_storable. It calculates its size, and puts it into the Store, together with the same key as before. The Store takes a reference to the object, links it into its data structure, and updates its running total of the size of all the objects within it.

If placing a new object into the Store would take it over the limit, it runs through and looks for the least recently used objects to evict to bring the limit down. In order for an object to be considered for eviction, their refcount must be 1. We know that the Store is holding 1 reference to the object - if anything else is, then removing it from the Store won’t actually save us any memory.

Regardless of whether the Store can be reduced to a suitable size, the object is always placed into the store. This ensures that the Store’s figure for “amount of memory used by fz_storable’s” remains correct (thus ensuring that should objects become evictable, the store size will fall correctly). It also does no harm, because clearly we have managed to allocate enough memory to form the fz_storable in the first place.

Regardless of whether a caller finds the object in the Store, or has to store it itself, it then proceeds identically. It uses the object for whatever purpose it needed it, and then calls the appropriate fz_drop function to lose its reference. The object will live on in the Store until it needs to be evicted to make room.

When the fz_context (or, more accurately, the last of a set of cloned contexts) is finally destroyed, the Store is destroyed too. This results in every object in the Store being released. Unless something has gone wrong with reference counting, this will result in all our objects being freed.

21.5.2 Handling keys

As discussed above, the Store is basically a set of key/value pairs. While the values are always fz_storables, the keys can be of many different types, due to coming from many disparate parts of the system.

Accordingly, we need a mechanism to allow us to safely know what ‘type’ a given key is, and to compare 2 keys of identical type.

We solve this, by using a fz_store_type structure:

typedef struct fz_store_type_s 
{ 
   int (*make_hash_key)(fz_context *ctx, fz_store_hash *, void *); 
   void *(*keep_key)(fz_context *,void *); 
   void (*drop_key)(fz_context *,void *); 
   int (*cmp_key)(fz_context *ctx, void *, void *); 
   void (*print)(fz_context *ctx, fz_output *out, void *); 
   int (*needs_reap)(fz_context *ctx, void *); 
} fz_store_type;

We will have just one instance of this for each type - normally a static const structure defined in the code. Whenever we insert (or lookup) something in the store, we pass the address of that ‘types’ structure.

We only compare items if they have the same type pointer, and any comparison is done using the cmp_key function pointer therein. In common with normal C idioms, 0 means match, non zero means different.

The keep_key and drop_key entries are used to implement reference counting of keys. Keys can be an amalgam of several reference counted objects, so a single call to the keep or drop functions provided here will take or release references for all these objects in one operation.

The print function is purely for debugging purposes as part of calls to fz_print_store - it should generate a human readable summary of the key to the given fz_output stream.

The make_hash_key and needs_reap functions are explained in the following subsections.

21.5.3 Hashing

In order to ensure the Store performs well, we must ensure that certain processes run efficiently - notably searching for an existing entry, insertion and deletion.

Accordingly, the Store is implemented based on a hash table. For every ‘key’, we need to be able to form a hash, but this process is complicated slightly by the fact that every different fz_storable has a different type for the key.

We solve this by having the make_hash_key member of the fz_store_type structure convert whatever its key data is into a common structure:

typedef struct fz_store_hash_s 
{ 
   fz_store_drop_fn *drop; 
   union 
   { 
      struct 
      { 
         const void *ptr; 
         int i; 
      } pi; /* 8 or 12 bytes */ 
      struct 
      { 
         const void *ptr; 
         int i; 
         fz_irect r; 
      } pir; /* 24 or 28 bytes */ 
      struct 
      { 
         int id; 
         float m[4]; 
      } im; /* 20 bytes */ 
      struct 
      { 
         unsigned char src_md5[16]; 
         unsigned char dst_md5[16]; 
         unsigned int ri:2; 
         unsigned int bp:1; 
         unsigned int bpp16:1; 
         unsigned int proof:1; 
         unsigned int src_extras:5; 
         unsigned int dst_extras:5; 
         unsigned int copy_spots:1; 
      } link; /* 36 bytes */ 
   } u; 
} fz_store_hash; /* 40 or 44 bytes */

The caller will always arrange for this structure to be zero filled on entry to the make_hash_key call. On exit, it should have been updated with the key details. Implementers may extend the union found in this structure as required, though ideally the size of the overall structure should be minimised to avoid unnecessary work.

Once the Store has formed a fz_store_hash it can then generate the required hash for the hash table as required.

21.5.4 Key storable items

Some objects can be used both as values within the Store, and as a component of keys within the Store. We refer to these objects as ‘key storable’ objects. In this case, we need to take additional care to ensure that we do not end up keeping an item within the store purely because its value is referred to by another key in the store.

An example of this are fz_images in PDF files. Each fz_image is placed into the Store to enable it to be easily reused. When the image is rendered, a pixmap is generated from the image, and the pixmap is placed into the Store so it can be reused on subsequent renders. The image forms part of the key for the pixmap.

When we close the pdf document (and any associated pages/display lists etc), we drop the images from the Store. This may leave us in the position of the images having non-zero reference counts purely because they are used as part of the keys for the pixmaps.

The pixmaps can never be found by a search of the Store, because to find them, we’d have to search for them using the appropriate fz_image. They are therefore, to all intents and purposes ‘dead’, and just taking up useless space.

We therefore use special reference counting functions to implement these fz_key_storable items, fz_keep_key_storable and fz_drop_key_storable rather than the more usual fz_keep_storable and fz_drop_storable.

The sole difference is that these enable us to store the number of references to these items that are used in keys. This is achieved by callers taking and dropping references for use in keys with fz_keep_key_storable_key and fz_drop_key_storable_key.

This means that key storable items need to provide two sets of keep and drop functions, one for ‘normal’ callers, and one for use during key handling. For example:

fz_image *fz_keep_image(fz_context *ctx, fz_image *image); 
void fz_drop_image(fz_context *ctx, fz_image *image); 
 
fz_image *fz_keep_image_store_key(fz_context *ctx, fz_image *image); 
void fz_drop_image_store_key(fz_context *ctx, fz_image *image);

The purpose of this extra work is to allow us to spot when we may need to check the Store for ‘dead’ entries - those that can never be found by searching the Store.

21.5.5 Reap passes

When the number of references to a key storable object equals the number of references to an object from keys in the Store, we know that we can remove all the items which have that object as part of the key. This is done by running a pass over the store, ‘reaping’ those items.

If a key does not consist of any storable objects, then the needs_reap entry in its fz_store_type can safely be left as NULL. If it does, however, it must provide an implementation to check whether a reap pass is required. Essentially this needs to check if any of its constituent fz_key_storable objects need reaping, which can be done by a call to:

int fz_key_storable_needs_reaping(fz_context *ctx, const fz_key_storable *ks);

Reap passes are slower than we would like as they touch every item in the store. We therefore provide a way to ‘batch’ such reap passes together, using fz_defer_reap_start and fz_defer_reap_end to bracket a region in which many may be triggered.

The need for a reap is detected as part of normal operations in the core code, and such passes are then triggered automatically as required. The user need never (and indeed cannot) trigger such passes manually. The user can, however, exercise some control over when such operations take place.

If an application is about to perform an operation that may drop many objects (say dropping a collection of cached display lists), then it should call fz_defer_reap_start beforehand, and match that with a fz_defer_reap_end afterwards. Any reap passes triggered by the dropping of objects within the display lists would be deferred until the end - resulting in at most one pass rather than potentially many.