[gs-devel] persistent cache design

Ralph Giles ralph.giles at artifex.com
Thu Oct 23 16:52:09 PDT 2003


At the last meeting we discussed the need for a persistent cache 
mechanism to store the results of expensive operations between 
Ghostscript invocations. Since then, we've had customer complaints about 
the performance penalty of the 8.11 native font enumeration in on MacOS,
so we've decided to go ahead and implement something. The other major 
demand for this is caching the halftone cels for Raph's Well Tempered 
Screening.

Raph, Ray, and I worked out a basic design today, documented below. I'll 
handle the implementation of the interface on unix and macos, and the 
utilization code for native font enum, but will need a volunteer to do 
(or at least debug) the windows implementation. I imagine Raph will 
handle the utilization code for WTS.

The basic idea is to have a small C api that can be used to cache the 
results of time-consuming calculations within Ghostscript for reuse in 
later invocations. We chose a simple interface where arbirary strings 
act as keys to a cache which accepts or returns simple data buffers.

The API looks like this:

int gp_cache_insert(int type, byte *key, int keylen, void *buffer, int buflen);
int gp_cache_query(int type, byte* key, int keylen, void **buffer)
int gp_cache_release(int type, void *buffer)

(perhaps the buffers should be byte *)

#define GP_CACHE_TYPE_WTS 1
#define GP_CACHE_TYPE_FONTMAP 2
[...etc...]


Thus you call 'insert' to store (or replace) a buffer of data under the 
given key string. You call 'query' with a key string and to retrieve a 
buffer that has been cached, or to learn through an error code that no 
data is available under that key. If a buffer is returned, the return 
value is the size of the buffer.

To simplify memory management, users are encouraged to copy out whatever 
data they need and then call 'release' to relinquish control of the 
buffer before returning to the interpreter.

The cache keys are really a tuple of the key string with a 'type' code 
which indicates the usage category for the data. While namespace 
separation could be provided with string prefixes, this allows the 
cache manager to make more intelligent decisions about eviction, 
storage format, and so on. For any particular use of the cache within 
Ghostscript this will be a constant. An enum would be a reasonable 
choice instead of the defines.

The cache manager stores the inserted data on disk in some convenient 
format and maintains a least-recently-used queuefile. Evictions are 
based on a per-type weighted combination of allotted usage and activity. 
So, for example, excessive generation of halftone cels wouldn't evict 
the cached native fontmap, but would borrow space temporarily from 
the icc colormaps.

Since the persistent cache will be moderately large (we envision a 
default limit around 100 MB) the cache will be shared between 
Ghostscript versions as well as multiple instances. It is thus a design 
requirement is that the cache manager work in parallel with itself. 
The type field can be incremented whenever a cache utilization changes 
it's buffer format or method of generation to avoid cross-talk between 
incompatible versions of Ghostscript sharing the same cache directory.

There will be compiled-in defaults for location and maximum size, which 
can be overridden with environment variables(or registry settings). On 
unix systems the default will be $HOME/.ghostscript/cache/. When run as 
a daemon something like /var/cache/ghostscript/ is recommended. Ray 
suggested somthing like \Temp\gs-cache\ for Windows was best practice. 
Presumedly the files would have to be world-writable for that to work in 
a multi-user context, so perhaps something under "Documents & Settings" 
would be better.

Situating the api as gs_pcache_*() is also reasonable. We felt a gp_ 
level interface was more appropriate because the 'public' api is likely 
to be smaller than any collection of platform-dependent calls used in 
the implementation of the cache manager, and we have precedent for 
large amounts of shared code on the gp_ side. It also leaves open the 
possibility of entirely different implementations on different 
platforms, which may be attractive if the OS or embedded runtime 
provides equivalent functionality already.

That's the plan. Comment now, or you don't get to say "I told you so" 
later. :-)

Cheers,
 -r



More information about the gs-devel mailing list