[gs-devel] persistent cache design
Ralph Giles
ralph.giles at artifex.com
Thu Oct 23 16:52:09 PDT 2003
At the last meeting we discussed the need for a persistent cache
mechanism to store the results of expensive operations between
Ghostscript invocations. Since then, we've had customer complaints about
the performance penalty of the 8.11 native font enumeration in on MacOS,
so we've decided to go ahead and implement something. The other major
demand for this is caching the halftone cels for Raph's Well Tempered
Screening.
Raph, Ray, and I worked out a basic design today, documented below. I'll
handle the implementation of the interface on unix and macos, and the
utilization code for native font enum, but will need a volunteer to do
(or at least debug) the windows implementation. I imagine Raph will
handle the utilization code for WTS.
The basic idea is to have a small C api that can be used to cache the
results of time-consuming calculations within Ghostscript for reuse in
later invocations. We chose a simple interface where arbirary strings
act as keys to a cache which accepts or returns simple data buffers.
The API looks like this:
int gp_cache_insert(int type, byte *key, int keylen, void *buffer, int buflen);
int gp_cache_query(int type, byte* key, int keylen, void **buffer)
int gp_cache_release(int type, void *buffer)
(perhaps the buffers should be byte *)
#define GP_CACHE_TYPE_WTS 1
#define GP_CACHE_TYPE_FONTMAP 2
[...etc...]
Thus you call 'insert' to store (or replace) a buffer of data under the
given key string. You call 'query' with a key string and to retrieve a
buffer that has been cached, or to learn through an error code that no
data is available under that key. If a buffer is returned, the return
value is the size of the buffer.
To simplify memory management, users are encouraged to copy out whatever
data they need and then call 'release' to relinquish control of the
buffer before returning to the interpreter.
The cache keys are really a tuple of the key string with a 'type' code
which indicates the usage category for the data. While namespace
separation could be provided with string prefixes, this allows the
cache manager to make more intelligent decisions about eviction,
storage format, and so on. For any particular use of the cache within
Ghostscript this will be a constant. An enum would be a reasonable
choice instead of the defines.
The cache manager stores the inserted data on disk in some convenient
format and maintains a least-recently-used queuefile. Evictions are
based on a per-type weighted combination of allotted usage and activity.
So, for example, excessive generation of halftone cels wouldn't evict
the cached native fontmap, but would borrow space temporarily from
the icc colormaps.
Since the persistent cache will be moderately large (we envision a
default limit around 100 MB) the cache will be shared between
Ghostscript versions as well as multiple instances. It is thus a design
requirement is that the cache manager work in parallel with itself.
The type field can be incremented whenever a cache utilization changes
it's buffer format or method of generation to avoid cross-talk between
incompatible versions of Ghostscript sharing the same cache directory.
There will be compiled-in defaults for location and maximum size, which
can be overridden with environment variables(or registry settings). On
unix systems the default will be $HOME/.ghostscript/cache/. When run as
a daemon something like /var/cache/ghostscript/ is recommended. Ray
suggested somthing like \Temp\gs-cache\ for Windows was best practice.
Presumedly the files would have to be world-writable for that to work in
a multi-user context, so perhaps something under "Documents & Settings"
would be better.
Situating the api as gs_pcache_*() is also reasonable. We felt a gp_
level interface was more appropriate because the 'public' api is likely
to be smaller than any collection of platform-dependent calls used in
the implementation of the cache manager, and we have precedent for
large amounts of shared code on the gp_ side. It also leaves open the
possibility of entirely different implementations on different
platforms, which may be attractive if the OS or embedded runtime
provides equivalent functionality already.
That's the plan. Comment now, or you don't get to say "I told you so"
later. :-)
Cheers,
-r
More information about the gs-devel
mailing list