[fitz-dev] Ideas for name contexts (atom repositories)
Raph Levien
raph.levien at artifex.com
Thu Feb 20 17:30:08 PST 2003
On Thu, Feb 20, 2003 at 08:59:59AM -0500, Antony Courtney wrote:
> Hi Raph,
>
> Thanks for taking the time to share your design ruminations with the
> rest of us; I hope we can help you think through some of the issues.
>
> In your message you present 3 different alternatives for managing names
> (and some variations for alternatives 1 and 2).
>
> However, unless I am missing something, (1) and (2) look to me like
> mechanisms for global string management in a library, whereas option (3)
> is really a clever implementation trick for callee-managed storage, for
> a particular ADT (dict's in this case).
Basically, yes. The (1) varieties are very much global string management,
but the (2) varieties have more to do with ownership rules for allocated
strings, particularly discussing when they can get freed. In (3), the rule
is that the ADT owns the strings.
> But if we restrict our scope to the problem of callee-managed storage
> for a particular ADT, rather than the (much more difficult) problem of
> global storage management for a library, I'm not so sure that all of the
> reasons for rejecting alternatives (1) and (2) still apply.
>
> What is the expected lifetime of dict's? If it's expected to be
> relatively short or finite, then why not just have a per-dict intern
> table for the keys?
Very good question re the lifetime of dicts. There are three access
patterns I foresee: (1) using them to pass arbitrary parameter bundles
down to "new" methods (lifetime very short; efficiency-minded objs
will unpack the dict rather than storing a local copy), (2) creating
dicts that will live in the Fitz tree (long life, but again it may
make sense to unpack them for tree storage), (3) passing them up to a
PDF processing app from a PDF parser (lifetime depends on the app, but
it's very easy to envision a cache of "resolved obj references").
There are three things we're trying to optimize here (it's important
to be clear on the goals!). First, memory footprint for
longer-lifetime objs. Second, malloc/free bandwidth (on a typical
arch, a malloc/free roundtrip is 300 clock cycles). Third, dict
lookup time. It's reasonable to assume that there maybe many dict
lookups for each dict insert, but if not, the time will be dominated
the latter.
A per-dict intern table doesn't help you at all with the first two of
these goals. What a larger-grain intern table buys you is sharing. In
a given dict, each name is only used once. But in a pool of a lot of
dicts, you expect a great deal of sharing.
> On the other hand, if the lifetime of a dict could be indeterminate,
> then of course you have to worry about leaking memory just as you did
> for option (1a). But then why can't you use removal of a key from the
> dict as a basis for refcounting or gc of a per-dict intern table,
> without leaking the GC complexity across the API?
Well, gc requires that you be able to enumerate the roots, so it's
hard to imagine _any_ way to do that without exporting it across the
API.
> Or (even better) you could just have a single intern table shared by all
> live dict instances, and use explicit allocation or deallocation of
> dict's along with presence/absence of keys in all live dicts as a basis
> for GC or refcounting. Again, this would look just like callee-managed
> storage of the names to the client of the dict API, and hence should
> isolate the client from the GC/ref-counting complexity.
This is, I believe, (1b). The question of refcnt discipline pain is a
good one. One reason for using names rather than strings in the dict
API is to speed lookups. I think I may have missed the possibility of
supplying dict API calls that use strings rather than interned names;
then the refcnt management is hidden under the interface (insert takes
ownership of the ref; lookup has a short-lifetime ref, and in fact you
can avoid refcnt memory traffic on lookup; if the string doesn't exist
in the name table, then the lookup will fail anyway).
> What do you think?
I'm still mulling it over.
Raph
More information about the fitz-dev
mailing list