[jbig2-dev] Comments on the compose code

Raph Levien raph@casper.ghostscript.com
Tue, 2 Jul 2002 18:45:38 -0700


Hi Ralph,

   Cool to see the image compositing code almost working. Some notes:

1. The code assumes big-endian. x86 chips are little-endian. If the
order of bits in the byte mismatches the order of bytes in the word,
it's actually relatively difficult to do the word-oriented shift and
mask operation.

2. Given that most glyphs in symbol dictionaries are small, word
alignment of rowstride can increase the memory consumption
considerably.

Given these two facts, I'd be inclined to recommend byte, rather than
word, granularity. It should be relatively easy to adapt the code.
The impact on performance will likely be minimal.

It might be worth considering preserving the word-oriented case
for those big-endian processors, which are well known for being
glacially slow compared with their little-endian cousins.

3. Your clip logic for x<0 or y<0 isn't quite right. Taking the
simpler y case first, in general you need to add -y * src->stride
to the source pointer. The x case is more complex because it means
that the source is not always aligned on a word (or byte) boundary.

4. I'd be very tempted to strength-reduce the multiplies out of
the body of the row loop. Most optimizing compilers will get this,
but if not, it's a fairly substantial lose.

5. The special case implements "replace" rather than the op. Thus, it
should probably be removed. It sure is a tempting optimization, though
:).

6. An optimization which probably doesn't hurt code clarity: the loops
to compute the mask are unnecessary. mask = (1 << leftbits) - 1; will
suffice.

7. Another optimization (which intrudes on Knuth's law):
special-casing the shift=0 case. With bytes, this will happen roughly
1/8 of the time, except for the complete strip case. One reason to
prefer it might be that you avoid shifting by the word size, which
is dubious on some chips, and is in fact undefined in the C99 spec.

Take care,

Raph