[jbig2-dev] Greetings from the ex-editor of JBIG2
Raph Levien
raph@casper.ghostscript.com
Tue, 4 Jun 2002 13:50:39 -0700
On Tue, Jun 04, 2002 at 10:12:39AM -0700, William Rucklidge wrote:
> I presume you've gone through the test sequences in Appendix H? They're
> quite short (3 symbols in the dictionary) so that's not too helpful if it's
> a corner case you're missing.
Yes, that seems to decode fine.
> Unfortunately, I don't have access to my previous implementation, so trace
> data is beyond my abilities. The UBC team used to have their source code
> freely available, but it was pulled a while ago because of a dispute
> with the company that had funded it.
>
> I think I may have found the error in your implementation of the arithmetic
> decoder. This is just from looking at the code, so I could be incorrect.
> I was concerned when the comment said that the spec was wrong in the
> software conventions decoder, because I'd implemented from that and hadn't
> run into any problems. However, there is a subtlety in the BYTEIN process
> which it appears you may have missed: B is the byte pointed at by BP. If
> BP changes, B *immediately* changes. It appears that in the leftmost
> branch of Figures E.19 and G.3, you didn't do that - you're using the value
> (0xFF) that B had *before* BP was incremented.
>
> Changing "as->C += 0xFF00" to "as->C += B1 << 9" might do the trick, and
> resolve the issue you had with the software conventions decoder.
>
> This would explain the fact that you get through a few hundred symbols then
> lose sync - this code path is fairly rare (about one byte out of 512, I
> think).
>
> Please let me know if this helps.
It does help, but it doesn't fix the problem with the symbol dicts in
the ubc test streams.
Looking again at the code and the spec, I see that BP was off-by-one.
In the spec, it points to the last byte read. In the code, it pointed
to the next byte to read. I believe that, in the non-software
conventions case, the logic actually turned out to be identical.
I've attached a patch that, I believe, brings the code in line with
the spec. Encouragingly, turning on SOFTWARE_CONVENTION now seems to
work the same as when it's off. This was verified on both the generic
ubc bytestreams and Annex H (test_arith.c).
Obviously, I had suspected the bit stuffing. However, this patch
doesn't affect the results at all. Bit stuffing changes cause the
generic streams to go wrong. There's over 100K in those files, so
probably enough to exercise the corner cases. Also, looking at the
trace, the divergence in the symbol dict streams doesn't seem to be
triggered by an 0xFF byte in the input.
Thanks for your feedback. That helped resolve my confusion over the
software conventions. We're still stuck, though.
Raph