[jbig2-dev] Re: UCS-2 interpretation

William Rucklidge wjr@imarkets.com
Mon, 22 Jul 2002 16:21:35 -0700


> I have a simpler spec question for you. :)

Happy to answer.  Sorry it's taken me a while to reply - I was on vacation,
but am back now.

> In section 7.4.15.2 'Multi-byte coded comment' the character encoding is 
> given as UCS-2. I understand UCS-2 doesn't actually specify the byte 
> order, though I haven't verified this in the referenced ISO document. I 
> would expect it to be big-endian, as with the rest of the spec, but 
> wanted to confirm that.

If the byte order is unspecified by the UCS-2 spec, then per the general
rule for encoding of multi-byte quantities, it should be big-endian.

> I'm also puzzled that UCS-2 was specified instead of UTF-16. Do you have 
> any insight into current and expected practice there?

Basically, mark this up to ignorance - my ignorance in particular.  I
wanted to make sure that there was some way of putting non-ASCII text into
comment segments, but I hadn't had any experience in actual i18n encodings.
I'd heard that ISO 10646 defined a two-byte encoding called UCS-2, so I
specified the use of that - and that exhausted my knowledge (I have no idea
what UTF-16 is or how it might differ from UCS-2).  If UCS-2 was a poor
choice, I apologise.  If I'd known that the byte order wasn't specified,
I'd at least have added a note clarifying it.

-wjr