[jbig2-dev] Re: UCS-2 interpretation

Ralph Giles giles@casper.ghostscript.com
Thu, 1 Aug 2002 08:19:12 -0700


On Mon, Jul 22, 2002 at 04:21:35PM -0700, William Rucklidge wrote:

> Happy to answer.  Sorry it's taken me a while to reply - I was on vacation,
> but am back now.

Same here. :)

> If the byte order is unspecified by the UCS-2 spec, then per the general
> rule for encoding of multi-byte quantities, it should be big-endian.

Ok, thanks for the confirmation.

> > I'm also puzzled that UCS-2 was specified instead of UTF-16. Do you have 
> > any insight into current and expected practice there?
> 
> Basically, mark this up to ignorance - my ignorance in particular.  I
> wanted to make sure that there was some way of putting non-ASCII text into
> comment segments, but I hadn't had any experience in actual i18n encodings.
> I'd heard that ISO 10646 defined a two-byte encoding called UCS-2, so I
> specified the use of that - and that exhausted my knowledge (I have no idea
> what UTF-16 is or how it might differ from UCS-2).  If UCS-2 was a poor
> choice, I apologise.  If I'd known that the byte order wasn't specified,
> I'd at least have added a note clarifying it.

And to be fair, UTF-16 is part of unicode 3.0 (or at least was moot previous to) and so may
post-date your original spec. Basically, unicode now specifies more than 64k characters; the
UTF-16 encoding allows some escape sequences to encode the less-common characters as multi-word
sequences. So while the encodings aren't the same, it's often possible to guess which is in use.

I suspect we'll see some UTF-16 in the wild, but who knows.

Cheers,
 -r