[jbig2-dev] Re: UCS-2 interpretation
Ralph Giles
giles@casper.ghostscript.com
Thu, 1 Aug 2002 08:19:12 -0700
On Mon, Jul 22, 2002 at 04:21:35PM -0700, William Rucklidge wrote:
> Happy to answer. Sorry it's taken me a while to reply - I was on vacation,
> but am back now.
Same here. :)
> If the byte order is unspecified by the UCS-2 spec, then per the general
> rule for encoding of multi-byte quantities, it should be big-endian.
Ok, thanks for the confirmation.
> > I'm also puzzled that UCS-2 was specified instead of UTF-16. Do you have
> > any insight into current and expected practice there?
>
> Basically, mark this up to ignorance - my ignorance in particular. I
> wanted to make sure that there was some way of putting non-ASCII text into
> comment segments, but I hadn't had any experience in actual i18n encodings.
> I'd heard that ISO 10646 defined a two-byte encoding called UCS-2, so I
> specified the use of that - and that exhausted my knowledge (I have no idea
> what UTF-16 is or how it might differ from UCS-2). If UCS-2 was a poor
> choice, I apologise. If I'd known that the byte order wasn't specified,
> I'd at least have added a note clarifying it.
And to be fair, UTF-16 is part of unicode 3.0 (or at least was moot previous to) and so may
post-date your original spec. Basically, unicode now specifies more than 64k characters; the
UTF-16 encoding allows some escape sequences to encode the less-common characters as multi-word
sequences. So while the encodings aren't the same, it's often possible to guess which is in use.
I suspect we'll see some UTF-16 in the wild, but who knows.
Cheers,
-r