25.5 Language

Some formats include a declaration of which language is being used for a given piece of text. This can be used to influence aspects of the text layout, including the exact choice of glyphs used in a given font. While we make relatively little use of this at present, we try to preserve the information as part of our philosophy of not losing any information unnecessarily.

Accordingly, we use ISO 639 language specification strings, for example:

typedef enum fz_text_language_e 
{ 
   FZ_LANG_UNSET = 0, 
   FZ_LANG_ur = FZ_LANG_TAG2(u,r), 
   FZ_LANG_urd = FZ_LANG_TAG3(u,r,d), 
   FZ_LANG_ko = FZ_LANG_TAG2(k,o), 
   FZ_LANG_ja = FZ_LANG_TAG2(j,a), 
   FZ_LANG_zh = FZ_LANG_TAG2(z,h), 
   FZ_LANG_zh_Hans = FZ_LANG_TAG3(z,h,s), 
   FZ_LANG_zh_Hant = FZ_LANG_TAG3(z,h,t), 
} fz_text_language;

To save space we pack these into 15 bits. Accordingly, we provide a way to pack/unpack these to/from the more normal string representations:

/* 
   Convert ISO 639 (639-{1,2,3,5}) language specification 
   strings losslessly to a 15 bit fz_text_language code. 
 
   No validation is carried out. Obviously invalid (out 
   of spec) codes will be mapped to FZ_LANG_UNSET, but 
   well-formed (but undefined) codes will be blithely 
   accepted. 
*/ 
fz_text_language fz_text_language_from_string(const char *str);
/* 
   Recover ISO 639 (639-{1,2,3,5}) language specification 
   strings losslessly from a 15 bit fz_text_language code. 
 
   No validation is carried out. See note above. 
*/ 
char *fz_string_from_text_language(char str[8], fz_text_language lang);