[Gs-devel] refined MDRC patch (still including "relpos")

Igor V. Melichev igor at artifex.com
Mon Jun 4 12:21:59 PDT 2001


Dear Mr. Toshiya Suzuki,

> From: mpsuzuki at hiroshima-u.ac.jp
> Sent: 28 ??  2001 ?. 14:35
> To: igor at artifex.com
> Cc: ray at artifex.com; gs-devel at ghostscript.com;
> mpsuzuki at hiroshima-u.ac.jp
> Subject: Re: RE: RE: [Gs-devel] refined MDRC patch (still including
> "relpos")

> Following the revised MDRC patch. In gs_cmap.ps, there's
> comment on "relpos" feature, and a sample of "relpos"
> disabler is included.

I reviewed your patch. Unfortunately I still have lot of remarks to
it, as well as I detected at least 2 bugs in it.
Thus it requires more efforts. See my comments below.

Besides, we need a log message about the patch to put to Source Forge.
Here is my draft log message :

Implementing multi-dimensional CID ranges, and fixing the bugs :

(1) when GS consumes undefined byte sequence (which cannot be mapped
    to some glyphs), GS aborts.

(2) when selected CID-keyed font lacks the glyph for requested CID,
    GS aborts.

(3) "notdefrange" defines SINGLE CID for undefined byte sequences
    in specified range, but current GS implementation takes it as
    an offset to calculate MULTIPLE CID.

(4) "cidrange" operators accept only
       full range specification (<0000> to <FFFF>)
    or 8 bit width range        (<xxyy> to <xxzz>).

Source Forge bug #415163.

[end of log message].

Please correct this draft message as you find useful.

> # I send a tarball of modified files to you in another mail.
> # For gs-devel mailing list, a patch would be better.

Received, thank you. But you did not send lib/gs_cmap.ps,
so as I'm hard to review it completely. Please send it,
but first please check my remarks below.



As usual, I insert my remarks into original text of your patch :


diff -Nur gs-CVS/src.orig/gdebug.h gs-CVS/src/gdebug.h
--- gs-CVS/src.orig/gdebug.h    Wed Sep 20 04:00:11 2000
+++ gs-CVS/src/gdebug.h Mon May 28 18:53:24 2001
@@ -122,5 +122,6 @@
 void debug_dump_bitmap(P4(const byte * from, uint raster, uint height,
                          const char *msg));
 void debug_print_string(P2(const byte * str, uint len));
+void debug_print_string_hex(P2(const byte * str, uint len));

 #endif /* gdebug_INCLUDED */
diff -Nur gs-CVS/src.orig/gsmisc.c gs-CVS/src/gsmisc.c
--- gs-CVS/src.orig/gsmisc.c    Mon Apr  9 17:35:51 2001
+++ gs-CVS/src/gsmisc.c Mon May 28 18:53:24 2001
@@ -440,6 +440,17 @@
     dflush();
 }

+/* Print a string in hexdump format. */
+void
+debug_print_string_hex(const byte * chrs, uint len)
+{
+    uint i;
+
+    for (i = 0; i < len; i++)
+        dprintf1("%02x", chrs[i]);
+    dflush();
+}
+
 /*
  * The following code prints a hex stack backtrace on Linux/Intel systems.
  * It is here to be patched into places where we need to print such a trace
diff -Nur gs-CVS/src.orig/gxfcmap.h gs-CVS/src/gxfcmap.h
--- gs-CVS/src.orig/gxfcmap.h   Wed Nov 29 14:50:03 2000
+++ gs-CVS/src/gxfcmap.h        Mon May 28 18:53:24 2001
@@ -69,8 +69,9 @@
 typedef enum {
     CODE_VALUE_CID,            /* CIDs */
     CODE_VALUE_GLYPH,          /* glyphs */
-    CODE_VALUE_CHARS           /* character(s) */
-#define CODE_VALUE_MAX CODE_VALUE_CHARS
+    CODE_VALUE_CHARS,          /* character(s) */
+    CODE_VALUE_UNDEF,          /* CID - for notdef(char|range) dst */


   To my opinion, "UNDEF" is disorienting identifier here.
   Please replace with "NOTDEF" according to Adobe's notation.
   I percept "undef" as "a stub for undefined value".


+#define CODE_VALUE_MAX CODE_VALUE_UNDEF
 } gx_code_value_type_t;
 /* The strings in this structure are all const after initialization. */
 typedef struct gx_code_lookup_range_s {
diff -Nur gs-CVS/src.orig/gsfcmap.c gs-CVS/src/gsfcmap.c
--- gs-CVS/src.orig/gsfcmap.c   Tue Dec 19 06:58:03 2000
+++ gs-CVS/src/gsfcmap.c        Mon May 28 18:53:24 2001
@@ -147,6 +147,142 @@
     return 0;
 }

+/*
+ * multi-dimentional range comparator
+ */
+#define if_debug_print_string_hex(c, str, len)\
+       if (gs_debug_c(c)) debug_print_string_hex((str), (len))
+
+#define if_debug_print_msg_str_in_range(c, str)\
+BEGIN\
+       if_debug_print_string_hex(c, str, prefix_size + key_size);\
+       if_debug0(c, " in ");\
+       if_debug_print_string_hex(c, prefix, prefix_size);\
+       if_debug_print_string_hex(c, key_lo, key_size);\
+       if_debug0(c, " - ");\
+       if_debug_print_string_hex(c, prefix, prefix_size);\
+       if_debug_print_string_hex(c, key_hi, key_size);\
+END

    I would very appreciate if you replace this macro with C function,
    and call it like this :

    if( gs_debug_c(c) )
        print_msg_str_in_range(str, prefix, prefix_size, key_lo, key_hi,
key_size);



+
+private int
+gs_cmap_get_shortest_chr(const gx_code_map_t * pcmap, uint *pfidx)
+{
+    int i;
+    int len_shortest = MAX_CMAP_CODE_SIZE;
+    uint fidx_shortest = 0; /* font index for this fallback */
+
+    for (i = pcmap->num_lookup - 1; i >= 0; --i) {
+        const gx_code_lookup_range_t *pclr = &pcmap->lookup[i];
+        if ((pclr->key_prefix_size + pclr->key_size) <= len_shortest) {
+           len_shortest = (pclr->key_prefix_size + pclr->key_size);
+           fidx_shortest = pclr->font_index;
+        }
+    }
+
+    if (NULL != pfidx)
+        *pfidx = fidx_shortest;


   pfidx appears always not NULL, please drop the 'if'.



+    return len_shortest;
+}
+
+private int
+gs_multidim_cmp_range(const byte *str,
+                       const byte *prefix,
+                        const byte *key_lo, const byte *key_hi,
+                       int prefix_size, int key_size)


   Actually this function returns the number of vector components
   which lie within projection of the specified range.

   Your identifier for this function may be interpreted as
   a comparison of 2 ranges. Really it's arguments is
   a range and a vector, not 2 ranges.

   I would like to rename this function :

   private int gs_multidim_range_projection( ...

   Also you may find a better name for this function.



+{
+    int        i;
+    if_debug0('J', "\n[J]gmcr() checks");
+    if_debug_print_msg_str_in_range('J', str);
+    if_debug0('J', "\n");
+
+    if (0 < prefix_size) {
+       for (i = 0; i < prefix_size; i++)
+          if (memcmp(prefix, str, i))
+             break;
+
+       if (0 == i)                     /* prefix No match */
+          return 0;
+       else if (i < prefix_size)       /* prefix partial match */
+          return i;
+       else if (0 == key_size)                 /* prefix full match, and no
key */
+          return prefix_size;
+       /* completely matched, and key (with finite length) */
+       str = str + prefix_size;
+    }
+
+    for (i = 0; i < key_size; i++)
+       if (str[i] < key_lo[i] || key_hi[i] < str[i])
+           break;
+
+    if (0 == prefix_size && 0 == i)    /* key no match */
+        return 0;

    Also please improve garmmer in the comment :
    "no key match".


+    else if (i < key_size)             /* key partial match */
+        return (prefix_size + i);
+
+    /* (prefix full patched, and) key full matched */


    Probably you mean "matched" instead "patched".


+    return (prefix_size + key_size);
+}
+
+
+/*
+ * multi-dimentional relative position calculator
+ */
+private int
+gs_multidim_calc_relpos(const byte *str,
+                       const byte *prefix,
+                        const byte *key_lo, const byte *key_hi,
+                       int prefix_size, int key_size)


   The argument "prefix" is never used. Please drop it.

   Also I'd like to remove prefix_size from here,
   so as the function is general-purpose calculator of "relpos"
   for a vector in a range, being considered as an abstract
   algebraic instances, which don't depend on specific data
   structures related to CID ranges. For doing this,
   I would replace calls to this function with ones like this :

   gs_multidim_calc_relpos(str + pre_size, key, key + key_size, key_size);

   This thing is much beter for understanding, right ?

   Then, I'd rename it with "gs_multidim_CID_offset",
   following the notation used by Adobe in CSL..



+{
+    /*
+     *       +---------------+ (L * M * N)
+     *      /|              /|
+     *     / |             / |
+     *    /  |            /  |
+     *   +---------------+   |
+     *   |   +---------- |---+ (L * M)
+     * N |  /            |  /
+     *   | /             | /M
+     *   |/1 2 3 . . . . |/
+     *   +---------------+
+     *  0                L
+     *
+     *         L: 1st dimension
+     *         M: 2nd dimension
+     *         N: 3rd dimension
+     *
+     * "relpos" calculates the number how many blocks are needed to fill
+     * the region from [0,0,0] to [l,m,n].
+     *
+     * To increment for the 1st dimension ([l-1, m, n] to [l, m, n]),
+     * 1 block is consumed.
+     *
+     * To increment for the 2nd dimension ([l, m-1, n] to [l, m, n]),
+     * L blocks are consumed.
+     *
+     * To increment for the 3rd dimension ([l, m, n-1] to [l, m, n]),
+     * (L*M) blocks are consumed.
+     *
+     * The calculation starts low dimension (l, then m, then n), and
+     * the number of blocks to increment current dimension is stored in
+     * variable "j".
+     */

   Here you introduce new style of comments to Ghostscript.
   Your extention for the old style is pictures.
   I'm not sure that it is desirable here.
   I'd like to replace this big comment and the picture with shorter
   thing, being understandable for any educated programmer :

   /* Returns offset of the given CID, considering CID range
      as array of CIDs (the last index changes fastest). */

+
+    int i;     /* index for current dimension */
+    int j = 1; /* how many # is required to increment current "1" */
+    int rel_pos = 0;
+
+    if_debug0('J', "[J]gmcrp() calc rel_pos for 0x");
+    if_debug_print_msg_str_in_range('J', str);
+    if_debug0('J', "\n");
+
+    for (i = key_size - 1; 0 <= i; i--) {
+        rel_pos = rel_pos
+                + j * (str[prefix_size + i] - key_lo[i]);
+        j = j * (key_hi[i] - key_lo[i] + 1);

   Use advantages of C languadge :
         rel_pos += j * (str[prefix_size + i] - key_lo[i]);
         j *= (key_hi[i] - key_lo[i] + 1);

+    }
+    return rel_pos;
+}
+
 /* Get a big-endian integer. */
 private uint
 bytes2int(const byte *p, int n)
@@ -180,7 +316,11 @@
      */
     int i;

-    for (i = pcmap->num_lookup - 1; i >= 0; --i) { /* reverse scan order
due to 'usecmap' */
+    /* old implementation initialized *pchr out of CMDN() */

   We don't need to comment changes relative to the old revision.
   CVS does this automatically. Please remove the comment above.



+    *pchr = 0;
+
+    /* reverse scan order due to 'usecmap' */
+    for (i = pcmap->num_lookup - 1; i >= 0; --i) {
         const gx_code_lookup_range_t *pclr = &pcmap->lookup[i];
         int pre_size = pclr->key_prefix_size, key_size = pclr->key_size,
             chr_size = pre_size + key_size;
@@ -221,6 +361,10 @@
                     bytes2int(str + pre_size, key_size) -
                     bytes2int(key, key_size);
                 return 0;
+            case CODE_VALUE_UNDEF:
+                *pglyph = gs_min_cid_glyph +
+                    bytes2int(pvalue, pclr->value_size);
+                return 0;
             case CODE_VALUE_GLYPH:
                 *pglyph = bytes2int(pvalue, pclr->value_size);
                 return 0;
@@ -240,9 +384,187 @@
     return 0;
 }

+private int
+code_map_decode_next_mdrc(const gx_code_map_t * pcmap, const
gs_const_string * pstr,
+                     uint * pindex, uint * pfidx,
+                     gs_char * pchr, gs_glyph * pglyph)

  Folks can't understand, what is "mdrc".
  Please provide a better name.


+{
+    /*
+     * one "range" is specified by
+     *    (prefix + key(lo-end)) and (prefix + key(hi-end)).
+     *
+     * pclr is a collection of range"s"
+     * which have same prefix, and keys with same length.
+     *
+     * number of ranges = pclr->num_keys
+     *           prefix = pclr->key_prefix
+     *    prefix length = pclr->key_prefix_size
+     *              key = pclr->key.data
+     *       key length = pclr->key_size


   This looks as your comment for yourself.
   I believe that gxfcmap.h explains all this pretty clear.



+     *   +-----------+-----------------------+
+     *   | *(prefix) | *(key)                |
+     *   |...MATCHED...
+     *   | *(prefix) | *(key + key_size)     |
+     *   +-----------+-----------------------+
+     *     / / / / / / / / / / / / / / / / /
+     *   |...NOT MATCHED                     |
+     *     / / / / / / / / / / / / / / / / /
+     *   +-----------+-----------------------+
+     *   | *(prefix) | *(key + 2 * key_size) |
+     *   |...MATCHED
+     *   | *(prefix) | *(key + 3 * key_size) |
+     *   +-----------+-----------------------+
+     *
+     * ...
+     *

   Sorry, I'm unable to understand this picture.
   Why 2 things are "MATCHED" ?
   Why "2" and "3" specifically are being used here ?

   I believe that it's better to drop such pictures than
   spend time for making them to be clear.


+     * pcmap is a collection of pclr.
+     *   number of pclr = pcmap->num_lookup
+     *             pclr = pcmap->lookup[i]
+     *
+     */
+
+    const byte *str = pstr->data + *pindex;
+    uint ssize = pstr->size - *pindex;
+    /*
+     * The keys are not sorted due to 'usecmap'.  Possible optimization :
+     * merge and sort keys in 'zbuildcmap', then use binary search here.
+     * This would be valuable for UniJIS-UTF8-H, which contains about 7000
+     * keys.
+     */
+    int i;
+
+    /*
+     * In the fallback of CMap decoding procedure, there is "partial
matching".
+     * For detail, refer PostScript Ref. Manual v3 at the end of Fonts
chapter.
+     */
+
+    /* partial match parameters, temporal use, NOT pointer !!! */
+    int pm_maxlen = 0;         /* partial match: max length */
+    int pm_index = *pindex;    /* partial match: ptr index (in str) */
+    uint pm_fidx = *pfidx;     /* partial match: ptr font index */
+    gs_char pm_chr = *pchr;    /* partial match: ptr character */
+                               /* pm_pvalue is not needed, because
+                                  partial match is used for notdef */

   What is pm_value ??? If it is not defined, and not needed,
   why to remind it at all ?



+
+    *pchr = 0; /* originally, pchr is initialized out of CMDN() */
+
+    if_debug1('J', "[J]CMDN() is called: str=0x%lx (", str);
+    if_debug0('J', " (");
+    if_debug_print_string_hex('J', str, ssize);
+    if_debug1('J', ") ssize=%d\n", ssize);
+    if_debug1('J', "[J]CMDN() checks %d ranges\n", pcmap->num_lookup);
+
+    for (i = pcmap->num_lookup - 1; i >= 0; --i) {
+       /* main loop - scan the map passed via pcmap */
+       /* reverse scan order due to 'usecmap' */
+
+        const gx_code_lookup_range_t *pclr = &pcmap->lookup[i];
+        int pre_size = pclr->key_prefix_size, key_size = pclr->key_size,
+            chr_size = pre_size + key_size;
+
+       /* length of the given byte stream is shorter than
+         * chr-length of current range, no need for further check,
+         * skip to the next range.
+         */
+        if (ssize < chr_size)
+            continue;
+
+        /* If the first byte of current range prefix does not match
+         * with the given string, there will be no match
+         * (exact nor partial), so skip to the next range.
+         */
+        if (0 < pclr->key_prefix && str[0] != pclr->key_prefix[0])
+            continue;
+
+        /* The first byte of current range matches with given string,
+         * progress to the real comparison.

    For me it's better to say :
           "proceed with full comparison".


+         */
+
+        /* Search the lookup range. We could use binary search. */
+        {
+            const byte *key = pclr->keys.data;
+            int step = key_size;
+            int k;
+            const byte *pvalue = NULL;
+
+            if_debug3('J', "[J]CMDN()     lookup range: key=0x%lx
pvalue=%lx step=%d\n", key, pvalue, step);
+
+           /* when range is "range", 2 keys for lo-end and hi-end
+            * are stacked. So twice the step. */
+            if (pclr->key_is_range)
+               step <<=1;      /* step = step * 2; */
+
+            for (k = 0; k < pclr->num_keys; ++k, key += step) {
+               int ret_gmcr;
+               ret_gmcr = gs_multidim_cmp_range(str,
+                       pclr->key_prefix, key, key + key_size,
+                       pre_size, key_size);
+               if (0 < ret_gmcr && pm_maxlen < chr_size)


     1. Bug : chr_size instead ret_gmcr.
     2. If we rename gs_multidim_cmp_range, ret_gmcr to be renamed either.


+                        pm_maxlen = chr_size;
+               pm_chr = (*pchr << (chr_size * 8)) + bytes2int(str,
chr_size);
+               pm_index = (*pindex) + chr_size;
+               pm_fidx = pclr->font_index;

      Bug : pm_maxlen and other pm_ variables are inconsistent.
      I believe, the code should be like this :


               if (0 < ret_gmcr && pm_maxlen < ret_gmcr) {
                        pm_maxlen = ret_gmcr;
                        pm_chr = (*pchr << (chr_size * 8)) + bytes2int(str,
chr_size);
                        pm_index = (*pindex) + chr_size;
                        pm_fidx = pclr->font_index;
               }

      In this case pay attention to the case ret_gmcr == 0 :
      the {} never executes, so as you may want more changes.

      Besides, (*pindex) is always zero, right ?
      Please remove it from here.


                if (ret_gmcr == pre_size + key_size)
+                        break;
+           }
+
+            /* all keys are tried, but found no match. */
+            /* go to next prefix. */
+            if (k == pclr->num_keys)
+                continue;
+
+            /* We have a match.  Return the result. */

      1. The comment to be improved :

             /* We have a (partial) match.  Return the result. */

      2. Bug :

         There may be several gx_code_lookup_range_s instances
         with same prefix. Such things come from 'usecmap'.
         Your core finds a partial match for the first of them,
         and returns. Perhaps subsequent ones may give a longer
         partial match or full match.

      This is serious bug in your code : it breaks 'usecmap'
      even for single dimension case.
      I'm strongly agains commiting it until it is fixed.



+            *pchr = (*pchr << (chr_size * 8)) + bytes2int(str, chr_size);

      I don't like this formula, but I had insufficient time for
      enough investigation. Definitely it causes fixed overflow,
      if CID range has more than 4 bytes. IMHO *pchr is not
      needed for caller. Please check.

+            *pindex += chr_size;
+            *pfidx = pclr->font_index;
+            pvalue = pclr->values.data + k * pclr->value_size;
+            switch (pclr->value_type) {
+            case CODE_VALUE_CID:
+               {
+                  int ret_gmcrp;
+                  ret_gmcrp = gs_multidim_calc_relpos(str,
+                       pclr->key_prefix, key, key + key_size,
+                       pre_size, key_size);
+                   *pglyph = gs_min_cid_glyph +
+                        bytes2int(pvalue, pclr->value_size) +
+                        ret_gmcrp;
+               }
+                return 0;
+            case CODE_VALUE_UNDEF:
+                *pglyph = gs_min_cid_glyph +
+                    bytes2int(pvalue, pclr->value_size);

   I'm unable to understand, why gs_min_cid_glyph is added.
   It looks as rudiment from old code.

+                return 0;
+            case CODE_VALUE_GLYPH:
+                *pglyph = bytes2int(pvalue, pclr->value_size);
+                return 0;
+            case CODE_VALUE_CHARS:
+                *pglyph =
+                    bytes2int(pvalue, pclr->value_size) +
+                    bytes2int(str + pre_size, key_size) -
+                    bytes2int(key, key_size);
+                return pclr->value_size;

   Why it returns value_size instead the length of consumed character code ?
   For me it looks as bug, but maybe I missunderstand something.


+            default:            /* shouldn't happen */
+                return_error(gs_error_rangecheck);
+            }
+        }
+    }
+    /* No mapping. */
+    *pchr = pm_chr;
+    *pindex = pm_index;
+    *pfidx = pm_fidx;
+    *pglyph = gs_no_glyph;
+    return 0;
+}
+#undef if_debug_print_string_hex(c, str, len)
+#undef if_debug_print_msg_str_in_range(c, str)
+
 /*
  * Decode a character from a string using a CMap.
  * Return like code_map_decode_next.
+ * At present, the range specification by (begin|end)codespacerange
+ * is not used in this function. Therefore, this function accepts
+ * some invalid CMap which def & undef maps exceed the codespacerange.
+ * It should be checked in this function, or some procedure in gs_cmap.ps.
  */
 int
 gs_cmap_decode_next(const gs_cmap_t * pcmap, const gs_const_string * pstr,
@@ -252,14 +574,74 @@
     uint save_index = *pindex;
     int code;

-    *pchr = 0;
+    uint pm_index;
+    uint pm_fidx;
+    gs_char pm_chr;
+
+    /* For first, check defined map */
     code =
-        code_map_decode_next(&pcmap->def, pstr, pindex, pfidx, pchr,
pglyph);
+        code_map_decode_next_mdrc(&pcmap->def, pstr, pindex, pfidx, pchr,
pglyph);
+
+    /* This is defined character */
     if (code != 0 || *pglyph != gs_no_glyph)
         return code;
-    /* This is an undefined character.  Use the notdef map. */
+
+    /* In here, this is NOT defined character */
+    /* save partially matched results */
+    pm_index = *pindex;
+    pm_fidx = *pfidx;
+    pm_chr = *pchr;
+
+    /* check notdef map. */
     *pindex = save_index;
-    *pchr = 0;
-    return code_map_decode_next(&pcmap->notdef, pstr, pindex, pfidx,
-                                pchr, pglyph);
+    code =
+       code_map_decode_next_mdrc(&pcmap->notdef, pstr, pindex, pfidx, pchr,
pglyph);
+
+    /* This is defined "notdef" character. */
+    if (code != 0 || *pglyph != gs_no_glyph)
+        return code;
+
+    /*
+     * This is undefined in def & undef maps,
+     * use partially matched result with default notdef (CID = 0).
+     */
+    if (save_index < pm_index) {
+
+       /* there was some partially matched */
+
+        *pglyph = gs_min_cid_glyph;    /* CID = 0 */
+        *pindex = pm_index;
+        *pfidx = pm_fidx;
+        *pchr = '\0';
+         return 0; /* should return some error for partial matched .notdef?
*/
+    }
+    else {
+       /* no match */
+
+       /* Even partial match is failed.
+         * Getting the shortest length from defined characters,
+         * and cut a .notdef with same length, from undecodable string.
+        * Also this procedure is specified in PS Ref. Manual v3,
+         * at the end of Fonts chapter.
+         */

     I don't like ".notdef" here.
     ".notdef" is somewhat from Type 1 fonts.
     IMHO it is not applicable for CID fonts.
     Please provide a better comment.


+
+       const byte *str = pstr->data + save_index;
+       uint ssize = pstr->size - save_index;
+       int chr_size_shortest =
+               gs_cmap_get_shortest_chr(&pcmap->def, pfidx);
+
+       if (chr_size_shortest <= ssize) {
+            *pglyph = gs_min_cid_glyph;        /* CID = 0, this is CMap
fallback */
+            *pindex = save_index + chr_size_shortest;
+           *pchr = '\0';
+            return 0; /* should return some error for fallback .notdef? */
+       }
+       else {
+            /* Undecodable string is shorter than the shortest character,
+             * there's no way except to return error.
+             */
+           *pglyph = gs_no_glyph;
+           return -1;
+       }
+    }
 }
diff -Nur gs-CVS/src.orig/zfcid0.c gs-CVS/src/zfcid0.c
--- gs-CVS/src.orig/zfcid0.c    Wed Mar 14 04:57:06 2001
+++ gs-CVS/src/zfcid0.c Mon May 28 18:53:24 2001
@@ -484,8 +484,21 @@
     code = pfcid->cidata.glyph_data((gs_font_base *)pfcid,
                        (gs_glyph)(gs_min_cid_glyph + op->value.intval),
                                    &gstr, &fidx);
-    if (code < 0)
-       return code;
+
+    /* return code; original error-sensitive & fragile code */

   I don't understand the comment above.
   IMHO it should be removed.


+    if (code < 0) { /* failed to load glyph data, put CID 0 */
+       int default_fallback_CID = 0 ;
+
+       if_debug2('J', "[J]ztype9cidmap() use CID %d instead of
glyph-missing CID %d\n", default_fallback_CID, op->value.intval);
+
+       op->value.intval = default_fallback_CID;
+
+       /* reload glyph for default_fallback_CID */
+
+       code = pfcid->cidata.glyph_data((gs_font_base *)pfcid,
+                       (gs_glyph)(gs_min_cid_glyph + op->value.intval),
+                                   &gstr, &fidx);
+    }

     make_const_string(op - 1,
                      a_readonly | imemory_space((gs_ref_memory_t
*)pfont->memory),
diff -Nur gs-CVS/lib.orig/gs_cmap.ps gs-CVS/lib/gs_cmap.ps
--- gs-CVS/lib.orig/gs_cmap.ps  Wed Nov 29 16:10:27 2000
+++ gs-CVS/lib/gs_cmap.ps       Mon May 28 18:53:24 2001
@@ -23,6 +23,22 @@

 % ---------------- Public operators ---------------- %

+/.rewriteTempMapsNotDef {
+  DEBUG { (rewriting TempMapsNotDef\n) print flush } if
+  .TempMaps 2 get
+  dup length 0 gt {
+    0 get
+    DEBUG { (...original...\n) print flush } if
+    1 5 2 index length 1 sub {
+      { 1 index exch get 2 3 put } stopped
+      { DEBUG { (cannot rewrite\n) print flush } if }
+      { DEBUG { (rewrite\n) print flush } if } ifelse
+    } for
+  } if
+  pop
+  DEBUG { (...FINISHED...\n) print } if
+} bind def
+
 % composefont doesn't appear in CMap files -- it's documented in
 % the "PostScript Language Reference Manual Supplement".
 /composefont {         % <name> <cmap|cmapname> <fonts> composefont <font>
@@ -89,11 +105,19 @@
   /CodeMap null def            % for .buildcmap
 } bind def
 /endcmap {             % - endcmap -
+  .rewriteTempMapsNotDef
+  DEBUG {
+    (*** defined charmap ***\n) print
+    .TempMaps 1 get {exch == (\t) print ==} forall
+    (*** undefined charmap ***\n) print
+    .TempMaps 2 get {exch == (\t) print ==} forall
+  } if
   10 dict begin 0 1 2 {
     /i exch def
                % Append data from .TempMaps to .CodeMapData.
     /t .TempMaps i get def
     .CodeMapData i get length t { exch pop length add } forall
+    DEBUG { (requested array size ) print dup == } if
     array /a exch def
     a 0 .CodeMapData i get .putmore
     0 1 t length 1 sub {
@@ -273,44 +297,99 @@
   counttomark 3 idiv {
     counttomark -3 roll                % process in correct order
                % Construct prefix, params, key_lo, key_hi, value,
font_index
-    3 1 roll dup length 1 eq {
-      () 3 1 roll      % prefix
-      <01 01 00 02>    % params
-      3 1 roll         % keys
-      concatstrings 4 -1 roll .endmapvalue
-    } {
-               % Stack: cid_base code_lo code_hi
-               % Hack: handle 16-bit single-range mappings specially.
-      counttomark 3 eq 1 index length 2 eq and {
-       () 3 1 roll     % prefix
-       <02 01 00 02>   % params
-       3 1 roll        % keys
-       concatstrings 4 -1 roll .endmapvalue
-      } {
-       exch dup dup length 1 sub 0 exch getinterval    % prefix
-                       % Stack: cid_base code_hi code_lo prefix
-       <01 01 00 02>   % params
-       3 -1 roll dup length 1 sub 1 getinterval        % key_lo
-       4 -1 roll dup length 1 sub 1 getinterval        % key_hi
-       concatstrings
-       4 -1 roll .endmapvalue
+    3 1 roll   % <cid_base> <code_lo> <code_hi>
+               %               prefix  key
+               % 1-byte code:  ()      .
+               % 1-byte range: ()      .
+               % N-byte code:  .       (*)
+               % N-byte range: (*)     (*)
+    dup 1 index eq {   % <code_lo> == <code_hi>
+                       % 0: prefix_len for 1-byte code
+                       % 1: prefix_len for N-byte code
+       dup length 1 eq { 0 } { 1 } ifelse
+    } {                        % <code_lo> != <code_hi>
+                       % calculate prefix_len for *-byte range
+       0               % initial value for N
+       {               % <cid_base> <code_lo> <code_hi> (code_len-1)  N
+           dup 2 index le { exit } if
+           2 index 1 index get % N-th byte of code_lo
+           2 index 2 index get % N-th byte of code_hi
+           eq { 1 add } { exit } ifelse
+       } loop
+    } ifelse
+                               % cid_base code_lo code_hi prefix_len
+
+    % Althogh Adobe CPSI with native CID/CMap support accept
+    % multi-dimensional range specification in notdef & cidrange
+    % (and CID is calculated as relative position in multi-dimensional
+    % range), but older CPSI & ATM cannot handle it.

   Improve grammer :

     % range), older CPSI & ATM cannot handle it.

+    %
+    % GS accepts such specification, but it's recommended to keep
+    % from using this feature for notdef & cidrange.
+    % Following is a disabler of this feature.

   Improve grammer :

     % The following is a disabler of this feature.



+    % -------------------------------------------------------------
+    % counttomark 1 add index  % get map#
+    % 0 ne {                   % if not codespacerange
+    %   1 index length                 % get code length
+    %   1 index                        % get prefix length
+    %   sub                    % calculate key length
+    %   1 gt {                 % if (key_len > 1),
+    %      (.endmapranges error) = flush
+    %      (multi-dimensional range specification is used out of
codespacerange)
+    %      = flush
+    %      (/) =only
+    %      CMapName CMapName length string cvs =only
+    %      (: <) =only
+    %      2 index (%stdout) (w) file exch writehexstring
+    %      (> <) =only
+    %      1 index (%stdout) (w) file exch writehexstring
+    %      (>\n) =only flush
+    %      quit
+    %   } if
+    % } if
+    % -------------------------------------------------------------
+
+    1 index exch 0 exch getinterval
+                               % cid_base code_lo code_hi prefix
+    dup length 3 index length exch sub
+                               % cid_base code_lo code_hi prefix range_len
+    dup 255 gt {
+       (too long coderange specification for current GS\n) print stop


   Please use 'signalerror' instead 'stop'.


+    } if
+    <00 01 00 02> 4 string copy        % create initialized param
+    dup 0 4 -1 roll put                % put range_len into param
+
+    % get key_hi
+    3 -1 roll dup length 3 index length dup 3 1 roll sub getinterval
+
+    % get key_lo
+    4 -1 roll dup length 4 index length dup 3 1 roll sub getinterval
+
+    % make "keys" (concatenated key_lo + key_hi)
+    exch concatstrings
+
+    %
+    4 -1 roll
+    .endmapvalue
+
                % See if we can merge with the previous value.
                % The prefix, params, and font index must match.
-               % Stack: prefix params keys value fontindex
-       4 index 10 index eq             % prefix
-       4 index 10 index eq and % params
-       1 index 7 index eq and  % fontindex
+    % prefix params keys value fontindex
+    counttomark 5 gt { % 2 (or more) ranges (1 range = 5 item)
+       4 index 10 index eq     % compare prefix
+       4 index 10 index eq and % compare params
+       1 index 7 index eq and  % compare fontindex
        {
+          DEBUG { (merge!\n) print } if
          pop 4 2 roll pop pop
-               % Stack: prefix params keys value fontindex keys2 value2
+           % prefix params keys value fontindex keys2 value2
          5 -1 roll 3 -1 roll concatstrings
-               % Stack: prefix params value fontindex value2 keys'
+           % prefix params value fontindex value2 keys'
          4 -1 roll 3 -1 roll concatstrings
-               % Stack: prefix params fontindex keys' values'
+           % prefix params fontindex keys' values'
          3 -1 roll
        } if
-      } ifelse
-    } ifelse
+     } if % end of 2 (or more) ranges
   } repeat
   counttomark 2 add -1 roll .appendmap
 } bind def



Igor.





More information about the gs-devel mailing list