[gs-bugs] [Bug 692381] utf8-Ghostscript - Incompatible changes to the GSAPI/GSDLL interfaces

bugzilla-daemon at ghostscript.com bugzilla-daemon at ghostscript.com
Thu Jul 28 11:15:14 UTC 2011


--- Comment #1 from SaGS <sags5495 at hotmail.com> 2011-07-28 11:15:11 UTC ---
My suggestions on what to do:

(i) -------
Functions that take text parameters get encoding-specific variants. To do 
this in a ‘Windowsly’ manner, names for the ANSII variants get an "A" 
suffix and for those that expect wchar_t parameters the suffix is "W". The
no-suffix names are kept for backward compatibility, and are aliases for 
the ANSII entry points. For reasons explained later, there is another 
variant that takes utf8 strings, doing no conversion at all; I suggets the
suffix "8" for these.

Example for gsapi_run_file():

- Add:

    int gsapi_run_fileA (..., const char *file_name/*ANSII*/, ...);

  New function. All it does is convert file_name ANSII->utf8 and call the
  utf8 worker function (see below). Exported from the *.DEF file as both
  ‘gsapi_run_fileA’ (suffix ‘A’) and ‘gsapi_run_file’ (no suffix).
  (‘EXPORTS’ syntax: ‘externalname = internalname [@ordinal]’.)

- Add:

    int gsapi_run_fileW (..., const wchar_t *file_name/*utf16*/, ...);

  New function. It converts utf16->utf8 and calls the same utf8 worker 
  function. Exported from the *.DEF file as ‘gsapi_run_fileW’ (suffix ‘W’).

- Keep unchanged:

    int gsapi_run_file (..., const char *file_name/*utf8*/, ...);

  This is the worker function called by the ‘A’ and ‘W’ variants. Also 
  exported from the *.DEF as ‘gsapi_run_file8’ (suffix ‘8’).

Why the ‘A/W’ variants:
    We need the ANSII version for backwards compatibility, and the Unicode 
    version for grater coverage of filenames. Utf8 provides the same coverage 
    as utf16 (aka ‘Unicode’ aka ‘wide character’), but utf8 is not a native 
    Windows encoding. These variants with exactly these suffixes are ‘the 
    Windows way’ for this kind of job. Think of these functions as helpers
    for calling the utf8 variant.

Why the no-suffix variant [on Windows]:
    Ensures that existing GSAPI/ GSDLL clients can link, unmodified, to the
    new DLL. Note that with the name-juggling in the DEF, the no-prefix 
    externally visible symbol links to the ANSII functions on Windows and
    to the utf8 one on all other platforms.

Why the ‘8’ variant:
    Currently, the gswin32.exe and gswin32c.dll get command-line as utf16 
    and convert automatically to utf8. In the process, the exact bytes 
    that represent these arguments change. We implicitely assume that 
    nonbody will put binary bytes (true binary, not hex or ascii85) on the 
    command line, and that this limitation is much less annoying than 
    not being able to simply type filenames with extended characters.
    For ‘gsapi_init_with_args()’ the situation si a bit different, the 
    ‘argv[]’ is [could be] generated by a program. This program can very
    well pass a ‘-c’ with binary tokens that would be destroyed by a 
    charset conversion. For such an exotic use, I suggest to provide these
    ‘8’ entries. Note this requires exactly ZERO coding, the function is 
    there anyway.

(ii) -------
I also consider that such an important change as the encoding for 
filename strings passed to PostScript operators must be signaled by 
‘gs_revision()’. At least the GSAPI client can know what kind of DLL it 
loaded and cat appropriately. Also allows writing clients that work with 
both ‘new’ and ‘old’ DLLs, and une the new functionality if available.

(iii) -------
After doing the above changes, the gswin32c.exe and gswin32.exe need to be 
changed to call the new entry points. Currently they call the no-suffix 
function but pass utf8 parameters. What they have to do is pass the 
wide char strings to the ‘W’ functions directly. I consider the 
wchar_t -> utf8 conversion is currently misplaced: being in the frontends
resolve the problems for those frontends; their place is ‘after’ the 
GSAPI interface, to help all clients and not introduce incompatibilities.

(iv) -------
Also these have to be documented in API.htm and DLL.htm. Until now, the 
specs were silent about character encoding, assuming some general 
consensus that the encoding is the one used by the host. Now, because the 
encoding on Windows is forced to be a non-native one, the docs may say 
exactly where is this encoding required and where not.

Configure bugmail: http://bugs.ghostscript.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

More information about the gs-bugs mailing list