[gs-bugs] [Bug 692381] utf8-Ghostscript - Incompatible changes to the GSAPI/GSDLL interfaces
bugzilla-daemon at ghostscript.com
bugzilla-daemon at ghostscript.com
Thu Jul 28 11:15:14 UTC 2011
--- Comment #1 from SaGS <sags5495 at hotmail.com> 2011-07-28 11:15:11 UTC ---
My suggestions on what to do:
Functions that take text parameters get encoding-specific variants. To do
this in a ‘Windowsly’ manner, names for the ANSII variants get an "A"
suffix and for those that expect wchar_t parameters the suffix is "W". The
no-suffix names are kept for backward compatibility, and are aliases for
the ANSII entry points. For reasons explained later, there is another
variant that takes utf8 strings, doing no conversion at all; I suggets the
suffix "8" for these.
Example for gsapi_run_file():
int gsapi_run_fileA (..., const char *file_name/*ANSII*/, ...);
New function. All it does is convert file_name ANSII->utf8 and call the
utf8 worker function (see below). Exported from the *.DEF file as both
‘gsapi_run_fileA’ (suffix ‘A’) and ‘gsapi_run_file’ (no suffix).
(‘EXPORTS’ syntax: ‘externalname = internalname [@ordinal]’.)
int gsapi_run_fileW (..., const wchar_t *file_name/*utf16*/, ...);
New function. It converts utf16->utf8 and calls the same utf8 worker
function. Exported from the *.DEF file as ‘gsapi_run_fileW’ (suffix ‘W’).
- Keep unchanged:
int gsapi_run_file (..., const char *file_name/*utf8*/, ...);
This is the worker function called by the ‘A’ and ‘W’ variants. Also
exported from the *.DEF as ‘gsapi_run_file8’ (suffix ‘8’).
Why the ‘A/W’ variants:
We need the ANSII version for backwards compatibility, and the Unicode
version for grater coverage of filenames. Utf8 provides the same coverage
as utf16 (aka ‘Unicode’ aka ‘wide character’), but utf8 is not a native
Windows encoding. These variants with exactly these suffixes are ‘the
Windows way’ for this kind of job. Think of these functions as helpers
for calling the utf8 variant.
Why the no-suffix variant [on Windows]:
Ensures that existing GSAPI/ GSDLL clients can link, unmodified, to the
new DLL. Note that with the name-juggling in the DEF, the no-prefix
externally visible symbol links to the ANSII functions on Windows and
to the utf8 one on all other platforms.
Why the ‘8’ variant:
Currently, the gswin32.exe and gswin32c.dll get command-line as utf16
and convert automatically to utf8. In the process, the exact bytes
that represent these arguments change. We implicitely assume that
nonbody will put binary bytes (true binary, not hex or ascii85) on the
command line, and that this limitation is much less annoying than
not being able to simply type filenames with extended characters.
For ‘gsapi_init_with_args()’ the situation si a bit different, the
‘argv’ is [could be] generated by a program. This program can very
well pass a ‘-c’ with binary tokens that would be destroyed by a
charset conversion. For such an exotic use, I suggest to provide these
‘8’ entries. Note this requires exactly ZERO coding, the function is
I also consider that such an important change as the encoding for
filename strings passed to PostScript operators must be signaled by
‘gs_revision()’. At least the GSAPI client can know what kind of DLL it
loaded and cat appropriately. Also allows writing clients that work with
both ‘new’ and ‘old’ DLLs, and une the new functionality if available.
After doing the above changes, the gswin32c.exe and gswin32.exe need to be
changed to call the new entry points. Currently they call the no-suffix
function but pass utf8 parameters. What they have to do is pass the
wide char strings to the ‘W’ functions directly. I consider the
wchar_t -> utf8 conversion is currently misplaced: being in the frontends
resolve the problems for those frontends; their place is ‘after’ the
GSAPI interface, to help all clients and not introduce incompatibilities.
Also these have to be documented in API.htm and DLL.htm. Until now, the
specs were silent about character encoding, assuming some general
consensus that the encoding is the one used by the host. Now, because the
encoding on Windows is forced to be a non-native one, the docs may say
exactly where is this encoding required and where not.
Configure bugmail: http://bugs.ghostscript.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
More information about the gs-bugs