| <<<Back 1 day (to 2019/07/14) | Fwd 1 day (to 2019/07/16) >>> | 20190715 |
kens | StephDesc, so you are talking about Windows, the 'Ghostscript pritner driver' isn't a real driver. Its a PPD file (PostScript Pritner Description), the actual driver is supplied by Microsoft. The outptu is pretty much exacly the same as you would get by printing from any Windows application to a PostScript printer. | 06:58.11 |
| The reason we supply it is pretty much the same reason for ht e'press return to continue' prompt,instead of 'press any key'. By supplying a PPD file we can tell people to use that for creating PostScript files to send to Ghostscript rather than saying 'use any PostScript printer driver' | 06:59.17 |
| That being the case, we can't help you with the PostScript it produces, other than saying 'talk to Microsoft' | 06:59.56 |
| Having said that.... | 07:00.02 |
| If you start from a Windows application, and use the Microsoft print system to create the PostScript, the printer driver does embed an extension (which we support) which allows for text to be labelled with its Unicode code point. | 07:00.56 |
| I suspect your problem is that the application you are printing from does **NOT** use that API. What it does is generate the PostScript itself and embed it in the output from the driver using the 'PostScript pass-through' feature of the device driver. | 07:01.45 |
| I would guess you are printing from Acrobat. | 07:01.58 |
| So, the problem is not us, its either Microsoft or, more likely, Adobe's products which are causing your problem. | 07:02.24 |
| To re-iterate what Robin said yesterday; if you have a PDF file form which you can extract the text, then use that. | 07:02.45 |
| Perhaps if you were to explain nwhat you are actually trying to achieve (and why) we might be able to offer some suggestions. As it is, in the asence of any clue what you are trying to achieve, and with no file to look at, all I can tell you is 'not us guv' | 07:03.34 |
| Oh, and its nothing to do with fonts. | 07:04.06 |
StephDesc | Kens, Thanks for your explanation, and Yes, I'm going to process directly the PDF file. But just to understand well, you told me that postscript does not support ToUnicde CMap information, so even if Acrobat Reader or the MS driver were ok, the text could still not be extracted, no? | 07:37.22 |
| As the PDF file uses CID fonts.. | 07:37.57 |
kens | Its not really to do with fonts | 07:38.10 |
| The PostScript language has no means (as standard) to associate any given glyph with a specific character code in a known encoding (such as Unicode) | 07:38.49 |
| PostScript is designed for pritning, nothing else, so you don't care what the character is, as long as its drawn correctly. | 07:39.09 |
StephDesc | ok, thanks I know what you mean | 07:39.32 |
kens | It so happens that, in general, programmers have used ASCII for Latin text, but its far from guaranteed | 07:39.37 |
StephDesc | Thanks again for all these information | 07:39.49 |
kens | For non-Latin text there is no single simple standard | 07:39.51 |
| Now the Microsoft PostScrip driver embeds extra information in the font dictionaries it creates | 07:40.18 |
| non-standard info, but it causes no harm | 07:40.25 |
| A consumer which understands that informaiton (and Ghostscri is one such, as is Acrobat Distiller) can use that information to attach a Unicode value to each glyph code. | 07:40.58 |
StephDesc | yes, I read some articles about that.. | 07:41.40 |
kens | So if (say) Microsoft Word printed some Devanagri text, then the extra info there would let us find that in the PostScript and creat a PDF file with a ToUnicode CMap and you would be able to extract the text | 07:41.41 |
| But, if the application creates the PostScript itaelf, and injects it into the output of the MS driver, using the PostScript pass-through meacanism, then that information is not present | 07:42.17 |
| So we don't have any information to use. If it so happens that the text is encoded with ASCII, or Identity, then we can still extract the information and use it (there are actually another couple of cases, but that's a bit obscure), but in the general case, the information is gone at that point | 07:43.27 |
| THis is essentially what Robin meant when he said don't do any more conversions than you absolutely need to. | 07:43.48 |
StephDesc | I understand... he's totally right | 07:44.13 |
| Thanks Kens | 07:45.23 |
kens | You're welcome, its an unfortunately complicated situation | 07:45.38 |
StephDesc | hum.. Yes! ;-) | 07:45.53 |
kens | Its basically because PostScript predates Unicode wide-spreqad adoption at least, and neither PDF nor PostScript was ever intended to be editable :-) | 07:46.50 |
voices | is ghostscript a language | 19:42.50 |
| i just wrote (drew?) my first postscript diagram. | 19:43.38 |
| a square | 19:43.57 |
| with a hello world caption in times roman | 19:45.00 |
| the syntax is quite.. logical | 19:46.33 |
| <<<Back 1 day (to 2019/07/14) | Forward 1 day (to 2019/07/16)>>> | |