| <<<Back 1 day (to 2020/02/05) | Fwd 1 day (to 2020/02/07) >>> | 20200206 |
RPaja | Hi guys! I'm sorry in advance for my errors during writing... '=D ... I need suggestions about conversion of ps file to searchable pdf... | 17:26.41 |
| I wrote some years ago an application that used a ghostscript library to convert a custom print directly to pdf (CMYK)... application and pdf generation works fine but now I should to add text search function | 17:29.00 |
chrisl | RPaja: There aren't really any suggestions: if the information is available, Ghostscript will produce a searchable PDF, if the information isn't there, well, it is isn't there | 17:29.32 |
RPaja | @chrisl thanks to your reply.... I'm not sure to be understand your reply... GhostScript document not show anything about this.... =L | 17:31.39 |
| *documentation | 17:31.52 |
chrisl | Well, it's not really Ghostscript specific, it's inherent in how Postscript uses fonts (and CIDfonts) | 17:33.03 |
RPaja | ok thanks... so i should review general PostScript documentation? | 17:37.06 |
chrisl | Well, maybe.... | 17:37.19 |
| The problem is that the way Postscript uses fonts, it effectively "decouples" the character code from the character it represents. | 17:38.03 |
| So, for example, just because a string contains the character code value 97, it will not necessarily map to the character 'a'. | 17:39.03 |
RPaja | yeah... i understand... so i also embed fonts but should be not corrected map as I expects... | 17:40.02 |
| *exactly mapped | 17:40.24 |
chrisl | That kind of remapping ("encoding" in Postscript terms) is especially true when embedding fonts! | 17:41.16 |
RPaja | ok... i'll try embedding fonts... | 17:41.48 |
chrisl | RPaja: Sorry, I meant embedding the fonts may well make the situation worse! | 17:45.49 |
| RPaja: I should also mention: it is obviously only possible at all if the Postscript actually contains text, and not just "stuff that looks like text" | 17:46.34 |
RPaja | ah ok... i understand exactly the opposite | 17:47.23 |
| @chrisl should "emulate" it as OCR? | 17:48.11 |
| something like hidden text with reference to corresponding page... | 17:48.45 |
chrisl | You'd have to render to a bitmap format, and run the OCR on the image. If your page displays as a sampled image, you lose scalability | 17:49.23 |
| So, it is common to do that with scanned pages, but you lose a good deal of the benefit of PDF | 17:50.00 |
RPaja | uhm... | 17:50.25 |
| meanwhile thank you @chrisl for your replies and your feedback... i'll coding a bit .... Have a nice day! | 17:53.12 |
chrisl | RPaja: Just before you go...... | 17:53.29 |
RPaja | i'm here | 17:53.44 |
chrisl | So, the best way to achieve what you need, would be for the Postscript to include relevant GlyphNames2Unicode dictionaries: https://ghostscript.com/doc/9.50/Language.htm#GlyphNames2Unicode | 17:53.50 |
| But as it is undocumented, it can be difficult to work out what's required! | 17:54.24 |
RPaja | i read this in my "googling"... but as "undocumented" i have no more info to use... i search again... | 17:55.28 |
| Thanks! bye! | 17:56.51 |
chrisl | Byte - sorry I couldn't be more help.... | 17:57.16 |
RPaja | no problem... thank you so much... | 17:57.33 |
| <<<Back 1 day (to 2020/02/05) | Forward 1 day (to 2020/02/07)>>> | |