Log of #ghostscript at irc.freenode.net.

 <<<Back 1 day (to 2020/02/05)Fwd 1 day (to 2020/02/07) >>>20200206 
RPaja Hi guys! I'm sorry in advance for my errors during writing... '=D ... I need suggestions about conversion of ps file to searchable pdf...17:26.41 
  I wrote some years ago an application that used a ghostscript library to convert a custom print directly to pdf (CMYK)... application and pdf generation works fine but now I should to add text search function17:29.00 
chrisl RPaja: There aren't really any suggestions: if the information is available, Ghostscript will produce a searchable PDF, if the information isn't there, well, it is isn't there17:29.32 
RPaja @chrisl thanks to your reply.... I'm not sure to be understand your reply... GhostScript document not show anything about this.... =L 17:31.39 
chrisl Well, it's not really Ghostscript specific, it's inherent in how Postscript uses fonts (and CIDfonts)17:33.03 
RPaja ok thanks... so i should review general PostScript documentation?17:37.06 
chrisl Well, maybe....17:37.19 
  The problem is that the way Postscript uses fonts, it effectively "decouples" the character code from the character it represents.17:38.03 
  So, for example, just because a string contains the character code value 97, it will not necessarily map to the character 'a'.17:39.03 
RPaja yeah... i understand... so i also embed fonts but should be not corrected map as I expects...17:40.02 
  *exactly mapped17:40.24 
chrisl That kind of remapping ("encoding" in Postscript terms) is especially true when embedding fonts!17:41.16 
RPaja ok... i'll try embedding fonts...17:41.48 
chrisl RPaja: Sorry, I meant embedding the fonts may well make the situation worse!17:45.49 
  RPaja: I should also mention: it is obviously only possible at all if the Postscript actually contains text, and not just "stuff that looks like text"17:46.34 
RPaja ah ok... i understand exactly the opposite17:47.23 
  @chrisl should "emulate" it as OCR?17:48.11 
  something like hidden text with reference to corresponding page...17:48.45 
chrisl You'd have to render to a bitmap format, and run the OCR on the image. If your page displays as a sampled image, you lose scalability17:49.23 
  So, it is common to do that with scanned pages, but you lose a good deal of the benefit of PDF17:50.00 
RPaja uhm...17:50.25 
  meanwhile thank you @chrisl for your replies and your feedback... i'll coding a bit .... Have a nice day!17:53.12 
chrisl RPaja: Just before you go......17:53.29 
RPaja i'm here17:53.44 
chrisl So, the best way to achieve what you need, would be for the Postscript to include relevant GlyphNames2Unicode dictionaries: https://ghostscript.com/doc/9.50/Language.htm#GlyphNames2Unicode17:53.50 
  But as it is undocumented, it can be difficult to work out what's required!17:54.24 
RPaja i read this in my "googling"... but as "undocumented" i have no more info to use... i search again...17:55.28 
  Thanks! bye!17:56.51 
chrisl Byte - sorry I couldn't be more help....17:57.16 
RPaja no problem... thank you so much...17:57.33 
 <<<Back 1 day (to 2020/02/05)Forward 1 day (to 2020/02/07)>>> 
ghostscript.com #mupdf