Ghostscript IRC logs

	<<<Back 1 day (to 2020/02/05)	Fwd 1 day (to 2020/02/07) >>>	20200206
RPaja	Hi guys! I'm sorry in advance for my errors during writing... '=D ... I need suggestions about conversion of ps file to searchable pdf...		17:26.41
	I wrote some years ago an application that used a ghostscript library to convert a custom print directly to pdf (CMYK)... application and pdf generation works fine but now I should to add text search function		17:29.00
chrisl	RPaja: There aren't really any suggestions: if the information is available, Ghostscript will produce a searchable PDF, if the information isn't there, well, it is isn't there		17:29.32
RPaja	@chrisl thanks to your reply.... I'm not sure to be understand your reply... GhostScript document not show anything about this.... =L		17:31.39
	*documentation		17:31.52
chrisl	Well, it's not really Ghostscript specific, it's inherent in how Postscript uses fonts (and CIDfonts)		17:33.03
RPaja	ok thanks... so i should review general PostScript documentation?		17:37.06
chrisl	Well, maybe....		17:37.19
	The problem is that the way Postscript uses fonts, it effectively "decouples" the character code from the character it represents.		17:38.03
	So, for example, just because a string contains the character code value 97, it will not necessarily map to the character 'a'.		17:39.03
RPaja	yeah... i understand... so i also embed fonts but should be not corrected map as I expects...		17:40.02
	*exactly mapped		17:40.24
chrisl	That kind of remapping ("encoding" in Postscript terms) is especially true when embedding fonts!		17:41.16
RPaja	ok... i'll try embedding fonts...		17:41.48
chrisl	RPaja: Sorry, I meant embedding the fonts may well make the situation worse!		17:45.49
	RPaja: I should also mention: it is obviously only possible at all if the Postscript actually contains text, and not just "stuff that looks like text"		17:46.34
RPaja	ah ok... i understand exactly the opposite		17:47.23
	@chrisl should "emulate" it as OCR?		17:48.11
	something like hidden text with reference to corresponding page...		17:48.45
chrisl	You'd have to render to a bitmap format, and run the OCR on the image. If your page displays as a sampled image, you lose scalability		17:49.23
	So, it is common to do that with scanned pages, but you lose a good deal of the benefit of PDF		17:50.00
RPaja	uhm...		17:50.25
	meanwhile thank you @chrisl for your replies and your feedback... i'll coding a bit .... Have a nice day!		17:53.12
chrisl	RPaja: Just before you go......		17:53.29
RPaja	i'm here		17:53.44
chrisl	So, the best way to achieve what you need, would be for the Postscript to include relevant GlyphNames2Unicode dictionaries: https://ghostscript.com/doc/9.50/Language.htm#GlyphNames2Unicode		17:53.50
	But as it is undocumented, it can be difficult to work out what's required!		17:54.24
RPaja	i read this in my "googling"... but as "undocumented" i have no more info to use... i search again...		17:55.28
	Thanks! bye!		17:56.51
chrisl	Byte - sorry I couldn't be more help....		17:57.16
RPaja	no problem... thank you so much...		17:57.33
	<<<Back 1 day (to 2020/02/05)	Forward 1 day (to 2020/02/07)>>>

Log of #ghostscript at irc.freenode.net.