Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2019/07/13)Fwd 1 day (to 2019/07/15) >>>20190714 
StephDesc Hi12:39.49 
ghostbot Welcome to #ghostscript. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line. If you are looking for help or infomation about MuPDF, try the new #mupdf channel.12:39.49 
StephDesc When I print a PDF document through the GS printer driver, I cannot extract text from the generated postscript file... Am I missing something?12:40.41 
Robin_Watts It's possible that there is not enough information in the original document to extract the text.12:42.59 
  But why don't you try to extract the text direct from the PDF?12:43.15 
  Every transformation you put the document through increases the chance of information being lost.12:43.50 
StephDesc if I try to extract text from the PDF, it's ok, but when converted to PS it seems that the cmap font encoding is lost...12:45.35 
  In the driver I have tried to modify settings and embed all fonts, but it does not work...12:47.29 
  If I convert the PS to PDF and I analyze the PDF with Acrobat, it says: "Text cannot be mapped to Unicode"12:49.03 
  It seems that as soon as the PDF is processed through the PS driver, it lost the font information..12:50.02 
  Is it a normal behavior or is it an issue with the driver,12:54.43 
camelopard StephDesc: What version of Ghostscript are you using? Ancient versions had problems with text extraction.16:04.22 
StephDesc 9.2716:13.05 
camelopard StephDesc: Then somebody experienced need to look into the problem. You can file a bug report and attach the original PDF file. In the worst case your bug report will be rejected.16:21.06 
kens StephDesc, if your PDF file uses CIDFonts (which I infer from the fact that you are talking about a CMap) then you have a problem. ps2write is a level 2 outptu device. Plain level 2It does not support CIDFonts.18:20.45 
  CIDFonts will be converted to multiple type 3 fonts and embedded that way.18:21.06 
  In addition, PostScript does not support ToUnicde CMap information, so that cannot be taken from the PDF file and used.18:21.34 
  PostScript is not an editable format, even less so thatn PDF. You shouldn't be trying to extract text from a PostScript file, its not likely to work in general.18:22.15 
  If you have a PDF file and can extract the text from that, then that's what you shoudl do. Converting to PostScript will only decrease (significantly so) the chances of extractign anything usable from it.18:22.54 
  NB you also haven't said what you mean by 'the GS printer driver'. There is no GS printer driver so presumably you are using one of the devices. I've assumed ps2write because you are expecting PostScript output, but perhaps you mean something else18:24.06 
StephDesc Kens, when I say GS printer driver, it means the Ghostscript printer Driver, the one which generates postscript files. After having a closer look, it seems that PDFs with identity-h font encoding are problematic. After being printed through the GS printer driver, text cannot be extracted21:40.46 
  so if I understand correctly, that's because the PS language does not support that kind of fonts? Could you please confirm? Thanks!21:43.34 
 <<<Back 1 day (to 2019/07/13)Forward 1 day (to 2019/07/15)>>> 
ghostscript.com #mupdf
Search: