Log of #ghostscript at irc.freenode.net.

 <<<Back 1 day (to 2020/11/16)Fwd 1 day (to 2020/11/18) >>>20201117 
Intruder777 Hi. I'm trying to merge several PDF files into one using gswin64.exe tool. The text in resulting pdf looks good, but when I try to copy it - it copies some garbage instead of text (before merging the copying worked fine). Can someone help with this?15:50.28 
artifexirc-bot <KenSharp> Intruder777 eve3ryone's in a meeting at the moment, hold on a bit please15:51.07 
Intruder777 Ok, thank you. Here is the command line I'm using: `gswin64.exe -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/default -dNOPAUSE -dQUIET -dBATCH -dDetectDuplicateImages -dCompressFonts=true -r150 -sOutputFile=output.pdf file1.pdf file2.pdf file3.pdf` . So file1.pdf has text which can be copied, and in the output.pdf the clipboard gets garbage instead of text when copying15:53.34 
artifexirc-bot <KenSharp> Intruder777 I may get called away but let's start.16:07.17 
  <KenSharp> The first point is that Ghostscript (and more particularly the pdfwrite device) doesn't 'merge' PDF files16:07.38 
  <KenSharp> The process is described here:https://www.ghostscript.com/doc/9.53.3/VectorDevices.htm#Overview16:07.45 
  <KenSharp> What happens is that each input PDF file is interpreted into a series of marks on the page, which are sent to the pdfwrite device which then reassembles a new PDF file from them16:08.18 
  <KenSharp> So what's in the output PDF file (in terms of the language and syntax) bears no relation to what was in the input16:08.36 
  <KenSharp> Now, it sounds to me like your input files do not have ToUnicode CMaps16:08.49 
  <KenSharp> In which case the PDF consumer lacks search information.16:09.02 
  <KenSharp> The best it can do (probably) is to use teh characetr Encoding and pretend its ASCII16:09.17 
  <KenSharp> The problem, probably, is that when pdfwrite creates the fonts for output, it is unable to preserve that encopding and uses a custom encoding16:09.51 
  <KenSharp> For example; if you had the text "Hello World" then H woudl be assigned the character code 1, e would be assigned 2 l would be assigned 3 and so on16:10.25 
  <KenSharp> Obviously that is not ASCII16:10.30 
Intruder777 The file1.pdf was created by MS Word - it was a plain .docx file which was "saved as..." as pdf. And it has non-english characters16:10.35 
artifexirc-bot <KenSharp> Well, obviously I'm guessing because I haven't seen your PDF files 🙂16:11.00 
  <KenSharp> If the non-Latin text is searchable then it ought to have ToUnicode CMaps (assumign by non-English you don't mean somethign like French)16:11.32 
  <KenSharp> You also haven't mentioned which version of Ghostscript you are using16:12.17 
Intruder777 GPL Ghostscript 9.2016:12.57 
artifexirc-bot <KenSharp> Well the current version is 9.53.316:13.10 
  <KenSharp> 9.20 is 4 years old16:13.25 
  <KenSharp> So first I'd suggest you try updating16:13.33 
Intruder777 I see. As for the original pdf file - it has something much worse than french - cyrilic16:13.49 
artifexirc-bot <KenSharp> Other than that I'd need to see the input files. The simplest way to provide those is to open a bug report at bugs.ghostscript.com16:14.02 
  <KenSharp> But broadly speaking I'd expect that the problem is the fonts are being re-encoded and there is either no ToUnicode information, or the ToUnicode is being lost16:14.47 
Intruder777 the text in console says "Substituting font Helvetica for AralMT. Loading NimbusSans-Regular font from %rom%Resource/Font/NibusSans-Regular... "16:15.15 
artifexirc-bot <KenSharp> Well that's a bad start16:15.26 
Intruder777 I see. I'm going to try to install recent version now...16:15.36 
artifexirc-bot <KenSharp> But again it depends on what kind of font it is. If its a CIDFont then the font **ought** to be embeded, if it isn't then there will be problems16:16.19 
  <KenSharp> Got to go to another channel, will bbs16:16.29 
Intruder777 Looks like latest version of GS helped. Now same command line produces output pdf where I can copy/search cyrillic text16:29.01 
artifexirc-bot <KenSharp> Well we do improve things from time to time 🙂16:29.21 
Intruder777 Thank you for your help. I figured out that I've already had latest version installed into my ProgramFiles folder, but there was some other stuff in my %PATH% so some old 9.20 version was used which was embedded into some other stuff.16:31.14 
artifexirc-bot <KenSharp> Yeah the $PATH environment variable doesn't get set by the installer on Windows so you have to do that manually16:31.47 
Intruder777 BTW, the latest 9.53.3 version still says those things on the console: "Substituting font Helvetica for AralMT. Loading NimbusSans-Regular font from %rom%Resource/Font/NibusSans-Regular... ". But anyways, the output result is much better.16:33.21 
artifexirc-bot <KenSharp> Probably means that that font wasn't the one causing you a problem. I'd guess the Cyrillic font is a different one16:33.53 
  <KenSharp> You can always use the Windows TT font as a substitute for a Font (but not a CIDFont)16:34.20 
  <KenSharp> You just need to tell Ghostscrit that its an alias by editing the font map16:34.33 
Intruder777 The font name in original docx file (which was saved as pdf) is Calibri16:35.41 
artifexirc-bot <KenSharp> Then I'm guessing Calibri was embedded in the PDF file or you would get a message it was being substituted16:36.12 
Intruder777 yeah, so probably issue was totally related to the GS version only...16:37.18 
artifexirc-bot <KenSharp> I would think so, its presumably something we've enhanced over the years16:37.40 
Intruder777 Thanks again!16:37.55 
artifexirc-bot <KenSharp> No problem have a good day16:38.05 
  <KenSharp> @NancyABQ and there's a bmpcmp now as well16:38.45 
 <<<Back 1 day (to 2020/11/16)Forward 1 day (to 2020/11/18)>>> 
ghostscript.com #mupdf