Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2022/05/09)Fwd 1 day (to 2022/05/11) >>>20220510 
artifexirc-bot <Knaldgas> Debugging an issue (on AIX) I found a gs process listening on TCP/IP port 1237. ps -ef : /usr/bin/gs -P- -dSAFER -dCompatibilityLevel=1.4 -sPAPERSIZE=a4 -q -P- -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sstdout=%stderr -sOutputFile=xxx.pdf -P- -dSAFER -dCompatibilityLevel=1.4 -sPAPERSIZE=a4 -c .setpdfwrite -f xxx (probably truncated here).06:01.04 
  <Knaldgas> I wasn't aware that gs could start listening for TCP/IP connections?! - Unfortunately in this case it seized a port that another process needed. Any thoughts?06:02.21 
  <KenSharp> As far as I know Ghostscript (as supplied by us) has no ability to listen to TCP/IP at all.06:59.00 
  <Knaldgas> KenSharp, right, that concurs with what I have found on that issue (nothing). But that leaves me quite baffled as to what I saw... The OS identified the gs process as "LISTENING" on that port, and when I killed the process, the port was released.07:11.14 
  <KenSharp> I've no idea why that would be the case.07:11.42 
  <Knaldgas> Odd07:11.43 
  <Knaldgas> KenSharp, thanks a lot for your feedback! :-)07:12.02 
  <KenSharp> You could try building Ghostscript from teh source we supply and see if it behaves the same07:12.33 
  <KenSharp> Rather than using an OS package07:12.41 
  <Knaldgas> The gs process was started by ps2pdf package, not that I understand more from that. I'm not sure if I can get gs to repeat what it did, but building it from your sources might be on the agenda - thanks again :)07:18.13 
  <KenSharp> Unless there's something odd going on ps2pdf is nothing more than a shell script (a very over-complicated shell script) which starts GS with a couple of parameters. You can easily get teh same result by just running Ghostscript directly07:19.14 
  <Knaldgas> KenSharp, could try that, thanks07:20.04 
  <KenSharp> We supply a ps2pdf script in the source somewhere, I was under the imprssion that was what the package maintainers use07:20.07 
  <KenSharp> Yeah in ghostpdl/lib is ps2pdf which is the relevant script07:20.43 
  <KenSharp> Yes as I thought. ps2pdf calls another shell script based on the PDF version required, so usually it calls ps2pdf14, which in turn calls ps2pdfwr with -dCompatibilityLevel=1.4. ps2pdfwr adds on some more options; -P- -dSAFER -q -P- (again) -dNOPAUSE -dBATCH -sDEVICE=pdfwrte -sstdout=%stderr -sOutputFile=07:23.44 
  <KenSharp> That is, of course, all assuming that it's using the ps2pdf script07:24.13 
  <chrisl> We don't have any networking code in Ghostscript, so I'd have to guess it's a third party library. Might be worth looking at what dynamic libs are linked.08:13.47 
  <qwertynik> Thanks @mvrhel for remembering the request and tagging me here. Appreciate it. Had taken a look at the link, looks like instead of BlackText will need to use those params.09:38.13 
  <qwertynik> However, would text using Type 3 fonts be covered by these options?09:38.13 
  <qwertynik> Thanks to Corona, businesses will digitize even faster. And naturally PDF documents will gain even higher traction. 09:45.02 
  <qwertynik> 09:45.02 
  <qwertynik> In terms of rendering, filling forms PDF format works great. However, extracting data, which anyways it was not originally mean for, is not so straight-forward. 09:45.03 
  <qwertynik> 09:45.04 
  <qwertynik> Given the presence of experts here wanted to understand if there would be changes to the spec to make data extraction *simpler*. Also are there any social media handles such as this to follow to keep a track of the latest developments in the PDF world - changes in specifications, availability of tools/services etc?09:45.06 
  <qwertynik> Thanks @mvrhel for remembering the request and tagging me here. Appreciate it. Had taken a look at the linked page, appears that instead of BlackText will need to use the new params.09:46.11 
  <qwertynik> However, would text using Type 3 fonts be covered by these options?09:46.12 
  <qwertynik> @here09:47.20 
  <Robin_Watts> Changes to the spec? Unlikely, IMHO.09:57.51 
  <Robin_Watts> How well you can extract information from a PDF file depends, largely, on how well the PDF file is constructed in the first place.09:58.28 
  <Robin_Watts> Most PDF construction programs are satisfied with getting stuff looking right - being able to be searched is a bonus. Actually being able to extract the data meaningfully is a very poor third place.10:01.25 
  <Robin_Watts> If you follow the PDF spec then you can produce PDFs where the raw text can be extracted fairly well. Ghostscript, for example, does a good job of making PDFs where the text can be extracted as text, rather than gobbledegook.10:03.10 
  <Robin_Watts> What's much harder is to make a PDF whereby the structure of a document is extractable (this text story flows down this column, then this one, then we have a table, then it continues on page 3 etc)10:04.03 
  <Robin_Watts> I think the StructTree stuff is supposed to allow that kind of information to be encoded - but the problem is most PDF generators aren't given that information, so they can't hope to encode it into the generated PDF file.10:04.50 
  <Robin_Watts> And if the info is rarely there, PDF consumers don't bother to implement the code to make use of it if it is.10:05.25 
  <Robin_Watts> so if no one uses it, why generate it? It's a vicious circle.10:05.40 
  <Robin_Watts> And if you're scanning/OCRing documents to get your PDF, you can't ever hope to accurately have that information either.10:06.36 
  <Robin_Watts> So people tend to just work from first principles and try to guess stuff on extract.10:06.53 
  <qwertynik> This would be great to have. However, having semantic representation of objects - headings, lists, paragraphs should be a good starting step.10:15.47 
  <qwertynik> Had never heard of it. Thanks for mentioning. Hopefully its adoption increases.10:16.36 
  <qwertynik> Yes. But given the increased digitization, being able to extract content with greater ease from PDFs could become essential.10:19.30 
  <qwertynik> Yes. Cumbersome but the only option as of now.10:28.26 
 <<<Back 1 day (to 2022/05/09)Forward 1 day (to 2022/05/11)>>> 
ghostscript.com #mupdf
Search: