Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2021/04/07)Fwd 1 day (to 2021/04/09) >>>20210408 
marcoagpinto Hello13:07.22 
ghostbot Welcome to #ghostscript. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line. If you are looking for help or infomation about MuPDF, try the new #mupdf channel.13:07.22 
marcoagpinto how do I output into .docx format using the latest ghostscript + cutepdfwriter13:07.47 
artifexirc-bot <KenSharp> I can't comment on 'cutepdfwrite'13:08.29 
  <KenSharp> I'd suggest you start by using Ghostscript from the command line and satisfy yourself that you can fget that working13:08.55 
  <KenSharp> After that you can seek help with cutepdfwriter13:09.05 
marcoagpinto ahhhh13:09.32 
  so, I have a PDF, how do I output to .docx using the command line?13:09.46 
  :p13:09.47 
  thnaks13:09.54 
  thanks*13:09.57 
  I was thinking a lot about buying the Acrobat DC converter to convert my Europass to .docx but it is a subscription service :(13:11.02 
  and I only update my CV very little times13:11.10 
artifexirc-bot <KenSharp> You need to do something like "gs -sDEVICE=docxwrite -o out.docx <input.pdf>"13:11.13 
marcoagpinto C:\Users\marco>gs13:12.10 
  'gs' is not recognized as an internal or external command,13:12.10 
  operable program or batch file.13:12.10 
artifexirc-bot <KenSharp> If you are on Windows then you need one of gswin32, gswin32c, gswin64 or gswin64c depending on whether you installed a 32 or 64 bit executable and want to run the command line or windowed version13:12.51 
marcoagpinto I installed both 32 and 6413:13.18 
  :)13:13.19 
  I always install both13:13.25 
artifexirc-bot <KenSharp> The its up to you which you use13:13.32 
  <KenSharp> Make sure that the paths are in $PATH13:13.41 
  <KenSharp> Or specify the full path to the executable13:13.56 
marcoagpinto ahhhh13:14.06 
  let me try to find the path13:14.11 
  should be c:\program files blah blah13:14.21 
artifexirc-bot <KenSharp> Probably "c:\program files\gs\gs9.54.0\bin\gswin64c"13:14.28 
marcoagpinto Buaaaaaa13:21.35 
  gswin64c -sDEVICE=docxwrite -o c:\d\out.docx c:\d\cveuro.pdf13:21.39 
  it converted 8 pages into just a few lines of text13:21.57 
  :(13:21.58 
  the .docx only has a few lines13:22.09 
artifexirc-bot <KenSharp> Try out%d.docx13:22.38 
  <KenSharp> SO that you get a file per page13:22.44 
  <KenSharp> And I'll see if the relevant developer is here13:22.55 
  <KenSharp> The docxwrite developer is on vacation today, if you continue to have trouble I'd suggest coming back and trying again tomorrow13:24.29 
  <KenSharp> ping @cgdae when you are around13:24.40 
marcoagpinto it output 8 .docx files but they don't have contents13:25.49 
  :)13:25.49 
  the original pdf even has images13:26.24 
artifexirc-bot <KenSharp> IWell it may well depend on the content of your PDF file. You're almost certainly going to have to make that available before anyone can comment13:26.29 
  <KenSharp> You won't get images out at the moment I believe, just text13:26.45 
marcoagpinto https://proofingtoolgui.org/proofingtoolgui_files/Europass-CV-marcoagpinto_20200731_en_BLANK.pdf13:27.06 
  here is the public PDF with my CV13:27.17 
artifexirc-bot <KenSharp> Crikey you sure you want that public ?13:27.33 
marcoagpinto what?13:27.43 
  I am a developer13:27.52 
artifexirc-bot <KenSharp> Making your CV public13:27.52 
  <KenSharp> Not sure I would do that 🙂13:27.57 
marcoagpinto I have my CV public13:27.57 
chrisl It doesn't look like there is much "text" in there.....13:28.02 
  It's almost all images13:28.24 
artifexirc-bot <KenSharp> DOB gender nationality are all text13:28.25 
marcoagpinto ohhhhhhhhh13:28.27 
  :((((((((13:28.29 
  I simply used the European site to create a link, saved as HTML and then formated it with BlueGriffon, opened with Firefox and printed into PDF format using CutePDFWriter + Ghostscript13:29.30 
artifexirc-bot <KenSharp> The first page has text the other pages don't seem to13:30.00 
chrisl Yeh, printing from firefox tends t render large chunks of the page to images....13:30.19 
artifexirc-bot <KenSharp> Using txtwrite (which dumps simple text as Unicode) I get 6 lines of text on page 113:30.24 
  <KenSharp> Date of Birth, Gender etc13:30.33 
marcoagpinto KenSharp: that is what I got... a few lines of text in page 113:30.53 
artifexirc-bot <KenSharp> The final page has 1 line of text "Report content:...."13:31.00 
  <KenSharp> Then its working as expected, that's the only text in the PDF file13:31.12 
chrisl Very much in the YMMV vain, you *could* try running it through our OCR device.... if you are willing to set things up so it will work13:31.20 
marcoagpinto what if I buy the Adobe subscription to convert to .docx?13:32.00 
  will it convert even the images?13:32.05 
  I tried to open the pdf with Word 365 and it messed most of it13:32.17 
chrisl You'll have to ask Adobe....13:32.19 
artifexirc-bot <KenSharp> Well then you are into Adobe territory. I'm sure it will get 'something' out13:32.19 
  <KenSharp> I believe Adobe uses OCR so it'll probably get some content out of the images13:32.34 
marcoagpinto :(((13:32.57 
chrisl But does it use it for exporting docx? Or do you have to one, then the other?13:33.04 
artifexirc-bot <KenSharp> @chrisl I have no clue, I have never used it13:33.16 
  <KenSharp> I **believe** it uses OCR when exporting, but I could so easily be wrong13:33.35 
  <KenSharp> It may retain the images as images13:33.41 
marcoagpinto the problem is that Europa site only exports in PDF, but I wanted to edit the CV to change colours and formating and add page numbers13:33.51 
artifexirc-bot <KenSharp> Well you won't be able to do that with a PDF file!13:34.12 
marcoagpinto that is why I sent an e-mail to the European guys asking why it doesn't export in ODT and DOC like years ago13:34.44 
  they never replied13:34.49 
artifexirc-bot <KenSharp> As @chrisl said, if you can rebuild Ghostscript you could add Tesseract and Leptonica, then run the PDF file through the pdfocr24 device13:34.52 
  <KenSharp> That would OCR all the content and produce something that the docx device could extract13:35.09 
chrisl If you have 9.54.0, you don't need to rebuld gs, you just need to get the training data, and point gs at it13:35.26 
artifexirc-bot <KenSharp> Note that it would still only be text, it won't get the images out13:35.38 
marcoagpinto :(13:35.44 
artifexirc-bot <KenSharp> Oh yes, I forgot!13:35.45 
marcoagpinto thank you for your help13:36.05 
artifexirc-bot <KenSharp> The training data is available on the net you just have to download the relevant language pack13:36.06 
marcoagpinto :)13:36.06 
chrisl Step 3 here: https://ghostscript.com/ocr.html13:36.25 
marcoagpinto if I win the EuroMillions tomorrow I will subscribe the Adobe's more expensive pack that includes a PDF editor13:36.31 
  :)13:36.32 
artifexirc-bot <KenSharp> PDF editing is never reliable13:36.49 
chrisl Adobe's PDF editor doesn't really edit PDFs13:36.53 
marcoagpinto no?13:37.01 
  :(13:37.02 
chrisl I manages certain sledghammery sort of manipulations, not really "editing"13:37.29 
artifexirc-bot <KenSharp> Definitely not. You may be able to edit some limited parts of a PDF, but the format was never intended for edeiting and so isn't really editable13:37.31 
marcoagpinto ahhhh13:37.47 
  then I will buy just the .docx converter... $12 or so per month13:38.02 
  :)13:38.04 
  but only when I am about to update my CV which will be after my thesis is ready13:38.22 
artifexirc-bot <KenSharp> That's probably the most likely solution to work. Though as @chrsil said, you could try it with our stuff for free13:38.31 
marcoagpinto I know I should try the stuff, but I always get stressed13:38.58 
artifexirc-bot <KenSharp> Your choice 🙂13:39.12 
marcoagpinto it is me who commits the English dictionaries for LibreOffice every six months... and I always get stressed as hell13:39.26 
artifexirc-bot <KenSharp> $12 isn't so much if you don't need it for long13:39.27 
marcoagpinto usually I update the CV twice a year13:41.04 
  :)13:41.05 
artifexirc-bot <KenSharp> Well no doubt our features and support will improve, the OCR feature and .docx feature are both new13:42.02 
marcoagpinto :)13:45.06 
  thank you for all your help :)13:46.49 
artifexirc-bot <KenSharp> NP good luck with your CV ;-D13:47.03 
 <<<Back 1 day (to 2021/04/07)Forward 1 day (to 2021/04/09)>>> 
ghostscript.com #mupdf
Search: