| <<<Back 1 day (to 2021/04/07) | Fwd 1 day (to 2021/04/09) >>> | 20210408 |
marcoagpinto | Hello | 13:07.22 |
ghostbot | Welcome to #ghostscript. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line. If you are looking for help or infomation about MuPDF, try the new #mupdf channel. | 13:07.22 |
marcoagpinto | how do I output into .docx format using the latest ghostscript + cutepdfwriter | 13:07.47 |
artifexirc-bot | <KenSharp> I can't comment on 'cutepdfwrite' | 13:08.29 |
| <KenSharp> I'd suggest you start by using Ghostscript from the command line and satisfy yourself that you can fget that working | 13:08.55 |
| <KenSharp> After that you can seek help with cutepdfwriter | 13:09.05 |
marcoagpinto | ahhhh | 13:09.32 |
| so, I have a PDF, how do I output to .docx using the command line? | 13:09.46 |
| :p | 13:09.47 |
| thnaks | 13:09.54 |
| thanks* | 13:09.57 |
| I was thinking a lot about buying the Acrobat DC converter to convert my Europass to .docx but it is a subscription service :( | 13:11.02 |
| and I only update my CV very little times | 13:11.10 |
artifexirc-bot | <KenSharp> You need to do something like "gs -sDEVICE=docxwrite -o out.docx <input.pdf>" | 13:11.13 |
marcoagpinto | C:\Users\marco>gs | 13:12.10 |
| 'gs' is not recognized as an internal or external command, | 13:12.10 |
| operable program or batch file. | 13:12.10 |
artifexirc-bot | <KenSharp> If you are on Windows then you need one of gswin32, gswin32c, gswin64 or gswin64c depending on whether you installed a 32 or 64 bit executable and want to run the command line or windowed version | 13:12.51 |
marcoagpinto | I installed both 32 and 64 | 13:13.18 |
| :) | 13:13.19 |
| I always install both | 13:13.25 |
artifexirc-bot | <KenSharp> The its up to you which you use | 13:13.32 |
| <KenSharp> Make sure that the paths are in $PATH | 13:13.41 |
| <KenSharp> Or specify the full path to the executable | 13:13.56 |
marcoagpinto | ahhhh | 13:14.06 |
| let me try to find the path | 13:14.11 |
| should be c:\program files blah blah | 13:14.21 |
artifexirc-bot | <KenSharp> Probably "c:\program files\gs\gs9.54.0\bin\gswin64c" | 13:14.28 |
marcoagpinto | Buaaaaaa | 13:21.35 |
| gswin64c -sDEVICE=docxwrite -o c:\d\out.docx c:\d\cveuro.pdf | 13:21.39 |
| it converted 8 pages into just a few lines of text | 13:21.57 |
| :( | 13:21.58 |
| the .docx only has a few lines | 13:22.09 |
artifexirc-bot | <KenSharp> Try out%d.docx | 13:22.38 |
| <KenSharp> SO that you get a file per page | 13:22.44 |
| <KenSharp> And I'll see if the relevant developer is here | 13:22.55 |
| <KenSharp> The docxwrite developer is on vacation today, if you continue to have trouble I'd suggest coming back and trying again tomorrow | 13:24.29 |
| <KenSharp> ping @cgdae when you are around | 13:24.40 |
marcoagpinto | it output 8 .docx files but they don't have contents | 13:25.49 |
| :) | 13:25.49 |
| the original pdf even has images | 13:26.24 |
artifexirc-bot | <KenSharp> IWell it may well depend on the content of your PDF file. You're almost certainly going to have to make that available before anyone can comment | 13:26.29 |
| <KenSharp> You won't get images out at the moment I believe, just text | 13:26.45 |
marcoagpinto | https://proofingtoolgui.org/proofingtoolgui_files/Europass-CV-marcoagpinto_20200731_en_BLANK.pdf | 13:27.06 |
| here is the public PDF with my CV | 13:27.17 |
artifexirc-bot | <KenSharp> Crikey you sure you want that public ? | 13:27.33 |
marcoagpinto | what? | 13:27.43 |
| I am a developer | 13:27.52 |
artifexirc-bot | <KenSharp> Making your CV public | 13:27.52 |
| <KenSharp> Not sure I would do that 🙂 | 13:27.57 |
marcoagpinto | I have my CV public | 13:27.57 |
chrisl | It doesn't look like there is much "text" in there..... | 13:28.02 |
| It's almost all images | 13:28.24 |
artifexirc-bot | <KenSharp> DOB gender nationality are all text | 13:28.25 |
marcoagpinto | ohhhhhhhhh | 13:28.27 |
| :(((((((( | 13:28.29 |
| I simply used the European site to create a link, saved as HTML and then formated it with BlueGriffon, opened with Firefox and printed into PDF format using CutePDFWriter + Ghostscript | 13:29.30 |
artifexirc-bot | <KenSharp> The first page has text the other pages don't seem to | 13:30.00 |
chrisl | Yeh, printing from firefox tends t render large chunks of the page to images.... | 13:30.19 |
artifexirc-bot | <KenSharp> Using txtwrite (which dumps simple text as Unicode) I get 6 lines of text on page 1 | 13:30.24 |
| <KenSharp> Date of Birth, Gender etc | 13:30.33 |
marcoagpinto | KenSharp: that is what I got... a few lines of text in page 1 | 13:30.53 |
artifexirc-bot | <KenSharp> The final page has 1 line of text "Report content:...." | 13:31.00 |
| <KenSharp> Then its working as expected, that's the only text in the PDF file | 13:31.12 |
chrisl | Very much in the YMMV vain, you *could* try running it through our OCR device.... if you are willing to set things up so it will work | 13:31.20 |
marcoagpinto | what if I buy the Adobe subscription to convert to .docx? | 13:32.00 |
| will it convert even the images? | 13:32.05 |
| I tried to open the pdf with Word 365 and it messed most of it | 13:32.17 |
chrisl | You'll have to ask Adobe.... | 13:32.19 |
artifexirc-bot | <KenSharp> Well then you are into Adobe territory. I'm sure it will get 'something' out | 13:32.19 |
| <KenSharp> I believe Adobe uses OCR so it'll probably get some content out of the images | 13:32.34 |
marcoagpinto | :((( | 13:32.57 |
chrisl | But does it use it for exporting docx? Or do you have to one, then the other? | 13:33.04 |
artifexirc-bot | <KenSharp> @chrisl I have no clue, I have never used it | 13:33.16 |
| <KenSharp> I **believe** it uses OCR when exporting, but I could so easily be wrong | 13:33.35 |
| <KenSharp> It may retain the images as images | 13:33.41 |
marcoagpinto | the problem is that Europa site only exports in PDF, but I wanted to edit the CV to change colours and formating and add page numbers | 13:33.51 |
artifexirc-bot | <KenSharp> Well you won't be able to do that with a PDF file! | 13:34.12 |
marcoagpinto | that is why I sent an e-mail to the European guys asking why it doesn't export in ODT and DOC like years ago | 13:34.44 |
| they never replied | 13:34.49 |
artifexirc-bot | <KenSharp> As @chrisl said, if you can rebuild Ghostscript you could add Tesseract and Leptonica, then run the PDF file through the pdfocr24 device | 13:34.52 |
| <KenSharp> That would OCR all the content and produce something that the docx device could extract | 13:35.09 |
chrisl | If you have 9.54.0, you don't need to rebuld gs, you just need to get the training data, and point gs at it | 13:35.26 |
artifexirc-bot | <KenSharp> Note that it would still only be text, it won't get the images out | 13:35.38 |
marcoagpinto | :( | 13:35.44 |
artifexirc-bot | <KenSharp> Oh yes, I forgot! | 13:35.45 |
marcoagpinto | thank you for your help | 13:36.05 |
artifexirc-bot | <KenSharp> The training data is available on the net you just have to download the relevant language pack | 13:36.06 |
marcoagpinto | :) | 13:36.06 |
chrisl | Step 3 here: https://ghostscript.com/ocr.html | 13:36.25 |
marcoagpinto | if I win the EuroMillions tomorrow I will subscribe the Adobe's more expensive pack that includes a PDF editor | 13:36.31 |
| :) | 13:36.32 |
artifexirc-bot | <KenSharp> PDF editing is never reliable | 13:36.49 |
chrisl | Adobe's PDF editor doesn't really edit PDFs | 13:36.53 |
marcoagpinto | no? | 13:37.01 |
| :( | 13:37.02 |
chrisl | I manages certain sledghammery sort of manipulations, not really "editing" | 13:37.29 |
artifexirc-bot | <KenSharp> Definitely not. You may be able to edit some limited parts of a PDF, but the format was never intended for edeiting and so isn't really editable | 13:37.31 |
marcoagpinto | ahhhh | 13:37.47 |
| then I will buy just the .docx converter... $12 or so per month | 13:38.02 |
| :) | 13:38.04 |
| but only when I am about to update my CV which will be after my thesis is ready | 13:38.22 |
artifexirc-bot | <KenSharp> That's probably the most likely solution to work. Though as @chrsil said, you could try it with our stuff for free | 13:38.31 |
marcoagpinto | I know I should try the stuff, but I always get stressed | 13:38.58 |
artifexirc-bot | <KenSharp> Your choice 🙂 | 13:39.12 |
marcoagpinto | it is me who commits the English dictionaries for LibreOffice every six months... and I always get stressed as hell | 13:39.26 |
artifexirc-bot | <KenSharp> $12 isn't so much if you don't need it for long | 13:39.27 |
marcoagpinto | usually I update the CV twice a year | 13:41.04 |
| :) | 13:41.05 |
artifexirc-bot | <KenSharp> Well no doubt our features and support will improve, the OCR feature and .docx feature are both new | 13:42.02 |
marcoagpinto | :) | 13:45.06 |
| thank you for all your help :) | 13:46.49 |
artifexirc-bot | <KenSharp> NP good luck with your CV ;-D | 13:47.03 |
| <<<Back 1 day (to 2021/04/07) | Forward 1 day (to 2021/04/09)>>> | |