| <<<Back 1 day (to 2014/12/03) | 20141204 |
jogux | robin_watts_mac: https://support.google.com/googleplay/android-developer/checklist/3294213?hl=en to transfer app | 14:38.59 |
chrisl | kens: parallel build fix: http://git.ghostscript.com/?p=user/chrisl/ghostpdl.git;a=commitdiff;h=227687ad | 15:32.29 |
pzn | I need to automate a thing at work (processing text content in pdf files). I tought about doing conversion from pdf->ps and then some grep commands to read the content. | 16:27.41 |
| however pdf2ps saves the ps with LZW compression. how can I specify for "gs" that I dont want compression? | 16:28.26 |
chrisl | pzn: you would be better to look at the ghostscript txtwrite device | 16:29.12 |
pzn | chrisl, tks, I'll take a look at it | 16:30.32 |
chrisl | pzn: it does mean you'll have to call gs directly, rather than use a convenience script, but something like: gs -sDEVICE=txtwrite -o out.txt <input>.pdf | 16:33.46 |
pzn | chrisl, which is the "default" -sDEVICE when I use pdf2ps for example? | 16:34.33 |
chrisl | pdf2ps does not use the default device - the default device is a display window. ps2pdf uses the ps2write device | 16:35.38 |
| Ooops, I mean pdf2ps uses ps2write | 16:35.54 |
pzn | when using ps2write it runs ok, when using txtwrite it gives "Error in `gs': corrupted double-linked list" | 16:37.58 |
chrisl | Ah, well, txtwrite is a relatively new device, it has had a number of fixes - what version of gs are you using? | 16:38.45 |
pzn | surely an old version from debian > 1 year old... I'll get the newest | 16:39.56 |
chrisl | Newest is 9.15..... | 16:40.10 |
pzn | using 9.10 | 16:40.36 |
chrisl | pzn: The problem is that Postscript has similarly flexible text encoding to PDF, so you may well find that strings in the Postscript aren't usable as they stand | 16:41.13 |
pzn | is there any "what you see is what you get" editor for ps? I'd like to learn it a little by making changes | 16:41.34 |
| chrisl, yes, that is what I discovered. things are not simple. | 16:42.03 |
| I'll just try a little bit, then maybe I'll ask for another input in my automation scripts... maybe pdf is not a good input for what I need. | 16:42.43 |
chrisl | No wysywyg editors, PS is a programming as well as a page description language | 16:42.45 |
| PDF allows (but does not require!) metadata to be included that allows the text encoding to be "deciphered", which txtwrite will use, if it's available. But ps2write won't use it.... ever | 16:43.48 |
pzn | chrisl, it was bad idea to get pdf as input. I'll ask for sysdevelopers to output some <xml> for me. It will require some days for them to do it, but it seems the right path to automate it. | 16:45.34 |
| chrisl, anyway, thanks for the help | 16:45.45 |
chrisl | pzn: if getting XML is an option, that is probably going to be preferable for extracting text, but may not be as useful for viewing, if that's required | 16:46.31 |
| Forward 1 day (to 2014/12/05)>>> | |