| <<<Back 1 day (to 2022/07/24) | Fwd 1 day (to 2022/07/26) >>> | 20220725 |
artifexirc-bot | <jonesmeier> hi, i was processing alot of files with the output device "ps2write", it was going fine for a while but then an error came up "cannot open device", i don't have the exact error message anymore | 09:54.05 |
| <jonesmeier> should I give it some rest every few calls or something ? | 09:54.23 |
| <jonesmeier> no sorry, it seems it's because /tmp is filling up with tempfiles, i thought only once the error occurs but apparently it does right away, i need to check that | 10:00.55 |
| <jonesmeier> i was piping the console output to grep and it didn't like that it seems and never removed it's temp files | 18:03.19 |
| <jonesmeier> (i'm trying to use it to detect corrupt PDF files) | 18:04.22 |
| <jonesmeier> looks like it's working fine now.. thx | 18:05.33 |
| <KenSharp> OK well if you do find a problem feel free to file a bug report. | 18:06.21 |
| <KenSharp> temp files ought to be cleaned up, they should be unlinked after creation (on Linux) and so should disappear when closed. | 18:06.41 |
| <KenSharp> I'm slightly surprised that you are getting temp files at all, if all you want to do is detect corrupt files. | 18:07.33 |
| <KenSharp> If it were me I woudl run with -sDEVICE=nullpage -dPDFSTOPONERROR and check the return code | 18:07.52 |
| <jonesmeier> ok thanks, maybe I should investigate the thing with grep again when it's done | 18:10.35 |
| <jonesmeier> your suggestion does sound much smarter than what I'm doing for sure! | 18:10.54 |
| <KenSharp> I don't know about smarter, but ti might be quicker! | 18:11.15 |
| <jonesmeier> because I'm converting them to Postscript and then look at it's text output, it never seemed to change the exit code even if it had errors there | 18:11.36 |
| <KenSharp> Oh wow, I really wouldn't convert to PostScript. | 18:11.49 |
| <KenSharp> That's is going to use temp files for certain, and it'll be potentially slow. | 18:12.02 |
| <jonesmeier> but I didn't know about that option you mentioned there | 18:12.09 |
| <KenSharp> The pdfwrite family (which includes ps2write) try **really** hard not to stop, so they note errors, and carry on. | 18:12.29 |
| <KenSharp> Then give you a report at the end and return success. | 18:12.44 |
| <jonesmeier> hm yeah I'm not sure, it is quite slow, but that's not a big concern for me.. I will try your suggestion definitely! | 18:13.04 |
| <KenSharp> But if you use -dPDFSTOPONERROR they should stop when they hit an error, and return a non-zer status | 18:13.06 |
| <jonesmeier> yes it would always return successfully, even when noting multiple errors and not even being able to read a single page | 18:13.24 |
| <KenSharp> We're trying to mimic Acrobat 🙂 | 18:13.40 |
| <jonesmeier> very nice, thank you! I'm going to make the changes now and see how that works :) | 18:14.05 |
| <KenSharp> Basically people tend to grumble when we throw errors on PDF input, because "Acrobat can open it" | 18:14.07 |
| <KenSharp> Which is true, but only because Acrobat silently ignores/fixes many problems. | 18:14.23 |
| <KenSharp> Well if you do find problems, do please open a bug report so we can look into it | 18:14.47 |
| <KenSharp> Best of luck 🙂 | 18:14.51 |
| <jonesmeier> hehe ok I see, so in order to basically do exactly what Acrobat does, it can't treat many things as an error | 18:15.59 |
| <KenSharp> Yep, the PDF interpreter tries to ignore errors and carry on. Unlike Acrobat we do report at the end if we found anything concerning though | 18:16.31 |
| <jonesmeier> I guess that's good if ultimately it could really read the file.. in my case it can't read some, but returns successfully.. so I guess it can read some of it, it has a correct header and stuff, but it says "no pages will be processed" or something | 18:16.49 |
| <jonesmeier> on those broken files | 18:17.09 |
| <KenSharp> Yeah if the trailer can't be read to find the Pages array then basically you#'re out of luck | 18:17.20 |
| <KenSharp> But we have a number of badly broken files (many produced by the OSS-fuzz fuzzing tool) which we can get 'some' content out of, even though even Acrobat won't open them. | 18:17.53 |
| <jonesmeier> yeah that must be it, the files were truncated on copying | 18:17.58 |
| <jonesmeier> nice, then it can still help in these situations! I always do try it | 18:18.40 |
| <jonesmeier> I ended up buying "Recovery Toolbox for PDF" for 40 bucks, because it really fixed all of them.. so I was happy to pay that | 18:19.11 |
| <KenSharp> It can often extract 'some' content, but not always everything. But at least it tries to tell you if it thinks there was a problem. I've not heard of Recovery Toolbox though, new one to me | 18:19.42 |
| <KenSharp> If you use the pdfwrite device to write a new PDF file GS will 'fix' a lot of broken files. | 18:20.07 |
| <jonesmeier> ok, will continue to try on Linux first of course, this one was a Windows tool | 18:20.16 |
| <jonesmeier> alright, will try that again too | 18:20.42 |
| <KenSharp> If you do have files which the recovery thing can fix and we can't I'd be interested to see them. Maybe we can do a better job. | 18:21.33 |
| <jonesmeier> okay... well unfortunately these are files from HR like contracts and stuff with personal data | 18:23.21 |
| <KenSharp> Hmm wel that's not going to work, can't be having those. | 18:23.39 |
| <jonesmeier> maybe I can simulate what happened to these files | 18:23.58 |
| <jonesmeier> because I have yet to find out why they got truncated during copying... | 18:24.32 |
| <KenSharp> Don't worry too much, if you happen to find a file or two you can share that would be great, but we do have a lot of broken files 🙂 | 18:25.04 |
| <jonesmeier> copied from a LAN samba share to a VPN samba share.. i want to do some testing, so maybe i can reproduce it, with some testfiles too | 18:25.18 |
| <jonesmeier> hehe ok :) well maybe, i will report back if i can reproduce the errors, and then with some harmless files | 18:25.50 |
| <KenSharp> I imagine it would be useful for you to figure out why they are being truncated, I'd certianly want to know.... | 18:25.54 |
| <jonesmeier> yeah that's what I need to do next! | 18:26.06 |
| <jonesmeier> it's really not great.. | 18:26.52 |
| <KenSharp> Worrying I would think | 18:27.06 |
| <jonesmeier> yes | 18:27.11 |
| <jonesmeier> ok the PDFSTOPONERROR did throw errors on a couple of more files now, I need to check afterwards if these were broken too.. I'm also only going with "opens in Adobe Reader" of course | 18:46.38 |
| <jonesmeier> but I will do another run afterwards, my version right now looks for "Processing pages" in the output | 18:47.07 |
| <KenSharp> Hmm wel Acrobat will open all kinds of broken files, and it won't usually tell you they were bnoken. | 18:47.17 |
| <jonesmeier> thank you very much for the time so far Ken | 18:47.18 |
| <KenSharp> NP | 18:47.22 |
| <KenSharp> I'm off now anyway, well past quitting time | 18:47.34 |
| <jonesmeier> hmm ok will have to see! | 18:47.57 |
| <jonesmeier> Have a good one! :) | 18:48.09 |
| <KenSharp> If, when you close a PDF file Acrobat offers to 'save the changes' and you didn't change anything, then it silently fixed a broken file for you. That's often a good clue | 18:48.28 |
| <KenSharp> You have a good day 🙂 | 18:48.37 |
| <jonesmeier> ah that's a good clue! I'll remember that for some manual tests, thx | 18:49.15 |
| <jonesmeier> you too | 18:49.19 |
| <<<Back 1 day (to 2022/07/24) | Forward 1 day (to 2022/07/26)>>> | |