Ghostscript IRC logs

	<<<Back 1 day (to 2022/07/24)	Fwd 1 day (to 2022/07/26) >>>	20220725
artifexirc-bot	<jonesmeier> hi, i was processing alot of files with the output device "ps2write", it was going fine for a while but then an error came up "cannot open device", i don't have the exact error message anymore		09:54.05
	<jonesmeier> should I give it some rest every few calls or something ?		09:54.23
	<jonesmeier> no sorry, it seems it's because /tmp is filling up with tempfiles, i thought only once the error occurs but apparently it does right away, i need to check that		10:00.55
	<jonesmeier> i was piping the console output to grep and it didn't like that it seems and never removed it's temp files		18:03.19
	<jonesmeier> (i'm trying to use it to detect corrupt PDF files)		18:04.22
	<jonesmeier> looks like it's working fine now.. thx		18:05.33
	<KenSharp> OK well if you do find a problem feel free to file a bug report.		18:06.21
	<KenSharp> temp files ought to be cleaned up, they should be unlinked after creation (on Linux) and so should disappear when closed.		18:06.41
	<KenSharp> I'm slightly surprised that you are getting temp files at all, if all you want to do is detect corrupt files.		18:07.33
	<KenSharp> If it were me I woudl run with -sDEVICE=nullpage -dPDFSTOPONERROR and check the return code		18:07.52
	<jonesmeier> ok thanks, maybe I should investigate the thing with grep again when it's done		18:10.35
	<jonesmeier> your suggestion does sound much smarter than what I'm doing for sure!		18:10.54
	<KenSharp> I don't know about smarter, but ti might be quicker!		18:11.15
	<jonesmeier> because I'm converting them to Postscript and then look at it's text output, it never seemed to change the exit code even if it had errors there		18:11.36
	<KenSharp> Oh wow, I really wouldn't convert to PostScript.		18:11.49
	<KenSharp> That's is going to use temp files for certain, and it'll be potentially slow.		18:12.02
	<jonesmeier> but I didn't know about that option you mentioned there		18:12.09
	<KenSharp> The pdfwrite family (which includes ps2write) try really hard not to stop, so they note errors, and carry on.		18:12.29
	<KenSharp> Then give you a report at the end and return success.		18:12.44
	<jonesmeier> hm yeah I'm not sure, it is quite slow, but that's not a big concern for me.. I will try your suggestion definitely!		18:13.04
	<KenSharp> But if you use -dPDFSTOPONERROR they should stop when they hit an error, and return a non-zer status		18:13.06
	<jonesmeier> yes it would always return successfully, even when noting multiple errors and not even being able to read a single page		18:13.24
	<KenSharp> We're trying to mimic Acrobat 🙂		18:13.40
	<jonesmeier> very nice, thank you! I'm going to make the changes now and see how that works :)		18:14.05
	<KenSharp> Basically people tend to grumble when we throw errors on PDF input, because "Acrobat can open it"		18:14.07
	<KenSharp> Which is true, but only because Acrobat silently ignores/fixes many problems.		18:14.23
	<KenSharp> Well if you do find problems, do please open a bug report so we can look into it		18:14.47
	<KenSharp> Best of luck 🙂		18:14.51
	<jonesmeier> hehe ok I see, so in order to basically do exactly what Acrobat does, it can't treat many things as an error		18:15.59
	<KenSharp> Yep, the PDF interpreter tries to ignore errors and carry on. Unlike Acrobat we do report at the end if we found anything concerning though		18:16.31
	<jonesmeier> I guess that's good if ultimately it could really read the file.. in my case it can't read some, but returns successfully.. so I guess it can read some of it, it has a correct header and stuff, but it says "no pages will be processed" or something		18:16.49
	<jonesmeier> on those broken files		18:17.09
	<KenSharp> Yeah if the trailer can't be read to find the Pages array then basically you#'re out of luck		18:17.20
	<KenSharp> But we have a number of badly broken files (many produced by the OSS-fuzz fuzzing tool) which we can get 'some' content out of, even though even Acrobat won't open them.		18:17.53
	<jonesmeier> yeah that must be it, the files were truncated on copying		18:17.58
	<jonesmeier> nice, then it can still help in these situations! I always do try it		18:18.40
	<jonesmeier> I ended up buying "Recovery Toolbox for PDF" for 40 bucks, because it really fixed all of them.. so I was happy to pay that		18:19.11
	<KenSharp> It can often extract 'some' content, but not always everything. But at least it tries to tell you if it thinks there was a problem. I've not heard of Recovery Toolbox though, new one to me		18:19.42
	<KenSharp> If you use the pdfwrite device to write a new PDF file GS will 'fix' a lot of broken files.		18:20.07
	<jonesmeier> ok, will continue to try on Linux first of course, this one was a Windows tool		18:20.16
	<jonesmeier> alright, will try that again too		18:20.42
	<KenSharp> If you do have files which the recovery thing can fix and we can't I'd be interested to see them. Maybe we can do a better job.		18:21.33
	<jonesmeier> okay... well unfortunately these are files from HR like contracts and stuff with personal data		18:23.21
	<KenSharp> Hmm wel that's not going to work, can't be having those.		18:23.39
	<jonesmeier> maybe I can simulate what happened to these files		18:23.58
	<jonesmeier> because I have yet to find out why they got truncated during copying...		18:24.32
	<KenSharp> Don't worry too much, if you happen to find a file or two you can share that would be great, but we do have a lot of broken files 🙂		18:25.04
	<jonesmeier> copied from a LAN samba share to a VPN samba share.. i want to do some testing, so maybe i can reproduce it, with some testfiles too		18:25.18
	<jonesmeier> hehe ok :) well maybe, i will report back if i can reproduce the errors, and then with some harmless files		18:25.50
	<KenSharp> I imagine it would be useful for you to figure out why they are being truncated, I'd certianly want to know....		18:25.54
	<jonesmeier> yeah that's what I need to do next!		18:26.06
	<jonesmeier> it's really not great..		18:26.52
	<KenSharp> Worrying I would think		18:27.06
	<jonesmeier> yes		18:27.11
	<jonesmeier> ok the PDFSTOPONERROR did throw errors on a couple of more files now, I need to check afterwards if these were broken too.. I'm also only going with "opens in Adobe Reader" of course		18:46.38
	<jonesmeier> but I will do another run afterwards, my version right now looks for "Processing pages" in the output		18:47.07
	<KenSharp> Hmm wel Acrobat will open all kinds of broken files, and it won't usually tell you they were bnoken.		18:47.17
	<jonesmeier> thank you very much for the time so far Ken		18:47.18
	<KenSharp> NP		18:47.22
	<KenSharp> I'm off now anyway, well past quitting time		18:47.34
	<jonesmeier> hmm ok will have to see!		18:47.57
	<jonesmeier> Have a good one! :)		18:48.09
	<KenSharp> If, when you close a PDF file Acrobat offers to 'save the changes' and you didn't change anything, then it silently fixed a broken file for you. That's often a good clue		18:48.28
	<KenSharp> You have a good day 🙂		18:48.37
	<jonesmeier> ah that's a good clue! I'll remember that for some manual tests, thx		18:49.15
	<jonesmeier> you too		18:49.19
	<<<Back 1 day (to 2022/07/24)	Forward 1 day (to 2022/07/26)>>>

Log of #ghostscript at irc.freenode.net.