| <<<Back 1 day (to 2017/04/12) | 20170413 |
celyr | Hi | 07:41.02 |
ghostbot | Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line. | 07:41.02 |
celyr | I have like 7000 pdf to repair | 07:41.26 |
| this is working good: gs -o file_ok.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress file_not_ok.pdf | 07:42.11 |
| but I would need to do it in place | 07:42.26 |
kens | Don't use PDFSETTINGS unless you understand exactly what all the controls embodied in that collection are doing. If you don't know what that means, don't do it at all. | 07:42.57 |
| YOu cannot read a PDF and mke a new PDF with the same naem. | 07:43.12 |
| Because pdfwrite needs to write the content 'as it goes', while at the same tie reading new content from the original PDF file. | 07:43.40 |
| ANd in any event, given that pdfwrite does not 'repair' files, and does not guarantee to be able to read a damaged or invalid source file, if you replaced the original with the output of pdfwrite its entirely possible you would replace a dmaaged or invalid file with one which suffered even wrose damage. | 07:44.48 |
| You shoudl keep the original until you are satisfied that the new file is no worse than the original file. | 07:45.10 |
celyr | kens, the mistake is the same in all 7000 files | 07:47.56 |
| kens, And I've tested it, it's fixed in that way | 07:48.14 |
kens | Well that's hopeful. Nevertheless, you cannot select an output filename which is the same as the input filename | 07:48.19 |
celyr | Ok | 07:48.26 |
kens | You can't write to a PDF file while also reading from it | 07:48.32 |
celyr | I will have to play with find | 07:48.33 |
chrisl | celyr: I find xargs useful for things like that | 07:49.35 |
celyr | I did it | 08:04.10 |
| but it's like a bomb lol still processing | 08:04.17 |
| 3000/7000 time for a coffe | 08:08.27 |
| btw i solved with $ find ./ -name CMR*.pdf -exec gs -o ../correct/{} -sDEVICE=pdfwrite {} \; | 08:20.19 |
frostym | kens: Hey, I ended up running a test overnight to test how accurately searching for "snap_to_device" and using a modified ghostscript to print to stdout when an image operation occurs were. Out of 4217 flagged files, 37 were flagged by ghostscript but did not contain the adobe keyword. I assume it's because they include images but were made in something like Inkscape but just want to confirm I | 13:29.16 |
| modified it correctly. When you said print to stdout yesterday in /image /imagemask and /colorimage, you were talking about modifying the definitions in Resource/Init/gs_img.ps right? | 13:29.22 |
kens | Umm, no.... | 13:29.43 |
| I was talking about adding PostScirp code (which you can run as a 'prologue' which redefines the image operators. | 13:30.06 |
frostym | From my initial testing it seemed to give accurate results, why would that approach be better? | 13:30.38 |
kens | SO (caveat, untested): | 13:31.07 |
| and similarly for colorimage and imagemask | 13:31.08 |
| Grr, irc doesn't liie '/' | 13:31.16 |
| Let me try that again | 13:31.20 |
kens2 | frostym : sorry network hiccup | 13:33.36 |
frostym | No worries | 13:33.56 |
kens2 | As I was saying. If you create a PostScript 'prolog' file and add something along the lines of: | 13:34.49 |
| "/orig_image /image load def | 13:34.49 |
| "/image {(this is an image\n) print flush orig_image} bind def | 13:34.49 |
| and similarly for imagemask and colorimage | 13:34.49 |
| (without the ") | 13:34.57 |
| This doesn't alter the behaviour of Ghostscript itself, modifying the Resource tre4e does. That means you would get these messages for every file, and you really only want them as a test , right ? So sending your prolog file when you wnt to run a test would mean the ordinary behaviour stays the same | 13:35.57 |
| Also, if we change the way tht we define the image operators (so they aren't in the Resources PostScript) then the prolog method continues to work, because its straight PostScript. It also works with interpreters other than Ghostscript. | 13:38.26 |
| I see I've come back with the wrong nick, I'll just log off and on again to get that sorted | 13:38.56 |
frostym | Well yeah, I need to go back and test our current collection but moving forward with new files I'd still like Ghostscript to tell me if it encountered an image. But if there is a way to do it without having to distribute a modified ghostscript to our production servers that'd obviously be ideal. | 13:39.27 |
| So basically I'll define this prolog file and supply it along with my EPS file, correct? | 13:39.42 |
kens | Put it on the command line, before the EPS | 13:39.58 |
| You *could* just do it all on the command line by using -c and -f but its probably easier to write it in a nice littel file | 13:40.26 |
| You can put comment sin the file with the '%' character too | 13:40.38 |
frostym | gs -dQUIET -dSAFER -dBATCH -dSTRICT -dNOPAUSE -dSHORTERRORS -sDEVICE=jpeg -dJPEGQ=100 -r300 -dMaxBitmap=500000000 -dUseCIEColor -sOutputFile=output.jpg -dGraphicsAlphaBits=4 -dEPSFitPage -g3000x3000 -f input.eps | 13:40.44 |
kens | You don't need the -f there. | 13:41.05 |
frostym | That's my current command, would I just add a -f before the input one? Or is that -f for input.eps not even necessary? | 13:41.08 |
kens | -f turns off -c | 13:41.18 |
| If you don't have a -c you don't need a -f | 13:41.25 |
frostym | Okay I didn't think so based on reading the man pages but I wasn't the one that constructed that command | 13:41.30 |
kens | Don't use -dUseCIEColor, that's horrible | 13:41.43 |
| I can't see the sense in TextAlphaBits and GraphicsAlphaBits if you are going to JPEG, the output will be quite blurred enough without those | 13:42.12 |
frostym | For testing our current collection, do any of those flags still need to be included or can I just call ghostscript with the input and output info + prologue? | 13:42.19 |
kens | For testing you don;t need any of that | 13:42.31 |
| But it depends how you want to do your tests :-) | 13:43.00 |
| If all you are doing is looking to see if a file uses an image, then you could just do -o /dev/null -r72 | 13:43.26 |
| Which throws away the rednered output and redners to a low resolution, so its fast | 13:43.42 |
frostym | Why is -dUseCIEColor horrible? I just looked up it's purpose but I don't see the harm, wouldn't improving conversion of CMYK documents to RGB be a good thing? | 13:43.59 |
kens | It absolutely does not improve the conversion of RGB to CMYK | 13:44.14 |
| Or vice versa | 13:44.20 |
| *unless* you carefull construct COlor Rendering Dictionaries and include those, which you aren't doing | 13:44.36 |
| THe ICC profile colour management in Ghostscript is *ar* superior to the old CRD method | 13:44.52 |
| If you want to do colour management, use the ICC profiles | 13:45.04 |
frostym | Ah okay, it appears I was looking at an old version of the documentation (8.54) where it claims it does improve conversion | 13:45.34 |
| I see now that text was updated in the latest version of the docs | 13:45.47 |
kens | UseCIECOlor is a PostScript convention | 13:45.52 |
| Good to know the recent docs are better :-) | 13:46.02 |
| UseCIEColor was an Adobe h*ck to try and do colour management before ICC profiles got popular, its never, ever worked well. | 13:46.39 |
frostym | Good to know, thanks. I'm unaware as to why TextAlphaBits and GraphicsAlphaBits are being included in the command too, is there any possibility that dropping them will have a negative effect for some files? | 13:48.25 |
kens | The rendering will go faster. For non-lossy compression methods (ie not JPEG), and if you're of the group of people who think that blurry text looks better, then you might consider the output to be less good. Personally I disagree, especially at even moderate resolutions. | 13:49.42 |
frostym | It will go faster with or without the flag? | 13:50.09 |
kens | Faster without | 13:50.14 |
| Try it out and see if you think the old way is better | 13:51.19 |
frostym | We render these images at upwards of 4000x4000px at times, I assume you have the same stance even when the image is that large right? And I'll definitely run a comparison on some of our images to see the difference, just have to compile a good list of test cases first. | 13:52.46 |
kens | THe number of pixels doesn't say much, its the resolution and for that I'd need to know the original medis size | 13:53.21 |
| If I render a 1 inch square at 100 pixelsx100 pixels then the resolution is 100 dpi, if I render it at 3000x3000 then its 3000 dpi. | 13:53.59 |
| So the number of pixels doens't say anything | 13:54.09 |
| Personally I dislike blurred text, but other people disagree with me. | 13:54.33 |
| FWIW I also vehemently dislike 'ClearType' (sub pixle rendering) as it puts ocloured fringes on the text | 13:54.59 |
frostym | -r300 would mean 300dpi then, right? | 13:55.20 |
kens | yes | 13:55.26 |
| and at 300 dpi you shopld not see much difference with TextAlphaBits or GraphicsAlphaBits I would say | 13:55.47 |
| Note that for yoru command line, putting -r300 is more or less irrelvant | 13:56.27 |
| Because you set the size of the media in pixels (and -g sets FIXEDMEDIA I believe) and then set -dEPSFitPage, which resizes the page to fit the media | 13:57.15 |
| So you aren't (for example) rendering an A4 page at 300 dpi. You are rending an A4 page, scaled so that its 10 inches in its largest dimension. | 13:57.53 |
| Because you set -g3000x3000 and -r300 | 13:58.05 |
| SO, in effect, the media is 10 inches by 10 inches | 13:58.16 |
frostym | Technically our command excludes -dGraphicAlphaBits, -dEPSFitPage and -g if a size isn't specified. If it is we append them to the end so that's why it includes both, there are times where we run the command without specifying -g | 13:58.20 |
kens | Well, its up to you, I can't tell you what's 'right' for you in a quality matter, because that's pretty subjective. | 13:58.51 |
| I wouldn't use anti-aliasing myself | 13:59.32 |
frostym | So -g will not override -r if specified is what you're saying? | 13:59.32 |
kens | No..... | 13:59.39 |
| If you specify a fixed media size (which you do, because you have used -g) and then set a resolution (which you do) and then scale the conent onto that media. Then you are not rendering the original content at 300 dpi. | 14:00.36 |
| Let's take a concrete example. | 14:00.46 |
| Say I have a document iontended for a large page, 20 inches by 20 inches. | 14:01.01 |
| If I render that at 300 dpi, then I get a bitmap which is 6000x6000 pixels. | 14:01.18 |
| Now lets say I use -g to limit the media to 3000x3000 | 14:01.29 |
| I also set the resolution to 300 dpi. | 14:01.38 |
| That means my media is now 10 inches by 10 inches | 14:01.51 |
| In order to render the whole content of my page I need to scale it down by 2 | 14:02.04 |
| So if I compare my rendered output, its the same, but its now only 10 inchesx10 inches | 14:02.28 |
| In effect I've got hte same effect as rendering the original document at 150 dpi | 14:02.51 |
frostym | Ah okay, that makes sense thanks. So in regards to -dGraphicsAlphaBits (which is only included when specifying a resolution with -g), what would be a good test image for comparing the effects with/without it's inclusion? We don't specify -dTextAlphaBits, is an image with text still the ideal test case? | 14:06.39 |
kens | If you are going to produce JPEG I think there's even less argument for using GraphicsAlphaBits than TextAlphaBits..... | 14:07.23 |
| I'm not sure what you mean by a 'good test image', image is a specific thing in PostScirp and PDF< its a bitmap. | 14:07.47 |
| FOr testing the effect of GraphicsAlphaBits I would say any Illustrator file is probably good (that doesn't use transparency!) as vectors all get anti-aliased with Graphcis ALphaBits | 14:08.27 |
frostym | Sorry a good EPS file that contains text | 14:08.38 |
| Okay sounds good, I'll get some test cases and get to work on testing the removal of -dGraphicsAlphaBits and -dUseCIEColor. Thanks for the detailed explanations! | 14:12.57 |
kens | NP | 14:13.22 |
frostym | kens: Quick question, my coworker tells me we initially run the command without -g3000x3000 (among other flags) so we can get the image's pixel dimensions and perform different logic accordingly. Will -r300 have any effect on that output or can it be excluded? We're not going to be printing this output so am I correct in assuming the dpi is irrelevant? Or will ghostscript produce an image with | 15:04.14 |
| different dimensions if it's excluded? | 15:04.20 |
kens | If you don't specify a resolution you will get the device's default resolution. Therefore the dimensions (in pixels) will always differe depending what resolution you request. | 15:05.06 |
| If all you want is dimensions, then use -r72 and its easy to figure out the actual media dimensions form the resulting pixels (but with low accuracy if someone does something with floating point media sizes) | 15:06.00 |
frostym | Will the ratio always remain the same? | 15:06.33 |
kens | Not sure what you mean | 15:06.45 |
| The media size is unchanged. | 15:06.52 |
| (unless you set a media size, and -dFIXEDMEDIA, or use a device wiht fixed media sizes) | 15:07.15 |
frostym | Which is really what we're after, we perform different logic if the result is say panoramic vs square | 15:07.20 |
kens | for if you always usew EPS files then the dimensions are in the comments in the EPS.... | 15:07.47 |
frostym | I'm assuming they weren't aware of that when deciding to pass it through ghostscript, I'll look into switching to just parsing the EPS comments. I've opened a test file and I see three headers, I'm assuming one of which is the one you're referencing. They're all the same for this image (with the exception of float vs int precision for one of them), which is the one I should use? | 15:12.14 |
| HiResBoundingBox or CropBox | 15:12.31 |
kens | You can also use the bbox device to get the dimensions of any input | 15:12.53 |
| You can't guarantee the presence of any of the comments. For DSC-conformant files BoundingBox must be present others are optional | 15:13.37 |
frostym | Is it common for EPS files to be DSC-conforming or are there bigger known applications that aren't? | 15:16.25 |
chrisl | IIRC, a valid EPS must be DSC compliant. | 15:18.13 |
| In the real world, however, there's a lot of crap out there! | 15:19.18 |
kens | Technically EPS files *must* be DSC compliant | 15:19.51 |
| Otherwise they won't work | 15:20.00 |
| Of course, just because it has a '.eps' extension doesn't mean its really an EPS | 15:20.18 |
frostym | Okay cool I'll rely on the presence of BoundingBox to get the width:height ratio and if it doesn't exist reject the file, we wouldn't want invalid EPS files in our collection anyways. Thanks guys | 15:29.33 |
kens | NP | 15:29.43 |
| Forward 1 day (to 2017/04/14)>>> | |