| <<<Back 1 day (to 2018/11/19) | 20181120 |
nosilver4u | Hi, I use GS to (re)compress the images inside the PDF files on our client sites, and recently upgraded to a new version of GS (9.06 to 9.25), and am having trouble with not getting the same results on the new server. | 07:01.57 |
| I run tests on this PDF as part of our unit testing: https://s3-us-west-2.amazonaws.com/exactlywww/tomtempleartist-bio-2008.pdf | 07:02.56 |
| and this is the command that is run: gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH -sOutputFile='tom-temp.pdf' tomtempleartist-bio-2008.pdf | 07:03.39 |
| for some reason, GS no longer seems to be recompressing the images in that PDF, when I compare the original and the new one side by side, the images look identical | 07:04.50 |
| ideas? | 07:04.54 |
| I've got to get some sleep now that I've been trying all sorts of different ideas to resolve it, but I'll check back in the morning if anyone could take a look. | 07:07.42 |
kens | nosilver4u (for the logs) we no longer decompress and recompress JPEG images unless required to do so. If you think you've found a bug, please open a bug report. | 07:59.40 |
nosilver4u | kens, how would I then get GS to recompress the JPEG images, is there an option to override that behavior? Or is it checking to see what the existing dpi is, and skipping if it matches, or something clever like that? | 14:19.49 |
kens | Yes there is an option to disable the JPEG compression, its in the documentation | 14:20.17 |
| the lack of JPEG decompression that is | 14:20.29 |
| It *should* be disbaling it anyway, if the image is at a higher dpi than the threshold. | 14:20.49 |
| However.... | 14:20.53 |
| dcompressing, quantising and recompressing can lead to smaller images (though obviously of poorer quality) even when the image does not need to be downsampled | 14:21.27 |
| So you could be seeing a case where the image is already at (or below) the target resolution, so we don#'t decompress it, thereby preserving the quality, whereas before we did decompress and recompress it, making it smaller but of worse quality | 14:22.13 |
nosilver4u | aha, I see | 14:22.47 |
| that's good to know also | 14:22.54 |
| Do you know how I can check the dpi of the existing images in a pdf? | 14:23.37 |
kens | That's non-trivial | 14:23.45 |
| You can determine the number of samples in each dimension of an image easily enough | 14:24.01 |
| But the 'dpi' depends on how those images are drawn. So an image 300x300 which isd drawn in a 1 inch square is 300 dpi, whereas the same image drawn in a 2 inch square is 150 dpi. | 14:24.37 |
| So you need to know hte area the image covers when rendered. | 14:24.48 |
| And that depends on the scaling in force at the time the image is drawn | 14:25.01 |
| And that can be affected at any point in the content stream. So the only way to find out is to parse the content stream until you get to the image, then figure out the scaling in force, and use that to determine the dpi | 14:25.34 |
nosilver4u | okay, that all makes sense so far | 14:26.17 |
kens | Let me know if I'm spouting baby talk, its hard to know what level to pitch answers at | 14:26.45 |
nosilver4u | I've very familiar with images, and dpi, and how they do or do not affect display on screens | 14:27.41 |
kens | OK so I can drop a load of the dumber explanations :-) | 14:27.56 |
nosilver4u | But images within a PDF is a bit of a mystery to me | 14:27.57 |
kens | Its much the same as on screen | 14:28.07 |
| But you don't knopw what the target resolution is :-) | 14:28.28 |
nosilver4u | I presume that a PDF has a pre-determined size & resolution, but whether the images will match that nor not... | 14:28.44 |
kens | It has a declared size, but not resolution | 14:28.59 |
| The MediaBox is in PostScript/PDF units (1/72 inch) | 14:29.11 |
| so an 8x11 media is 612x792 units | 14:29.29 |
| umm actually no... | 14:29.40 |
| That would be 576x792 | 14:30.22 |
| All objects are drawn so they cover an area, normally using vectors | 14:30.42 |
| So they scale accurately | 14:30.48 |
nosilver4u | right, I was just thinking PDFs are more like vector graphics | 14:30.59 |
kens | Yeah, for everything except images :-) | 14:31.11 |
nosilver4u | they have a defined size, but they are meant to scale somewhat | 14:31.12 |
kens | They should scale seamlessly | 14:31.20 |
| Images have to have the samples scaled though | 14:31.35 |
| WHich means some kind of interpolation when rendering | 14:31.44 |
| Unless you happen to hit on the resolution where the scaled image exactly matches the pixels of the device | 14:32.11 |
nosilver4u | Yeah, I suppose anytime I've seen one that wasn't scaling well, it was likely due to the image content | 14:32.24 |
| right, sure | 14:32.32 |
kens | Poor scaling could have a lto of reasons, could be a low res image, could be poor quality interpolation (or downsampling) in the rendering engine | 14:33.13 |
| Note that the pdfwrite devcie doesn't (by default) modify the images at all | 14:33.51 |
| That's why we moved to the JPEG preservation, because for high qulaity work decompressing a JPEG and then reapplying JPEG compression leads to nasty artefacts | 14:34.17 |
| Obviously if you are changing colour space, or downsampling images, then we do have to decompress and recompress | 14:34.36 |
| Did yo ufind the documentation on JPEG preservation ? | 14:35.56 |
nosilver4u | so I found the pdfimages tool, was looking for that before I go manually overriding the JPG preservation, but that's next | 14:36.25 |
kens | the pdfimage device ? Those render the PDF to an image, then wrap the image back up as a PDF | 14:36.58 |
nosilver4u | no, the pdfimage command-line tool to find the resolution of an image | 14:37.19 |
| it also gives a ppi setting | 14:37.28 |
kens | Oh, not familiar with it | 14:37.30 |
nosilver4u | my test PDF looks like this: | 14:37.46 |
kens | The documentation is here: | 14:37.48 |
| https://www.ghostscript.com/doc/9.25/VectorDevices.htm#PDFWRITE | 14:37.48 |
nosilver4u | page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio | 14:38.01 |
kens | The control is called PassThroughJPEGImages | 14:38.04 |
nosilver4u | -------------------------------------------------------------------------------------------- | 14:38.06 |
| 1 0 image 611 447 rgb 3 8 jpeg yes 12 0 200 201 31.8K 4.0% | 14:38.10 |
| 1 1 image 868 430 rgb 3 8 jpeg yes 15 0 201 200 88.8K 8.1% | 14:38.15 |
kens | so 200 dpi | 14:38.31 |
nosilver4u | weird, thought I was just on that page, tried searching, nothing, so I clicked your link, and there it is... crazy :) | 14:40.01 |
kens | IIRC you were using ebook ? The colour and gray image resolution for ebook is 150 dpi and teh threshold is 1.5. So an image would have to exceed 225 dpi to be downsampled. Mono images are at 300 dpi with a threshold of 1.5, so they would have to exceed 450 dpi | 14:40.06 |
nosilver4u | yup, was using ebook, that all looks right on the chart | 14:40.55 |
| learning a lot, thanks! | 14:40.59 |
kens | NP | 14:41.17 |
nosilver4u | with the ebook preset, I was wondering about the sampling method, bicubic is better than the average method usually, right? | 14:43.10 |
kens | Generally, yes, at the cost of performance | 14:43.33 |
nosilver4u | So I could just define the ColorImageDownsampleType setting to override that | 14:43.39 |
kens | Yes | 14:43.47 |
nosilver4u | While I generally want it to be fast, I can sacrifice a little speed for better quality images | 14:44.03 |
kens | But you must define it *after* PDFSETTINSG on the command line | 14:44.12 |
nosilver4u | I'm hoping to find something that handles text within the images better | 14:44.40 |
| I know, crazy, right? text within an image contained in a pdf, it's nuts, but people do it... | 14:45.01 |
kens | Well, don't use JPEG output :-) | 14:45.04 |
nosilver4u | they shouldn't, obviously, and I warn them the compressor will probably blow up if they do that, but it'd be nice if bicubic sampling would help some | 14:45.51 |
| I'll do some testing, see if I have a sample PDF for that, hopefully it makes a difference for those crazy folks :) | 14:46.28 |
kens | What I meant was don't use JPEG compression on the outptu from pdfwrite if it contains images | 14:46.48 |
| Use Flate instead | 14:46.55 |
| Bah two conversations at once, can't keep up. I mean if the output OPDF contains an image, whcih has text, then compress the PDF with Flate rather than JPEG | 14:47.41 |
| The result on text will be better (thought hte size will be larger) | 14:47.58 |
| Bicubic downsampling might help, not sure | 14:48.16 |
nosilver4u | well, the passthrough option worked, although given what I know now, I probably won't use it, but I might decrease the threshold a bit | 14:50.44 |
kens | All the parameters can be fiddled with. If you have a consistent workflow you can probably tinker with the numbers until you get a good result | 14:51.18 |
| Forward 1 day (to 2018/11/21)>>> | |