Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2018/11/19)20181120 
nosilver4u Hi, I use GS to (re)compress the images inside the PDF files on our client sites, and recently upgraded to a new version of GS (9.06 to 9.25), and am having trouble with not getting the same results on the new server.07:01.57 
  I run tests on this PDF as part of our unit testing: https://s3-us-west-2.amazonaws.com/exactlywww/tomtempleartist-bio-2008.pdf07:02.56 
  and this is the command that is run: gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH -sOutputFile='tom-temp.pdf' tomtempleartist-bio-2008.pdf07:03.39 
  for some reason, GS no longer seems to be recompressing the images in that PDF, when I compare the original and the new one side by side, the images look identical07:04.50 
  ideas?07:04.54 
  I've got to get some sleep now that I've been trying all sorts of different ideas to resolve it, but I'll check back in the morning if anyone could take a look.07:07.42 
kens nosilver4u (for the logs) we no longer decompress and recompress JPEG images unless required to do so. If you think you've found a bug, please open a bug report.07:59.40 
nosilver4u kens, how would I then get GS to recompress the JPEG images, is there an option to override that behavior? Or is it checking to see what the existing dpi is, and skipping if it matches, or something clever like that?14:19.49 
kens Yes there is an option to disable the JPEG compression, its in the documentation14:20.17 
  the lack of JPEG decompression that is14:20.29 
  It *should* be disbaling it anyway, if the image is at a higher dpi than the threshold.14:20.49 
  However....14:20.53 
  dcompressing, quantising and recompressing can lead to smaller images (though obviously of poorer quality) even when the image does not need to be downsampled14:21.27 
  So you could be seeing a case where the image is already at (or below) the target resolution, so we don#'t decompress it, thereby preserving the quality, whereas before we did decompress and recompress it, making it smaller but of worse quality14:22.13 
nosilver4u aha, I see14:22.47 
  that's good to know also14:22.54 
  Do you know how I can check the dpi of the existing images in a pdf?14:23.37 
kens That's non-trivial14:23.45 
  You can determine the number of samples in each dimension of an image easily enough14:24.01 
  But the 'dpi' depends on how those images are drawn. So an image 300x300 which isd drawn in a 1 inch square is 300 dpi, whereas the same image drawn in a 2 inch square is 150 dpi.14:24.37 
  So you need to know hte area the image covers when rendered.14:24.48 
  And that depends on the scaling in force at the time the image is drawn14:25.01 
  And that can be affected at any point in the content stream. So the only way to find out is to parse the content stream until you get to the image, then figure out the scaling in force, and use that to determine the dpi14:25.34 
nosilver4u okay, that all makes sense so far14:26.17 
kens Let me know if I'm spouting baby talk, its hard to know what level to pitch answers at14:26.45 
nosilver4u I've very familiar with images, and dpi, and how they do or do not affect display on screens14:27.41 
kens OK so I can drop a load of the dumber explanations :-)14:27.56 
nosilver4u But images within a PDF is a bit of a mystery to me14:27.57 
kens Its much the same as on screen14:28.07 
  But you don't knopw what the target resolution is :-)14:28.28 
nosilver4u I presume that a PDF has a pre-determined size & resolution, but whether the images will match that nor not...14:28.44 
kens It has a declared size, but not resolution14:28.59 
  The MediaBox is in PostScript/PDF units (1/72 inch)14:29.11 
  so an 8x11 media is 612x792 units14:29.29 
  umm actually no...14:29.40 
  That would be 576x79214:30.22 
  All objects are drawn so they cover an area, normally using vectors14:30.42 
  So they scale accurately14:30.48 
nosilver4u right, I was just thinking PDFs are more like vector graphics14:30.59 
kens Yeah, for everything except images :-)14:31.11 
nosilver4u they have a defined size, but they are meant to scale somewhat14:31.12 
kens They should scale seamlessly14:31.20 
  Images have to have the samples scaled though14:31.35 
  WHich means some kind of interpolation when rendering14:31.44 
  Unless you happen to hit on the resolution where the scaled image exactly matches the pixels of the device14:32.11 
nosilver4u Yeah, I suppose anytime I've seen one that wasn't scaling well, it was likely due to the image content14:32.24 
  right, sure14:32.32 
kens Poor scaling could have a lto of reasons, could be a low res image, could be poor quality interpolation (or downsampling) in the rendering engine14:33.13 
  Note that the pdfwrite devcie doesn't (by default) modify the images at all14:33.51 
  That's why we moved to the JPEG preservation, because for high qulaity work decompressing a JPEG and then reapplying JPEG compression leads to nasty artefacts14:34.17 
  Obviously if you are changing colour space, or downsampling images, then we do have to decompress and recompress14:34.36 
  Did yo ufind the documentation on JPEG preservation ?14:35.56 
nosilver4u so I found the pdfimages tool, was looking for that before I go manually overriding the JPG preservation, but that's next14:36.25 
kens the pdfimage device ? Those render the PDF to an image, then wrap the image back up as a PDF14:36.58 
nosilver4u no, the pdfimage command-line tool to find the resolution of an image14:37.19 
  it also gives a ppi setting14:37.28 
kens Oh, not familiar with it14:37.30 
nosilver4u my test PDF looks like this:14:37.46 
kens The documentation is here:14:37.48 
  https://www.ghostscript.com/doc/9.25/VectorDevices.htm#PDFWRITE14:37.48 
nosilver4u page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio14:38.01 
kens The control is called PassThroughJPEGImages14:38.04 
nosilver4u --------------------------------------------------------------------------------------------14:38.06 
  1 0 image 611 447 rgb 3 8 jpeg yes 12 0 200 201 31.8K 4.0%14:38.10 
  1 1 image 868 430 rgb 3 8 jpeg yes 15 0 201 200 88.8K 8.1%14:38.15 
kens so 200 dpi14:38.31 
nosilver4u weird, thought I was just on that page, tried searching, nothing, so I clicked your link, and there it is... crazy :)14:40.01 
kens IIRC you were using ebook ? The colour and gray image resolution for ebook is 150 dpi and teh threshold is 1.5. So an image would have to exceed 225 dpi to be downsampled. Mono images are at 300 dpi with a threshold of 1.5, so they would have to exceed 450 dpi14:40.06 
nosilver4u yup, was using ebook, that all looks right on the chart14:40.55 
  learning a lot, thanks!14:40.59 
kens NP14:41.17 
nosilver4u with the ebook preset, I was wondering about the sampling method, bicubic is better than the average method usually, right?14:43.10 
kens Generally, yes, at the cost of performance14:43.33 
nosilver4u So I could just define the ColorImageDownsampleType setting to override that14:43.39 
kens Yes14:43.47 
nosilver4u While I generally want it to be fast, I can sacrifice a little speed for better quality images14:44.03 
kens But you must define it *after* PDFSETTINSG on the command line14:44.12 
nosilver4u I'm hoping to find something that handles text within the images better14:44.40 
  I know, crazy, right? text within an image contained in a pdf, it's nuts, but people do it...14:45.01 
kens Well, don't use JPEG output :-)14:45.04 
nosilver4u they shouldn't, obviously, and I warn them the compressor will probably blow up if they do that, but it'd be nice if bicubic sampling would help some14:45.51 
  I'll do some testing, see if I have a sample PDF for that, hopefully it makes a difference for those crazy folks :)14:46.28 
kens What I meant was don't use JPEG compression on the outptu from pdfwrite if it contains images14:46.48 
  Use Flate instead14:46.55 
  Bah two conversations at once, can't keep up. I mean if the output OPDF contains an image, whcih has text, then compress the PDF with Flate rather than JPEG14:47.41 
  The result on text will be better (thought hte size will be larger)14:47.58 
  Bicubic downsampling might help, not sure14:48.16 
nosilver4u well, the passthrough option worked, although given what I know now, I probably won't use it, but I might decrease the threshold a bit14:50.44 
kens All the parameters can be fiddled with. If you have a consistent workflow you can probably tinker with the numbers until you get a good result14:51.18 
 Forward 1 day (to 2018/11/21)>>> 
ghostscript.com #mupdf
Search: