Ghostscript IRC logs

	<<<Back 1 day (to 2018/11/19)	20181120
nosilver4u	Hi, I use GS to (re)compress the images inside the PDF files on our client sites, and recently upgraded to a new version of GS (9.06 to 9.25), and am having trouble with not getting the same results on the new server.	07:01.57
	I run tests on this PDF as part of our unit testing: https://s3-us-west-2.amazonaws.com/exactlywww/tomtempleartist-bio-2008.pdf	07:02.56
	and this is the command that is run: gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH -sOutputFile='tom-temp.pdf' tomtempleartist-bio-2008.pdf	07:03.39
	for some reason, GS no longer seems to be recompressing the images in that PDF, when I compare the original and the new one side by side, the images look identical	07:04.50
	ideas?	07:04.54
	I've got to get some sleep now that I've been trying all sorts of different ideas to resolve it, but I'll check back in the morning if anyone could take a look.	07:07.42
kens	nosilver4u (for the logs) we no longer decompress and recompress JPEG images unless required to do so. If you think you've found a bug, please open a bug report.	07:59.40
nosilver4u	kens, how would I then get GS to recompress the JPEG images, is there an option to override that behavior? Or is it checking to see what the existing dpi is, and skipping if it matches, or something clever like that?	14:19.49
kens	Yes there is an option to disable the JPEG compression, its in the documentation	14:20.17
	the lack of JPEG decompression that is	14:20.29
	It should be disbaling it anyway, if the image is at a higher dpi than the threshold.	14:20.49
	However....	14:20.53
	dcompressing, quantising and recompressing can lead to smaller images (though obviously of poorer quality) even when the image does not need to be downsampled	14:21.27
	So you could be seeing a case where the image is already at (or below) the target resolution, so we don#'t decompress it, thereby preserving the quality, whereas before we did decompress and recompress it, making it smaller but of worse quality	14:22.13
nosilver4u	aha, I see	14:22.47
	that's good to know also	14:22.54
	Do you know how I can check the dpi of the existing images in a pdf?	14:23.37
kens	That's non-trivial	14:23.45
	You can determine the number of samples in each dimension of an image easily enough	14:24.01
	But the 'dpi' depends on how those images are drawn. So an image 300x300 which isd drawn in a 1 inch square is 300 dpi, whereas the same image drawn in a 2 inch square is 150 dpi.	14:24.37
	So you need to know hte area the image covers when rendered.	14:24.48
	And that depends on the scaling in force at the time the image is drawn	14:25.01
	And that can be affected at any point in the content stream. So the only way to find out is to parse the content stream until you get to the image, then figure out the scaling in force, and use that to determine the dpi	14:25.34
nosilver4u	okay, that all makes sense so far	14:26.17
kens	Let me know if I'm spouting baby talk, its hard to know what level to pitch answers at	14:26.45
nosilver4u	I've very familiar with images, and dpi, and how they do or do not affect display on screens	14:27.41
kens	OK so I can drop a load of the dumber explanations :-)	14:27.56
nosilver4u	But images within a PDF is a bit of a mystery to me	14:27.57
kens	Its much the same as on screen	14:28.07
	But you don't knopw what the target resolution is :-)	14:28.28
nosilver4u	I presume that a PDF has a pre-determined size & resolution, but whether the images will match that nor not...	14:28.44
kens	It has a declared size, but not resolution	14:28.59
	The MediaBox is in PostScript/PDF units (1/72 inch)	14:29.11
	so an 8x11 media is 612x792 units	14:29.29
	umm actually no...	14:29.40
	That would be 576x792	14:30.22
	All objects are drawn so they cover an area, normally using vectors	14:30.42
	So they scale accurately	14:30.48
nosilver4u	right, I was just thinking PDFs are more like vector graphics	14:30.59
kens	Yeah, for everything except images :-)	14:31.11
nosilver4u	they have a defined size, but they are meant to scale somewhat	14:31.12
kens	They should scale seamlessly	14:31.20
	Images have to have the samples scaled though	14:31.35
	WHich means some kind of interpolation when rendering	14:31.44
	Unless you happen to hit on the resolution where the scaled image exactly matches the pixels of the device	14:32.11
nosilver4u	Yeah, I suppose anytime I've seen one that wasn't scaling well, it was likely due to the image content	14:32.24
	right, sure	14:32.32
kens	Poor scaling could have a lto of reasons, could be a low res image, could be poor quality interpolation (or downsampling) in the rendering engine	14:33.13
	Note that the pdfwrite devcie doesn't (by default) modify the images at all	14:33.51
	That's why we moved to the JPEG preservation, because for high qulaity work decompressing a JPEG and then reapplying JPEG compression leads to nasty artefacts	14:34.17
	Obviously if you are changing colour space, or downsampling images, then we do have to decompress and recompress	14:34.36
	Did yo ufind the documentation on JPEG preservation ?	14:35.56
nosilver4u	so I found the pdfimages tool, was looking for that before I go manually overriding the JPG preservation, but that's next	14:36.25
kens	the pdfimage device ? Those render the PDF to an image, then wrap the image back up as a PDF	14:36.58
nosilver4u	no, the pdfimage command-line tool to find the resolution of an image	14:37.19
	it also gives a ppi setting	14:37.28
kens	Oh, not familiar with it	14:37.30
nosilver4u	my test PDF looks like this:	14:37.46
kens	The documentation is here:	14:37.48
	https://www.ghostscript.com/doc/9.25/VectorDevices.htm#PDFWRITE	14:37.48
nosilver4u	page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio	14:38.01
kens	The control is called PassThroughJPEGImages	14:38.04
nosilver4u	--------------------------------------------------------------------------------------------	14:38.06
	1 0 image 611 447 rgb 3 8 jpeg yes 12 0 200 201 31.8K 4.0%	14:38.10
	1 1 image 868 430 rgb 3 8 jpeg yes 15 0 201 200 88.8K 8.1%	14:38.15
kens	so 200 dpi	14:38.31
nosilver4u	weird, thought I was just on that page, tried searching, nothing, so I clicked your link, and there it is... crazy :)	14:40.01
kens	IIRC you were using ebook ? The colour and gray image resolution for ebook is 150 dpi and teh threshold is 1.5. So an image would have to exceed 225 dpi to be downsampled. Mono images are at 300 dpi with a threshold of 1.5, so they would have to exceed 450 dpi	14:40.06
nosilver4u	yup, was using ebook, that all looks right on the chart	14:40.55
	learning a lot, thanks!	14:40.59
kens	NP	14:41.17
nosilver4u	with the ebook preset, I was wondering about the sampling method, bicubic is better than the average method usually, right?	14:43.10
kens	Generally, yes, at the cost of performance	14:43.33
nosilver4u	So I could just define the ColorImageDownsampleType setting to override that	14:43.39
kens	Yes	14:43.47
nosilver4u	While I generally want it to be fast, I can sacrifice a little speed for better quality images	14:44.03
kens	But you must define it after PDFSETTINSG on the command line	14:44.12
nosilver4u	I'm hoping to find something that handles text within the images better	14:44.40
	I know, crazy, right? text within an image contained in a pdf, it's nuts, but people do it...	14:45.01
kens	Well, don't use JPEG output :-)	14:45.04
nosilver4u	they shouldn't, obviously, and I warn them the compressor will probably blow up if they do that, but it'd be nice if bicubic sampling would help some	14:45.51
	I'll do some testing, see if I have a sample PDF for that, hopefully it makes a difference for those crazy folks :)	14:46.28
kens	What I meant was don't use JPEG compression on the outptu from pdfwrite if it contains images	14:46.48
	Use Flate instead	14:46.55
	Bah two conversations at once, can't keep up. I mean if the output OPDF contains an image, whcih has text, then compress the PDF with Flate rather than JPEG	14:47.41
	The result on text will be better (thought hte size will be larger)	14:47.58
	Bicubic downsampling might help, not sure	14:48.16
nosilver4u	well, the passthrough option worked, although given what I know now, I probably won't use it, but I might decrease the threshold a bit	14:50.44
kens	All the parameters can be fiddled with. If you have a consistent workflow you can probably tinker with the numbers until you get a good result	14:51.18
	Forward 1 day (to 2018/11/21)>>>

Log of #ghostscript at irc.freenode.net.