| <<<Back 1 day (to 2016/05/23) | 20160524 |
Robin_Watts | np. | 00:05.08 |
anddam | Robin_Watts: sorry, I was afk | 06:48.54 |
| I'm on OS X, I'm using ghostscript from ports | 06:49.09 |
| I tried using pdfwrite device with different PDFSETTINGS | 06:50.22 |
| specifically I tried /screen that's supposedly very low in quality, the resolution should be 72 dpi IIRC so about a quarter of that 200 you suggested | 06:51.02 |
| and yet I get a 4MB file for 13, b/w printed pages | 06:51.18 |
| it's an excerpt from an old book | 06:51.34 |
| I see the page were scanned as color, and then someone tried to play with gamma, but I can see the yellow from the old paper | 06:52.08 |
| I tried converting to mono, unluckily I couldn't use the psmono device that I read about on stackoverflow,my installation doesn't have it | 06:52.47 |
| I tried imagemagick to convert to mono with the very same results | 06:52.56 |
| oh I'm using 9.19 | 06:53.37 |
kens | anddam don't use the PDFSETTINGS canned settings, they change *lots* of things all at once. Read up the documentation on each control, decide which ones you want to use, and apply them individually. Given that you are starting from image dtaa, and want to retain legibility, you have a fairly sever problem because reducing the resolution of the images will impact the legibility of the text. | 07:12.07 |
anddam | kens: oh hi, so you read the previous lines, I guess | 07:12.38 |
| I'd actually just drop any kind of color mapping, this is old b/w text with no figures | 07:13.00 |
kens | I would say the first thing you should do, since you say these are colour, is to use -sColorConversionStrategy=Gray to convert the images into gray scale. That should save 66% of the size straight off | 07:13.10 |
| Yeah I read the logs, we all do | 07:13.26 |
| Without seeing the file its hard to give concrete suggestions | 07:13.42 |
| It may be that the image data is JPEG compressed, in whch case you will want to have pdfwrite use Flate compression on the output, to prevent the artefacts caused by quantising JPEG data twice | 07:14.21 |
anddam | I can share it if you have a minute to have a look, I didn't as I didn't want to impose | 07:14.24 |
kens | I can take a cuik look | 07:14.33 |
| err quick | 07:14.37 |
| My mailbox seems unusually full this morning:-( | 07:15.11 |
anddam | I have the gray-ified version, 5.6 MB, and the original scanned one, 115MB | 07:15.12 |
kens | Well that's a bgi improvement :-) | 07:15.21 |
anddam | I have them as dropbox links already | 07:15.23 |
kens | OK | 07:15.31 |
anddam | https://dl.dropbox.com/s/46gh9w65vfk0xjo/Italo%20Greek%20coins%20%28Hands%201912%29.pdf | 07:15.40 |
kens | OK got that, one second | 07:16.04 |
anddam | https://dl.dropbox.com/s/w7n2f47dvd628ct/Italo-greeks%20coins%20%28beige%29.pdf | 07:16.09 |
kens | That gray one seems to have yellow on it on page 2 | 07:16.30 |
| and 3 in fact | 07:16.44 |
anddam | yes, not well greyified, I don't even know what the person that asked me actually did | 07:16.58 |
kens | Hmm I'll just prise the file open | 07:17.12 |
| well decompresed that comes out at 240 MB so I guess these are quite high resolution images | 07:18.26 |
| Oh wow, they also include transparency oO | 07:18.51 |
anddam | is the right approach to compress the inner images? | 07:18.51 |
kens | 'inner' ? | 07:19.06 |
| The PDF file images are already compressed, that's why it explodes in size when I decompress it | 07:19.22 |
anddam | I don't know well how PDF format works, but I figure it embeds object, and some of those object can be images | 07:19.29 |
kens | For image data, yes those are included, and the 'gray' file certainly has the images compressed | 07:19.57 |
| getting the larger file is taking longer | 07:20.07 |
anddam | I can see how those pages converted b/w or grayscale and with a dpi decent for screen reading would take very little space | 07:20.09 |
kens | Well, *less* space, but the inclusion of transparency is kind of a bad thing. | 07:20.33 |
anddam | I thought gs had some automagic recipe for that, I actually use the PDFSETTINGS trick as oneline to shrink PDF files I read | 07:20.49 |
kens | You might, wstrangely, do better to convert into PDF 1.3 files, whcih will 'flatten' the transparency | 07:20.52 |
| Don't use PDFSETTINGS | 07:21.00 |
anddam | yes, I mean _in past_ | 07:21.08 |
kens | It changes too much stuff, I keep telling people this | 07:21.12 |
anddam | as you just did to me, but I didn't know before you told me | 07:21.29 |
kens | No, the net is sadly full of cargo cult approaches to using GS and pdfwrite | 07:22.38 |
| Not many people bother to read teh documentation | 07:22.46 |
anddam | kens: just as overall description what approach do you suggest here? | 07:22.47 |
| well, it depends on how much time do you have and how handy that documentation is | 07:23.06 |
kens | I would look at the PDF files, decide where the space is being used, then apply the contrls I felt were best used to reduce that space | 07:23.20 |
| So the big file only increases to 417 MB which means it isn't well compressed | 07:23.41 |
anddam | I mean I went looking the man page, there was the generic -sname=string option for "systemdict" then I googled "gs systemdict" and couldn't find what it was right then, so I dropped it | 07:24.08 |
kens | We don't supply man pages. Our documentation is shipped as HTML in the ghostscript/doc directory | 07:24.33 |
| The original file has all the images in an ICCBased space with a RGB base space | 07:25.48 |
anddam | mmm now I understand the reference to "usage documentation" in man better, "For more information, see /opt/local/share/ghostscript/9.19/doc/Use.htm." | 07:26.26 |
kens | Yes that's our documenatation :) | 07:27.00 |
| For some reason the 'gray' PDF has *some* images converted to gray scale, but not all of them...... | 07:27.19 |
anddam | kens: that ICCBased looks worrying, from your sentence, but I'm not sure what it is. I think ICC is something related to color profiles | 07:27.21 |
kens | Yes ICC is the International Colour Consortium | 07:28.01 |
| It just means its a complex space | 07:28.14 |
anddam | I'm reading https://blog.idrsolutions.com/2011/04/understanding-the-pdf-file-format-%E2%80%93-iccbased-colorspaces/ | 07:28.24 |
kens | The fact that its based on RGB just really says that its colour, not grayscale | 07:28.29 |
| ICC isn't really something to worry about, it just adds to the complexity a little | 07:28.59 |
anddam | I see | 07:29.02 |
kens | The original file *does* contain trnasparency though | 07:29.28 |
anddam | a 1-component ICCBased space would be more appropriate there, wouldn't it? | 07:29.29 |
kens | Yes it would | 07:29.34 |
| But it isn't what is there, its a 3 component space | 07:29.44 |
anddam | I can understand that transparency thing is bad | 07:29.50 |
| so how can remove the transparency? how can I go to 1-component color space? | 07:30.06 |
kens | It makes the file much larger and *way* slower for no real reason as far as I can see | 07:30.11 |
| For transaprecny flattening you convert to a PDF 1.3 file | 07:30.33 |
anddam | I thought of exporting the pdf pages as image, manipulate the images with imagemagick and reassemble the pdf | 07:30.34 |
kens | THat turns all the transparency into images, whcih I wouldn't normally reccomend, but since its all images already that's not a problem | 07:30.53 |
| using imagemagick would work fine | 07:31.15 |
anddam | and does that require the 1.3 version passage? I mean reassembling images woulnd't add transparency | 07:31.48 |
kens | If you just render to, say, TIFF then downsacale the TIFF and re export as PDF then you don't need to got to 1.3. There is no way that would add trasnsparency back in as there is no reason to | 07:32.47 |
anddam | I'm using gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 -dNOPAUSE -dBATCH and it's quite slow | 07:33.00 |
| I mean way slower than when I used to use PDFSETTINGS (no, I'm not going to anymore) | 07:33.34 |
kens | It will be slow, it has to render all these high resolution images, and the transparency, to image data | 07:33.36 |
| The difference is that its rendering the images, not just reading and writing them, and it has to take the transparency into account when rendering, which it doesn't when preserving the transparency | 07:34.22 |
| It doesn't matter whether you use GS+pdfwrite to do the flattening, or render to bitmaps, its going to be slow | 07:35.03 |
| But its the only real way to lose the transparency. By the way the file opens really slowly in Acrobat too, and that's due to the transparency | 07:35.30 |
| Ha, your original file lies, it claims to be PDF 1.3 but uses transparency, which is only legal in a PDF 1.4 or later file | 07:36.26 |
anddam | once I have the 1.3 flattened pdf, what's the proper approach to convert to mono or grayscale? | 07:36.44 |
| I mean this literally has to be just read on screen | 07:36.51 |
kens | Using pdfwrite you set -sColorConversionStrategy=Gray you can't convert to monochrome with GS you would need to use ImageMagick | 07:37.19 |
anddam | I read that /screen setting used 72 dpi, that's a bit low for today's screen, isn't it? | 07:38.12 |
kens | Try it and see is the best I can suggest | 07:38.37 |
| Wow that really is slow | 07:39.24 |
| Oh its rendering at 720 dpi, silly me | 07:41.16 |
| OK this: | 07:42.18 |
| gs -r72 -sColorConversionStrategy=Gray -dCompatibiltyLevel=1.3 -sDEVICE=pdfwrite -o out.pdf "Italo.....pdf" | 07:42.18 |
| gives me 65KB per page in a quick time | 07:42.31 |
| Hmm 4.5MB | 07:43.41 |
kens2 | ah I see the original really does have beige background | 07:45.16 |
anddam | my flattened 1.3.pdf is 50 MB | 07:45.40 |
kens2 | Starting with the 'gray' file you supplied, I get 4.5MB with the command line above | 07:46.01 |
anddam | I converted it just with CompatibilityLevel=1.3 | 07:46.11 |
kens2 | THe original is much smaller, but the beige background makes it illegible | 07:46.16 |
| OK set -r72 as well | 07:46.25 |
anddam | should now I use this 1.3 file for further conversion or the original? | 07:46.26 |
| I mean as basis | 07:46.32 |
kens2 | I'm seeing better results using your gray PDF file, rather than the beige one | 07:46.48 |
| The beige background is ugly when converted to gray scale | 07:47.07 |
anddam | I figure that could be filtered out (with simple images) with mono color space and a threshold | 07:47.33 |
kens2 | You could try that, certainly | 07:47.52 |
| The reason you got a big PDF file is that the rendering defaults to 720 dpi, so if you use -r72 you will get better results | 07:48.21 |
| But like I said, the background makes it difficult to read | 07:48.35 |
| I would suggest that your best approach is to use GS to render the pages to somethign like TIFF< open the TIFF files in ImageMagick and process them there to remove the colour and reduce the resolution | 07:49.05 |
| Then export back to PDF | 07:49.10 |
anddam | why TIFF? | 07:49.25 |
kens2 | Its best to use an image manipualtion tool to manipulate images | 07:49.29 |
| Because TIFF is lossless | 07:49.35 |
anddam | I usually use PNG for lossless | 07:49.39 |
| does TIFF have any advantage? | 07:49.47 |
kens2 | Well use PNG then, it doesn't matter | 07:49.49 |
anddam | ok | 07:49.54 |
kens2 | Just don't use JPEG | 07:49.55 |
anddam | ofc | 07:50.00 |
kens2 | You'd be surprised how many people don't understand that :) | 07:50.11 |
anddam | well you can understand how quality is of the utter importance in a hundred years old book about Greek coins in southern Italy | 07:50.45 |
| that's a hot topic on twitter | 07:50.57 |
kens2 | LOL | 07:51.01 |
| Its important for legal documentation | 07:51.18 |
| THs kind of processing wouldn't be acceptable there | 07:51.28 |
anddam | my 72dpi Gray converted 1.3 file is still 55MB | 07:51.35 |
| this is odd | 07:51.37 |
kens2 | That is indeed odd. | 07:51.45 |
anddam | gs -r72 -sColoConversionStrategy=Gray -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 -dNOPAUSE -dBATCH -sOutputFile=Italo\⦠| 07:51.50 |
| Colo* | 07:51.56 |
| dammit | 07:51.57 |
kens2 | :-) | 07:52.01 |
anddam | 4.4 MB | 07:53.07 |
kens2 | For me at 150 dpi the original beige PDF comes out at 1.9MB after flattening at 150 | 07:53.23 |
| Mine is better, but the JPEG artefacts are hard to read | 07:53.50 |
anddam | this is my actual line gs -r72 -sColorConversionStrategy=Gray -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 -dNOPAUSE -dBATCH -sOutputFile=[â¦] | 07:53.50 |
| what's the proper tool to explode into imagesm pdftk? | 07:54.05 |
| s/imagesm/images,/ | 07:54.10 |
kens2 | You cna use GS to render to image | 07:54.18 |
| Just use a different device | 07:54.24 |
anddam | no | 07:54.26 |
| _you_ can use gs to render to image | 07:54.33 |
| I'm not able to | 07:54.40 |
kens2 | -sDEVICE=tiff24nc -sOutputFile=out%d.tif | 07:54.58 |
| let me just see the PNG outptu | 07:55.06 |
anddam | I'm sure I'll search the web and the first answer on stackoverflow for splitting will use -sPDFSETTINGS | 07:55.07 |
kens2 | Well you can create muiltiple PDF files | 07:55.19 |
| just use -o out%d.pdf | 07:55.29 |
| Seems we have a number of PNG devices | 07:56.29 |
anddam | in the folder where's pdfwrite.ps I don't see devices i've been reading around | 07:56.30 |
| like that tiff24nc or psmono | 07:56.38 |
kens2 | The devcies are compiled in | 07:56.43 |
anddam | I expected all devices to be there | 07:56.46 |
kens2 | pdfwrite.ps is a PostScript program, not a device | 07:57.02 |
| if you do gs -help it will list the devices | 07:57.19 |
anddam | oh, I thought -sDEVICE loaded an external module or so | 07:57.27 |
| yep, lot of those in -help | 07:57.42 |
kens2 | but -sDEVICE=png16m -o out%d.png will get you PNG output | 07:57.43 |
| You can use -r to change the resolution of hte output | 07:57.56 |
anddam | why 16m ? | 07:58.05 |
kens2 | full colour | 07:58.10 |
| I was thinking you would open the PNG i IM and then play with them there, so best to have the full range | 07:58.30 |
| Don't use mono because that will halftone screen the output | 07:58.45 |
anddam | rather than having gs di it where I cannot fine-tune the conversion | 07:59.12 |
| I see the point | 07:59.14 |
| ok | 07:59.16 |
| thanks for all the info | 07:59.29 |
kens2 | Yes exactly, best to use the right tool for the job, and an image editor is much better placed to do image editing | 07:59.38 |
anddam | way more than I intended to learn! | 07:59.41 |
kens2 | :-) | 07:59.45 |
| Must go do my email now | 07:59.50 |
anddam | I'll still batch edit those | 07:59.50 |
| thanks again | 07:59.54 |
kens2 | NP | 07:59.57 |
tempus_fol | Hello again, I've found a different set of PDFs failing PDF/A-2b conformation (because of "Overprint mode": https://0x0.st/qn0.html ); my command is always "gs -dPDFA=2 -dAutoRotatePages=/None -dColorConversionStrategy=/UseDeviceIndependentColor -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sProcessColorModel=DeviceCMYK -sColorConversionStrategyForImages=CMYK -dPDFACompatibilityPolicy=2 -sOutputFile=output.pdf input.pdf ~/PDFA_def.ps ./ | 09:40.52 |
| pdfmark" ; these PDFs have no images. Shall I open a new bug report or there's something patently wrong in the one-liner? I can eventually provide the source as private attachment | 09:40.53 |
kens | Make a report | 09:41.13 |
| I'm not aware of overprint being forbidden, I'll have to check the spec | 09:41.40 |
tempus_fol | Done, bug #696799 (please mark the attachment as private) | 09:46.44 |
Robin_Watts | done | 09:47.10 |
tempus_fol | thank you again | 09:47.39 |
Robin_Watts | np. | 09:48.09 |
Dylan_ | Hi | 21:37.32 |
ghostbot | Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line. | 21:37.32 |
Dylan_ | When a PDF is converted to JPEG with Ghostscript, a ICC profile is inserted into the JPEG and a "Copyright Artifex Software 2011" is added to the metainfo of the JPEG file. What is the license of this inserted ICC profile? The same license as Ghostscript? AGPL? | 21:38.16 |
HenryStiles | Dylan_: I think we intended them to be "do whatever you want don't blame us" but there should be a license file in the ghostscript iccprofile directory, the person that owns that stuff will be back in a bit and I'll talk to him about it. | 21:46.57 |
Dylan_ | thank you | 21:49.32 |
HenryStiles | for the logs mvrhel_laptop, can you and chrisl fix the licensing documentation for the icc profiles? | 21:57.12 |
| Forward 1 day (to 2016/05/25)>>> | |