IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2016/05/23)20160524 
Robin_Watts np.00:05.08 
anddam Robin_Watts: sorry, I was afk06:48.54 
  I'm on OS X, I'm using ghostscript from ports06:49.09 
  I tried using pdfwrite device with different PDFSETTINGS06:50.22 
  specifically I tried /screen that's supposedly very low in quality, the resolution should be 72 dpi IIRC so about a quarter of that 200 you suggested06:51.02 
  and yet I get a 4MB file for 13, b/w printed pages06:51.18 
  it's an excerpt from an old book 06:51.34 
  I see the page were scanned as color, and then someone tried to play with gamma, but I can see the yellow from the old paper06:52.08 
  I tried converting to mono, unluckily I couldn't use the psmono device that I read about on stackoverflow,my installation doesn't have it06:52.47 
  I tried imagemagick to convert to mono with the very same results06:52.56 
  oh I'm using 9.1906:53.37 
kens anddam don't use the PDFSETTINGS canned settings, they change *lots* of things all at once. Read up the documentation on each control, decide which ones you want to use, and apply them individually. Given that you are starting from image dtaa, and want to retain legibility, you have a fairly sever problem because reducing the resolution of the images will impact the legibility of the text.07:12.07 
anddam kens: oh hi, so you read the previous lines, I guess07:12.38 
  I'd actually just drop any kind of color mapping, this is old b/w text with no figures07:13.00 
kens I would say the first thing you should do, since you say these are colour, is to use -sColorConversionStrategy=Gray to convert the images into gray scale. That should save 66% of the size straight off07:13.10 
  Yeah I read the logs, we all do07:13.26 
  Without seeing the file its hard to give concrete suggestions07:13.42 
  It may be that the image data is JPEG compressed, in whch case you will want to have pdfwrite use Flate compression on the output, to prevent the artefacts caused by quantising JPEG data twice07:14.21 
anddam I can share it if you have a minute to have a look, I didn't as I didn't want to impose07:14.24 
kens I can take a cuik look07:14.33 
  err quick07:14.37 
  My mailbox seems unusually full this morning:-(07:15.11 
anddam I have the gray-ified version, 5.6 MB, and the original scanned one, 115MB07:15.12 
kens Well that's a bgi improvement :-)07:15.21 
anddam I have them as dropbox links already07:15.23 
kens OK07:15.31 
anddam https://dl.dropbox.com/s/46gh9w65vfk0xjo/Italo%20Greek%20coins%20%28Hands%201912%29.pdf07:15.40 
kens OK got that, one second07:16.04 
anddam https://dl.dropbox.com/s/w7n2f47dvd628ct/Italo-greeks%20coins%20%28beige%29.pdf07:16.09 
kens That gray one seems to have yellow on it on page 207:16.30 
  and 3 in fact07:16.44 
anddam yes, not well greyified, I don't even know what the person that asked me actually did07:16.58 
kens Hmm I'll just prise the file open07:17.12 
  well decompresed that comes out at 240 MB so I guess these are quite high resolution images07:18.26 
  Oh wow, they also include transparency oO07:18.51 
anddam is the right approach to compress the inner images?07:18.51 
kens 'inner' ?07:19.06 
  The PDF file images are already compressed, that's why it explodes in size when I decompress it07:19.22 
anddam I don't know well how PDF format works, but I figure it embeds object, and some of those object can be images07:19.29 
kens For image data, yes those are included, and the 'gray' file certainly has the images compressed07:19.57 
  getting the larger file is taking longer07:20.07 
anddam I can see how those pages converted b/w or grayscale and with a dpi decent for screen reading would take very little space07:20.09 
kens Well, *less* space, but the inclusion of transparency is kind of a bad thing.07:20.33 
anddam I thought gs had some automagic recipe for that, I actually use the PDFSETTINGS trick as oneline to shrink PDF files I read07:20.49 
kens You might, wstrangely, do better to convert into PDF 1.3 files, whcih will 'flatten' the transparency07:20.52 
  Don't use PDFSETTINGS07:21.00 
anddam yes, I mean _in past_07:21.08 
kens It changes too much stuff, I keep telling people this07:21.12 
anddam as you just did to me, but I didn't know before you told me07:21.29 
kens No, the net is sadly full of cargo cult approaches to using GS and pdfwrite07:22.38 
  Not many people bother to read teh documentation07:22.46 
anddam kens: just as overall description what approach do you suggest here?07:22.47 
  well, it depends on how much time do you have and how handy that documentation is07:23.06 
kens I would look at the PDF files, decide where the space is being used, then apply the contrls I felt were best used to reduce that space07:23.20 
  So the big file only increases to 417 MB which means it isn't well compressed07:23.41 
anddam I mean I went looking the man page, there was the generic -sname=string option for "systemdict" then I googled "gs systemdict" and couldn't find what it was right then, so I dropped it07:24.08 
kens We don't supply man pages. Our documentation is shipped as HTML in the ghostscript/doc directory07:24.33 
  The original file has all the images in an ICCBased space with a RGB base space07:25.48 
anddam mmm now I understand the reference to "usage documentation" in man better, "For more information, see /opt/local/share/ghostscript/9.19/doc/Use.htm."07:26.26 
kens Yes that's our documenatation :)07:27.00 
  For some reason the 'gray' PDF has *some* images converted to gray scale, but not all of them......07:27.19 
anddam kens: that ICCBased looks worrying, from your sentence, but I'm not sure what it is. I think ICC is something related to color profiles07:27.21 
kens Yes ICC is the International Colour Consortium07:28.01 
  It just means its a complex space07:28.14 
anddam I'm reading https://blog.idrsolutions.com/2011/04/understanding-the-pdf-file-format-%E2%80%93-iccbased-colorspaces/07:28.24 
kens The fact that its based on RGB just really says that its colour, not grayscale07:28.29 
  ICC isn't really something to worry about, it just adds to the complexity a little07:28.59 
anddam I see07:29.02 
kens The original file *does* contain trnasparency though07:29.28 
anddam a 1-component ICCBased space would be more appropriate there, wouldn't it?07:29.29 
kens Yes it would07:29.34 
  But it isn't what is there, its a 3 component space07:29.44 
anddam I can understand that transparency thing is bad07:29.50 
  so how can remove the transparency? how can I go to 1-component color space?07:30.06 
kens It makes the file much larger and *way* slower for no real reason as far as I can see07:30.11 
  For transaprecny flattening you convert to a PDF 1.3 file07:30.33 
anddam I thought of exporting the pdf pages as image, manipulate the images with imagemagick and reassemble the pdf07:30.34 
kens THat turns all the transparency into images, whcih I wouldn't normally reccomend, but since its all images already that's not a problem07:30.53 
  using imagemagick would work fine07:31.15 
anddam and does that require the 1.3 version passage? I mean reassembling images woulnd't add transparency07:31.48 
kens If you just render to, say, TIFF then downsacale the TIFF and re export as PDF then you don't need to got to 1.3. There is no way that would add trasnsparency back in as there is no reason to07:32.47 
anddam I'm using gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 -dNOPAUSE -dBATCH and it's quite slow07:33.00 
  I mean way slower than when I used to use PDFSETTINGS (no, I'm not going to anymore)07:33.34 
kens It will be slow, it has to render all these high resolution images, and the transparency, to image data07:33.36 
  The difference is that its rendering the images, not just reading and writing them, and it has to take the transparency into account when rendering, which it doesn't when preserving the transparency07:34.22 
  It doesn't matter whether you use GS+pdfwrite to do the flattening, or render to bitmaps, its going to be slow07:35.03 
  But its the only real way to lose the transparency. By the way the file opens really slowly in Acrobat too, and that's due to the transparency07:35.30 
  Ha, your original file lies, it claims to be PDF 1.3 but uses transparency, which is only legal in a PDF 1.4 or later file07:36.26 
anddam once I have the 1.3 flattened pdf, what's the proper approach to convert to mono or grayscale?07:36.44 
  I mean this literally has to be just read on screen07:36.51 
kens Using pdfwrite you set -sColorConversionStrategy=Gray you can't convert to monochrome with GS you would need to use ImageMagick07:37.19 
anddam I read that /screen setting used 72 dpi, that's a bit low for today's screen, isn't it?07:38.12 
kens Try it and see is the best I can suggest07:38.37 
  Wow that really is slow07:39.24 
  Oh its rendering at 720 dpi, silly me07:41.16 
  OK this:07:42.18 
  gs -r72 -sColorConversionStrategy=Gray -dCompatibiltyLevel=1.3 -sDEVICE=pdfwrite -o out.pdf "Italo.....pdf"07:42.18 
  gives me 65KB per page in a quick time07:42.31 
  Hmm 4.5MB07:43.41 
kens2 ah I see the original really does have beige background07:45.16 
anddam my flattened 1.3.pdf is 50 MB07:45.40 
kens2 Starting with the 'gray' file you supplied, I get 4.5MB with the command line above07:46.01 
anddam I converted it just with CompatibilityLevel=1.307:46.11 
kens2 THe original is much smaller, but the beige background makes it illegible07:46.16 
  OK set -r72 as well07:46.25 
anddam should now I use this 1.3 file for further conversion or the original?07:46.26 
  I mean as basis07:46.32 
kens2 I'm seeing better results using your gray PDF file, rather than the beige one07:46.48 
  The beige background is ugly when converted to gray scale07:47.07 
anddam I figure that could be filtered out (with simple images) with mono color space and a threshold07:47.33 
kens2 You could try that, certainly07:47.52 
  The reason you got a big PDF file is that the rendering defaults to 720 dpi, so if you use -r72 you will get better results07:48.21 
  But like I said, the background makes it difficult to read07:48.35 
  I would suggest that your best approach is to use GS to render the pages to somethign like TIFF< open the TIFF files in ImageMagick and process them there to remove the colour and reduce the resolution07:49.05 
  Then export back to PDF07:49.10 
anddam why TIFF?07:49.25 
kens2 Its best to use an image manipualtion tool to manipulate images07:49.29 
  Because TIFF is lossless07:49.35 
anddam I usually use PNG for lossless07:49.39 
  does TIFF have any advantage?07:49.47 
kens2 Well use PNG then, it doesn't matter07:49.49 
anddam ok07:49.54 
kens2 Just don't use JPEG07:49.55 
anddam ofc07:50.00 
kens2 You'd be surprised how many people don't understand that :)07:50.11 
anddam well you can understand how quality is of the utter importance in a hundred years old book about Greek coins in southern Italy07:50.45 
  that's a hot topic on twitter07:50.57 
kens2 LOL07:51.01 
  Its important for legal documentation07:51.18 
  THs kind of processing wouldn't be acceptable there07:51.28 
anddam my 72dpi Gray converted 1.3 file is still 55MB07:51.35 
  this is odd07:51.37 
kens2 That is indeed odd.07:51.45 
anddam gs -r72 -sColoConversionStrategy=Gray -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 -dNOPAUSE -dBATCH -sOutputFile=Italo\…07:51.50 
  Colo*07:51.56 
  dammit07:51.57 
kens2 :-)07:52.01 
anddam 4.4 MB07:53.07 
kens2 For me at 150 dpi the original beige PDF comes out at 1.9MB after flattening at 15007:53.23 
  Mine is better, but the JPEG artefacts are hard to read07:53.50 
anddam this is my actual line gs -r72 -sColorConversionStrategy=Gray -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 -dNOPAUSE -dBATCH -sOutputFile=[…]07:53.50 
  what's the proper tool to explode into imagesm pdftk?07:54.05 
  s/imagesm/images,/07:54.10 
kens2 You cna use GS to render to image07:54.18 
  Just use a different device07:54.24 
anddam no07:54.26 
  _you_ can use gs to render to image07:54.33 
  I'm not able to 07:54.40 
kens2 -sDEVICE=tiff24nc -sOutputFile=out%d.tif07:54.58 
  let me just see the PNG outptu07:55.06 
anddam I'm sure I'll search the web and the first answer on stackoverflow for splitting will use -sPDFSETTINGS07:55.07 
kens2 Well you can create muiltiple PDF files07:55.19 
  just use -o out%d.pdf07:55.29 
  Seems we have a number of PNG devices07:56.29 
anddam in the folder where's pdfwrite.ps I don't see devices i've been reading around07:56.30 
  like that tiff24nc or psmono07:56.38 
kens2 The devcies are compiled in07:56.43 
anddam I expected all devices to be there07:56.46 
kens2 pdfwrite.ps is a PostScript program, not a device07:57.02 
  if you do gs -help it will list the devices07:57.19 
anddam oh, I thought -sDEVICE loaded an external module or so07:57.27 
  yep, lot of those in -help07:57.42 
kens2 but -sDEVICE=png16m -o out%d.png will get you PNG output07:57.43 
  You can use -r to change the resolution of hte output07:57.56 
anddam why 16m ?07:58.05 
kens2 full colour07:58.10 
  I was thinking you would open the PNG i IM and then play with them there, so best to have the full range07:58.30 
  Don't use mono because that will halftone screen the output07:58.45 
anddam rather than having gs di it where I cannot fine-tune the conversion07:59.12 
  I see the point07:59.14 
  ok07:59.16 
  thanks for all the info07:59.29 
kens2 Yes exactly, best to use the right tool for the job, and an image editor is much better placed to do image editing07:59.38 
anddam way more than I intended to learn!07:59.41 
kens2 :-)07:59.45 
  Must go do my email now07:59.50 
anddam I'll still batch edit those07:59.50 
  thanks again07:59.54 
kens2 NP07:59.57 
tempus_fol Hello again, I've found a different set of PDFs failing PDF/A-2b conformation (because of "Overprint mode": https://0x0.st/qn0.html ); my command is always "gs -dPDFA=2 -dAutoRotatePages=/None -dColorConversionStrategy=/UseDeviceIndependentColor -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sProcessColorModel=DeviceCMYK -sColorConversionStrategyForImages=CMYK -dPDFACompatibilityPolicy=2 -sOutputFile=output.pdf input.pdf ~/PDFA_def.ps ./09:40.52 
  pdfmark" ; these PDFs have no images. Shall I open a new bug report or there's something patently wrong in the one-liner? I can eventually provide the source as private attachment09:40.53 
kens Make a report09:41.13 
  I'm not aware of overprint being forbidden, I'll have to check the spec09:41.40 
tempus_fol Done, bug #696799 (please mark the attachment as private)09:46.44 
Robin_Watts done09:47.10 
tempus_fol thank you again09:47.39 
Robin_Watts np.09:48.09 
Dylan_ Hi21:37.32 
ghostbot Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.21:37.32 
Dylan_ When a PDF is converted to JPEG with Ghostscript, a ICC profile is inserted into the JPEG and a "Copyright Artifex Software 2011" is added to the metainfo of the JPEG file. What is the license of this inserted ICC profile? The same license as Ghostscript? AGPL?21:38.16 
HenryStiles Dylan_: I think we intended them to be "do whatever you want don't blame us" but there should be a license file in the ghostscript iccprofile directory, the person that owns that stuff will be back in a bit and I'll talk to him about it.21:46.57 
Dylan_ thank you21:49.32 
HenryStiles for the logs mvrhel_laptop, can you and chrisl fix the licensing documentation for the icc profiles?21:57.12 
 Forward 1 day (to 2016/05/25)>>> 
ghostscript.com
Search: