Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2018/03/08)20180309 
sammy hi trying to reduce the size of pdf's from scanned documents using gs ver 9.21. Using default settings for pdfwrite gives an image dpi of 300 as per poppler. I undertood that the dpi should be reduced to 72 for default. If i use pdfsetting screen this occurs.03:38.29 
chrisl deekej: quick question: where can I get a Fedora 28 iso?10:07.01 
deekej chrisl: it's still in Alpha state AFAIK, meaning that it's like a Rawhide10:07.32 
  there are only nightly composes for the iso10:07.41 
  often it's hard to install due to some bug10:07.52 
  give me second10:07.54 
chrisl What I found labelled rawhide claims to be 2910:08.22 
deekej yes, because the branching for f28 has happened already10:08.42 
  but it's very close10:08.51 
  here: https://www.happyassassin.net/nightlies.html10:08.56 
  last known good built is from March 2nd10:09.32 
  there's a link to download that iso10:09.40 
chrisl I'll try with the one I have, and if that doesn't help, I'll the one from March 2nd - thanks10:10.50 
deekej np :)10:11.56 
velix I like that Ghostscript keeps track of bookmarks when splitting a document. But since I'm splitting a 6,000 page document into 10 page chunks, it takes a while. Ghostscript goes through the whole document at each extract process :D Will it work faster with a linearized PDF ?11:21.26 
kens No11:21.37 
velix Ok. So I'll use another PDF splitter, which is faster but drops the TOC and re-writing the bookmark to each of the file might be fastest.11:22.08 
kens If you use FirstPage and LastPage then Ghostscript will only process those pages, not the whole PDF file11:22.40 
velix kens: Here it scrolls through the whole file and throws out warnings 11:26.56 
  kens: don't say a word, I'll do it in a sec ;)11:27.08 
kens It does preparse the file for certain things (transparency mainly)11:27.16 
  But that should be quick11:27.31 
velix https://bpaste.net/show/0f58b65620be11:27.49 
  This throws out hundrets of warnings and it goes up to page 6,00011:28.08 
kens You could also split the 'n' page file into 'n' 1 page files and tehn combine them in groups11:28.12 
velix kens: like PDFtk's burst mode?11:28.25 
kens I don't know much about PDFtk11:28.35 
velix kens: ok ;)11:28.39 
  kens: But my command line is correct?11:28.46 
kens But if you do -sOutptuFile=out%d.pdf then you get one page per file11:28.57 
  The command line looks OK11:29.11 
velix kens: want to see the warning message?11:29.16 
kens You haven't said what the warnigns are, I guess they are due to the Outlines entries pointing to pages that no longer exist11:29.33 
  It does (obviously) have to parse the Outlines tree every time in order to find the Outlines relevant too the page range being processed11:30.14 
  You can also disable Outlines processing altogether by setting NO_PDFMARK_OUTLINES11:31.23 
velix kens: Yeah and that makes me wonder. I think it's because my PDF has bookmarks. Other tools (sorry to tell you) can do it much faster qhile dropping the bookmarks.11:31.30 
  kens: yeah, that's what I think.11:31.34 
kens As I said, you can sdrop the Outlines11:31.50 
velix kens: --> yeah, that's what I think.11:31.57 
  ;)11:31.59 
kens Also, Ghostscript isn't intended as a PDF manipulation toool11:32.03 
velix I know, but someone told me: better use Ghostscript. They know what they do.11:32.21 
kens The output PDF files will have content (at the low level) which is not the same as the original files11:32.31 
velix kens: actually, it's smaller than the original one ;)11:32.48 
kens The *result* should be the same, but the actual way its acheived will be different.11:32.55 
  Yes it coudl well be smaller11:33.03 
velix I didn't check the resolution of the images now.11:33.18 
kens There's an explanation in VectorDevices.htm on how the pdfwrite device works11:33.20 
velix ok11:33.31 
kens The image data wil be the same resolution11:33.34 
velix kens: ok. 11:33.41 
kens The default is to leave that unchanged11:33.44 
velix ah okay.11:33.58 
  let me try NO_PDFMARK_OUTLINES11:34.01 
kens The next release version will also pass JPEG data unchanged, unless the colour space or image resolution is specified to differ11:34.13 
  Thatwill be -dNO_PDFMARK_OUTLINES on the command line11:34.42 
velix kens: yeah, here is the warning without: https://bpaste.net/show/16ae187e0eaa11:35.09 
  that's a document with > 8,500 pages11:35.17 
kens Yes it'll do that11:35.19 
  Could suppress the warnign I guess11:35.29 
velix 2 > /dev/null aka 2>NUL on windows11:35.58 
  But the process is slowed down of course ;)11:36.04 
kens Yep11:36.08 
  And you lose any other usefule messages11:36.15 
velix -dNO_PDFMARK_OUTLINES did it11:36.24 
  <311:36.26 
  thanks11:36.28 
kens NP11:36.35 
  I'm not sure I ever documented that, I guess I should check11:36.48 
velix I know, this question is asked a lot, but could you also have a look on my command to create a PDF from PS ?11:38.44 
kens OK11:38.54 
velix https://bpaste.net/show/88862734fbe411:39.19 
kens Actually people rarely ask, they just crib bits from random Google searches11:39.22 
velix oops, ignore line 1311:39.31 
kens Well you don't need some of those switches; PDFSETTINGS=/default is redundant for example11:40.04 
  Personally I wouldn't use -dFastWebView, it just makes the file bigger11:40.19 
velix kens: oh okay. I thought it adds some kind of defaults, I'm not covering.11:40.32 
kens If you are using a reasonably up to date Ghostscript and you set ColorConversionStrategy it sets the ProcessColroModel for you, so you don't need to do that.11:40.54 
velix kens: Ok, I thought, this might speed up things, like splitting (but we discussed that before9.11:40.57 
kens Linearized PDF is all but useless11:41.08 
velix kens: can I mix up CMYK in RGB in a PDF? Actually, 90% of the time, my images are RGB.11:41.32 
kens At *best* (ie the file is valid, properly created, and teh consumer understands Linearized PDF files) it improves the time to render page 1, nothing more11:41.38 
velix kens: I think, PDF-X/4 allows this11:41.41 
  ok11:41.54 
kens If you set ColorConversionStrategy then all colour will be converted tot he desired space.11:42.06 
  If you leave it as 'LeaveColorUnchanged' then you can mix any legal colour space11:42.20 
  But if you want PDF/A or PDF/X outptu, then there are rules to follow11:42.35 
  Again, pdfwrite will try to enforce those.11:42.44 
velix kens: Yeah, but this is just for normal PDFs.11:42.47 
kens Then I wouldn't bother with ColorConversionStrategy11:43.13 
  Better not to convert colours unless required11:43.27 
velix kens: So should I drop anything with color or only the strategy?11:43.29 
kens Well to be honest, I'd just leave everything at the defaults, unless you have some pressing reason to change them11:43.59 
  The default for pdfwrite is to preserve everything unchanged as far as possible11:44.17 
  That's usually a good thing11:44.32 
velix I had some bad experience with default settings, like compression.11:44.44 
kens Can you be more specific ?11:44.57 
velix kens: It has been within other tools. The default settings for users had compression etc. since they wanted small PDFs. That's why I like to fix my config.11:45.29 
kens Well, pdfwrite wil generally try to choose a compression filter which gives maximum compression11:46.06 
  There is one known problem there, which is JPEG images11:46.19 
  You don't want to decompress and recompress those with JPEG again11:46.34 
velix maximum lossy or lossless compression11:46.52 
kens So its reasonable to control those in current versions of GS. The next release will pass through JPEG images unchecnged unless there is a need to meddle with the pixels (colour conversion, image downsampling etc)11:47.13 
  It doesn't matter what degree of lossiness you use with JPEG, multiple passes are bad11:47.42 
  Well, unless you use lossless JPEG but that ratehr defeats the point11:48.02 
  Embedding and subsetting fonts are the default11:48.20 
  And compressing them, the various 'compress' flags are really for debugging, its useful to be able to produce a PDF file that's readable when debugging. You shouldn't need to set any of those in production11:49.01 
  Downsampling images is off by default too11:49.20 
velix kens: Is it possible to read the defaults?11:49.32 
kens The defaults are listed in VectorDevices.htm11:49.44 
velix ok, let me have a look11:49.52 
kens Sectin 6.1 Common Controls and feattures11:50.30 
  Then the Distiller parameters table11:50.39 
velix Yeah, found it. 11:53.00 
  Is it possible to merge font subsets?11:53.13 
kens No11:53.19 
velix ok ;) I've got about 5-6 entries of the same font in the file.11:53.25 
kens There is no way to know when the glyphs in one subset are the same as the glyphs in another subset11:53.37 
  The pdfwrite device used to do this (because of architectural limitations) and that caused endless problems.11:54.04 
  We've since fixed it11:54.11 
velix Hmm, "default" seems to change ColorImageResolution11:54.52 
kens Yes, but since that is only used when determingin whether to downsample images (and the resolution to downsample to) and Image Downsampling is turned off in default, changing it has no effect.11:55.50 
  Notice all the Downsampl*Images are set to false11:56.19 
velix oh, I wasn't there ;)11:56.33 
kens NP11:56.38 
  Basically, the resolution only has any effect if you are downsampling. We have to set it to *something*, so we give it a value11:57.04 
  Obviously if you wanted to do real downsampling you would change the resoltuion as well as turning on downsampling11:57.28 
velix Yeah, but I don't want downsampling ;)11:57.52 
  kens: note 5 is gone!11:57.55 
kens Umm possibly11:58.01 
  Hmm11:58.10 
velix Optimize says (0,5) and has "no" for default.11:58.12 
  eeh false*11:58.24 
kens It does also say in note 0 that it has no effect....11:58.50 
velix kens: yeah11:59.01 
kens I think that should really eb OptimizForFastWebView or seomthing11:59.09 
  Which we handle differently11:59.31 
velix Actually yeah, default look good.12:00.02 
kens I should remove the reference to note 5 though12:00.02 
velix and another typo: (eg rendering transparent pages for output to PDF versions < 14)12:00.29 
kens Hmm, yeah that's been there a long time12:01.19 
  I'll try and remember to fix it when I'm not debugging a problem12:01.42 
velix ;)12:01.54 
  I really have problems with base14 fonts in Windows. Terms like 14 °F or 14 °C often look ugly in Times instead of Times New Roman.12:03.30 
  Is this a PDF viewer problem?12:03.38 
  Or a font problem12:03.43 
kens Could be either12:03.47 
  It depends on whether the font is embedded (base 14 fonts need not be)12:04.03 
  If the font is embedded, then all viewers should use it, and the results should be identical12:04.23 
velix Yeah, I'll analyze it in the next days and report it to you.12:04.39 
kens If its not embedded, then the viewer should use its own copy, because all PDF consumers are expected to have these fonts available12:04.48 
  While those fotns should be identical to the actual Times font (not Times New Roman) fonts do vary across vendors12:05.21 
velix On Windows, Times New Roman gets substituted to Times.12:06.46 
  You can turn this off, of course.12:06.50 
kens PDF has a pretty solid concept of what Times is12:07.02 
  And its not Times New Roman12:07.10 
  Though even Acrobat seems to substitute it these days12:07.26 
velix Yeah okay, I'll try to find Times as an TTF or OTF or Type1 and compare.12:07.28 
kens GS ships with a Times in type 1 format12:07.43 
velix nice12:07.51 
  Thanks for your time, I've done lots of notes12:09.26 
kens NP12:09.30 
velix Yeah, even mixed CMYK/RGB files from CorelDraw (using GhostScript PPD) create valid colors and maintains shapes.12:17.32 
kens I would hope so :-)12:17.43 
velix Actually, the rectangle isn't a rectangle anymore... it's a closed path of connected lines, but that's okay for me.12:18.37 
kens That's what I mean by the contents not being the same12:19.08 
  Though usually we try to convert rectangular paths into rectangles12:19.23 
velix Ahhhh!12:20.16 
  That's fine for me ;)12:20.20 
kens chrisl this pdf_copy_mono thing is spiralling.....12:20.30 
velix I had problems with Cairo in the past. It didn't close rectangles.12:20.40 
kens I have a file which works *better* if I remove the code to write the device colour out12:20.46 
  velix : Well we can only go on what we get in12:21.00 
velix kens: Yeah. But I'm totally fine now. Thanks a gain.12:21.26 
kens Ordinarily if a path is a rectangle we'll emit it as a 're' operator. Otherwise we emit it as a closed path12:21.26 
  NP12:21.29 
  chrisl I have a feeling this is a can of worms :-(12:21.46 
  I'm going to have some lunch.....12:22.10 
velix kens: I hope you don't eat this can of worms12:52.59 
chrisl kens: TBH, I'm not especially minded to hold up the release for that issue - OTOH, we have time to sit on it for a bit longer13:11.04 
kens chrisl its looking worse than I thought initially, if I pull out the set_dev_color that causes the 'correct' colour to be output for the uncached glyph, then several files atsrt showing differences. At least some of those are progressions (RGB 'sort of black' gets turned into CMYK pure black13:31.03 
  I believer this, in part anyway, comes down to that C=M=Y=0, K=1 -> Gray not mapping to 100% black problem13:31.41 
  The pdfwrite device defaults to RGB, if we end up writing a device colour we end up writing an RGB colour13:32.06 
  If the original was anything other than RGB then the colour is not perfect.13:32.20 
  Of coruse, we should *not* be writing the device colour13:32.34 
  There's an example here:13:33.16 
  https://ghostscript.com/~regression/ken/compare.html13:33.16 
  tests 6 and 713:33.21 
  And yet, other cases look like regressions :-(13:33.55 
chrisl Well, as I say, it's only March 9th, we have time to sit on the release, and let you bang on this13:35.41 
kens Yeah it might take a while....13:36.01 
chrisl It's just, this doesn't seem to me like a new problem13:36.24 
kens Admittedly its only the one case showing up, its just that investigating that shows me that we have an existing problem we've never noticed13:36.40 
  Ah, looks like this is mostly a problem for ps2write, which makes sense13:41.16 
chrisl So, that's probably why we have to write the device color?13:42.08 
kens No, it means we end up wirting bitmaps more13:42.25 
  And the bitmaps write the device color13:42.34 
  Its the lack of CIDFont support13:42.51 
chrisl Right, but I mean we can't write an ICC color13:42.52 
kens True, but ICC colour isn't really the problem13:43.05 
  Its because we can't embed a CIDFont, so we render a bitmap (mask) and if we do that, we end up writing the device colour, even though we don't need to13:43.48 
  Because we've already written the original colour13:43.58 
  And CMYK pure black does not map to R=G=B13:44.16 
chrisl I see13:44.28 
kens Really we shouldn't be writing the device colour I think13:44.44 
  I'll have to investigate each case, but it looks to me like these are all, or mostly, progressions13:45.04 
  I need to hassle Michael about the ICC profile, I don't know what's going on there, but its a different problem13:45.29 
chrisl deekej: are you around?14:35.40 
deekej chrisl: yes14:37.45 
  btw: I have tried to do a scratch build of GS 9.23, and the compilation went OK14:38.09 
chrisl That imagemagick problem - it's libpaper that's leaking the file descriptors14:38.21 
deekej there was some minor issue on my side, nothing significant14:38.24 
  ah, ok14:38.32 
  so, I can report a bug against libpaper because of it14:39.01 
chrisl I got the libpaper source from github, and that did *not* leak14:39.01 
deekej I'm checking the package in Fedora now14:39.53 
chrisl FWIW, the difference may not be in the source, but might be in the configuration14:40.43 
deekej ah, ok :)14:45.51 
  could you please send me the link to github for libpaper, please?14:46.04 
  turned my colleague is a maintainer of libpaper, so I can take it directly to him14:46.34 
chrisl I actually don't think the github repo is the canonical one, I used this: https://github.com/naota/libpaper14:47.25 
deekej hmm, that source code for it is 8 years old...14:48.07 
  he says that nmu5 is the latest version14:48.41 
  https://packages.debian.org/unstable/source/libpaper14:48.43 
chrisl FWIW, I've never worked out where the "real" origin for the libpaper code is14:48.56 
deekej we have nmu4 in fedora now, but I really don't know what that versioning means14:49.03 
  Arch linux references the origin as Debian: https://packages.debian.org/unstable/source/libpaper14:50.02 
  ah, here: https://www.archlinux.org/packages/extra/x86_64/libpaper/14:50.14 
chrisl Yeh, but the maintainer contact never replied to me when I tried to reach out to him :-(14:51.01 
  Okay, I know the core of the problem :-)14:51.20 
  The default /etc/papersize has no content, just a comment. Without an actual papersize entry, the libpaper code drops out the loop without closing the file14:52.26 
  if I add a line with a4 on it to /etc/papersize it no longer leaks14:52.57 
deekej Software engineering... LOL :D14:52.57 
  this is a stupid mistake :D14:53.11 
  *bug14:53.15 
chrisl Yep, sure is14:53.44 
deekej so basically we can either fix libpaper to be able to process empty /etc/papersize14:54.08 
  or we can workaround this by dropping something into /etc/papersize14:54.27 
chrisl I think it has to be something *valid* in /etc/papersize - which probably ought get done automatically by the installer based in region14:55.05 
  But it would be good to see libpaper fixed, too14:55.36 
deekej after some time, I no longer trust users to not accidentally delete/change the /etc files14:56.27 
chrisl Yep14:56.39 
deekej I know myself :D14:56.42 
  I need a git to track all the changes to not mess something up in /etc :D14:56.57 
chrisl It's probably a very simple patch14:57.21 
deekej I'll see if we could fix the source code14:57.24 
chrisl Yeh, actually I think the fix is trivial14:59.28 
deekej I'm looking now for that part of code :)14:59.55 
chrisl so, lib/paper.c15:00.08 
  line 153 onwards15:00.40 
  You can see that if we reach an EOF, we break out of the while loop, and never close the file15:01.21 
deekej you're right15:01.54 
  IMHO, that code is a kind of mess... :D it would deserve a rewrite :D15:03.41 
  anyway, it looks like a simple one-liner fix could resolve this15:04.38 
chrisl I'd go with two/three lines....15:04.53 
  https://pastebin.com/AXjwNrFD15:04.58 
deekej I'm not sure that fclose will set the file descriptor to NULL15:07.54 
  at least man page doesn't say anything about it15:08.02 
chrisl It doesn't, but I don't see a way that fclose gets called without returning from the functino15:08.49 
deekej I was thinking about inserting the fclose(ps); before the break (line 184)15:08.56 
chrisl I don't see a break at 184....15:09.47 
  I guess looking at difference source15:10.10 
  Oh, I see, yeh - that would work. I was misreading that block of code - that horrible :-(15:10.55 
deekej yeah, it's horrible :D15:11.09 
chrisl I missed the trailing ; on the while... line15:11.39 
deekej https://pastebin.com/NXcJj3Rx15:11.47 
chrisl Yep, that'll do it15:12.02 
  Oh, no it won't -(15:12.34 
  Consider a completely empty file15:12.49 
deekej did I miss something? :D15:12.50 
  oh crap15:12.57 
  another possibility is to add atexit() call15:13.48 
chrisl I still think my change will work15:14.05 
deekej but atexit might not work as well15:14.19 
chrisl atexit is for process exit. IIRC15:14.40 
deekej yeah, I might be wrong, it has been some time since I have done some proper coding15:15.38 
chrisl Everywhere the existing code calls fclose, it drops down to a return15:15.55 
  So it should never break out of the main while loop with a closed, but non-NULL ps pointer15:16.31 
deekej yeah, you're right15:18.37 
chrisl As you say, horrid code, but I'm not up for rewriting it completely15:19.38 
deekej nah, I'm not doing that either :D15:19.50 
  btw: it reminds me of this... :D https://goo.gl/vUa6Pj15:20.06 
chrisl Been there, done that!15:20.28 
  Along with the "Oh my god, was I high when I wrote that....?"15:21.07 
kens wonders how much longer I can keep blaming my predecessor :-)15:21.24 
deekej hahaha :D15:21.52 
chrisl kens: Given your predecessor? A long, long time! 15:21.59 
deekej xD15:22.03 
chrisl deekej: anyway, does that give you enough to go on with this libpaper stuff?15:22.50 
deekej I think we should submit the pull-request with the patch on github: https://github.com/naota/libpaper15:22.52 
  for everybody to see it15:22.55 
  will you do it Chris, or should I? :)15:23.09 
chrisl Do we think that's the place to do it? The only reason I was trying to foist it onto you is because I've no idea15:23.35 
  The last debian update seems to have been Nov 201615:24.29 
deekej well, we will have the patch in Fedora for sure. And since the debian maintainer is not much responsive, submitting a pull-request is all we can do15:24.29 
  I don't want to go and inform all the possible Linux distros :)15:25.00 
chrisl deekej: maybe if I open a debian bug, that'll get more attention15:25.03 
deekej too many of them :D15:25.05 
  okay, could you please open the debian bug?15:25.19 
chrisl I will do so15:25.24 
deekej I'll submit the pull-request15:25.25 
  here's the pull-request: https://github.com/naota/libpaper/pull/115:33.08 
chrisl I'd forgotten how horrid and naff submitting bugs to debian can be<sigh>15:36.27 
deekej xD15:36.42 
chrisl If I don't get any joy with them, I open one for Ubuntu15:36.49 
deekej now I feel sorry for you :)15:36.50 
  that might actually be easir15:37.01 
  *easier15:37.04 
chrisl The slight concern is that Ubuntu tends to weight bugs by affected users - which is this case, is not going to be many!15:37.50 
  I'll give it until Monday, and if my report does not appear in the Debian tracker, I'll report it to Ubuntu15:40.55 
  Ooh, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=89249015:41.23 
deekej nice :)15:44.02 
chrisl I'm still not going to hold my breath!15:45.20 
deekej :)15:53.49 
chrisl I'll give them a week or two, and if there's no action, go the Ubuntu route15:54.43 
deekej in Fedora it should be ready soon15:55.11 
chrisl I'm relieved to get to the bottom of that!15:57.16 
 Forward 1 day (to 2018/03/10)>>> 
ghostscript.com #mupdf
Search: