IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2014/08/07)2014/08/08 
mvrhel_laptop tkamppeter: are you there?03:46.36 
  tkamppeter: just sent you an email about the open print summit scheulde03:51.13 
  schedule03:51.21 
  rayjj: Sorry this timing thing ended up being way more complicated 03:56.19 
rayjj mvrhel_laptop: are you still unable to tell me what gs you used on your Pi testing ?04:56.58 
  mvrhel_laptop: about to send out updated timings...04:57.10 
  timings sent05:00.46 
  next I'll try an old version (from about 1 yr ago)05:01.22 
kens mlen (for the logs) I need to know if you are happy with the changes I made to your tiffsep patch before I can commit it.07:09.48 
tomty89 can someone help with this: [ Error handled by opdfread.ps :08:02.48 
  typecheck; OffendingCommand: gt ]08:02.49 
kens There's a problem.08:03.01 
  You've used ps2write to create a PostScript file, and something about the file is not compatible with your device.08:03.27 
  What are you sending the PostScript file to ?08:03.36 
tomty89 a ricoh printer, or cups with its pxl driver08:03.54 
  the error was printed on paper08:04.06 
kens It would be, yes08:04.12 
tomty89 It only occurs in LibreOffice, so far08:04.26 
kens I don't hitnk you can be sending it to a PXL driver, unless you are going via Ghostscript again (in which case, don't)08:04.34 
tomty89 by "the driver", i mean a ppd from ricoh/openprinting, which has a gs command inside it08:05.45 
kens You need to find out whether you are sending PostScript directo to your printer, or whether you are sending it to Ghostscript in order to have it converted to PXL. Basically you need to sort out what CUPS is doing. Then you can send us a spcimen file to reproduce the problem, and a command line (assuming Ghostscript is interpreing the PostScript)08:06.00 
tomty89 I think it's sending to ghostscript, afaik my printer doesn't support postscript08:06.36 
  coz it plays fine with a generic pcl or pdf driver, just as the config test page printed08:07.00 
kens Right, so it looks like you've taken a PDF file, converted it to PostScript (using Ghostscript), then sent the PostScript to Ghostscript again in order to convert to PXL. Not really a great sequence, you should convert the PDF to PXL in one step.08:07.06 
  Like I said, you need to figure out what CUPS is doing. Once you know that, you can send us the original file, and the command line required to reproduce the problem.08:07.52 
tomty89 i guess it's ricoh's bad indeed, but i think there's a problem between LO and CUPS/GS, coz I export the file to PDF and print with Evince, it prints08:08.17 
kens I'm not saying its a Ricoh problem, in fact it probably isn't. It looks to me like GS is consuming the PostScript.08:08.44 
  But I cannot help you at all with CUPS. In order to help you I need the original PDF file, the command line used to convert that to PostScript and the command line being used to invoke Ghostscript when consuming the PostScript file.08:09.33 
tomty89 well https://gist.github.com/anonymous/aabb8c55b77f733db6af from cups log08:12.28 
  http://www.openprinting.org/download/PPD/Ricoh/PXL/Ricoh-MP_C4503_PXL.ppd08:12.39 
kens SO that's the conversion to PostScript, but the PPD doesn't help me at all.08:13.06 
  Once you have the original PDF file, teh command line to convert to PostScript, and ideally the command line being used for sending to the printer, the best thing to do is open a bug report at bugs.ghostscript.com08:13.49 
  You can attach the file there, and I'll be able to look at it.08:14.12 
  door brb08:14.15 
tomty89 the problem only occurs if I print in LibreOffice, and it seems to happen for choosing to emit PDF and PS08:15.01 
  but as I said, if I export it to PDF with LibreOffice, and print it with something else, it prints fine :(08:15.52 
kens Presumably because it skips the conversion to PostScript step08:16.34 
  It looks to me like you are doing PDF->PS->PXL tehn all you really need is PDF->PXL08:17.00 
  Now I'd like to fix the PDF->PS bug, but to do that I need the PDF file and the command line to create it (which you pasted above) and ideally the command line to do PS->PXL08:17.38 
  If you really can't get that, then I can maybe live without it, I cna at least try08:17.54 
  But again, best bet is to open a bug report, attach the PDF file that is used to go PDF->PS, and the command line used to do that08:18.48 
kens is amused by the -sLanguageLevel=3 in the CUPS invocation08:19.19 
tomty89 it's a document at work, so not at convenience to share, but the problem doesn't seem very document-specific to me, though it's pretty "delicate" to trigger it08:21.14 
kens I'm afraid if you can't find a document to share, there's no way we cna fix the problem. We need to be able to reproduce it to see what's wrong08:21.47 
  The attachment cna be marked private so that only Artifex staff can access it, or you can send me it via email if that helps08:22.16 
  By the way,sharing the LibreOffice document won't help, because we'd have to have exactly the same setup as you in order to generate the same PDF/PS file, we really need the PDF input that gets sent to Ghostscript by CUPS08:24.56 
tomty89 hmm, then the problem comes, how to get that "PDF input"?08:26.16 
kens I believe that you can capture it in CUPS08:26.29 
chrisl https://wiki.ubuntu.com/DebuggingPrintingProblems#Capturing_print_job_data08:27.42 
tomty89 ok thanks08:29.28 
chrisl I need to run an errand - back in half an hour or so.....08:30.24 
mlen kens: sorry for the late reply. I just tested the patch. Everything works fine :)08:32.37 
kens mlen, you are happy enough with the format of the output ?08:32.50 
mlen yes, it's still easy to parse08:32.59 
kens OK great I'll commit it now then, thanks!08:33.08 
mlen kens: thanks! :)08:33.43 
kens Just need to stash my current work first......08:33.58 
  Oh, I'd better add the new switch to the docs too08:35.24 
tomty89 heck, the gs command indeed have a problem! i guess i can file a bug report later08:45.12 
kens If there's a bug, we'd like to fix it....08:45.39 
tomty89 i captured it in both pdf and ps, and the gs command output a file which evince could read08:45.57 
  but if i print only page 1, which prints fine, evince read the output08:46.37 
kens I'm not sure I se the difference.....08:47.05 
  But if you file a bug report, I'm sure it will be clearer08:47.49 
tomty89 lol, s/could/couldn't for the first line08:48.05 
kens Aha, OK08:48.11 
  That makes more sense08:48.15 
  Is this PostScript output ? I didn't think evince could read PostScript08:48.26 
tomty89 it is, and it could (with some lib or ghostscript itself)08:49.11 
kens Probably it uses GS to cvonvert to PDF.....08:49.25 
  The good news here is that, in both cases, Ghostscript is being used to read the PostScript and Ghostscript doesn't like the PS file. THis is good because, unlike printers, we can actually debug it ourselves :-)08:50.05 
tomty89 :)08:50.22 
kens You may be the first person to actually find a bug in the ps2write output, instead of a buggy PostScript printer08:50.51 
tomty89 lol08:50.57 
  not sure if it matters, but my documents are mainly of Chinese characters08:52.11 
kens Well it means I can't read them, but that's not a problem08:52.27 
tomty89 I'll file a bug report tonight after work :)08:53.08 
kens It probably means they will end up being bitmaps in the PostScript output too, so its a good idea to make sure the resolution is correct for your printer. I'd be surprised if CUPS does that. Beyond that, its not a problem in general08:53.10 
tomty89 well, the ppd provides the resolution options, and the ppd is provided by ricoh (semi-officially, i guess :S)08:54.35 
mattchz morning.09:14.31 
  does someone want to add a comment to this: http://ask.slashdot.org/story/14/08/07/1811227/ask-slashdot-best-pdf-handling-library09:14.35 
kens Morning matt09:14.38 
tomty89 heh, even gs gives the same error on paper when viewing the gs outputs, convincing enough09:15.22 
kens tomty89 : yes this is what I expect.09:15.36 
  From what you described earlier09:15.47 
tomty89 Could it be LibreOffice's fault? I mean maybe it outputs bad pdf and ps for gs09:16.48 
kens tomty89 : almost certainly not. Its most likely a bug, which requires specific PDF to trigger it. But without seeing the file I cna't really tell09:17.16 
tomty89 ok, i am sure i'll file a bug report, thanks for your generous help :)09:17.52 
kens NP please do file a report, otherwise we won't be able to fix it. Its surprising how many peple can't see that :-(09:18.17 
tomty89 btw i'm not sure if it's a good or bad news, another output from a spreadsheet also trigger the problem09:18.51 
kens Well its probably like I said, you need a specific type of PDF.09:19.12 
  It can't be common though, or we'd have heard before :-)09:19.25 
tomty89 i see09:19.33 
kens chrisl is tor on vacation today ?09:50.53 
chrisl Not that I'm aware of, but I haven't really kept tabs09:51.46 
kens Hmm, I really need to speak to him, I can't build the latest MuPDF under Windows. Or more accurately I can't build mudraw09:52.13 
chrisl What's the error?09:52.30 
kens lots of them all much the same:09:52.42 
  error C2065: cmap_UniCMS_X : undeclared identifier09:53.01 
  I cna probably fix it, but I'd prefer tor to tell me why its wrong.....09:53.17 
chrisl Have you tried git clean -x -f -d then rebuild?09:54.04 
kens Hmm, no let me try that09:54.16 
  Oh that removes my project, oh well09:55.16 
  Ah that seems to work, thanks chrisl09:56.10 
chrisl NP09:56.23 
kens Now I cna check mudraw and send an email09:56.39 
  Oh, first test I try it gets it wrong09:57.52 
  Well I'll send them an email by the time they figure out how to build MuPDF it'll probably be fixed :-)09:58.53 
  Hmm, tor must have been lurking somewhere -)10:14.44 
chrisl Weird, on my computer, the Acrobat PS output for the first 100 pages of the PLRM (using the Ray's benchmark 600dpi command line) takes 22 seconds. If I "convert" the Acrobat PS through ps2write, and run the result, it takes 10 seconds.......10:16.44 
kens A win ! :-)10:16.57 
chrisl Yes, but this is odd: doing that I get Type 1 outlines in the ps2write output, converting the PDF via ps2write directly, I get type 3 bitmap glyphs.....10:17.46 
kens Well I noticed that the PLRM has a *lot* of fonts, often subset, and including some multiple masters10:18.11 
chrisl I'm suspicious that the PLRM.pdf does weird sh*t with encodings, which probably confuses ps2write10:18.34 
  kens2: well, the only thing is, our PS output is already faster than Acrobat despite the large number of type 3 glyphs we define at the start - if we avoided that, we might be *much* faster.....10:26.06 
mattchz anybody fancy commenting on this: http://bugs.ghostscript.com/show_bug.cgi?id=69533610:27.04 
chrisl mattchz: probably needs paulgardiner 10:27.43 
kens2 chrisl if I could get this fallback code to work it would emit fewer glyphs. The charprocs, when stored as outlines, are captured using the identity matrix, then the text should be scaled by the CTM. THis means that duplicate charprocs would be noticed and elided, even when the font is a different size. If I could only get the matrix right.....10:27.44 
  I saw the post to the thread, nothing I cna say about it10:28.00 
  chrisl OK is htis the PDF file i SENT THE OTHER DAY YOU ARE USING ?10:29.08 
  Grr stupid caps lock key10:29.18 
chrisl kens2: yes, it is10:29.21 
kens2 OK I'll take a quick look.10:29.33 
paulgardiner I don't think we support any form of writing of encrypted files.10:29.59 
kens2 Better stash my current code first though, or I'll get funny results10:30.13 
chrisl kens2: it's not urgent/vital, just interesting - it's odd that the Postscript is a lot slower than the PDF10:30.16 
kens2 chrisl well, the PostScript is (IMO) terrible. THe only thing in its favour is it does actual;ly work (mostly)10:30.47 
chrisl kens2: this is the Acrobat produced Postscript - *our* Postscript is faster10:31.13 
kens2 :-D10:31.27 
chrisl The Acrobat PS must be pretty appalling]10:32.07 
kens2 It really must be, yes.10:32.18 
chrisl But the main this is, for henrys, that it really seems the problem is construction of the PS, not necessarily a problem with out PS interpreter10:33.27 
  s/out/our10:33.43 
  I seem to struggling with the typing today :-(10:33.58 
kens2 If you run the 2 PS versions through acrobat, is ours faster there too ?10:34.01 
  Distiller that its10:34.09 
chrisl I haven't tried it - I've generally struggled to get meaning performance numbers from Distiller10:34.44 
kens2 Oh I use a stopwatch, I don't believe what Distiller reports, it lies10:35.02 
chrisl Well, with a time of less than three seconds.....10:36.25 
kens2 THat's kind of tricky10:36.37 
  Hmm, the very first glyph triggers a fallback O.O10:37.06 
chrisl Oh my. Is it just heading that way because it's a non-standard encoding?10:37.54 
kens2 Don't know yet10:38.03 
  Its coming back from pdf_obtain_font_resource with an error, I have to keep on tracking it down10:38.22 
  AH, pdev->HaveCFF is false10:39.55 
  And since tghe fonts are CFF fonts.....10:40.10 
chrisl Huh? So I wonder how the pdfwrite output, converted to PS contains T1 fonts10:40.48 
  Hmm, our Postscript is much, *much* slower through Distiller than the Acro produced PS - 3 seconds, versus 57 seconds10:41.03 
kens2 Well I guess not all the fonts are CFF10:41.07 
chrisl They are all Type1C, IIRC10:41.19 
kens2 That's CFF though isn't it ?10:41.31 
chrisl CFF charstrings10:41.43 
kens2 Right, and I htought you said hte pdfwrite output was type 1C10:42.00 
chrisl Yes, it is10:42.16 
kens2 OK so CFF in CFF out, I must be missing your point10:42.31 
  Oh I see what you mean, the pdfwrite converts to type 110:42.51 
  No idea how that works10:42.58 
  I've just found a control called HaveCIDSystem which apparently allows ps2write to output CIDFOnts. I wonder if it works10:43.57 
chrisl I don't think it does work, I think I tried that before. Might work for Type 1 outlines?10:44.25 
kens2 Maybe, I obviously keep forgetting about it10:44.42 
  OK so HaveCFF is always false for ps2write10:44.59 
chrisl Even if it works, it's of very limited use without TTF outlines support10:45.13 
kens2 also for pdfwrite if PDF level is < 1.210:45.20 
chrisl So we'll convert type 1 charstrings to CFF, but not the other way?10:46.45 
kens2 I don't know. I was only looking at whether we will emit CFF or not. If tis ps2write or PDF < 1.2 we won't.10:47.38 
  well I converted the PDF file to PDF using pdfwrite, and ps2write is still going through the fallback code for me10:48.22 
chrisl Hmm, maybe I confused myself with all the different versions of the file I've been trying10:48.57 
kens2 Well for me I still get a load of bitmaps charprocs in teh output PS file10:49.41 
  brb10:49.45 
  Nothing but interruptions today.....10:52.29 
chrisl Well, I suggest we not worry about this just now. Wait and if the new charproc outlines capture brings the improvement we expect10:53.53 
  Wait and see....10:54.03 
kens2 If I cna ever getit to work10:54.20 
dcmst Hi, is it possibile to disable the "auto advance" feature in muPDF presentation mode?11:33.37 
kens2 I'm sure you can write code to do so, I've no idea what that is though11:34.23 
dcmst so there is no user interface to disable it (like options, shortcuts, etc.)?11:39.53 
kens2 Well I don't know what feature you are referring to.11:40.09 
  It will also depend on what platofrm you are running on11:40.32 
  But probably, no.11:40.42 
dcmst this is where the feature I want to disable is described: http://ghostscript.com/pipermail/gs-commits/2012-October/015421.html11:41.19 
kens2 Well not pressing p woudl seem to be favourite then11:41.47 
  If you don't do that, tehn it won't be in presentation mode and so won't advance11:42.13 
dcmst I want presentation mode without auto advancing11:42.47 
kens2 Then you will need to write it yourself.11:43.20 
dcmst I need the transition effect (I'm recording a video of the pdf)11:43.20 
chrisl kens: as last week, I'm heading out for a bit of squash training in a bit - I'll call if I get back early enough13:12.17 
kens OK have fun13:12.24 
chrisl I think if it's "fun", I'm probably doing it wrong ;-)13:12.45 
kens :-)13:13.03 
rayjj chrisl_away: (for the logs) That "trick" of converting the Acrobat PS of the first 100 pages of the PLRM (with the setting to preload the fonts, not incremental) using gs ps2write resulted in a PS fle that is 1.5Mb and runs in 45 seconds !!!14:15.58 
  kens: it looks to me like the presentation mode isn't mentioned in the 'usage' in x11/pdfapp.c14:32.48 
kens Possibly not.14:33.07 
  But that wasn';t really what he was asking about nayway as such. THe 'feature' he was describing is just part of 'presentation mode' which is why I had no idea what he was talking about14:33.43 
rayjj looks like adding a prefix to set the time for the delay on each page would be easy, then 0p could set infinite time (right now AIUI, 5 seconds is hard coded == 5p)14:42.05 
kens Yes, its hard coded. The poster opened a bug report, so its up to Tor now from my POV14:42.44 
tkamppeter mvrhel_laptop, hi14:45.01 
mvrhel_laptop hi tkamppeter. did you get my email?14:52.56 
tkamppeter mvrhel_laptop, yes, it arrived around 6am here, you had already left when I saw it at our 8am.14:57.41 
mvrhel_laptop ok14:57.52 
tkamppeter mvrhel_laptop, to get your presentation onto the Thu or Fri we need to contact Mike Sweet and/or Ira.14:58.43 
robin_watts_mac MuPDF can't write encrypted files at all.14:59.12 
  hence we can't write annotations to encrypted files.14:59.35 
mvrhel_laptop tkamppeter: ok. I think Ira was cc'd on that email. If you can handle this that would be great14:59.55 
tkamppeter mvrhel_laptop, I have sent out a mail to them now to see what can be done.15:08.11 
mvrhel_laptop ok15:08.24 
  thanks tkamppeter15:08.30 
rayjj coffee. bbiab15:10.21 
tomty89 hi it's me again. i am filing a bug report for the possible gs bug which output invalid file with LibreOffice emission. Is it true that I can mark the attachments as private?15:11.30 
e98 any devs awake and reading?15:26.21 
rayjj e98 was too quick :-(15:28.33 
kens No patience.....15:28.43 
tomty89 kens: it is true that i can make the attachment private? coz i don't see an option in bugzilla15:28.49 
kens tomty89 : you can't, butI can15:28.58 
tomty89 lol15:29.02 
  ok15:29.04 
  gonna upload them now15:29.11 
kens Just stick it there and tell me and I'll make it private15:29.15 
rayjj tomty89: if kens is gone, and I notice the attachment (we get email) I'll mark it private15:30.10 
kens I'll be here for a bit yet15:30.24 
rayjj chrisl: thanks for the idea. Works a champ. Now all we have to do is get ps2write to do as well without Acrobat "helping" beforehand :-)15:30.58 
chrisl rayjj: well, as discussed with kens, hopefully the work he's doing now will improve the situation quite a bit15:31.35 
kens Well, I can get the text to come out now, positioned in the right place, and correctly sized. But the Widths are wrong, and all the curves are 'warped' for some reason.15:32.08 
rayjj chrisl: I didn't read the logs thoroughly yet15:32.12 
tomty89 upload is done, thanks :)15:32.43 
rayjj kens: you're capturing the glyphs as outlines ?15:32.46 
kens ok 2 secs15:32.49 
  done15:34.40 
tomty89 :D15:35.38 
rayjj kens: even building fonts of the glyph bitmaps would be better, I'd think. Needing 11k glyphs for 100 pages doesn't seem reasonable -- there must be quite a bit of duplication15:35.42 
kens It does build fonts with the glyph bitmaps, type 3 ones15:36.33 
rayjj kens: I see. I guess it's just not carrying the glyphs over pages or something. 110 or so unique glyphs per page seems reasonable for the PLRM15:39.08 
kens No, because they are bitmaps, they are different for each size or transform of each glyph15:39.38 
henrys kens:I’ll look at the FirstPage/LastPage thing that came in.15:47.18 
kens OK thanks henrys15:47.45 
rayjj hmm.. that seems unfortunate. The default build for mupdf in Makefile is "debug". Most people won't know to say "make build=release" since the README doesn't mention it16:16.52 
  that means that newbies evaluation mupdf performance will be getting a debug build :-(16:17.23 
  mvrhel_laptop: have you done any performance testing on linux with mupdf ?16:18.18 
mvrhel_laptop rayjj: I did testing on the pi16:18.35 
  Robin and I match when I did that16:19.03 
  matched16:19.07 
  so I am pretty sure I was not doing a debug build. plus mupdf was faster that gs16:19.49 
  than16:19.52 
  can't type today16:19.54 
henrys rayjj: the default build in ghostscript is debug in VS too.16:20.32 
  I find that odd also16:21.10 
  mvrhel_laptop: when is the meeting?16:40.56 
mvrhel_laptop it is at 11 today16:41.07 
henrys mvrhel_laptop: they have a habit of stopping in chewing engineering hours and disappearing I wonder if it is a strategy16:42.24 
mvrhel_laptop henrys: I hope something comes out of all of this. 16:42.50 
henrys kens, chrisl : I didn’t know adobe was prompting to install font packages when viewing a pdf, when did that come about. Never seen it on the mac, just windows which I’ve been using more frequently lately.16:45.57 
kens CJK font pack ?16:47.05 
  I alwys install it immediately anyway16:47.22 
chrisl I'm still using Acro9 so......16:47.23 
henrys kens: yes it install a cjk font package when vieiwing the jeitta files16:56.25 
kens AH well I install those when I install Acrobat, so I wouldn't get prompted for that. I always install all the fonts initially16:57.00 
  69541716:57.07 
  OOps16:57.10 
  Night all17:01.30 
rayjj mudraw is amazingly slow at 600 dpi. CMYK on the PLRM is 32 seconds per page on the Pi, but it's even 5+ seconds per page on my laptop. 22:16.30 
  is this a reasonable command line? : mudraw -r 300 -o /dev/null -F pam -c cmyk -b 0 -B 661 -m -M PLRM_100_AR.pdf22:17.22 
  even without the -B 661 it is just as slow. gs does this *FAST* on my laptop (all 100 pages in 5.8 sec, and it was < 62 sec on the Pi)22:19.49 
  note the -B 661 for mudraw was to constrain the memory use to near what the -dBufferSpace=16m does to gs forcing 10 bands22:21.03 
nemo so. given my general failures to make parameters to gs do anything to the jpeg quality of the resulting PDFs, I tried grepping for QFactor in source22:24.01 
  I modified Resource/Init/gs_pdfwr.ps and set all of them to 0.95 111122:24.20 
  reran make22:24.22 
  reran PDF generation22:24.26 
  image was totally unchanged from every other attempt22:24.34 
  what am I missing ☹22:24.38 
rayjj nemo: you probably need kens or someone to dig into pdfwrite. Even though it is in the docs, pdfwrite may not pay attention to the Filter params. And AFAIK, gs_pdfwr.ps isn't used (it was used only by pdfopt.ps which was intended to allow PS programs to load, then output PDF files)22:27.10 
  on j9_acrobat.pdf gs on the Pi, gs does all 5 pages in 47 sec (at 600 dpi) and mudraw takes 242 sec :-(22:28.07 
  nemo: bad news. Looking at devices/vector/gdevpdfu.c in pdf_put_filters, there is a comment /* Currently this only saves parameters for CCITTFaxDecode. */22:33.49 
  nemo: which seems to correspond to code I see later that never calls s_DCTE_get_params (as it does for s_CF_get_params)22:35.39 
nemo à² _ಠ22:40.40 
  I'm astounded that setting the jpeg quality in PDF images has turned out to be such a massive task22:41.08 
  'cause, every single one outputted so far has been unusably muddy22:41.30 
rayjj nemo: in fact, there are many parameters in the Ps2pdf.htm doc that are mentioned that are currently ignored by pdfwrite AFAICT, but we'd have to wait for kens to make sure22:41.54 
nemo I'd be perfectly happy to hardcode it if I knew where to do it22:42.39 
rayjj nemo: and just downsampling and lossless doesn't cut it ?22:42.45 
  nemo: can you send a page of the file to me: ray at artifex.com with the params that you are using so I can play with it. I may try more extreme ImageResolution and forcing Interpolate true22:44.06 
nemo rayjj: um. not sure... do you have a commandline for that? 'cause my prior attempt to do that still resulted in jpeg compression22:44.10 
  rayjj: man. I'm hardpressed to come up w/ a page I can share22:44.25 
  but seriously, this happens on like every single PDF I've tried so far22:44.38 
  ugh. is late on a friday and I have to head out anyway. I'll attack it next week I 'spose :-/22:45.08 
rayjj nemo: that's the other thing I want to look at, is why the text is being done with JPEG22:45.25 
nemo rayjj: well. these are scans of existing docs.22:45.38 
  rayjj: their first scan as mentioned before was done stupidly22:45.48 
  not in document mode, so no background removal22:46.01 
rayjj nemo: right, and if the original scan was muddy (not just big) we may not be able to do much22:46.13 
nemo of the ones after they fixed this, a number still had enough ink and whatnot garbage that they were still enormous after Flate22:46.24 
  rayjj: well, what bugs me is, I print out a page of the original, muddy...22:46.44 
  I do something in gs, and I get a ton of compression artifacts around everything22:46.57 
  but... no matter *what* I try to change in the params, I always get compression artifacts.22:47.10 
rayjj nemo: but the text is supposed to be black ? Just happened to be a color scan ?22:47.14 
nemo most of the scans are greyscale22:47.22 
rayjj because we never use Flate (AFAIK) with monochrome images22:47.30 
nemo some are colour (blue ink, the odd picture)22:47.32 
  ah22:47.39 
  what bugs me is that all my outputs from gs are basically identical22:47.58 
  nothing I've done, in the hours of messing with this, seems to cause the slightest bit of difference to the output22:48.12 
  unlike, say, opening it in GIMP and saving at a few different jpeg levels, where the differing results are obvious22:48.27 
  it's as if gs is ignoring everything, and always saving low quality jpeg22:48.40 
rayjj nemo: but gimp doesn't preserve the pdf text, right ?22:48.59 
nemo yeah. that was just an example22:49.12 
rayjj right, so you need to extract the image, process it, and put the PDF back with the modified image, leaving the rest of the (presumably OCR layer of text) in the PDF22:50.18 
  many scanner apps OCR the text and put it into the PDF as Tr3 (Invisible) text so that it is searchable and can be cut and paste22:51.08 
nemo really, the results in gs are more like gimp at 15% quality.22:51.22 
rayjj then the image goes in "however"22:51.29 
nemo hm...22:51.37 
  that really would do the trick22:51.45 
  I can process this image in absolutely anything. convert for example22:51.55 
  I don't need to use ghostscript22:51.59 
  is more the "stitching it back together" that is the tricky part22:52.08 
rayjj nemo: but I'd have to see an example to have ideas that are more than just guesses (just one page)22:52.18 
  nemo: mutool is the thing for PDF manipulation22:52.53 
nemo hm. found something that looks genericish22:53.18 
rayjj usage: mutool <command> [options]22:53.22 
  clean -- rewrite pdf file22:53.24 
  extract -- extract font and image resources22:53.25 
  info -- show information about pdf resources22:53.27 
  poster -- split large page into many tiles22:53.28 
  show -- show internal pdf objects22:53.30 
nemo 'k22:53.48 
  so. how do I extract 'sactly one page from a PDF?22:53.57 
rayjj nemo: and each of those functions has options for how it works22:54.06 
nemo well. I mean, without altering it22:54.22 
  I wonder if pdfseparate would do that22:54.45 
rayjj nemo: mutool clean -ggg in.pdf out.pdf22:54.57 
  nemo: mutool clean -ggg in.pdf out.pdf 122:55.03 
nemo ok22:55.06 
rayjj (for just page 1)22:55.08 
  nemo: pdfTk also lets you "burst" a PDF, but AFAIK it writes all the pages22:55.41 
  nemo: mutool clean -ggg in.pdf out.pdf 1-10 for the first 10 pages22:55.58 
  the -ggg leaves out unused Resources (fonts, images, etc.) that may be in the original 22:56.36 
nemo http://m8y.org/tmp/after.pdf - after gs22:57.21 
  http://m8y.org/tmp/before.pdf - before gs22:57.57 
  I can totally understand some muddying. What puzzles me is that nothing I do seems to change it at all22:58.37 
rayjj nemo: the "after" still looks fine to me. I can even read the "OFFICE OF MINORITY HEALTH" 22:59.32 
nemo rayjj: it looks a lot worse on docs with background removal I gotta say22:59.47 
  rayjj: also, it looks fine on the screen. is when printed that the fuzziness around the letters becomes clearer23:00.09 
  frankly, I was still happy w/ it, was just boss who was not, and wanted me to improve quality23:00.25 
  which, I thought would be an easy task23:00.29 
rayjj nemo: I'll try printing it and see what I get (have to go home for that)23:00.31 
nemo no biggie23:00.38 
  hm23:00.51 
rayjj nemo: and I'll have a look at the contents of the "before"23:00.54 
nemo lemme take a pic on my phone ☺23:00.57 
rayjj nemo: mutool clean info before.pdf shows: [ DCT ] 3435x4394 8bpc DevRGB (3 0 R) and "after" has: [ DCT ] 1030x1318 8bpc DevRGB (10 0 R)23:05.02 
nemo 10 0 R ?23:06.54 
rayjj which means you are starting with a 400 dpi image and going down to a 120 dpi image23:06.57 
nemo sure23:07.09 
rayjj nemo: that's the object number23:07.11 
nemo I pasted that in all my gs scripts ☺23:07.18 
  but... the artifacts around text don't appear related to dpi23:07.28 
  http://m8y.org/tmp/temp.jpeg23:07.30 
  I guess they could be23:07.44 
  but they look more like the usual jpeg stuff23:07.49 
  s/gs scripts/gs commandlines/23:07.57 
  the general object was to take something that was at a super-high DPI and reduce it to something that was still reasonably printable23:08.33 
rayjj nemo: OK. I see the diffs in temp.jpg23:08.34 
nemo it's even more dramatic on an image with background removal23:09.08 
rayjj nemo: right. I'll have a look23:09.16 
nemo there's grey fuzzy blocks around all text. I can photograph a piece of one of those pages too23:09.36 
rayjj nemo: background removal often involves a "thresholding" stage that makes sharp edges, but this is *NOT* something that JPEG compression deals with well23:10.01 
nemo well23:10.15 
  I was going to experiment with lossy vs non-lossy compression23:10.34 
  see if, say, jpeg 0.95 or something outperformed non-lossy overall23:10.47 
  but I was getting truly hideous results, and couldn't get any difference in appearance in gs23:11.02 
rayjj nemo: The main thing I see is to try and compress monochrome (or near mono) using CCITT23:11.19 
  mutool extract lets me extract the image and play with it23:12.06 
  nemo: that example is good because it has non-black (gray) text as well as images, all in the same image. You don't want ot make the text nicer at the expense of crap on other parts of the image23:13.32 
nemo there's a reasonable amount of that in these scans23:17.03 
  here's what happened on a form with background removal, and the same parameters23:17.16 
  the excessive jpeg compression is more obvious23:17.23 
  http://m8y.org/tmp/temp2.jpeg23:17.33 
  aaaanyway. way-late here. gotta go23:17.59 
  I appreciate you taking an interest, and if you see absolutely anything on controlling jpeg quality in gs, that'd be lovely23:18.19 
  simply calling one gs commandline seems more elegant than working out something w/ mutools and convert23:18.36 
  but, eh, I imagine the latter is more "unixy"23:18.43 
  and certainly removing the image and popping it back in gives me a lot more control over what happens to the image23:19.00 
  can even do stuff based on, say, imagemagick identify's analysis of it (number of colours etc)23:19.18 
  I just need to figure out how to make these mutools do that, since I just heard about 'em, and make sure I don't screw up the PDF structure (bookmarks, text OCR, whatnot)23:20.11 
  page dimensions, image position...23:20.22 
rayjj nemo: I think a transfer function might help. The histogram has most of the "fuzz" in the before up near white and black. After mapping those to pure white or black, the noise in the image is MUCH better23:21.59 
  then getting gs to output as a lossless Gray image should be a decent size23:22.37 
  nemo: so, the question is: the "before" image is 4.5Mb and the "after" image is 640Kb. If I munge to gray and save "lossless" I get down to 280Kb and it looks pretty good (to me). See: http://casper.ghostscript.com/~ray/before_9_245_gray_120dpi.png23:39.14 
  it has a lot less noise that the "after" image" http://casper.ghostscript.com/~ray/img-0010.png23:40.19 
  nemo: keeping RGB (not forcing gray) but doing the transfer fucntion and lossless gets to < 600 Kb and the image is:23:44.10 
  http://casper.ghostscript.com/~ray/before_9_245_rgb_120dpi.png23:44.58 
  nemo: the 9 and the 245 mean that anything less than 10/255 goes to 0 (black) and everything lighter than 245/255 goes to 255 (white). This removes the noise which makes the lossless compression better23:46.38 
  nemo: saving that 120 dpi image after the transfer function as JPEG reduces the file size to 240Kb but degrades it a bit: http://casper.ghostscript.com/~ray/before_9_245_rgb_120dpi.png23:51.52 
  nemo: that's with the GS defaults of 90% and 2 1 1 2 23:52.22 
  nemo: so after you let me know, I'll tell you what it takes to do this with gs (if it isi possible)23:53.01 
  heading off to be with the family (I think they're home by now). Check back later23:53.55 
 Forward 1 day (to 2014/08/09)>>> 
ghostscript.com
Search: