| <<<Back 1 day (to 2014/08/07) | 2014/08/08 |
mvrhel_laptop | tkamppeter: are you there? | 03:46.36 |
| tkamppeter: just sent you an email about the open print summit scheulde | 03:51.13 |
| schedule | 03:51.21 |
| rayjj: Sorry this timing thing ended up being way more complicated | 03:56.19 |
rayjj | mvrhel_laptop: are you still unable to tell me what gs you used on your Pi testing ? | 04:56.58 |
| mvrhel_laptop: about to send out updated timings... | 04:57.10 |
| timings sent | 05:00.46 |
| next I'll try an old version (from about 1 yr ago) | 05:01.22 |
kens | mlen (for the logs) I need to know if you are happy with the changes I made to your tiffsep patch before I can commit it. | 07:09.48 |
tomty89 | can someone help with this: [ Error handled by opdfread.ps : | 08:02.48 |
| typecheck; OffendingCommand: gt ] | 08:02.49 |
kens | There's a problem. | 08:03.01 |
| You've used ps2write to create a PostScript file, and something about the file is not compatible with your device. | 08:03.27 |
| What are you sending the PostScript file to ? | 08:03.36 |
tomty89 | a ricoh printer, or cups with its pxl driver | 08:03.54 |
| the error was printed on paper | 08:04.06 |
kens | It would be, yes | 08:04.12 |
tomty89 | It only occurs in LibreOffice, so far | 08:04.26 |
kens | I don't hitnk you can be sending it to a PXL driver, unless you are going via Ghostscript again (in which case, don't) | 08:04.34 |
tomty89 | by "the driver", i mean a ppd from ricoh/openprinting, which has a gs command inside it | 08:05.45 |
kens | You need to find out whether you are sending PostScript directo to your printer, or whether you are sending it to Ghostscript in order to have it converted to PXL. Basically you need to sort out what CUPS is doing. Then you can send us a spcimen file to reproduce the problem, and a command line (assuming Ghostscript is interpreing the PostScript) | 08:06.00 |
tomty89 | I think it's sending to ghostscript, afaik my printer doesn't support postscript | 08:06.36 |
| coz it plays fine with a generic pcl or pdf driver, just as the config test page printed | 08:07.00 |
kens | Right, so it looks like you've taken a PDF file, converted it to PostScript (using Ghostscript), then sent the PostScript to Ghostscript again in order to convert to PXL. Not really a great sequence, you should convert the PDF to PXL in one step. | 08:07.06 |
| Like I said, you need to figure out what CUPS is doing. Once you know that, you can send us the original file, and the command line required to reproduce the problem. | 08:07.52 |
tomty89 | i guess it's ricoh's bad indeed, but i think there's a problem between LO and CUPS/GS, coz I export the file to PDF and print with Evince, it prints | 08:08.17 |
kens | I'm not saying its a Ricoh problem, in fact it probably isn't. It looks to me like GS is consuming the PostScript. | 08:08.44 |
| But I cannot help you at all with CUPS. In order to help you I need the original PDF file, the command line used to convert that to PostScript and the command line being used to invoke Ghostscript when consuming the PostScript file. | 08:09.33 |
tomty89 | well https://gist.github.com/anonymous/aabb8c55b77f733db6af from cups log | 08:12.28 |
| http://www.openprinting.org/download/PPD/Ricoh/PXL/Ricoh-MP_C4503_PXL.ppd | 08:12.39 |
kens | SO that's the conversion to PostScript, but the PPD doesn't help me at all. | 08:13.06 |
| Once you have the original PDF file, teh command line to convert to PostScript, and ideally the command line being used for sending to the printer, the best thing to do is open a bug report at bugs.ghostscript.com | 08:13.49 |
| You can attach the file there, and I'll be able to look at it. | 08:14.12 |
| door brb | 08:14.15 |
tomty89 | the problem only occurs if I print in LibreOffice, and it seems to happen for choosing to emit PDF and PS | 08:15.01 |
| but as I said, if I export it to PDF with LibreOffice, and print it with something else, it prints fine :( | 08:15.52 |
kens | Presumably because it skips the conversion to PostScript step | 08:16.34 |
| It looks to me like you are doing PDF->PS->PXL tehn all you really need is PDF->PXL | 08:17.00 |
| Now I'd like to fix the PDF->PS bug, but to do that I need the PDF file and the command line to create it (which you pasted above) and ideally the command line to do PS->PXL | 08:17.38 |
| If you really can't get that, then I can maybe live without it, I cna at least try | 08:17.54 |
| But again, best bet is to open a bug report, attach the PDF file that is used to go PDF->PS, and the command line used to do that | 08:18.48 |
kens | is amused by the -sLanguageLevel=3 in the CUPS invocation | 08:19.19 |
tomty89 | it's a document at work, so not at convenience to share, but the problem doesn't seem very document-specific to me, though it's pretty "delicate" to trigger it | 08:21.14 |
kens | I'm afraid if you can't find a document to share, there's no way we cna fix the problem. We need to be able to reproduce it to see what's wrong | 08:21.47 |
| The attachment cna be marked private so that only Artifex staff can access it, or you can send me it via email if that helps | 08:22.16 |
| By the way,sharing the LibreOffice document won't help, because we'd have to have exactly the same setup as you in order to generate the same PDF/PS file, we really need the PDF input that gets sent to Ghostscript by CUPS | 08:24.56 |
tomty89 | hmm, then the problem comes, how to get that "PDF input"? | 08:26.16 |
kens | I believe that you can capture it in CUPS | 08:26.29 |
chrisl | https://wiki.ubuntu.com/DebuggingPrintingProblems#Capturing_print_job_data | 08:27.42 |
tomty89 | ok thanks | 08:29.28 |
chrisl | I need to run an errand - back in half an hour or so..... | 08:30.24 |
mlen | kens: sorry for the late reply. I just tested the patch. Everything works fine :) | 08:32.37 |
kens | mlen, you are happy enough with the format of the output ? | 08:32.50 |
mlen | yes, it's still easy to parse | 08:32.59 |
kens | OK great I'll commit it now then, thanks! | 08:33.08 |
mlen | kens: thanks! :) | 08:33.43 |
kens | Just need to stash my current work first...... | 08:33.58 |
| Oh, I'd better add the new switch to the docs too | 08:35.24 |
tomty89 | heck, the gs command indeed have a problem! i guess i can file a bug report later | 08:45.12 |
kens | If there's a bug, we'd like to fix it.... | 08:45.39 |
tomty89 | i captured it in both pdf and ps, and the gs command output a file which evince could read | 08:45.57 |
| but if i print only page 1, which prints fine, evince read the output | 08:46.37 |
kens | I'm not sure I se the difference..... | 08:47.05 |
| But if you file a bug report, I'm sure it will be clearer | 08:47.49 |
tomty89 | lol, s/could/couldn't for the first line | 08:48.05 |
kens | Aha, OK | 08:48.11 |
| That makes more sense | 08:48.15 |
| Is this PostScript output ? I didn't think evince could read PostScript | 08:48.26 |
tomty89 | it is, and it could (with some lib or ghostscript itself) | 08:49.11 |
kens | Probably it uses GS to cvonvert to PDF..... | 08:49.25 |
| The good news here is that, in both cases, Ghostscript is being used to read the PostScript and Ghostscript doesn't like the PS file. THis is good because, unlike printers, we can actually debug it ourselves :-) | 08:50.05 |
tomty89 | :) | 08:50.22 |
kens | You may be the first person to actually find a bug in the ps2write output, instead of a buggy PostScript printer | 08:50.51 |
tomty89 | lol | 08:50.57 |
| not sure if it matters, but my documents are mainly of Chinese characters | 08:52.11 |
kens | Well it means I can't read them, but that's not a problem | 08:52.27 |
tomty89 | I'll file a bug report tonight after work :) | 08:53.08 |
kens | It probably means they will end up being bitmaps in the PostScript output too, so its a good idea to make sure the resolution is correct for your printer. I'd be surprised if CUPS does that. Beyond that, its not a problem in general | 08:53.10 |
tomty89 | well, the ppd provides the resolution options, and the ppd is provided by ricoh (semi-officially, i guess :S) | 08:54.35 |
mattchz | morning. | 09:14.31 |
| does someone want to add a comment to this: http://ask.slashdot.org/story/14/08/07/1811227/ask-slashdot-best-pdf-handling-library | 09:14.35 |
kens | Morning matt | 09:14.38 |
tomty89 | heh, even gs gives the same error on paper when viewing the gs outputs, convincing enough | 09:15.22 |
kens | tomty89 : yes this is what I expect. | 09:15.36 |
| From what you described earlier | 09:15.47 |
tomty89 | Could it be LibreOffice's fault? I mean maybe it outputs bad pdf and ps for gs | 09:16.48 |
kens | tomty89 : almost certainly not. Its most likely a bug, which requires specific PDF to trigger it. But without seeing the file I cna't really tell | 09:17.16 |
tomty89 | ok, i am sure i'll file a bug report, thanks for your generous help :) | 09:17.52 |
kens | NP please do file a report, otherwise we won't be able to fix it. Its surprising how many peple can't see that :-( | 09:18.17 |
tomty89 | btw i'm not sure if it's a good or bad news, another output from a spreadsheet also trigger the problem | 09:18.51 |
kens | Well its probably like I said, you need a specific type of PDF. | 09:19.12 |
| It can't be common though, or we'd have heard before :-) | 09:19.25 |
tomty89 | i see | 09:19.33 |
kens | chrisl is tor on vacation today ? | 09:50.53 |
chrisl | Not that I'm aware of, but I haven't really kept tabs | 09:51.46 |
kens | Hmm, I really need to speak to him, I can't build the latest MuPDF under Windows. Or more accurately I can't build mudraw | 09:52.13 |
chrisl | What's the error? | 09:52.30 |
kens | lots of them all much the same: | 09:52.42 |
| error C2065: cmap_UniCMS_X : undeclared identifier | 09:53.01 |
| I cna probably fix it, but I'd prefer tor to tell me why its wrong..... | 09:53.17 |
chrisl | Have you tried git clean -x -f -d then rebuild? | 09:54.04 |
kens | Hmm, no let me try that | 09:54.16 |
| Oh that removes my project, oh well | 09:55.16 |
| Ah that seems to work, thanks chrisl | 09:56.10 |
chrisl | NP | 09:56.23 |
kens | Now I cna check mudraw and send an email | 09:56.39 |
| Oh, first test I try it gets it wrong | 09:57.52 |
| Well I'll send them an email by the time they figure out how to build MuPDF it'll probably be fixed :-) | 09:58.53 |
| Hmm, tor must have been lurking somewhere -) | 10:14.44 |
chrisl | Weird, on my computer, the Acrobat PS output for the first 100 pages of the PLRM (using the Ray's benchmark 600dpi command line) takes 22 seconds. If I "convert" the Acrobat PS through ps2write, and run the result, it takes 10 seconds....... | 10:16.44 |
kens | A win ! :-) | 10:16.57 |
chrisl | Yes, but this is odd: doing that I get Type 1 outlines in the ps2write output, converting the PDF via ps2write directly, I get type 3 bitmap glyphs..... | 10:17.46 |
kens | Well I noticed that the PLRM has a *lot* of fonts, often subset, and including some multiple masters | 10:18.11 |
chrisl | I'm suspicious that the PLRM.pdf does weird sh*t with encodings, which probably confuses ps2write | 10:18.34 |
| kens2: well, the only thing is, our PS output is already faster than Acrobat despite the large number of type 3 glyphs we define at the start - if we avoided that, we might be *much* faster..... | 10:26.06 |
mattchz | anybody fancy commenting on this: http://bugs.ghostscript.com/show_bug.cgi?id=695336 | 10:27.04 |
chrisl | mattchz: probably needs paulgardiner | 10:27.43 |
kens2 | chrisl if I could get this fallback code to work it would emit fewer glyphs. The charprocs, when stored as outlines, are captured using the identity matrix, then the text should be scaled by the CTM. THis means that duplicate charprocs would be noticed and elided, even when the font is a different size. If I could only get the matrix right..... | 10:27.44 |
| I saw the post to the thread, nothing I cna say about it | 10:28.00 |
| chrisl OK is htis the PDF file i SENT THE OTHER DAY YOU ARE USING ? | 10:29.08 |
| Grr stupid caps lock key | 10:29.18 |
chrisl | kens2: yes, it is | 10:29.21 |
kens2 | OK I'll take a quick look. | 10:29.33 |
paulgardiner | I don't think we support any form of writing of encrypted files. | 10:29.59 |
kens2 | Better stash my current code first though, or I'll get funny results | 10:30.13 |
chrisl | kens2: it's not urgent/vital, just interesting - it's odd that the Postscript is a lot slower than the PDF | 10:30.16 |
kens2 | chrisl well, the PostScript is (IMO) terrible. THe only thing in its favour is it does actual;ly work (mostly) | 10:30.47 |
chrisl | kens2: this is the Acrobat produced Postscript - *our* Postscript is faster | 10:31.13 |
kens2 | :-D | 10:31.27 |
chrisl | The Acrobat PS must be pretty appalling] | 10:32.07 |
kens2 | It really must be, yes. | 10:32.18 |
chrisl | But the main this is, for henrys, that it really seems the problem is construction of the PS, not necessarily a problem with out PS interpreter | 10:33.27 |
| s/out/our | 10:33.43 |
| I seem to struggling with the typing today :-( | 10:33.58 |
kens2 | If you run the 2 PS versions through acrobat, is ours faster there too ? | 10:34.01 |
| Distiller that its | 10:34.09 |
chrisl | I haven't tried it - I've generally struggled to get meaning performance numbers from Distiller | 10:34.44 |
kens2 | Oh I use a stopwatch, I don't believe what Distiller reports, it lies | 10:35.02 |
chrisl | Well, with a time of less than three seconds..... | 10:36.25 |
kens2 | THat's kind of tricky | 10:36.37 |
| Hmm, the very first glyph triggers a fallback O.O | 10:37.06 |
chrisl | Oh my. Is it just heading that way because it's a non-standard encoding? | 10:37.54 |
kens2 | Don't know yet | 10:38.03 |
| Its coming back from pdf_obtain_font_resource with an error, I have to keep on tracking it down | 10:38.22 |
| AH, pdev->HaveCFF is false | 10:39.55 |
| And since tghe fonts are CFF fonts..... | 10:40.10 |
chrisl | Huh? So I wonder how the pdfwrite output, converted to PS contains T1 fonts | 10:40.48 |
| Hmm, our Postscript is much, *much* slower through Distiller than the Acro produced PS - 3 seconds, versus 57 seconds | 10:41.03 |
kens2 | Well I guess not all the fonts are CFF | 10:41.07 |
chrisl | They are all Type1C, IIRC | 10:41.19 |
kens2 | That's CFF though isn't it ? | 10:41.31 |
chrisl | CFF charstrings | 10:41.43 |
kens2 | Right, and I htought you said hte pdfwrite output was type 1C | 10:42.00 |
chrisl | Yes, it is | 10:42.16 |
kens2 | OK so CFF in CFF out, I must be missing your point | 10:42.31 |
| Oh I see what you mean, the pdfwrite converts to type 1 | 10:42.51 |
| No idea how that works | 10:42.58 |
| I've just found a control called HaveCIDSystem which apparently allows ps2write to output CIDFOnts. I wonder if it works | 10:43.57 |
chrisl | I don't think it does work, I think I tried that before. Might work for Type 1 outlines? | 10:44.25 |
kens2 | Maybe, I obviously keep forgetting about it | 10:44.42 |
| OK so HaveCFF is always false for ps2write | 10:44.59 |
chrisl | Even if it works, it's of very limited use without TTF outlines support | 10:45.13 |
kens2 | also for pdfwrite if PDF level is < 1.2 | 10:45.20 |
chrisl | So we'll convert type 1 charstrings to CFF, but not the other way? | 10:46.45 |
kens2 | I don't know. I was only looking at whether we will emit CFF or not. If tis ps2write or PDF < 1.2 we won't. | 10:47.38 |
| well I converted the PDF file to PDF using pdfwrite, and ps2write is still going through the fallback code for me | 10:48.22 |
chrisl | Hmm, maybe I confused myself with all the different versions of the file I've been trying | 10:48.57 |
kens2 | Well for me I still get a load of bitmaps charprocs in teh output PS file | 10:49.41 |
| brb | 10:49.45 |
| Nothing but interruptions today..... | 10:52.29 |
chrisl | Well, I suggest we not worry about this just now. Wait and if the new charproc outlines capture brings the improvement we expect | 10:53.53 |
| Wait and see.... | 10:54.03 |
kens2 | If I cna ever getit to work | 10:54.20 |
dcmst | Hi, is it possibile to disable the "auto advance" feature in muPDF presentation mode? | 11:33.37 |
kens2 | I'm sure you can write code to do so, I've no idea what that is though | 11:34.23 |
dcmst | so there is no user interface to disable it (like options, shortcuts, etc.)? | 11:39.53 |
kens2 | Well I don't know what feature you are referring to. | 11:40.09 |
| It will also depend on what platofrm you are running on | 11:40.32 |
| But probably, no. | 11:40.42 |
dcmst | this is where the feature I want to disable is described: http://ghostscript.com/pipermail/gs-commits/2012-October/015421.html | 11:41.19 |
kens2 | Well not pressing p woudl seem to be favourite then | 11:41.47 |
| If you don't do that, tehn it won't be in presentation mode and so won't advance | 11:42.13 |
dcmst | I want presentation mode without auto advancing | 11:42.47 |
kens2 | Then you will need to write it yourself. | 11:43.20 |
dcmst | I need the transition effect (I'm recording a video of the pdf) | 11:43.20 |
chrisl | kens: as last week, I'm heading out for a bit of squash training in a bit - I'll call if I get back early enough | 13:12.17 |
kens | OK have fun | 13:12.24 |
chrisl | I think if it's "fun", I'm probably doing it wrong ;-) | 13:12.45 |
kens | :-) | 13:13.03 |
rayjj | chrisl_away: (for the logs) That "trick" of converting the Acrobat PS of the first 100 pages of the PLRM (with the setting to preload the fonts, not incremental) using gs ps2write resulted in a PS fle that is 1.5Mb and runs in 45 seconds !!! | 14:15.58 |
| kens: it looks to me like the presentation mode isn't mentioned in the 'usage' in x11/pdfapp.c | 14:32.48 |
kens | Possibly not. | 14:33.07 |
| But that wasn';t really what he was asking about nayway as such. THe 'feature' he was describing is just part of 'presentation mode' which is why I had no idea what he was talking about | 14:33.43 |
rayjj | looks like adding a prefix to set the time for the delay on each page would be easy, then 0p could set infinite time (right now AIUI, 5 seconds is hard coded == 5p) | 14:42.05 |
kens | Yes, its hard coded. The poster opened a bug report, so its up to Tor now from my POV | 14:42.44 |
tkamppeter | mvrhel_laptop, hi | 14:45.01 |
mvrhel_laptop | hi tkamppeter. did you get my email? | 14:52.56 |
tkamppeter | mvrhel_laptop, yes, it arrived around 6am here, you had already left when I saw it at our 8am. | 14:57.41 |
mvrhel_laptop | ok | 14:57.52 |
tkamppeter | mvrhel_laptop, to get your presentation onto the Thu or Fri we need to contact Mike Sweet and/or Ira. | 14:58.43 |
robin_watts_mac | MuPDF can't write encrypted files at all. | 14:59.12 |
| hence we can't write annotations to encrypted files. | 14:59.35 |
mvrhel_laptop | tkamppeter: ok. I think Ira was cc'd on that email. If you can handle this that would be great | 14:59.55 |
tkamppeter | mvrhel_laptop, I have sent out a mail to them now to see what can be done. | 15:08.11 |
mvrhel_laptop | ok | 15:08.24 |
| thanks tkamppeter | 15:08.30 |
rayjj | coffee. bbiab | 15:10.21 |
tomty89 | hi it's me again. i am filing a bug report for the possible gs bug which output invalid file with LibreOffice emission. Is it true that I can mark the attachments as private? | 15:11.30 |
e98 | any devs awake and reading? | 15:26.21 |
rayjj | e98 was too quick :-( | 15:28.33 |
kens | No patience..... | 15:28.43 |
tomty89 | kens: it is true that i can make the attachment private? coz i don't see an option in bugzilla | 15:28.49 |
kens | tomty89 : you can't, butI can | 15:28.58 |
tomty89 | lol | 15:29.02 |
| ok | 15:29.04 |
| gonna upload them now | 15:29.11 |
kens | Just stick it there and tell me and I'll make it private | 15:29.15 |
rayjj | tomty89: if kens is gone, and I notice the attachment (we get email) I'll mark it private | 15:30.10 |
kens | I'll be here for a bit yet | 15:30.24 |
rayjj | chrisl: thanks for the idea. Works a champ. Now all we have to do is get ps2write to do as well without Acrobat "helping" beforehand :-) | 15:30.58 |
chrisl | rayjj: well, as discussed with kens, hopefully the work he's doing now will improve the situation quite a bit | 15:31.35 |
kens | Well, I can get the text to come out now, positioned in the right place, and correctly sized. But the Widths are wrong, and all the curves are 'warped' for some reason. | 15:32.08 |
rayjj | chrisl: I didn't read the logs thoroughly yet | 15:32.12 |
tomty89 | upload is done, thanks :) | 15:32.43 |
rayjj | kens: you're capturing the glyphs as outlines ? | 15:32.46 |
kens | ok 2 secs | 15:32.49 |
| done | 15:34.40 |
tomty89 | :D | 15:35.38 |
rayjj | kens: even building fonts of the glyph bitmaps would be better, I'd think. Needing 11k glyphs for 100 pages doesn't seem reasonable -- there must be quite a bit of duplication | 15:35.42 |
kens | It does build fonts with the glyph bitmaps, type 3 ones | 15:36.33 |
rayjj | kens: I see. I guess it's just not carrying the glyphs over pages or something. 110 or so unique glyphs per page seems reasonable for the PLRM | 15:39.08 |
kens | No, because they are bitmaps, they are different for each size or transform of each glyph | 15:39.38 |
henrys | kens:Iâll look at the FirstPage/LastPage thing that came in. | 15:47.18 |
kens | OK thanks henrys | 15:47.45 |
rayjj | hmm.. that seems unfortunate. The default build for mupdf in Makefile is "debug". Most people won't know to say "make build=release" since the README doesn't mention it | 16:16.52 |
| that means that newbies evaluation mupdf performance will be getting a debug build :-( | 16:17.23 |
| mvrhel_laptop: have you done any performance testing on linux with mupdf ? | 16:18.18 |
mvrhel_laptop | rayjj: I did testing on the pi | 16:18.35 |
| Robin and I match when I did that | 16:19.03 |
| matched | 16:19.07 |
| so I am pretty sure I was not doing a debug build. plus mupdf was faster that gs | 16:19.49 |
| than | 16:19.52 |
| can't type today | 16:19.54 |
henrys | rayjj: the default build in ghostscript is debug in VS too. | 16:20.32 |
| I find that odd also | 16:21.10 |
| mvrhel_laptop: when is the meeting? | 16:40.56 |
mvrhel_laptop | it is at 11 today | 16:41.07 |
henrys | mvrhel_laptop: they have a habit of stopping in chewing engineering hours and disappearing I wonder if it is a strategy | 16:42.24 |
mvrhel_laptop | henrys: I hope something comes out of all of this. | 16:42.50 |
henrys | kens, chrisl : I didnât know adobe was prompting to install font packages when viewing a pdf, when did that come about. Never seen it on the mac, just windows which Iâve been using more frequently lately. | 16:45.57 |
kens | CJK font pack ? | 16:47.05 |
| I alwys install it immediately anyway | 16:47.22 |
chrisl | I'm still using Acro9 so...... | 16:47.23 |
henrys | kens: yes it install a cjk font package when vieiwing the jeitta files | 16:56.25 |
kens | AH well I install those when I install Acrobat, so I wouldn't get prompted for that. I always install all the fonts initially | 16:57.00 |
| 695417 | 16:57.07 |
| OOps | 16:57.10 |
| Night all | 17:01.30 |
rayjj | mudraw is amazingly slow at 600 dpi. CMYK on the PLRM is 32 seconds per page on the Pi, but it's even 5+ seconds per page on my laptop. | 22:16.30 |
| is this a reasonable command line? : mudraw -r 300 -o /dev/null -F pam -c cmyk -b 0 -B 661 -m -M PLRM_100_AR.pdf | 22:17.22 |
| even without the -B 661 it is just as slow. gs does this *FAST* on my laptop (all 100 pages in 5.8 sec, and it was < 62 sec on the Pi) | 22:19.49 |
| note the -B 661 for mudraw was to constrain the memory use to near what the -dBufferSpace=16m does to gs forcing 10 bands | 22:21.03 |
nemo | so. given my general failures to make parameters to gs do anything to the jpeg quality of the resulting PDFs, I tried grepping for QFactor in source | 22:24.01 |
| I modified Resource/Init/gs_pdfwr.ps and set all of them to 0.95 1111 | 22:24.20 |
| reran make | 22:24.22 |
| reran PDF generation | 22:24.26 |
| image was totally unchanged from every other attempt | 22:24.34 |
| what am I missing â¹ | 22:24.38 |
rayjj | nemo: you probably need kens or someone to dig into pdfwrite. Even though it is in the docs, pdfwrite may not pay attention to the Filter params. And AFAIK, gs_pdfwr.ps isn't used (it was used only by pdfopt.ps which was intended to allow PS programs to load, then output PDF files) | 22:27.10 |
| on j9_acrobat.pdf gs on the Pi, gs does all 5 pages in 47 sec (at 600 dpi) and mudraw takes 242 sec :-( | 22:28.07 |
| nemo: bad news. Looking at devices/vector/gdevpdfu.c in pdf_put_filters, there is a comment /* Currently this only saves parameters for CCITTFaxDecode. */ | 22:33.49 |
| nemo: which seems to correspond to code I see later that never calls s_DCTE_get_params (as it does for s_CF_get_params) | 22:35.39 |
nemo | ಠ_ಠ| 22:40.40 |
| I'm astounded that setting the jpeg quality in PDF images has turned out to be such a massive task | 22:41.08 |
| 'cause, every single one outputted so far has been unusably muddy | 22:41.30 |
rayjj | nemo: in fact, there are many parameters in the Ps2pdf.htm doc that are mentioned that are currently ignored by pdfwrite AFAICT, but we'd have to wait for kens to make sure | 22:41.54 |
nemo | I'd be perfectly happy to hardcode it if I knew where to do it | 22:42.39 |
rayjj | nemo: and just downsampling and lossless doesn't cut it ? | 22:42.45 |
| nemo: can you send a page of the file to me: ray at artifex.com with the params that you are using so I can play with it. I may try more extreme ImageResolution and forcing Interpolate true | 22:44.06 |
nemo | rayjj: um. not sure... do you have a commandline for that? 'cause my prior attempt to do that still resulted in jpeg compression | 22:44.10 |
| rayjj: man. I'm hardpressed to come up w/ a page I can share | 22:44.25 |
| but seriously, this happens on like every single PDF I've tried so far | 22:44.38 |
| ugh. is late on a friday and I have to head out anyway. I'll attack it next week I 'spose :-/ | 22:45.08 |
rayjj | nemo: that's the other thing I want to look at, is why the text is being done with JPEG | 22:45.25 |
nemo | rayjj: well. these are scans of existing docs. | 22:45.38 |
| rayjj: their first scan as mentioned before was done stupidly | 22:45.48 |
| not in document mode, so no background removal | 22:46.01 |
rayjj | nemo: right, and if the original scan was muddy (not just big) we may not be able to do much | 22:46.13 |
nemo | of the ones after they fixed this, a number still had enough ink and whatnot garbage that they were still enormous after Flate | 22:46.24 |
| rayjj: well, what bugs me is, I print out a page of the original, muddy... | 22:46.44 |
| I do something in gs, and I get a ton of compression artifacts around everything | 22:46.57 |
| but... no matter *what* I try to change in the params, I always get compression artifacts. | 22:47.10 |
rayjj | nemo: but the text is supposed to be black ? Just happened to be a color scan ? | 22:47.14 |
nemo | most of the scans are greyscale | 22:47.22 |
rayjj | because we never use Flate (AFAIK) with monochrome images | 22:47.30 |
nemo | some are colour (blue ink, the odd picture) | 22:47.32 |
| ah | 22:47.39 |
| what bugs me is that all my outputs from gs are basically identical | 22:47.58 |
| nothing I've done, in the hours of messing with this, seems to cause the slightest bit of difference to the output | 22:48.12 |
| unlike, say, opening it in GIMP and saving at a few different jpeg levels, where the differing results are obvious | 22:48.27 |
| it's as if gs is ignoring everything, and always saving low quality jpeg | 22:48.40 |
rayjj | nemo: but gimp doesn't preserve the pdf text, right ? | 22:48.59 |
nemo | yeah. that was just an example | 22:49.12 |
rayjj | right, so you need to extract the image, process it, and put the PDF back with the modified image, leaving the rest of the (presumably OCR layer of text) in the PDF | 22:50.18 |
| many scanner apps OCR the text and put it into the PDF as Tr3 (Invisible) text so that it is searchable and can be cut and paste | 22:51.08 |
nemo | really, the results in gs are more like gimp at 15% quality. | 22:51.22 |
rayjj | then the image goes in "however" | 22:51.29 |
nemo | hm... | 22:51.37 |
| that really would do the trick | 22:51.45 |
| I can process this image in absolutely anything. convert for example | 22:51.55 |
| I don't need to use ghostscript | 22:51.59 |
| is more the "stitching it back together" that is the tricky part | 22:52.08 |
rayjj | nemo: but I'd have to see an example to have ideas that are more than just guesses (just one page) | 22:52.18 |
| nemo: mutool is the thing for PDF manipulation | 22:52.53 |
nemo | hm. found something that looks genericish | 22:53.18 |
rayjj | usage: mutool <command> [options] | 22:53.22 |
| clean -- rewrite pdf file | 22:53.24 |
| extract -- extract font and image resources | 22:53.25 |
| info -- show information about pdf resources | 22:53.27 |
| poster -- split large page into many tiles | 22:53.28 |
| show -- show internal pdf objects | 22:53.30 |
nemo | 'k | 22:53.48 |
| so. how do I extract 'sactly one page from a PDF? | 22:53.57 |
rayjj | nemo: and each of those functions has options for how it works | 22:54.06 |
nemo | well. I mean, without altering it | 22:54.22 |
| I wonder if pdfseparate would do that | 22:54.45 |
rayjj | nemo: mutool clean -ggg in.pdf out.pdf | 22:54.57 |
| nemo: mutool clean -ggg in.pdf out.pdf 1 | 22:55.03 |
nemo | ok | 22:55.06 |
rayjj | (for just page 1) | 22:55.08 |
| nemo: pdfTk also lets you "burst" a PDF, but AFAIK it writes all the pages | 22:55.41 |
| nemo: mutool clean -ggg in.pdf out.pdf 1-10 for the first 10 pages | 22:55.58 |
| the -ggg leaves out unused Resources (fonts, images, etc.) that may be in the original | 22:56.36 |
nemo | http://m8y.org/tmp/after.pdf - after gs | 22:57.21 |
| http://m8y.org/tmp/before.pdf - before gs | 22:57.57 |
| I can totally understand some muddying. What puzzles me is that nothing I do seems to change it at all | 22:58.37 |
rayjj | nemo: the "after" still looks fine to me. I can even read the "OFFICE OF MINORITY HEALTH" | 22:59.32 |
nemo | rayjj: it looks a lot worse on docs with background removal I gotta say | 22:59.47 |
| rayjj: also, it looks fine on the screen. is when printed that the fuzziness around the letters becomes clearer | 23:00.09 |
| frankly, I was still happy w/ it, was just boss who was not, and wanted me to improve quality | 23:00.25 |
| which, I thought would be an easy task | 23:00.29 |
rayjj | nemo: I'll try printing it and see what I get (have to go home for that) | 23:00.31 |
nemo | no biggie | 23:00.38 |
| hm | 23:00.51 |
rayjj | nemo: and I'll have a look at the contents of the "before" | 23:00.54 |
nemo | lemme take a pic on my phone ⺠| 23:00.57 |
rayjj | nemo: mutool clean info before.pdf shows: [ DCT ] 3435x4394 8bpc DevRGB (3 0 R) and "after" has: [ DCT ] 1030x1318 8bpc DevRGB (10 0 R) | 23:05.02 |
nemo | 10 0 R ? | 23:06.54 |
rayjj | which means you are starting with a 400 dpi image and going down to a 120 dpi image | 23:06.57 |
nemo | sure | 23:07.09 |
rayjj | nemo: that's the object number | 23:07.11 |
nemo | I pasted that in all my gs scripts ⺠| 23:07.18 |
| but... the artifacts around text don't appear related to dpi | 23:07.28 |
| http://m8y.org/tmp/temp.jpeg | 23:07.30 |
| I guess they could be | 23:07.44 |
| but they look more like the usual jpeg stuff | 23:07.49 |
| s/gs scripts/gs commandlines/ | 23:07.57 |
| the general object was to take something that was at a super-high DPI and reduce it to something that was still reasonably printable | 23:08.33 |
rayjj | nemo: OK. I see the diffs in temp.jpg | 23:08.34 |
nemo | it's even more dramatic on an image with background removal | 23:09.08 |
rayjj | nemo: right. I'll have a look | 23:09.16 |
nemo | there's grey fuzzy blocks around all text. I can photograph a piece of one of those pages too | 23:09.36 |
rayjj | nemo: background removal often involves a "thresholding" stage that makes sharp edges, but this is *NOT* something that JPEG compression deals with well | 23:10.01 |
nemo | well | 23:10.15 |
| I was going to experiment with lossy vs non-lossy compression | 23:10.34 |
| see if, say, jpeg 0.95 or something outperformed non-lossy overall | 23:10.47 |
| but I was getting truly hideous results, and couldn't get any difference in appearance in gs | 23:11.02 |
rayjj | nemo: The main thing I see is to try and compress monochrome (or near mono) using CCITT | 23:11.19 |
| mutool extract lets me extract the image and play with it | 23:12.06 |
| nemo: that example is good because it has non-black (gray) text as well as images, all in the same image. You don't want ot make the text nicer at the expense of crap on other parts of the image | 23:13.32 |
nemo | there's a reasonable amount of that in these scans | 23:17.03 |
| here's what happened on a form with background removal, and the same parameters | 23:17.16 |
| the excessive jpeg compression is more obvious | 23:17.23 |
| http://m8y.org/tmp/temp2.jpeg | 23:17.33 |
| aaaanyway. way-late here. gotta go | 23:17.59 |
| I appreciate you taking an interest, and if you see absolutely anything on controlling jpeg quality in gs, that'd be lovely | 23:18.19 |
| simply calling one gs commandline seems more elegant than working out something w/ mutools and convert | 23:18.36 |
| but, eh, I imagine the latter is more "unixy" | 23:18.43 |
| and certainly removing the image and popping it back in gives me a lot more control over what happens to the image | 23:19.00 |
| can even do stuff based on, say, imagemagick identify's analysis of it (number of colours etc) | 23:19.18 |
| I just need to figure out how to make these mutools do that, since I just heard about 'em, and make sure I don't screw up the PDF structure (bookmarks, text OCR, whatnot) | 23:20.11 |
| page dimensions, image position... | 23:20.22 |
rayjj | nemo: I think a transfer function might help. The histogram has most of the "fuzz" in the before up near white and black. After mapping those to pure white or black, the noise in the image is MUCH better | 23:21.59 |
| then getting gs to output as a lossless Gray image should be a decent size | 23:22.37 |
| nemo: so, the question is: the "before" image is 4.5Mb and the "after" image is 640Kb. If I munge to gray and save "lossless" I get down to 280Kb and it looks pretty good (to me). See: http://casper.ghostscript.com/~ray/before_9_245_gray_120dpi.png | 23:39.14 |
| it has a lot less noise that the "after" image" http://casper.ghostscript.com/~ray/img-0010.png | 23:40.19 |
| nemo: keeping RGB (not forcing gray) but doing the transfer fucntion and lossless gets to < 600 Kb and the image is: | 23:44.10 |
| http://casper.ghostscript.com/~ray/before_9_245_rgb_120dpi.png | 23:44.58 |
| nemo: the 9 and the 245 mean that anything less than 10/255 goes to 0 (black) and everything lighter than 245/255 goes to 255 (white). This removes the noise which makes the lossless compression better | 23:46.38 |
| nemo: saving that 120 dpi image after the transfer function as JPEG reduces the file size to 240Kb but degrades it a bit: http://casper.ghostscript.com/~ray/before_9_245_rgb_120dpi.png | 23:51.52 |
| nemo: that's with the GS defaults of 90% and 2 1 1 2 | 23:52.22 |
| nemo: so after you let me know, I'll tell you what it takes to do this with gs (if it isi possible) | 23:53.01 |
| heading off to be with the family (I think they're home by now). Check back later | 23:53.55 |
| Forward 1 day (to 2014/08/09)>>> | |