Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2014/08/07)	2014/08/08
mvrhel_laptop	tkamppeter: are you there?	03:46.36
	tkamppeter: just sent you an email about the open print summit scheulde	03:51.13
	schedule	03:51.21
	rayjj: Sorry this timing thing ended up being way more complicated	03:56.19
rayjj	mvrhel_laptop: are you still unable to tell me what gs you used on your Pi testing ?	04:56.58
	mvrhel_laptop: about to send out updated timings...	04:57.10
	timings sent	05:00.46
	next I'll try an old version (from about 1 yr ago)	05:01.22
kens	mlen (for the logs) I need to know if you are happy with the changes I made to your tiffsep patch before I can commit it.	07:09.48
tomty89	can someone help with this: [ Error handled by opdfread.ps :	08:02.48
	typecheck; OffendingCommand: gt ]	08:02.49
kens	There's a problem.	08:03.01
	You've used ps2write to create a PostScript file, and something about the file is not compatible with your device.	08:03.27
	What are you sending the PostScript file to ?	08:03.36
tomty89	a ricoh printer, or cups with its pxl driver	08:03.54
	the error was printed on paper	08:04.06
kens	It would be, yes	08:04.12
tomty89	It only occurs in LibreOffice, so far	08:04.26
kens	I don't hitnk you can be sending it to a PXL driver, unless you are going via Ghostscript again (in which case, don't)	08:04.34
tomty89	by "the driver", i mean a ppd from ricoh/openprinting, which has a gs command inside it	08:05.45
kens	You need to find out whether you are sending PostScript directo to your printer, or whether you are sending it to Ghostscript in order to have it converted to PXL. Basically you need to sort out what CUPS is doing. Then you can send us a spcimen file to reproduce the problem, and a command line (assuming Ghostscript is interpreing the PostScript)	08:06.00
tomty89	I think it's sending to ghostscript, afaik my printer doesn't support postscript	08:06.36
	coz it plays fine with a generic pcl or pdf driver, just as the config test page printed	08:07.00
kens	Right, so it looks like you've taken a PDF file, converted it to PostScript (using Ghostscript), then sent the PostScript to Ghostscript again in order to convert to PXL. Not really a great sequence, you should convert the PDF to PXL in one step.	08:07.06
	Like I said, you need to figure out what CUPS is doing. Once you know that, you can send us the original file, and the command line required to reproduce the problem.	08:07.52
tomty89	i guess it's ricoh's bad indeed, but i think there's a problem between LO and CUPS/GS, coz I export the file to PDF and print with Evince, it prints	08:08.17
kens	I'm not saying its a Ricoh problem, in fact it probably isn't. It looks to me like GS is consuming the PostScript.	08:08.44
	But I cannot help you at all with CUPS. In order to help you I need the original PDF file, the command line used to convert that to PostScript and the command line being used to invoke Ghostscript when consuming the PostScript file.	08:09.33
tomty89	well https://gist.github.com/anonymous/aabb8c55b77f733db6af from cups log	08:12.28
	http://www.openprinting.org/download/PPD/Ricoh/PXL/Ricoh-MP_C4503_PXL.ppd	08:12.39
kens	SO that's the conversion to PostScript, but the PPD doesn't help me at all.	08:13.06
	Once you have the original PDF file, teh command line to convert to PostScript, and ideally the command line being used for sending to the printer, the best thing to do is open a bug report at bugs.ghostscript.com	08:13.49
	You can attach the file there, and I'll be able to look at it.	08:14.12
	door brb	08:14.15
tomty89	the problem only occurs if I print in LibreOffice, and it seems to happen for choosing to emit PDF and PS	08:15.01
	but as I said, if I export it to PDF with LibreOffice, and print it with something else, it prints fine :(	08:15.52
kens	Presumably because it skips the conversion to PostScript step	08:16.34
	It looks to me like you are doing PDF->PS->PXL tehn all you really need is PDF->PXL	08:17.00
	Now I'd like to fix the PDF->PS bug, but to do that I need the PDF file and the command line to create it (which you pasted above) and ideally the command line to do PS->PXL	08:17.38
	If you really can't get that, then I can maybe live without it, I cna at least try	08:17.54
	But again, best bet is to open a bug report, attach the PDF file that is used to go PDF->PS, and the command line used to do that	08:18.48
*kens*	is amused by the -sLanguageLevel=3 in the CUPS invocation	08:19.19
tomty89	it's a document at work, so not at convenience to share, but the problem doesn't seem very document-specific to me, though it's pretty "delicate" to trigger it	08:21.14
kens	I'm afraid if you can't find a document to share, there's no way we cna fix the problem. We need to be able to reproduce it to see what's wrong	08:21.47
	The attachment cna be marked private so that only Artifex staff can access it, or you can send me it via email if that helps	08:22.16
	By the way,sharing the LibreOffice document won't help, because we'd have to have exactly the same setup as you in order to generate the same PDF/PS file, we really need the PDF input that gets sent to Ghostscript by CUPS	08:24.56
tomty89	hmm, then the problem comes, how to get that "PDF input"?	08:26.16
kens	I believe that you can capture it in CUPS	08:26.29
chrisl	https://wiki.ubuntu.com/DebuggingPrintingProblems#Capturing_print_job_data	08:27.42
tomty89	ok thanks	08:29.28
chrisl	I need to run an errand - back in half an hour or so.....	08:30.24
mlen	kens: sorry for the late reply. I just tested the patch. Everything works fine :)	08:32.37
kens	mlen, you are happy enough with the format of the output ?	08:32.50
mlen	yes, it's still easy to parse	08:32.59
kens	OK great I'll commit it now then, thanks!	08:33.08
mlen	kens: thanks! :)	08:33.43
kens	Just need to stash my current work first......	08:33.58
	Oh, I'd better add the new switch to the docs too	08:35.24
tomty89	heck, the gs command indeed have a problem! i guess i can file a bug report later	08:45.12
kens	If there's a bug, we'd like to fix it....	08:45.39
tomty89	i captured it in both pdf and ps, and the gs command output a file which evince could read	08:45.57
	but if i print only page 1, which prints fine, evince read the output	08:46.37
kens	I'm not sure I se the difference.....	08:47.05
	But if you file a bug report, I'm sure it will be clearer	08:47.49
tomty89	lol, s/could/couldn't for the first line	08:48.05
kens	Aha, OK	08:48.11
	That makes more sense	08:48.15
	Is this PostScript output ? I didn't think evince could read PostScript	08:48.26
tomty89	it is, and it could (with some lib or ghostscript itself)	08:49.11
kens	Probably it uses GS to cvonvert to PDF.....	08:49.25
	The good news here is that, in both cases, Ghostscript is being used to read the PostScript and Ghostscript doesn't like the PS file. THis is good because, unlike printers, we can actually debug it ourselves :-)	08:50.05
tomty89	:)	08:50.22
kens	You may be the first person to actually find a bug in the ps2write output, instead of a buggy PostScript printer	08:50.51
tomty89	lol	08:50.57
	not sure if it matters, but my documents are mainly of Chinese characters	08:52.11
kens	Well it means I can't read them, but that's not a problem	08:52.27
tomty89	I'll file a bug report tonight after work :)	08:53.08
kens	It probably means they will end up being bitmaps in the PostScript output too, so its a good idea to make sure the resolution is correct for your printer. I'd be surprised if CUPS does that. Beyond that, its not a problem in general	08:53.10
tomty89	well, the ppd provides the resolution options, and the ppd is provided by ricoh (semi-officially, i guess :S)	08:54.35
mattchz	morning.	09:14.31
	does someone want to add a comment to this: http://ask.slashdot.org/story/14/08/07/1811227/ask-slashdot-best-pdf-handling-library	09:14.35
kens	Morning matt	09:14.38
tomty89	heh, even gs gives the same error on paper when viewing the gs outputs, convincing enough	09:15.22
kens	tomty89 : yes this is what I expect.	09:15.36
	From what you described earlier	09:15.47
tomty89	Could it be LibreOffice's fault? I mean maybe it outputs bad pdf and ps for gs	09:16.48
kens	tomty89 : almost certainly not. Its most likely a bug, which requires specific PDF to trigger it. But without seeing the file I cna't really tell	09:17.16
tomty89	ok, i am sure i'll file a bug report, thanks for your generous help :)	09:17.52
kens	NP please do file a report, otherwise we won't be able to fix it. Its surprising how many peple can't see that :-(	09:18.17
tomty89	btw i'm not sure if it's a good or bad news, another output from a spreadsheet also trigger the problem	09:18.51
kens	Well its probably like I said, you need a specific type of PDF.	09:19.12
	It can't be common though, or we'd have heard before :-)	09:19.25
tomty89	i see	09:19.33
kens	chrisl is tor on vacation today ?	09:50.53
chrisl	Not that I'm aware of, but I haven't really kept tabs	09:51.46
kens	Hmm, I really need to speak to him, I can't build the latest MuPDF under Windows. Or more accurately I can't build mudraw	09:52.13
chrisl	What's the error?	09:52.30
kens	lots of them all much the same:	09:52.42
	error C2065: cmap_UniCMS_X : undeclared identifier	09:53.01
	I cna probably fix it, but I'd prefer tor to tell me why its wrong.....	09:53.17
chrisl	Have you tried git clean -x -f -d then rebuild?	09:54.04
kens	Hmm, no let me try that	09:54.16
	Oh that removes my project, oh well	09:55.16
	Ah that seems to work, thanks chrisl	09:56.10
chrisl	NP	09:56.23
kens	Now I cna check mudraw and send an email	09:56.39
	Oh, first test I try it gets it wrong	09:57.52
	Well I'll send them an email by the time they figure out how to build MuPDF it'll probably be fixed :-)	09:58.53
	Hmm, tor must have been lurking somewhere -)	10:14.44
chrisl	Weird, on my computer, the Acrobat PS output for the first 100 pages of the PLRM (using the Ray's benchmark 600dpi command line) takes 22 seconds. If I "convert" the Acrobat PS through ps2write, and run the result, it takes 10 seconds.......	10:16.44
kens	A win ! :-)	10:16.57
chrisl	Yes, but this is odd: doing that I get Type 1 outlines in the ps2write output, converting the PDF via ps2write directly, I get type 3 bitmap glyphs.....	10:17.46
kens	Well I noticed that the PLRM has a lot of fonts, often subset, and including some multiple masters	10:18.11
chrisl	I'm suspicious that the PLRM.pdf does weird sh*t with encodings, which probably confuses ps2write	10:18.34
	kens2: well, the only thing is, our PS output is already faster than Acrobat despite the large number of type 3 glyphs we define at the start - if we avoided that, we might be much faster.....	10:26.06
mattchz	anybody fancy commenting on this: http://bugs.ghostscript.com/show_bug.cgi?id=695336	10:27.04
chrisl	mattchz: probably needs paulgardiner	10:27.43
kens2	chrisl if I could get this fallback code to work it would emit fewer glyphs. The charprocs, when stored as outlines, are captured using the identity matrix, then the text should be scaled by the CTM. THis means that duplicate charprocs would be noticed and elided, even when the font is a different size. If I could only get the matrix right.....	10:27.44
	I saw the post to the thread, nothing I cna say about it	10:28.00
	chrisl OK is htis the PDF file i SENT THE OTHER DAY YOU ARE USING ?	10:29.08
	Grr stupid caps lock key	10:29.18
chrisl	kens2: yes, it is	10:29.21
kens2	OK I'll take a quick look.	10:29.33
paulgardiner	I don't think we support any form of writing of encrypted files.	10:29.59
kens2	Better stash my current code first though, or I'll get funny results	10:30.13
chrisl	kens2: it's not urgent/vital, just interesting - it's odd that the Postscript is a lot slower than the PDF	10:30.16
kens2	chrisl well, the PostScript is (IMO) terrible. THe only thing in its favour is it does actual;ly work (mostly)	10:30.47
chrisl	kens2: this is the Acrobat produced Postscript - our Postscript is faster	10:31.13
kens2	:-D	10:31.27
chrisl	The Acrobat PS must be pretty appalling]	10:32.07
kens2	It really must be, yes.	10:32.18
chrisl	But the main this is, for henrys, that it really seems the problem is construction of the PS, not necessarily a problem with out PS interpreter	10:33.27
	s/out/our	10:33.43
	I seem to struggling with the typing today :-(	10:33.58
kens2	If you run the 2 PS versions through acrobat, is ours faster there too ?	10:34.01
	Distiller that its	10:34.09
chrisl	I haven't tried it - I've generally struggled to get meaning performance numbers from Distiller	10:34.44
kens2	Oh I use a stopwatch, I don't believe what Distiller reports, it lies	10:35.02
chrisl	Well, with a time of less than three seconds.....	10:36.25
kens2	THat's kind of tricky	10:36.37
	Hmm, the very first glyph triggers a fallback O.O	10:37.06
chrisl	Oh my. Is it just heading that way because it's a non-standard encoding?	10:37.54
kens2	Don't know yet	10:38.03
	Its coming back from pdf_obtain_font_resource with an error, I have to keep on tracking it down	10:38.22
	AH, pdev->HaveCFF is false	10:39.55
	And since tghe fonts are CFF fonts.....	10:40.10
chrisl	Huh? So I wonder how the pdfwrite output, converted to PS contains T1 fonts	10:40.48
	Hmm, our Postscript is much, much slower through Distiller than the Acro produced PS - 3 seconds, versus 57 seconds	10:41.03
kens2	Well I guess not all the fonts are CFF	10:41.07
chrisl	They are all Type1C, IIRC	10:41.19
kens2	That's CFF though isn't it ?	10:41.31
chrisl	CFF charstrings	10:41.43
kens2	Right, and I htought you said hte pdfwrite output was type 1C	10:42.00
chrisl	Yes, it is	10:42.16
kens2	OK so CFF in CFF out, I must be missing your point	10:42.31
	Oh I see what you mean, the pdfwrite converts to type 1	10:42.51
	No idea how that works	10:42.58
	I've just found a control called HaveCIDSystem which apparently allows ps2write to output CIDFOnts. I wonder if it works	10:43.57
chrisl	I don't think it does work, I think I tried that before. Might work for Type 1 outlines?	10:44.25
kens2	Maybe, I obviously keep forgetting about it	10:44.42
	OK so HaveCFF is always false for ps2write	10:44.59
chrisl	Even if it works, it's of very limited use without TTF outlines support	10:45.13
kens2	also for pdfwrite if PDF level is < 1.2	10:45.20
chrisl	So we'll convert type 1 charstrings to CFF, but not the other way?	10:46.45
kens2	I don't know. I was only looking at whether we will emit CFF or not. If tis ps2write or PDF < 1.2 we won't.	10:47.38
	well I converted the PDF file to PDF using pdfwrite, and ps2write is still going through the fallback code for me	10:48.22
chrisl	Hmm, maybe I confused myself with all the different versions of the file I've been trying	10:48.57
kens2	Well for me I still get a load of bitmaps charprocs in teh output PS file	10:49.41
	brb	10:49.45
	Nothing but interruptions today.....	10:52.29
chrisl	Well, I suggest we not worry about this just now. Wait and if the new charproc outlines capture brings the improvement we expect	10:53.53
	Wait and see....	10:54.03
kens2	If I cna ever getit to work	10:54.20
dcmst	Hi, is it possibile to disable the "auto advance" feature in muPDF presentation mode?	11:33.37
kens2	I'm sure you can write code to do so, I've no idea what that is though	11:34.23
dcmst	so there is no user interface to disable it (like options, shortcuts, etc.)?	11:39.53
kens2	Well I don't know what feature you are referring to.	11:40.09
	It will also depend on what platofrm you are running on	11:40.32
	But probably, no.	11:40.42
dcmst	this is where the feature I want to disable is described: http://ghostscript.com/pipermail/gs-commits/2012-October/015421.html	11:41.19
kens2	Well not pressing p woudl seem to be favourite then	11:41.47
	If you don't do that, tehn it won't be in presentation mode and so won't advance	11:42.13
dcmst	I want presentation mode without auto advancing	11:42.47
kens2	Then you will need to write it yourself.	11:43.20
dcmst	I need the transition effect (I'm recording a video of the pdf)	11:43.20
chrisl	kens: as last week, I'm heading out for a bit of squash training in a bit - I'll call if I get back early enough	13:12.17
kens	OK have fun	13:12.24
chrisl	I think if it's "fun", I'm probably doing it wrong ;-)	13:12.45
kens	:-)	13:13.03
rayjj	chrisl_away: (for the logs) That "trick" of converting the Acrobat PS of the first 100 pages of the PLRM (with the setting to preload the fonts, not incremental) using gs ps2write resulted in a PS fle that is 1.5Mb and runs in 45 seconds !!!	14:15.58
	kens: it looks to me like the presentation mode isn't mentioned in the 'usage' in x11/pdfapp.c	14:32.48
kens	Possibly not.	14:33.07
	But that wasn';t really what he was asking about nayway as such. THe 'feature' he was describing is just part of 'presentation mode' which is why I had no idea what he was talking about	14:33.43
rayjj	looks like adding a prefix to set the time for the delay on each page would be easy, then 0p could set infinite time (right now AIUI, 5 seconds is hard coded == 5p)	14:42.05
kens	Yes, its hard coded. The poster opened a bug report, so its up to Tor now from my POV	14:42.44
tkamppeter	mvrhel_laptop, hi	14:45.01
mvrhel_laptop	hi tkamppeter. did you get my email?	14:52.56
tkamppeter	mvrhel_laptop, yes, it arrived around 6am here, you had already left when I saw it at our 8am.	14:57.41
mvrhel_laptop	ok	14:57.52
tkamppeter	mvrhel_laptop, to get your presentation onto the Thu or Fri we need to contact Mike Sweet and/or Ira.	14:58.43
robin_watts_mac	MuPDF can't write encrypted files at all.	14:59.12
	hence we can't write annotations to encrypted files.	14:59.35
mvrhel_laptop	tkamppeter: ok. I think Ira was cc'd on that email. If you can handle this that would be great	14:59.55
tkamppeter	mvrhel_laptop, I have sent out a mail to them now to see what can be done.	15:08.11
mvrhel_laptop	ok	15:08.24
	thanks tkamppeter	15:08.30
rayjj	coffee. bbiab	15:10.21
tomty89	hi it's me again. i am filing a bug report for the possible gs bug which output invalid file with LibreOffice emission. Is it true that I can mark the attachments as private?	15:11.30
e98	any devs awake and reading?	15:26.21
rayjj	e98 was too quick :-(	15:28.33
kens	No patience.....	15:28.43
tomty89	kens: it is true that i can make the attachment private? coz i don't see an option in bugzilla	15:28.49
kens	tomty89 : you can't, butI can	15:28.58
tomty89	lol	15:29.02
	ok	15:29.04
	gonna upload them now	15:29.11
kens	Just stick it there and tell me and I'll make it private	15:29.15
rayjj	tomty89: if kens is gone, and I notice the attachment (we get email) I'll mark it private	15:30.10
kens	I'll be here for a bit yet	15:30.24
rayjj	chrisl: thanks for the idea. Works a champ. Now all we have to do is get ps2write to do as well without Acrobat "helping" beforehand :-)	15:30.58
chrisl	rayjj: well, as discussed with kens, hopefully the work he's doing now will improve the situation quite a bit	15:31.35
kens	Well, I can get the text to come out now, positioned in the right place, and correctly sized. But the Widths are wrong, and all the curves are 'warped' for some reason.	15:32.08
rayjj	chrisl: I didn't read the logs thoroughly yet	15:32.12
tomty89	upload is done, thanks :)	15:32.43
rayjj	kens: you're capturing the glyphs as outlines ?	15:32.46
kens	ok 2 secs	15:32.49
	done	15:34.40
tomty89	:D	15:35.38
rayjj	kens: even building fonts of the glyph bitmaps would be better, I'd think. Needing 11k glyphs for 100 pages doesn't seem reasonable -- there must be quite a bit of duplication	15:35.42
kens	It does build fonts with the glyph bitmaps, type 3 ones	15:36.33
rayjj	kens: I see. I guess it's just not carrying the glyphs over pages or something. 110 or so unique glyphs per page seems reasonable for the PLRM	15:39.08
kens	No, because they are bitmaps, they are different for each size or transform of each glyph	15:39.38
henrys	kens:Iâll look at the FirstPage/LastPage thing that came in.	15:47.18
kens	OK thanks henrys	15:47.45
rayjj	hmm.. that seems unfortunate. The default build for mupdf in Makefile is "debug". Most people won't know to say "make build=release" since the README doesn't mention it	16:16.52
	that means that newbies evaluation mupdf performance will be getting a debug build :-(	16:17.23
	mvrhel_laptop: have you done any performance testing on linux with mupdf ?	16:18.18
mvrhel_laptop	rayjj: I did testing on the pi	16:18.35
	Robin and I match when I did that	16:19.03
	matched	16:19.07
	so I am pretty sure I was not doing a debug build. plus mupdf was faster that gs	16:19.49
	than	16:19.52
	can't type today	16:19.54
henrys	rayjj: the default build in ghostscript is debug in VS too.	16:20.32
	I find that odd also	16:21.10
	mvrhel_laptop: when is the meeting?	16:40.56
mvrhel_laptop	it is at 11 today	16:41.07
henrys	mvrhel_laptop: they have a habit of stopping in chewing engineering hours and disappearing I wonder if it is a strategy	16:42.24
mvrhel_laptop	henrys: I hope something comes out of all of this.	16:42.50
henrys	kens, chrisl : I didnât know adobe was prompting to install font packages when viewing a pdf, when did that come about. Never seen it on the mac, just windows which Iâve been using more frequently lately.	16:45.57
kens	CJK font pack ?	16:47.05
	I alwys install it immediately anyway	16:47.22
chrisl	I'm still using Acro9 so......	16:47.23
henrys	kens: yes it install a cjk font package when vieiwing the jeitta files	16:56.25
kens	AH well I install those when I install Acrobat, so I wouldn't get prompted for that. I always install all the fonts initially	16:57.00
	695417	16:57.07
	OOps	16:57.10
	Night all	17:01.30
rayjj	mudraw is amazingly slow at 600 dpi. CMYK on the PLRM is 32 seconds per page on the Pi, but it's even 5+ seconds per page on my laptop.	22:16.30
	is this a reasonable command line? : mudraw -r 300 -o /dev/null -F pam -c cmyk -b 0 -B 661 -m -M PLRM_100_AR.pdf	22:17.22
	even without the -B 661 it is just as slow. gs does this FAST on my laptop (all 100 pages in 5.8 sec, and it was < 62 sec on the Pi)	22:19.49
	note the -B 661 for mudraw was to constrain the memory use to near what the -dBufferSpace=16m does to gs forcing 10 bands	22:21.03
nemo	so. given my general failures to make parameters to gs do anything to the jpeg quality of the resulting PDFs, I tried grepping for QFactor in source	22:24.01
	I modified Resource/Init/gs_pdfwr.ps and set all of them to 0.95 1111	22:24.20
	reran make	22:24.22
	reran PDF generation	22:24.26
	image was totally unchanged from every other attempt	22:24.34
	what am I missing â¹	22:24.38
rayjj	nemo: you probably need kens or someone to dig into pdfwrite. Even though it is in the docs, pdfwrite may not pay attention to the Filter params. And AFAIK, gs_pdfwr.ps isn't used (it was used only by pdfopt.ps which was intended to allow PS programs to load, then output PDF files)	22:27.10
	on j9_acrobat.pdf gs on the Pi, gs does all 5 pages in 47 sec (at 600 dpi) and mudraw takes 242 sec :-(	22:28.07
	nemo: bad news. Looking at devices/vector/gdevpdfu.c in pdf_put_filters, there is a comment /* Currently this only saves parameters for CCITTFaxDecode. */	22:33.49
	nemo: which seems to correspond to code I see later that never calls s_DCTE_get_params (as it does for s_CF_get_params)	22:35.39
nemo	à² _à²	22:40.40
	I'm astounded that setting the jpeg quality in PDF images has turned out to be such a massive task	22:41.08
	'cause, every single one outputted so far has been unusably muddy	22:41.30
rayjj	nemo: in fact, there are many parameters in the Ps2pdf.htm doc that are mentioned that are currently ignored by pdfwrite AFAICT, but we'd have to wait for kens to make sure	22:41.54
nemo	I'd be perfectly happy to hardcode it if I knew where to do it	22:42.39
rayjj	nemo: and just downsampling and lossless doesn't cut it ?	22:42.45
	nemo: can you send a page of the file to me: ray at artifex.com with the params that you are using so I can play with it. I may try more extreme ImageResolution and forcing Interpolate true	22:44.06
nemo	rayjj: um. not sure... do you have a commandline for that? 'cause my prior attempt to do that still resulted in jpeg compression	22:44.10
	rayjj: man. I'm hardpressed to come up w/ a page I can share	22:44.25
	but seriously, this happens on like every single PDF I've tried so far	22:44.38
	ugh. is late on a friday and I have to head out anyway. I'll attack it next week I 'spose :-/	22:45.08
rayjj	nemo: that's the other thing I want to look at, is why the text is being done with JPEG	22:45.25
nemo	rayjj: well. these are scans of existing docs.	22:45.38
	rayjj: their first scan as mentioned before was done stupidly	22:45.48
	not in document mode, so no background removal	22:46.01
rayjj	nemo: right, and if the original scan was muddy (not just big) we may not be able to do much	22:46.13
nemo	of the ones after they fixed this, a number still had enough ink and whatnot garbage that they were still enormous after Flate	22:46.24
	rayjj: well, what bugs me is, I print out a page of the original, muddy...	22:46.44
	I do something in gs, and I get a ton of compression artifacts around everything	22:46.57
	but... no matter what I try to change in the params, I always get compression artifacts.	22:47.10
rayjj	nemo: but the text is supposed to be black ? Just happened to be a color scan ?	22:47.14
nemo	most of the scans are greyscale	22:47.22
rayjj	because we never use Flate (AFAIK) with monochrome images	22:47.30
nemo	some are colour (blue ink, the odd picture)	22:47.32
	ah	22:47.39
	what bugs me is that all my outputs from gs are basically identical	22:47.58
	nothing I've done, in the hours of messing with this, seems to cause the slightest bit of difference to the output	22:48.12
	unlike, say, opening it in GIMP and saving at a few different jpeg levels, where the differing results are obvious	22:48.27
	it's as if gs is ignoring everything, and always saving low quality jpeg	22:48.40
rayjj	nemo: but gimp doesn't preserve the pdf text, right ?	22:48.59
nemo	yeah. that was just an example	22:49.12
rayjj	right, so you need to extract the image, process it, and put the PDF back with the modified image, leaving the rest of the (presumably OCR layer of text) in the PDF	22:50.18
	many scanner apps OCR the text and put it into the PDF as Tr3 (Invisible) text so that it is searchable and can be cut and paste	22:51.08
nemo	really, the results in gs are more like gimp at 15% quality.	22:51.22
rayjj	then the image goes in "however"	22:51.29
nemo	hm...	22:51.37
	that really would do the trick	22:51.45
	I can process this image in absolutely anything. convert for example	22:51.55
	I don't need to use ghostscript	22:51.59
	is more the "stitching it back together" that is the tricky part	22:52.08
rayjj	nemo: but I'd have to see an example to have ideas that are more than just guesses (just one page)	22:52.18
	nemo: mutool is the thing for PDF manipulation	22:52.53
nemo	hm. found something that looks genericish	22:53.18
rayjj	usage: mutool <command> [options]	22:53.22
	clean -- rewrite pdf file	22:53.24
	extract -- extract font and image resources	22:53.25
	info -- show information about pdf resources	22:53.27
	poster -- split large page into many tiles	22:53.28
	show -- show internal pdf objects	22:53.30
nemo	'k	22:53.48
	so. how do I extract 'sactly one page from a PDF?	22:53.57
rayjj	nemo: and each of those functions has options for how it works	22:54.06
nemo	well. I mean, without altering it	22:54.22
	I wonder if pdfseparate would do that	22:54.45
rayjj	nemo: mutool clean -ggg in.pdf out.pdf	22:54.57
	nemo: mutool clean -ggg in.pdf out.pdf 1	22:55.03
nemo	ok	22:55.06
rayjj	(for just page 1)	22:55.08
	nemo: pdfTk also lets you "burst" a PDF, but AFAIK it writes all the pages	22:55.41
	nemo: mutool clean -ggg in.pdf out.pdf 1-10 for the first 10 pages	22:55.58
	the -ggg leaves out unused Resources (fonts, images, etc.) that may be in the original	22:56.36
nemo	http://m8y.org/tmp/after.pdf - after gs	22:57.21
	http://m8y.org/tmp/before.pdf - before gs	22:57.57
	I can totally understand some muddying. What puzzles me is that nothing I do seems to change it at all	22:58.37
rayjj	nemo: the "after" still looks fine to me. I can even read the "OFFICE OF MINORITY HEALTH"	22:59.32
nemo	rayjj: it looks a lot worse on docs with background removal I gotta say	22:59.47
	rayjj: also, it looks fine on the screen. is when printed that the fuzziness around the letters becomes clearer	23:00.09
	frankly, I was still happy w/ it, was just boss who was not, and wanted me to improve quality	23:00.25
	which, I thought would be an easy task	23:00.29
rayjj	nemo: I'll try printing it and see what I get (have to go home for that)	23:00.31
nemo	no biggie	23:00.38
	hm	23:00.51
rayjj	nemo: and I'll have a look at the contents of the "before"	23:00.54
nemo	lemme take a pic on my phone âº	23:00.57
rayjj	nemo: mutool clean info before.pdf shows: [ DCT ] 3435x4394 8bpc DevRGB (3 0 R) and "after" has: [ DCT ] 1030x1318 8bpc DevRGB (10 0 R)	23:05.02
nemo	10 0 R ?	23:06.54
rayjj	which means you are starting with a 400 dpi image and going down to a 120 dpi image	23:06.57
nemo	sure	23:07.09
rayjj	nemo: that's the object number	23:07.11
nemo	I pasted that in all my gs scripts âº	23:07.18
	but... the artifacts around text don't appear related to dpi	23:07.28
	http://m8y.org/tmp/temp.jpeg	23:07.30
	I guess they could be	23:07.44
	but they look more like the usual jpeg stuff	23:07.49
	s/gs scripts/gs commandlines/	23:07.57
	the general object was to take something that was at a super-high DPI and reduce it to something that was still reasonably printable	23:08.33
rayjj	nemo: OK. I see the diffs in temp.jpg	23:08.34
nemo	it's even more dramatic on an image with background removal	23:09.08
rayjj	nemo: right. I'll have a look	23:09.16
nemo	there's grey fuzzy blocks around all text. I can photograph a piece of one of those pages too	23:09.36
rayjj	nemo: background removal often involves a "thresholding" stage that makes sharp edges, but this is NOT something that JPEG compression deals with well	23:10.01
nemo	well	23:10.15
	I was going to experiment with lossy vs non-lossy compression	23:10.34
	see if, say, jpeg 0.95 or something outperformed non-lossy overall	23:10.47
	but I was getting truly hideous results, and couldn't get any difference in appearance in gs	23:11.02
rayjj	nemo: The main thing I see is to try and compress monochrome (or near mono) using CCITT	23:11.19
	mutool extract lets me extract the image and play with it	23:12.06
	nemo: that example is good because it has non-black (gray) text as well as images, all in the same image. You don't want ot make the text nicer at the expense of crap on other parts of the image	23:13.32
nemo	there's a reasonable amount of that in these scans	23:17.03
	here's what happened on a form with background removal, and the same parameters	23:17.16
	the excessive jpeg compression is more obvious	23:17.23
	http://m8y.org/tmp/temp2.jpeg	23:17.33
	aaaanyway. way-late here. gotta go	23:17.59
	I appreciate you taking an interest, and if you see absolutely anything on controlling jpeg quality in gs, that'd be lovely	23:18.19
	simply calling one gs commandline seems more elegant than working out something w/ mutools and convert	23:18.36
	but, eh, I imagine the latter is more "unixy"	23:18.43
	and certainly removing the image and popping it back in gives me a lot more control over what happens to the image	23:19.00
	can even do stuff based on, say, imagemagick identify's analysis of it (number of colours etc)	23:19.18
	I just need to figure out how to make these mutools do that, since I just heard about 'em, and make sure I don't screw up the PDF structure (bookmarks, text OCR, whatnot)	23:20.11
	page dimensions, image position...	23:20.22
rayjj	nemo: I think a transfer function might help. The histogram has most of the "fuzz" in the before up near white and black. After mapping those to pure white or black, the noise in the image is MUCH better	23:21.59
	then getting gs to output as a lossless Gray image should be a decent size	23:22.37
	nemo: so, the question is: the "before" image is 4.5Mb and the "after" image is 640Kb. If I munge to gray and save "lossless" I get down to 280Kb and it looks pretty good (to me). See: http://casper.ghostscript.com/~ray/before_9_245_gray_120dpi.png	23:39.14
	it has a lot less noise that the "after" image" http://casper.ghostscript.com/~ray/img-0010.png	23:40.19
	nemo: keeping RGB (not forcing gray) but doing the transfer fucntion and lossless gets to < 600 Kb and the image is:	23:44.10
	http://casper.ghostscript.com/~ray/before_9_245_rgb_120dpi.png	23:44.58
	nemo: the 9 and the 245 mean that anything less than 10/255 goes to 0 (black) and everything lighter than 245/255 goes to 255 (white). This removes the noise which makes the lossless compression better	23:46.38
	nemo: saving that 120 dpi image after the transfer function as JPEG reduces the file size to 240Kb but degrades it a bit: http://casper.ghostscript.com/~ray/before_9_245_rgb_120dpi.png	23:51.52
	nemo: that's with the GS defaults of 90% and 2 1 1 2	23:52.22
	nemo: so after you let me know, I'll tell you what it takes to do this with gs (if it isi possible)	23:53.01
	heading off to be with the family (I think they're home by now). Check back later	23:53.55
	Forward 1 day (to 2014/08/09)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.