Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2014/08/10)	2014/08/11
samtek	hello everybody ..	05:06.48
	somebody there ..?	05:08.11
	hello robin_watts	05:08.58
	i read your post and comments in stack overflow on MUPDF	05:09.22
	hello mrdoc	05:23.22
kens	tor8 ping	09:46.01
tor8	kens: morning.	09:59.59
kens	Morning Tor	10:00.08
	Did you see the query from Amish, and our problems over the weekend ?	10:00.24
tor8	kens: ah, no. reading the mail now.	10:01.01
kens	Its the IRC logs as well.	10:01.08
	THey need a patch to an archaic version of MuPDF, somewhere between 0.7 and 0.8 to fix a specific problem	10:01.33
	I canisolate it (from binaries and their mail) to somwehere between 4th December 2010 and 3rd April 2011	10:02.13
tor8	ah yes, I remember they were very keen on chopping up the code and changing it around in their own fork	10:02.36
	which would make it very difficult for them to follow our development	10:02.53
kens	But when I checkout any MuPDF commits between those dates it fails to buidl for me. And yes, tehy can't simply upgrade because they've changed laods of stuff	10:02.56
	They say they will upgrade (hmm.....) but they want a patch for this transparecny problem in the mentime.	10:03.23
tor8	kens: I think back in those commits we didn't have the thirdparty libraries set up as submodules yet	10:03.27
kens	tor8 correct, they seem to be hardcoded in. Which makes the inability to build mopre surprising to me.	10:03.55
tor8	we have the tarballs required for the various dates on http://mupdf.com/downloads/archive/	10:03.58
kens	tor8 I'm not sure that helps.....	10:04.36
	I know its after December 2010 and I know the 0.8 release is fixed, but I need the specific commit int those times	10:05.00
	Those archives seem to be (mostly) the releases	10:05.49
tor8	kens: mupdf-thirdparty-$DATE.zip	10:06.27
	ah, I see now that Robin's already told you this :) anyway, the thirdparty directories didn't change much back then	10:06.47
	it should be possible to build with the oldest of those zips in 0.9, or maybe even just use the system libraries. memory's hazy.	10:07.15
kens	0.9 is no good to me, I need to go back before 0.8	10:07.38
	There's only one 3rd party of a relevant date that I can see, 2011-02-24.zip	10:08.01
	(messing about with code over 3 years old is painful....)	10:08.17
tor8	kens: yeah. let me see if I can get a checkout that old that builds.	10:08.39
kens	OK thanks tor	10:08.45
	I confess I'm struggling with thise	10:08.55
tor8	kens: ah, for 0.7 there's a 0.7-thirdparty.zip which should work for that release (and probably the gits before and after it)	10:09.59
kens	OK I'll pull that and see if I get anywhere, give me a minute or two, I'm trying to figure out why opdfread's insertion sort doesn't work, I'll need to save away what I'mdoing	10:10.39
tor8	kens: give me a few minutes and I might be able to say if it works or not :)	10:11.03
kens	I guess that would help :-)	10:11.16
tor8	building on windows back in those release days was still a bit dodgy	10:11.24
kens	Well it 'seems' to work, except that the 3rd party libraries throw compilation errors, especially openjpeg	10:11.55
tor8	well, a 0.7 release built just fine. I'm surprised!	10:11.56
	with the mupdf-0.7-thirdparty.zip archive	10:12.12
	but that's on linux though	10:12.18
kens	I admit I didn't actually try that. Not being aware of the history I just pulled a checkout and expected it to build.....	10:12.21
	I could build on Linux, it'll just take longer	10:12.35
tor8	kens: oh, jconfig.h is missing in the mupdf-thirdparty-2011-02-24 tarball which makes building 0.8 non-trivial	10:15.16
	(unless you have libjpeg installed on your system)	10:15.25
kens	Not for Windows :-)	10:15.35
	Hmm, I don't currently have a MuPDF clone on Linux	10:16.39
tor8	okay, so the fix they're looking for is somewhere between commit c8c6ac and 0.8	10:17.19
	and that commit builds with the same thirdparty stuff as 0.8	10:17.29
kens	Yes, that's what I was able to discenr	10:17.31
tor8	so I guess it's git bisect time	10:17.34
kens	Bisect was what IU was trying	10:17.40
	But an inability to build the code was a bit of a show stopper	10:17.54
tor8	if you take the 2011-02-24 thirdparty archive it should build, but it might fail on a missing jconfig.h (which you can copy from the 0.7-thirdparty zip)	10:18.12
*tor8*	types "man git-bisect"	10:18.31
kens	OK.... But won't those files be overwritten on ecah bisect ?	10:18.41
	bah, typing.....	10:18.48
tor8	no, the thirdparty directory (pre-submodules) is completely untracked by git	10:19.06
kens	Hmm, so when I do a git checkout c8c6... where do the contents of those directories come from ?	10:19.54
tor8	a clean checkout of c8c6 shouldn't have a thirdparty directory (rm -rf thirdparty if it's there)	10:20.25
	then unzip the thirdparty.zip	10:20.36
kens	OK doing thast now	10:21.12
tor8	git might be reluctant to remove the now-unused-git-submodules because you might have some unsaved work there	10:21.14
kens	It says it can't remove them (my checkout is clean) but I'm not going to worry	10:21.30
	D'oh wrong archive.....	10:22.14
	OK I can't do this on WIndows.	10:23.53
	It won't let me kill the thirdparty from the current checkout, nor can I rename the directory or anythign helpful like that	10:24.20
tor8	kens: that's ... odd	10:26.08
	0.7 has the problem they're showing, but the release they claim they're working from doesn't	10:26.35
kens	Its some kind of Windows'ism, I hate the stupid 'I can't let you do that Dave' of Windows these days	10:26.37
	Is it possible they aren't really working from that release ? I wouldn't be surprised to find they are actually wrking from 0.7 with a number (but not all) of later fixes	10:27.18
tor8	I hate git-bisect and its terminology "good" and "bad" ...	10:32.28
	Some good revs are not ancestor of the bad rev.	10:32.35
kens	Yeah that's always a problem	10:32.38
tor8	so now I have to invert my meaning of good and bad and I'm messing up everytime	10:32.51
kens	I've had that before, never been able to resolve what it means	10:32.53
tor8	just doing a manual bisect is easier :(	10:32.56
kens	Oh yeah, reversing the meaning is the way to go. I don't find it too hard	10:33.13
	But a manual bisect might be just as quick	10:33.25
tor8	It's always "git bisect start new old" and reverse the meaning	10:33.33
kens	I've had to resort to that when intermediate commits don't build	10:33.45
tor8	I have narrowed it down to a commit, and it's one they claim they have on their branch	10:33.51
kens	O.O	10:33.58
	The FZ_COMBINE rounding fix ?	10:34.13
tor8	or maybe that's just the latest they have pulled, ever	10:34.15
kens	That would not surprise me	10:34.26
	I 'suspect' they have 0.7 with a random assortment of other patches	10:34.45
tor8	no, one that's 9 commits before the FZ_COMBINE2 rounding fix	10:34.51
	69815ed Apply soft masks from gstate to individual objects.	10:34.57
kens	Yeah I seeit, that's the kind of commit I was looking for, but I, of course, was looking in the other direction.....	10:35.36
tor8	yeah, me too	10:35.47
kens	I might have realised if I'd been able to build the commit they mentioned.	10:35.50
tor8	until I did a double-take when I realized the commit they mention actually works as well	10:36.02
kens	Yeah, something of a clue that :-)	10:36.15
tor8	so, do you want to take it from here or should I jump into the discussion?	10:36.37
kens	You want to write them a mail, or shall I ?	10:36.40
	LOL	10:36.42
	Up to you	10:36.46
tor8	You're better with customers ;)	10:36.53
*kens*	is shocked anyone would say that to me	10:37.10
	But I'll write them a nice email, thanks tor	10:37.22
tor8	the commit in question is fairly isolated and should be easy to backport, even if they have changed the code a lot they should be able to figure it out	10:37.45
kens	Well they wil have to won't they ? If they've changed the code significantly we can't do it for them :-)	10:38.14
tor8	indeed. basically some repeated lines of code have been refactored into a function, and the bug fixed in that function	10:38.47
kens	Yeah I had a quick look at the changes, it didn't look too bad	10:39.20
tor8	and it's all in the interpreter side, so nothing I expect they'll have messed with. I think they did most of their changes to the device interface side	10:39.43
kens	OK you know more than me :-)	10:39.56
tor8	I only have a hunch from what questions they were coming up with several years ago	10:40.19
	I think they basically rewrote the text extraction device, but in-place rather than just making their own separate device to do what they want	10:40.50
	which makes merging hell for them	10:40.58
kens	Of course, because doing it properly would be too future proof....	10:41.11
tor8	I tried nudging them in that direction, but I have little patience for explaining obvious things...	10:41.58
kens	I'd like to think that episodes like this would teach them, but experience says not	10:42.34
	OK mail to customer is gone, hopefully they will be happy.	10:50.39
	Back to insertion sorts in PostScript	10:50.52
	Hmm tor8 that customer seems to have been bought out, judging by the email I received. I'll CC it to support.	10:58.26
	Hmm, interesting, I seem ot have a LOCA table entry which is coming out as a null object, which is why the InsertionSort is failing	12:37.24
henrys	tor8:thanks for helping out with that. FWIW I use this with bisect: http://stackoverflow.com/questions/15407075/how-could-i-use-git-bisect-to-find-the-first-good-commit	13:14.30
buildMuPDF	I want to build MuPDF. But I get the following error messages when running ndk-build:	13:19.25
	expected 'struct fz_rect const *' but argument is of type 'fz_rect'	13:19.35
	expected 'struct fz_matrix const *' but argument is of type 'fz_matrix'	13:19.49
	'fz_text_line' has no member named 'len'	13:20.11
kens	Did you get the source from our Git repository, and if so, when ?	13:20.17
buildMuPDF	'fz_text_line' has no member named 'spans'	13:20.25
kens	What's the SHA of the source you are using ?	13:21.05
buildMuPDF	It is mupdf-a3d00b2c51c1df23258f774f58268be794384c27.tar.bz2	13:21.39
kens	Well that doesn't seem to be the latest, when did you fetch it ?	13:22.36
	What does "git log" say ?	13:24.11
	OK well that's back in May. I'm not aware of any problems there, but you should probably upadate to the current version anyway and try again	13:25.30
jogux	kens: https://www.google.co.uk/search?client=safari&rls=en&q=mupdf-a3d00b2c51c1df23258f774f58268be794384c27.tar.bz2&ie=UTF-8&oe=UTF-8&gfe_rd=cr&ei=pMPoU57wGejY8gepl4DQCA suggests there's more going on here :-) several of the matches have patches along side them that modify the source after extracting it.	13:25.34
buildMuPDF	It belongs to APV. It's from May 25, 2013. https://code.google.com/p/apv/source/browse/pdfview/?r=e949eb17e42186321523a595316e13a73a98456f#pdfview%2Fdeps	13:25.52
kens	buildMuPDF : You can't expect us to support code you get from somewhere else	13:26.08
	You either need to use our code or take it up with APV	13:26.33
	jogux yes I see that, thanks, I did't know so many people were hosting versions of MuPDF	13:27.11
buildMuPDF	Well, the error comes from mupdf	13:27.22
kens	Brando753 : If someone changes the source, its not MuPDF	13:27.43
	AFAIK our current source builds for Android. I'm not one of the MuPDF developers, but if it was me I would tell you to get our version of the source, and try with that.	13:28.17
buildMuPDF	It is using your MuPDF (unmodified)	13:28.18
*kens*	is a cynic.	13:28.37
buildMuPDF	Ok I will try.	13:28.50
kens	I wouldn't trust source that didn't come from us. And in any event, you are using source 2 months out of date	13:28.55
buildMuPDF	It is their latest release. The others are beta.	13:29.50
*kens*	shrugs	13:30.01
	Mayeb you should query APV about it ?	13:30.11
	I assume you are using the instructions in the README ?	13:30.50
buildMuPDF	Does the newest MuPDFlibrary work exactly the same or did you make great changes that could stop apv from working?	13:32.18
kens	I'm not aware of any breaking changes, but like I said, I don't work on MuPDF	13:32.50
tor8	buildMuPDF: we don't track what APV are doing. that said, we haven't made any breaking changes in the past month or so.	13:33.01
	the source as we provide it as releases on mupdf.com or by git from git.ghostscript.com builds fine for us on android	13:33.45
buildMuPDF	Ok. Thank you. I will try your newest release.	13:34.25
kens	chrisl ping	13:38.53
	Oh oops, he's away :-(	13:39.05
buildMuPDF	Do you mean me??	13:44.45
kens	No I mean chrisl	13:44.53
tor8	robin_watts_mac: ping	13:49.07
	paulgardiner: ping! I guess you're more likely to be around that Robin	13:49.54
robin_watts_mac	pong.	13:50.39
	but just going to breakfast.	13:50.44
tor8	robin_watts_mac: I have two small commits on tor/master waiting for review	13:51.05
sebras	tor8: the cbz-commit LGTM.	13:58.27
buildMuPDF	Do you know if this code is compatible with your MuPDF library? http://dpaste.com/13XZ5DE	13:59.47
sebras	buildMuPDF: no probably not.	14:01.55
buildMuPDF	How can I change it to be compatible?	14:02.33
tor8	buildMuPDF: no, it does not look entirely compatible.	14:02.41
	buildMuPDF: look in "include/mupdf/fitz/structured-text.h"	14:02.52
buildMuPDF	(It is C not java)	14:03.17
sebras	that code uses fz_text_line->len while here http://git.ghostscript.com/?p=mupdf.git;a=blob;f=include/mupdf/fitz/structured-text.h;h=f325bf216a5cc6946a7d8f95a4970ff09e4e7c14;hb=HEAD#l119 mupdf declares it fz_text_line not to have a len member, for example.	14:03.17
tor8	the changes are fairly minor; the page_block can contain both text and image blocks	14:03.25
sebras	buildMuPDF: we know. ;)	14:03.25
tor8	so you'll need to check the type field and then get the text block	14:03.48
	and looping over the spans is a linked list rather than an array	14:03.57
sebras	tor8: I agree with your description of your test_device-patch, but I don't understand how iscolor relates to dev->user..?	14:06.26
	tor8: that just seems wrong. why is dev->user passed to fz_test_color()..?	14:06.51
tor8	sebras dev->user points to an integer that contains the boolean result of the iscolor 'test'	14:07.19
	fz_new_test_device(ctx, &iscolor)	14:07.38
	run page	14:07.40
	read iscolor value	14:07.43
buildMuPDF	Could you change it, please. As am only programming in java and don't know your library, it is very difficult for me.	14:07.51
sebras	tor8: ah! now I see it! it is passed through fz_new_device(). ok then it makes sense.	14:08.19
	tor8: ok, LGTM.	14:09.49
tor8	buildMuPDF: look at fz_print_text_page http://git.ghostscript.com/?p=mupdf.git;a=blob;f=source/fitz/stext-output.c;h=6ed595fc1dac90a04da38db57a15d5d49ed06037;hb=HEAD#l363 and just restructure the code you have to that loop framework	14:09.55
	sebras: thanks.	14:10.00
	buildMuPDF: replace the /* for now lets just flatten */ block with the code in fz_print_text_page, substituting the printf for append_chars	14:10.59
buildMuPDF	Including "void"? void fz_print_text_page(fz_context ctx, fz_output out, fz_text_page *page)	14:14.55
tor8	buildMuPDF: I thought you said you knew Java? C isn't that much different.	14:17.26
sebras	buildMuPDF: you can't just quote the entire code including the argument declarations inside another function in C. in the same way you cannot do this in Java, right..?	14:20.12
buildMuPDF	I just want to know if I have to include the void in line 362 or leave it out? (You pointed to l363)	14:20.14
sebras	buildMuPDF: actually you don't need line 363 either.	14:20.46
tor8	buildMuPDF: I linked to the function and said copy the contents (of the function). copying the function would be rather pointless; as it already exists...	14:20.49
buildMuPDF	Ok thank you.	14:21.15
sebras	buildMuPDF: you need to read and understand your original extract_text(). you must understand how it loops over each type of datastructure to get at the pieces of text and how it stores it in the output string. then you need to read a bit of fz_print_text_page() to see how it differs in looping of the datastuctures to get at the pieces of text. in this case they are printed instead of being appended to a string.	14:23.15
	buildMuPDF: I think it will help you in the future if you actually spend the time to learn this now, I mean you might end up having to interact more with the native mupdf C library in the future and starting out with this quite simple code is a great way to start. :)	14:24.31
tor8	buildMuPDF: something like this http://collabedit.com/qu7rp	14:25.10
buildMuPDF	I still get errors.	14:48.58
	http://dpaste.com/2FK9HY4	14:50.13
kens	Well the easy way to solve those is to cast them to const types	14:50.53
	Oh and you'll need to see what 'out' should be and define it	14:51.15
buildMuPDF	Where do I see what out should be? You mean "fz_printf(out, "\n");"?	14:54.37
sebras	buildMuPDF: did you see the changes that tor8 and kens did for you?	14:54.45
	buildMuPDF: http://collabedit.com/qu7rp	14:54.50
kens	Wasn't me, must have been tor	14:54.59
sebras	buildMuPDF: it's a collaborative text editor online.	14:55.00
	kens: oh, I thought you contributed too. :)	14:55.15
kens	I logged in to read it, I wouldn';t want to edit it and send someone wrong	14:55.39
robin_watts_mac	tor8: OK, do you still need your commits reviewed? Which ones?	15:00.12
tor8	robin_watts_mac: no, I'm all set thanks to sebras	15:00.33
	go back to vacation!	15:00.35
robin_watts_mac	ok.	15:00.43
kens	Still in CHile Robin ?	15:00.57
sebras	robin_watts_mac: bye! :)	15:01.00
robin_watts_mac	kens: yeah, got back to santiago from Easter Island yesterday.	15:01.35
	Off to Dallas tonight.	15:01.42
kens	Say hi to Scott for us :-)	15:01.52
robin_watts_mac	Seeing Scott tomorrow, then we fly back to the UK on wed.	15:01.57
	Will do.	15:01.59
kens	Hmm, thunder.... :-(	15:02.04
pedro_mac	Robin: cool - have a safe trip back	15:02.27
buildMuPDF	I'm running ndk-build and it looks really good. - It's still running without any errors :)	15:02.48
kens	pedro_mac : just wants to ensure someone else comes and works on Smart Office :-)	15:03.02
pedro_mac	kens: :)	15:03.26
buildMuPDF	tor8: Thank you very much!	15:03.38
sebras	buildMuPDF: most importantly. do you understand it better now..?	15:04.47
kens	oh boy lightning now too, if I drop off suddenly you'll know why....	15:05.58
buildMuPDF	Yes. Not everything but more than before.	15:09.30
sebras	buildMuPDF: excellent, good work. :)	15:10.22
buildMuPDF	Thank you and goodbye.	15:12.31
kens	bb	15:12.44
robin_watts_mac	see y'all in dallas.	15:43.39
mvrhel_laptop	bbiab	15:48.07
kens	Night all	16:04.03
pedro_mac	gânite kens	16:05.21
nemo	rayjj: WB	16:40.30
	so, (a bit later) how did you flatten that PDF? might be worth trying to do it	16:41.01
	although... a fair number of pages are colour, so it might be just too difficult to descriminate	16:41.17
	probably better to just focus on using mutools to get more reliable jpeg compression than ghostscript can provide	16:41.33
rayjj	nemo: OK, so I have something that cleans up the image	16:56.16
	nemo: I don't think mutool (or mudraw) can do this since it relies on image transfer function	16:57.12
	mudraw only supports gamma AFAICT	16:57.50
nemo	rayjj: well, I was thinking more extracting the jpegs using mutools, then putting them back using it	16:58.23
	which you'd suggested	16:58.26
	(putting 'em back after manipulating in imagemagick or whatever)	16:58.40
rayjj	nemo: I have a command line that uses gs to do it in one step	16:58.56
nemo	I could, I suppose, try to identify what the image is based on its colour profile and pick a different technique. could be a bit tedious tho	16:59.04
	but just making ghostscript use less hideous jpeg compression would already be a win	16:59.39
	rayjj: oh neat	16:59.58
	I love copying and pasting commandlines!	17:00.13
	(slow at reading)	17:00.17
rayjj	image result at http://casper.ghostscript.com/~ray/after_transfer_w_DCT.pdf -- command line:	17:04.36
	gswin32c -dColorConversionStrategy=/Gray -dProcessColorModel=/DeviceGray -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterGrayImages false /GrayImageFilter /DCTEncode >> setdistillerparams {dup .95 gt { pop 1 } if } settransfer" -f before.pdf	17:04.38
	nemo: this converts the image to Gray, and the result is 168,492 bytes. If I keep the transfer function, but stay in RGB, it is 177,083 bytes. That image is at http://casper.ghostscript.com/~ray/after_transfer_w_DCT_RGB.pdf	17:07.47
	for that, just leave off the -dColorConversionStrategy=/Gray -dProcessColorModel=/DeviceGray options	17:08.14
	oops. not exactly. Actually:	17:09.13
	gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode >> setdistillerparams { dup .95 gt { pop 1 } if } settransfer" -f before.pdf	17:09.14
nemo	hmmm	17:09.44
*nemo*	fires that off on the whole doc	17:09.50
rayjj	the "x.pdf" is just to simplify my testing w/ various settings -- I renamed afterwards	17:09.56
nemo	huh..	17:10.45
rayjj	nemo: the key is to force the DCTEncode because if I use the transfer functions and leave Auto__ImageFilter true then it selects FlateEncode and that is MUCH larger	17:11.26
nemo	setdistillerparams { dup .95 â that's for quality 95%?	17:11.53
rayjj	nemo: no, that's part of the non-linear transfer function	17:12.17
nemo	ah...	17:12.20
rayjj	the "mud" is mostly junk that was in the original that was made more visible by the second JPEG	17:12.52
nemo	ugh. I hate working w/ git	17:12.56
rayjj	nemo: quality 95% is the default AFAICT	17:13.06
nemo	hm	17:13.15
	I should print out the encoded jpegs gimp creates	17:13.36
	I hadn't done that yet	17:13.39
	printing does seem to make the mud more visible	17:13.47
rayjj	nemo: I haven't checked the QFactor actually being used. I am confused by the ACSImageDict and the ImageDict -- I don't know which one is it using (I'm not the pdfwrite expert, and kens is gone for the day)	17:16.31
	nemo: printing _would_ intensify the light gray dots due to dot gain, particularly on a laser engine	17:17.09
	nemo: can you try printing either of the pages I posted and compare to 'before.pdf" printed ?	17:17.54
nemo	rayjj: hm. um... can you relink?	17:18.06
rayjj	relink ???	17:18.18
nemo	rayjj: I'm trying to get !@#$ git to restore a file I'd deleted just so I can try your commandline	17:18.22
	rayjj: post the link again. disappeared in history over weekend and I'm on a new machine	17:18.37
	re-link	17:18.40
rayjj	git status -u shows the changed file, right ?	17:18.49
	then just use git checkout <changed_file>	17:19.18
	that'll restore to the "master" file	17:19.39
	nemo: even if it has been deleted	17:19.59
nemo	rayjj: yeah, I eventually got someone to tell me it was checkout	17:20.05
	it annoys me that it is so... different	17:20.11
	mercurial manages to be close enough to svn/cvs that I had no trouble adapting	17:20.28
rayjj	different to svn and cvs, yeah. But I've more or less gotten on terms with it	17:20.38
nemo	besides not screwing around with history and maintaining a clear timeline	17:20.39
	rayjj: â¥ mercurial	17:20.46
	I normally just convert to a mercurial repo if I need to do a lot of stuff	17:21.18
	but I'm mostly treating this repo as readonly	17:21.24
rayjj	I like local repository that svn and cvs don't have	17:21.31
nemo	I was just trying to clean up from the screwing around last week	17:21.33
	rayjj: yeah. that's what mercurial is for :D	17:21.42
	only more intuitive to use	17:21.45
*rayjj*	admits that git is NOT* intuitive*	17:22.11
	I'm not sure about trusting a repository to something named "mercurial" :-)	17:23.25
nemo	rayjj: heh. "git" is hardly better in english	17:23.57
rayjj	synonyms: volatile, capricious, temperamental, excitable, fickle, changeable, unpredictable, variable, protean, mutable, erratic, quicksilver, inconstant, inconsistent, unstable, unsteady, fluctuating, ever-changing, moody, flighty, wayward, whimsical, impulsive	17:24.11
	nemo: well, to those starting to use it "git" as a slang for "spawn of the devil" seems appropriate	17:24.48
nemo	rayjj: well, DVCS are indeed rather "protean"	17:25.36
	but mercurial has more of a backbone than git	17:25.41
	http://mercurial.selenic.com/wiki/GitConcepts	17:25.51
	http://www.webmonkey.com/2010/03/a-subversion-users-guide-to-mercurial-version-control/	17:26.07
rayjj	nemo: so which files do you need posted ? The 'before.pdf" ? (I assume that you have the ones from today)	17:27.02
	nemo: http://casper.ghostscript.com/~ray/before.pdf	17:28.03
nemo	rayjj: I mean, your processed file	17:28.21
	you wanted me to print it	17:28.25
	13:17 < rayjj> nemo: can you try printing either of the pages I posted and compare to 'before.pdf" printed ?	17:28.37
	obv I have "before" ;)	17:28.47
rayjj	the ones I just uploaded today are http://casper.ghostscript.com/~ray/after_transfer_w_DCT.pdf (Gray) and http://casper.ghostscript.com/~ray/after_transfer_w_DCT_RGB.pdf	17:30.09
nemo	hm	17:36.25
	yeah, I dunno...	17:36.49
	I'll run it past the boss	17:36.53
	also. let me see what happens if I use gimp	17:37.08
	frankly, this is more for my peace of mind	17:37.30
	they are perfectly happy to toss 30 gigabytes of badly scanned PDFs into the database	17:37.42
rayjj	nemo: also I have uploaded ones with different QFactor settings: http://casper.ghostscript.com/~ray/after_transfer_DCT_QF_95.pdf QF_76 QF_40 and QF_15 you can see the size differeces	17:37.44
nemo	hm	17:37.48
	how did you do that?	17:37.51
	set the QF?	17:37.54
	404	17:38.23
rayjj	command line:	17:38.43
	gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode /ColorImageDict << /QFactor 0.15 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> >> setdistillerparams { dup .95 gt { pop 1 } if } settransfer" -f before.pdf	17:38.44
	nemo: oops: http://casper.ghostscript.com/~ray/after_transfer_DCT_RGB_QF_95.pdf	17:39.30
	the 15 is somewhat cleaner than the 95, but the file size is 381,456 vs 173,296	17:42.36
	the before.pdf was 799,160	17:43.22
nemo	so one thing that puzzles me, is I thought "before" was lossless	17:43.34
	so why would there be a double encoding issue	17:43.39
rayjj	nemo: no, the before was DCT	17:44.12
nemo	hmmm	17:44.19
	'k	17:44.20
	thanks	17:44.21
	I missed that	17:44.23
	rayjj: I thought they were all Flate	17:44.35
	but perhaps some of the pages were DCT based on whatever the scan tool was doing	17:44.44
rayjj	object 3 from "before": <</BitsPerComponent 8/ColorSpace/DeviceRGB/Filter/DCTDecode/Height 4394/Length 757424/Subtype/Image/Type/XObject/Width 3435>>	17:44.59
nemo	rayjj: so yeah, those links are 404 fwiw	17:45.12
rayjj	nemo: I just tested it !!!	17:45.41
nemo	you're right.	17:45.43
	huh...	17:45.44
	not the links	17:45.47
	the DCT	17:45.48
	I just checked the original file	17:45.52
	strings foo.pdf \| grep -E Filter.*DCT \| wc -l	17:46.14
	307	17:46.14
	all DCT. bleah	17:46.18
rayjj	I just tested the link posted right before the file size	17:46.19
nemo	/msg'd	17:47.33
rayjj	nemo: FWIW, If I don't force DCT, then with the transfer function, it uses Flate and teh RGB size is 922,774	17:48.10
nemo	ew	17:48.15
	rayjj: that's even w/ downsampling DPI?	17:48.21
	oh wait	17:48.22
	you didn't reduce DPI!	17:48.25
	getting lower than 400 was kinda one of the main goals :)	17:48.45
rayjj	nemo: yes, I did: <</Subtype/Image/ColorSpace/DeviceRGB/Width 1288/Height 1647/BitsPerComponent 8/Filter/FlateDecode/DecodeParms<</Predictor 15/Columns 1288/Colors 3>>/Length 922657>>	17:49.39
	nemo: so that's 150 dpi	17:51.15
nemo	ahhh	17:51.19
	I missed that in the commandline above	17:51.27
	hm'k	17:51.39
rayjj	nemo: it's implied in the /ebook	17:51.41
nemo	well. that still helps	17:51.41
	oh :(	17:51.45
	â noob at this	17:51.51
	hm	17:52.18
	so. ...	17:52.21
	find -name ".pdf" \| while read f;do echo "$f $(strings "$f" \| grep -E "Filter.DCT" \| wc -l) $(strings "$f" \| grep -E "Filter.*Flate" \| wc -l)";done	17:52.26
	this is probably a stupid hack, but...	17:52.32
	_.pdf 36 79 _.pdf 197 400 _.pdf 39 85 etc etc	17:52.49
	pages are about 50:50 flate/dct	17:52.56
	I missed that in the first couple of sample files I pulled. ugh	17:53.03
	their tool must have been selecting based on the page	17:53.10
rayjj	nemo: me, too. BTW, I force conversion to Gray, then the Flate file size is 499,385	17:53.44
nemo	rayjj: The annoying thing is colour is so rare	17:54.18
rayjj	nemo: they probably use Auto_ImageFilter true, so it depends on the image contents	17:54.26
nemo	but I'd really have to consider on a page by page basis	17:54.27
rayjj	nemo: I suggest setting the QFactor to a low number, and just go with color. Even at 0.15, it is _still_smaller than Flate flattened to Gray	17:55.31
	I can upload the Flate if you want to compare with the QF files	17:55.58
	nemo: http://casper.ghostscript.com/~ray/after_transfer_Flate_Gray.pdf	17:58.32
	nemo: I have to run an errand. bbiaw	18:01.06
	nemo: can you see all of the QF files now ?	18:01.23
nemo	huh. why don't I see RGB for flate	18:06.50
	trying to figure out how many pages they did RGB and how many not	18:07.34
rayjj	nemo: I didn't post the RGB for the Flate (it was 922.738 bytes, so larger than "before"	18:42.12
nemo	rayjj: I mean, I was trying to figure out how many pages they did RGB and how many black and white, if any	18:43.03
	rayjj: is Flate always colour?	18:43.10
	basically, we are trying to determine how screwed up the 2nd batch of PDFs was	18:43.26
rayjj	nemo: no, Flate can be used for Gray or Color	18:44.08
nemo	hm	18:44.11
	trying to figure out where that is in the filter line	18:44.43
rayjj	nemo: gs can examine files and check if they have color, but the definition of 'neutral' is compiled in	18:44.57
nemo	if it doesn't say, is default colour?	18:45.07
	rayjj: I was just grepping the files to get a general idea	18:45.15
	strings foo.pdf \| grep -E "Filter.*Flate"	18:45.22
rayjj	in the command line, if one doesn't force ProcessColorModel and ColorConversionStrategy, then out will be whatever colorspace came in (per image)	18:46.00
	grep for DeviceGray, maybe ?	18:46.20
	or DeviceRGB.	18:46.33
	but with the 'before' file you sent, then image was RGB even though it looked like just shades of Gray	18:47.10
	nemo: give me a sec and I'll check what gs thinks about that 'before' page.	18:47.35
nemo	yeah	18:47.35
	rayjj: that Before one was the first batch, which was just stupid	18:47.43
	rayjj: 2nd batch is better	18:47.48
	trying to figure out approaches for both	18:47.54
	actually, the 2nd batch is just weird	18:48.03
	they picked 400DPI for everything, but often used B&W flate or even CCIT Fax	18:48.23
	CCITT Fax	18:48.31
	that is, a scanned page flattened to literally B&W, not greyscale	18:48.45
rayjj	nemo: if it doesn't have shades of gray, then CCITT is best compression	18:49.04
nemo	so I'm like... why... why are we using 400DPI here? obviously quality went out the window, not that it was ever really there to begin with	18:49.05
	rayjj: yeah, I'm just confused at the parameters chosen âº	18:49.17
	the choice of compression was probably indeed by some scanning software	18:49.29
	which probably also flattened the pages	18:49.36
	I guess what is happening is, the 2nd batch does auto WB.	18:49.54
	And as a result, some pages are close enough to B&W to trigger the software they are using to go into that mode	18:50.08
	and they kept 400DPI, just 'cause.	18:50.14
	the problem for me ofc, is it is harder to tell gs to be smart about such a crazy crazy mess	18:50.28
	rayjj: I'm thinking what I need gs to do really is just reduce DPI, but keep whatever algorithm they used	18:50.49
	maybe sometimes they used DCT or Flate inappropriately, but whatever.	18:51.02
rayjj	nemo: If the image comes in CCITT, gs will emit CCITT since that is 1 bpp and is lossless	18:51.19
nemo	and actually, on the original batch, where Before came from, not removing background is a win	18:51.22
	since it hides the double-encoding of jpeg âº	18:51.35
	and. I'm convinced now, that's where all my jpeg problems came from that were driving me batty.	18:51.56
	well, I don't really consider it a win, but boss does :D	18:52.15
	but. eh. lemme fire off your last suggested commandline against one from the new batch, and one from the old batch	18:52.43
	I'm sure the DBA will appreciate it regardless	18:52.56
	rayjj: and... not a single one of the Flate pages had device RGB or device Gray, so... going to guess it is just RGB	18:56.02
rayjj	nemo: grep may not be reliable. And there are other colorspaces that it might have used.	18:58.18
nemo	rayjj: I was eyeballing the strings, and couldn't find any mention of Device	18:59.14
	amusingly ColorSpace/DeviceGray is on the CCITTFaxDecode lines	18:59.53
	oh well whatever	18:59.56
rayjj	nemo: try grepping for "/Subtype.*Image"	19:00.05
	hmm... my grep calls it a binary file, so won't print the line :-(	19:00.30
nemo	heh	19:00.39
	I always run strings first	19:00.42
	tidier	19:00.45
rayjj	it just says: Binary file /c/Users/ray/Downloads/before.pdf matches	19:00.51
nemo	strings .pdf \| grep -E "/Subtype.Image"	19:00.58
	strings .pdf \| grep -E "/Subtype.Image" \| grep Flate	19:01.20
	returns nothing	19:01.22
	I did 2 greps 'cause I have no idea what order that should be in the line, and didn't feel like a complicated regex âº	19:01.39
	just... really weird	19:01.42
	oh well.	19:01.45
rayjj	but strings \| grep "Subtype.*Image" does give me the line	19:02.09
	nemo: just see without the second grep to make sure it is showing the Image obect	19:02.40
nemo	it was	19:02.48
	lots of CCITT fax lines :)	19:02.59
	but, I think bosses are pushing back on them to just rescan everything \o/ \o/	19:03.15
rayjj	nemo: there is nothing that can be done to reduce the size of the CCITT pages	19:03.23
nemo	we'll see. if they refuse, back to the programmer here w/ OCD to try and tidy it up	19:03.28
	rayjj: yeah, I don't care about them frankly	19:03.36
	rayjj: welll.... lower DPI would probably help some no? âº	19:03.46
	but frankly, more worried about the new ones	19:03.52
	er, more worried about the RGB Flate/DCT pages	19:04.18
rayjj	nemo: lower dpi doesn't help if it goes from DCT to Flate	19:06.18
nemo	rayjj: oh sure. I meant "nothing can be done for CCITT" - I mean, those would still be a bit of a win, but the files are so tiny, that, eh, who cares	19:06.42
rayjj	nemo: BTW, even though it looks Gray, the GrayDetection=true, it thinks it has color (with the default tolerance 5/255)	19:11.26
nemo	heh	19:13.30
	well, is a scan	19:13.34
	even white paper probably doesn't look white	19:13.50
	esp after sitting in a folder for a while	19:14.01
rayjj	nemo: I changed the transfer function a bit, and it looks better, IMHO. Please look at the files after_transfer_DCT_RGB_QF_15.pdf (and 40, 76 and 90)	19:26.28
	it cleans up more of the dots around the text, making them lighter. The overall image is slightly lighter, too	19:27.19
	I used { .93 div dup 1 gt { pop 1 } if } settransfer for these	19:27.41
nemo	heh. 90 is still 404 âº	19:32.27
	but I tried the others and those do work	19:32.42
	I hadn't tried them before	19:32.46
	huh. I must not understand how Quality factor works - is strange that 15 is the larger one. I thought quality decreased from 1.0 to 0	19:33.31
rayjj	nemo: This works MUCH better, and there is no noise added (compared to the Flate output) by using DCT even at QF 40 which has a file size of 253,283 The 76 adds a few dots, and the 95, quite a few	19:37.27
	nemo: TBH, so did I. I'm just reporting what I see. But the Ps2pdf.htm document that has the Notes 7, 8, 9, and 10 seem to imply that 0.15 is used for "prepress" (the best) and 0.95 for screen and ebook	19:39.22
	umm. 0.76 for screen and ebook, and 0.9 in general	19:40.05
	the "printer" setting is 0.40	19:40.26
nemo	weird	19:41.01
rayjj	nemo: BTW, it's 95, not 90. Sorry	19:41.05
nemo	ahhhh	19:41.14
rayjj	anyway, you have the "magic" to get the cleaned up file size, in color, down to 250K or so.	19:41.56
nemo	13:38 < rayjj> gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode /ColorImageDict << /QFactor 0.15 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> >> setdistillerparams { dup .95 gt { pop 1 } if } settransfer" -f before.pdf	19:42.25
rayjj	at the "reduced" resolution of 150 dpi	19:42.27
nemo	that one right âº	19:42.30
	where the only thing to fiddle w/ is QFactor	19:42.55
rayjj	nemo: not quite:	19:43.15
	gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode /ColorImageDict << /QFactor 0.15 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> >> setdistillerparams { .93 div 1 gt { pop 1 } if } settransfer" -f before.pdf	19:43.16
nemo	oh. 'k	19:43.20
	well, regardless of what they decide on, I think I'm going to find this handy in the future	19:44.03
	dumping it to my tips n tricks folder	19:44.12
	thanks.	19:44.14
rayjj	rather than leaving dots untouched that are below 95% white, it lightens everything up linearly by dividing by 0.93 (mul by 1.07) and clamps at white == 1	19:44.48
	I looked at the gray shades for some of the "noise" dots and there were some below 240	19:45.40
	you can play with the "0.93" if the image lightness seems too much. A higher number lightens less	19:46.12
nemo	I'm gonna give it a shot at "40" - there's still decen win in size and I couldn't pick out any artifacting.	19:52.06
	this is gonna take a looooooong time to run tho âº	19:52.15
	and, the lightness looks good to me	19:52.24
kens	rayjj, QFactor, page 163 of the PLRM:	19:53.55
	"Valid values are in the range 0 to 1,000,000. A value less than 1	19:53.55
	improves image quality but decreases compression; a value greater than 1	19:53.55
	increases compression but degrades image quality. Default value: 1.0."	19:53.55
nemo	O_o	19:54.07
	hm.	19:54.36
	I'm gonna try values bigger than 1 w/ your sample then	19:54.49
	oh wait. no. n/m. I forgot. the default value was already deemed too ugly	19:55.25
	eh. let's see what that range looks like	19:55.46
	~/git/ghostpdl/gs/bin/gs -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode /ColorImageDict << /QFactor 1000000 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> >> setdistillerparams { .93 div 1 gt { pop 1 } if } settransfer" -f before.pdf	19:56.38
	Error: /stackunderflow in --pop--	19:56.44
	hrm...	19:57.05
	you sure that line didn't get trimmed?	19:57.11
	that word "if" off on its own - I don't know much about the language used, but that seems odd	19:57.28
	I'm assuming postscript, but the tutorials I've found so far, not too helpful	19:59.51
rayjj	nemo: sorry. Let me cut and paste what worked for me. I had just hand edited yours	20:03.15
nemo	ah well, yours is probably better	20:03.31
rayjj	gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /ColorImageDict << /QFactor 0.95 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /AutoFilterColorImages false /ColorImageFilter /DCTEncode >> setdistillerparams { .93 div dup 1 gt { pop 1 } if } settransfer" -f before.pdf	20:04.58
	nemo: PS is a postfix operation language, so it's <bool> <proc> if (execute proc if bool is true)	20:06.01
nemo	hm. so. my system release version of gs ran it just fine	20:06.58
	9.10	20:07.05
rayjj	so, I forgot the 'dup' after the 'div' sorry	20:07.06
nemo	\| ./base/gsicc_manage.c:1685: gsicc_set_device_profile(): cannot find device profile	20:07.21
	anyway, let's see if the system one does the trick. it was crashing before, but, eh, maybe I'll get lucky	20:07.39
rayjj	nemo: I get that if I put /ProcessColorModel /DeviceGray /ColorConversionStrategy /Gray in the distillerparams dict (rather than as command line options). I am opening a bug.	20:08.33
	for kens :-)	20:08.48
*nemo*	moves them	20:09.42
	er. wait. I Don't see those. sooo. um. no idea what you mean	20:10.55
	(thought I just needed to move some parameters outside of the quoted block)	20:11.34
rayjj	nemo: I opened a bug for kens: http://bugs.ghostscript.com/show_bug.cgi?id=695420	20:32.00
	going offline for a bit...	20:33.23
nemo	m'k	20:33.38
	rayjj: well, that's a shame, the non-git version I have crashed as it did before	20:46.35
	sooo guess I either have to fix bug 695420, or wait âº	20:46.44
	fix/fall back to working version	21:08.06
	hm	21:08.07
	bisect. helps you, helps me	21:08.13
	shame I dislike git oh so much	21:08.19
henrys	Â mvrhel_laptop any word back for the meeting time?	21:19.03
mvrhel_laptop	henrys: yes :(	21:32.32
	at the last minute she cancelled it on me	21:32.46
	I am trying to salvage something now	21:32.55
	They are a bit strange over there	21:33.05
	She had cleared all of this earlier I had thought	21:33.56
	and there were people in multiple groups that were interested	21:34.19
henrys	mvrhel_laptop: so are you canceling the trip?	22:07.14
mvrhel_laptop	henrys; good question. she still wants to meet to discuss the book that I am working on. however I think we have to even meet off site	22:16.06
henrys	mvrhel_laptop: well if you need a place to stay, Iâve plenty of room.	22:20.35
mvrhel_laptop	henrys: sorry internet was down as cable guy was here working on it.	23:46.31
	so trip is cancelled	23:46.36
	so I will be here for the morning meeting	23:47.07
	Forward 1 day (to 2014/08/12)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.