IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2014/08/10)2014/08/11 
samtek hello everybody ..05:06.48 
  somebody there ..?05:08.11 
  hello robin_watts 05:08.58 
  i read your post and comments in stack overflow on MUPDF 05:09.22 
  hello mrdoc05:23.22 
kens tor8 ping09:46.01 
tor8 kens: morning.09:59.59 
kens Morning Tor10:00.08 
  Did you see the query from Amish, and our problems over the weekend ?10:00.24 
tor8 kens: ah, no. reading the mail now.10:01.01 
kens Its the IRC logs as well.10:01.08 
  THey need a patch to an archaic version of MuPDF, somewhere between 0.7 and 0.8 to fix a specific problem10:01.33 
  I canisolate it (from binaries and their mail) to somwehere between 4th December 2010 and 3rd April 201110:02.13 
tor8 ah yes, I remember they were very keen on chopping up the code and changing it around in their own fork10:02.36 
  which would make it very difficult for them to follow our development10:02.53 
kens But when I checkout any MuPDF commits between those dates it fails to buidl for me. And yes, tehy can't simply upgrade because they've changed laods of stuff10:02.56 
  They say they will upgrade (hmm.....) but they want a patch for this transparecny problem in the mentime.10:03.23 
tor8 kens: I think back in those commits we didn't have the thirdparty libraries set up as submodules yet10:03.27 
kens tor8 correct, they seem to be hardcoded in. Which makes the inability to build mopre surprising to me.10:03.55 
tor8 we have the tarballs required for the various dates on http://mupdf.com/downloads/archive/10:03.58 
kens tor8 I'm not sure that helps.....10:04.36 
  I know its after December 2010 and I know the 0.8 release is fixed, but I need the specific commit int those times10:05.00 
  Those archives seem to be (mostly) the releases10:05.49 
tor8 kens: mupdf-thirdparty-$DATE.zip10:06.27 
  ah, I see now that Robin's already told you this :) anyway, the thirdparty directories didn't change much back then10:06.47 
  it should be possible to build with the oldest of those zips in 0.9, or maybe even just use the system libraries. memory's hazy.10:07.15 
kens 0.9 is no good to me, I need to go back before 0.810:07.38 
  There's only one 3rd party of a relevant date that I can see, 2011-02-24.zip10:08.01 
  (messing about with code over 3 years old is painful....)10:08.17 
tor8 kens: yeah. let me see if I can get a checkout that old that builds.10:08.39 
kens OK thanks tor10:08.45 
  I confess I'm struggling with thise10:08.55 
tor8 kens: ah, for 0.7 there's a 0.7-thirdparty.zip which should work for that release (and probably the gits before and after it)10:09.59 
kens OK I'll pull that and see if I get anywhere, give me a minute or two, I'm trying to figure out why opdfread's insertion sort doesn't work, I'll need to save away what I'mdoing10:10.39 
tor8 kens: give me a few minutes and I might be able to say if it works or not :)10:11.03 
kens I guess that would help :-)10:11.16 
tor8 building on windows back in those release days was still a bit dodgy10:11.24 
kens Well it 'seems' to work, except that the 3rd party libraries throw compilation errors, especially openjpeg10:11.55 
tor8 well, a 0.7 release built just fine. I'm surprised!10:11.56 
  with the mupdf-0.7-thirdparty.zip archive10:12.12 
  but that's on linux though10:12.18 
kens I admit I didn't actually try that. Not being aware of the history I just pulled a checkout and expected it to build.....10:12.21 
  I could build on Linux, it'll just take longer10:12.35 
tor8 kens: oh, jconfig.h is missing in the mupdf-thirdparty-2011-02-24 tarball which makes building 0.8 non-trivial10:15.16 
  (unless you have libjpeg installed on your system)10:15.25 
kens Not for Windows :-)10:15.35 
  Hmm, I don't currently have a MuPDF clone on Linux10:16.39 
tor8 okay, so the fix they're looking for is somewhere between commit c8c6ac and 0.810:17.19 
  and that commit builds with the same thirdparty stuff as 0.810:17.29 
kens Yes, that's what I was able to discenr10:17.31 
tor8 so I guess it's git bisect time10:17.34 
kens Bisect was what IU was trying10:17.40 
  But an inability to build the code was a bit of a show stopper10:17.54 
tor8 if you take the 2011-02-24 thirdparty archive it should build, but it might fail on a missing jconfig.h (which you can copy from the 0.7-thirdparty zip)10:18.12 
tor8 types "man git-bisect"10:18.31 
kens OK.... But won't those files be overwritten on ecah bisect ?10:18.41 
  bah, typing.....10:18.48 
tor8 no, the thirdparty directory (pre-submodules) is completely untracked by git10:19.06 
kens Hmm, so when I do a git checkout c8c6... where do the contents of those directories come from ?10:19.54 
tor8 a clean checkout of c8c6 shouldn't have a thirdparty directory (rm -rf thirdparty if it's there)10:20.25 
  then unzip the thirdparty.zip10:20.36 
kens OK doing thast now10:21.12 
tor8 git might be reluctant to remove the now-unused-git-submodules because you might have some unsaved work there10:21.14 
kens It says it can't remove them (my checkout is clean) but I'm not going to worry10:21.30 
  D'oh wrong archive.....10:22.14 
  OK I can't do this on WIndows.10:23.53 
  It won't let me kill the thirdparty from the current checkout, nor can I rename the directory or anythign helpful like that10:24.20 
tor8 kens: that's ... odd10:26.08 
  0.7 has the problem they're showing, but the release they claim they're working from doesn't10:26.35 
kens Its some kind of Windows'ism, I hate the stupid 'I can't let you do that Dave' of Windows these days10:26.37 
  Is it possible they aren't really working from that release ? I wouldn't be surprised to find they are actually wrking from 0.7 with a number (but not all) of later fixes10:27.18 
tor8 I hate git-bisect and its terminology "good" and "bad" ...10:32.28 
  Some good revs are not ancestor of the bad rev.10:32.35 
kens Yeah that's always a problem10:32.38 
tor8 so now I have to invert my meaning of good and bad and I'm messing up everytime10:32.51 
kens I've had that before, never been able to resolve what it means10:32.53 
tor8 just doing a manual bisect is easier :(10:32.56 
kens Oh yeah, reversing the meaning is the way to go. I don't find it too hard10:33.13 
  But a manual bisect might be just as quick10:33.25 
tor8 It's always "git bisect start new old" and reverse the meaning10:33.33 
kens I've had to resort to that when intermediate commits don't build10:33.45 
tor8 I have narrowed it down to a commit, and it's one they claim they have on their branch10:33.51 
kens O.O10:33.58 
  The FZ_COMBINE rounding fix ?10:34.13 
tor8 or maybe that's just the latest they have pulled, ever10:34.15 
kens That would not surprise me10:34.26 
  I 'suspect' they have 0.7 with a random assortment of other patches10:34.45 
tor8 no, one that's 9 commits before the FZ_COMBINE2 rounding fix10:34.51 
  69815ed Apply soft masks from gstate to individual objects.10:34.57 
kens Yeah I seeit, that's the kind of commit I was looking for, but I, of course, was looking in the other direction.....10:35.36 
tor8 yeah, me too10:35.47 
kens I might have realised if I'd been able to build the commit they mentioned.10:35.50 
tor8 until I did a double-take when I realized the commit they mention actually works as well10:36.02 
kens Yeah, something of a clue that :-)10:36.15 
tor8 so, do you want to take it from here or should I jump into the discussion?10:36.37 
kens You want to write them a mail, or shall I ?10:36.40 
  LOL10:36.42 
  Up to you10:36.46 
tor8 You're better with customers ;)10:36.53 
kens is shocked anyone would say that to me10:37.10 
  But I'll write them a nice email, thanks tor10:37.22 
tor8 the commit in question is fairly isolated and should be easy to backport, even if they have changed the code a lot they should be able to figure it out10:37.45 
kens Well they wil have to won't they ? If they've changed the code significantly we can't do it for them :-)10:38.14 
tor8 indeed. basically some repeated lines of code have been refactored into a function, and the bug fixed in that function10:38.47 
kens Yeah I had a quick look at the changes, it didn't look too bad10:39.20 
tor8 and it's all in the interpreter side, so nothing I expect they'll have messed with. I think they did most of their changes to the device interface side10:39.43 
kens OK you know more than me :-)10:39.56 
tor8 I only have a hunch from what questions they were coming up with several years ago10:40.19 
  I think they basically rewrote the text extraction device, but in-place rather than just making their own separate device to do what they want10:40.50 
  which makes merging hell for them10:40.58 
kens Of course, because doing it properly would be too future proof....10:41.11 
tor8 I tried nudging them in that direction, but I have little patience for explaining obvious things...10:41.58 
kens I'd like to think that episodes like this would teach them, but experience says not10:42.34 
  OK mail to customer is gone, hopefully they will be happy.10:50.39 
  Back to insertion sorts in PostScript10:50.52 
  Hmm tor8 that customer seems to have been bought out, judging by the email I received. I'll CC it to support.10:58.26 
  Hmm, interesting, I seem ot have a LOCA table entry which is coming out as a null object, which is why the InsertionSort is failing12:37.24 
henrys tor8:thanks for helping out with that. FWIW I use this with bisect: http://stackoverflow.com/questions/15407075/how-could-i-use-git-bisect-to-find-the-first-good-commit 13:14.30 
buildMuPDF I want to build MuPDF. But I get the following error messages when running ndk-build:13:19.25 
  expected 'struct fz_rect const *' but argument is of type 'fz_rect'13:19.35 
  expected 'struct fz_matrix const *' but argument is of type 'fz_matrix'13:19.49 
  'fz_text_line' has no member named 'len'13:20.11 
kens Did you get the source from our Git repository, and if so, when ?13:20.17 
buildMuPDF 'fz_text_line' has no member named 'spans'13:20.25 
kens What's the SHA of the source you are using ?13:21.05 
buildMuPDF It is mupdf-a3d00b2c51c1df23258f774f58268be794384c27.tar.bz213:21.39 
kens Well that doesn't seem to be the latest, when did you fetch it ?13:22.36 
  What does "git log" say ?13:24.11 
  OK well that's back in May. I'm not aware of any problems there, but you should probably upadate to the current version anyway and try again13:25.30 
jogux kens: https://www.google.co.uk/search?client=safari&rls=en&q=mupdf-a3d00b2c51c1df23258f774f58268be794384c27.tar.bz2&ie=UTF-8&oe=UTF-8&gfe_rd=cr&ei=pMPoU57wGejY8gepl4DQCA suggests there's more going on here :-) several of the matches have patches along side them that modify the source after extracting it.13:25.34 
buildMuPDF It belongs to APV. It's from May 25, 2013. https://code.google.com/p/apv/source/browse/pdfview/?r=e949eb17e42186321523a595316e13a73a98456f#pdfview%2Fdeps13:25.52 
kens buildMuPDF : You can't expect us to support code you get from somewhere else13:26.08 
  You either need to use our code or take it up with APV13:26.33 
  jogux yes I see that, thanks, I did't know so many people were hosting versions of MuPDF13:27.11 
buildMuPDF Well, the error comes from mupdf13:27.22 
kens Brando753 : If someone changes the source, its not MuPDF13:27.43 
  AFAIK our current source builds for Android. I'm not one of the MuPDF developers, but if it was me I would tell you to get our version of the source, and try with that.13:28.17 
buildMuPDF It is using your MuPDF (unmodified)13:28.18 
kens is a cynic.13:28.37 
buildMuPDF Ok I will try. 13:28.50 
kens I wouldn't trust source that didn't come from us. And in any event, you are using source 2 months out of date13:28.55 
buildMuPDF It is their latest release. The others are beta.13:29.50 
kens shrugs13:30.01 
  Mayeb you should query APV about it ?13:30.11 
  I assume you are using the instructions in the README ?13:30.50 
buildMuPDF Does the newest MuPDFlibrary work exactly the same or did you make great changes that could stop apv from working?13:32.18 
kens I'm not aware of any breaking changes, but like I said, I don't work on MuPDF13:32.50 
tor8 buildMuPDF: we don't track what APV are doing. that said, we haven't made any breaking changes in the past month or so.13:33.01 
  the source as we provide it as releases on mupdf.com or by git from git.ghostscript.com builds fine for us on android13:33.45 
buildMuPDF Ok. Thank you. I will try your newest release.13:34.25 
kens chrisl ping13:38.53 
  Oh oops, he's away :-(13:39.05 
buildMuPDF Do you mean me??13:44.45 
kens No I mean chrisl13:44.53 
tor8 robin_watts_mac: ping13:49.07 
  paulgardiner: ping! I guess you're more likely to be around that Robin13:49.54 
robin_watts_mac pong.13:50.39 
  but just going to breakfast.13:50.44 
tor8 robin_watts_mac: I have two small commits on tor/master waiting for review13:51.05 
sebras tor8: the cbz-commit LGTM.13:58.27 
buildMuPDF Do you know if this code is compatible with your MuPDF library? http://dpaste.com/13XZ5DE13:59.47 
sebras buildMuPDF: no probably not.14:01.55 
buildMuPDF How can I change it to be compatible? 14:02.33 
tor8 buildMuPDF: no, it does not look entirely compatible.14:02.41 
  buildMuPDF: look in "include/mupdf/fitz/structured-text.h"14:02.52 
buildMuPDF (It is C not java)14:03.17 
sebras that code uses fz_text_line->len while here http://git.ghostscript.com/?p=mupdf.git;a=blob;f=include/mupdf/fitz/structured-text.h;h=f325bf216a5cc6946a7d8f95a4970ff09e4e7c14;hb=HEAD#l119 mupdf declares it fz_text_line not to have a len member, for example.14:03.17 
tor8 the changes are fairly minor; the page_block can contain both text and image blocks14:03.25 
sebras buildMuPDF: we know. ;)14:03.25 
tor8 so you'll need to check the type field and then get the text block14:03.48 
  and looping over the spans is a linked list rather than an array14:03.57 
sebras tor8: I agree with your description of your test_device-patch, but I don't understand how iscolor relates to dev->user..?14:06.26 
  tor8: that just seems wrong. why is dev->user passed to fz_test_color()..?14:06.51 
tor8 sebras dev->user points to an integer that contains the boolean result of the iscolor 'test'14:07.19 
  fz_new_test_device(ctx, &iscolor)14:07.38 
  run page14:07.40 
  read iscolor value14:07.43 
buildMuPDF Could you change it, please. As am only programming in java and don't know your library, it is very difficult for me.14:07.51 
sebras tor8: ah! now I see it! it is passed through fz_new_device(). ok then it makes sense.14:08.19 
  tor8: ok, LGTM.14:09.49 
tor8 buildMuPDF: look at fz_print_text_page http://git.ghostscript.com/?p=mupdf.git;a=blob;f=source/fitz/stext-output.c;h=6ed595fc1dac90a04da38db57a15d5d49ed06037;hb=HEAD#l363 and just restructure the code you have to that loop framework14:09.55 
  sebras: thanks.14:10.00 
  buildMuPDF: replace the /* for now lets just flatten */ block with the code in fz_print_text_page, substituting the printf for append_chars14:10.59 
buildMuPDF Including "void"? void fz_print_text_page(fz_context *ctx, fz_output *out, fz_text_page *page)14:14.55 
tor8 buildMuPDF: I thought you said you knew Java? C isn't that much different.14:17.26 
sebras buildMuPDF: you can't just quote the entire code including the argument declarations inside another function in C. in the same way you cannot do this in Java, right..?14:20.12 
buildMuPDF I just want to know if I have to include the void in line 362 or leave it out? (You pointed to l363)14:20.14 
sebras buildMuPDF: actually you don't need line 363 either.14:20.46 
tor8 buildMuPDF: I linked to the function and said copy the contents (of the function). copying the function would be rather pointless; as it already exists...14:20.49 
buildMuPDF Ok thank you.14:21.15 
sebras buildMuPDF: you need to read and understand your original extract_text(). you must understand how it loops over each type of datastructure to get at the pieces of text and how it stores it in the output string. then you need to read a bit of fz_print_text_page() to see how it differs in looping of the datastuctures to get at the pieces of text. in this case they are printed instead of being appended to a string.14:23.15 
  buildMuPDF: I think it will help you in the future if you actually spend the time to learn this now, I mean you might end up having to interact more with the native mupdf C library in the future and starting out with this quite simple code is a great way to start. :)14:24.31 
tor8 buildMuPDF: something like this http://collabedit.com/qu7rp14:25.10 
buildMuPDF I still get errors.14:48.58 
  http://dpaste.com/2FK9HY414:50.13 
kens Well the easy way to solve those is to cast them to const types14:50.53 
  Oh and you'll need to see what 'out' should be and define it14:51.15 
buildMuPDF Where do I see what out should be? You mean "fz_printf(out, "\n");"? 14:54.37 
sebras buildMuPDF: did you see the changes that tor8 and kens did for you?14:54.45 
  buildMuPDF: http://collabedit.com/qu7rp14:54.50 
kens Wasn't me, must have been tor14:54.59 
sebras buildMuPDF: it's a collaborative text editor online.14:55.00 
  kens: oh, I thought you contributed too. :)14:55.15 
kens I logged in to read it, I wouldn';t want to edit it and send someone wrong14:55.39 
robin_watts_mac tor8: OK, do you still need your commits reviewed? Which ones?15:00.12 
tor8 robin_watts_mac: no, I'm all set thanks to sebras15:00.33 
  go back to vacation!15:00.35 
robin_watts_mac ok.15:00.43 
kens Still in CHile Robin ?15:00.57 
sebras robin_watts_mac: bye! :)15:01.00 
robin_watts_mac kens: yeah, got back to santiago from Easter Island yesterday.15:01.35 
  Off to Dallas tonight.15:01.42 
kens Say hi to Scott for us :-)15:01.52 
robin_watts_mac Seeing Scott tomorrow, then we fly back to the UK on wed.15:01.57 
  Will do.15:01.59 
kens Hmm, thunder.... :-(15:02.04 
pedro_mac Robin: cool - have a safe trip back15:02.27 
buildMuPDF I'm running ndk-build and it looks really good. - It's still running without any errors :)15:02.48 
kens pedro_mac : just wants to ensure someone else comes and works on Smart Office :-)15:03.02 
pedro_mac kens: :)15:03.26 
buildMuPDF tor8: Thank you very much!15:03.38 
sebras buildMuPDF: most importantly. do you understand it better now..?15:04.47 
kens oh boy lightning now too, if I drop off suddenly you'll know why....15:05.58 
buildMuPDF Yes. Not everything but more than before.15:09.30 
sebras buildMuPDF: excellent, good work. :)15:10.22 
buildMuPDF Thank you and goodbye.15:12.31 
kens bb15:12.44 
robin_watts_mac see y'all in dallas.15:43.39 
mvrhel_laptop bbiab15:48.07 
kens Night all16:04.03 
pedro_mac g’nite kens16:05.21 
nemo rayjj: WB16:40.30 
  so, (a bit later) how did you flatten that PDF? might be worth trying to do it16:41.01 
  although... a fair number of pages are colour, so it might be just too difficult to descriminate16:41.17 
  probably better to just focus on using mutools to get more reliable jpeg compression than ghostscript can provide16:41.33 
rayjj nemo: OK, so I have something that cleans up the image16:56.16 
  nemo: I don't think mutool (or mudraw) can do this since it relies on image transfer function16:57.12 
  mudraw only supports gamma AFAICT16:57.50 
nemo rayjj: well, I was thinking more extracting the jpegs using mutools, then putting them back using it16:58.23 
  which you'd suggested16:58.26 
  (putting 'em back after manipulating in imagemagick or whatever)16:58.40 
rayjj nemo: I have a command line that uses gs to do it in one step16:58.56 
nemo I could, I suppose, try to identify what the image is based on its colour profile and pick a different technique. could be a bit tedious tho16:59.04 
  but just making ghostscript use less hideous jpeg compression would already be a win 16:59.39 
  rayjj: oh neat16:59.58 
  I love copying and pasting commandlines! 17:00.13 
  (slow at reading)17:00.17 
rayjj image result at http://casper.ghostscript.com/~ray/after_transfer_w_DCT.pdf -- command line:17:04.36 
  gswin32c -dColorConversionStrategy=/Gray -dProcessColorModel=/DeviceGray -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterGrayImages false /GrayImageFilter /DCTEncode >> setdistillerparams {dup .95 gt { pop 1 } if } settransfer" -f before.pdf17:04.38 
  nemo: this converts the image to Gray, and the result is 168,492 bytes. If I keep the transfer function, but stay in RGB, it is 177,083 bytes. That image is at http://casper.ghostscript.com/~ray/after_transfer_w_DCT_RGB.pdf17:07.47 
  for that, just leave off the -dColorConversionStrategy=/Gray -dProcessColorModel=/DeviceGray options17:08.14 
  oops. not exactly. Actually:17:09.13 
  gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode >> setdistillerparams { dup .95 gt { pop 1 } if } settransfer" -f before.pdf17:09.14 
nemo hmmm17:09.44 
nemo fires that off on the whole doc17:09.50 
rayjj the "x.pdf" is just to simplify my testing w/ various settings -- I renamed afterwards17:09.56 
nemo huh..17:10.45 
rayjj nemo: the key is to force the DCTEncode because if I use the transfer functions and leave Auto__ImageFilter true then it selects FlateEncode and that is MUCH larger17:11.26 
nemo setdistillerparams { dup .95 ← that's for quality 95%?17:11.53 
rayjj nemo: no, that's part of the non-linear transfer function17:12.17 
nemo ah...17:12.20 
rayjj the "mud" is mostly junk that was in the original that was made more visible by the second JPEG17:12.52 
nemo ugh. I hate working w/ git17:12.56 
rayjj nemo: quality 95% is the default AFAICT17:13.06 
nemo hm17:13.15 
  I should print out the encoded jpegs gimp creates17:13.36 
  I hadn't done that yet17:13.39 
  printing does seem to make the mud more visible17:13.47 
rayjj nemo: I haven't checked the QFactor actually being used. I am confused by the ACSImageDict and the ImageDict -- I don't know which one is it using (I'm not the pdfwrite expert, and kens is gone for the day)17:16.31 
  nemo: printing _would_ intensify the light gray dots due to dot gain, particularly on a laser engine17:17.09 
  nemo: can you try printing either of the pages I posted and compare to 'before.pdf" printed ?17:17.54 
nemo rayjj: hm. um... can you relink?17:18.06 
rayjj relink ???17:18.18 
nemo rayjj: I'm trying to get !@#$ git to restore a file I'd deleted just so I can try your commandline17:18.22 
  rayjj: post the link again. disappeared in history over weekend and I'm on a new machine17:18.37 
  re-link17:18.40 
rayjj git status -u shows the changed file, right ?17:18.49 
  then just use git checkout <changed_file>17:19.18 
  that'll restore to the "master" file17:19.39 
  nemo: even if it has been deleted17:19.59 
nemo rayjj: yeah, I eventually got someone to tell me it was checkout 17:20.05 
  it annoys me that it is so... different17:20.11 
  mercurial manages to be close enough to svn/cvs that I had no trouble adapting17:20.28 
rayjj different to svn and cvs, yeah. But I've more or less gotten on terms with it17:20.38 
nemo besides not screwing around with history and maintaining a clear timeline17:20.39 
  rayjj: ♥ mercurial17:20.46 
  I normally just convert to a mercurial repo if I need to do a lot of stuff17:21.18 
  but I'm mostly treating this repo as readonly17:21.24 
rayjj I like local repository that svn and cvs don't have17:21.31 
nemo I was just trying to clean up from the screwing around last week17:21.33 
  rayjj: yeah. that's what mercurial is for :D17:21.42 
  only more intuitive to use17:21.45 
rayjj admits that git is *NOT* intuitive17:22.11 
  I'm not sure about trusting a repository to something named "mercurial" :-)17:23.25 
nemo rayjj: heh. "git" is hardly better in english17:23.57 
rayjj synonyms: volatile, capricious, temperamental, excitable, fickle, changeable, unpredictable, variable, protean, mutable, erratic, quicksilver, inconstant, inconsistent, unstable, unsteady, fluctuating, ever-changing, moody, flighty, wayward, whimsical, impulsive17:24.11 
  nemo: well, to those starting to use it "git" as a slang for "spawn of the devil" seems appropriate17:24.48 
nemo rayjj: well, DVCS are indeed rather "protean"17:25.36 
  but mercurial has more of a backbone than git17:25.41 
  http://mercurial.selenic.com/wiki/GitConcepts17:25.51 
  http://www.webmonkey.com/2010/03/a-subversion-users-guide-to-mercurial-version-control/17:26.07 
rayjj nemo: so which files do you need posted ? The 'before.pdf" ? (I assume that you have the ones from today)17:27.02 
  nemo: http://casper.ghostscript.com/~ray/before.pdf17:28.03 
nemo rayjj: I mean, your processed file17:28.21 
  you wanted me to print it17:28.25 
  13:17 < rayjj> nemo: can you try printing either of the pages I posted and compare to 'before.pdf" printed ?17:28.37 
  obv I have "before" ;)17:28.47 
rayjj the ones I just uploaded today are http://casper.ghostscript.com/~ray/after_transfer_w_DCT.pdf (Gray) and http://casper.ghostscript.com/~ray/after_transfer_w_DCT_RGB.pdf17:30.09 
nemo hm17:36.25 
  yeah, I dunno...17:36.49 
  I'll run it past the boss17:36.53 
  also. let me see what happens if I use gimp17:37.08 
  frankly, this is more for my peace of mind17:37.30 
  they are perfectly happy to toss 30 gigabytes of badly scanned PDFs into the database17:37.42 
rayjj nemo: also I have uploaded ones with different QFactor settings: http://casper.ghostscript.com/~ray/after_transfer_DCT_QF_95.pdf QF_76 QF_40 and QF_15 you can see the size differeces17:37.44 
nemo hm17:37.48 
  how did you do that?17:37.51 
  set the QF?17:37.54 
  40417:38.23 
rayjj command line:17:38.43 
  gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode /ColorImageDict << /QFactor 0.15 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> >> setdistillerparams { dup .95 gt { pop 1 } if } settransfer" -f before.pdf17:38.44 
  nemo: oops: http://casper.ghostscript.com/~ray/after_transfer_DCT_RGB_QF_95.pdf17:39.30 
  the 15 is somewhat cleaner than the 95, but the file size is 381,456 vs 173,29617:42.36 
  the before.pdf was 799,16017:43.22 
nemo so one thing that puzzles me, is I thought "before" was lossless17:43.34 
  so why would there be a double encoding issue17:43.39 
rayjj nemo: no, the before was DCT17:44.12 
nemo hmmm17:44.19 
  'k17:44.20 
  thanks17:44.21 
  I missed that17:44.23 
  rayjj: I thought they were all Flate17:44.35 
  but perhaps some of the pages were DCT based on whatever the scan tool was doing17:44.44 
rayjj object 3 from "before": <</BitsPerComponent 8/ColorSpace/DeviceRGB/Filter/DCTDecode/Height 4394/Length 757424/Subtype/Image/Type/XObject/Width 3435>>17:44.59 
nemo rayjj: so yeah, those links are 404 fwiw17:45.12 
rayjj nemo: I just tested it !!!17:45.41 
nemo you're right.17:45.43 
  huh...17:45.44 
  not the links17:45.47 
  the DCT17:45.48 
  I just checked the original file17:45.52 
  strings foo.pdf | grep -E Filter.*DCT | wc -l17:46.14 
  30717:46.14 
  all DCT. bleah17:46.18 
rayjj I just tested the link posted right before the file size17:46.19 
nemo /msg'd17:47.33 
rayjj nemo: FWIW, If I don't force DCT, then with the transfer function, it uses Flate and teh RGB size is 922,774 17:48.10 
nemo ew17:48.15 
  rayjj: that's even w/ downsampling DPI? 17:48.21 
  oh wait17:48.22 
  you didn't reduce DPI!17:48.25 
  getting lower than 400 was kinda one of the main goals :)17:48.45 
rayjj nemo: yes, I did: <</Subtype/Image/ColorSpace/DeviceRGB/Width 1288/Height 1647/BitsPerComponent 8/Filter/FlateDecode/DecodeParms<</Predictor 15/Columns 1288/Colors 3>>/Length 922657>>17:49.39 
  nemo: so that's 150 dpi17:51.15 
nemo ahhh17:51.19 
  I missed that in the commandline above17:51.27 
  hm'k17:51.39 
rayjj nemo: it's implied in the /ebook17:51.41 
nemo well. that still helps17:51.41 
  oh :(17:51.45 
  â† noob at this17:51.51 
  hm17:52.18 
  so. ...17:52.21 
  find -name "*.pdf" | while read f;do echo "$f $(strings "$f" | grep -E "Filter.*DCT" | wc -l) $(strings "$f" | grep -E "Filter.*Flate" | wc -l)";done17:52.26 
  this is probably a stupid hack, but...17:52.32 
  _.pdf 36 79 _.pdf 197 400 _.pdf 39 85 etc etc17:52.49 
  pages are about 50:50 flate/dct17:52.56 
  I missed that in the first couple of sample files I pulled. ugh17:53.03 
  their tool must have been selecting based on the page17:53.10 
rayjj nemo: me, too. BTW, I force conversion to Gray, then the Flate file size is 499,38517:53.44 
nemo rayjj: The annoying thing is colour is so *rare*17:54.18 
rayjj nemo: they probably use Auto_ImageFilter true, so it depends on the image contents17:54.26 
nemo but I'd really have to consider on a page by page basis17:54.27 
rayjj nemo: I suggest setting the QFactor to a low number, and just go with color. Even at 0.15, it is _still_smaller than Flate flattened to Gray17:55.31 
  I can upload the Flate if you want to compare with the QF files17:55.58 
  nemo: http://casper.ghostscript.com/~ray/after_transfer_Flate_Gray.pdf17:58.32 
  nemo: I have to run an errand. bbiaw18:01.06 
  nemo: can you see all of the QF files now ?18:01.23 
nemo huh. why don't I see RGB for flate18:06.50 
  trying to figure out how many pages they did RGB and how many not18:07.34 
rayjj nemo: I didn't post the RGB for the Flate (it was 922.738 bytes, so larger than "before"18:42.12 
nemo rayjj: I mean, I was trying to figure out how many pages they did RGB and how many black and white, if any18:43.03 
  rayjj: is Flate always colour?18:43.10 
  basically, we are trying to determine how screwed up the 2nd batch of PDFs was18:43.26 
rayjj nemo: no, Flate can be used for Gray or Color18:44.08 
nemo hm18:44.11 
  trying to figure out where that is in the filter line18:44.43 
rayjj nemo: gs can examine files and check if they have color, but the definition of 'neutral' is compiled in18:44.57 
nemo if it doesn't say, is default colour?18:45.07 
  rayjj: I was just grepping the files to get a general idea18:45.15 
  strings foo.pdf | grep -E "Filter.*Flate"18:45.22 
rayjj in the command line, if one doesn't force ProcessColorModel and ColorConversionStrategy, then out will be whatever colorspace came in (per image)18:46.00 
  grep for DeviceGray, maybe ?18:46.20 
  or DeviceRGB.18:46.33 
  but with the 'before' file you sent, then image was RGB even though it looked like just shades of Gray18:47.10 
  nemo: give me a sec and I'll check what gs thinks about that 'before' page.18:47.35 
nemo yeah18:47.35 
  rayjj: that Before one was the first batch, which was just stupid18:47.43 
  rayjj: 2nd batch is better18:47.48 
  trying to figure out approaches for both18:47.54 
  actually, the 2nd batch is just weird18:48.03 
  they picked 400DPI for everything, but often used B&W flate or even CCIT Fax18:48.23 
  CCITT Fax18:48.31 
  that is, a scanned page flattened to literally B&W, not greyscale18:48.45 
rayjj nemo: if it doesn't have shades of gray, then CCITT is best compression18:49.04 
nemo so I'm like... why... why are we using 400DPI here? obviously quality went out the window, not that it was ever really there to begin with18:49.05 
  rayjj: yeah, I'm just confused at the parameters chosen ☺18:49.17 
  the choice of compression was probably indeed by some scanning software18:49.29 
  which probably also flattened the pages18:49.36 
  I guess what is happening is, the 2nd batch does auto WB.18:49.54 
  And as a result, some pages are close enough to B&W to trigger the software they are using to go into that mode18:50.08 
  and they kept 400DPI, just 'cause.18:50.14 
  the problem for me ofc, is it is harder to tell gs to be smart about such a crazy crazy mess18:50.28 
  rayjj: I'm thinking what I need gs to do really is just reduce DPI, but keep whatever algorithm they used18:50.49 
  maybe sometimes they used DCT or Flate inappropriately, but whatever.18:51.02 
rayjj nemo: If the image comes in CCITT, gs will emit CCITT since that is 1 bpp and is lossless18:51.19 
nemo and actually, on the original batch, where Before came from, not removing background is a win18:51.22 
  since it hides the double-encoding of jpeg ☺18:51.35 
  and. I'm convinced now, that's where all my jpeg problems came from that were driving me batty.18:51.56 
  well, I don't really consider it a win, but boss does :D18:52.15 
  but. eh. lemme fire off your last suggested commandline against one from the new batch, and one from the old batch18:52.43 
  I'm sure the DBA will appreciate it regardless18:52.56 
  rayjj: and... not a single one of the Flate pages had device RGB or device Gray, so... going to guess it is just RGB18:56.02 
rayjj nemo: grep may not be reliable. And there are other colorspaces that it might have used.18:58.18 
nemo rayjj: I was eyeballing the strings, and couldn't find any mention of Device18:59.14 
  amusingly ColorSpace/DeviceGray is on the CCITTFaxDecode lines18:59.53 
  oh well whatever18:59.56 
rayjj nemo: try grepping for "/Subtype.*Image"19:00.05 
  hmm... my grep calls it a binary file, so won't print the line :-(19:00.30 
nemo heh19:00.39 
  I always run strings first19:00.42 
  tidier19:00.45 
rayjj it just says: Binary file /c/Users/ray/Downloads/before.pdf matches19:00.51 
nemo strings *.pdf | grep -E "/Subtype.*Image"19:00.58 
  strings *.pdf | grep -E "/Subtype.*Image" | grep Flate19:01.20 
  returns nothing19:01.22 
  I did 2 greps 'cause I have no idea what order that should be in the line, and didn't feel like a complicated regex ☺19:01.39 
  just... really weird19:01.42 
  oh well.19:01.45 
rayjj but strings | grep "Subtype.*Image" does give me the line19:02.09 
  nemo: just see without the second grep to make sure it is showing the Image obect19:02.40 
nemo it was19:02.48 
  lots of CCITT fax lines :)19:02.59 
  but, I think bosses are pushing back on them to just rescan everything \o/ \o/19:03.15 
rayjj nemo: there is nothing that can be done to reduce the size of the CCITT pages19:03.23 
nemo we'll see. if they refuse, back to the programmer here w/ OCD to try and tidy it up19:03.28 
  rayjj: yeah, I don't care about them frankly19:03.36 
  rayjj: welll.... lower DPI would probably help some no? ☺19:03.46 
  but frankly, more worried about the new ones19:03.52 
  er, more worried about the RGB Flate/DCT pages19:04.18 
rayjj nemo: lower dpi doesn't help if it goes from DCT to Flate19:06.18 
nemo rayjj: oh sure. I meant "nothing can be done for CCITT" - I mean, those would still be a bit of a win, but the files are so tiny, that, eh, who cares19:06.42 
rayjj nemo: BTW, even though it looks Gray, the GrayDetection=true, it thinks it has color (with the default tolerance 5/255)19:11.26 
nemo heh19:13.30 
  well, is a scan19:13.34 
  even white paper probably doesn't look white19:13.50 
  esp after sitting in a folder for a while19:14.01 
rayjj nemo: I changed the transfer function a bit, and it looks better, IMHO. Please look at the files after_transfer_DCT_RGB_QF_15.pdf (and 40, 76 and 90)19:26.28 
  it cleans up more of the dots around the text, making them lighter. The overall image is slightly lighter, too19:27.19 
  I used { .93 div dup 1 gt { pop 1 } if } settransfer for these19:27.41 
nemo heh. 90 is still 404 ☺19:32.27 
  but I tried the others and those do work19:32.42 
  I hadn't tried them before19:32.46 
  huh. I must not understand how Quality factor works - is strange that 15 is the larger one. I thought quality decreased from 1.0 to 019:33.31 
rayjj nemo: This works *MUCH* better, and there is no noise added (compared to the Flate output) by using DCT even at QF 40 which has a file size of 253,283 The 76 adds a few dots, and the 95, quite a few19:37.27 
  nemo: TBH, so did I. I'm just reporting what I see. But the Ps2pdf.htm document that has the Notes 7, 8, 9, and 10 seem to imply that 0.15 is used for "prepress" (the best) and 0.95 for screen and ebook19:39.22 
  umm. 0.76 for screen and ebook, and 0.9 in general19:40.05 
  the "printer" setting is 0.4019:40.26 
nemo weird19:41.01 
rayjj nemo: BTW, it's 95, not 90. Sorry19:41.05 
nemo ahhhh19:41.14 
rayjj anyway, you have the "magic" to get the cleaned up file size, in color, down to 250K or so.19:41.56 
nemo 13:38 < rayjj> gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode /ColorImageDict << /QFactor 0.15 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> >> setdistillerparams { dup .95 gt { pop 1 } if } settransfer" -f before.pdf19:42.25 
rayjj at the "reduced" resolution of 150 dpi19:42.27 
nemo that one right ☺19:42.30 
  where the only thing to fiddle w/ is QFactor19:42.55 
rayjj nemo: not quite:19:43.15 
  gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode /ColorImageDict << /QFactor 0.15 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> >> setdistillerparams { .93 div 1 gt { pop 1 } if } settransfer" -f before.pdf19:43.16 
nemo oh. 'k19:43.20 
  well, regardless of what they decide on, I think I'm going to find this handy in the future19:44.03 
  dumping it to my tips n tricks folder19:44.12 
  thanks.19:44.14 
rayjj rather than leaving dots untouched that are below 95% white, it lightens everything up linearly by dividing by 0.93 (mul by 1.07) and clamps at white == 119:44.48 
  I looked at the gray shades for some of the "noise" dots and there were some below 24019:45.40 
  you can play with the "0.93" if the image lightness seems too much. A higher number lightens less19:46.12 
nemo I'm gonna give it a shot at "40" - there's still decen win in size and I couldn't pick out any artifacting.19:52.06 
  this is gonna take a looooooong time to run tho ☺19:52.15 
  and, the lightness looks good to me19:52.24 
kens rayjj, QFactor, page 163 of the PLRM:19:53.55 
  "Valid values are in the range 0 to 1,000,000. A value less than 119:53.55 
  improves image quality but decreases compression; a value greater than 119:53.55 
  increases compression but degrades image quality. Default value: 1.0."19:53.55 
nemo O_o19:54.07 
  hm.19:54.36 
  I'm gonna try values bigger than 1 w/ your sample then19:54.49 
  oh wait. no. n/m. I forgot. the default value was already deemed too ugly19:55.25 
  eh. let's see what that range looks like 19:55.46 
  ~/git/ghostpdl/gs/bin/gs -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode /ColorImageDict << /QFactor 1000000 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> >> setdistillerparams { .93 div 1 gt { pop 1 } if } settransfer" -f before.pdf 19:56.38 
  Error: /stackunderflow in --pop--19:56.44 
  hrm...19:57.05 
  you sure that line didn't get trimmed?19:57.11 
  that word "if" off on its own - I don't know much about the language used, but that seems odd19:57.28 
  I'm assuming postscript, but the tutorials I've found so far, not too helpful19:59.51 
rayjj nemo: sorry. Let me cut and paste what worked for me. I had just hand edited yours20:03.15 
nemo ah well, yours is probably better20:03.31 
rayjj gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /ColorImageDict << /QFactor 0.95 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /AutoFilterColorImages false /ColorImageFilter /DCTEncode >> setdistillerparams { .93 div dup 1 gt { pop 1 } if } settransfer" -f before.pdf20:04.58 
  nemo: PS is a postfix operation language, so it's <bool> <proc> if (execute proc if bool is true)20:06.01 
nemo hm. so. my system release version of gs ran it just fine20:06.58 
  9.1020:07.05 
rayjj so, I forgot the 'dup' after the 'div' sorry20:07.06 
nemo | ./base/gsicc_manage.c:1685: gsicc_set_device_profile(): cannot find device profile20:07.21 
  anyway, let's see if the system one does the trick. it was crashing before, but, eh, maybe I'll get lucky20:07.39 
rayjj nemo: I get that if I put /ProcessColorModel /DeviceGray /ColorConversionStrategy /Gray in the distillerparams dict (rather than as command line options). I am opening a bug.20:08.33 
  for kens :-)20:08.48 
nemo moves them20:09.42 
  er. wait. I Don't see those. sooo. um. no idea what you mean20:10.55 
  (thought I just needed to move some parameters outside of the quoted block)20:11.34 
rayjj nemo: I opened a bug for kens: http://bugs.ghostscript.com/show_bug.cgi?id=69542020:32.00 
  going offline for a bit...20:33.23 
nemo m'k20:33.38 
  rayjj: well, that's a shame, the non-git version I have crashed as it did before20:46.35 
  sooo guess I either have to fix bug 695420, or wait ☺20:46.44 
  fix/fall back to working version21:08.06 
  hm21:08.07 
  bisect. helps you, helps me21:08.13 
  shame I dislike git oh so much21:08.19 
henrys Â mvrhel_laptop any word back for the meeting time?21:19.03 
mvrhel_laptop henrys: yes :(21:32.32 
  at the last minute she cancelled it on me21:32.46 
  I am trying to salvage something now21:32.55 
  They are a bit strange over there21:33.05 
  She had cleared all of this earlier I had thought21:33.56 
  and there were people in multiple groups that were interested 21:34.19 
henrys mvrhel_laptop: so are you canceling the trip?22:07.14 
mvrhel_laptop henrys; good question. she still wants to meet to discuss the book that I am working on. however I think we have to even meet off site22:16.06 
henrys mvrhel_laptop: well if you need a place to stay, I’ve plenty of room.22:20.35 
mvrhel_laptop henrys: sorry internet was down as cable guy was here working on it.23:46.31 
  so trip is cancelled23:46.36 
  so I will be here for the morning meeting23:47.07 
 Forward 1 day (to 2014/08/12)>>> 
ghostscript.com
Search: