| <<<Back 1 day (to 2014/08/10) | 2014/08/11 |
samtek | hello everybody .. | 05:06.48 |
| somebody there ..? | 05:08.11 |
| hello robin_watts | 05:08.58 |
| i read your post and comments in stack overflow on MUPDF | 05:09.22 |
| hello mrdoc | 05:23.22 |
kens | tor8 ping | 09:46.01 |
tor8 | kens: morning. | 09:59.59 |
kens | Morning Tor | 10:00.08 |
| Did you see the query from Amish, and our problems over the weekend ? | 10:00.24 |
tor8 | kens: ah, no. reading the mail now. | 10:01.01 |
kens | Its the IRC logs as well. | 10:01.08 |
| THey need a patch to an archaic version of MuPDF, somewhere between 0.7 and 0.8 to fix a specific problem | 10:01.33 |
| I canisolate it (from binaries and their mail) to somwehere between 4th December 2010 and 3rd April 2011 | 10:02.13 |
tor8 | ah yes, I remember they were very keen on chopping up the code and changing it around in their own fork | 10:02.36 |
| which would make it very difficult for them to follow our development | 10:02.53 |
kens | But when I checkout any MuPDF commits between those dates it fails to buidl for me. And yes, tehy can't simply upgrade because they've changed laods of stuff | 10:02.56 |
| They say they will upgrade (hmm.....) but they want a patch for this transparecny problem in the mentime. | 10:03.23 |
tor8 | kens: I think back in those commits we didn't have the thirdparty libraries set up as submodules yet | 10:03.27 |
kens | tor8 correct, they seem to be hardcoded in. Which makes the inability to build mopre surprising to me. | 10:03.55 |
tor8 | we have the tarballs required for the various dates on http://mupdf.com/downloads/archive/ | 10:03.58 |
kens | tor8 I'm not sure that helps..... | 10:04.36 |
| I know its after December 2010 and I know the 0.8 release is fixed, but I need the specific commit int those times | 10:05.00 |
| Those archives seem to be (mostly) the releases | 10:05.49 |
tor8 | kens: mupdf-thirdparty-$DATE.zip | 10:06.27 |
| ah, I see now that Robin's already told you this :) anyway, the thirdparty directories didn't change much back then | 10:06.47 |
| it should be possible to build with the oldest of those zips in 0.9, or maybe even just use the system libraries. memory's hazy. | 10:07.15 |
kens | 0.9 is no good to me, I need to go back before 0.8 | 10:07.38 |
| There's only one 3rd party of a relevant date that I can see, 2011-02-24.zip | 10:08.01 |
| (messing about with code over 3 years old is painful....) | 10:08.17 |
tor8 | kens: yeah. let me see if I can get a checkout that old that builds. | 10:08.39 |
kens | OK thanks tor | 10:08.45 |
| I confess I'm struggling with thise | 10:08.55 |
tor8 | kens: ah, for 0.7 there's a 0.7-thirdparty.zip which should work for that release (and probably the gits before and after it) | 10:09.59 |
kens | OK I'll pull that and see if I get anywhere, give me a minute or two, I'm trying to figure out why opdfread's insertion sort doesn't work, I'll need to save away what I'mdoing | 10:10.39 |
tor8 | kens: give me a few minutes and I might be able to say if it works or not :) | 10:11.03 |
kens | I guess that would help :-) | 10:11.16 |
tor8 | building on windows back in those release days was still a bit dodgy | 10:11.24 |
kens | Well it 'seems' to work, except that the 3rd party libraries throw compilation errors, especially openjpeg | 10:11.55 |
tor8 | well, a 0.7 release built just fine. I'm surprised! | 10:11.56 |
| with the mupdf-0.7-thirdparty.zip archive | 10:12.12 |
| but that's on linux though | 10:12.18 |
kens | I admit I didn't actually try that. Not being aware of the history I just pulled a checkout and expected it to build..... | 10:12.21 |
| I could build on Linux, it'll just take longer | 10:12.35 |
tor8 | kens: oh, jconfig.h is missing in the mupdf-thirdparty-2011-02-24 tarball which makes building 0.8 non-trivial | 10:15.16 |
| (unless you have libjpeg installed on your system) | 10:15.25 |
kens | Not for Windows :-) | 10:15.35 |
| Hmm, I don't currently have a MuPDF clone on Linux | 10:16.39 |
tor8 | okay, so the fix they're looking for is somewhere between commit c8c6ac and 0.8 | 10:17.19 |
| and that commit builds with the same thirdparty stuff as 0.8 | 10:17.29 |
kens | Yes, that's what I was able to discenr | 10:17.31 |
tor8 | so I guess it's git bisect time | 10:17.34 |
kens | Bisect was what IU was trying | 10:17.40 |
| But an inability to build the code was a bit of a show stopper | 10:17.54 |
tor8 | if you take the 2011-02-24 thirdparty archive it should build, but it might fail on a missing jconfig.h (which you can copy from the 0.7-thirdparty zip) | 10:18.12 |
tor8 | types "man git-bisect" | 10:18.31 |
kens | OK.... But won't those files be overwritten on ecah bisect ? | 10:18.41 |
| bah, typing..... | 10:18.48 |
tor8 | no, the thirdparty directory (pre-submodules) is completely untracked by git | 10:19.06 |
kens | Hmm, so when I do a git checkout c8c6... where do the contents of those directories come from ? | 10:19.54 |
tor8 | a clean checkout of c8c6 shouldn't have a thirdparty directory (rm -rf thirdparty if it's there) | 10:20.25 |
| then unzip the thirdparty.zip | 10:20.36 |
kens | OK doing thast now | 10:21.12 |
tor8 | git might be reluctant to remove the now-unused-git-submodules because you might have some unsaved work there | 10:21.14 |
kens | It says it can't remove them (my checkout is clean) but I'm not going to worry | 10:21.30 |
| D'oh wrong archive..... | 10:22.14 |
| OK I can't do this on WIndows. | 10:23.53 |
| It won't let me kill the thirdparty from the current checkout, nor can I rename the directory or anythign helpful like that | 10:24.20 |
tor8 | kens: that's ... odd | 10:26.08 |
| 0.7 has the problem they're showing, but the release they claim they're working from doesn't | 10:26.35 |
kens | Its some kind of Windows'ism, I hate the stupid 'I can't let you do that Dave' of Windows these days | 10:26.37 |
| Is it possible they aren't really working from that release ? I wouldn't be surprised to find they are actually wrking from 0.7 with a number (but not all) of later fixes | 10:27.18 |
tor8 | I hate git-bisect and its terminology "good" and "bad" ... | 10:32.28 |
| Some good revs are not ancestor of the bad rev. | 10:32.35 |
kens | Yeah that's always a problem | 10:32.38 |
tor8 | so now I have to invert my meaning of good and bad and I'm messing up everytime | 10:32.51 |
kens | I've had that before, never been able to resolve what it means | 10:32.53 |
tor8 | just doing a manual bisect is easier :( | 10:32.56 |
kens | Oh yeah, reversing the meaning is the way to go. I don't find it too hard | 10:33.13 |
| But a manual bisect might be just as quick | 10:33.25 |
tor8 | It's always "git bisect start new old" and reverse the meaning | 10:33.33 |
kens | I've had to resort to that when intermediate commits don't build | 10:33.45 |
tor8 | I have narrowed it down to a commit, and it's one they claim they have on their branch | 10:33.51 |
kens | O.O | 10:33.58 |
| The FZ_COMBINE rounding fix ? | 10:34.13 |
tor8 | or maybe that's just the latest they have pulled, ever | 10:34.15 |
kens | That would not surprise me | 10:34.26 |
| I 'suspect' they have 0.7 with a random assortment of other patches | 10:34.45 |
tor8 | no, one that's 9 commits before the FZ_COMBINE2 rounding fix | 10:34.51 |
| 69815ed Apply soft masks from gstate to individual objects. | 10:34.57 |
kens | Yeah I seeit, that's the kind of commit I was looking for, but I, of course, was looking in the other direction..... | 10:35.36 |
tor8 | yeah, me too | 10:35.47 |
kens | I might have realised if I'd been able to build the commit they mentioned. | 10:35.50 |
tor8 | until I did a double-take when I realized the commit they mention actually works as well | 10:36.02 |
kens | Yeah, something of a clue that :-) | 10:36.15 |
tor8 | so, do you want to take it from here or should I jump into the discussion? | 10:36.37 |
kens | You want to write them a mail, or shall I ? | 10:36.40 |
| LOL | 10:36.42 |
| Up to you | 10:36.46 |
tor8 | You're better with customers ;) | 10:36.53 |
kens | is shocked anyone would say that to me | 10:37.10 |
| But I'll write them a nice email, thanks tor | 10:37.22 |
tor8 | the commit in question is fairly isolated and should be easy to backport, even if they have changed the code a lot they should be able to figure it out | 10:37.45 |
kens | Well they wil have to won't they ? If they've changed the code significantly we can't do it for them :-) | 10:38.14 |
tor8 | indeed. basically some repeated lines of code have been refactored into a function, and the bug fixed in that function | 10:38.47 |
kens | Yeah I had a quick look at the changes, it didn't look too bad | 10:39.20 |
tor8 | and it's all in the interpreter side, so nothing I expect they'll have messed with. I think they did most of their changes to the device interface side | 10:39.43 |
kens | OK you know more than me :-) | 10:39.56 |
tor8 | I only have a hunch from what questions they were coming up with several years ago | 10:40.19 |
| I think they basically rewrote the text extraction device, but in-place rather than just making their own separate device to do what they want | 10:40.50 |
| which makes merging hell for them | 10:40.58 |
kens | Of course, because doing it properly would be too future proof.... | 10:41.11 |
tor8 | I tried nudging them in that direction, but I have little patience for explaining obvious things... | 10:41.58 |
kens | I'd like to think that episodes like this would teach them, but experience says not | 10:42.34 |
| OK mail to customer is gone, hopefully they will be happy. | 10:50.39 |
| Back to insertion sorts in PostScript | 10:50.52 |
| Hmm tor8 that customer seems to have been bought out, judging by the email I received. I'll CC it to support. | 10:58.26 |
| Hmm, interesting, I seem ot have a LOCA table entry which is coming out as a null object, which is why the InsertionSort is failing | 12:37.24 |
henrys | tor8:thanks for helping out with that. FWIW I use this with bisect: http://stackoverflow.com/questions/15407075/how-could-i-use-git-bisect-to-find-the-first-good-commit | 13:14.30 |
buildMuPDF | I want to build MuPDF. But I get the following error messages when running ndk-build: | 13:19.25 |
| expected 'struct fz_rect const *' but argument is of type 'fz_rect' | 13:19.35 |
| expected 'struct fz_matrix const *' but argument is of type 'fz_matrix' | 13:19.49 |
| 'fz_text_line' has no member named 'len' | 13:20.11 |
kens | Did you get the source from our Git repository, and if so, when ? | 13:20.17 |
buildMuPDF | 'fz_text_line' has no member named 'spans' | 13:20.25 |
kens | What's the SHA of the source you are using ? | 13:21.05 |
buildMuPDF | It is mupdf-a3d00b2c51c1df23258f774f58268be794384c27.tar.bz2 | 13:21.39 |
kens | Well that doesn't seem to be the latest, when did you fetch it ? | 13:22.36 |
| What does "git log" say ? | 13:24.11 |
| OK well that's back in May. I'm not aware of any problems there, but you should probably upadate to the current version anyway and try again | 13:25.30 |
jogux | kens: https://www.google.co.uk/search?client=safari&rls=en&q=mupdf-a3d00b2c51c1df23258f774f58268be794384c27.tar.bz2&ie=UTF-8&oe=UTF-8&gfe_rd=cr&ei=pMPoU57wGejY8gepl4DQCA suggests there's more going on here :-) several of the matches have patches along side them that modify the source after extracting it. | 13:25.34 |
buildMuPDF | It belongs to APV. It's from May 25, 2013. https://code.google.com/p/apv/source/browse/pdfview/?r=e949eb17e42186321523a595316e13a73a98456f#pdfview%2Fdeps | 13:25.52 |
kens | buildMuPDF : You can't expect us to support code you get from somewhere else | 13:26.08 |
| You either need to use our code or take it up with APV | 13:26.33 |
| jogux yes I see that, thanks, I did't know so many people were hosting versions of MuPDF | 13:27.11 |
buildMuPDF | Well, the error comes from mupdf | 13:27.22 |
kens | Brando753 : If someone changes the source, its not MuPDF | 13:27.43 |
| AFAIK our current source builds for Android. I'm not one of the MuPDF developers, but if it was me I would tell you to get our version of the source, and try with that. | 13:28.17 |
buildMuPDF | It is using your MuPDF (unmodified) | 13:28.18 |
kens | is a cynic. | 13:28.37 |
buildMuPDF | Ok I will try. | 13:28.50 |
kens | I wouldn't trust source that didn't come from us. And in any event, you are using source 2 months out of date | 13:28.55 |
buildMuPDF | It is their latest release. The others are beta. | 13:29.50 |
kens | shrugs | 13:30.01 |
| Mayeb you should query APV about it ? | 13:30.11 |
| I assume you are using the instructions in the README ? | 13:30.50 |
buildMuPDF | Does the newest MuPDFlibrary work exactly the same or did you make great changes that could stop apv from working? | 13:32.18 |
kens | I'm not aware of any breaking changes, but like I said, I don't work on MuPDF | 13:32.50 |
tor8 | buildMuPDF: we don't track what APV are doing. that said, we haven't made any breaking changes in the past month or so. | 13:33.01 |
| the source as we provide it as releases on mupdf.com or by git from git.ghostscript.com builds fine for us on android | 13:33.45 |
buildMuPDF | Ok. Thank you. I will try your newest release. | 13:34.25 |
kens | chrisl ping | 13:38.53 |
| Oh oops, he's away :-( | 13:39.05 |
buildMuPDF | Do you mean me?? | 13:44.45 |
kens | No I mean chrisl | 13:44.53 |
tor8 | robin_watts_mac: ping | 13:49.07 |
| paulgardiner: ping! I guess you're more likely to be around that Robin | 13:49.54 |
robin_watts_mac | pong. | 13:50.39 |
| but just going to breakfast. | 13:50.44 |
tor8 | robin_watts_mac: I have two small commits on tor/master waiting for review | 13:51.05 |
sebras | tor8: the cbz-commit LGTM. | 13:58.27 |
buildMuPDF | Do you know if this code is compatible with your MuPDF library? http://dpaste.com/13XZ5DE | 13:59.47 |
sebras | buildMuPDF: no probably not. | 14:01.55 |
buildMuPDF | How can I change it to be compatible? | 14:02.33 |
tor8 | buildMuPDF: no, it does not look entirely compatible. | 14:02.41 |
| buildMuPDF: look in "include/mupdf/fitz/structured-text.h" | 14:02.52 |
buildMuPDF | (It is C not java) | 14:03.17 |
sebras | that code uses fz_text_line->len while here http://git.ghostscript.com/?p=mupdf.git;a=blob;f=include/mupdf/fitz/structured-text.h;h=f325bf216a5cc6946a7d8f95a4970ff09e4e7c14;hb=HEAD#l119 mupdf declares it fz_text_line not to have a len member, for example. | 14:03.17 |
tor8 | the changes are fairly minor; the page_block can contain both text and image blocks | 14:03.25 |
sebras | buildMuPDF: we know. ;) | 14:03.25 |
tor8 | so you'll need to check the type field and then get the text block | 14:03.48 |
| and looping over the spans is a linked list rather than an array | 14:03.57 |
sebras | tor8: I agree with your description of your test_device-patch, but I don't understand how iscolor relates to dev->user..? | 14:06.26 |
| tor8: that just seems wrong. why is dev->user passed to fz_test_color()..? | 14:06.51 |
tor8 | sebras dev->user points to an integer that contains the boolean result of the iscolor 'test' | 14:07.19 |
| fz_new_test_device(ctx, &iscolor) | 14:07.38 |
| run page | 14:07.40 |
| read iscolor value | 14:07.43 |
buildMuPDF | Could you change it, please. As am only programming in java and don't know your library, it is very difficult for me. | 14:07.51 |
sebras | tor8: ah! now I see it! it is passed through fz_new_device(). ok then it makes sense. | 14:08.19 |
| tor8: ok, LGTM. | 14:09.49 |
tor8 | buildMuPDF: look at fz_print_text_page http://git.ghostscript.com/?p=mupdf.git;a=blob;f=source/fitz/stext-output.c;h=6ed595fc1dac90a04da38db57a15d5d49ed06037;hb=HEAD#l363 and just restructure the code you have to that loop framework | 14:09.55 |
| sebras: thanks. | 14:10.00 |
| buildMuPDF: replace the /* for now lets just flatten */ block with the code in fz_print_text_page, substituting the printf for append_chars | 14:10.59 |
buildMuPDF | Including "void"? void fz_print_text_page(fz_context *ctx, fz_output *out, fz_text_page *page) | 14:14.55 |
tor8 | buildMuPDF: I thought you said you knew Java? C isn't that much different. | 14:17.26 |
sebras | buildMuPDF: you can't just quote the entire code including the argument declarations inside another function in C. in the same way you cannot do this in Java, right..? | 14:20.12 |
buildMuPDF | I just want to know if I have to include the void in line 362 or leave it out? (You pointed to l363) | 14:20.14 |
sebras | buildMuPDF: actually you don't need line 363 either. | 14:20.46 |
tor8 | buildMuPDF: I linked to the function and said copy the contents (of the function). copying the function would be rather pointless; as it already exists... | 14:20.49 |
buildMuPDF | Ok thank you. | 14:21.15 |
sebras | buildMuPDF: you need to read and understand your original extract_text(). you must understand how it loops over each type of datastructure to get at the pieces of text and how it stores it in the output string. then you need to read a bit of fz_print_text_page() to see how it differs in looping of the datastuctures to get at the pieces of text. in this case they are printed instead of being appended to a string. | 14:23.15 |
| buildMuPDF: I think it will help you in the future if you actually spend the time to learn this now, I mean you might end up having to interact more with the native mupdf C library in the future and starting out with this quite simple code is a great way to start. :) | 14:24.31 |
tor8 | buildMuPDF: something like this http://collabedit.com/qu7rp | 14:25.10 |
buildMuPDF | I still get errors. | 14:48.58 |
| http://dpaste.com/2FK9HY4 | 14:50.13 |
kens | Well the easy way to solve those is to cast them to const types | 14:50.53 |
| Oh and you'll need to see what 'out' should be and define it | 14:51.15 |
buildMuPDF | Where do I see what out should be? You mean "fz_printf(out, "\n");"? | 14:54.37 |
sebras | buildMuPDF: did you see the changes that tor8 and kens did for you? | 14:54.45 |
| buildMuPDF: http://collabedit.com/qu7rp | 14:54.50 |
kens | Wasn't me, must have been tor | 14:54.59 |
sebras | buildMuPDF: it's a collaborative text editor online. | 14:55.00 |
| kens: oh, I thought you contributed too. :) | 14:55.15 |
kens | I logged in to read it, I wouldn';t want to edit it and send someone wrong | 14:55.39 |
robin_watts_mac | tor8: OK, do you still need your commits reviewed? Which ones? | 15:00.12 |
tor8 | robin_watts_mac: no, I'm all set thanks to sebras | 15:00.33 |
| go back to vacation! | 15:00.35 |
robin_watts_mac | ok. | 15:00.43 |
kens | Still in CHile Robin ? | 15:00.57 |
sebras | robin_watts_mac: bye! :) | 15:01.00 |
robin_watts_mac | kens: yeah, got back to santiago from Easter Island yesterday. | 15:01.35 |
| Off to Dallas tonight. | 15:01.42 |
kens | Say hi to Scott for us :-) | 15:01.52 |
robin_watts_mac | Seeing Scott tomorrow, then we fly back to the UK on wed. | 15:01.57 |
| Will do. | 15:01.59 |
kens | Hmm, thunder.... :-( | 15:02.04 |
pedro_mac | Robin: cool - have a safe trip back | 15:02.27 |
buildMuPDF | I'm running ndk-build and it looks really good. - It's still running without any errors :) | 15:02.48 |
kens | pedro_mac : just wants to ensure someone else comes and works on Smart Office :-) | 15:03.02 |
pedro_mac | kens: :) | 15:03.26 |
buildMuPDF | tor8: Thank you very much! | 15:03.38 |
sebras | buildMuPDF: most importantly. do you understand it better now..? | 15:04.47 |
kens | oh boy lightning now too, if I drop off suddenly you'll know why.... | 15:05.58 |
buildMuPDF | Yes. Not everything but more than before. | 15:09.30 |
sebras | buildMuPDF: excellent, good work. :) | 15:10.22 |
buildMuPDF | Thank you and goodbye. | 15:12.31 |
kens | bb | 15:12.44 |
robin_watts_mac | see y'all in dallas. | 15:43.39 |
mvrhel_laptop | bbiab | 15:48.07 |
kens | Night all | 16:04.03 |
pedro_mac | gânite kens | 16:05.21 |
nemo | rayjj: WB | 16:40.30 |
| so, (a bit later) how did you flatten that PDF? might be worth trying to do it | 16:41.01 |
| although... a fair number of pages are colour, so it might be just too difficult to descriminate | 16:41.17 |
| probably better to just focus on using mutools to get more reliable jpeg compression than ghostscript can provide | 16:41.33 |
rayjj | nemo: OK, so I have something that cleans up the image | 16:56.16 |
| nemo: I don't think mutool (or mudraw) can do this since it relies on image transfer function | 16:57.12 |
| mudraw only supports gamma AFAICT | 16:57.50 |
nemo | rayjj: well, I was thinking more extracting the jpegs using mutools, then putting them back using it | 16:58.23 |
| which you'd suggested | 16:58.26 |
| (putting 'em back after manipulating in imagemagick or whatever) | 16:58.40 |
rayjj | nemo: I have a command line that uses gs to do it in one step | 16:58.56 |
nemo | I could, I suppose, try to identify what the image is based on its colour profile and pick a different technique. could be a bit tedious tho | 16:59.04 |
| but just making ghostscript use less hideous jpeg compression would already be a win | 16:59.39 |
| rayjj: oh neat | 16:59.58 |
| I love copying and pasting commandlines! | 17:00.13 |
| (slow at reading) | 17:00.17 |
rayjj | image result at http://casper.ghostscript.com/~ray/after_transfer_w_DCT.pdf -- command line: | 17:04.36 |
| gswin32c -dColorConversionStrategy=/Gray -dProcessColorModel=/DeviceGray -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterGrayImages false /GrayImageFilter /DCTEncode >> setdistillerparams {dup .95 gt { pop 1 } if } settransfer" -f before.pdf | 17:04.38 |
| nemo: this converts the image to Gray, and the result is 168,492 bytes. If I keep the transfer function, but stay in RGB, it is 177,083 bytes. That image is at http://casper.ghostscript.com/~ray/after_transfer_w_DCT_RGB.pdf | 17:07.47 |
| for that, just leave off the -dColorConversionStrategy=/Gray -dProcessColorModel=/DeviceGray options | 17:08.14 |
| oops. not exactly. Actually: | 17:09.13 |
| gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode >> setdistillerparams { dup .95 gt { pop 1 } if } settransfer" -f before.pdf | 17:09.14 |
nemo | hmmm | 17:09.44 |
nemo | fires that off on the whole doc | 17:09.50 |
rayjj | the "x.pdf" is just to simplify my testing w/ various settings -- I renamed afterwards | 17:09.56 |
nemo | huh.. | 17:10.45 |
rayjj | nemo: the key is to force the DCTEncode because if I use the transfer functions and leave Auto__ImageFilter true then it selects FlateEncode and that is MUCH larger | 17:11.26 |
nemo | setdistillerparams { dup .95 â that's for quality 95%? | 17:11.53 |
rayjj | nemo: no, that's part of the non-linear transfer function | 17:12.17 |
nemo | ah... | 17:12.20 |
rayjj | the "mud" is mostly junk that was in the original that was made more visible by the second JPEG | 17:12.52 |
nemo | ugh. I hate working w/ git | 17:12.56 |
rayjj | nemo: quality 95% is the default AFAICT | 17:13.06 |
nemo | hm | 17:13.15 |
| I should print out the encoded jpegs gimp creates | 17:13.36 |
| I hadn't done that yet | 17:13.39 |
| printing does seem to make the mud more visible | 17:13.47 |
rayjj | nemo: I haven't checked the QFactor actually being used. I am confused by the ACSImageDict and the ImageDict -- I don't know which one is it using (I'm not the pdfwrite expert, and kens is gone for the day) | 17:16.31 |
| nemo: printing _would_ intensify the light gray dots due to dot gain, particularly on a laser engine | 17:17.09 |
| nemo: can you try printing either of the pages I posted and compare to 'before.pdf" printed ? | 17:17.54 |
nemo | rayjj: hm. um... can you relink? | 17:18.06 |
rayjj | relink ??? | 17:18.18 |
nemo | rayjj: I'm trying to get !@#$ git to restore a file I'd deleted just so I can try your commandline | 17:18.22 |
| rayjj: post the link again. disappeared in history over weekend and I'm on a new machine | 17:18.37 |
| re-link | 17:18.40 |
rayjj | git status -u shows the changed file, right ? | 17:18.49 |
| then just use git checkout <changed_file> | 17:19.18 |
| that'll restore to the "master" file | 17:19.39 |
| nemo: even if it has been deleted | 17:19.59 |
nemo | rayjj: yeah, I eventually got someone to tell me it was checkout | 17:20.05 |
| it annoys me that it is so... different | 17:20.11 |
| mercurial manages to be close enough to svn/cvs that I had no trouble adapting | 17:20.28 |
rayjj | different to svn and cvs, yeah. But I've more or less gotten on terms with it | 17:20.38 |
nemo | besides not screwing around with history and maintaining a clear timeline | 17:20.39 |
| rayjj: ⥠mercurial | 17:20.46 |
| I normally just convert to a mercurial repo if I need to do a lot of stuff | 17:21.18 |
| but I'm mostly treating this repo as readonly | 17:21.24 |
rayjj | I like local repository that svn and cvs don't have | 17:21.31 |
nemo | I was just trying to clean up from the screwing around last week | 17:21.33 |
| rayjj: yeah. that's what mercurial is for :D | 17:21.42 |
| only more intuitive to use | 17:21.45 |
rayjj | admits that git is *NOT* intuitive | 17:22.11 |
| I'm not sure about trusting a repository to something named "mercurial" :-) | 17:23.25 |
nemo | rayjj: heh. "git" is hardly better in english | 17:23.57 |
rayjj | synonyms: volatile, capricious, temperamental, excitable, fickle, changeable, unpredictable, variable, protean, mutable, erratic, quicksilver, inconstant, inconsistent, unstable, unsteady, fluctuating, ever-changing, moody, flighty, wayward, whimsical, impulsive | 17:24.11 |
| nemo: well, to those starting to use it "git" as a slang for "spawn of the devil" seems appropriate | 17:24.48 |
nemo | rayjj: well, DVCS are indeed rather "protean" | 17:25.36 |
| but mercurial has more of a backbone than git | 17:25.41 |
| http://mercurial.selenic.com/wiki/GitConcepts | 17:25.51 |
| http://www.webmonkey.com/2010/03/a-subversion-users-guide-to-mercurial-version-control/ | 17:26.07 |
rayjj | nemo: so which files do you need posted ? The 'before.pdf" ? (I assume that you have the ones from today) | 17:27.02 |
| nemo: http://casper.ghostscript.com/~ray/before.pdf | 17:28.03 |
nemo | rayjj: I mean, your processed file | 17:28.21 |
| you wanted me to print it | 17:28.25 |
| 13:17 < rayjj> nemo: can you try printing either of the pages I posted and compare to 'before.pdf" printed ? | 17:28.37 |
| obv I have "before" ;) | 17:28.47 |
rayjj | the ones I just uploaded today are http://casper.ghostscript.com/~ray/after_transfer_w_DCT.pdf (Gray) and http://casper.ghostscript.com/~ray/after_transfer_w_DCT_RGB.pdf | 17:30.09 |
nemo | hm | 17:36.25 |
| yeah, I dunno... | 17:36.49 |
| I'll run it past the boss | 17:36.53 |
| also. let me see what happens if I use gimp | 17:37.08 |
| frankly, this is more for my peace of mind | 17:37.30 |
| they are perfectly happy to toss 30 gigabytes of badly scanned PDFs into the database | 17:37.42 |
rayjj | nemo: also I have uploaded ones with different QFactor settings: http://casper.ghostscript.com/~ray/after_transfer_DCT_QF_95.pdf QF_76 QF_40 and QF_15 you can see the size differeces | 17:37.44 |
nemo | hm | 17:37.48 |
| how did you do that? | 17:37.51 |
| set the QF? | 17:37.54 |
| 404 | 17:38.23 |
rayjj | command line: | 17:38.43 |
| gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode /ColorImageDict << /QFactor 0.15 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> >> setdistillerparams { dup .95 gt { pop 1 } if } settransfer" -f before.pdf | 17:38.44 |
| nemo: oops: http://casper.ghostscript.com/~ray/after_transfer_DCT_RGB_QF_95.pdf | 17:39.30 |
| the 15 is somewhat cleaner than the 95, but the file size is 381,456 vs 173,296 | 17:42.36 |
| the before.pdf was 799,160 | 17:43.22 |
nemo | so one thing that puzzles me, is I thought "before" was lossless | 17:43.34 |
| so why would there be a double encoding issue | 17:43.39 |
rayjj | nemo: no, the before was DCT | 17:44.12 |
nemo | hmmm | 17:44.19 |
| 'k | 17:44.20 |
| thanks | 17:44.21 |
| I missed that | 17:44.23 |
| rayjj: I thought they were all Flate | 17:44.35 |
| but perhaps some of the pages were DCT based on whatever the scan tool was doing | 17:44.44 |
rayjj | object 3 from "before": <</BitsPerComponent 8/ColorSpace/DeviceRGB/Filter/DCTDecode/Height 4394/Length 757424/Subtype/Image/Type/XObject/Width 3435>> | 17:44.59 |
nemo | rayjj: so yeah, those links are 404 fwiw | 17:45.12 |
rayjj | nemo: I just tested it !!! | 17:45.41 |
nemo | you're right. | 17:45.43 |
| huh... | 17:45.44 |
| not the links | 17:45.47 |
| the DCT | 17:45.48 |
| I just checked the original file | 17:45.52 |
| strings foo.pdf | grep -E Filter.*DCT | wc -l | 17:46.14 |
| 307 | 17:46.14 |
| all DCT. bleah | 17:46.18 |
rayjj | I just tested the link posted right before the file size | 17:46.19 |
nemo | /msg'd | 17:47.33 |
rayjj | nemo: FWIW, If I don't force DCT, then with the transfer function, it uses Flate and teh RGB size is 922,774 | 17:48.10 |
nemo | ew | 17:48.15 |
| rayjj: that's even w/ downsampling DPI? | 17:48.21 |
| oh wait | 17:48.22 |
| you didn't reduce DPI! | 17:48.25 |
| getting lower than 400 was kinda one of the main goals :) | 17:48.45 |
rayjj | nemo: yes, I did: <</Subtype/Image/ColorSpace/DeviceRGB/Width 1288/Height 1647/BitsPerComponent 8/Filter/FlateDecode/DecodeParms<</Predictor 15/Columns 1288/Colors 3>>/Length 922657>> | 17:49.39 |
| nemo: so that's 150 dpi | 17:51.15 |
nemo | ahhh | 17:51.19 |
| I missed that in the commandline above | 17:51.27 |
| hm'k | 17:51.39 |
rayjj | nemo: it's implied in the /ebook | 17:51.41 |
nemo | well. that still helps | 17:51.41 |
| oh :( | 17:51.45 |
| â noob at this | 17:51.51 |
| hm | 17:52.18 |
| so. ... | 17:52.21 |
| find -name "*.pdf" | while read f;do echo "$f $(strings "$f" | grep -E "Filter.*DCT" | wc -l) $(strings "$f" | grep -E "Filter.*Flate" | wc -l)";done | 17:52.26 |
| this is probably a stupid hack, but... | 17:52.32 |
| _.pdf 36 79 _.pdf 197 400 _.pdf 39 85 etc etc | 17:52.49 |
| pages are about 50:50 flate/dct | 17:52.56 |
| I missed that in the first couple of sample files I pulled. ugh | 17:53.03 |
| their tool must have been selecting based on the page | 17:53.10 |
rayjj | nemo: me, too. BTW, I force conversion to Gray, then the Flate file size is 499,385 | 17:53.44 |
nemo | rayjj: The annoying thing is colour is so *rare* | 17:54.18 |
rayjj | nemo: they probably use Auto_ImageFilter true, so it depends on the image contents | 17:54.26 |
nemo | but I'd really have to consider on a page by page basis | 17:54.27 |
rayjj | nemo: I suggest setting the QFactor to a low number, and just go with color. Even at 0.15, it is _still_smaller than Flate flattened to Gray | 17:55.31 |
| I can upload the Flate if you want to compare with the QF files | 17:55.58 |
| nemo: http://casper.ghostscript.com/~ray/after_transfer_Flate_Gray.pdf | 17:58.32 |
| nemo: I have to run an errand. bbiaw | 18:01.06 |
| nemo: can you see all of the QF files now ? | 18:01.23 |
nemo | huh. why don't I see RGB for flate | 18:06.50 |
| trying to figure out how many pages they did RGB and how many not | 18:07.34 |
rayjj | nemo: I didn't post the RGB for the Flate (it was 922.738 bytes, so larger than "before" | 18:42.12 |
nemo | rayjj: I mean, I was trying to figure out how many pages they did RGB and how many black and white, if any | 18:43.03 |
| rayjj: is Flate always colour? | 18:43.10 |
| basically, we are trying to determine how screwed up the 2nd batch of PDFs was | 18:43.26 |
rayjj | nemo: no, Flate can be used for Gray or Color | 18:44.08 |
nemo | hm | 18:44.11 |
| trying to figure out where that is in the filter line | 18:44.43 |
rayjj | nemo: gs can examine files and check if they have color, but the definition of 'neutral' is compiled in | 18:44.57 |
nemo | if it doesn't say, is default colour? | 18:45.07 |
| rayjj: I was just grepping the files to get a general idea | 18:45.15 |
| strings foo.pdf | grep -E "Filter.*Flate" | 18:45.22 |
rayjj | in the command line, if one doesn't force ProcessColorModel and ColorConversionStrategy, then out will be whatever colorspace came in (per image) | 18:46.00 |
| grep for DeviceGray, maybe ? | 18:46.20 |
| or DeviceRGB. | 18:46.33 |
| but with the 'before' file you sent, then image was RGB even though it looked like just shades of Gray | 18:47.10 |
| nemo: give me a sec and I'll check what gs thinks about that 'before' page. | 18:47.35 |
nemo | yeah | 18:47.35 |
| rayjj: that Before one was the first batch, which was just stupid | 18:47.43 |
| rayjj: 2nd batch is better | 18:47.48 |
| trying to figure out approaches for both | 18:47.54 |
| actually, the 2nd batch is just weird | 18:48.03 |
| they picked 400DPI for everything, but often used B&W flate or even CCIT Fax | 18:48.23 |
| CCITT Fax | 18:48.31 |
| that is, a scanned page flattened to literally B&W, not greyscale | 18:48.45 |
rayjj | nemo: if it doesn't have shades of gray, then CCITT is best compression | 18:49.04 |
nemo | so I'm like... why... why are we using 400DPI here? obviously quality went out the window, not that it was ever really there to begin with | 18:49.05 |
| rayjj: yeah, I'm just confused at the parameters chosen ⺠| 18:49.17 |
| the choice of compression was probably indeed by some scanning software | 18:49.29 |
| which probably also flattened the pages | 18:49.36 |
| I guess what is happening is, the 2nd batch does auto WB. | 18:49.54 |
| And as a result, some pages are close enough to B&W to trigger the software they are using to go into that mode | 18:50.08 |
| and they kept 400DPI, just 'cause. | 18:50.14 |
| the problem for me ofc, is it is harder to tell gs to be smart about such a crazy crazy mess | 18:50.28 |
| rayjj: I'm thinking what I need gs to do really is just reduce DPI, but keep whatever algorithm they used | 18:50.49 |
| maybe sometimes they used DCT or Flate inappropriately, but whatever. | 18:51.02 |
rayjj | nemo: If the image comes in CCITT, gs will emit CCITT since that is 1 bpp and is lossless | 18:51.19 |
nemo | and actually, on the original batch, where Before came from, not removing background is a win | 18:51.22 |
| since it hides the double-encoding of jpeg ⺠| 18:51.35 |
| and. I'm convinced now, that's where all my jpeg problems came from that were driving me batty. | 18:51.56 |
| well, I don't really consider it a win, but boss does :D | 18:52.15 |
| but. eh. lemme fire off your last suggested commandline against one from the new batch, and one from the old batch | 18:52.43 |
| I'm sure the DBA will appreciate it regardless | 18:52.56 |
| rayjj: and... not a single one of the Flate pages had device RGB or device Gray, so... going to guess it is just RGB | 18:56.02 |
rayjj | nemo: grep may not be reliable. And there are other colorspaces that it might have used. | 18:58.18 |
nemo | rayjj: I was eyeballing the strings, and couldn't find any mention of Device | 18:59.14 |
| amusingly ColorSpace/DeviceGray is on the CCITTFaxDecode lines | 18:59.53 |
| oh well whatever | 18:59.56 |
rayjj | nemo: try grepping for "/Subtype.*Image" | 19:00.05 |
| hmm... my grep calls it a binary file, so won't print the line :-( | 19:00.30 |
nemo | heh | 19:00.39 |
| I always run strings first | 19:00.42 |
| tidier | 19:00.45 |
rayjj | it just says: Binary file /c/Users/ray/Downloads/before.pdf matches | 19:00.51 |
nemo | strings *.pdf | grep -E "/Subtype.*Image" | 19:00.58 |
| strings *.pdf | grep -E "/Subtype.*Image" | grep Flate | 19:01.20 |
| returns nothing | 19:01.22 |
| I did 2 greps 'cause I have no idea what order that should be in the line, and didn't feel like a complicated regex ⺠| 19:01.39 |
| just... really weird | 19:01.42 |
| oh well. | 19:01.45 |
rayjj | but strings | grep "Subtype.*Image" does give me the line | 19:02.09 |
| nemo: just see without the second grep to make sure it is showing the Image obect | 19:02.40 |
nemo | it was | 19:02.48 |
| lots of CCITT fax lines :) | 19:02.59 |
| but, I think bosses are pushing back on them to just rescan everything \o/ \o/ | 19:03.15 |
rayjj | nemo: there is nothing that can be done to reduce the size of the CCITT pages | 19:03.23 |
nemo | we'll see. if they refuse, back to the programmer here w/ OCD to try and tidy it up | 19:03.28 |
| rayjj: yeah, I don't care about them frankly | 19:03.36 |
| rayjj: welll.... lower DPI would probably help some no? ⺠| 19:03.46 |
| but frankly, more worried about the new ones | 19:03.52 |
| er, more worried about the RGB Flate/DCT pages | 19:04.18 |
rayjj | nemo: lower dpi doesn't help if it goes from DCT to Flate | 19:06.18 |
nemo | rayjj: oh sure. I meant "nothing can be done for CCITT" - I mean, those would still be a bit of a win, but the files are so tiny, that, eh, who cares | 19:06.42 |
rayjj | nemo: BTW, even though it looks Gray, the GrayDetection=true, it thinks it has color (with the default tolerance 5/255) | 19:11.26 |
nemo | heh | 19:13.30 |
| well, is a scan | 19:13.34 |
| even white paper probably doesn't look white | 19:13.50 |
| esp after sitting in a folder for a while | 19:14.01 |
rayjj | nemo: I changed the transfer function a bit, and it looks better, IMHO. Please look at the files after_transfer_DCT_RGB_QF_15.pdf (and 40, 76 and 90) | 19:26.28 |
| it cleans up more of the dots around the text, making them lighter. The overall image is slightly lighter, too | 19:27.19 |
| I used { .93 div dup 1 gt { pop 1 } if } settransfer for these | 19:27.41 |
nemo | heh. 90 is still 404 ⺠| 19:32.27 |
| but I tried the others and those do work | 19:32.42 |
| I hadn't tried them before | 19:32.46 |
| huh. I must not understand how Quality factor works - is strange that 15 is the larger one. I thought quality decreased from 1.0 to 0 | 19:33.31 |
rayjj | nemo: This works *MUCH* better, and there is no noise added (compared to the Flate output) by using DCT even at QF 40 which has a file size of 253,283 The 76 adds a few dots, and the 95, quite a few | 19:37.27 |
| nemo: TBH, so did I. I'm just reporting what I see. But the Ps2pdf.htm document that has the Notes 7, 8, 9, and 10 seem to imply that 0.15 is used for "prepress" (the best) and 0.95 for screen and ebook | 19:39.22 |
| umm. 0.76 for screen and ebook, and 0.9 in general | 19:40.05 |
| the "printer" setting is 0.40 | 19:40.26 |
nemo | weird | 19:41.01 |
rayjj | nemo: BTW, it's 95, not 90. Sorry | 19:41.05 |
nemo | ahhhh | 19:41.14 |
rayjj | anyway, you have the "magic" to get the cleaned up file size, in color, down to 250K or so. | 19:41.56 |
nemo | 13:38 < rayjj> gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode /ColorImageDict << /QFactor 0.15 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> >> setdistillerparams { dup .95 gt { pop 1 } if } settransfer" -f before.pdf | 19:42.25 |
rayjj | at the "reduced" resolution of 150 dpi | 19:42.27 |
nemo | that one right ⺠| 19:42.30 |
| where the only thing to fiddle w/ is QFactor | 19:42.55 |
rayjj | nemo: not quite: | 19:43.15 |
| gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode /ColorImageDict << /QFactor 0.15 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> >> setdistillerparams { .93 div 1 gt { pop 1 } if } settransfer" -f before.pdf | 19:43.16 |
nemo | oh. 'k | 19:43.20 |
| well, regardless of what they decide on, I think I'm going to find this handy in the future | 19:44.03 |
| dumping it to my tips n tricks folder | 19:44.12 |
| thanks. | 19:44.14 |
rayjj | rather than leaving dots untouched that are below 95% white, it lightens everything up linearly by dividing by 0.93 (mul by 1.07) and clamps at white == 1 | 19:44.48 |
| I looked at the gray shades for some of the "noise" dots and there were some below 240 | 19:45.40 |
| you can play with the "0.93" if the image lightness seems too much. A higher number lightens less | 19:46.12 |
nemo | I'm gonna give it a shot at "40" - there's still decen win in size and I couldn't pick out any artifacting. | 19:52.06 |
| this is gonna take a looooooong time to run tho ⺠| 19:52.15 |
| and, the lightness looks good to me | 19:52.24 |
kens | rayjj, QFactor, page 163 of the PLRM: | 19:53.55 |
| "Valid values are in the range 0 to 1,000,000. A value less than 1 | 19:53.55 |
| improves image quality but decreases compression; a value greater than 1 | 19:53.55 |
| increases compression but degrades image quality. Default value: 1.0." | 19:53.55 |
nemo | O_o | 19:54.07 |
| hm. | 19:54.36 |
| I'm gonna try values bigger than 1 w/ your sample then | 19:54.49 |
| oh wait. no. n/m. I forgot. the default value was already deemed too ugly | 19:55.25 |
| eh. let's see what that range looks like | 19:55.46 |
| ~/git/ghostpdl/gs/bin/gs -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /AutoFilterColorImages false /ColorImageFilter /DCTEncode /ColorImageDict << /QFactor 1000000 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> >> setdistillerparams { .93 div 1 gt { pop 1 } if } settransfer" -f before.pdf | 19:56.38 |
| Error: /stackunderflow in --pop-- | 19:56.44 |
| hrm... | 19:57.05 |
| you sure that line didn't get trimmed? | 19:57.11 |
| that word "if" off on its own - I don't know much about the language used, but that seems odd | 19:57.28 |
| I'm assuming postscript, but the tutorials I've found so far, not too helpful | 19:59.51 |
rayjj | nemo: sorry. Let me cut and paste what worked for me. I had just hand edited yours | 20:03.15 |
nemo | ah well, yours is probably better | 20:03.31 |
rayjj | gswin32c -o x.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -c "<< /ColorImageDict << /QFactor 0.95 /Blend 1 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /AutoFilterColorImages false /ColorImageFilter /DCTEncode >> setdistillerparams { .93 div dup 1 gt { pop 1 } if } settransfer" -f before.pdf | 20:04.58 |
| nemo: PS is a postfix operation language, so it's <bool> <proc> if (execute proc if bool is true) | 20:06.01 |
nemo | hm. so. my system release version of gs ran it just fine | 20:06.58 |
| 9.10 | 20:07.05 |
rayjj | so, I forgot the 'dup' after the 'div' sorry | 20:07.06 |
nemo | | ./base/gsicc_manage.c:1685: gsicc_set_device_profile(): cannot find device profile | 20:07.21 |
| anyway, let's see if the system one does the trick. it was crashing before, but, eh, maybe I'll get lucky | 20:07.39 |
rayjj | nemo: I get that if I put /ProcessColorModel /DeviceGray /ColorConversionStrategy /Gray in the distillerparams dict (rather than as command line options). I am opening a bug. | 20:08.33 |
| for kens :-) | 20:08.48 |
nemo | moves them | 20:09.42 |
| er. wait. I Don't see those. sooo. um. no idea what you mean | 20:10.55 |
| (thought I just needed to move some parameters outside of the quoted block) | 20:11.34 |
rayjj | nemo: I opened a bug for kens: http://bugs.ghostscript.com/show_bug.cgi?id=695420 | 20:32.00 |
| going offline for a bit... | 20:33.23 |
nemo | m'k | 20:33.38 |
| rayjj: well, that's a shame, the non-git version I have crashed as it did before | 20:46.35 |
| sooo guess I either have to fix bug 695420, or wait ⺠| 20:46.44 |
| fix/fall back to working version | 21:08.06 |
| hm | 21:08.07 |
| bisect. helps you, helps me | 21:08.13 |
| shame I dislike git oh so much | 21:08.19 |
henrys | Â mvrhel_laptop any word back for the meeting time? | 21:19.03 |
mvrhel_laptop | henrys: yes :( | 21:32.32 |
| at the last minute she cancelled it on me | 21:32.46 |
| I am trying to salvage something now | 21:32.55 |
| They are a bit strange over there | 21:33.05 |
| She had cleared all of this earlier I had thought | 21:33.56 |
| and there were people in multiple groups that were interested | 21:34.19 |
henrys | mvrhel_laptop: so are you canceling the trip? | 22:07.14 |
mvrhel_laptop | henrys; good question. she still wants to meet to discuss the book that I am working on. however I think we have to even meet off site | 22:16.06 |
henrys | mvrhel_laptop: well if you need a place to stay, Iâve plenty of room. | 22:20.35 |
mvrhel_laptop | henrys: sorry internet was down as cable guy was here working on it. | 23:46.31 |
| so trip is cancelled | 23:46.36 |
| so I will be here for the morning meeting | 23:47.07 |
| Forward 1 day (to 2014/08/12)>>> | |