| <<<Back 1 day (to 2014/08/05) | 2014/08/06 |
henrys | rayjj: jeitta_pcl.tar.gz in my home on casper. | 05:32.10 |
rayjj | henrys: thanks. | 06:05.09 |
| it seems that since mvrhel ran the numbers, some files have gotten slightly faster, but some (eg. PLRM-3rd.pdf pages 1-100) have gotten worse by almost 2:1 | 06:06.34 |
| I'm hoping I can see that on a fast machine -- running git bisect on the pi would be *painful* | 06:07.53 |
| unless I am miscalculating, it took 1.5 HOURS to build gs | 06:11.20 |
chrisl | rayjj: did you use the 9.14 release, or did you use the latest master code? | 06:12.08 |
rayjj | so even 8 bisect steps could be overnight -- assuming I could automate the performance check | 06:12.23 |
| chrisl: latest master | 06:12.33 |
chrisl | Hmm, it's disappointing that's slower, then :-( | 06:13.12 |
rayjj | nchrisl: and I don't know for sure which version mvrhel benchmarked (just ~ last Aug) | 06:13.43 |
chrisl | something close to 9.10 I would guess | 06:14.29 |
rayjj | of the file mvrhel_laptop ran, J11 pdf and PLRM-3rd.pdf are the ones that slowed down. The others were equal or faster with current code | 06:16.36 |
chrisl | rayjj: if you want, I can bisect - I can cross compile for the pi here | 06:17.12 |
mvrhel_laptop | yes something close to 9.10 | 06:17.35 |
| it was in august of last year | 06:17.40 |
rayjj | chrisl: cross compile, then you'd have to push the binary and run it | 06:19.40 |
| chrisl: how would I do that ? | 06:19.52 |
| (the 1.5 hour build was a make from clean on the Pi) | 06:20.25 |
chrisl | rayjj: I use a system called buildroot to build a cross compile environment | 06:20.27 |
rayjj | chrisl: I did a bit with buildroot on the gumstix, but that was 3+ years ago | 06:21.15 |
| chrisl: can you tell me how to cross compile? If I get a binary, then I can easily automate the testing | 06:22.43 |
| chrisl: or if you want, I can give you an account on my Pi | 06:24.20 |
chrisl | rayjj: I have a pi here | 06:24.33 |
rayjj | If you can look at just the PLRM and/or J11, that's enough | 06:24.50 |
| chrisl: in the meantime (tomorrow) I'll run the PCL tests | 06:25.21 |
chrisl | rayjj: I'll make a start once I've had a shower and other such morning rituals..... :-) | 06:25.54 |
rayjj | chrisl: I just forwarded the emails from mvrhel_laptop. The one with the XLS attachment has the gs parameters | 06:27.05 |
| chrisl: and COFFEE !!! | 06:27.14 |
| (or tea, if you are a wimp) | 06:27.37 |
mvrhel_laptop | thanks guys | 06:28.07 |
| I have to take a personal day tomorrow to deal with some issues with my mom | 06:28.21 |
rayjj | although referring to anyone that plays squash as a 'wimp' is probably *NOT* a good idea ;-) | 06:28.32 |
mvrhel_laptop | I will be off an on some but likely off more | 06:28.37 |
rayjj | mvrhel_laptop: np. | 06:28.51 |
| mvrhel_laptop: I hope all goes well | 06:29.02 |
mvrhel_laptop | thanks | 06:29.07 |
rayjj | mvrhel_laptop: at least I will have some kind of relative numbers for PCL vs. PS/PDF with HEAD | 06:30.03 |
mvrhel_laptop | rayjj: ok that is good | 06:30.19 |
rayjj | assuming that the files that henrys generated are reasonable | 06:30.36 |
rayjj | expects that PS will always win, given the design criteria for gs | 06:31.22 |
| time for bed... | 06:32.23 |
mlen | kens: yes, I sent it yesterday (This message has been postponed on 2014-08-05 18:03:24.) | 06:50.32 |
kens | mlen, aha, OK then I'll check with Miles to see he got it. I assume you sent it by email ? | 06:51.22 |
kens | realises mlen is not here, that's a neat trick having a waiting message | 06:52.11 |
mlen | kens: weechat has cool plugins :) | 08:36.35 |
| kens: and yes, I sent it via email | 08:36.53 |
| I'll test the updated version of the patch today in the evening | 08:37.42 |
kens | mlen thanks, I'll ping Miles and if you and Ray are both happy then I can commit the patch | 08:43.49 |
| Morning robin_watts_mac | 08:57.05 |
| Off out, back this afternoon | 08:59.40 |
mattchz | tor8: a few crashes have appeared in crashlytics MuPDF for iOS btw. | 12:46.10 |
| some are in the core, a couple in the ios code. | 12:46.32 |
kens | tor8 ping | 12:55.56 |
tor8 | mattchz: :( | 12:56.13 |
| kens: pong. | 12:56.16 |
kens | tor8 I've lost track of the problems with a certain outfit wanting to use MuPDF to split a PDF file into pages. | 12:56.40 |
| Marcos has opened a bug report that '%d' doesn't work, were there any other problems there ? | 12:56.56 |
| Bug is 695393 | 12:57.14 |
| Oh, and should waht Marcos is doing there work do you think ? | 12:57.36 |
tor8 | kens: is this the same outfit that wanted to know if a page is color or grayscale? | 12:58.09 |
kens | Yes, that's them | 12:58.15 |
| I'm trying to write a polite reply to thei pestering | 12:58.31 |
tor8 | well, that one has a fix in the git version (mudraw -T) | 12:58.36 |
kens | So mudraw -T will detect colour or monochrome pages ? | 12:58.53 |
tor8 | kens: the %d bug from marcos is with the pdfwrite device | 12:59.00 |
kens | That's the MuPDF pdfwrite device ? So there are other problems potentially with using that, related to fonts and stuff | 12:59.26 |
tor8 | kens: it's not finished, so not fit for consumption | 12:59.45 |
kens | Right, that's good enough for now, thanks | 12:59.59 |
tor8 | mutool clean has the capability to save subsets of pages in the output | 13:00.08 |
kens | I think they want to split each page out individually, but I'll mention the subsets thing too | 13:00.32 |
tor8 | so if they're okay with looping over a command line to extract each page with a separate invocation, they can use mutool clean | 13:00.33 |
kens | doubts their competence to create such a script | 13:00.58 |
tor8 | for i in $(seq 10); do mutool clean -gg pdfref17.pdf out$i.pdf $i; done | 13:01.29 |
kens | Could take a while for 6000 pages :-) | 13:01.44 |
tor8 | sadly, the subsetting is a very slow operation | 13:01.45 |
kens | Yeah, well GS pdfwrite isn't exactly lightning on 6000 pages either. | 13:02.29 |
tor8 | but I don't think doing the loop in mutool itself will be much faster; but this is slower than I expected so I should probably profile it to see if there are any obvious bottle-necks | 13:02.57 |
kens | It ought to be possible to extract pages fairly quickly, pdftk is pretty fast as I recall. | 13:03.28 |
tor8 | or it's just the overhead of mutool clean parsing the entire file to load all the objects before it starts garbage collecting, in which case it would be faster | 13:03.31 |
| but if the slowness is in writng out the objects, then we have a bigger problem | 13:04.01 |
| could probably rejig the cleaning code to do the "garbage collection" when creating a subset in a faster manner | 13:04.40 |
kens | I guess it depends how it work. I'm fairly sure pdftk doesn't 'process' the objects, just the xref, it uses that to pick up the objects it needs, writes those out to a new file, with new object numbers, and writes the decorations required to make it a PDF file. | 13:05.16 |
tor8 | supporting %d in pdfwrite *and* in mutool clean destination name to get individual pages out are both worthwhile doing | 13:05.18 |
| kens: yeah, we do something similar but I think we still loop over all the objects in the file before writing it out | 13:07.12 |
kens | Umm, well maybe it could be optimised..... | 13:07.25 |
kens | is just trying to get some of these support queries off my plate before the next ones come in | 13:07.56 |
tor8 | kens: hmm, reading the code that seems to not be the case. need to profile! | 13:10.42 |
kens | WHich is not the case ? | 13:10.54 |
tor8 | loading every object | 13:11.01 |
| we only load the ones we've determined are needed by scanning from the trailer | 13:11.15 |
kens | Ah, so the performance should be better maybe. | 13:11.18 |
| Hmm, its odd that its not quicker then | 13:11.27 |
| I've mentioned using mutool in my reply (as welll as explaining why they shouldn't use GS for this), they may come back (probably will, I dount they can figure out how to do this alone) | 13:12.10 |
tor8 | kens: yes. disregarding performance, mutool clean will do the task they require, if they can invoke it in a loop | 13:12.58 |
kens | We'll see what they come back with :-) | 13:13.24 |
| OK 2 down, just the Zoltan bug left to look at | 13:29.08 |
kens | fetches more coffee first | 13:29.52 |
henrys | kens: I can take the zoltan one if you want sorry I didnât know all this was open next time we should ask marcosw what is open before he leaves. | 14:06.16 |
kens | I had a feeling there was some stuff built up. I'm looking at the zoltan one now, but haven't got far yet | 14:06.41 |
| I have a feeling he should be looking at the separation map but I'mnot certain yet | 14:07.12 |
henrys | I was just aware of the unicode thing⦠I should have looked | 14:07.40 |
kens | It took me a little while this orning to sort through the open emails :-( | 14:08.03 |
| henrys, feel free to look a tZoltan's query if you like, its not actually at the tiffsep end of things, so I'm floundering at the moment | 14:18.28 |
henrys | okay I have the skype meeting now and Iâll look at it when Iâm done. | 14:22.41 |
kens | OK I'll continue poking it for now | 14:22.53 |
henrys | rayjj: did you get the pcl printer files? | 14:23.05 |
jogux | anyone know of an easy way of getting the 80M pdf our scanner generated for a 8 page doc down to a size small enough to email, without rescanning the whole thing at a lower res etc? :) | 15:06.57 |
kens | make a PDF from it and downsample the images ? | 15:07.24 |
pedro_mac | jogux: I can scan it from the âdoze app - much smaller images | 15:07.34 |
kens | THat's a new PDF I mean | 15:07.38 |
| You oculd use GS with the ebook PDFSETTINGS | 15:08.02 |
jogux | kens: that should do it. thanks :-) | 15:08.40 |
kens | isn't certain, but its worth a try | 15:09.01 |
pedro_mac | gace up using the scan to USB on our scanner as its very bloaty | 15:14.12 |
| ^ gave | 15:14.32 |
jogux | kens: ebook settings are awesome. 80M -> 1.3M. and I can't honestly see any difference between the two files :) | 15:28.25 |
| (thanks!) | 15:29.58 |
kens | jogux, I'm surprised :-) | 15:34.16 |
| I'd have to guess the scan was very high resolution before | 15:34.26 |
jogux | kens: or that samsung's mac scanner software is really really bad. :( | 15:34.47 |
kens | It could only be that bad if it wasn't compressing the image, surely it can't be *that* dumb..... | 15:35.17 |
rayjj | henrys: I have the PCL files, thanks | 16:08.41 |
chrisl | rayjj: the slowdown between the 9.10 code and now is from this commit: http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=d2f4c3e5 | 16:10.07 |
rayjj | is the plrm 100 pages or the whole thing ? | 16:10.30 |
kens | chrisl that affects PCL ? <boggles> | 16:11.06 |
chrisl | kens: that's gs | 16:11.24 |
henrys | 100 | 16:11.25 |
chrisl | rayjj: There are annotations in J11 that trigger that test - I couldn't see a significant performance difference with the PLRM | 16:11.34 |
rayjj | kens: that was repeating PS/PDF tests that mvrhel had done. | 16:11.49 |
kens | I'd guess that the output was wrong before then | 16:11.53 |
rayjj | henrys: thanks | 16:11.54 |
| chrisl: thanks. So the J11 output was wrong before ? | 16:12.47 |
kens | This shoudl only trigger if there is no appearance stream, so wrong in this case would be 'not the same as Acrobat' | 16:13.15 |
chrisl | rayjj: yes, there was missing transparency | 16:13.15 |
kens | Well then fast but wrong, ror slow but good :-) | 16:13.42 |
chrisl | rayjj: I suspect mvrhel used a cutdown PDF of the PLRM with only 100 pages in it, to avoid the startup costs of assembling the xref - might account for why you couldn't match that time | 16:14.02 |
rayjj | chrisl: strange about the PLRM. What time did you see (I got 76 seconds at 600 dpi, 143 sec at 1200 | 16:14.06 |
chrisl | the master code was 115.427 seconds, the 9.10 code was 122.122 @ 600 dpi | 16:14.53 |
| So the current code, for me, was faster | 16:15.07 |
kens | Probably those text fixes | 16:15.20 |
chrisl | I'm fairly sure, yes | 16:15.30 |
kens | I'm going to give up on the Zoltan question, I'm not even sure wht he's asking about really | 16:16.17 |
| No doubt tomorrow will bring plenty new emails from our friends to the East | 16:17.21 |
| Goodnight folks | 16:17.27 |
chrisl | rayjj: of course, my numbers aren't really good comparisons for yours or mvrhel's because my build environment is quite different - different gcc versions, different libc implmentation etc, etc..... | 16:18.13 |
| rayjj: sorry, I have to head out: if you want me to do more testing or whatever, just let me know | 16:20.34 |
rayjj | henrys: (or anyone) can you see what's wrong with this command line: | 16:20.59 |
| main/obj/pcl6 -dUseFastColor=true -q -Z: -sDEVICE=bitcmyk -r600 -dGrayValues=256 -dBufferSpace=16m -dNOPAUSE -dBATCH -sOutputFile=/dev/null -sBandListStorage=memory ../testing/plrm.prn | 16:21.00 |
| it is saying "Unrecognixed switch:" (and doesn't tell me which one) | 16:21.02 |
| chrisl_away: thanks. I'll continue from here | 16:21.11 |
| I have to run an errand. bbiaw | 16:21.58 |
| oh, nm. it was the -q | 16:22.39 |
| pcl6 on that plrm.prn was 31 seconds ! | 16:24.03 |
henrys | I guess I should add -q for consitency | 16:24.12 |
nemo | Say guys, I'm trying to fix some horrendous PNGs to be a bit less horrible | 18:52.08 |
| I decided to try gs against one of the worst offenders... | 18:52.20 |
| using some options mentioned on stackoverflow that appeared to be about reducing colours which sounded good | 18:52.47 |
| gs -dQUIET -dBATCH -dNOPAUSE -dPDFSETTINGS=/screen -sDEVICE=pdfwrite -dColorImageResolution=120 -dMonoImageDownsampleType=/Average -dOptimize=true -dDownsampleColorImages=true -dDownsampleGrayImages=true -dDownsampleMonoImages=true -dUseCIEColor -dColorConversionStrategy=/sRGB -dFIXEDMEDIA | 18:53.23 |
| also Resolution of 72 | 18:53.28 |
| he also had -dNOGC | 18:53.41 |
| I tried dropping that since gs was segfaulting | 18:53.53 |
| unfortunately also segfaulted w/ -dNOGC removed | 18:54.01 |
| gs 9.10 | 18:54.11 |
| this is just the default gs installed in ubuntu, and, well, symbols are lacking and there's no libgs-dbg | 18:56.52 |
| assuming that would help anything | 18:56.58 |
| I guess I could do my own build if pressed | 18:57.03 |
| #1 0x00007ffff71e22a2 in sclose () from /usr/lib/libgs.so.9 | 18:57.12 |
| #2 0x00007ffff71e3430 in s_close_filters () from /usr/lib/libgs.so.9 | 18:57.12 |
| #3 0x00007ffff730aebc in pdf_choose_compression () from /usr/lib/libgs.so.9 | 18:57.12 |
| the source PDF is 307 pages long non-lossy Flate at 400dpi | 19:01.14 |
| we really didn't need that quality for viewing these as the occasional reference doc online (besides the fact that there's the OCR text) | 19:01.44 |
| so was trying to crank it down quite a bit | 19:01.57 |
| my initial tests on smaller files had been pretty promising | 19:02.08 |
| anyway 'cause it was blowing up in "compression" I was wondering if there was some flag I could try adding/removing that might help | 19:04.35 |
| hm. removing -dNOGC does seem to make a difference on modestly smaller files | 19:08.22 |
| like a 193 page one I'm trying | 19:08.35 |
rayjj | nemo: it may be that it has been fixed. | 19:08.37 |
| can you post the one that crashes on bugs.ghostscript.com ? | 19:09.12 |
nemo | *sigh* | 19:09.45 |
| so tired of registering for bug systems :-p | 19:09.50 |
rayjj | nemo: and kens no longer recommends setting -dUseCIEColor | 19:10.09 |
nemo | ok. | 19:10.28 |
| no idea what that does :) | 19:10.33 |
rayjj | nemo: and I notice you have -dFIXEDMEDIA but without a -gWWWxHHH or -sPAPERSIZE and without -dFitPage, so you will get the default page size and input pages that differ in size will get clipped or have margins | 19:13.03 |
nemo | ah. I see | 19:14.07 |
rayjj | nemo: it may be that you don't want -dFIXEDMEDIA if you are just trying to have it reduce the PDf size and image resolution | 19:14.08 |
nemo | ok | 19:14.11 |
| dropping that | 19:14.13 |
| rayjj: I'm pretty sure they are all the same size, but not positive ofc | 19:14.38 |
| and yeah. mostly just want it downsampled | 19:14.46 |
| colourspaces shrunk, dpi reduced, lossy to the point of "readable" | 19:15.03 |
rayjj | I'm not sure, but if speed isn't a problem, you _may_ end up with better quality with -dDOINTERPOLATE to force interpolation on images before downsampling | 19:15.48 |
nemo | so far my tests are resulting in files that are 20-25% size of originals, but still "good enough" for what we need it for | 19:15.51 |
| rayjj: ah. well. don't particularly care 'bout quality. | 19:16.06 |
| would it reduce the image size? ⺠| 19:16.18 |
| speed isn't a big deal. can let this run overnight | 19:16.35 |
| trying to avoid needlessly dumping 20 gigabytes of data into our database | 19:16.50 |
rayjj | nemo: I don't know, but you may also want to force the compression type to JPEG and that *might* avoid your crash since it seems to be in the "choose_compression" logic | 19:17.53 |
nemo | hmmm | 19:18.29 |
| cool | 19:18.30 |
| how do I do that :) | 19:18.34 |
rayjj | nemo: although it is lossy, JPEG (DCT) compression is much better compression than lossless Flate | 19:18.35 |
nemo | yeah. absolutely | 19:18.39 |
| that was one of the things we complained about to the users | 19:18.47 |
rayjj | knew he was going to ask "how" :-/ | 19:18.55 |
nemo | 1) why are you spending a huge sum of money to have a professional scan shop do that | 19:18.59 |
| ... when the results you want can be achieved by just dropping these files in your office scanner | 19:19.16 |
| 2) mr. profesisonal scanner, after we sent you our criteria and recommendations, why did you still choose 400dpi lossless | 19:19.32 |
| *professional | 19:19.36 |
| esp for the 2nd batch after we complained about the first one | 19:19.47 |
| not my decision in the end ofc, but hoping to make accessing these files in the future more pleasant | 19:20.15 |
rayjj | nemo: try -dColorImageFilter=/DCTEncode | 19:20.17 |
nemo | and also reduce size of database by 40 gigabytes or so | 19:20.23 |
| huh. this is odd | 19:20.33 |
| I didn't think to time it, but my rerunning of my test pdf is taking a lot longer | 19:20.49 |
| oh. that was with -dFIXEDMEDIA removed | 19:20.58 |
| maybe that made a difference | 19:21.01 |
| also no -dUseCIEColor but that probably isn't it | 19:21.13 |
| rayjj: thing w/ ghostscript is, it is an insanely powerful tool, but my interaction w/ it is like, once every 5 years or so, when something like this comes up | 19:21.45 |
rayjj | nemo: maybe you were clipping off some data. gs -q -- toolbin/pdf_info.ps input.pdf will tell you the page sizes | 19:21.51 |
nemo | so I'm usually reduced to searching stackoverflow for promising stuff | 19:21.53 |
| Last OS error: No such file or directory | 19:22.34 |
| maybe I don't have pdf_info.ps | 19:22.40 |
rayjj | nemo: there are often gs developers here, and we cover European and US timezones (UTC to Pacific) and many of us are online at odd hours | 19:23.03 |
| nemo: you can just grab that one file from: http://git.ghostscript.com/?p=ghostpdl.git;a=blob_plain;f=gs/toolbin/pdf_info.ps;hb=HEAD | 19:24.15 |
nemo | WRT analysing the PDFs. certainly possible. I was doing it much more crudely by grepping strings | 19:24.22 |
| hunting for filter lines and checking width/height | 19:24.36 |
rayjj | nemo: that doesn't work if the objects are compressed. | 19:25.01 |
nemo | looks like the pages were a little different, but not a lot | 19:27.12 |
| probably scanner autosensing sizes slightly different | 19:27.36 |
| apart from a couple of a bit larger ones which were probably their idiotic scanning of the manila folder dividers themselves | 19:28.01 |
rayjj | I have to run an errand. I'll check back here later | 19:29.00 |
nemo | thanks | 19:29.03 |
| so far the one w/ DCT specified isn't crashing.. | 19:29.28 |
| made it through the medium-large file without crashing w/ DCT specified. now trying the biggest of the bunch | 19:43.56 |
| gs -dQUIET -dBATCH -dNOPAUSE -dPDFSETTINGS=/screen -sDEVICE=pdfwrite -dColorImageResolution=120 -dMonoImageDownsampleType=/Average -dOptimize=true -dDownsampleColorImages=true -dDownsampleGrayImages=true -dDownsampleMonoImages=true -dColorImageFilter=/DCTEncode -dColorConversionStrategy=/sRGB -sOutputFile=test5.pdf | 19:44.08 |
| btw. why do some of the options use / and some not? | 19:44.29 |
| heh | 19:52.24 |
| clearly you have an internet issue | 19:52.36 |
| rayjj: looks like your suggestion fixed my probs. thanks! | 20:08.39 |
| ColorImageFilter that is | 20:08.43 |
| rayjj: hm. can I also specify DCT compression level/lossiness? | 20:08.55 |
| these could probably get a lot muddier given the use | 20:09.26 |
rayjj | nemo: sorry about the net problems (I'm on wireless in a room far from the base) | 20:13.30 |
nemo | no need to apologise, you've been very helpful | 20:14.40 |
rayjj | nemo: AKAIK, you can't adjust the JPEG quality levels used by pdfwrite (the jpeg device has device specific parameters, but pdfwrite ignores these) | 20:14.42 |
nemo | ah | 20:14.47 |
| oh well... | 20:14.50 |
| rayjj: yeah, I saw the JPEG params in RTFM | 20:15.04 |
| but couldn't find ones for PDF generation | 20:15.09 |
| rayjj: there was also an amusing lecture in docs about how using JPEG is naughty | 20:15.19 |
rayjj | nemo: you can modify the code to set them since the filters support it, either as a param or just hard coded | 20:15.29 |
nemo | rayjj: eh. not worth it for this one-off task | 20:15.37 |
rayjj | nemo: JPEG for text makes for nasty looking (IMHO) text | 20:15.55 |
nemo | rayjj: true | 20:16.23 |
| but it isn't relative to this case | 20:16.32 |
| *relevent | 20:16.35 |
rayjj | nemo: do you need the text to be searchable, or are you just looking for the image ? | 20:16.39 |
nemo | relevant | 20:16.44 |
nemo | sighs | 20:16.45 |
| spelling failday | 20:16.48 |
| searchable | 20:16.58 |
| rayjj: the text has been OCR'd. the image around it is more just as a matter of record | 20:17.11 |
| odds of anyone ever reading a given page of one of these docs ever again is pretty low | 20:17.26 |
| so saving 30 or 40 gigs of database space seemed worth it | 20:17.40 |
rayjj | ok. I was going to suggest two passes -- to JPEG (adjusting the quality) then to PDF, but going to JPEG will lose the searchable text | 20:17.52 |
nemo | yep | 20:17.59 |
| no biggie. | 20:18.03 |
| would be a nice to have. but what I have now is already a huge win | 20:18.14 |
| rayjj: so. we have 2 sets of awful docs | 20:18.20 |
| both at 400dpi, lossless | 20:18.31 |
| the first set they didn't do background removal on, even. | 20:18.38 |
| after we complained, they added that | 20:18.43 |
| (ignored the stuff about dpi and lossiness) | 20:18.50 |
| rayjj: on the first set, the ghostscript stuff above reduces size to 1/5th the original | 20:19.11 |
| on the 2nd set, to 1/3rd | 20:19.16 |
| what would be sweet is if gs was really smart about this, and cranked up losiness on pages with little to no text (text is searchable after all) | 20:20.01 |
| and cranked it down in pages that are text heavy | 20:20.16 |
| that's asking for way too much tho ⺠| 20:20.31 |
rayjj | nemo: at least for 'out of the box' That's why it's open source, for those that really need something, they can do it | 20:22.08 |
| nemo: that's what JPX is _supposed_ to do -- it recognizes "symbols" (discrete blobs) in images and builds a dictionary (sort of like a font of pictures) and then uses those wherever else they occur in the page | 20:25.49 |
| but most JPX encoders don't bother with this | 20:26.07 |
| nemo: what would be best is for scanners to recognize monochrome areas and use CCITT or JBIG2 on those areas, and only use color when it's needed | 20:27.43 |
nemo | rayjj: oh. is that the one that caused that hilarious screwup on small font size financial docs recently? | 20:28.43 |
| rayjj: like, swapped an 8 for a 0 or something | 20:28.51 |
rayjj | nemo: that was _probably_ an OCR problem on a scanned doc | 20:30.32 |
nemo | http://www.nbcnews.com/tech/tech-news/copier-conundrum-xerox-machines-swap-numbers-during-scans-f6C10860706 ? | 20:30.58 |
| that's not OCR | 20:31.14 |
| that's jpeg | 20:31.17 |
| or well. image side | 20:31.20 |
| not jpeg but whatever this JPX stuff is I guess | 20:31.26 |
rayjj | the font size and font choices on bank statements really irritate me. They use fonts that make "6" and "8" and "5" almost indistinguishable | 20:31.43 |
nemo | oh. JBIG2 | 20:31.47 |
| anyway. there was arguments about scan artifacting, but as people noted, the hourglass shape of the 8 was distinctly visible | 20:32.07 |
| "D. Kriesel writes in a follow-up that the issue may in fact affect higher quality settings, in contradiction with Xerox's assurances. We will follow up as soon as we have more information." | 20:32.40 |
| yeah heard about that later too | 20:32.44 |
| rayjj: well. anyway. doesn't seem like a good choice for docs where a single character substitution could be a big deal | 20:33.09 |
| esp sheets of numbers | 20:33.12 |
| might be ok for these tho | 20:33.21 |
rayjj | kens: (for the logs) ps2write seems to do a sucky job of converting the PLRM to ps2 -- It spends a LOT of time loading "CharProc" objects which are inline images, CCITT compressed. For the first 100 pages, it has 11,458 of them | 22:39.07 |
| kens: even if it has to use inline images for CharProcs, the PLRM fonts _should_ be shareable across pages | 22:39.41 |
henrys | rayjj: can you tell me what the gs numbers are for plrm either 100 pages or in pages per minute? | 22:40.09 |
| rayjj: there shouldnât be any difference at all. | 22:40.47 |
rayjj | henrys: you are in luck I was just about to send out the updated chart with numbers for the Pi (including PCL) | 22:41.22 |
henrys | rayjj: if there is somethingâs wrong and Iâll look at it. | 22:41.24 |
| rayjj: okay | 22:41.41 |
rayjj | henrys: Once it get going (finished loading all of the CharProc objects), then ps2write is OK, but on the Pi it takes 20+ seconds before doing the first page (out of 103 total) and it's still slower than PDF input | 22:46.46 |
| henrys: also, I can't get anywhere close to mvrhel's PLRM.pdf times | 22:47.26 |
| PLRM 100 pages | 22:54.17 |
| PCL TimePS TimePDF TimePCL PPMPS PPM PDF PPM | 22:54.19 |
| 31.1 103.9 62.4 192.9 57.7 96.2 600 dpi | 22:54.20 |
| 96.3 158.9 121.9 62.3 37.8 49.2 1200 dpi | 22:54.22 |
| email sent with the timings (to mvrhel, cc tech) | 23:14.33 |
| henrys: just curious -- should we try gxps ? | 23:15.25 |
henrys | rayjj: I think so. | 23:16.00 |
rayjj | henrys: do you want to generate the XPS files from the "original" J files (note we only need J9, J11 and J12 for now)? | 23:19.25 |
| henrys: MS has a built in "XPS writer" | 23:19.46 |
robin_watts_mac | rayjj: What rpi distro are you using? | 23:21.33 |
| The original distro didn't make full use of the NEON copro for fp, so it was slower. | 23:21.59 |
rayjj | robin_watts_mac: Linux raspberrypi 3.2.27+ #250 PREEMPT Thu Oct 18 19:03:02 BST 2012 armv6l GNU/Linux | 23:22.06 |
robin_watts_mac | The latest raspbian should be faster. | 23:22.08 |
rayjj | robin_watts_mac: how do I update (I just bought the pre-built SD card) | 23:22.39 |
| robin_watts_mac: note that my times are faster than mvrhel's last year (except for the PLRM.pdf) and much faster than what chrisl got | 23:23.56 |
| me googles... | 23:24.20 |
robin_watts_mac | rayjj: Just looking, | 23:24.53 |
| http://www.raspberrypi.org/downloads/ | 23:25.47 |
| That has a Raspbian image for debian wheezy. | 23:26.05 |
| Dating from june 2014 | 23:26.17 |
rayjj | robin_watts_mac: OK. I also see the http://www.raspberrypi.org/documentation/raspbian/updating.md instrcutions | 23:26.58 |
robin_watts_mac | As of june 2012 there was no hardware FP support. Dunno if that had changed by october. | 23:28.02 |
| Apparently it was 18th September 2012 when they released a version with hardware FP | 23:29.21 |
| so you should be OK with the version you have. | 23:29.31 |
| rayjj: How are you timing PLRM.pdf ? | 23:29.49 |
| Are you using mudraw PLRM.pdf 1-100? | 23:30.02 |
| or did you mutool clean -ggg PLRM.pdf PLRM100.pdf 1-100 ? | 23:30.24 |
| and then time PLRM100.pdf ? | 23:30.33 |
rayjj | robin_watts_mac: timing mudraw comes next | 23:30.40 |
robin_watts_mac | rayjj: Cos operating on a PDF file that contains just pages 1-100 is probably faster than trying to operate on the whole file and just process the first 100 pages, right? | 23:31.22 |
rayjj | robin_watts_mac: I created a 100 page PDF using Acrobat | 23:31.23 |
robin_watts_mac | Acrobat probably used compressed streams etc that would have slowed it down. | 23:31.56 |
rayjj | robin_watts_mac: only slightly faster with gs than just using -dLastPage=100 | 23:31.59 |
robin_watts_mac | Try making the PDF with just the first 100 pages in using mutool. | 23:32.22 |
rayjj | robin_watts_mac: OK. | 23:32.30 |
| well, it has hundreds of upgrades to do (500+) so it'll be a while before I can test | 23:43.39 |
| Forward 1 day (to 2014/08/07)>>> | |