IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2012/02/13)2012/02/14 
mvrhel_laptop alexcher are you around?00:59.59 
alexcher mvrhel_laptop: yes, I've received your message.02:08.41 
mvrhel_laptop alexcher: ok great. did it all make sense to you?03:46.15 
noobirc MuPDF: Makefile:160, install cbz/mucbz.h $(PDF_APPS) $(XPS_APPS) $(MUPDF) $(bindir)06:47.34 
  I guess mucbz.h goes to the wrong plce=)06:47.50 
ray_laptop Robin_Watts: henrys: I got the image rotation change working -- once I worked through the transformations, I was able to do it with just the ImageMatrix mods. The good news is that (at least with the simulator) the performance boost is awesome -- 2.8x (2.3 sec for the 4 pages of PWTTQ1CC vs. 6.6)07:26.06 
  Robin_Watts: I haven't had a chance to throw much at it, but I will tomorrow. I will put the change into HEAD and run a cluster regression as well as looking at the simulator output for the files I've found that will take advantage.07:27.20 
  g'nite all...07:31.03 
  cluster run started ...07:36.36 
Robin_Watts tor8: mudraw pushed for your delight and delectation.11:18.12 
  tor8: I've saned it locally and it passes.11:18.27 
  (with a modified sane, obviously)11:18.34 
chrisl Robin_Watts: where are you at with the cust 532 performance issues?11:21.39 
Robin_Watts I believe that with Rays changes he was cluster testing earlier, we should hit the performance targets for all but 1 file.11:23.32 
  I spent some time staring at that file yesterday and didn't really get far.11:23.48 
chrisl It's just, I think I need to ask you to look at the patterns problem again :-(11:24.20 
Robin_Watts I spotted a potential optimisation in the clist code to do with eliminating clipping path use when it wasn't required, but my patch ran into problems.11:24.32 
  I mailed ray about it, but he's (understandably) not got back to me yet.11:24.48 
  chrisl: OK.11:24.58 
  I'm just heading out for a run now, but if you put details on here, I'll look when I get back.11:25.22 
chrisl If you look at http://www.ghostscript.com/~regression/chrisl/compare3.html and test 60, you can see the issues11:25.25 
Robin_Watts And why is that a problem?11:26.19 
  because it's more ziggy zaggy ?11:26.44 
chrisl Yes, there's also a where there is a gap that wasn't there before - I'll find an example for after you're been running11:27.25 
Robin_Watts ok.11:27.35 
chrisl Robin_Watts: (for when you return) test 83 on: http://www.ghostscript.com/~regression/chrisl/compare5.html 11:57.29 
  and test 96 on: http://www.ghostscript.com/~regression/chrisl/compare6.html11:57.52 
sebras tor8: you did pick up on noobirc's remark..?11:58.04 
tor8 sebras: yeah, I fixed that last week11:58.52 
sebras ah, sweet.11:59.07 
  now, back to the meetings...11:59.13 
tor8 commit 8e8fc85624b66e59e5e284de7f532522e776a35311:59.16 
chrisl Robin_Watts: Both the above show gaps that weren't there before. On *some* tests, Acrobat shows gaps where we didn't before, and do with our proposed change (can't compare PCL/PXL).12:00.02 
Robin_Watts chrisl: We *are* in an area where differences are permissible.12:46.29 
  The spec deliberately doesn't specify exact behaviour.12:46.49 
chrisl Robin_Watts: I know, and my first reaction was exactly that. But I'd like a second opinion before going with that12:47.31 
Robin_Watts Is your gut feeling that this is an improvement or a regression?12:47.59 
  My gut says regression, but I've only looked at a few.12:48.28 
chrisl Overall, I feel this is an improvement - there are a lot that we are closer to Acrobat with the change than before - it's just unfortunate that the ones that look worse look very obviously worse.....12:49.43 
Robin_Watts How did you run the bmpcmp ?12:51.50 
chrisl -t 16 -w 3, and I filtered the initial push with filter=ppm12:52.30 
Robin_Watts Right.12:52.37 
chrisl With a lot of those files, halftoned output was confusing as hell!12:53.22 
Robin_Watts oh yes...12:53.37 
  And seeing pbm/pgm etc variants of the files really doesn't add anything.12:54.10 
chrisl Exactly12:54.20 
Robin_Watts Test 66 is a shame12:57.43 
chrisl Test 66, I think, is actually *closer* to Acrobat's output - although not the same.....12:58.54 
Robin_Watts ACrobat gives the gaps ?13:00.00 
chrisl Not exactly, but it doesn't show a constant shading as we originally did13:00.45 
  Robin_Watts: actually, the 300dpi tiff output from Acrobat *does* show the gaps, just like the bmpcmp13:01.46 
Robin_Watts So the question occurs - do we want to attempt to emulate acrobats flaws?13:02.26 
  bbs13:02.37 
chrisl I would say "no". *But* I do think the pattern stepping we have is sufficiently off (for example, with fts_06_0618.pdf) we should address it - I doubt we'd get away with waving the spec at cust 532.13:07.03 
  If addressing that stepping problem results in "poorer" output from the file in test 66, that nevertheless matches Acrobat (more closely), then so be it. But there are others which I'm less confident about.13:08.49 
  FWIW: Acrobat's output: http://www.ghostscript.com/~chrisl/Bug690637.tif13:10.12 
  Oops, better idea (compressed): http://www.ghostscript.com/~chrisl/Bug690637.tif.zip13:13.18 
Robin_Watts I wonder if it's worth pulling 532 into this?13:18.37 
  We could send them a mail, saying that this is an area of the spec where there is considerable latitude left to the consumer, and everyone works differently.13:19.07 
  Say that we have a possible change that helps their test case, but it hurts in other cases.13:19.23 
  There is no one "correct" thing to do; so if they want to adopt the change, then they must be sure they are happy with it, and the onus for testing it should be on them.13:19.54 
chrisl Regardless of what 532 said, I still think there's an issue13:22.13 
Robin_Watts Right, but the time pressure is from them13:22.36 
  I have to run to the shop for hallmark day, then it will be lunchtime. back after that sorry.13:23.38 
chrisl I'll be going to squash shortly, too13:23.56 
jvervier Hi all14:04.43 
  I'm joining you because I'm looking to know if something is possible when using ghostscript on windows14:05.18 
  Is it possible to launch a gs32win.exe command with parameters without opening any other ghostview window during the process? (Like with linux distribution when I launch the command line)14:06.08 
  ?14:06.10 
Robin_Watts gswin32c.exe14:11.14 
  rather than gswin32.exe14:11.48 
jvervier indeed !14:19.49 
  Forget this one14:19.55 
  sorry :D14:19.58 
  It's working by this way14:20.06 
Robin_Watts fab.14:20.15 
wwww quit15:02.13 
Robin_Watts There is scope for accelerating the skip_black_pixels, skip_white_pixels stuff in the bitmap compression code by using platform specific code.15:53.07 
  chrisl: You here?16:13.48 
chrisl Robin_Watts: yes, just getting a cuppa16:15.12 
Robin_Watts The 532 file I'm currently looking at is WWTTM1CT.16:15.38 
  It mostly has text in it.16:15.43 
  but lots of time is spent in gx_fill_path16:15.58 
chrisl That's plausible depending on what the text rendering mode is in force16:17.03 
Robin_Watts which seems odd to me. I thought rasterisation was supposed to be happening through freetype ?16:17.17 
  Oh, so what modes stay in our stuff ?16:17.32 
chrisl Well, actually, cust 532's code pre-dates freetype anyway16:17.49 
Robin_Watts OK,that makes sense then.16:17.58 
chrisl For base 136 fonts, their code is using UFST, but not for any embedded fonts16:19.22 
ray_laptop hi, all16:38.00 
chrisl hi, ray_laptop, how's it going?16:38.22 
ray_laptop OK. Got the kids off to school with lunches, books, instruments, ... I'll do a quick market in a bit, then go see Karen. She's scheduled for the OR on Wed.16:39.51 
henrys ray_laptop:please wish Karen well for me.16:40.43 
Robin_Watts ray_laptop: Delayed from yesterday?16:41.02 
  ray_laptop: Great news about the speedups (I saw the logs)16:41.17 
  Did that test out OK? Is there anything we can do to help with that?16:41.40 
  (I thought Karen was due for the OR late yesterday?)16:42.06 
ray_laptop Robin_Watts: looks like there are a couple of files to look into 035-01.ps 148-01.ps 148.11 Bug687603.ps Bug687889.pdf Bug691554.eps, ...16:43.43 
Robin_Watts I'll try to look at some of those.16:44.17 
  I'm failing dismally to come up with anything to speed up WWTM1CT.pdf16:44.33 
ray_laptop Robin_Watts: she was, but a scheduling conflict (something urgent) came up. Hers is not as time critical -- 4 days post injury for repair is not uncommon16:44.38 
  Robin_Watts: I'll submit a bmpcmp now, OK ?16:44.59 
Robin_Watts ray_laptop: Right, that makes sense.16:45.02 
  fab.16:45.04 
ray_laptop Robin_Watts: frankly I was surprised that the performance difference was so large, but the wheel output looks OK, and the same patch on HEAD works with no diffs.16:46.40 
Robin_Watts I'm not :)16:47.22 
henrys WWMTM1CT.pdf is primarily text?16:47.54 
Robin_Watts WWTTM1CT.pdf, yes.16:48.09 
henrys are we sure the cache hit rate is okay on the target?16:48.24 
ray_laptop I'll drop Eric a note and tell him that the performance improvements look encouraging, but we're looking at a few files that show diff.16:48.30 
Robin_Watts henrys: No idea.16:48.42 
  The whole runtime is 75 seconds (ish)16:49.37 
ray_laptop henrys: WWTT{MN}1CT are text. 'M' is all black. 'N' is blocks of 4 different color text in 4 sections down the page16:49.44 
  henrys: they are both 50 page files.16:49.55 
henrys well there should be a call in the profile for rendering from cache for each character.16:49.56 
ray_laptop henrys: there are16:50.16 
Robin_Watts The clist playback is 39 seconds or so.16:50.23 
henrys hmm not something we have a history for being slow on.16:50.51 
ray_laptop henrys: iirc, the first page subsete file had 5069 "render from cache" calls16:50.52 
  iirc, gx_image_cached_char16:51.18 
Robin_Watts I see 31595 calls to that.16:52.05 
  oh, but that's on all the pages.16:52.12 
henrys this is the one they print faster than the blank pages?16:52.13 
ray_laptop Robin_Watts: henrys: I llooked and I had thought that forcing it to not use the image_init, ..., path and using the 'copy_mono' would be better, but it didn't look like it on the simulator16:52.30 
  henrys: there is something funny about the 50_blank_pages on their engine, but we haven't focused on that.16:53.06 
  henrys: our blank page time is faster than the page with the text16:53.34 
henrys so the ufst renderer is faster than ours ... But with a good cache hit rate I wouldn't expect that to matter much.16:54.04 
ray_laptop our blank pages is faster than their time with the text at least16:54.18 
Robin_Watts The char rendering is being done in our renderer, I believe.16:54.29 
henrys I assume we're using our renderer and 5th gen is ufst16:54.30 
ray_laptop henrys: this is using UFST I think. Another file (J11) uses our type42 and type1 rendering for embedded fonts16:55.09 
chrisl henrys: 6th gen is using UFST for MT fonts, but embedded fonts still use the AFS code16:55.24 
Robin_Watts ray_laptop: Not using UFST for this, I'm fairly sure.16:55.38 
ray_laptop Robin_Watts: let me check the profile... 16:55.53 
Robin_Watts 47 seconds is in gx_default_fill_path (I think, their profile is hard to real)16:55.54 
henrys chrisl:right ... 16:56.01 
Robin_Watts 22+2516:56.03 
henrys I assumed embedded.16:56.09 
  Robin_Watts:and you are profiling 8.something for this or just using the simulator?16:56.49 
Robin_Watts I'm looking the profiles they supply.16:57.43 
  My attempts to profile their stuff have failed so far.16:58.11 
chrisl Robin_Watts: what type of text is it (western, kanji etc)?16:58.32 
Robin_Watts (And in fact, I can't seem to get very sleepy to find symbols for anything since I installed Windows 7 :( )16:58.55 
  western.16:58.59 
  It's english text. About cowboys.16:59.09 
chrisl Maybe Ghostscript doesn't like the old west?16:59.33 
ray_laptop chrisl: on the 1200 bit profile of a 6-page subset file, I see 31,595 calls to gx_image_cached_char -- there are 980 calls to gs_type1_interp_init and only 490 calls to gs_type1_interpret17:00.09 
henrys ray_laptop:is karen a regular skater? That must have been quite a fall.17:00.32 
  well I guess we should start the meeting.17:01.14 
Robin_Watts ray_laptop: 38 seconds is spent in clist_fill_mask17:01.17 
  Of which 17 seconds are in clist_change_bits - most of which is (I believe) doing 2d compression of the character.17:01.54 
chrisl ray_laptop: I'm bemused about the calls to gs_type1_interp_init and gs_type1_interpret - I would expect the other way around!17:02.17 
henrys mvrhel and alexcher are setup.17:02.27 
ray_laptop hmm... I looked at the PDF file. It is an embedded Type1 font subset "/BaseFont /KAKDNP+TimesNewRomanPSMT"17:02.48 
henrys mvrhel and I keep bumping the priority of the icc user params in pcl, but think I can start today.17:02.54 
mvrhel ah ok. if you need anything from me, let me know henrys17:03.11 
henrys mvrhel are you alexcher good?17:03.42 
mvrhel well, I would like get a "ok I understand" or a "I don't have any idea what you want" back from him17:04.09 
chrisl ray_laptop: double the number of calls to interp compared to interp_init would be plausible for various reasons, but the other way round is slightly odd17:04.11 
henrys this hpgl/2 stuff I have is painful, 3x3 foot plots all over the place.17:04.51 
alexcher henrys: the patch looks OK but I didn't think much about it.17:04.58 
henrys thinking is good ;-)17:05.45 
mvrhel alexcher: so do you know what I need you to do on the interpreter end for the output intent support?17:06.03 
Robin_Watts alexcher: You don't think much of it, or you haven't thought much about it?17:06.09 
alexcher mvrhel: I'll review the patch today. 17:07.11 
mvrhel well, its not a patch now17:07.19 
  its in the trunk17:07.22 
Robin_Watts (Sorry, I misread alexes initial reply, hence my question. I think it's clear he meant the latter, sorry)17:07.41 
mvrhel review the steps that I said I would like you to do in the intepreter17:07.46 
  please17:07.50 
ray_laptop Robin_Watts: looking at the 6_page profile, there is 110 seconds in gx_ht_alloc_cache called from gx_dc_default_fill_masked. That seems like a lot17:07.54 
Robin_Watts ray_laptop: I'm looking at the profiles from Build_0036. Those are the most recent, I believe.17:08.30 
ray_laptop Robin_Watts: OK. I'll switch over -- I was probably looking at an older one.17:09.26 
Robin_Watts WWTTM1CT_1200_1bit_hprof.txt17:09.29 
henrys can we ask them to reproduce the profiling results in the simulator? I assume they can profile the simulator and if the results are quite different than the target we know there isn't a lot we can do.17:09.41 
  other than hardware platform stuff.17:10.09 
Robin_Watts henrys: Well, assuming the call graph is the same (which we'd hope it would be, otherwise, what's the point?) then we can identify issues by examining the profiles they supply, and then use the simulator to step through.17:10.57 
ray_laptop henrys: their simulator profile was screwy enough that I was ignoring it. Also they don't seem to be able to get the profiler to run with a non-DEBUG build, so the timings are not comparable17:11.46 
Robin_Watts But I find it hard to believe that we spent 77 seconds in 22 calls to gs_push_boolean17:11.46 
henrys I am just concerned there is something wrong on the printer - integration or something, that would be obvious comparing the target and simulator but after ray_laptop's comment nvm.17:14.03 
  chrisl:do you have anything for the meeting?17:14.37 
  marcosw_:I've installed all the packages?17:14.52 
  s/?//17:14.58 
chrisl henrys: no nothing from me this week, I don't think.....17:15.13 
Robin_Watts The mupdf customer has been experimenting with multithreaded rendering, and I *think* he's got it working.17:15.29 
  He reported some problems, and I pointed him at fixes, and he's gone quiet, so presumably that's a good thing.17:16.08 
henrys Robin_Watts:that is very good. We need to think about where we want to show off the new android app now that paul is finished.17:16.17 
Robin_Watts henrys: Well, presumably we want to get it listed on the android app store, in the same way that we are on the ios app store?17:17.14 
henrys yes, has tor8 finished reviewing?17:17.42 
Robin_Watts But we can offer download links to it on mupdf.com already.17:17.44 
henrys I assume the android store has fewer police.17:18.07 
Robin_Watts henrys: Yes, you get to keep your testicles, rather than having to hand them to apple in a jar.17:18.41 
henrys I guess we'll release this on the world with the upcoming mupdf release.17:19.42 
Robin_Watts ray_laptop: It seems odd to me that clist_playback_file_bands can take 39seconds, where gx_image_cached_char takes 42.17:20.48 
  Are we imaging the cached char BEFORE the clist ?17:21.12 
ray_laptop Robin_Watts: w.r.t gs_push_boolean -- the profle says it is calling gs_interpret, so I think their tool is getting the function names wrong. It was probably supposed to be gs_main_interpret which is also in the same .o (imain.c)17:21.52 
Robin_Watts ray_laptop: Yeah, I figured it was something like that.17:22.12 
  It's EXTREMELY annoying that their tool does that.17:22.20 
chrisl Robin_Watts: yes, I believe that's how glyph rendering and clist work - otherwise the clist wouldn't be self contained17:22.48 
ray_laptop Robin_Watts: gx_image_cached_char happens during clist writing. It writes 'bits' to the tile cache. The rendering does 'copy_mono' from those bits17:22.51 
  the clist doesn't know anything about characters17:23.11 
Robin_Watts Right. So on every glyph we image, we recompress the bitmap into the clist ?17:23.14 
  Is there no way we can only compress each image into the clist once ?17:23.43 
ray_laptop Robin_Watts: when we put the tiles into the cache, I think that's where we compress17:23.54 
  Robin_Watts: there _should_ be good hit rate in the cache. Maybe we are seeing that the tile cache isn't large enough for 1200 dpi17:24.42 
tor8 Robin_Watts: about the android app, the ChoosePDFActivity should probably be a file browser so you can find documents in other locations too (to match what other android apps do)17:24.56 
  and I'll want to go over the icons a bit before release17:25.09 
Robin_Watts tor8: File browser - sure, but I don't think we can ask Paul to do that (at least not as part of the original quote).17:25.50 
tor8 I also wonder if it's possible to constrain the vertical scroll bouncing17:26.02 
henrys tor8:how are the docs (sebras) coming? Is there a work in progress somewhere we can look at?17:26.11 
Robin_Watts tor8: Of the pages when zoomed out ?17:26.18 
tor8 yeah. when flipping left and right it's a bit disconcerting how it wobbles up and down at the same time17:26.35 
Robin_Watts henrys: sebras has a branch in his git repo.17:26.45 
  (on casper)17:26.50 
henrys oh okay I'll check it out.17:27.12 
ray_laptop Robin_Watts: I'll let you focus on mupdf and we can discuss WWTT performance later. I have to run a couple of errands and go over to see Karen17:27.12 
Robin_Watts ray_laptop: OK.17:27.25 
henrys let's call the meeting done.17:27.34 
Robin_Watts It might be worth us trying to up the cache a bit in the clist.17:27.39 
ray_laptop Robin_Watts: If you want to look at some of the bmpcmp results on the image rotation, just mention it in IRC and I'll check the logs before diving in.17:27.59 
Robin_Watts ray_laptop: Will do.17:28.07 
tor8 henrys: http://git.ghostscript.com/?p=user/sebras/mupdf.git;a=tree;h=refs/heads/doc;hb=refs/heads/doc17:28.14 
henrys tor8:thanks17:28.32 
ray_laptop Robin_Watts: we should get the same cache hit/miss sequence in the simulator (at 1200 dpi)17:28.37 
tor8 henrys: some docs in doc/ and others in fitz/fitz.h and pdf/mupdf.h17:28.53 
ray_laptop Robin_Watts: and we can look at its size and effectiveness17:29.03 
Robin_Watts ray_laptop: Indeed. I'll try and figure out how :)17:29.09 
ray_laptop bye for now....17:29.09 
Robin_Watts cu17:30.48 
henrys Robin_Watts:I wonder if we could keep the memory okay without compression - I assume that would speed things up significantly.17:36.57 
Robin_Watts Not as much as just making the cache large enough might.17:38.30 
  This is odd.17:41.17 
  It's finding things in the cache, and then compressing them anyway.17:41.28 
chrisl That's probably bad......17:41.44 
Robin_Watts oh, each bitmap gets compressed once per band.17:42.21 
henrys I assume it pulls the bitmap from the cache and compresses it stuff it in a band right?17:42.39 
Robin_Watts I'm confused then.17:42.59 
  We render glyphs to bitmaps. Those go into the font cache.17:43.19 
chrisl glyph cache17:43.33 
henrys It's been ages since I look at it.17:43.33 
Robin_Watts Then we pull bitmaps out from the font cache, and pass them to clist_fill_mask17:43.41 
  chrisl: sorry, glyph cache, not font cache (I stand corrected)17:44.00 
  clist_fill_mask puts the bitmaps into a tile cache.17:44.16 
henrys ah I didn't know it was using the tile cache I thought it would just do copy mono on the clist.17:45.39 
chrisl Robin_Watts: It's been a while, but isn't there a cache for objects which applies to all bands?17:46.17 
Robin_Watts Then it seems to encode each tile once per band.17:46.35 
  Potentially stupid question here, so please bear with me...17:47.26 
  The same time with compress the same regardless of which band it goes into, right?17:47.48 
  So why don't we just copy the compressed version from one band to another.17:48.04 
  (That may be akin to saying "So why don't we just make boats that can't sink?")17:48.37 
mvrhel bboab17:48.44 
  bbiab17:48.46 
Robin_Watts s/time/tile/17:49.31 
marcosw_ Sorry I missed the meeting. Was there anything for me other than that henrysx6 is ready to be re-enabled as a cluster node.17:50.05 
chrisl Robin_Watts: Does it do that for every glyph?17:50.06 
Robin_Watts Yes.17:50.13 
chrisl Regardless of which band(s) the glyph occupies17:50.37 
Robin_Watts As far as I can tell, every glyph causes a call to clist_fill_mask. That then looks for the tile in the cache.17:50.51 
  If it finds the tile it checks to see if that tile was signed as being in the required band - if not, it resends the tile for the new band.17:51.31 
  So, in this case, it means it's going to resend every tile for every band it's used in.17:51.56 
chrisl Do we have one tile cache? Or a tile cache per band?17:52.17 
Robin_Watts Wheras if it was smart enough to have said "this one will be in every band" to start with, it would only have to send it once.17:52.24 
  We have one tile cache. Each entry has a set of bits, one per band to say which bands it's in.17:52.43 
  Now, maybe we NEED to send it to every band - I confess, that I still haven't got my head entirely around the clist stuff.17:53.26 
chrisl Robin_Watts: I was wondering if we could special case tiles which represent glyphs, so they automatically get associated with every band - my feeling is that would be a useful thing for a lot of documents17:54.08 
Robin_Watts If so, then this may be almost the best we can do - but why go to the trouble of recompressing the bitmap again and again? Why not just copy the compressed representation from one band to another ?17:54.18 
  chrisl: That would be nice. But that may require us to send the glyph to every band anyway.17:55.08 
  which would mean we'd actually we worse off than we are now.17:55.36 
  'Just' keeping a note in the tile cache of whether we've compressed a tile before, and where we can copy the data from would give better results, I think.17:56.10 
chrisl I'm confused - I guess I don't understand the format of entries in the tile cache.....17:56.29 
Robin_Watts The tile cache contains uncompressed data, I believe.17:57.19 
  When we send a tile entry to the band we 2d compress it, and copy it into the band data (I think!)17:57.39 
henrys so storing the glyph compressed in the tile cache would work okay too?17:58.27 
Robin_Watts henrys: Yes, was just pondering that.17:59.07 
  I don't know what else the tile cache is used for.17:59.18 
chrisl Sounds like there's mileage in compressing all tiles in the cache, rather than on writing to the clist.17:59.26 
Robin_Watts chrisl: Yeah. Presumably nothing goes into the tile cache that won't be compressed into a band anyway.17:59.57 
  so we can't be any worse off.18:00.04 
chrisl Robin_Watts: exactly. I'm assuming from what Ray said earlier, *after* the tile cache, the clist has no idea the tile represents a glyph.18:00.47 
Robin_Watts I don't think it even knows at the tile cache level.18:01.20 
henrys but if the performance problem goes away just bypassing compression it is worth a look, that is such a trivial change.18:01.22 
Robin_Watts image_fill_mask doesn't even know that it's a glyph, as far as I can tell.18:01.59 
henrys the memory optimization may no longer be relevant.18:02.16 
Robin_Watts henrys: Possibly, yes. But at 1200dpi, what will that do for memory use?18:02.22 
  Typical glyph = 1/5 inch?18:03.03 
chrisl Presumably we only write each glyph once to each band, we don't write every *use* of every glyph?18:03.53 
Robin_Watts So 240x240 = 7.2K per glyph uncompressed.18:04.17 
  chrisl: indeed.18:04.22 
  So for upper and lower caps for a single font in a single size that would be 388K per band.18:05.12 
henrys and your probably getting 2:1 with the run length?18:06.10 
  maybe a little better.18:06.19 
Robin_Watts cmd_compress_cfe = fax compression, right?18:06.37 
  fax compression gets 4:1 (according to prof google)18:07.50 
  I'd hope more than that on such high res glyphs. 18:08.16 
  group 4 fax gets 15:1 (again from google). Don't know what we use.18:08.32 
henrys too many variables we'd need to look at a page full of glyphs - we aren't accounting for any overhead - each entry in the tile cache has a header etc.18:09.47 
Robin_Watts it's not space in the tile cache, it's space in the band data.18:10.19 
  (unless I'm not following)18:10.27 
  but I take your point.18:10.31 
henrys we have stats for that in the code.18:11.36 
Robin_Watts Let me see if I can find out how to print the completed band size.18:11.43 
henrys at 600 dpi 1 bit I don't see why we are banding at all but I won't go there.18:12.19 
Robin_Watts 1200dpi, 1 bit, but yes...18:12.56 
henrys yes18:13.03 
  so this job is okay at 600 or no?18:13.25 
Robin_Watts Urm...18:14.04 
henrys I should have the mail in front of me, sorry18:14.35 
Robin_Watts Yes.18:14.48 
  The last set of results we had from him...18:15.02 
  PWTTQ1CC was slow in all 3 modes, but rays rotation should sort that.18:15.22 
  WWTM1CT and WWTTN1CT were both slow in 1200 1bpp only.18:15.44 
  And that's all the results they'd put in bold in the email, so I assume we're off the hook for anything else.18:16.00 
chrisl Robin_Watts: sorry, just a thought - what about compressing with rle - it should be considerably faster than fax, and given the relatively small size of the bitmaps in question, I'm not sure that fax will really get into its stride18:16.17 
Robin_Watts chrisl: That is a good idea.18:16.41 
  Let me try and measure the differences in band size of these files.18:17.01 
  Any hints etc... :)18:17.33 
chrisl Erm, -Zl?18:17.56 
henrys right -ZLl says everything I thought.18:18.19 
  literally ... everything18:18.44 
  marcosw_:Robin_Watts recent change reminds me a smoke test for all the devices would be useful. What ever happened to that?18:22.30 
  marcosw_:nothing else I know of for the meeting.18:22.41 
  it seems odd we have so many bands at 120018:25.46 
marcosw_ henrys:I have a simple script that runs "gs -h" captures the output and then runs all the devices using each of the input files from the examples directory. It isn't run automatically, but could be.18:26.11 
henrys seems like a simple thing to do along with the regular regression run.18:27.04 
  there may be a bunch that fail and will need to be excluded or ignored.18:27.32 
marcosw_ the only problem is that lockups are not currently detected, the script just stops and I have to control-C and deal with it. I'll fix that and run it automatically daily, sending out an email for seg faults, error, or lockups.18:31.50 
henrys marcosw_:sounds good.18:32.15 
chrisl Robin_Watts: my ride to my next squash match is due any minute so.... I seem to remember there is quite a bit of initialization involved in fax encode and decode, and possibly some non-trivial flushing for encode, as well as the overhead of a fairly complex encoding: there is probably a "critical mass" where the bitmap is sufficiently large to warrant all the extra effort over simple rle.18:37.30 
Robin_Watts chrisl: Yeah, I'm almost there...18:37.44 
chrisl Robin_Watts: I just wanted to mention that before I have to disappear. I'd *hope* rle would get decent results, given the nature of a glyph bitmap18:38.32 
Robin_Watts With fax band sizes are typically 230K ish.18:41.24 
henrys chrisl_away, Robin_Watts:I guess switching the filters should be easy too.18:41.25 
Robin_Watts With rle 1 meg.18:41.37 
  I'd be really hard pushed to tell the difference between rle or none.18:44.47 
henrys quite a difference18:44.57 
  that is really quite surprising I would think rle would work well18:45.34 
Robin_Watts yeah, but fax gives rle in both directions, effectively.18:45.54 
  oh rle == none, is slightly surprising yes.18:46.14 
henrys yes that is what I meant18:46.24 
  anyway I've got to get back to my hpgl/2 foibles but I'll be about.18:48.26 
Robin_Watts rle does run length encoding of bytes, not bits, right ?18:52.30 
  So it'll only really kick in for 1bpp stuff when we have runs of > 24 pixels.18:53.06 
  I'm going to bale soon, so I'll send ray an email about this stuff. He may have thoughts.18:54.09 
henrys yes I was thinking that as well but we do trim the white space border from the glyph and a 1200 dpi glyph should give a nice repeat "full" bytes.18:55.08 
mvrhel_laptop ok this may be better19:19.04 
  Robin_Watts: so would it help at all if the glyphs were put in a location that was shared amongst the bands?19:19.55 
  similar to what is done with the icc profiles in the clist?19:20.07 
Robin_Watts mvrhel_laptop: That would mean we didn't recompress them more than once, yes.19:20.28 
  But I thought we wanted the band data to be completely separate per band ?19:20.50 
mvrhel_laptop ok. we added the profile stuff in somewhat generically to have this capability for other objects19:22.20 
Robin_Watts Well, that may be perfect then.19:22.42 
  but it's a question of how hard it would be to do (bear in mind they are using 8.71 + patches)19:22.59 
mvrhel_laptop oh. 19:23.05 
  darn. yes. all this stuff was added with 9.019:24.37 
  and I don't think they have those changes19:24.59 
  let me show you where the functions are hold on19:25.26 
  so if you search on ICC_BAND_OFFSET that should show you what you need19:27.43 
  basically the write, and read of the data19:28.27 
  The icc_table is a structure that has offsets stored in it that point to the icc profiles clist. I am not sure how you would want to do this for the glyphs. This may be bit too much of a project given the time constraints19:30.15 
Robin_Watts So, you have a pseudoband to which you write all the icc stuff.19:31.27 
mvrhel_laptop yes19:31.31 
Robin_Watts and presumably the idea is we'd have another one to which we write all the compressed bitmaps.19:31.42 
mvrhel_laptop yes19:31.47 
Robin_Watts I suspect with the learning curve involved with me getting over my fear of the clist, this will take too long.19:32.15 
mvrhel_laptop if it was in 9,0+ I think it would go quickly. But working from some cobbled 8.71+ version I have to agree19:32.46 
Robin_Watts I'm secretly hoping that ray will say "oh, well that's easy..." to either 3 or 4 from my email.19:33.02 
mvrhel_laptop I have to think 3 would be easy19:34.13 
Robin_Watts It sounds easy if you say it fast.19:35.43 
mvrhel_laptop and I was thinking that you would do that as mentioned in 419:35.46 
Robin_Watts It depends if we're OK to jump back and read from a file.19:36.05 
mvrhel_laptop with 4 there is no read yes?19:36.34 
Robin_Watts 4 may be easy - but I don't have a complete picture of how the tile cache is used.19:36.35 
  It depends if we need to have uncompressed access to the tile cache too.19:36.53 
mvrhel_laptop Can we not have both?19:37.06 
Robin_Watts I'm hoping ray knows all this stuff just off the top of his memory.19:37.11 
  mvrhel_laptop: You mean keep both compressed and uncompressed data in the tile cache?19:37.25 
mvrhel_laptop yes19:37.30 
  during writing19:37.34 
Robin_Watts We could, but that will bloat the tile cache.19:37.35 
  Maybe that's acceptable though.19:37.50 
mvrhel_laptop true. I don't see why we would need the uncompressed 19:38.09 
Robin_Watts I was thinking that if we only had to hold compressed data, we'd (presumably) be better off than we are now too.19:38.10 
mvrhel_laptop yes19:38.18 
Robin_Watts Ah. If I'm remembering this right... the tile cache is used both in writing and reading.19:43.02 
  The data is uncompressed from the band into the tile cache.19:43.31 
  at the same offsets etc at which it was compressed INTO the band.19:43.47 
  So if we do move to holding compressed data in the tile cache for writing, then we'd need to ensure that we at least kept the spacing large enough for the uncompressed data.19:45.17 
  So it's not a showstopper, just an extra bit of complexity.19:45.44 
  When we go to put something into the tile cache during writing, we'd have to calculate the space the uncompressed thing would use.19:46.12 
  Then attempt to compress into that space - if it fits, great. If not, we'd just copy the uncompressed thing.19:46.29 
  Then writing into the band we just copy the data (compressed or uncompressed, doesn't matter).19:46.47 
  Then the reading side can always copy out/uncompress knowing it will be large enough.19:47.03 
mvrhel_laptop ok19:49.15 
  bbiaw19:55.38 
Robin_Watts tor8: ping20:12.20 
  So... any thoughts on me deleting mupdfdraw and muxpsdraw? Any reason not to just offer mudraw ?20:13.14 
  Should we rename mupdf to muview ?20:13.32 
ray_laptop Robin_Watts: have you determined why the cache is getting re-loaded so much -- does it need to be larger. Are we seeing the same "traffic" at 600 dpi ?20:17.22 
Robin_Watts ray_laptop: The cache doesn't seem to be being reloaded much.20:17.41 
  In my tests, I didn't see any evictions (though I may have if I had run for longer)20:18.17 
ray_laptop Robin_Watts: I thought we saw it being written into 30K + times for 30K characters20:18.26 
Robin_Watts Were we?20:18.54 
  clist_change_bits is called 30K times (as we'd expect)20:19.48 
  but cmd_put_bits is only called 6000 times.20:20.05 
ray_laptop the profile shows 30K+ calls to clist_change_bits20:20.45 
Robin_Watts That's what I just said :)20:21.01 
  clist_change_bits looks to see if it's in the cache. If it is, and it's been sent already for this band it just exits.20:21.34 
  If it's not in the cache, it puts it in. If it has not been sent for this band, it then calls cmd_put_bits.20:22.10 
  So the number of calls to cmd_put_bits correspond to the number of 'cache misses'.20:22.33 
ray_laptop Robin_Watts: OK, I was working from (faulty) memory. I just opened the code and agree.20:22.59 
  Robin_Watts: sorry.20:23.06 
Robin_Watts no worries!20:23.12 
  I'm new to this code, so could easily be making stupid mistakes.20:23.23 
  The call to cmd_put_bits does the number crunching of the compression.20:23.45 
  And that adds up to 17 seconds or so out of the 79 for this file.20:24.06 
ray_laptop Robin_Watts: so the compression or not will only address the cmd_put_bits part of the time -- 16.6 seconds20:24.22 
Robin_Watts Given this is all latin text at the same font size, I can't believe that we're actually compressing more than 64 glyphs.20:25.03 
  Indeed.20:25.38 
ray_laptop The other question -- is it reasonable to have 6000 cache misses ? Looking at the file, that's hard to imagine. There are only a couple of embedded fonts20:25.43 
Robin_Watts So that's 21% of overall time we're playing for.20:26.00 
  It's the single biggest hotspot, I believe.20:26.17 
ray_laptop Robin_Watts: and besides compression, we have decompression.20:26.43 
Robin_Watts ray_laptop: Right, but that's elsewhere.20:26.58 
  my proposal doesn't affect that.20:27.05 
ray_laptop but I think investigating why the cache is so ineffective is the simplest20:27.09 
Robin_Watts Suppose we have 64 glyphs in play. on 53 pages.20:27.29 
ray_laptop Robin_Watts: the 6000 cache misses are only on _6_ pages !!20:27.57 
Robin_Watts Really?20:28.17 
ray_laptop Yes, look at the print_page_copies call count20:28.36 
Robin_Watts OK. so 64 glyphs on 6 pages is 384 calls. That's our baseline minimum.20:28.55 
  How many bands per page?20:29.06 
ray_laptop I wonder if this file paints text more than once on each page -- let me dump the text of a page and look at how many chars per page20:29.47 
tor8 Robin_Watts: go ahead and zap pdfdraw and xpsdraw. we must remember to fix the manpage as well to reflect the new name and capabilities.20:31.37 
Robin_Watts I reckon we have 52 bands per page.20:31.53 
  312 calls to clist_playback_band; 312/6 = 5220:32.16 
ray_laptop Robin_Watts: OK. the text wc gives: 300 lines, 5541 words, 34580 chars20:32.30 
Robin_Watts So... if every glyph appears on every band of every page, we'd expect 52*64*6 cache misses.20:32.46 
  = 1996820:33.02 
  So 6000 seems quite reasonable to me.20:33.13 
  That's the best the current scheme can do, regardless of cache size. Agreed?20:33.42 
  (sorry, cache misses is a bad term. "bitmap compression operations" would be better)20:34.13 
ray_laptop Robin_Watts: so your concept is for a shared tile cache ?20:34.22 
Robin_Watts No.20:34.28 
  At the moment, we hold uncompressed data in the tile cache.20:34.56 
  And we compress it into the band.20:35.03 
  we end up compressing the same data multiple times into multiple bands.20:35.24 
  Instead, I'd like to compress when we create the tile cache entry, and then just 'copy' into the band.20:35.53 
ray_laptop If we have a number of bitmaps that are available to ALL bands, then might be able to get much better hit rate for text pages20:36.05 
Robin_Watts We'd be reducing the 1000 compression operations per page to 384.20:36.38 
  Yes, mvrhel appeared earlier and suggested keeping the image data in a shared area, akin to how ICC profiles are done.20:37.14 
  And I pointed out that the code to do that is only in 9.0x, not in 8.71.20:37.31 
  Given how well the cache is working in this case, we'd not actually gain that much by having a shared area; we'd compress the data just once per page in both schemes.20:38.35 
ray_laptop Robin_Watts: the clist 'pseudo_band' accessors are not to hard to back port (I had done something like it about 3 years ago for band complexity, so it doesn't rely on recent clist innovations)20:38.45 
Robin_Watts The win would be that we didn't have to copy the compressed data into each band.20:38.47 
ray_laptop clist innovations -- there's an oxymoron for you ;-)20:39.03 
Robin_Watts ray_laptop: Well, if you want to take that on, then I won't complain about how you want to do it :)20:39.17 
  but I am aware that you have other priorities right now.20:39.48 
ray_laptop Robin_Watts: we'd still have 30K decompressions, right ?20:40.00 
Robin_Watts Yes, but then we have that under any scheme, right ?20:40.14 
ray_laptop Robin_Watts: well if a bitmap was in the page level tile cache, there'd be less need for compression (a subset of the total)20:41.23 
Robin_Watts A page level tile cache presents problems though.20:41.43 
  You can't safely reuse entries in the cache during rendering because different rendering threads might need it at different times.20:42.21 
ray_laptop thinking on the fly here ... If a glyph occurs a second time stick it in the page level cache (uncompressed) until the page level is full20:42.29 
Robin_Watts Hence you'd have to insist that the tile cache was large enough to hold all possible glyphs.20:42.41 
ray_laptop it's not really a cache in that it's there for the entire page.20:42.48 
  Robin_Watts: no -- when the page level bitmap storage was full we fallback to the current per-band scheme20:43.25 
Robin_Watts Right. So you'd have to have an additional block of memory for this page level tile storage block, and you'd fill it with tiles until it was full, and then drop back to the old method ?20:43.36 
ray_laptop Robin_Watts: and recall the page level tiles go into the clist.20:44.00 
Robin_Watts Ok, so consider a typical page, where we have a title in a large font, followed by lots of body text in a smaller font.20:44.27 
ray_laptop so if we reduce the number of tiles per page from 1000 to, say, 100 we are still OK.20:44.38 
  the page level bitmap storage doesn't need to be a constrained size20:44.54 
  Robin_Watts: and if it is in a pseudo-band, then the memory based clist compression fallback will automatically compress that too20:45.25 
Robin_Watts We'd use all the page level bitmap storage up putting the headline glyphs in, then have no room left for the body text ones which are the ones it would really help with.20:45.27 
  I'm confused now.20:45.48 
  How can we random access into a band if it's compressed ?20:46.09 
ray_laptop the clist is memory based anyway, so anything we do to only store a bitmap glyph once is a win20:46.22 
  gxclmem does that -- it has a LRU cache of decompressed blocks20:46.49 
Robin_Watts Well, clearly, if you can arrange to only store bitmaps once per page, rather than once per band that's a major win.20:47.04 
ray_laptop Robin_Watts: that's what I was thinking -- both in performance (if we don't compress) and in space20:47.50 
Robin_Watts Even if we compress it'd still be a 50fold improvement (on writing) on what we have now.20:48.33 
  because we'd only compress once per page rather than once per band.20:48.51 
ray_laptop But for disk based tiles, we want some set of LRU "blocks" of bitmaps that get reloaded on demand from the clist.20:49.17 
Robin_Watts And it would be a similar improvement on reading too, as we'd only decompress once too, rather than once per band.20:49.21 
ray_laptop Robin_Watts: If it's stored compressed we still have to decompress to use it in copy_mono20:49.54 
Robin_Watts This all sounds nice in theory, but it's a scarily large change to try to patch on top of the 8.71 + random stuff that 532 is using.20:50.24 
ray_laptop Robin_Watts: well, it's bits and pieces of stuff that is already done.20:50.50 
Robin_Watts Whereas, I'd hoped that 4) would be manageable.20:50.55 
ray_laptop mvrhel or I could probably port the pseudo-band accessor functions readily enough. The changes are mostly in clist_fill_mask I guess20:53.41 
  Robin_Watts: (4) ???20:54.05 
  Robin_Watts: nm -- I just saw your email20:55.23 
Robin_Watts sorry, was on phone.20:55.44 
  Ah, you'd not seen the email. I'm surprised I was making any sense at all :)20:56.06 
ray_laptop Robin_Watts: well, it's not like I haven't been pondering this as well :-)20:56.50 
  Robin_Watts: but the more I think about it, the idea of having the bitmaps stored (uncompressed) in a pseudo band, and the only cache be in the reader that knows how to reload a tile from the pseudo band, the simpler it sounds.20:58.13 
Robin_Watts So we'd do away with the tile cache completely ?20:58.45 
ray_laptop The only reason for the writer trying to manage the tile cache is so the reading of the bands can be 'streamed'20:59.08 
  Robin_Watts: no, we'd use a tile cache in the reader so that when a certain bitmap was requested, it would load from the pseudo-band if needed, and re-use of slots is allowed since we can reload as needed21:00.28 
  The reader tile cache is what prevents "thrashing" if we are disk based.21:01.08 
  so we NEVER store a bitmap more than once in the clists (in the pseudo band) unlike now where we store it multiple times for bands that need the glyph21:02.11 
  Robin_Watts: this follows somewhat what I have been thinking about for high-level image data as well -- store in a pseudo band. This is less of a clist size win since we only store a subset of an image in any particular band (what fits in the band plus the 'support')21:03.48 
henrys thinking about Robin_Watts compression numbers I imagine we could get a quick fix to their problem with no compression. At the end of the day it is peak memory usage that matters and it would be hard to imagine a page text is going to rival typical grapics that would be printed on a usable printer.21:05.09 
ray_laptop Robin_Watts: but the image pseudo-band storage would be a HUGE win for skewed images that either greatly extend the number of support lines or punt to rectangles now21:05.11 
  Robin_Watts: any idea what having the 6000 bitmaps uncompressed vs. the current compression will be ? (iirc, we use G4 now which is pretty good)21:06.18 
  henrys: I must have missed "Robin_Watts compression numbers" was that an email or IRC ?21:07.36 
henrys I think it is great to improve the current code. Getting something quickly working for an 8.71 branch seems a different project, we need something very simple.21:08.03 
ray_laptop henrys: and I have no objection to Robin_Watts taking a quicker approach that we later do differently21:08.19 
henrys 1 meg bands vs. 256k k bands.21:08.32 
  have you customized their band setup?21:08.48 
  ray_laptop?21:08.56 
ray_laptop henrys: they run BandHeight=256 21:09.30 
henrys so I see 52 bands for 1200.21:10.44 
  I imagine Robin_Watts numbers we not done for that BandHeight21:12.29 
  s/we/were21:12.35 
Robin_Watts sorry, off phone now. let me read the logs.21:13.15 
  52 bands for 1200, yes.21:14.30 
  Unless the simulator was set up differently, that's what I'd expect I was running at.21:15.04 
  Helen is calling me for dinner. It's Hallmark day, so I'd better go...21:15.43 
  I'll check back in later to see if you have any more thoughts.21:15.55 
henrys okay 21:16.28 
Robin_Watts I could send Eric an email tomorrow suggesting that he tries removing compression (with a patch) and then he can see if that makes a timing difference.21:16.35 
ray_laptop because that's what their raster pipeline needs -- this matters on the older color product, but here the band heihhy21:16.52 
  It's easy enough to take the BandHeight param out of the ***_gs_main and let the BufferSpace determine the band height (their code won't be able to tell on the back end)21:18.01 
  Robin_Watts: I have to go now too21:18.19 
  Robin_Watts: but if you email Eric, suggest that he increase the BandHeight param or remove it21:18.55 
sebras tor8: hm. are the elements in a matrix allowed to take the float value inf?21:53.01 
tor8 no.21:53.33 
sebras tor8: when readin fz_invert_matrix() is seems like this may actually happen whenever two of abcd are zero e.g...21:53.56 
  a pdf with a pattern matrix of [ 0 0 0 0 0 0 ] triggers the bug.21:54.31 
tor8 not all matrices are invertible21:54.34 
sebras I know, but mupdf doesn't check for that.21:54.48 
tor8 and that matrix is indeed degenerate21:54.54 
sebras sure, so are you confident that we handle inf "correctly" i.e. without crashing elsewhere?21:55.20 
tor8 right, so we should probably check that the det != 0 before inverting21:55.39 
  no, we'll crash big time21:55.49 
sebras actually no. :)21:56.06 
  I just tried it on a pattern and it works fine, don't ask me why though...21:56.21 
tor8 maybe the pattern isn't used :)21:56.31 
  or the bbox is set properly and it doesn't contain any graphics21:56.44 
  in which case the matrix isn't actually used21:56.52 
sebras tor8: that could be the case.21:57.50 
  tor8: will you push a fix?21:58.03 
  tor8: I assume that we'll return the matrix unchanged instead?22:08.30 
tor8 sebras: I don't have sane working a.t.m. but there is a patch on my users/tor repo22:11.42 
sebras tor8: excelltn.22:12.51 
  then my comment fits. :)22:12.59 
marcosw_ henrys: I've reinstalled the cluster software on henrysx6 and it appears to work but I have temporarily disabled it since I'd like to test it more thoroughly when I'm online, which should be tomorrow.22:42.01 
sebras Robin_Watts tor8: pushes some more text to sebras/doc23:15.42 
  I must read up on the text device to be able to document it. also I would appreciate if Robin could write something about fz_set_aa_level() becuase that code is beyond my comprehension.23:17.26 
tor8 sebras: I'm rewriting the text device (slowly...) so don't spend any time on that yet23:45.32 
 Forward 1 day (to 2012/02/15)>>> 
ghostscript.com
Search: