Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2012/02/13)	2012/02/14
mvrhel_laptop	alexcher are you around?	00:59.59
alexcher	mvrhel_laptop: yes, I've received your message.	02:08.41
mvrhel_laptop	alexcher: ok great. did it all make sense to you?	03:46.15
noobirc	MuPDF: Makefile:160, install cbz/mucbz.h $(PDF_APPS) $(XPS_APPS) $(MUPDF) $(bindir)	06:47.34
	I guess mucbz.h goes to the wrong plce=)	06:47.50
ray_laptop	Robin_Watts: henrys: I got the image rotation change working -- once I worked through the transformations, I was able to do it with just the ImageMatrix mods. The good news is that (at least with the simulator) the performance boost is awesome -- 2.8x (2.3 sec for the 4 pages of PWTTQ1CC vs. 6.6)	07:26.06
	Robin_Watts: I haven't had a chance to throw much at it, but I will tomorrow. I will put the change into HEAD and run a cluster regression as well as looking at the simulator output for the files I've found that will take advantage.	07:27.20
	g'nite all...	07:31.03
	cluster run started ...	07:36.36
Robin_Watts	tor8: mudraw pushed for your delight and delectation.	11:18.12
	tor8: I've saned it locally and it passes.	11:18.27
	(with a modified sane, obviously)	11:18.34
chrisl	Robin_Watts: where are you at with the cust 532 performance issues?	11:21.39
Robin_Watts	I believe that with Rays changes he was cluster testing earlier, we should hit the performance targets for all but 1 file.	11:23.32
	I spent some time staring at that file yesterday and didn't really get far.	11:23.48
chrisl	It's just, I think I need to ask you to look at the patterns problem again :-(	11:24.20
Robin_Watts	I spotted a potential optimisation in the clist code to do with eliminating clipping path use when it wasn't required, but my patch ran into problems.	11:24.32
	I mailed ray about it, but he's (understandably) not got back to me yet.	11:24.48
	chrisl: OK.	11:24.58
	I'm just heading out for a run now, but if you put details on here, I'll look when I get back.	11:25.22
chrisl	If you look at http://www.ghostscript.com/~regression/chrisl/compare3.html and test 60, you can see the issues	11:25.25
Robin_Watts	And why is that a problem?	11:26.19
	because it's more ziggy zaggy ?	11:26.44
chrisl	Yes, there's also a where there is a gap that wasn't there before - I'll find an example for after you're been running	11:27.25
Robin_Watts	ok.	11:27.35
chrisl	Robin_Watts: (for when you return) test 83 on: http://www.ghostscript.com/~regression/chrisl/compare5.html	11:57.29
	and test 96 on: http://www.ghostscript.com/~regression/chrisl/compare6.html	11:57.52
sebras	tor8: you did pick up on noobirc's remark..?	11:58.04
tor8	sebras: yeah, I fixed that last week	11:58.52
sebras	ah, sweet.	11:59.07
	now, back to the meetings...	11:59.13
tor8	commit 8e8fc85624b66e59e5e284de7f532522e776a353	11:59.16
chrisl	Robin_Watts: Both the above show gaps that weren't there before. On some tests, Acrobat shows gaps where we didn't before, and do with our proposed change (can't compare PCL/PXL).	12:00.02
Robin_Watts	chrisl: We are in an area where differences are permissible.	12:46.29
	The spec deliberately doesn't specify exact behaviour.	12:46.49
chrisl	Robin_Watts: I know, and my first reaction was exactly that. But I'd like a second opinion before going with that	12:47.31
Robin_Watts	Is your gut feeling that this is an improvement or a regression?	12:47.59
	My gut says regression, but I've only looked at a few.	12:48.28
chrisl	Overall, I feel this is an improvement - there are a lot that we are closer to Acrobat with the change than before - it's just unfortunate that the ones that look worse look very obviously worse.....	12:49.43
Robin_Watts	How did you run the bmpcmp ?	12:51.50
chrisl	-t 16 -w 3, and I filtered the initial push with filter=ppm	12:52.30
Robin_Watts	Right.	12:52.37
chrisl	With a lot of those files, halftoned output was confusing as hell!	12:53.22
Robin_Watts	oh yes...	12:53.37
	And seeing pbm/pgm etc variants of the files really doesn't add anything.	12:54.10
chrisl	Exactly	12:54.20
Robin_Watts	Test 66 is a shame	12:57.43
chrisl	Test 66, I think, is actually closer to Acrobat's output - although not the same.....	12:58.54
Robin_Watts	ACrobat gives the gaps ?	13:00.00
chrisl	Not exactly, but it doesn't show a constant shading as we originally did	13:00.45
	Robin_Watts: actually, the 300dpi tiff output from Acrobat does show the gaps, just like the bmpcmp	13:01.46
Robin_Watts	So the question occurs - do we want to attempt to emulate acrobats flaws?	13:02.26
	bbs	13:02.37
chrisl	I would say "no". But I do think the pattern stepping we have is sufficiently off (for example, with fts_06_0618.pdf) we should address it - I doubt we'd get away with waving the spec at cust 532.	13:07.03
	If addressing that stepping problem results in "poorer" output from the file in test 66, that nevertheless matches Acrobat (more closely), then so be it. But there are others which I'm less confident about.	13:08.49
	FWIW: Acrobat's output: http://www.ghostscript.com/~chrisl/Bug690637.tif	13:10.12
	Oops, better idea (compressed): http://www.ghostscript.com/~chrisl/Bug690637.tif.zip	13:13.18
Robin_Watts	I wonder if it's worth pulling 532 into this?	13:18.37
	We could send them a mail, saying that this is an area of the spec where there is considerable latitude left to the consumer, and everyone works differently.	13:19.07
	Say that we have a possible change that helps their test case, but it hurts in other cases.	13:19.23
	There is no one "correct" thing to do; so if they want to adopt the change, then they must be sure they are happy with it, and the onus for testing it should be on them.	13:19.54
chrisl	Regardless of what 532 said, I still think there's an issue	13:22.13
Robin_Watts	Right, but the time pressure is from them	13:22.36
	I have to run to the shop for hallmark day, then it will be lunchtime. back after that sorry.	13:23.38
chrisl	I'll be going to squash shortly, too	13:23.56
jvervier	Hi all	14:04.43
	I'm joining you because I'm looking to know if something is possible when using ghostscript on windows	14:05.18
	Is it possible to launch a gs32win.exe command with parameters without opening any other ghostview window during the process? (Like with linux distribution when I launch the command line)	14:06.08
	?	14:06.10
Robin_Watts	gswin32c.exe	14:11.14
	rather than gswin32.exe	14:11.48
jvervier	indeed !	14:19.49
	Forget this one	14:19.55
	sorry :D	14:19.58
	It's working by this way	14:20.06
Robin_Watts	fab.	14:20.15
wwww	quit	15:02.13
Robin_Watts	There is scope for accelerating the skip_black_pixels, skip_white_pixels stuff in the bitmap compression code by using platform specific code.	15:53.07
	chrisl: You here?	16:13.48
chrisl	Robin_Watts: yes, just getting a cuppa	16:15.12
Robin_Watts	The 532 file I'm currently looking at is WWTTM1CT.	16:15.38
	It mostly has text in it.	16:15.43
	but lots of time is spent in gx_fill_path	16:15.58
chrisl	That's plausible depending on what the text rendering mode is in force	16:17.03
Robin_Watts	which seems odd to me. I thought rasterisation was supposed to be happening through freetype ?	16:17.17
	Oh, so what modes stay in our stuff ?	16:17.32
chrisl	Well, actually, cust 532's code pre-dates freetype anyway	16:17.49
Robin_Watts	OK,that makes sense then.	16:17.58
chrisl	For base 136 fonts, their code is using UFST, but not for any embedded fonts	16:19.22
ray_laptop	hi, all	16:38.00
chrisl	hi, ray_laptop, how's it going?	16:38.22
ray_laptop	OK. Got the kids off to school with lunches, books, instruments, ... I'll do a quick market in a bit, then go see Karen. She's scheduled for the OR on Wed.	16:39.51
henrys	ray_laptop:please wish Karen well for me.	16:40.43
Robin_Watts	ray_laptop: Delayed from yesterday?	16:41.02
	ray_laptop: Great news about the speedups (I saw the logs)	16:41.17
	Did that test out OK? Is there anything we can do to help with that?	16:41.40
	(I thought Karen was due for the OR late yesterday?)	16:42.06
ray_laptop	Robin_Watts: looks like there are a couple of files to look into 035-01.ps 148-01.ps 148.11 Bug687603.ps Bug687889.pdf Bug691554.eps, ...	16:43.43
Robin_Watts	I'll try to look at some of those.	16:44.17
	I'm failing dismally to come up with anything to speed up WWTM1CT.pdf	16:44.33
ray_laptop	Robin_Watts: she was, but a scheduling conflict (something urgent) came up. Hers is not as time critical -- 4 days post injury for repair is not uncommon	16:44.38
	Robin_Watts: I'll submit a bmpcmp now, OK ?	16:44.59
Robin_Watts	ray_laptop: Right, that makes sense.	16:45.02
	fab.	16:45.04
ray_laptop	Robin_Watts: frankly I was surprised that the performance difference was so large, but the wheel output looks OK, and the same patch on HEAD works with no diffs.	16:46.40
Robin_Watts	I'm not :)	16:47.22
henrys	WWMTM1CT.pdf is primarily text?	16:47.54
Robin_Watts	WWTTM1CT.pdf, yes.	16:48.09
henrys	are we sure the cache hit rate is okay on the target?	16:48.24
ray_laptop	I'll drop Eric a note and tell him that the performance improvements look encouraging, but we're looking at a few files that show diff.	16:48.30
Robin_Watts	henrys: No idea.	16:48.42
	The whole runtime is 75 seconds (ish)	16:49.37
ray_laptop	henrys: WWTT{MN}1CT are text. 'M' is all black. 'N' is blocks of 4 different color text in 4 sections down the page	16:49.44
	henrys: they are both 50 page files.	16:49.55
henrys	well there should be a call in the profile for rendering from cache for each character.	16:49.56
ray_laptop	henrys: there are	16:50.16
Robin_Watts	The clist playback is 39 seconds or so.	16:50.23
henrys	hmm not something we have a history for being slow on.	16:50.51
ray_laptop	henrys: iirc, the first page subsete file had 5069 "render from cache" calls	16:50.52
	iirc, gx_image_cached_char	16:51.18
Robin_Watts	I see 31595 calls to that.	16:52.05
	oh, but that's on all the pages.	16:52.12
henrys	this is the one they print faster than the blank pages?	16:52.13
ray_laptop	Robin_Watts: henrys: I llooked and I had thought that forcing it to not use the image_init, ..., path and using the 'copy_mono' would be better, but it didn't look like it on the simulator	16:52.30
	henrys: there is something funny about the 50_blank_pages on their engine, but we haven't focused on that.	16:53.06
	henrys: our blank page time is faster than the page with the text	16:53.34
henrys	so the ufst renderer is faster than ours ... But with a good cache hit rate I wouldn't expect that to matter much.	16:54.04
ray_laptop	our blank pages is faster than their time with the text at least	16:54.18
Robin_Watts	The char rendering is being done in our renderer, I believe.	16:54.29
henrys	I assume we're using our renderer and 5th gen is ufst	16:54.30
ray_laptop	henrys: this is using UFST I think. Another file (J11) uses our type42 and type1 rendering for embedded fonts	16:55.09
chrisl	henrys: 6th gen is using UFST for MT fonts, but embedded fonts still use the AFS code	16:55.24
Robin_Watts	ray_laptop: Not using UFST for this, I'm fairly sure.	16:55.38
ray_laptop	Robin_Watts: let me check the profile...	16:55.53
Robin_Watts	47 seconds is in gx_default_fill_path (I think, their profile is hard to real)	16:55.54
henrys	chrisl:right ...	16:56.01
Robin_Watts	22+25	16:56.03
henrys	I assumed embedded.	16:56.09
	Robin_Watts:and you are profiling 8.something for this or just using the simulator?	16:56.49
Robin_Watts	I'm looking the profiles they supply.	16:57.43
	My attempts to profile their stuff have failed so far.	16:58.11
chrisl	Robin_Watts: what type of text is it (western, kanji etc)?	16:58.32
Robin_Watts	(And in fact, I can't seem to get very sleepy to find symbols for anything since I installed Windows 7 :( )	16:58.55
	western.	16:58.59
	It's english text. About cowboys.	16:59.09
chrisl	Maybe Ghostscript doesn't like the old west?	16:59.33
ray_laptop	chrisl: on the 1200 bit profile of a 6-page subset file, I see 31,595 calls to gx_image_cached_char -- there are 980 calls to gs_type1_interp_init and only 490 calls to gs_type1_interpret	17:00.09
henrys	ray_laptop:is karen a regular skater? That must have been quite a fall.	17:00.32
	well I guess we should start the meeting.	17:01.14
Robin_Watts	ray_laptop: 38 seconds is spent in clist_fill_mask	17:01.17
	Of which 17 seconds are in clist_change_bits - most of which is (I believe) doing 2d compression of the character.	17:01.54
chrisl	ray_laptop: I'm bemused about the calls to gs_type1_interp_init and gs_type1_interpret - I would expect the other way around!	17:02.17
henrys	mvrhel and alexcher are setup.	17:02.27
ray_laptop	hmm... I looked at the PDF file. It is an embedded Type1 font subset "/BaseFont /KAKDNP+TimesNewRomanPSMT"	17:02.48
henrys	mvrhel and I keep bumping the priority of the icc user params in pcl, but think I can start today.	17:02.54
mvrhel	ah ok. if you need anything from me, let me know henrys	17:03.11
henrys	mvrhel are you alexcher good?	17:03.42
mvrhel	well, I would like get a "ok I understand" or a "I don't have any idea what you want" back from him	17:04.09
chrisl	ray_laptop: double the number of calls to interp compared to interp_init would be plausible for various reasons, but the other way round is slightly odd	17:04.11
henrys	this hpgl/2 stuff I have is painful, 3x3 foot plots all over the place.	17:04.51
alexcher	henrys: the patch looks OK but I didn't think much about it.	17:04.58
henrys	thinking is good ;-)	17:05.45
mvrhel	alexcher: so do you know what I need you to do on the interpreter end for the output intent support?	17:06.03
Robin_Watts	alexcher: You don't think much of it, or you haven't thought much about it?	17:06.09
alexcher	mvrhel: I'll review the patch today.	17:07.11
mvrhel	well, its not a patch now	17:07.19
	its in the trunk	17:07.22
Robin_Watts	(Sorry, I misread alexes initial reply, hence my question. I think it's clear he meant the latter, sorry)	17:07.41
mvrhel	review the steps that I said I would like you to do in the intepreter	17:07.46
	please	17:07.50
ray_laptop	Robin_Watts: looking at the 6_page profile, there is 110 seconds in gx_ht_alloc_cache called from gx_dc_default_fill_masked. That seems like a lot	17:07.54
Robin_Watts	ray_laptop: I'm looking at the profiles from Build_0036. Those are the most recent, I believe.	17:08.30
ray_laptop	Robin_Watts: OK. I'll switch over -- I was probably looking at an older one.	17:09.26
Robin_Watts	WWTTM1CT_1200_1bit_hprof.txt	17:09.29
henrys	can we ask them to reproduce the profiling results in the simulator? I assume they can profile the simulator and if the results are quite different than the target we know there isn't a lot we can do.	17:09.41
	other than hardware platform stuff.	17:10.09
Robin_Watts	henrys: Well, assuming the call graph is the same (which we'd hope it would be, otherwise, what's the point?) then we can identify issues by examining the profiles they supply, and then use the simulator to step through.	17:10.57
ray_laptop	henrys: their simulator profile was screwy enough that I was ignoring it. Also they don't seem to be able to get the profiler to run with a non-DEBUG build, so the timings are not comparable	17:11.46
Robin_Watts	But I find it hard to believe that we spent 77 seconds in 22 calls to gs_push_boolean	17:11.46
henrys	I am just concerned there is something wrong on the printer - integration or something, that would be obvious comparing the target and simulator but after ray_laptop's comment nvm.	17:14.03
	chrisl:do you have anything for the meeting?	17:14.37
	marcosw_:I've installed all the packages?	17:14.52
	s/?//	17:14.58
chrisl	henrys: no nothing from me this week, I don't think.....	17:15.13
Robin_Watts	The mupdf customer has been experimenting with multithreaded rendering, and I think he's got it working.	17:15.29
	He reported some problems, and I pointed him at fixes, and he's gone quiet, so presumably that's a good thing.	17:16.08
henrys	Robin_Watts:that is very good. We need to think about where we want to show off the new android app now that paul is finished.	17:16.17
Robin_Watts	henrys: Well, presumably we want to get it listed on the android app store, in the same way that we are on the ios app store?	17:17.14
henrys	yes, has tor8 finished reviewing?	17:17.42
Robin_Watts	But we can offer download links to it on mupdf.com already.	17:17.44
henrys	I assume the android store has fewer police.	17:18.07
Robin_Watts	henrys: Yes, you get to keep your testicles, rather than having to hand them to apple in a jar.	17:18.41
henrys	I guess we'll release this on the world with the upcoming mupdf release.	17:19.42
Robin_Watts	ray_laptop: It seems odd to me that clist_playback_file_bands can take 39seconds, where gx_image_cached_char takes 42.	17:20.48
	Are we imaging the cached char BEFORE the clist ?	17:21.12
ray_laptop	Robin_Watts: w.r.t gs_push_boolean -- the profle says it is calling gs_interpret, so I think their tool is getting the function names wrong. It was probably supposed to be gs_main_interpret which is also in the same .o (imain.c)	17:21.52
Robin_Watts	ray_laptop: Yeah, I figured it was something like that.	17:22.12
	It's EXTREMELY annoying that their tool does that.	17:22.20
chrisl	Robin_Watts: yes, I believe that's how glyph rendering and clist work - otherwise the clist wouldn't be self contained	17:22.48
ray_laptop	Robin_Watts: gx_image_cached_char happens during clist writing. It writes 'bits' to the tile cache. The rendering does 'copy_mono' from those bits	17:22.51
	the clist doesn't know anything about characters	17:23.11
Robin_Watts	Right. So on every glyph we image, we recompress the bitmap into the clist ?	17:23.14
	Is there no way we can only compress each image into the clist once ?	17:23.43
ray_laptop	Robin_Watts: when we put the tiles into the cache, I think that's where we compress	17:23.54
	Robin_Watts: there _should_ be good hit rate in the cache. Maybe we are seeing that the tile cache isn't large enough for 1200 dpi	17:24.42
tor8	Robin_Watts: about the android app, the ChoosePDFActivity should probably be a file browser so you can find documents in other locations too (to match what other android apps do)	17:24.56
	and I'll want to go over the icons a bit before release	17:25.09
Robin_Watts	tor8: File browser - sure, but I don't think we can ask Paul to do that (at least not as part of the original quote).	17:25.50
tor8	I also wonder if it's possible to constrain the vertical scroll bouncing	17:26.02
henrys	tor8:how are the docs (sebras) coming? Is there a work in progress somewhere we can look at?	17:26.11
Robin_Watts	tor8: Of the pages when zoomed out ?	17:26.18
tor8	yeah. when flipping left and right it's a bit disconcerting how it wobbles up and down at the same time	17:26.35
Robin_Watts	henrys: sebras has a branch in his git repo.	17:26.45
	(on casper)	17:26.50
henrys	oh okay I'll check it out.	17:27.12
ray_laptop	Robin_Watts: I'll let you focus on mupdf and we can discuss WWTT performance later. I have to run a couple of errands and go over to see Karen	17:27.12
Robin_Watts	ray_laptop: OK.	17:27.25
henrys	let's call the meeting done.	17:27.34
Robin_Watts	It might be worth us trying to up the cache a bit in the clist.	17:27.39
ray_laptop	Robin_Watts: If you want to look at some of the bmpcmp results on the image rotation, just mention it in IRC and I'll check the logs before diving in.	17:27.59
Robin_Watts	ray_laptop: Will do.	17:28.07
tor8	henrys: http://git.ghostscript.com/?p=user/sebras/mupdf.git;a=tree;h=refs/heads/doc;hb=refs/heads/doc	17:28.14
henrys	tor8:thanks	17:28.32
ray_laptop	Robin_Watts: we should get the same cache hit/miss sequence in the simulator (at 1200 dpi)	17:28.37
tor8	henrys: some docs in doc/ and others in fitz/fitz.h and pdf/mupdf.h	17:28.53
ray_laptop	Robin_Watts: and we can look at its size and effectiveness	17:29.03
Robin_Watts	ray_laptop: Indeed. I'll try and figure out how :)	17:29.09
ray_laptop	bye for now....	17:29.09
Robin_Watts	cu	17:30.48
henrys	Robin_Watts:I wonder if we could keep the memory okay without compression - I assume that would speed things up significantly.	17:36.57
Robin_Watts	Not as much as just making the cache large enough might.	17:38.30
	This is odd.	17:41.17
	It's finding things in the cache, and then compressing them anyway.	17:41.28
chrisl	That's probably bad......	17:41.44
Robin_Watts	oh, each bitmap gets compressed once per band.	17:42.21
henrys	I assume it pulls the bitmap from the cache and compresses it stuff it in a band right?	17:42.39
Robin_Watts	I'm confused then.	17:42.59
	We render glyphs to bitmaps. Those go into the font cache.	17:43.19
chrisl	glyph cache	17:43.33
henrys	It's been ages since I look at it.	17:43.33
Robin_Watts	Then we pull bitmaps out from the font cache, and pass them to clist_fill_mask	17:43.41
	chrisl: sorry, glyph cache, not font cache (I stand corrected)	17:44.00
	clist_fill_mask puts the bitmaps into a tile cache.	17:44.16
henrys	ah I didn't know it was using the tile cache I thought it would just do copy mono on the clist.	17:45.39
chrisl	Robin_Watts: It's been a while, but isn't there a cache for objects which applies to all bands?	17:46.17
Robin_Watts	Then it seems to encode each tile once per band.	17:46.35
	Potentially stupid question here, so please bear with me...	17:47.26
	The same time with compress the same regardless of which band it goes into, right?	17:47.48
	So why don't we just copy the compressed version from one band to another.	17:48.04
	(That may be akin to saying "So why don't we just make boats that can't sink?")	17:48.37
mvrhel	bboab	17:48.44
	bbiab	17:48.46
Robin_Watts	s/time/tile/	17:49.31
marcosw_	Sorry I missed the meeting. Was there anything for me other than that henrysx6 is ready to be re-enabled as a cluster node.	17:50.05
chrisl	Robin_Watts: Does it do that for every glyph?	17:50.06
Robin_Watts	Yes.	17:50.13
chrisl	Regardless of which band(s) the glyph occupies	17:50.37
Robin_Watts	As far as I can tell, every glyph causes a call to clist_fill_mask. That then looks for the tile in the cache.	17:50.51
	If it finds the tile it checks to see if that tile was signed as being in the required band - if not, it resends the tile for the new band.	17:51.31
	So, in this case, it means it's going to resend every tile for every band it's used in.	17:51.56
chrisl	Do we have one tile cache? Or a tile cache per band?	17:52.17
Robin_Watts	Wheras if it was smart enough to have said "this one will be in every band" to start with, it would only have to send it once.	17:52.24
	We have one tile cache. Each entry has a set of bits, one per band to say which bands it's in.	17:52.43
	Now, maybe we NEED to send it to every band - I confess, that I still haven't got my head entirely around the clist stuff.	17:53.26
chrisl	Robin_Watts: I was wondering if we could special case tiles which represent glyphs, so they automatically get associated with every band - my feeling is that would be a useful thing for a lot of documents	17:54.08
Robin_Watts	If so, then this may be almost the best we can do - but why go to the trouble of recompressing the bitmap again and again? Why not just copy the compressed representation from one band to another ?	17:54.18
	chrisl: That would be nice. But that may require us to send the glyph to every band anyway.	17:55.08
	which would mean we'd actually we worse off than we are now.	17:55.36
	'Just' keeping a note in the tile cache of whether we've compressed a tile before, and where we can copy the data from would give better results, I think.	17:56.10
chrisl	I'm confused - I guess I don't understand the format of entries in the tile cache.....	17:56.29
Robin_Watts	The tile cache contains uncompressed data, I believe.	17:57.19
	When we send a tile entry to the band we 2d compress it, and copy it into the band data (I think!)	17:57.39
henrys	so storing the glyph compressed in the tile cache would work okay too?	17:58.27
Robin_Watts	henrys: Yes, was just pondering that.	17:59.07
	I don't know what else the tile cache is used for.	17:59.18
chrisl	Sounds like there's mileage in compressing all tiles in the cache, rather than on writing to the clist.	17:59.26
Robin_Watts	chrisl: Yeah. Presumably nothing goes into the tile cache that won't be compressed into a band anyway.	17:59.57
	so we can't be any worse off.	18:00.04
chrisl	Robin_Watts: exactly. I'm assuming from what Ray said earlier, after the tile cache, the clist has no idea the tile represents a glyph.	18:00.47
Robin_Watts	I don't think it even knows at the tile cache level.	18:01.20
henrys	but if the performance problem goes away just bypassing compression it is worth a look, that is such a trivial change.	18:01.22
Robin_Watts	image_fill_mask doesn't even know that it's a glyph, as far as I can tell.	18:01.59
henrys	the memory optimization may no longer be relevant.	18:02.16
Robin_Watts	henrys: Possibly, yes. But at 1200dpi, what will that do for memory use?	18:02.22
	Typical glyph = 1/5 inch?	18:03.03
chrisl	Presumably we only write each glyph once to each band, we don't write every use of every glyph?	18:03.53
Robin_Watts	So 240x240 = 7.2K per glyph uncompressed.	18:04.17
	chrisl: indeed.	18:04.22
	So for upper and lower caps for a single font in a single size that would be 388K per band.	18:05.12
henrys	and your probably getting 2:1 with the run length?	18:06.10
	maybe a little better.	18:06.19
Robin_Watts	cmd_compress_cfe = fax compression, right?	18:06.37
	fax compression gets 4:1 (according to prof google)	18:07.50
	I'd hope more than that on such high res glyphs.	18:08.16
	group 4 fax gets 15:1 (again from google). Don't know what we use.	18:08.32
henrys	too many variables we'd need to look at a page full of glyphs - we aren't accounting for any overhead - each entry in the tile cache has a header etc.	18:09.47
Robin_Watts	it's not space in the tile cache, it's space in the band data.	18:10.19
	(unless I'm not following)	18:10.27
	but I take your point.	18:10.31
henrys	we have stats for that in the code.	18:11.36
Robin_Watts	Let me see if I can find out how to print the completed band size.	18:11.43
henrys	at 600 dpi 1 bit I don't see why we are banding at all but I won't go there.	18:12.19
Robin_Watts	1200dpi, 1 bit, but yes...	18:12.56
henrys	yes	18:13.03
	so this job is okay at 600 or no?	18:13.25
Robin_Watts	Urm...	18:14.04
henrys	I should have the mail in front of me, sorry	18:14.35
Robin_Watts	Yes.	18:14.48
	The last set of results we had from him...	18:15.02
	PWTTQ1CC was slow in all 3 modes, but rays rotation should sort that.	18:15.22
	WWTM1CT and WWTTN1CT were both slow in 1200 1bpp only.	18:15.44
	And that's all the results they'd put in bold in the email, so I assume we're off the hook for anything else.	18:16.00
chrisl	Robin_Watts: sorry, just a thought - what about compressing with rle - it should be considerably faster than fax, and given the relatively small size of the bitmaps in question, I'm not sure that fax will really get into its stride	18:16.17
Robin_Watts	chrisl: That is a good idea.	18:16.41
	Let me try and measure the differences in band size of these files.	18:17.01
	Any hints etc... :)	18:17.33
chrisl	Erm, -Zl?	18:17.56
henrys	right -ZLl says everything I thought.	18:18.19
	literally ... everything	18:18.44
	marcosw_:Robin_Watts recent change reminds me a smoke test for all the devices would be useful. What ever happened to that?	18:22.30
	marcosw_:nothing else I know of for the meeting.	18:22.41
	it seems odd we have so many bands at 1200	18:25.46
marcosw_	henrys:I have a simple script that runs "gs -h" captures the output and then runs all the devices using each of the input files from the examples directory. It isn't run automatically, but could be.	18:26.11
henrys	seems like a simple thing to do along with the regular regression run.	18:27.04
	there may be a bunch that fail and will need to be excluded or ignored.	18:27.32
marcosw_	the only problem is that lockups are not currently detected, the script just stops and I have to control-C and deal with it. I'll fix that and run it automatically daily, sending out an email for seg faults, error, or lockups.	18:31.50
henrys	marcosw_:sounds good.	18:32.15
chrisl	Robin_Watts: my ride to my next squash match is due any minute so.... I seem to remember there is quite a bit of initialization involved in fax encode and decode, and possibly some non-trivial flushing for encode, as well as the overhead of a fairly complex encoding: there is probably a "critical mass" where the bitmap is sufficiently large to warrant all the extra effort over simple rle.	18:37.30
Robin_Watts	chrisl: Yeah, I'm almost there...	18:37.44
chrisl	Robin_Watts: I just wanted to mention that before I have to disappear. I'd hope rle would get decent results, given the nature of a glyph bitmap	18:38.32
Robin_Watts	With fax band sizes are typically 230K ish.	18:41.24
henrys	chrisl_away, Robin_Watts:I guess switching the filters should be easy too.	18:41.25
Robin_Watts	With rle 1 meg.	18:41.37
	I'd be really hard pushed to tell the difference between rle or none.	18:44.47
henrys	quite a difference	18:44.57
	that is really quite surprising I would think rle would work well	18:45.34
Robin_Watts	yeah, but fax gives rle in both directions, effectively.	18:45.54
	oh rle == none, is slightly surprising yes.	18:46.14
henrys	yes that is what I meant	18:46.24
	anyway I've got to get back to my hpgl/2 foibles but I'll be about.	18:48.26
Robin_Watts	rle does run length encoding of bytes, not bits, right ?	18:52.30
	So it'll only really kick in for 1bpp stuff when we have runs of > 24 pixels.	18:53.06
	I'm going to bale soon, so I'll send ray an email about this stuff. He may have thoughts.	18:54.09
henrys	yes I was thinking that as well but we do trim the white space border from the glyph and a 1200 dpi glyph should give a nice repeat "full" bytes.	18:55.08
mvrhel_laptop	ok this may be better	19:19.04
	Robin_Watts: so would it help at all if the glyphs were put in a location that was shared amongst the bands?	19:19.55
	similar to what is done with the icc profiles in the clist?	19:20.07
Robin_Watts	mvrhel_laptop: That would mean we didn't recompress them more than once, yes.	19:20.28
	But I thought we wanted the band data to be completely separate per band ?	19:20.50
mvrhel_laptop	ok. we added the profile stuff in somewhat generically to have this capability for other objects	19:22.20
Robin_Watts	Well, that may be perfect then.	19:22.42
	but it's a question of how hard it would be to do (bear in mind they are using 8.71 + patches)	19:22.59
mvrhel_laptop	oh.	19:23.05
	darn. yes. all this stuff was added with 9.0	19:24.37
	and I don't think they have those changes	19:24.59
	let me show you where the functions are hold on	19:25.26
	so if you search on ICC_BAND_OFFSET that should show you what you need	19:27.43
	basically the write, and read of the data	19:28.27
	The icc_table is a structure that has offsets stored in it that point to the icc profiles clist. I am not sure how you would want to do this for the glyphs. This may be bit too much of a project given the time constraints	19:30.15
Robin_Watts	So, you have a pseudoband to which you write all the icc stuff.	19:31.27
mvrhel_laptop	yes	19:31.31
Robin_Watts	and presumably the idea is we'd have another one to which we write all the compressed bitmaps.	19:31.42
mvrhel_laptop	yes	19:31.47
Robin_Watts	I suspect with the learning curve involved with me getting over my fear of the clist, this will take too long.	19:32.15
mvrhel_laptop	if it was in 9,0+ I think it would go quickly. But working from some cobbled 8.71+ version I have to agree	19:32.46
Robin_Watts	I'm secretly hoping that ray will say "oh, well that's easy..." to either 3 or 4 from my email.	19:33.02
mvrhel_laptop	I have to think 3 would be easy	19:34.13
Robin_Watts	It sounds easy if you say it fast.	19:35.43
mvrhel_laptop	and I was thinking that you would do that as mentioned in 4	19:35.46
Robin_Watts	It depends if we're OK to jump back and read from a file.	19:36.05
mvrhel_laptop	with 4 there is no read yes?	19:36.34
Robin_Watts	4 may be easy - but I don't have a complete picture of how the tile cache is used.	19:36.35
	It depends if we need to have uncompressed access to the tile cache too.	19:36.53
mvrhel_laptop	Can we not have both?	19:37.06
Robin_Watts	I'm hoping ray knows all this stuff just off the top of his memory.	19:37.11
	mvrhel_laptop: You mean keep both compressed and uncompressed data in the tile cache?	19:37.25
mvrhel_laptop	yes	19:37.30
	during writing	19:37.34
Robin_Watts	We could, but that will bloat the tile cache.	19:37.35
	Maybe that's acceptable though.	19:37.50
mvrhel_laptop	true. I don't see why we would need the uncompressed	19:38.09
Robin_Watts	I was thinking that if we only had to hold compressed data, we'd (presumably) be better off than we are now too.	19:38.10
mvrhel_laptop	yes	19:38.18
Robin_Watts	Ah. If I'm remembering this right... the tile cache is used both in writing and reading.	19:43.02
	The data is uncompressed from the band into the tile cache.	19:43.31
	at the same offsets etc at which it was compressed INTO the band.	19:43.47
	So if we do move to holding compressed data in the tile cache for writing, then we'd need to ensure that we at least kept the spacing large enough for the uncompressed data.	19:45.17
	So it's not a showstopper, just an extra bit of complexity.	19:45.44
	When we go to put something into the tile cache during writing, we'd have to calculate the space the uncompressed thing would use.	19:46.12
	Then attempt to compress into that space - if it fits, great. If not, we'd just copy the uncompressed thing.	19:46.29
	Then writing into the band we just copy the data (compressed or uncompressed, doesn't matter).	19:46.47
	Then the reading side can always copy out/uncompress knowing it will be large enough.	19:47.03
mvrhel_laptop	ok	19:49.15
	bbiaw	19:55.38
Robin_Watts	tor8: ping	20:12.20
	So... any thoughts on me deleting mupdfdraw and muxpsdraw? Any reason not to just offer mudraw ?	20:13.14
	Should we rename mupdf to muview ?	20:13.32
ray_laptop	Robin_Watts: have you determined why the cache is getting re-loaded so much -- does it need to be larger. Are we seeing the same "traffic" at 600 dpi ?	20:17.22
Robin_Watts	ray_laptop: The cache doesn't seem to be being reloaded much.	20:17.41
	In my tests, I didn't see any evictions (though I may have if I had run for longer)	20:18.17
ray_laptop	Robin_Watts: I thought we saw it being written into 30K + times for 30K characters	20:18.26
Robin_Watts	Were we?	20:18.54
	clist_change_bits is called 30K times (as we'd expect)	20:19.48
	but cmd_put_bits is only called 6000 times.	20:20.05
ray_laptop	the profile shows 30K+ calls to clist_change_bits	20:20.45
Robin_Watts	That's what I just said :)	20:21.01
	clist_change_bits looks to see if it's in the cache. If it is, and it's been sent already for this band it just exits.	20:21.34
	If it's not in the cache, it puts it in. If it has not been sent for this band, it then calls cmd_put_bits.	20:22.10
	So the number of calls to cmd_put_bits correspond to the number of 'cache misses'.	20:22.33
ray_laptop	Robin_Watts: OK, I was working from (faulty) memory. I just opened the code and agree.	20:22.59
	Robin_Watts: sorry.	20:23.06
Robin_Watts	no worries!	20:23.12
	I'm new to this code, so could easily be making stupid mistakes.	20:23.23
	The call to cmd_put_bits does the number crunching of the compression.	20:23.45
	And that adds up to 17 seconds or so out of the 79 for this file.	20:24.06
ray_laptop	Robin_Watts: so the compression or not will only address the cmd_put_bits part of the time -- 16.6 seconds	20:24.22
Robin_Watts	Given this is all latin text at the same font size, I can't believe that we're actually compressing more than 64 glyphs.	20:25.03
	Indeed.	20:25.38
ray_laptop	The other question -- is it reasonable to have 6000 cache misses ? Looking at the file, that's hard to imagine. There are only a couple of embedded fonts	20:25.43
Robin_Watts	So that's 21% of overall time we're playing for.	20:26.00
	It's the single biggest hotspot, I believe.	20:26.17
ray_laptop	Robin_Watts: and besides compression, we have decompression.	20:26.43
Robin_Watts	ray_laptop: Right, but that's elsewhere.	20:26.58
	my proposal doesn't affect that.	20:27.05
ray_laptop	but I think investigating why the cache is so ineffective is the simplest	20:27.09
Robin_Watts	Suppose we have 64 glyphs in play. on 53 pages.	20:27.29
ray_laptop	Robin_Watts: the 6000 cache misses are only on _6_ pages !!	20:27.57
Robin_Watts	Really?	20:28.17
ray_laptop	Yes, look at the print_page_copies call count	20:28.36
Robin_Watts	OK. so 64 glyphs on 6 pages is 384 calls. That's our baseline minimum.	20:28.55
	How many bands per page?	20:29.06
ray_laptop	I wonder if this file paints text more than once on each page -- let me dump the text of a page and look at how many chars per page	20:29.47
tor8	Robin_Watts: go ahead and zap pdfdraw and xpsdraw. we must remember to fix the manpage as well to reflect the new name and capabilities.	20:31.37
Robin_Watts	I reckon we have 52 bands per page.	20:31.53
	312 calls to clist_playback_band; 312/6 = 52	20:32.16
ray_laptop	Robin_Watts: OK. the text wc gives: 300 lines, 5541 words, 34580 chars	20:32.30
Robin_Watts	So... if every glyph appears on every band of every page, we'd expect 52646 cache misses.	20:32.46
	= 19968	20:33.02
	So 6000 seems quite reasonable to me.	20:33.13
	That's the best the current scheme can do, regardless of cache size. Agreed?	20:33.42
	(sorry, cache misses is a bad term. "bitmap compression operations" would be better)	20:34.13
ray_laptop	Robin_Watts: so your concept is for a shared tile cache ?	20:34.22
Robin_Watts	No.	20:34.28
	At the moment, we hold uncompressed data in the tile cache.	20:34.56
	And we compress it into the band.	20:35.03
	we end up compressing the same data multiple times into multiple bands.	20:35.24
	Instead, I'd like to compress when we create the tile cache entry, and then just 'copy' into the band.	20:35.53
ray_laptop	If we have a number of bitmaps that are available to ALL bands, then might be able to get much better hit rate for text pages	20:36.05
Robin_Watts	We'd be reducing the 1000 compression operations per page to 384.	20:36.38
	Yes, mvrhel appeared earlier and suggested keeping the image data in a shared area, akin to how ICC profiles are done.	20:37.14
	And I pointed out that the code to do that is only in 9.0x, not in 8.71.	20:37.31
	Given how well the cache is working in this case, we'd not actually gain that much by having a shared area; we'd compress the data just once per page in both schemes.	20:38.35
ray_laptop	Robin_Watts: the clist 'pseudo_band' accessors are not to hard to back port (I had done something like it about 3 years ago for band complexity, so it doesn't rely on recent clist innovations)	20:38.45
Robin_Watts	The win would be that we didn't have to copy the compressed data into each band.	20:38.47
ray_laptop	clist innovations -- there's an oxymoron for you ;-)	20:39.03
Robin_Watts	ray_laptop: Well, if you want to take that on, then I won't complain about how you want to do it :)	20:39.17
	but I am aware that you have other priorities right now.	20:39.48
ray_laptop	Robin_Watts: we'd still have 30K decompressions, right ?	20:40.00
Robin_Watts	Yes, but then we have that under any scheme, right ?	20:40.14
ray_laptop	Robin_Watts: well if a bitmap was in the page level tile cache, there'd be less need for compression (a subset of the total)	20:41.23
Robin_Watts	A page level tile cache presents problems though.	20:41.43
	You can't safely reuse entries in the cache during rendering because different rendering threads might need it at different times.	20:42.21
ray_laptop	thinking on the fly here ... If a glyph occurs a second time stick it in the page level cache (uncompressed) until the page level is full	20:42.29
Robin_Watts	Hence you'd have to insist that the tile cache was large enough to hold all possible glyphs.	20:42.41
ray_laptop	it's not really a cache in that it's there for the entire page.	20:42.48
	Robin_Watts: no -- when the page level bitmap storage was full we fallback to the current per-band scheme	20:43.25
Robin_Watts	Right. So you'd have to have an additional block of memory for this page level tile storage block, and you'd fill it with tiles until it was full, and then drop back to the old method ?	20:43.36
ray_laptop	Robin_Watts: and recall the page level tiles go into the clist.	20:44.00
Robin_Watts	Ok, so consider a typical page, where we have a title in a large font, followed by lots of body text in a smaller font.	20:44.27
ray_laptop	so if we reduce the number of tiles per page from 1000 to, say, 100 we are still OK.	20:44.38
	the page level bitmap storage doesn't need to be a constrained size	20:44.54
	Robin_Watts: and if it is in a pseudo-band, then the memory based clist compression fallback will automatically compress that too	20:45.25
Robin_Watts	We'd use all the page level bitmap storage up putting the headline glyphs in, then have no room left for the body text ones which are the ones it would really help with.	20:45.27
	I'm confused now.	20:45.48
	How can we random access into a band if it's compressed ?	20:46.09
ray_laptop	the clist is memory based anyway, so anything we do to only store a bitmap glyph once is a win	20:46.22
	gxclmem does that -- it has a LRU cache of decompressed blocks	20:46.49
Robin_Watts	Well, clearly, if you can arrange to only store bitmaps once per page, rather than once per band that's a major win.	20:47.04
ray_laptop	Robin_Watts: that's what I was thinking -- both in performance (if we don't compress) and in space	20:47.50
Robin_Watts	Even if we compress it'd still be a 50fold improvement (on writing) on what we have now.	20:48.33
	because we'd only compress once per page rather than once per band.	20:48.51
ray_laptop	But for disk based tiles, we want some set of LRU "blocks" of bitmaps that get reloaded on demand from the clist.	20:49.17
Robin_Watts	And it would be a similar improvement on reading too, as we'd only decompress once too, rather than once per band.	20:49.21
ray_laptop	Robin_Watts: If it's stored compressed we still have to decompress to use it in copy_mono	20:49.54
Robin_Watts	This all sounds nice in theory, but it's a scarily large change to try to patch on top of the 8.71 + random stuff that 532 is using.	20:50.24
ray_laptop	Robin_Watts: well, it's bits and pieces of stuff that is already done.	20:50.50
Robin_Watts	Whereas, I'd hoped that 4) would be manageable.	20:50.55
ray_laptop	mvrhel or I could probably port the pseudo-band accessor functions readily enough. The changes are mostly in clist_fill_mask I guess	20:53.41
	Robin_Watts: (4) ???	20:54.05
	Robin_Watts: nm -- I just saw your email	20:55.23
Robin_Watts	sorry, was on phone.	20:55.44
	Ah, you'd not seen the email. I'm surprised I was making any sense at all :)	20:56.06
ray_laptop	Robin_Watts: well, it's not like I haven't been pondering this as well :-)	20:56.50
	Robin_Watts: but the more I think about it, the idea of having the bitmaps stored (uncompressed) in a pseudo band, and the only cache be in the reader that knows how to reload a tile from the pseudo band, the simpler it sounds.	20:58.13
Robin_Watts	So we'd do away with the tile cache completely ?	20:58.45
ray_laptop	The only reason for the writer trying to manage the tile cache is so the reading of the bands can be 'streamed'	20:59.08
	Robin_Watts: no, we'd use a tile cache in the reader so that when a certain bitmap was requested, it would load from the pseudo-band if needed, and re-use of slots is allowed since we can reload as needed	21:00.28
	The reader tile cache is what prevents "thrashing" if we are disk based.	21:01.08
	so we NEVER store a bitmap more than once in the clists (in the pseudo band) unlike now where we store it multiple times for bands that need the glyph	21:02.11
	Robin_Watts: this follows somewhat what I have been thinking about for high-level image data as well -- store in a pseudo band. This is less of a clist size win since we only store a subset of an image in any particular band (what fits in the band plus the 'support')	21:03.48
henrys	thinking about Robin_Watts compression numbers I imagine we could get a quick fix to their problem with no compression. At the end of the day it is peak memory usage that matters and it would be hard to imagine a page text is going to rival typical grapics that would be printed on a usable printer.	21:05.09
ray_laptop	Robin_Watts: but the image pseudo-band storage would be a HUGE win for skewed images that either greatly extend the number of support lines or punt to rectangles now	21:05.11
	Robin_Watts: any idea what having the 6000 bitmaps uncompressed vs. the current compression will be ? (iirc, we use G4 now which is pretty good)	21:06.18
	henrys: I must have missed "Robin_Watts compression numbers" was that an email or IRC ?	21:07.36
henrys	I think it is great to improve the current code. Getting something quickly working for an 8.71 branch seems a different project, we need something very simple.	21:08.03
ray_laptop	henrys: and I have no objection to Robin_Watts taking a quicker approach that we later do differently	21:08.19
henrys	1 meg bands vs. 256k k bands.	21:08.32
	have you customized their band setup?	21:08.48
	ray_laptop?	21:08.56
ray_laptop	henrys: they run BandHeight=256	21:09.30
henrys	so I see 52 bands for 1200.	21:10.44
	I imagine Robin_Watts numbers we not done for that BandHeight	21:12.29
	s/we/were	21:12.35
Robin_Watts	sorry, off phone now. let me read the logs.	21:13.15
	52 bands for 1200, yes.	21:14.30
	Unless the simulator was set up differently, that's what I'd expect I was running at.	21:15.04
	Helen is calling me for dinner. It's Hallmark day, so I'd better go...	21:15.43
	I'll check back in later to see if you have any more thoughts.	21:15.55
henrys	okay	21:16.28
Robin_Watts	I could send Eric an email tomorrow suggesting that he tries removing compression (with a patch) and then he can see if that makes a timing difference.	21:16.35
ray_laptop	because that's what their raster pipeline needs -- this matters on the older color product, but here the band heihhy	21:16.52
	It's easy enough to take the BandHeight param out of the ***_gs_main and let the BufferSpace determine the band height (their code won't be able to tell on the back end)	21:18.01
	Robin_Watts: I have to go now too	21:18.19
	Robin_Watts: but if you email Eric, suggest that he increase the BandHeight param or remove it	21:18.55
sebras	tor8: hm. are the elements in a matrix allowed to take the float value inf?	21:53.01
tor8	no.	21:53.33
sebras	tor8: when readin fz_invert_matrix() is seems like this may actually happen whenever two of abcd are zero e.g...	21:53.56
	a pdf with a pattern matrix of [ 0 0 0 0 0 0 ] triggers the bug.	21:54.31
tor8	not all matrices are invertible	21:54.34
sebras	I know, but mupdf doesn't check for that.	21:54.48
tor8	and that matrix is indeed degenerate	21:54.54
sebras	sure, so are you confident that we handle inf "correctly" i.e. without crashing elsewhere?	21:55.20
tor8	right, so we should probably check that the det != 0 before inverting	21:55.39
	no, we'll crash big time	21:55.49
sebras	actually no. :)	21:56.06
	I just tried it on a pattern and it works fine, don't ask me why though...	21:56.21
tor8	maybe the pattern isn't used :)	21:56.31
	or the bbox is set properly and it doesn't contain any graphics	21:56.44
	in which case the matrix isn't actually used	21:56.52
sebras	tor8: that could be the case.	21:57.50
	tor8: will you push a fix?	21:58.03
	tor8: I assume that we'll return the matrix unchanged instead?	22:08.30
tor8	sebras: I don't have sane working a.t.m. but there is a patch on my users/tor repo	22:11.42
sebras	tor8: excelltn.	22:12.51
	then my comment fits. :)	22:12.59
marcosw_	henrys: I've reinstalled the cluster software on henrysx6 and it appears to work but I have temporarily disabled it since I'd like to test it more thoroughly when I'm online, which should be tomorrow.	22:42.01
sebras	Robin_Watts tor8: pushes some more text to sebras/doc	23:15.42
	I must read up on the text device to be able to document it. also I would appreciate if Robin could write something about fz_set_aa_level() becuase that code is beyond my comprehension.	23:17.26
tor8	sebras: I'm rewriting the text device (slowly...) so don't spend any time on that yet	23:45.32
	Forward 1 day (to 2012/02/15)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.