MuPDF IRC logs

	<<<Back 1 day (to 2017/05/22)	20170523
tor8	avih: morning.	08:58.41
	finally got some time to look at your patch	08:58.47
	adding a special fast case for integers in jsV_numbertostring does speed things up quite a bit	08:59.10
	even on top of the grisu2 code	08:59.21
	avih: is there a particular reason for the isnormal() and range checks, rather than just if ((double)(int)f == f) ?	09:00.49
avih	tor8: i didn't want to handle anything around 0. for instance, without it, that fast path would be entered for -0.0 and i preferred to leave that to the more complete code (which in your case handled it a the ==0 test, and i'm not sure it's right). the <= INT_MAX is required since because casting an out of range double to int is undefined behavior	09:37.07
	and there are also subnormals, which i preferred not handle. i.e. i took the easy case of numbers which are non-0 integers	09:39.12
	(the <= INT_MAX also validly rule out nan and inf, and otherwise casting any of them to int is also UB)	09:42.06
tor8	implementation defined, or undefined?	09:43.33
avih	(but isnormal filters them too, so the isnormal part only addition is for things around 0)	09:43.51
	i believe undefined	09:44.09
tor8	then range checks and (double)(int)f == f should be enough	09:46.40
	that should take care of subnormals too	09:46.49
avih	how so? or maybe add range also > 0 ?	09:47.08
	everything around 0 is dodgy though. i think (f > 0) can still be subnormal, depending on compiler/cpu modes	09:48.43
tor8	in jsV_numbertostring we don't care about the signedness of 0	09:49.34
avih	and also not about subnormals? if both are correct, then it just needs d >= INT_MIN && d <= INT_MAX && d == (int)d	09:50.41
	actually, no. it must be non negative for js_itoa. so >= 0 && <= INT_MAX would be enough	09:51.44
tor8	I can change js_itoa to allow signed numbers.	09:52.44
	In fact, I did that as part of this change.	09:52.52
avih	i considered it too, but didn't know what its role is in the grand scheme of things	09:53.09
tor8	minor :)	09:53.15
avih	:)	09:53.18
tor8	it's a convenience function for js_getindex and several other places that handle array indices	09:53.34
	which are generally only positive	09:53.40
	look at mujs tor/master branch for a commit	09:54.09
avih	right. i think more than generally though, isn't it? iirc the es5 spec defines "valid array index" as equivalent to anything which uint32 can hold	09:54.44
	or maybe UINT32_MAX - 1 or some such	09:56.22
tor8	yes. there's a bunch of voodoo for array objects (auto updating the .length property if you set a property which is classified as an "array index")	09:56.37
avih	indeed	09:57.11
tor8	given that we work with signed integers (mixing signed and unsigned math is just too bug prone, IMO) our use of that is slightly more limited	09:57.13
avih	nothing prevents you from using uint32 at hasindex etc	09:57.48
	tor8: you have a bug. you're entering undefined behavior for f == INT_MIN. because -f is bigger than INT_MAX	09:59.34
tor8	avih: where?	10:00.54
avih	you could test > INT_MIN instead of >=, but I don't think you get guarantees that intmin is -intmax - 1	10:01.10
	oh, sorry, js_itoa now handles negs	10:02.13
	what's 'a' used for?	10:03.16
	ah, it's at js_itoa	10:03.34
	it's ok. same as my patch, it misses half the range which uint can handle and int cannot	10:05.03
	tor8: did you measure any perf impact the patch has on top of grisu2?	10:06.01
	also, i think unrelated to the patch, if you notice the numbers i posted for "frac strings" then grisu2 improves x3+, which i don't understand why. do you?	10:07.12
tor8	the grisu2 patch also adds a new faster strtod	10:08.58
	I'm sure it's got different bugs from the other strtod we used...	10:09.29
avih	yes, but it shouldn't be entered for a["1.5"], right?	10:09.29
tor8	if a is an Array object, it is entered	10:09.55
	to check whether the property is an integer	10:10.02
avih	oh?	10:10.04
tor8	js_isarrayindex	10:10.12
	I'm rewriting that to be faster too	10:10.17
avih	but it doesn't seem to affect perf of a["1"]	10:10.26
	but a["0.5"] is now more than 3x faster	10:10.53
tor8	probably due to the difference in behaviour of old strtod?	10:11.18
avih	in general though, i think the next step in array access improvement would be to use C arrays under some conditions (probably e.g. at least N items and sparseness factor up to 2 or some such) without going through the properties search code.	10:12.18
	that grisu2 is fantastic.	10:13.02
tor8	a is the unsigned integer used to convert the integer to a decimal string	10:13.27
	avih: I've got timings doing 'a[2] = a[0] + a[1]' one million times	10:15.55
avih	how does it compare?	10:16.10
tor8	pre-grisu2: 9000ms, grisu2: 652ms, fast-path numbertostring: 627ms, fast-path isarrayindex: 600ms	10:18.52
avih	so 10% at most?	10:19.37
	(for this test case)	10:19.45
tor8	for this integer only test case, yes	10:19.52
avih	well, a[0] is already fastpath in the pre-grisu2 code, so you're only testing two out of three cases which fast paths improve.	10:21.12
	you should test a[2] = a[1] + a[1] + a[1] + a[1] at least IMO	10:21.38
tor8	a[2] and a[1] both exercise all the fast path codes	10:23.30
	the a[0] is largely irrelevant and just adds a bit more fixed overhead	10:23.40
avih	they do, but your loop does a lot more than accessing a[2] and a[1]	10:23.53
	which is why the diff you're seeing does not represent the actual improvement	10:24.26
tor8	of course. percentages are only percentages of the current program, not of the time spent in number conversions.	10:25.14
avih	yep. the higher the percentage of the time the test prog spends in using the fastpath, the closer the diff becomes to the actual improvement.	10:26.39
tor8	just moving the loop into a function (as opposed to at top level script scope) takes it from 600ms to 400ms	10:27.11
avih	heh	10:27.23
tor8	because of the variable accesses being optimized to direct stack slots rather than using globals in the global scope object	10:27.34
avih	don't start unrolling loop and manually inlining code :)	10:27.43
tor8	nah, I doubt unrolling a js loop is going to have any measurable impact at all	10:28.16
avih	oh, you meant at the js code.	10:28.24
tor8	it's the property accesses that cost	10:28.26
avih	yeah, that's why i think it's the next meaningful step, and till then, i think grisu fills the gap very nicely	10:29.26
	and don't forget the new fastpath also has some cost when it's not entered.	10:30.56
	small, but everything counts	10:31.16
	especially when the fastpath now is only slightly faster than grisu2, it means it could be almost doubling the evaluation time when not entered	10:32.54
	(less actually, only the checks are made, not the js_itoa code)	10:33.45
	seems that "milo" is almost twice faster than grisu2 https://github.com/miloyip/dtoa-benchmark	10:35.03
	not sure what "null" is. if it's actually a generic conversion then it's another x2-3 faster.	10:36.31
	ah "the null implementation does nothing"	10:37.00
tor8	"milo" is C++	10:37.03
avih	hmm.. other than some static_cast<...> i don't see cpp stuff in a glance, looking here https://github.com/miloyip/dtoa-benchmark/blob/master/src/milo/dtoa_milo.h	10:39.26
tor8	what about the classes then?	10:42.30
avih	must have missed them? :)	10:42.40
	where?	10:42.52
	the code does have some annoying compiler tests and different codepaths	10:44.02
jogux	sebras: okay, I'm here :)	10:44.42
avih	tor8: is the grisu2 code expected to work on arm and other processors?	10:44.56
tor8	avih: yes. anything that has uint64_t.	10:45.18
avih	that's good	10:45.30
tor8	no special voodoo defines required like gay's dtoa	10:45.44
avih	IMO then don't add my/your fastpath patch. grisu2 is really good, and the next step should be IMO C array access where possible to avoid property search	10:46.50
tor8	faster number <-> string conversions for integers is still worth doing, and it's easy low hanging fruit	10:47.38
	avoiding property search is non-trivial	10:47.54
avih	i get both :)	10:48.29
	and while it does hang low, it's a pretty small fruit :) but hey, it's not nothing :)	10:49.22
sebras	jogux: nice!	10:54.17
	jogux: can I trouble you with attempting to compile and test this one? http://git.ghostscript.com/?p=user/sebras/mupdf-ios-viewer.git;a=commit;h=087aef04	10:54.59
	jogux: I hope I got it reasonable correct but objc is not my forte.	10:55.12
tor8	avih: timings with optimised loop at 10 million entries, grisu: 4.2s, fast-path numbertostring: 3.3s, fast-path isarrayindex: 2.1s	11:02.46
	avih: commits on tor/master if you want to try your own timings	11:03.09
avih	tor8: right. so it's worth it. what's the last case though?	11:03.22
	i'll give it a go, yes.	11:03.39
jogux	there's some problem with iOS mupdf builds btw, Xcode now frequently gets stuck in the custom build step. I don't know if it's a Xcode upgrade or a mupdf change that's caused it :(	11:03.55
tor8	rewrote js_isarrayindex to do fast string to integer conversion with early bailout on non-integer and overflow	11:03.59
jogux	sebras: That seems to work. I see 'MuPDF 1.11 Documents' in the homescreen toolbar now.	11:04.27
avih	gotcha.	11:04.56
sebras	jogux: nice.	11:05.08
tor8	instead of the "this is what the spec says, make it work, don't care about speed" implementation it used to have	11:05.20
sebras	jogux: I'm worried that the title string might leak.	11:05.26
tor8	which did a string-to-number + number-to-string + strcmp	11:05.31
sebras	jogux: I'm not sure how reference counting and stuff works here.	11:05.36
jogux	sebras: all the methods you are calling return autoreleased strings.	11:05.48
	(so it won't leak)	11:05.55
sebras	jogux: alright, how would I find this out? (so I don't have to worry about this next time)	11:06.22
jogux	there's some rule like unless the method has alloc or new or something obvious like that in the name it returns an autoreleased thing.	11:06.37
avih	tor8: that's some roundtrip :)	11:06.44
	i hope it enjoyed the view at least	11:07.04
jogux	sebras: But no one worries about this anymore, we should convert mupdf to use ARC (Automatic Reference Counting) sometime.	11:07.10
	Other than mupdf it's years since I've had to do manual memory management on iOS.	11:07.21
sebras	jogux: do you want me to check this fix in, or is it better that you do it?	11:07.48
jogux	sebras: you might as well :)	11:08.00
sebras	or is it merge/commit/push/argh?	11:08.07
	will do.	11:08.13
	jogux: thanks for helping out. :)	11:09.57
jogux	no problem	11:10.03
avih	tor8: your branch is a meaningful speed up for the test case i posted at the PR. i say push :)	11:19.46
	(just tested)	11:19.59
	compared to mujs master with grisu, the "int numbers" case is 30% faster, "int strings" and "frac strings" ~15% faster, and "frac numbers" ~8% faster	11:22.48
	that's pretty good in my book	11:22.53
	however, in a real world code which was suffering terribly before grisu, and which made me investigate and submit the PR, the diff from grisu2 to your branch is barely measurable. probably 2% or so. though surely there are real world cases which will benefit more. as reference, grisu/my-PR sped up my "real world" code almost x2.	11:27.37
tor8	yes, pre-grisu to grisu was a 20x speed difference	11:30.38
avih	yes	11:30.44
tor8	the extra 2x is barely noticeable after that :)	11:30.53
avih	:)	11:31.18
	tor8: fwiw, my code which was too-slow-for-my-taste is the set_timer and insert_sorted functions here https://0x0.st/EqT.txt	11:32.49
	(this iterative-move insert_sorted actually ends up faster than splice, which initially surprised me, but then i realized splice also creates a new array, and has to iterate all the items pretty much the same as insert_sorted now does anyway)	11:34.09
tor8	yes, and it needs to deal with odd cases that your code doesn't need to worry about	11:35.13
avih	yes	11:36.28
	when splice is implemented with memmove i might get back to splice ;)	11:52.00
	tor8: fwiw, on the test cases from the PR, with your branch, duktape is now almost 2x faster for "int numbers", ~30% faster on int/frac strings, and finally almost 2x slower on frac numbers :)	12:13.51
	(admittedly though, accessing an array with fractional numbers is not too common ;) )	12:14.33
	but i _think_ it uses C arrays where possible, and if that's true, then the always-search-property implementation is really nice and fast	12:19.53
tor8	duktape also has 6x as many lines of code...	12:36.21
avih	very true indeed	12:40.24
	2x slower in something which doesn't pretend to be blazingly fast to begin with is something i'd take every time over x6 LOC	12:41.25
	hence "FWIW". just to give you some motivation ;)	12:42.08
	tor8: do string vars assignments duplicate chars? or just char*?	15:33.34
tor8	avih: it depends.	15:49.23
avih	whether it fits in a value or not?	15:49.33
	and if not?	15:49.39
tor8	short strings (< 16 bytes) fit in the value	15:49.41
	other strings are all by pointer	15:49.47
avih	cool. thx	15:49.52
	Forward 1 day (to 2017/05/24)>>>

Log of #mupdf at irc.freenode.net.