Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2017/05/22)20170523 
tor8 avih: morning.08:58.41 
  finally got some time to look at your patch08:58.47 
  adding a special fast case for integers in jsV_numbertostring does speed things up quite a bit08:59.10 
  even on top of the grisu2 code08:59.21 
  avih: is there a particular reason for the isnormal() and range checks, rather than just if ((double)(int)f == f) ?09:00.49 
avih tor8: i didn't want to handle anything around 0. for instance, without it, that fast path would be entered for -0.0 and i preferred to leave that to the more complete code (which in your case handled it a the ==0 test, and i'm not sure it's right). the <= INT_MAX is required since because casting an out of range double to int is undefined behavior09:37.07 
  and there are also subnormals, which i preferred not handle. i.e. i took the easy case of numbers which are non-0 integers09:39.12 
  (the <= INT_MAX also validly rule out nan and inf, and otherwise casting any of them to int is also UB)09:42.06 
tor8 implementation defined, or undefined?09:43.33 
avih (but isnormal filters them too, so the isnormal part only addition is for things around 0)09:43.51 
  i believe undefined09:44.09 
tor8 then range checks and (double)(int)f == f should be enough09:46.40 
  that should take care of subnormals too09:46.49 
avih how so? or maybe add range also > 0 ?09:47.08 
  everything around 0 is dodgy though. i think (f > 0) can still be subnormal, depending on compiler/cpu modes09:48.43 
tor8 in jsV_numbertostring we don't care about the signedness of 009:49.34 
avih and also not about subnormals? if both are correct, then it just needs d >= INT_MIN && d <= INT_MAX && d == (int)d09:50.41 
  actually, no. it must be non negative for js_itoa. so >= 0 && <= INT_MAX would be enough09:51.44 
tor8 I can change js_itoa to allow signed numbers.09:52.44 
  In fact, I did that as part of this change.09:52.52 
avih i considered it too, but didn't know what its role is in the grand scheme of things09:53.09 
tor8 minor :)09:53.15 
avih :)09:53.18 
tor8 it's a convenience function for js_getindex and several other places that handle array indices09:53.34 
  which are generally only positive09:53.40 
  look at mujs tor/master branch for a commit09:54.09 
avih right. i think more than generally though, isn't it? iirc the es5 spec defines "valid array index" as equivalent to anything which uint32 can hold09:54.44 
  or maybe UINT32_MAX - 1 or some such09:56.22 
tor8 yes. there's a bunch of voodoo for array objects (auto updating the .length property if you set a property which is classified as an "array index")09:56.37 
avih indeed09:57.11 
tor8 given that we work with signed integers (mixing signed and unsigned math is just too bug prone, IMO) our use of that is slightly more limited09:57.13 
avih nothing prevents you from using uint32 at hasindex etc09:57.48 
  tor8: you have a bug. you're entering undefined behavior for f == INT_MIN. because -f is bigger than INT_MAX09:59.34 
tor8 avih: where?10:00.54 
avih you could test > INT_MIN instead of >=, but I don't think you get guarantees that intmin is -intmax - 110:01.10 
  oh, sorry, js_itoa now handles negs10:02.13 
  what's 'a' used for?10:03.16 
  ah, it's at js_itoa10:03.34 
  it's ok. same as my patch, it misses half the range which uint can handle and int cannot10:05.03 
  tor8: did you measure any perf impact the patch has on top of grisu2?10:06.01 
  also, i think unrelated to the patch, if you notice the numbers i posted for "frac strings" then grisu2 improves x3+, which i don't understand why. do you?10:07.12 
tor8 the grisu2 patch also adds a new faster strtod10:08.58 
  I'm sure it's got different bugs from the other strtod we used...10:09.29 
avih yes, but it shouldn't be entered for a["1.5"], right?10:09.29 
tor8 if a is an Array object, it is entered10:09.55 
  to check whether the property is an integer10:10.02 
avih oh?10:10.04 
tor8 js_isarrayindex10:10.12 
  I'm rewriting that to be faster too10:10.17 
avih but it doesn't seem to affect perf of a["1"]10:10.26 
  but a["0.5"] is now more than 3x faster10:10.53 
tor8 probably due to the difference in behaviour of old strtod?10:11.18 
avih in general though, i think the next step in array access improvement would be to use C arrays under some conditions (probably e.g. at least N items and sparseness factor up to 2 or some such) without going through the properties search code.10:12.18 
  that grisu2 is fantastic.10:13.02 
tor8 a is the unsigned integer used to convert the integer to a decimal string10:13.27 
  avih: I've got timings doing 'a[2] = a[0] + a[1]' one million times10:15.55 
avih how does it compare?10:16.10 
tor8 pre-grisu2: 9000ms, grisu2: 652ms, fast-path numbertostring: 627ms, fast-path isarrayindex: 600ms10:18.52 
avih so 10% at most?10:19.37 
  (for this test case)10:19.45 
tor8 for this integer only test case, yes10:19.52 
avih well, a[0] is already fastpath in the pre-grisu2 code, so you're only testing two out of three cases which fast paths improve.10:21.12 
  you should test a[2] = a[1] + a[1] + a[1] + a[1] at least IMO10:21.38 
tor8 a[2] and a[1] both exercise all the fast path codes10:23.30 
  the a[0] is largely irrelevant and just adds a bit more fixed overhead10:23.40 
avih they do, but your loop does a lot more than accessing a[2] and a[1]10:23.53 
  which is why the diff you're seeing does not represent the actual improvement10:24.26 
tor8 of course. percentages are only percentages of the current program, not of the time spent in number conversions.10:25.14 
avih yep. the higher the percentage of the time the test prog spends in using the fastpath, the closer the diff becomes to the actual improvement.10:26.39 
tor8 just moving the loop into a function (as opposed to at top level script scope) takes it from 600ms to 400ms10:27.11 
avih heh10:27.23 
tor8 because of the variable accesses being optimized to direct stack slots rather than using globals in the global scope object10:27.34 
avih don't start unrolling loop and manually inlining code :)10:27.43 
tor8 nah, I doubt unrolling a js loop is going to have any measurable impact at all10:28.16 
avih oh, you meant at the js code.10:28.24 
tor8 it's the property accesses that cost10:28.26 
avih yeah, that's why i think it's the next meaningful step, and till then, i think grisu fills the gap very nicely10:29.26 
  and don't forget the new fastpath also has some cost when it's not entered.10:30.56 
  small, but everything counts10:31.16 
  especially when the fastpath now is only slightly faster than grisu2, it means it could be almost doubling the evaluation time when not entered10:32.54 
  (less actually, only the checks are made, not the js_itoa code)10:33.45 
  seems that "milo" is almost twice faster than grisu2 https://github.com/miloyip/dtoa-benchmark10:35.03 
  not sure what "null" is. if it's actually a generic conversion then it's another x2-3 faster.10:36.31 
  ah "the null implementation does nothing"10:37.00 
tor8 "milo" is C++10:37.03 
avih hmm.. other than some static_cast<...> i don't see cpp stuff in a glance, looking here https://github.com/miloyip/dtoa-benchmark/blob/master/src/milo/dtoa_milo.h10:39.26 
tor8 what about the classes then?10:42.30 
avih must have missed them? :)10:42.40 
  where?10:42.52 
  the code does have some annoying compiler tests and different codepaths10:44.02 
jogux sebras: okay, I'm here :)10:44.42 
avih tor8: is the grisu2 code expected to work on arm and other processors?10:44.56 
tor8 avih: yes. anything that has uint64_t.10:45.18 
avih that's good10:45.30 
tor8 no special voodoo defines required like gay's dtoa10:45.44 
avih IMO then don't add my/your fastpath patch. grisu2 is really good, and the next step should be IMO C array access where possible to avoid property search10:46.50 
tor8 faster number <-> string conversions for integers is still worth doing, and it's easy low hanging fruit10:47.38 
  avoiding property search is non-trivial10:47.54 
avih i get both :)10:48.29 
  and while it does hang low, it's a pretty small fruit :) but hey, it's not nothing :)10:49.22 
sebras jogux: nice!10:54.17 
  jogux: can I trouble you with attempting to compile and test this one? http://git.ghostscript.com/?p=user/sebras/mupdf-ios-viewer.git;a=commit;h=087aef0410:54.59 
  jogux: I hope I got it reasonable correct but objc is not my forte.10:55.12 
tor8 avih: timings with optimised loop at 10 million entries, grisu: 4.2s, fast-path numbertostring: 3.3s, fast-path isarrayindex: 2.1s11:02.46 
  avih: commits on tor/master if you want to try your own timings11:03.09 
avih tor8: right. so it's worth it. what's the last case though?11:03.22 
  i'll give it a go, yes.11:03.39 
jogux there's some problem with iOS mupdf builds btw, Xcode now frequently gets stuck in the custom build step. I don't know if it's a Xcode upgrade or a mupdf change that's caused it :(11:03.55 
tor8 rewrote js_isarrayindex to do fast string to integer conversion with early bailout on non-integer and overflow11:03.59 
jogux sebras: That seems to work. I see 'MuPDF 1.11 Documents' in the homescreen toolbar now.11:04.27 
avih gotcha.11:04.56 
sebras jogux: nice.11:05.08 
tor8 instead of the "this is what the spec says, make it work, don't care about speed" implementation it used to have11:05.20 
sebras jogux: I'm worried that the title string might leak.11:05.26 
tor8 which did a string-to-number + number-to-string + strcmp11:05.31 
sebras jogux: I'm not sure how reference counting and stuff works here.11:05.36 
jogux sebras: all the methods you are calling return autoreleased strings.11:05.48 
  (so it won't leak)11:05.55 
sebras jogux: alright, how would I find this out? (so I don't have to worry about this next time)11:06.22 
jogux there's some rule like unless the method has alloc or new or something obvious like that in the name it returns an autoreleased thing.11:06.37 
avih tor8: that's some roundtrip :)11:06.44 
  i hope it enjoyed the view at least11:07.04 
jogux sebras: But no one worries about this anymore, we should convert mupdf to use ARC (Automatic Reference Counting) sometime.11:07.10 
  Other than mupdf it's years since I've had to do manual memory management on iOS.11:07.21 
sebras jogux: do you want me to check this fix in, or is it better that you do it?11:07.48 
jogux sebras: you might as well :)11:08.00 
sebras or is it merge/commit/push/argh?11:08.07 
  will do.11:08.13 
  jogux: thanks for helping out. :)11:09.57 
jogux no problem11:10.03 
avih tor8: your branch is a meaningful speed up for the test case i posted at the PR. i say push :)11:19.46 
  (just tested)11:19.59 
  compared to mujs master with grisu, the "int numbers" case is 30% faster, "int strings" and "frac strings" ~15% faster, and "frac numbers" ~8% faster11:22.48 
  that's pretty good in my book11:22.53 
  however, in a real world code which was suffering terribly before grisu, and which made me investigate and submit the PR, the diff from grisu2 to your branch is barely measurable. probably 2% or so. though surely there are real world cases which will benefit more. as reference, grisu/my-PR sped up my "real world" code almost x2.11:27.37 
tor8 yes, pre-grisu to grisu was a 20x speed difference11:30.38 
avih yes11:30.44 
tor8 the extra 2x is barely noticeable after that :)11:30.53 
avih :)11:31.18 
  tor8: fwiw, my code which was too-slow-for-my-taste is the set_timer and insert_sorted functions here https://0x0.st/EqT.txt11:32.49 
  (this iterative-move insert_sorted actually ends up faster than splice, which initially surprised me, but then i realized splice also creates a new array, and has to iterate all the items pretty much the same as insert_sorted now does anyway)11:34.09 
tor8 yes, and it needs to deal with odd cases that your code doesn't need to worry about11:35.13 
avih yes11:36.28 
  when splice is implemented with memmove i might get back to splice ;)11:52.00 
  tor8: fwiw, on the test cases from the PR, with your branch, duktape is now almost 2x faster for "int numbers", ~30% faster on int/frac strings, and finally almost 2x slower on frac numbers :)12:13.51 
  (admittedly though, accessing an array with fractional numbers is not too common ;) )12:14.33 
  but i _think_ it uses C arrays where possible, and if that's true, then the always-search-property implementation is really nice and fast12:19.53 
tor8 duktape also has 6x as many lines of code...12:36.21 
avih very true indeed12:40.24 
  2x slower in something which doesn't pretend to be blazingly fast to begin with is something i'd take every time over x6 LOC12:41.25 
  hence "FWIW". just to give you some motivation ;)12:42.08 
  tor8: do string vars assignments duplicate chars? or just char*?15:33.34 
tor8 avih: it depends.15:49.23 
avih whether it fits in a value or not?15:49.33 
  and if not?15:49.39 
tor8 short strings (< 16 bytes) fit in the value15:49.41 
  other strings are all by pointer15:49.47 
avih cool. thx15:49.52 
 Forward 1 day (to 2017/05/24)>>> 
ghostscript.com #ghostscript
Search: