Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2020/01/29)Fwd 1 day (to 2020/01/31)>>>20200130 
avih ator: without walks, and intentionally verbose (can be cleaned up later): https://0x0.st/isSW.txt comments?11:28.56 
  i seriously hate the signed/unsigned stuff11:32.26 
  js_isarrayindex should be proper uint3211:32.59 
ator you mean 2^32-111:35.55 
  if you hate signed/unsigned, just live with the limit 2^31-1 instead11:36.13 
  it's what I did, mujs is going to run out of memory long before an array hits that limit11:36.43 
avih ator: i don't understand. uint32 is perfectly fit for all array index purposes. a valid index is < 2^31-1, so maximum size is 2^31-1. it should just work11:39.27 
ator 2^31-1 fits in a signed int3211:40.43 
  you need uint32 if you want to fit 2^32-111:41.01 
avih what?!11:43.48 
  how does 2^31-a fit signed int?11:44.01 
  -1 *11:44.06 
  (assuming int is 32 bits)11:44.20 
  oh, wait, you're right...11:45.14 
  wait, so the spec only allow arrays which don't use the MSB of uint32?11:45.49 
  i failed to grasp that...11:46.24 
ator the spec allows it, mujs doesn't treat numbers >2^31 with the magic array 'length' handling11:47.41 
avih though still, uint is more appropriate IMHO for array indexes. int can be negative...11:47.50 
ator see ecma-262-5.1 section 15.4 for the definition of "array index"11:48.45 
  yes, but then you run into the problem that the C language *really* doesn't want you to use unsigned integers11:49.11 
avih yeah, so why "A property name P (in the form of a String value) is an array index if and only if ToString(ToUint32(P)) is equal to P and ToUint32(P) is not equal to 2 32 1" and not just Int32 ?11:49.17 
  i mentally implied that Uint32 is used because Int32 cannot be used11:50.03 
ator I originally had used unsigned integers -- and after static analysis checks finding hundreds of possible issues with comparing signed and unsigned values, and the insane C automatic integer promotion rules, I just had enough11:50.16 
avih ator: ^ blame mupdf (which i think SumatraPDF uses) for not copying "^" at 2^31 ? :)11:51.04 
ator way too many possible hidden bugs by using unsigned for my liking11:51.04 
avih ah, there's no ^, the 32 is superscript11:51.37 
  31*11:51.49 
ator you pasted a U+F02D 'invalid character'11:52.02 
avih I did? :)11:52.12 
ator "... is not equal to 2 32 {U+F02D}1"11:53.13 
avih huh, quassel suppresses it on display, apparently11:53.40 
ator probably depends on the font/terminal/whatever11:54.05 
avih though the invalid point seems to be the minus. the superscript marking is just gone regardless11:54.34 
ator I have the 'ttf-unifont' package installed, which helps show broken characters11:54.46 
avih i use noto sans11:54.57 
ator yeah, the '-' probably has a bad encoding in the PDF file11:55.00 
  unifont has glyphs for all invalid characters too11:55.25 
  so it's handy as it ends up being the last resort fallback font for these cases11:55.40 
  in hexchat and gtk+ applications anyway11:55.55 
avih ator: anyway, i'll clean up the signgedness a bit (minor), what do you think of the approach/code/etc?11:56.16 
  i did add repack when it becomes very sparse, but the other way around (make it optimized again) is not there for now11:57.05 
  i'm not sure it's worth the complexity/performance11:57.27 
  (though obviously use cases can be crafted where it will be worth it)11:57.43 
ator avih: what does the 'u' in uprops stand for?11:58.50 
avih unsigned int keys11:59.02 
ator 'a' for array would be my preference :)11:59.13 
avih i did not use array because it applies to normal objects too11:59.23 
  but i don't feel strongly about that11:59.37 
ator some test cases, see if it actually helps in common code. it does add a fair bit of complexity and some memory bloat.11:59.48 
  'array part' of any object :)12:00.03 
avih sure. i don't mind :)12:00.14 
ator because as you said, it's extra complexity that may not be worth it in the end12:00.32 
avih re test cases, planned, don't have it setup yet (except for a trivial one which only tests dense arrays)12:00.38 
ator given how slow mujs is overall, optimizing for speed has never been a priority12:00.51 
  if I were, I'd make the tree a hash table12:02.07 
avih well, here's my test case and timing. insertion is meaningfully slow, lookup is reasonable but not blazing:12:02.16 
  var now = mp.get_time_ms /* (new Date()) should work, untested */, n=1000000, a = [], s = 0, t0 = now(); for (var i=0; i<n; i++) s=0; var t1=now(); for (var i=0; i<n; i++) a[i] = 0; t2=now(); for (var i=0; i<n; i++) s=a[i]; var t3=now(); print(t1-t0, t2-t1, t3-t2)12:03.01 
  representative timings of repeatitive runs: 221 1162 57112:03.33 
  for for indexes 0-1M, 220ms just for the loop, 1200ms for insertions, 570ms for lookups12:05.28 
ator Date.now() returns the current wall clock time in millis12:06.03 
avih yeah, i wrote it should probably work, just didn't try it12:06.19 
ator new Date() creates a whole new object, needlessly bloated :)12:06.35 
  just Date.now() returns a number12:06.41 
avih agreed12:06.42 
  yes, that's best.12:06.50 
  ator: i expect that it provides very meaningful speedup when dense, and very little cost otherwise, but clearly it doesn't worth much without proper numbers. I intended to use the benchmarks which quickjs uses, but i don't have it setup yet12:09.58 
ator in debug builds, I get these numbers (pre and post your patch): 406 1840 778 -> 401 1295 64312:09.59 
  for that one test case12:10.18 
avih interesting. very bad12:10.22 
ator so a measurable performance improvement12:10.54 
avih it's a very small one. not worth the code IMHO if normal builds have similar order of magnitude12:11.32 
ator 278 1402 623 -> 308 831 440 in a release build12:12.10 
avih still not worth it IMO12:12.24 
  the optimized insertion is barely twice faster...12:12.44 
  and that's the most costly operation otherwise12:13.00 
ator yeah, it's not an order of magintude faster. I fear it's going to be dwarfed by other costs.12:13.23 
avih (i don't scuff at twice faster, but it was meant to be quite better)12:13.37 
ator if I make the benchmark a lightweigt function, 61 1087 317 -> 61 500 11112:15.07 
  that takes out the overhead of assigning to local variables stored in an environment object in the loops, using the stack directly12:15.33 
avih hmm.. lookup actually gained the most. if you remove the loop overhead, average insertion is ~x2.2 faster, average lookup is ~x5 faster12:18.04 
  how come insertion gained so much less for so many keys?12:18.52 
ator avih: my old test hack is still available on tor/wip branch12:19.29 
avih ator: how does it perform with this test?12:19.44 
ator the "WIP: Fast Array implementation." that avoids using properties at all (just to see how fast it could go)12:19.48 
  there's also a more-or-less working Typed Array extension, that would be more helpful if doing number crunching12:20.11 
  65 502 11212:21.12 
avih pretty much the same as mine then12:21.28 
ator no wonder, I was running the old code :(12:21.45 
avih (61 500 111)12:21.45 
ator 64 115 10112:21.50 
  so even faster insertion, but the same lookup12:22.00 
  (but it loses all hope of enumeration and readonly/etc attributes)12:22.14 
avih no, both insertion and lookup of your branch are pretty much identical to mine, no?12:22.26 
ator see the new numbers I pasted where insertion is 115ms12:22.45 
  I was running the wrong code12:22.50 
avih wait, what's the diff which causes 500/115?12:23.06 
ator WIP: Fast Array implementation.12:23.17 
  cherry-pickable from the tor/wip branch12:23.28 
avih ah, so 500 is wrong/unrelated?12:23.31 
ator yes, that's the numbers with your commit12:23.46 
avih right12:23.52 
  i do wonder how come insertion is so relatively slow with my code.12:24.06 
  your numbers is kinda what i hoped/expected12:24.21 
  i can guess that the most expensive part of insert is malloc of the new object, and your implementation is probably array of objects and not pointers?12:25.19 
ator array of js_Value12:25.38 
  so no extra mallocs at all12:25.50 
avih right, same in this context12:25.51 
ator and no js_intern of the property name12:26.06 
avih right12:26.28 
  makes sense12:26.33 
  though indeed the size overhead becomes much more meaningful for sparse content12:27.45 
  though probably still worth it. malloc'ed blobs have overhead too in terms of space12:28.28 
  ator: so will you complete your test branch? i think my approach can be improved (array of properties and not pointers) and complete (include walks and everything else) relatively easily, and i can work on it if you'll include it if it ens up good12:39.14 
  i'll probably separate the structs to TProps (level/left/right and js_Properties value) and AProps (my uprops as array of objects rather than pointers)12:40.55 
  array of js_Properties *12:41.18 
  not sure yet about interning the name.12:41.27 
  but i won't work on it if you have no intention to merge such approach12:42.18 
  maybe only TProp should have interned name12:43.33 
  though looking at jsS_insert, it has no mallocs, so possibly it's fast enough12:45.25 
  from a holistic perspective though, the name is part of the property, and especially tied to atts12:47.32 
  even if not strictly required to be stored with the property12:48.07 
ator separating TProps and AProps sounds like a good idea, and array of AProps rather than pointers.12:50.41 
  AProp shouldn't need the name12:50.47 
  since it's implicit by the location in the array12:51.20 
  (but it may make it slightly easier to enumerate)12:51.35 
 <<<Back 1 day (to 2020/01/29)Forward 1 day (to 2020/01/31)>>> 
ghostscript.com #ghostscript
Search: