| <<<Back 1 day (to 2020/03/26) | Fwd 1 day (to 2020/03/28)>>> | 20200327 |
avih | ator: i managed to make a dent via gc threold strategy. splay becomes x10 faster with a strategy of next threshold = average of GC surviving allocations and GC'ed allocations. basically, in splay it adds huge amount of objects (nearly million) and very little is freed. so distancing GC makes a huge difference. on other tests it has less impact, but the threshold basically adapts itself to the amount of allocation action. | 14:38.09 |
| e.g. that's the last gc of splay: | 14:39.41 |
| garbage collected: thresh:1182733 8/11 envs, 3/52 funs, 863886/864328 objs, 318588/318590 strs | 14:39.42 |
| so the threshold for gccounter is just over a million at this stage. in other tests it auto balances to the number of allocations the program uses. | 14:40.33 |
| well, the last gc is maybe less interesting. this is 2 GCs earlier: | 14:42.16 |
| garbage collected: thresh:781032 1/8 envs, 0/52 funs, 129961/898605 objs, 138687/394751 strs | 14:42.16 |
| so it has huge number of objects which just stay there. | 14:42.51 |
ator | that's a case where generational and incremental gc would help performance | 14:54.59 |
| but I think if you're chasing performance there are much easier to reach low hanging fruit to go for | 14:55.13 |
| like changing the property lookup in objects into actual hash tables rather than simple self-balancing trees | 14:55.37 |
| keeping a pool of previously-allocated-but-now-free objects is a minor (but good) change I'm glad you're seeing benefits from it | 14:56.59 |
| I would gravitate towards "never" freeing the lists, except when constrained for memory | 14:58.20 |
| could also look into allocating the different types from a pre-allocated pool of memory | 14:58.30 |
| but again, all this is counter to the initial philosophy of "keep it stupidly simple" | 14:59.15 |
avih | ator: yeah, i checked "never" as well. it does help a bit for raytrace, not much elsewhere | 15:03.07 |
| s/checked/tried/ | 15:03.15 |
| not sure how much generational would help with splay though, because it only affects sweep, but if objects just remain there, then mark is the bottleneck | 15:04.09 |
| ator: well, keeping avail-lists is fairly simple (with or without freeing them) and useful. gc threshold strategy is also trivial with huge impact in some cases and no harm with others. | 15:06.14 |
| hash could indeed have an impact, as well as C-arrays - which we already measured with big impact, even if the patches were not complete | 15:07.12 |
| my own gripe with the code is the interning. of all property names and string object values | 15:08.51 |
| as i said, yesterday, the "avoid interning short property names" is buggy, probably due to use after free where it's likely pushed as literal in an iterator or some such and tries to live after the object dies | 15:09.51 |
ator | why does the interning bother you? it's intended to remove the overhead for duplicating strings everywhere they are used and not have to GC them | 15:10.42 |
avih | i know what it does, but to me it feels like a cheap hack | 15:11.46 |
ator | if it's because the interning table is a slow binary tree, make a hash table. | 15:11.48 |
avih | no, my main issue with it is that it never frees | 15:12.10 |
ator | the only strings that are interned are string constants from source code | 15:12.48 |
avih | i.e. there are valid program patterns which should be stable but will grow forever with the interning | 15:12.56 |
ator | it's a very common method for programming languages | 15:13.11 |
avih | of course not. all property names are interned, as well as any string which temporarily become an object to get its length or when using str[n] with it etc | 15:13.40 |
| (i have a patch to not intern String value) | 15:14.43 |
| (you never responded to it eventhough i posted it few times here) | 15:15.07 |
ator | I think having 'short' strings inline in a struct (like in js_Value) will go a long way towards solving the same issue with repeated property names | 15:15.35 |
avih | yes, i think property names should be js_Value | 15:15.59 |
ator | might look into putting string constants from the parser, the internal string in String objects, and properties use a subset of something like js_Value | 15:16.16 |
avih | this is trivial to do, but needs a bit more work to be used throughout the code. e.g. every place which does setproperty(.. name) would need to pass a js_Value where possible | 15:16.42 |
ator | js_StringValue or something like that, which has TSHRSTR and TMEMSTR and TLITSTR | 15:16.43 |
| or just SHRSTR and MEMSTR | 15:16.48 |
avih | exactly | 15:17.10 |
| but js_Value itself is fine as well | 15:17.19 |
ator | it needs to be only string values, js_Value allows too much | 15:17.24 |
avih | i don't disagree | 15:17.52 |
| but if you assert that it's only a string value, then it's equivalent, even if less pretty | 15:18.19 |
| ator: that's my patch to not intern String object values: https://0x0.st/iBxU.txt | 15:20.28 |
| so you see that jsV_newstring assert that it's a string value which is made an object, and the value is simply copied into the object. | 15:22.41 |
| ator: i don't think just SHRSTR and MEMSTR is enough. it needs literal as well, because it needs to hold any mujs string. | 15:25.38 |
| so the VM main loop can just use string value regardless of its source when it sets a property name | 15:26.27 |
ator | avih: okay, here's a counter proposal: | 15:46.08 |
| intern every string instead, and GC the string table | 15:46.18 |
avih | not bad | 15:46.58 |
ator | pass const char * throughout the API (and get the js_StringNode using str + offsetof(js_StringNode, p) | 15:47.02 |
| str - offsetof(...) | 15:47.08 |
avih | what does it counter? | 15:47.13 |
ator | would remove the complexity of MEMSTR, LITSTR, and SHRSTR | 15:47.22 |
avih | yeah, nice | 15:47.49 |
| and then at a later stage you could/would replace the tree with a hashtable | 15:48.04 |
ator | but would need more thorough GC scanning of strings (js_Function string constant table, function name, etc) | 15:48.11 |
| avih: yes. | 15:48.16 |
avih | indeed re more thorough | 15:48.25 |
ator | it would also remove the caveat of having to keep the stack slot alive in the C api to not corrupt char * | 15:48.47 |
| well, not completely, it could still be GC'd | 15:49.10 |
| but you would generally have to worry less | 15:49.22 |
avih | ator: so the assumption underneath this change is that looking a string up in a hashtable/tree is cheaper than allocating? | 15:49.27 |
ator | counter proposal b is to duplicate every string and make everything a MEMSTR | 15:50.00 |
avih | yes, did did consider that as well (all strings distinctly allocated) | 15:50.25 |
ator | a trade-off, not duplicating memory for lots of strings vs time spent interning on the C api border | 15:50.28 |
| less pressure on the GC as well, every new string need not be a new object that has to be walked and freed | 15:50.55 |
| s/object/allocation/ | 15:51.07 |
avih | i think both are less involved maybe that moving js_Value around, but ultimately i think the js_Value approach is more correct. | 15:51.33 |
| than* | 15:51.49 |
ator | for holding the internal value of a String object, I can agree | 15:52.07 |
avih | just ignore char * APIs internally, and replace it with js_StringValue, which could also be a js_Value | 15:52.30 |
ator | for the property list in objects, I'm not as convinced (that one is (my gut feeling) the main performance bottle-neck in a lot of code) | 15:52.43 |
avih | why would it be more code? | 15:53.03 |
ator | have to check 2 or 3 cases before being able to extract and compare the strings | 15:53.19 |
avih | it does need a lot of change, but i don't think it would end up in meaningfully more code | 15:53.26 |
| ator: but before all of this, my conclusion for now is that we don't really know where the time is spent. we have some anecdotal datapoints, but that's about it. and trying to improve things without a clear grasp of the current state is kinda meh, even if we both appreciate our prior experience, i already noticed it fail to predict the perf impact with mujs | 15:55.18 |
| it=my prior experience | 15:55.54 |
ator | not permanently growing memory when using String() and accessing properties using computed strings is the main concern I'm listening to | 15:57.12 |
avih | the facts which we do have is that gc threshold strategy and C-arrays can have big impact with susceptible use cases. we really don't know much beyond these. | 15:57.13 |
ator | performance is, meh, it's slow anyway :) | 15:57.19 |
avih | i hear you | 15:57.40 |
ator | "leaking" memory for the String() and property name case could become a real problem for some use cases | 15:57.55 |
avih | but i do want to at least know why it's slower than other js VMs. what's the main thing, if there is one | 15:58.15 |
ator | btw, historically the string types we have have grown a bit organically and maybe not the best way | 15:58.21 |
| started out by interning all strings, and *always* leaking them | 15:58.30 |
avih | yeah, as i said, cheap/lazy (and working) hack :) | 15:58.52 |
ator | then I added short strings to avoid interning and leaking lots of small strings (like every 'number' when accessing arrays) | 15:58.54 |
| and then I added memory strings to do garbage collection | 15:59.01 |
| I wanted to avoid having to GC and mess with GC'd strings in the lexer | 15:59.15 |
avih | right, i grok the evolution | 15:59.23 |
ator | revamping it to GC all strings and intern them to avoid having to duplicate all property names is something I could look into next I spend significant time on mujs | 16:00.00 |
avih | well, i do like the all strings are in a tree/hash thing. it does make the code simpler. | 16:01.02 |
ator | yes. | 16:01.25 |
avih | it's less correct, and slower, than passing js_value around, but maybe it can improve things. | 16:01.51 |
ator | define "correct" | 16:02.04 |
avih | correct is that you don't have to locate a char* pointer in a "DB" where one OPCODE earlier you already had it. | 16:02.42 |
ator | you wouldn't, you could always use the char* pointer directly | 16:03.10 |
avih | s/where/when/ | 16:03.14 |
ator | in the interpreter and internals | 16:03.22 |
avih | because it's already interned? | 16:03.27 |
ator | every place a string is entered into the stack or a value, it's interned | 16:03.48 |
| js_pushstring will intern the string | 16:03.55 |
| if it's coming from user space | 16:04.06 |
| the parser will intern strings going into the function opcodes, etc. | 16:04.20 |
avih | right. i need to think about it a bit, but if the interning itself only happens on external APIs of when new strings are created in the vm (concat, etc), then it sounds super fine to me | 16:04.43 |
ator | so only the touch points to the public C api where you are adding strings as values need interning | 16:04.46 |
| pushstring, and setproperty being the main ones | 16:05.00 |
avih | but only setproperty at the external API, yes? the internal api will use a version of setproperty which assumes it's already interned, yes? | 16:05.41 |
ator | the bytecode interpreter will just be passing char* pointers (embedded in a js_Value) around untouched | 16:05.42 |
| yes. the public js_setproperty will intern, the internal jsV_setproperty etc will assume already interned string | 16:06.05 |
avih | yeah, that's similar to my idea of passing jsvalue around except when char * comes from the external api. but it's better than mine, because it can remain char* internally as well | 16:06.47 |
ator | yeah. we'll lose the SHRSTR optimization for js_Value but I'm willing to pay that cost | 16:07.13 |
avih | but this means no short string, right? | 16:07.17 |
| right | 16:07.20 |
| fwiw, i actually found that interning the property names had impact even on read because the data was local. so shortstring, or embedded values in general, does have good performance impact due to data locality | 16:08.31 |
| ator: so if we do this and string never moved, is there anything which blocs from using dynamic stack size etc? | 16:10.29 |
| strings never move* | 16:10.46 |
| i think the only concern with moving js_values around is shortstrings | 16:11.16 |
| if that's gone, then it could be fully reallocatable | 16:11.34 |
| earlier: found out that NOT interning property names (i.e. using embedded buffer) had good perf impact* | 16:12.40 |
ator | avih: yes, re: reallocing stack. | 16:13.59 |
| avih: interning vs duplicating is a memory vs speed tradeoff | 16:14.45 |
| embedded buffers (and short strings) can be a performance boost due to cache performance, less chasing pointers off into uncached memory | 16:15.13 |
avih | exactly | 16:15.24 |
| we could start with allocating all char* unconditionally, see how that goes, because i think it's simpler than splitting the APIs to external which does allocate and internal which doesn't except when creating a different string. | 16:16.49 |
| once that is complete, splitting the API should be simpler. and then we can also hold all the strings at a db and see how much impact it has in terms of memory/performance | 16:17.45 |
| ator: that's a summary run on windows with my patches and without. my patches increase the score in 40%, splay is the most affected by x10 https://0x0.st/iB3G.txt | 16:43.44 |
| raytrace is affected mostly by the reused allocations | 16:44.26 |
| so the improvements are not meaningless, but also definitely not something to write (too much) home about | 16:46.26 |
| i think C-arrays could have a meaningful impact as well, but this needs some work. | 16:47.05 |
| <<<Back 1 day (to 2020/03/26) | Forward 1 day (to 2020/03/28)>>> | |