Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2020/03/26)Fwd 1 day (to 2020/03/28)>>>20200327 
avih ator: i managed to make a dent via gc threold strategy. splay becomes x10 faster with a strategy of next threshold = average of GC surviving allocations and GC'ed allocations. basically, in splay it adds huge amount of objects (nearly million) and very little is freed. so distancing GC makes a huge difference. on other tests it has less impact, but the threshold basically adapts itself to the amount of allocation action.14:38.09 
  e.g. that's the last gc of splay:14:39.41 
  garbage collected: thresh:1182733 8/11 envs, 3/52 funs, 863886/864328 objs, 318588/318590 strs14:39.42 
  so the threshold for gccounter is just over a million at this stage. in other tests it auto balances to the number of allocations the program uses.14:40.33 
  well, the last gc is maybe less interesting. this is 2 GCs earlier:14:42.16 
  garbage collected: thresh:781032 1/8 envs, 0/52 funs, 129961/898605 objs, 138687/394751 strs14:42.16 
  so it has huge number of objects which just stay there.14:42.51 
ator that's a case where generational and incremental gc would help performance14:54.59 
  but I think if you're chasing performance there are much easier to reach low hanging fruit to go for14:55.13 
  like changing the property lookup in objects into actual hash tables rather than simple self-balancing trees14:55.37 
  keeping a pool of previously-allocated-but-now-free objects is a minor (but good) change I'm glad you're seeing benefits from it14:56.59 
  I would gravitate towards "never" freeing the lists, except when constrained for memory14:58.20 
  could also look into allocating the different types from a pre-allocated pool of memory14:58.30 
  but again, all this is counter to the initial philosophy of "keep it stupidly simple"14:59.15 
avih ator: yeah, i checked "never" as well. it does help a bit for raytrace, not much elsewhere15:03.07 
  s/checked/tried/15:03.15 
  not sure how much generational would help with splay though, because it only affects sweep, but if objects just remain there, then mark is the bottleneck15:04.09 
  ator: well, keeping avail-lists is fairly simple (with or without freeing them) and useful. gc threshold strategy is also trivial with huge impact in some cases and no harm with others.15:06.14 
  hash could indeed have an impact, as well as C-arrays - which we already measured with big impact, even if the patches were not complete15:07.12 
  my own gripe with the code is the interning. of all property names and string object values15:08.51 
  as i said, yesterday, the "avoid interning short property names" is buggy, probably due to use after free where it's likely pushed as literal in an iterator or some such and tries to live after the object dies15:09.51 
ator why does the interning bother you? it's intended to remove the overhead for duplicating strings everywhere they are used and not have to GC them15:10.42 
avih i know what it does, but to me it feels like a cheap hack15:11.46 
ator if it's because the interning table is a slow binary tree, make a hash table.15:11.48 
avih no, my main issue with it is that it never frees15:12.10 
ator the only strings that are interned are string constants from source code15:12.48 
avih i.e. there are valid program patterns which should be stable but will grow forever with the interning15:12.56 
ator it's a very common method for programming languages15:13.11 
avih of course not. all property names are interned, as well as any string which temporarily become an object to get its length or when using str[n] with it etc15:13.40 
  (i have a patch to not intern String value)15:14.43 
  (you never responded to it eventhough i posted it few times here)15:15.07 
ator I think having 'short' strings inline in a struct (like in js_Value) will go a long way towards solving the same issue with repeated property names15:15.35 
avih yes, i think property names should be js_Value15:15.59 
ator might look into putting string constants from the parser, the internal string in String objects, and properties use a subset of something like js_Value15:16.16 
avih this is trivial to do, but needs a bit more work to be used throughout the code. e.g. every place which does setproperty(.. name) would need to pass a js_Value where possible15:16.42 
ator js_StringValue or something like that, which has TSHRSTR and TMEMSTR and TLITSTR15:16.43 
  or just SHRSTR and MEMSTR15:16.48 
avih exactly15:17.10 
  but js_Value itself is fine as well15:17.19 
ator it needs to be only string values, js_Value allows too much15:17.24 
avih i don't disagree15:17.52 
  but if you assert that it's only a string value, then it's equivalent, even if less pretty15:18.19 
  ator: that's my patch to not intern String object values: https://0x0.st/iBxU.txt15:20.28 
  so you see that jsV_newstring assert that it's a string value which is made an object, and the value is simply copied into the object.15:22.41 
  ator: i don't think just SHRSTR and MEMSTR is enough. it needs literal as well, because it needs to hold any mujs string.15:25.38 
  so the VM main loop can just use string value regardless of its source when it sets a property name15:26.27 
ator avih: okay, here's a counter proposal:15:46.08 
  intern every string instead, and GC the string table15:46.18 
avih not bad15:46.58 
ator pass const char * throughout the API (and get the js_StringNode using str + offsetof(js_StringNode, p)15:47.02 
  str - offsetof(...)15:47.08 
avih what does it counter?15:47.13 
ator would remove the complexity of MEMSTR, LITSTR, and SHRSTR15:47.22 
avih yeah, nice15:47.49 
  and then at a later stage you could/would replace the tree with a hashtable15:48.04 
ator but would need more thorough GC scanning of strings (js_Function string constant table, function name, etc)15:48.11 
  avih: yes.15:48.16 
avih indeed re more thorough15:48.25 
ator it would also remove the caveat of having to keep the stack slot alive in the C api to not corrupt char *15:48.47 
  well, not completely, it could still be GC'd15:49.10 
  but you would generally have to worry less15:49.22 
avih ator: so the assumption underneath this change is that looking a string up in a hashtable/tree is cheaper than allocating?15:49.27 
ator counter proposal b is to duplicate every string and make everything a MEMSTR15:50.00 
avih yes, did did consider that as well (all strings distinctly allocated)15:50.25 
ator a trade-off, not duplicating memory for lots of strings vs time spent interning on the C api border15:50.28 
  less pressure on the GC as well, every new string need not be a new object that has to be walked and freed15:50.55 
  s/object/allocation/15:51.07 
avih i think both are less involved maybe that moving js_Value around, but ultimately i think the js_Value approach is more correct.15:51.33 
  than*15:51.49 
ator for holding the internal value of a String object, I can agree15:52.07 
avih just ignore char * APIs internally, and replace it with js_StringValue, which could also be a js_Value15:52.30 
ator for the property list in objects, I'm not as convinced (that one is (my gut feeling) the main performance bottle-neck in a lot of code)15:52.43 
avih why would it be more code?15:53.03 
ator have to check 2 or 3 cases before being able to extract and compare the strings15:53.19 
avih it does need a lot of change, but i don't think it would end up in meaningfully more code15:53.26 
  ator: but before all of this, my conclusion for now is that we don't really know where the time is spent. we have some anecdotal datapoints, but that's about it. and trying to improve things without a clear grasp of the current state is kinda meh, even if we both appreciate our prior experience, i already noticed it fail to predict the perf impact with mujs15:55.18 
  it=my prior experience15:55.54 
ator not permanently growing memory when using String() and accessing properties using computed strings is the main concern I'm listening to15:57.12 
avih the facts which we do have is that gc threshold strategy and C-arrays can have big impact with susceptible use cases. we really don't know much beyond these.15:57.13 
ator performance is, meh, it's slow anyway :)15:57.19 
avih i hear you15:57.40 
ator "leaking" memory for the String() and property name case could become a real problem for some use cases15:57.55 
avih but i do want to at least know why it's slower than other js VMs. what's the main thing, if there is one15:58.15 
ator btw, historically the string types we have have grown a bit organically and maybe not the best way15:58.21 
  started out by interning all strings, and *always* leaking them15:58.30 
avih yeah, as i said, cheap/lazy (and working) hack :)15:58.52 
ator then I added short strings to avoid interning and leaking lots of small strings (like every 'number' when accessing arrays)15:58.54 
  and then I added memory strings to do garbage collection15:59.01 
  I wanted to avoid having to GC and mess with GC'd strings in the lexer15:59.15 
avih right, i grok the evolution15:59.23 
ator revamping it to GC all strings and intern them to avoid having to duplicate all property names is something I could look into next I spend significant time on mujs16:00.00 
avih well, i do like the all strings are in a tree/hash thing. it does make the code simpler.16:01.02 
ator yes.16:01.25 
avih it's less correct, and slower, than passing js_value around, but maybe it can improve things.16:01.51 
ator define "correct"16:02.04 
avih correct is that you don't have to locate a char* pointer in a "DB" where one OPCODE earlier you already had it.16:02.42 
ator you wouldn't, you could always use the char* pointer directly16:03.10 
avih s/where/when/16:03.14 
ator in the interpreter and internals16:03.22 
avih because it's already interned?16:03.27 
ator every place a string is entered into the stack or a value, it's interned16:03.48 
  js_pushstring will intern the string16:03.55 
  if it's coming from user space16:04.06 
  the parser will intern strings going into the function opcodes, etc.16:04.20 
avih right. i need to think about it a bit, but if the interning itself only happens on external APIs of when new strings are created in the vm (concat, etc), then it sounds super fine to me16:04.43 
ator so only the touch points to the public C api where you are adding strings as values need interning16:04.46 
  pushstring, and setproperty being the main ones16:05.00 
avih but only setproperty at the external API, yes? the internal api will use a version of setproperty which assumes it's already interned, yes?16:05.41 
ator the bytecode interpreter will just be passing char* pointers (embedded in a js_Value) around untouched16:05.42 
  yes. the public js_setproperty will intern, the internal jsV_setproperty etc will assume already interned string16:06.05 
avih yeah, that's similar to my idea of passing jsvalue around except when char * comes from the external api. but it's better than mine, because it can remain char* internally as well16:06.47 
ator yeah. we'll lose the SHRSTR optimization for js_Value but I'm willing to pay that cost16:07.13 
avih but this means no short string, right?16:07.17 
  right16:07.20 
  fwiw, i actually found that interning the property names had impact even on read because the data was local. so shortstring, or embedded values in general, does have good performance impact due to data locality16:08.31 
  ator: so if we do this and string never moved, is there anything which blocs from using dynamic stack size etc?16:10.29 
  strings never move*16:10.46 
  i think the only concern with moving js_values around is shortstrings16:11.16 
  if that's gone, then it could be fully reallocatable16:11.34 
  earlier: found out that NOT interning property names (i.e. using embedded buffer) had good perf impact*16:12.40 
ator avih: yes, re: reallocing stack.16:13.59 
  avih: interning vs duplicating is a memory vs speed tradeoff16:14.45 
  embedded buffers (and short strings) can be a performance boost due to cache performance, less chasing pointers off into uncached memory16:15.13 
avih exactly16:15.24 
  we could start with allocating all char* unconditionally, see how that goes, because i think it's simpler than splitting the APIs to external which does allocate and internal which doesn't except when creating a different string.16:16.49 
  once that is complete, splitting the API should be simpler. and then we can also hold all the strings at a db and see how much impact it has in terms of memory/performance16:17.45 
  ator: that's a summary run on windows with my patches and without. my patches increase the score in 40%, splay is the most affected by x10 https://0x0.st/iB3G.txt16:43.44 
  raytrace is affected mostly by the reused allocations16:44.26 
  so the improvements are not meaningless, but also definitely not something to write (too much) home about16:46.26 
  i think C-arrays could have a meaningful impact as well, but this needs some work.16:47.05 
 <<<Back 1 day (to 2020/03/26)Forward 1 day (to 2020/03/28)>>> 
ghostscript.com #ghostscript
Search: