MuPDF IRC logs

	<<<Back 1 day (to 2020/01/26)	Fwd 1 day (to 2020/01/28)>>>	20200127
ator	hisacro: PDF is a random access file format, and mupdf needs a "seekable" file. You have to save to a temporary file and open that, mupdf cannot open content from stdin.		10:23.27
	emfipp: we don't ship an installer. all the mupdf exe files are standalone and don't need any installation, just run them from wherever.		10:24.08
	emfipp: mupdf-gl takes a -r option on both windows and unixes		10:24.38
	emfipp: mupdf-gl does not (due to how it uses FreeGLUT for windowing) support extracting the dpi from the monitor		10:25.24
	not at the same level as mupdf-w32/x11, but we do look at it for scaling the UI		10:27.11
sebras	ator: one day I shall install docker in a VM instead of running it native. :-P		10:27.16
avih	ator: i've been thinking about optimized object array keys (index) access/storage. that's my rough initial approach: https://0x0.st/ir5D.txt thoughts?		12:41.06
	everything is o(1) and keys never move between optimized and normal storage		12:43.05
	i didn't consider enumeration yet		12:49.15
	the only cost this has (other than more logic) is unused memory if an array becomes sparse after it's been not-sparse. but i think it's a fair tradeoff, because we've already had to pay this price earlier anyway, so it doesn't add a new price, just doesn't has a "no refunds" policy		13:02.35
	just has* a "no refunds"..		13:03.06
	and even this can be modified, e.g. if sparseness becomes extreme, unoptimize the storage once but keep it frozen.		13:04.28
ator	avih: with a bit of reshuffling, you/I/we/someone could add a "small integer" property array part in js_Object to go along with the current property tree		13:10.12
	this would end up being somewhat similar to lua's approach to their tables		13:10.21
	where they have an 'integer indexed array' part and a 'general hash table' part		13:10.33
avih	ator: why small?		13:10.59
ator	in Lua it the "small" part grows if you extend it by one		13:11.23
avih	also, i don't think "-1" is common. i think the js definition of index being uint < 2^31-1 is good		13:11.45
ator	but add an integer key that is larger than that, will make that key "sparse"		13:11.48
	each entry in the array will still need to be a full property, with attributes and getter and setter		13:12.10
	so basically the same as the current table, but a faster way to index the properties		13:12.30
	s/index/find/		13:12.40
avih	of course, my only concern was whether it's stored at the tree or in an optimized c array		13:12.47
ator	enumeration and deletion are the "tricky" parts		13:13.17
	a deleted property needs to not exist, and is different from a property with value 'undefined'		13:13.34
	which is where my previous attempts to do a quick hack for this have fallen down		13:13.46
avih	ator: i'm not familiar with the lua mechanics especially if a key changes its storage, but i think my approach covers it well, and can cover the lua one too with tunning of the values		13:14.03
ator	but at the time, I also attempted to keep the same behavior as chrome/spidermonkey where enumeration returned properties in the same order they were inserted		13:14.20
	I gave that up, which should make the task easier now		13:14.39
avih	ator: i considered deleted. not yet enumeration (but should be easy, because order is implementation defined)		13:14.41
ator	... since the order is implementation defined		13:14.59
avih	ator: how does lua handle a "non small integer" which later becomes "small" because gaps have been filled?		13:15.45
ator	don't know, it may keep track of fill rate and rearrange storage if it grows too sparse		13:16.04
avih	well, in my approach it's well defined.		13:16.22
	ator: did you look at my paste?		13:16.45
ator	why 'freeze'? (and I assume this is not related to Object.freeze())		13:17.04
avih	correct, it's related to optimized storage size		13:17.44
	and i freeze the size because it makes everything else O(1)		13:18.06
	otherwise you need O(N) when you delete		13:18.31
ator	in lua, contiguous integer keys starting from 1 (would be 0 in mujs) always go in the array part, everything else goes into the hash part		13:18.46
avih	(to maintain count, in case you delete ix size-1)		13:18.57
ator	we could maintain a bit in the array part saying 'deleted' and enumeration would skip it		13:19.17
	keep count of how much in the array part is deleted, and if it grows beyond a certain ratio, repack the object properties		13:19.35
	by either shrinking (if they are all at the end) or moving everything past the first hole into the hash/tree part		13:20.09
avih	well, i considered it allocated objects with NULL at the c array, but it can also be an array of objects with is_deleted key too		13:20.14
ator	the array part will need to be some variant of js_Property struct (missing the name, left, right, and level fields; keeping atts, value, setter, getter)		13:20.50
avih	yes		13:21.12
ator	one bit in the 'atts' field can store whether the key exists or not		13:21.42
avih	re repack, my approach avoids it. it's a one-way approach, but sure, expensive repacking is possible too		13:21.52
ator	better than freezing it completely, I think		13:22.10
avih	however, i think that the cost is likely to be small even without repacking		13:22.21
ator	{ 1:'foo', 2:'bar', 5000:'zot' } should still be unfrozen, IMO		13:22.37
avih	and it's way simpler without repacking		13:22.40
	ator: in theory i agree (1/2/500), in practice this is hard to maintain, but i aborted this approach early, it might be simpler than i guessed		13:23.27
	the thing is, my approach has everything O(1). repacking etc can be O(N)		13:24.26
	(even just checking if repacking is needed)		13:24.41
	however, i do maintain sparseness in O(1), so if repacking depends on sparseness, checking it is O(1) even if the repacking is O(Nlogn)		13:25.40
	the thing i afraid the most is that some usage patterns can end in frequent expensive repacking.		13:27.29
ator	implement without repacking, or check for repacking (which is O(1) -- just compare 'deleted' count with 'added') only when doing GC		13:31.15
avih	'deleted existing' with 'added non-existing', but yes		13:33.46
ator	avih: would you keep the array part 2^m sized, and the sparseness check is whether it fits in the current allocation or is exactly one past the edge?		13:34.05
avih	i thought the array size is 2^m, but it doesn't have to be. it's not part of the sparseness calc		13:34.38
	the sparseness only counts highest inserted ix and count		13:35.05
ator	problem is when growing, the new array part may need to swallow up existing properties in the hash part		13:35.08
	or fall back to looking at the hash part		13:35.14
avih	not sure i follow		13:35.26
ator	cause if you start with {1:foo, 2:bar, 10:zot}		13:35.38
	you would have an array part with 1 and 2, hash part with 10		13:35.48
avih	in my approach (without repacking), keys mover move between optimized and tree		13:35.51
ator	but if you keep growing the array part up to 10		13:35.53
avih	never*		13:36.00
ator	then 10:zot would need to migrate		13:36.02
	yeah, you'd 'freeze' as soon asy ou see the 10 and the array part would never expand, right?		13:36.22
avih	yes		13:36.30
	repacking would indeed involve moving between storages		13:36.52
	the thought is that if we're adding items which make it sparse, it's likely to remain sparse		13:37.43
	but as long as it's reasonably dense, we keep it optimized		13:38.05
ator	yeah, either you treat it as a compact 'array' and get the benefits		13:38.21
	deviate from that and you get punished		13:38.34
avih	it's not impossible to add a repack. but considering it's O(N log n), it must not happen frequently		13:38.57
	where N is the sparse length. e.g. {1:"foo", 1000:"bar"} N is 1000		13:39.31
	however, with smart code N can be min(num-existing-keys, sparse-length)		13:40.42
	(you're alternating normal enumeration steps at the tree with n=n+1 at the optimized storage)		13:41.25
	my thought was that storage_base-len is 4 or 8, and grow is 2 or 1.5 or something along these lines		13:44.52
	also note that in my approach, on insertion we don't care about sparseness when it fits the storage. so it can become sparse and then regain density without freezing the size		13:46.29
	i.e. we only consider sparseness just before we're about to grow the storage		13:47.39
	ator: look at it this way: if it becomes sparse, it falls back to current behavior and that's it, except that ix which fit the storage size are still optimizd.		13:52.26
	i don't disagree it's nicer if it's not one-way, but the cost (logic, runtime) can be high		13:53.44
	and in my approach the cost is very reasonable i think		13:54.13
	just "no refunds" for things you've already paid		13:54.33
ator	sure. that's not where the real complexity lies though -- that's in getting enumeration, deletion, attribute properties, and the prototype lookup chaining, setters and getters,		13:56.38
	interactions with "classes" special behavior (String, Array), userdata fallback setters and getters		13:57.07
	and object and property level permissions		13:57.37
avih	i'm not sure i get that. all those things (maybe except "virtual" properties like String) remain as is. it's just lookup/insertion/enumeration which changes.		13:58.49
	lookup/deletion/insertion are trivial. enumeration should be too (scan the optimized storage, then do normal tree scan)		13:59.44
	virtual keys like String - no idea. i never looked at that code, but i imagine it shouldn't change much either, because it only modifies how a key is accessed IF it's stored		14:02.47
ator	avih: if you hide the details in jsV_getproperty and family of friends, yes, you're probably right		14:02.56
	it's been a while, I'm not as familiar with the code as I used to be :)		14:03.06
avih	yes, that was my intent (before trying to touch the code)		14:03.17
	heh		14:03.22
	the optimization is a property of the storage. tree or tree and c-array. whether the storage is abstracted well, or if it's leaky (e.g. things elsewhere use left/right) - i don't know		14:04.52
ator	nothing else uses the left/right/level in js_Property		14:08.21
avih	i imagined the storage as having the API: has_key(s), get_key(s), del_key(s), set_key(s, val) (maybe returns the prev val object existed). and users of the storage should not care how the key is mapped internally to a value. it can be hash, tree, array, whatever		14:08.36
	ator: i don't see a js_property.h, what's the API?		14:17.37
	jsproperty.h *		14:17.44
	would be easier to approach it if the API is clear		14:18.10
	i'm also not seeing the definition of struct js_Property ...		14:21.35
	ah, in jsvalue.h. github omitted this result in its search		14:23.07
	well, left, right, level should not be global. they should be local to a wrapper struct at jsproperty.c /me thinks		14:24.25
	not sure what atts is, but i imagine it's part of the property itself and not part of the storage (i.e. should remain global)		14:25.30
	and not sure if name should be global or not. depending if it's only used for access (so local) or also used elsewhere		14:26.14
	jsG_freeproperty should probably move to jsproperty.c		14:28.59
ator	avih: sorry, having internet troubles here		15:00.37
	avih: atts has the DONTENUM, DONTCONF, READONLY bits for the property		15:00.56
	avih: the files are organized to minimize the number of functions that need to be publicly visible, not by data structure		15:01.28
	<<<Back 1 day (to 2020/01/26)	Forward 1 day (to 2020/01/28)>>>

Log of #mupdf at irc.freenode.net.