MuPDF IRC logs

	<<<Back 1 day (to 2020/02/11)	Fwd 1 day (to 2020/02/13)>>>	20200212
avih	ator: do you think mujs should provide utilities to_utf8 and from_utf8? i think most applications generally use utf8, and mujs is the one which (potentially) requires conversion to and from CESU-8		11:11.17
	or maybe just make the C interface utf8 and internally check if conversion is required or not?		11:12.09
	it should be trivial when sending strings into mujs, but might require memory management when reading string values from mujs		11:13.11
	the CESU-8 nature of the interface requires extra care at the API usage. I think that checking if conversion is required at all should be fairly quick and the vast majority of the cases would not require any conversion		11:15.59
ator	avih: converting inside mujs would require memory management as you say, and would be non-trivial. adding cesu8_from_utf8 and utf8_from_cesu8 helper functions for the few folk who would know or care about the distinction should suffice I think.		11:30.15
avih	ator: in that case, also two functions which check if conversion is required?		11:31.42
ator	depends on how you use it, if the JS code you write doesn't care and is just passing strings along and not actually looking at them, you can get away with using utf8 (you just won't see the surrogate pairs, but a string suddenly has a character value > 65535)		11:31.43
avih	define "see"		11:32.16
ator	we already have the C-friendly limitation that mujs javascript strings can't hold '\0' characters		11:32.25
	js_pushstring("<U+1FFFF>")		11:33.12
	s[0] == 0x1ffff;		11:33.24
	if you don't care about the semantics (i.e. did you expect to see a surrogate pair there?), mujs will let you get away with it		11:34.12
avih	hmm.. not sure i understand if that's an issue or not. seems it's ok? IOW, where would there be an issue?		11:34.17
ator	it would only be an issue if you hava javascript code that expects utf-16 surrogate pairs		11:34.44
	which is what the spec says strings should be		11:34.59
	although lots of people just uses javascript strings as 16-bit integer arrays for passing random binary data		11:35.43
	(and that falls apart on mujs due to us not supporting '\0' in strings)		11:35.56
avih	so arbitrary utf8 strings which are passed into and out of mujs will remain as is, and the only issue is at js code if you expect certain things from the structure of the strings?		11:37.29
	e.g. js_pushstring(J, "hello <extended-charset-emoji>") and at js code you return str.substring(6), and at c code the string you read will be of the emoji only in utf8?		11:39.39
	emoji-only *		11:40.05
	but if you did str.substring(-1) then there could be issues, right?		11:41.11
	because in utf8 it's one codepoint, but in js it's two utf8 values?		11:41.54
	two utf16* values		11:42.08
ator	if you pushstring a utf-8 that has values > 65535 then those will be passed to JS as-is (which could cause issues if your JS code expects characters > 65535 to be surrogate pair encoded)		11:45.02
	in your example, str[5] will be the <extended-charset-emoji>		11:46.13
avih	hello+<space> is 6, so str[6], yes?		11:46.44
ator	whereas if we were to do an enforced automatic conversion to cesu8, str[5] would be the first in a surrogate pair		11:46.50
	yes, sorry, str[6]		11:46.58
	so the documentation that talks about cesu8 is that if you expect conformant JS behavior, pass mujs cesu8 strings		11:47.26
	the mujs core is encoding agnostic there, and allows for codepoints > 65535		11:48.03
avih	is it not a bug that a char in mujs js can have a value bigger than 0xffff ?		11:48.05
ator	the bug is in the user code, don't pass values bigger than that!		11:48.30
avih	right, but even if the user used it incorrectly, mujs still behaves gracefully by making the codepoint one char in js string, yes?		11:49.09
ator	if there is a bug, it's that we don't validate strings in js_pushstring		11:49.22
	to make sure they are valid cesu8 strings and not use code points outside the BMP		11:49.37
avih	i find it a useful behavior. rejecting extended chars would break for code which just uses utf8		11:49.57
ator	however, for usability sake, to be the least surprising, we do what we do now and just pass the data through as is		11:49.59
	and if you don't know any better, then things will just work		11:50.19
avih	"as is", but the utf8 implementation still recognizes it as extended codepoint and sets its value correctly		11:50.49
	ator: in that case, is it possible to abuse the implementation to hold arbitrary binary data by converting it to 16-bits pairs where value of 0 gets the codepoint 0xffff+1, and when reading it doing % 0x10000 ?		11:53.30
ator	ehm, no, I'm sorry I lied		11:53.30
	we encode code points > 65535 as invalid characters		11:53.47
avih	oh		11:53.54
ator	I wasn't sure which version of the UTF8 encoding I was using in mujs		11:54.08
	we are using the one that limits it to 3-byte sequences (which only encode up to 16 bits)		11:54.29
avih	so in this case passing extended charsets into and out of mujs via the c interface will modify the string, yes?		11:54.42
ator	could be replaced with one that does 4 byte sequences (up to 21 bits)		11:54.48
avih	(passing into - as utf8, that is)		11:54.59
	that sounds good (improving to behave like you described earlier)		11:55.22
ator	avih: the string will be as is, but iterating over the characters would give you "bad encoding" characters		11:55.30
avih	so what happens with substring?		11:55.48
ator	it iterates, like indexing also does		11:56.19
avih	so would .substring(6) and/or .substring(-1) break when passed back to the c interface or in any other way?		11:57.07
	sorry, for -1 it should be slice, not substring		11:59.30
ator	s.charCodeAt(6) would in your case return 65533		12:20.25
	var s = "Hello 💩!";		12:22.07
	for (var i=0; i < s.length; ++i) print(s.charCodeAt(i))		12:22.07
	that shows what ends up in the string		12:22.39
	72		12:23.49
	101		12:23.50
	108		12:23.50
	111		12:23.50
	32		12:23.51
	65533		12:23.52
	33		12:24.00
avih	ator: is js_loadfile(js_State J, const char filename); buggy then if it doesn't convert a source file to CESU-8 ? the docs mention only C-API, but supposedly js_loadfile should expect utf8, right?		15:46.39
ator	avih: yes.		17:19.04
openbsdtai123	Is mupdf reconsidering using getchar() to move prev/next slide as well ? see Mplayer, feh,... and co.		18:59.14
sebras	openbsdtai123: have you tried "xdotool search --class MuPDF keydown Next" yet?		19:10.12
openbsdtai123	xdotool is a linux solution ;)		19:17.08
	getchar ;)		19:17.10
sebras	openbsdtai123: really? https://openports.se/x11/xdotool		19:19.41
openbsdtai123	xdotool is slow and non working. It is a bad idea,		19:20.08
	while getchar is just : int c ; c = getchar(); and do sthg .		19:20.23
sebras	openbsdtai123: did you read ator's response at the bottom of this log? https://ghostscript.com/mupdfirclogs/2020/01/21.html		19:21.46
openbsdtai123	I havent read it... I see not much difficulty to add a getchar(). this is simple.		19:23.27
sebras	openbsdtai123: ator didn't seem keen on adding it and he is the main mupdf developer.		19:25.22
openbsdtai123	it needs to learn termios ... this is outdated today.		19:28.06
angry	I need help		21:44.57
	I installed this from source but messed some of the files up. Now I'm trying to uninstall and found that there's no `		21:45.33
	I installed this from source but messed some of the files up. Now I'm trying to uninstall and found that there's no `make uninstall` provided.		21:45.44
	Can any of you help with this?		21:51.11
	<<<Back 1 day (to 2020/02/11)	Forward 1 day (to 2020/02/13)>>>

Log of #mupdf at irc.freenode.net.