Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2020/02/11)Fwd 1 day (to 2020/02/13)>>>20200212 
avih ator: do you think mujs should provide utilities to_utf8 and from_utf8? i think most applications generally use utf8, and mujs is the one which (potentially) requires conversion to and from CESU-811:11.17 
  or maybe just make the C interface utf8 and internally check if conversion is required or not?11:12.09 
  it should be trivial when sending strings into mujs, but might require memory management when reading string values from mujs11:13.11 
  the CESU-8 nature of the interface requires extra care at the API usage. I think that checking if conversion is required at all should be fairly quick and the vast majority of the cases would not require any conversion11:15.59 
ator avih: converting inside mujs would require memory management as you say, and would be non-trivial. adding cesu8_from_utf8 and utf8_from_cesu8 helper functions for the few folk who would know or care about the distinction should suffice I think.11:30.15 
avih ator: in that case, also two functions which check if conversion is required?11:31.42 
ator depends on how you use it, if the JS code you write doesn't care and is just passing strings along and not actually looking at them, you can get away with using utf8 (you just won't see the surrogate pairs, but a string suddenly has a character value > 65535)11:31.43 
avih define "see"11:32.16 
ator we already have the C-friendly limitation that mujs javascript strings can't hold '\0' characters11:32.25 
  js_pushstring("<U+1FFFF>")11:33.12 
  s[0] == 0x1ffff;11:33.24 
  if you don't care about the semantics (i.e. did you expect to see a surrogate pair there?), mujs will let you get away with it11:34.12 
avih hmm.. not sure i understand if that's an issue or not. seems it's ok? IOW, where would there be an issue?11:34.17 
ator it would only be an issue if you hava javascript code that expects utf-16 surrogate pairs11:34.44 
  which is what the spec says strings should be11:34.59 
  although lots of people just uses javascript strings as 16-bit integer arrays for passing random binary data11:35.43 
  (and that falls apart on mujs due to us not supporting '\0' in strings)11:35.56 
avih so arbitrary utf8 strings which are passed into and out of mujs will remain as is, and the only issue is at js code if you expect certain things from the structure of the strings?11:37.29 
  e.g. js_pushstring(J, "hello <extended-charset-emoji>") and at js code you return str.substring(6), and at c code the string you read will be of the emoji only in utf8?11:39.39 
  emoji-only *11:40.05 
  but if you did str.substring(-1) then there could be issues, right?11:41.11 
  because in utf8 it's one codepoint, but in js it's two utf8 values?11:41.54 
  two utf16* values11:42.08 
ator if you pushstring a utf-8 that has values > 65535 then those will be passed to JS as-is (which could cause issues if your JS code expects characters > 65535 to be surrogate pair encoded)11:45.02 
  in your example, str[5] will be the <extended-charset-emoji>11:46.13 
avih hello+<space> is 6, so str[6], yes?11:46.44 
ator whereas if we were to do an enforced automatic conversion to cesu8, str[5] would be the first in a surrogate pair11:46.50 
  yes, sorry, str[6]11:46.58 
  so the documentation that talks about cesu8 is that if you expect conformant JS behavior, pass mujs cesu8 strings11:47.26 
  the mujs core is encoding agnostic there, and allows for codepoints > 6553511:48.03 
avih is it not a bug that a char in mujs js can have a value bigger than 0xffff ?11:48.05 
ator the bug is in the user code, don't pass values bigger than that!11:48.30 
avih right, but even if the user used it incorrectly, mujs still behaves gracefully by making the codepoint one char in js string, yes?11:49.09 
ator if there is a bug, it's that we don't validate strings in js_pushstring11:49.22 
  to make sure they are valid cesu8 strings and not use code points outside the BMP11:49.37 
avih i find it a useful behavior. rejecting extended chars would break for code which just uses utf811:49.57 
ator however, for usability sake, to be the least surprising, we do what we do now and just pass the data through as is11:49.59 
  and if you don't know any better, then things will just work11:50.19 
avih "as is", but the utf8 implementation still recognizes it as extended codepoint and sets its value correctly11:50.49 
  ator: in that case, is it possible to abuse the implementation to hold arbitrary binary data by converting it to 16-bits pairs where value of 0 gets the codepoint 0xffff+1, and when reading it doing % 0x10000 ?11:53.30 
ator ehm, no, I'm sorry I lied11:53.30 
  we encode code points > 65535 as invalid characters11:53.47 
avih oh11:53.54 
ator I wasn't sure which version of the UTF8 encoding I was using in mujs11:54.08 
  we are using the one that limits it to 3-byte sequences (which only encode up to 16 bits)11:54.29 
avih so in this case passing extended charsets into and out of mujs via the c interface will modify the string, yes?11:54.42 
ator could be replaced with one that does 4 byte sequences (up to 21 bits)11:54.48 
avih (passing into - as utf8, that is)11:54.59 
  that sounds good (improving to behave like you described earlier)11:55.22 
ator avih: the string will be as is, but iterating over the characters would give you "bad encoding" characters11:55.30 
avih so what happens with substring?11:55.48 
ator it iterates, like indexing also does11:56.19 
avih so would .substring(6) and/or .substring(-1) break when passed back to the c interface or in any other way?11:57.07 
  sorry, for -1 it should be slice, not substring11:59.30 
ator s.charCodeAt(6) would in your case return 6553312:20.25 
  var s = "Hello 💩!";12:22.07 
  for (var i=0; i < s.length; ++i) print(s.charCodeAt(i))12:22.07 
  that shows what ends up in the string12:22.39 
  7212:23.49 
  10112:23.50 
  10812:23.50 
  11112:23.50 
  3212:23.51 
  6553312:23.52 
  3312:24.00 
avih ator: is js_loadfile(js_State *J, const char *filename); buggy then if it doesn't convert a source file to CESU-8 ? the docs mention only C-API, but supposedly js_loadfile should expect utf8, right?15:46.39 
ator avih: yes.17:19.04 
openbsdtai123 Is mupdf reconsidering using getchar() to move prev/next slide as well ? see Mplayer, feh,... and co.18:59.14 
sebras openbsdtai123: have you tried "xdotool search --class MuPDF keydown Next" yet?19:10.12 
openbsdtai123 xdotool is a linux solution ;)19:17.08 
  getchar ;)19:17.10 
sebras openbsdtai123: really? https://openports.se/x11/xdotool19:19.41 
openbsdtai123 xdotool is slow and non working. It is a bad idea,19:20.08 
  while getchar is just : int c ; c = getchar(); and do sthg .19:20.23 
sebras openbsdtai123: did you read ator's response at the bottom of this log? https://ghostscript.com/mupdfirclogs/2020/01/21.html19:21.46 
openbsdtai123 I havent read it... I see not much difficulty to add a getchar(). this is simple.19:23.27 
sebras openbsdtai123: ator didn't seem keen on adding it and he is the main mupdf developer.19:25.22 
openbsdtai123 it needs to learn termios ... this is outdated today.19:28.06 
angry I need help21:44.57 
  I installed this from source but messed some of the files up. Now I'm trying to uninstall and found that there's no `21:45.33 
  I installed this from source but messed some of the files up. Now I'm trying to uninstall and found that there's no `make uninstall` provided.21:45.44 
  Can any of you help with this?21:51.11 
 <<<Back 1 day (to 2020/02/11)Forward 1 day (to 2020/02/13)>>> 
ghostscript.com #ghostscript
Search: