Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2020/05/13)Fwd 1 day (to 2020/05/15)>>>20200514 
myopia is artifex planning on muhtml after mujs? all current browser and html engines suffer from memory leaks from those I have tried10:51.11 
  a clean-room C implementation of a leak-proof html engine to be coupled with mujs would be nice10:52.01 
  a candidate to be squared away for a truly scriptable minimalistic browser10:52.36 
  scriptable *and* minimalistic10:52.54 
  probably also coupled with security proofs from a haskell code model10:53.39 
ator myopia: no. a minimalist browser that is usable on todays web would be a universe-imploding type paradox.11:49.46 
  sebras: a couple of mujs commits on tor/master that could do with a review if you got time11:50.53 
  avih: that goes for you too. I'm thinking of finally pushing the WTF-8 mess...11:51.52 
avih ator: summary? also what is the "WTF8 mess"?11:52.20 
ator allow >16bit characters in JS strings that would pass from C to C untouched, without automatic SMP conversion11:52.55 
avih (i've been using this successfully https://github.com/avih/mpv/commit/81306684 )11:52.57 
  well, i know what the problem is quite exactly. i mean, what's your solution?11:53.27 
ator basically the same hack as duktype11:53.27 
  my solution is to do nothing and leave it up to the user :)11:53.47 
avih ah, so js char is a codepoint?11:53.47 
  (that's the implication of char > 16 bit, right?)11:54.25 
ator with the commits I'm proposing, you won't be able to create a >16bit char from Javascript, nor by loading it from source11:55.08 
avih but it will preserve them if they came from the C api?11:55.35 
  what about slice etc?11:55.40 
ator but if you pass in a string with >16bit characters via js_pushstring(), they will pass through unmolested back to js_tostring()11:56.08 
avih what happens with CP > BMP in source files/code?11:56.15 
ator SMP characters in UTF-8 source code will be converted to surrogate pairs11:56.37 
avih not fun11:57.02 
  the C client will still need to handle surrogates then, in addition to UTF8 as non-surrogates11:57.28 
  ator: i don't know why you keep messing with half solutions. make it proper. let the C api be UTF8 and handle everything which results inside mujs, IMO11:58.55 
  be graceful with invalid codepoints, but for valid ones just make them work. remove this burden from the user.11:59.33 
ator fine. then I'll be strict and only allow 16-bit characters as originally intended by ECMAScript.12:00.07 
avih my solution is external and complete. internally it will be a bit simpler, but there's no getting away from converting every C api string if required.12:00.37 
  you can create another version of all string APIs which will be used internally only, so it doesn't get tested for conversion, but user-facing APIs should just be UTF8.12:02.39 
ator I'm honestly more tempted to say "screw compatibility" and allow >16bit code points in the JS code12:03.21 
avih i think this would be better than half-baked solutions. not as good as both UTF8 API and 16 bits chars, but better than half-solutions IMO12:06.10 
ator the main problem with that is representing strings as ascii. there's no escape syntax for >16bit unicode characters.12:06.56 
  \x is 1-byte, and \u is 2-byte only12:07.09 
avih i don't follow example?12:07.13 
  hmm12:07.16 
  isn't there \U ?12:07.24 
ator (of course, I could just not escape them)12:07.26 
avih so this is only for JSON?12:07.39 
ator JSON and console.log()/print()12:08.20 
avih iirc JSON allows both surrogate and \U<any-codepoint>. I don't know if the js JSON spec requires differently12:08.42 
  I was wrong. JSON allows either surrogates or unescaped utf812:09.54 
  https://en.wikipedia.org/wiki/JSON#Data_portability_issues12:10.19 
ator I'm leaning towards not escaping SMP characters12:10.39 
avih 15.12.1.1 says "SourceCharacter but not one of " or \ or U+0000 through U+001F"12:13.32 
  i think it could be interpreted that that unescaped SMP is allowed?12:13.49 
  actually, any source utf8 except 0-31 seems to be allowed12:14.34 
ator well, the JSON.stringify algorithm actually only says to escape 0-31, not >12712:25.29 
  avih: okay, how about what's on tor/master now then?12:35.42 
avih ator: i assume only HEAD~1 and HEAD~2 ?12:37.26 
ator yes. feel free to look at the others too, but those are the ones I think you're most interested in.12:37.54 
avih ator: so the first commit only converts 0 in js_pushlstring into whatever else you encode it as?12:40.36 
ator no. it does not affect pushlstring. it affects how "foo\0bar" is parsed12:41.20 
avih in source code only?12:41.44 
ator yes.12:41.48 
avih hmm..12:41.52 
  well, i think it should be applied in js_pushlstring as well12:42.10 
  (not sure yet how to retrieve it. there's no js_tolstring...)12:42.37 
ator in other words, we represent embedded 0 in strings as <c080>. same as Java does in the .class bytecode and elsewhere.12:42.42 
  js_pushlstring is to push a non-zero-terminated string12:43.12 
avih exactly, because it may include embedded 0 which you don't want to terminate it12:43.39 
ator making it a public function was a mistake12:43.45 
  no, because you're pushing a slice of another string as a new string12:43.56 
  like when the regexp code finds a match12:44.07 
avih that may have been your intention, but generally non-0-terminated also means a binary blob12:44.26 
ator pushing strings with an embedded zero was never the intention :)12:44.31 
avih anyway, i think this is off topic. source files rarely have 0 in them, so only for source files i don't think you should push it.12:45.20 
  (i.e. just terminate the parsing on 0, or skip it, or whatever, it's not worth all this code IMO)12:45.59 
  (fwiw, i don't disagree that pushlstring should not handle embedded 0. if you want blob, use whatever which supports arbitrary sequences - which is not *tf8. it could be some array buffer or some encoding etc)12:47.44 
  "+If you have Javascript code that expects to work with UTF-16 surrogate pairs,12:49.24 
  +you will need to manually convert any extended characters to surrogate pairs12:49.24 
  +and back when passing strings between C and Javascript."12:49.24 
  i think that it you already broke it, at least make it complete - allow >16bit also from js, and provide tehse functions which convert to/from surrogate pairs12:49.57 
  (i can write them for you once you allow >16bit from inside js)12:50.31 
  however, my solution is still complete, does not break the standard, does not have perf implications for running js code, and has effectively negligible perf impact for C APIs.12:52.31 
  you keep trying to find half solutions, even with this.12:53.28 
  (i understand why.. but you can also choose to solve it correctly)12:54.34 
ator nobody wants the "correct" solution because it's a piece of crap and the world has moved on to UTF-8 and is not stuck in 1993 when windows NT came out with its 2-byte "unicode" support...12:57.23 
  nobody in their right mind wants to mess with surrogate pairs.12:57.35 
avih i agree, but it's still the correct js way12:58.03 
ator your patch should still work unaffected by what is on tor/master12:58.06 
avih i know. i was hoping to not need it.12:58.29 
ator but now, if someone wants to use mujs with utf-8, unrestricted by having to convert to surrogate pairs, they can. if they just don't know or don't care, their code will work.12:58.58 
avih regardless, if you allow SMP chars from inside js and provide the conversion function, i think it will be good enough12:59.00 
ator you talk as if you haven't actually looked at the code :)12:59.20 
avih i haven't12:59.29 
  iirc you said it won't work from inside js12:59.41 
ator here I am thinking you object to something in the code, when you're actually objecting to something that doesn't exist >.<13:02.47 
avih i wasn't objecting to anything. i said that if you support SMP chars in js, then you should also support it from inside js if it's not supported already.13:03.47 
  and the docs should provide the js conversion to/from surrogates, and i can write them for you once using SMP from inside js is supported.13:05.16 
  i.e. str[1]=0x10005 should work13:05.54 
  or rather FromCharCode(0x10005) should work13:07.02 
ator the current patch supports them inside and outside of js. String.fromCharCode and String.prototype.charCodeAt() are not limited to 16-bits (as per the commit message)13:09.25 
avih great. then it should be good enough. i can write the to/from surrogate pairs conversion if you want to add them to the docs. i expect ~5-10 LOC each13:11.17 
  (probably more like 5)13:13.11 
  i still think supporting embedded 0 in source files is a waste of code. but up to you.13:13.47 
ator avih: again you haven't read the source. I don't support embedded 0 in the source code *file* ... I'm supporting it in strings, like this: var myZeroString = "Hello\x00World!";13:14.46 
  or \u0000. not a literal 0-value byte in the source text.13:15.47 
avih yeah, i've looked at it but didn't follow it, which is why i asked what it does, and from your replies i thought it only applies to source files. i probably understood wrong.13:16.00 
  anyway. i don't have stong opinion about the embedded 0 and i think the other patch to support SMP chars is good enough according to what you said.13:17.10 
ator fair enough. it's hard to convey the meaning in english :)13:17.26 
  language is too ambiguous.13:17.33 
avih sorry, it was probably me not reading carefully enough. shit happens :)13:18.50 
 <<<Back 1 day (to 2020/05/13)Forward 1 day (to 2020/05/15)>>> 
ghostscript.com #ghostscript
Search: