| <<<Back 1 day (to 2014/11/06) | 20141107 |
rayjj | For the logs: I replied to Saito-san about halftones (dither patterns) with Ghostscript. Hopefully, they will come back with a description of what they actually want and let us help them. | 06:21.28 |
| what a pain. Have to do the health forms all over again :-( Good thing I saved a copy | 06:23.05 |
avih | tor, at the docs for js_newcfunction, it says "The length argument is the number of arguments to the function. If the function is called with fewer arguments, the argument list will be padded with undefined." should probably start with ".. is the number of arguments the function assume to exist on its stack." and possibly add "The function can check the actual number of arguments on its stack using js_gettop(J) - 1. This number will never be smaller than | 10:21.44 |
| length." | 10:21.44 |
tor8 | avih: I've figured out why array.pop() is so slow | 14:25.11 |
avih | O(n)? | 14:25.34 |
| but at the worst case for AA tree? | 14:25.44 |
| bug* | 14:25.47 |
tor8 | O(n*n) | 14:25.49 |
avih | heh | 14:25.52 |
tor8 | it's the handling of the "magic" length property | 14:26.08 |
avih | i guess that should give some nice boost to any js code which deals with big objects. | 14:26.27 |
tor8 | for(i=0;i<1000;++i) --x.length will have the same slowdown | 14:26.56 |
avih | hmm.. | 14:27.21 |
tor8 | setting an array length needs to delete all numeric properties >= the length | 14:27.40 |
avih | yup | 14:27.47 |
tor8 | the culprit is the jsV_resizearray function | 14:28.24 |
avih | wait, but iirc at jsarray.c you manually delete elements past the new length.. don't you? | 14:28.26 |
tor8 | avih: yeah. I can fix the Ap_pop function to shortcut that | 14:28.43 |
avih | so that's redundant? | 14:28.46 |
tor8 | but then the --x.length case will still be slow | 14:28.49 |
avih | yeah, because it implies pop, just without the return value | 14:29.09 |
tor8 | yeah. and x.length = y needs to create an iterator to loop through and delete all the properties | 14:29.39 |
| and creating an iterator is a bit slow | 14:29.45 |
| not to mention looping through it | 14:29.51 |
avih | splice as well iirc, though maybe not after the latest splice fix | 14:30.00 |
| yeah | 14:30.18 |
tor8 | anywhere js_setlength is called, really | 14:30.33 |
| it's a bit of a pickle doing this well | 14:31.29 |
avih | so you always use this generic object structure for arrays? iirc there's some CARRAY stuff there, i initially assumed that's for array of objects. though i didn't try to folow the semantics there | 14:31.33 |
tor8 | I can of course loop through the actual numeric indices rather than use the enumerator | 14:31.47 |
| but then it will behave badly for sparse arrays | 14:31.55 |
| avih: the CARRAY stuff is just to trigger the automatic length handling | 14:32.16 |
avih | n log n is netter than n^2 ;) | 14:32.20 |
tor8 | underneath that is just a regular object | 14:32.23 |
avih | though n log n for pop sounds nasty, or for any change of length | 14:32.49 |
| so, you got a nice approach to fix this? | 14:33.19 |
tor8 | for compact arrays, looping over the actual affected indexes (which is just the delta between the new and old length) will be well behaved | 14:33.22 |
| but for a sparse array, it could get nasty fast | 14:33.35 |
| x=[]; x.length = 1000000; x.length = 1 | 14:33.46 |
avih | yeah, that's sparse, but 1 -> 1000 ar[i]=x isn't | 14:34.08 |
| or shouldn't. | 14:34.24 |
tor8 | each individual pop would be O(log n) | 14:35.03 |
avih | yeah, that's the ideal case with aa | 14:35.18 |
tor8 | which is as good as it's going to get here | 14:36.02 |
avih | did you think of representing arrays, up to some limit sparsness factor, as actual arrays? | 14:36.07 |
| i.e. c arrays | 14:36.25 |
| tracking the sparse factor is O(1) | 14:36.54 |
tor8 | yes, I have thought about it. shouldn't be too hard, but it does break the simplicity of everything handled the same way, adding a lot of special cases. | 14:37.08 |
| although we already have a lot of special cases for CARRAY etc | 14:37.36 |
avih | yeah. though tbh, o(log n) isn't bad, once it works | 14:38.06 |
| it's probably best to get a reasonably stable implementation using the current generic object approach, and then start optimizing. | 14:39.05 |
| as it stands, i think it's not mature enough, or at least not tested enough, to start optimize this stuff | 14:39.23 |
tor8 | yeah. I'm going to need to track the actual number of properties vs length in order to deal gracefully with the sparse array case | 14:39.25 |
avih | yes, though that's o(1) | 14:39.57 |
| (when i approached mujs i thought it's more mature/tested that it was. i still like this project a lot, and the response is great, and the code is nice and solid, so i'm sticking around ;) ) | 14:41.28 |
tor8 | avih: I appreciate you sticking around and pointing out these oddities that still need to be ironed out | 14:42.14 |
| as you've noticed, it doesn't take long to fix things once they've been spotted | 14:42.46 |
avih | indeed, which is great | 14:43.06 |
| tor8: btw, i noticed it was originally copyright-you, then changed to agpl. what's the history of mujs? | 14:44.27 |
| was it a sort of private project which you added to muPFD later? | 14:45.11 |
tor8 | it started as a weekend project, then when it became useful for MuPDF and I started spending work time on it | 14:45.13 |
| yes. | 14:45.39 |
avih | yeah, that's what i thought. well done. just straighten up the kinks, and it's gonna be amazing :) | 14:45.49 |
| btw, what did you think of the cygwin comment? the fix is real small and you get another target platform | 14:46.55 |
nsz | btw there are very slow array tests in the official ecma test suit | 14:47.09 |
| https://github.com/tc39/test262/blob/master/test/suite/ch15/15.4/15.4.4/15.4.4.15/15.4.4.15-5-12.js | 14:47.20 |
avih | nsz: did you try running the suite on mujs? | 14:47.34 |
nsz | eg this one does a[2^32 - 2] = null; | 14:47.34 |
| yes | 14:47.39 |
avih | how does it do? | 14:47.45 |
nsz | lot of bugs | 14:47.52 |
avih | heh | 14:47.55 |
| crashes? | 14:47.58 |
nsz | i have a list of failed tests if you are interested but probably easy to fix | 14:48.04 |
| i started with the hardest one: floating-point printing and parsing | 14:48.17 |
avih | i'm mostly a user of mujs, but the thought of running the test suite did cross my mind when i became aware of the fact that it's not as mature as i thought it was. | 14:48.50 |
nsz | i have correct floting-point printing code now | 14:48.51 |
avih | cool | 14:49.15 |
| (i'm adding js scripting support to mpv as a hobby project) | 14:49.39 |
nsz | http://port70.net/~nsz/failed.tar.gz | 14:51.10 |
| if you uncompress this into the mujs repo then you can easily run the relevant subset of tests | 14:51.39 |
avih | test.log only for the results, yes? | 14:52.03 |
nsz | ah yes | 14:52.25 |
| that was the result the last time i ran the suite | 14:52.56 |
| running the whole suite takes a lot of time that's why i collected the failed ones | 14:53.18 |
avih | oh, but these are only the failures, right? what's the percentage of passes? (even if percentage doesn't mean much.. still.. it's a figure ;) ) | 14:53.23 |
nsz | i dont remember and dont have that code here | 14:53.50 |
avih | yeah, well, tor found out that decreasing the length of an array is O(N^2), so hopefully making it o(log n) should make things a bit faster ;) | 14:54.22 |
nsz | these conformance tests often test useless corner cases | 14:54.27 |
avih | indeed | 14:54.45 |
| i spent considerable amount of time with the conformance tests of my Promise implementation. | 14:55.03 |
tor8 | avih: changing the length of an array was O(n) | 14:55.31 |
avih | oh.. | 14:55.48 |
tor8 | avih: now it's going to be anywhere between O(delta-n) and O(n) | 14:55.56 |
avih | delta? | 14:56.09 |
tor8 | if it's sparse, O(n), if it's not sparse O(log n) * (old length - new length) | 14:56.36 |
avih | but O(log n) on average, even for the case of looop 1000 pop? | 14:56.38 |
tor8 | so pop() will be O(log n) | 14:56.48 |
nsz | btw my mujs hack to unscramble youtube direct streaming urls worked fine | 14:57.57 |
| however youtube html5player.js is about 1M | 14:58.09 |
| to parse this in and run the appropriate unscrambling function in it takes >400ms with mujs | 14:59.14 |
| (and about 150ms in nodejs from which 40-50ms is the startup time) | 14:59.39 |
avih | tor8: if the elements were ordered at the tree by the array index, then truncating an array would only be O(delta) | 14:59.40 |
| or delta log n | 15:00.11 |
nsz | (i dont understand how ppl are not fed up with the web: 100ms is a huge latency on a ui) | 15:00.22 |
avih | nsz: indeed it is. | 15:00.38 |
| or at least should be considered big enough to be uncomfortable | 15:00.57 |
nsz | anyway i can play youtube videos now ..with mpv :) | 15:00.58 |
avih | :) | 15:01.07 |
nsz | it just takes a bit of time to do the parsing | 15:01.31 |
avih | you parse the google js code in js? | 15:01.43 |
nsz | but it is still much less than the time you wait in a browser | 15:01.44 |
| i parse js with mujs | 15:02.18 |
| and eval one function with a short string argument | 15:02.39 |
avih | and look for the stream url? | 15:02.43 |
nsz | stream url is in the html, but for some videos it is 'obfuscated' | 15:03.05 |
| and the deobfuscator is in js in html5player.js | 15:03.20 |
avih | so that's what you parse for? | 15:03.21 |
nsz | yes | 15:03.26 |
avih | k | 15:03.28 |
| so how does it work? you input a youtube url into your tool, and it "imports" the video? | 15:04.02 |
tor8 | nsz: glad you finally got it working! | 15:04.06 |
nsz | avih: i have a yurl tool that takes a youtube url or video id (or even playlist id used to work) and then tries its best to download the webpage, parse all sorts of relevant info from it, download the js if needed and then output the results in a nice format | 15:05.14 |
| and then i have yget and yplay that use this yurl tool | 15:05.32 |
| (they can select stream format etc) | 15:05.42 |
avih | nsz: and the result being, stream[s] url? | 15:05.44 |
nsz | and the way i use it is from w3m browser | 15:05.52 |
| i have a key if i press it on a youtube link then the video starts playing in mpv | 15:06.17 |
avih | is unfamiliar with w3m.. he thinks | 15:06.22 |
nsz | w3m is console browser | 15:06.33 |
avih | oh | 15:06.41 |
| and why do you use a console browser? :) | 15:06.51 |
| there are some nicely modern UI based browsers these days, you know ;) | 15:07.13 |
nsz | gui browsers are too much pain when you are programming in terminal | 15:07.55 |
| *terminals | 15:07.59 |
avih | and the video? ascii art? :) | 15:08.36 |
nsz | no, i run x | 15:08.59 |
| just with many terminals | 15:09.09 |
avih | dunno.. never felt comfortable with text browsers. and i've been playing with gopher before there was mosaic :) | 15:09.40 |
tor8 | avih: delete performance fix has been pushed. I hope that solves the performance hits you've been encountering. | 15:09.52 |
| avih: w3m is a usable lynx :) | 15:10.24 |
avih | tor8: noticing. my code doesn't have such long arrays, but i was testing it anyway... :) | 15:10.38 |
tor8 | avih: no, it was a good spot | 15:10.47 |
| and well worth fixing | 15:10.50 |
avih | absolutely :) | 15:10.58 |
nsz | avih: you cannot use a gui browser without a pointer device and they are incredibly slow to start up | 15:11.16 |
tor8 | avih: nsz: I'm not super stressed out about passing all the ecma test suite tests; a lot of them are silly and many of the test suites have outright bugs | 15:11.37 |
| in that they don't test what they claim to be testing | 15:11.44 |
avih | nsz: indeed. though there are alternative navigation methods | 15:11.48 |
tor8 | but if you spot any actual crashes, I'm more concerned | 15:12.03 |
nsz | ok | 15:12.15 |
avih | tor8: crashes, or clearly incorrect behavior, i hope ;) | 15:12.33 |
tor8 | I will eventually get around to passing the tests, it's not that I don't care about them it's just I've got a lot on my plate and I have to prioritize | 15:12.34 |
| avih: yes. | 15:12.39 |
| anything that makes it actually unusable :) | 15:12.54 |
avih | sure, i understand. and it is making very nice progress. just need some more users to use it for some heavier duty stuff.. and your bugmail will grow quickly ;) | 15:13.29 |
tor8 | avih: yeah. you always need users for a project to mature. | 15:14.05 |
| garbage collecting strings is probably the highest priority TODO item on MuJS at the moment | 15:14.31 |
| but that may have to wait a few weeks while I'm focusing on the EPUB module of MuPDF | 15:14.54 |
avih | it does seem considerably solid though, considering the very few users it has... | 15:14.56 |
| GC for API strings? or internal? | 15:15.25 |
tor8 | all strings | 15:15.46 |
| the lifetime will be as we discussed earlier, and the external API shouldn't change | 15:15.59 |
avih | so currently you don't GC strings at all? | 15:16.50 |
| yeah, lifetime == as long as on stack sounds perfectly reasonable. and decent too. | 15:17.22 |
| w00t! 1000x pop instantly! :) | 15:38.12 |
| not even twice as slow as push... : | 15:38.49 |
| [test] 1000x push: 2.2 ms | 15:38.50 |
| [test] 1000x pop: 2.0 ms | 15:38.50 |
| 1000x pop used to be ~900ms | 15:39.22 |
tor8 | avih: yeah. picking a better algorithm helps. :) | 15:39.46 |
avih | :p | 15:39.52 |
| tor8: btw, i recall an issue on github where someone said that Date.now() doesn't change after a very long loop. i bumped into this issue too, but since i'm embedded, i exposed another get_time function, but Date.now() should robably be fixed | 15:41.00 |
| or that it does change but in 1s multiples | 15:41.19 |
| also, if you want users, i think you should keep the issues section open. take the burden on yourself to duplicate the bugs into ghostscript.com and post a url to the ghostscript bug at the github page imo | 15:45.09 |
| people will post bugs to github quickly, but an order of magnitude less will create an account to file at ghostscript | 15:46.05 |
nsz | github? | 15:47.23 |
avih | https://github.com/ccxvii/mujs | 15:47.50 |
nsz | i dont think mujs is hosted on github | 15:47.55 |
avih | tor updates it frequently after updates to master | 15:48.14 |
nsz | i see | 15:48.19 |
avih | though still manually | 15:48.28 |
nsz | well github is useful for mirroring | 15:48.32 |
avih | yes | 15:48.38 |
| and bugs | 15:48.43 |
nsz | i dont know about bugs.. or any service that needs a github account | 15:50.07 |
| but maybe js ppl use github a lot | 15:50.18 |
avih | dunno, i don't use gh a lot, but it does have many many more registered users than bugzilla at ghostscript ;) | 15:51.17 |
chrisl | Well, after an immense amount of faffing about, I finally have two instrument based Visual Studio profiles from the 8.71+ and 9.06+ versions of cust 532's simulator. And the results are..... complete garbage, nonsense of highest order. What a crock of sh*t :-( | 16:09.36 |
Robin_Watts | instrumented profiles are rarely useful IME. | 16:10.47 |
chrisl | Well, sampled ones are completely useless | 16:11.38 |
| Or, I should say, sampled profiles are largely useless when profiling an interpreter | 16:12.56 |
| But sampling also falls apart when an application uses threads | 16:13.30 |
| I would, however, expect that at least the call counts would be meaningful in an instrumented profile | 16:14.07 |
Robin_Watts | sampling with threads falls apart, (in the absence of integration with the threading system) | 16:15.44 |
chrisl | With Ghostscript, especially doing PDF, I find sampled profiling along isn't as helpful because it doesn't tell you whether the function is taking more time, or is being called more often than before | 16:17.42 |
| Personally, I find instrumented profiles give me a better hint where to look, but that's probably because I'm used to reading them. But in the case, the output is genuinely nonsensical | 16:21.22 |
| Also, the fact that VS takes over 25 minutes to "analyze" the profile so you can actually view it is just totally insane........ | 16:23.38 |
Diemex | Can I detect if I'm dealing with a scanned/picture only pdf? That would be great so I don't have to redraw when zooming in. I'm using mupdf on android | 18:45.29 |
Robin_Watts | Diemex: In what way wouldn't you have to redraw? | 18:50.49 |
Diemex | Well if I know that the page is scanned I can render it once at the resolution of the scan. When I zoom in using pinch to zoom I would usually redraw the page at a higher resolution so text becomes crisp. There is no point in redrawing if the page is a picture. It would just waste cpu and more memory for the larger picture size. | 18:52.48 |
Robin_Watts | Diemex: So you're doing your own blitting of the MuPDF rendered image? | 18:56.32 |
| In the current MuPDF android app, the bitmap is drawn at the requested resolution, and we scale any images to that resolution. | 18:57.07 |
| When you zoom, we redraw the page at a higher res, causing images to be rescaled. | 18:57.24 |
| To do what you suggest would mean you replacing part of that pipeline. | 18:57.37 |
Diemex | Well I use mupdf in my own app and am looking for a function like isScannedPdf(). If it's scanned I wont scale it. What is "blitting"? | 19:00.12 |
Robin_Watts | blitting: I'm informally using to mean "taking a prerendered bitmap and shoving it on the screen" | 19:00.55 |
| Diemex: OK, so there is no isScannedPdf function. | 19:01.13 |
| But it would relatively easy to write one. | 19:01.22 |
| So you speak C ? | 19:01.44 |
| Do you speak C ? (sorry) | 19:02.00 |
Diemex | My c is very bad. It's like "You say I say" | 19:02.42 |
tor5 | Robin_Watts: didn't we have code somewhere that detected an image-only page by looking at the display list? | 19:03.03 |
Robin_Watts | tor5: I was going to suggest writing a device that looked at the calls to see if it was only images. | 19:03.24 |
Diemex | I was thinking that one could look at the list of "pdf objects" if it's only images -> scanned | 19:04.19 |
Robin_Watts | Diemex: Right. The easiest way to do that is to run the page through a custom device, and look at the page objects you are called with. | 19:05.18 |
| And to avoid processing the page twice, you'd use a display list. | 19:05.35 |
| But bear in mind that not all the images on the page might be the same resolution, so you'd need to figure out the max res. | 19:05.54 |
Diemex | It would be good to get a "bounding box" around all the images. Basically the size of the page in pixels. Then render the page in that resoluton. | 19:07.08 |
Robin_Watts | Diemex: I disagree with that way of characterising it. | 19:08.38 |
| PDF pages have a well defined physical size. | 19:09.01 |
| So for each image you'd look at how it was scaled onto the page and calculate a dpi for it. | 19:09.36 |
| You find the maximum dpi of any image on the screen. | 19:09.54 |
| Then you can figure out how big the page would be when rendered at that dpi. | 19:10.05 |
| And that's what you render. | 19:10.13 |
Diemex | That sounds better. Now I need to figure out where to look and add the functionality. | 19:14.14 |
Robin_Watts | Diemex: It will require you to write a new fz_device in C. | 19:16.11 |
| Look at source/fitz/test-device.c for an example. | 19:16.53 |
Diemex | Are there any docs on what every class/file does? What is a fz_device? | 19:19.43 |
Robin_Watts | Diemex: The only docs is the source. The header files are fairly well commented. | 19:21.07 |
| Diemex: Basically when MuPDF interprets a PDF file, it breaks the page down into a series of drawing operations. | 19:21.40 |
| For each of these drawing operations, it calls a function in an fz_device structure (a method of an fz_device class, if you prefer OO terms) | 19:22.17 |
| We have devices that render these to screen. | 19:22.33 |
| Or store them into a display list to be played back later. | 19:22.43 |
| Or that extract text from the objects. | 19:22.53 |
| Or that work out if a page is black and white or color. | 19:23.02 |
| etc, | 19:23.04 |
| You're writing another such 'consumer of page objects', right? | 19:23.15 |
Diemex | Okay: So I listen to the draw calls. If a draw call for something other than an image is called it is not scanned. I could set a bool variable to false and would have my isScanned() function | 19:25.12 |
| I'm a bit confused. C is not oo, right? How would I add a global bool variable to the device? | 19:26.16 |
Robin_Watts | Indeed, C is not OO. | 19:28.22 |
| But you can use C in an OO style. | 19:28.28 |
| You don't have classes and members natively in C. | 19:28.44 |
| but we achieve the same result using structures with function pointers in. | 19:28.56 |
| If you look at test-device.c you'll see a device that looks to see if all the incoming drawing requests are greyscale or color. | 19:29.34 |
| That should give you a good starting point. | 19:29.42 |
Diemex | This is function pointer ? | 19:51.42 |
| void (*fill_image)(fz_device *, fz_image *img, const fz_matrix *ctm, float alpha); | 19:51.43 |
| type void?? | 19:51.52 |
Robin_Watts | It's a pointer to a function. | 19:54.50 |
| The function takes the list of arguments on the right, and returns void. | 19:55.04 |
Diemex | Oh I have an idea. If I wanted to be super memory efficient I could detect if a page is greyscale. Then I would only need one byte per pixel instead of 4 byte. Sadly android only has a bitmap that saves a translucency. | 19:59.31 |
| A I could render a white background and then render a completely black picture above | 20:00.30 |
| The rendered picture with the transluceny values could be used as a mask for the black overlay | 20:00.55 |
| Robin_Watts: If I want to save a variable in my device I have to do it in a struct right? Or where do I define "member variables" | 20:13.13 |
Robin_Watts | Diemex: In the user data in the struct. | 20:20.12 |
| which can be a pointer to another struct if you need more than one. | 20:20.36 |
Diemex | is a device per page or per pdf file? | 20:22.36 |
Robin_Watts | per page. | 20:23.41 |
Diemex | What determines the speed of rendering a pdf? I noticed that there is a big difference in rendering time ranging from about 100ms to 1000ms per page. 100ms for a text only pdf and 1000ms for a scanned pdf, which is the reason why I want to not have to redraw scanned pdfs. Pdfs from publishers seem to be rather fast. Pdfs that have been autogenerated by ms office or similar seem to be generally slower | 20:30.06 |
Robin_Watts | Diemex: The delay is almost certainly the time taken to smooth scale the decoded image to the screen. | 20:30.52 |
| text only PDFs don't involve bitmap scaling. | 20:31.05 |
Diemex | If I would render scanned pdfs at their native resolution without scaling it would be faster? | 20:31.37 |
Robin_Watts | You'd avoid the scaling stage there yes. | 20:31.56 |
| but you'd then have to scale the native res thing onto the screen. | 20:32.05 |
| which would be just as slow. | 20:32.10 |
Diemex | scaling with a canvas is pretty quick | 20:32.43 |
| So I'm trying to determine which function calls would make it a scanned pdf: If one of these is called it has scalable content: fill_path, stroke_path, clip_path, clip_stroke_path, fill_text, stroke_text, clip_text, clip_stroke_text If one of these is called it has an image: fill_image_mask, clip_image_mask, fill_image what about these: ignore_text, fill_shade, pop_clip, begin_mask, end_mask, begin_group, end_group, begin_t | 20:39.02 |
| Forward 1 day (to 2014/11/08)>>> | |