IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2014/11/06)20141107 
rayjj For the logs: I replied to Saito-san about halftones (dither patterns) with Ghostscript. Hopefully, they will come back with a description of what they actually want and let us help them.06:21.28 
  what a pain. Have to do the health forms all over again :-( Good thing I saved a copy06:23.05 
avih tor, at the docs for js_newcfunction, it says "The length argument is the number of arguments to the function. If the function is called with fewer arguments, the argument list will be padded with undefined." should probably start with ".. is the number of arguments the function assume to exist on its stack." and possibly add "The function can check the actual number of arguments on its stack using js_gettop(J) - 1. This number will never be smaller than10:21.44 
  length."10:21.44 
tor8 avih: I've figured out why array.pop() is so slow14:25.11 
avih O(n)?14:25.34 
  but at the worst case for AA tree?14:25.44 
  bug*14:25.47 
tor8 O(n*n)14:25.49 
avih heh14:25.52 
tor8 it's the handling of the "magic" length property14:26.08 
avih i guess that should give some nice boost to any js code which deals with big objects.14:26.27 
tor8 for(i=0;i<1000;++i) --x.length will have the same slowdown14:26.56 
avih hmm..14:27.21 
tor8 setting an array length needs to delete all numeric properties >= the length14:27.40 
avih yup14:27.47 
tor8 the culprit is the jsV_resizearray function14:28.24 
avih wait, but iirc at jsarray.c you manually delete elements past the new length.. don't you?14:28.26 
tor8 avih: yeah. I can fix the Ap_pop function to shortcut that14:28.43 
avih so that's redundant?14:28.46 
tor8 but then the --x.length case will still be slow14:28.49 
avih yeah, because it implies pop, just without the return value14:29.09 
tor8 yeah. and x.length = y needs to create an iterator to loop through and delete all the properties14:29.39 
  and creating an iterator is a bit slow14:29.45 
  not to mention looping through it14:29.51 
avih splice as well iirc, though maybe not after the latest splice fix14:30.00 
  yeah14:30.18 
tor8 anywhere js_setlength is called, really14:30.33 
  it's a bit of a pickle doing this well14:31.29 
avih so you always use this generic object structure for arrays? iirc there's some CARRAY stuff there, i initially assumed that's for array of objects. though i didn't try to folow the semantics there14:31.33 
tor8 I can of course loop through the actual numeric indices rather than use the enumerator14:31.47 
  but then it will behave badly for sparse arrays14:31.55 
  avih: the CARRAY stuff is just to trigger the automatic length handling14:32.16 
avih n log n is netter than n^2 ;)14:32.20 
tor8 underneath that is just a regular object14:32.23 
avih though n log n for pop sounds nasty, or for any change of length14:32.49 
  so, you got a nice approach to fix this?14:33.19 
tor8 for compact arrays, looping over the actual affected indexes (which is just the delta between the new and old length) will be well behaved14:33.22 
  but for a sparse array, it could get nasty fast14:33.35 
  x=[]; x.length = 1000000; x.length = 114:33.46 
avih yeah, that's sparse, but 1 -> 1000 ar[i]=x isn't14:34.08 
  or shouldn't.14:34.24 
tor8 each individual pop would be O(log n)14:35.03 
avih yeah, that's the ideal case with aa14:35.18 
tor8 which is as good as it's going to get here14:36.02 
avih did you think of representing arrays, up to some limit sparsness factor, as actual arrays?14:36.07 
  i.e. c arrays14:36.25 
  tracking the sparse factor is O(1)14:36.54 
tor8 yes, I have thought about it. shouldn't be too hard, but it does break the simplicity of everything handled the same way, adding a lot of special cases.14:37.08 
  although we already have a lot of special cases for CARRAY etc14:37.36 
avih yeah. though tbh, o(log n) isn't bad, once it works14:38.06 
  it's probably best to get a reasonably stable implementation using the current generic object approach, and then start optimizing.14:39.05 
  as it stands, i think it's not mature enough, or at least not tested enough, to start optimize this stuff14:39.23 
tor8 yeah. I'm going to need to track the actual number of properties vs length in order to deal gracefully with the sparse array case14:39.25 
avih yes, though that's o(1)14:39.57 
  (when i approached mujs i thought it's more mature/tested that it was. i still like this project a lot, and the response is great, and the code is nice and solid, so i'm sticking around ;) )14:41.28 
tor8 avih: I appreciate you sticking around and pointing out these oddities that still need to be ironed out14:42.14 
  as you've noticed, it doesn't take long to fix things once they've been spotted14:42.46 
avih indeed, which is great14:43.06 
  tor8: btw, i noticed it was originally copyright-you, then changed to agpl. what's the history of mujs?14:44.27 
  was it a sort of private project which you added to muPFD later?14:45.11 
tor8 it started as a weekend project, then when it became useful for MuPDF and I started spending work time on it14:45.13 
  yes.14:45.39 
avih yeah, that's what i thought. well done. just straighten up the kinks, and it's gonna be amazing :)14:45.49 
  btw, what did you think of the cygwin comment? the fix is real small and you get another target platform14:46.55 
nsz btw there are very slow array tests in the official ecma test suit14:47.09 
  https://github.com/tc39/test262/blob/master/test/suite/ch15/15.4/15.4.4/15.4.4.15/15.4.4.15-5-12.js14:47.20 
avih nsz: did you try running the suite on mujs?14:47.34 
nsz eg this one does a[2^32 - 2] = null;14:47.34 
  yes14:47.39 
avih how does it do?14:47.45 
nsz lot of bugs14:47.52 
avih heh14:47.55 
  crashes?14:47.58 
nsz i have a list of failed tests if you are interested but probably easy to fix14:48.04 
  i started with the hardest one: floating-point printing and parsing14:48.17 
avih i'm mostly a user of mujs, but the thought of running the test suite did cross my mind when i became aware of the fact that it's not as mature as i thought it was.14:48.50 
nsz i have correct floting-point printing code now14:48.51 
avih cool14:49.15 
  (i'm adding js scripting support to mpv as a hobby project)14:49.39 
nsz http://port70.net/~nsz/failed.tar.gz14:51.10 
  if you uncompress this into the mujs repo then you can easily run the relevant subset of tests14:51.39 
avih test.log only for the results, yes?14:52.03 
nsz ah yes14:52.25 
  that was the result the last time i ran the suite14:52.56 
  running the whole suite takes a lot of time that's why i collected the failed ones14:53.18 
avih oh, but these are only the failures, right? what's the percentage of passes? (even if percentage doesn't mean much.. still.. it's a figure ;) )14:53.23 
nsz i dont remember and dont have that code here14:53.50 
avih yeah, well, tor found out that decreasing the length of an array is O(N^2), so hopefully making it o(log n) should make things a bit faster ;)14:54.22 
nsz these conformance tests often test useless corner cases14:54.27 
avih indeed14:54.45 
  i spent considerable amount of time with the conformance tests of my Promise implementation.14:55.03 
tor8 avih: changing the length of an array was O(n)14:55.31 
avih oh..14:55.48 
tor8 avih: now it's going to be anywhere between O(delta-n) and O(n)14:55.56 
avih delta?14:56.09 
tor8 if it's sparse, O(n), if it's not sparse O(log n) * (old length - new length)14:56.36 
avih but O(log n) on average, even for the case of looop 1000 pop?14:56.38 
tor8 so pop() will be O(log n)14:56.48 
nsz btw my mujs hack to unscramble youtube direct streaming urls worked fine14:57.57 
  however youtube html5player.js is about 1M14:58.09 
  to parse this in and run the appropriate unscrambling function in it takes >400ms with mujs14:59.14 
  (and about 150ms in nodejs from which 40-50ms is the startup time)14:59.39 
avih tor8: if the elements were ordered at the tree by the array index, then truncating an array would only be O(delta)14:59.40 
  or delta log n15:00.11 
nsz (i dont understand how ppl are not fed up with the web: 100ms is a huge latency on a ui)15:00.22 
avih nsz: indeed it is.15:00.38 
  or at least should be considered big enough to be uncomfortable15:00.57 
nsz anyway i can play youtube videos now ..with mpv :)15:00.58 
avih :)15:01.07 
nsz it just takes a bit of time to do the parsing15:01.31 
avih you parse the google js code in js?15:01.43 
nsz but it is still much less than the time you wait in a browser15:01.44 
  i parse js with mujs15:02.18 
  and eval one function with a short string argument15:02.39 
avih and look for the stream url?15:02.43 
nsz stream url is in the html, but for some videos it is 'obfuscated'15:03.05 
  and the deobfuscator is in js in html5player.js15:03.20 
avih so that's what you parse for?15:03.21 
nsz yes15:03.26 
avih k15:03.28 
  so how does it work? you input a youtube url into your tool, and it "imports" the video?15:04.02 
tor8 nsz: glad you finally got it working!15:04.06 
nsz avih: i have a yurl tool that takes a youtube url or video id (or even playlist id used to work) and then tries its best to download the webpage, parse all sorts of relevant info from it, download the js if needed and then output the results in a nice format15:05.14 
  and then i have yget and yplay that use this yurl tool15:05.32 
  (they can select stream format etc)15:05.42 
avih nsz: and the result being, stream[s] url?15:05.44 
nsz and the way i use it is from w3m browser15:05.52 
  i have a key if i press it on a youtube link then the video starts playing in mpv15:06.17 
avih is unfamiliar with w3m.. he thinks15:06.22 
nsz w3m is console browser15:06.33 
avih oh15:06.41 
  and why do you use a console browser? :)15:06.51 
  there are some nicely modern UI based browsers these days, you know ;)15:07.13 
nsz gui browsers are too much pain when you are programming in terminal15:07.55 
  *terminals15:07.59 
avih and the video? ascii art? :)15:08.36 
nsz no, i run x15:08.59 
  just with many terminals15:09.09 
avih dunno.. never felt comfortable with text browsers. and i've been playing with gopher before there was mosaic :)15:09.40 
tor8 avih: delete performance fix has been pushed. I hope that solves the performance hits you've been encountering.15:09.52 
  avih: w3m is a usable lynx :)15:10.24 
avih tor8: noticing. my code doesn't have such long arrays, but i was testing it anyway... :)15:10.38 
tor8 avih: no, it was a good spot15:10.47 
  and well worth fixing15:10.50 
avih absolutely :)15:10.58 
nsz avih: you cannot use a gui browser without a pointer device and they are incredibly slow to start up15:11.16 
tor8 avih: nsz: I'm not super stressed out about passing all the ecma test suite tests; a lot of them are silly and many of the test suites have outright bugs15:11.37 
  in that they don't test what they claim to be testing15:11.44 
avih nsz: indeed. though there are alternative navigation methods15:11.48 
tor8 but if you spot any actual crashes, I'm more concerned15:12.03 
nsz ok15:12.15 
avih tor8: crashes, or clearly incorrect behavior, i hope ;)15:12.33 
tor8 I will eventually get around to passing the tests, it's not that I don't care about them it's just I've got a lot on my plate and I have to prioritize15:12.34 
  avih: yes.15:12.39 
  anything that makes it actually unusable :)15:12.54 
avih sure, i understand. and it is making very nice progress. just need some more users to use it for some heavier duty stuff.. and your bugmail will grow quickly ;)15:13.29 
tor8 avih: yeah. you always need users for a project to mature.15:14.05 
  garbage collecting strings is probably the highest priority TODO item on MuJS at the moment15:14.31 
  but that may have to wait a few weeks while I'm focusing on the EPUB module of MuPDF15:14.54 
avih it does seem considerably solid though, considering the very few users it has...15:14.56 
  GC for API strings? or internal?15:15.25 
tor8 all strings15:15.46 
  the lifetime will be as we discussed earlier, and the external API shouldn't change15:15.59 
avih so currently you don't GC strings at all?15:16.50 
  yeah, lifetime == as long as on stack sounds perfectly reasonable. and decent too.15:17.22 
  w00t! 1000x pop instantly! :)15:38.12 
  not even twice as slow as push... :15:38.49 
  [test] 1000x push: 2.2 ms15:38.50 
  [test] 1000x pop: 2.0 ms15:38.50 
  1000x pop used to be ~900ms15:39.22 
tor8 avih: yeah. picking a better algorithm helps. :)15:39.46 
avih :p15:39.52 
  tor8: btw, i recall an issue on github where someone said that Date.now() doesn't change after a very long loop. i bumped into this issue too, but since i'm embedded, i exposed another get_time function, but Date.now() should robably be fixed15:41.00 
  or that it does change but in 1s multiples15:41.19 
  also, if you want users, i think you should keep the issues section open. take the burden on yourself to duplicate the bugs into ghostscript.com and post a url to the ghostscript bug at the github page imo15:45.09 
  people will post bugs to github quickly, but an order of magnitude less will create an account to file at ghostscript15:46.05 
nsz github?15:47.23 
avih https://github.com/ccxvii/mujs15:47.50 
nsz i dont think mujs is hosted on github15:47.55 
avih tor updates it frequently after updates to master15:48.14 
nsz i see15:48.19 
avih though still manually15:48.28 
nsz well github is useful for mirroring15:48.32 
avih yes15:48.38 
  and bugs15:48.43 
nsz i dont know about bugs.. or any service that needs a github account15:50.07 
  but maybe js ppl use github a lot15:50.18 
avih dunno, i don't use gh a lot, but it does have many many more registered users than bugzilla at ghostscript ;)15:51.17 
chrisl Well, after an immense amount of faffing about, I finally have two instrument based Visual Studio profiles from the 8.71+ and 9.06+ versions of cust 532's simulator. And the results are..... complete garbage, nonsense of highest order. What a crock of sh*t :-(16:09.36 
Robin_Watts instrumented profiles are rarely useful IME.16:10.47 
chrisl Well, sampled ones are completely useless16:11.38 
  Or, I should say, sampled profiles are largely useless when profiling an interpreter16:12.56 
  But sampling also falls apart when an application uses threads16:13.30 
  I would, however, expect that at least the call counts would be meaningful in an instrumented profile16:14.07 
Robin_Watts sampling with threads falls apart, (in the absence of integration with the threading system)16:15.44 
chrisl With Ghostscript, especially doing PDF, I find sampled profiling along isn't as helpful because it doesn't tell you whether the function is taking more time, or is being called more often than before16:17.42 
  Personally, I find instrumented profiles give me a better hint where to look, but that's probably because I'm used to reading them. But in the case, the output is genuinely nonsensical16:21.22 
  Also, the fact that VS takes over 25 minutes to "analyze" the profile so you can actually view it is just totally insane........16:23.38 
Diemex Can I detect if I'm dealing with a scanned/picture only pdf? That would be great so I don't have to redraw when zooming in. I'm using mupdf on android18:45.29 
Robin_Watts Diemex: In what way wouldn't you have to redraw?18:50.49 
Diemex Well if I know that the page is scanned I can render it once at the resolution of the scan. When I zoom in using pinch to zoom I would usually redraw the page at a higher resolution so text becomes crisp. There is no point in redrawing if the page is a picture. It would just waste cpu and more memory for the larger picture size.18:52.48 
Robin_Watts Diemex: So you're doing your own blitting of the MuPDF rendered image?18:56.32 
  In the current MuPDF android app, the bitmap is drawn at the requested resolution, and we scale any images to that resolution.18:57.07 
  When you zoom, we redraw the page at a higher res, causing images to be rescaled.18:57.24 
  To do what you suggest would mean you replacing part of that pipeline.18:57.37 
Diemex Well I use mupdf in my own app and am looking for a function like isScannedPdf(). If it's scanned I wont scale it. What is "blitting"?19:00.12 
Robin_Watts blitting: I'm informally using to mean "taking a prerendered bitmap and shoving it on the screen"19:00.55 
  Diemex: OK, so there is no isScannedPdf function.19:01.13 
  But it would relatively easy to write one.19:01.22 
  So you speak C ?19:01.44 
  Do you speak C ? (sorry)19:02.00 
Diemex My c is very bad. It's like "You say I say"19:02.42 
tor5 Robin_Watts: didn't we have code somewhere that detected an image-only page by looking at the display list?19:03.03 
Robin_Watts tor5: I was going to suggest writing a device that looked at the calls to see if it was only images.19:03.24 
Diemex I was thinking that one could look at the list of "pdf objects" if it's only images -> scanned19:04.19 
Robin_Watts Diemex: Right. The easiest way to do that is to run the page through a custom device, and look at the page objects you are called with.19:05.18 
  And to avoid processing the page twice, you'd use a display list.19:05.35 
  But bear in mind that not all the images on the page might be the same resolution, so you'd need to figure out the max res.19:05.54 
Diemex It would be good to get a "bounding box" around all the images. Basically the size of the page in pixels. Then render the page in that resoluton.19:07.08 
Robin_Watts Diemex: I disagree with that way of characterising it.19:08.38 
  PDF pages have a well defined physical size.19:09.01 
  So for each image you'd look at how it was scaled onto the page and calculate a dpi for it.19:09.36 
  You find the maximum dpi of any image on the screen.19:09.54 
  Then you can figure out how big the page would be when rendered at that dpi.19:10.05 
  And that's what you render.19:10.13 
Diemex That sounds better. Now I need to figure out where to look and add the functionality.19:14.14 
Robin_Watts Diemex: It will require you to write a new fz_device in C.19:16.11 
  Look at source/fitz/test-device.c for an example.19:16.53 
Diemex Are there any docs on what every class/file does? What is a fz_device?19:19.43 
Robin_Watts Diemex: The only docs is the source. The header files are fairly well commented.19:21.07 
  Diemex: Basically when MuPDF interprets a PDF file, it breaks the page down into a series of drawing operations.19:21.40 
  For each of these drawing operations, it calls a function in an fz_device structure (a method of an fz_device class, if you prefer OO terms)19:22.17 
  We have devices that render these to screen.19:22.33 
  Or store them into a display list to be played back later.19:22.43 
  Or that extract text from the objects.19:22.53 
  Or that work out if a page is black and white or color.19:23.02 
  etc,19:23.04 
  You're writing another such 'consumer of page objects', right?19:23.15 
Diemex Okay: So I listen to the draw calls. If a draw call for something other than an image is called it is not scanned. I could set a bool variable to false and would have my isScanned() function19:25.12 
  I'm a bit confused. C is not oo, right? How would I add a global bool variable to the device?19:26.16 
Robin_Watts Indeed, C is not OO.19:28.22 
  But you can use C in an OO style.19:28.28 
  You don't have classes and members natively in C.19:28.44 
  but we achieve the same result using structures with function pointers in.19:28.56 
  If you look at test-device.c you'll see a device that looks to see if all the incoming drawing requests are greyscale or color.19:29.34 
  That should give you a good starting point.19:29.42 
Diemex This is function pointer ?19:51.42 
  void (*fill_image)(fz_device *, fz_image *img, const fz_matrix *ctm, float alpha);19:51.43 
  type void??19:51.52 
Robin_Watts It's a pointer to a function.19:54.50 
  The function takes the list of arguments on the right, and returns void.19:55.04 
Diemex Oh I have an idea. If I wanted to be super memory efficient I could detect if a page is greyscale. Then I would only need one byte per pixel instead of 4 byte. Sadly android only has a bitmap that saves a translucency.19:59.31 
  A I could render a white background and then render a completely black picture above 20:00.30 
  The rendered picture with the transluceny values could be used as a mask for the black overlay20:00.55 
  Robin_Watts: If I want to save a variable in my device I have to do it in a struct right? Or where do I define "member variables"20:13.13 
Robin_Watts Diemex: In the user data in the struct.20:20.12 
  which can be a pointer to another struct if you need more than one.20:20.36 
Diemex is a device per page or per pdf file?20:22.36 
Robin_Watts per page.20:23.41 
Diemex What determines the speed of rendering a pdf? I noticed that there is a big difference in rendering time ranging from about 100ms to 1000ms per page. 100ms for a text only pdf and 1000ms for a scanned pdf, which is the reason why I want to not have to redraw scanned pdfs. Pdfs from publishers seem to be rather fast. Pdfs that have been autogenerated by ms office or similar seem to be generally slower20:30.06 
Robin_Watts Diemex: The delay is almost certainly the time taken to smooth scale the decoded image to the screen.20:30.52 
  text only PDFs don't involve bitmap scaling.20:31.05 
Diemex If I would render scanned pdfs at their native resolution without scaling it would be faster?20:31.37 
Robin_Watts You'd avoid the scaling stage there yes.20:31.56 
  but you'd then have to scale the native res thing onto the screen.20:32.05 
  which would be just as slow.20:32.10 
Diemex scaling with a canvas is pretty quick20:32.43 
  So I'm trying to determine which function calls would make it a scanned pdf: If one of these is called it has scalable content: fill_path, stroke_path, clip_path, clip_stroke_path, fill_text, stroke_text, clip_text, clip_stroke_text If one of these is called it has an image: fill_image_mask, clip_image_mask, fill_image what about these: ignore_text, fill_shade, pop_clip, begin_mask, end_mask, begin_group, end_group, begin_t20:39.02 
 Forward 1 day (to 2014/11/08)>>> 
ghostscript.com
Search: