Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2014/11/06)	20141107
rayjj	For the logs: I replied to Saito-san about halftones (dither patterns) with Ghostscript. Hopefully, they will come back with a description of what they actually want and let us help them.	06:21.28
	what a pain. Have to do the health forms all over again :-( Good thing I saved a copy	06:23.05
avih	tor, at the docs for js_newcfunction, it says "The length argument is the number of arguments to the function. If the function is called with fewer arguments, the argument list will be padded with undefined." should probably start with ".. is the number of arguments the function assume to exist on its stack." and possibly add "The function can check the actual number of arguments on its stack using js_gettop(J) - 1. This number will never be smaller than	10:21.44
	length."	10:21.44
tor8	avih: I've figured out why array.pop() is so slow	14:25.11
avih	O(n)?	14:25.34
	but at the worst case for AA tree?	14:25.44
	bug*	14:25.47
tor8	O(n*n)	14:25.49
avih	heh	14:25.52
tor8	it's the handling of the "magic" length property	14:26.08
avih	i guess that should give some nice boost to any js code which deals with big objects.	14:26.27
tor8	for(i=0;i<1000;++i) --x.length will have the same slowdown	14:26.56
avih	hmm..	14:27.21
tor8	setting an array length needs to delete all numeric properties >= the length	14:27.40
avih	yup	14:27.47
tor8	the culprit is the jsV_resizearray function	14:28.24
avih	wait, but iirc at jsarray.c you manually delete elements past the new length.. don't you?	14:28.26
tor8	avih: yeah. I can fix the Ap_pop function to shortcut that	14:28.43
avih	so that's redundant?	14:28.46
tor8	but then the --x.length case will still be slow	14:28.49
avih	yeah, because it implies pop, just without the return value	14:29.09
tor8	yeah. and x.length = y needs to create an iterator to loop through and delete all the properties	14:29.39
	and creating an iterator is a bit slow	14:29.45
	not to mention looping through it	14:29.51
avih	splice as well iirc, though maybe not after the latest splice fix	14:30.00
	yeah	14:30.18
tor8	anywhere js_setlength is called, really	14:30.33
	it's a bit of a pickle doing this well	14:31.29
avih	so you always use this generic object structure for arrays? iirc there's some CARRAY stuff there, i initially assumed that's for array of objects. though i didn't try to folow the semantics there	14:31.33
tor8	I can of course loop through the actual numeric indices rather than use the enumerator	14:31.47
	but then it will behave badly for sparse arrays	14:31.55
	avih: the CARRAY stuff is just to trigger the automatic length handling	14:32.16
avih	n log n is netter than n^2 ;)	14:32.20
tor8	underneath that is just a regular object	14:32.23
avih	though n log n for pop sounds nasty, or for any change of length	14:32.49
	so, you got a nice approach to fix this?	14:33.19
tor8	for compact arrays, looping over the actual affected indexes (which is just the delta between the new and old length) will be well behaved	14:33.22
	but for a sparse array, it could get nasty fast	14:33.35
	x=[]; x.length = 1000000; x.length = 1	14:33.46
avih	yeah, that's sparse, but 1 -> 1000 ar[i]=x isn't	14:34.08
	or shouldn't.	14:34.24
tor8	each individual pop would be O(log n)	14:35.03
avih	yeah, that's the ideal case with aa	14:35.18
tor8	which is as good as it's going to get here	14:36.02
avih	did you think of representing arrays, up to some limit sparsness factor, as actual arrays?	14:36.07
	i.e. c arrays	14:36.25
	tracking the sparse factor is O(1)	14:36.54
tor8	yes, I have thought about it. shouldn't be too hard, but it does break the simplicity of everything handled the same way, adding a lot of special cases.	14:37.08
	although we already have a lot of special cases for CARRAY etc	14:37.36
avih	yeah. though tbh, o(log n) isn't bad, once it works	14:38.06
	it's probably best to get a reasonably stable implementation using the current generic object approach, and then start optimizing.	14:39.05
	as it stands, i think it's not mature enough, or at least not tested enough, to start optimize this stuff	14:39.23
tor8	yeah. I'm going to need to track the actual number of properties vs length in order to deal gracefully with the sparse array case	14:39.25
avih	yes, though that's o(1)	14:39.57
	(when i approached mujs i thought it's more mature/tested that it was. i still like this project a lot, and the response is great, and the code is nice and solid, so i'm sticking around ;) )	14:41.28
tor8	avih: I appreciate you sticking around and pointing out these oddities that still need to be ironed out	14:42.14
	as you've noticed, it doesn't take long to fix things once they've been spotted	14:42.46
avih	indeed, which is great	14:43.06
	tor8: btw, i noticed it was originally copyright-you, then changed to agpl. what's the history of mujs?	14:44.27
	was it a sort of private project which you added to muPFD later?	14:45.11
tor8	it started as a weekend project, then when it became useful for MuPDF and I started spending work time on it	14:45.13
	yes.	14:45.39
avih	yeah, that's what i thought. well done. just straighten up the kinks, and it's gonna be amazing :)	14:45.49
	btw, what did you think of the cygwin comment? the fix is real small and you get another target platform	14:46.55
nsz	btw there are very slow array tests in the official ecma test suit	14:47.09
	https://github.com/tc39/test262/blob/master/test/suite/ch15/15.4/15.4.4/15.4.4.15/15.4.4.15-5-12.js	14:47.20
avih	nsz: did you try running the suite on mujs?	14:47.34
nsz	eg this one does a[2^32 - 2] = null;	14:47.34
	yes	14:47.39
avih	how does it do?	14:47.45
nsz	lot of bugs	14:47.52
avih	heh	14:47.55
	crashes?	14:47.58
nsz	i have a list of failed tests if you are interested but probably easy to fix	14:48.04
	i started with the hardest one: floating-point printing and parsing	14:48.17
avih	i'm mostly a user of mujs, but the thought of running the test suite did cross my mind when i became aware of the fact that it's not as mature as i thought it was.	14:48.50
nsz	i have correct floting-point printing code now	14:48.51
avih	cool	14:49.15
	(i'm adding js scripting support to mpv as a hobby project)	14:49.39
nsz	http://port70.net/~nsz/failed.tar.gz	14:51.10
	if you uncompress this into the mujs repo then you can easily run the relevant subset of tests	14:51.39
avih	test.log only for the results, yes?	14:52.03
nsz	ah yes	14:52.25
	that was the result the last time i ran the suite	14:52.56
	running the whole suite takes a lot of time that's why i collected the failed ones	14:53.18
avih	oh, but these are only the failures, right? what's the percentage of passes? (even if percentage doesn't mean much.. still.. it's a figure ;) )	14:53.23
nsz	i dont remember and dont have that code here	14:53.50
avih	yeah, well, tor found out that decreasing the length of an array is O(N^2), so hopefully making it o(log n) should make things a bit faster ;)	14:54.22
nsz	these conformance tests often test useless corner cases	14:54.27
avih	indeed	14:54.45
	i spent considerable amount of time with the conformance tests of my Promise implementation.	14:55.03
tor8	avih: changing the length of an array was O(n)	14:55.31
avih	oh..	14:55.48
tor8	avih: now it's going to be anywhere between O(delta-n) and O(n)	14:55.56
avih	delta?	14:56.09
tor8	if it's sparse, O(n), if it's not sparse O(log n) * (old length - new length)	14:56.36
avih	but O(log n) on average, even for the case of looop 1000 pop?	14:56.38
tor8	so pop() will be O(log n)	14:56.48
nsz	btw my mujs hack to unscramble youtube direct streaming urls worked fine	14:57.57
	however youtube html5player.js is about 1M	14:58.09
	to parse this in and run the appropriate unscrambling function in it takes >400ms with mujs	14:59.14
	(and about 150ms in nodejs from which 40-50ms is the startup time)	14:59.39
avih	tor8: if the elements were ordered at the tree by the array index, then truncating an array would only be O(delta)	14:59.40
	or delta log n	15:00.11
nsz	(i dont understand how ppl are not fed up with the web: 100ms is a huge latency on a ui)	15:00.22
avih	nsz: indeed it is.	15:00.38
	or at least should be considered big enough to be uncomfortable	15:00.57
nsz	anyway i can play youtube videos now ..with mpv :)	15:00.58
avih	:)	15:01.07
nsz	it just takes a bit of time to do the parsing	15:01.31
avih	you parse the google js code in js?	15:01.43
nsz	but it is still much less than the time you wait in a browser	15:01.44
	i parse js with mujs	15:02.18
	and eval one function with a short string argument	15:02.39
avih	and look for the stream url?	15:02.43
nsz	stream url is in the html, but for some videos it is 'obfuscated'	15:03.05
	and the deobfuscator is in js in html5player.js	15:03.20
avih	so that's what you parse for?	15:03.21
nsz	yes	15:03.26
avih	k	15:03.28
	so how does it work? you input a youtube url into your tool, and it "imports" the video?	15:04.02
tor8	nsz: glad you finally got it working!	15:04.06
nsz	avih: i have a yurl tool that takes a youtube url or video id (or even playlist id used to work) and then tries its best to download the webpage, parse all sorts of relevant info from it, download the js if needed and then output the results in a nice format	15:05.14
	and then i have yget and yplay that use this yurl tool	15:05.32
	(they can select stream format etc)	15:05.42
avih	nsz: and the result being, stream[s] url?	15:05.44
nsz	and the way i use it is from w3m browser	15:05.52
	i have a key if i press it on a youtube link then the video starts playing in mpv	15:06.17
*avih*	is unfamiliar with w3m.. he thinks	15:06.22
nsz	w3m is console browser	15:06.33
avih	oh	15:06.41
	and why do you use a console browser? :)	15:06.51
	there are some nicely modern UI based browsers these days, you know ;)	15:07.13
nsz	gui browsers are too much pain when you are programming in terminal	15:07.55
	*terminals	15:07.59
avih	and the video? ascii art? :)	15:08.36
nsz	no, i run x	15:08.59
	just with many terminals	15:09.09
avih	dunno.. never felt comfortable with text browsers. and i've been playing with gopher before there was mosaic :)	15:09.40
tor8	avih: delete performance fix has been pushed. I hope that solves the performance hits you've been encountering.	15:09.52
	avih: w3m is a usable lynx :)	15:10.24
avih	tor8: noticing. my code doesn't have such long arrays, but i was testing it anyway... :)	15:10.38
tor8	avih: no, it was a good spot	15:10.47
	and well worth fixing	15:10.50
avih	absolutely :)	15:10.58
nsz	avih: you cannot use a gui browser without a pointer device and they are incredibly slow to start up	15:11.16
tor8	avih: nsz: I'm not super stressed out about passing all the ecma test suite tests; a lot of them are silly and many of the test suites have outright bugs	15:11.37
	in that they don't test what they claim to be testing	15:11.44
avih	nsz: indeed. though there are alternative navigation methods	15:11.48
tor8	but if you spot any actual crashes, I'm more concerned	15:12.03
nsz	ok	15:12.15
avih	tor8: crashes, or clearly incorrect behavior, i hope ;)	15:12.33
tor8	I will eventually get around to passing the tests, it's not that I don't care about them it's just I've got a lot on my plate and I have to prioritize	15:12.34
	avih: yes.	15:12.39
	anything that makes it actually unusable :)	15:12.54
avih	sure, i understand. and it is making very nice progress. just need some more users to use it for some heavier duty stuff.. and your bugmail will grow quickly ;)	15:13.29
tor8	avih: yeah. you always need users for a project to mature.	15:14.05
	garbage collecting strings is probably the highest priority TODO item on MuJS at the moment	15:14.31
	but that may have to wait a few weeks while I'm focusing on the EPUB module of MuPDF	15:14.54
avih	it does seem considerably solid though, considering the very few users it has...	15:14.56
	GC for API strings? or internal?	15:15.25
tor8	all strings	15:15.46
	the lifetime will be as we discussed earlier, and the external API shouldn't change	15:15.59
avih	so currently you don't GC strings at all?	15:16.50
	yeah, lifetime == as long as on stack sounds perfectly reasonable. and decent too.	15:17.22
	w00t! 1000x pop instantly! :)	15:38.12
	not even twice as slow as push... :	15:38.49
	[test] 1000x push: 2.2 ms	15:38.50
	[test] 1000x pop: 2.0 ms	15:38.50
	1000x pop used to be ~900ms	15:39.22
tor8	avih: yeah. picking a better algorithm helps. :)	15:39.46
avih	:p	15:39.52
	tor8: btw, i recall an issue on github where someone said that Date.now() doesn't change after a very long loop. i bumped into this issue too, but since i'm embedded, i exposed another get_time function, but Date.now() should robably be fixed	15:41.00
	or that it does change but in 1s multiples	15:41.19
	also, if you want users, i think you should keep the issues section open. take the burden on yourself to duplicate the bugs into ghostscript.com and post a url to the ghostscript bug at the github page imo	15:45.09
	people will post bugs to github quickly, but an order of magnitude less will create an account to file at ghostscript	15:46.05
nsz	github?	15:47.23
avih	https://github.com/ccxvii/mujs	15:47.50
nsz	i dont think mujs is hosted on github	15:47.55
avih	tor updates it frequently after updates to master	15:48.14
nsz	i see	15:48.19
avih	though still manually	15:48.28
nsz	well github is useful for mirroring	15:48.32
avih	yes	15:48.38
	and bugs	15:48.43
nsz	i dont know about bugs.. or any service that needs a github account	15:50.07
	but maybe js ppl use github a lot	15:50.18
avih	dunno, i don't use gh a lot, but it does have many many more registered users than bugzilla at ghostscript ;)	15:51.17
chrisl	Well, after an immense amount of faffing about, I finally have two instrument based Visual Studio profiles from the 8.71+ and 9.06+ versions of cust 532's simulator. And the results are..... complete garbage, nonsense of highest order. What a crock of sh*t :-(	16:09.36
Robin_Watts	instrumented profiles are rarely useful IME.	16:10.47
chrisl	Well, sampled ones are completely useless	16:11.38
	Or, I should say, sampled profiles are largely useless when profiling an interpreter	16:12.56
	But sampling also falls apart when an application uses threads	16:13.30
	I would, however, expect that at least the call counts would be meaningful in an instrumented profile	16:14.07
Robin_Watts	sampling with threads falls apart, (in the absence of integration with the threading system)	16:15.44
chrisl	With Ghostscript, especially doing PDF, I find sampled profiling along isn't as helpful because it doesn't tell you whether the function is taking more time, or is being called more often than before	16:17.42
	Personally, I find instrumented profiles give me a better hint where to look, but that's probably because I'm used to reading them. But in the case, the output is genuinely nonsensical	16:21.22
	Also, the fact that VS takes over 25 minutes to "analyze" the profile so you can actually view it is just totally insane........	16:23.38
Diemex	Can I detect if I'm dealing with a scanned/picture only pdf? That would be great so I don't have to redraw when zooming in. I'm using mupdf on android	18:45.29
Robin_Watts	Diemex: In what way wouldn't you have to redraw?	18:50.49
Diemex	Well if I know that the page is scanned I can render it once at the resolution of the scan. When I zoom in using pinch to zoom I would usually redraw the page at a higher resolution so text becomes crisp. There is no point in redrawing if the page is a picture. It would just waste cpu and more memory for the larger picture size.	18:52.48
Robin_Watts	Diemex: So you're doing your own blitting of the MuPDF rendered image?	18:56.32
	In the current MuPDF android app, the bitmap is drawn at the requested resolution, and we scale any images to that resolution.	18:57.07
	When you zoom, we redraw the page at a higher res, causing images to be rescaled.	18:57.24
	To do what you suggest would mean you replacing part of that pipeline.	18:57.37
Diemex	Well I use mupdf in my own app and am looking for a function like isScannedPdf(). If it's scanned I wont scale it. What is "blitting"?	19:00.12
Robin_Watts	blitting: I'm informally using to mean "taking a prerendered bitmap and shoving it on the screen"	19:00.55
	Diemex: OK, so there is no isScannedPdf function.	19:01.13
	But it would relatively easy to write one.	19:01.22
	So you speak C ?	19:01.44
	Do you speak C ? (sorry)	19:02.00
Diemex	My c is very bad. It's like "You say I say"	19:02.42
tor5	Robin_Watts: didn't we have code somewhere that detected an image-only page by looking at the display list?	19:03.03
Robin_Watts	tor5: I was going to suggest writing a device that looked at the calls to see if it was only images.	19:03.24
Diemex	I was thinking that one could look at the list of "pdf objects" if it's only images -> scanned	19:04.19
Robin_Watts	Diemex: Right. The easiest way to do that is to run the page through a custom device, and look at the page objects you are called with.	19:05.18
	And to avoid processing the page twice, you'd use a display list.	19:05.35
	But bear in mind that not all the images on the page might be the same resolution, so you'd need to figure out the max res.	19:05.54
Diemex	It would be good to get a "bounding box" around all the images. Basically the size of the page in pixels. Then render the page in that resoluton.	19:07.08
Robin_Watts	Diemex: I disagree with that way of characterising it.	19:08.38
	PDF pages have a well defined physical size.	19:09.01
	So for each image you'd look at how it was scaled onto the page and calculate a dpi for it.	19:09.36
	You find the maximum dpi of any image on the screen.	19:09.54
	Then you can figure out how big the page would be when rendered at that dpi.	19:10.05
	And that's what you render.	19:10.13
Diemex	That sounds better. Now I need to figure out where to look and add the functionality.	19:14.14
Robin_Watts	Diemex: It will require you to write a new fz_device in C.	19:16.11
	Look at source/fitz/test-device.c for an example.	19:16.53
Diemex	Are there any docs on what every class/file does? What is a fz_device?	19:19.43
Robin_Watts	Diemex: The only docs is the source. The header files are fairly well commented.	19:21.07
	Diemex: Basically when MuPDF interprets a PDF file, it breaks the page down into a series of drawing operations.	19:21.40
	For each of these drawing operations, it calls a function in an fz_device structure (a method of an fz_device class, if you prefer OO terms)	19:22.17
	We have devices that render these to screen.	19:22.33
	Or store them into a display list to be played back later.	19:22.43
	Or that extract text from the objects.	19:22.53
	Or that work out if a page is black and white or color.	19:23.02
	etc,	19:23.04
	You're writing another such 'consumer of page objects', right?	19:23.15
Diemex	Okay: So I listen to the draw calls. If a draw call for something other than an image is called it is not scanned. I could set a bool variable to false and would have my isScanned() function	19:25.12
	I'm a bit confused. C is not oo, right? How would I add a global bool variable to the device?	19:26.16
Robin_Watts	Indeed, C is not OO.	19:28.22
	But you can use C in an OO style.	19:28.28
	You don't have classes and members natively in C.	19:28.44
	but we achieve the same result using structures with function pointers in.	19:28.56
	If you look at test-device.c you'll see a device that looks to see if all the incoming drawing requests are greyscale or color.	19:29.34
	That should give you a good starting point.	19:29.42
Diemex	This is function pointer ?	19:51.42
	void (fill_image)(fz_device , fz_image img, const fz_matrix ctm, float alpha);	19:51.43
	type void??	19:51.52
Robin_Watts	It's a pointer to a function.	19:54.50
	The function takes the list of arguments on the right, and returns void.	19:55.04
Diemex	Oh I have an idea. If I wanted to be super memory efficient I could detect if a page is greyscale. Then I would only need one byte per pixel instead of 4 byte. Sadly android only has a bitmap that saves a translucency.	19:59.31
	A I could render a white background and then render a completely black picture above	20:00.30
	The rendered picture with the transluceny values could be used as a mask for the black overlay	20:00.55
	Robin_Watts: If I want to save a variable in my device I have to do it in a struct right? Or where do I define "member variables"	20:13.13
Robin_Watts	Diemex: In the user data in the struct.	20:20.12
	which can be a pointer to another struct if you need more than one.	20:20.36
Diemex	is a device per page or per pdf file?	20:22.36
Robin_Watts	per page.	20:23.41
Diemex	What determines the speed of rendering a pdf? I noticed that there is a big difference in rendering time ranging from about 100ms to 1000ms per page. 100ms for a text only pdf and 1000ms for a scanned pdf, which is the reason why I want to not have to redraw scanned pdfs. Pdfs from publishers seem to be rather fast. Pdfs that have been autogenerated by ms office or similar seem to be generally slower	20:30.06
Robin_Watts	Diemex: The delay is almost certainly the time taken to smooth scale the decoded image to the screen.	20:30.52
	text only PDFs don't involve bitmap scaling.	20:31.05
Diemex	If I would render scanned pdfs at their native resolution without scaling it would be faster?	20:31.37
Robin_Watts	You'd avoid the scaling stage there yes.	20:31.56
	but you'd then have to scale the native res thing onto the screen.	20:32.05
	which would be just as slow.	20:32.10
Diemex	scaling with a canvas is pretty quick	20:32.43
	So I'm trying to determine which function calls would make it a scanned pdf: If one of these is called it has scalable content: fill_path, stroke_path, clip_path, clip_stroke_path, fill_text, stroke_text, clip_text, clip_stroke_text If one of these is called it has an image: fill_image_mask, clip_image_mask, fill_image what about these: ignore_text, fill_shade, pop_clip, begin_mask, end_mask, begin_group, end_group, begin_t	20:39.02
	Forward 1 day (to 2014/11/08)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.