Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2016/04/06)	20160407
tor8	Robin_Watts: so... looking at adding in the svg parser	11:40.00
Robin_Watts	tor8: cool.	11:40.17
tor8	I've got it running as a fz_document thing	11:40.24
	but it would be nice to be able to hook it into a fz_image	11:40.33
	so that epub can just use it as any other image	11:40.46
Robin_Watts	tor8: Ah. Interesting.	11:40.47
	It would be nice to do that genericly.	11:41.01
	fz_image are decoded syncronously.	11:41.58
tor8	question is ... do it for svg specifically or handle it via fz_document	11:42.01
Robin_Watts	Just pondering what locks are taken while they are decoded...	11:42.16
	do it via fz_document if at all possible.	11:42.34
	Feel free to leave that for me, if you want.	11:42.59
tor8	I vaguely recall seeing some epubs where the xhtml has svg embedded, not as a separate file	11:43.12
	but it should be easy enough to instantiate an svg document from a fz_xml thing	11:43.38
Robin_Watts	fz_document can be instantiated from an fz_stream, right?	11:44.26
tor8	yes.	11:44.36
Robin_Watts	So that should work out OK.	11:44.50
tor8	but it should be trivial to add a special 'init an svg fz_document from this fz_xml tree' for use in epub	11:44.53
	when we run into that problem	11:45.04
Robin_Watts	Oh, you mean, you want to run it post parsing rather than from the underlying stream.	11:45.38
	I was thinking that we'd open a stream on the subsection of the file in question.	11:45.57
tor8	it might be an 'inline' svg in the html document	11:46.10
Robin_Watts	tor8: yeah, so we find the byte range of the html document and make an fz_stream from that.	11:46.34
tor8	Robin_Watts: wasteful re-parsing of xml; and we have long since lost the byte range by that time.	11:47.03
Robin_Watts	tor8: yeah.	11:47.13
tor8	there's another gotcha we might need to look out for	11:47.27
Robin_Watts	I dislike special cases though.	11:47.32
tor8	if an svg document refers to external resources	11:47.35
	like png images, etc	11:47.43
	those need to be looked for in the parent context (i.e. the epub zip file)	11:47.57
	so I fear there are going to be quite a few special cases here :(	11:48.08
	One drawback of handling svg images as fz_image ... the pdfwrite document will write them as rasters not as vectors.	11:49.13
Robin_Watts	That one doesn't bother me so much, cos having an open_document_in_this_context (or something, where the 'context' gives access to resources) doesn't feel like a massive upheaval.	11:49.16
	pdfwrite is capable of asking "what format is the underlying image" and reacting appropriately.	11:49.53
	The fallback position is to write them as rasters.	11:50.01
tor8	Right. so an svg image could then have a case to write an an XObject form thing.	11:50.20
Robin_Watts	(actually to write them as flate decompressed lossless things).	11:50.21
	We have code that spots JPEGs and writes them unchanged.	11:50.40
	So we could extend that to spot other types too.	11:50.57
tor8	one alternative is to do FZ_IMAGE_VECTOR with a fz_display_list	11:51.00
Robin_Watts	tor8: Yes. That sounds nice in fact.	11:51.15
tor8	rather than go via fz_document	11:51.22
Robin_Watts	Well, going via document and having an fz_image_vector are not necessarily mutually exclusive.	11:52.15
sh4rm4^bnc	i wanted to update jbig2dec to 0.12... and using my google-fu i was finally able to find the tarball	13:24.00
	however its incomplete	13:24.11
	lacks install-sh, depcomp, config.sub config.guess and probably others	13:25.00
	is there a chance a fixed tarball gets uploaded ?	13:26.51
kens	Can't you just use the Git repository ? (I believe its in a Git repository these days)	13:27.30
	http://git.ghostscript.com/?p=jbig2dec.git;a=summary	13:29.05
	The 0.12 release is tagged there 15 months back	13:29.22
sh4rm4^bnc	nope, my distro works with release tarballs only	13:29.40
Robin_Watts	Also missing those files though.	13:29.43
kens	Hmm well Git and jbig2dec are not my areas of competence :)	13:30.07
Robin_Watts	I think you're expected to use ./autogen.sh	13:30.08
kens	That would seem likely	13:30.16
sh4rm4^bnc	no, because that pulls in unwanted build deps	13:30.29
Robin_Watts	sh4rm4^bnc: Such as ?	13:30.42
sh4rm4^bnc	autoconf, automake, libtool, m4...	13:30.43
Robin_Watts	Right.	13:30.47
kens	Well yes....	13:30.52
Robin_Watts	the build process for the code as we supply it is to use autogen.sh	13:31.07
kens	If you don't want to use autogen I suspect you are on your own	13:31.13
sh4rm4^bnc	i wrote about this here https://github.com/sabotage-linux/sabotage/wiki/Why-github-downloads-suck	13:31.33
	see "autoconf-dilemma"	13:31.46
Robin_Watts	If you want to avoid those dependencies (which you may be able to do because you know about your distro), then it's up to you.	13:31.50
kens	The fact that you don't like autogen doesn't really signify. That's the way we build it and the way we expect you to build it. Of course, if you don't like it you don't have to use it, but we aren't going to build it the way you want us to, its not how we work.	13:33.06
Robin_Watts	sh4rm4^bnc: I believe that as part of the release process, chrisl generates configure scripts from autogen.sh (for ghostscript at least).	13:33.26
	He may not do that for jbig2dec.	13:33.37
kens	He might, but I sort of doubt it	13:34.03
	It might also fall out as part of doing Ghostscript, I don't know wneouhg and he's not here	13:34.27
sh4rm4^bnc	tl;dr: using autogen.sh is a PITA	13:41.12
	it depends that you have the right versions of everything	13:41.30
	including m4 macros for deps	13:41.37
kens	You won't find any arguments here, our build maintainer (chrisl) doesn't like it either, but we agreed some time back to use autogen so at present complaining about it won't make any difference.	13:42.21
	I strongly suggest you come back when chrisl is online	13:42.41
sh4rm4^bnc	ok	13:42.48
Robin_Watts	kens: cluster seems unhappy.	13:56.57
kens	I was htinking that	13:57.05
Robin_Watts	I'm going to kill my job to give yours a chance to run.	13:57.11
kens	Hmm, OK	13:57.18
	The last one I ran was OK (I screwed up my code and got thousands of errors, but that's my fault)	13:57.51
	I hope I haven't filled up a scratch volume or something	13:58.06
Robin_Watts	The non bmpcmp version that I just ran was fine.	13:58.08
kens	Oh, well I guess that's encouraging, but I'll need a bmpcmp too if thsi works OK :-(	13:58.50
sh4rm4^bnc	is there any important security-related bugfix in 0.12 ?	13:58.57
kens	I guess I can try it and see what happens	13:58.59
sh4rm4^bnc	or is it safe to continue using 0.11 until the configure script is fixed ?	13:59.17
kens	sh4rm4^bnc : It was 15 months ago, I can't recall what the changes were, but if you look at our Gitweb interface you can see them all	13:59.50
	However a quick scan would suggest the answer is yes	14:00.41
sh4rm4^bnc	well CVEs are usually easily remembered	14:00.47
kens	I don't recall ever seeing a CVE for jbig2dec, which doesn't mean there aren't important security fixes	14:01.07
sh4rm4^bnc	i see	14:01.21
kens	Eg http://git.ghostscript.com/?p=jbig2dec.git;a=commit;h=6e1f2259115efac14cd6c7ad9d119b43013a32a1	14:01.45
	http://git.ghostscript.com/?p=jbig2dec.git;a=commit;h=4e682afbfcb79ea61b096af38f4fa703274c192d	14:02.04
	I alos see numerous segv fixes, prevention of heap overflow (3 of) etc	14:02.41
	And in fact back in 2013 there were 7 fixes for CERT reported issues	14:03.25
sh4rm4^bnc	ouch	14:03.26
kens	Err 2012, I can't read dates now it seems	14:03.47
	Robin_Watts : going to try a bmpcmp now.....	14:10.03
	Yeah that looks totally broken. Weird, it was working OK earlier	14:15.34
Robin_Watts	I see 106 rsyncs running.	14:17.39
kens	O.O	14:17.52
	Oh, it looks like it completed	14:18.04
	But I sent an abort, so I don;t know what really happened	14:18.15
et^	Hi! Anyone got a few mins to help me a bit? Trying to print a pdf as a A6 pagesize, but it comes as A4. :) (Ghostscript.Net)	14:19.34
kens	Ghostscript.NET is not, I'm afraid, anything to do with us. Although it does use Ghostscript, its developed, maintained and supported by someone else (j habjan)	14:20.24
et^	ah, ok :)	14:20.40
kens	So its likely we won't really be able to help you, but I'm willing to listen	14:20.43
chrisl	sh4rm4^bnc: Hmm, I should probably have done a jbig2dec release last month - it slipped my mind.....	14:23.20
sh4rm4^bnc	chrisl, would you be so kind as to add all the autogen-generated files ?	14:25.15
	it really makes life much nicer	14:25.28
chrisl	sh4rm4^bnc: that's one of the fixes I did last year (hence should have done a release this time around)	14:26.02
	sh4rm4^bnc: for some reason I cannot fathom, automake defaults to creating symbolic links to several of it's files (rather than copying them) hence the results of the default automake are specific to my system - which, AFAICT, is totally the opposite of the intent of the autotools	14:29.28
kens	Interesting article from the MS Build:	14:36.46
	http://www.theregister.co.uk/2016/04/07/microsoft_rethinks_the_windows_application_platform_one_more_time/	14:36.46
	Robin_Watts : my bmpcmp looks like its completing now	14:39.52
	You might like to retry yours	14:40.11
sh4rm4^bnc	chrisl, weird. maybe its a good idea then to untar the tarball to /tmp/foo or something and check if configure and make work as intended before publishing it	14:48.35
	(not wanting to sound like a smart-ass, but eh)	14:48.53
chrisl	sh4rm4^bnc: That wouldn't be enough - I'd have to uninstall the autotools, or have a "fresh" machine - which I will do. I just didn't realise automake was being to stupid.....	14:50.20
sh4rm4^bnc	cool thanks	14:55.32
chrisl	sh4rm4^bnc: if you check tomorrow about this time, the release should be ready, all being well	14:56.06
sh4rm4^bnc	great <3	14:56.31
Robin_Watts	132 rsyncs. (Well, actually twice that number, cos rsync calls rsync it seems, but...) that can't be right.	15:06.28
kens	It seems excessive	15:08.11
	it does seem to be running though	15:09.43
Robin_Watts	marcosw loops around each rsync call 5 times to allow for retrying.	15:10.05
	I wonder if that's going wrong, and it's actually running all of them at once.	15:10.23
marcosw	Robin_Watts: problem with the cluster?	15:10.33
Robin_Watts	marcosw: When a bmpcmp is run, casper has a massive load of rsyncs run before the jobs get started properly.	15:11.07
marcosw	before the cluster jobs are run on the nodes?	15:11.43
Robin_Watts	It sits there at 30/1000 jobs, with ~145 ââsshdââ¬â145*[sshdâââsshdâââtcshâââauthprogsââârsyncââârsync]	15:11.47
	with those being: rsync --server -logDtpre.iLs . /home/regression/cluster/bmpcmp/.	15:12.05
	marcosw: Yes.	15:12.11
marcosw	off hand I donât know why that would be but iâll look into it.	15:12.39
jogux	robin: I /think/ rsync forks rather than calling itself (bicbw).	15:12.48
Robin_Watts	jogux: The line above is from pstree. It shows (or seems to show) 145 instances of sshd calling sshd calling tcsh calling rsync calling rsync	15:13.53
	but I could be reading it wrong.	15:13.58
jogux	I think sshd is forking too, I think ps tends to show forks as separate child processes because 'unix'.	15:14.28
chrisl	forks are separate processes	15:15.07
Robin_Watts	jogux: OK, but 145 of them seems excessive.	15:15.27
jogux	that part I definitely agree with :-)	15:15.37
Robin_Watts	I wonder that 33 nodes * 5 retries each is in the right kind of ballpark.	15:15.52
jogux	chrisl: well, true, yes :-)	15:17.31
marcosw	but the retries happen sequentially	15:18.31
	and there isnât anything in the logs on the nodes suggesting that the retries are necessary	15:20.42
jogux	could one node be running multiple jobs at the same time, all of which are calling the rsync?	15:21.38
marcosw	jogux: thatâs true	15:23.54
	but that shouldnât just occur at the beginning of the run	15:24.49
Robin_Watts	jogux: At the point at which I'm seeing the rsyncs, the dashboard reports 30 jobs have been sent.	15:24.54
	hence if that was the case, I'd (naively) expect 30 rsyncs max.	15:25.11
jogux	marcosw: probably at the begginning of the run would be the only time they'd happen all that the same time, later ones would be staggered I would guess as jobs process at different rates.	15:25.30
	Robin_Watts: Your argument seems sound. Don't know enough about the cluster to counter :)	15:26.03
Robin_Watts	jogux: Different nodes build at different speeds.	15:26.09
*jogux*	nods.	15:26.17
Robin_Watts	And (AIUI) nodes are triggered by polling the clustermaster, rather than vice versa, so there is an additional "at any time within a 30 second period" factor there to.	15:26.55
	too	15:26.57
jogux	Robin_Watts: Hm. Makes it harder, but 33 nodes, that's still going to be an average of over one a second starting an rsync and I would bet the rsync rarely completes in under a second.	15:29.10
kens	No idea if its relevant, but my last bmpcmp came bicj with "rsync retry 1" 5 times	15:29.42
Robin_Watts	yes, but it lessens the difference between the start of the run and the middle of the run, I expect.	15:30.10
jogux	Robin_Watts: true.	15:30.55
	it's happening again now	15:32.03
marcosw	iâm seeing ~100 rsync jobs running, but thatâs after 870 jobs have been sent and all the cluster nodes are running	15:32.04
Robin_Watts	dashboard says "40/1000" sent.	15:32.28
marcosw	Regression marcos bmpcmp started at 04/07/16 15:27:38 UTC - 1000/1000 sent - 100%	15:32.44
	thatâs using the console dashboard	15:32.52
*jogux*	makes it about 150 rsyncs now	15:32.58
marcosw	presumably the http dashboard is delayed?	15:33.12
Robin_Watts	marcosw: I guess it must be.	15:33.22
marcosw	I donât see any easy way of preventing this. The cluster nodes have to upload the completed bmpcmp output and if they donât all do it at once itâs going to slow down cluster jobs.	15:34.03
	I suppose they could gather output nto a .tar.gz file and upload them in chunk.s	15:34.23
Robin_Watts	marcosw: Can't we upload all the files from a node at once at the end of the run?	15:37.59
	That would keep the number of rsyncs going on casper to the number of nodes.	15:38.29
	Would be slower of course, as we wouldn't start transferring files until they were all done.	15:39.02
	Best of all worlds might be to queue rsyncs from a node, so that no node ever has more than one rsync going at a time.	15:39.29
	So we'd get maximum use of bandwidth still, and not kill casper each time.	15:39.56
marcosw	the problem with queueing jobs is that it make each job no longer indpendent.	15:40.10
Robin_Watts	Possibly that might be as simple as taking/dropping a mutex around the rsync call ?	15:40.16
	In what way not independent ?	15:40.29
marcosw	the rsync command is built into the job that is sent to the cluster node.	15:41.14
	the node software just runs a bunch of these jobs in parallel. it doesnât know that the job contains an rsync.	15:41.40
Robin_Watts	marcosw: Can we change it from rsync to my_rsync? And then have my_rsync be a script on each node that takes a lock, calls rsync, then drops a lock ?	15:41.59
marcosw	that should work...	15:42.36
jogux	marcosw: possibly /home/regression/bin/authprogs could be tweaked to only allow <x> rsyncs at once.	15:43.01
	though Robin's idea works just as well	15:43.10
Robin_Watts	jogux: We don't want rsyncs to fail. We want them to block though.	15:43.22
jogux	Robin_Watts: Yeah, I mean, sleep if there are more than <x>	15:43.36
	'allow' was the wrong word	15:43.49
Robin_Watts	We could have a lockfile/rm pair in the script, and still use vanilla rsync?	15:44.02
	the target of lockfile could be set in an environment variable that could be set locally on each cluster node ?	15:44.34
marcosw	Robin_Watts: I like the lockfile idea, but why does it need to be different on each cluste rnode? canât we just use ./rsync.lock?	15:46.03
Robin_Watts	marcosw: We could, yes.	15:46.16
	I was worried that we'd want to use /tmp/blah or something, and /tmp might be different on windows nodes or something.	15:46.44
	or MacOSX nodes.	15:46.50
	but ./rsync.lock sounds fine.	15:47.03
marcosw	luckily we donâ thave any windows nodes :-) and Iâm pretty sure that /tmp works on mac os x	15:47.17
Robin_Watts	as long as we don't go changing directory.	15:47.21
	marcosw: We could have windows nodes though. The cluster stuff runs under cygwin. Or did a couple of years ago at least.	15:47.42
marcosw	(yeah, itâs just a symlink to /private/tmp)	15:47.48
	cygwin has /tmp	15:47.55
Robin_Watts	And with the new windows 10 bash stuff, it might run better when that's out of beta.	15:48.18
marcosw	pretty sure every unix program would break if /tmp didnât exist, but in any case we can use the directory that the bmpcmp are being generated in (./temp).	15:48.48
chrisl	You could make it consult the environment for which temp directory to use	15:49.53
marcosw	you mean TMPDIR?	15:50.36
chrisl	Or we could have our own "CLUSTER_TEMPDIR"	15:50.56
Robin_Watts	An environment variable? Crazy!	15:50.56
chrisl	Better than the registry <sigh>	15:51.29
marcosw	the cluster node already has a variable $temp that it uses for a temp directory ( currently set to â./tempâ). I will just use that.	15:52.15
chrisl	So, that'll work, then.....	15:52.33
marcosw	at one point I was experimenting with using ramdisks, so $temp was /dev/shm/temp)	15:52.54
	lockfile doesnât exist on mac os x, so thatâs a problem...	15:53.36
Robin_Watts	https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/lockfile.1.html	15:54.11
jogux	marcosw: flock is a bit more portable	15:54.40
marcosw	marcos@macbookpro:[37]% lockfile	15:55.02
jogux	lockfile is technically part of procmail iirc. and doesn't exist on my Mac.	15:55.03
marcosw	lockfile: Command not found.	15:55.03
Robin_Watts	http://stackoverflow.com/questions/10526651/mac-os-x-equivalent-of-linux-flock1-command	15:55.42
marcosw	jogux: flock doesnât seem to exist on mac os x either	15:55.50
jogux	marcosw: uh. so it doesn't. ignore me :(	15:56.04
	perl it is then :-S	15:56.17
marcosw	perl -MFcntl=:flock -e '$\|=1; $f=shift; print("starting\n"); open(FH,$f) \|\| die($!); flock(FH,LOCK_EX); print("got lock\n"); system(join(" ",@ARGV)); print("unlocking\n"); flock(FH,LOCK_UN); ' /tmp/longrunning.sh /tmp/longrunning.sh	15:56.21
Robin_Watts	https://github.com/discoteq/flock	15:56.26
marcosw	flock seems to be working. Iâve disabled the macpro node temporarily.	16:42.13
	\|-sshd-+-24*[sshd---sshd---tcsh---authprogs---rsync---rsync]	16:42.26
Robin_Watts	marcosw: Nice one.	16:44.27
marcosw	Robin_Watts: thx, but you found the problem and the solution.	16:44.58
jogux	as marcosw suspected, the performance of aws general purpose SSD is pretty poor - \| make the guaranteed speed for casper's root approx 8MBps, theoretically burstable upto 11.7MBps (contrast with a modern SSD that should achieve around 500MBps.)	16:55.32
	so that would explain why casper feels like it has a spinny disc :-)	16:55.44
Robin_Watts	jogux: We don't generally do much compute intensive on casper.	17:30.41
	Running the git server is probably the most intensive thing.	17:31.00
	The cluster shouldn't really be compute intensive. The recent problems were due to join going crazy cos I'd given it filenames with spaces in.	17:31.31
	s/cluster/cluster master/	17:31.53
	obviously the cluster nodes themselves are compute intensive :)	17:32.09
	But even then, probably not disc intensive.	17:32.21
sebras	Robin_Watts: jogux: are you seeing performance issues with casper?	17:34.25
jogux	we saw one where bmpcmp rsyncs were making out casper's I/O bandwidth several times over, but hopefully Marcos has fixed that though.	17:35.14
	s/making/maxing/	17:35.22
sebras	jogux: right.	17:35.42
	jogux: a simple read of a file gave me 37Mbyte/s which equates to almost 300Mbps. but that was without any processing at all.	17:36.42
jogux	Hm. I wonder if I got my IOPS -> MBps calc wrong. afaict, amazon guarantees us 2,000ish IOPS.	17:37.28
	(and should let us burst to 3,0000 IOPS temporarily)	17:40.31
Robin_Watts	jogux: Are those "Random 4k IOPS" ?	18:34.52
jogux	Robin_Watts: Urm, pass. https://aws.amazon.com/ebs/details/ is the entire extent on my knowledge on the subject :)	18:35.38
Robin_Watts	In which case 2000 IOPS is ~8MBPS.	18:35.47
	Do you mean 30,000 or 3,000 burst? :)	18:36.04
jogux	Robin_Watts: that page says 'Max IOPS Burst Performance3,000 for volumes <= 1 TB'	18:36.23
	I don't know how sebras got 37Mbyte/s. That would probably imply casper had at least some of the file cached in ram then I guess.	18:37.23
Robin_Watts	That page says Max throughput of 160MB/sec, which presumably matches the 10,000 IOPS/volume.	18:38.42
	If we take 30% of that (because of the <= 1TB thing) that's 16*3 = 48Mbyte/sec max.	18:39.14
	which is ~sebras figures.	18:39.24
jogux	Robin_Watts: I think when it says "max" it means "if you had a huge volume such that you had a burstable 10,000"	18:43.39
	I think our max is 3,000	18:43.52
	admittedly this may depend on our I/O size.	18:45.09
	http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-io-characteristics.html says a bit more	18:45.17
	interestingly running hdparm -t -T I get:	18:46.49
	Timing buffered disk reads: 248 MB in 3.00 seconds = 82.53 MB/sec	18:46.50
	but I never understand exactly what that means :-)	18:47.06
sebras	Robin_Watts: jogux: the first times I ran dd on my file I got 37Mbyte/s, and the fastest iteration was 97.8Mbyte/s but by then it was definitely cached.	21:56.09
	Forward 1 day (to 2016/04/08)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.