Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2015/01/28)	20150129
tor8	Robin_Watts: got a minute?	11:43.09
Robin_Watts	sure.	11:45.38
	<FX: Runs to fetch caffiene>	11:45.48
tor8	I nuked the embedded contexts in fz_stream and fz_output. the results are nicer than I thought, but I'll need to run some benchmarks to see if it's made an performance impact.	11:46.19
Robin_Watts	tor8: So everywhere you pass a stream or an output you now need to pass context too?	11:47.00
tor8	my instinct tells me it should be a wash, passing an extra argument vs doing a pointer dereference to get the context back out	11:47.02
	Robin_Watts: yeah. making the API symmetrical, always take a fz_context no brain activity required	11:47.20
Robin_Watts	symmetry is a good argument.	11:47.40
tor8	and hopefully removing the need for the rebind magic that can go wrong so easily	11:47.52
Robin_Watts	I still really want to replace fz_stream with estream.	11:48.03
tor8	what is estream?	11:48.14
Robin_Watts	estream is the stream abstraction we use in sot.	11:48.26
tor8	I'm not opposed to rewriting the fz_stream api and merging fz_stream and fz_output somehow	11:48.30
	the entry points in most places should be similar with read/write	11:48.51
Robin_Watts	I had not considered merging stream and output.	11:49.09
tor8	should let us filter stuff both on input and output, so we can compress using the same filters	11:49.32
	though our current filters only decompress	11:49.38
	not something I consider hugely important, though	11:50.03
Robin_Watts	no.	11:50.09
tor8	but we should be using fz_stream and fz_output for all of our i/o needs	11:50.26
	and also more consistently use fz_buffer for memory buffers	11:50.47
Robin_Watts	I think estreams are easier to think about/code, but we have stuff working at the moment, so it's purely a refinement at the moment.	11:50.59
tor8	anyway, what I wanted to discuss is the fz_device interface	11:51.12
Robin_Watts	got a patch on line?	11:51.13
tor8	yeah, tor/drop branch	11:51.19
	lots of search-and-replace style commits	11:51.43
	I was hoping to make the device callback functions of the form foo_fill_path(fz_context, void *user, ...other arguments...) rather than fz_device	11:52.31
Robin_Watts	That might tie in with something else.	11:53.40
	You have a stylistic thing of doing structs of a fixed size (like fz_device) and then having a void * user pointer.	11:54.17
	My instinct is to have a struct like fz_device, and then have other structs based off that, like: fz_device_foo being { fz_device base; extra fields... }	11:55.24
tor8	yea mean to embed the fz_device at the beginning of the user data instead?	11:55.31
	yeah. I've considered that	11:55.35
Robin_Watts	It means less mallocing generally.	11:55.51
tor8	needs nasty type casting though, but less mallocing and means passing the same pointer semi-transparently	11:56.04
Robin_Watts	I think it's important than in device callbacks we always pass the device.	11:56.16
tor8	if only C had the plan9 C extension where an anonymous struct at the beginning of a struct would type punt	11:56.23
Robin_Watts	s/callbacks/functions/	11:56.28
	tor8: yeah.	11:56.35
	You don't need type casting.	11:56.57
	You can just pass &dev->base rather than dev	11:57.21
tor8	somewhere you need to cast back or to fz_device_foo	11:57.58
Robin_Watts	and if you're consistent about using 'base', you can #define BASE(x) (&x->base)	11:58.08
tor8	or you mean to let the user see fz_device_foo structs?	11:58.21
Robin_Watts	tor8: True.	11:58.24
tor8	Robin_Watts: this is what we do for fz_document/pdf_document	11:59.09
Robin_Watts	Right. Possibly cos I was the last one to touch that? :)	11:59.33
tor8	but annoyingly they have duplicated fields... both fz_document and pdf_document have a ctx, so there's both doc->ctx and doc->super.ctx :/	11:59.33
	it's newish code and I might have been influenced by you :)	11:59.46
Robin_Watts	tor8: yeah that's not great.	11:59.52
tor8	no, but I think we might be able to get rid of the fz_document ctx	12:00.09
Robin_Watts	But to return to the device functions...	12:00.10
tor8	but I'll deal with *_document later, devices are my current focus	12:00.24
Robin_Watts	I think we do need to pass dev into every device function, not just dev->user	12:00.47
tor8	yeah. we need the 'hints' flag in the subdevices	12:02.00
	the error depth thing is done at the wrapper layer that calls the function pointers IIRC	12:02.12
Robin_Watts	And if you want to do a device that (say) maps stroked text to text, you want the dev so you can pass on.	12:02.42
tor8	Robin_Watts: true enough	12:03.45
	okay, so we really do need to pass the fz_device along rather than just the user pointer	12:04.07
Robin_Watts	I feel vaguely uneasy about passing context everywhere in speed critical places.	12:04.14
tor8	which makes the choice of user pointer or embedded struct moot, the subdevices can do whichever	12:04.33
	Robin_Watts: yeah, but I'm going to measure some benchmarks now and see if it actually has an impact	12:04.48
Robin_Watts	tor8: perfect.	12:04.56
tor8	I wouldn't be surprised if it's actually faster for those functions that actually use the context	12:05.00
Robin_Watts	the user pointer/embedded struct is an issue.	12:05.05
	cos we currently have fz_new_device (or something like that) that allocates a basic fz_device struct.	12:05.36
tor8	Robin_Watts: right. pass a size_of_extra argument as well?	12:06.18
Robin_Watts	We would need an fz_init_device (or an fz_new_device that took sizeof(required struct)) to neatly allow the oo way of working.	12:06.21
	tor8: I think we should standardise on one or the other.	12:06.37
tor8	Robin_Watts: the construction of fz_document is a nasty mess	12:06.43
Robin_Watts	but yes, that would be a good start.	12:06.47
tor8	so we could consider cleaning both of these up at the same time	12:06.59
Robin_Watts	fz_document is slightly weird cos of pdf_no_run maybe.	12:07.15
tor8	also, the pdf_process struct uses another pattern, init/fin on a stack allocated struct	12:07.26
	Robin_Watts: possibly, but I think it just grew organically and mutated in odd ways	12:07.41
Robin_Watts	tor8: init/fin may not be a bad thing to follow.	12:07.57
tor8	which is a pattern which we could standardise on, but I'm not sure the gain (one malloc/free pair) is worth the extra API cognitive load	12:08.23
Robin_Watts	init/fin are construct and destruct without malloc/free	12:08.26
tor8	Robin_Watts: yeah. I'm not convinced of its benefit in non-performance related code, since we're heavy malloc users everywhere else	12:08.55
Robin_Watts	yeah, but where it's needed for performance it's a win.	12:10.15
	init/fin can be thought of as an internal part of new/free.	12:10.36
	and we only expose it in circumstances where it's required.	12:10.54
	it's not a huge change to the way we work.	12:11.14
tor8	no, it's fine for our internal interfaces	12:11.29
	but not something I'd want to expose to the public (since it requires non-opaque data types)	12:11.48
Robin_Watts	indeed.	12:12.41
tor8	agh, don't quit the editor when focused on xchat!	12:14.32
paulgardiner	init and fin are a pain in some places within sot. Forces to have object plus flag saying whether the object is allocated or not, although I suppose it would be possible to find special values within the struct to mark something as not allocated	12:21.33
Robin_Watts	paulgardiner: Yes, they can be a pain, but they can also be powerful.	12:22.39
	SOT has them for a reason.	12:22.54
paulgardiner	Not in the case I'm refering to.	12:23.09
	I think the argument was "well pthreads uses it and that's Lunux so it must be right"	12:23.35
	If the malloc is the only possible failure then that can be a good reason for init/fin	12:24.03
*kens*	heads for some lunch	12:27.41
tor8	Robin_Watts: inconclusive first benchmark results indicate that passing the ctx everywhere is actually faster	13:09.35
	than fetching it out of a struct	13:09.51
Robin_Watts	ok...	13:10.01
tor8	and on other test files it's marginally slower	13:11.29
	pdfref17 is faster with context everywhere (by ~50 ms over the whole document)	13:12.54
	it's in the hard to measure category...	13:13.53
	and about the same diff slower on a more graphically intensive document	13:15.54
	so I'd say not to worry about it for now, the API benefits outweigh any performance differences	13:16.13
Robin_Watts	I guess that if it's hard to measure, it's in the 'we don't care' area.	13:16.22
tor8	especially since it seems to be a wash	13:16.24
	some things marginally faster, others marginally slower	13:16.38
	I expect we might be slower now than in the previous case and only for functions that don't use the ctx themselves	13:17.15
	otherwise the time spent passing the extra argument is just spent fetching it back out of the struct	13:17.28
	and those functions are pretty rare	13:17.36
	and any inlined code should be faster, since there's no pointers that have to be fetched	13:18.05
Robin_Watts	We use ctx a lot. Either to pass it to other functions, or in try/catch.	13:18.31
tor8	Robin_Watts: have you got an arm platform to test on?	13:18.33
Robin_Watts	tor8: I have a beagleboard.	13:18.43
	and a pi.	13:18.50
tor8	could you do a "time mudraw -5 pdfref17.pdf" on those comparing master to tor/drop?	13:19.26
	if you're not busy?	13:19.36
Robin_Watts	not immediately.	13:19.43
tor8	it would be good to know if arm makes more difference, intel cpus are so finicky and hard to predic	13:20.05
	t	13:20.06
Robin_Watts	it would.	13:20.15
kens	Hmm Office on Android now available:	13:23.23
	http://www.theregister.co.uk/2015/01/29/apple_office_android_tablets/	13:23.23
pedro_mac	seems to be cloud only though, canât open docs already on your device	13:37.01
	(or at least not directly from their app)	13:37.31
	and can only save edits to a cloud share	13:39.03
kens	No ideas, its clearly limited since you need Office 365 subscription to create/edit docs	13:42.10
	Makes it a viewer then I guess	13:42.24
pedro_mac	it has dropbox & sharepoint support, so you donât need a subscription but it doesnât let you save/load to device	13:46.58
	strange choice	13:47.06
kens	The article says you need a subscription to Office 265 to edit or create documents, is that not correct ?	13:47.53
	Office 365*	13:48.01
pedro_mac	I have it on my phone and just use dropbox	13:48.24
Robin_Watts	Office 265 is the european public sector version where they take more holidays.	13:48.32
kens	:-D	13:48.39
	Some of the comments on the web site seem to indicate that indeed you don;t need a subscription	13:50.25
pedro_mac	its a massively cut-down editing experience too	13:52.43
	I get the choice of making my text red, yellow or green	13:52.57
kens	Well, that's good news for SOT right ?	13:53.11
Robin_Watts	The news is not as bad as it might have been.	13:53.38
pedro_mac	no font selection either - just size, style and a choice of 3 colours	13:54.18
	probably enough to encourage people to buy the pro version if/when it arrives	13:54.45
	they have 1 million downloads so far though, and a 4 star rating	13:55.48
	for a 1 star feature set - hey, what do yu want for nothing?	13:56.17
neves	Hi!I'm developing android app,which should download pdf file from url.My problem is,I must download not all pages for one time,but separately,every page,so user shouldn't wait until all document will be downloaded.Is it possible,using MuPDF?	13:56.25
Robin_Watts	neves: MuPDF has code in its core to allow documents to be displayed as they are downloaded.	13:57.32
	With a suitable linearized file, MuPDF will therefore show pages as they appear.	13:58.07
	If you have an http fetcher that can do byte range requests, you can even jump ahead in the file, and pages will be preferentially loaded as you look at them.	13:58.46
	BUT... that code is not hooked up for the android version.	13:59.02
	You can probably hook it up yourself if you want if you are a competent C programmer.	13:59.41
	neves: Are you aware of the licensing situation with MuPDF?	14:00.08
neves	No,I'm not.Yet	14:00.27
Robin_Watts	MuPDF is developed by Artifex (us).	14:01.39
	We release it in 2 ways.	14:01.47
	Firstly, if you are happy to abide by the terms of the GNU GPL, then you can use MuPDF under that license.	14:02.07
	This means (among other things) that you must give away all the source to your application.	14:02.28
	(sorry, that should read GNU AGPL, but the difference is probably moot in this case).	14:02.49
	If the terms of the GNU AGPL are impossible for you to live with, then we can sell you a commercial license that lets you do what you want.	14:03.47
chrisl	To clarify, with (A)GPL you don't have to "give away all the source....", you retain the copyright, but the source must be openly available/modifiable and re-distributable under the same license terms	14:04.56
neves	Ok,thank but.But since I'm android developer it would be very hard to implement changes in mypdf core,I think..	14:09.47
Robin_Watts	neves: The changes required are not in the mupdf core.	14:11.52
	They are in the android specific wrappers around the core.	14:12.02
	but that will require some C/JNI/Java	14:12.49
	Or with a suitable commercial contract, we could possibly do the work for you.	14:13.13
*Robin_Watts*	foods. bbs.	14:15.04
neves	Ok,thanks for your help!sorry,I can't offer you a commercial contract since I'm just a developer	14:37.22
Robin_Watts	neves: No worries. If you decide to tackle it, let us know.	15:01.01
henrys	kens: the problem I punted to to you? Did you see it?	15:41.26
kens	Yes, there are several parts to it	15:44.24
	I was going to send round an email for comment, to tech	15:44.43
	But I was hoping to fix an actual limitation exposed by the code first.	15:44.58
	Which I'm getting nowhere with at the moment	15:45.06
	I'll finish up the email and send it for comment	15:45.23
henrys	kens: they contacted me again yesterday for a schedule so if it's a big "todo" let's make a bug with your analysis and point them to it.	15:51.07
kens	It cna be partially solved 'reasonably' quickly, partly on their end, partly on ours. A major portion is not triuvial and would be weeks to months of work. I'll finish this email and you cna read it.	15:51.51
henrys	chrisl: we missed your gs font expertise yesterday.	16:00.38
chrisl	henrys: I saw some of the discussion - but didn't read it all in details	16:01.19
henrys	chrisl: probably don't need to.	16:04.32
chrisl	henrys: what I can I say is that we can't use the same trickery for TTFs that we do for UFST/Microtype - we could do something sort of similar, but it would be potentially much more complicated	16:05.42
henrys	chrisl: don't worry about that... but otf cff is the direction.	16:06.37
chrisl	henrys: Okay, I've been doing a little experimenting, although I've been using just CFF, not OTF......	16:07.25
henrys	why does adobe acrobat ship with otf instead of cff?	16:07.59
chrisl	henrys: I assume for greater compatibility - the Windows font engine can (sort of) handle OTF/CFF, but not bare CFF	16:08.41
henrys	chrisl: well we can think about converting what urw is going to deliver but it doesn't look like a big savings over otf.	16:10.03
chrisl	henrys: the base URW gs font set (the latest ones we just got) got from 2.4Mb in Type 1 pfb format, to 1.1Mb in "bare" CFF	16:10.27
henrys	yes there is about 40 to 50 percent from type 1 to cff but the savings. I was talking about the difference between cff and otf with cff outlines	16:11.29
chrisl	Yeh, I'm just not sure right now how to poke fontforge's scripting interface to produce OTF/CFF - hence I tried CFF first	16:12.19
henrys	tor posted a script to pastebin I've been using.	16:12.46
	well I just took out the cff extension and used otf in his script and it seems to work. But how good is fontforge? I feel like I'm depending on this thing for these numbers and have no clue if it's producing something reasonable. Have you done a cluster push with a cff substitute for the a type 1?	16:14.34
chrisl	I haven't clusterpushed yet, the CFF fonts don't quite work with Ghostscript because of some of the crazy sh*t we do when loading fonts	16:15.38
henrys	chrisl: I think I can do a cluster push with the type 1 courier converted to otf with pcl. Be interesting to see the bmpcmp	16:16.20
chrisl	henrys: I'm not sure that will work.	16:16.56
kens	henrys mail sent to tech, its a bit lengthy I'm afraid but its hard to explain this problem quickly. I really would like you at least to read it and consider what (if anything) we should do about this particular issue. Other opinions welcomed by the way (hint hint; chrisl, ray, Robin etc)	16:18.05
Robin_Watts	kens: 'form cache' sounds like something that would be done in MuPDF using a display list.	16:20.28
	The idea of having to use clist to do it in gs makes me go cold(er).	16:20.46
kens	Robin_Watts : there's lots of ways to do it, GS doesn't do it at all	16:20.52
	It doesn't have to be done for low level devices at all. Its possible that we could store an /Implementation in the form dictioanry and have the form code check it, if it finds that, it doesn't execute the form, just sends the Implementation to the high level device (for an example)	16:21.42
Robin_Watts	"MD65"	16:21.56
kens	Ooops	16:22.01
Robin_Watts	13 times as good as MD5.	16:22.09
kens	13 times slower too ?	16:22.17
Robin_Watts	"WHich"	16:22.52
	"/R19as"	16:23.21
kens	I was in a hurry writing a lot of this.....	16:23.33
Robin_Watts	"WHen"	16:23.38
kens	Really I;'m more interested in comments about the facts and implications than spelling mistakes	16:23.56
henrys	and Shapr is the one I noticed ;-)	16:24.04
Robin_Watts	"R18 Do"	16:24.19
	kens: yes, just mentioning stuff as I go.	16:24.35
henrys	how is this different than pdf/vt that I'm constantly badgered about at tradeshows is it completely separate?	16:24.37
kens	This is PostScript input	16:24.48
	PDF/VT is a way of doign the same task with PDF input (sort of)	16:25.06
henrys	right but presumably if we had PDF/VT machinery in the code.... then it would be useful to this problem.	16:25.42
Robin_Watts	kens: Reads well to me.	16:26.07
kens	He could rewrite his contents as PDF/VT, yes	16:26.07
chrisl	kens: I'm assuming that the Implementation key could simply be an integer index, and that would be sufficient for high level devices?	16:26.47
kens	He would have to convert the fixed portion to PDF separtely (3 pages) then add the variable portion (which is all 'aaa' and similar in his test file to the 'VT' definition, which I don't recall offhand	16:26.47
	chrisl yeah I was thinking the object number already in the PDF file would be easy	16:27.05
	It woudl work for ps2write and pdfwrite, teh PS front-end doesn't need to know what it is, its very presence implies 'send the associated value direct to the device'	16:27.38
chrisl	kens: doesn't that complicate things by pdfwrite having to communicate that back up to the interpreter?	16:27.49
kens	chrisl, indeed it does, yes	16:27.58
	I didn't say it would be easy :-(	16:28.04
chrisl	I was thinking of just adding it in at the interpreter end....	16:28.24
kens	I'm also wondering if we should have a Forms cache for rendering, though its much harder to justify that	16:28.25
	We could certinaly add an ID at the interpreter, pdfwrite could use that instead. It would be as complex though, possibly	16:29.15
chrisl	In theory, it's practically the same as a Type 1 pattern cache, but.....	16:29.17
kens	Much easier if the object number relates directly to the existing stored object	16:29.28
	Forms can be much bigger than (sensible) pattern tiles though	16:29.55
henrys	from a marketing perspective if we can call whatever we do for this customer pdf/vt it makes a lot more sense to undertake it, if we can't I"d want to push back.	16:30.24
chrisl	That doesn't matter, if the tile is too big, it uses a clist	16:30.25
Robin_Watts	clist pattern tiles and a form cache would.... what chrisl said.	16:30.34
kens	henrys we absolutely cannot call it PDF/VT since it doesn't involve that at all	16:30.44
henrys	push back if we can't find a simple solution.	16:30.50
kens	I cna solve 'part' of the problem	16:31.08
chrisl	kens: a spec_op for the interpreter to say to the device "I have a form, give me a 'something' for the implementation key"?	16:31.44
kens	The [/Pattern] colour space should be fixed anyway, its wrong	16:31.49
Robin_Watts	kens: So... pdfmarks cause a problem cos they write Illustrator metadata into the file.	16:31.57
	Does the illustrator metadata differ for each instance?	16:32.12
	What format is illustrator metadata in?	16:33.16
kens2	D'oh bad time for the net to die. I was just saying that we don't know the ID for the form until after its stored, so it would be best if the interpreter sent a spec_op after the endform saying 'can I put an implementation in here'	16:34.02
Robin_Watts	kens: So... pdfmarks cause a problem cos they write Illustrator metadata into the file.	16:34.16
	Does the illustrator metadata differ for each instance?	16:34.17
	What format is illustrator metadata in?	16:34.19
henrys	kens: I'm fine with the prose except the spelling stuff. lgtm	16:34.38
kens2	Robin_Watts : the illustrator metadata is the same for each instance of the form, therefore using a form cache would resolve that problem, as well as all the others, including performance	16:34.51
	henrys, spelling corrected already	16:34.59
	Robin_Watts : the Illustrator metadata is XML, but actually it could be anything, and there are other kinds of pdfmarks	16:35.27
Robin_Watts	ok, let me rephrase the question a bit...	16:35.27
henrys	kens: if we did pdf/vt would we be able to use that machinery to solve his problem was my question.	16:35.39
Robin_Watts	but pdfmarks write what? arbitrary streams? or an arbitrary pdf object? or multiple objects?	16:36.26
kens2	henrys, yeds, but the customer would have to alter their workflow away from PostScript to manufacture the files as PDF/VT. I don't know why they want these files as PDF, but I'm assuming they want them as real PDF file, they aren't intending to use the PDF for printing, otherwise they'd be better staying with PostScript	16:36.35
Robin_Watts	(I am, as you can probably tell ignorant of what pdfmarks are, other than being "some magic that lets you set some pdf stuff from postscript")	16:37.23
kens2	Robin_Watts : pdfmarks can write pretty much anything. This particular one writes a Properties dicitonary which references a stream. The Properties dictionary can contain anything which is valid for a dictionary. SO this data could be abnything which is valid as 'general' PDF	16:37.29
	It can't, for example, write an xref, or a Pages tree or anything like that	16:37.58
Robin_Watts	kens2: pdfmarks are postscript code?	16:38.18
kens2	Yes they are	16:38.23
	But they create PDF objects	16:38.29
chrisl	Hmm, disabling pdfmarks during an execform wouldn't be a general solution to the problem :-(	16:38.45
kens2	THey are, as you said, a magic way to construct 'stuff' in a PDF file	16:38.47
Robin_Watts	So there are specific operators that can be called by pdfmarks that generate pdf objects ?	16:38.59
kens2	pdfmark is the operator, the arguments define waht type of object is written (and where)	16:39.26
	chrisl I agree, a form cache is a much better solution	16:39.37
Robin_Watts	If I read your email correctly, a form cache would not solve the problem with a change to avoid pdfmarks too ?	16:40.15
kens2	But trying to identify if a random pdfmark matches some random object which we've already written to the file is too much for me to take on	16:40.17
Robin_Watts	s/with/without/	16:40.24
chrisl	But then implementing a full form cache would be quite a lot of work for really very little real world benefit........	16:40.45
kens2	Robin_Watts : yes it would, because we would not execute the form again, so we wouldn't execute the pdfmarks in the form, and so wouln't end upw tihdifferent form content streams	16:40.52
Robin_Watts	Ah, I see.	16:41.04
kens2	chrisl a quick one for pdfwrite/ps2write would work well though	16:41.10
Robin_Watts	a form cache does sound like a nice solution.	16:41.15
	And if we can leverage the pattern clist code to do it...	16:41.28
kens2	From my POV its the best solution, but I have no real clue how long it would take to write.	16:41.40
chrisl	kens2: yes, I was thinking of a full blown cache	16:41.44
Robin_Watts	(could we even reuse the pattern cache maybe?)	16:41.53
chrisl	Robin_Watts: this is Ghostscript we're talking about......	16:42.08
	;-)	16:42.13
kens2	: A full-blown cache would take longer than a quick and dirty pdfwrite solution. I guess we could do aomething with the pattern cache code	16:42.15
	I doubt we could reuse it, maybe take some hints	16:42.30
Robin_Watts	What's the lifespan of the pattern cache? per page?	16:42.47
chrisl	The lifetime of the color space object]	16:43.08
kens2	Hmm, I assumed it was the lifetime of the job	16:43.08
	That makes more sense chrisl	16:43.26
	No point in keeping the pattern bitmap after the colour space goes away	16:43.43
henrys	kens2: have you had technical conversations with these folks before, are they going to have any idea what you are saying?	16:44.03
Robin_Watts	kens2: Ordinarily I'd be really scared to do anything that involved the clist, but given that michael/ray have already done the pattern clist stuff, I'm guessing that the really nasty decoupling of page/clist has been done already.	16:44.09
kens2	henrys, nope as far as I know I've never spoken to them	16:44.19
chrisl	I'm wary about devoting a lot of time to a form cache because forms are rarely used, and almost never used "properly"	16:44.34
kens2	I've no clue if they will understand any of this, one reason I wanted to run it past you	16:44.35
	I just don't have a good idea how long a 'full' form cache would take to implement. I suspect I could do a quick and dirty implementation for the high level devices quite quickly	16:45.24
	Just add a spec_op after the endform to get a number to store in the Implementation. Check tghe form dict before beginform and if we have an Implementation, send a different spec_op to the device to say 'draw this form'. If it returns an error, go through the full execform for safety's sake	16:46.39
chrisl	That would also improve the speed a lot, 'cause you could skip the checksumming	16:47.47
kens2	And also the execution of the form, which I htink is where allthe time is going	16:48.08
	I'm sure that's how Distiller is getting such performance on this file, if it was running the forms 5000 times it couldn't possibly (and yes, the customer example file is nearly 5,000 pages with 3 different form definitions.....)	16:48.54
henrys	kens2: what does you PaintProc get them? Is it an improvement?	16:49.07
kens2	Its smaller henrys	16:49.19
	About 63Mb instead of 81 Mb	16:49.32
	The problem form is the biggest one and that still ends up in the file 1200 times	16:49.48
rayjj	kens2: that's not much of an improvement compared to Distiller	16:49.53
henrys	kens2: but not anywhere near adobe	16:49.57
kens2	rayjj ^^	16:49.58
	Like I said, if I fix the [/Pattern] so that the shadings don't mess up the form stream, that will almost certainly improve dramatically.	16:50.35
	I obvously can't say for certain without getting the problem fixed, and its turning out to be surprisingly difficult	16:51.03
rayjj	kens2: and that's the problem with the Shading (Pattern) colorspace, right	16:51.06
kens2	I thought it would be quick to fix, half an hour or so, but its been all afternoon and I'm nowhere with it at the moment	16:51.43
	The problem is that the way the code works when it finds an uncolored pattern it doesn't write the [/Pattern] as a colour space at all	16:52.22
rayjj	if we can pass a PDF_obj_id into some of the dicts (images, Patterns, etc.) it becomes a lot more straightforward to recognize that we already have it, right ?	16:52.30
henrys	kens2: I'd be inclined to put everything you know in a bug, tell them we are still in a "research mode". If you want to just create the bug I'll talk to the customer.	16:52.33
kens2	SO our code for finding duplicate colour spaces doesn't work	16:52.34
	henrys OK I can crib the bug content from the email	16:52.48
henrys	okay when I see but I'll write the customer and contact support.	16:53.18
kens2	rayjj not really. We have to check at definition whether a defined object is the same as an existing one	16:53.29
	henrys no problem, I'll go do it now.	16:53.46
henrys	kens2: when I talked to them privately I did say it didn't look like something you were going to fix quickly so I think they are "braced"	16:54.01
rayjj	every PDF object is unique -- if we knew the PDF obj#, then we know it's the same isn't it?	16:54.32
kens2	henrys if you and Miles think its worth it a 'quick' solution would be as I outlined above, to have pdfwrite say 'this form has this ID' and have the form code tell pdfwrite each time it is about to rerun a form. That would get gthem everything they want, performance and small size (I believe). What I don't know is how long it will take to implement	16:55.16
rayjj	kens2: you just need to know what object you've created for which source PDF object	16:55.28
kens2	rayjj the input is PostScript, so no object numbers	16:55.28
rayjj	kens2: oh. That part I missed.	16:55.49
kens2	That is, unfortunately part of the problem :-(	16:56.08
	BTW we don't even attempt to spot duplicate forms in PDF files, so if they were to take the Distiller output of this file and run it back through pdfwrite they would still get a monster file.	16:57.00
henrys	chrisl: put this in batch.ff:	17:06.45
	Open($argv[2])	17:06.50
	Generate($fontname + "." + $argv[1])	17:06.51
	the then arg 1 is otf and arg 2 on is a font file.	17:07.38
chrisl	henrys: I got it - I just wasn't sure if "otf" would result in CFF outlines, so I tried it	17:08.05
henrys	chrisl: I looked at that and it didn't seem it converted to TT	17:09.03
chrisl	It's rather poor use of the TLA since OTF doesn't mean it's definitely CFF outlines	17:09.13
henrys	chrisl: I imagine if you started with a TT it wouldn't go to cff... but as I was saying I don't know how good fontforge is, if you start from a pfb and generate pfb you get something larger which is alarming but not completely unexpected.	17:11.27
kens2	henrys one (I hope) comprehenesive description in bug #695805	17:12.36
	Also has the customer number and such	17:12.45
chrisl	henrys: of course, there will be loads of cluster diffs, even just changing Type 1 to CFF.....	17:12.50
henrys	kens2: I think we should "sit on it" and discuss it next meeting after I notify the customer	17:13.18
kens2	OK not a problem for me	17:13.27
	I will try and fix the definite bug though	17:13.37
	as a low priority	17:13.41
henrys	kens2: right.	17:13.45
kens2	I'm feeling cr*p again, this bug seems to hit me as the day wears on, so I'm off for the night, see you all tomorrow.	17:14.50
henrys	has anyone not been sick in January?	17:15.30
chrisl	henrys: with those latest fonts from URW, if I have fontforge regenerate pfb's from the ones we got, I get smaller files out: 2.0Mb vs 2.4Mb	17:15.45
henrys	what version of ff?	17:16.30
chrisl	fontforge 20120731	17:16.57
henrys	chrisl: same thing, likely I had it backwards.	17:17.48
	details ;-)	17:18.17
chrisl	I could decrypt the fonts and work out why, but I don't think it's worth it	17:18.37
henrys	chrisl: yeah, I'm sort of annoyed not have the fonts from the vendor. He's created the fonts in a tool like fontforge where any format is a button push... geez.	17:25.54
	the otf with cff outlines that is.	17:26.32
chrisl	henrys: I'd have thought/hoped they'd be amenable to the request	17:28.44
henrys	I somehow missed this talk when it came out, I wish there was a short written summary of it somewhere, anyway worth a listen if you're into tech and civics: https://www.usenix.org/conference/usenixsecurity13/dr-felten-goes-washington-lessons-18-months-government	17:31.10
chrisl	henrys: there's my bmpcmp (-t 16 -w 3) on the regression dashboard which is gs with the base fonts in CFF (all except symbol and dingbats)	17:33.12
Robin_Watts	chrisl: bmpcmp -filter=.ppmraw :)	17:34.26
chrisl	Robin_Watts: I just forgot.... and it didn't seem worth rerunning when the fuzzy got it down to such a manageable number	17:35.13
henrys	chrisl: oh that's why my office is warm...	17:35.27
rayjj	chrisl: is there a simple way to just get a few devices built into gs (other than autogen.sh and just edit Makefile) ? I want just bit, bitrgb, bitcmyk, bitrgbtags	17:35.27
Robin_Watts	Are there any non halftoned ones there?	17:36.10
chrisl	rayjj: with configure, do: --with-drivers=bit,bitrgb,bitcmyk,bitrgbtags	17:36.35
	rayjj: But you'll need pdfwrite, too, or gs won't work....	17:36.54
rayjj	Robin_Watts: all of the bi devices can be any depth: 1, 2, 4 or 8 bits per component with -dGrayValues=2, 4, 16, 256	17:36.57
henrys	chrisl: I've gone through a page and a half and don't see anything that wouldn't pass "fuzzy"	17:37.04
Robin_Watts	rayjj: Different conversation :)	17:37.13
henrys	chrisl: are these the new fonts converted?	17:37.21
chrisl	henrys: yes	17:37.26
Robin_Watts	henrys: They don't pass fuzzy cos they are halftoned :)	17:37.35
chrisl	Wot Robin_Watts just said.....	17:38.00
rayjj	Robin_Watts: sorry -- that makes sense that fuzzy doesn't work with halftoned images	17:38.14
chrisl	There's a few on page 9 that are more noticeable, but not "wrong"	17:38.55
henrys	chrisl: can you do the filter so we can all look at them quickly?	17:39.16
chrisl	henrys: running now	17:41.43
henrys	thanks	17:41.57
chrisl	Hmm, except it's not appeared in the queue......	17:42.34
henrys	rayjj: did you send out the email to the potential customer?	17:42.49
cryptopsy	how can i move with arrows around a large picture opened in mupdf?	17:42.57
henrys	I didn't see it.	17:42.57
rayjj	henrys: still collecting numbers on linux x86. It's easy enough to also provide the ARM ROM sizes for the builds so I have those, but collecting the clist RAM size is harder. And I am doing mono as well as color based on the printers you had in that link (all at 600 and 1200)	17:45.10
henrys	okay great	17:45.53
rayjj	I will send it to tech for comment BEFORE it goes to the customer, just in case anyone has comments or questions	17:46.15
	and I have the Font size broken out so if we have the 136 CFF we can plug those in (presumably compressed)	17:47.08
chrisl	So, for the 136 fonts from URW, converting from Type 1 to CFF goes from 7Mb to 4Mb and OTF/CFF comes in at 4.5Mb (and TTFs from URW comes in at 12Mb).	17:54.36
rayjj	chrisl: what about zipping each font -- what's the total then ? (that's what romfs would do if we enabled compression)	17:55.24
chrisl	rayjj: which ones, the T1 or the CFF?	17:55.53
rayjj	chrisl: the CFF or the OTF's	17:56.13
henrys	chrisl: right but we want to know the numbers with the new glyphs and we don't have those. I hope to extrapolate from the 3 fonts they sent us but that looks precarious	17:56.21
	I had hoped to extrapolate ^^^	17:56.54
rayjj	I am just curious how compressible the CFF's will be	17:57.22
chrisl	rayjj: ah, give me a sec, I made a mistake there.....	17:58.11
rayjj	based on what we did at CalComp (with Peter's wrfont stuff) zip gave us about 80% of the original bzip2 got it to 70%	17:58.35
chrisl	rayjj: ~2.9Mb gzipping the cffs individually	17:59.33
rayjj	chrisl: great! so about 75% of the original size	17:59.59
chrisl	rayjj: yeh, but I'd worry about the impact on performance......	18:00.28
rayjj	and since current romfs doesn't compress, that's a reduction down from 7Mb to 2.9	18:00.52
Robin_Watts	Hey marcosw. Feeling better?	18:00.55
rayjj	chrisl: fonts get loaded rarely	18:01.23
henrys	chrisl: do you have current numbers for the ufst?	18:01.40
chrisl	henrys: I don't think you want to know them.......	18:01.55
rayjj	and gzip is pretty fast at decompression (unlike bzip2)	18:01.56
	The UFST 80 is about 800Kb iirc	18:02.19
	but it's been a while since I checked	18:02.45
chrisl	The 135 PS3 fonts FCO is 1.2Mb	18:03.19
rayjj	chrisl: that seems reasonable. Of course, we don't know what glyph set it has	18:03.47
chrisl	rayjj: The glyph set is rather bonkers, frankly	18:04.13
henrys	rayjj: I hope we do we just did a big analysis of urw vs. ufst, didn't we?	18:04.31
rayjj	chrisl: as is the UFST quality, IMHO	18:04.33
chrisl	You also have to add another ~150Kb for the plugin and the other fco which I forget what it's for....	18:05.14
rayjj	chrisl: I think that's symbols or dingbats or something	18:05.38
chrisl	Yeh, something like that.....	18:05.51
henrys	I do wonder how many duplicate glyphs we could find in the 136 or at least visually the same or don't care.	18:06.25
chrisl	henrys: We're still not getting anywhere near the glyph set of the MT fonts if you allow the multitudes of "unstyled" "non-standard" glyphs they include	18:06.43
Robin_Watts	chrisl: You and Ken argued the other day that postscript can do 'things' with the fonts which means that we have to have CFF rather than TTF. I don't want to open that particular argument again, but I was wondering what things they could do? Other than 'get the outlines for a given glyph' ?	18:06.50
	(sorry, feel free to ignore that until after the existing conversation dies down)	18:07.42
rayjj	it might be interesting to pick a fairly common font like "Arial/Helvetica" and compare the glyph quality between URW and UFST and find some particularly ugly UFST glyphs	18:07.43
henrys	chrisl: i.e. cjk?	18:07.44
	Robin_Watts: release the kracken	18:08.18
chrisl	henrys: no, those crazy geometric shapes and "symbols"	18:08.47
rayjj	Robin_Watts: PS can (and often does) add glyphs to the CharStr dict and plug them in -- they add in Type 1	18:08.51
henrys	anyway I'm going to do a run be back in an hour or so I'll write kens customer when I return.	18:09.24
rayjj	Robin_Watts: and PS sometimes tries to diddle with the matrices to do artificial slant typeface	18:09.48
henrys	chrisl: I thought we put those in the order for urw - the box things?	18:09.49
Robin_Watts	gs can handle truetype fonts - presumably if someone adds glyphs to those fonts it "works"?	18:09.53
chrisl	henrys: some of them, we left out the crazier ones	18:10.11
Robin_Watts	(i.e. I bet we don't actually ever add to the real font)	18:10.16
chrisl	Robin_Watts: no that doesn't work.	18:10.31
henrys	chrisl: okay I think that's reasonable.	18:10.35
Robin_Watts	chrisl: Ah, so we really do manipulate the cff internals for that ?	18:10.55
chrisl	Robin_Watts: yes, or Type 1 internals - the point is, it needs to be a Postscript font, not a Postscript layer on top of another font format	18:11.36
Robin_Watts	chrisl: OK. Curiosity dowsed for now. Thanks :)	18:12.00
chrisl	Robin_Watts: the problem is, if we try to make a glyph from a "real" charstring, and the dictionary isn't from a charstring based font, bad things could happen - like running calling a subr, expecting another charstring, and getting an integer back	18:13.13
rayjj	Robin_Watts: plus, the TTF's (at least before stripping out tables) are 12Mb compared to 4Mb for CFF. I'm not sure you'd get that back by stripping tables	18:13.27
henrys	Robin_Watts: I've seen many postscript programs that do a condition if it is type 1 and assume it is type 2 if the condition fails on an internal font - I recall the position of the euro when adobe first release cff and moved the euro around.	18:13.28
Robin_Watts	rayjj: I am not advocating the use of TTF at all.	18:13.49
rayjj	Robin_Watts: good. otherwise you might get a midnight visit from some angry Scots ;-)	18:14.35
chrisl	And the reason we can "hack" around all that for the UFST/MT fonts is because to render a glyph from those, we only use one standard, and two non-standard keys from the font dictionary, so the rest of the dictionary can be made to look just like a "real" type 1 font.	18:15.48
rayjj	chrisl: I am curious about your statement that gs won't run without pdfwrite. I built it and it runs fine (at least tiger)	18:17.13
chrisl	rayjj: really? Our startup code specifically loads pdfwrite initially or, at least, did not that long ago....	18:18.06
rayjj	chrisl: Note that in order to get it to build I do need a patch I haven't uploaded yet.	18:18.19
	chrisl: hmm... it might be when doing PDF's annots.pdf fails with: Error: /undefined in --run-- Operand stack: --nostringval-- OutputIntent --nostringval--	18:21.32
	I guess I'll fix that as well since we really don't want printers to require pdfwrite	18:22.08
chrisl	No, during startup we (did?) load pdfwrite and do a getdeviceparams - presumably as pretty much every other device uses a subset of the params pdfwrite uses	18:22.54
rayjj	chrisl: that may have been fixed in gs_pdfwr.ps that now uses "IsDistiller" spec_op	18:24.55
chrisl	Ah, possibly. I did discuss it with kens a while ago	18:25.21
	I'm going to have to finish now, I'm starting to get a headache (late night, last night!).......	18:28.46
cryptopsy	bye for now	18:47.11
henrys	marcosw: are you back to work?	19:27.00
Robin_Watts	mvrhel_laptop: For the logs... 1 of the top 5 commits on robin/master is a fix for SOT builds in your code. Trivial thing. Let me know if you're not happy with it.	20:07.29
henrys	is the cluster broken I get back all segv's that I can't reproduce locally?	20:58.21
	ah nevermind it's perfectly correct all pcl -> pdf jobs are failing with otf which makes sense.	21:05.40
	Forward 1 day (to 2015/01/30)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.