| <<<Back 1 day (to 2015/01/28) | 20150129 |
tor8 | Robin_Watts: got a minute? | 11:43.09 |
Robin_Watts | sure. | 11:45.38 |
| <FX: Runs to fetch caffiene> | 11:45.48 |
tor8 | I nuked the embedded contexts in fz_stream and fz_output. the results are nicer than I thought, but I'll need to run some benchmarks to see if it's made an performance impact. | 11:46.19 |
Robin_Watts | tor8: So everywhere you pass a stream or an output you now need to pass context too? | 11:47.00 |
tor8 | my instinct tells me it should be a wash, passing an extra argument vs doing a pointer dereference to get the context back out | 11:47.02 |
| Robin_Watts: yeah. making the API symmetrical, always take a fz_context no brain activity required | 11:47.20 |
Robin_Watts | symmetry is a good argument. | 11:47.40 |
tor8 | and hopefully removing the need for the rebind magic that can go wrong so easily | 11:47.52 |
Robin_Watts | I still *really* want to replace fz_stream with estream. | 11:48.03 |
tor8 | what is estream? | 11:48.14 |
Robin_Watts | estream is the stream abstraction we use in sot. | 11:48.26 |
tor8 | I'm not opposed to rewriting the fz_stream api and merging fz_stream and fz_output somehow | 11:48.30 |
| the entry points in most places should be similar with read/write | 11:48.51 |
Robin_Watts | I had not considered merging stream and output. | 11:49.09 |
tor8 | should let us filter stuff both on input and output, so we can compress using the same filters | 11:49.32 |
| though our current filters only decompress | 11:49.38 |
| not something I consider hugely important, though | 11:50.03 |
Robin_Watts | no. | 11:50.09 |
tor8 | but we should be using fz_stream and fz_output for all of our i/o needs | 11:50.26 |
| and also more consistently use fz_buffer for memory buffers | 11:50.47 |
Robin_Watts | I think estreams are easier to think about/code, but we have stuff working at the moment, so it's purely a refinement at the moment. | 11:50.59 |
tor8 | anyway, what I wanted to discuss is the fz_device interface | 11:51.12 |
Robin_Watts | got a patch on line? | 11:51.13 |
tor8 | yeah, tor/drop branch | 11:51.19 |
| lots of search-and-replace style commits | 11:51.43 |
| I was hoping to make the device callback functions of the form foo_fill_path(fz_context, void *user, ...other arguments...) rather than fz_device | 11:52.31 |
Robin_Watts | That might tie in with something else. | 11:53.40 |
| You have a stylistic thing of doing structs of a fixed size (like fz_device) and then having a void * user pointer. | 11:54.17 |
| My instinct is to have a struct like fz_device, and then have other structs based off that, like: fz_device_foo being { fz_device base; extra fields... } | 11:55.24 |
tor8 | yea mean to embed the fz_device at the beginning of the user data instead? | 11:55.31 |
| yeah. I've considered that | 11:55.35 |
Robin_Watts | It means less mallocing generally. | 11:55.51 |
tor8 | needs nasty type casting though, but less mallocing and means passing the same pointer semi-transparently | 11:56.04 |
Robin_Watts | I think it's important than in device callbacks we always pass the device. | 11:56.16 |
tor8 | if only C had the plan9 C extension where an anonymous struct at the beginning of a struct would type punt | 11:56.23 |
Robin_Watts | s/callbacks/functions/ | 11:56.28 |
| tor8: yeah. | 11:56.35 |
| You don't need type casting. | 11:56.57 |
| You can just pass &dev->base rather than dev | 11:57.21 |
tor8 | somewhere you need to cast back or to fz_device_foo | 11:57.58 |
Robin_Watts | and if you're consistent about using 'base', you can #define BASE(x) (&x->base) | 11:58.08 |
tor8 | or you mean to let the user see fz_device_foo structs? | 11:58.21 |
Robin_Watts | tor8: True. | 11:58.24 |
tor8 | Robin_Watts: this is what we do for fz_document/pdf_document | 11:59.09 |
Robin_Watts | Right. Possibly cos I was the last one to touch that? :) | 11:59.33 |
tor8 | but annoyingly they have duplicated fields... both fz_document and pdf_document have a ctx, so there's both doc->ctx and doc->super.ctx :/ | 11:59.33 |
| it's newish code and I might have been influenced by you :) | 11:59.46 |
Robin_Watts | tor8: yeah that's not great. | 11:59.52 |
tor8 | no, but I think we might be able to get rid of the fz_document ctx | 12:00.09 |
Robin_Watts | But to return to the device functions... | 12:00.10 |
tor8 | but I'll deal with *_document later, devices are my current focus | 12:00.24 |
Robin_Watts | I think we *do* need to pass dev into every device function, not just dev->user | 12:00.47 |
tor8 | yeah. we need the 'hints' flag in the subdevices | 12:02.00 |
| the error depth thing is done at the wrapper layer that calls the function pointers IIRC | 12:02.12 |
Robin_Watts | And if you want to do a device that (say) maps stroked text to text, you want the dev so you can pass on. | 12:02.42 |
tor8 | Robin_Watts: true enough | 12:03.45 |
| okay, so we really do need to pass the fz_device along rather than just the user pointer | 12:04.07 |
Robin_Watts | I feel vaguely uneasy about passing context everywhere in speed critical places. | 12:04.14 |
tor8 | which makes the choice of user pointer or embedded struct moot, the subdevices can do whichever | 12:04.33 |
| Robin_Watts: yeah, but I'm going to measure some benchmarks now and see if it actually has an impact | 12:04.48 |
Robin_Watts | tor8: perfect. | 12:04.56 |
tor8 | I wouldn't be surprised if it's actually faster for those functions that actually use the context | 12:05.00 |
Robin_Watts | the user pointer/embedded struct is an issue. | 12:05.05 |
| cos we currently have fz_new_device (or something like that) that allocates a basic fz_device struct. | 12:05.36 |
tor8 | Robin_Watts: right. pass a size_of_extra argument as well? | 12:06.18 |
Robin_Watts | We would need an fz_init_device (or an fz_new_device that took sizeof(required struct)) to neatly allow the oo way of working. | 12:06.21 |
| tor8: I think we should standardise on one or the other. | 12:06.37 |
tor8 | Robin_Watts: the construction of fz_document is a nasty mess | 12:06.43 |
Robin_Watts | but yes, that would be a good start. | 12:06.47 |
tor8 | so we could consider cleaning both of these up at the same time | 12:06.59 |
Robin_Watts | fz_document is slightly weird cos of pdf_no_run maybe. | 12:07.15 |
tor8 | also, the pdf_process struct uses another pattern, init/fin on a stack allocated struct | 12:07.26 |
| Robin_Watts: possibly, but I think it just grew organically and mutated in odd ways | 12:07.41 |
Robin_Watts | tor8: init/fin may not be a bad thing to follow. | 12:07.57 |
tor8 | which is a pattern which we could standardise on, but I'm not sure the gain (one malloc/free pair) is worth the extra API cognitive load | 12:08.23 |
Robin_Watts | init/fin are construct and destruct without malloc/free | 12:08.26 |
tor8 | Robin_Watts: yeah. I'm not convinced of its benefit in non-performance related code, since we're heavy malloc users everywhere else | 12:08.55 |
Robin_Watts | yeah, but where it's needed for performance it's a win. | 12:10.15 |
| init/fin can be thought of as an internal part of new/free. | 12:10.36 |
| and we only expose it in circumstances where it's required. | 12:10.54 |
| it's not a huge change to the way we work. | 12:11.14 |
tor8 | no, it's fine for our internal interfaces | 12:11.29 |
| but not something I'd want to expose to the public (since it requires non-opaque data types) | 12:11.48 |
Robin_Watts | indeed. | 12:12.41 |
tor8 | agh, don't quit the editor when focused on xchat! | 12:14.32 |
paulgardiner | init and fin are a pain in some places within sot. Forces to have object plus flag saying whether the object is allocated or not, although I suppose it would be possible to find special values within the struct to mark something as not allocated | 12:21.33 |
Robin_Watts | paulgardiner: Yes, they can be a pain, but they can also be powerful. | 12:22.39 |
| SOT has them for a reason. | 12:22.54 |
paulgardiner | Not in the case I'm refering to. | 12:23.09 |
| I think the argument was "well pthreads uses it and that's Lunux so it must be right" | 12:23.35 |
| If the malloc is the only possible failure then that can be a good reason for init/fin | 12:24.03 |
kens | heads for some lunch | 12:27.41 |
tor8 | Robin_Watts: inconclusive first benchmark results indicate that passing the ctx everywhere is actually faster | 13:09.35 |
| than fetching it out of a struct | 13:09.51 |
Robin_Watts | ok... | 13:10.01 |
tor8 | and on other test files it's marginally slower | 13:11.29 |
| pdfref17 is faster with context everywhere (by ~50 ms over the whole document) | 13:12.54 |
| it's in the hard to measure category... | 13:13.53 |
| and about the same diff slower on a more graphically intensive document | 13:15.54 |
| so I'd say not to worry about it for now, the API benefits outweigh any performance differences | 13:16.13 |
Robin_Watts | I guess that if it's hard to measure, it's in the 'we don't care' area. | 13:16.22 |
tor8 | especially since it seems to be a wash | 13:16.24 |
| some things marginally faster, others marginally slower | 13:16.38 |
| I expect we might be slower now than in the previous case and only for functions that don't use the ctx themselves | 13:17.15 |
| otherwise the time spent passing the extra argument is just spent fetching it back out of the struct | 13:17.28 |
| and those functions are pretty rare | 13:17.36 |
| and any inlined code should be faster, since there's no pointers that have to be fetched | 13:18.05 |
Robin_Watts | We use ctx a lot. Either to pass it to other functions, or in try/catch. | 13:18.31 |
tor8 | Robin_Watts: have you got an arm platform to test on? | 13:18.33 |
Robin_Watts | tor8: I have a beagleboard. | 13:18.43 |
| and a pi. | 13:18.50 |
tor8 | could you do a "time mudraw -5 pdfref17.pdf" on those comparing master to tor/drop? | 13:19.26 |
| if you're not busy? | 13:19.36 |
Robin_Watts | not immediately. | 13:19.43 |
tor8 | it would be good to know if arm makes more difference, intel cpus are so finicky and hard to predic | 13:20.05 |
| t | 13:20.06 |
Robin_Watts | it would. | 13:20.15 |
kens | Hmm Office on Android now available: | 13:23.23 |
| http://www.theregister.co.uk/2015/01/29/apple_office_android_tablets/ | 13:23.23 |
pedro_mac | seems to be cloud only though, canât open docs already on your device | 13:37.01 |
| (or at least not directly from their app) | 13:37.31 |
| and can only save edits to a cloud share | 13:39.03 |
kens | No ideas, its clearly limited since you need Office 365 subscription to create/edit docs | 13:42.10 |
| Makes it a viewer then I guess | 13:42.24 |
pedro_mac | it has dropbox & sharepoint support, so you donât need a subscription but it doesnât let you save/load to device | 13:46.58 |
| strange choice | 13:47.06 |
kens | The article says you need a subscription to Office 265 to edit or create documents, is that not correct ? | 13:47.53 |
| Office 365* | 13:48.01 |
pedro_mac | I have it on my phone and just use dropbox | 13:48.24 |
Robin_Watts | Office 265 is the european public sector version where they take more holidays. | 13:48.32 |
kens | :-D | 13:48.39 |
| Some of the comments on the web site seem to indicate that indeed you don;t need a subscription | 13:50.25 |
pedro_mac | its a massively cut-down editing experience too | 13:52.43 |
| I get the choice of making my text red, yellow or green | 13:52.57 |
kens | Well, that's good news for SOT right ? | 13:53.11 |
Robin_Watts | The news is not as bad as it might have been. | 13:53.38 |
pedro_mac | no font selection either - just size, style and a choice of 3 colours | 13:54.18 |
| probably enough to encourage people to buy the pro version if/when it arrives | 13:54.45 |
| they have 1 million downloads so far though, and a 4 star rating | 13:55.48 |
| for a 1 star feature set - hey, what do yu want for nothing? | 13:56.17 |
neves | Hi!I'm developing android app,which should download pdf file from url.My problem is,I must download not all pages for one time,but separately,every page,so user shouldn't wait until all document will be downloaded.Is it possible,using MuPDF? | 13:56.25 |
Robin_Watts | neves: MuPDF has code in its core to allow documents to be displayed as they are downloaded. | 13:57.32 |
| With a suitable linearized file, MuPDF will therefore show pages as they appear. | 13:58.07 |
| If you have an http fetcher that can do byte range requests, you can even jump ahead in the file, and pages will be preferentially loaded as you look at them. | 13:58.46 |
| BUT... that code is not hooked up for the android version. | 13:59.02 |
| You can probably hook it up yourself if you want if you are a competent C programmer. | 13:59.41 |
| neves: Are you aware of the licensing situation with MuPDF? | 14:00.08 |
neves | No,I'm not.Yet | 14:00.27 |
Robin_Watts | MuPDF is developed by Artifex (us). | 14:01.39 |
| We release it in 2 ways. | 14:01.47 |
| Firstly, if you are happy to abide by the terms of the GNU GPL, then you can use MuPDF under that license. | 14:02.07 |
| This means (among other things) that you must give away all the source to your application. | 14:02.28 |
| (sorry, that should read GNU AGPL, but the difference is probably moot in this case). | 14:02.49 |
| If the terms of the GNU AGPL are impossible for you to live with, then we can sell you a commercial license that lets you do what you want. | 14:03.47 |
chrisl | To clarify, with (A)GPL you don't have to "give away all the source....", you retain the copyright, but the source must be openly available/modifiable and re-distributable under the same license terms | 14:04.56 |
neves | Ok,thank but.But since I'm android developer it would be very hard to implement changes in mypdf core,I think.. | 14:09.47 |
Robin_Watts | neves: The changes required are not in the mupdf core. | 14:11.52 |
| They are in the android specific wrappers around the core. | 14:12.02 |
| but that will require some C/JNI/Java | 14:12.49 |
| Or with a suitable commercial contract, we could possibly do the work for you. | 14:13.13 |
Robin_Watts | foods. bbs. | 14:15.04 |
neves | Ok,thanks for your help!sorry,I can't offer you a commercial contract since I'm just a developer | 14:37.22 |
Robin_Watts | neves: No worries. If you decide to tackle it, let us know. | 15:01.01 |
henrys | kens: the problem I punted to to you? Did you see it? | 15:41.26 |
kens | Yes, there are several parts to it | 15:44.24 |
| I was going to send round an email for comment, to tech | 15:44.43 |
| But I was hoping to fix an actual limitation exposed by the code first. | 15:44.58 |
| Which I'm getting nowhere with at the moment | 15:45.06 |
| I'll finish up the email and send it for comment | 15:45.23 |
henrys | kens: they contacted me again yesterday for a schedule so if it's a big "todo" let's make a bug with your analysis and point them to it. | 15:51.07 |
kens | It cna be partially solved 'reasonably' quickly, partly on their end, partly on ours. A major portion is not triuvial and would be weeks to months of work. I'll finish this email and you cna read it. | 15:51.51 |
henrys | chrisl: we missed your gs font expertise yesterday. | 16:00.38 |
chrisl | henrys: I saw some of the discussion - but didn't read it all in details | 16:01.19 |
henrys | chrisl: probably don't need to. | 16:04.32 |
chrisl | henrys: what I can I say is that we can't use the same trickery for TTFs that we do for UFST/Microtype - we could do something sort of similar, but it would be potentially *much* more complicated | 16:05.42 |
henrys | chrisl: don't worry about that... but otf cff is the direction. | 16:06.37 |
chrisl | henrys: Okay, I've been doing a little experimenting, although I've been using just CFF, not OTF...... | 16:07.25 |
henrys | why does adobe acrobat ship with otf instead of cff? | 16:07.59 |
chrisl | henrys: I assume for greater compatibility - the Windows font engine can (sort of) handle OTF/CFF, but not bare CFF | 16:08.41 |
henrys | chrisl: well we can think about converting what urw is going to deliver but it doesn't look like a big savings over otf. | 16:10.03 |
chrisl | henrys: the base URW gs font set (the latest ones we just got) got from 2.4Mb in Type 1 pfb format, to 1.1Mb in "bare" CFF | 16:10.27 |
henrys | yes there is about 40 to 50 percent from type 1 to cff but the savings. I was talking about the difference between cff and otf with cff outlines | 16:11.29 |
chrisl | Yeh, I'm just not sure right now how to poke fontforge's scripting interface to produce OTF/CFF - hence I tried CFF first | 16:12.19 |
henrys | tor posted a script to pastebin I've been using. | 16:12.46 |
| well I just took out the cff extension and used otf in his script and it seems to work. But how good is fontforge? I feel like I'm depending on this thing for these numbers and have no clue if it's producing something reasonable. Have you done a cluster push with a cff substitute for the a type 1? | 16:14.34 |
chrisl | I haven't clusterpushed yet, the CFF fonts don't *quite* work with Ghostscript because of some of the crazy sh*t we do when loading fonts | 16:15.38 |
henrys | chrisl: I think I can do a cluster push with the type 1 courier converted to otf with pcl. Be interesting to see the bmpcmp | 16:16.20 |
chrisl | henrys: I'm not sure that will work. | 16:16.56 |
kens | henrys mail sent to tech, its a bit lengthy I'm afraid but its hard to explain this problem quickly. I really would like you at least to read it and consider what (if anything) we should do about this particular issue. Other opinions welcomed by the way (hint hint; chrisl, ray, Robin etc) | 16:18.05 |
Robin_Watts | kens: 'form cache' sounds like something that would be done in MuPDF using a display list. | 16:20.28 |
| The idea of having to use clist to do it in gs makes me go cold(er). | 16:20.46 |
kens | Robin_Watts : there's lots of ways to do it, GS doesn't do it at all | 16:20.52 |
| It doesn't have to be done for low level devices at all. Its possible that we could store an /Implementation in the form dictioanry and have the form code check it, if it finds that, it doesn't execute the form, just sends the Implementation to the high level device (for an example) | 16:21.42 |
Robin_Watts | "MD65" | 16:21.56 |
kens | Ooops | 16:22.01 |
Robin_Watts | 13 times as good as MD5. | 16:22.09 |
kens | 13 times slower too ? | 16:22.17 |
Robin_Watts | "WHich" | 16:22.52 |
| "/R19as" | 16:23.21 |
kens | I was in a hurry writing a lot of this..... | 16:23.33 |
Robin_Watts | "WHen" | 16:23.38 |
kens | Really I;'m more interested in comments about the facts and implications than spelling mistakes | 16:23.56 |
henrys | and Shapr is the one I noticed ;-) | 16:24.04 |
Robin_Watts | "R18 Do" | 16:24.19 |
| kens: yes, just mentioning stuff as I go. | 16:24.35 |
henrys | how is this different than pdf/vt that I'm constantly badgered about at tradeshows is it completely separate? | 16:24.37 |
kens | This is PostScript input | 16:24.48 |
| PDF/VT is a way of doign the same task with PDF input (sort of) | 16:25.06 |
henrys | right but presumably if we had PDF/VT machinery in the code.... then it would be useful to this problem. | 16:25.42 |
Robin_Watts | kens: Reads well to me. | 16:26.07 |
kens | He could rewrite his contents as PDF/VT, yes | 16:26.07 |
chrisl | kens: I'm assuming that the Implementation key could simply be an integer index, and that would be sufficient for high level devices? | 16:26.47 |
kens | He would have to convert the fixed portion to PDF separtely (3 pages) then add the variable portion (which is all 'aaa' and similar in his test file to the 'VT' definition, which I don't recall offhand | 16:26.47 |
| chrisl yeah I was thinking the object number already in the PDF file would be easy | 16:27.05 |
| It woudl work for ps2write and pdfwrite, teh PS front-end doesn't need to know what it is, its very presence implies 'send the associated value direct to the device' | 16:27.38 |
chrisl | kens: doesn't that complicate things by pdfwrite having to communicate that back up to the interpreter? | 16:27.49 |
kens | chrisl, indeed it does, yes | 16:27.58 |
| I didn't say it would be easy :-( | 16:28.04 |
chrisl | I was thinking of just adding it in at the interpreter end.... | 16:28.24 |
kens | I'm also wondering if we should have a Forms cache for rendering, though its much harder to justify that | 16:28.25 |
| We could certinaly add an ID at the interpreter, pdfwrite could use that instead. It would be as complex though, possibly | 16:29.15 |
chrisl | In theory, it's practically the same as a Type 1 pattern cache, but..... | 16:29.17 |
kens | Much easier if the object number relates directly to the existing stored object | 16:29.28 |
| Forms can be much bigger than (sensible) pattern tiles though | 16:29.55 |
henrys | from a marketing perspective if we can call whatever we do for this customer pdf/vt it makes a lot more sense to undertake it, if we can't I"d want to push back. | 16:30.24 |
chrisl | That doesn't matter, if the tile is too big, it uses a clist | 16:30.25 |
Robin_Watts | clist pattern tiles and a form cache would.... what chrisl said. | 16:30.34 |
kens | henrys we absolutely cannot call it PDF/VT since it doesn't involve that at all | 16:30.44 |
henrys | push back if we can't find a simple solution. | 16:30.50 |
kens | I cna solve 'part' of the problem | 16:31.08 |
chrisl | kens: a spec_op for the interpreter to say to the device "I have a form, give me a 'something' for the implementation key"? | 16:31.44 |
kens | The [/Pattern] colour space should be fixed anyway, its wrong | 16:31.49 |
Robin_Watts | kens: So... pdfmarks cause a problem cos they write Illustrator metadata into the file. | 16:31.57 |
| Does the illustrator metadata differ for each instance? | 16:32.12 |
| What format is illustrator metadata in? | 16:33.16 |
kens2 | D'oh bad time for the net to die. I was just saying that we don't know the ID for the form until after its stored, so it would be best if the interpreter sent a spec_op after the endform saying 'can I put an implementation in here' | 16:34.02 |
Robin_Watts | kens: So... pdfmarks cause a problem cos they write Illustrator metadata into the file. | 16:34.16 |
| Does the illustrator metadata differ for each instance? | 16:34.17 |
| What format is illustrator metadata in? | 16:34.19 |
henrys | kens: I'm fine with the prose except the spelling stuff. lgtm | 16:34.38 |
kens2 | Robin_Watts : the illustrator metadata is the same for each instance of the form, therefore using a form cache would resolve that problem, as well as all the others, including performance | 16:34.51 |
| henrys, spelling corrected already | 16:34.59 |
| Robin_Watts : the Illustrator metadata is XML, but actually it *could* be anything, and there are other kinds of pdfmarks | 16:35.27 |
Robin_Watts | ok, let me rephrase the question a bit... | 16:35.27 |
henrys | kens: if we did pdf/vt would we be able to use that machinery to solve his problem was my question. | 16:35.39 |
Robin_Watts | but pdfmarks write what? arbitrary streams? or an arbitrary pdf object? or multiple objects? | 16:36.26 |
kens2 | henrys, yeds, but the customer would have to alter their workflow away from PostScript to manufacture the files as PDF/VT. I don't know why they want these files as PDF, but I'm assuming they want them as *real* PDF file, they aren't intending to use the PDF for printing, otherwise they'd be better staying with PostScript | 16:36.35 |
Robin_Watts | (I am, as you can probably tell ignorant of what pdfmarks are, other than being "some magic that lets you set some pdf stuff from postscript") | 16:37.23 |
kens2 | Robin_Watts : pdfmarks can write pretty much anything. This particular one writes a Properties dicitonary which references a stream. The Properties dictionary can contain anything which is valid for a dictionary. SO this data could be abnything which is valid as 'general' PDF | 16:37.29 |
| It can't, for example, write an xref, or a Pages tree or anything like that | 16:37.58 |
Robin_Watts | kens2: pdfmarks are postscript code? | 16:38.18 |
kens2 | Yes they are | 16:38.23 |
| But they create PDF objects | 16:38.29 |
chrisl | Hmm, disabling pdfmarks during an execform wouldn't be a general solution to the problem :-( | 16:38.45 |
kens2 | THey are, as you said, a magic way to construct 'stuff' in a PDF file | 16:38.47 |
Robin_Watts | So there are specific operators that can be called by pdfmarks that generate pdf objects ? | 16:38.59 |
kens2 | pdfmark is the operator, the arguments define waht type of object is written (and where) | 16:39.26 |
| chrisl I agree, a form cache is a much better solution | 16:39.37 |
Robin_Watts | If I read your email correctly, a form cache would not solve the problem with a change to avoid pdfmarks too ? | 16:40.15 |
kens2 | But trying to identify if a random pdfmark matches some random object which we've already written to the file is too much for me to take on | 16:40.17 |
Robin_Watts | s/with/without/ | 16:40.24 |
chrisl | But then implementing a full form cache would be quite a lot of work for really very little real world benefit........ | 16:40.45 |
kens2 | Robin_Watts : yes it would, because we would not execute the form again, so we wouldn't execute the pdfmarks in the form, and so wouln't end upw tihdifferent form content streams | 16:40.52 |
Robin_Watts | Ah, I see. | 16:41.04 |
kens2 | chrisl a quick one for pdfwrite/ps2write would work well though | 16:41.10 |
Robin_Watts | a form cache does sound like a nice solution. | 16:41.15 |
| And if we can leverage the pattern clist code to do it... | 16:41.28 |
kens2 | From my POV its the best solution, but I have no real clue how long it would take to write. | 16:41.40 |
chrisl | kens2: yes, I was thinking of a full blown cache | 16:41.44 |
Robin_Watts | (could we even reuse the pattern cache maybe?) | 16:41.53 |
chrisl | Robin_Watts: this is Ghostscript we're talking about...... | 16:42.08 |
| ;-) | 16:42.13 |
kens2 | : A full-blown cache would take longer than a quick and dirty pdfwrite solution. I guess we could do aomething with the pattern cache code | 16:42.15 |
| I doubt we could reuse it, maybe take some hints | 16:42.30 |
Robin_Watts | What's the lifespan of the pattern cache? per page? | 16:42.47 |
chrisl | The lifetime of the color space object] | 16:43.08 |
kens2 | Hmm, I assumed it was the lifetime of the job | 16:43.08 |
| That makes more sense chrisl | 16:43.26 |
| No point in keeping the pattern bitmap after the colour space goes away | 16:43.43 |
henrys | kens2: have you had technical conversations with these folks before, are they going to have any idea what you are saying? | 16:44.03 |
Robin_Watts | kens2: Ordinarily I'd be really scared to do anything that involved the clist, but given that michael/ray have already done the pattern clist stuff, I'm guessing that the really nasty decoupling of page/clist has been done already. | 16:44.09 |
kens2 | henrys, nope as far as I know I've never spoken to them | 16:44.19 |
chrisl | I'm wary about devoting a lot of time to a form cache because forms are rarely used, and almost never used "properly" | 16:44.34 |
kens2 | I've no clue if they will understand any of this, one reason I wanted to run it past you | 16:44.35 |
| I just don't have a good idea how long a 'full' form cache would take to implement. I suspect I could do a quick and dirty implementation for the high level devices quite quickly | 16:45.24 |
| Just add a spec_op after the endform to get a number to store in the Implementation. Check tghe form dict before beginform and if we have an Implementation, send a different spec_op to the device to say 'draw this form'. If it returns an error, go through the full execform for safety's sake | 16:46.39 |
chrisl | That would also improve the speed a lot, 'cause you could skip the checksumming | 16:47.47 |
kens2 | And also the execution of the form, which I htink is where allthe time is going | 16:48.08 |
| I'm sure that's how Distiller is getting such performance on this file, if it was running the forms 5000 times it couldn't possibly (and yes, the customer example file is nearly 5,000 pages with 3 different form definitions.....) | 16:48.54 |
henrys | kens2: what does you PaintProc get them? Is it an improvement? | 16:49.07 |
kens2 | Its smaller henrys | 16:49.19 |
| About 63Mb instead of 81 Mb | 16:49.32 |
| The problem form is the biggest one and that still ends up in the file 1200 times | 16:49.48 |
rayjj | kens2: that's not much of an improvement compared to Distiller | 16:49.53 |
henrys | kens2: but not anywhere near adobe | 16:49.57 |
kens2 | rayjj ^^ | 16:49.58 |
| Like I said, if I fix the [/Pattern] so that the shadings don't mess up the form stream, that will almost certainly improve dramatically. | 16:50.35 |
| I obvously can't say for certain without getting the problem fixed, and its turning out to be surprisingly difficult | 16:51.03 |
rayjj | kens2: and that's the problem with the Shading (Pattern) colorspace, right | 16:51.06 |
kens2 | I thought it would be quick to fix, half an hour or so, but its been all afternoon and I'm nowhere with it at the moment | 16:51.43 |
| The problem is that the way the code works when it finds an uncolored pattern it doesn't write the [/Pattern] as a colour space at all | 16:52.22 |
rayjj | if we can pass a PDF_obj_id into some of the dicts (images, Patterns, etc.) it becomes a lot more straightforward to recognize that we already have it, right ? | 16:52.30 |
henrys | kens2: I'd be inclined to put everything you know in a bug, tell them we are still in a "research mode". If you want to just create the bug I'll talk to the customer. | 16:52.33 |
kens2 | SO our code for finding duplicate colour spaces doesn't work | 16:52.34 |
| henrys OK I can crib the bug content from the email | 16:52.48 |
henrys | okay when I see but I'll write the customer and contact support. | 16:53.18 |
kens2 | rayjj not really. We have to check at definition whether a defined object is the same as an existing one | 16:53.29 |
| henrys no problem, I'll go do it now. | 16:53.46 |
henrys | kens2: when I talked to them privately I did say it didn't look like something you were going to fix quickly so I think they are "braced" | 16:54.01 |
rayjj | every PDF object is unique -- if we knew the PDF obj#, then we know it's the same isn't it? | 16:54.32 |
kens2 | henrys if you and Miles think its worth it a 'quick' solution would be as I outlined above, to have pdfwrite say 'this form has this ID' and have the form code tell pdfwrite each time it is about to rerun a form. That would get gthem everything they want, performance and small size (I believe). What I don't know is how long it will take to implement | 16:55.16 |
rayjj | kens2: you just need to know what object you've created for which source PDF object | 16:55.28 |
kens2 | rayjj the input is PostScript, so no object numbers | 16:55.28 |
rayjj | kens2: oh. That part I missed. | 16:55.49 |
kens2 | That is, unfortunately part of the problem :-( | 16:56.08 |
| BTW we don't even attempt to spot duplicate forms in PDF files, so if they were to take the Distiller output of this file and run it back through pdfwrite they would still get a monster file. | 16:57.00 |
henrys | chrisl: put this in batch.ff: | 17:06.45 |
| Open($argv[2]) | 17:06.50 |
| Generate($fontname + "." + $argv[1]) | 17:06.51 |
| the then arg 1 is otf and arg 2 on is a font file. | 17:07.38 |
chrisl | henrys: I got it - I just wasn't sure if "otf" would result in CFF outlines, so I tried it | 17:08.05 |
henrys | chrisl: I looked at that and it didn't seem it converted to TT | 17:09.03 |
chrisl | It's rather poor use of the TLA since OTF doesn't mean it's definitely CFF outlines | 17:09.13 |
henrys | chrisl: I imagine if you started with a TT it wouldn't go to cff... but as I was saying I don't know how good fontforge is, if you start from a pfb and generate pfb you get something larger which is alarming but not completely unexpected. | 17:11.27 |
kens2 | henrys one (I hope) comprehenesive description in bug #695805 | 17:12.36 |
| Also has the customer number and such | 17:12.45 |
chrisl | henrys: of course, there will be loads of cluster diffs, even just changing Type 1 to CFF..... | 17:12.50 |
henrys | kens2: I think we should "sit on it" and discuss it next meeting after I notify the customer | 17:13.18 |
kens2 | OK not a problem for me | 17:13.27 |
| I will try and fix the definite bug though | 17:13.37 |
| as a low priority | 17:13.41 |
henrys | kens2: right. | 17:13.45 |
kens2 | I'm feeling cr*p again, this bug seems to hit me as the day wears on, so I'm off for the night, see you all tomorrow. | 17:14.50 |
henrys | has anyone not been sick in January? | 17:15.30 |
chrisl | henrys: with those latest fonts from URW, if I have fontforge regenerate pfb's from the ones we got, I get smaller files out: 2.0Mb vs 2.4Mb | 17:15.45 |
henrys | what version of ff? | 17:16.30 |
chrisl | fontforge 20120731 | 17:16.57 |
henrys | chrisl: same thing, likely I had it backwards. | 17:17.48 |
| details ;-) | 17:18.17 |
chrisl | I could decrypt the fonts and work out why, but I don't think it's worth it | 17:18.37 |
henrys | chrisl: yeah, I'm sort of annoyed not have the fonts from the vendor. He's created the fonts in a tool like fontforge where any format is a button push... geez. | 17:25.54 |
| the otf with cff outlines that is. | 17:26.32 |
chrisl | henrys: I'd have thought/hoped they'd be amenable to the request | 17:28.44 |
henrys | I somehow missed this talk when it came out, I wish there was a short written summary of it somewhere, anyway worth a listen if you're into tech and civics: https://www.usenix.org/conference/usenixsecurity13/dr-felten-goes-washington-lessons-18-months-government | 17:31.10 |
chrisl | henrys: there's my bmpcmp (-t 16 -w 3) on the regression dashboard which is gs with the base fonts in CFF (all except symbol and dingbats) | 17:33.12 |
Robin_Watts | chrisl: bmpcmp -filter=.ppmraw :) | 17:34.26 |
chrisl | Robin_Watts: I just forgot.... and it didn't seem worth rerunning when the fuzzy got it down to such a manageable number | 17:35.13 |
henrys | chrisl: oh that's why my office is warm... | 17:35.27 |
rayjj | chrisl: is there a simple way to just get a few devices built into gs (other than autogen.sh and just edit Makefile) ? I want just bit, bitrgb, bitcmyk, bitrgbtags | 17:35.27 |
Robin_Watts | Are there any non halftoned ones there? | 17:36.10 |
chrisl | rayjj: with configure, do: --with-drivers=bit,bitrgb,bitcmyk,bitrgbtags | 17:36.35 |
| rayjj: But you'll need pdfwrite, too, or gs won't work.... | 17:36.54 |
rayjj | Robin_Watts: all of the bi devices can be any depth: 1, 2, 4 or 8 bits per component with -dGrayValues=2, 4, 16, 256 | 17:36.57 |
henrys | chrisl: I've gone through a page and a half and don't see anything that wouldn't pass "fuzzy" | 17:37.04 |
Robin_Watts | rayjj: Different conversation :) | 17:37.13 |
henrys | chrisl: are these the new fonts converted? | 17:37.21 |
chrisl | henrys: yes | 17:37.26 |
Robin_Watts | henrys: They don't pass fuzzy cos they are halftoned :) | 17:37.35 |
chrisl | Wot Robin_Watts just said..... | 17:38.00 |
rayjj | Robin_Watts: sorry -- that makes sense that fuzzy doesn't work with halftoned images | 17:38.14 |
chrisl | There's a few on page 9 that are more noticeable, but not "wrong" | 17:38.55 |
henrys | chrisl: can you do the filter so we can all look at them quickly? | 17:39.16 |
chrisl | henrys: running now | 17:41.43 |
henrys | thanks | 17:41.57 |
chrisl | Hmm, except it's not appeared in the queue...... | 17:42.34 |
henrys | rayjj: did you send out the email to the potential customer? | 17:42.49 |
cryptopsy | how can i move with arrows around a large picture opened in mupdf? | 17:42.57 |
henrys | I didn't see it. | 17:42.57 |
rayjj | henrys: still collecting numbers on linux x86. It's easy enough to also provide the ARM ROM sizes for the builds so I have those, but collecting the clist RAM size is harder. And I am doing mono as well as color based on the printers you had in that link (all at 600 and 1200) | 17:45.10 |
henrys | okay great | 17:45.53 |
rayjj | I will send it to tech for comment BEFORE it goes to the customer, just in case anyone has comments or questions | 17:46.15 |
| and I have the Font size broken out so if we have the 136 CFF we can plug those in (presumably compressed) | 17:47.08 |
chrisl | So, for the 136 fonts from URW, converting from Type 1 to CFF goes from 7Mb to 4Mb and OTF/CFF comes in at 4.5Mb (and TTFs from URW comes in at 12Mb). | 17:54.36 |
rayjj | chrisl: what about zipping each font -- what's the total then ? (that's what romfs would do if we enabled compression) | 17:55.24 |
chrisl | rayjj: which ones, the T1 or the CFF? | 17:55.53 |
rayjj | chrisl: the CFF or the OTF's | 17:56.13 |
henrys | chrisl: right but we want to know the numbers with the new glyphs and we don't have those. I hope to extrapolate from the 3 fonts they sent us but that looks precarious | 17:56.21 |
| I had hoped to extrapolate ^^^ | 17:56.54 |
rayjj | I am just curious how compressible the CFF's will be | 17:57.22 |
chrisl | rayjj: ah, give me a sec, I made a mistake there..... | 17:58.11 |
rayjj | based on what we did at CalComp (with Peter's wrfont stuff) zip gave us about 80% of the original bzip2 got it to 70% | 17:58.35 |
chrisl | rayjj: ~2.9Mb gzipping the cffs individually | 17:59.33 |
rayjj | chrisl: great! so about 75% of the original size | 17:59.59 |
chrisl | rayjj: yeh, but I'd worry about the impact on performance...... | 18:00.28 |
rayjj | and since current romfs doesn't compress, that's a reduction down from 7Mb to 2.9 | 18:00.52 |
Robin_Watts | Hey marcosw. Feeling better? | 18:00.55 |
rayjj | chrisl: fonts get loaded rarely | 18:01.23 |
henrys | chrisl: do you have current numbers for the ufst? | 18:01.40 |
chrisl | henrys: I don't think you want to know them....... | 18:01.55 |
rayjj | and gzip is pretty fast at decompression (unlike bzip2) | 18:01.56 |
| The UFST 80 is about 800Kb iirc | 18:02.19 |
| but it's been a while since I checked | 18:02.45 |
chrisl | The 135 PS3 fonts FCO is 1.2Mb | 18:03.19 |
rayjj | chrisl: that seems reasonable. Of course, we don't know what glyph set it has | 18:03.47 |
chrisl | rayjj: The glyph set is rather bonkers, frankly | 18:04.13 |
henrys | rayjj: I hope we do we just did a big analysis of urw vs. ufst, didn't we? | 18:04.31 |
rayjj | chrisl: as is the UFST quality, IMHO | 18:04.33 |
chrisl | You also have to add another ~150Kb for the plugin and the other fco which I forget what it's for.... | 18:05.14 |
rayjj | chrisl: I think that's symbols or dingbats or something | 18:05.38 |
chrisl | Yeh, something like that..... | 18:05.51 |
henrys | I do wonder how many duplicate glyphs we could find in the 136 or at least visually the same or don't care. | 18:06.25 |
chrisl | henrys: We're still not getting anywhere near the glyph set of the MT fonts *if* you allow the multitudes of "unstyled" "non-standard" glyphs they include | 18:06.43 |
Robin_Watts | chrisl: You and Ken argued the other day that postscript can do 'things' with the fonts which means that we have to have CFF rather than TTF. I don't want to open that particular argument again, but I was wondering what things they could do? Other than 'get the outlines for a given glyph' ? | 18:06.50 |
| (sorry, feel free to ignore that until after the existing conversation dies down) | 18:07.42 |
rayjj | it might be interesting to pick a fairly common font like "Arial/Helvetica" and compare the glyph quality between URW and UFST and find some particularly ugly UFST glyphs | 18:07.43 |
henrys | chrisl: i.e. cjk? | 18:07.44 |
| Robin_Watts: release the kracken | 18:08.18 |
chrisl | henrys: no, those crazy geometric shapes and "symbols" | 18:08.47 |
rayjj | Robin_Watts: PS can (and often does) add glyphs to the CharStr dict and plug them in -- they add in Type 1 | 18:08.51 |
henrys | anyway I'm going to do a run be back in an hour or so I'll write kens customer when I return. | 18:09.24 |
rayjj | Robin_Watts: and PS sometimes tries to diddle with the matrices to do artificial slant typeface | 18:09.48 |
henrys | chrisl: I thought we put those in the order for urw - the box things? | 18:09.49 |
Robin_Watts | gs can handle truetype fonts - presumably if someone adds glyphs to those fonts it "works"? | 18:09.53 |
chrisl | henrys: some of them, we left out the crazier ones | 18:10.11 |
Robin_Watts | (i.e. I bet we don't actually ever add to the real font) | 18:10.16 |
chrisl | Robin_Watts: no that doesn't work. | 18:10.31 |
henrys | chrisl: okay I think that's reasonable. | 18:10.35 |
Robin_Watts | chrisl: Ah, so we really do manipulate the cff internals for that ? | 18:10.55 |
chrisl | Robin_Watts: yes, or Type 1 internals - the point is, it needs to be a Postscript font, not a Postscript layer on top of another font format | 18:11.36 |
Robin_Watts | chrisl: OK. Curiosity dowsed for now. Thanks :) | 18:12.00 |
chrisl | Robin_Watts: the problem is, if we try to make a glyph from a "real" charstring, and the dictionary isn't from a charstring based font, bad things could happen - like running calling a subr, expecting another charstring, and getting an integer back | 18:13.13 |
rayjj | Robin_Watts: plus, the TTF's (at least before stripping out tables) are 12Mb compared to 4Mb for CFF. I'm not sure you'd get that back by stripping tables | 18:13.27 |
henrys | Robin_Watts: I've seen many postscript programs that do a condition if it is type 1 and assume it is type 2 if the condition fails on an internal font - I recall the position of the euro when adobe first release cff and moved the euro around. | 18:13.28 |
Robin_Watts | rayjj: I am not advocating the use of TTF at all. | 18:13.49 |
rayjj | Robin_Watts: good. otherwise you might get a midnight visit from some angry Scots ;-) | 18:14.35 |
chrisl | And the reason we can "hack" around all that for the UFST/MT fonts is because to render a glyph from those, we only use one standard, and two non-standard keys from the font dictionary, so the rest of the dictionary can be made to look just like a "real" type 1 font. | 18:15.48 |
rayjj | chrisl: I am curious about your statement that gs won't run without pdfwrite. I built it and it runs fine (at least tiger) | 18:17.13 |
chrisl | rayjj: really? Our startup code specifically loads pdfwrite initially or, at least, did not that long ago.... | 18:18.06 |
rayjj | chrisl: Note that in order to get it to build I do need a patch I haven't uploaded yet. | 18:18.19 |
| chrisl: hmm... it might be when doing PDF's annots.pdf fails with: Error: /undefined in --run-- Operand stack: --nostringval-- OutputIntent --nostringval-- | 18:21.32 |
| I guess I'll fix that as well since we really don't want printers to require pdfwrite | 18:22.08 |
chrisl | No, during startup we (did?) load pdfwrite and do a getdeviceparams - presumably as pretty much every other device uses a subset of the params pdfwrite uses | 18:22.54 |
rayjj | chrisl: that may have been fixed in gs_pdfwr.ps that now uses "IsDistiller" spec_op | 18:24.55 |
chrisl | Ah, possibly. I did discuss it with kens a while ago | 18:25.21 |
| I'm going to have to finish now, I'm starting to get a headache (late night, last night!)....... | 18:28.46 |
cryptopsy | bye for now | 18:47.11 |
henrys | marcosw: are you back to work? | 19:27.00 |
Robin_Watts | mvrhel_laptop: For the logs... 1 of the top 5 commits on robin/master is a fix for SOT builds in your code. Trivial thing. Let me know if you're not happy with it. | 20:07.29 |
henrys | is the cluster broken I get back all segv's that I can't reproduce locally? | 20:58.21 |
| ah nevermind it's perfectly correct all pcl -> pdf jobs are failing with otf which makes sense. | 21:05.40 |
| Forward 1 day (to 2015/01/30)>>> | |