| <<<Back 1 day (to 2015/11/09) | 20151110 |
marcosw | Robin_Watts: WOW, the SM951 fast! cd | 04:37.09 |
| bonnie++ reports 1.5 GB/s writes and 2.3 GB/s reads. Amusingly most of the results are ++++, indicating the test finished too fast to measure accurately, at least with the default parameters. | 04:53.43 |
| The SM951 does get warm; it went from 40C to 81C in a couple of minutes with continuous read/writes. The limit is 85C but it appeared to start throttling when it hit 80C. The M2 slot doesn't get much airflow with the case I'm using. | 05:37.46 |
James | Hi | 11:13.52 |
ghostbot | Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line. | 11:13.52 |
James | Is anyone there | 11:13.56 |
kens | No | 11:14.09 |
Guest49912 | Ha ha | 11:14.15 |
| Quick question for you | 11:14.21 |
| Mupdf looks great | 11:14.50 |
| Does it support interactive PDF for page navigation? | 11:15.09 |
kens | I don't know what you mean by that, are you talking about following hyperlinks ? | 11:15.38 |
Guest49912 | Yes | 11:15.48 |
| Exactly | 11:15.50 |
| So users can navigate from a contents page | 11:16.03 |
Robin_Watts | MuPDF is a portable C library for opening/manipulating PDF (and other files). | 11:16.10 |
Guest49912 | Instead of flicking through hundreds of pages | 11:16.14 |
Robin_Watts | The core of mupdf certainly reads that information from the PDF files. | 11:16.37 |
kens | It may depend on the OS< the demo viewer varies with the platform. However its certainly possible in the core library and I thought it was present in the viewers. I am not a MuPDF developer though | 11:16.38 |
tor8 | Guest49912: Yes. The MuPDF viewer supports hyperlinks. The mobile and the new desktop viewers also support the outline/table of contents for quick navigation. | 11:16.58 |
Robin_Watts | The viewers for various different platforms are just thin wrappers around the core. not all of them expose all of the functionality of the core. | 11:17.03 |
Guest49912 | Ok, were running it on android from google play | 11:17.04 |
Robin_Watts | So, depending on what platform you want to run it on, ymmv. | 11:17.17 |
Guest49912 | Just trying to work out the best way of getting it to work | 11:17.18 |
| It's failing at the moment with hyperlinks set from acrobat | 11:17.36 |
Robin_Watts | Guest49912: Yes the android app supports link following. But only if you click the 'link' icon in the top bar first :) | 11:17.49 |
| Look for the icon that looks like a chain. | 11:17.59 |
mhayden | just stumbled onto mupdf and i love it -- thanks for the work there, folks | 13:52.55 |
sebras | mhayden: are you using the android app? or the pre-built apps for windows/ios/linux/winrt? | 13:55.22 |
mhayden | i'm using it in linux (Fedora) | 13:55.45 |
sebras | mhayden: alright, it is always interesting to know. thanks! :) | 14:44.39 |
mhayden | no problem ;) | 14:53.52 |
henrys | Robin_Watts: tiffscaled has really become an important devicef for us, great stuff! | 15:15.20 |
Robin_Watts | henrys: We should consider a tiffscaledets device maybe at some point. | 15:16.19 |
henrys | Robin_Watts: good idea. | 15:18.07 |
Robin_Watts | All the smarts of tiffscaled are in the downscaler which is used by lots of devices. It would be a matter of updating that to know how to do ETS. | 15:20.55 |
| and then we'd use the existing tiffscaled etc, but with -dDownScaleMode=1 to tell it to use ETS. | 15:21.29 |
henrys | Robin_Watts: I'll add it to the project list on the agenda. Maybe something rayjj would be interested in doing. | 15:22.42 |
marcosw | Robin_Watts: thanks for the Phil email analysis. | 15:23.32 |
Robin_Watts | marcosw: np. | 15:23.42 |
henrys | Other than asking about the mupdf release I'm not really feeling the need for a meeting. | 15:24.59 |
| tkamppeter: were you able to help out the folks trying to use the old konica minolta printers? I visited their office last week and mentioned the community was having printing problems. | 15:27.23 |
| tkamppeter: not the HQ but the "printer language group" | 15:28.01 |
| I saw Spy last night and I'm still laughing about it. Good fun movie. | 15:28.44 |
mvrhel_laptop | I was wondering if anyone had gone ahead and reinstalled the gsview beta (for windows) that I had chrisl put up a couple of weeks ago. wanted to make sure there were not any major issues | 15:28.50 |
henrys | mvrhel_laptop: I will today at some point | 15:29.28 |
mvrhel_laptop | thanks | 15:29.36 |
henrys | chrisl: does it make sense to test the final commercial release or do you think the last problem we had is isolated and not likely to recur | 15:30.03 |
| ? | 15:30.05 |
Robin_Watts | mvrhel_laptop: This is still called gsview_setup_6.0.exe ? | 15:30.51 |
chrisl | henrys: I do test it - but I rarely use the VS projects, and almost always build at the command line | 15:30.53 |
marcosw | Robin_Watts: take a look a the 85 and 170 spots in the tiffscaled output, the error diffusion fails in an interesting way. | 15:31.03 |
mvrhel_laptop | Robin_Watts: yeah we probably should have but a beta_# after it | 15:31.16 |
| let me look what the tag was | 15:31.30 |
Robin_Watts | marcosw: That's not an unexpected failure. | 15:31.59 |
| mvrhel_laptop: Just wanted to make sure I was downloading the right version. | 15:32.19 |
| We should have a build number in there. | 15:32.26 |
tkamppeter | henrys, these printers need XPS, there is someone (Helge Blischke) who wrote a simple filter in Perl for using GS with xpswrite, but I think that later on something in cups-filters would be needed. Also a rastertoxps to print from phones would be a nice thing. | 15:32.29 |
Robin_Watts | gsview_setup_6.0_0001.exe etc | 15:32.43 |
mvrhel_laptop | yes | 15:32.55 |
| I will get that added in the nsis script | 15:33.41 |
henrys | tkamppeter: raster to xps should be pretty trivial | 15:33.48 |
tkamppeter | henrys, but before investing more time into a thing like rastertoxps I would need to know how many modern printers are actually XPS-only, as all Google results about these Konica Minolta printers are from 7 or 8 years ago. | 15:33.52 |
mvrhel_laptop | XPS-only has the be very rare | 15:34.18 |
| I can't imagine | 15:34.25 |
| s/the/to/ | 15:34.41 |
henrys | tkamppeter: I don't think it's a good thing to spend a lot of time on but writing a rastertoxps is probably 3 hours so... | 15:35.03 |
rayjj | Robin_Watts: marcosw: rather than ETS (or other error diffusion), Phil may like the output using the stochastic threshold array. This should be faster to halftone and won't have strange effects | 15:35.03 |
tkamppeter | henrys, most work on a rastertoxps is to find out about the format, the structure, starting pages, selecting resolution and color space, ... the bitmap itself probably goes the same way as the rastertopdf filter for example. | 15:36.09 |
tor8 | Robin_Watts: could you compile the android release apks? the android sdk keeps breaking on my machine... stupid thing *still* doesn't work properly on 64-bit linux | 15:36.24 |
Robin_Watts | tor8: Sure. | 15:36.44 |
| Did the VS2005 64bit building issues get sorted? | 15:36.54 |
| mvrhel_laptop: gsview seems to work for me. | 15:38.04 |
mvrhel_laptop | thanks Robin_Watts | 15:38.19 |
henrys | rayjj: this does sound like something for a twiki page... when to use halftone in the pipeline vs postprocessing. Really hard for me to imagine when you'd want in pipeline for host based stuff. | 15:38.32 |
| rayjj: it would be interesting to see some timing numbers | 15:39.01 |
| tor8: is that the only problem for the release? | 15:40.02 |
tor8 | henrys: as far as I can tell, yes. | 15:40.24 |
| oh, and ios builds | 15:40.32 |
| Robin_Watts: there are two commits on tor/master that we could probably sneak into the 1.8 release | 15:40.51 |
mvrhel_laptop | tor8: so do we want to add something like a pdf-create.c in the pdf directory to add in the stuff that I am pulling out from pdf-device.c (e.g. the code to create the image and font resources) | 15:40.52 |
tor8 | mvrhel_laptop: yes. | 15:41.12 |
mvrhel_laptop | ok. I will work on the image stuff today thanks | 15:41.30 |
tor8 | mvrhel_laptop: not sure I'm sold on the exact file name, but something to that effect will do fine :) | 15:41.40 |
Robin_Watts | mvrhel_laptop, tor8: That's something we should think about for a mo, maybe. | 15:41.41 |
| Do we want to separate the pdf in and out codebases? | 15:42.11 |
tor8 | Robin_Watts: Going even further and separating the pdf module into in/out/common directories? | 15:43.19 |
Robin_Watts | tor8: I was just about to suggest that. | 15:43.36 |
tor8 | A lot of the PDF data structures are read/write (like the xref) but separating reading and writing clearly would make it easier to navigate the source | 15:44.09 |
Robin_Watts | yeah. | 15:44.15 |
tor8 | It has gotten to be a bit of a jungle, with odds and bits spread out throughout | 15:44.36 |
henrys | anybody else have meeting stuff? | 15:44.41 |
mvrhel_laptop | ok. so pdf-create.c is clearly not the right name/way | 15:44.41 |
tor8 | mvrhel_laptop: the right way, but we might want to stop and think about doing a big reorganize. | 15:45.17 |
| oh boy, is that going to upset zeniko even more :) | 15:45.24 |
| though I suspect he gave up trying to maintain that big patch of his already | 15:45.51 |
henrys | tkamppeter: isn't all that logic already in rastertopdf and you just need to find xps substitutes for the generated pdf? or am I missing something? | 15:47.32 |
chrisl | I'd be quite surprised if those bizhub printers seriously support XPS - 32Mb and 120Mhz seems quite low end for full XPS support..... | 15:49.00 |
henrys | chrisl: yeah my friend who worked directly on those printers said to use the gdi driver. | 15:49.46 |
mvrhel_laptop | tor8: so should I continue with the way that I am going for now? or is there something else that I need to do. I understand the desire/need for the reorg. I think it will be easier for me to get up to speed if I finish up this bit for the mutool create first though. Pulling stuff out of pdf-device.c that is going to be use for both will move us toward the goal of having the common stuff... | 15:50.09 |
| ...in one place | 15:50.10 |
marcosw | rayjj: thanks for the suggestion. I just tried stocht.ps and it's not symmetric (the difference between the 0 and 1 patches is significant and the 253, 254, 255 patches are identically entirely white). Is this a bug or is ht_ccsto.ps tuned for a particular device (it appears to be doing negative dot gain). | 15:50.14 |
Robin_Watts | mvrhel_laptop: You push through as best suits you. | 15:50.27 |
| We can always rejig stuff. git FTW :) | 15:50.38 |
chrisl | henrys: I'm surprised there's much interest in GDI printers these days. either...... | 15:50.48 |
mvrhel_laptop | yes that is my thought, and I fully expect that to happen | 15:50.51 |
tor8 | mvrhel_laptop: Just keep going. The re-org may happen at any time either Robin or I get bored, but we're good with the git voodoo to fix things up :) | 15:51.04 |
tkamppeter | henrys, partially, for PDF I use the QPDF library which generates the PDF pieces for me. There seems to be no XPS processor or generator library for Linux/free software. | 15:51.19 |
henrys | chrisl: I think what he meant was the gdi driver will work XPS not so much, but he didn't say that explicitly. | 15:52.14 |
Robin_Watts | tor8: Have you tagged 1.8 ? | 15:52.36 |
chrisl | henrys: I wasn't specifically referring to that - just slightly surprised we're still talking about GDI printers these days! | 15:53.07 |
| tkamppeter: has anyone actually tested printing XPS to these devices? | 15:53.28 |
tkamppeter | chrisl, I don't know. | 15:53.44 |
henrys | chrisl: true and when you look at printer prices why is anyone fooling with one of these bizhub bricks in the first place | 15:54.07 |
chrisl | tkamppeter: It just seems to me we/you could devote a lot of time to this, only to find they don't really work | 15:54.15 |
tkamppeter | henrys, the "gdi" output device of GS is only for Samsung printers, especially older bw printers. | 15:54.31 |
tor8 | Robin_Watts: I have only tagged the RC-1 locally, and that tag is what currently is on origin/master | 15:54.57 |
chrisl | henrys: Well, we've had a couple of people complain about us dropping our Postscript Level 1 output, so....... | 15:55.04 |
henrys | tkamppeter: I sent you a link to a gdi thing that my friend said would work. I have no idea how it works | 15:55.16 |
tor8 | Robin_Watts: I think we should get the two oldest commits on tor/master into 1.8 | 15:55.18 |
| if you've looked them over I can push and then we can tag and build releases | 15:55.44 |
Robin_Watts | looking now. | 15:55.49 |
henrys | skype in 5 minutes. | 15:56.11 |
marcosw | I have to run to a doctor's appointment in a few minutes. does anyone have anything for me? | 15:56.19 |
chrisl | tkamppeter: I assume that the samsung gdi printer is really just a wrapper around a raster for each page | 15:56.37 |
Robin_Watts | tor8: screen_w - 20 etc | 15:56.39 |
henrys | marcosw: I'm good | 15:56.42 |
Robin_Watts | I dislike magic numbers. | 15:56.44 |
tor8 | Robin_Watts: they're *really* magic this time... | 15:56.58 |
Robin_Watts | Presumably that 20 is for window furniture width etc? | 15:56.58 |
tor8 | basically accounting for furniture as you said | 15:57.05 |
chrisl | #define MAGIC_NUMBER 20 | 15:57.21 |
Robin_Watts | enum { FURNITURE_WIDTH = 20, FURNITURE_HEIGHT=40} ; ? | 15:57.31 |
| likewise, there are layout_w/layout_h/layout_em that could be DEFAULT_LAYOUT_{W,H,EM} | 15:58.29 |
| Have all such things defined at the top of the file rather than buried in the middle of it. | 15:58.43 |
mvrhel_laptop | bbiab | 16:01.20 |
Robin_Watts | In the exception fiddling commit, rather than changing from error to ctx->error everywhere, we should do error = ctx->error at the top of the function. No need to force C to keep dereffing (pointer aliasing etc) | 16:01.57 |
tkamppeter | chrisl, that is the case, but the wrapper itself is specific to Samsung's printers. There is no PDL called GDI. GDI in reality is a printer driver API of Windows, and as GDI printer one understands a device without standard PDL and with Windows driver ("GDI driver") designed only for the use with Windows. With knowledge about the printer's PDL/communication protocol one can make it working with any OS, but the manufacturer keep this info secret, pr | 16:02.31 |
| obably to not show how dumb the printer is. | 16:02.31 |
tor8 | Robin_Watts: can do that. | 16:03.02 |
Robin_Watts | tor8: Actually... I'm having trouble seeing what the actual change in that commit is. Why the change from ctx->error to ctx when calling the functions ? | 16:03.52 |
tor8 | the ctx->error thing shouldn't really matter -- I'm just keeping things symmetrical (the throw/catch macros dereference the ctx->error mulitple times) | 16:03.53 |
chrisl | tkamppeter: sure. What I meant was that might be a better route to a working solution than XPS | 16:03.56 |
Robin_Watts | Oh, so you can use fz_throw ? | 16:04.01 |
tor8 | I needed to pass the ctx not the error context so I could use throw | 16:04.22 |
Robin_Watts | yeah. | 16:04.28 |
tkamppeter | chrisl, you mean reverse-engineering the proprietary language of Konica Minolta printers? This helps only for Konica Minolta and also Konica Minolta will not necessarily keep their proprietary language for longer time. So the safer investment of time is XPS, it could also serve for sending jobs to Windows servers (assuming that XPS is still used nowadays). | 16:12.14 |
chrisl | tkamppeter: XPS is not going to be a safer investment of time if it doesn't actually work | 16:13.36 |
tor8 | Robin_Watts: third commit on tor/master has magic number constants with names | 16:13.48 |
tkamppeter | But XPS is probably easier to make it to work as specs are published. | 16:15.43 |
| chrisl, ^^ | 16:15.49 |
chrisl | tkamppeter: yes, but it is also a full PDL, and the specs of at least those bizhub printers make me suspicious about the level to which they actually support XPS | 16:16.45 |
rayjj | marcosw: sorry, I had a minor issue I had to take care of. | 16:17.08 |
Robin_Watts | tor8: looks ok to me. | 16:17.10 |
| I have a preference for enums rather than #defines personally, but... | 16:17.26 |
tkamppeter | chrisl, you mean that the Konica Minolta devices do not really fully support XPS and so one still needs a model-specific driver for it, even if one sends XPS to the printer? | 16:18.16 |
chrisl | tkamppeter: Yes. | 16:18.44 |
rayjj | marcosw: yes, the ht_ccsto.ps is a 167x167 with a transfer function 'baked in'. We can generate any dimension stochastic threshold array and then apply any transfer function desired (including linear) to it | 16:19.00 |
| if the array is large enough, even with the transfer function, we will always get 256 shades | 16:19.58 |
tor8 | Robin_Watts: updated commit with enums instead | 16:20.11 |
chrisl | tkamppeter: I'm not saying *don't* pursue the XPS route, but I am saying, make sure it's going to work (enough) before heading down that route. | 16:20.34 |
rayjj | marcosw: and we can generate stochastic arrays with a minimum dot size (for laser/led engines) | 16:20.37 |
Robin_Watts | tor8: lovely! | 16:20.42 |
| rayjj, marcosw: tiffscaled supports a minimum feature size thing too. | 16:21.14 |
| -dMinFeatureSize can be 1,2 or 3, IIRC. | 16:22.11 |
rayjj | marcosw: I'll put together information on that for Phil, along with timings for the tiffscaled error diffusion vs. stochastic threshold array. I'll use a linear 256x256 array so images will be similr | 16:22.30 |
tkamppeter | chrisl, seems that XPS is not really worth the time, too few printers and one does not really know whether one makes them all work. | 16:23.26 |
rayjj | Robin_Watts: that sounds right. The stochastic array generator is a bit more specific in that you can choose the size/shape of the minimum dot 1x2, 2x1, 2x2, ... | 16:23.50 |
Robin_Watts | marcosw: What Phil *should* do is to use tiff24nc to generate a contone version. | 16:24.19 |
| Then he can try all the different methods he can think of to process that down to 1bpp. | 16:24.40 |
| When he finds the way that best works for him, we can try to reproduce it within gs. | 16:24.52 |
chrisl | tkamppeter: that is my feeling, but I am somewhat remote from consumer printing. It may be a case of "not worth the time, just now.... but worth keeping an eye on" | 16:24.55 |
rayjj | Robin_Watts: marcosw: I don't really know where Phil is going with this. Part of the RIP technology they got from the company they used (then bought) -- cust 850, was their "special" halftoning | 16:28.11 |
| in the JaNe device | 16:28.37 |
| they used gs to render to contone Lab and then the JaNe device took it from there | 16:29.13 |
| The other strange thing is the performance issue with the -dFirstPage (Quick Q thread) mentioned a G850 CPU with only 2Gb RAM, where previously they were using massive Xeon based 8 core (or more) 8Gb systems | 16:31.14 |
Robin_Watts | Tor8: so, you want me to build off golden/master now ? | 16:34.25 |
tor8 | Robin_Watts: yes please | 16:34.57 |
| I've got to pop out for dinner, but I'll check back in a couple of hours | 16:35.14 |
rayjj | Robin_Watts: The 'muddy' output from tiffscaled is probably due to their engine not working well with single dispersed dots. I'll mention that in my follow up as well | 16:36.37 |
Robin_Watts | rayjj: Right, so -dMinFeatureSize=2 may be enough to sort that. | 16:37.18 |
rayjj | Robin_Watts: I'll send him samples of it with that as well as stochastic threshold array (can we just say Blue Noise Mask, or BNM, now???) with single and 2x2 min dot as well -- both with no xfer function and timings | 16:39.03 |
| the reason the stochastic generator supports shaped minimum dots is that some engines have 2x1 resolutions such as 1200x600 | 16:42.02 |
| cust 532 has that mode -- they call it "fast 1200" | 16:42.37 |
Robin_Watts | rayjj: I understand the idea. I just didn't code it :) | 16:43.07 |
rayjj | I guess marketing thought that sounded better than "half assed 1200" | 16:43.28 |
| ;-) | 16:43.35 |
mvrhel_laptop | Robin_Watts: tor8 has stepped out so let me ask you | 16:51.09 |
| In his email he mentioned the cache that is used to avoid putting in an existing image into the resource object. In pdf-device.c this is a storage of md5s that are stored in a structure on the device. Where do we envision this structure residing when we add to our resources for mutool create | 16:55.07 |
Robin_Watts | Just give me a mo to have a look. | 16:56.15 |
mvrhel_laptop_ | hmm network issues here | 16:57.19 |
Robin_Watts | mvrhel_laptop: OK, so... we have fz_store. | 16:57.25 |
mvrhel_laptop_ | not sure if you had all my messages | 16:57.37 |
Robin_Watts | fz_store is the generic 'cache' for all sorts of things. | 16:57.42 |
mvrhel_laptop | ok | 16:58.04 |
Robin_Watts | Anything can be put into the fz_store, as long as its structure starts with an fz_storable struct. | 16:58.44 |
| That contains a reference count and a type pointer. | 17:00.17 |
| This was originally put in so that we could store stuff like decoded images; we'd decode the image and use it, and put it in the store. Then we'd drop our pointer to it. | 17:00.58 |
| So the store was the only thing holding a reference to it. | 17:01.08 |
| When we run low on memory the store would look through to see the things that have a single reference (i.e. just the store) holding them, and would bin them oldest first until we have enough memory to continue. | 17:01.54 |
mvrhel_laptop | ok | 17:02.04 |
| So I am having trouble seeing in send_image in pdf-device.c where fz_store is coming into play | 17:02.27 |
Robin_Watts | The question is, do we want (or can we) use fz_store as part of the pdf-device. | 17:02.44 |
mvrhel_laptop | or is the image already in the store when this is called? | 17:03.00 |
| I am missing something | 17:03.21 |
Robin_Watts | When writing a PDF file out, we want to keep a list of the images we've already written, and then check whether a new image we want to write has been written to the pdf file already. | 17:03.42 |
| mvrhel_laptop: (bear with me, the exposition is a way of me getting myself back up to speed) | 17:04.06 |
mvrhel_laptop | no problem. I feel bad having to bother you as I know you are busy | 17:04.25 |
Robin_Watts | So, we don't really want to be checking the fz_store, cos that only has stuff that we happen to have in memory, not an exhaustive list of stuff that exists in the resources for this page already. | 17:05.24 |
mvrhel_laptop | right | 17:05.38 |
Robin_Watts | So I think fz_store (and hence pdf_store) is a red herring. | 17:05.41 |
| OK, so in pdf-device.c, we have a structure definition for pdf_device. | 17:06.41 |
mvrhel_laptop | it seems that we need an array of md5 sums in some place that is not hook to the pdf_device | 17:06.43 |
Robin_Watts | That contains num_imgs/max_imgs/images. | 17:07.06 |
| Where image_entry *images = a block of max_imgs image_entries of which num_imgs are populated. | 17:07.40 |
mvrhel_laptop | yes. this is exactly where I am | 17:08.11 |
| and that all seems reasonable | 17:08.21 |
Robin_Watts | Every time we meet a new image, we take a digest of it, and check to see if that digest is listed in the existing images; if so we do (I assume) a slow exhaustive compare) to see if it matches. If not, we insert it as a new image. | 17:08.52 |
mvrhel_laptop | right | 17:08.58 |
Robin_Watts | So... that's all well and good for the case where we are creating completely new pages, and we don't worry about images being already in the PDF file. | 17:09.43 |
| But it falls down when appending new content to a page, or when reusing an image on a page that's already used elsewhere. | 17:10.25 |
| OK, I suspect I'm now caught up to where you were at the start of this conversation :) | 17:10.39 |
mvrhel_laptop | exactly | 17:10.45 |
| so tor8 had the following int pdf_create_image(fz_context *ctx, pdf_document *doc, fz_image *image) where we return an object number | 17:11.51 |
| My problem is that I don't see where I know whats already in doc | 17:12.16 |
Robin_Watts | mvrhel_laptop: Right. Me either. | 17:12.50 |
| SOMEWHERE, we need to have a 'check to see if this image exists in the document already' mechanism. | 17:13.20 |
| Whether that's hidden under pdf_create_image, or is expected to be called before pdf_create_image is called is a different discussion. | 17:13.47 |
| I reckon it might be nicest to hide it under there. | 17:14.45 |
| The fz_image *image pointer gives us everything we need to compare the image with existing ones. | 17:15.17 |
| The pdf_document *doc pointer gives us everything we need to see what's in the current document. | 17:15.41 |
| One idea might be to have a new structure as part of the doc that contains info for every image in the document. | 17:16.17 |
mvrhel_laptop | I was wondering that. Do we do some initial search/set up of the md5s in the existing doc and then search that | 17:16.32 |
Robin_Watts | We could populate it on first access. | 17:16.33 |
mvrhel_laptop | :) | 17:16.37 |
Robin_Watts | So it wouldn't slow every file open down. Only when we start writing images. | 17:16.57 |
mvrhel_laptop | yes | 17:17.04 |
Robin_Watts | We'd run through every object in the file looking for it to be a /Type/Image (or whatever it is) | 17:17.24 |
| and then stash details of W/H/compression. | 17:17.47 |
| Then if we write an image, we can quickly check if we have any potential matches. | 17:18.07 |
| For potential matches, we can update the info to also contain a hash of the compressed data. | 17:18.42 |
| Then we can exhaustively check if the hashes match. | 17:19.06 |
mvrhel_laptop | oh, ok, so as a first cut just compare the obvious diffs and then do the hashes if there may be a match | 17:19.17 |
| can the pdf-device use this too instead of pdev->images[i].digest | 17:19.48 |
Robin_Watts | Absolutely. pdf device should move over to using this. | 17:20.10 |
| This is a 'better' version of the limited thing I hacked up for pdf-device. | 17:20.27 |
mvrhel_laptop | ok. I think I have enough to keep me busy for a bit. thanks Robin_Watts | 17:21.11 |
| I may ping you again as I push on | 17:21.23 |
Robin_Watts | I suspect that fetching just X/Y/compression type etc from the file should be fast enough to not be a massive hit. | 17:21.31 |
| Actually.... | 17:21.40 |
| I'd suggested doing this by walking the entire file looking for objects that were /Type/Image. | 17:22.26 |
| That might be slow in the case where we have a file with lots of compressed objects. | 17:22.43 |
mvrhel_laptop | ok | 17:23.00 |
Robin_Watts | One possible way to be faster would be to walk the resources tree, and just check the Image objects listed there. | 17:23.17 |
| s/walk the resources tree/walk the page tree and just check the image objects in the resources dictionaries there/ | 17:23.48 |
| Cos all the Image objects in the file *should* be listed in the page resources tree, and that'll be a smaller set to walk. | 17:24.18 |
| But... another thought... | 17:25.09 |
mvrhel_laptop | Robin_Watts: ok. I imagine there must be code to do that already | 17:25.15 |
Robin_Watts | To walk the pages tree? Yes. To look for the image resources in the way I've just talked about, no. | 17:25.46 |
mvrhel_laptop | what does mutool extract do | 17:26.03 |
Robin_Watts | The other thought that occurs to me, is that images are not the only thing we're going to be doing this for. | 17:26.08 |
mvrhel_laptop | Robin_Watts: true | 17:26.21 |
Robin_Watts | We're going to need to do this for things like fonts, and potentially patterns etc. | 17:26.28 |
mvrhel_laptop | He wants to do the same thing with fonts | 17:26.36 |
| yes | 17:26.38 |
Robin_Watts | Which leads me to the idea that we should generalise what we do to not just the image case. | 17:26.54 |
| Maybe we can think of this as being a 'pdf resources indexer' | 17:27.16 |
| i.e. we maintain a index of the resources within the pdf file. | 17:27.34 |
| A 'resource' could be any of an image/font/pattern/+others and would have some 'cheap' info (X/Y/Compression in the case of images, font name/font type in the case of fonts etc), and then a shared digest field. | 17:30.23 |
| Does that sound like enough of an idea for the shape of it? | 17:30.44 |
| (can you think of any problems?) | 17:30.54 |
mvrhel_laptop | Robin_Watts: yes. so this would be a member variable of pdf_document? And we would still do the lazy initialization as well as the lazy fill in of the digest field. | 17:31.37 |
Robin_Watts | Yes, exactly. | 17:31.48 |
| All the resources can be found by walking the pdf page tree and looking at the resources dictionaries we pass. | 17:32.00 |
mvrhel_laptop | Robin_Watts: ok. I think I have enough to make a nice mess of things | 17:32.33 |
Robin_Watts | I suspect that at least 1 version of the PDF spec said that XObjects can have their own resource dictionaries, so we'd need to descend those parts of the trees too. | 17:32.36 |
mvrhel_laptop | Robin_Watts: ok one other question | 17:32.55 |
Robin_Watts | mvrhel_laptop: Tor will arrive tomorrow and describe a much simpler way of doing all this, no doubt :) | 17:33.01 |
mvrhel_laptop | so in walking the page tree is that going to be any different than going through all the objects like is done in pdfextract.c | 17:33.31 |
| where we find all the font and images | 17:33.43 |
| or is there something more sophisticated | 17:34.01 |
| or intelligent | 17:34.11 |
Robin_Watts | mvrhel_laptop: I think it'll be very similar to that. | 17:34.30 |
mvrhel_laptop | ok | 17:34.34 |
Robin_Watts | We should be careful to allow for malicious page trees. | 17:34.47 |
mvrhel_laptop | Thanks for the discussion Robin_Watts. | 17:34.47 |
| Robin_Watts: can you explain? | 17:35.14 |
Robin_Watts | i.e. ones where X is a descendent of X. | 17:35.24 |
mvrhel_laptop | ah | 17:35.28 |
Robin_Watts | We cope with that by doing pdf_mark and pdf_unmark as we descend to check we don't hit cycles. | 17:35.49 |
| (search in the source for pdf_mark and you'll find code that walks the page tree, I think) | 17:36.14 |
| Also, we should watch out for deeply nested trees (recursion may not be the best option, though it might be acceptable as a first step to getting something working) | 17:37.20 |
| I've seen documents that basically have a page tree that's a linked list. So 1500 pages works out as 1500 levels of recursion in the native tree walker :) | 17:38.18 |
mvrhel_laptop | wow | 17:38.33 |
| so the pdf_mark_obj and pdf_unmark_obj are used so that we can identify that we have already visited this object during searches? | 17:41.40 |
Robin_Watts | Yes. | 17:43.59 |
| There is a secret bit in our in memory representation of pdf_dicts for that. | 17:44.30 |
mvrhel_laptop | Robin_Watts: just looking at this. Another question. pdf_load_colorspace_imp does a throw if it detects a recursion in color space object. Shouldnt the call to pdf_load_colorspace_imp in pdf_load_colorspace be wrapped up in the try to rethrow? | 17:46.59 |
Robin_Watts | s/native tree walker/naive tree walker/ sorry. | 17:47.15 |
mvrhel_laptop | This is in pdf-colorspace.c . | 17:47.37 |
Robin_Watts | Why? | 17:47.59 |
| It only needs to be wrapped up if it needs to tidy up in the case of a throw. | 17:48.25 |
mvrhel_laptop | ok | 17:48.35 |
Robin_Watts | We have no tidying up to do here, so all we'd do is rethrow it anyway. | 17:48.45 |
| One of the (many) big attractions of try/catch over explicit error passing is that we don't need to clutter functions with extra stuff, unless there is actual cleanup to do. | 17:49.32 |
Robin_Watts | will convert you :) | 17:49.49 |
mvrhel_laptop | :) In this case, I am confused as to what gets returned | 17:50.26 |
Robin_Watts | mvrhel_laptop: What gets returned where? | 17:50.38 |
mvrhel_laptop | by pdf_load_colorspace_imp | 17:50.48 |
Robin_Watts | If pdf_load_colorspace_imp throws, then nothing is returned. | 17:50.55 |
| We don't go through C's return mechanism. | 17:51.13 |
mvrhel_laptop | ok | 17:51.16 |
Robin_Watts | We longjmp out to the previous catch. | 17:51.23 |
mvrhel_laptop | ah | 17:51.32 |
| I see | 17:51.41 |
| excuse my ignorance on that | 17:52.01 |
Robin_Watts | mvrhel_laptop: No worries. fz_try/fz_catch are best thought of as sausages. | 17:52.27 |
mvrhel_laptop | :) | 17:52.42 |
Robin_Watts | A fabulous invention that makes our lives better, but you don't want to look into what goes into them. | 17:52.54 |
mvrhel_laptop | ok Thanks for all the help Robin_Watts . bbiab | 17:54.18 |
Robin_Watts | no worries. | 17:54.41 |
rayjj | mvrhel_laptop: Robin_Watts: Does mupdf also handle "inline" images, w.r.t. extract images or writing PDF's ? | 18:09.36 |
Robin_Watts | rayjj: MuPDF handles inline images for reading, certainly. | 18:22.53 |
| For writing, we never write inline. | 18:23.08 |
| (currently) | 18:23.11 |
| For extracting images using the pdf structured text device (or any other device) you see inline and non-inline images as identical things. | 18:23.42 |
| For extracting images using the mutool extract, that works via object number, so, no, inline objects can't be accessed. | 18:24.15 |
| For the purposes of the discussion I just had with michael, we can ignore inline images, cos it makes no sense to try to match a previous inline image as we can't reuse it. | 18:24.47 |
mvrhel_laptop | right. that is a good point | 18:25.13 |
Robin_Watts | which is good, cos it doesn't invalidate the "look in the page tree resources dicts" plan :) | 18:26.16 |
mvrhel_laptop | yes | 18:26.25 |
Robin_Watts | mvrhel_laptop: Random thought... is it worth us writing a generic 'map this function over page tree entries' function ? | 18:27.05 |
mvrhel_laptop | Robin_Watts: I don't quite follow what you mean. I suspect you are at a higher abstraction level... | 18:27.54 |
Robin_Watts | Part of our current task is "For every node in the page tree, check the resources", right? | 18:28.26 |
mvrhel_laptop | yes | 18:28.31 |
Robin_Watts | So I'm thinking that we could do with a function MapOverPageTree(X) that would walk the page tree and call X(page) | 18:29.10 |
| so for our job X would do "check the resources for page" | 18:29.32 |
| Did that make it clearer? | 18:30.58 |
rayjj | Robin_Watts: if we have small images that are determined to be unique when the PDF is created, it is slightly more efficient to write them as inline images, but that may not be worth it (but gs does it) | 18:31.48 |
mvrhel_laptop | Robin_Watts: So we would have some defined proc prototype which might be set to check for font matches or might be set to check for image matches etc | 18:32.56 |
Robin_Watts | mvrhel_laptop: Yes. We'd have a function that walks the tree, and one of the params to it would be a function to run on each node. | 18:33.36 |
| Let me look in the existing code. | 18:33.56 |
mvrhel_laptop | I am going to need to start writing all of this down.... | 18:34.26 |
rayjj | mvrhel_laptop: on a different topic, since I'm putting something together for Phil about halftoning, I was going to mention gen_ordered as well as gen_stochastic -- back in 2011 I gave you a snapshot of the stochastic stuff. Did you ever do anything to/with it ? | 18:34.41 |
Robin_Watts | mvrhel_laptop: Currently we have a function 'pdf_lookup_page_loc' for example. | 18:35.15 |
| That knows how to efficiently and safely walk the pagetree allowing for cycles and not recursing too much. | 18:35.57 |
| It'd be lovely to refactor that somehow into a form whereby our other actions on the page tree could share that same intelligence. | 18:36.48 |
rayjj | mvrhel_laptop: and I realized that although we checked the tools into git, neither of us ever committed the linearize_threshold | 18:37.30 |
mvrhel_laptop | Robin_Watts: ok I understand | 18:38.08 |
Robin_Watts | mvrhel_laptop: It might require some cleverness so that we remain efficient in both cases. I haven't thought it through. | 18:39.05 |
mvrhel_laptop | rayjj: oh I see that we never added the linearization | 18:40.35 |
| I thought we had done that | 18:40.50 |
rayjj | mvrhel_laptop: you never made any changes to it, did you ? | 18:41.04 |
mvrhel_laptop | No. I did not | 18:41.10 |
| I remember playing with it | 18:41.24 |
| and feeding it into the gen_ordered stuff I think | 18:41.36 |
| is was too long ago | 18:41.41 |
rayjj | mvrhel_laptop: ISTR that you made gen_ordered so it could emit a file of the same format as gen_stochastic does so that it could be run through linearize_threshold | 18:42.03 |
mvrhel_laptop | yes | 18:42.10 |
| that is what I remember | 18:42.14 |
| ed | 18:42.16 |
rayjj | mvrhel_laptop: well, I'll just commit the version I have, and maybe a README on it (and gen_stochastic) | 18:42.43 |
mvrhel_laptop | rayjj: the readme in the ordered generation does mention using thresh_remap | 18:44.29 |
| A problem though if its not there.... | 18:44.40 |
| using the turn on sequence (tos) output | 18:45.04 |
tor8 | mvrhel_laptop: Robin_Watts: the approach robin outlined with adding a structure that tracks resources to pdf_document looks good | 18:46.52 |
mvrhel_laptop | tor8: ok good | 18:47.06 |
tor8 | we might think about only seeding it from resources discovered when parsing the pdf | 18:47.13 |
Robin_Watts | waits for the other shoe... | 18:47.17 |
tor8 | since if we're reusing resources, then we've already loaded them at least once | 18:47.29 |
rayjj | mvrhel_laptop: oh, should I rename it ? | 18:47.32 |
tor8 | mvrhel_laptop: I'd hand waved that resource tracker thing in my email, robin seems to have figured out what needs to be done. I might come up with some ideas once I've slept on it or seen it in action :) | 18:48.34 |
| and the purging of resources in low memory conditions does indeed mean the fz_store is unsuitable | 18:48.59 |
rayjj | mvrhel_laptop: I might as well rename it, since it's more descriptive of what it does | 18:49.08 |
tor8 | I expect a plain md5 (or sha1) sum of the contents should be good enough that we don't need to hang on to the actual data | 18:49.18 |
| Robin_Watts: in pdf-page.c there's a resource walker that looks for blending operations which we call whenever we load a page | 18:50.11 |
mvrhel_laptop | I saw that | 18:50.19 |
tor8 | pdf_resources_use_blending | 18:50.22 |
mvrhel_laptop | rayjj: I wonder where I got that name | 18:50.49 |
| so with mutool create -f F0 Times.ttf -i Im0 logo.png -i Im1 photo.jpg contents.txt if we find the document already has logo.png, how do we handle the replacement of the reference to Im0 in contents.txt | 18:52.24 |
| i.e. the indirect reference will be different | 18:52.46 |
| so I would need to filter contents.txt | 18:52.54 |
| tor8, Robin_Watts ^^, or am I missing something | 18:53.11 |
tor8 | mvrhel_laptop: if looking at my original email, the function pdf_store_image would rummage through the resource tracker looking for a match and if it finds it, returns the object number | 18:53.52 |
| the names (Im0 and Im1) are only used in the resource dictionary for that specific content stream to map a name to an object number | 18:54.32 |
| I expect that mapping to be different for each page | 18:54.39 |
| so "mutool create -i Im0 logo.png -i Im1 logo.png contents.txt" would create one resource for logo.png, and then map both Im0 and Im1 to that same object number | 18:55.27 |
| it would load logo.png twice, but the resource tracker would catch the duplication | 18:55.45 |
mvrhel_laptop | ok | 18:55.54 |
tor8 | does that make sense? | 18:55.59 |
mvrhel_laptop | that makes sense | 18:56.00 |
tor8 | we might want to rename my example functions pdf_store_* into something different, not to get them confused with the fz_store | 18:57.21 |
| pdf_track_image (calls a generic pdf_track_resource) maybe | 18:57.47 |
| you or robin may come up with a better name | 18:57.58 |
Robin_Watts | We already have pdf_store functions that build on the fz_store ones. | 19:00.00 |
| Any new functions should avoid the 'store' name, I reckon, for the benefit of easily confused people like me. | 19:00.31 |
| Is the pdf_resource name used anywhere ? | 19:00.52 |
mvrhel_laptop | so track does not do a whole lot for me | 19:00.58 |
| pdf_resource seems to not be used except for in pdf_resources_use_blending | 19:01.39 |
Robin_Watts | tor8: MuPDF android builds are done and uploaded to beta test. | 19:02.31 |
| And to my public_html | 19:02.49 |
| MuPDF-8{0,1,2,3}.apk | 19:02.59 |
hyper_ch | hi there, ghostscript is replacing Helvetica and other fonts in pdfs when I process them. Is that really necessary? Can't those fonts just stay? | 19:03.52 |
Robin_Watts | hyper_ch: gs is not "replacing" fonts in a PDF when it processes them, because gs does not 'process' a PDF :) | 19:04.36 |
| It consumes one PDF and throws out another PDF that hopefully looks like the one that came in. | 19:04.57 |
| But other than hopefully looking the same, the two PDFs are unrelated. | 19:05.27 |
| Now, if you're saying that the fonts are not surviving the conversion process, that's a valid concern. | 19:05.53 |
| The guy you need to speak to about this is kens, and he's gone for the night. | 19:06.08 |
| He'll be back in about 14 hours time, I guess. | 19:06.28 |
hyper_ch | Robin_Watts: https://paste.simplylinux.ch/view/20d726b7 | 19:07.52 |
Robin_Watts | Standard answer: Have you tried using an up to date version of gs? | 19:08.46 |
hyper_ch | it's the newest debian ;) | 19:09.01 |
Robin_Watts | So? | 19:09.30 |
hyper_ch | just saying :) | 19:09.46 |
Robin_Watts | I suspect the problem is that your input file names the fonts without embedding them. | 19:10.06 |
hyper_ch | that's also possible | 19:10.21 |
Robin_Watts | If gs can't find the fonts, then it substitutes them. | 19:10.35 |
hyper_ch | so if I installed helvetica, things would be well then? | 19:10.48 |
Robin_Watts | No... you've got it backwards. | 19:11.02 |
| The file says "Use LiberationSans" and gs says "I can't find LiberationSans, so I'm using Helvetica" | 19:11.22 |
hyper_ch | can't find -> can't find in the pdf/gs | 19:11.22 |
| ah | 19:11.45 |
Robin_Watts | You either need to get the source documents to include LiberationSans, OR you need to make it so that gs can find LiberationSans. | 19:12.00 |
| Now, how you make it so that gs can find LiberationSans, is a good question that I'm not entirely sure of the answer to. | 19:12.29 |
| chrisl, rayjj, or kens would probably know. | 19:13.05 |
hyper_ch | when you mention rayjj, he leaves the channel :) | 19:14.58 |
Robin_Watts | Is GS_FONTPATH set on your system ? | 19:15.26 |
| If not, try setting it to the path to where LiberationSans can be found. | 19:16.26 |
| If that doesn't work, you may need to fiddle with FontMap, and at that point, I run away, sorry. | 19:16.46 |
| http://www.ghostscript.com/doc/current/Use.htm#Font_lookup | 19:16.57 |
hyper_ch | will do so | 19:19.02 |
mvrhel_laptop | lunch time | 19:37.44 |
| Forward 1 day (to 2015/11/11)>>> | |