IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2015/11/09)20151110 
marcosw Robin_Watts: WOW, the SM951 fast! cd04:37.09 
  bonnie++ reports 1.5 GB/s writes and 2.3 GB/s reads. Amusingly most of the results are ++++, indicating the test finished too fast to measure accurately, at least with the default parameters.04:53.43 
  The SM951 does get warm; it went from 40C to 81C in a couple of minutes with continuous read/writes. The limit is 85C but it appeared to start throttling when it hit 80C. The M2 slot doesn't get much airflow with the case I'm using.05:37.46 
James Hi11:13.52 
ghostbot Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.11:13.52 
James Is anyone there11:13.56 
kens No11:14.09 
Guest49912 Ha ha11:14.15 
  Quick question for you11:14.21 
  Mupdf looks great11:14.50 
  Does it support interactive PDF for page navigation?11:15.09 
kens I don't know what you mean by that, are you talking about following hyperlinks ?11:15.38 
Guest49912 Yes11:15.48 
  Exactly11:15.50 
  So users can navigate from a contents page11:16.03 
Robin_Watts MuPDF is a portable C library for opening/manipulating PDF (and other files).11:16.10 
Guest49912 Instead of flicking through hundreds of pages11:16.14 
Robin_Watts The core of mupdf certainly reads that information from the PDF files.11:16.37 
kens It may depend on the OS< the demo viewer varies with the platform. However its certainly possible in the core library and I thought it was present in the viewers. I am not a MuPDF developer though11:16.38 
tor8 Guest49912: Yes. The MuPDF viewer supports hyperlinks. The mobile and the new desktop viewers also support the outline/table of contents for quick navigation.11:16.58 
Robin_Watts The viewers for various different platforms are just thin wrappers around the core. not all of them expose all of the functionality of the core.11:17.03 
Guest49912 Ok, were running it on android from google play11:17.04 
Robin_Watts So, depending on what platform you want to run it on, ymmv.11:17.17 
Guest49912 Just trying to work out the best way of getting it to work11:17.18 
  It's failing at the moment with hyperlinks set from acrobat 11:17.36 
Robin_Watts Guest49912: Yes the android app supports link following. But only if you click the 'link' icon in the top bar first :)11:17.49 
  Look for the icon that looks like a chain.11:17.59 
mhayden just stumbled onto mupdf and i love it -- thanks for the work there, folks13:52.55 
sebras mhayden: are you using the android app? or the pre-built apps for windows/ios/linux/winrt?13:55.22 
mhayden i'm using it in linux (Fedora)13:55.45 
sebras mhayden: alright, it is always interesting to know. thanks! :)14:44.39 
mhayden no problem ;)14:53.52 
henrys Robin_Watts: tiffscaled has really become an important devicef for us, great stuff!15:15.20 
Robin_Watts henrys: We should consider a tiffscaledets device maybe at some point.15:16.19 
henrys Robin_Watts: good idea.15:18.07 
Robin_Watts All the smarts of tiffscaled are in the downscaler which is used by lots of devices. It would be a matter of updating that to know how to do ETS.15:20.55 
  and then we'd use the existing tiffscaled etc, but with -dDownScaleMode=1 to tell it to use ETS.15:21.29 
henrys Robin_Watts: I'll add it to the project list on the agenda. Maybe something rayjj would be interested in doing.15:22.42 
marcosw Robin_Watts: thanks for the Phil email analysis. 15:23.32 
Robin_Watts marcosw: np.15:23.42 
henrys Other than asking about the mupdf release I'm not really feeling the need for a meeting.15:24.59 
  tkamppeter: were you able to help out the folks trying to use the old konica minolta printers? I visited their office last week and mentioned the community was having printing problems.15:27.23 
  tkamppeter: not the HQ but the "printer language group"15:28.01 
  I saw Spy last night and I'm still laughing about it. Good fun movie.15:28.44 
mvrhel_laptop I was wondering if anyone had gone ahead and reinstalled the gsview beta (for windows) that I had chrisl put up a couple of weeks ago. wanted to make sure there were not any major issues15:28.50 
henrys mvrhel_laptop: I will today at some point15:29.28 
mvrhel_laptop thanks15:29.36 
henrys chrisl: does it make sense to test the final commercial release or do you think the last problem we had is isolated and not likely to recur15:30.03 
  ?15:30.05 
Robin_Watts mvrhel_laptop: This is still called gsview_setup_6.0.exe ?15:30.51 
chrisl henrys: I do test it - but I rarely use the VS projects, and almost always build at the command line15:30.53 
marcosw Robin_Watts: take a look a the 85 and 170 spots in the tiffscaled output, the error diffusion fails in an interesting way.15:31.03 
mvrhel_laptop Robin_Watts: yeah we probably should have but a beta_# after it15:31.16 
  let me look what the tag was15:31.30 
Robin_Watts marcosw: That's not an unexpected failure.15:31.59 
  mvrhel_laptop: Just wanted to make sure I was downloading the right version.15:32.19 
  We should have a build number in there.15:32.26 
tkamppeter henrys, these printers need XPS, there is someone (Helge Blischke) who wrote a simple filter in Perl for using GS with xpswrite, but I think that later on something in cups-filters would be needed. Also a rastertoxps to print from phones would be a nice thing.15:32.29 
Robin_Watts gsview_setup_6.0_0001.exe etc15:32.43 
mvrhel_laptop yes15:32.55 
  I will get that added in the nsis script15:33.41 
henrys tkamppeter: raster to xps should be pretty trivial15:33.48 
tkamppeter henrys, but before investing more time into a thing like rastertoxps I would need to know how many modern printers are actually XPS-only, as all Google results about these Konica Minolta printers are from 7 or 8 years ago.15:33.52 
mvrhel_laptop XPS-only has the be very rare15:34.18 
  I can't imagine15:34.25 
  s/the/to/15:34.41 
henrys tkamppeter: I don't think it's a good thing to spend a lot of time on but writing a rastertoxps is probably 3 hours so...15:35.03 
rayjj Robin_Watts: marcosw: rather than ETS (or other error diffusion), Phil may like the output using the stochastic threshold array. This should be faster to halftone and won't have strange effects15:35.03 
tkamppeter henrys, most work on a rastertoxps is to find out about the format, the structure, starting pages, selecting resolution and color space, ... the bitmap itself probably goes the same way as the rastertopdf filter for example.15:36.09 
tor8 Robin_Watts: could you compile the android release apks? the android sdk keeps breaking on my machine... stupid thing *still* doesn't work properly on 64-bit linux15:36.24 
Robin_Watts tor8: Sure.15:36.44 
  Did the VS2005 64bit building issues get sorted?15:36.54 
  mvrhel_laptop: gsview seems to work for me.15:38.04 
mvrhel_laptop thanks Robin_Watts 15:38.19 
henrys rayjj: this does sound like something for a twiki page... when to use halftone in the pipeline vs postprocessing. Really hard for me to imagine when you'd want in pipeline for host based stuff.15:38.32 
  rayjj: it would be interesting to see some timing numbers15:39.01 
  tor8: is that the only problem for the release?15:40.02 
tor8 henrys: as far as I can tell, yes.15:40.24 
  oh, and ios builds15:40.32 
  Robin_Watts: there are two commits on tor/master that we could probably sneak into the 1.8 release15:40.51 
mvrhel_laptop tor8: so do we want to add something like a pdf-create.c in the pdf directory to add in the stuff that I am pulling out from pdf-device.c (e.g. the code to create the image and font resources)15:40.52 
tor8 mvrhel_laptop: yes.15:41.12 
mvrhel_laptop ok. I will work on the image stuff today thanks15:41.30 
tor8 mvrhel_laptop: not sure I'm sold on the exact file name, but something to that effect will do fine :)15:41.40 
Robin_Watts mvrhel_laptop, tor8: That's something we should think about for a mo, maybe.15:41.41 
  Do we want to separate the pdf in and out codebases?15:42.11 
tor8 Robin_Watts: Going even further and separating the pdf module into in/out/common directories?15:43.19 
Robin_Watts tor8: I was just about to suggest that.15:43.36 
tor8 A lot of the PDF data structures are read/write (like the xref) but separating reading and writing clearly would make it easier to navigate the source15:44.09 
Robin_Watts yeah.15:44.15 
tor8 It has gotten to be a bit of a jungle, with odds and bits spread out throughout15:44.36 
henrys anybody else have meeting stuff?15:44.41 
mvrhel_laptop ok. so pdf-create.c is clearly not the right name/way15:44.41 
tor8 mvrhel_laptop: the right way, but we might want to stop and think about doing a big reorganize.15:45.17 
  oh boy, is that going to upset zeniko even more :)15:45.24 
  though I suspect he gave up trying to maintain that big patch of his already15:45.51 
henrys tkamppeter: isn't all that logic already in rastertopdf and you just need to find xps substitutes for the generated pdf? or am I missing something?15:47.32 
chrisl I'd be quite surprised if those bizhub printers seriously support XPS - 32Mb and 120Mhz seems quite low end for full XPS support.....15:49.00 
henrys chrisl: yeah my friend who worked directly on those printers said to use the gdi driver.15:49.46 
mvrhel_laptop tor8: so should I continue with the way that I am going for now? or is there something else that I need to do. I understand the desire/need for the reorg. I think it will be easier for me to get up to speed if I finish up this bit for the mutool create first though. Pulling stuff out of pdf-device.c that is going to be use for both will move us toward the goal of having the common stuff...15:50.09 
  ...in one place15:50.10 
marcosw rayjj: thanks for the suggestion. I just tried stocht.ps and it's not symmetric (the difference between the 0 and 1 patches is significant and the 253, 254, 255 patches are identically entirely white). Is this a bug or is ht_ccsto.ps tuned for a particular device (it appears to be doing negative dot gain).15:50.14 
Robin_Watts mvrhel_laptop: You push through as best suits you.15:50.27 
  We can always rejig stuff. git FTW :)15:50.38 
chrisl henrys: I'm surprised there's much interest in GDI printers these days. either......15:50.48 
mvrhel_laptop yes that is my thought, and I fully expect that to happen15:50.51 
tor8 mvrhel_laptop: Just keep going. The re-org may happen at any time either Robin or I get bored, but we're good with the git voodoo to fix things up :)15:51.04 
tkamppeter henrys, partially, for PDF I use the QPDF library which generates the PDF pieces for me. There seems to be no XPS processor or generator library for Linux/free software.15:51.19 
henrys chrisl: I think what he meant was the gdi driver will work XPS not so much, but he didn't say that explicitly.15:52.14 
Robin_Watts tor8: Have you tagged 1.8 ?15:52.36 
chrisl henrys: I wasn't specifically referring to that - just slightly surprised we're still talking about GDI printers these days!15:53.07 
  tkamppeter: has anyone actually tested printing XPS to these devices?15:53.28 
tkamppeter chrisl, I don't know.15:53.44 
henrys chrisl: true and when you look at printer prices why is anyone fooling with one of these bizhub bricks in the first place15:54.07 
chrisl tkamppeter: It just seems to me we/you could devote a lot of time to this, only to find they don't really work15:54.15 
tkamppeter henrys, the "gdi" output device of GS is only for Samsung printers, especially older bw printers.15:54.31 
tor8 Robin_Watts: I have only tagged the RC-1 locally, and that tag is what currently is on origin/master15:54.57 
chrisl henrys: Well, we've had a couple of people complain about us dropping our Postscript Level 1 output, so.......15:55.04 
henrys tkamppeter: I sent you a link to a gdi thing that my friend said would work. I have no idea how it works15:55.16 
tor8 Robin_Watts: I think we should get the two oldest commits on tor/master into 1.815:55.18 
  if you've looked them over I can push and then we can tag and build releases15:55.44 
Robin_Watts looking now.15:55.49 
henrys skype in 5 minutes.15:56.11 
marcosw I have to run to a doctor's appointment in a few minutes. does anyone have anything for me?15:56.19 
chrisl tkamppeter: I assume that the samsung gdi printer is really just a wrapper around a raster for each page15:56.37 
Robin_Watts tor8: screen_w - 20 etc15:56.39 
henrys marcosw: I'm good15:56.42 
Robin_Watts I dislike magic numbers.15:56.44 
tor8 Robin_Watts: they're *really* magic this time...15:56.58 
Robin_Watts Presumably that 20 is for window furniture width etc?15:56.58 
tor8 basically accounting for furniture as you said15:57.05 
chrisl #define MAGIC_NUMBER 2015:57.21 
Robin_Watts enum { FURNITURE_WIDTH = 20, FURNITURE_HEIGHT=40} ; ? 15:57.31 
  likewise, there are layout_w/layout_h/layout_em that could be DEFAULT_LAYOUT_{W,H,EM}15:58.29 
  Have all such things defined at the top of the file rather than buried in the middle of it.15:58.43 
mvrhel_laptop bbiab16:01.20 
Robin_Watts In the exception fiddling commit, rather than changing from error to ctx->error everywhere, we should do error = ctx->error at the top of the function. No need to force C to keep dereffing (pointer aliasing etc)16:01.57 
tkamppeter chrisl, that is the case, but the wrapper itself is specific to Samsung's printers. There is no PDL called GDI. GDI in reality is a printer driver API of Windows, and as GDI printer one understands a device without standard PDL and with Windows driver ("GDI driver") designed only for the use with Windows. With knowledge about the printer's PDL/communication protocol one can make it working with any OS, but the manufacturer keep this info secret, pr16:02.31 
  obably to not show how dumb the printer is.16:02.31 
tor8 Robin_Watts: can do that.16:03.02 
Robin_Watts tor8: Actually... I'm having trouble seeing what the actual change in that commit is. Why the change from ctx->error to ctx when calling the functions ?16:03.52 
tor8 the ctx->error thing shouldn't really matter -- I'm just keeping things symmetrical (the throw/catch macros dereference the ctx->error mulitple times)16:03.53 
chrisl tkamppeter: sure. What I meant was that might be a better route to a working solution than XPS16:03.56 
Robin_Watts Oh, so you can use fz_throw ?16:04.01 
tor8 I needed to pass the ctx not the error context so I could use throw16:04.22 
Robin_Watts yeah.16:04.28 
tkamppeter chrisl, you mean reverse-engineering the proprietary language of Konica Minolta printers? This helps only for Konica Minolta and also Konica Minolta will not necessarily keep their proprietary language for longer time. So the safer investment of time is XPS, it could also serve for sending jobs to Windows servers (assuming that XPS is still used nowadays).16:12.14 
chrisl tkamppeter: XPS is not going to be a safer investment of time if it doesn't actually work 16:13.36 
tor8 Robin_Watts: third commit on tor/master has magic number constants with names16:13.48 
tkamppeter But XPS is probably easier to make it to work as specs are published.16:15.43 
  chrisl, ^^16:15.49 
chrisl tkamppeter: yes, but it is also a full PDL, and the specs of at least those bizhub printers make me suspicious about the level to which they actually support XPS16:16.45 
rayjj marcosw: sorry, I had a minor issue I had to take care of.16:17.08 
Robin_Watts tor8: looks ok to me.16:17.10 
  I have a preference for enums rather than #defines personally, but...16:17.26 
tkamppeter chrisl, you mean that the Konica Minolta devices do not really fully support XPS and so one still needs a model-specific driver for it, even if one sends XPS to the printer?16:18.16 
chrisl tkamppeter: Yes.16:18.44 
rayjj marcosw: yes, the ht_ccsto.ps is a 167x167 with a transfer function 'baked in'. We can generate any dimension stochastic threshold array and then apply any transfer function desired (including linear) to it16:19.00 
  if the array is large enough, even with the transfer function, we will always get 256 shades16:19.58 
tor8 Robin_Watts: updated commit with enums instead16:20.11 
chrisl tkamppeter: I'm not saying *don't* pursue the XPS route, but I am saying, make sure it's going to work (enough) before heading down that route.16:20.34 
rayjj marcosw: and we can generate stochastic arrays with a minimum dot size (for laser/led engines)16:20.37 
Robin_Watts tor8: lovely!16:20.42 
  rayjj, marcosw: tiffscaled supports a minimum feature size thing too.16:21.14 
  -dMinFeatureSize can be 1,2 or 3, IIRC.16:22.11 
rayjj marcosw: I'll put together information on that for Phil, along with timings for the tiffscaled error diffusion vs. stochastic threshold array. I'll use a linear 256x256 array so images will be similr16:22.30 
tkamppeter chrisl, seems that XPS is not really worth the time, too few printers and one does not really know whether one makes them all work.16:23.26 
rayjj Robin_Watts: that sounds right. The stochastic array generator is a bit more specific in that you can choose the size/shape of the minimum dot 1x2, 2x1, 2x2, ...16:23.50 
Robin_Watts marcosw: What Phil *should* do is to use tiff24nc to generate a contone version.16:24.19 
  Then he can try all the different methods he can think of to process that down to 1bpp.16:24.40 
  When he finds the way that best works for him, we can try to reproduce it within gs.16:24.52 
chrisl tkamppeter: that is my feeling, but I am somewhat remote from consumer printing. It may be a case of "not worth the time, just now.... but worth keeping an eye on"16:24.55 
rayjj Robin_Watts: marcosw: I don't really know where Phil is going with this. Part of the RIP technology they got from the company they used (then bought) -- cust 850, was their "special" halftoning16:28.11 
  in the JaNe device16:28.37 
  they used gs to render to contone Lab and then the JaNe device took it from there16:29.13 
  The other strange thing is the performance issue with the -dFirstPage (Quick Q thread) mentioned a G850 CPU with only 2Gb RAM, where previously they were using massive Xeon based 8 core (or more) 8Gb systems16:31.14 
Robin_Watts Tor8: so, you want me to build off golden/master now ?16:34.25 
tor8 Robin_Watts: yes please16:34.57 
  I've got to pop out for dinner, but I'll check back in a couple of hours16:35.14 
rayjj Robin_Watts: The 'muddy' output from tiffscaled is probably due to their engine not working well with single dispersed dots. I'll mention that in my follow up as well16:36.37 
Robin_Watts rayjj: Right, so -dMinFeatureSize=2 may be enough to sort that.16:37.18 
rayjj Robin_Watts: I'll send him samples of it with that as well as stochastic threshold array (can we just say Blue Noise Mask, or BNM, now???) with single and 2x2 min dot as well -- both with no xfer function and timings16:39.03 
  the reason the stochastic generator supports shaped minimum dots is that some engines have 2x1 resolutions such as 1200x60016:42.02 
  cust 532 has that mode -- they call it "fast 1200"16:42.37 
Robin_Watts rayjj: I understand the idea. I just didn't code it :)16:43.07 
rayjj I guess marketing thought that sounded better than "half assed 1200"16:43.28 
  ;-)16:43.35 
mvrhel_laptop Robin_Watts: tor8 has stepped out so let me ask you 16:51.09 
  In his email he mentioned the cache that is used to avoid putting in an existing image into the resource object. In pdf-device.c this is a storage of md5s that are stored in a structure on the device. Where do we envision this structure residing when we add to our resources for mutool create16:55.07 
Robin_Watts Just give me a mo to have a look.16:56.15 
mvrhel_laptop_ hmm network issues here16:57.19 
Robin_Watts mvrhel_laptop: OK, so... we have fz_store.16:57.25 
mvrhel_laptop_ not sure if you had all my messages16:57.37 
Robin_Watts fz_store is the generic 'cache' for all sorts of things.16:57.42 
mvrhel_laptop ok16:58.04 
Robin_Watts Anything can be put into the fz_store, as long as its structure starts with an fz_storable struct.16:58.44 
  That contains a reference count and a type pointer.17:00.17 
  This was originally put in so that we could store stuff like decoded images; we'd decode the image and use it, and put it in the store. Then we'd drop our pointer to it.17:00.58 
  So the store was the only thing holding a reference to it.17:01.08 
  When we run low on memory the store would look through to see the things that have a single reference (i.e. just the store) holding them, and would bin them oldest first until we have enough memory to continue.17:01.54 
mvrhel_laptop ok17:02.04 
  So I am having trouble seeing in send_image in pdf-device.c where fz_store is coming into play17:02.27 
Robin_Watts The question is, do we want (or can we) use fz_store as part of the pdf-device.17:02.44 
mvrhel_laptop or is the image already in the store when this is called?17:03.00 
  I am missing something17:03.21 
Robin_Watts When writing a PDF file out, we want to keep a list of the images we've already written, and then check whether a new image we want to write has been written to the pdf file already.17:03.42 
  mvrhel_laptop: (bear with me, the exposition is a way of me getting myself back up to speed)17:04.06 
mvrhel_laptop no problem. I feel bad having to bother you as I know you are busy17:04.25 
Robin_Watts So, we don't really want to be checking the fz_store, cos that only has stuff that we happen to have in memory, not an exhaustive list of stuff that exists in the resources for this page already.17:05.24 
mvrhel_laptop right17:05.38 
Robin_Watts So I think fz_store (and hence pdf_store) is a red herring.17:05.41 
  OK, so in pdf-device.c, we have a structure definition for pdf_device.17:06.41 
mvrhel_laptop it seems that we need an array of md5 sums in some place that is not hook to the pdf_device 17:06.43 
Robin_Watts That contains num_imgs/max_imgs/images.17:07.06 
  Where image_entry *images = a block of max_imgs image_entries of which num_imgs are populated.17:07.40 
mvrhel_laptop yes. this is exactly where I am17:08.11 
  and that all seems reasonable17:08.21 
Robin_Watts Every time we meet a new image, we take a digest of it, and check to see if that digest is listed in the existing images; if so we do (I assume) a slow exhaustive compare) to see if it matches. If not, we insert it as a new image.17:08.52 
mvrhel_laptop right17:08.58 
Robin_Watts So... that's all well and good for the case where we are creating completely new pages, and we don't worry about images being already in the PDF file.17:09.43 
  But it falls down when appending new content to a page, or when reusing an image on a page that's already used elsewhere.17:10.25 
  OK, I suspect I'm now caught up to where you were at the start of this conversation :)17:10.39 
mvrhel_laptop exactly17:10.45 
  so tor8 had the following int pdf_create_image(fz_context *ctx, pdf_document *doc, fz_image *image) where we return an object number17:11.51 
  My problem is that I don't see where I know whats already in doc17:12.16 
Robin_Watts mvrhel_laptop: Right. Me either.17:12.50 
  SOMEWHERE, we need to have a 'check to see if this image exists in the document already' mechanism.17:13.20 
  Whether that's hidden under pdf_create_image, or is expected to be called before pdf_create_image is called is a different discussion.17:13.47 
  I reckon it might be nicest to hide it under there.17:14.45 
  The fz_image *image pointer gives us everything we need to compare the image with existing ones.17:15.17 
  The pdf_document *doc pointer gives us everything we need to see what's in the current document.17:15.41 
  One idea might be to have a new structure as part of the doc that contains info for every image in the document.17:16.17 
mvrhel_laptop I was wondering that. Do we do some initial search/set up of the md5s in the existing doc and then search that 17:16.32 
Robin_Watts We could populate it on first access.17:16.33 
mvrhel_laptop :)17:16.37 
Robin_Watts So it wouldn't slow every file open down. Only when we start writing images.17:16.57 
mvrhel_laptop yes17:17.04 
Robin_Watts We'd run through every object in the file looking for it to be a /Type/Image (or whatever it is)17:17.24 
  and then stash details of W/H/compression.17:17.47 
  Then if we write an image, we can quickly check if we have any potential matches.17:18.07 
  For potential matches, we can update the info to also contain a hash of the compressed data.17:18.42 
  Then we can exhaustively check if the hashes match.17:19.06 
mvrhel_laptop oh, ok, so as a first cut just compare the obvious diffs and then do the hashes if there may be a match17:19.17 
  can the pdf-device use this too instead of pdev->images[i].digest 17:19.48 
Robin_Watts Absolutely. pdf device should move over to using this.17:20.10 
  This is a 'better' version of the limited thing I hacked up for pdf-device.17:20.27 
mvrhel_laptop ok. I think I have enough to keep me busy for a bit. thanks Robin_Watts 17:21.11 
  I may ping you again as I push on17:21.23 
Robin_Watts I suspect that fetching just X/Y/compression type etc from the file should be fast enough to not be a massive hit.17:21.31 
  Actually....17:21.40 
  I'd suggested doing this by walking the entire file looking for objects that were /Type/Image.17:22.26 
  That might be slow in the case where we have a file with lots of compressed objects.17:22.43 
mvrhel_laptop ok17:23.00 
Robin_Watts One possible way to be faster would be to walk the resources tree, and just check the Image objects listed there.17:23.17 
  s/walk the resources tree/walk the page tree and just check the image objects in the resources dictionaries there/17:23.48 
  Cos all the Image objects in the file *should* be listed in the page resources tree, and that'll be a smaller set to walk.17:24.18 
  But... another thought...17:25.09 
mvrhel_laptop Robin_Watts: ok. I imagine there must be code to do that already17:25.15 
Robin_Watts To walk the pages tree? Yes. To look for the image resources in the way I've just talked about, no.17:25.46 
mvrhel_laptop what does mutool extract do17:26.03 
Robin_Watts The other thought that occurs to me, is that images are not the only thing we're going to be doing this for.17:26.08 
mvrhel_laptop Robin_Watts: true17:26.21 
Robin_Watts We're going to need to do this for things like fonts, and potentially patterns etc.17:26.28 
mvrhel_laptop He wants to do the same thing with fonts17:26.36 
  yes17:26.38 
Robin_Watts Which leads me to the idea that we should generalise what we do to not just the image case.17:26.54 
  Maybe we can think of this as being a 'pdf resources indexer'17:27.16 
  i.e. we maintain a index of the resources within the pdf file.17:27.34 
  A 'resource' could be any of an image/font/pattern/+others and would have some 'cheap' info (X/Y/Compression in the case of images, font name/font type in the case of fonts etc), and then a shared digest field.17:30.23 
  Does that sound like enough of an idea for the shape of it?17:30.44 
  (can you think of any problems?)17:30.54 
mvrhel_laptop Robin_Watts: yes. so this would be a member variable of pdf_document? And we would still do the lazy initialization as well as the lazy fill in of the digest field. 17:31.37 
Robin_Watts Yes, exactly.17:31.48 
  All the resources can be found by walking the pdf page tree and looking at the resources dictionaries we pass.17:32.00 
mvrhel_laptop Robin_Watts: ok. I think I have enough to make a nice mess of things17:32.33 
Robin_Watts I suspect that at least 1 version of the PDF spec said that XObjects can have their own resource dictionaries, so we'd need to descend those parts of the trees too.17:32.36 
mvrhel_laptop Robin_Watts: ok one other question17:32.55 
Robin_Watts mvrhel_laptop: Tor will arrive tomorrow and describe a much simpler way of doing all this, no doubt :)17:33.01 
mvrhel_laptop so in walking the page tree is that going to be any different than going through all the objects like is done in pdfextract.c17:33.31 
  where we find all the font and images17:33.43 
  or is there something more sophisticated17:34.01 
  or intelligent17:34.11 
Robin_Watts mvrhel_laptop: I think it'll be very similar to that.17:34.30 
mvrhel_laptop ok17:34.34 
Robin_Watts We should be careful to allow for malicious page trees.17:34.47 
mvrhel_laptop Thanks for the discussion Robin_Watts. 17:34.47 
  Robin_Watts: can you explain?17:35.14 
Robin_Watts i.e. ones where X is a descendent of X.17:35.24 
mvrhel_laptop ah17:35.28 
Robin_Watts We cope with that by doing pdf_mark and pdf_unmark as we descend to check we don't hit cycles.17:35.49 
  (search in the source for pdf_mark and you'll find code that walks the page tree, I think)17:36.14 
  Also, we should watch out for deeply nested trees (recursion may not be the best option, though it might be acceptable as a first step to getting something working)17:37.20 
  I've seen documents that basically have a page tree that's a linked list. So 1500 pages works out as 1500 levels of recursion in the native tree walker :)17:38.18 
mvrhel_laptop wow17:38.33 
  so the pdf_mark_obj and pdf_unmark_obj are used so that we can identify that we have already visited this object during searches?17:41.40 
Robin_Watts Yes.17:43.59 
  There is a secret bit in our in memory representation of pdf_dicts for that.17:44.30 
mvrhel_laptop Robin_Watts: just looking at this. Another question. pdf_load_colorspace_imp does a throw if it detects a recursion in color space object. Shouldnt the call to pdf_load_colorspace_imp in pdf_load_colorspace be wrapped up in the try to rethrow?17:46.59 
Robin_Watts s/native tree walker/naive tree walker/ sorry.17:47.15 
mvrhel_laptop This is in pdf-colorspace.c . 17:47.37 
Robin_Watts Why?17:47.59 
  It only needs to be wrapped up if it needs to tidy up in the case of a throw.17:48.25 
mvrhel_laptop ok17:48.35 
Robin_Watts We have no tidying up to do here, so all we'd do is rethrow it anyway.17:48.45 
  One of the (many) big attractions of try/catch over explicit error passing is that we don't need to clutter functions with extra stuff, unless there is actual cleanup to do.17:49.32 
Robin_Watts will convert you :)17:49.49 
mvrhel_laptop :) In this case, I am confused as to what gets returned17:50.26 
Robin_Watts mvrhel_laptop: What gets returned where?17:50.38 
mvrhel_laptop by pdf_load_colorspace_imp17:50.48 
Robin_Watts If pdf_load_colorspace_imp throws, then nothing is returned.17:50.55 
  We don't go through C's return mechanism.17:51.13 
mvrhel_laptop ok17:51.16 
Robin_Watts We longjmp out to the previous catch.17:51.23 
mvrhel_laptop ah17:51.32 
  I see17:51.41 
  excuse my ignorance on that17:52.01 
Robin_Watts mvrhel_laptop: No worries. fz_try/fz_catch are best thought of as sausages.17:52.27 
mvrhel_laptop :)17:52.42 
Robin_Watts A fabulous invention that makes our lives better, but you don't want to look into what goes into them.17:52.54 
mvrhel_laptop ok Thanks for all the help Robin_Watts . bbiab17:54.18 
Robin_Watts no worries.17:54.41 
rayjj mvrhel_laptop: Robin_Watts: Does mupdf also handle "inline" images, w.r.t. extract images or writing PDF's ?18:09.36 
Robin_Watts rayjj: MuPDF handles inline images for reading, certainly.18:22.53 
  For writing, we never write inline.18:23.08 
  (currently)18:23.11 
  For extracting images using the pdf structured text device (or any other device) you see inline and non-inline images as identical things.18:23.42 
  For extracting images using the mutool extract, that works via object number, so, no, inline objects can't be accessed.18:24.15 
  For the purposes of the discussion I just had with michael, we can ignore inline images, cos it makes no sense to try to match a previous inline image as we can't reuse it.18:24.47 
mvrhel_laptop right. that is a good point18:25.13 
Robin_Watts which is good, cos it doesn't invalidate the "look in the page tree resources dicts" plan :)18:26.16 
mvrhel_laptop yes18:26.25 
Robin_Watts mvrhel_laptop: Random thought... is it worth us writing a generic 'map this function over page tree entries' function ?18:27.05 
mvrhel_laptop Robin_Watts: I don't quite follow what you mean. I suspect you are at a higher abstraction level...18:27.54 
Robin_Watts Part of our current task is "For every node in the page tree, check the resources", right?18:28.26 
mvrhel_laptop yes18:28.31 
Robin_Watts So I'm thinking that we could do with a function MapOverPageTree(X) that would walk the page tree and call X(page)18:29.10 
  so for our job X would do "check the resources for page"18:29.32 
  Did that make it clearer?18:30.58 
rayjj Robin_Watts: if we have small images that are determined to be unique when the PDF is created, it is slightly more efficient to write them as inline images, but that may not be worth it (but gs does it)18:31.48 
mvrhel_laptop Robin_Watts: So we would have some defined proc prototype which might be set to check for font matches or might be set to check for image matches etc18:32.56 
Robin_Watts mvrhel_laptop: Yes. We'd have a function that walks the tree, and one of the params to it would be a function to run on each node.18:33.36 
  Let me look in the existing code.18:33.56 
mvrhel_laptop I am going to need to start writing all of this down....18:34.26 
rayjj mvrhel_laptop: on a different topic, since I'm putting something together for Phil about halftoning, I was going to mention gen_ordered as well as gen_stochastic -- back in 2011 I gave you a snapshot of the stochastic stuff. Did you ever do anything to/with it ?18:34.41 
Robin_Watts mvrhel_laptop: Currently we have a function 'pdf_lookup_page_loc' for example.18:35.15 
  That knows how to efficiently and safely walk the pagetree allowing for cycles and not recursing too much.18:35.57 
  It'd be lovely to refactor that somehow into a form whereby our other actions on the page tree could share that same intelligence.18:36.48 
rayjj mvrhel_laptop: and I realized that although we checked the tools into git, neither of us ever committed the linearize_threshold18:37.30 
mvrhel_laptop Robin_Watts: ok I understand18:38.08 
Robin_Watts mvrhel_laptop: It might require some cleverness so that we remain efficient in both cases. I haven't thought it through.18:39.05 
mvrhel_laptop rayjj: oh I see that we never added the linearization18:40.35 
  I thought we had done that18:40.50 
rayjj mvrhel_laptop: you never made any changes to it, did you ?18:41.04 
mvrhel_laptop No. I did not18:41.10 
  I remember playing with it18:41.24 
  and feeding it into the gen_ordered stuff I think18:41.36 
  is was too long ago18:41.41 
rayjj mvrhel_laptop: ISTR that you made gen_ordered so it could emit a file of the same format as gen_stochastic does so that it could be run through linearize_threshold18:42.03 
mvrhel_laptop yes18:42.10 
  that is what I remember18:42.14 
  ed18:42.16 
rayjj mvrhel_laptop: well, I'll just commit the version I have, and maybe a README on it (and gen_stochastic)18:42.43 
mvrhel_laptop rayjj: the readme in the ordered generation does mention using thresh_remap 18:44.29 
  A problem though if its not there....18:44.40 
  using the turn on sequence (tos) output18:45.04 
tor8 mvrhel_laptop: Robin_Watts: the approach robin outlined with adding a structure that tracks resources to pdf_document looks good18:46.52 
mvrhel_laptop tor8: ok good18:47.06 
tor8 we might think about only seeding it from resources discovered when parsing the pdf18:47.13 
Robin_Watts waits for the other shoe...18:47.17 
tor8 since if we're reusing resources, then we've already loaded them at least once18:47.29 
rayjj mvrhel_laptop: oh, should I rename it ?18:47.32 
tor8 mvrhel_laptop: I'd hand waved that resource tracker thing in my email, robin seems to have figured out what needs to be done. I might come up with some ideas once I've slept on it or seen it in action :)18:48.34 
  and the purging of resources in low memory conditions does indeed mean the fz_store is unsuitable18:48.59 
rayjj mvrhel_laptop: I might as well rename it, since it's more descriptive of what it does18:49.08 
tor8 I expect a plain md5 (or sha1) sum of the contents should be good enough that we don't need to hang on to the actual data18:49.18 
  Robin_Watts: in pdf-page.c there's a resource walker that looks for blending operations which we call whenever we load a page18:50.11 
mvrhel_laptop I saw that18:50.19 
tor8 pdf_resources_use_blending18:50.22 
mvrhel_laptop rayjj: I wonder where I got that name18:50.49 
  so with mutool create -f F0 Times.ttf -i Im0 logo.png -i Im1 photo.jpg contents.txt if we find the document already has logo.png, how do we handle the replacement of the reference to Im0 in contents.txt18:52.24 
  i.e. the indirect reference will be different18:52.46 
  so I would need to filter contents.txt18:52.54 
  tor8, Robin_Watts ^^, or am I missing something18:53.11 
tor8 mvrhel_laptop: if looking at my original email, the function pdf_store_image would rummage through the resource tracker looking for a match and if it finds it, returns the object number18:53.52 
  the names (Im0 and Im1) are only used in the resource dictionary for that specific content stream to map a name to an object number18:54.32 
  I expect that mapping to be different for each page18:54.39 
  so "mutool create -i Im0 logo.png -i Im1 logo.png contents.txt" would create one resource for logo.png, and then map both Im0 and Im1 to that same object number18:55.27 
  it would load logo.png twice, but the resource tracker would catch the duplication18:55.45 
mvrhel_laptop ok18:55.54 
tor8 does that make sense?18:55.59 
mvrhel_laptop that makes sense18:56.00 
tor8 we might want to rename my example functions pdf_store_* into something different, not to get them confused with the fz_store18:57.21 
  pdf_track_image (calls a generic pdf_track_resource) maybe18:57.47 
  you or robin may come up with a better name18:57.58 
Robin_Watts We already have pdf_store functions that build on the fz_store ones.19:00.00 
  Any new functions should avoid the 'store' name, I reckon, for the benefit of easily confused people like me.19:00.31 
  Is the pdf_resource name used anywhere ?19:00.52 
mvrhel_laptop so track does not do a whole lot for me19:00.58 
  pdf_resource seems to not be used except for in pdf_resources_use_blending19:01.39 
Robin_Watts tor8: MuPDF android builds are done and uploaded to beta test.19:02.31 
  And to my public_html19:02.49 
  MuPDF-8{0,1,2,3}.apk19:02.59 
hyper_ch hi there, ghostscript is replacing Helvetica and other fonts in pdfs when I process them. Is that really necessary? Can't those fonts just stay?19:03.52 
Robin_Watts hyper_ch: gs is not "replacing" fonts in a PDF when it processes them, because gs does not 'process' a PDF :)19:04.36 
  It consumes one PDF and throws out another PDF that hopefully looks like the one that came in.19:04.57 
  But other than hopefully looking the same, the two PDFs are unrelated.19:05.27 
  Now, if you're saying that the fonts are not surviving the conversion process, that's a valid concern.19:05.53 
  The guy you need to speak to about this is kens, and he's gone for the night.19:06.08 
  He'll be back in about 14 hours time, I guess.19:06.28 
hyper_ch Robin_Watts: https://paste.simplylinux.ch/view/20d726b719:07.52 
Robin_Watts Standard answer: Have you tried using an up to date version of gs?19:08.46 
hyper_ch it's the newest debian ;)19:09.01 
Robin_Watts So?19:09.30 
hyper_ch just saying :)19:09.46 
Robin_Watts I suspect the problem is that your input file names the fonts without embedding them.19:10.06 
hyper_ch that's also possible19:10.21 
Robin_Watts If gs can't find the fonts, then it substitutes them.19:10.35 
hyper_ch so if I installed helvetica, things would be well then?19:10.48 
Robin_Watts No... you've got it backwards.19:11.02 
  The file says "Use LiberationSans" and gs says "I can't find LiberationSans, so I'm using Helvetica"19:11.22 
hyper_ch can't find -> can't find in the pdf/gs19:11.22 
  ah19:11.45 
Robin_Watts You either need to get the source documents to include LiberationSans, OR you need to make it so that gs can find LiberationSans.19:12.00 
  Now, how you make it so that gs can find LiberationSans, is a good question that I'm not entirely sure of the answer to.19:12.29 
  chrisl, rayjj, or kens would probably know.19:13.05 
hyper_ch when you mention rayjj, he leaves the channel :)19:14.58 
Robin_Watts Is GS_FONTPATH set on your system ?19:15.26 
  If not, try setting it to the path to where LiberationSans can be found.19:16.26 
  If that doesn't work, you may need to fiddle with FontMap, and at that point, I run away, sorry.19:16.46 
  http://www.ghostscript.com/doc/current/Use.htm#Font_lookup19:16.57 
hyper_ch will do so19:19.02 
mvrhel_laptop lunch time19:37.44 
 Forward 1 day (to 2015/11/11)>>> 
ghostscript.com
Search: