Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2015/11/09)	20151110
marcosw	Robin_Watts: WOW, the SM951 fast! cd	04:37.09
	bonnie++ reports 1.5 GB/s writes and 2.3 GB/s reads. Amusingly most of the results are ++++, indicating the test finished too fast to measure accurately, at least with the default parameters.	04:53.43
	The SM951 does get warm; it went from 40C to 81C in a couple of minutes with continuous read/writes. The limit is 85C but it appeared to start throttling when it hit 80C. The M2 slot doesn't get much airflow with the case I'm using.	05:37.46
James	Hi	11:13.52
ghostbot	Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.	11:13.52
James	Is anyone there	11:13.56
kens	No	11:14.09
Guest49912	Ha ha	11:14.15
	Quick question for you	11:14.21
	Mupdf looks great	11:14.50
	Does it support interactive PDF for page navigation?	11:15.09
kens	I don't know what you mean by that, are you talking about following hyperlinks ?	11:15.38
Guest49912	Yes	11:15.48
	Exactly	11:15.50
	So users can navigate from a contents page	11:16.03
Robin_Watts	MuPDF is a portable C library for opening/manipulating PDF (and other files).	11:16.10
Guest49912	Instead of flicking through hundreds of pages	11:16.14
Robin_Watts	The core of mupdf certainly reads that information from the PDF files.	11:16.37
kens	It may depend on the OS< the demo viewer varies with the platform. However its certainly possible in the core library and I thought it was present in the viewers. I am not a MuPDF developer though	11:16.38
tor8	Guest49912: Yes. The MuPDF viewer supports hyperlinks. The mobile and the new desktop viewers also support the outline/table of contents for quick navigation.	11:16.58
Robin_Watts	The viewers for various different platforms are just thin wrappers around the core. not all of them expose all of the functionality of the core.	11:17.03
Guest49912	Ok, were running it on android from google play	11:17.04
Robin_Watts	So, depending on what platform you want to run it on, ymmv.	11:17.17
Guest49912	Just trying to work out the best way of getting it to work	11:17.18
	It's failing at the moment with hyperlinks set from acrobat	11:17.36
Robin_Watts	Guest49912: Yes the android app supports link following. But only if you click the 'link' icon in the top bar first :)	11:17.49
	Look for the icon that looks like a chain.	11:17.59
mhayden	just stumbled onto mupdf and i love it -- thanks for the work there, folks	13:52.55
sebras	mhayden: are you using the android app? or the pre-built apps for windows/ios/linux/winrt?	13:55.22
mhayden	i'm using it in linux (Fedora)	13:55.45
sebras	mhayden: alright, it is always interesting to know. thanks! :)	14:44.39
mhayden	no problem ;)	14:53.52
henrys	Robin_Watts: tiffscaled has really become an important devicef for us, great stuff!	15:15.20
Robin_Watts	henrys: We should consider a tiffscaledets device maybe at some point.	15:16.19
henrys	Robin_Watts: good idea.	15:18.07
Robin_Watts	All the smarts of tiffscaled are in the downscaler which is used by lots of devices. It would be a matter of updating that to know how to do ETS.	15:20.55
	and then we'd use the existing tiffscaled etc, but with -dDownScaleMode=1 to tell it to use ETS.	15:21.29
henrys	Robin_Watts: I'll add it to the project list on the agenda. Maybe something rayjj would be interested in doing.	15:22.42
marcosw	Robin_Watts: thanks for the Phil email analysis.	15:23.32
Robin_Watts	marcosw: np.	15:23.42
henrys	Other than asking about the mupdf release I'm not really feeling the need for a meeting.	15:24.59
	tkamppeter: were you able to help out the folks trying to use the old konica minolta printers? I visited their office last week and mentioned the community was having printing problems.	15:27.23
	tkamppeter: not the HQ but the "printer language group"	15:28.01
	I saw Spy last night and I'm still laughing about it. Good fun movie.	15:28.44
mvrhel_laptop	I was wondering if anyone had gone ahead and reinstalled the gsview beta (for windows) that I had chrisl put up a couple of weeks ago. wanted to make sure there were not any major issues	15:28.50
henrys	mvrhel_laptop: I will today at some point	15:29.28
mvrhel_laptop	thanks	15:29.36
henrys	chrisl: does it make sense to test the final commercial release or do you think the last problem we had is isolated and not likely to recur	15:30.03
	?	15:30.05
Robin_Watts	mvrhel_laptop: This is still called gsview_setup_6.0.exe ?	15:30.51
chrisl	henrys: I do test it - but I rarely use the VS projects, and almost always build at the command line	15:30.53
marcosw	Robin_Watts: take a look a the 85 and 170 spots in the tiffscaled output, the error diffusion fails in an interesting way.	15:31.03
mvrhel_laptop	Robin_Watts: yeah we probably should have but a beta_# after it	15:31.16
	let me look what the tag was	15:31.30
Robin_Watts	marcosw: That's not an unexpected failure.	15:31.59
	mvrhel_laptop: Just wanted to make sure I was downloading the right version.	15:32.19
	We should have a build number in there.	15:32.26
tkamppeter	henrys, these printers need XPS, there is someone (Helge Blischke) who wrote a simple filter in Perl for using GS with xpswrite, but I think that later on something in cups-filters would be needed. Also a rastertoxps to print from phones would be a nice thing.	15:32.29
Robin_Watts	gsview_setup_6.0_0001.exe etc	15:32.43
mvrhel_laptop	yes	15:32.55
	I will get that added in the nsis script	15:33.41
henrys	tkamppeter: raster to xps should be pretty trivial	15:33.48
tkamppeter	henrys, but before investing more time into a thing like rastertoxps I would need to know how many modern printers are actually XPS-only, as all Google results about these Konica Minolta printers are from 7 or 8 years ago.	15:33.52
mvrhel_laptop	XPS-only has the be very rare	15:34.18
	I can't imagine	15:34.25
	s/the/to/	15:34.41
henrys	tkamppeter: I don't think it's a good thing to spend a lot of time on but writing a rastertoxps is probably 3 hours so...	15:35.03
rayjj	Robin_Watts: marcosw: rather than ETS (or other error diffusion), Phil may like the output using the stochastic threshold array. This should be faster to halftone and won't have strange effects	15:35.03
tkamppeter	henrys, most work on a rastertoxps is to find out about the format, the structure, starting pages, selecting resolution and color space, ... the bitmap itself probably goes the same way as the rastertopdf filter for example.	15:36.09
tor8	Robin_Watts: could you compile the android release apks? the android sdk keeps breaking on my machine... stupid thing still doesn't work properly on 64-bit linux	15:36.24
Robin_Watts	tor8: Sure.	15:36.44
	Did the VS2005 64bit building issues get sorted?	15:36.54
	mvrhel_laptop: gsview seems to work for me.	15:38.04
mvrhel_laptop	thanks Robin_Watts	15:38.19
henrys	rayjj: this does sound like something for a twiki page... when to use halftone in the pipeline vs postprocessing. Really hard for me to imagine when you'd want in pipeline for host based stuff.	15:38.32
	rayjj: it would be interesting to see some timing numbers	15:39.01
	tor8: is that the only problem for the release?	15:40.02
tor8	henrys: as far as I can tell, yes.	15:40.24
	oh, and ios builds	15:40.32
	Robin_Watts: there are two commits on tor/master that we could probably sneak into the 1.8 release	15:40.51
mvrhel_laptop	tor8: so do we want to add something like a pdf-create.c in the pdf directory to add in the stuff that I am pulling out from pdf-device.c (e.g. the code to create the image and font resources)	15:40.52
tor8	mvrhel_laptop: yes.	15:41.12
mvrhel_laptop	ok. I will work on the image stuff today thanks	15:41.30
tor8	mvrhel_laptop: not sure I'm sold on the exact file name, but something to that effect will do fine :)	15:41.40
Robin_Watts	mvrhel_laptop, tor8: That's something we should think about for a mo, maybe.	15:41.41
	Do we want to separate the pdf in and out codebases?	15:42.11
tor8	Robin_Watts: Going even further and separating the pdf module into in/out/common directories?	15:43.19
Robin_Watts	tor8: I was just about to suggest that.	15:43.36
tor8	A lot of the PDF data structures are read/write (like the xref) but separating reading and writing clearly would make it easier to navigate the source	15:44.09
Robin_Watts	yeah.	15:44.15
tor8	It has gotten to be a bit of a jungle, with odds and bits spread out throughout	15:44.36
henrys	anybody else have meeting stuff?	15:44.41
mvrhel_laptop	ok. so pdf-create.c is clearly not the right name/way	15:44.41
tor8	mvrhel_laptop: the right way, but we might want to stop and think about doing a big reorganize.	15:45.17
	oh boy, is that going to upset zeniko even more :)	15:45.24
	though I suspect he gave up trying to maintain that big patch of his already	15:45.51
henrys	tkamppeter: isn't all that logic already in rastertopdf and you just need to find xps substitutes for the generated pdf? or am I missing something?	15:47.32
chrisl	I'd be quite surprised if those bizhub printers seriously support XPS - 32Mb and 120Mhz seems quite low end for full XPS support.....	15:49.00
henrys	chrisl: yeah my friend who worked directly on those printers said to use the gdi driver.	15:49.46
mvrhel_laptop	tor8: so should I continue with the way that I am going for now? or is there something else that I need to do. I understand the desire/need for the reorg. I think it will be easier for me to get up to speed if I finish up this bit for the mutool create first though. Pulling stuff out of pdf-device.c that is going to be use for both will move us toward the goal of having the common stuff...	15:50.09
	...in one place	15:50.10
marcosw	rayjj: thanks for the suggestion. I just tried stocht.ps and it's not symmetric (the difference between the 0 and 1 patches is significant and the 253, 254, 255 patches are identically entirely white). Is this a bug or is ht_ccsto.ps tuned for a particular device (it appears to be doing negative dot gain).	15:50.14
Robin_Watts	mvrhel_laptop: You push through as best suits you.	15:50.27
	We can always rejig stuff. git FTW :)	15:50.38
chrisl	henrys: I'm surprised there's much interest in GDI printers these days. either......	15:50.48
mvrhel_laptop	yes that is my thought, and I fully expect that to happen	15:50.51
tor8	mvrhel_laptop: Just keep going. The re-org may happen at any time either Robin or I get bored, but we're good with the git voodoo to fix things up :)	15:51.04
tkamppeter	henrys, partially, for PDF I use the QPDF library which generates the PDF pieces for me. There seems to be no XPS processor or generator library for Linux/free software.	15:51.19
henrys	chrisl: I think what he meant was the gdi driver will work XPS not so much, but he didn't say that explicitly.	15:52.14
Robin_Watts	tor8: Have you tagged 1.8 ?	15:52.36
chrisl	henrys: I wasn't specifically referring to that - just slightly surprised we're still talking about GDI printers these days!	15:53.07
	tkamppeter: has anyone actually tested printing XPS to these devices?	15:53.28
tkamppeter	chrisl, I don't know.	15:53.44
henrys	chrisl: true and when you look at printer prices why is anyone fooling with one of these bizhub bricks in the first place	15:54.07
chrisl	tkamppeter: It just seems to me we/you could devote a lot of time to this, only to find they don't really work	15:54.15
tkamppeter	henrys, the "gdi" output device of GS is only for Samsung printers, especially older bw printers.	15:54.31
tor8	Robin_Watts: I have only tagged the RC-1 locally, and that tag is what currently is on origin/master	15:54.57
chrisl	henrys: Well, we've had a couple of people complain about us dropping our Postscript Level 1 output, so.......	15:55.04
henrys	tkamppeter: I sent you a link to a gdi thing that my friend said would work. I have no idea how it works	15:55.16
tor8	Robin_Watts: I think we should get the two oldest commits on tor/master into 1.8	15:55.18
	if you've looked them over I can push and then we can tag and build releases	15:55.44
Robin_Watts	looking now.	15:55.49
henrys	skype in 5 minutes.	15:56.11
marcosw	I have to run to a doctor's appointment in a few minutes. does anyone have anything for me?	15:56.19
chrisl	tkamppeter: I assume that the samsung gdi printer is really just a wrapper around a raster for each page	15:56.37
Robin_Watts	tor8: screen_w - 20 etc	15:56.39
henrys	marcosw: I'm good	15:56.42
Robin_Watts	I dislike magic numbers.	15:56.44
tor8	Robin_Watts: they're really magic this time...	15:56.58
Robin_Watts	Presumably that 20 is for window furniture width etc?	15:56.58
tor8	basically accounting for furniture as you said	15:57.05
chrisl	#define MAGIC_NUMBER 20	15:57.21
Robin_Watts	enum { FURNITURE_WIDTH = 20, FURNITURE_HEIGHT=40} ; ?	15:57.31
	likewise, there are layout_w/layout_h/layout_em that could be DEFAULT_LAYOUT_{W,H,EM}	15:58.29
	Have all such things defined at the top of the file rather than buried in the middle of it.	15:58.43
mvrhel_laptop	bbiab	16:01.20
Robin_Watts	In the exception fiddling commit, rather than changing from error to ctx->error everywhere, we should do error = ctx->error at the top of the function. No need to force C to keep dereffing (pointer aliasing etc)	16:01.57
tkamppeter	chrisl, that is the case, but the wrapper itself is specific to Samsung's printers. There is no PDL called GDI. GDI in reality is a printer driver API of Windows, and as GDI printer one understands a device without standard PDL and with Windows driver ("GDI driver") designed only for the use with Windows. With knowledge about the printer's PDL/communication protocol one can make it working with any OS, but the manufacturer keep this info secret, pr	16:02.31
	obably to not show how dumb the printer is.	16:02.31
tor8	Robin_Watts: can do that.	16:03.02
Robin_Watts	tor8: Actually... I'm having trouble seeing what the actual change in that commit is. Why the change from ctx->error to ctx when calling the functions ?	16:03.52
tor8	the ctx->error thing shouldn't really matter -- I'm just keeping things symmetrical (the throw/catch macros dereference the ctx->error mulitple times)	16:03.53
chrisl	tkamppeter: sure. What I meant was that might be a better route to a working solution than XPS	16:03.56
Robin_Watts	Oh, so you can use fz_throw ?	16:04.01
tor8	I needed to pass the ctx not the error context so I could use throw	16:04.22
Robin_Watts	yeah.	16:04.28
tkamppeter	chrisl, you mean reverse-engineering the proprietary language of Konica Minolta printers? This helps only for Konica Minolta and also Konica Minolta will not necessarily keep their proprietary language for longer time. So the safer investment of time is XPS, it could also serve for sending jobs to Windows servers (assuming that XPS is still used nowadays).	16:12.14
chrisl	tkamppeter: XPS is not going to be a safer investment of time if it doesn't actually work	16:13.36
tor8	Robin_Watts: third commit on tor/master has magic number constants with names	16:13.48
tkamppeter	But XPS is probably easier to make it to work as specs are published.	16:15.43
	chrisl, ^^	16:15.49
chrisl	tkamppeter: yes, but it is also a full PDL, and the specs of at least those bizhub printers make me suspicious about the level to which they actually support XPS	16:16.45
rayjj	marcosw: sorry, I had a minor issue I had to take care of.	16:17.08
Robin_Watts	tor8: looks ok to me.	16:17.10
	I have a preference for enums rather than #defines personally, but...	16:17.26
tkamppeter	chrisl, you mean that the Konica Minolta devices do not really fully support XPS and so one still needs a model-specific driver for it, even if one sends XPS to the printer?	16:18.16
chrisl	tkamppeter: Yes.	16:18.44
rayjj	marcosw: yes, the ht_ccsto.ps is a 167x167 with a transfer function 'baked in'. We can generate any dimension stochastic threshold array and then apply any transfer function desired (including linear) to it	16:19.00
	if the array is large enough, even with the transfer function, we will always get 256 shades	16:19.58
tor8	Robin_Watts: updated commit with enums instead	16:20.11
chrisl	tkamppeter: I'm not saying don't pursue the XPS route, but I am saying, make sure it's going to work (enough) before heading down that route.	16:20.34
rayjj	marcosw: and we can generate stochastic arrays with a minimum dot size (for laser/led engines)	16:20.37
Robin_Watts	tor8: lovely!	16:20.42
	rayjj, marcosw: tiffscaled supports a minimum feature size thing too.	16:21.14
	-dMinFeatureSize can be 1,2 or 3, IIRC.	16:22.11
rayjj	marcosw: I'll put together information on that for Phil, along with timings for the tiffscaled error diffusion vs. stochastic threshold array. I'll use a linear 256x256 array so images will be similr	16:22.30
tkamppeter	chrisl, seems that XPS is not really worth the time, too few printers and one does not really know whether one makes them all work.	16:23.26
rayjj	Robin_Watts: that sounds right. The stochastic array generator is a bit more specific in that you can choose the size/shape of the minimum dot 1x2, 2x1, 2x2, ...	16:23.50
Robin_Watts	marcosw: What Phil should do is to use tiff24nc to generate a contone version.	16:24.19
	Then he can try all the different methods he can think of to process that down to 1bpp.	16:24.40
	When he finds the way that best works for him, we can try to reproduce it within gs.	16:24.52
chrisl	tkamppeter: that is my feeling, but I am somewhat remote from consumer printing. It may be a case of "not worth the time, just now.... but worth keeping an eye on"	16:24.55
rayjj	Robin_Watts: marcosw: I don't really know where Phil is going with this. Part of the RIP technology they got from the company they used (then bought) -- cust 850, was their "special" halftoning	16:28.11
	in the JaNe device	16:28.37
	they used gs to render to contone Lab and then the JaNe device took it from there	16:29.13
	The other strange thing is the performance issue with the -dFirstPage (Quick Q thread) mentioned a G850 CPU with only 2Gb RAM, where previously they were using massive Xeon based 8 core (or more) 8Gb systems	16:31.14
Robin_Watts	Tor8: so, you want me to build off golden/master now ?	16:34.25
tor8	Robin_Watts: yes please	16:34.57
	I've got to pop out for dinner, but I'll check back in a couple of hours	16:35.14
rayjj	Robin_Watts: The 'muddy' output from tiffscaled is probably due to their engine not working well with single dispersed dots. I'll mention that in my follow up as well	16:36.37
Robin_Watts	rayjj: Right, so -dMinFeatureSize=2 may be enough to sort that.	16:37.18
rayjj	Robin_Watts: I'll send him samples of it with that as well as stochastic threshold array (can we just say Blue Noise Mask, or BNM, now???) with single and 2x2 min dot as well -- both with no xfer function and timings	16:39.03
	the reason the stochastic generator supports shaped minimum dots is that some engines have 2x1 resolutions such as 1200x600	16:42.02
	cust 532 has that mode -- they call it "fast 1200"	16:42.37
Robin_Watts	rayjj: I understand the idea. I just didn't code it :)	16:43.07
rayjj	I guess marketing thought that sounded better than "half assed 1200"	16:43.28
	;-)	16:43.35
mvrhel_laptop	Robin_Watts: tor8 has stepped out so let me ask you	16:51.09
	In his email he mentioned the cache that is used to avoid putting in an existing image into the resource object. In pdf-device.c this is a storage of md5s that are stored in a structure on the device. Where do we envision this structure residing when we add to our resources for mutool create	16:55.07
Robin_Watts	Just give me a mo to have a look.	16:56.15
mvrhel_laptop_	hmm network issues here	16:57.19
Robin_Watts	mvrhel_laptop: OK, so... we have fz_store.	16:57.25
mvrhel_laptop_	not sure if you had all my messages	16:57.37
Robin_Watts	fz_store is the generic 'cache' for all sorts of things.	16:57.42
mvrhel_laptop	ok	16:58.04
Robin_Watts	Anything can be put into the fz_store, as long as its structure starts with an fz_storable struct.	16:58.44
	That contains a reference count and a type pointer.	17:00.17
	This was originally put in so that we could store stuff like decoded images; we'd decode the image and use it, and put it in the store. Then we'd drop our pointer to it.	17:00.58
	So the store was the only thing holding a reference to it.	17:01.08
	When we run low on memory the store would look through to see the things that have a single reference (i.e. just the store) holding them, and would bin them oldest first until we have enough memory to continue.	17:01.54
mvrhel_laptop	ok	17:02.04
	So I am having trouble seeing in send_image in pdf-device.c where fz_store is coming into play	17:02.27
Robin_Watts	The question is, do we want (or can we) use fz_store as part of the pdf-device.	17:02.44
mvrhel_laptop	or is the image already in the store when this is called?	17:03.00
	I am missing something	17:03.21
Robin_Watts	When writing a PDF file out, we want to keep a list of the images we've already written, and then check whether a new image we want to write has been written to the pdf file already.	17:03.42
	mvrhel_laptop: (bear with me, the exposition is a way of me getting myself back up to speed)	17:04.06
mvrhel_laptop	no problem. I feel bad having to bother you as I know you are busy	17:04.25
Robin_Watts	So, we don't really want to be checking the fz_store, cos that only has stuff that we happen to have in memory, not an exhaustive list of stuff that exists in the resources for this page already.	17:05.24
mvrhel_laptop	right	17:05.38
Robin_Watts	So I think fz_store (and hence pdf_store) is a red herring.	17:05.41
	OK, so in pdf-device.c, we have a structure definition for pdf_device.	17:06.41
mvrhel_laptop	it seems that we need an array of md5 sums in some place that is not hook to the pdf_device	17:06.43
Robin_Watts	That contains num_imgs/max_imgs/images.	17:07.06
	Where image_entry *images = a block of max_imgs image_entries of which num_imgs are populated.	17:07.40
mvrhel_laptop	yes. this is exactly where I am	17:08.11
	and that all seems reasonable	17:08.21
Robin_Watts	Every time we meet a new image, we take a digest of it, and check to see if that digest is listed in the existing images; if so we do (I assume) a slow exhaustive compare) to see if it matches. If not, we insert it as a new image.	17:08.52
mvrhel_laptop	right	17:08.58
Robin_Watts	So... that's all well and good for the case where we are creating completely new pages, and we don't worry about images being already in the PDF file.	17:09.43
	But it falls down when appending new content to a page, or when reusing an image on a page that's already used elsewhere.	17:10.25
	OK, I suspect I'm now caught up to where you were at the start of this conversation :)	17:10.39
mvrhel_laptop	exactly	17:10.45
	so tor8 had the following int pdf_create_image(fz_context ctx, pdf_document doc, fz_image *image) where we return an object number	17:11.51
	My problem is that I don't see where I know whats already in doc	17:12.16
Robin_Watts	mvrhel_laptop: Right. Me either.	17:12.50
	SOMEWHERE, we need to have a 'check to see if this image exists in the document already' mechanism.	17:13.20
	Whether that's hidden under pdf_create_image, or is expected to be called before pdf_create_image is called is a different discussion.	17:13.47
	I reckon it might be nicest to hide it under there.	17:14.45
	The fz_image *image pointer gives us everything we need to compare the image with existing ones.	17:15.17
	The pdf_document *doc pointer gives us everything we need to see what's in the current document.	17:15.41
	One idea might be to have a new structure as part of the doc that contains info for every image in the document.	17:16.17
mvrhel_laptop	I was wondering that. Do we do some initial search/set up of the md5s in the existing doc and then search that	17:16.32
Robin_Watts	We could populate it on first access.	17:16.33
mvrhel_laptop	:)	17:16.37
Robin_Watts	So it wouldn't slow every file open down. Only when we start writing images.	17:16.57
mvrhel_laptop	yes	17:17.04
Robin_Watts	We'd run through every object in the file looking for it to be a /Type/Image (or whatever it is)	17:17.24
	and then stash details of W/H/compression.	17:17.47
	Then if we write an image, we can quickly check if we have any potential matches.	17:18.07
	For potential matches, we can update the info to also contain a hash of the compressed data.	17:18.42
	Then we can exhaustively check if the hashes match.	17:19.06
mvrhel_laptop	oh, ok, so as a first cut just compare the obvious diffs and then do the hashes if there may be a match	17:19.17
	can the pdf-device use this too instead of pdev->images[i].digest	17:19.48
Robin_Watts	Absolutely. pdf device should move over to using this.	17:20.10
	This is a 'better' version of the limited thing I hacked up for pdf-device.	17:20.27
mvrhel_laptop	ok. I think I have enough to keep me busy for a bit. thanks Robin_Watts	17:21.11
	I may ping you again as I push on	17:21.23
Robin_Watts	I suspect that fetching just X/Y/compression type etc from the file should be fast enough to not be a massive hit.	17:21.31
	Actually....	17:21.40
	I'd suggested doing this by walking the entire file looking for objects that were /Type/Image.	17:22.26
	That might be slow in the case where we have a file with lots of compressed objects.	17:22.43
mvrhel_laptop	ok	17:23.00
Robin_Watts	One possible way to be faster would be to walk the resources tree, and just check the Image objects listed there.	17:23.17
	s/walk the resources tree/walk the page tree and just check the image objects in the resources dictionaries there/	17:23.48
	Cos all the Image objects in the file should be listed in the page resources tree, and that'll be a smaller set to walk.	17:24.18
	But... another thought...	17:25.09
mvrhel_laptop	Robin_Watts: ok. I imagine there must be code to do that already	17:25.15
Robin_Watts	To walk the pages tree? Yes. To look for the image resources in the way I've just talked about, no.	17:25.46
mvrhel_laptop	what does mutool extract do	17:26.03
Robin_Watts	The other thought that occurs to me, is that images are not the only thing we're going to be doing this for.	17:26.08
mvrhel_laptop	Robin_Watts: true	17:26.21
Robin_Watts	We're going to need to do this for things like fonts, and potentially patterns etc.	17:26.28
mvrhel_laptop	He wants to do the same thing with fonts	17:26.36
	yes	17:26.38
Robin_Watts	Which leads me to the idea that we should generalise what we do to not just the image case.	17:26.54
	Maybe we can think of this as being a 'pdf resources indexer'	17:27.16
	i.e. we maintain a index of the resources within the pdf file.	17:27.34
	A 'resource' could be any of an image/font/pattern/+others and would have some 'cheap' info (X/Y/Compression in the case of images, font name/font type in the case of fonts etc), and then a shared digest field.	17:30.23
	Does that sound like enough of an idea for the shape of it?	17:30.44
	(can you think of any problems?)	17:30.54
mvrhel_laptop	Robin_Watts: yes. so this would be a member variable of pdf_document? And we would still do the lazy initialization as well as the lazy fill in of the digest field.	17:31.37
Robin_Watts	Yes, exactly.	17:31.48
	All the resources can be found by walking the pdf page tree and looking at the resources dictionaries we pass.	17:32.00
mvrhel_laptop	Robin_Watts: ok. I think I have enough to make a nice mess of things	17:32.33
Robin_Watts	I suspect that at least 1 version of the PDF spec said that XObjects can have their own resource dictionaries, so we'd need to descend those parts of the trees too.	17:32.36
mvrhel_laptop	Robin_Watts: ok one other question	17:32.55
Robin_Watts	mvrhel_laptop: Tor will arrive tomorrow and describe a much simpler way of doing all this, no doubt :)	17:33.01
mvrhel_laptop	so in walking the page tree is that going to be any different than going through all the objects like is done in pdfextract.c	17:33.31
	where we find all the font and images	17:33.43
	or is there something more sophisticated	17:34.01
	or intelligent	17:34.11
Robin_Watts	mvrhel_laptop: I think it'll be very similar to that.	17:34.30
mvrhel_laptop	ok	17:34.34
Robin_Watts	We should be careful to allow for malicious page trees.	17:34.47
mvrhel_laptop	Thanks for the discussion Robin_Watts.	17:34.47
	Robin_Watts: can you explain?	17:35.14
Robin_Watts	i.e. ones where X is a descendent of X.	17:35.24
mvrhel_laptop	ah	17:35.28
Robin_Watts	We cope with that by doing pdf_mark and pdf_unmark as we descend to check we don't hit cycles.	17:35.49
	(search in the source for pdf_mark and you'll find code that walks the page tree, I think)	17:36.14
	Also, we should watch out for deeply nested trees (recursion may not be the best option, though it might be acceptable as a first step to getting something working)	17:37.20
	I've seen documents that basically have a page tree that's a linked list. So 1500 pages works out as 1500 levels of recursion in the native tree walker :)	17:38.18
mvrhel_laptop	wow	17:38.33
	so the pdf_mark_obj and pdf_unmark_obj are used so that we can identify that we have already visited this object during searches?	17:41.40
Robin_Watts	Yes.	17:43.59
	There is a secret bit in our in memory representation of pdf_dicts for that.	17:44.30
mvrhel_laptop	Robin_Watts: just looking at this. Another question. pdf_load_colorspace_imp does a throw if it detects a recursion in color space object. Shouldnt the call to pdf_load_colorspace_imp in pdf_load_colorspace be wrapped up in the try to rethrow?	17:46.59
Robin_Watts	s/native tree walker/naive tree walker/ sorry.	17:47.15
mvrhel_laptop	This is in pdf-colorspace.c .	17:47.37
Robin_Watts	Why?	17:47.59
	It only needs to be wrapped up if it needs to tidy up in the case of a throw.	17:48.25
mvrhel_laptop	ok	17:48.35
Robin_Watts	We have no tidying up to do here, so all we'd do is rethrow it anyway.	17:48.45
	One of the (many) big attractions of try/catch over explicit error passing is that we don't need to clutter functions with extra stuff, unless there is actual cleanup to do.	17:49.32
*Robin_Watts*	will convert you :)	17:49.49
mvrhel_laptop	:) In this case, I am confused as to what gets returned	17:50.26
Robin_Watts	mvrhel_laptop: What gets returned where?	17:50.38
mvrhel_laptop	by pdf_load_colorspace_imp	17:50.48
Robin_Watts	If pdf_load_colorspace_imp throws, then nothing is returned.	17:50.55
	We don't go through C's return mechanism.	17:51.13
mvrhel_laptop	ok	17:51.16
Robin_Watts	We longjmp out to the previous catch.	17:51.23
mvrhel_laptop	ah	17:51.32
	I see	17:51.41
	excuse my ignorance on that	17:52.01
Robin_Watts	mvrhel_laptop: No worries. fz_try/fz_catch are best thought of as sausages.	17:52.27
mvrhel_laptop	:)	17:52.42
Robin_Watts	A fabulous invention that makes our lives better, but you don't want to look into what goes into them.	17:52.54
mvrhel_laptop	ok Thanks for all the help Robin_Watts . bbiab	17:54.18
Robin_Watts	no worries.	17:54.41
rayjj	mvrhel_laptop: Robin_Watts: Does mupdf also handle "inline" images, w.r.t. extract images or writing PDF's ?	18:09.36
Robin_Watts	rayjj: MuPDF handles inline images for reading, certainly.	18:22.53
	For writing, we never write inline.	18:23.08
	(currently)	18:23.11
	For extracting images using the pdf structured text device (or any other device) you see inline and non-inline images as identical things.	18:23.42
	For extracting images using the mutool extract, that works via object number, so, no, inline objects can't be accessed.	18:24.15
	For the purposes of the discussion I just had with michael, we can ignore inline images, cos it makes no sense to try to match a previous inline image as we can't reuse it.	18:24.47
mvrhel_laptop	right. that is a good point	18:25.13
Robin_Watts	which is good, cos it doesn't invalidate the "look in the page tree resources dicts" plan :)	18:26.16
mvrhel_laptop	yes	18:26.25
Robin_Watts	mvrhel_laptop: Random thought... is it worth us writing a generic 'map this function over page tree entries' function ?	18:27.05
mvrhel_laptop	Robin_Watts: I don't quite follow what you mean. I suspect you are at a higher abstraction level...	18:27.54
Robin_Watts	Part of our current task is "For every node in the page tree, check the resources", right?	18:28.26
mvrhel_laptop	yes	18:28.31
Robin_Watts	So I'm thinking that we could do with a function MapOverPageTree(X) that would walk the page tree and call X(page)	18:29.10
	so for our job X would do "check the resources for page"	18:29.32
	Did that make it clearer?	18:30.58
rayjj	Robin_Watts: if we have small images that are determined to be unique when the PDF is created, it is slightly more efficient to write them as inline images, but that may not be worth it (but gs does it)	18:31.48
mvrhel_laptop	Robin_Watts: So we would have some defined proc prototype which might be set to check for font matches or might be set to check for image matches etc	18:32.56
Robin_Watts	mvrhel_laptop: Yes. We'd have a function that walks the tree, and one of the params to it would be a function to run on each node.	18:33.36
	Let me look in the existing code.	18:33.56
mvrhel_laptop	I am going to need to start writing all of this down....	18:34.26
rayjj	mvrhel_laptop: on a different topic, since I'm putting something together for Phil about halftoning, I was going to mention gen_ordered as well as gen_stochastic -- back in 2011 I gave you a snapshot of the stochastic stuff. Did you ever do anything to/with it ?	18:34.41
Robin_Watts	mvrhel_laptop: Currently we have a function 'pdf_lookup_page_loc' for example.	18:35.15
	That knows how to efficiently and safely walk the pagetree allowing for cycles and not recursing too much.	18:35.57
	It'd be lovely to refactor that somehow into a form whereby our other actions on the page tree could share that same intelligence.	18:36.48
rayjj	mvrhel_laptop: and I realized that although we checked the tools into git, neither of us ever committed the linearize_threshold	18:37.30
mvrhel_laptop	Robin_Watts: ok I understand	18:38.08
Robin_Watts	mvrhel_laptop: It might require some cleverness so that we remain efficient in both cases. I haven't thought it through.	18:39.05
mvrhel_laptop	rayjj: oh I see that we never added the linearization	18:40.35
	I thought we had done that	18:40.50
rayjj	mvrhel_laptop: you never made any changes to it, did you ?	18:41.04
mvrhel_laptop	No. I did not	18:41.10
	I remember playing with it	18:41.24
	and feeding it into the gen_ordered stuff I think	18:41.36
	is was too long ago	18:41.41
rayjj	mvrhel_laptop: ISTR that you made gen_ordered so it could emit a file of the same format as gen_stochastic does so that it could be run through linearize_threshold	18:42.03
mvrhel_laptop	yes	18:42.10
	that is what I remember	18:42.14
	ed	18:42.16
rayjj	mvrhel_laptop: well, I'll just commit the version I have, and maybe a README on it (and gen_stochastic)	18:42.43
mvrhel_laptop	rayjj: the readme in the ordered generation does mention using thresh_remap	18:44.29
	A problem though if its not there....	18:44.40
	using the turn on sequence (tos) output	18:45.04
tor8	mvrhel_laptop: Robin_Watts: the approach robin outlined with adding a structure that tracks resources to pdf_document looks good	18:46.52
mvrhel_laptop	tor8: ok good	18:47.06
tor8	we might think about only seeding it from resources discovered when parsing the pdf	18:47.13
*Robin_Watts*	waits for the other shoe...	18:47.17
tor8	since if we're reusing resources, then we've already loaded them at least once	18:47.29
rayjj	mvrhel_laptop: oh, should I rename it ?	18:47.32
tor8	mvrhel_laptop: I'd hand waved that resource tracker thing in my email, robin seems to have figured out what needs to be done. I might come up with some ideas once I've slept on it or seen it in action :)	18:48.34
	and the purging of resources in low memory conditions does indeed mean the fz_store is unsuitable	18:48.59
rayjj	mvrhel_laptop: I might as well rename it, since it's more descriptive of what it does	18:49.08
tor8	I expect a plain md5 (or sha1) sum of the contents should be good enough that we don't need to hang on to the actual data	18:49.18
	Robin_Watts: in pdf-page.c there's a resource walker that looks for blending operations which we call whenever we load a page	18:50.11
mvrhel_laptop	I saw that	18:50.19
tor8	pdf_resources_use_blending	18:50.22
mvrhel_laptop	rayjj: I wonder where I got that name	18:50.49
	so with mutool create -f F0 Times.ttf -i Im0 logo.png -i Im1 photo.jpg contents.txt if we find the document already has logo.png, how do we handle the replacement of the reference to Im0 in contents.txt	18:52.24
	i.e. the indirect reference will be different	18:52.46
	so I would need to filter contents.txt	18:52.54
	tor8, Robin_Watts ^^, or am I missing something	18:53.11
tor8	mvrhel_laptop: if looking at my original email, the function pdf_store_image would rummage through the resource tracker looking for a match and if it finds it, returns the object number	18:53.52
	the names (Im0 and Im1) are only used in the resource dictionary for that specific content stream to map a name to an object number	18:54.32
	I expect that mapping to be different for each page	18:54.39
	so "mutool create -i Im0 logo.png -i Im1 logo.png contents.txt" would create one resource for logo.png, and then map both Im0 and Im1 to that same object number	18:55.27
	it would load logo.png twice, but the resource tracker would catch the duplication	18:55.45
mvrhel_laptop	ok	18:55.54
tor8	does that make sense?	18:55.59
mvrhel_laptop	that makes sense	18:56.00
tor8	we might want to rename my example functions pdf_store_* into something different, not to get them confused with the fz_store	18:57.21
	pdf_track_image (calls a generic pdf_track_resource) maybe	18:57.47
	you or robin may come up with a better name	18:57.58
Robin_Watts	We already have pdf_store functions that build on the fz_store ones.	19:00.00
	Any new functions should avoid the 'store' name, I reckon, for the benefit of easily confused people like me.	19:00.31
	Is the pdf_resource name used anywhere ?	19:00.52
mvrhel_laptop	so track does not do a whole lot for me	19:00.58
	pdf_resource seems to not be used except for in pdf_resources_use_blending	19:01.39
Robin_Watts	tor8: MuPDF android builds are done and uploaded to beta test.	19:02.31
	And to my public_html	19:02.49
	MuPDF-8{0,1,2,3}.apk	19:02.59
hyper_ch	hi there, ghostscript is replacing Helvetica and other fonts in pdfs when I process them. Is that really necessary? Can't those fonts just stay?	19:03.52
Robin_Watts	hyper_ch: gs is not "replacing" fonts in a PDF when it processes them, because gs does not 'process' a PDF :)	19:04.36
	It consumes one PDF and throws out another PDF that hopefully looks like the one that came in.	19:04.57
	But other than hopefully looking the same, the two PDFs are unrelated.	19:05.27
	Now, if you're saying that the fonts are not surviving the conversion process, that's a valid concern.	19:05.53
	The guy you need to speak to about this is kens, and he's gone for the night.	19:06.08
	He'll be back in about 14 hours time, I guess.	19:06.28
hyper_ch	Robin_Watts: https://paste.simplylinux.ch/view/20d726b7	19:07.52
Robin_Watts	Standard answer: Have you tried using an up to date version of gs?	19:08.46
hyper_ch	it's the newest debian ;)	19:09.01
Robin_Watts	So?	19:09.30
hyper_ch	just saying :)	19:09.46
Robin_Watts	I suspect the problem is that your input file names the fonts without embedding them.	19:10.06
hyper_ch	that's also possible	19:10.21
Robin_Watts	If gs can't find the fonts, then it substitutes them.	19:10.35
hyper_ch	so if I installed helvetica, things would be well then?	19:10.48
Robin_Watts	No... you've got it backwards.	19:11.02
	The file says "Use LiberationSans" and gs says "I can't find LiberationSans, so I'm using Helvetica"	19:11.22
hyper_ch	can't find -> can't find in the pdf/gs	19:11.22
	ah	19:11.45
Robin_Watts	You either need to get the source documents to include LiberationSans, OR you need to make it so that gs can find LiberationSans.	19:12.00
	Now, how you make it so that gs can find LiberationSans, is a good question that I'm not entirely sure of the answer to.	19:12.29
	chrisl, rayjj, or kens would probably know.	19:13.05
hyper_ch	when you mention rayjj, he leaves the channel :)	19:14.58
Robin_Watts	Is GS_FONTPATH set on your system ?	19:15.26
	If not, try setting it to the path to where LiberationSans can be found.	19:16.26
	If that doesn't work, you may need to fiddle with FontMap, and at that point, I run away, sorry.	19:16.46
	http://www.ghostscript.com/doc/current/Use.htm#Font_lookup	19:16.57
hyper_ch	will do so	19:19.02
mvrhel_laptop	lunch time	19:37.44
	Forward 1 day (to 2015/11/11)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.