Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2013/06/27)	2013/06/28
sebras	after some updating I get this instead:	00:35.46
	error: Annotation object not a dictionary	00:35.49
	warning: ignoring broken annotation	00:35.49
	this is on a Make magainze pdf, but I just noticed that I get the same on pdfref17.pdf...	00:41.08
	Robin_Watts: looks like it might be 9d20a4f3a69fdea855f8678c1ad50b5db7472d81 causing the problem.	00:44.22
	wait... no.	00:46.20
	git blames paul in f07fedd9. I mistook the setting of is_dict = 0 for the problem.	00:46.56
	it is not.	00:46.59
	but in f07fedd9 paul changed the if (!is_dict) continue) into if (!is_dict) fz_throw(). which appears to be incorrect. as I read table 8.15 page 606 of pdfref17 AP is not required, hence we shouldn't error out...?	00:48.08
Guest51945	!list	02:10.58
jamma	hi !	04:24.35
	Can I merge PDF files and keep bookmarks ?	04:25.17
	seems that yes, but the dest pages are wrong..	04:25.34
ray_laptop	morning, kens	07:22.15
kens	Hi ray_laptop you're up late	07:22.23
	I replied to Mani because I thought you colonails would all be in bed :-)	07:22.41
ray_laptop	I saw your response. Thansk	07:23.02
	thanks	07:23.06
	we are both offloading the mupdf folks :-)	07:23.34
kens	Well, I have neough to do, but given those folks are in India, I thought it would be good to get them a response in their working day	07:24.13
ray_laptop	would be nice if we got a real customer	07:24.17
kens	One day :-)	07:24.24
	But its really hard to make money on apps	07:24.39
ray_laptop	kens: what I don't grok is the proliferation of video games at > $25 prices	07:28.53
	most are junk	07:29.03
kens	Absolutely, but people will pay for them, while apps are free in most people's mind (or nearly soo, 50c is nothing)	07:29.38
	Robin_Watts : tor8 ping ?	07:51.26
tor8	hi kens	07:51.35
kens	shobhit has opened a SO question about his compilation problem:	07:51.55
	http://stackoverflow.com/questions/17347594/errors-while-compiling-the-mupdf-for-android-platform	07:51.56
	He doesn't seem to have done what Robin_Watts asked him, so perhaps an answer on OS might prompt him to do as was suggested.	07:52.25
	There's also someone wanting to generate a thumbnail using MuPDF, but its not clear to me what he wants to do exactly, I'd have thought Mudraw was what he wanted:	07:53.28
	http://stackoverflow.com/questions/17348382/mupdf-generate-thumbnail	07:53.29
tor8	kens: is there a bugzilla bug for shobhit's question?	07:55.29
	e-book app ... yet another gpl violator? :)	07:55.57
	and his question is incoherent... or incomprehensible	07:56.56
	"thumbnail for each pdf file in my project" huh?	07:57.08
kens	tor8 no, he hasn't reportted a bugzilla bug, he was on here (and off, and on and off) yesterday	08:10.27
	tor8 yes, that was waht I meant when I said I didn't really understand his question. Presumably if he wants an image he can just use mudraw at a lower resolution...	08:11.07
tor8	kens: yeah. I'll answer him how to use mudraw.	08:14.13
kens	tor8 thanks	08:14.19
tor8	Robin_Watts: paulgardiner: I think I have a suspicion why opening a document like pdfref17 takes so long...	09:09.42
	it's finding the page numbers for the link destinations that's taking ages	09:09.54
	(I made it even slower, and now it's really slow...)	09:10.11
paulgardiner	I hadn't realised it did that when opening the document. Had thought that was a page-load thing	09:12.12
tor8	it happens when loading the bookmark outline thing	09:12.58
paulgardiner	Oh I see	09:13.11
tor8	we currently do a linear search through the page tree to find a hit	09:13.44
	one way to speed that up would be to cache the page numbers in the pdf_xref_entry	09:14.31
	but I'm going to try an approach that follows the Parent links and adds up the Count, but I suspect that'll be both tricky to write and not very fast	09:15.15
Robin_Watts	Let me answer him.	09:15.41
tor8	I think the parent link following is most robust in the long run. less duplicated state that needs to be kept in sync in the face of edits.	09:16.26
	another way would be to make links opaque with callback to get the page number out	09:19.28
paulgardiner	That sounds quite attractive	09:20.52
tor8	but I think first make it a dynamic lookup through the page tree, and then callbackify it to reduce the startup overhead	09:21.42
	Robin_Watts: I forgot that your page tree editing existed... they will need some work too.	09:22.09
Robin_Watts	tor8: ok.	09:22.23
	http://prism.andrevv.com/	09:38.20
kens	:-)	09:38.52
Robin_Watts	And for those of you not on facebook...	09:39.49
	Heisenberg is driving down the motorway when a cop pulls him over. "Sr, Do you know how fast you were going?" "No, but I know where I am!"	09:40.35
	"You were doing 100 miles an hour!" "Great. Now I'm lost!"	09:40.53
kens	Hmm, but if Heisenberg knows where he is, the cop can't know how fast he was going....	09:41.27
*kens*	thinks the GS PDF Form implementation might be slow	09:43.18
paulgardiner	Robin_Watts: Oh, I do like that. :-)	09:50.55
kens	OK so its not the form processing per se that's slow, its a particular form that's slow	09:51.26
sebras	paulgardiner: did you see my discovery about issues with pdfref?	09:55.02
	paulgardiner: basically your commit (see the logs) requires "AP" in the annotation dict, which is not correct.	09:55.29
tor8	Robin_Watts: going for lunch, there are some preliminary pagetree commits on tor/pagetree	09:55.40
	I need to fix the create/delete/insert page functions as well	09:55.55
	(or if you want to tackle that while I'm out)	09:56.05
	there's some problem with annotations with those changes though...	09:56.23
	oh, those are broken before the page tree commits as well... so nvm that.	09:57.14
paulgardiner	Oh right. Thanks sebras.	10:01.27
	Hmmm. But old behaviour if no ap was to continue, which would step to the next array element without creating an annot struct for the current one. Now we're creating an annot struct but later removing it if no ap. I don't understand what's wrong.	10:06.27
	sebras: ^	10:07.21
Robin_Watts	paulgardiner: There are 2 changes on robin/master.	10:27.02
paulgardiner	Robin_Watts: okay	10:27.25
Robin_Watts	I think tor was happy with them last night, but if you could nod at them too, I'd be grateful.	10:27.27
	tor8: (For when you get back). It's not quite a "linear" search, is it? You'll skip whole subtrees with the count stuff.	10:37.43
	And I don't think the parent thing will help you, will it?	10:38.53
paulgardiner	fz_disable_device_hints(dev, FZ_IGNORE_IMAGE): should I read that as a double negative?	10:51.35
Robin_Watts	Yes.	10:51.49
	The FZ_IGNORE_IMAGE hint says to "ignore images".	10:52.06
paulgardiner	Right. So default is no images	10:52.10
	Both look good to me	10:52.23
Robin_Watts	Default for what device?	10:52.23
*Robin_Watts*	wonders what paul is looking at.	10:52.41
paulgardiner	HTML extraction ...	10:53.01
Robin_Watts	no, the two before that.	10:53.08
	sorry.	10:53.10
paulgardiner	Ah. No my fault.	10:53.28
	I was looking at 2 and 3	10:53.39
	2 is good	10:53.43
	Both good	10:54.10
Robin_Watts	Thanks.	10:54.15
paulgardiner	Robin_Watts, tor8: I have two ready to go too, although not rebased past those two yet.	10:55.40
	cluster test of mujstest is now happy with them	10:55.58
Robin_Watts	paulgardiner: As soon as I finish looking at tor8's I will look at yours.	10:56.10
paulgardiner	ta	10:56.16
	No hurry. I'm mostly not intending to work today. I was just trying to get that one finished	10:56.48
Robin_Watts	tor8: In your first commit you have: for (i = 0; i < pdf_array_len(kids); i++)	10:59.29
	That's bad - calling a function every time around the loop.	10:59.40
	Also, in general, our handling for pdf arrays is frequently bad.	11:00.22
	Actually, scratch that, sorry.	11:01.43
	tor8: Second commit looks fine, except: 1) Where you mark/unmark you need try/catch logic.	11:12.14
	2) Rather than limiting pdf_lookup_page_number by depth you should use mark/unmark.	11:12.36
	In the third commit, should really mark/unmark in pdf_lookup_inherited_page_item too.	11:14.11
	tor8: I'm going to prod at the commits a bit. Yell when you're back.	11:14.42
*tor8*	yells at Robin_Watts	12:03.43
	for 2) the depth, lookup_page_number is fairly performance critical at load time now so I'm wary of doing a lot of extra book-keeping and cleanup logic	12:04.48
Robin_Watts	tor8: right.	12:06.12
	I have new versions of the commits on robin/pagetree.	12:06.24
	And I am working on a version now that avoids doing try/catch at every level, but is still safe.	12:06.46
tor8	and unwinding the stack of mark/unmark needs a dynamic array (or recursion) to recover, which means a lot of extra try/catch	12:06.47
	the depth limit should cover our bases, and the normal page loading should catch most errors	12:07.27
Robin_Watts	Leaving marks throughout the page tree is bad. I think I have a way to get us both speed and correctness. Let me bash on it for a bit.	12:08.01
tor8	for correctness, recursion instead of iteration is cleanest. but also slower.	12:10.49
	a depth counter is fast and catches the error eventually	12:11.08
Robin_Watts	recursion to a depth of 100 would be bad.	12:11.33
tor8	yes, but we only iterate to a depth of 100 a.t.m.	12:12.03
	if we recurse we'll have to mark and stop and unwind	12:12.16
	or keep a separate stack and check both depth and then mark/unwind	12:12.34
	but bash on it a while longer if you wish, maybe you can figure something elegant out	12:12.57
Robin_Watts	but I've seen files that do: 1 0 obj << /Count 1000 /Kids [ 2 0 R .... 11 0 R ] >> ... 11 0 R << /Count 990 /Kids [12 0 R ... 21 0 R ] >> etc	12:13.03
tor8	I'm more concerned with how page tree edits will affect links	12:13.10
	Robin_Watts: so a very tilted page tree?	12:13.37
Robin_Watts	yeah.	12:13.41
tor8	ick.	12:13.44
	right. depth of 100 will hurt, but so will unwinding marks on errors :(	12:14.04
Robin_Watts	At the moment, the code won't compile, right?	12:14.13
tor8	build/debug/mupdf will compile	12:14.25
Robin_Watts	Right, but pdfclean doesn't.	12:14.37
tor8	but not mutool (due to the uncommented page creation code)	12:14.39
Robin_Watts	right.	12:14.47
tor8	the try/catch safety commit looks fine	12:16.46
	anywhere we save page numbers external to the pdf_obj hierarchy should be refitted to do dynamic page lookups from object numbers IMO	12:17.41
	so we can do safe page edits etc	12:17.50
Robin_Watts	tor8: I think you're right.	12:17.53
tor8	which means we have to make fz_link etc subclassed by the document types	12:18.03
Robin_Watts	There should be an FZ_NORETURN change on there.	12:18.07
tor8	have you tried the noreturn changes on gcc?	12:18.39
Robin_Watts	and your first commit is tweaked slightly - I rolled the 'delete dead code' commit into it and tweaked some for loops to not call functions.	12:18.48
	Yes, I just cluster tested it.	12:18.57
	I'm being called for lunch. back in 40 mins, sorry.	12:19.37
tor8	fab.	12:19.48
	oh bollocks. merge cockup with winrt/mupdfwinrt/status.h	12:23.10
Robin_Watts	back.	12:57.04
tor8	Robin_Watts: so how do you want to progress now?	13:00.51
	pdf_insert_page and pdf_delete_page should be able to work on the existing page tree with some careful thought	13:02.59
	pdf_balance_page_tree could rebuild a new balanced page tree, if we take the naive approach when implementing insert/delete page	13:03.57
Robin_Watts	tor8: How is the speed at the moment ?	13:11.09
tor8	which speed?	13:11.26
Robin_Watts	The speed of opening a large PDF file.	13:11.40
	Is that better or worse than before ?	13:11.45
tor8	pdfref17 is too fast on my desktop to notice. with the naive lookup_page_number it was noticeably slow though.	13:12.23
Robin_Watts	So this code is not noticably worse than the old one? That's all I was worried about.	13:14.01
	It feels like it should be faster not having to load the whole tree when we start.	13:14.21
tor8	let me do some comparisons	13:14.44
	it's faster :)	13:15.07
	new code loads and prints the outline in 127ms with little variation	13:15.42
	the old code takes between 120 and 150ms with a lot of variation	13:16.00
Robin_Watts	fab.	13:16.35
tor8	this is on debug builds though	13:17.18
	but still, a measurable difference at these time scales is pretty good to have	13:17.35
Robin_Watts	yes. nice one.	13:18.28
	Of course, this is all a complete pain in the ass for progressive loading.	13:18.57
tor8	let me try just nuking the outline loading to see what happens then	13:18.59
Robin_Watts	because we can't use the page tree at all.	13:19.13
tor8	Robin_Watts: why?	13:19.23
	or rather, why not?	13:19.26
Robin_Watts	page tree = last thing in the file.	13:19.33
	pretty much.	13:19.38
tor8	how did that work before then?	13:19.40
Robin_Watts	My previous progressive loading stuff would find the first page, and populate the entry manually.	13:20.05
	Then when the rest of the file arrived, it would read the page tree.	13:20.15
	The right way to do it is to either use hintstreams to tell you where pages begin/end or spot /Type/Page as you read objects.	13:20.59
tor8	if we skip loading the outline, the document loads in 90ms	13:20.59
Robin_Watts	tor8: Right, for progressive stuff I need to skip the outline loading.	13:21.17
	cos that's also at the end of the file.	13:21.23
tor8	the outline is loaded manually by the clients	13:21.38
Robin_Watts	Right. In pdfapp though it's loaded up front.	13:22.05
tor8	and I suspect if we delay that (in all apps and the android in particular) to when the outline view is actually shown we could gain significant load time speedups	13:22.29
Robin_Watts	For progressive operation, I'll need to have a list of page objects I can populate as it loads.	13:22.45
	and then we'll bin that and go back to the normal mode of operation at the end of loading.	13:22.58
tor8	ouch. hasOutlineInternal loads and tosses the outline	13:23.15
Robin_Watts	so no editing documents etc while they load - which I don't think is unreasonable :)	13:23.16
	tor8: yeah.	13:23.21
tor8	then it's reloaded in getOutlineInternal	13:23.21
Robin_Watts	but this feels like a much better way to be working.	13:23.40
tor8	Robin_Watts: you can't be "sure" that the first /Type/Page you encounter is page 1 though	13:23.48
Robin_Watts	tor8: The linearised object at the top of the file tells me what the first page is.	13:24.19
	After that I am guaranteed to get them in order.	13:24.31
tor8	right. so look at that for page 0 in these functions?	13:24.34
	and for all pages > 0 require the page tree	13:24.49
Robin_Watts	tor8: These functions will gain a bit that says: if (doc->progressive) { ... look it up in the array ... } else { do what we do now }	13:25.18
tor8	I think, as much as we can, we should be working at this level now that we support pdf file editing	13:25.26
	all this loading stuff into internal c structs to simplify other bits of code that uses it has to go :(	13:25.46
	which array?	13:26.17
Robin_Watts	tor8: I'll need to have a temporary array that maps page number -> object number or something for use during loading.	13:27.01
tor8	that stuff isn't part of the hint stream?	13:27.19
Robin_Watts	and that will get binned at the end of progressive loading.	13:27.21
sebras	paulgardiner: sorry for the delay. I think you're mistaken, if "AP" is not present we throw an error, should we do that?	13:27.24
Robin_Watts	1) We can't rely on the hint stream	13:27.34
sebras	paulgardiner: i.e. if is_dict == FALSE.	13:27.35
Robin_Watts	cos no one uses it, therefore no one generates it right.	13:27.47
	2) Even if we could rely on the hint stream it's compressed, so we'd need to unpack it into such an array anyway.	13:28.09
tor8	Robin_Watts: right. I'd be more comfortable just doing page 0 that way than all the pages (since no one generates files properly...)	13:28.23
Robin_Watts	so I'd rather have an array and populate it as we go.	13:28.29
	tor8: Well, chrome manages to display pages as they load. and we're aiming to be as good as chrome.	13:28.52
sebras	paulgardiner: I don't think there is a requirement in pdfref that annotations must have "AP".	13:28.56
tor8	but anyway, I just changed all the accesses to page_objs/refs to pdf_lookup_page_obj (or should that be page_ref?)	13:29.00
sebras	hence I argue that we shouldn't throw in this case.	13:29.06
Robin_Watts	tor8: right. I am absolutely in favour of all your changes.	13:29.27
	I've just pushed a new commit to robin/pagetree	13:29.44
	Should give us safety with speed.	13:29.56
paulgardiner	Hi sebras. I thought that was correct because the catch clause doesn't bomb out completely: it just removes that annot from the list, which is the same effect as the continue that was in place before, because the continue was before the creation of the annot structure	13:30.57
sebras	paulgardiner: hm... but why is the error printed to the console in that case?	13:32.03
paulgardiner	There's still an fz_warn in there. But there was before, so I wouldn't expect an error printed out where there wasn't before.	13:32.57
sebras	paulgardiner: I guess that is my main gripe. if we do internal error handling using throw to skip over annotations that dont have AP then that is fine, but we shouldn't complain about it.	13:32.58
tor8	Robin_Watts: ugh. so. much. code.	13:33.07
Robin_Watts	tor8: Well, my worry was that I didn't want to make the code slower. If it's not slow,then we don't necessarily need the faster version.	13:33.45
sebras	paulgardiner: hm. can you try rendering pdfref17.pdf, just the first few pages. if you see no errors/warnings printed for those pages then there might be something strange with my build.	13:33.55
tor8	I think lookup_page_number is the one that has to be fast, not lookup_page_obj	13:34.06
	since that one is used to load all links and link dests	13:34.16
sebras	paulgardiner: though I rebuilt everything from scratch last night, so I would be really surprised if I messed up. :)	13:34.16
tor8	though if we make fz_link opaque with dynamic page lookup on demand, neither needs to be blazing fast	13:34.35
paulgardiner	sebras: and this is an error that didn't previously appear, before these changes?	13:34.57
	Ah! Robin spotted a problem with pdfref17.pdf, but that's supposedly fixed now.	13:35.42
Robin_Watts	sebras: there was a problem that used to spit lots of "can't load object 333278 into cache" or something like that errors	13:36.34
	tor8: So are we really happy with just a depth counter?	13:37.51
	I'd like to have a go at pdf_mark/unmarking there.	13:38.43
tor8	I think I'm happy, but maybe bump it up beyond 100 if we run into these bad deeply nested cases too often	13:43.14
	you could try a recursive variant with mark/unmark though	13:43.37
	but that's going to hit stack problems with those same files	13:43.50
	I'm having a go at insert/delete page	13:44.06
	which needs a more generic lookup_obj function that returns the parent and index in kids array	13:44.22
Robin_Watts	tor8: I think I can do a non-recursive version with mark/unmark.	13:44.57
paulgardiner	sebras: I do see "broken annotation" warnings pdfref17.pdf. We could not warn in that situation, but what I'm more concerned about is if these warnings are newly occuring. Something isn't working as I thought if these warnings weren't appearning before these changes.	13:47.41
Robin_Watts	actually, I think there is a flaw in my faster version with less try/catching.	13:48.02
	By using try/catch in the cleanup path, I think I lose the ability to rethrow the error.	13:48.29
	actually, no, I should be fine.	13:49.07
paulgardiner	sebras: oh hang on. You're right. We'd have avoided a warning there before. Damn! that will be awkard to handle neatly without using a special error code, or a local flag	13:50.33
	Robin_Watts, tor8: now we have error codes, do you think it's reasonable to use them in earnest even for cases where it's just so that we easily share error clean up code?	13:55.15
Robin_Watts	paulgardiner: No point in having 'em if we don't use 'em.	13:56.14
paulgardiner	In this case I'd be introducing a missing-appearance-stream error, just so that I can test for it and avoid issuing a warning.	13:56.22
Robin_Watts	Argh. That's the excuse for all C++ code being awful isn't it ?	13:56.25
paulgardiner	:-)	13:56.53
	sebras: anyway, thanks for spotting that, and sorry for being so slow to spot what was going on.	13:57.24
tor8	Robin_Watts: you know what, how about just leaving dirt marks around if there are errors? that's screw up recovery on incremental loading though won't it?	13:57.26
Robin_Watts	It'll screw everything up.	13:58.12
	but in particular incremental loading.	13:58.26
tor8	I hate error handling... it always ends up being five times as much code and completely obscures what the code is supposed to be doing, exceptions or error code returns doesn't matter it's all just the same ugly crap in the end :(	13:59.40
	and our allman-style brace convention really adds to the number on semi-blank useless lines for try/always/catch	14:00.16
paulgardiner	But exceptions at least avoid it polluting a large number of functions	14:00.19
Robin_Watts	Exceptions generally make the error handling much easier to read, but in this case, I'm working to avoid the try/catch overhead.	14:00.28
paulgardiner	At least you don't have to explicitly pass errors through functions that don't even need clean up	14:01.02
Robin_Watts	Just thinking about it... if we hit a file with a large unbalanced tree of pages, we'll overflow the stack in the pdf_lookup_page_obj stuff.	14:01.12
tor8	Robin_Watts: we did that in the old code too though	14:01.54
	that one was also recursive	14:01.57
Robin_Watts	http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=0e4c512bb4b73d575cfa6640efe6ed6372f6d060	14:03.10
tor8	if (mark) both sets and tests?	14:06.24
Robin_Watts	it does.	14:06.31
tor8	won't the always block hit the same cycle and loop forever?	14:07.11
	while (parent2 != parent)	14:07.32
Robin_Watts	eh?	14:08.01
	I just updated my patch. How did you find the new one?	14:08.11
	http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=8438125a08d50467a1ffc25ec8009ec3af853730	14:08.28
	oh, you didn't yet. but I think the above solves it.	14:08.47
tor8	yeah, it looks reasonable. head deep in some code of my own so haven't considered all the cases.	14:09.50
	or is that neck deep?	14:09.58
sebras	paulgardiner: no problem, I meant to fix it last night, but I have slept badly the entire week so I opted for just telling you instead of spending the night bugfixing mupdf. ;)	14:10.24
dogisfat	Why does the *_plane_data portion of a driver provide a height, from my experience I have only seen one passed in as the height. Is it not possible to process the image all at once?	14:10.33
sebras	paulgardiner: it doesn't matter that you broke it as long as we fix it. :)	14:10.46
Robin_Watts	dogisfat: No, it is not always possible to process all the image at once.	14:12.09
paulgardiner	sebras: yeah, but still good to know one way or the other and the mechanism by which it's broken.	14:12.31
Robin_Watts	tor8: I will do the same modification to pdf_lookup_inherited_page_item.	14:12.33
dogisfat	Robin_Watts: Can you give me an example? I am trying to better understand how images work within GS.	14:12.49
Robin_Watts	dogisfat: The device interface is driven from all sorts of different places.	14:13.46
	Sometimes the places that are calling it don't have whole images available.	14:14.06
	So they have to be able to pass the data they do have.	14:14.14
	In particular we might get images in in some obscure compressed format.	14:14.50
dogisfat	That makes sense. Will one row aways be passed at a time, never more?	14:14.50
Robin_Watts	dogisfat: No. We can pass as many lines as we have.	14:15.07
dogisfat	Ah, thanks!	14:15.18
	Also I understand that PDFs are flattened when there is transparency within the image. What would happen if there were a PNG with transparency in the PDF, will that be passed as a transparent image to *_plane_data?	14:16.02
Robin_Watts	dogisfat: Different type of transparency, I believe.	14:17.18
	actually, no. ignore me.	14:17.29
	PDFs do not ever have PNGs in them.	14:17.42
chrisl	We don't support PNG in PDF, IIRC	14:17.42
kens	pdfwrite doesn't, no	14:17.55
chrisl	PNG is an almost never implemented optional filter in PDF, IIRC	14:18.13
dogisfat	Lets say then any image format with transparency found in a PDF.	14:18.40
kens	pdfwrite only flattens transparency when going to a version of PDF < 1.4 or to PostScript	14:19.02
	But that's more to do with transparency than images	14:19.38
Robin_Watts	chrisl: PNG is not even mentioned in the spec!	14:19.56
tor8	Robin_Watts: ugh. missing functions I want: pdf_array_delete and pdf_array_insert_drop	14:20.51
Robin_Watts	dogisfat: Most image formats (including all of those used in PDF I believe) support (at most) 'on' or 'off alpha channels (akin to gif)	14:20.56
chrisl	Robin_Watts: Oh, maybe I dreamed that, then......	14:21.12
kens	I dn't think images carry transparency as such in PDF< they can be associated with a SMask, or in a transparecncy group, but that is all (I think)	14:21.19
dogisfat	Ok, thank you very much	14:21.32
Robin_Watts	The only way to get 'proper' alpha transparency is to use, as kens says, either an SMask, or a transparency group.	14:21.39
kens	As robin says, ther is also the imagemask but tha'ts not exactly transparency	14:21.43
Robin_Watts	kens: PDF can do on/off transparency using colormasking things too.	14:22.06
	but again that's not real transparency	14:22.22
kens	Robin_Watts : so can PostScript, but its stil masking, not really transparency	14:22.25
	type 4 images if I recall are chroma keyed	14:22.49
dogisfat	I am really interested in extracting elements from different document types and trying to figure out how the data will be available to me.	14:23.29
Robin_Watts	dogisfat: Using gs? You'll always get the uncompressed image data fed to you.	14:27.41
	If you're writing your own device, then you can avoid using the clist, and thus hopefully avoid getting images clipped into several bands.	14:28.31
dogisfat	Robin_Watts: Right, but imagine I have a PDF as an input file with two overlapping images with transparency. How will those arrive in my driver? Obviously this depends on the way the driver is written, but in my case I am intrested in how it will arrive at the *_plane_data function	14:30.02
Robin_Watts	PDF files are intrinsically drawn with the painters algorithm right?	14:31.18
	start at the back, move forwards.	14:31.26
	New things overlay old things.	14:31.41
	So generally you'll expect to get things in order.	14:31.58
	HOWEVER, for transparency, we do all sorts of stuff with compositor groups etc.	14:32.15
	so it can get complex.	14:32.34
	Can I ask, is postscript important to you?	14:32.43
dogisfat	Yes, as much so as PDF if not more	14:32.53
Robin_Watts	ok.	14:33.01
	so I can't suggest that you look at mupdf then.	14:33.12
dogisfat	I am writing a driver to intercept PS files from the OSX print system and break them into graphical elements. I also wish to support as many other file types as possible and PDF happens to be an easy one to test with.	14:34.22
	Yeah, I have evaluated a handful of packages but thus far GS supports the most features.	14:34.57
Robin_Watts	dogisfat: I think there is a trace device that can be built into gs.	14:36.23
dogisfat	Robin_Watts: I am using it as a model.	14:36.44
Robin_Watts	right, so with that you can see exactly how the device is tickled.	14:37.50
dogisfat	I will, however, I need to see with what it is supplied and I am having trouble getting said information out. I am so inexperienced with GS that performing simple tasks such as writing data out to a file can be very cumbersome.	14:40.11
Robin_Watts	dogisfat: If you enhance the trace device to trace with more information than it currently does, then let us know. We'd be interested to see what you come up with.	14:41.26
dogisfat	I will be sure to do that!	14:41.53
Robin_Watts	tor8: New version of the latest commit on robin/pagetree	14:42.15
	tor8: Is there a separable bit of work I can do to help you?	14:42.37
	Oh, I should review paulgardiners changes, unless you have.	14:42.50
dogisfat	One last question if I may. Will plane_data be called sequentially until all of an images information has been passed?	14:44.12
Robin_Watts	Yes.	14:44.52
dogisfat	Awsome	14:45.08
Robin_Watts	I mean, a caller might decide to decimate an image and send it as several smaller images over the device interface.	14:45.18
	As long as the device interface sees all it needs to render stuff, it doesn't care.	14:45.33
	but why would they?	14:45.37
	99 times out of 100 callers do "the thing that is simplest for them", which in this case is generally to run through an image sending all it's data in conveniently sized lumps.	14:46.31
dogisfat	Ok, that makes sense, I just remember reading that calls to one of the image handlers was such that you had to keep track of which image was being processed.	14:47.45
tor8	Robin_Watts: more pagetree stuff on tor/master	14:48.17
Robin_Watts	dogisfat: Right. For some PS image functions, you get image and alpha interleaved.	14:48.20
tor8	reimplemented lookup_page_obj, that needs try/catch sanity checking	14:48.33
	and check for off-by-one errors in the array_insert/drop	14:48.45
dogisfat	Robin_Watts: Thank you for al of your help.	14:48.49
Robin_Watts	and they are possibly sent as separate images.	14:48.51
	np.	14:48.53
	tor8: Let me look.	14:49.06
	The first of paulgardiners commits looks fine. I still need to look at the second one.	14:49.29
	previously pdf_array_insert would have inserted at the front ?	14:50.27
	if we need that we should call it pdf_array_prepend (and likewise pdf_array_append)	14:51.18
tor8	we have pdf_array_push for adding to the end	14:51.32
	insert used to insert at the front, yes	14:51.51
Robin_Watts	pdf_array_insert is broken.	14:52.15
	second arg to memmove is wrong I think. Needs a +i ?	14:52.24
tor8	d'oh! it used to have a+i	14:52.38
	must've undone once too many somewhere	14:52.53
	the retainpages code is probably broken now in pdfclean, with the resources etc being inherited from the page tree	14:53.20
Robin_Watts	does pdfclean now build?	14:53.42
tor8	everything builds now	14:53.49
	retainpages could be rejigged to call pdf_delete_page	14:54.28
	and then make a tree rebalancer (or just recreation function, based on the #if 0'd code)	14:54.45
	that should help us not have to recreate the dests name tree as well	14:55.40
	we still have to take care of page resources	14:56.22
Robin_Watts	so, have you taken on my try/catch stuff as a basis for these commits ?	14:56.27
	yes, you took on the simple versions.	14:57.01
	OK.	14:57.03
tor8	no. the lookup_page_obj had to be rewritten, and I haven't cherry-picked over your second one	14:57.03
Robin_Watts	You took on: Add try/catch safety to pdf_lookup_page_obj_imp	14:57.28
tor8	yes, I used your simple version	14:57.30
	with the skip counter instead of offset	14:57.37
Robin_Watts	right, that makes the diffs make sense :)	14:57.38
	You can't pass &needle in to pdf_lookup_page_loc_imp	14:58.52
	because if !hit you need the original value of needle for the error message.	14:59.10
	That's why I had int skip = needle; and passed &skip	14:59.21
tor8	oh, right you are	14:59.42
	fixes pushed	15:00.12
Robin_Watts	For pdf_lookup_page_loc, I might be tempted to have a structure with a pdf_obj * and an int * in.	15:00.25
	And then pass the pointer to that rather than 2 separate pointers.	15:00.31
	on ARM having 4 or less params is nicer.	15:00.46
tor8	too many types! I see your point, but ugh.	15:00.48
Robin_Watts	minor thing.	15:00.55
	It means that if we ever have to update what a "location" is, we can do it without changing too much code.	15:01.20
tor8	hopefully won't see too many calls there	15:01.23
	we could hoist the Count skip thing up to before we call the recursion	15:01.43
Robin_Watts	Root/Pages is guaranteed not to be a Page is it?	15:04.26
tor8	I believe it must be a page tree node (as opposed to page tree leaf)	15:05.53
	since it must have a Count at the top somewhere	15:06.14
Robin_Watts	tor8: Are you going to rebase -i fixup those fix commits ?	15:07.21
tor8	sure. let me just try hoisting the count skip update first.	15:07.45
Robin_Watts	ok. I'm not sure I see how that will work, but I'll wait for it :)	15:08.27
tor8	pull :)	15:08.52
	not sure if that catches every corner case though	15:09.39
	ignore that one, I'm too tired for it... off by one bugs everywhere too	15:10.21
	I'll squish the rest up and push a cleaned fixup	15:10.37
dogisfat	Are there routines avaliable that I can call to convert binary data into Base64 available within G?	15:10.48
	GS*	15:10.51
Robin_Watts	tor8: oh, I see. On the grounds that we should never be out of range of the topmost item, hence pushing in doesn't hurt.	15:11.24
tor8	yeah.	15:11.36
Robin_Watts	Well, you've lost my Mariotastic comment :(	15:11.48
	if (skip >= count) { skip -= count; break; } hit = pdf_lookup_page_loc_imp;	15:12.48
	no need for if(hit)break then, and fewer lines overall?	15:13.05
	We had a rule at my previous place that we should avoid right hand creep where possible, so it's become a habit.	15:13.39
	Oh, ignore me! We still need if (hit)break.	15:14.28
	so, I approve of your first 2 commits, and the fixes for them.	15:15.04
	I still have to read 3 and 4.	15:15.15
	oh, 4 is another fix. That's fine too.	15:15.34
tor8	Robin_Watts: squished and resorted commits on tor/pagetree now	15:15.34
	still need to rebase past an area of conflict...	15:15.54
Robin_Watts	First one is still broken :)	15:16.10
	+1 != +i	15:16.17
tor8	shoot me now.	15:16.32
SparFux	Hi all. In Fedora F18, I get a foomatic-rip error: http://pastebin.com/ec2Lgc7A	15:16.47
Robin_Watts	It's a friday.	15:16.48
kens	SparFux : looks like your PostScript file is broken. If youi want someone to look more closely, open a bug report	15:17.38
Robin_Watts	SparFux: You either need to speak to tkamppeter, or you need to reduce the problem to a direct invocation of ghostscript.	15:17.49
kens	SparFux : the last error is 'broken pipe' so perhaps you need to back to teh Foomatic rip people	15:18.43
SparFux	Yeah, gs can actually display the file on screen.	15:19.16
kens	SparFux : then it sounds like its not us....	15:19.33
SparFux	so I guess it's not broken, but it might be the foomatic-rip stuff.	15:19.34
	kens: thx :-)	15:19.39
Robin_Watts	tor8: Your second commit has: "// TODO: inherit" in it.	15:20.04
	just after you add the inheritance code :)	15:20.12
	actually, there is a problem with pdf_array_insert I think.	15:21.58
	Suppose I have an array with 2 things in, and a cap of 4. And I want to insert at position 10.	15:22.26
tor8	yeah, we're not good about bounds checking there	15:22.50
	should probably add some	15:22.59
	0 <= i <= array->len	15:23.21
Robin_Watts	same issue in pdf_insert_page	15:23.26
	tor8: That feels reasonable.	15:23.37
tor8	I gotta go soon, I've pushed rebased version on tor/pagetree	15:23.42
	don't push the hoist one whatever you do, I think it's broken	15:23.50
Robin_Watts	I can do these little fixes if you want.	15:23.55
tor8	go for it. I'm not fit for more coding today, considering how many times I tried to fix array_insert and failed >.<	15:24.19
Robin_Watts	ok.	15:24.51
	I'm gonna take a 20 minute break.	15:25.05
tor8	make sure to pull before you start again, there's one more fix up	15:27.41
Robin_Watts	will do	15:28.26
dogisfat	How does the ImageMatrix describe the translation/transformation of an image?	15:46.22
kens	dogisfat, that's really outside the scope of an IRC discussion, you should read teh PostScript language reference manual or the PDF reference manual	15:47.08
dogisfat	Ok, thank you.	15:47.21
SparFux	Sorry guys! Was definitely my fault. My direct printing script and the cups configuration used an old IP.	16:04.19
kens	Good to know you've sorted it out SparFux	16:04.42
SparFux	kens: yeah, just didn't want to leave anybody in doubt in might have been postscript or some software has a bug not found yet :-)	16:05.26
Robin_Watts	tor8: IF we have a /Page with a /Count in it, we'll go wrong.	16:25.49
tor8	Robin_Watts: in which function?	16:28.30
Robin_Watts	pdf_count_pages_before_kid	16:28.53
	robin/pagetree has the out of bounds fixes on.	16:30.10
	proposed fix for pdf_count_pages_before_kid on robin/master.	16:32.41
	robin/pagetree	16:32.47
	oops. broken. but you get the idea.	16:34.21
	Fixed one there now.	16:35.58
	and now a version that compiles too.	16:37.22
	tor8: If I don't speak to you before, btw, I hope the move goes well.	16:38.36
tor8	Robin_Watts: right, they both look fine	16:55.53
	and thanks :)	16:56.03
Robin_Watts	tor8: np.	16:56.12
	cluster testing shows a file that goes wrong.	16:57.35
	will look into it.	16:57.38
	And a file that now works :)	16:58.01
	ghostpcl/tests_private/pdf/PDF_1.7_FTS/fts_07_0704.pdf has page 2 with no /Type/Page	17:00.36
	oh, it's a /Type /Template	17:01.02
	cluster problems aren't really problems, so it looks good.	17:23.31
ray_laptop	Robin_Watts: Scott's trying to skype you	20:57.00
	Forward 1 day (to 2013/06/29)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.