Ghostscript IRC logs

	<<<Back 1 day (to 2016/12/22)	20161223
sebras	tor8: there are two commits on sebras/master, do they make sense to you?	12:12.23
tor8	sebras: I'd rather fix the segfault to allow for NULL resource dicts	13:05.48
	resource dictionaries in pages are optional, after all	13:06.00
sebras	tor8: notice that this NULL, not null.	13:07.43
tor8	should not matter. pdf_dict_get(page_obj, "Resources") will return NULL not null.	13:08.38
	...if the resource dictionary is missing	13:09.04
	resource dictionaries are not required to be indirect objects either, as you state in the commit comment.	13:09.37
	they can be inline dictionaries	13:09.43
	and that is what you create in your patch as well...	13:10.00
sebras	in the file that was originally created, we create an indirect reference to an object that does not exist.	13:11.42
	this causes pdf_load_page to call pdf_resource_use_blending which eventually tries to resolve the object.	13:12.16
tor8	sebras: I'm confused. We don't create a file in make_fake_doc.	13:12.25
sebras	this cannot be done because there is no file backing it so fz_seek() segfaults in the end.	13:12.29
tor8	we pass NULL to pdf_add_page	13:12.42
sebras	tor8: exactly, so NULL is stored in the xref.	13:13.09
tor8	huh, why are we trying to seek in a newly created document?!	13:13.19
sebras	xref entry->obj	13:13.20
tor8	ah, pdf_add_page creates an indirect object for the resources dictionary	13:13.48
sebras	let me switch irc computer...	13:13.49
tor8	if (resources) pdf_dict_put_drop(...Resources) then?	13:13.58
	same with Contents, they are also optional	13:14.03
sebras	that might work.	13:15.16
	since this an in-memory only object we cannot afford to seek in the non-existing stream.	13:15.43
tor8	hm. reading the spec more closely, it says that if there are no resources we should create an empty dictionary (or else it will go looking for inherited resource dictionaries in the page tree)	13:16.34
sebras	tor8: right, so those if-statements in pdf_add_page() also fixes it.	13:16.35
tor8	sebras: I think the core issue here is calling pdf_add_object with NULL	13:16.51
	but crashing on such errors is also bad	13:17.24
sebras	tor8: maybe pdf_add_page() should be able to handle a NULL resource argument and create the empty dict in that case?	13:17.43
tor8	yeah. I think that would be best.	13:18.35
	and it should possible avoid creating a new indirect object as well	13:18.55
sebras	OTOH, if we do that you cannot add pages which you want to inherit the resource dict fro mthe page tree..	13:19.16
tor8	no. we don't want to get involved in that insanity :)	13:19.44
	that's just a brain dead bit of spec which we shall avoid at all costs when creating our own files.	13:20.17
sebras	this is actually what the file I'm using for testing happens to use. :)	13:20.37
tor8	we don't allow access to the page tree so there's no way for anyone to even do that	13:20.41
	using the pdf_add_page and pdf_insert_page apis	13:20.51
sebras	right, then the path to resolving this is clera.	13:21.02
tor8	yes. the only question is whether to put the resource dictionary inline or create a new numbered object for it.	13:21.42
	most files tend to use numbered objects for it	13:22.19
sebras	inline if it is empty and indirect if it is provided?	13:22.23
tor8	and that will let our garbage collection dupe detection coalesce them	13:22.34
	yeah, inline if it is empty and indirect if it is provided and not already indirect is probably best.	13:22.51
sebras	so the issue causing all of this is that there is no stream backing the fake document.	13:25.05
	this means that having the resource dict be an indirect reference to a non-existing object in a normal file ought to work fine.	13:25.35
tor8	yeah. there is normally no way to create a new document and have a bogus indirect reference that will try to load the underlying file.	13:25.49
sebras	no, but you could have a file we parse that says /Resource 99 0 R but not have 99 0 obj in there, and in thta case well error out on "object out of range" which results e.g. in "cannot find Font dictionary".	13:27.17
tor8	if it's a non-existent object, we'll throw exceptions based on it being out of range of xref->len and not try to load	13:27.27
	if it's created with pdf_create_object it'll be a 'f' free entry and thus not loaded	13:27.42
	and if it's updated with pdf_update_object then it'll have a proper object	13:27.54
	but if we call pdf_update_object with NULL all hell breaks loose	13:28.07
	so we should catch that right there	13:28.12
	and add a 'null' object	13:28.20
	or just no-op it	13:28.24
	sebras: try the two top commits on tor/master	13:31.16
sebras	that works for the tests I have been doing. mention bug 697183 in the commit message though.	13:36.47
	though I'm worried what will happen if csi->rdb is a null object and we look for a resource other than a font though.	13:37.33
tor8	sebras: in the page interpretation?	13:38.00
sebras	yes.	13:38.04
	do we handle not having a resource dict gracefully everywhere.	13:38.37
	in pdf-interpret.c	13:38.43
tor8	sebras: we'll just get NULL	13:39.50
	see pdf_process_Do where we look for image resources	13:40.04
	xres = dict_get(csi->rdb, "XObject") will return NULL if rdb in NULL	13:40.29
sebras	yes, we check xres for NULL.	13:40.35
tor8	and even if it were not checked, we'd eventually end up pdf_load_image with xobj==NULL where we'd fail on width==0	13:41.44
	sebras: was there somewhere specific you were looking at?	13:42.00
sebras	no, I'm looking at all uses of csi->rdb	13:42.18
	to make sure they're all robust.	13:42.43
tor8	resolve_properties is the only one that doesn't have an explicit check	13:43.22
sebras	true but then again the caller checks for !cooked.	13:43.49
tor8	which is for the marked content stuff used for optional content groups	13:43.53
	but that can cope with blank and missing entries well enough	13:44.05
	yeah.	13:44.28
sebras	load_font_or_hail_mary() is the only one I haven't looked at yet.	13:44.45
tor8	sebras: the other fixes in your make_fake_doc look reasonable	13:44.54
	missing the 's' stroke command on the content stream makes me wonder if this has ever been properly tested :)	13:45.17
sebras	yeah, I'll bring your commits over and create separate commits for that.	13:45.24
	I think not.	13:45.35
	also the bogus error handling seems to suggest that no... ;)	13:45.50
tor8	sebras: there is one other place where we create a 'fake' document	13:45.51
	in pdfportfolio.c when creating a new portfolio pdf	13:45.59
sebras	right, I'm actually attempting to add portfolio support to the x11 viewer just as an experiment.	13:46.22
tor8	but there we add some actual text and not just a cross	13:46.33
	sebras: okay. as an extra command line argument which portfolio entry to open?	13:47.04
sebras	oh, pdf_load_type3_font() checks for rdb being NULL. so I think we handle NULL everywhere in that case.	13:47.09
tor8	sebras: yes, we should. the loading code is pretty robust these days.	13:47.31
	it's the creation code that's had less field testing...	13:47.40
sebras	tor8: no, I wanted to generate a fake page with GoToE links. this is why i was looking into the make_fake_doc() code.	13:48.11
tor8	take a look at pdfportfolio.c	13:48.31
sebras	I did.	13:48.35
tor8	okay.	13:48.39
sebras	as of now, when you press F, you'll get a fake page with the names of the embedded files	13:49.03
	but as of yet no GoToE links.	13:49.12
	we don't support those, but they would be handy in this case.	13:49.33
tor8	sebras: right. we don't support those...	13:49.41
sebras	in order to support them fully we'd need to keep a hierarchy of pdf's in memory. :-P	13:50.17
tor8	I think if we allow mupdf to be called with a file:// url argument and crack that to go to the correct file, embedded doc, and page	13:50.24
	as an external URL	13:50.34
sebras	yes, I have already implemented an ugly hack to support mupdf-x11 file:///absolute/path/to/document.pdf#page=42	13:51.03
tor8	sebras: the other alternative is to have a pdf portfolio container fz_document which just puts all the portfolio entries in sequence	13:51.10
sebras	along with an initial fake page used as index?	13:52.15
	if we want.	13:52.19
tor8	yeah.	13:52.49
sebras	tor8: so pdfportfolio leaks a lot where the fake document is created. in case of error.	13:53.07
tor8	BUT ... pdf portfolios are such a bad idea I don't fancy the idea of wasting effort supporting them.	13:53.15
	sebras: yeah, but it's a command line tool so we don't really care.	13:53.28
	but if you want to clean it up, go ahead! :)	13:53.51
sebras	the resource dict is being fully populated though so there are no issues there.	13:54.23
tor8	if we get an error, we'll end up calling abort() with uncaught exception	13:54.27
sebras	yeah, I see that.	13:54.39
	tor8: I have another thing pending where I think the gif decoder needs to enable old_tiff LZW-decoding.	14:05.59
	tor8: but I don't have access to my entire gif suite now so I'm not sure if there will be regressions.	14:06.21
tor8	sebras: https://helpx.adobe.com/acrobat/kb/link-html-pdf-page-acrobat.html for specific pdf page numbers and named destinations	14:11.26
	but I can't find an equivalent syntax to go to an embedded portfolio file	14:11.36
sebras	yeah I found that link in git log and read it.	14:12.16
tor8	https://forums.adobe.com/thread/2111745 looks like the answer is "no can do"	14:14.07
	but that doesn't prevent us inventing our own syntax, like #subfile=N,page=M	14:14.51
	or #portfolio=N to be clearer	14:15.21
sebras	is N the name or a number though?	14:23.00
	tor8: the commits on sebras/master clustered fine, final LGTM?	14:23.40
serge111	Thanks for the guidance Robin_Watts, I'll fiddle with MuPDF and see what I can do.	14:34.40
karl_	any mupdf developer here ?	22:02.41
	Forward 1 day (to 2016/12/24)>>>

Log of #ghostscript at irc.freenode.net.