Ghostscript IRC logs

	<<<Back 1 day (to 2016/09/21)	20160922
sebras	tor8: so http://bugs.ghostscript.com/show_bug.cgi?id=697018 basically requires gatherresources() to be rewritten to not be recursive.	11:07.24
tor8	sebras: cycles in the object structure?	11:07.56
	sebras: look at pdf_resources_use_blending for how we use pdf_mark_obj to prevent infinite loops	11:09.28
sebras	tor8: kind of. cycle when repairing pdf.	11:09.34
	tor8: so it looks like there's a cycle in the object structure.	11:09.40
tor8	oh, gatherresources triggers a reparation?	11:09.49
sebras	tor8: yes, and after that a recursion.	11:10.06
tor8	we repair recursively?	11:10.27
sebras	tor8: bug I guess you could conceivably have a syntax correct pdf that is recursive too.	11:10.39
tor8	if the problem is that a pdf has cycliclal reference chains: page has a resource has an xobject has the same resource has an xobject has a stack smash	11:11.40
	then pdf_mark_obj is the way we deal with those	11:11.58
	basically flagging an object as "we've already been here, no need to try again"	11:12.13
sebras	tor8: yes, something like that would work.	11:12.36
	tor8: I see now tha the repair is triggered before reaching gatherresouces()	11:12.58
	tor8: btw, when are you travelling to the meeting?	11:27.09
tor8	28th	11:27.30
Robin_Watts	repairs should never recurse.	11:30.58
sebras	Robin_Watts: it's not the repair that recurses.	11:33.49
	Robin_Watts: it's the repaired PDF that has recusive dicts.	11:34.08
	tor8: Robin_Watts: commit on sebras/master the reproducer.pdf now works fine in valgrind and it clusters.	11:34.40
Robin_Watts	sebras: Right	11:43.00
tor8	sebras: LGTM	11:44.08
sebras	tor8: pushed.	11:44.49
tor8	sebras: Robin_Watts: the stuff on tor/master could do with a review	11:56.13
sebras	tor8: the patch for PDFObject_size() is no longer necessary, right?	11:57.12
tor8	right. I just zapped it, should be gone if you pull again	11:57.39
sebras	tor8: didn't the new annot interfaces find it's way into the first commit?	11:58.53
	the latter changes in annot.h after the addition of PDF_*	11:59.14
	I'm guessing you planned om keeping those separate?	11:59.25
tor8	sebras: which commit are you talking about?	11:59.41
sebras	tor8: 4a740677267dd3ede0325d070fbe1140cf2cd3a4	11:59.55
	tor8: include/mupdf/pdf/annot.h	12:00.06
tor8	oh, right. yeah, that chunk should move to another commit.	12:00.40
sebras	I think it _used_ to be separate, so... a rebase gone awry?	12:02.54
tor8	sebras: yes. updated commits online.	12:04.08
	argh. I fail at basic git rebase ...	12:04.47
sebras	tor8: PDF_NAME_M?	12:06.52
Robin_Watts	tor8, sebras: So... this problem with just in time repair.	12:08.01
	During the repair, we read objects out of the file, and then discard them.	12:08.47
	We never write new objects over old objects.	12:08.55
	Hence all existing references should be fine.	12:09.05
	The sole exceptions to that are:	12:09.12
	1) When we replace the length of streams with the measured length.	12:09.31
	2) When we replace xref->trailer	12:10.09
	So my plan is to keep the old versions of those objects around in an orphan list.	12:11.16
	and bin that list when the document is destroyed.	12:11.49
sebras	Robin_Watts: how come an updated stream length causes the fontdict ref to be invalid?	12:12.54
tor8	Robin_Watts: sounds reasonable	12:14.21
	the stream length is often an indirect reference	12:14.31
	and setting that to the actual length (an integer) would drop the reference held to the numbered object with the length	12:15.16
Robin_Watts	sebras: The loop is running through a dictionary that purports to be the list of fonts, reading each value in turn.	12:15.19
tor8	still not sure why that would be a problem	12:15.23
Robin_Watts	But fontdict actually returns an image dictionary (actually a stream).	12:15.44
	OR rather, a reference to one.	12:16.06
	An indirect object that points to one, I should say.	12:16.24
	That object was listed in the dictionary as /Length <indirection>	12:16.57
	When we ask "is it a dictionary?" it triggers a repair of the file.	12:17.13
	As part of the repair of the file, we replace /Length <indirection> with /Length <the actual measured length>	12:17.46
	hence the <indirection> object is dropped.	12:17.58
	and fontdict is left pointing at an invalid pointer.	12:18.09
tor8	Robin_Watts: right. gotcha.	12:18.24
	an orphan list sounds good.	12:18.35
	or a 'deferred drop' list of sorts	12:18.59
sebras	Robin_Watts: btw, do you want me to close the ULL-bug and point to the commit you checked in?	12:19.24
	Robin_Watts: I already have the comment written. :)	12:19.32
	Robin_Watts: ok, now I understand the puzzle.	12:20.40
Robin_Watts	sebras: Please, go for it.	12:23.21
	It dawned on me this morning that a better solution to the ULL bug would have been:	12:23.38
	#ifndef INT64_C #define INT64_C(x) x ## ull #endif	12:24.21
	(or something like that)	12:24.27
sebras	Robin_Watts: UINT64_C(), but yes.	12:24.50
tor8	Robin_Watts: that's exactly what's in the linux stdint.h file	12:24.57
	but given that all compilers we care about support ULL suffix, why bother?	12:25.10
Robin_Watts	cos that way if we ever meet a platform where ull doesn't work, and INT64_C isn't defined in the headers, then they can define it themselves.	12:25.14
tor8	Robin_Watts: or we can tell them to upgrade to the 17 year old standard C :)	12:25.53
Robin_Watts	tor8: It's precisely so that in situations when we that "given" isn't, it's easier to cope with.	12:25.56
*Robin_Watts*	invents something that's not quite english.	12:26.14
*sebras*	throws new SemanticParseException("eh?!");	12:27.51
Robin_Watts	EWHACHOTALKINBOUTWILLIS	12:29.41
	tor8, sebras: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=1e03c06456d997435019fb3526fa2d4be7dbc6ec	12:53.17
tor8	Robin_Watts: LGTM	13:19.43
sebras	tor8: 78f22b3 does not compile.	13:38.58
tor8	sigh. maybe I should just squash it all into one big fat commit :/	13:40.22
sebras	tor8: it is worth the effort though!	13:40.41
	tor8: this is what it is like for me to do a doc commit... ;)	13:41.06
	tor8: but I have no fallback.	13:41.25
	tor8: annot->changed is missing.	13:41.30
	tor8: so it looks like you want the Remove separate ... annotaiton lists before Add annotation property accessors.	13:42.02
tor8	yes, and that doesn't rebase cleanly without hundreds of lines of diffs	13:42.16
	conflicts*	13:42.23
sebras	tor8: when that happens I sometimes just create a backup branch and keep cherry-picking the commits in the order I want. surprisingly it happens that _those_ conflicts are smaller than the conflicts I get during a rebase. I've always been puzzled as to why.	13:44.30
tor8	sebras: I squash, rebase to the top, then pick diff hunks individually to recreate the commits	13:44.52
sebras	tor8: but you'd have at least one hunk which belongs to two commits...?	13:45.42
	tor8: right?	13:45.45
	tor8: or rather, that you'd _want_ to partly be in one commit and partly be in another.	13:46.09
tor8	git gui ... I can pick lines out of a hunk	13:47.00
sebras	tor8: ok, you never mentioned that above though. :)	13:47.32
	tor8: if pdf_array_push_drop() fails in pdf_set_annot_color() we leak the array object named obj..?	13:56.23
	tor8: and if n != {1,3,4} we till set /C [] is that what we want?	13:56.56
tor8	sebras: can you wait, I'm in the middle of rebase hell	13:57.21
	i.e. the code is all in bits on the floor. if I sneeze, I won't ever be able to find everything again :)	13:57.44
sebras	tor8: sure, np. I'm not requring immediate attention. :)	13:57.50
tor8	sebras: okay, hopefully the commits on tor/master will now build	14:10.24
	sebras: yes, the error handling is incomplete for several of these functions	14:11.01
	n = 0 is also a valid color	14:11.24
	in which case /C [] is what we want	14:11.29
	an empty /C array means 'no color, make it transparent'	14:11.39
sebras	tor8: oh!	14:13.17
	tor8: do we want to fix the error handling now?	14:14.17
	tor8: you want to postpone that until later?	14:14.22
	tor8: new branch compiles and builds up until 92132c0	14:22.38
	tor8: and also LGTM so far.	14:22.47
	tor8: looking at the js, annot lists and editing functions now.	14:23.06
Robin_Watts	Do we assume that %*s works in printfs?	14:36.19
sebras	Robin_Watts: no.	14:36.45
	Robin_Watts: not in fz_vsnprintf() as far as I know.	14:36.57
Robin_Watts	ok, will work around it.	14:37.17
sebras	tor8: ok, I have reviewed up until tor/master not.	14:51.05
	now.	14:51.09
	tor8: LGTM, but I have a hard time remembering how the js-stuff works.	14:51.24
	tor8: but at the same time we know that these functions are missing proper error handling.	14:51.57
Robin_Watts	tor8, sebras: Is that fixable? Or can we at least put some FIXMEs in there?	14:53.51
sebras	Robin_Watts: of course it is fixable. I'm not sure whether tor8 wanted to fix it now, or wants to play around with the design of the interface a bit more before committing it to master.	14:54.49
Robin_Watts	tor8, sebras: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=3f185ffbda20e823b07f21021c69f29a0f607224	14:56.13
sebras	Robin_Watts: isn't gsdll_stderr() supposed to print to stderr?	15:01.41
	Robin_Watts: it prints to stdout now.	15:01.51
Robin_Watts	sebras: cut and paste whoopsie, sorry.	15:02.54
sebras	Robin_Watts: yup, got that. :)	15:03.16
Robin_Watts	Fixed: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=1c46ee8d0463c17cb2efd36e1151e6a8ad6c7c0f	15:04.02
sebras	Robin_Watts: so these strings are never null-terminated? is that due to ps?	15:06.21
kens	No, its due to Ghostscript	15:07.10
sebras	ok.	15:08.20
	Robin_Watts: LGTM	15:08.24
ago	Robin_Watts: hello, do you know in which stable version willgo the fix you made today for the bug I opened?	15:10.20
Robin_Watts	ago. 1.10	15:13.29
	Which will be released... soon.	15:13.47
	sebras: Thanks.	15:14.04
ago	Robin_Watts: do you think that you will fix the other from fuzzing before release 1.10 ?	15:17.07
Robin_Watts	ago: I don't know. I hope so.	15:17.16
sebras	ago: are you depending on them being fixed for some mupdf-based tool you are building or is this more of a personal interest? :)	15:18.04
ago	sebras: personal, usually I blogs issues spotted by asan/fuzzer	15:27.09
Robin_Watts	ago: Cool. We appreciate you looking.	15:27.31
	We've had the likes of google feed us fuzzing issues before, and we've battled through them. Any new issues are always interesting.	15:28.06
ago	Robin_Watts: I'll re-fuzz on 1.10 when it will be out :)	15:28.12
sebras	ago: ok, I was just being curious. :)	15:35.37
	ago: if you find asan-related (or american fuzzy lop issues) using mutool draw -s t $FILE then those are definitely interesting.	15:36.37
	Robin_Watts: is the fix for 695015 ready for review?	15:41.29
Robin_Watts	Is that not committed?	15:41.43
sebras	Robin_Watts: I think one of the other fuzzing issues end up in the same case.	15:41.47
Robin_Watts	http://ghostscript.com/regression/cgi-bin/clustermonitor.cgi?report=1e03c06456d997435019fb3526fa2d4be7dbc6ec&project=mupdf	15:42.10
sebras	Robin_Watts: oh... /me needs to fetch.	15:42.12
Robin_Watts	:)	15:42.18
ago	sebras: I will chck	15:44.18
sebras	ago: 697019 appears to be fixed by the same change as 697015.	15:46.09
Robin_Watts	sebras: How do I find the length of a PDFObject that I know to be an array?	15:46.36
	obj.size();	15:46.39
	?	15:46.41
sebras	Robin_Watts: something like that. sec.	15:46.59
Robin_Watts	yeah, I see it in the JNI.	15:47.09
sebras	Robin_Watts: yeah, looks right.	15:47.17
ago	sebras: do you think that is another bug but fixed by the same commit or just the same bug?	15:49.38
sebras	ago: I believe it is the same bug.	15:49.58
ago	sebras: ok	15:50.05
sebras	ago: I can see valgrind backtraces indicating that the cause is the same when I revert Robin's fix. when re-applying the fix, valgrind no longer indicates problems.	15:53.40
ago	sebras: great. thanks	15:55.11
	sebras: another OT question. Who is in charge to fix the fuzzing bugs on gs? I see bugs opened since november	16:00.19
sebras	ago: I don't think there's anyone specifically appointed to fix fuzzing bugs.	16:01.15
Robin_Watts	ago: Fuzzing bugs get fixed when we have time between fighting other fires :)	16:01.53
	sebras: How do I get a name from a PDFObject?	16:02.17
sebras	Robin_Watts: there should be a .get() for you.	16:03.35
	Robin_Watts: note that there are two variants, one for arrays one for dicts.	16:03.48
	Robin_Watts: so PDFObject.get(String) is the one you want.	16:04.10
Robin_Watts	.toByteString looks like it will return a result from either a name or a stream buffer.	16:04.25
	no. I have a PDFObject obj.	16:04.37
	I think it's a name.	16:04.45
	I want to compare it to another known name.	16:04.51
	so obj.toByteString and then a comparison.	16:05.07
sebras	Robin_Watts: yes, that might work.	16:05.18
	Robin_Watts: maybe we need to expose pdf_obj_cmp()?	16:05.33
Robin_Watts	but that's bad, cos I can't tell between a name or an str_buf without explicitly checking.	16:05.35
	pdf_obj_cmp would be good, but it'd be an expensive way to compare names.	16:06.06
	I'd like a .toByteString to be split into 2 functions I think.	16:07.19
	One that works on names, one that works on strings?	16:07.35
sebras	Robin_Watts: so basically was you want is pdf_name_eq()?	16:08.12
Robin_Watts	sebras: That's something I'd like, yes.	16:08.27
sebras	Robin_Watts: because you don't really care about the bytestrings themselves.	16:08.36
Robin_Watts	But having that would not counter what looks like a hole in our API.	16:09.05
sebras	Robin_Watts: ehm... but that one depends on pdf_objcmp() in the end!	16:09.09
	Robin_Watts: there are problem quite a number of more holes.	16:10.14
Robin_Watts	sebras: Having a java thing that called pdf_name_eq would not require making a new java object to do the comparison.	16:10.17
sebras	Robin_Watts: we don't expose the _entire_ C level API.	16:10.24
	Robin_Watts: right.	16:10.39
	Robin_Watts: I'll add it to my TODO, is that ok?	16:10.58
Robin_Watts	Exposing pdf_objcmp and using that would require us to make a new name PDFObject from a string, and then call pdf_objcmp.	16:11.02
	Sure.	16:11.05
	I can use toByteString for now.	16:11.15
	We don't expose the entire C level API. Yet :)	16:11.29
sebras	Robin_Watts: what are you playing with?	16:11.30
Robin_Watts	the gproof demo.	16:11.41
	I need to reach into the pdf document and pull out if there is an embedded profile.	16:12.03
sebras	Robin_Watts: is that another interface function we are missing?	16:14.08
	Robin_Watts: maybe even at the C-level?	16:14.11
Robin_Watts	sebras: It's not clear to me that this should be done at the C level.	16:14.30
sebras	Robin_Watts: or this is the type of PDF object trickery better not fixed by a convenience function?	16:14.30
	Robin_Watts: jstest_main.c is used in the cluster, right?	16:15.35
Robin_Watts	sebras: Certainly, I think it's worth doing this at the java level, because a) it is possible it will need tweaking, and b) it shows that the API is complete or not.	16:15.36
	jstest_main.c is used in the cluster.	16:15.45
	Fixing that is lower priority, cos it's not useful for anyone other than us, and us only with specific files.	16:16.19
sebras	Robin_Watts: right. I think this one is trivial.	16:17.22
	Robin_Watts: in 928939c346e12ecce75d8de573b13c411f1bebd5 you actually did most of the changes in jstest_main.c	16:17.40
	Robin_Watts: now we have a filename variable we copy _to_ but never read.	16:17.49
Robin_Watts	jstest_main.c is a hacked version of the old app, I think, that just renders to bitmaps.	16:18.27
sebras	Robin_Watts: yeah, I realize that.	16:18.41
Robin_Watts	In the C, we have it so that everything copes nicely if we do:	16:21.53
	pdf_to_name(ctx, pdf_dict_get(ctx, pdf_dict_get(ctx, dict, "Foo"), "Bar")) etc	16:22.42
	even if there is no "Foo".	16:22.49
	Does the java have that property, or does it throw exceptions?	16:22.59
sebras	Robin_Watts: if we fz_throw() it will throw java exceptions	16:25.46
	also it might throw because the JVM JNI interface throws an exception for whatever reason.	16:26.05
Robin_Watts	sebras: No, the C deliberately doesn't thow.	16:26.07
sebras	(e.g. accessing elements out of range or out of memory)	16:26.18
Robin_Watts	pdf_dict_get will return NULL if it's not there.	16:26.25
	If you ask pdf_dict_get to lookup in NULL it returns NULL.	16:26.37
	If you ask pdf_to_name on NULL, it returns "" etc.	16:26.50
	all so we don't need to check at every turn.	16:27.07
sebras	Robin_Watts: yeah, we mirror that behaviour from pdf_dict_gets()	16:27.08
Robin_Watts	sebras: Fab.	16:27.15
sebras	Robin_Watts: maybe with the exception of toByteString()	16:27.35
	Robin_Watts: since it uses pdf_is_name() to determine if it is a name (which is it is not if it is NULL)	16:28.00
	Robin_Watts: but then again pdf_to_str_buf() also returns "" for NULL.	16:28.36
Robin_Watts	sounds good then,	16:28.49
	I can try this and see.	16:28.53
sebras	Robin_Watts: do you want to review three trivial patches over at sebras/master?	16:29.25
Robin_Watts	just let me finish this or I'll never remember where I got to.	16:29.44
	sebras: All 3 lgtm.	16:34.40
sebras	Robin_Watts: they cluster fine, so then I'll push?	16:34.58
Robin_Watts	go for it.	16:35.04
	toByteString also seems to return null terminated arrays.	16:41.38
	That seems wrong.	16:41.41
sebras	Robin_Watts: want to review another one?	17:04.27
Robin_Watts	desperately.	17:04.36
sebras	it's a oneliner.	17:04.38
	Robin_Watts: you programming Java I take it.	17:05.02
Robin_Watts	yeah. I might just have a fiddle in PDFObject.java for you to look at.	17:05.24
	sebras: lgtm.	17:07.53
	Is it 2am again there?	17:08.05
sebras	Robin_Watts: yes.	17:08.13
Robin_Watts	You should sleep.	17:08.19
sebras	ago: still here?	17:14.09
	ago: it was rather easy to resolve the remaining fuzzing cases now that I knew where they were.	17:14.31
	ago: and how to reproduce. I think I have resolved all of them tonight, if you care to you can retest on git HEAD now, or you can wait for the 1.10 release.	17:15.30
Robin_Watts	So things like outline->title are in PDFDocEncoding, I think.	17:26.37
	(or they might have a BOM to identify their encoding)	17:26.50
	Currently the JNI code calls NewStringUTF on them, which is Bad(TM), I think.	17:27.18
sebras	Robin_Watts: oh. yeah, we should convert them to Modified UTF-8.	17:28.15
	Robin_Watts: there's something special with JNI and UTF-8 concerning NULL I think.	17:28.35
*sebras*	sleeps.	17:28.38
Robin_Watts	Night sebras:	17:28.43
	The question is, should we do that conversion at the C level, or at the JNI level?	17:29.15
	For now I propose to ignore it.	17:29.20
tor8	sebras: Robin_Watts: oh, I thought there would be a toName() for name objects, and toByteString for string objects.	22:54.47
Robin_Watts	tor8: We are hamstrung by java into having a toString that converts objects of all types to be a printable version.	22:55.20
	I therefore think that our stuff should be 'asBoolean' 'asInteger' 'asString' etc.	22:55.45
	cos 'asName' and 'asString' can both return strings that way.	22:56.02
	I have a commit to achieve that on my master.	22:56.11
tor8	sebras: dict.get("Name").toNumber() will crash with an NPE if the dict doesn't have /Name	22:56.53
Robin_Watts	Also, I have changes there to allow us to do: blah.get("Foo").get("Bar").get("Baz").asInteger() etc without needing lots of null checks.	22:56.56
	tor8: Not with my changes :)	22:57.05
tor8	Robin_Watts: ah, fab. you're returning a 'null' PDFObject instead?	22:57.41
	blah.has("Foo") to check if a key exists then perhaps?	22:57.53
Robin_Watts	tor8: I am.	22:58.13
tor8	cool.	22:58.35
Robin_Watts	blah.has("Foo") would be !blah.get("Foo").isNull()	22:58.56
tor8	toString (or asString, though I'm not too fond of that name, it may be best in the long run) should apply PDFDocEncoding etc to convert to a proper unicode java string	22:59.13
	toByteString gets the raw bytes	22:59.19
	toName would get the PDF name as a java string	22:59.26
	was my intention, and matches what the JS code does	22:59.34
Robin_Watts	tor8: OK.	22:59.46
	Currently we assume that the strings are UTF format.	23:00.02
	So that needs to be fixed in the JNI. I was going to look at that tomorrow.	23:00.17
tor8	well, the JS stuff is more automagic and uses the toString/valueOf callbacks to convert objects to primitive types	23:00.22
Robin_Watts	JNI can either take Unicode, or UTF.	23:00.45
	If we make byte [] from it, then the String constructor in javascript will take more encodings, but none are quite right	23:01.11
tor8	yeah, a raw PDF string should just be a byte array, IMO	23:01.26
Robin_Watts	hence I think we need a helper function to go from string -> UTF.	23:01.44
tor8	like the encryption ID strings, etc	23:01.47
Robin_Watts	tor8: Having the ability to get it as a byte string makes sense, yes.	23:02.08
tor8	pdf_to_utf8 ?	23:02.10
Robin_Watts	pdf_to_javas_wierd_utf8_variant.	23:02.30
tor8	that looks for the BOM or PDFDocEncoding	23:02.32
Robin_Watts	yes.	23:02.35
tor8	we have pdf_to_utf8, which should suffice?	23:02.46
	or are there more quirks to java's weird utf-8 variant?	23:02.55
Robin_Watts	tor8: javas utf is different.	23:03.03
tor8	we also have pdf_to_ucs2	23:03.05
Robin_Watts	pdf_to_ucs2 may be the easiest way.	23:03.18
tor8	hm, the JS stuff needs some extending	23:03.42
	to get a string as an array of numbers to deal with raw strings not unicode encoded	23:03.59
	etc	23:03.59
	I only have obj.toIndirect as an explicit call	23:04.14
	all the others are implicit using JS's type coercion	23:04.22
	Robin_Watts: oh, and I made PDFDocument be a proper subclass of Document in JS. will need to update the JNI to use a factory constructor.	23:07.54
	to do the same there	23:07.58
	so you won't need mDoc.toPDFDocument(), you could just do if (mDoc instanceof PDFDocument) mPDF = (PDFDocument)mDoc;	23:08.30
	Robin_Watts: and I'll follow suit in JS with 'asBoolean' etc	23:09.17
	asFloat should probably be asReal to match PDF spec terminology	23:09.49
	in JS I'm just going to call it asNumber	23:10.03
	Robin_Watts: we could make a static final PDFObject nullObject instead of creating new 'null' wrappers all the time	23:13.26
	anyway, I'm off. ttytm.	23:14.08
Robin_Watts	We could. For getDictionary and getArray, if we're called on the null object, I return the same object, so we don't make one in that case.	23:14.10
	yeah. night.	23:14.13
	Forward 1 day (to 2016/09/23)>>>

Log of #ghostscript at irc.freenode.net.