Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2016/09/21)20160922 
sebras tor8: so http://bugs.ghostscript.com/show_bug.cgi?id=697018 basically requires gatherresources() to be rewritten to not be recursive.11:07.24 
tor8 sebras: cycles in the object structure?11:07.56 
  sebras: look at pdf_resources_use_blending for how we use pdf_mark_obj to prevent infinite loops11:09.28 
sebras tor8: kind of. cycle when repairing pdf.11:09.34 
  tor8: so it looks like there's a cycle in the object structure.11:09.40 
tor8 oh, gatherresources triggers a reparation?11:09.49 
sebras tor8: yes, and after that a recursion.11:10.06 
tor8 we repair recursively?11:10.27 
sebras tor8: bug I guess you could conceivably have a syntax correct pdf that is recursive too.11:10.39 
tor8 if the problem is that a pdf has cycliclal reference chains: page has a resource has an xobject has the same resource has an xobject has a stack smash11:11.40 
  then pdf_mark_obj is the way we deal with those11:11.58 
  basically flagging an object as "we've already been here, no need to try again"11:12.13 
sebras tor8: yes, something like that would work.11:12.36 
  tor8: I see now tha the repair is triggered before reaching gatherresouces()11:12.58 
  tor8: btw, when are you travelling to the meeting?11:27.09 
tor8 28th11:27.30 
Robin_Watts repairs should never recurse.11:30.58 
sebras Robin_Watts: it's not the repair that recurses.11:33.49 
  Robin_Watts: it's the repaired PDF that has recusive dicts.11:34.08 
  tor8: Robin_Watts: commit on sebras/master the reproducer.pdf now works fine in valgrind and it clusters.11:34.40 
Robin_Watts sebras: Right11:43.00 
tor8 sebras: LGTM11:44.08 
sebras tor8: pushed.11:44.49 
tor8 sebras: Robin_Watts: the stuff on tor/master could do with a review11:56.13 
sebras tor8: the patch for PDFObject_size() is no longer necessary, right?11:57.12 
tor8 right. I just zapped it, should be gone if you pull again11:57.39 
sebras tor8: didn't the new annot interfaces find it's way into the first commit?11:58.53 
  the latter changes in annot.h after the addition of PDF_*11:59.14 
  I'm guessing you planned om keeping those separate?11:59.25 
tor8 sebras: which commit are you talking about?11:59.41 
sebras tor8: 4a740677267dd3ede0325d070fbe1140cf2cd3a411:59.55 
  tor8: include/mupdf/pdf/annot.h12:00.06 
tor8 oh, right. yeah, that chunk should move to another commit.12:00.40 
sebras I think it _used_ to be separate, so... a rebase gone awry?12:02.54 
tor8 sebras: yes. updated commits online.12:04.08 
  argh. I fail at basic git rebase ...12:04.47 
sebras tor8: PDF_NAME_M?12:06.52 
Robin_Watts tor8, sebras: So... this problem with just in time repair.12:08.01 
  During the repair, we read objects out of the file, and then discard them.12:08.47 
  We never write new objects over old objects.12:08.55 
  Hence all existing references should be fine.12:09.05 
  The sole exceptions to that are:12:09.12 
  1) When we replace the length of streams with the measured length.12:09.31 
  2) When we replace xref->trailer12:10.09 
  So my plan is to keep the old versions of those objects around in an orphan list.12:11.16 
  and bin that list when the document is destroyed.12:11.49 
sebras Robin_Watts: how come an updated stream length causes the fontdict ref to be invalid?12:12.54 
tor8 Robin_Watts: sounds reasonable12:14.21 
  the stream length is often an indirect reference12:14.31 
  and setting that to the actual length (an integer) would drop the reference held to the numbered object with the length12:15.16 
Robin_Watts sebras: The loop is running through a dictionary that purports to be the list of fonts, reading each value in turn.12:15.19 
tor8 still not sure why that would be a problem12:15.23 
Robin_Watts But fontdict actually returns an image dictionary (actually a stream).12:15.44 
  OR rather, a reference to one.12:16.06 
  An indirect object that points to one, I should say.12:16.24 
  That object was listed in the dictionary as /Length <indirection>12:16.57 
  When we ask "is it a dictionary?" it triggers a repair of the file.12:17.13 
  As part of the repair of the file, we replace /Length <indirection> with /Length <the actual measured length>12:17.46 
  hence the <indirection> object is dropped.12:17.58 
  and fontdict is left pointing at an invalid pointer.12:18.09 
tor8 Robin_Watts: right. gotcha.12:18.24 
  an orphan list sounds good.12:18.35 
  or a 'deferred drop' list of sorts12:18.59 
sebras Robin_Watts: btw, do you want me to close the ULL-bug and point to the commit you checked in?12:19.24 
  Robin_Watts: I already have the comment written. :)12:19.32 
  Robin_Watts: ok, now I understand the puzzle.12:20.40 
Robin_Watts sebras: Please, go for it.12:23.21 
  It dawned on me this morning that a better solution to the ULL bug would have been:12:23.38 
  #ifndef INT64_C #define INT64_C(x) x ## ull #endif12:24.21 
  (or something like that)12:24.27 
sebras Robin_Watts: UINT64_C(), but yes.12:24.50 
tor8 Robin_Watts: that's exactly what's in the linux stdint.h file12:24.57 
  but given that all compilers we care about support ULL suffix, why bother?12:25.10 
Robin_Watts cos that way if we ever meet a platform where ull doesn't work, and INT64_C isn't defined in the headers, then they can define it themselves.12:25.14 
tor8 Robin_Watts: or we can tell them to upgrade to the 17 year old standard C :)12:25.53 
Robin_Watts tor8: It's precisely so that in situations when we that "given" isn't, it's easier to cope with.12:25.56 
Robin_Watts invents something that's not quite english.12:26.14 
sebras throws new SemanticParseException("eh?!");12:27.51 
Robin_Watts EWHACHOTALKINBOUTWILLIS12:29.41 
  tor8, sebras: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=1e03c06456d997435019fb3526fa2d4be7dbc6ec12:53.17 
tor8 Robin_Watts: LGTM13:19.43 
sebras tor8: 78f22b3 does not compile.13:38.58 
tor8 sigh. maybe I should just squash it all into one big fat commit :/13:40.22 
sebras tor8: it is worth the effort though!13:40.41 
  tor8: this is what it is like for me to do a doc commit... ;)13:41.06 
  tor8: but I have no fallback.13:41.25 
  tor8: annot->changed is missing.13:41.30 
  tor8: so it looks like you want the Remove separate ... annotaiton lists before Add annotation property accessors.13:42.02 
tor8 yes, and that doesn't rebase cleanly without hundreds of lines of diffs13:42.16 
  conflicts*13:42.23 
sebras tor8: when that happens I sometimes just create a backup branch and keep cherry-picking the commits in the order I want. surprisingly it happens that _those_ conflicts are smaller than the conflicts I get during a rebase. I've always been puzzled as to why.13:44.30 
tor8 sebras: I squash, rebase to the top, then pick diff hunks individually to recreate the commits13:44.52 
sebras tor8: but you'd have at least one hunk which belongs to two commits...?13:45.42 
  tor8: right?13:45.45 
  tor8: or rather, that you'd _want_ to partly be in one commit and partly be in another.13:46.09 
tor8 git gui ... I can pick lines out of a hunk13:47.00 
sebras tor8: ok, you never mentioned that above though. :)13:47.32 
  tor8: if pdf_array_push_drop() fails in pdf_set_annot_color() we leak the array object named obj..?13:56.23 
  tor8: and if n != {1,3,4} we till set /C [] is that what we want?13:56.56 
tor8 sebras: can you wait, I'm in the middle of rebase hell13:57.21 
  i.e. the code is all in bits on the floor. if I sneeze, I won't ever be able to find everything again :)13:57.44 
sebras tor8: sure, np. I'm not requring immediate attention. :)13:57.50 
tor8 sebras: okay, hopefully the commits on tor/master will now build14:10.24 
  sebras: yes, the error handling is incomplete for several of these functions14:11.01 
  n = 0 is also a valid color14:11.24 
  in which case /C [] is what we want14:11.29 
  an empty /C array means 'no color, make it transparent'14:11.39 
sebras tor8: oh!14:13.17 
  tor8: do we want to fix the error handling now?14:14.17 
  tor8: you want to postpone that until later?14:14.22 
  tor8: new branch compiles and builds up until 92132c014:22.38 
  tor8: and also LGTM so far.14:22.47 
  tor8: looking at the js, annot lists and editing functions now.14:23.06 
Robin_Watts Do we assume that %*s works in printfs?14:36.19 
sebras Robin_Watts: no.14:36.45 
  Robin_Watts: not in fz_vsnprintf() as far as I know.14:36.57 
Robin_Watts ok, will work around it.14:37.17 
sebras tor8: ok, I have reviewed up until tor/master not.14:51.05 
  now.14:51.09 
  tor8: LGTM, but I have a hard time remembering how the js-stuff works.14:51.24 
  tor8: but at the same time we know that these functions are missing proper error handling.14:51.57 
Robin_Watts tor8, sebras: Is that fixable? Or can we at least put some FIXMEs in there?14:53.51 
sebras Robin_Watts: of course it is fixable. I'm not sure whether tor8 wanted to fix it now, or wants to play around with the design of the interface a bit more before committing it to master.14:54.49 
Robin_Watts tor8, sebras: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=3f185ffbda20e823b07f21021c69f29a0f60722414:56.13 
sebras Robin_Watts: isn't gsdll_stderr() supposed to print to stderr?15:01.41 
  Robin_Watts: it prints to stdout now.15:01.51 
Robin_Watts sebras: cut and paste whoopsie, sorry.15:02.54 
sebras Robin_Watts: yup, got that. :)15:03.16 
Robin_Watts Fixed: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=1c46ee8d0463c17cb2efd36e1151e6a8ad6c7c0f15:04.02 
sebras Robin_Watts: so these strings are never null-terminated? is that due to ps?15:06.21 
kens No, its due to Ghostscript15:07.10 
sebras ok.15:08.20 
  Robin_Watts: LGTM15:08.24 
ago Robin_Watts: hello, do you know in which stable version willgo the fix you made today for the bug I opened?15:10.20 
Robin_Watts ago. 1.1015:13.29 
  Which will be released... soon.15:13.47 
  sebras: Thanks.15:14.04 
ago Robin_Watts: do you think that you will fix the other from fuzzing before release 1.10 ?15:17.07 
Robin_Watts ago: I don't know. I hope so.15:17.16 
sebras ago: are you depending on them being fixed for some mupdf-based tool you are building or is this more of a personal interest? :)15:18.04 
ago sebras: personal, usually I blogs issues spotted by asan/fuzzer15:27.09 
Robin_Watts ago: Cool. We appreciate you looking.15:27.31 
  We've had the likes of google feed us fuzzing issues before, and we've battled through them. Any new issues are always interesting.15:28.06 
ago Robin_Watts: I'll re-fuzz on 1.10 when it will be out :)15:28.12 
sebras ago: ok, I was just being curious. :)15:35.37 
  ago: if you find asan-related (or american fuzzy lop issues) using mutool draw -s t $FILE then those are definitely interesting.15:36.37 
  Robin_Watts: is the fix for 695015 ready for review?15:41.29 
Robin_Watts Is that not committed?15:41.43 
sebras Robin_Watts: I think one of the other fuzzing issues end up in the same case.15:41.47 
Robin_Watts http://ghostscript.com/regression/cgi-bin/clustermonitor.cgi?report=1e03c06456d997435019fb3526fa2d4be7dbc6ec&project=mupdf15:42.10 
sebras Robin_Watts: oh... /me needs to fetch.15:42.12 
Robin_Watts :)15:42.18 
ago sebras: I will chck15:44.18 
sebras ago: 697019 appears to be fixed by the same change as 697015.15:46.09 
Robin_Watts sebras: How do I find the length of a PDFObject that I know to be an array?15:46.36 
  obj.size();15:46.39 
  ?15:46.41 
sebras Robin_Watts: something like that. sec.15:46.59 
Robin_Watts yeah, I see it in the JNI.15:47.09 
sebras Robin_Watts: yeah, looks right.15:47.17 
ago sebras: do you think that is another bug but fixed by the same commit or just the same bug?15:49.38 
sebras ago: I believe it is the same bug.15:49.58 
ago sebras: ok15:50.05 
sebras ago: I can see valgrind backtraces indicating that the cause is the same when I revert Robin's fix. when re-applying the fix, valgrind no longer indicates problems.15:53.40 
ago sebras: great. thanks15:55.11 
  sebras: another OT question. Who is in charge to fix the fuzzing bugs on gs? I see bugs opened since november16:00.19 
sebras ago: I don't think there's anyone specifically appointed to fix fuzzing bugs.16:01.15 
Robin_Watts ago: Fuzzing bugs get fixed when we have time between fighting other fires :)16:01.53 
  sebras: How do I get a name from a PDFObject?16:02.17 
sebras Robin_Watts: there should be a .get() for you.16:03.35 
  Robin_Watts: note that there are two variants, one for arrays one for dicts.16:03.48 
  Robin_Watts: so PDFObject.get(String) is the one you want.16:04.10 
Robin_Watts .toByteString looks like it will return a result from either a name or a stream buffer.16:04.25 
  no. I have a PDFObject obj.16:04.37 
  I think it's a name.16:04.45 
  I want to compare it to another known name.16:04.51 
  so obj.toByteString and then a comparison.16:05.07 
sebras Robin_Watts: yes, that might work.16:05.18 
  Robin_Watts: maybe we need to expose pdf_obj_cmp()?16:05.33 
Robin_Watts but that's bad, cos I can't tell between a name or an str_buf without explicitly checking.16:05.35 
  pdf_obj_cmp would be good, but it'd be an expensive way to compare names.16:06.06 
  I'd like a .toByteString to be split into 2 functions I think.16:07.19 
  One that works on names, one that works on strings?16:07.35 
sebras Robin_Watts: so basically was you want is pdf_name_eq()?16:08.12 
Robin_Watts sebras: That's something I'd like, yes.16:08.27 
sebras Robin_Watts: because you don't really care about the bytestrings themselves.16:08.36 
Robin_Watts But having that would not counter what looks like a hole in our API.16:09.05 
sebras Robin_Watts: ehm... but that one depends on pdf_objcmp() in the end!16:09.09 
  Robin_Watts: there are problem quite a number of more holes.16:10.14 
Robin_Watts sebras: Having a java thing that called pdf_name_eq would not require making a new java object to do the comparison.16:10.17 
sebras Robin_Watts: we don't expose the _entire_ C level API.16:10.24 
  Robin_Watts: right.16:10.39 
  Robin_Watts: I'll add it to my TODO, is that ok?16:10.58 
Robin_Watts Exposing pdf_objcmp and using that would require us to make a new name PDFObject from a string, and then call pdf_objcmp.16:11.02 
  Sure.16:11.05 
  I can use toByteString for now.16:11.15 
  We don't expose the entire C level API. *Yet* :)16:11.29 
sebras Robin_Watts: what are you playing with?16:11.30 
Robin_Watts the gproof demo.16:11.41 
  I need to reach into the pdf document and pull out if there is an embedded profile.16:12.03 
sebras Robin_Watts: is that another interface function we are missing?16:14.08 
  Robin_Watts: maybe even at the C-level?16:14.11 
Robin_Watts sebras: It's not clear to me that this should be done at the C level.16:14.30 
sebras Robin_Watts: or this is the type of PDF object trickery better not fixed by a convenience function?16:14.30 
  Robin_Watts: jstest_main.c is used in the cluster, right?16:15.35 
Robin_Watts sebras: Certainly, I think it's worth doing this at the java level, because a) it is possible it will need tweaking, and b) it shows that the API is complete or not.16:15.36 
  jstest_main.c is used in the cluster.16:15.45 
  Fixing that is lower priority, cos it's not useful for anyone other than us, and us only with specific files.16:16.19 
sebras Robin_Watts: right. I think this one is trivial.16:17.22 
  Robin_Watts: in 928939c346e12ecce75d8de573b13c411f1bebd5 you actually did most of the changes in jstest_main.c16:17.40 
  Robin_Watts: now we have a filename variable we copy _to_ but never read.16:17.49 
Robin_Watts jstest_main.c is a hacked version of the old app, I think, that just renders to bitmaps.16:18.27 
sebras Robin_Watts: yeah, I realize that.16:18.41 
Robin_Watts In the C, we have it so that everything copes nicely if we do:16:21.53 
  pdf_to_name(ctx, pdf_dict_get(ctx, pdf_dict_get(ctx, dict, "Foo"), "Bar")) etc16:22.42 
  even if there is no "Foo".16:22.49 
  Does the java have that property, or does it throw exceptions?16:22.59 
sebras Robin_Watts: if we fz_throw() it will throw java exceptions16:25.46 
  also it might throw because the JVM JNI interface throws an exception for whatever reason.16:26.05 
Robin_Watts sebras: No, the C deliberately doesn't thow.16:26.07 
sebras (e.g. accessing elements out of range or out of memory)16:26.18 
Robin_Watts pdf_dict_get will return NULL if it's not there.16:26.25 
  If you ask pdf_dict_get to lookup in NULL it returns NULL.16:26.37 
  If you ask pdf_to_name on NULL, it returns "" etc.16:26.50 
  all so we don't need to check at every turn.16:27.07 
sebras Robin_Watts: yeah, we mirror that behaviour from pdf_dict_gets()16:27.08 
Robin_Watts sebras: Fab.16:27.15 
sebras Robin_Watts: maybe with the exception of toByteString()16:27.35 
  Robin_Watts: since it uses pdf_is_name() to determine if it is a name (which is it is not if it is NULL)16:28.00 
  Robin_Watts: but then again pdf_to_str_buf() also returns "" for NULL.16:28.36 
Robin_Watts sounds good then,16:28.49 
  I can try this and see.16:28.53 
sebras Robin_Watts: do you want to review three trivial patches over at sebras/master?16:29.25 
Robin_Watts just let me finish this or I'll never remember where I got to.16:29.44 
  sebras: All 3 lgtm.16:34.40 
sebras Robin_Watts: they cluster fine, so then I'll push?16:34.58 
Robin_Watts go for it.16:35.04 
  toByteString also seems to return null terminated arrays.16:41.38 
  That seems wrong.16:41.41 
sebras Robin_Watts: want to review another one?17:04.27 
Robin_Watts desperately.17:04.36 
sebras it's a oneliner.17:04.38 
  Robin_Watts: you programming Java I take it.17:05.02 
Robin_Watts yeah. I might just have a fiddle in PDFObject.java for you to look at.17:05.24 
  sebras: lgtm.17:07.53 
  Is it 2am again there?17:08.05 
sebras Robin_Watts: yes.17:08.13 
Robin_Watts You should sleep.17:08.19 
sebras ago: still here?17:14.09 
  ago: it was rather easy to resolve the remaining fuzzing cases now that I knew where they were.17:14.31 
  ago: and how to reproduce. I think I have resolved all of them tonight, if you care to you can retest on git HEAD now, or you can wait for the 1.10 release.17:15.30 
Robin_Watts So things like outline->title are in PDFDocEncoding, I think.17:26.37 
  (or they might have a BOM to identify their encoding)17:26.50 
  Currently the JNI code calls NewStringUTF on them, which is Bad(TM), I think.17:27.18 
sebras Robin_Watts: oh. yeah, we should convert them to Modified UTF-8.17:28.15 
  Robin_Watts: there's something special with JNI and UTF-8 concerning NULL I think.17:28.35 
sebras sleeps.17:28.38 
Robin_Watts Night sebras:17:28.43 
  The question is, should we do that conversion at the C level, or at the JNI level?17:29.15 
  For now I propose to ignore it.17:29.20 
tor8 sebras: Robin_Watts: oh, I thought there would be a toName() for name objects, and toByteString for string objects.22:54.47 
Robin_Watts tor8: We are hamstrung by java into having a toString that converts objects of all types to be a printable version.22:55.20 
  I therefore think that our stuff should be 'asBoolean' 'asInteger' 'asString' etc.22:55.45 
  cos 'asName' and 'asString' can both return strings that way.22:56.02 
  I have a commit to achieve that on my master.22:56.11 
tor8 sebras: dict.get("Name").toNumber() will crash with an NPE if the dict doesn't have /Name22:56.53 
Robin_Watts Also, I have changes there to allow us to do: blah.get("Foo").get("Bar").get("Baz").asInteger() etc without needing lots of null checks.22:56.56 
  tor8: Not with my changes :)22:57.05 
tor8 Robin_Watts: ah, fab. you're returning a 'null' PDFObject instead?22:57.41 
  blah.has("Foo") to check if a key exists then perhaps?22:57.53 
Robin_Watts tor8: I am.22:58.13 
tor8 cool.22:58.35 
Robin_Watts blah.has("Foo") would be !blah.get("Foo").isNull()22:58.56 
tor8 toString (or asString, though I'm not too fond of that name, it may be best in the long run) should apply PDFDocEncoding etc to convert to a proper unicode java string22:59.13 
  toByteString gets the raw bytes22:59.19 
  toName would get the PDF name as a java string22:59.26 
  was my intention, and matches what the JS code does22:59.34 
Robin_Watts tor8: OK.22:59.46 
  Currently we assume that the strings are UTF format.23:00.02 
  So that needs to be fixed in the JNI. I was going to look at that tomorrow.23:00.17 
tor8 well, the JS stuff is more automagic and uses the toString/valueOf callbacks to convert objects to primitive types23:00.22 
Robin_Watts JNI can either take Unicode, or UTF.23:00.45 
  If we make byte [] from it, then the String constructor in javascript will take more encodings, but none are quite right23:01.11 
tor8 yeah, a raw PDF string should just be a byte array, IMO23:01.26 
Robin_Watts hence I think we need a helper function to go from string -> UTF.23:01.44 
tor8 like the encryption ID strings, etc23:01.47 
Robin_Watts tor8: Having the ability to get it as a byte string makes sense, yes.23:02.08 
tor8 pdf_to_utf8 ?23:02.10 
Robin_Watts pdf_to_javas_wierd_utf8_variant.23:02.30 
tor8 that looks for the BOM or PDFDocEncoding23:02.32 
Robin_Watts yes.23:02.35 
tor8 we have pdf_to_utf8, which should suffice?23:02.46 
  or are there more quirks to java's weird utf-8 variant?23:02.55 
Robin_Watts tor8: javas utf is different.23:03.03 
tor8 we also have pdf_to_ucs223:03.05 
Robin_Watts pdf_to_ucs2 may be the easiest way.23:03.18 
tor8 hm, the JS stuff needs some extending23:03.42 
  to get a string as an array of numbers to deal with raw strings not unicode encoded23:03.59 
  etc23:03.59 
  I only have obj.toIndirect as an explicit call23:04.14 
  all the others are implicit using JS's type coercion23:04.22 
  Robin_Watts: oh, and I made PDFDocument be a proper subclass of Document in JS. will need to update the JNI to use a factory constructor.23:07.54 
  to do the same there23:07.58 
  so you won't need mDoc.toPDFDocument(), you could just do if (mDoc instanceof PDFDocument) mPDF = (PDFDocument)mDoc;23:08.30 
  Robin_Watts: and I'll follow suit in JS with 'asBoolean' etc23:09.17 
  asFloat should probably be asReal to match PDF spec terminology23:09.49 
  in JS I'm just going to call it asNumber23:10.03 
  Robin_Watts: we could make a static final PDFObject nullObject instead of creating new 'null' wrappers all the time23:13.26 
  anyway, I'm off. ttytm.23:14.08 
Robin_Watts We could. For getDictionary and getArray, if we're called on the null object, I return the same object, so we don't make one in that case.23:14.10 
  yeah. night.23:14.13 
 Forward 1 day (to 2016/09/23)>>> 
ghostscript.com
Search: