| <<<Back 1 day (to 2016/09/21) | 20160922 |
sebras | tor8: so http://bugs.ghostscript.com/show_bug.cgi?id=697018 basically requires gatherresources() to be rewritten to not be recursive. | 11:07.24 |
tor8 | sebras: cycles in the object structure? | 11:07.56 |
| sebras: look at pdf_resources_use_blending for how we use pdf_mark_obj to prevent infinite loops | 11:09.28 |
sebras | tor8: kind of. cycle when repairing pdf. | 11:09.34 |
| tor8: so it looks like there's a cycle in the object structure. | 11:09.40 |
tor8 | oh, gatherresources triggers a reparation? | 11:09.49 |
sebras | tor8: yes, and after that a recursion. | 11:10.06 |
tor8 | we repair recursively? | 11:10.27 |
sebras | tor8: bug I guess you could conceivably have a syntax correct pdf that is recursive too. | 11:10.39 |
tor8 | if the problem is that a pdf has cycliclal reference chains: page has a resource has an xobject has the same resource has an xobject has a stack smash | 11:11.40 |
| then pdf_mark_obj is the way we deal with those | 11:11.58 |
| basically flagging an object as "we've already been here, no need to try again" | 11:12.13 |
sebras | tor8: yes, something like that would work. | 11:12.36 |
| tor8: I see now tha the repair is triggered before reaching gatherresouces() | 11:12.58 |
| tor8: btw, when are you travelling to the meeting? | 11:27.09 |
tor8 | 28th | 11:27.30 |
Robin_Watts | repairs should never recurse. | 11:30.58 |
sebras | Robin_Watts: it's not the repair that recurses. | 11:33.49 |
| Robin_Watts: it's the repaired PDF that has recusive dicts. | 11:34.08 |
| tor8: Robin_Watts: commit on sebras/master the reproducer.pdf now works fine in valgrind and it clusters. | 11:34.40 |
Robin_Watts | sebras: Right | 11:43.00 |
tor8 | sebras: LGTM | 11:44.08 |
sebras | tor8: pushed. | 11:44.49 |
tor8 | sebras: Robin_Watts: the stuff on tor/master could do with a review | 11:56.13 |
sebras | tor8: the patch for PDFObject_size() is no longer necessary, right? | 11:57.12 |
tor8 | right. I just zapped it, should be gone if you pull again | 11:57.39 |
sebras | tor8: didn't the new annot interfaces find it's way into the first commit? | 11:58.53 |
| the latter changes in annot.h after the addition of PDF_* | 11:59.14 |
| I'm guessing you planned om keeping those separate? | 11:59.25 |
tor8 | sebras: which commit are you talking about? | 11:59.41 |
sebras | tor8: 4a740677267dd3ede0325d070fbe1140cf2cd3a4 | 11:59.55 |
| tor8: include/mupdf/pdf/annot.h | 12:00.06 |
tor8 | oh, right. yeah, that chunk should move to another commit. | 12:00.40 |
sebras | I think it _used_ to be separate, so... a rebase gone awry? | 12:02.54 |
tor8 | sebras: yes. updated commits online. | 12:04.08 |
| argh. I fail at basic git rebase ... | 12:04.47 |
sebras | tor8: PDF_NAME_M? | 12:06.52 |
Robin_Watts | tor8, sebras: So... this problem with just in time repair. | 12:08.01 |
| During the repair, we read objects out of the file, and then discard them. | 12:08.47 |
| We never write new objects over old objects. | 12:08.55 |
| Hence all existing references should be fine. | 12:09.05 |
| The sole exceptions to that are: | 12:09.12 |
| 1) When we replace the length of streams with the measured length. | 12:09.31 |
| 2) When we replace xref->trailer | 12:10.09 |
| So my plan is to keep the old versions of those objects around in an orphan list. | 12:11.16 |
| and bin that list when the document is destroyed. | 12:11.49 |
sebras | Robin_Watts: how come an updated stream length causes the fontdict ref to be invalid? | 12:12.54 |
tor8 | Robin_Watts: sounds reasonable | 12:14.21 |
| the stream length is often an indirect reference | 12:14.31 |
| and setting that to the actual length (an integer) would drop the reference held to the numbered object with the length | 12:15.16 |
Robin_Watts | sebras: The loop is running through a dictionary that purports to be the list of fonts, reading each value in turn. | 12:15.19 |
tor8 | still not sure why that would be a problem | 12:15.23 |
Robin_Watts | But fontdict actually returns an image dictionary (actually a stream). | 12:15.44 |
| OR rather, a reference to one. | 12:16.06 |
| An indirect object that points to one, I should say. | 12:16.24 |
| That object was listed in the dictionary as /Length <indirection> | 12:16.57 |
| When we ask "is it a dictionary?" it triggers a repair of the file. | 12:17.13 |
| As part of the repair of the file, we replace /Length <indirection> with /Length <the actual measured length> | 12:17.46 |
| hence the <indirection> object is dropped. | 12:17.58 |
| and fontdict is left pointing at an invalid pointer. | 12:18.09 |
tor8 | Robin_Watts: right. gotcha. | 12:18.24 |
| an orphan list sounds good. | 12:18.35 |
| or a 'deferred drop' list of sorts | 12:18.59 |
sebras | Robin_Watts: btw, do you want me to close the ULL-bug and point to the commit you checked in? | 12:19.24 |
| Robin_Watts: I already have the comment written. :) | 12:19.32 |
| Robin_Watts: ok, now I understand the puzzle. | 12:20.40 |
Robin_Watts | sebras: Please, go for it. | 12:23.21 |
| It dawned on me this morning that a better solution to the ULL bug would have been: | 12:23.38 |
| #ifndef INT64_C #define INT64_C(x) x ## ull #endif | 12:24.21 |
| (or something like that) | 12:24.27 |
sebras | Robin_Watts: UINT64_C(), but yes. | 12:24.50 |
tor8 | Robin_Watts: that's exactly what's in the linux stdint.h file | 12:24.57 |
| but given that all compilers we care about support ULL suffix, why bother? | 12:25.10 |
Robin_Watts | cos that way if we ever meet a platform where ull doesn't work, and INT64_C isn't defined in the headers, then they can define it themselves. | 12:25.14 |
tor8 | Robin_Watts: or we can tell them to upgrade to the 17 year old standard C :) | 12:25.53 |
Robin_Watts | tor8: It's precisely so that in situations when we that "given" isn't, it's easier to cope with. | 12:25.56 |
Robin_Watts | invents something that's not quite english. | 12:26.14 |
sebras | throws new SemanticParseException("eh?!"); | 12:27.51 |
Robin_Watts | EWHACHOTALKINBOUTWILLIS | 12:29.41 |
| tor8, sebras: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=1e03c06456d997435019fb3526fa2d4be7dbc6ec | 12:53.17 |
tor8 | Robin_Watts: LGTM | 13:19.43 |
sebras | tor8: 78f22b3 does not compile. | 13:38.58 |
tor8 | sigh. maybe I should just squash it all into one big fat commit :/ | 13:40.22 |
sebras | tor8: it is worth the effort though! | 13:40.41 |
| tor8: this is what it is like for me to do a doc commit... ;) | 13:41.06 |
| tor8: but I have no fallback. | 13:41.25 |
| tor8: annot->changed is missing. | 13:41.30 |
| tor8: so it looks like you want the Remove separate ... annotaiton lists before Add annotation property accessors. | 13:42.02 |
tor8 | yes, and that doesn't rebase cleanly without hundreds of lines of diffs | 13:42.16 |
| conflicts* | 13:42.23 |
sebras | tor8: when that happens I sometimes just create a backup branch and keep cherry-picking the commits in the order I want. surprisingly it happens that _those_ conflicts are smaller than the conflicts I get during a rebase. I've always been puzzled as to why. | 13:44.30 |
tor8 | sebras: I squash, rebase to the top, then pick diff hunks individually to recreate the commits | 13:44.52 |
sebras | tor8: but you'd have at least one hunk which belongs to two commits...? | 13:45.42 |
| tor8: right? | 13:45.45 |
| tor8: or rather, that you'd _want_ to partly be in one commit and partly be in another. | 13:46.09 |
tor8 | git gui ... I can pick lines out of a hunk | 13:47.00 |
sebras | tor8: ok, you never mentioned that above though. :) | 13:47.32 |
| tor8: if pdf_array_push_drop() fails in pdf_set_annot_color() we leak the array object named obj..? | 13:56.23 |
| tor8: and if n != {1,3,4} we till set /C [] is that what we want? | 13:56.56 |
tor8 | sebras: can you wait, I'm in the middle of rebase hell | 13:57.21 |
| i.e. the code is all in bits on the floor. if I sneeze, I won't ever be able to find everything again :) | 13:57.44 |
sebras | tor8: sure, np. I'm not requring immediate attention. :) | 13:57.50 |
tor8 | sebras: okay, hopefully the commits on tor/master will now build | 14:10.24 |
| sebras: yes, the error handling is incomplete for several of these functions | 14:11.01 |
| n = 0 is also a valid color | 14:11.24 |
| in which case /C [] is what we want | 14:11.29 |
| an empty /C array means 'no color, make it transparent' | 14:11.39 |
sebras | tor8: oh! | 14:13.17 |
| tor8: do we want to fix the error handling now? | 14:14.17 |
| tor8: you want to postpone that until later? | 14:14.22 |
| tor8: new branch compiles and builds up until 92132c0 | 14:22.38 |
| tor8: and also LGTM so far. | 14:22.47 |
| tor8: looking at the js, annot lists and editing functions now. | 14:23.06 |
Robin_Watts | Do we assume that %*s works in printfs? | 14:36.19 |
sebras | Robin_Watts: no. | 14:36.45 |
| Robin_Watts: not in fz_vsnprintf() as far as I know. | 14:36.57 |
Robin_Watts | ok, will work around it. | 14:37.17 |
sebras | tor8: ok, I have reviewed up until tor/master not. | 14:51.05 |
| now. | 14:51.09 |
| tor8: LGTM, but I have a hard time remembering how the js-stuff works. | 14:51.24 |
| tor8: but at the same time we know that these functions are missing proper error handling. | 14:51.57 |
Robin_Watts | tor8, sebras: Is that fixable? Or can we at least put some FIXMEs in there? | 14:53.51 |
sebras | Robin_Watts: of course it is fixable. I'm not sure whether tor8 wanted to fix it now, or wants to play around with the design of the interface a bit more before committing it to master. | 14:54.49 |
Robin_Watts | tor8, sebras: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=3f185ffbda20e823b07f21021c69f29a0f607224 | 14:56.13 |
sebras | Robin_Watts: isn't gsdll_stderr() supposed to print to stderr? | 15:01.41 |
| Robin_Watts: it prints to stdout now. | 15:01.51 |
Robin_Watts | sebras: cut and paste whoopsie, sorry. | 15:02.54 |
sebras | Robin_Watts: yup, got that. :) | 15:03.16 |
Robin_Watts | Fixed: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=1c46ee8d0463c17cb2efd36e1151e6a8ad6c7c0f | 15:04.02 |
sebras | Robin_Watts: so these strings are never null-terminated? is that due to ps? | 15:06.21 |
kens | No, its due to Ghostscript | 15:07.10 |
sebras | ok. | 15:08.20 |
| Robin_Watts: LGTM | 15:08.24 |
ago | Robin_Watts: hello, do you know in which stable version willgo the fix you made today for the bug I opened? | 15:10.20 |
Robin_Watts | ago. 1.10 | 15:13.29 |
| Which will be released... soon. | 15:13.47 |
| sebras: Thanks. | 15:14.04 |
ago | Robin_Watts: do you think that you will fix the other from fuzzing before release 1.10 ? | 15:17.07 |
Robin_Watts | ago: I don't know. I hope so. | 15:17.16 |
sebras | ago: are you depending on them being fixed for some mupdf-based tool you are building or is this more of a personal interest? :) | 15:18.04 |
ago | sebras: personal, usually I blogs issues spotted by asan/fuzzer | 15:27.09 |
Robin_Watts | ago: Cool. We appreciate you looking. | 15:27.31 |
| We've had the likes of google feed us fuzzing issues before, and we've battled through them. Any new issues are always interesting. | 15:28.06 |
ago | Robin_Watts: I'll re-fuzz on 1.10 when it will be out :) | 15:28.12 |
sebras | ago: ok, I was just being curious. :) | 15:35.37 |
| ago: if you find asan-related (or american fuzzy lop issues) using mutool draw -s t $FILE then those are definitely interesting. | 15:36.37 |
| Robin_Watts: is the fix for 695015 ready for review? | 15:41.29 |
Robin_Watts | Is that not committed? | 15:41.43 |
sebras | Robin_Watts: I think one of the other fuzzing issues end up in the same case. | 15:41.47 |
Robin_Watts | http://ghostscript.com/regression/cgi-bin/clustermonitor.cgi?report=1e03c06456d997435019fb3526fa2d4be7dbc6ec&project=mupdf | 15:42.10 |
sebras | Robin_Watts: oh... /me needs to fetch. | 15:42.12 |
Robin_Watts | :) | 15:42.18 |
ago | sebras: I will chck | 15:44.18 |
sebras | ago: 697019 appears to be fixed by the same change as 697015. | 15:46.09 |
Robin_Watts | sebras: How do I find the length of a PDFObject that I know to be an array? | 15:46.36 |
| obj.size(); | 15:46.39 |
| ? | 15:46.41 |
sebras | Robin_Watts: something like that. sec. | 15:46.59 |
Robin_Watts | yeah, I see it in the JNI. | 15:47.09 |
sebras | Robin_Watts: yeah, looks right. | 15:47.17 |
ago | sebras: do you think that is another bug but fixed by the same commit or just the same bug? | 15:49.38 |
sebras | ago: I believe it is the same bug. | 15:49.58 |
ago | sebras: ok | 15:50.05 |
sebras | ago: I can see valgrind backtraces indicating that the cause is the same when I revert Robin's fix. when re-applying the fix, valgrind no longer indicates problems. | 15:53.40 |
ago | sebras: great. thanks | 15:55.11 |
| sebras: another OT question. Who is in charge to fix the fuzzing bugs on gs? I see bugs opened since november | 16:00.19 |
sebras | ago: I don't think there's anyone specifically appointed to fix fuzzing bugs. | 16:01.15 |
Robin_Watts | ago: Fuzzing bugs get fixed when we have time between fighting other fires :) | 16:01.53 |
| sebras: How do I get a name from a PDFObject? | 16:02.17 |
sebras | Robin_Watts: there should be a .get() for you. | 16:03.35 |
| Robin_Watts: note that there are two variants, one for arrays one for dicts. | 16:03.48 |
| Robin_Watts: so PDFObject.get(String) is the one you want. | 16:04.10 |
Robin_Watts | .toByteString looks like it will return a result from either a name or a stream buffer. | 16:04.25 |
| no. I have a PDFObject obj. | 16:04.37 |
| I think it's a name. | 16:04.45 |
| I want to compare it to another known name. | 16:04.51 |
| so obj.toByteString and then a comparison. | 16:05.07 |
sebras | Robin_Watts: yes, that might work. | 16:05.18 |
| Robin_Watts: maybe we need to expose pdf_obj_cmp()? | 16:05.33 |
Robin_Watts | but that's bad, cos I can't tell between a name or an str_buf without explicitly checking. | 16:05.35 |
| pdf_obj_cmp would be good, but it'd be an expensive way to compare names. | 16:06.06 |
| I'd like a .toByteString to be split into 2 functions I think. | 16:07.19 |
| One that works on names, one that works on strings? | 16:07.35 |
sebras | Robin_Watts: so basically was you want is pdf_name_eq()? | 16:08.12 |
Robin_Watts | sebras: That's something I'd like, yes. | 16:08.27 |
sebras | Robin_Watts: because you don't really care about the bytestrings themselves. | 16:08.36 |
Robin_Watts | But having that would not counter what looks like a hole in our API. | 16:09.05 |
sebras | Robin_Watts: ehm... but that one depends on pdf_objcmp() in the end! | 16:09.09 |
| Robin_Watts: there are problem quite a number of more holes. | 16:10.14 |
Robin_Watts | sebras: Having a java thing that called pdf_name_eq would not require making a new java object to do the comparison. | 16:10.17 |
sebras | Robin_Watts: we don't expose the _entire_ C level API. | 16:10.24 |
| Robin_Watts: right. | 16:10.39 |
| Robin_Watts: I'll add it to my TODO, is that ok? | 16:10.58 |
Robin_Watts | Exposing pdf_objcmp and using that would require us to make a new name PDFObject from a string, and then call pdf_objcmp. | 16:11.02 |
| Sure. | 16:11.05 |
| I can use toByteString for now. | 16:11.15 |
| We don't expose the entire C level API. *Yet* :) | 16:11.29 |
sebras | Robin_Watts: what are you playing with? | 16:11.30 |
Robin_Watts | the gproof demo. | 16:11.41 |
| I need to reach into the pdf document and pull out if there is an embedded profile. | 16:12.03 |
sebras | Robin_Watts: is that another interface function we are missing? | 16:14.08 |
| Robin_Watts: maybe even at the C-level? | 16:14.11 |
Robin_Watts | sebras: It's not clear to me that this should be done at the C level. | 16:14.30 |
sebras | Robin_Watts: or this is the type of PDF object trickery better not fixed by a convenience function? | 16:14.30 |
| Robin_Watts: jstest_main.c is used in the cluster, right? | 16:15.35 |
Robin_Watts | sebras: Certainly, I think it's worth doing this at the java level, because a) it is possible it will need tweaking, and b) it shows that the API is complete or not. | 16:15.36 |
| jstest_main.c is used in the cluster. | 16:15.45 |
| Fixing that is lower priority, cos it's not useful for anyone other than us, and us only with specific files. | 16:16.19 |
sebras | Robin_Watts: right. I think this one is trivial. | 16:17.22 |
| Robin_Watts: in 928939c346e12ecce75d8de573b13c411f1bebd5 you actually did most of the changes in jstest_main.c | 16:17.40 |
| Robin_Watts: now we have a filename variable we copy _to_ but never read. | 16:17.49 |
Robin_Watts | jstest_main.c is a hacked version of the old app, I think, that just renders to bitmaps. | 16:18.27 |
sebras | Robin_Watts: yeah, I realize that. | 16:18.41 |
Robin_Watts | In the C, we have it so that everything copes nicely if we do: | 16:21.53 |
| pdf_to_name(ctx, pdf_dict_get(ctx, pdf_dict_get(ctx, dict, "Foo"), "Bar")) etc | 16:22.42 |
| even if there is no "Foo". | 16:22.49 |
| Does the java have that property, or does it throw exceptions? | 16:22.59 |
sebras | Robin_Watts: if we fz_throw() it will throw java exceptions | 16:25.46 |
| also it might throw because the JVM JNI interface throws an exception for whatever reason. | 16:26.05 |
Robin_Watts | sebras: No, the C deliberately doesn't thow. | 16:26.07 |
sebras | (e.g. accessing elements out of range or out of memory) | 16:26.18 |
Robin_Watts | pdf_dict_get will return NULL if it's not there. | 16:26.25 |
| If you ask pdf_dict_get to lookup in NULL it returns NULL. | 16:26.37 |
| If you ask pdf_to_name on NULL, it returns "" etc. | 16:26.50 |
| all so we don't need to check at every turn. | 16:27.07 |
sebras | Robin_Watts: yeah, we mirror that behaviour from pdf_dict_gets() | 16:27.08 |
Robin_Watts | sebras: Fab. | 16:27.15 |
sebras | Robin_Watts: maybe with the exception of toByteString() | 16:27.35 |
| Robin_Watts: since it uses pdf_is_name() to determine if it is a name (which is it is not if it is NULL) | 16:28.00 |
| Robin_Watts: but then again pdf_to_str_buf() also returns "" for NULL. | 16:28.36 |
Robin_Watts | sounds good then, | 16:28.49 |
| I can try this and see. | 16:28.53 |
sebras | Robin_Watts: do you want to review three trivial patches over at sebras/master? | 16:29.25 |
Robin_Watts | just let me finish this or I'll never remember where I got to. | 16:29.44 |
| sebras: All 3 lgtm. | 16:34.40 |
sebras | Robin_Watts: they cluster fine, so then I'll push? | 16:34.58 |
Robin_Watts | go for it. | 16:35.04 |
| toByteString also seems to return null terminated arrays. | 16:41.38 |
| That seems wrong. | 16:41.41 |
sebras | Robin_Watts: want to review another one? | 17:04.27 |
Robin_Watts | desperately. | 17:04.36 |
sebras | it's a oneliner. | 17:04.38 |
| Robin_Watts: you programming Java I take it. | 17:05.02 |
Robin_Watts | yeah. I might just have a fiddle in PDFObject.java for you to look at. | 17:05.24 |
| sebras: lgtm. | 17:07.53 |
| Is it 2am again there? | 17:08.05 |
sebras | Robin_Watts: yes. | 17:08.13 |
Robin_Watts | You should sleep. | 17:08.19 |
sebras | ago: still here? | 17:14.09 |
| ago: it was rather easy to resolve the remaining fuzzing cases now that I knew where they were. | 17:14.31 |
| ago: and how to reproduce. I think I have resolved all of them tonight, if you care to you can retest on git HEAD now, or you can wait for the 1.10 release. | 17:15.30 |
Robin_Watts | So things like outline->title are in PDFDocEncoding, I think. | 17:26.37 |
| (or they might have a BOM to identify their encoding) | 17:26.50 |
| Currently the JNI code calls NewStringUTF on them, which is Bad(TM), I think. | 17:27.18 |
sebras | Robin_Watts: oh. yeah, we should convert them to Modified UTF-8. | 17:28.15 |
| Robin_Watts: there's something special with JNI and UTF-8 concerning NULL I think. | 17:28.35 |
sebras | sleeps. | 17:28.38 |
Robin_Watts | Night sebras: | 17:28.43 |
| The question is, should we do that conversion at the C level, or at the JNI level? | 17:29.15 |
| For now I propose to ignore it. | 17:29.20 |
tor8 | sebras: Robin_Watts: oh, I thought there would be a toName() for name objects, and toByteString for string objects. | 22:54.47 |
Robin_Watts | tor8: We are hamstrung by java into having a toString that converts objects of all types to be a printable version. | 22:55.20 |
| I therefore think that our stuff should be 'asBoolean' 'asInteger' 'asString' etc. | 22:55.45 |
| cos 'asName' and 'asString' can both return strings that way. | 22:56.02 |
| I have a commit to achieve that on my master. | 22:56.11 |
tor8 | sebras: dict.get("Name").toNumber() will crash with an NPE if the dict doesn't have /Name | 22:56.53 |
Robin_Watts | Also, I have changes there to allow us to do: blah.get("Foo").get("Bar").get("Baz").asInteger() etc without needing lots of null checks. | 22:56.56 |
| tor8: Not with my changes :) | 22:57.05 |
tor8 | Robin_Watts: ah, fab. you're returning a 'null' PDFObject instead? | 22:57.41 |
| blah.has("Foo") to check if a key exists then perhaps? | 22:57.53 |
Robin_Watts | tor8: I am. | 22:58.13 |
tor8 | cool. | 22:58.35 |
Robin_Watts | blah.has("Foo") would be !blah.get("Foo").isNull() | 22:58.56 |
tor8 | toString (or asString, though I'm not too fond of that name, it may be best in the long run) should apply PDFDocEncoding etc to convert to a proper unicode java string | 22:59.13 |
| toByteString gets the raw bytes | 22:59.19 |
| toName would get the PDF name as a java string | 22:59.26 |
| was my intention, and matches what the JS code does | 22:59.34 |
Robin_Watts | tor8: OK. | 22:59.46 |
| Currently we assume that the strings are UTF format. | 23:00.02 |
| So that needs to be fixed in the JNI. I was going to look at that tomorrow. | 23:00.17 |
tor8 | well, the JS stuff is more automagic and uses the toString/valueOf callbacks to convert objects to primitive types | 23:00.22 |
Robin_Watts | JNI can either take Unicode, or UTF. | 23:00.45 |
| If we make byte [] from it, then the String constructor in javascript will take more encodings, but none are quite right | 23:01.11 |
tor8 | yeah, a raw PDF string should just be a byte array, IMO | 23:01.26 |
Robin_Watts | hence I think we need a helper function to go from string -> UTF. | 23:01.44 |
tor8 | like the encryption ID strings, etc | 23:01.47 |
Robin_Watts | tor8: Having the ability to get it as a byte string makes sense, yes. | 23:02.08 |
tor8 | pdf_to_utf8 ? | 23:02.10 |
Robin_Watts | pdf_to_javas_wierd_utf8_variant. | 23:02.30 |
tor8 | that looks for the BOM or PDFDocEncoding | 23:02.32 |
Robin_Watts | yes. | 23:02.35 |
tor8 | we have pdf_to_utf8, which should suffice? | 23:02.46 |
| or are there more quirks to java's weird utf-8 variant? | 23:02.55 |
Robin_Watts | tor8: javas utf is different. | 23:03.03 |
tor8 | we also have pdf_to_ucs2 | 23:03.05 |
Robin_Watts | pdf_to_ucs2 may be the easiest way. | 23:03.18 |
tor8 | hm, the JS stuff needs some extending | 23:03.42 |
| to get a string as an array of numbers to deal with raw strings not unicode encoded | 23:03.59 |
| etc | 23:03.59 |
| I only have obj.toIndirect as an explicit call | 23:04.14 |
| all the others are implicit using JS's type coercion | 23:04.22 |
| Robin_Watts: oh, and I made PDFDocument be a proper subclass of Document in JS. will need to update the JNI to use a factory constructor. | 23:07.54 |
| to do the same there | 23:07.58 |
| so you won't need mDoc.toPDFDocument(), you could just do if (mDoc instanceof PDFDocument) mPDF = (PDFDocument)mDoc; | 23:08.30 |
| Robin_Watts: and I'll follow suit in JS with 'asBoolean' etc | 23:09.17 |
| asFloat should probably be asReal to match PDF spec terminology | 23:09.49 |
| in JS I'm just going to call it asNumber | 23:10.03 |
| Robin_Watts: we could make a static final PDFObject nullObject instead of creating new 'null' wrappers all the time | 23:13.26 |
| anyway, I'm off. ttytm. | 23:14.08 |
Robin_Watts | We could. For getDictionary and getArray, if we're called on the null object, I return the same object, so we don't make one in that case. | 23:14.10 |
| yeah. night. | 23:14.13 |
| Forward 1 day (to 2016/09/23)>>> | |