| <<<Back 1 day (to 2016/09/20) | 20160921 |
sebras | jogux: you might be on to something. | 12:01.37 |
| jogux: UTIN64_C() is a C99 construct. | 12:01.44 |
jogux | so is int64_t isn't it? :) (technically at least) | 12:02.06 |
| I did have a quick reading of the C standard, afaict hex literals should be open to automatical promotion to unsigned long long, so I still don't understand why gcc is moaning. | 12:02.51 |
sebras | jogux: yes, but I think the old compiler he is uinsg 4.4.7 does not by default compile using C99, so I wonder if you had to explicitly suffix your constants back then..? | 12:03.20 |
tor8 | sebras: the default gcc mode since a long long time has been gnu99 | 12:03.40 |
| the bigger question is, does MSVC 2005 support the UINT64_C macro? | 12:04.06 |
sebras | tor8: well, up until gcc 4.9.4 it was gnu90... https://gcc.gnu.org/onlinedocs/gcc-4.9.4/gcc/Standards.html | 12:05.34 |
| in gcc 5.4.0 and later it is gnu11 https://gcc.gnu.org/onlinedocs/gcc-5.4.0/gcc/Standards.html#Standards but I have no idea what happened earlier on in the 5.x-series. | 12:06.23 |
| tor8: we could define it to be a passthrough macro if it is not alreayd defined. | 12:06.37 |
| but then again, we could also just use the ULL suffix..? | 12:06.49 |
| without using the macro. | 12:06.57 |
tor8 | again, not sure if that works with MSVC | 12:07.28 |
sebras | then my best suggestion is to define a passthru macro if one is not already defined. | 12:08.01 |
| tor8: https://github.com/jemalloc/jemalloc/issues/65 this seems to indicate that LLU (or ULL?) is not supported by msvc. | 12:10.19 |
tor8 | sebras: hah, I was just looking at that exact same page :) | 12:10.55 |
sebras | also there's a __STDC_CONSTANT_MACROS macro, which I'm not quite sure what it controls. | 12:11.18 |
| but I think it is for C++-only. | 12:11.26 |
tor8 | UINT64_C on linux is just a macro to concatenate UL or ULL to the integer constant | 12:11.44 |
sebras | tor8: http://stackoverflow.com/questions/126279/c99-stdint-h-header-and-ms-visual-studio seems to indicate that they added stdint.h to MSVC 2010. | 12:12.30 |
tor8 | https://msdn.microsoft.com/en-us/library/00a1awxf.aspx seems to indicate that MSVC supports ULL suffixes | 12:13.37 |
| at least in C++ mode | 12:13.45 |
sebras | tor8: right, but then there is no Since-tag there, so it could be since MSVC2017... ;) | 12:14.19 |
Robin_Watts | I can test whatever it is you want tested on vs2005 trivially. | 12:14.56 |
sebras | oh! there's a pulldown menu to choose the versions! :) | 12:14.58 |
| Robin_Watts: please do! :) is the UINT64_C() macro available in stdint.h? | 12:15.18 |
Robin_Watts | which constant needs it added? | 12:15.42 |
sebras | source/fitz/ftoa.c:static uint64_t powers_ten | 12:15.52 |
tor8 | Robin_Watts: links on the irc web log are broken. they're all of the form http:/<wbr>/... | 12:15.57 |
sebras | tor8: yeah, I forgot to mention that. | 12:16.24 |
tor8 | sebras: on the jemalloc github issue, they mention the suffix "LLU" | 12:16.39 |
| which is not supported by that grammar, it must be ULL for MSVC | 12:16.47 |
| if ULL works, that's what I suggest we use | 12:17.09 |
| easier and less ugly than UINT64_C macros | 12:17.21 |
sebras | tor8: but it's declared using a stdint type. | 12:17.55 |
| tor8: but I do agree with you about the uglyness. | 12:18.07 |
tor8 | sebras: irrelevant. it's the literal constant that doesn't fit (though I'd have hoped compilers were smarter about large constants) | 12:18.40 |
| i.e., where that literal constant is used is irrelevant, it's the fact that the constant is too big to fit 'int' which is the default type for constants that matters. as I understand it. | 12:20.06 |
sebras | tor8: well, I arguing that since we use the type we might as well use the macro that's "supposed" to be used for this purpose. | 12:20.10 |
tor8 | sebras: you asked about annotations yesterday. I've added InkList and QuadPoints accessors on tor/master | 12:22.49 |
sebras | tor8: ok. I'll have quick look. | 12:23.08 |
| tor8: using pdf_set_annot_ink_list() you set the entire inklist. | 12:24.35 |
| tor8: would it be somehow better to set each path of the inklist on its own? | 12:24.57 |
tor8 | not sure. this is how the old code worked, and it feels reasonable. | 12:25.27 |
| but given how it just prods and pokes at pdf_obj arrays under the hood, doing an incremental interface would also work | 12:25.53 |
sebras | tor8: I think so. | 12:26.35 |
| tor8: and if a user has an undo-button (which I think NUI will have in the end) | 12:26.49 |
Robin_Watts | Sorry, phone call. | 12:27.00 |
sebras | then the user would probably prefer to undo stroke by stroke... | 12:27.04 |
Robin_Watts | OK, INT64_C(x) does not work. | 12:27.07 |
tor8 | sebras: undo would work at a much higher/lower level I think. undoing stroke by stroke would mean the gui keeps track of strokes, and can just recreate the annot for each undo step. | 12:27.43 |
sebras | Robin_Watts: how about ULL? | 12:27.45 |
Robin_Watts | suffixing with ull does. | 12:27.46 |
tor8 | though I *seriously* doubt we'll be adding undo at that level | 12:27.54 |
| sebras: or wait until the user hits 'save' to actually create the annot | 12:28.40 |
sebras | tor8: well, fred mentioned wanting to have a C level call to begin and end undo transactions. and then an API to undo/redo them. | 12:28.45 |
tor8 | save or confirm, etc | 12:28.48 |
| right. begin/end undo transactions would have to be on the underlying pdf_obj layer. probably using incremental xrefs. | 12:29.20 |
| will not be as easy as that though | 12:29.25 |
| since pdf_obj's are mutable | 12:29.35 |
sebras | tor8: I found the NUI pdf I saw some time ago and it does mention a undo button but doesn't specify it a whole lot more. | 12:29.57 |
| tor8: yeah I know. | 12:30.17 |
tor8 | undo as it stands would have to be by saving and reverting to a saved state | 12:30.33 |
| reverting to a saved state would mean reloading the pdf | 12:30.45 |
| the core is just not designed with undo/redo in mind | 12:30.55 |
sebras | tor8: not yet! ;) | 12:31.24 |
tor8 | and the incremental save stuff we added for digital signatures is complicated | 12:31.25 |
| sebras: pdf_dict_put/pdf_array_put would have to be smart and add versioned states | 12:31.55 |
| or use some copy on write hackery | 12:32.16 |
sebras | tor8: yeah, and depending on incremental xrefs is probably not enough, unless we make each alteration a new version of some sort. | 12:33.33 |
| so for fred's idea I guess begin_transation() would create a new incremental xref and then we never modify old objects, just create new ones and relink | 12:34.06 |
| and for the next begin_transacation() we create yet another xref part. | 12:34.23 |
| I guess that mean we could rewind to the a precious xref version. | 12:34.48 |
tor8 | sebras: yes. and something to handle copy-on-write when mutating existing objects. | 12:34.50 |
sebras | tor8: indeed. but... ugh. this will not be done anytime soon. :) | 12:35.11 |
tor8 | and to reset the copies to the old state when rewinding | 12:35.11 |
| indeed, it will not. | 12:35.17 |
Robin_Watts | sebras: We could do something like that, and possibly add something to 'collapse' multiple xrefs to a single one on a save. | 12:35.43 |
| but that's a lot of work. | 12:35.51 |
tor8 | Robin_Watts: one would hope that undo works across a 'save' though | 12:36.04 |
Robin_Watts | tor8: urgh. | 12:36.15 |
tor8 | I mean, hitting ^S should not prevent us from undoing past that :) | 12:36.28 |
Robin_Watts | tor8: Right, yes, but not after a reload. | 12:36.41 |
tor8 | not meaning undo should work after reloading/opening a file | 12:36.51 |
Robin_Watts | That's not incompatible with what I suggested. | 12:36.53 |
tor8 | yeah, that depends on whether pdf_save_document actually merges the xref's in memory or just as they're written out | 12:37.16 |
ago | Hello | 12:45.13 |
ghostbot | Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line. | 12:45.13 |
ago | mupdf developers here? I filed multiple security bugs time ago with no response... | 12:46.04 |
Robin_Watts | ago: Hi. | 12:50.18 |
| Yes, we're here. Which ones? | 12:50.26 |
| Ah, you're the fuzzing guy? | 12:51.36 |
| 697015 onwards? | 12:51.43 |
sebras | Robin_Watts: I actually tried to reproduce http://bugs.ghostscript.com/show_bug.cgi?id=697018 last night, but failed. | 12:52.33 |
| both on 1.9 and 1.9a on both 32- and 64-bit. | 12:52.44 |
ago | Robin_Watts: yes... | 13:04.04 |
| 697015 onwards | 13:04.21 |
Robin_Watts | ago: So, yes, we are interested. Yes, we will look. No, I haven't personally had time yet. | 13:04.39 |
| And sebras has tried and failed to reproduce at least one issue. | 13:05.00 |
ago | Robin_Watts: ok...I'm always available in freenode as ago also if I'm not in this channel (maybe because of a restart of irssi) | 13:05.34 |
| so if you need something, just ask... | 13:05.41 |
Robin_Watts | Exactly what version of the source were you testing, and on what platform please? | 13:05.43 |
| ago: Thanks. | 13:05.47 |
ago | the first hint I can give if you can't reproduce is compile with clang and asan | 13:06.01 |
sebras | ago: right, I have been compiling using gcc. | 13:06.19 |
ago | sebras: it is usual the I found bugs that upstream is unable to reproduce :D it happened with libarchive and imagemagick too | 13:06.59 |
| Robin_Watts: as the trace says, 1.9a on x86_64 | 13:16.33 |
sebras | ago: git checkout 1.9a && git submodule update --init && make -j10 nuke && make -j10 CC=clang XCFLAGS=-fsanitize=address LDFLAGS=-fsanitize=address && ./build/debug/mutool draw -s t ./reproducer.pdf works fine for me | 13:18.40 |
| ago: what am I doing wrong? | 13:18.43 |
| ago: the same goes for on the current origin/master. | 13:20.10 |
| ago: of course mutool draw still complains about syntax errors in the file, but no infinite loop. | 13:21.03 |
ago | sebras: give me the time to recompile and I'll check | 13:21.31 |
| sebras: which clang? | 13:21.43 |
sebras | ago: 3.6.2-3. | 13:21.56 |
ago | sebras: pretty old, I used 3.8.0 | 13:22.15 |
| now I'm retesting on 3.8.1 | 13:22.33 |
sebras | ago: also I'm on 32-bit. | 13:23.04 |
| ago: git checkout 1.9a && git submodule update --init && make -j10 nuke && make -j10 CC=clang-3.8 XCFLAGS=-fsanitize=address LDFLAGS=-fsanitize=address && ./build/debug/mutool draw -s t ./reproducer.pdf | 13:37.54 |
| ago: the same with clang 3.8.1-9 for me. | 13:38.08 |
ago | sebras: if asan is working correctly, you should be unable to compile | 13:40.37 |
| because of undefined reference to `__asan* | 13:41.04 |
sebras | ago: but I set LDFLAGS=-fsanitize=address on the command-line... | 13:41.20 |
ago | sebras: dunno what you are doing, I'm using my package manager | 13:43.04 |
| anyway, -Wl,--no-undefined does not work with asan, you need to filter it | 13:43.23 |
sebras | ago: package manager?! I'm not using that to compile or link mupdf. I did use it to install the debian clang-3.8 package though. | 13:44.16 |
| ago: then I'm checking out 1.9a as you mentioned, then cleaning the directory compiling and linking and trying to render the reproducer pdf you attached in 697018 | 13:45.19 |
| ago: This is the command I'm using: git checkout 1.9a && git submodule update --init && make -j10 nuke && LDFLAGS=-fsanitize=address make -j10 CC=clang-3.8 XCFLAGS=-fsanitize=address && ./build/debug/mutool draw -s t ./reproducer.pdf | 13:45.34 |
| ago: I tested using LDFLAGS=-fsanitize=address as a make argument too (which means it cannot be alterated by any Makefile statements), but for this case this doesn't appear to matter. | 13:46.24 |
| ago: so I still cannot reproduce the issue you report in 697018 | 13:46.40 |
ago | sebras: compiled now and I'm able to reproduce | 13:47.21 |
sebras | ago: are you using the same command as I am? | 13:48.10 |
ago | sebras: no, I'm not using draw but just info | 13:48.36 |
| mutool info $FILE | 13:48.43 |
sebras | ago: ok. having retested with mutool info $FILE the results are for me... *waiting* not the same! :) | 13:49.24 |
| ago: now I also see something address sanitizer-related. | 13:50.17 |
| ago: same on git HEAD. cool. now that I can reproduce it I'll take a look at it. thanks for helping out! | 13:51.28 |
| ago: oh, and I assume you used mutool info on all the other bugs too? | 13:51.38 |
ago | sebras: yes... | 13:55.21 |
| sebras: so you didn't reproduce because of draw instead of info? | 13:55.41 |
sebras | ago: indeed. | 13:55.51 |
| ago: using mutool info I can reproduce it using gcc without asan too. | 13:56.09 |
ago | sebras: well I'm happy that is reproducible without complications :) | 13:56.31 |
sebras | ago: right, so gatherresourceinfo() in mutool info is recursive and your specially crafte .pdf file causes it to reparse the same object infinitely many times. | 13:57.42 |
| ago: I don't have more time to spend on this tonight, but now I know where to begin when I do, thanks! :) | 13:58.07 |
ago | sebras: yes...so is the summary fine based on your analysis? | 13:58.13 |
sebras | ago: summary? | 13:58.29 |
ago | sebras: title of the bug in the bugzilla | 13:58.45 |
| mutool/mupdf: stack-based buffer overflow after an infinite loop in fz_flush_warnings (error.c) | 13:58.52 |
| since it was not clear at all I asked to edit if it could be described better | 13:59.26 |
sebras | ago: ah. the summary is probably ok. but it would have helped if you had included the exact command you used to reproduce it. :) | 13:59.34 |
| but that could be a comment. | 13:59.40 |
| I added it myself just now. | 13:59.44 |
ago | sebras: ok, right but if you open it with mupdf directly? | 14:01.06 |
| sebras / Robin_Watts: I added a comment about the reproduction command in each bug I opened, included mujstest. | 14:04.00 |
sebras | ago: using mupdf-x11 it also works for me. | 14:08.09 |
| ago: the main issue is in gatherresourceinfo() which is only part of mutool info | 14:08.27 |
Robin_Watts | sebras: OK, I'm going to have a look at 697015 until either fred or mvrhel appear. | 14:09.44 |
| You've not started on that one yet? | 14:09.52 |
sebras | Robin_Watts: no. | 14:10.16 |
| Robin_Watts: looks like mutool info accidentally accesses a pdf object after it has been freed though. | 14:11.17 |
| or possibly never allocated. | 14:11.21 |
ago | sebras: so the title should be adjusted to pint to gatherresourceinfo instead of fz_flush_warnings or not? | 14:37.09 |
Robin_Watts | sebras, tor8: So bug 697015 is an interesting one. | 15:00.25 |
| fontdict = pdf_dict_get_val(); | 15:00.37 |
| if (!pdf_is_dict(ctx, fontdict)) | 15:00.52 |
| in the pdf_is_dict(ctx, fontdict) call, we try to resolve the indirection. It turns out that the object pointed to is broken, so we repair the file. | 15:01.30 |
| A side effect of that repair is that fontdict is freed. | 15:01.41 |
sebras | ago: doesn't matter so much when we know how to reproduce the issues. :) | 15:09.16 |
| Robin_Watts: so what is the result of pdf_is_dict()? 0? | 15:09.49 |
Robin_Watts | indeed 0. | 15:10.03 |
sebras | Robin_Watts: ok.. how come we can free the dict? | 15:10.27 |
| Robin_Watts: I think it is ok to error out on it and return 0, but not free it..?! | 15:10.41 |
Robin_Watts | sebras: The problem is that the "just in time" file repair code is invalidating the reference. | 15:12.53 |
sebras | Robin_Watts: but that is not allowed, is it? | 15:17.29 |
| Robin_Watts: it is ok for it to be unable to resolve the reference, but not to invalidate the reference itself. | 15:17.53 |
Robin_Watts | sebras: Yes, it's bad. | 15:20.33 |
| Trying to understand more now. | 15:20.41 |
sebras | Robin_Watts: btw, do you mind taking another look at the logs. | 15:25.03 |
| Robin_Watts: we have <wbr> inside the href attribute in anchor tags. | 15:26.00 |
| Robin_Watts: like this https:/<wbr>/gcc.gnu.org/... | 15:26.10 |
| Robin_Watts: but not for any other / though. Surprisingly. | 15:26.26 |
Robin_Watts | sebras: Sure. | 15:27.28 |
| sebras: I'm just going to back out that change, cos I can't see a trivial fix. | 15:31.31 |
sebras | Robin_Watts: where can I see the code? | 15:32.24 |
Robin_Watts | on casper /home/ghostbot/irc2html.pl | 15:32.47 |
| And I apologise in advance. | 15:32.54 |
| Gah. This "just in time" fix is a pain. | 17:09.15 |
| When you call pdf_dict_get_val you don't own the reference you are returned, you just borrow it. | 17:11.42 |
| so if the dict you got it from is rewritten as part of the repair, the object can get dropped and your ref goes invalid. | 17:12.19 |
| Not sure how to fix that :( | 17:12.28 |
sebras | Robin_Watts: 84909fb LGTM. | 17:12.38 |
Robin_Watts | Ta. | 17:13.12 |
sebras | Robin_Watts: so.. we'd need to keep? | 17:13.25 |
Robin_Watts | sebrsa | 17:13.32 |
sebras | robni | 17:13.42 |
Robin_Watts | sorry, premature return :) | 17:13.55 |
| The "neatest" way would be to make pdf_dict_get_val always keep the reference it returns, but OH MY GOD, what chaos that would cause. | 17:15.13 |
sebras | Robin_Watts: well, the borrowed ref is alright until I have to resolve it. | 17:16.28 |
| but that is pretty much all pdf_obj functions. | 17:16.34 |
| before those were explicit and then you could imagine that you'd have to check the reference after that. | 17:16.54 |
| if it became null. | 17:17.06 |
| or something. | 17:17.08 |
| Robin_Watts: when we repair the file does that mean that _all_ refs are invalidated? | 17:17.50 |
| or just some? | 17:17.57 |
| Robin: also when repairing you could conceivably keep the dictionary object, but change its contents to fit what we repaired from the file. but that would also be a major pain. | 17:19.21 |
Robin_Watts | sebras: Potentially all. | 17:21.42 |
| We've thought about this in the past, and I thought we'd come up with a way that it was safe, but I can't remember it now. | 17:22.59 |
sebras | Robin_Watts: well then, in that case as soon as we discover we have to repair we have to throw away all object reference and start from the beginning on that page. | 17:34.56 |
| that would be a pain of we at that point have unsaved annotations! :) | 17:34.58 |
Robin_Watts | sebras: Yes. | 17:35.23 |
| The alternative is to only be able to repair files that you spot are broken during the initial xref read. | 17:36.05 |
sebras | Robin_Watts: but then we'd have to scan the entire file. | 17:36.49 |
| Robin_Watts: like that 2Gbyte file someone is reporting bugs about... | 17:37.16 |
Robin_Watts | sebras: Sorry, to be clear... that alternative is what we used to do. | 17:37.26 |
| (and it meant we had whole classes of files we failed to fix) | 17:37.42 |
sebras | Robin_Watts: btw, what are the symptoms here? sigsegv I guess? | 17:38.42 |
Robin_Watts | yes. | 17:38.59 |
sebras | Robin_Watts: I wonder if there is a way for us to return a pointer which we upon the next interface invocation could spot is invalid and just error out. | 17:40.35 |
Robin_Watts | sebras: The problem is we have no idea which pointers have ever been read. | 17:41.34 |
sebras | Robin_Watts: from the set of all pdf object pointers you mean? | 17:42.06 |
Robin_Watts | sebras: Yes. | 17:43.25 |
| We could have a special mode that we enter when we start a repair. | 17:45.15 |
| Whereby any pdf objects that get 'dropped' to 0 refs, are not actually dropped. | 17:45.38 |
| We would set a flag in them to mean "this is now invalid". | 17:46.01 |
| And any access to such an object in future would throw an exception. | 17:46.17 |
sebras | Robin_Watts: wait... it is only when it is dropped by the repair code that it's harmful. | 17:48.21 |
| if the client drops the object then that's benign. | 17:48.30 |
| we _want_ them to drop. | 17:48.34 |
Robin_Watts | Yes. | 17:48.48 |
| Hence only drops "during a repair" would be affected. | 17:48.57 |
sebras | Robin_Watts: yes, the problem is then that there is a big risk that different parts of the code refer to the same dict keys using different object pointers. | 17:49.37 |
| Robin_Watts: I can't forsee what will happen in that case, probably something horribly bad. | 17:49.53 |
Robin_Watts | sebras: The idea is that we still have only 1 'valid' pointer for an object. | 17:50.25 |
| It's just that rather than crashing when we try to use an old 'invalid' pointer, we'd throw an exception which could trigger the operation being retried. | 17:50.59 |
sebras | Robin_Watts: right, so we're basically depening on that future interface calls using the invalidated object will throw. | 17:51.57 |
Robin_Watts | Yes. | 17:52.06 |
sebras | that _could_ potentially work. | 17:52.08 |
Robin_Watts | The alternative is to ALWAYS throw after a repair. | 17:52.26 |
sebras | and once all non-repair references to the invalidated object are dropped then the object is garbagecollected as normal. | 17:52.39 |
| basically that's what we will be doing. | 17:53.05 |
Robin_Watts | sebras: The problem is that there are no (official) references to the objects. | 17:53.13 |
sebras | just delaying it a few statements. | 17:53.21 |
Robin_Watts | No, not quite. | 17:53.30 |
sebras | Robin_Watts: why is that a problem? | 17:53.35 |
Robin_Watts | Most of the time, the repair is fine, and we don't have a problem with refs. | 17:53.59 |
| In such cases, we'll never access an invalid reference, and so the code will continue forwards blissfully unaware. | 17:54.39 |
| In the case where we do have references, they aren't "official" references (i.e. a variable happens to be holding a pointer to an object, but that object has no idea that someone has a pointer to it - it's reference count has not been updated to cope with that). | 17:55.50 |
| We have no way of knowing when it's safe to drop such a pointer. | 17:56.14 |
| s/a pointer/an object/ | 17:56.25 |
sebras | Robin_Watts: so basically you are saying that it is only unresolved references that are problematic when we realize we have to repair. | 17:59.39 |
| Robin_Watts: that seems reasonable. | 17:59.43 |
Robin_Watts | sebras: unresolved is a bad word. | 17:59.58 |
| It's "transient" references that are problematic. | 18:00.16 |
sebras | Robin_Watts: well, we use that term in the interface. | 18:00.19 |
| Robin_Watts: but here you mean something else, I get that. | 18:00.35 |
Robin_Watts | sebras: Do we? Then I worry we are talking at cross purposes. | 18:00.38 |
| unresolved references are fine, AIUI. | 18:00.50 |
| because if I have an object that means "5 0 R", then that will still mean "5 0 R" after a repair. | 18:01.13 |
sebras | Robin_Watts: true, but the dict where you read that 5 0 R from might now say 42 0 R | 18:01.45 |
| Robin_Watts: because in our infinite wisdom we repaired it and picked another object. | 18:02.03 |
| dict. | 18:02.06 |
| not object. | 18:02.08 |
Robin_Watts | yes, the dict where previously we had 5 0 R might now have 42 0 R. | 18:02.37 |
| but that problem is not restricted to references. | 18:02.46 |
| any object can have this problem. | 18:02.51 |
sebras | Robin_Watts: is that true? I though it was only containing objects (dict, array) that have references and they may all of a sudden have keys/elements being a different reference. | 18:04.24 |
| so if you cached the _old_ reference prior to repair then you don't know if it is valid in the new, repaired context. | 18:04.47 |
| s/context/document/ | 18:04.54 |
Robin_Watts | Only dicts and arrays have object references. | 18:08.31 |
| Object references can be direct or indirect. | 18:08.51 |
sebras | indeed. | 18:08.59 |
Robin_Watts | direct object refs are just pointers to other objects. | 18:09.12 |
| indirect object refs are objects with a doc/obj/gen triple in them. | 18:09.42 |
sebras | oh.. and if e.g. a dict is replaced by repair then even direct objects inside it could be replaced. | 18:10.05 |
| I see what you mean now. | 18:10.08 |
Robin_Watts | Yes. | 18:10.12 |
| Sorry, too much overloading of language here. | 18:10.21 |
sebras | np. | 18:10.34 |
| well. how is this different from throwin TryLater? | 18:10.58 |
| but in this case TryLaterBecauseWeJustInvalidatedAllObjects? | 18:11.20 |
| so basically rerender the page. | 18:11.26 |
Robin_Watts | sebras: In many cases we are never holding any references to old objects. | 18:11.54 |
sebras | and because we cannot enter repair again (right?) then if we fail to render the 2nd time, then we can just throw normally. | 18:12.01 |
Robin_Watts | hence when the repair is triggered, we can safely continue. | 18:12.07 |
sebras | Robin_Watts: right, so it is only when we have those transient borrowed pointers from dicts/arrays that we are suceptible to this issue. | 18:12.45 |
Robin_Watts | So our choices are to either ALWAYS throw on a repair, which will force operations to restart, possibly unnecessarily... | 18:13.03 |
| or to throw when a newly illegal object is accessed. | 18:13.27 |
| The latter introduces questions about how long lived the 'illegal' objects have to be. | 18:13.59 |
sebras | yes. and the former alternative is Safe<tm> but wastes time. | 18:14.42 |
Robin_Watts | I think I prefer SafeButSimpler(TM). | 18:15.25 |
sebras | to always throw? | 18:15.41 |
Robin_Watts | Just trying to walk through this file to understand this exact example. | 18:15.52 |
| sebras: yes. | 18:15.56 |
sebras | Robin_Watts: what do we do about unsaved annotation changes? | 18:16.06 |
| I vote for scrapping them. | 18:16.14 |
Robin_Watts | sebras: Yes. | 18:16.20 |
sebras | Robin_Watts: so... let's imagine that this happens to the android app. | 18:16.45 |
| Robin_Watts: will it see the throw? I think it will bound up into the Java-layer that has to retry it's operation. | 18:17.14 |
| what ever it was. | 18:17.18 |
| probably loadPage() but quite possibly also changeAnnotationState() (which may rigger content stream rendering which may trigger repair. | 18:17.55 |
| Robin_Watts: what do we do with PDFObject refs that the Java code has created prior to this happening? | 18:18.55 |
| Robin_Watts: those objects are now stale, but how do we know? | 18:19.05 |
Robin_Watts | sebras: Have we exposed any PDF objects into java at the moment ? | 18:19.16 |
| I thought we were all at the fz level at the moment, rather than pdf stuff ? | 18:19.33 |
sebras | Robin_Watts: of course. you can do Document.createNewBoolean() or whatever it is called. | 18:19.40 |
| no, that's why we have Document and PDFDocument. the latter you can use to create objects. | 18:20.03 |
Robin_Watts | sebras: Objects at the JNI level all have REAL references. | 18:20.15 |
| hence they will never be invalidated. | 18:20.34 |
sebras | Robin_Watts: ah, yes, true. | 18:20.59 |
| Robin_Watts: how do we handled newly created objects when we do a repair?! do we merge the xrefs?! | 18:21.25 |
| maybe some special xref flag for created but not saved objects. | 18:21.45 |
Robin_Watts | sebras: Ah, ok, so the way we do a repair is that we run through the file parsing objects in turn. For each object we insert it into the current xref. | 18:24.56 |
sebras | Robin_Watts: when do we drop that borrowed dict? | 18:26.33 |
Robin_Watts | what borrowed dict? | 18:26.51 |
sebras | Robin_Watts: in this case it was a fontdict. | 18:27.12 |
Robin_Watts | Let me try and finish. This one may be a special case. | 18:30.33 |
| So when we repair, we run through the file parsing objects in turn, reading offset/generation/number (and stream offset/length, if appropriate) into a list. | 18:33.38 |
tor8 | Robin_Watts: oh, right. our "borrowed reference" assumption doesn't hold. | 18:35.16 |
Robin_Watts | We then dump that list back into the current xref (expanded if required), but we NEVER replace existing objects. | 18:35.46 |
| That's how we know borrowed refs are safe. | 18:35.58 |
| BUT... in this special case, we'd screwing that up, because we update the length of streams. | 18:36.22 |
tor8 | I don't think we thought through all the corner cases when we added the ability to repair on the fly | 18:36.26 |
Robin_Watts | and this object that we're holding happens to be a Length value. | 18:36.44 |
| so that's a smaller wrinkle that we might be able to solve, maybe. | 18:36.57 |
sebras | re-reads again. | 18:40.11 |
Robin_Watts | Sorry, my multitasking falls to bits when being assailed with color science on the other tab as well as hairy object reference problems. | 18:47.51 |
sebras | Robin_Watts: np. I should be sleeping anyway. | 18:48.24 |
Robin_Watts | sebras: You go to bed. We can talk about this tomorrow (lucky you!) | 18:48.50 |
| tor8, sebras: I have a partial fix that solves that bug. | 20:20.42 |
| I now leak 2 objects from that PDF file though. | 20:21.01 |
| so I plan to put those into a orphan list or something and deal with them when the doc is closed. | 20:21.38 |
| Will worry about that tomorrow. | 20:21.48 |
| Forward 1 day (to 2016/09/22)>>> | |