Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2018/01/17)20180118 
tor8 sebras: fredross-perry: there is a function to immediately release all resources for a java Document without forcing the gc to run: Document.destroy()10:48.23 
  that will call the finalizer and set the internal C pointer in the java wrapping object to NULL10:48.50 
  fredross-perry: the fz_set_context function sets a global value. I don't understand how you mean to have it contain a flag for whether the most recently opened stream was an InputStream or OutputStream and then act on that information.10:50.32 
  but that code seems to have gone now, so that's good :)10:51.35 
titanous tor8: do you mind taking a look at https://github.com/google/oss-fuzz/pull/106713:31.00 
tor8 titanous: in pdf_fuzzer.cc you can just call the fz-prefixed functions. fz_open_document_with_stream, fz_count_pages, and fz_drop_document.13:34.39 
  just pass in "pdf" as the 'magic' argument13:35.07 
  then you don't need to do the &doc->super stuff13:35.22 
titanous got it13:35.27 
tor8 and that will then also make it work for the other input formats we support13:35.59 
titanous what email address should I use for you? and who else should be CCed? they need to be Google accounts13:36.17 
tor8 titanous: I think sebras was looking into setting up a google account for this. he's been a bit busy the last couple of days so I don't know what's happened with that.13:37.04 
  how much traffic is this going to generate?13:37.39 
  you can also pass 0 to fz_new_pixmap_from_page_number and save a bit of memory13:39.06 
  as the final argument13:39.12 
titanous there will be one email thread for each bug found, but they are very easy to filter into a folder or whatever13:39.30 
  they will be from Google's monorail issue tracker13:39.43 
  then there is a web UI that can be used to access the stats and bugs found13:39.57 
  the bugs will automatically be closed after the issue is fixed13:40.20 
  everyone that you want to have access to the details should be in the YAML file13:41.20 
  I can add the initial set and then you can send PRs to oss-fuzz in the future13:41.34 
tor8 titanous: you can put tor.andersson@artifex.com in there to start with13:49.57 
  artifex.com is a google account13:50.30 
titanous perfect13:50.35 
  do you mind just posting a comment that says it looks good if everything is okay?13:50.58 
  I've pushed all of the changes13:51.07 
tor8 paulgardiner: you around?14:21.21 
  I finally got around to reviewing your signature-work branch14:21.43 
  Only found some minor issues. Two typos spotted: "verifcation" in a comment in "Further changes to sig..." and "potention" in a commit message14:22.42 
  fz_output_as_stream ought to be named fz_stream_from_output (and the internal callback can be either as_stream, to_stream, or stream_from, I have no strong opinion there)14:23.09 
  I find that X_from_Y naming "x = X_from_Y(y)" reads better than "x = Y_to_X(y)"14:24.08 
  in complete_signatures, you create a new stream from the output for each byte range14:24.50 
  you could hoist that creation outside the for (unsaved_sigs) loop and just seek/read inside the loop14:25.10 
  I have nothing to say about how the openssl API is used; I'll trust you on those bits :)14:26.45 
paulgardiner That's great. I can sort most of those out. The repeatedly creating a new stream might have to remain. Robin requested the condition that the output is not used while a derived stream is in existence.14:34.34 
Robin_Watts tor8: yeah, I felt it was important that we not have to cope with output and streams both existing at the same time and being used interleaved.14:41.18 
tor8 Robin_Watts: paulgardiner: that's not the case here though, or have I missed something? the only functions being called (other than create/seek/read/drop of the stream) in the loop is pdf_obj_parent_num and pdf_dict_getl14:46.34 
Robin_Watts tor8: I haven't looked at the latest patch, but I thought when I looked before it was an issue.14:47.18 
tor8 I'm talking about the first loop only14:47.25 
  pdf_write_digest should probably create the stream from its fz_output inside the function itself14:48.06 
  since that's where the interleaving happens14:48.12 
  this case I'm talking about the second loop that does create/pdf_write_digest/drop14:48.54 
paulgardiner Ah, right. That makes sense14:54.00 
  Robin_Watts: I've sorted the issue you spotted before. Just sounds like I've overdone it on the fix14:54.43 
  tor8: I've addressed all your comments and pushed a new version to the branch. I'll retest on the cluster too.15:55.40 
  cluster says it's okay.16:07.53 
fredross-perry sebras - you there?16:54.09 
sebras fredross-perry: I am.16:57.32 
fredross-perry So I am chasing down unreleased references to the document in my Java code. But I am still somewhat uncomfortable forcing GC. I think maybe we forgo the close() parts of the stream code, and I'll do the right thing with secureFS on my side.16:59.23 
  thoughts?17:00.04 
sebras I sent this message privately earlier today "I spoke with tor8 and he reminded me about Document.destroy() which will call the finalizer and set the pointer to the native fz_document to NULL. that's something that you could call just prior to setting the last java reference to a document to null in order to trigger it to release all its resources. without having to wait for GC. I had forgotten about it,17:00.54 
  sorry. :)"17:01.00 
  fredross-perry: I have been forcing GC myself when I have needed to control freeing, so I have forced more work on myself as well...17:02.10 
Robin_Watts sebras: You've been "forcing gc" ?17:05.57 
  By calling System.gc(); ?17:06.09 
  In general, we should never "force gc".17:06.52 
sebras Robin_Watts: when I have been looking for reference leaks, yes.17:07.16 
fredross-perry Right. But in my case the only reason I am needing the _close to be called is so I can turn around and tell secureFS to close. I know it's good that we're revealing cleanup that needs to be done, but I also want to reliably call close. We can leave the _close functionality in, and say that my code has the issue. IOW, trying to get something reasonable pushed into mupdf soon.17:07.17 
Robin_Watts But we should free as much as we can without relying on gc, IMAO.17:07.20 
fredross-perry Robin - I agree.17:07.28 
Robin_Watts sebras: Ah, so for debug only, gotcha.17:07.31 
Robin_Watts scuttles back under rock.17:07.41 
sebras Robin_Watts: yes, for runtime I don't care if the resources used by a document are freed at a later point.17:08.19 
  Robin_Watts: unless the resources are somewhat limited.17:08.26 
  fredross-perry: so securefs has two close calls? one to close an opened file and one to unmount/close the entire securefs from future use? (until it is mounted/openend again of course)17:09.16 
fredross-perry sebras - not like that. The model for a single document is open, read, seek, close. Right now close is invoked via the fz_stream we create, which is subject to all the Java refs being cleaned up. I'd rather handle close explicitly on the Java side without relying on mupdf at this point.17:11.36 
sebras fredross-perry: ok, then I misunderstood you. what happens if you call Document.destroy() at the place where you would want to call Document.close()?17:12.51 
  fredross-perry: given what I read in Document.destroy() this ought to free all its resources.17:13.25 
fredross-perry That's what I was doing, and the ref count for the document (in mupdf) was 2, not 1. Which has led to me hunting down unreleased refs on my side.17:13.59 
  The other reason I don't want to rely on that is that even if I do find and squash all the refs, there's no guarantee that we'll introduce on in the future, and that seems fragile to me.17:14.59 
sebras fredross-perry: it was? you didn't mention that yesterday when I proposed nulling the references and forcing GC.17:15.02 
fredross-perry Don't get me wrong. Nulling references and forcing GC is leading to fixing some issues on my side. I just don't want to rely on getting that right in order for close() to get called.17:16.41 
Robin_Watts fredross-perry: Yeah, this is why we have the _destroy() calls, IIRC.17:17.29 
sebras Robin_Watts: and yesterday I had forgotten about that so I never suggested using it. (as I haven't even used it myself.)17:19.05 
fredross-perry destroy() just calls finalize() directly.17:19.05 
Robin_Watts fredross-perry: IIRC destroy calls finalize, and sets everything in the java object to NULL.17:19.49 
  That way, if (when) the gc happens and the java object gets thrown away, the finalize call doesn't do anything.17:20.13 
  It means that after a destroy we have a 'zombie' java object, if you like - a husk with no real contents.17:20.34 
sebras Robin_Watts: but if there is a reference leak of fz_stream somewhere in the native code then that stream (and hence the SeekableStream) would still stay around even if Document is now finalized.17:21.01 
Robin_Watts sebras: Yes.17:21.17 
sebras Robin_Watts: and fredross-perry seems to suggest that even if that happens he wants SeekableStream.close() to be called during Document.finalize().17:21.56 
Robin_Watts My contention would that if we are calling destroy() reliably then it's easier to rule out that leaks aren't happening just because gc hasn't happened.17:22.06 
sebras Robin_Watts: i.e. even if we have reference leaks in the native code the java code shouldn't suffer.17:22.12 
Robin_Watts (cos ISTR that even calling System.gc() is not guaranteed to call all finalizers immediately)17:22.22 
sebras Robin_Watts: yes, if don't want to wait for GC then we must call .destroy(), but if I get fredross-perry correctly his worry is shielding the java code against reference leaks in the native code.17:23.07 
Robin_Watts I think we have to work on the basis that if the native code gets reference counts wrong, we're screwed.17:23.08 
fredross-perry I believe that I've got a stray Java reference to the document that still exists at the time Document_finalize gets called. That's what I am hunting down.17:23.11 
sebras Robin_Watts: no, you'd have to call java.lang.System.runFinalizers() too.17:23.25 
Robin_Watts fredross-perry: Right.17:23.26 
fredross-perry I don't think the issue is in the native code.17:23.33 
Robin_Watts fredross-perry: I can't remember if I got Memento working with the JNI code.17:23.48 
  If so, that might help track leaks.17:24.07 
  Cos for any given block the details stuff can store the history of where reference counts were changed.17:24.33 
  tor8, sebras: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=79aca97f74ebf4a9971ffbc02bebf36de4c475fc17:25.42 
fredross-perry Is there a Java-only way to track references? 'Cause I don't believe I've got a JNI issue at this point.17:26.06 
  I'm going to the office now to fiddle w/miles. Back up later.17:26.45 
sebras Robin_Watts: so in lcmsart you are basically taking the new upstream and adding two shelly fixes?17:30.13 
  I can't judge the fixes themselves, but the branching seems reasonable to me.17:30.32 
  Robin_Watts: it clusters well I guess?17:32.02 
Robin_Watts it does.17:32.10 
  sebras: yes, upstream + 2 fixes.17:32.20 
sebras Robin_Watts: are you familiar with securefs?17:33.40 
Robin_Watts at a highish level17:33.51 
sebras Robin_Watts: I want to know why it is important that the underlying stream is closed when we close a document?17:34.00 
Robin_Watts (i.e. I understand the intent)17:34.13 
sebras is it just a cleanliness matter or may something actually stop working?17:34.30 
Robin_Watts sebras: The idea is that an integrator can give us a securefs implementation, and we do file fetching/writing through that.17:34.50 
  so perhaps it could wrap their own cloud service.17:35.02 
  Or maybe it could do DRM etc.17:35.11 
  If we don't close the underlying securefs thing, how can the cloud service know that it can free the connection (or the local cached copy of the file etc)?17:35.54 
sebras Robin_Watts: so this is similar to on Linux having a program with a file open on a file system that you want to unmount: you can't.17:36.29 
Robin_Watts sebras: sounds similar, yes.17:36.45 
sebras Robin_Watts: why would the cloud service want to free the connection before all the files have been closed?17:37.34 
Robin_Watts sebras: It would want to free the connection when all the files have been closed.17:38.58 
sebras Robin_Watts: I guess one way around this is to have the SeekableInputStream implementation have a SeekableInputStream.freeAllResourcesAndHaveFutureInterfaceCallsThrowIOException() that can be called...?17:39.11 
Robin_Watts I'm clearly missing something here in this discussion.17:39.31 
  When we go to open a document, we open a "file" (which is actually a securefs stream), yes?17:39.51 
sebras Robin_Watts: I'm too so I have the eerie feeling that I'm missing something.17:39.55 
  yes17:39.59 
Robin_Watts we read the document in.17:40.01 
sebras yes17:40.03 
Robin_Watts we keep that stream open all the time we have the document open.17:40.14 
sebras yes17:40.18 
Robin_Watts when we close the document, we close the stream.17:40.25 
sebras yes17:40.29 
  or rather... the document drops its reference17:40.40 
  but the stream is not closed.17:40.43 
Robin_Watts At that point, we want to make sure that the underlying stream really is closed, rather than it being "closed-pending-a-gc".17:41.08 
sebras right.17:41.19 
  and barring no errors you would to Document.destroy() and that would happily close the stream.17:41.41 
  but if there is a mistake in the code so that there is still a reference to Document, or in the JNI so that it has leaked a reference to the stream, then the stream would not close.17:42.09 
Robin_Watts hence rather than the document just doing: doc.stream = null;, we want to do doc.stream.destroy(); doc.stream = null;17:42.10 
  sebras: Yes.17:42.19 
sebras agree, but fred's suggesting that doing so wouldn't be enough if there are also leaks of Document in java or stream in the native code.17:42.42 
Robin_Watts So the correct thing to do is to make sure there is no such mistake in the code.17:42.50 
sebras yes, but fred's suggesting that we might introduce another one.17:43.06 
  and then the stream wouldn't be closed.17:43.12 
Robin_Watts another mistake?17:43.23 
sebras yes.17:43.26 
Robin_Watts Let's try not to introduce such mistakes :)17:43.33 
sebras there is one now, but even if we fix that we might introduce another one.17:43.51 
Robin_Watts Indeed.17:43.58 
sebras and that's true we've messed up reference counting before.17:44.00 
Robin_Watts but hopefully such things should be easy to spot, because securefs won't close properly.17:44.17 
sebras but I have a hard time both relying on reference counting to free resources and then subverting it in order to free resources when I want.17:44.40 
  I would hope so. though finding reference leaks is pretty difficult sometimes. especially for complicated documents.17:45.30 
  Robin_Watts: apparently some software can show all java object references: https://stackoverflow.com/questions/6155464/is-it-possible-see-all-the-references-to-an-object-in-execution-time#615598617:47.00 
  not sure mupdf would build in netbeans though.17:47.11 
Robin_Watts sebras: Urm...17:47.52 
  The way I think about this stuff is that all the native stuff should be freed directly, without relying on gc.17:48.45 
  That's why we have .destroy.17:48.53 
  The only thing that gc does for us is to get rid of the 'husk' java objects.17:49.23 
sebras right.17:49.29 
  I'm not sure we call .destroy() in our own apps at the moment.17:49.40 
Robin_Watts So I don't feel we are ever "relying on reference counting to free resources and then subverting it in order to free resources when I want"17:49.51 
  Are we not? We should be.17:50.01 
sebras well, .destroy() is kind of saying that I don't trust the reference couting to free resources for me, so do it NOW! :)17:50.23 
  maybe I'm viewing it incorrectly. :)17:50.43 
Robin_Watts destroy() is saying "I want to not rely on when finalize() is called", yes.17:52.16 
  IF we always call destroy, then we are NEVER relying on gc.17:52.41 
sebras true.17:52.50 
Robin_Watts (for native objects).17:52.53 
  And that's what we should be shooting for, I believe.17:53.00 
sebras but would we want a Document.destroy_and_disconnect_the_java_objects_from_their_native_counterparts() as an alternative?17:53.32 
Robin_Watts sebras: Was there a lgtm about that commit ?17:54.01 
sebras i.e. in this case that the SeekableStream thingies are closed and so when the fz_stream callbacks are called they fz_throw?17:54.04 
  Robin_Watts: lcms2? yes. the branching looks correct to me. I assume that it clustered well..?17:54.31 
Robin_Watts It did.17:54.37 
sebras Robin_Watts: LGTM.17:54.42 
Robin_Watts Ta.17:54.44 
sebras np.17:54.47 
Robin_Watts sebras: Document.close(), presumably calls stream.close(), right?17:55.30 
sebras no.17:55.37 
Robin_Watts Why not?17:55.49 
sebras Document.destroy() -> Document.finalize() -> fz_drop_document() -> pdf_drop_document_imp() -> fz_drop_stream() -> close-callback -> SeekableStream.close()17:57.03 
  but if someone leaked a reference to the stream fz_drop_stream() wouldn't call the callback.17:57.23 
Robin_Watts Right, so let's not leak references to that stream.17:57.43 
sebras agreed, but then fred needs to find the reference leak that he is facing now.17:58.03 
Robin_Watts On platforms where Memento runs, and the details stuff works, you can see for any given memory block where the reference counts were updated.17:58.28 
  so you can see precisely where they were leaked.17:58.39 
  (with a C backtrace)17:58.59 
sebras Robin_Watts: yes, though I'm not sure it works in muso.17:59.09 
Robin_Watts sebras: Right, but that may be something we can solve.17:59.22 
  fundamentally there is nothing about Memento that shouldn't work on Android.17:59.39 
  We just need to figure out the right backtrace method to use.17:59.51 
sebras that might help. otherwise it is a lot of breakpoints and checking keep/drops against the backtraces of each triggered breakpoint (i.e. the old fashioned way)18:00.02 
Robin_Watts sebras: Supposedly android has backtrace and backtrace_symbols_fd, so using MEMENTO_STACKTRACE_METHOD=1 should work.18:01.47 
sebras that'd be nice.18:01.59 
  then a way to build mupdf using memento is the thing missing.18:02.12 
  Robin_Watts: would we need to include the java backtrace too?18:02.35 
Robin_Watts sebras: The stacktrace would run back as far as the JNI function, which should be enough I'd hope?18:03.20 
sebras right, and then point to the JVM code that executes the javabyte code presumably.18:03.55 
Robin_Watts sebras: I mean, we shouldn't need the java backtrace.18:04.20 
  cos the C backtrace alone should tell us what java calls were being made.18:04.44 
  (just not the callstacks for the java calls)18:05.15 
sebras Robin_Watts: if there are resource leaks then they mostly be pointing to the member that runs the page.18:05.30 
  right. perhaps they are not needed.18:05.50 
  ok. so we need to figure out how to build memento under android then, got it.18:06.06 
Robin_Watts -DMEMENTO=118:06.12 
fredross-perry Just to be clear, I don't believe there is an issue in JNI at this point. Rather, there is a Java doc reference that still exists at the time Document_finalize is called. It's related to the way I'm handling page-related objects in NUI. I just have to hunt them down. So when Document_finalize drops the doc, call_SeekableInputStream_close is NOT called, so we've lost the opportunity to tell secureFS to close. I18:10.06 
  can handle getting secureFS closed another way.18:10.07 
sebras fredross-perry: but calling Document.destroy() ought to freel the internal structure of the object pointed to by that stray Document reference. so then the callback ought to be called.18:12.51 
  despite the stray java Document reference.18:13.00 
  at least the way I understand it.18:13.10 
fredross-perry that's what's not happening. Remember, Document.destroy() is in Java, which just calls finalize(). Finalize calls fz_drop_document.18:14.24 
sebras Robin_Watts: wee! the latest version of android NDK deprecates armeabi, mips and mips64! :)18:14.44 
fredross-perry do we need those?18:15.17 
sebras fredross-perry: yes, and then fz_drop_document calls pdf_drop_document_imp which calls fz_drop_stream. and that ought to close the stream.18:15.21 
  fredross-perry: re: archs, I bet there aren't many users with devices on those platforms. but it certainly reduces build times!18:15.44 
fredross-perry it doesn't when I step thru it, though.18:15.46 
Robin_Watts fredross-perry: So there is another java object that is holding a Stream reference.18:16.02 
sebras or native code leaking a reference.18:16.21 
Robin_Watts sebras: Fred's claim is that it's a java object rather than a native one.18:16.41 
  (AIUI)18:16.45 
fredross-perry that's my claim.18:16.50 
sebras Robin_Watts: src/mupdf-android-fitz/build/intermediates/ndkBuild/release/obj/local/mips64/libmupdf_core.a(memento.o): In function `Memento_showStacktrace':18:17.00 
  Robin_Watts: src/mupdf-android-fitz/libmupdf/source/fitz/memento.c:794: undefined reference to `__cxa_demangle'18:17.10 
  ugh.18:17.13 
Robin_Watts So... presumably we should be able to find that java object from Document.18:17.23 
fredross-perry Right, I am in the process of hunting it down.18:17.50 
  heading back home now, more later.18:18.54 
Robin_Watts so we can hopefully add a 'destroy' method to that java object so: doc.mysterious_object.destroy() calls mysterious_object.stream.destroy(); and sets mysterious_object.stream = null;18:18.57 
  So, yes, finding that object is key. I agree with your plan :)18:19.24 
  sebras: Ugh.18:19.34 
  MEMENTO is deliberately set up so we can use different stacktrace mechanisms easily.18:20.00 
  possibly we need to copy the standard linux one (the one that uses backtrace etc) and then tweak it a bit.18:20.23 
sebras Robin_Watts: would we need to link with some special library to get this symbol?18:21.34 
  seems like it ought to be part of libstdc++, but I'm not sure.18:22.17 
Robin_Watts sebras: not a clue, offhand, sorry.18:24.17 
  We can probably live without demangling.18:24.37 
  It just means we need to mentally decode the function names ourselves.18:24.49 
sebras right. I'll comment it out. someone suggested compiling __cxa_demangle from the libstdc++ manually, but that seems awful (because android may at one point introduce it!)18:25.53 
Robin_Watts sebras: #ifndef MEMENTO_ANDROID maybe.18:26.50 
  MEMENTO_ANDROID is set already.18:27.10 
sebras Robin_Watts: fred: https://pastebin.com/raw/tXx7iEAa with this patch I get a memento build of libmupdf for android. hopefully that can be used to track down the problem easier.18:30.51 
  I need to run out for a while, but when I'm back I'll try to sort that patch out cleanly with ifdefs.18:31.15 
fredross-perry sebras - just to confirm, in Document_finalize, if I force doc->refs = 1 before dropping the doc, it all works as I expect. So that's my evidence that I've got a stray doc reference.19:09.55 
  I believe I've found my issue. com.artifex.mupdf.fitz.Page is holding the doc reference. I have to call destroy() on it. When I do, all is well, including no forcing GC.19:33.50 
  sebras - OK finally, http://git.ghostscript.com/?p=user/fred/mupdf.git;a=commit;h=48c2ebc8f2a1ec009fd474169d0374b74c0d530423:46.32 
 Forward 1 day (to 2018/01/19)>>> 
ghostscript.com #ghostscript
Search: