MuPDF IRC logs

	<<<Back 1 day (to 2018/02/04)	20180205
sebras	fredross-perry: ok, so I looked at it a bit more.	01:30.54
	fredross-perry: we are requiring android api level 16. this was released in jelly bean android 4.1 on july 9th 2012	01:31.15
	that corresponds roughly to the NDK r8b release.	01:31.26
fredross-perry	right.	01:31.34
sebras	I attempted to build with the earliest NDK available for download over at https://developer.android.com/ndk/downloads/older_releases.html which id NDK r10e	01:31.45
	and if I add #include <limits.h> to memory.c like you say I can confirm that the build error cocerning use of SIZE_MAX is resolved.	01:32.16
	the worrying thing is that SIZE_MAX according to opengroup ought to be defined in stdint.h, not limits.h	01:32.41
	Google appears to have rectified this in android api level 21. where SIZE_MAX all of a sudden is in stdint.h (while being in limits.h in api level 18)	01:33.23
	now... later versions of the NDK unified the include hierarchy for all api levels	01:33.49
	this appears to have happend in NDK r14.	01:35.58
	so if we state we require NDK r14 or later we should be including stdint.h	01:36.26
	if you want to support NDK r10e through NDK r13b we should be including limits.h	01:36.58
	what a mess.	01:37.01
fredross-perry	I will say that I see the problem in r14b, but not in r15c. But in any case, I'd like to see a strategy where we can use a reasonable range of NDK versions. 14b was only March of last year.	01:38.07
sebras	fredross-perry: oh, yeah. that's pretty late.	01:38.34
fredross-perry	Also, the NDK with which you build is different than the API you run against. We should be able to build with recent NDKs and still run on API 16.	01:39.15
sebras	fredross-perry: NDK r10e was may 2015.	01:39.23
	btw, POSIX/opengroup specifies that SIZE_MAX ought to go in stdint.h while SSIZE_MAX ought to go in limits.h	01:40.46
fredross-perry	Maybe we can test something that should be defined in stdint.h, like SIZE_MAX, and if it's not defined, then include limits.h.	01:41.14
	re: POSIX, yes, I think this is an issue with earlier NDKs that got fixed.	01:41.46
sebras	fredross-perry: mm, read on https://android.googlesource.com/platform/ndk/+/ndk-release-r16/docs/UnifiedHeaders.md that starting NDK r15 the unified headers are used by default.	01:42.38
	perhaps starting at that point the issue is now permanently fixed for all API levels if you have NDK r15 or later.	01:43.02
fredross-perry	yes, that's why I don't have an issue with r15c	01:44.14
sebras	makes sense.	01:44.33
	we do use SIZE_MAX elsewhere so I wonder if those locations are also problematic or if we by chance happen to include the correct header there.	01:44.56
	(indirectly)\	01:44.59
	also I took the liberty to look for all _MAX and _MIN locations in the mupdf code, and I think I have found a few where we ought to include limits.h/stdint.h but don't (i.e. we're relying on these headers being pulled in by something else (which may be removed in the future)).	01:46.47
fredross-perry	it woud only be problematic if it's part of the Android build. memory.c is, but others may not be.	01:46.48
sebras	it is in source/fitz/store.c where we use SIZE_MAX, and this ought to be part of the android build.	01:47.14
fredross-perry	that one does include limits.h	01:47.39
	So I guess I'm back to, is it safe to just include limits.h in memory.c?	01:48.16
sebras	yes it should be safe, but it is papering over the bug in android. :-/	01:48.49
fredross-perry	Understood. So, #ifdef HAVE_ANDROID ?	01:49.59
	or, #ifndef SIZE_MAX, or something	01:51.07
sebras	now that I changed memory.c to include limits.h instead I did arrive at a link error. :-(	01:51.20
	timegm is missing.	01:51.24
	so that means that NDK r10e is probably not recent enough for us to build.	01:51.43
fredross-perry	with which NDK? I did the same thing yesterday with r14b and no such error.	01:52.05
	ah yes.	01:52.23
sebras	NDK r10e	01:52.55
fredross-perry	've got to go out now. Leave me a not on what you decide. I need this change so I can build with ATAS.	01:52.57
sebras	yeah I got that.	01:53.11
paulgardiner	I have a file for which mupdf prints "assert: overwrite hash slot woth different value!". There's a comment above the code that prints the message, which says "This is legal, but should rarely happen". The same file apparently also leads to crashes, but none that I've been able to reproduce yet.	14:58.06
tor8	paulgardiner: toss me or sebras the file. we've been trying to find and eliminate these issues.	15:06.28
	paulgardiner: they shouldn't be happening ... the comment refers to cases where multi-threading might mean the same item might be rendered and stored in the cache by two separate threads	15:07.20
	because neither thread could find the item they were looking for in the cache, and both start rendering it at the same time	15:07.43
	(or decoding or loading, depending on the type of resource)	15:08.03
	but in single-threaded use, it should never occur	15:08.10
paulgardiner	Interesting. Maybe connected with the crash then. I suppose there's some possibility that MuSO is causing memory corruption and this is how it is showing up, but it's consistently this message	15:09.50
	File sent	15:11.28
tor8	paulgardiner: did you still have some problem saving a file with encryption? I could take a look at that this week, if you could just remind me of the bug number...	15:14.15
paulgardiner	That would be great: 698918	15:15.14
	tor8: with that hash clash, the existing_tile returned by fz_store_item has a width of 1329134352. That's a bad sign, right?	15:36.42
tor8	paulgardiner: yeah, that does sound very bad	15:37.07
	except I can't reproduce it with mupdf-gl	15:38.06
	nor with mutool draw	15:40.56
paulgardiner	Can you think of anything I might be doing wrongly in calling mupdf that would specifically trigger that?	15:44.42
	I would expect random memory corruption to fail in a less consistent way. Also would hope that memento would moan.	15:46.03
tor8	not really. if you could print out the options you pass to the rendering function, maybe we can reproduce it if it depends on zoom levels	15:46.57
paulgardiner	It seems to require more than visiting each page once.	15:47.26
tor8	hmm.	15:48.18
	do you have a backtrace from where it triggers?	15:48.26
paulgardiner	* frame #0: 0x0000000101e1e710 sodk`do_hash_insert(ctx=0x0000000108d7bed0, table=0x000000010b79ef00, key=0x000000016f875ff8, val=0x000000011de3d960) at hash.c:126	15:50.47
	frame #1: 0x0000000101e1e33c sodk`fz_hash_insert(ctx=0x0000000108d7bed0, table=0x000000010b79ef00, key=0x000000016f875ff8, val=0x000000011de3d960) at hash.c:220	15:50.49
	frame #2: 0x0000000101e68e4c sodk`fz_store_item(ctx=0x0000000108d7bed0, key=0x000000011ad61a70, val_=0x0000000119d7ad00, itemsize=38259, type=0x00000001050bc178) at store.c:457	15:50.50
	frame #3: 0x0000000101e204f8 sodk`fz_get_pixmap_from_image(ctx=0x0000000108d7bed0, image=0x00000001133576f0, subarea=0x000000016f876240, ctm=0x000000016f8762a0, dw=0x000000016f876290, dh=0x000000016f87628c) at image.c:710	15:50.52
	frame #4: 0x0000000101ddd8c4 sodk`fz_draw_fill_image(ctx=0x0000000108d7bed0, devp=0x0000000106b3b260, image=0x00000001133576f0, in_ctm=0x000000016f8764c8, alpha=1, color_params=0x000000016f876500) at draw-device.c:1760	15:50.53
	frame #5: 0x0000000101dbc9fc sodk`fz_fill_image(ctx=0x0000000108d7bed0, dev=0x0000000106b3b260, image=0x00000001133576f0, ctm=0x000000016f8764c8, alpha=1, color_params=0x000000016f876500) at device.c:329	15:50.55
	frame #6: 0x0000000101e25be4 sodk`fz_run_display_list(ctx=0x0000000108d7bed0, list=0x0000000117a23cd0, dev=0x0000000106b3b260, top_ctm=0x0000000102b06eb0, scissor=0x0000000102b06ec8, cookie=0x0000000000000000) at list-device.c:1731	15:50.56
	I can trigger it consistently now by scrolling slowly through from first page to last and then scrolling back a bit. This is in a side bar of thumbs that render on demand	15:54.12
	It is possibly always page 25 (or 26 starting from 1)	16:05.05
tor8	paulgardiner: what happens if you change the fz_get_pixmap_from_image function to always use the "if (image->decode)" case	16:33.34
	i.e. ignore the subarea and l2factor downsampling	16:33.48
	just change that if-case to if (1) or something like it	16:34.02
paulgardiner	That's difficult for me to test immediately because I've just this moment provoked a crash with access to a freed block.	16:34.45
	Actually 0za9a9a9a9a9a9a979. What does the 7 mean?	16:35.21
tor8	I have a vague suspicion of where it may be going wrong	16:35.24
	but if the problem persists after nobbling the subarea and/or get_pixmap with an l2factor then I am truly lost	16:37.02
paulgardiner	I will try in just a moment	16:38.35
	This is the crash:	16:39.21
	frame #0: 0x0000000101a66ff8 sodk`safe_find_block(ptr="gUO8gUO8g") at memento.c:1759	16:39.29
	frame #1: 0x0000000101a6846c sodk`do_dropRef(blk="gUO8gUO8g") at memento.c:2054	16:39.30
	frame #2: 0x0000000101a683d8 sodk`Memento_dropRef(blk="gUO8gUO8g") at memento.c:2083	16:39.32
	* frame #3: 0x0000000101a952f8 sodk`ensure_space(ctx=0x000000012f0379f0, tofree=846253) at store.c:364	16:39.33
	frame #4: 0x0000000101a95030 sodk`fz_store_item(ctx=0x000000012f0379f0, key=0x000000012fe19590, val_=0x00000001317b5e00, itemsize=852472, type=0x0000000104ce8178) at store.c:511	16:39.35
	frame #5: 0x0000000101a4c4ec sodk`fz_get_pixmap_from_image(ctx=0x000000012f0379f0, image=0x00000001364e6900, subarea=0x000000016f8ea240, ctm=0x000000016f8ea2a0, dw=0x000000016f8ea290, dh=0x000000016f8ea28c) at image.c:710	16:39.36
	frame #6: 0x0000000101a098b8 sodk`fz_draw_fill_image(ctx=0x000000012f0379f0, devp=0x000000012c189a60, image=0x00000001364e6900, in_ctm=0x000000016f8ea4c8, alpha=1, color_params=0x000000016f8ea500) at draw-device.c:1760	16:39.38
	frame #7: 0x00000001019e89f0 sodk`fz_fill_image(ctx=0x000000012f0379f0, dev=0x000000012c189a60, image=0x00000001364e6900, ctm=0x000000016f8ea4c8, alpha=1, color_params=0x000000016f8ea500) at device.c:329	16:39.39
	frame #8: 0x0000000101a51bd8 sodk`fz_run_display_list(ctx=0x000000012f0379f0, list=0x0000000136579de0, dev=0x000000012c189a60, top_ctm=0x0000000102732eb0, scissor=0x0000000102732ec8, cookie=0x0000000000000000) at list-device.c:1731	16:39.41
	Any feeling whether that could have a common cause?	16:40.29
tor8	it could be, yes.	16:41.19
	they're both happening when converting a fz_image into a fz_pixmap for rendering	16:41.35
paulgardiner	I'll try your test then rather than trying to analise this further	16:41.46
	So I replace if (image_decoded) wtih if (1) ?	16:43.22
tor8	y	16:49.48
paulgardiner	Just rebuilding	16:50.02
	That looks to stop that happening.	16:52.31
tor8	fab! now you know where to send robin hunting ;)	16:52.54
	this is his area of code	16:53.04
sebras	tor8: are you guys also looking at fz_decomp_image_from_stream() ?	16:54.40
	I'm looking at the subarea code at the beginning of that function and if there is a read error it leaves uninitialized memory in the newly allocated samples (but execution continues!)	16:55.50
	tor8: found the issue. I'll badger you for more reviews tomorrow.	17:33.35
	Robin_Watts: though if you have spare time, there are a number of patches concerning separations that I'd like for you to review.	17:34.57
	and knockout group error handling.	17:35.15
Robin_Watts	sebras: Not at the moment, sorry. I'm running late on this code already, and I am trying to avoid distractions.	17:35.29
sebras	Robin_Watts: np, do you know approximately for how long I should avoid distracting you? a week? more?	17:36.08
Robin_Watts	sebras: I had hoped to have this done a week ago.	17:36.25
	I now hope that I'll be done in a week.	17:36.32
sebras	Robin_Watts: ok, so pi weeks then. :)	17:36.54
	Robin_Watts: noted.	17:36.57
fredross-perry	sebras - what's the verdict on limits.h ?	19:38.22
	Forward 1 day (to 2018/02/06)>>>

Log of #mupdf at irc.freenode.net.