| <<<Back 1 day (to 2018/02/04) | 20180205 |
sebras | fredross-perry: ok, so I looked at it a bit more. | 01:30.54 |
| fredross-perry: we are requiring android api level 16. this was released in jelly bean android 4.1 on july 9th 2012 | 01:31.15 |
| that corresponds roughly to the NDK r8b release. | 01:31.26 |
fredross-perry | right. | 01:31.34 |
sebras | I attempted to build with the earliest NDK available for download over at https://developer.android.com/ndk/downloads/older_releases.html which id NDK r10e | 01:31.45 |
| and if I add #include <limits.h> to memory.c like you say I can confirm that the build error cocerning use of SIZE_MAX is resolved. | 01:32.16 |
| the worrying thing is that SIZE_MAX according to opengroup ought to be defined in stdint.h, not limits.h | 01:32.41 |
| Google appears to have rectified this in android api level 21. where SIZE_MAX all of a sudden is in stdint.h (while being in limits.h in api level 18) | 01:33.23 |
| now... later versions of the NDK unified the include hierarchy for all api levels | 01:33.49 |
| this appears to have happend in NDK r14. | 01:35.58 |
| so if we state we require NDK r14 or later we should be including stdint.h | 01:36.26 |
| if you want to support NDK r10e through NDK r13b we should be including limits.h | 01:36.58 |
| what a mess. | 01:37.01 |
fredross-perry | I will say that I see the problem in r14b, but not in r15c. But in any case, I'd like to see a strategy where we can use a reasonable range of NDK versions. 14b was only March of last year. | 01:38.07 |
sebras | fredross-perry: oh, yeah. that's pretty late. | 01:38.34 |
fredross-perry | Also, the NDK with which you build is different than the API you run against. We should be able to build with recent NDKs and still run on API 16. | 01:39.15 |
sebras | fredross-perry: NDK r10e was may 2015. | 01:39.23 |
| btw, POSIX/opengroup specifies that SIZE_MAX ought to go in stdint.h while SSIZE_MAX ought to go in limits.h | 01:40.46 |
fredross-perry | Maybe we can test something that should be defined in stdint.h, like SIZE_MAX, and if it's not defined, then include limits.h. | 01:41.14 |
| re: POSIX, yes, I think this is an issue with earlier NDKs that got fixed. | 01:41.46 |
sebras | fredross-perry: mm, read on https://android.googlesource.com/platform/ndk/+/ndk-release-r16/docs/UnifiedHeaders.md that starting NDK r15 the unified headers are used by default. | 01:42.38 |
| perhaps starting at that point the issue is now permanently fixed for all API levels if you have NDK r15 or later. | 01:43.02 |
fredross-perry | yes, that's why I don't have an issue with r15c | 01:44.14 |
sebras | makes sense. | 01:44.33 |
| we do use SIZE_MAX elsewhere so I wonder if those locations are also problematic or if we by chance happen to include the correct header there. | 01:44.56 |
| (indirectly)\ | 01:44.59 |
| also I took the liberty to look for all _MAX and _MIN locations in the mupdf code, and I think I have found a few where we ought to include limits.h/stdint.h but don't (i.e. we're relying on these headers being pulled in by something else (which may be removed in the future)). | 01:46.47 |
fredross-perry | it woud only be problematic if it's part of the Android build. memory.c is, but others may not be. | 01:46.48 |
sebras | it is in source/fitz/store.c where we use SIZE_MAX, and this ought to be part of the android build. | 01:47.14 |
fredross-perry | that one does include limits.h | 01:47.39 |
| So I guess I'm back to, is it safe to just include limits.h in memory.c? | 01:48.16 |
sebras | yes it should be safe, but it is papering over the bug in android. :-/ | 01:48.49 |
fredross-perry | Understood. So, #ifdef HAVE_ANDROID ? | 01:49.59 |
| or, #ifndef SIZE_MAX, or something | 01:51.07 |
sebras | now that I changed memory.c to include limits.h instead I did arrive at a link error. :-( | 01:51.20 |
| timegm is missing. | 01:51.24 |
| so that means that NDK r10e is probably not recent enough for us to build. | 01:51.43 |
fredross-perry | with which NDK? I did the same thing yesterday with r14b and no such error. | 01:52.05 |
| ah yes. | 01:52.23 |
sebras | NDK r10e | 01:52.55 |
fredross-perry | 've got to go out now. Leave me a not on what you decide. I need this change so I can build with ATAS. | 01:52.57 |
sebras | yeah I got that. | 01:53.11 |
paulgardiner | I have a file for which mupdf prints "assert: overwrite hash slot woth different value!". There's a comment above the code that prints the message, which says "This is legal, but should rarely happen". The same file apparently also leads to crashes, but none that I've been able to reproduce yet. | 14:58.06 |
tor8 | paulgardiner: toss me or sebras the file. we've been trying to find and eliminate these issues. | 15:06.28 |
| paulgardiner: they *shouldn't* be happening ... the comment refers to cases where multi-threading might mean the same item might be rendered and stored in the cache by two separate threads | 15:07.20 |
| because neither thread could find the item they were looking for in the cache, and both start rendering it at the same time | 15:07.43 |
| (or decoding or loading, depending on the type of resource) | 15:08.03 |
| but in single-threaded use, it should never occur | 15:08.10 |
paulgardiner | Interesting. Maybe connected with the crash then. I suppose there's some possibility that MuSO is causing memory corruption and this is how it is showing up, but it's consistently this message | 15:09.50 |
| File sent | 15:11.28 |
tor8 | paulgardiner: did you still have some problem saving a file with encryption? I could take a look at that this week, if you could just remind me of the bug number... | 15:14.15 |
paulgardiner | That would be great: 698918 | 15:15.14 |
| tor8: with that hash clash, the existing_tile returned by fz_store_item has a width of 1329134352. That's a bad sign, right? | 15:36.42 |
tor8 | paulgardiner: yeah, that does sound very bad | 15:37.07 |
| except I can't reproduce it with mupdf-gl | 15:38.06 |
| nor with mutool draw | 15:40.56 |
paulgardiner | Can you think of anything I might be doing wrongly in calling mupdf that would specifically trigger that? | 15:44.42 |
| I would expect random memory corruption to fail in a less consistent way. Also would hope that memento would moan. | 15:46.03 |
tor8 | not really. if you could print out the options you pass to the rendering function, maybe we can reproduce it if it depends on zoom levels | 15:46.57 |
paulgardiner | It seems to require more than visiting each page once. | 15:47.26 |
tor8 | hmm. | 15:48.18 |
| do you have a backtrace from where it triggers? | 15:48.26 |
paulgardiner | * frame #0: 0x0000000101e1e710 sodk`do_hash_insert(ctx=0x0000000108d7bed0, table=0x000000010b79ef00, key=0x000000016f875ff8, val=0x000000011de3d960) at hash.c:126 | 15:50.47 |
| frame #1: 0x0000000101e1e33c sodk`fz_hash_insert(ctx=0x0000000108d7bed0, table=0x000000010b79ef00, key=0x000000016f875ff8, val=0x000000011de3d960) at hash.c:220 | 15:50.49 |
| frame #2: 0x0000000101e68e4c sodk`fz_store_item(ctx=0x0000000108d7bed0, key=0x000000011ad61a70, val_=0x0000000119d7ad00, itemsize=38259, type=0x00000001050bc178) at store.c:457 | 15:50.50 |
| frame #3: 0x0000000101e204f8 sodk`fz_get_pixmap_from_image(ctx=0x0000000108d7bed0, image=0x00000001133576f0, subarea=0x000000016f876240, ctm=0x000000016f8762a0, dw=0x000000016f876290, dh=0x000000016f87628c) at image.c:710 | 15:50.52 |
| frame #4: 0x0000000101ddd8c4 sodk`fz_draw_fill_image(ctx=0x0000000108d7bed0, devp=0x0000000106b3b260, image=0x00000001133576f0, in_ctm=0x000000016f8764c8, alpha=1, color_params=0x000000016f876500) at draw-device.c:1760 | 15:50.53 |
| frame #5: 0x0000000101dbc9fc sodk`fz_fill_image(ctx=0x0000000108d7bed0, dev=0x0000000106b3b260, image=0x00000001133576f0, ctm=0x000000016f8764c8, alpha=1, color_params=0x000000016f876500) at device.c:329 | 15:50.55 |
| frame #6: 0x0000000101e25be4 sodk`fz_run_display_list(ctx=0x0000000108d7bed0, list=0x0000000117a23cd0, dev=0x0000000106b3b260, top_ctm=0x0000000102b06eb0, scissor=0x0000000102b06ec8, cookie=0x0000000000000000) at list-device.c:1731 | 15:50.56 |
| I can trigger it consistently now by scrolling slowly through from first page to last and then scrolling back a bit. This is in a side bar of thumbs that render on demand | 15:54.12 |
| It is possibly always page 25 (or 26 starting from 1) | 16:05.05 |
tor8 | paulgardiner: what happens if you change the fz_get_pixmap_from_image function to always use the "if (image->decode)" case | 16:33.34 |
| i.e. ignore the subarea and l2factor downsampling | 16:33.48 |
| just change that if-case to if (1) or something like it | 16:34.02 |
paulgardiner | That's difficult for me to test immediately because I've just this moment provoked a crash with access to a freed block. | 16:34.45 |
| Actually 0za9a9a9a9a9a9a979. What does the 7 mean? | 16:35.21 |
tor8 | I have a vague suspicion of where it may be going wrong | 16:35.24 |
| but if the problem persists after nobbling the subarea and/or get_pixmap with an l2factor then I am truly lost | 16:37.02 |
paulgardiner | I will try in just a moment | 16:38.35 |
| This is the crash: | 16:39.21 |
| frame #0: 0x0000000101a66ff8 sodk`safe_find_block(ptr="gUO8gUO8g") at memento.c:1759 | 16:39.29 |
| frame #1: 0x0000000101a6846c sodk`do_dropRef(blk="gUO8gUO8g") at memento.c:2054 | 16:39.30 |
| frame #2: 0x0000000101a683d8 sodk`Memento_dropRef(blk="gUO8gUO8g") at memento.c:2083 | 16:39.32 |
| * frame #3: 0x0000000101a952f8 sodk`ensure_space(ctx=0x000000012f0379f0, tofree=846253) at store.c:364 | 16:39.33 |
| frame #4: 0x0000000101a95030 sodk`fz_store_item(ctx=0x000000012f0379f0, key=0x000000012fe19590, val_=0x00000001317b5e00, itemsize=852472, type=0x0000000104ce8178) at store.c:511 | 16:39.35 |
| frame #5: 0x0000000101a4c4ec sodk`fz_get_pixmap_from_image(ctx=0x000000012f0379f0, image=0x00000001364e6900, subarea=0x000000016f8ea240, ctm=0x000000016f8ea2a0, dw=0x000000016f8ea290, dh=0x000000016f8ea28c) at image.c:710 | 16:39.36 |
| frame #6: 0x0000000101a098b8 sodk`fz_draw_fill_image(ctx=0x000000012f0379f0, devp=0x000000012c189a60, image=0x00000001364e6900, in_ctm=0x000000016f8ea4c8, alpha=1, color_params=0x000000016f8ea500) at draw-device.c:1760 | 16:39.38 |
| frame #7: 0x00000001019e89f0 sodk`fz_fill_image(ctx=0x000000012f0379f0, dev=0x000000012c189a60, image=0x00000001364e6900, ctm=0x000000016f8ea4c8, alpha=1, color_params=0x000000016f8ea500) at device.c:329 | 16:39.39 |
| frame #8: 0x0000000101a51bd8 sodk`fz_run_display_list(ctx=0x000000012f0379f0, list=0x0000000136579de0, dev=0x000000012c189a60, top_ctm=0x0000000102732eb0, scissor=0x0000000102732ec8, cookie=0x0000000000000000) at list-device.c:1731 | 16:39.41 |
| Any feeling whether that could have a common cause? | 16:40.29 |
tor8 | it could be, yes. | 16:41.19 |
| they're both happening when converting a fz_image into a fz_pixmap for rendering | 16:41.35 |
paulgardiner | I'll try your test then rather than trying to analise this further | 16:41.46 |
| So I replace if (image_decoded) wtih if (1) ? | 16:43.22 |
tor8 | y | 16:49.48 |
paulgardiner | Just rebuilding | 16:50.02 |
| That looks to stop that happening. | 16:52.31 |
tor8 | fab! now you know where to send robin hunting ;) | 16:52.54 |
| this is his area of code | 16:53.04 |
sebras | tor8: are you guys also looking at fz_decomp_image_from_stream() ? | 16:54.40 |
| I'm looking at the subarea code at the beginning of that function and if there is a read error it leaves uninitialized memory in the newly allocated samples (but execution continues!) | 16:55.50 |
| tor8: found the issue. I'll badger you for more reviews tomorrow. | 17:33.35 |
| Robin_Watts: though if you have spare time, there are a number of patches concerning separations that I'd like for you to review. | 17:34.57 |
| and knockout group error handling. | 17:35.15 |
Robin_Watts | sebras: Not at the moment, sorry. I'm running late on this code already, and I am trying to avoid distractions. | 17:35.29 |
sebras | Robin_Watts: np, do you know approximately for how long I should avoid distracting you? a week? more? | 17:36.08 |
Robin_Watts | sebras: I had hoped to have this done a week ago. | 17:36.25 |
| I now hope that I'll be done in a week. | 17:36.32 |
sebras | Robin_Watts: ok, so pi weeks then. :) | 17:36.54 |
| Robin_Watts: noted. | 17:36.57 |
fredross-perry | sebras - what's the verdict on limits.h ? | 19:38.22 |
| Forward 1 day (to 2018/02/06)>>> | |