| <<<Back 1 day (to 2020/10/13) | Fwd 1 day (to 2020/10/15)>>> | 20201014 |
sebras | ator: Wizzup: ah I see what ator mentioned now, I first needed to do this: | 04:23.21 |
| diff --git a/source/cbz/muimg.c b/source/cbz/muimg.c | 04:23.21 |
| @@ -168,7 +168,6 @@ img_open_document_with_stream(fz_context *ctx, fz_stream *file) | 04:23.30 |
| else if (fmt == FZ_IMAGE_JBIG2) | 04:23.33 |
| { | 04:23.35 |
| to be able to reproduce the problem. | 04:23.36 |
| doc->page_count = fz_load_jbig2_subimage_count(ctx, data, len); | 04:23.39 |
| - doc->load_subimage = fz_load_jbig2_subimage; | 04:23.42 |
| $ ./build/sanitize/mutool draw -st ./x/wizzup/img.jbig2 | 04:23.46 |
| page ./x/wizzup/img.jbig2 1warning: jbig2dec error: page has no image, cannot be completed (segment 4294967295) | 04:23.46 |
| error: cannot complete jbig2 image | 04:23.49 |
| warning: read error; treating as end of file | 04:23.51 |
| warning: padding truncated image | 04:23.54 |
| so what jbig2enc created for you is a JBIG2 file. | 04:24.13 |
| files come with an initial header with 8 byte magic and two 4 byte fields. | 04:24.55 |
| the internal JBIG2 image segments do not start until offset 13 in img.jbig2 | 04:25.18 |
| the problem is that JBIG2 files can be organized in two ways: | 04:25.32 |
| 1) <file header><segment1 header><segment1 data><segment2 header><segment2 data><segment3 header><segment3 data> | 04:30.58 |
| or | 04:31.01 |
| 2) <file header><segment1 header><segment2 header><segment3 header><segment1 data><segment2 data><segment3 data> | 04:31.05 |
| I'm suspecting that it is organized the wrong way. | 04:31.14 |
Wizzup | What is the right way from mupdf's perspective? | 07:03.03 |
Tamir_Evan | In Visual Studio, how do I make a release build of mupdf, that has the tesseract functionality, but doesn't require libluratech.lib? | 07:56.46 |
| The 'Release' configuration builds successfully without needing libluratech, but doesn't compile (and link in) the tesseract and leptonica libs (even when sources are in place). | 07:57.01 |
| The 'RelaeseTesseract' configuration compiles the tesseract and leptonica libs, but fails with "LNK1181: cannot open input file '<path to mupdf source>\platform\win32\Release\libluratech.lib'" at link time. | 07:57.07 |
ator | sebras: problem I see is we can read it one way but not the other. | 08:36.19 |
| muimg has a special case to read it the way it works, but when loading it using the normal image API, it doesn't work | 08:36.54 |
sebras | ator: well, PDF expects JBIG2 streams in embedded organisation, which means no file header and then each segment is stored by combining its header and data. like in organisation 1) above. | 08:57.05 |
| JBIG2 _files_ are normally stored with a file header but can use either organisation | 08:57.29 |
| in file mode all segments must be stored in numerically increasing order, but in PDF's embedded mode that is not a requirement. | 08:58.13 |
ator | sebras: fz_load_jbig2_subimage works, fz_new_image_from_buffer doesn't, on the same file. | 08:58.38 |
| why does one work and the other fail? | 08:59.10 |
| I guess it comes down to differences in fz_load_jbig2 and fz_open_jbig2d | 09:01.00 |
sebras | (gdb) bt | 09:01.15 |
| #0 jbig2_ctx_new_imp (allocator=0x555557c62bf0, options=JBIG2_OPTIONS_EMBEDDED, global_ctx=0x0, error_callback=0x55555572aa88 <error_callback>, error_callback_data=0x555557c2c2a0, jbig2_version_major=0, jbig2_version_minor=18) at thirdparty/jbig2dec/jbig2.c:107 | 09:01.16 |
| #1 0x000055555572ae49 in fz_open_jbig2d (ctx=0x555557c2c2a0, chain=0x555557c43b10, globals=0x0) at source/fitz/filter-jbig2.c:432 | 09:01.19 |
| #2 0x0000555555709830 in fz_open_image_decomp_stream (ctx=0x555557c2c2a0, tail=0x555557c43b10, params=0x555557c5b740, l2factor=0x7fffffffc924) at source/fitz/compressed-buffer.c:72 | 09:01.22 |
| #3 0x000055555570965b in fz_open_image_decomp_stream_from_buffer (ctx=0x555557c2c2a0, buffer=0x555557c5b740, l2factor=0x7fffffffc924) at source/fitz/compressed-buffer.c:25 | 09:01.25 |
| #4 0x0000555555611dc5 in compressed_image_get_pixmap (ctx=0x555557c2c2a0, image_=0x555557c435d0, subarea=0x7fffffffc904, w=2414, h=3560, l2factor=0x7fffffffc924) at source/fitz/image.c:572 | 09:01.28 |
| #5 0x0000555555612758 in fz_get_pixmap_from_image (ctx=0x555557c2c2a0, image=0x555557c435d0, subarea=0x7fffffffc9d0, ctm=0x7fffffffca20, dw=0x7fffffffca14, dh=0x7fffffffca10) at source/fitz/image.c:770 | 09:01.32 |
| #6 0x00005555555e66f9 in fz_draw_fill_image (ctx=0x555557c2c2a0, devp=0x555557c5b860, image=0x555557c435d0, in_ctm=..., alpha=1, color_params=...) at source/fitz/draw-device.c:1746 | 09:01.36 |
| #7 0x00005555555d9c86 in fz_fill_image (ctx=0x555557c2c2a0, dev=0x555557c5b860, image=0x555557c435d0, ctm=..., alpha=1, color_params=...) at source/fitz/device.c:329 | 09:01.39 |
| #8 0x0000555555618cf2 in fz_run_display_list (ctx=0x555557c2c2a0, list=0x555557c5b7b0, dev=0x555557c5b860, top_ctm=..., scissor=..., cookie=0x7fffffffd470) at source/fitz/list-device.c:1742 | 09:01.42 |
| #9 0x00005555555a22d6 in drawband (ctx=0x555557c2c2a0, page=0x555557c5b5e0, list=0x555557c5b7b0, ctm=..., tbounds=..., cookie=0x7fffffffd470, band_start=0, pix=0x555557c445e0, bit=0x7fffffffd010) at source/tools/mudraw.c:584 | 09:01.46 |
| #10 0x00005555555a453b in dodrawpage (ctx=0x555557c2c2a0, page=0x555557c5b5e0, list=0x555557c5b7b0, pagenum=1, cookie=0x7fffffffd470, start=0, interptime=0, fname=0x7fffffffeb25 "./x/wizzup/img.jbig2", bg=0, seps=0x0) at source/tools/mudraw.c:1062 | 09:01.50 |
| #11 0x00005555555a550a in drawpage (ctx=0x555557c2c2a0, doc=0x555557c44640, pagenum=1) at source/tools/mudraw.c:1385 | 09:01.54 |
| #12 0x00005555555a56c4 in drawrange (ctx=0x555557c2c2a0, doc=0x555557c44640, range=0x555555a1f75c "") at source/tools/mudraw.c:1424 | 09:01.57 |
| #13 0x00005555555a7e89 in mudraw_main (argc=3, argv=0x7fffffffe820) at source/tools/mudraw.c:2363 | 09:02.01 |
| #14 0x00005555555a10cc in main (argc=4, argv=0x7fffffffe818) at source/tools/mutool.c:130 | 09:02.03 |
| #0 jbig2_ctx_new_imp (allocator=0x7fffffffd0c0, options=(unknown: 0), global_ctx=0x0, error_callback=0x55555561f47f <error_callback>, error_callback_data=0x555557c2c2a0, jbig2_version_major=0, jbig2_version_minor=18) at thirdparty/jbig2dec/jbig2.c:107 | 09:02.06 |
| #1 0x000055555561f6af in jbig2_read_image (ctx=0x555557c2c2a0, jbig2=0x7fffffffd120, buf=0x555557c45b50 "\227JB2\r\n\032\n\001", len=69098, only_metadata=0, subimage=0) at source/fitz/load-jbig2.c:310 | 09:02.10 |
| #2 0x000055555561fb95 in fz_load_jbig2_subimage (ctx=0x555557c2c2a0, buf=0x555557c45b50 "\227JB2\r\n\032\n\001", len=69098, subimage=0) at source/fitz/load-jbig2.c:413 | 09:02.14 |
| #3 0x00005555557011ca in img_load_page (ctx=0x555557c2c2a0, doc_=0x555557c44640, chapter=0, number=0) at source/cbz/muimg.c:95 | 09:02.17 |
| #4 0x00005555555dc13c in fz_load_chapter_page (ctx=0x555557c2c2a0, doc=0x555557c44640, chapter=0, number=0) at source/fitz/document.c:534 | 09:02.21 |
| #5 0x00005555555dbc5d in fz_load_page (ctx=0x555557c2c2a0, doc=0x555557c44640, number=0) at source/fitz/document.c:404 | 09:02.23 |
| #6 0x00005555555a4c98 in drawpage (ctx=0x555557c2c2a0, doc=0x555557c44640, pagenum=1) at source/tools/mudraw.c:1233 | 09:02.26 |
| #7 0x00005555555a56c4 in drawrange (ctx=0x555557c2c2a0, doc=0x555557c44640, range=0x555555a1f75c "") at source/tools/mudraw.c:1424 | 09:02.30 |
| #8 0x00005555555a7e89 in mudraw_main (argc=3, argv=0x7fffffffe820) at source/tools/mudraw.c:2363 | 09:02.33 |
| #9 0x00005555555a10cc in main (argc=4, argv=0x7fffffffe818) at source/tools/mutool.c:130 | 09:02.36 |
| likely because options have different values. | 09:02.38 |
ator | JBIG2_OPTIONS_EMBEDDED in one but not the other? | 09:03.03 |
sebras | yes | 09:03.07 |
ator | can we pre-scan for the magic file headers and set the option appropriately? | 09:03.32 |
sebras | per spec PDF requires JBIG2 streams to be in embedded organisation | 09:04.02 |
ator | yeah, but we're able to load JBIG2 streams from files too | 09:04.21 |
sebras | per spec PDF does not accept the magic file header. | 09:04.32 |
ator | or we shouldn't have added JBIG2 to muimg and fz_new_image_from_buffer | 09:04.35 |
Wizzup | How would you know if a file is a JBIG2 image file is it doesn't have the header? | 09:04.55 |
ator | or we need to pass that along with the compressed_buffer constructor | 09:05.05 |
| a separate enum for JBIG2_EMBEDDED and JBIG2_FILE | 09:05.16 |
sebras | Wizzup: in PDF the object which contains the JBIG2 data stream states that is should be decoded using JBIG2Decode | 09:05.39 |
ator | sebras: yeah, if I change the JBIG2_OPTIONS_EMBEDDED to 0 in fz_open_jbig2d the file converts and opens with the default image api | 09:06.44 |
| sebras: is there a magic number in the file header we can look for? | 09:07.11 |
sebras | I don't remember when I added JBIG2 to muimg and fz_new_image_from_buffer(). likely because I had no other JBIG2 viewer and I wanted to be able to view the JBIG2 streams that I could extract from a PDF. | 09:07.34 |
| ator: there is. | 09:07.39 |
| ator: but the file that Wizzup sent _does_ contain the file header as indicated by the 8 byte magic number: | 09:08.24 |
| $ xxd -g1 x/wizzup/img.jbig2 | head | 09:08.27 |
| 00000000: 97 4a 42 32 0d 0a 1a 0a 01 00 00 00 01 00 00 00 .JB2............ | 09:08.30 |
| 00000010: 00 30 00 01 00 00 00 13 00 00 09 6e 00 00 0d e8 .0.........n.... | 09:08.32 |
| if you change it to 0, then a file header is required. | 09:09.13 |
ator | can't jbig2dec detect the file header automagically and just do the right thing? | 09:09.32 |
| is there info in the file header that is required for decoding, or is it just magic packaging? | 09:09.53 |
sebras | the organisation type and the number of pages. | 09:10.59 |
ator | thinking about how to take a jbig2 with a file header, and embed it in PDF when doing pdf_add_image | 09:11.02 |
sebras | if you want to do that you must strip out the file header. | 09:11.26 |
ator | it looks like we should be having separate enums rather than one FZ_IMAGE_JBIG2 | 09:11.38 |
| (or just detect the file header everywhere) | 09:11.56 |
sebras | the user supplied the options to jbig2dec prior to passing it any data from the byte stream | 09:12.26 |
| I think raph might have viewed this as the expectation of the data in the stream. | 09:13.50 |
malc_ | ator: FWIW that's the closest i came up with regarding the typeface in that jbig picture http://www.identifont.com/identify?34+.+7VX+8N+6B+PAF+5L+4O+8F+3AM+35YW+JI7+19+2Z3D+J+8C+7PQ+2Z38+5V+9A+4A+1KS+2M+19U+53K+1KK+1U6+7VR+3Z+2ZGN+1LA+7G+1QY+8B+A0 | 09:13.53 |
sebras | so if the data is incorrectly formatted jbig2dec will try to parse it and get lots of out of range values and probalby error out. | 09:14.33 |
ator | malc_: it looked like a modern font, but too low res to really tell. | 09:14.49 |
| ralph, not raph :) | 09:15.05 |
malc_ | ator: i'm not sure what "modern" means | 09:15.15 |
sebras | commit 0a9304dad738268b27556717bf83936c15618506 | 09:15.18 |
| Author: raph <raph@ded80894-8fb9-0310-811b-c03f3676ab4d> | 09:15.18 |
malc_ | FIGHT! | 09:15.27 |
ator | guess he had his fingers in the pie too | 09:15.28 |
sebras | ator: looks like raph to me. :) | 09:15.32 |
| ator: he did. | 09:15.37 |
ator | sebras: sure does! | 09:15.39 |
sebras | early on. | 09:15.40 |
| the commit above introduced JBIG2_OPTIONS_EMBEDDED. | 09:18.52 |
| ator: oh, and as PDF requirers there not to be a file header, then JBIG2 images _there_ can' | 09:19.40 |
| t be multi page, while normal .jbig2 files, e.g. annex-h.jbig2, can (and that one does). | 09:20.08 |
Wizzup | Looks like the 'jbig2' encoding tool has a flag '-p --pdf': procedure PDF ready data - maybe this is what you are talking about. | 09:22.18 |
sebras | Wizzup: that is likely to enable embedded mode, yes. | 09:22.34 |
| try it -- you shouldn't be getting the first 12 bytes of the hex output I quoted earlier. | 09:23.21 |
Wizzup | It looks like the file is 35 bytes less large, and does seem to omit those headers. | 09:25.16 |
sebras | Wizzup: that's more than 16 bytes. I wonder what else they've omitted. :) | 09:30.27 |
Wizzup | I wonder if it also deals with some of other problems you observed - regarding multi page? | 09:30.47 |
sebras | Wizzup: do you mind providing the pdf-ized jbig2 file? | 09:33.44 |
Wizzup | https://wizzup.org/img-pdf.jbig2 | 09:36.08 |
sebras | Wizzup: ah yes, I had forgotten that end of page and end of file segments are also omitted. | 09:37.50 |
Wizzup | If one were to provide mupdf with an embedded JBIG2 file in a file, it is still not clear to me how mupdf would know what image type it is. fz_recognize_image_format cannot know that without the header, right? | 09:42.43 |
ator | Wizzup: correct, it would not be able to recognize it. | 09:43.29 |
Wizzup | As a user (who assumed jbig2 writing was in place, which isn't fully true it seems, but that's fine), I would expect mupdf to read the jbig2 file with header, and strip the header if that is what is required to add the jbig2 into the pdf. | 09:44.06 |
malc_ | ator: this UI toolkit that you wrote and mupdf-gl uses really blows goats on my monitor, sorry for being so abrasive but c'mon | 09:58.27 |
ator | too big? too small? | 09:59.18 |
sebras | too goaty. | 09:59.25 |
ator | I like goats. | 09:59.31 |
sebras | malc_: my avatar is a goat. | 09:59.34 |
ator | with their freaky square eyes | 09:59.40 |
malc_ | sebras: that explains it | 09:59.44 |
| ator: too small | 09:59.48 |
| and i would say that img-pdf.jbig causes mupdf to return negative values to somewhere | 10:00.34 |
| at least that's the effect it has on llpp trying to view the thing | 10:00.51 |
| nuc:~ | 10:01.09 |
| - llpp img-pdf.jbig2 | 10:01.09 |
| error processing '"pdim 0 0 0 -2147483648"': Stdlib.Scanf.Scan_failure("scanf: bad input at char number 6: character '-' is not a decimal digit") | 10:01.09 |
| | 10:01.09 |
| naughty naughty fitz | 10:01.36 |
| sebras: https://www.youtube.com/watch?v=G3dooqt15kY | 10:02.24 |
ator | malc_: see ui_init for the threshold values when calculating ui_scale | 10:03.24 |
| your monitor needs to report being >= 144, 192, or 288 dpi for the ui scaling to come into effect | 10:03.53 |
malc_ | ator: my monitor lies through its teeth when it comes to the dpi to keep windows satisfied i suppose, on top of that i also modify dpi by hand to make my eyes happy | 10:05.29 |
| mupdf being outlier in suckage here | 10:05.43 |
ator | malc_: well, you could hardwire it in gl-ui.c | 10:06.20 |
| "use the source, luke"! | 10:06.25 |
malc_ | ator: then again i don't use mupdf, just something you probably should be aware of in an event my spawns will take over the world and start lynching swedes named Tor | 10:06.51 |
| F... T... S...! I'm not touching the source that uses tabs with a ten foot pole! | 10:07.24 |
| https://github.com/moosotc/snippets/blob/master/.xinitrc | 10:09.00 |
| https://github.com/moosotc/snippets/blob/master/bin/setdpi | 10:09.00 |
| dpi that keeps my eyes and brain happy is 163 | 10:09.30 |
| whoops make that 190 | 10:10.30 |
ator | malc_: where are you going to breed your spawn if you throw out the bathwater? | 10:10.45 |
malc_ | ator: in space! | 10:15.41 |
| wi hev a lot ov speis in matha rusha! | 10:16.04 |
ator | everything is better ... IN SPAAAACE! | 10:33.28 |
malc_ | heh | 10:36.46 |
| depends on how good your stabilizers are i s'pose | 10:37.08 |
Tamir_Evan | ator: Did you mean to push all those 'WIP' commits ("WIP: Save PDF file", 3 "WIP Bug 702350:..." commits, "WIP refactor uses of zlib", "WIP: mutool run TODO list" and "WIP: Remove support for luratech jbig2 and JPEG2000 decoders") to the main mupdf repo? | 11:01.20 |
ator | Tamir_Evan: no. did I? | 11:11.51 |
| gah! that was an accident. | 11:12.08 |
sebras | ator: my origin/master is on 32e4e8b4b | 11:12.44 |
ator | okay, should be reset now | 11:13.23 |
| cluster may need kicking in the future. Robin_Watts | 11:13.28 |
| Tamir_Evan: the windows luratech/tesseract build issue is something Robin_Watts will have to answer | 11:14.00 |
Robin_Watts | ator: So golden/master should be 32e4e8b4b ? | 11:16.20 |
ator | Robin_Watts: yes | 11:17.43 |
malc_ | ator: another thing that blows: 'make build=native' inside mupdf checkout that has those WIP commits, 'git remote update; git pull' 'make build=native' considers everything up-to-date... sigh | 11:18.02 |
| thirdparty uptodatedness tracking leaves a lot to be desired too... oh well, at least those Makefiles _must_ have tabs | 11:20.43 |
| shrug | 11:20.46 |
Robin_Watts | ator: OK, reset, I think. | 11:22.26 |
ator | Robin_Watts: thanks. | 11:25.51 |
Robin_Watts | Tamir_Evan: That shouldn't happen. ReleaseTesseract should be what you want. I'll check in a bit when the current panic is over. | 11:27.10 |
Tamir_Evan | Robin_Watts: O.K., thanks for looking into it. | 11:30.20 |
malc_ | ator: fitz spews a lot of diagnostics for - http://port70.net/~nsz/c/c11/n1570.html | 12:25.28 |
Robin_Watts | Tamir_Evan: OK, back. | 13:58.00 |
| So, you're working from git, or from the release? | 13:58.16 |
| I think the answer is to build Release first, then ReleaseTesseract. Just checking now. I'll get it fixed. | 14:01.08 |
Tamir_Evan | Robin_Watts: I'm working from git. Last time I tried was this morning, from commit 32e4e8b4bcbacbf92af7c88337efae21986d9603 ("Bug 702958: Fix overflow in fz_clear_pixmap_with_value"), on the master branch. | 16:49.58 |
| Robin_Watts: "Release first, then ReleaseTesseract" seems a bit convoluted to me, but... O.K.. | 16:50.16 |
| Robin_Watts: Maybe I can build just a few key projects in Release, and then build the whole solution in ReleaseTesseract. I'll look into it tomorrow. | 16:50.29 |
Robin_Watts | Tamir_Evan: I'll have a fix tomorrow for you. | 16:50.43 |
| It's a simple config error on my part, but my attention is being occupied elsewhere at the moment. | 16:51.01 |
Tamir_Evan | Robin_Watts: O.K., thank you! I'll have a look sometime tomorrow, or the day after. | 16:52.38 |
| <<<Back 1 day (to 2020/10/13) | Forward 1 day (to 2020/10/15)>>> | |