| <<<Back 1 day (to 2020/11/04) | Fwd 1 day (to 2020/11/06)>>> | 20201105 |
ator | sebras: how far did you get in reviewing tor/master? | 13:05.30 |
sam_ | Robin_Watts: looks like you got there? :) | 13:05.55 |
| Sorry, I conked out last night. | 13:06.01 |
sebras | ator: Re "Remove support for Luratech JBIG2 and JPEG2000 decoders." I asked, should we go first before gs? | 13:06.19 |
ator | sebras: ah, I missed that q | 13:06.34 |
Robin_Watts | sam_: No problem. Touch wood, it appears to be working. Thanks for the suggestion. | 13:06.36 |
sebras | ator: we could, but what's the plan for gs? | 13:06.48 |
| ator: LGTM "mutool run: Support image mask when creating images from file names." | 13:06.54 |
sam_ | Robin_Watts: no problem, let me know if I can help further | 13:07.04 |
Robin_Watts | sebras: I believe we didn't supply luratech this time in the commercial release, but the support code is there. | 13:07.31 |
ator | maybe we should get HenryStiles to weigh in on that? | 13:07.37 |
Robin_Watts | The plan is to remove the support code either next release, or the one after. | 13:07.43 |
sebras | ator: Re "Add 'embedded' option to JBIG2 compressed buffers." I got stuck because I got confused about what this change is actually affecting since I constantly confuse the entire image api. :)P | 13:07.55 |
| Robin_Watts: I know that, but what is the plan for removeing the support code? | 13:08.10 |
| ator: so we're adding support for the embedded option in the jbig2 stream API, i.e. source/fitz/filter-jbig2.c but not in the file API in source/fitz/load-jbig2.c | 13:10.23 |
| and then in the next commit "Only use jbig2 subimage loading for multi-page images." you change muimg to use source/fitz/load-jbig2.c for multipage files and source/fitz/filter-jbig2.c for single page files..? | 13:11.17 |
Robin_Watts | sebras: Those two messages just crossed, right? | 13:11.21 |
sebras | Robin_Watts: yes, I didn't see yours. sorry. | 13:11.46 |
Robin_Watts | no worries. :) | 13:12.00 |
ator | sebras: yes. the first change is so that we can use fz_new_image_from_buffer for non-embedded non-PDF jbig2 images too | 13:13.21 |
| sebras: the second side-steps the "load everything as a pixmap" behavior of multi-page images that mucbz has special cased code for | 13:13.44 |
| so that when we open single-page jbig2 images, we use the compressed_buffer code for them, instead of always loading via a fz_pixmap | 13:14.34 |
sebras | ator: I need more hand holding. so when the PDF-stuff normally decodes a jpeg stream it ends up in pdf_load_image_imp() | 13:28.48 |
| this then calls pdf_load_compressed_stream() that decodes the stream into a buffer. this decoded buffer is then passed to fz_new_image_from_compressed()? | 13:29.32 |
| let's skip inline images right now. | 13:29.45 |
| hm... maybe that's not possible because what I just said is true if cstm == NULL. | 13:31.04 |
ator | pdf_load_image -> ... -> fz_new_image_from_compressed_buffer | 13:31.31 |
sebras | if cstm != NULL then we frist call fz_new_image_from_compressed_buffer() and _then_ call pdf_load_compressed_inline_image(). | 13:31.42 |
ator | and pdf_load_compressed_stream, which does the actual work of creating the compressed buffer | 13:32.07 |
| pdf_load_image_imp calls pdf_load_compressed_stream, and then fz_new_image_from_compressed_buffer wraps the compressed_buffer into a fz_image | 13:32.50 |
sebras | ah, shit. there's fz_compressed_buffer too. I expected fz_buffer. | 13:33.17 |
| and was confused as to why we decoded it. | 13:33.32 |
ator | pdf_load_compressed_stream -> pdf_load_image_stream -> pdf_open_image_stream -> pdf_open_filter -> build_filter -> fz_open_image_decomp_stream -> fz_open_jbig2d | 13:34.19 |
sebras | right, and what is the corresponding path for muimg? | 13:34.43 |
ator | a fz_image comes in several flavors. one is backed with compressed_buffer, another is backed with fz_pixmap. | 13:34.53 |
sebras | after your patch for single page and multi page jbig2 files. | 13:34.58 |
ator | fz_new_image_from_buffer -> fz_load_jbig2_info + malloc + fz_new_image_from_compressed_buffer | 13:36.14 |
| however, in the PDF case, since we have a compressed_buffer, build_filter short-stops when it gets to the jbig2 filter (since it knows that the compressed_buffer will take over from there) | 13:38.24 |
| sebras: so there are 3 paths for us to read jbig2 data | 13:39.09 |
sebras | mm, but then compressed_image_get_pixmap -> fz_open_image_decomp_stream_from_buffer + fz_decomp_image_from_stream ->? argghghghghghg. | 13:39.19 |
ator | 1) by just opening the PDF stream | 13:39.20 |
| 2) by wrapping the PDF stream in a compressed_buffer stuffed in a fz_image | 13:39.33 |
| 3) by using the multi-page fz_load_jbig2_subimage | 13:40.06 |
sebras | when are 1) and 2) used? | 13:40.20 |
| in what cases. | 13:40.32 |
| this is _utterly_ confusing to me. | 13:40.40 |
ator | mutool show uses (1) | 13:40.43 |
| to just decode the stream, without knowing or caring that it's an image | 13:40.54 |
| case (2) is when we know we're decoding an image, and want to pass the compressed image data to the device interface without decompressing it in the interpreter | 13:41.22 |
| case (3) you added. | 13:41.28 |
sebras | yes, case 3 I know I added. :) | 13:41.39 |
| doesn't mean I understand what I did. :) | 13:41.57 |
ator | both case (1) and case (2) end up with fz_open_jbig2d | 13:41.57 |
| the compressed_buffer is just a way to "pause" the chain of filters at a known level | 13:42.15 |
sebras | right, and the latter ends up in fz_load_jbig2(). | 13:42.16 |
ator | and then we can resume the chain when we want to use the image data, or we can use the compressed data as is | 13:42.38 |
| this is so we can go from an encrypted + ASCII85 + DCT encoded image to actual JPEG data stored in a fz_image + compressed_buffer | 13:43.35 |
| rather than having to decode the full image into a fz_pixmap when we encounter it | 13:43.45 |
sebras | yes. | 13:43.55 |
ator | then the device can decide whether it wants a pixmap, in which case it'd resume decoding the DCT/jbig2/whatever | 13:44.19 |
| or if it wants to say write a new PDF file on the output, it can copy the compressed image data | 13:44.31 |
sebras | so in build_compression_params() you set + params->u.jbig2.embedded = 1; /* jbig2 streams are always embedded without file headers */ | 13:44.53 |
| ARGH1 | 13:44.56 |
ator | why is that argh? | 13:45.11 |
sebras | (pasted the wrong thing) you set params->u.jbig2.embedded = 1 becuase that is the location where all PDF stuff will go throught. | 13:45.24 |
ator | jbig2 images in PDF files are always embedded, so we want that to be 1 | 13:45.36 |
sebras | but the file case will not pass throught build_filter() | 13:45.39 |
ator | fz_new_image_from_buffer takes a standalone JBIG2 image, and there you don't want that flag | 13:45.57 |
| this is so we can pass the correct flag to jbig2dec | 13:46.17 |
sebras | right, but at the right spot. | 13:46.35 |
| and that's what I was trying to understand. | 13:46.46 |
| i.e. would we require embedded jbig2 bitstreams in the file case too? | 13:47.03 |
| but since build_filter() will not be called in that case, then we won't. | 13:47.13 |
| because muimg will end up in fz_new_image_from_buffer(). | 13:47.23 |
| and not in pdf_load_compressed_stream(). | 13:47.35 |
| did I get that right? | 13:47.44 |
ator | fz_new_image_from_buffer leaves the flag zero from fz_malloc_struct(fz_compressed_buffer) | 13:47.48 |
| yes. | 13:47.58 |
| we could get rid a bit of spaghetti, by hooking the multi-page image loaders into the main ball of pasta :) | 13:50.01 |
| then there would be no case (3) | 13:50.11 |
| fz_load_pnm could go via either a compressed_buffer, or a fz_image.get_pixmap | 13:50.52 |
sebras | and in the muimg case we end up in fz_open_image_decomp_stream_from_buffer() only for two formats: jpeg and jbig2...? | 13:51.32 |
| yes, adding the multipage thing there might be the better option. | 13:52.03 |
ator | sebras: no, we end up there for all PDF images, I think | 13:52.24 |
sebras | well compressed_image_get_pixmap() (used by muimg) loads png/gif/bmp/tiff/pnm/jxr/jpx but jpeg and jbig2 are handled by calling fz_open_image_decomp_stream_from_buffer(). | 13:54.02 |
ator | raw, flate, ccittfax, etc. | 13:54.03 |
sebras | yes, in the pdf case, I meant in the muimg case. | 13:54.16 |
ator | I think it would be possible to add a fz_load_jbig2_subimage case to compressed_image_get_pixmap | 13:55.30 |
sebras | we don't open raw,flat,cittfax, etc in the muimg case..? | 13:55.31 |
ator | but we don't for historic reasons (since we need to be able to decode as a non-image anyway) | 13:55.42 |
| fz_new_image_from_buffer cannot create those | 13:56.24 |
sebras | ok, so LGTM "Add 'embedded' option to JBIG2 compressed buffers." and LGTM "Only use jbig2 subimage loading for multi-page images." | 13:57.01 |
ator | since those are not standalone image formats, there's no image width/height/colorspace etc headers | 13:57.02 |
sebras | because w/h/cs is specified in PDF dicts. | 13:57.43 |
ator | yup. | 13:57.53 |
| all PDF images end up in the default: case of compressed_image_get_pixmap | 13:58.47 |
sebras | and xps/svg/fb2/html/etc call use fz_new_image_from_buffer() since they don't use embedded streams like PDF, but rather actual _files_. | 13:59.19 |
ator | y. | 13:59.26 |
| sebras: did you nod at "mutool run: Support image mask when creating images from file names." already? | 14:01.58 |
sebras | y | 14:05.07 |
| "Add JBIG2 support to pdf_add_image." is next. | 14:05.20 |
ator | sebras: I think I missed a drop so we leak the globals->data buffer | 14:09.44 |
| fixup for that on tor/master | 14:10.16 |
Wizzup | I pulled in the 5 JBIG2 patches, rebuild mupdf locally, and I've been able to generate the PDFs I wanted to generate with JBIG2 images instead of ccitt - thank you! | 14:38.39 |
malc_ | sobras, ator: is it (or will ever be) possible to load img-pdf.jbig2? | 14:45.04 |
Robin_Watts | malc_: IIRC, the big problem is that jbig2 doesn't have a header on it. | 15:00.39 |
| so you can't immediately spot that something is a jbig2 from looking at the file. | 15:01.00 |
malc_ | Robin_Watts: perhaps, it's still a bit puzzling (for an outsider - me) that img.jbig2 is okay but img-pdf.jbig2 is problematic | 15:02.44 |
Robin_Watts | ok, there is clearly stuff I don't follow here, so ignore my comments. | 15:03.23 |
malc_ | can do | 15:05.26 |
ator | malc_: there's no way for us to know that it is a jbig2 file -- no magic signature | 15:12.53 |
malc_ | ator: gotcha | 15:14.18 |
| ator: img.jbig2 has this JB2 near the start of the file, "non mandatory" filler? | 15:16.40 |
ator | malc_: "JB2\r\n\x1a\n" to be accurate | 15:18.28 |
malc_ | ator: now pray tell how my description WASN'T accurate? | 15:20.12 |
ator | img- | 15:21.27 |
| malc_: there are 3 formats of jbig2 data. 2 have a file header, one is embedded data that is not a valid standalone file. | 15:22.21 |
| img-pdf.jbig2 is the latter. | 15:22.38 |
malc_ | ator: tack | 15:23.09 |
sebras | so if one wants to add a jbig2 _file_ to a PDF, then this file header must be stripped out, and the naked jbig2 _bitstream_ is put inside the PDF. | 15:40.38 |
| this is what the next patch to review is all about I think. | 15:40.51 |
ator | sebras: yes. | 16:06.05 |
| the next patch is all about stripping the file header and maybe reorganizing the segment headers to form an embedded jbig2 stream | 16:06.25 |
sebras | yeah, I'm looking. | 16:08.15 |
| ator: in pdf_parse_jbig2_segment_header, where does 11 in the first length check come from? 4 byte segment number + 1 byte flags + 4 bytes referred to segment count ought to be the worst case..? | 16:30.01 |
| the message for the last throw in pdf_copy_jbig2_random_segments() should probably refer to "segment data" not "segment header". | 16:39.28 |
| ator: I believe that the end of file segment is optional for files that are sequential. | 16:42.52 |
| nvm the last message, I mixed up the random and sequential cases in your reordering. | 16:56.07 |
| ator: right, so in the future we'd want add_image_res() to supply a image index number to fz_new_image_from_file() so we can ask for a particular page. | 17:18.08 |
| but that means the entire fz_image must be multipage. | 17:18.17 |
ator | sebras: or that the image/compressed buffer has a page number associated with it | 17:18.55 |
sebras | ator: Re "Add JBIG2 support to pdf_add_image." I wonder about the 11 and the error messages (2 of them) mentioned above. otherwise LGTM | 17:19.16 |
| also: JBIG2 is a convenient format. :-/ | 17:19.38 |
| ator: anyhow multipage is next patch. :) | 17:20.40 |
ator | sebras: 11 is the minimum file header size | 17:20.57 |
| the second segment header error is an error from parsing the header, even if we're at the data | 17:21.29 |
| the header said there would be more data than there is | 17:21.37 |
| they could of course be reworded better, error messages can always be improved :) | 17:22.00 |
sebras | yes please. | 17:22.06 |
| why does the minimum file header size matter when you are parsing a segment header? | 17:22.47 |
| and isn't the file header size 9 bytes (8 byte id + 1 byte flags (indicating unknown number of pages) | 17:24.08 |
| the minimum page header appears to be 10 bytes? (4 byte segment number + 1 byte segment flags + 1 byte referred to segment count + 4 byte segment data length)... | 17:26.08 |
| I must be missing something. | 17:26.17 |
| ator: LGTM "Bug 703077: Confirm reload if the document has unsaved changes." | 17:30.30 |
Robin_Watts | test | 18:13.12 |
ator | sebras: you missed the return to segment data | 18:22.25 |
| sebras: in any case, I took the number from jbig2dec jbig2_segment.c line 55 | 18:23.28 |
sebras | ator: return to segment data? | 18:24.17 |
| ator: referred to segment data? but "count of referred-to segments" might be < 4 in which case the field in 7.2.4 only occupies a single byte which includes both the count _and_ the retain bits. | 18:31.46 |
| I think giles counted them wrong. | 18:32.16 |
| too | 18:32.22 |
ator | I thought so too, and recounted and got 11 but I don't remember how anymore | 18:43.25 |
sebras | ator: at least it is consistent with jbig2dec. | 18:43.54 |
| ator: do you still have your WIP Makelists commit somewhere? | 18:44.39 |
| ator: I reviewed that one, but want to see what you have changed since if anything. | 18:44.58 |
| I've lost it and neither git fsck --dangling nor git reflog finds it. | 18:45.25 |
ator | sebras: there's an older version of it on tor/wip4 | 18:46.15 |
sebras | ator: thanks. | 18:56.45 |
| ator: why does HAVE_LEPTONICA, HAVE_TESSERACT and HAVE_EXTRACT all end up in CFLAGS when other libraries like jpegxr get their flags added to THIRD_CFLAGS? | 18:59.08 |
| ator: CURL's variables are not set similar to any other other libraries. | 19:02.15 |
| I know it is the odd one out since we never build it ourselves. | 19:02.32 |
| but still. | 19:02.43 |
| ator: the part that you really wanted my opinion on was the android parts, and to me they look good. I have not testcompiled. should I? | 19:07.16 |
Robin_Watts | HAVE_LEPTONICA is not a flag for leptonica. | 19:17.26 |
| It's a flag that the main code needs to get to determine whether or not it uses the appropriate bits of thirdparty code. | 19:18.05 |
sebras | I know, but then again HAVE_JPEGXR wouldn't be a flag for jpegxr either. | 19:18.09 |
Robin_Watts | OK. I can't argue that. | 19:18.19 |
sebras | HAVE_JPEEGXR is supplied to THIRD_CFLAGS, HAVE_LEPTONICA to CFLAGS. my observation is that it is inconsistent and my question is "why?". | 19:18.55 |
| not sure if it is an oversight. :) | 19:19.21 |
| <<<Back 1 day (to 2020/11/04) | Forward 1 day (to 2020/11/06)>>> | |