MuPDF IRC logs

	<<<Back 1 day (to 2020/11/04)	Fwd 1 day (to 2020/11/06)>>>	20201105
ator	sebras: how far did you get in reviewing tor/master?		13:05.30
sam_	Robin_Watts: looks like you got there? :)		13:05.55
	Sorry, I conked out last night.		13:06.01
sebras	ator: Re "Remove support for Luratech JBIG2 and JPEG2000 decoders." I asked, should we go first before gs?		13:06.19
ator	sebras: ah, I missed that q		13:06.34
Robin_Watts	sam_: No problem. Touch wood, it appears to be working. Thanks for the suggestion.		13:06.36
sebras	ator: we could, but what's the plan for gs?		13:06.48
	ator: LGTM "mutool run: Support image mask when creating images from file names."		13:06.54
sam_	Robin_Watts: no problem, let me know if I can help further		13:07.04
Robin_Watts	sebras: I believe we didn't supply luratech this time in the commercial release, but the support code is there.		13:07.31
ator	maybe we should get HenryStiles to weigh in on that?		13:07.37
Robin_Watts	The plan is to remove the support code either next release, or the one after.		13:07.43
sebras	ator: Re "Add 'embedded' option to JBIG2 compressed buffers." I got stuck because I got confused about what this change is actually affecting since I constantly confuse the entire image api. :)P		13:07.55
	Robin_Watts: I know that, but what is the plan for removeing the support code?		13:08.10
	ator: so we're adding support for the embedded option in the jbig2 stream API, i.e. source/fitz/filter-jbig2.c but not in the file API in source/fitz/load-jbig2.c		13:10.23
	and then in the next commit "Only use jbig2 subimage loading for multi-page images." you change muimg to use source/fitz/load-jbig2.c for multipage files and source/fitz/filter-jbig2.c for single page files..?		13:11.17
Robin_Watts	sebras: Those two messages just crossed, right?		13:11.21
sebras	Robin_Watts: yes, I didn't see yours. sorry.		13:11.46
Robin_Watts	no worries. :)		13:12.00
ator	sebras: yes. the first change is so that we can use fz_new_image_from_buffer for non-embedded non-PDF jbig2 images too		13:13.21
	sebras: the second side-steps the "load everything as a pixmap" behavior of multi-page images that mucbz has special cased code for		13:13.44
	so that when we open single-page jbig2 images, we use the compressed_buffer code for them, instead of always loading via a fz_pixmap		13:14.34
sebras	ator: I need more hand holding. so when the PDF-stuff normally decodes a jpeg stream it ends up in pdf_load_image_imp()		13:28.48
	this then calls pdf_load_compressed_stream() that decodes the stream into a buffer. this decoded buffer is then passed to fz_new_image_from_compressed()?		13:29.32
	let's skip inline images right now.		13:29.45
	hm... maybe that's not possible because what I just said is true if cstm == NULL.		13:31.04
ator	pdf_load_image -> ... -> fz_new_image_from_compressed_buffer		13:31.31
sebras	if cstm != NULL then we frist call fz_new_image_from_compressed_buffer() and _then_ call pdf_load_compressed_inline_image().		13:31.42
ator	and pdf_load_compressed_stream, which does the actual work of creating the compressed buffer		13:32.07
	pdf_load_image_imp calls pdf_load_compressed_stream, and then fz_new_image_from_compressed_buffer wraps the compressed_buffer into a fz_image		13:32.50
sebras	ah, shit. there's fz_compressed_buffer too. I expected fz_buffer.		13:33.17
	and was confused as to why we decoded it.		13:33.32
ator	pdf_load_compressed_stream -> pdf_load_image_stream -> pdf_open_image_stream -> pdf_open_filter -> build_filter -> fz_open_image_decomp_stream -> fz_open_jbig2d		13:34.19
sebras	right, and what is the corresponding path for muimg?		13:34.43
ator	a fz_image comes in several flavors. one is backed with compressed_buffer, another is backed with fz_pixmap.		13:34.53
sebras	after your patch for single page and multi page jbig2 files.		13:34.58
ator	fz_new_image_from_buffer -> fz_load_jbig2_info + malloc + fz_new_image_from_compressed_buffer		13:36.14
	however, in the PDF case, since we have a compressed_buffer, build_filter short-stops when it gets to the jbig2 filter (since it knows that the compressed_buffer will take over from there)		13:38.24
	sebras: so there are 3 paths for us to read jbig2 data		13:39.09
sebras	mm, but then compressed_image_get_pixmap -> fz_open_image_decomp_stream_from_buffer + fz_decomp_image_from_stream ->? argghghghghghg.		13:39.19
ator	1) by just opening the PDF stream		13:39.20
	2) by wrapping the PDF stream in a compressed_buffer stuffed in a fz_image		13:39.33
	3) by using the multi-page fz_load_jbig2_subimage		13:40.06
sebras	when are 1) and 2) used?		13:40.20
	in what cases.		13:40.32
	this is _utterly_ confusing to me.		13:40.40
ator	mutool show uses (1)		13:40.43
	to just decode the stream, without knowing or caring that it's an image		13:40.54
	case (2) is when we know we're decoding an image, and want to pass the compressed image data to the device interface without decompressing it in the interpreter		13:41.22
	case (3) you added.		13:41.28
sebras	yes, case 3 I know I added. :)		13:41.39
	doesn't mean I understand what I did. :)		13:41.57
ator	both case (1) and case (2) end up with fz_open_jbig2d		13:41.57
	the compressed_buffer is just a way to "pause" the chain of filters at a known level		13:42.15
sebras	right, and the latter ends up in fz_load_jbig2().		13:42.16
ator	and then we can resume the chain when we want to use the image data, or we can use the compressed data as is		13:42.38
	this is so we can go from an encrypted + ASCII85 + DCT encoded image to actual JPEG data stored in a fz_image + compressed_buffer		13:43.35
	rather than having to decode the full image into a fz_pixmap when we encounter it		13:43.45
sebras	yes.		13:43.55
ator	then the device can decide whether it wants a pixmap, in which case it'd resume decoding the DCT/jbig2/whatever		13:44.19
	or if it wants to say write a new PDF file on the output, it can copy the compressed image data		13:44.31
sebras	so in build_compression_params() you set + params->u.jbig2.embedded = 1; /* jbig2 streams are always embedded without file headers */		13:44.53
	ARGH1		13:44.56
ator	why is that argh?		13:45.11
sebras	(pasted the wrong thing) you set params->u.jbig2.embedded = 1 becuase that is the location where all PDF stuff will go throught.		13:45.24
ator	jbig2 images in PDF files are always embedded, so we want that to be 1		13:45.36
sebras	but the file case will not pass throught build_filter()		13:45.39
ator	fz_new_image_from_buffer takes a standalone JBIG2 image, and there you don't want that flag		13:45.57
	this is so we can pass the correct flag to jbig2dec		13:46.17
sebras	right, but at the right spot.		13:46.35
	and that's what I was trying to understand.		13:46.46
	i.e. would we require embedded jbig2 bitstreams in the file case too?		13:47.03
	but since build_filter() will not be called in that case, then we won't.		13:47.13
	because muimg will end up in fz_new_image_from_buffer().		13:47.23
	and not in pdf_load_compressed_stream().		13:47.35
	did I get that right?		13:47.44
ator	fz_new_image_from_buffer leaves the flag zero from fz_malloc_struct(fz_compressed_buffer)		13:47.48
	yes.		13:47.58
	we could get rid a bit of spaghetti, by hooking the multi-page image loaders into the main ball of pasta :)		13:50.01
	then there would be no case (3)		13:50.11
	fz_load_pnm could go via either a compressed_buffer, or a fz_image.get_pixmap		13:50.52
sebras	and in the muimg case we end up in fz_open_image_decomp_stream_from_buffer() only for two formats: jpeg and jbig2...?		13:51.32
	yes, adding the multipage thing there might be the better option.		13:52.03
ator	sebras: no, we end up there for all PDF images, I think		13:52.24
sebras	well compressed_image_get_pixmap() (used by muimg) loads png/gif/bmp/tiff/pnm/jxr/jpx but jpeg and jbig2 are handled by calling fz_open_image_decomp_stream_from_buffer().		13:54.02
ator	raw, flate, ccittfax, etc.		13:54.03
sebras	yes, in the pdf case, I meant in the muimg case.		13:54.16
ator	I think it would be possible to add a fz_load_jbig2_subimage case to compressed_image_get_pixmap		13:55.30
sebras	we don't open raw,flat,cittfax, etc in the muimg case..?		13:55.31
ator	but we don't for historic reasons (since we need to be able to decode as a non-image anyway)		13:55.42
	fz_new_image_from_buffer cannot create those		13:56.24
sebras	ok, so LGTM "Add 'embedded' option to JBIG2 compressed buffers." and LGTM "Only use jbig2 subimage loading for multi-page images."		13:57.01
ator	since those are not standalone image formats, there's no image width/height/colorspace etc headers		13:57.02
sebras	because w/h/cs is specified in PDF dicts.		13:57.43
ator	yup.		13:57.53
	all PDF images end up in the default: case of compressed_image_get_pixmap		13:58.47
sebras	and xps/svg/fb2/html/etc call use fz_new_image_from_buffer() since they don't use embedded streams like PDF, but rather actual _files_.		13:59.19
ator	y.		13:59.26
	sebras: did you nod at "mutool run: Support image mask when creating images from file names." already?		14:01.58
sebras	y		14:05.07
	"Add JBIG2 support to pdf_add_image." is next.		14:05.20
ator	sebras: I think I missed a drop so we leak the globals->data buffer		14:09.44
	fixup for that on tor/master		14:10.16
Wizzup	I pulled in the 5 JBIG2 patches, rebuild mupdf locally, and I've been able to generate the PDFs I wanted to generate with JBIG2 images instead of ccitt - thank you!		14:38.39
malc_	sobras, ator: is it (or will ever be) possible to load img-pdf.jbig2?		14:45.04
Robin_Watts	malc_: IIRC, the big problem is that jbig2 doesn't have a header on it.		15:00.39
	so you can't immediately spot that something is a jbig2 from looking at the file.		15:01.00
malc_	Robin_Watts: perhaps, it's still a bit puzzling (for an outsider - me) that img.jbig2 is okay but img-pdf.jbig2 is problematic		15:02.44
Robin_Watts	ok, there is clearly stuff I don't follow here, so ignore my comments.		15:03.23
malc_	can do		15:05.26
ator	malc_: there's no way for us to know that it is a jbig2 file -- no magic signature		15:12.53
malc_	ator: gotcha		15:14.18
	ator: img.jbig2 has this JB2 near the start of the file, "non mandatory" filler?		15:16.40
ator	malc_: "JB2\r\n\x1a\n" to be accurate		15:18.28
malc_	ator: now pray tell how my description WASN'T accurate?		15:20.12
ator	img-		15:21.27
	malc_: there are 3 formats of jbig2 data. 2 have a file header, one is embedded data that is not a valid standalone file.		15:22.21
	img-pdf.jbig2 is the latter.		15:22.38
malc_	ator: tack		15:23.09
sebras	so if one wants to add a jbig2 _file_ to a PDF, then this file header must be stripped out, and the naked jbig2 _bitstream_ is put inside the PDF.		15:40.38
	this is what the next patch to review is all about I think.		15:40.51
ator	sebras: yes.		16:06.05
	the next patch is all about stripping the file header and maybe reorganizing the segment headers to form an embedded jbig2 stream		16:06.25
sebras	yeah, I'm looking.		16:08.15
	ator: in pdf_parse_jbig2_segment_header, where does 11 in the first length check come from? 4 byte segment number + 1 byte flags + 4 bytes referred to segment count ought to be the worst case..?		16:30.01
	the message for the last throw in pdf_copy_jbig2_random_segments() should probably refer to "segment data" not "segment header".		16:39.28
	ator: I believe that the end of file segment is optional for files that are sequential.		16:42.52
	nvm the last message, I mixed up the random and sequential cases in your reordering.		16:56.07
	ator: right, so in the future we'd want add_image_res() to supply a image index number to fz_new_image_from_file() so we can ask for a particular page.		17:18.08
	but that means the entire fz_image must be multipage.		17:18.17
ator	sebras: or that the image/compressed buffer has a page number associated with it		17:18.55
sebras	ator: Re "Add JBIG2 support to pdf_add_image." I wonder about the 11 and the error messages (2 of them) mentioned above. otherwise LGTM		17:19.16
	also: JBIG2 is a convenient format. :-/		17:19.38
	ator: anyhow multipage is next patch. :)		17:20.40
ator	sebras: 11 is the minimum file header size		17:20.57
	the second segment header error is an error from parsing the header, even if we're at the data		17:21.29
	the header said there would be more data than there is		17:21.37
	they could of course be reworded better, error messages can always be improved :)		17:22.00
sebras	yes please.		17:22.06
	why does the minimum file header size matter when you are parsing a segment header?		17:22.47
	and isn't the file header size 9 bytes (8 byte id + 1 byte flags (indicating unknown number of pages)		17:24.08
	the minimum page header appears to be 10 bytes? (4 byte segment number + 1 byte segment flags + 1 byte referred to segment count + 4 byte segment data length)...		17:26.08
	I must be missing something.		17:26.17
	ator: LGTM "Bug 703077: Confirm reload if the document has unsaved changes."		17:30.30
Robin_Watts	test		18:13.12
ator	sebras: you missed the return to segment data		18:22.25
	sebras: in any case, I took the number from jbig2dec jbig2_segment.c line 55		18:23.28
sebras	ator: return to segment data?		18:24.17
	ator: referred to segment data? but "count of referred-to segments" might be < 4 in which case the field in 7.2.4 only occupies a single byte which includes both the count _and_ the retain bits.		18:31.46
	I think giles counted them wrong.		18:32.16
	too		18:32.22
ator	I thought so too, and recounted and got 11 but I don't remember how anymore		18:43.25
sebras	ator: at least it is consistent with jbig2dec.		18:43.54
	ator: do you still have your WIP Makelists commit somewhere?		18:44.39
	ator: I reviewed that one, but want to see what you have changed since if anything.		18:44.58
	I've lost it and neither git fsck --dangling nor git reflog finds it.		18:45.25
ator	sebras: there's an older version of it on tor/wip4		18:46.15
sebras	ator: thanks.		18:56.45
	ator: why does HAVE_LEPTONICA, HAVE_TESSERACT and HAVE_EXTRACT all end up in CFLAGS when other libraries like jpegxr get their flags added to THIRD_CFLAGS?		18:59.08
	ator: CURL's variables are not set similar to any other other libraries.		19:02.15
	I know it is the odd one out since we never build it ourselves.		19:02.32
	but still.		19:02.43
	ator: the part that you really wanted my opinion on was the android parts, and to me they look good. I have not testcompiled. should I?		19:07.16
Robin_Watts	HAVE_LEPTONICA is not a flag for leptonica.		19:17.26
	It's a flag that the main code needs to get to determine whether or not it uses the appropriate bits of thirdparty code.		19:18.05
sebras	I know, but then again HAVE_JPEGXR wouldn't be a flag for jpegxr either.		19:18.09
Robin_Watts	OK. I can't argue that.		19:18.19
sebras	HAVE_JPEEGXR is supplied to THIRD_CFLAGS, HAVE_LEPTONICA to CFLAGS. my observation is that it is inconsistent and my question is "why?".		19:18.55
	not sure if it is an oversight. :)		19:19.21
	<<<Back 1 day (to 2020/11/04)	Forward 1 day (to 2020/11/06)>>>

Log of #mupdf at irc.freenode.net.