MuPDF IRC logs

	<<<Back 1 day (to 2020/10/13)	Fwd 1 day (to 2020/10/15)>>>	20201014
sebras	ator: Wizzup: ah I see what ator mentioned now, I first needed to do this:		04:23.21
	diff --git a/source/cbz/muimg.c b/source/cbz/muimg.c		04:23.21
	@@ -168,7 +168,6 @@ img_open_document_with_stream(fz_context ctx, fz_stream file)		04:23.30
	else if (fmt == FZ_IMAGE_JBIG2)		04:23.33
	{		04:23.35
	to be able to reproduce the problem.		04:23.36
	doc->page_count = fz_load_jbig2_subimage_count(ctx, data, len);		04:23.39
	- doc->load_subimage = fz_load_jbig2_subimage;		04:23.42
	$ ./build/sanitize/mutool draw -st ./x/wizzup/img.jbig2		04:23.46
	page ./x/wizzup/img.jbig2 1warning: jbig2dec error: page has no image, cannot be completed (segment 4294967295)		04:23.46
	error: cannot complete jbig2 image		04:23.49
	warning: read error; treating as end of file		04:23.51
	warning: padding truncated image		04:23.54
	so what jbig2enc created for you is a JBIG2 file.		04:24.13
	files come with an initial header with 8 byte magic and two 4 byte fields.		04:24.55
	the internal JBIG2 image segments do not start until offset 13 in img.jbig2		04:25.18
	the problem is that JBIG2 files can be organized in two ways:		04:25.32
	1) <file header><segment1 header><segment1 data><segment2 header><segment2 data><segment3 header><segment3 data>		04:30.58
	or		04:31.01
	2) <file header><segment1 header><segment2 header><segment3 header><segment1 data><segment2 data><segment3 data>		04:31.05
	I'm suspecting that it is organized the wrong way.		04:31.14
Wizzup	What is the right way from mupdf's perspective?		07:03.03
Tamir_Evan	In Visual Studio, how do I make a release build of mupdf, that has the tesseract functionality, but doesn't require libluratech.lib?		07:56.46
	The 'Release' configuration builds successfully without needing libluratech, but doesn't compile (and link in) the tesseract and leptonica libs (even when sources are in place).		07:57.01
	The 'RelaeseTesseract' configuration compiles the tesseract and leptonica libs, but fails with "LNK1181: cannot open input file '<path to mupdf source>\platform\win32\Release\libluratech.lib'" at link time.		07:57.07
ator	sebras: problem I see is we can read it one way but not the other.		08:36.19
	muimg has a special case to read it the way it works, but when loading it using the normal image API, it doesn't work		08:36.54
sebras	ator: well, PDF expects JBIG2 streams in embedded organisation, which means no file header and then each segment is stored by combining its header and data. like in organisation 1) above.		08:57.05
	JBIG2 _files_ are normally stored with a file header but can use either organisation		08:57.29
	in file mode all segments must be stored in numerically increasing order, but in PDF's embedded mode that is not a requirement.		08:58.13
ator	sebras: fz_load_jbig2_subimage works, fz_new_image_from_buffer doesn't, on the same file.		08:58.38
	why does one work and the other fail?		08:59.10
	I guess it comes down to differences in fz_load_jbig2 and fz_open_jbig2d		09:01.00
sebras	(gdb) bt		09:01.15
	#0 jbig2_ctx_new_imp (allocator=0x555557c62bf0, options=JBIG2_OPTIONS_EMBEDDED, global_ctx=0x0, error_callback=0x55555572aa88 <error_callback>, error_callback_data=0x555557c2c2a0, jbig2_version_major=0, jbig2_version_minor=18) at thirdparty/jbig2dec/jbig2.c:107		09:01.16
	#1 0x000055555572ae49 in fz_open_jbig2d (ctx=0x555557c2c2a0, chain=0x555557c43b10, globals=0x0) at source/fitz/filter-jbig2.c:432		09:01.19
	#2 0x0000555555709830 in fz_open_image_decomp_stream (ctx=0x555557c2c2a0, tail=0x555557c43b10, params=0x555557c5b740, l2factor=0x7fffffffc924) at source/fitz/compressed-buffer.c:72		09:01.22
	#3 0x000055555570965b in fz_open_image_decomp_stream_from_buffer (ctx=0x555557c2c2a0, buffer=0x555557c5b740, l2factor=0x7fffffffc924) at source/fitz/compressed-buffer.c:25		09:01.25
	#4 0x0000555555611dc5 in compressed_image_get_pixmap (ctx=0x555557c2c2a0, image_=0x555557c435d0, subarea=0x7fffffffc904, w=2414, h=3560, l2factor=0x7fffffffc924) at source/fitz/image.c:572		09:01.28
	#5 0x0000555555612758 in fz_get_pixmap_from_image (ctx=0x555557c2c2a0, image=0x555557c435d0, subarea=0x7fffffffc9d0, ctm=0x7fffffffca20, dw=0x7fffffffca14, dh=0x7fffffffca10) at source/fitz/image.c:770		09:01.32
	#6 0x00005555555e66f9 in fz_draw_fill_image (ctx=0x555557c2c2a0, devp=0x555557c5b860, image=0x555557c435d0, in_ctm=..., alpha=1, color_params=...) at source/fitz/draw-device.c:1746		09:01.36
	#7 0x00005555555d9c86 in fz_fill_image (ctx=0x555557c2c2a0, dev=0x555557c5b860, image=0x555557c435d0, ctm=..., alpha=1, color_params=...) at source/fitz/device.c:329		09:01.39
	#8 0x0000555555618cf2 in fz_run_display_list (ctx=0x555557c2c2a0, list=0x555557c5b7b0, dev=0x555557c5b860, top_ctm=..., scissor=..., cookie=0x7fffffffd470) at source/fitz/list-device.c:1742		09:01.42
	#9 0x00005555555a22d6 in drawband (ctx=0x555557c2c2a0, page=0x555557c5b5e0, list=0x555557c5b7b0, ctm=..., tbounds=..., cookie=0x7fffffffd470, band_start=0, pix=0x555557c445e0, bit=0x7fffffffd010) at source/tools/mudraw.c:584		09:01.46
	#10 0x00005555555a453b in dodrawpage (ctx=0x555557c2c2a0, page=0x555557c5b5e0, list=0x555557c5b7b0, pagenum=1, cookie=0x7fffffffd470, start=0, interptime=0, fname=0x7fffffffeb25 "./x/wizzup/img.jbig2", bg=0, seps=0x0) at source/tools/mudraw.c:1062		09:01.50
	#11 0x00005555555a550a in drawpage (ctx=0x555557c2c2a0, doc=0x555557c44640, pagenum=1) at source/tools/mudraw.c:1385		09:01.54
	#12 0x00005555555a56c4 in drawrange (ctx=0x555557c2c2a0, doc=0x555557c44640, range=0x555555a1f75c "") at source/tools/mudraw.c:1424		09:01.57
	#13 0x00005555555a7e89 in mudraw_main (argc=3, argv=0x7fffffffe820) at source/tools/mudraw.c:2363		09:02.01
	#14 0x00005555555a10cc in main (argc=4, argv=0x7fffffffe818) at source/tools/mutool.c:130		09:02.03
	#0 jbig2_ctx_new_imp (allocator=0x7fffffffd0c0, options=(unknown: 0), global_ctx=0x0, error_callback=0x55555561f47f <error_callback>, error_callback_data=0x555557c2c2a0, jbig2_version_major=0, jbig2_version_minor=18) at thirdparty/jbig2dec/jbig2.c:107		09:02.06
	#1 0x000055555561f6af in jbig2_read_image (ctx=0x555557c2c2a0, jbig2=0x7fffffffd120, buf=0x555557c45b50 "\227JB2\r\n\032\n\001", len=69098, only_metadata=0, subimage=0) at source/fitz/load-jbig2.c:310		09:02.10
	#2 0x000055555561fb95 in fz_load_jbig2_subimage (ctx=0x555557c2c2a0, buf=0x555557c45b50 "\227JB2\r\n\032\n\001", len=69098, subimage=0) at source/fitz/load-jbig2.c:413		09:02.14
	#3 0x00005555557011ca in img_load_page (ctx=0x555557c2c2a0, doc_=0x555557c44640, chapter=0, number=0) at source/cbz/muimg.c:95		09:02.17
	#4 0x00005555555dc13c in fz_load_chapter_page (ctx=0x555557c2c2a0, doc=0x555557c44640, chapter=0, number=0) at source/fitz/document.c:534		09:02.21
	#5 0x00005555555dbc5d in fz_load_page (ctx=0x555557c2c2a0, doc=0x555557c44640, number=0) at source/fitz/document.c:404		09:02.23
	#6 0x00005555555a4c98 in drawpage (ctx=0x555557c2c2a0, doc=0x555557c44640, pagenum=1) at source/tools/mudraw.c:1233		09:02.26
	#7 0x00005555555a56c4 in drawrange (ctx=0x555557c2c2a0, doc=0x555557c44640, range=0x555555a1f75c "") at source/tools/mudraw.c:1424		09:02.30
	#8 0x00005555555a7e89 in mudraw_main (argc=3, argv=0x7fffffffe820) at source/tools/mudraw.c:2363		09:02.33
	#9 0x00005555555a10cc in main (argc=4, argv=0x7fffffffe818) at source/tools/mutool.c:130		09:02.36
	likely because options have different values.		09:02.38
ator	JBIG2_OPTIONS_EMBEDDED in one but not the other?		09:03.03
sebras	yes		09:03.07
ator	can we pre-scan for the magic file headers and set the option appropriately?		09:03.32
sebras	per spec PDF requires JBIG2 streams to be in embedded organisation		09:04.02
ator	yeah, but we're able to load JBIG2 streams from files too		09:04.21
sebras	per spec PDF does not accept the magic file header.		09:04.32
ator	or we shouldn't have added JBIG2 to muimg and fz_new_image_from_buffer		09:04.35
Wizzup	How would you know if a file is a JBIG2 image file is it doesn't have the header?		09:04.55
ator	or we need to pass that along with the compressed_buffer constructor		09:05.05
	a separate enum for JBIG2_EMBEDDED and JBIG2_FILE		09:05.16
sebras	Wizzup: in PDF the object which contains the JBIG2 data stream states that is should be decoded using JBIG2Decode		09:05.39
ator	sebras: yeah, if I change the JBIG2_OPTIONS_EMBEDDED to 0 in fz_open_jbig2d the file converts and opens with the default image api		09:06.44
	sebras: is there a magic number in the file header we can look for?		09:07.11
sebras	I don't remember when I added JBIG2 to muimg and fz_new_image_from_buffer(). likely because I had no other JBIG2 viewer and I wanted to be able to view the JBIG2 streams that I could extract from a PDF.		09:07.34
	ator: there is.		09:07.39
	ator: but the file that Wizzup sent _does_ contain the file header as indicated by the 8 byte magic number:		09:08.24
	$ xxd -g1 x/wizzup/img.jbig2 \| head		09:08.27
	00000000: 97 4a 42 32 0d 0a 1a 0a 01 00 00 00 01 00 00 00 .JB2............		09:08.30
	00000010: 00 30 00 01 00 00 00 13 00 00 09 6e 00 00 0d e8 .0.........n....		09:08.32
	if you change it to 0, then a file header is required.		09:09.13
ator	can't jbig2dec detect the file header automagically and just do the right thing?		09:09.32
	is there info in the file header that is required for decoding, or is it just magic packaging?		09:09.53
sebras	the organisation type and the number of pages.		09:10.59
ator	thinking about how to take a jbig2 with a file header, and embed it in PDF when doing pdf_add_image		09:11.02
sebras	if you want to do that you must strip out the file header.		09:11.26
ator	it looks like we should be having separate enums rather than one FZ_IMAGE_JBIG2		09:11.38
	(or just detect the file header everywhere)		09:11.56
sebras	the user supplied the options to jbig2dec prior to passing it any data from the byte stream		09:12.26
	I think raph might have viewed this as the expectation of the data in the stream.		09:13.50
malc_	ator: FWIW that's the closest i came up with regarding the typeface in that jbig picture http://www.identifont.com/identify?34+.+7VX+8N+6B+PAF+5L+4O+8F+3AM+35YW+JI7+19+2Z3D+J+8C+7PQ+2Z38+5V+9A+4A+1KS+2M+19U+53K+1KK+1U6+7VR+3Z+2ZGN+1LA+7G+1QY+8B+A0		09:13.53
sebras	so if the data is incorrectly formatted jbig2dec will try to parse it and get lots of out of range values and probalby error out.		09:14.33
ator	malc_: it looked like a modern font, but too low res to really tell.		09:14.49
	ralph, not raph :)		09:15.05
malc_	ator: i'm not sure what "modern" means		09:15.15
sebras	commit 0a9304dad738268b27556717bf83936c15618506		09:15.18
	Author: raph <raph@ded80894-8fb9-0310-811b-c03f3676ab4d>		09:15.18
malc_	FIGHT!		09:15.27
ator	guess he had his fingers in the pie too		09:15.28
sebras	ator: looks like raph to me. :)		09:15.32
	ator: he did.		09:15.37
ator	sebras: sure does!		09:15.39
sebras	early on.		09:15.40
	the commit above introduced JBIG2_OPTIONS_EMBEDDED.		09:18.52
	ator: oh, and as PDF requirers there not to be a file header, then JBIG2 images _there_ can'		09:19.40
	t be multi page, while normal .jbig2 files, e.g. annex-h.jbig2, can (and that one does).		09:20.08
Wizzup	Looks like the 'jbig2' encoding tool has a flag '-p --pdf': procedure PDF ready data - maybe this is what you are talking about.		09:22.18
sebras	Wizzup: that is likely to enable embedded mode, yes.		09:22.34
	try it -- you shouldn't be getting the first 12 bytes of the hex output I quoted earlier.		09:23.21
Wizzup	It looks like the file is 35 bytes less large, and does seem to omit those headers.		09:25.16
sebras	Wizzup: that's more than 16 bytes. I wonder what else they've omitted. :)		09:30.27
Wizzup	I wonder if it also deals with some of other problems you observed - regarding multi page?		09:30.47
sebras	Wizzup: do you mind providing the pdf-ized jbig2 file?		09:33.44
Wizzup	https://wizzup.org/img-pdf.jbig2		09:36.08
sebras	Wizzup: ah yes, I had forgotten that end of page and end of file segments are also omitted.		09:37.50
Wizzup	If one were to provide mupdf with an embedded JBIG2 file in a file, it is still not clear to me how mupdf would know what image type it is. fz_recognize_image_format cannot know that without the header, right?		09:42.43
ator	Wizzup: correct, it would not be able to recognize it.		09:43.29
Wizzup	As a user (who assumed jbig2 writing was in place, which isn't fully true it seems, but that's fine), I would expect mupdf to read the jbig2 file with header, and strip the header if that is what is required to add the jbig2 into the pdf.		09:44.06
malc_	ator: this UI toolkit that you wrote and mupdf-gl uses really blows goats on my monitor, sorry for being so abrasive but c'mon		09:58.27
ator	too big? too small?		09:59.18
sebras	too goaty.		09:59.25
ator	I like goats.		09:59.31
sebras	malc_: my avatar is a goat.		09:59.34
ator	with their freaky square eyes		09:59.40
malc_	sebras: that explains it		09:59.44
	ator: too small		09:59.48
	and i would say that img-pdf.jbig causes mupdf to return negative values to somewhere		10:00.34
	at least that's the effect it has on llpp trying to view the thing		10:00.51
	nuc:~		10:01.09
	- llpp img-pdf.jbig2		10:01.09
	error processing '"pdim 0 0 0 -2147483648"': Stdlib.Scanf.Scan_failure("scanf: bad input at char number 6: character '-' is not a decimal digit")		10:01.09
			10:01.09
	naughty naughty fitz		10:01.36
	sebras: https://www.youtube.com/watch?v=G3dooqt15kY		10:02.24
ator	malc_: see ui_init for the threshold values when calculating ui_scale		10:03.24
	your monitor needs to report being >= 144, 192, or 288 dpi for the ui scaling to come into effect		10:03.53
malc_	ator: my monitor lies through its teeth when it comes to the dpi to keep windows satisfied i suppose, on top of that i also modify dpi by hand to make my eyes happy		10:05.29
	mupdf being outlier in suckage here		10:05.43
ator	malc_: well, you could hardwire it in gl-ui.c		10:06.20
	"use the source, luke"!		10:06.25
malc_	ator: then again i don't use mupdf, just something you probably should be aware of in an event my spawns will take over the world and start lynching swedes named Tor		10:06.51
	F... T... S...! I'm not touching the source that uses tabs with a ten foot pole!		10:07.24
	https://github.com/moosotc/snippets/blob/master/.xinitrc		10:09.00
	https://github.com/moosotc/snippets/blob/master/bin/setdpi		10:09.00
	dpi that keeps my eyes and brain happy is 163		10:09.30
	whoops make that 190		10:10.30
ator	malc_: where are you going to breed your spawn if you throw out the bathwater?		10:10.45
malc_	ator: in space!		10:15.41
	wi hev a lot ov speis in matha rusha!		10:16.04
ator	everything is better ... IN SPAAAACE!		10:33.28
malc_	heh		10:36.46
	depends on how good your stabilizers are i s'pose		10:37.08
Tamir_Evan	ator: Did you mean to push all those 'WIP' commits ("WIP: Save PDF file", 3 "WIP Bug 702350:..." commits, "WIP refactor uses of zlib", "WIP: mutool run TODO list" and "WIP: Remove support for luratech jbig2 and JPEG2000 decoders") to the main mupdf repo?		11:01.20
ator	Tamir_Evan: no. did I?		11:11.51
	gah! that was an accident.		11:12.08
sebras	ator: my origin/master is on 32e4e8b4b		11:12.44
ator	okay, should be reset now		11:13.23
	cluster may need kicking in the future. Robin_Watts		11:13.28
	Tamir_Evan: the windows luratech/tesseract build issue is something Robin_Watts will have to answer		11:14.00
Robin_Watts	ator: So golden/master should be 32e4e8b4b ?		11:16.20
ator	Robin_Watts: yes		11:17.43
malc_	ator: another thing that blows: 'make build=native' inside mupdf checkout that has those WIP commits, 'git remote update; git pull' 'make build=native' considers everything up-to-date... sigh		11:18.02
	thirdparty uptodatedness tracking leaves a lot to be desired too... oh well, at least those Makefiles _must_ have tabs		11:20.43
	shrug		11:20.46
Robin_Watts	ator: OK, reset, I think.		11:22.26
ator	Robin_Watts: thanks.		11:25.51
Robin_Watts	Tamir_Evan: That shouldn't happen. ReleaseTesseract should be what you want. I'll check in a bit when the current panic is over.		11:27.10
Tamir_Evan	Robin_Watts: O.K., thanks for looking into it.		11:30.20
malc_	ator: fitz spews a lot of diagnostics for - http://port70.net/~nsz/c/c11/n1570.html		12:25.28
Robin_Watts	Tamir_Evan: OK, back.		13:58.00
	So, you're working from git, or from the release?		13:58.16
	I think the answer is to build Release first, then ReleaseTesseract. Just checking now. I'll get it fixed.		14:01.08
Tamir_Evan	Robin_Watts: I'm working from git. Last time I tried was this morning, from commit 32e4e8b4bcbacbf92af7c88337efae21986d9603 ("Bug 702958: Fix overflow in fz_clear_pixmap_with_value"), on the master branch.		16:49.58
	Robin_Watts: "Release first, then ReleaseTesseract" seems a bit convoluted to me, but... O.K..		16:50.16
	Robin_Watts: Maybe I can build just a few key projects in Release, and then build the whole solution in ReleaseTesseract. I'll look into it tomorrow.		16:50.29
Robin_Watts	Tamir_Evan: I'll have a fix tomorrow for you.		16:50.43
	It's a simple config error on my part, but my attention is being occupied elsewhere at the moment.		16:51.01
Tamir_Evan	Robin_Watts: O.K., thank you! I'll have a look sometime tomorrow, or the day after.		16:52.38
	<<<Back 1 day (to 2020/10/13)	Forward 1 day (to 2020/10/15)>>>

Log of #mupdf at irc.freenode.net.