MuPDF IRC logs

	<<<Back 1 day (to 2020/10/12)	Fwd 1 day (to 2020/10/14)>>>	20201013
muevoid	Hello! I am trying to build mupdf with clang and clangutils however I get this error: ld: error: target emulation unknown: -m or at least one .o file required. Any ideas?		07:45.16
sebras	muevoid: do you have the full compilation log somewhere?		07:48.21
muevoid	Yes I can get it and upload to termbin		07:48.35
sebras	muevoid: and what version of mupdf are you compiling git HEAD?		07:48.46
muevoid	1.18.0		07:48.59
sebras	ok.		07:49.13
muevoid	Here is the build log: https://termbin.com/vyzl. I also have the following patch but don't believe it would affect this: https://termbin.com/ltoj		07:50.09
	Actually sorry for wasting your time. I missed a package! My apologies		07:55.22
sebras	ok.		07:55.28
	muevoid: do you mind telling me what package?		07:55.53
	tesseract/leptonica perhaps?		07:56.03
muevoid	gnu-as, I run a distro which get's rid of a lot of gnu things and ld.bfd is needed for linking since clang ld doesn't have a default output		07:57.08
sebras	ok.		07:57.23
muevoid	Sorry for any inconveince		07:57.32
Wizzup	Hi - I'm trying to save a JBIG2 file via pymupdf, but I get this error: `mupdf: cannot complete jbig2 image`. Any clue how I can get more info? I looked at the source code and it seems like in all cases where the function jbig2_complete_page would return < 0, there should be an error printed with level JBIG2_SEVERITY_WARNING, but I am not seeing any.		10:52.16
	Maybe it is a jbig2 version mismatch.		12:02.06
ator	Wizzup: I haven't a clue, I'm afraid. You might have more luck asking the pymupdf developers.		12:39.41
Wizzup	I am in contact with him. In case you are interested: https://github.com/pymupdf/PyMuPDF/issues/685		12:41.16
ator	Wizzup: completely off topic, but I'd advise against using JPX -- it's a horribly slow and memory hungry image format		12:42.22
	JPEG would be a much better choice, IMO.		12:42.31
	Wizzup: the functions that pymupdf uses don't internally support jbig2 compression. however, we detect and compress monochrome 1-bit images using CCITT fax encoding.		12:46.18
	if you want to create a PDF from known input data, and have the jbig2 stream, it's actually pretty easy to just "printf" the PDF file from scratch without using a library		12:46.55
	basically just write a bunch of template strings, write the image data, while keeping track of the file offsets of each object to create the 'xref' table at the end		12:47.31
	images loaded and added using the pdf_add_image function only support a subset of formats (and not jbig2 for complicated reasons -- the "global" dictionary being the main problem)		12:49.01
vtorri__	ator hello		12:49.41
ator	hi		12:49.51
mubot	Welcome to #mupdf, the channel for MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.		12:49.51
vtorri__	ator does the PDF format allow to support any kind of image format ?		12:50.19
	ator like as an extension		12:50.35
ator	define "format"		12:50.35
	you mean compression/file format?		12:50.47
vtorri__	webp, avif etc...		12:50.53
ator	no. it does not.		12:50.57
vtorri__	container and codec		12:51.00
	ok, thank you		12:51.10
ator	it supports a specific set of formats, but no way to extend it.		12:51.23
vtorri__	ok		12:51.29
	as weel as pdf 2 ?		12:51.45
ator	correct.		12:52.08
vtorri__	thank you		12:53.31
ator	Wizzup: "If we pass a JBIG2 stream to write to the PDF, why would MuPDF want to decode it?" it decodes it because pdf_add_image doesn't know how to write JBIG2, so it decodes it first, then chooses a compression for output after looking at the data.		12:53.50
Wizzup	ator: I do not wish to just copy a PDF from known input data, that was just my first test		13:11.36
ator	Wizzup: do you have an example jbig2 file such as one you want to create the data from?		13:13.47
Wizzup	Sure: https://wizzup.org/img.jbig2		13:14.20
	(Created using https://github.com/agl/jbig2enc)		13:14.35
ator	403 forbidden		13:14.35
Wizzup	oops. sorry.		13:14.40
	Should work now (I created the initial file with mkstemp)		13:15.07
ator	got it		13:15.56
malc_	ator: any idea what typeface was used in the book this jbig file is an image of?		13:17.35
ator	Wizzup: it looks to be loaded as a compressed jbig2 image internally		13:19.07
	Wizzup: but for some reason it's unable to decompress the image		13:20.57
Wizzup	jbig2dec can decompress the image: jbig2dec img.jbig2 -t png -o /tmp/out.png		13:21.28
ator	I suspect our code expects jbig2 images to only come from the streams embedded in a PDF file, not an external jbig2		13:22.17
Wizzup	So I have tried stripping the 8 byte header, but then mupdf says it cannot recognize what image it is dealing with (which makes sense)		13:23.03
ator	Wizzup: yeah. this looks to be a bug in mupdf.		13:23.33
	sebras: (when you're back from dinner) the muimg.c code has a special case for load_subimage for jbig2, if I remove that special case and try to load a jbig2 image file "normally", I fail with the same error as Wizzup is seeing		13:24.30
Wizzup	Check, can you let the pymupdf developer know, or should I? (To prevent him from going on a ghost chase.)		13:24.39
ator	I'll write a comment		13:24.54
Wizzup	thanks!		13:25.11
ator	you'll still not be able to save it as a jbig2 even if I fix this error		13:25.49
	at least not until I extend pdf_add_image to support writing jbig2 streams (but that may not be easy, due to the jbig2 file header thing)		13:26.23
	but you should at least be able to load them and mupdf should compress as ccitt fax!		13:27.02
Wizzup	That would be a great start. :-)		13:29.49
	(I'm hoping to release this tool to compress PDFs with MRC in the next week or so, because I couldn't find anything that did it for me.)		13:30.51
ator	Wizzup: you could start with JPEG and CCITT fax (using monochrome png as input) :)		13:31.17
Wizzup	And on JPX, well noted. I'm going with it mostly because of the (perceived?) quality/size ratio.		13:31.26
ator	or pbm, if you want to keep it really simple		13:31.37
	it's maybe 10% smaller, but 10x slower to decompress		13:32.01
Wizzup	Right, but the main goal of applying MRC is to compress the images a lot, so that we can fit ~300 large images in a PDF and still keep it relatively small		13:32.14
ator	99% of slow PDF files are due to using JPX compressed images		13:32.24
	just use heavier jpeg compression :)		13:32.38
	JPX is especially bad on huge fill-page high-resolution images		13:32.51
Wizzup	JPX is JPEG2000, right?		13:33.15
ator	yes		13:33.19
	sorry :)		13:33.22
	it's "JPXDecode" internally in PDF		13:33.32
Wizzup	Yeah, I realised		13:33.45
	Interesting, so the Internet Archive stores most of their photos of books as JPEG2000, and I must say I am impressed with the quality/size ratio compared to normal JPEG		13:34.06
	But I haven't done much investigation on my own into the perceived benefits.		13:34.26
	(And yes, JPEG2000 is a pain to deal with)		13:34.30
ator	a full page jpeg2000 image at 300dpi is about 8 megapixels. that needs ~150MB of ram to decompress.		13:35.08
	due to jpeg2000 decoders being slow and stupid about pixel format representation (storing each color component as a full integer)		13:35.33
Wizzup	Is that with OpenJPEG?		13:35.36
ator	yeah		13:35.40
	jasper was equally bad, and even slower		13:35.49
Wizzup	Yeah, I've found OpenJPEG to be quite slow. I've also used this "kakadu" thing and it's much, much faster but not FOSS.		13:36.05
ator	there are no good open source jpeg2000 decoders :(		13:36.22
	plenty of proprietary ones		13:36.28
Wizzup	Yeah, indeed.		13:36.37
ator	it's a terrible format, horribly over complicated, so I don't blame open source developers for shunning it :)		13:36.46
Wizzup	agreed		13:37.00
ator	djvu uses iw44 for its color images, which is based on the same compression principles as jpeg2000 but not completely insane		13:37.51
Wizzup	It is, however, mostly not my decision. But I could try to do this with JPEG as well, but right now I am trying to be at least on-par with this closed source tool that I am re-implementing in python (using mupdf and some science libs), so for an apples-to-apples comparison, I'd likely initially want to have JPX in there as well		13:37.55
ator	wavelets, etc.		13:37.57
	right. well, jpeg2000 + ccittfax should work today.		13:38.15
Wizzup	Yeah, that sounds good. I could probably encode it to ccittfax myself, since I encode the png to jbig2 to begin with.		13:38.37
ator	jbig2 output would need creating the pdf file from scratch, or wait until we can pass through jbig2 images from our image loader to the PDF writer		13:38.48
	Wizzup: that wouldn't be worth bothering with, mupdf would just decompress and recompress it		13:39.20
	just pass it the png and mupdf will figure out to use ccitt compression if it's a monochrome bitmap		13:39.45
Wizzup	Let me try that, I think I was getting type "image" before of size >1MB, but maybe I was doing it wrong.		13:40.03
ator	sebras: same issue with "mutool convert -o out.pdf img.jbig2"		13:41.19
Wizzup	hm, so I am getting this when I pass a png as a smask, but maybe that is a pymupdf problem (output from pdfimages -list): 1 2 smask 2414 3560 gray 1 1 image no 7 0 305 305 1050K 100%		13:48.02
ator	Wizzup: I tested a similar script using 'mutool run' and I get a 'SMask' too (but monochrome and CCITTFaxDecode)		14:06.08
	but the image is compressed with CCITT, so it should make no practical difference		14:09.22
Wizzup	Check, so I'm doing something wrong probably. The only time I've seen pymupdf/mupdf create a ccitt is when I was copying the jbig2 stream and writing it back, in which case it gets encoded to ccitt. The PNG I am loading is 1 bit grayscale.		14:10.39
ator	mutool show out.pdf pages/1/Resources/XObject/*/SMask/Filter		14:11.41
Wizzup	$ mutool show test.pdf pages/1/Resources/XObject/Im1/SMask/Filter		14:14.45
	null		14:14.45
	$ mutool show test.pdf pages/1/Resources/XObject/Im1/SMask \| grep 'type' /Subtype /Image		14:14.49
ator	Wizzup: http://ghostscript.com/~tor/example.pdf		14:19.10
	that's what my script creates		14:19.15
	given 2 jpeg images (background, foreground) and a monochrome PNG mask		14:19.45
	save this https://pastebin.com/raw/32DKgFA2 as "foo.js" and run "mutool run foo.js" with the page.jpeg, text.jpeg, and mask.png images in the same directory. it will write a file "out.pdf"		14:20.43
	ah, it needs a commit that isn't pushed yet		14:21.29
	but you get the idea		14:21.36
Wizzup	I do, thanks. Your pdf looks fine to me indeed.		14:30.03
sebras	ator: after the meeting.		14:38.20
	Wizzup: ping?		19:16.48
	Wizzup: I'm able to open img.jbig2 that you supplied using mupdf-gl and mupdf-x11. I'm confused what is problematic for you?		19:25.50
	Wizzup: did you change the img.jbig2 file on your website?		19:36.02
Wizzup	sebras: I did not change the image		19:51.55
sebras	Wizzup: when I run the very latests git HEAD version of mupdf on img.jbig2 I see something like this: http://ghostscript.com/~sebras/tmp/img.png		19:56.01
	Wizzup: it looks reasonable. but possibly inverted?		19:56.18
	Wizzup: when you said it didn't render, what did you mean?		19:56.37
	ator: I don't see a problem with mutool convert either?		19:59.22
Wizzup	sebras: For me, loading the jbig resulted in the error shared in the pymupdf bug report earlier		21:00.34
	sebras: I think ator managed to reproduce it too; the question is I think: how we one load jbig2 encoded files and render them as smask? I didn't manage load them from a file and save them to a pdf.		21:03.48
	I am not at my machine atm, will try to get more info soon.		21:04.12
	sebras: What I am attempting to do is load a JPX image and a JBIG2 image (as smask) and write those to a PDF. When I do that, I get this: https://pastebin.com/SBR4maWE		23:09.53
	The only way I've been able to handle a JBIG2 image is by reading the xref of an existing pdf with a JBIG2 image in it, and writing the stream to another pdf doc, even though mupdf will convert it to ccitt. I haven't been able to load a jbig2 from a file and insert that into a pdf, but running 'mutool convert -o /tmp/out.pdf /tmp/mask.jbig2' does work for me, so I don't know what that means.		23:11.37
nickberger	how do i use noto fonts in "mupdf create" to enter this text string "अ" which is "u+0905".		23:34.38
	<<<Back 1 day (to 2020/10/12)	Forward 1 day (to 2020/10/14)>>>

Log of #mupdf at irc.freenode.net.