| <<<Back 1 day (to 2020/10/12) | Fwd 1 day (to 2020/10/14)>>> | 20201013 |
muevoid | Hello! I am trying to build mupdf with clang and clangutils however I get this error: ld: error: target emulation unknown: -m or at least one .o file required. Any ideas? | 07:45.16 |
sebras | muevoid: do you have the full compilation log somewhere? | 07:48.21 |
muevoid | Yes I can get it and upload to termbin | 07:48.35 |
sebras | muevoid: and what version of mupdf are you compiling git HEAD? | 07:48.46 |
muevoid | 1.18.0 | 07:48.59 |
sebras | ok. | 07:49.13 |
muevoid | Here is the build log: https://termbin.com/vyzl. I also have the following patch but don't believe it would affect this: https://termbin.com/ltoj | 07:50.09 |
| Actually sorry for wasting your time. I missed a package! My apologies | 07:55.22 |
sebras | ok. | 07:55.28 |
| muevoid: do you mind telling me what package? | 07:55.53 |
| tesseract/leptonica perhaps? | 07:56.03 |
muevoid | gnu-as, I run a distro which get's rid of a lot of gnu things and ld.bfd is needed for linking since clang ld doesn't have a default output | 07:57.08 |
sebras | ok. | 07:57.23 |
muevoid | Sorry for any inconveince | 07:57.32 |
Wizzup | Hi - I'm trying to save a JBIG2 file via pymupdf, but I get this error: `mupdf: cannot complete jbig2 image`. Any clue how I can get more info? I looked at the source code and it seems like in all cases where the function jbig2_complete_page would return < 0, there should be an error printed with level JBIG2_SEVERITY_WARNING, but I am not seeing any. | 10:52.16 |
| Maybe it is a jbig2 version mismatch. | 12:02.06 |
ator | Wizzup: I haven't a clue, I'm afraid. You might have more luck asking the pymupdf developers. | 12:39.41 |
Wizzup | I am in contact with him. In case you are interested: https://github.com/pymupdf/PyMuPDF/issues/685 | 12:41.16 |
ator | Wizzup: completely off topic, but I'd advise against using JPX -- it's a horribly slow and memory hungry image format | 12:42.22 |
| JPEG would be a much better choice, IMO. | 12:42.31 |
| Wizzup: the functions that pymupdf uses don't internally support jbig2 compression. however, we detect and compress monochrome 1-bit images using CCITT fax encoding. | 12:46.18 |
| if you want to create a PDF from known input data, and have the jbig2 stream, it's actually pretty easy to just "printf" the PDF file from scratch without using a library | 12:46.55 |
| basically just write a bunch of template strings, write the image data, while keeping track of the file offsets of each object to create the 'xref' table at the end | 12:47.31 |
| images loaded and added using the pdf_add_image function only support a subset of formats (and not jbig2 for complicated reasons -- the "global" dictionary being the main problem) | 12:49.01 |
vtorri__ | ator hello | 12:49.41 |
ator | hi | 12:49.51 |
mubot | Welcome to #mupdf, the channel for MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line. | 12:49.51 |
vtorri__ | ator does the PDF format allow to support any kind of image format ? | 12:50.19 |
| ator like as an extension | 12:50.35 |
ator | define "format" | 12:50.35 |
| you mean compression/file format? | 12:50.47 |
vtorri__ | webp, avif etc... | 12:50.53 |
ator | no. it does not. | 12:50.57 |
vtorri__ | container and codec | 12:51.00 |
| ok, thank you | 12:51.10 |
ator | it supports a specific set of formats, but no way to extend it. | 12:51.23 |
vtorri__ | ok | 12:51.29 |
| as weel as pdf 2 ? | 12:51.45 |
ator | correct. | 12:52.08 |
vtorri__ | thank you | 12:53.31 |
ator | Wizzup: "If we pass a JBIG2 stream to write to the PDF, why would MuPDF want to decode it?" it decodes it because pdf_add_image doesn't know how to write JBIG2, so it decodes it first, then chooses a compression for output after looking at the data. | 12:53.50 |
Wizzup | ator: I do not wish to just copy a PDF from known input data, that was just my first test | 13:11.36 |
ator | Wizzup: do you have an example jbig2 file such as one you want to create the data from? | 13:13.47 |
Wizzup | Sure: https://wizzup.org/img.jbig2 | 13:14.20 |
| (Created using https://github.com/agl/jbig2enc) | 13:14.35 |
ator | 403 forbidden | 13:14.35 |
Wizzup | oops. sorry. | 13:14.40 |
| Should work now (I created the initial file with mkstemp) | 13:15.07 |
ator | got it | 13:15.56 |
malc_ | ator: any idea what typeface was used in the book this jbig file is an image of? | 13:17.35 |
ator | Wizzup: it looks to be loaded as a compressed jbig2 image internally | 13:19.07 |
| Wizzup: but for some reason it's unable to decompress the image | 13:20.57 |
Wizzup | jbig2dec can decompress the image: jbig2dec img.jbig2 -t png -o /tmp/out.png | 13:21.28 |
ator | I suspect our code expects jbig2 images to only come from the streams embedded in a PDF file, not an external jbig2 | 13:22.17 |
Wizzup | So I have tried stripping the 8 byte header, but then mupdf says it cannot recognize what image it is dealing with (which makes sense) | 13:23.03 |
ator | Wizzup: yeah. this looks to be a bug in mupdf. | 13:23.33 |
| sebras: (when you're back from dinner) the muimg.c code has a special case for load_subimage for jbig2, if I remove that special case and try to load a jbig2 image file "normally", I fail with the same error as Wizzup is seeing | 13:24.30 |
Wizzup | Check, can you let the pymupdf developer know, or should I? (To prevent him from going on a ghost chase.) | 13:24.39 |
ator | I'll write a comment | 13:24.54 |
Wizzup | thanks! | 13:25.11 |
ator | you'll still not be able to save it as a jbig2 even if I fix this error | 13:25.49 |
| at least not until I extend pdf_add_image to support writing jbig2 streams (but that may not be easy, due to the jbig2 file header thing) | 13:26.23 |
| but you should at least be able to load them and mupdf should compress as ccitt fax! | 13:27.02 |
Wizzup | That would be a great start. :-) | 13:29.49 |
| (I'm hoping to release this tool to compress PDFs with MRC in the next week or so, because I couldn't find anything that did it for me.) | 13:30.51 |
ator | Wizzup: you could start with JPEG and CCITT fax (using monochrome png as input) :) | 13:31.17 |
Wizzup | And on JPX, well noted. I'm going with it mostly because of the (perceived?) quality/size ratio. | 13:31.26 |
ator | or pbm, if you want to keep it really simple | 13:31.37 |
| it's maybe 10% smaller, but 10x slower to decompress | 13:32.01 |
Wizzup | Right, but the main goal of applying MRC is to compress the images a lot, so that we can fit ~300 large images in a PDF and still keep it relatively small | 13:32.14 |
ator | 99% of slow PDF files are due to using JPX compressed images | 13:32.24 |
| just use heavier jpeg compression :) | 13:32.38 |
| JPX is especially bad on huge fill-page high-resolution images | 13:32.51 |
Wizzup | JPX is JPEG2000, right? | 13:33.15 |
ator | yes | 13:33.19 |
| sorry :) | 13:33.22 |
| it's "JPXDecode" internally in PDF | 13:33.32 |
Wizzup | Yeah, I realised | 13:33.45 |
| Interesting, so the Internet Archive stores most of their photos of books as JPEG2000, and I must say I am impressed with the quality/size ratio compared to normal JPEG | 13:34.06 |
| But I haven't done much investigation on my own into the perceived benefits. | 13:34.26 |
| (And yes, JPEG2000 is a pain to deal with) | 13:34.30 |
ator | a full page jpeg2000 image at 300dpi is about 8 megapixels. that needs ~150MB of ram to decompress. | 13:35.08 |
| due to jpeg2000 decoders being slow and stupid about pixel format representation (storing each color component as a full integer) | 13:35.33 |
Wizzup | Is that with OpenJPEG? | 13:35.36 |
ator | yeah | 13:35.40 |
| jasper was equally bad, and even slower | 13:35.49 |
Wizzup | Yeah, I've found OpenJPEG to be quite slow. I've also used this "kakadu" thing and it's much, much faster but not FOSS. | 13:36.05 |
ator | there are no good open source jpeg2000 decoders :( | 13:36.22 |
| plenty of proprietary ones | 13:36.28 |
Wizzup | Yeah, indeed. | 13:36.37 |
ator | it's a terrible format, horribly over complicated, so I don't blame open source developers for shunning it :) | 13:36.46 |
Wizzup | agreed | 13:37.00 |
ator | djvu uses iw44 for its color images, which is based on the same compression principles as jpeg2000 but not completely insane | 13:37.51 |
Wizzup | It is, however, mostly not my decision. But I could try to do this with JPEG as well, but right now I am trying to be at least on-par with this closed source tool that I am re-implementing in python (using mupdf and some science libs), so for an apples-to-apples comparison, I'd likely initially want to have JPX in there as well | 13:37.55 |
ator | wavelets, etc. | 13:37.57 |
| right. well, jpeg2000 + ccittfax should work today. | 13:38.15 |
Wizzup | Yeah, that sounds good. I could probably encode it to ccittfax myself, since I encode the png to jbig2 to begin with. | 13:38.37 |
ator | jbig2 output would need creating the pdf file from scratch, or wait until we can pass through jbig2 images from our image loader to the PDF writer | 13:38.48 |
| Wizzup: that wouldn't be worth bothering with, mupdf would just decompress and recompress it | 13:39.20 |
| just pass it the png and mupdf will figure out to use ccitt compression if it's a monochrome bitmap | 13:39.45 |
Wizzup | Let me try that, I think I was getting type "image" before of size >1MB, but maybe I was doing it wrong. | 13:40.03 |
ator | sebras: same issue with "mutool convert -o out.pdf img.jbig2" | 13:41.19 |
Wizzup | hm, so I am getting this when I pass a png as a smask, but maybe that is a pymupdf problem (output from pdfimages -list): 1 2 smask 2414 3560 gray 1 1 image no 7 0 305 305 1050K 100% | 13:48.02 |
ator | Wizzup: I tested a similar script using 'mutool run' and I get a 'SMask' too (but monochrome and CCITTFaxDecode) | 14:06.08 |
| but the image is compressed with CCITT, so it should make no practical difference | 14:09.22 |
Wizzup | Check, so I'm doing something wrong probably. The only time I've seen pymupdf/mupdf create a ccitt is when I was copying the jbig2 stream and writing it back, in which case it gets encoded to ccitt. The PNG I am loading is 1 bit grayscale. | 14:10.39 |
ator | mutool show out.pdf pages/1/Resources/XObject/*/SMask/Filter | 14:11.41 |
Wizzup | $ mutool show test.pdf pages/1/Resources/XObject/Im1/SMask/Filter | 14:14.45 |
| null | 14:14.45 |
| $ mutool show test.pdf pages/1/Resources/XObject/Im1/SMask | grep 'type' /Subtype /Image | 14:14.49 |
ator | Wizzup: http://ghostscript.com/~tor/example.pdf | 14:19.10 |
| that's what my script creates | 14:19.15 |
| given 2 jpeg images (background, foreground) and a monochrome PNG mask | 14:19.45 |
| save this https://pastebin.com/raw/32DKgFA2 as "foo.js" and run "mutool run foo.js" with the page.jpeg, text.jpeg, and mask.png images in the same directory. it will write a file "out.pdf" | 14:20.43 |
| ah, it needs a commit that isn't pushed yet | 14:21.29 |
| but you get the idea | 14:21.36 |
Wizzup | I do, thanks. Your pdf looks fine to me indeed. | 14:30.03 |
sebras | ator: after the meeting. | 14:38.20 |
| Wizzup: ping? | 19:16.48 |
| Wizzup: I'm able to open img.jbig2 that you supplied using mupdf-gl and mupdf-x11. I'm confused what is problematic for you? | 19:25.50 |
| Wizzup: did you change the img.jbig2 file on your website? | 19:36.02 |
Wizzup | sebras: I did not change the image | 19:51.55 |
sebras | Wizzup: when I run the very latests git HEAD version of mupdf on img.jbig2 I see something like this: http://ghostscript.com/~sebras/tmp/img.png | 19:56.01 |
| Wizzup: it looks reasonable. but possibly inverted? | 19:56.18 |
| Wizzup: when you said it didn't render, what did you mean? | 19:56.37 |
| ator: I don't see a problem with mutool convert either? | 19:59.22 |
Wizzup | sebras: For me, loading the jbig resulted in the error shared in the pymupdf bug report earlier | 21:00.34 |
| sebras: I think ator managed to reproduce it too; the question is I think: how we one load jbig2 encoded files and render them as smask? I didn't manage load them from a file and save them to a pdf. | 21:03.48 |
| I am not at my machine atm, will try to get more info soon. | 21:04.12 |
| sebras: What I am attempting to do is load a JPX image and a JBIG2 image (as smask) and write those to a PDF. When I do that, I get this: https://pastebin.com/SBR4maWE | 23:09.53 |
| The only way I've been able to handle a JBIG2 image is by reading the xref of an existing pdf with a JBIG2 image in it, and writing the stream to another pdf doc, even though mupdf will convert it to ccitt. I haven't been able to load a jbig2 from a file and insert that into a pdf, but running 'mutool convert -o /tmp/out.pdf /tmp/mask.jbig2' does work for me, so I don't know what that means. | 23:11.37 |
nickberger | how do i use noto fonts in "mupdf create" to enter this text string "अ" which is "u+0905". | 23:34.38 |
| <<<Back 1 day (to 2020/10/12) | Forward 1 day (to 2020/10/14)>>> | |