Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2020/10/12)Fwd 1 day (to 2020/10/14)>>>20201013 
muevoid Hello! I am trying to build mupdf with clang and clangutils however I get this error: ld: error: target emulation unknown: -m or at least one .o file required. Any ideas?07:45.16 
sebras muevoid: do you have the full compilation log somewhere?07:48.21 
muevoid Yes I can get it and upload to termbin07:48.35 
sebras muevoid: and what version of mupdf are you compiling git HEAD?07:48.46 
muevoid 1.18.007:48.59 
sebras ok.07:49.13 
muevoid Here is the build log: https://termbin.com/vyzl. I also have the following patch but don't believe it would affect this: https://termbin.com/ltoj07:50.09 
  Actually sorry for wasting your time. I missed a package! My apologies07:55.22 
sebras ok.07:55.28 
  muevoid: do you mind telling me what package?07:55.53 
  tesseract/leptonica perhaps?07:56.03 
muevoid gnu-as, I run a distro which get's rid of a lot of gnu things and ld.bfd is needed for linking since clang ld doesn't have a default output07:57.08 
sebras ok.07:57.23 
muevoid Sorry for any inconveince07:57.32 
Wizzup Hi - I'm trying to save a JBIG2 file via pymupdf, but I get this error: `mupdf: cannot complete jbig2 image`. Any clue how I can get more info? I looked at the source code and it seems like in all cases where the function jbig2_complete_page would return < 0, there should be an error printed with level JBIG2_SEVERITY_WARNING, but I am not seeing any.10:52.16 
  Maybe it is a jbig2 version mismatch.12:02.06 
ator Wizzup: I haven't a clue, I'm afraid. You might have more luck asking the pymupdf developers.12:39.41 
Wizzup I am in contact with him. In case you are interested: https://github.com/pymupdf/PyMuPDF/issues/68512:41.16 
ator Wizzup: completely off topic, but I'd advise against using JPX -- it's a horribly slow and memory hungry image format12:42.22 
  JPEG would be a much better choice, IMO.12:42.31 
  Wizzup: the functions that pymupdf uses don't internally support jbig2 compression. however, we detect and compress monochrome 1-bit images using CCITT fax encoding.12:46.18 
  if you want to create a PDF from known input data, and have the jbig2 stream, it's actually pretty easy to just "printf" the PDF file from scratch without using a library12:46.55 
  basically just write a bunch of template strings, write the image data, while keeping track of the file offsets of each object to create the 'xref' table at the end12:47.31 
  images loaded and added using the pdf_add_image function only support a subset of formats (and not jbig2 for complicated reasons -- the "global" dictionary being the main problem)12:49.01 
vtorri__ ator hello12:49.41 
ator hi12:49.51 
mubot Welcome to #mupdf, the channel for MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.12:49.51 
vtorri__ ator does the PDF format allow to support any kind of image format ?12:50.19 
  ator like as an extension12:50.35 
ator define "format"12:50.35 
  you mean compression/file format?12:50.47 
vtorri__ webp, avif etc...12:50.53 
ator no. it does not.12:50.57 
vtorri__ container and codec12:51.00 
  ok, thank you12:51.10 
ator it supports a specific set of formats, but no way to extend it.12:51.23 
vtorri__ ok12:51.29 
  as weel as pdf 2 ?12:51.45 
ator correct.12:52.08 
vtorri__ thank you12:53.31 
ator Wizzup: "If we pass a JBIG2 stream to write to the PDF, why would MuPDF want to decode it?" it decodes it because pdf_add_image doesn't know how to write JBIG2, so it decodes it first, then chooses a compression for output after looking at the data.12:53.50 
Wizzup ator: I do not wish to just copy a PDF from known input data, that was just my first test13:11.36 
ator Wizzup: do you have an example jbig2 file such as one you want to create the data from?13:13.47 
Wizzup Sure: https://wizzup.org/img.jbig213:14.20 
  (Created using https://github.com/agl/jbig2enc)13:14.35 
ator 403 forbidden13:14.35 
Wizzup oops. sorry.13:14.40 
  Should work now (I created the initial file with mkstemp)13:15.07 
ator got it13:15.56 
malc_ ator: any idea what typeface was used in the book this jbig file is an image of?13:17.35 
ator Wizzup: it looks to be loaded as a compressed jbig2 image internally13:19.07 
  Wizzup: but for some reason it's unable to decompress the image13:20.57 
Wizzup jbig2dec can decompress the image: jbig2dec img.jbig2 -t png -o /tmp/out.png13:21.28 
ator I suspect our code expects jbig2 images to only come from the streams embedded in a PDF file, not an external jbig213:22.17 
Wizzup So I have tried stripping the 8 byte header, but then mupdf says it cannot recognize what image it is dealing with (which makes sense)13:23.03 
ator Wizzup: yeah. this looks to be a bug in mupdf.13:23.33 
  sebras: (when you're back from dinner) the muimg.c code has a special case for load_subimage for jbig2, if I remove that special case and try to load a jbig2 image file "normally", I fail with the same error as Wizzup is seeing13:24.30 
Wizzup Check, can you let the pymupdf developer know, or should I? (To prevent him from going on a ghost chase.)13:24.39 
ator I'll write a comment13:24.54 
Wizzup thanks!13:25.11 
ator you'll still not be able to save it as a jbig2 even if I fix this error13:25.49 
  at least not until I extend pdf_add_image to support writing jbig2 streams (but that may not be easy, due to the jbig2 file header thing)13:26.23 
  but you should at least be able to load them and mupdf should compress as ccitt fax!13:27.02 
Wizzup That would be a great start. :-)13:29.49 
  (I'm hoping to release this tool to compress PDFs with MRC in the next week or so, because I couldn't find anything that did it for me.)13:30.51 
ator Wizzup: you could start with JPEG and CCITT fax (using monochrome png as input) :)13:31.17 
Wizzup And on JPX, well noted. I'm going with it mostly because of the (perceived?) quality/size ratio.13:31.26 
ator or pbm, if you want to keep it really simple13:31.37 
  it's maybe 10% smaller, but 10x slower to decompress13:32.01 
Wizzup Right, but the main goal of applying MRC is to compress the images a lot, so that we can fit ~300 large images in a PDF and still keep it relatively small13:32.14 
ator 99% of slow PDF files are due to using JPX compressed images13:32.24 
  just use heavier jpeg compression :)13:32.38 
  JPX is especially bad on huge fill-page high-resolution images13:32.51 
Wizzup JPX is JPEG2000, right?13:33.15 
ator yes13:33.19 
  sorry :)13:33.22 
  it's "JPXDecode" internally in PDF13:33.32 
Wizzup Yeah, I realised13:33.45 
  Interesting, so the Internet Archive stores most of their photos of books as JPEG2000, and I must say I am impressed with the quality/size ratio compared to normal JPEG13:34.06 
  But I haven't done much investigation on my own into the perceived benefits.13:34.26 
  (And yes, JPEG2000 is a pain to deal with)13:34.30 
ator a full page jpeg2000 image at 300dpi is about 8 megapixels. that needs ~150MB of ram to decompress.13:35.08 
  due to jpeg2000 decoders being slow and stupid about pixel format representation (storing each color component as a full integer)13:35.33 
Wizzup Is that with OpenJPEG?13:35.36 
ator yeah13:35.40 
  jasper was equally bad, and even slower13:35.49 
Wizzup Yeah, I've found OpenJPEG to be quite slow. I've also used this "kakadu" thing and it's much, much faster but not FOSS.13:36.05 
ator there are no good open source jpeg2000 decoders :(13:36.22 
  plenty of proprietary ones13:36.28 
Wizzup Yeah, indeed.13:36.37 
ator it's a terrible format, horribly over complicated, so I don't blame open source developers for shunning it :)13:36.46 
Wizzup agreed13:37.00 
ator djvu uses iw44 for its color images, which is based on the same compression principles as jpeg2000 but not completely insane13:37.51 
Wizzup It is, however, mostly not my decision. But I could try to do this with JPEG as well, but right now I am trying to be at least on-par with this closed source tool that I am re-implementing in python (using mupdf and some science libs), so for an apples-to-apples comparison, I'd likely initially want to have JPX in there as well13:37.55 
ator wavelets, etc.13:37.57 
  right. well, jpeg2000 + ccittfax should work today.13:38.15 
Wizzup Yeah, that sounds good. I could probably encode it to ccittfax myself, since I encode the png to jbig2 to begin with.13:38.37 
ator jbig2 output would need creating the pdf file from scratch, or wait until we can pass through jbig2 images from our image loader to the PDF writer13:38.48 
  Wizzup: that wouldn't be worth bothering with, mupdf would just decompress and recompress it13:39.20 
  just pass it the png and mupdf will figure out to use ccitt compression if it's a monochrome bitmap13:39.45 
Wizzup Let me try that, I think I was getting type "image" before of size >1MB, but maybe I was doing it wrong.13:40.03 
ator sebras: same issue with "mutool convert -o out.pdf img.jbig2"13:41.19 
Wizzup hm, so I am getting this when I pass a png as a smask, but maybe that is a pymupdf problem (output from pdfimages -list): 1 2 smask 2414 3560 gray 1 1 image no 7 0 305 305 1050K 100%13:48.02 
ator Wizzup: I tested a similar script using 'mutool run' and I get a 'SMask' too (but monochrome and CCITTFaxDecode)14:06.08 
  but the image is compressed with CCITT, so it should make no practical difference14:09.22 
Wizzup Check, so I'm doing something wrong probably. The only time I've seen pymupdf/mupdf create a ccitt is when I was copying the jbig2 stream and writing it back, in which case it gets encoded to ccitt. The PNG I am loading is 1 bit grayscale.14:10.39 
ator mutool show out.pdf pages/1/Resources/XObject/*/SMask/Filter14:11.41 
Wizzup $ mutool show test.pdf pages/1/Resources/XObject/Im1/SMask/Filter14:14.45 
  null14:14.45 
  $ mutool show test.pdf pages/1/Resources/XObject/Im1/SMask | grep 'type' /Subtype /Image14:14.49 
ator Wizzup: http://ghostscript.com/~tor/example.pdf14:19.10 
  that's what my script creates14:19.15 
  given 2 jpeg images (background, foreground) and a monochrome PNG mask14:19.45 
  save this https://pastebin.com/raw/32DKgFA2 as "foo.js" and run "mutool run foo.js" with the page.jpeg, text.jpeg, and mask.png images in the same directory. it will write a file "out.pdf"14:20.43 
  ah, it needs a commit that isn't pushed yet14:21.29 
  but you get the idea14:21.36 
Wizzup I do, thanks. Your pdf looks fine to me indeed.14:30.03 
sebras ator: after the meeting.14:38.20 
  Wizzup: ping?19:16.48 
  Wizzup: I'm able to open img.jbig2 that you supplied using mupdf-gl and mupdf-x11. I'm confused what is problematic for you?19:25.50 
  Wizzup: did you change the img.jbig2 file on your website?19:36.02 
Wizzup sebras: I did not change the image19:51.55 
sebras Wizzup: when I run the very latests git HEAD version of mupdf on img.jbig2 I see something like this: http://ghostscript.com/~sebras/tmp/img.png19:56.01 
  Wizzup: it looks reasonable. but possibly inverted?19:56.18 
  Wizzup: when you said it didn't render, what did you mean?19:56.37 
  ator: I don't see a problem with mutool convert either?19:59.22 
Wizzup sebras: For me, loading the jbig resulted in the error shared in the pymupdf bug report earlier21:00.34 
  sebras: I think ator managed to reproduce it too; the question is I think: how we one load jbig2 encoded files and render them as smask? I didn't manage load them from a file and save them to a pdf.21:03.48 
  I am not at my machine atm, will try to get more info soon.21:04.12 
  sebras: What I am attempting to do is load a JPX image and a JBIG2 image (as smask) and write those to a PDF. When I do that, I get this: https://pastebin.com/SBR4maWE23:09.53 
  The only way I've been able to handle a JBIG2 image is by reading the xref of an existing pdf with a JBIG2 image in it, and writing the stream to another pdf doc, even though mupdf will convert it to ccitt. I haven't been able to load a jbig2 from a file and insert that into a pdf, but running 'mutool convert -o /tmp/out.pdf /tmp/mask.jbig2' does work for me, so I don't know what that means.23:11.37 
nickberger how do i use noto fonts in "mupdf create" to enter this text string "अ" which is "u+0905".23:34.38 
 <<<Back 1 day (to 2020/10/12)Forward 1 day (to 2020/10/14)>>> 
ghostscript.com #ghostscript
Search: