Log of #mupdf at irc.freenode.net.

 <<<Back 1 day (to 2020/05/20)Fwd 1 day (to 2020/05/22)>>>20200521 
kens not strained.06:52.49 
Zsolt hello09:46.18 
mubot Welcome to #mupdf, the channel for MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.09:46.18 
Zsolt I have a pdf related question, not exactly related to the mupdf software, but a question I got no answer for it.09:48.58 
sebras Zsolt: ok, what is it?09:49.18 
  Zsolt: I'm not sure I can help, but I can try. :)09:49.40 
kens sebras is too modest....09:49.56 
Zsolt for some pdf's I'm downloading from archive.org (books in pdf format) the rendering of pages is quite slow (I'm using SumatraPDF for pdf view).09:50.17 
sebras Zsolt: SumatraPDF uses mupdf to render PDF files.09:50.49 
Zsolt I on the net that other peoples have the same problem with some pdf books downloaded from archive.org09:50.51 
  yes?? I did not know !09:51.01 
kens There's no way to diaghnose the problem without seeing an example (preferably a small example)09:51.16 
Zsolt I also have Adobe acrobat pro DC, and I tried literally a dozen of saving methods to spped up the pdf's but with no results . . .09:51.46 
kens It could be due to transparency, excessive use of patterns, silly colour spaces, strange image masking, all kinds of things09:51.46 
sebras though it is not unlikely that the files contain each page scanned as a JPEG2000 image at a high resolution. these do tend to be quite slow to render.09:51.52 
  Zsolt: do you have a link for a pdf on archive.org that is slow for you?09:52.21 
Zsolt kens, it is OK if I upload 1 10-15 page book to google drive that renders slowly?09:53.03 
kens Or you could just gice us a URL09:53.17 
sebras directly to archive.org09:53.26 
kens But google drive is OK too, or dropbox, or...09:53.28 
Zsolt I will go to google drive, because sending a direct ling will not go, because the books in case are drm protected and you need dedrm to remove the protection (what I did). But I don't want to bother you with such things . . .09:54.57 
  So I will upload to google drive09:55.05 
kens OK that works09:55.10 
Zsolt 28 pages which is just 1.5 MB, but renders very slow to display 1 file. (10-12 sec on my laptop). 2GHz dual core CPU, with 2 GB RAM10:05.44 
  *1 page, not 1 file . . .10:06.19 
kens OK I have the file, give me a minute10:07.14 
Zsolt ok10:07.26 
kens decompressed the file is 58MB10:08.00 
Zsolt that is big10:08.22 
kens It appears to be a scan of a paper book10:08.27 
Zsolt yes10:08.33 
  the whole book is 5xx pages10:08.45 
kens Someone has applied OCR and layered invisible text on top of the images in order to make ir searchable10:08.57 
Zsolt so you say the OCR layer slows it down?10:09.35 
kens No, just describing how its constructed10:09.54 
  10 0 obj10:10.46 
Zsolt what helped (with other such slow books) was to save the book in Adobe pro DC with jpeg quality medium, but that introduced artifacts around the letters/characters . . .10:10.50 
kens Resaving a JPEG always does that.10:11.05 
  Caused by quantising an already quantised set of image data10:11.24 
  JPEG (and JPEG 2000) is a lossy format10:11.34 
  Yes, the images are JPX (ie JPEG2000) encoded.10:11.54 
  That's a highly complex format which takes a long time to decompress10:12.09 
  The only way to improve the processing speed (IMO) is to resave the file with different (simpler) compression.10:13.29 
Zsolt but that will increase the book size (I presume), no?10:14.34 
kens Yes, unless you also downsample the images to a lower resolution10:14.52 
Zsolt by the way10:15.03 
kens Which will, of course, reduce the quality and make it harder to read10:15.05 
Zsolt in case of a book scan10:15.29 
  every page is an actual image?10:15.37 
kens Yes, correct10:15.43 
Zsolt like jpg, png, tiff . . .10:15.46 
kens Every page is a single JPX image10:15.48 
Zsolt ok10:15.54 
kens JPEG2000 iimage10:15.55 
  Looks like 100 dpi10:16.22 
  So its already a pretty low resolution scan10:16.35 
  You could convert it to grayscale or Monochrome. Monochrome would allow use of CCITTFaxEncoding10:17.08 
Zsolt I tried to understand the pdf resolution (DPI). What will change if I downsample a 300dpi pdf to 100pdi?10:17.13 
  because I can read it the same way, no?10:17.24 
kens You'll get fewer image samples10:17.30 
  If a PDF page is (let say for example) 8 inches by 10 inches, and you have an image which is 800 saampels wide, by 1000 samples long, and you draw the image so that it covers the enitre page, then the image is drawn with a resolution of 100 dpi.10:18.52 
  If I take the exact same image, draw it on a PDF file where the page is only 4 inches by 5, but still filling the entire page, then the resolution will be 200 dpi10:19.22 
  Of course, on screen that may not appear any different *because you zoom the page in), but when you come to print it on paper, it will10:20.21 
Zsolt I understand10:20.50 
malc_ ator: https://vimeo.com/4029657110:21.07 
ator kens: it's worse than that, there are two images. one a low res and a second high-res.10:22.06 
Zsolt kens, I'm checking the pdf resolution and acrobat reader tells me it is 144 by 144 pixels . . .10:22.19 
kens Oh really ? I hadn't seen the hires one10:22.22 
  The Acrobat display looks quite low res10:22.29 
ator image 10 and then 1110:22.33 
  first one is 1118x1614 pixels, second one is 3352x484210:23.02 
  on each page10:23.03 
kens Ah yes10:23.12 
  Three times the resolution10:23.22 
Zsolt are you using Acrobat?10:23.43 
ator and followed by a third jbig2-encoded image mask10:23.45 
  for the black-and-white data10:23.52 
  so this is basically the DjVu approach in PDF10:24.08 
  one low-res color image as the page backdrop10:24.15 
  followed by a high-res color image, masked by a high-res monochrome image10:24.27 
kens Or in other words, about as slow as you can make it10:24.43 
ator if you look at the extracted high-res color image it has a lot of garbage in the pixels that are masked away by the jbig2 image10:24.45 
Zsolt yes I see10:25.11 
ator you could extract all the image masks, and create a black&white PDF from only those. you'd lose the coloration but keep the text shapes.10:25.48 
Zsolt or save the pages as lossless images with no compression? that should theoretically speed up page rendering, no?10:26.52 
ator given how it's constructed, there are two things that make it slow:10:27.30 
kens It would be faster, but you're still facing drawing multiple sets of image dtaa, one set through a mask10:27.30 
ator 1) JPEG2000 and JBIG2 compression (both of these are ridiculously slow to decompress)10:27.45 
  2) High resolution images with masking, take a while to process and resample to fit the display resolution.10:28.03 
Zsolt JBIG is some sort of jpeg?10:29.24 
sebras Zsolt: no, JBIG2 is a different format.10:29.34 
  Zsolt: it is used for black and white images, which is why it used for the image mask in this case.10:30.02 
Zsolt ok, I understand10:30.15 
  thanks for the help, I will return if I face some other pdf problems10:32.07 
  I downloaded about 20 books from archive.org, and every one is slow . . .10:32.41 
  on your computers the my sample pdf is also slow?10:40.44 
kens yes, because it requires a lot of processing10:41.19 
Zsolt ok, so it is not because I have a slow pc . . .10:41.48 
ator no. I have a fast pc and the file is still crazy slow!10:42.04 
Zsolt I need to go now, maybe I will back latter If I can solve this pdf issue.10:43.44 
  thanks for help10:43.48 
  and bye10:43.50 
hisacro is there a way to predefine page to fit width, as of now I made mupdf to read the next file every time I quit but need 'W' each time10:59.30 
ator hisacro: get a bigger monitor, or start with a smaller resolution (-r) so the page fits your screen :)11:27.36 
  the default is to start with the full page showing, window sized to fit11:27.56 
  but a small monitor or big page size can get in the way of that, we don't create a window larger than will fit on your screen11:28.12 
malc_ ator: what's your definition of "bigger monitor"?11:31.31 
ator kens: Zsolt: I wrote a small mutool run script to convert this file (and others like it) to contain only the image mask: http://ix.io/2mTj11:31.38 
kens how muchfaster is that ator ?11:31.57 
ator malc_: I can easily fit a4 and letter sized documents on my 1920x1200 24" monitor11:32.01 
  takes a while to grind through though11:32.11 
kens :-)11:32.11 
ator kens: try the result in casper:/home/tor/out.pdf11:32.35 
  hm, but it's not fax compressing it as it should :(11:33.11 
  need to do more work...11:33.19 
kens That's a load faster certainly11:34.02 
  essentially instant as opposed to 3-4 seconds11:34.09 
  per page that is11:34.56 
malc_ ator: hmmm, my (now broken) 25in one is 2560x1440 and this new 27 (3840x2160) are both bigger than 24... still i thin they sport higher dpi than your 24 one... explains why you enable antialiasing (as seen in your github snippets repository config files)11:35.21 
ator kens: and try it now! bugs fixed so now it fax compresses the images too.11:47.08 
  takes the file size down to <1MB11:47.27 
kens That's pretty good11:47.43 
  easier to read in monochrome also11:47.48 
ator but uh oh, I think I have a bug in the fax compression :/11:47.50 
  page 8 is broken11:47.56 
  much easier to read!11:49.18 
  yeah. fax g3 saving works. g4 creates broken streams. bah!11:49.41 
  g4 makes soooo much smaller files, gotta fix that bug11:50.06 
malc_ and for all typography connoisseurs out there - https://www.mcsweeneys.net/articles/im-comic-sans-asshole14:06.11 
  i used to have a (n optional) mode that used fz_install_load_system_font_funcs to use fontconfig to load fonts for rendering, today i've tried to install sytem fallback function to override default behavior (at least when rendering htpml, to avoid falling back to charis sil), but the callback function was never called, should it have been?17:37.34 
  comment describing fz_load_system_fallback_font_fn in include/mupdf/fitz/font.h is wrong (copy-pasted?) it lists `name` argument - this function has no argument called name.18:07.12 
txt23 Hey guys. I have customers constantly upload super large scanned versions of their docs. How can I resize them to letter size using MuPDF?21:59.34 
 <<<Back 1 day (to 2020/05/20)Forward 1 day (to 2020/05/22)>>> 
ghostscript.com #ghostscript