| <<<Back 1 day (to 2020/05/20) | Fwd 1 day (to 2020/05/22)>>> | 20200521 |
kens | not strained. | 06:52.49 |
Zsolt | hello | 09:46.18 |
mubot | Welcome to #mupdf, the channel for MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line. | 09:46.18 |
Zsolt | I have a pdf related question, not exactly related to the mupdf software, but a question I got no answer for it. | 09:48.58 |
sebras | Zsolt: ok, what is it? | 09:49.18 |
| Zsolt: I'm not sure I can help, but I can try. :) | 09:49.40 |
kens | sebras is too modest.... | 09:49.56 |
Zsolt | for some pdf's I'm downloading from archive.org (books in pdf format) the rendering of pages is quite slow (I'm using SumatraPDF for pdf view). | 09:50.17 |
sebras | Zsolt: SumatraPDF uses mupdf to render PDF files. | 09:50.49 |
Zsolt | I on the net that other peoples have the same problem with some pdf books downloaded from archive.org | 09:50.51 |
| yes?? I did not know ! | 09:51.01 |
kens | There's no way to diaghnose the problem without seeing an example (preferably a small example) | 09:51.16 |
Zsolt | I also have Adobe acrobat pro DC, and I tried literally a dozen of saving methods to spped up the pdf's but with no results . . . | 09:51.46 |
kens | It could be due to transparency, excessive use of patterns, silly colour spaces, strange image masking, all kinds of things | 09:51.46 |
sebras | though it is not unlikely that the files contain each page scanned as a JPEG2000 image at a high resolution. these do tend to be quite slow to render. | 09:51.52 |
| Zsolt: do you have a link for a pdf on archive.org that is slow for you? | 09:52.21 |
Zsolt | kens, it is OK if I upload 1 10-15 page book to google drive that renders slowly? | 09:53.03 |
kens | Or you could just gice us a URL | 09:53.17 |
sebras | directly to archive.org | 09:53.26 |
kens | But google drive is OK too, or dropbox, or... | 09:53.28 |
Zsolt | I will go to google drive, because sending a direct ling will not go, because the books in case are drm protected and you need dedrm to remove the protection (what I did). But I don't want to bother you with such things . . . | 09:54.57 |
| So I will upload to google drive | 09:55.05 |
kens | OK that works | 09:55.10 |
Zsolt | 28 pages which is just 1.5 MB, but renders very slow to display 1 file. (10-12 sec on my laptop). 2GHz dual core CPU, with 2 GB RAM | 10:05.44 |
| https://drive.google.com/open?id=1c7i_oHpu8HKvyIbUepXUHWDOKGtVVx-N | 10:05.45 |
| *1 page, not 1 file . . . | 10:06.19 |
kens | OK I have the file, give me a minute | 10:07.14 |
Zsolt | ok | 10:07.26 |
kens | decompressed the file is 58MB | 10:08.00 |
Zsolt | that is big | 10:08.22 |
kens | It appears to be a scan of a paper book | 10:08.27 |
Zsolt | yes | 10:08.33 |
| the whole book is 5xx pages | 10:08.45 |
kens | Someone has applied OCR and layered invisible text on top of the images in order to make ir searchable | 10:08.57 |
Zsolt | so you say the OCR layer slows it down? | 10:09.35 |
kens | No, just describing how its constructed | 10:09.54 |
| 10 0 obj | 10:10.46 |
Zsolt | what helped (with other such slow books) was to save the book in Adobe pro DC with jpeg quality medium, but that introduced artifacts around the letters/characters . . . | 10:10.50 |
kens | Resaving a JPEG always does that. | 10:11.05 |
| Caused by quantising an already quantised set of image data | 10:11.24 |
| JPEG (and JPEG 2000) is a lossy format | 10:11.34 |
| Yes, the images are JPX (ie JPEG2000) encoded. | 10:11.54 |
| That's a highly complex format which takes a long time to decompress | 10:12.09 |
| The only way to improve the processing speed (IMO) is to resave the file with different (simpler) compression. | 10:13.29 |
Zsolt | but that will increase the book size (I presume), no? | 10:14.34 |
kens | Yes, unless you also downsample the images to a lower resolution | 10:14.52 |
Zsolt | by the way | 10:15.03 |
kens | Which will, of course, reduce the quality and make it harder to read | 10:15.05 |
Zsolt | in case of a book scan | 10:15.29 |
| every page is an actual image? | 10:15.37 |
kens | Yes, correct | 10:15.43 |
Zsolt | like jpg, png, tiff . . . | 10:15.46 |
kens | Every page is a single JPX image | 10:15.48 |
Zsolt | ok | 10:15.54 |
kens | JPEG2000 iimage | 10:15.55 |
| Looks like 100 dpi | 10:16.22 |
| So its already a pretty low resolution scan | 10:16.35 |
| You could convert it to grayscale or Monochrome. Monochrome would allow use of CCITTFaxEncoding | 10:17.08 |
Zsolt | I tried to understand the pdf resolution (DPI). What will change if I downsample a 300dpi pdf to 100pdi? | 10:17.13 |
| because I can read it the same way, no? | 10:17.24 |
kens | You'll get fewer image samples | 10:17.30 |
| If a PDF page is (let say for example) 8 inches by 10 inches, and you have an image which is 800 saampels wide, by 1000 samples long, and you draw the image so that it covers the enitre page, then the image is drawn with a resolution of 100 dpi. | 10:18.52 |
| If I take the exact same image, draw it on a PDF file where the page is only 4 inches by 5, but still filling the entire page, then the resolution will be 200 dpi | 10:19.22 |
| Of course, on screen that may not appear any different *because you zoom the page in), but when you come to print it on paper, it will | 10:20.21 |
Zsolt | I understand | 10:20.50 |
malc_ | ator: https://vimeo.com/40296571 | 10:21.07 |
ator | kens: it's worse than that, there are two images. one a low res and a second high-res. | 10:22.06 |
Zsolt | kens, I'm checking the pdf resolution and acrobat reader tells me it is 144 by 144 pixels . . . | 10:22.19 |
kens | Oh really ? I hadn't seen the hires one | 10:22.22 |
| The Acrobat display looks quite low res | 10:22.29 |
ator | image 10 and then 11 | 10:22.33 |
| first one is 1118x1614 pixels, second one is 3352x4842 | 10:23.02 |
| on each page | 10:23.03 |
kens | Ah yes | 10:23.12 |
| Three times the resolution | 10:23.22 |
Zsolt | are you using Acrobat? | 10:23.43 |
ator | and followed by a third jbig2-encoded image mask | 10:23.45 |
| for the black-and-white data | 10:23.52 |
| so this is basically the DjVu approach in PDF | 10:24.08 |
| one low-res color image as the page backdrop | 10:24.15 |
| followed by a high-res color image, masked by a high-res monochrome image | 10:24.27 |
kens | Or in other words, about as slow as you can make it | 10:24.43 |
ator | if you look at the extracted high-res color image it has a lot of garbage in the pixels that are masked away by the jbig2 image | 10:24.45 |
Zsolt | yes I see | 10:25.11 |
ator | you could extract all the image masks, and create a black&white PDF from only those. you'd lose the coloration but keep the text shapes. | 10:25.48 |
Zsolt | or save the pages as lossless images with no compression? that should theoretically speed up page rendering, no? | 10:26.52 |
ator | given how it's constructed, there are two things that make it slow: | 10:27.30 |
kens | It would be faster, but you're still facing drawing multiple sets of image dtaa, one set through a mask | 10:27.30 |
ator | 1) JPEG2000 and JBIG2 compression (both of these are ridiculously slow to decompress) | 10:27.45 |
| 2) High resolution images with masking, take a while to process and resample to fit the display resolution. | 10:28.03 |
Zsolt | JBIG is some sort of jpeg? | 10:29.24 |
sebras | Zsolt: no, JBIG2 is a different format. | 10:29.34 |
| Zsolt: it is used for black and white images, which is why it used for the image mask in this case. | 10:30.02 |
Zsolt | ok, I understand | 10:30.15 |
| thanks for the help, I will return if I face some other pdf problems | 10:32.07 |
| I downloaded about 20 books from archive.org, and every one is slow . . . | 10:32.41 |
| on your computers the my sample pdf is also slow? | 10:40.44 |
kens | yes, because it requires a lot of processing | 10:41.19 |
Zsolt | ok, so it is not because I have a slow pc . . . | 10:41.48 |
ator | no. I have a fast pc and the file is still crazy slow! | 10:42.04 |
Zsolt | I need to go now, maybe I will back latter If I can solve this pdf issue. | 10:43.44 |
| thanks for help | 10:43.48 |
| and bye | 10:43.50 |
| *can't | 10:44.04 |
hisacro | is there a way to predefine page to fit width, as of now I made mupdf to read the next file every time I quit but need 'W' each time | 10:59.30 |
ator | hisacro: get a bigger monitor, or start with a smaller resolution (-r) so the page fits your screen :) | 11:27.36 |
| the default is to start with the full page showing, window sized to fit | 11:27.56 |
| but a small monitor or big page size can get in the way of that, we don't create a window larger than will fit on your screen | 11:28.12 |
malc_ | ator: what's your definition of "bigger monitor"? | 11:31.31 |
ator | kens: Zsolt: I wrote a small mutool run script to convert this file (and others like it) to contain only the image mask: http://ix.io/2mTj | 11:31.38 |
kens | how muchfaster is that ator ? | 11:31.57 |
ator | malc_: I can easily fit a4 and letter sized documents on my 1920x1200 24" monitor | 11:32.01 |
| MUCH! | 11:32.05 |
| takes a while to grind through though | 11:32.11 |
kens | :-) | 11:32.11 |
ator | kens: try the result in casper:/home/tor/out.pdf | 11:32.35 |
| hm, but it's not fax compressing it as it should :( | 11:33.11 |
| need to do more work... | 11:33.19 |
kens | That's a load faster certainly | 11:34.02 |
| essentially instant as opposed to 3-4 seconds | 11:34.09 |
| per page that is | 11:34.56 |
malc_ | ator: hmmm, my (now broken) 25in one is 2560x1440 and this new 27 (3840x2160) are both bigger than 24... still i thin they sport higher dpi than your 24 one... explains why you enable antialiasing (as seen in your github snippets repository config files) | 11:35.21 |
ator | kens: and try it now! bugs fixed so now it fax compresses the images too. | 11:47.08 |
| takes the file size down to <1MB | 11:47.27 |
kens | That's pretty good | 11:47.43 |
| easier to read in monochrome also | 11:47.48 |
ator | but uh oh, I think I have a bug in the fax compression :/ | 11:47.50 |
| page 8 is broken | 11:47.56 |
| much easier to read! | 11:49.18 |
| yeah. fax g3 saving works. g4 creates broken streams. bah! | 11:49.41 |
| g4 makes soooo much smaller files, gotta fix that bug | 11:50.06 |
malc_ | and for all typography connoisseurs out there - https://www.mcsweeneys.net/articles/im-comic-sans-asshole | 14:06.11 |
| i used to have a (n optional) mode that used fz_install_load_system_font_funcs to use fontconfig to load fonts for rendering, today i've tried to install sytem fallback function to override default behavior (at least when rendering htpml, to avoid falling back to charis sil), but the callback function was never called, should it have been? | 17:37.34 |
| comment describing fz_load_system_fallback_font_fn in include/mupdf/fitz/font.h is wrong (copy-pasted?) it lists `name` argument - this function has no argument called name. | 18:07.12 |
txt23 | Hey guys. I have customers constantly upload super large scanned versions of their docs. How can I resize them to letter size using MuPDF? | 21:59.34 |
| <<<Back 1 day (to 2020/05/20) | Forward 1 day (to 2020/05/22)>>> | |