| <<<Back 1 day (to 2020/05/04) | Fwd 1 day (to 2020/05/06)>>> | 20200505 |
zebrag | The images this document is made with seem to be very low definition: https://homepages.inf.ed.ac.uk/gdp/publications/cbn_cbv_lambda.pdf | 05:55.50 |
| I have been extracting the size of page 7 with PyMuPDF, bindings are for version PyMuPDF 1.16.18. | 05:57.30 |
| Size of the page is (792, 561, 3) | 05:58.11 |
| I get similar results using `mutool run` | 05:58.53 |
| When I multiply the size of the image (x3) using supposedly good zoom tools, | 05:59.35 |
| I get very blurry results | 05:59.54 |
| If instead I visualize the page from the original pdf file using `mupdf` | 06:00.20 |
| and then zoom through pressing the `+` key | 06:00.52 |
| I get results that are far much crispier | 06:01.22 |
| Whatever the number of time I do the operation | 06:01.35 |
| First possiblity: I misunderstood the size of the document: I did something wrong when extracting the image, and the image should be of much better quality initially | 06:02.55 |
| Second possibility: There are much better solutions for zooming the image | 06:03.26 |
| and starting from a 561X792 image, you can get by zooming repetitively , an image that seems great on a 4K display. | 06:04.32 |
| Here the PyMyPDF script I used: http://paste.debian.net/1145058/ | 06:05.41 |
| There are about 40 lines per page; with 800 pixels in height that makes the body of each char something like 15 pixels in height | 06:09.55 |
| That is consistent with what I see when I zoom the extracted image with feh. | 06:11.16 |
| But that is not at all what I see when I zoom with `+` key from within mupdf | 06:12.12 |
ator | zebrag: how are you "extracting the image"? it's a page that is drawn using multiple source images. | 08:32.33 |
velix | Can muPDF overlay PDF pages? | 14:55.32 |
kens | velix what do you mean by 'overlay' ? | 14:56.44 |
| and to which output format ? | 14:57.00 |
velix | mupdf old.pdf overlay.pdf --output new.pdf | 14:57.13 |
kens | But what are you expecitng in the output? | 14:57.27 |
velix | For example, overlay.pdf has a red box. The red box then should lay over the content of old.pdf | 14:58.01 |
| like a layer. | 14:58.08 |
| (it worked in the past with another PDF tool, but I never tried with mupdf) | 14:58.24 |
| I think, I used qpdf or something? | 14:58.31 |
kens | I'll have to defer to the MuPDF developers for that one, but I think the answer is 'no' | 14:58.41 |
velix | ooops wait. not MuPDF, but MuTools ! | 14:59.31 |
| I'm always mixing those two. | 14:59.36 |
| mutool* | 14:59.50 |
kens | The MuPDF developers are currently in a meeting, hopefully they will be finished shortly and can answer. | 15:04.46 |
velix | Sure, no problems. Had 4 phone meetings today. | 15:05.35 |
| Ah, it's called "Stamping" in other pdf tools | 15:17.26 |
sebras | velix: so you're saying you had this working with mutool before, but it stopped working? | 15:19.00 |
velix | No, I never used mutool for this before, but I'm using mutools for a lot of stoff. I just asked myself: hey, does mutools have this, too? | 15:19.39 |
| mutool* | 15:19.47 |
sebras | velix: right, then it makes more sense, because I don't recall mutool supporting that type of processing. | 15:20.20 |
velix | oh, why? | 15:21.43 |
ator | velix: it does not, but you could script it with 'mutool run' | 15:22.40 |
sebras | velix: it might be that mutool run can be coaxed into doing it using a custom javascript. | 15:22.45 |
velix | Uh, no. No javascript please :D | 15:23.56 |
| Okay, I'll use qpdf for this job then :( | 15:24.04 |
| But thanks for your look into it :) | 15:24.15 |
| Okay, done in 283 PDFs. | 15:34.14 |
| One of my students had a wrong date on the second page :D | 15:34.36 |
zebrag | ator: Yes, I, sort of, figured that "page that is drawn". The document I'm working on is a scanned article. I'm trying to access the best underlying image. | 16:37.09 |
| The size of the "generated image", is the same as the value returned by the `page.bound();` operation. | 16:39.36 |
| But it is very poor for my purpose; and it doesn't look like the source used by `mupdf` rendering, which looks far superior. | 16:40.39 |
| ator: I've used `mutool extract` on the document, and the result is 7252 images, each one like the following: | 16:51.35 |
| img-0353.png: PNG image data, 3120 x 83, 8-bit grayscale, non-interlaced | 16:51.53 |
| So the quality is way better there. 3120 / 560 = 6 times better. | 16:54.16 |
| So now I have to reproduce it with `mutool run`, and for a specific page. And if the images were already stitched together, all the better. | 16:55.33 |
ator | zebrag: render at 600dpi (the page.bounds() and default rendering is measured at 96dpi) | 17:37.13 |
zebrag | ator: I understand; How do I configure that? How do I get `getPixmap()` to use 600dpi instead of 96dpi? | 17:49.42 |
| I think I'm using `Page#toPixmap(transform, colorspace, alpha, skipAnnotations) ` https://mupdf.com/docs/manual-mutool-run.html | 18:00.11 |
| through a python binding called `getPixmap()`; in any case, with `mutool run` I haven't found how to change the default "dpi". | 18:01.23 |
| ator: Fixed. | 18:29.35 |
| https://pymupdf.readthedocs.io/en/latest/faq/#how-to-increase-image-resolution | 18:29.40 |
| I've introduced a "zoom" matrix with factor `3120.0 / 561.0`, obtained from division of one extracted "stripe" image width, with the width from `Page#bound()`. | 18:32.09 |
| I still have a lot to understand... | 18:33.58 |
| Thanks for the "dpi" tip; it did speed up things a lot. | 18:34.31 |
| <<<Back 1 day (to 2020/05/04) | Forward 1 day (to 2020/05/06)>>> | |