| <<<Back 1 day (to 2018/05/15) | 20180516 |
moolc | is it somehow possible to convince mutool to subset fonts used in a pdf? | 05:59.36 |
tor8 | Robin_Watts: I thought fz_image was supposed to hold the image data in its compressed form? | 09:22.25 |
| looking at pdf_load_image ... I see we *always* decompress and store the decompressed data. | 09:22.45 |
Robin_Watts | It does in most cases, why? | 09:22.50 |
tor8 | Bug 699345 has a file with a lot of jbig2 images | 09:23.38 |
| we're decompressing all of them even if we don't render | 09:23.49 |
Robin_Watts | jbig2 gets decompressed, cos jbig2 is special. | 09:24.09 |
| jbig2 was always done in a "special" way in MuPDF for as long as I can remember. | 09:24.46 |
| I'd love for it not to be. | 09:24.51 |
tor8 | pdf_load_image calls pdf_load_compressed_stream calls pdf_load_image_stream calls pdf_open_image_stream and fz_read_all | 09:25.01 |
| pdf_open_image_stream calls pdf_open_filter which always builds the full decompression chain | 09:25.37 |
Robin_Watts | Well, then how do we end up with anything in the fz_compressed_data case? | 09:26.24 |
tor8 | I have no idea | 09:26.34 |
| this code is a twisty maze of functions, all alike | 09:26.43 |
| I tried with a JPEG and that doesn't seem to hit the same code path | 09:28.16 |
Robin_Watts | I've just tried with lion.pdf (a jpeg) | 09:29.12 |
| and we get to pdf_load_compressed_stream. | 09:29.20 |
| and that returns with jpeg data in the buffer. | 09:30.10 |
tor8 | ah, spotted it. | 09:31.07 |
| hidden in build_filter we stop short if we have a compression_params object | 09:31.27 |
Robin_Watts | /* If we were using params we were passed in, and we successfully | 09:31.36 |
| * recognised the image type, we can use the existing filter and | 09:31.38 |
| * shortstop here. */ | 09:31.39 |
| if (params != &local_params && params->type != FZ_IMAGE_RAW) | 09:31.41 |
| return fz_keep_stream(ctx, chain); /* nothing to do */ | 09:31.42 |
| yeah. | 09:31.44 |
tor8 | so why can't we do it for jbig2dec? | 09:32.08 |
Robin_Watts | It's the same trick I did in the PDF agent for Picsel about 20 years ago :) | 09:32.10 |
| tor8: I don't know. | 09:32.14 |
tor8 | is it because it's awkward to stow away the jbig2 'globals' stream data as well? | 09:32.39 |
Robin_Watts | As I say, jbig2dec has always been a special case. | 09:32.43 |
| tor8: Not a clue, but if that was it, I'd have thought we could cope. | 09:33.02 |
tor8 | JPXDecode is the one I remember as being really special (because of how it was shoehorned into the spec) | 09:33.24 |
Robin_Watts | oh, crap. | 09:34.18 |
| Then jbig2dec is NOT special. | 09:34.24 |
| Sorry, I always get jbig2 and jp2k confused, cos I am a fool. | 09:34.39 |
tor8 | whoever named these formats is the bigger fool ... | 09:35.00 |
| jpeg, jbig2, jpeg2000, jpeg-xr ... give it a rest already! | 09:35.16 |
inflex | jErks. | 09:53.20 |
tor8 | Robin_Watts: some commits on tor/master for review (including one that puts jbig2 images in compressed buffers) | 10:39.50 |
| I think you've already approved the first 4 but I'm not 100% sure | 10:40.27 |
Robin_Watts | looking | 10:41.43 |
| The resources one... | 10:46.38 |
| Ok, I'm happy, I think. | 10:47.21 |
tor8 | fab. I'll just let the cluster finish before I push though. | 10:48.23 |
Robin_Watts | ok, all look good. | 10:48.35 |
razi | Hi there, is there a sample code that show how to copy text containing in a rectangle? | 15:44.03 |
inflex | razi, the mupdf-gl (or others) viewer has that in there, stext_search or something I think. | 15:50.41 |
| stext-search.c: fz_highlight_selection() | 15:52.16 |
| sorry, I meant fz_copy_selection() | 15:52.56 |
razi | inflex: Indeed, I used fz_copy_selection() to extract text but sometimes it return the whole line when I just want to select a single word. So, I think maybe I used it in wrong way. | 15:54.58 |
| inflex: maybe you can check this code, please? https://beepaste.io/paste/view/Nj3642 | 15:59.44 |
inflex | I just did a word-copy variation the other day, relies on a single-click (right) and it selects the whole word around that | 16:04.12 |
| it's likely exceedingly inefficient, but it does work atm | 16:04.25 |
| razi, might be of help to you - https://github.com/inflex/mupdf/blob/master/source/fitz/stext-search.c#L223 | 16:08.05 |
razi | inflex: thanks I'll check it. | 16:08.55 |
inflex | I don't know how it'll work on selecting a word within a selection; my code relies just on pagea = pageb | 16:10.31 |
| for all I know, there's probably a vastly more succinct way to do what I did, I just tried to stumble through the dark with this one. | 16:11.22 |
razi | inflex: Well, you didn't use fz_copy_selection(). I will also try extract text by traversing text blocks/lines to see if it works. | 16:16.12 |
inflex | razi, the enumerate function gets used by the fz_copy_selection() | 16:18.21 |
razi | inflex: yeah, you are right. Is your application using this single-click feature public? | 16:32.34 |
inflex | The code is there on github. It's been done for a specific task, choosing to optimise workflow at the expense of other things. I was considering changing it from right-click to double-left. | 16:39.10 |
razi | inflex: Ok, thanks for your time. | 16:45.57 |
inflex | you're welcome, best of luck. | 16:47.04 |
| Forward 1 day (to 2018/05/17)>>> | |