| <<<Back 1 day (to 2017/12/19) | 20171220 |
sebras | Robin_Watts: if you are around, maybe you can take a look at sebras/master? | 13:17.50 |
| Robin_Watts: I tried to fix a couple of issues I've noticed. | 13:17.58 |
Robin_Watts | sebras: Sure. | 13:17.59 |
sebras | Robin_Watts: I found a couple of pdfs over athttp://www.pdfill.com and decided to run mudraw over those. that tripped ASAN/valgrind a bit. | 13:19.07 |
Robin_Watts | All 3 look good to me. | 13:20.42 |
| A repository for sick PDFs? :) | 13:21.06 |
sebras | perhaps, but I don't think so. | 13:21.41 |
| the company develops some kind of windows app for signing pdfs (and editing?) | 13:21.57 |
Robin_Watts | pdf "ill". Never mind :) | 13:22.45 |
sebras | Robin_Watts: I'm not sure I appreciate the GUI they show off on their landing page (you really ought to take a look...) | 13:23.53 |
| Robin_Watts: on a more serious note: in source/pdf/pdf-op-filter.c:filter_push:102 we keep the text fonts in pending and sent. | 13:25.03 |
Robin_Watts | Yeah, ribbon overload. | 13:25.06 |
sebras | Robin_Watts: shouldn't we equally keep the cs, pat and shd parts in cs, CS, sc and SC? | 13:25.25 |
| Robin_Watts: seems to me like the new_gstate will otherwise just borrow those references from the next gstate. | 13:25.48 |
| perhaps that is ok, but if it is, then why keep the fonts? | 13:26.00 |
| I looked at all *variable = *variable2 assingments after having found the reference counting mistake in fz_new_pixmap_from_pixmap(). | 13:26.41 |
| so I'd like to add something like fz_keep_colorspace(ctx, new_gstate->pending.cs.cs); | 13:27.17 |
| but perhaps I'm missing something. | 13:27.22 |
Robin_Watts | sebras: You may be right. I am a bit buried in other stuff at the mo. | 13:27.26 |
sebras | Robin_Watts: ok, I'll look further, you continue to stay buried. I wouldn't want to dig you up... ;) | 13:28.00 |
Robin_Watts | I'm reluctant to take references more than we need to. | 13:28.03 |
| cos colorspaces might only have signed 8 bit reference counts. | 13:28.20 |
sebras | ah. | 13:28.42 |
| Robin_Watts: are you still buried? | 17:54.54 |
KnoteAI | Hello guys! | 18:28.39 |
Robin_Watts | sebras: I can unbury myself for a bit, sure. | 18:29.31 |
| KnoteAI: Hi | 18:29.42 |
sebras | Robin_Watts: fz_get_pixmap_from_image() can be used to decode subareas of an image. | 18:31.17 |
Robin_Watts | sebras: Sorry, I've just been called away for the next 15 mins or so. she who must be obeyed. | 18:31.45 |
sebras | Robin_Watts: first we look for the requested subarea recomputed into a rect. | 18:31.59 |
| no worries. | 18:32.04 |
| we look for this subarea in the store. | 18:32.16 |
| if we don't find it we decide to decode the image. | 18:32.24 |
| when we doecode the image we change the rect that represents the subarea and then uses this as part of the key when storing the decoded image into the store. | 18:33.15 |
| so the next time we request another part of the image we will try the same time, i.e. we will fail to find the fully decoded image in the store since we initally look only for the subarea we wanted. | 18:33.55 |
| of course we don't find it, then we *again* decide to decode the entire image and then we *again* try to store the fully decoded image into the store, thereby overwriting the existing entry. | 18:34.40 |
| that seems a bit silly. | 18:34.44 |
| I'm thinking we could first look for the subarea we want and if we find it, use it. | 18:35.02 |
| if we don't find it, try to look for not a subarea but the entire image fully decoded in the store. | 18:35.23 |
| if we find it, use the full image and adapt ctm correspondingly | 18:35.45 |
| and if we can't find that either, only *then* will we attempt do decode the full image and try to put it into the store. | 18:36.11 |
Robin_Watts | sebras: back. | 18:44.51 |
| That does indeed seem suboptimal. | 18:45.02 |
| What we ought to be doing is looking for an entry in the store that includes the given subarea. | 18:45.44 |
| That was the original intent. | 18:45.53 |
| To do that we'd need to form the hash key from everything *without* the rectangle, and allow multiple entries in the hash store with the same key. | 18:47.00 |
| Then we'd linearly probe through those entries to look for one with an appropriate rectangle | 18:47.32 |
sebras | Robin_Watts: ok, because we are guaranteed that values with the same key are stored linearly in the hash. | 18:51.01 |
| i see. | 18:51.04 |
| wouldn't we still need to have the rect though to know what part was actually decoded? | 18:51.43 |
Robin_Watts | sebras: The mechanism isn't important really. It's the idea of looking for a set of possibilities. | 18:51.45 |
sebras | yes, I realize that. | 18:51.55 |
Robin_Watts | The rect would still be stored, it just wouldn't form part of the actual key. | 18:52.11 |
sebras | right so fz_make_hash_image_key would still store the rect in the key, but fz_cmp_image_key() wouldn't take it into account. | 18:53.23 |
| if the rect is not stored there, I'm lost as to where it would be stored. | 18:53.35 |
Robin_Watts | sebras: Is it not stored in the fz_pixmap ? | 18:53.48 |
| x, y, x+width, y+height should be the rect, right? | 18:54.07 |
sebras | I don't think we get the correct x,y from the pixmap though. | 18:55.11 |
| but the width/height is certainly stored there. | 18:55.19 |
| well, or larger. | 18:55.33 |
| also I have noticed that in compressed_image_get_pixmap() we may invert the CMYK jpegs for XPS. | 18:56.01 |
Robin_Watts | sebras: Why wouldn't we get the correct x/y ? | 18:56.18 |
sebras | but this is not present in the key, so we could end up wanting a non-inverted variant and the store would give us the inverted one. | 18:56.24 |
Robin_Watts | one problem at a time, eh? :) | 18:56.41 |
sebras | Robin_Watts: sure, but I need to blurt it out before I forget! :) | 18:56.54 |
Robin_Watts | I can understand that :) | 18:57.02 |
sebras | Robin_Watts: the pixmap wouldn't know what x,y it should be at since it might be used in several locations. | 18:57.20 |
Robin_Watts | sebras: The pixmap x/y should be the position of the rectangle within the source that is decoded. | 18:57.48 |
sebras | when we finsihed decoding the image I imagine that x == y == 0 | 18:57.49 |
Robin_Watts | i.e. if we have a 300x300 image, and we decode just the middle 100x100 of it, I'd expect x,y,w,h to all be 100. | 18:58.25 |
| Where, and at what scale that is displayed shouldn't affect the rectangle. | 18:58.53 |
sebras | ok, not the x,y on the page, got it. | 18:59.56 |
| Robin_Watts: I think you try to do the corresponding thing with the subsampling factor, you try to look in the store for the most appropriate factor first. | 19:01.22 |
| if that is not found, you decrease it and try to look again. | 19:01.36 |
| until we subsampling is 0. | 19:01.42 |
Robin_Watts | yeah, with subsample factors, there are a limited number of possibilities, so I can check 'em all. | 19:01.58 |
| I can't check all the possible rectangles that would satisfy me though. | 19:02.11 |
sebras | depends on how we implement the compare function, right? | 19:02.28 |
| what if the comparison would say FOUND! as long as the desired rectangle is covered..? | 19:02.44 |
| i.e. even if the pixmap is larger than you originally wanted. | 19:02.59 |
| but that would only solve the linear probing. | 19:03.32 |
| ok. | 19:03.33 |
KnoteAI | any plan to change the xmin, ymin, xmax, ymax to x,y,w,h ??? | 19:03.41 |
Robin_Watts | KnoteAI: What, where? | 19:04.10 |
sebras | KnoteAI: you need much more context to get a decent answer. what xmin/ymin/etc..? | 19:04.23 |
Robin_Watts | sebras: The intent is that the comparison should say "Found!" as long as the desired rectangle is covered. | 19:04.47 |
sebras | right. | 19:04.55 |
Robin_Watts | But that can't work with the current way we're driving the hash table. | 19:05.15 |
| We need the "check multiple entries" stuff (which we can do by linear probing, I hope) | 19:05.37 |
sebras | 0 | 19:05.44 |
| if the hash is the same the entires will be stored linearly without gaps as far as I know. | 19:06.06 |
Robin_Watts | sebras: That is my understanding too. | 19:06.17 |
| We could have a hash_nextMatch or something. | 19:06.28 |
sebras | we don't have a hash table interface to say, ok I have this entry, give me the next one. | 19:06.34 |
| yes. | 19:06.38 |
Robin_Watts | but in fz_underscore_stylee | 19:06.53 |
sebras | fz_hash_find_next() I assume. | 19:07.06 |
Robin_Watts | sebras: That'd work for me I think. | 19:07.17 |
| I'm being distracted by my SSE4.1 intrinsics almost, but not quite working. | 19:07.39 |
KnoteAI | Sorry I was talking about the span bounding box, I am wondering if you were on the point to change the bbox xmin, ymin, xmax, ymax to x,y,w,h? I'm just curious. | 19:08.15 |
sebras | Robin_Watts: I discovered this while trying to help malc_ debug a problem in llpp, he's doing tiling when he's rendering, even if he's rendering a PNG, hence he triggered this. mupdf doesn't seem to be doing that so we haven't seen it. | 19:08.20 |
| KnoteAI: I haven't heard of any such plans. | 19:08.47 |
KnoteAI | ok ok thank you for your answer. I just made a bad conclusion with your message (<@Robin_Watts> i.e. if we have a 300x300 image, and we decode just the middle 100x100 of it, I'd expect x,y,w,h to all be 100.) | 19:10.01 |
sebras | KnoteAI: that is about an entirely different data structure. | 19:10.41 |
KnoteAI | got it thanks ;) | 19:11.18 |
sebras | Robin_Watts: SSE4.1 intrinsics is for SO? | 19:12.12 |
| Robin_Watts: thanks for setting me on the right track, I'll try to do something smart with this. | 19:12.52 |
Robin_Watts | sebras: gs, image scaling. | 19:22.29 |
Diemex | Hi! | 19:27.12 |
| Couple of quick questions: | 19:27.21 |
| Is there a way to get the source/javadoc from the maven repository? | 19:27.45 |
Robin_Watts | Diemex: What source? What javadoc? What maven repository? :) | 19:28.50 |
Diemex | https://mupdf.com/docs/android-sdk.html | 19:29.03 |
| I can't see the javadoc in Android Studio for MuPdf methods | 19:29.39 |
Robin_Watts | Everything is in the git repos listed on that page. | 19:29.40 |
| It's entirely possible that there IS no javadoc as yet. | 19:30.02 |
Diemex | Would be cool if that could be included in the repo if possible | 19:30.45 |
| I'm using MuPdf from Android. If I hold on to the DisplayLists can I call MuPdf from multiple Java Threads to render multiple pages at once? | 19:31.44 |
| If yes is there anything I need to watch out for? | 19:32.42 |
sebras | Diemex: one you have a displaylist you should be able to have multiple threads read that list and render it. | 19:34.40 |
| Diemex: e.g. if you two threads and one renders the upper half while the other renders the lower half. | 19:35.01 |
| be sure to only one thread to parse the document though. | 19:35.17 |
| Diemex: also, no there is no javadoc at all. | 19:35.31 |
Robin_Watts | Diemex: If you want to write it for us, we'll add it, sure. | 19:35.34 |
sebras | Robin_Watts: nice timing. :) | 19:35.46 |
Robin_Watts | Diemex: You may find: http://ghostscript.com/~robin/mupdf_explored.pdf of interest. | 19:36.15 |
| That describes the C API, upon which the Java API is based. | 19:36.37 |
| I haven't written the chapters on the java bindings yet. | 19:36.48 |
Diemex | Robin_Watts: Nice! Thanks for that | 19:36.57 |
| I just returned to Android Dev from an absence of over a year. It's a breeze. Tools have improved a lot. And Kotlin is just WOW | 19:38.41 |
| A walk in the park with a yummy icecream compared to FPGA dev XD | 19:39.13 |
Robin_Watts | :did FPGA dev 20 years ago. | 19:40.02 |
Robin_Watts | did FPGA dev 20 years ago. | 19:40.07 |
Diemex | Niiiice. Which FPGA/vendor? | 19:40.42 |
Robin_Watts | Diemex: We used Xilinx mostly, but then we were writing the tools to take software descriptions of programs, and compile them to netlists that then got laid out for the chips. | 19:41.37 |
| so I was insulated from the hardware itself most of the time. | 19:41.47 |
| Gotta go, sorry! | 19:41.53 |
Diemex | Robin_Watts: cu :-) | 19:42.09 |
| Does loading a page and holding on to the page object require a lot of memory? I have just "benchmarked" the loadPage function and it only requires about 0.3 ms per call for the PDF I tried. I expected it to take longer. Is there a possibility that I would run into memory troubles if I would f.e load every page of a large pdf and not free the objects? | 20:13.30 |
sebras | Diemex: that depends on how much memory you have and the number of pages of course. | 20:16.54 |
| Diemex: and probably other things too. | 20:17.07 |
malc_ | sebras: it depends on the pages too :) | 20:17.13 |
sebras | malc_: certainly. | 20:17.53 |
| malc_: the more complicated pages are the more memory they use. | 20:18.07 |
malc_ | sebras: not only that, they can contain humongous embedded images for instance | 20:18.34 |
| there goes your memory | 20:18.37 |
Diemex | Ah I see. So embedded images are loaded into memory. Can I get the memory usage from the Java API? | 20:19.35 |
| To decide if I should release the page? | 20:19.50 |
| Or I just keep it simple and only hold on to the last 5 pages or so | 20:20.11 |
sebras | Diemex: I don't think we have any API that gives you the memory usage even at the C level. | 20:27.38 |
malc_ | sebras: custom allocator in fz_new_context might help some, then again no idea if libjpeg etc is tracked by that | 20:30.54 |
sebras | malc_: the intent is that they should be, but I quite recently discovered allocations in the libraries that are not. | 20:34.04 |
| malc_: I seem to recall that e.g. harfbuzz uses malloc directly. | 20:34.15 |
malc_ | sebras: aye.. simple grep reveals naked calloc there in harfbuzz | 20:36.58 |
sebras | malc_: might be replaced by #define calloc() though. | 21:22.53 |
| malc_: I think some libraries do it that way. | 21:23.00 |
malc_ | sebras: thus moving deeper into the realm of upstream deviation | 21:25.25 |
janzo | i see mupdf can output xml. im trying to get the boxes/lines/words of a pdf. is this a possible way? | 22:24.12 |
| i stumbled on the stext output | 22:37.34 |
| Forward 1 day (to 2017/12/21)>>> | |