MuPDF IRC logs

	<<<Back 1 day (to 2017/12/19)	20171220
sebras	Robin_Watts: if you are around, maybe you can take a look at sebras/master?	13:17.50
	Robin_Watts: I tried to fix a couple of issues I've noticed.	13:17.58
Robin_Watts	sebras: Sure.	13:17.59
sebras	Robin_Watts: I found a couple of pdfs over athttp://www.pdfill.com and decided to run mudraw over those. that tripped ASAN/valgrind a bit.	13:19.07
Robin_Watts	All 3 look good to me.	13:20.42
	A repository for sick PDFs? :)	13:21.06
sebras	perhaps, but I don't think so.	13:21.41
	the company develops some kind of windows app for signing pdfs (and editing?)	13:21.57
Robin_Watts	pdf "ill". Never mind :)	13:22.45
sebras	Robin_Watts: I'm not sure I appreciate the GUI they show off on their landing page (you really ought to take a look...)	13:23.53
	Robin_Watts: on a more serious note: in source/pdf/pdf-op-filter.c:filter_push:102 we keep the text fonts in pending and sent.	13:25.03
Robin_Watts	Yeah, ribbon overload.	13:25.06
sebras	Robin_Watts: shouldn't we equally keep the cs, pat and shd parts in cs, CS, sc and SC?	13:25.25
	Robin_Watts: seems to me like the new_gstate will otherwise just borrow those references from the next gstate.	13:25.48
	perhaps that is ok, but if it is, then why keep the fonts?	13:26.00
	I looked at all variable = variable2 assingments after having found the reference counting mistake in fz_new_pixmap_from_pixmap().	13:26.41
	so I'd like to add something like fz_keep_colorspace(ctx, new_gstate->pending.cs.cs);	13:27.17
	but perhaps I'm missing something.	13:27.22
Robin_Watts	sebras: You may be right. I am a bit buried in other stuff at the mo.	13:27.26
sebras	Robin_Watts: ok, I'll look further, you continue to stay buried. I wouldn't want to dig you up... ;)	13:28.00
Robin_Watts	I'm reluctant to take references more than we need to.	13:28.03
	cos colorspaces might only have signed 8 bit reference counts.	13:28.20
sebras	ah.	13:28.42
	Robin_Watts: are you still buried?	17:54.54
KnoteAI	Hello guys!	18:28.39
Robin_Watts	sebras: I can unbury myself for a bit, sure.	18:29.31
	KnoteAI: Hi	18:29.42
sebras	Robin_Watts: fz_get_pixmap_from_image() can be used to decode subareas of an image.	18:31.17
Robin_Watts	sebras: Sorry, I've just been called away for the next 15 mins or so. she who must be obeyed.	18:31.45
sebras	Robin_Watts: first we look for the requested subarea recomputed into a rect.	18:31.59
	no worries.	18:32.04
	we look for this subarea in the store.	18:32.16
	if we don't find it we decide to decode the image.	18:32.24
	when we doecode the image we change the rect that represents the subarea and then uses this as part of the key when storing the decoded image into the store.	18:33.15
	so the next time we request another part of the image we will try the same time, i.e. we will fail to find the fully decoded image in the store since we initally look only for the subarea we wanted.	18:33.55
	of course we don't find it, then we again decide to decode the entire image and then we again try to store the fully decoded image into the store, thereby overwriting the existing entry.	18:34.40
	that seems a bit silly.	18:34.44
	I'm thinking we could first look for the subarea we want and if we find it, use it.	18:35.02
	if we don't find it, try to look for not a subarea but the entire image fully decoded in the store.	18:35.23
	if we find it, use the full image and adapt ctm correspondingly	18:35.45
	and if we can't find that either, only then will we attempt do decode the full image and try to put it into the store.	18:36.11
Robin_Watts	sebras: back.	18:44.51
	That does indeed seem suboptimal.	18:45.02
	What we ought to be doing is looking for an entry in the store that includes the given subarea.	18:45.44
	That was the original intent.	18:45.53
	To do that we'd need to form the hash key from everything without the rectangle, and allow multiple entries in the hash store with the same key.	18:47.00
	Then we'd linearly probe through those entries to look for one with an appropriate rectangle	18:47.32
sebras	Robin_Watts: ok, because we are guaranteed that values with the same key are stored linearly in the hash.	18:51.01
	i see.	18:51.04
	wouldn't we still need to have the rect though to know what part was actually decoded?	18:51.43
Robin_Watts	sebras: The mechanism isn't important really. It's the idea of looking for a set of possibilities.	18:51.45
sebras	yes, I realize that.	18:51.55
Robin_Watts	The rect would still be stored, it just wouldn't form part of the actual key.	18:52.11
sebras	right so fz_make_hash_image_key would still store the rect in the key, but fz_cmp_image_key() wouldn't take it into account.	18:53.23
	if the rect is not stored there, I'm lost as to where it would be stored.	18:53.35
Robin_Watts	sebras: Is it not stored in the fz_pixmap ?	18:53.48
	x, y, x+width, y+height should be the rect, right?	18:54.07
sebras	I don't think we get the correct x,y from the pixmap though.	18:55.11
	but the width/height is certainly stored there.	18:55.19
	well, or larger.	18:55.33
	also I have noticed that in compressed_image_get_pixmap() we may invert the CMYK jpegs for XPS.	18:56.01
Robin_Watts	sebras: Why wouldn't we get the correct x/y ?	18:56.18
sebras	but this is not present in the key, so we could end up wanting a non-inverted variant and the store would give us the inverted one.	18:56.24
Robin_Watts	one problem at a time, eh? :)	18:56.41
sebras	Robin_Watts: sure, but I need to blurt it out before I forget! :)	18:56.54
Robin_Watts	I can understand that :)	18:57.02
sebras	Robin_Watts: the pixmap wouldn't know what x,y it should be at since it might be used in several locations.	18:57.20
Robin_Watts	sebras: The pixmap x/y should be the position of the rectangle within the source that is decoded.	18:57.48
sebras	when we finsihed decoding the image I imagine that x == y == 0	18:57.49
Robin_Watts	i.e. if we have a 300x300 image, and we decode just the middle 100x100 of it, I'd expect x,y,w,h to all be 100.	18:58.25
	Where, and at what scale that is displayed shouldn't affect the rectangle.	18:58.53
sebras	ok, not the x,y on the page, got it.	18:59.56
	Robin_Watts: I think you try to do the corresponding thing with the subsampling factor, you try to look in the store for the most appropriate factor first.	19:01.22
	if that is not found, you decrease it and try to look again.	19:01.36
	until we subsampling is 0.	19:01.42
Robin_Watts	yeah, with subsample factors, there are a limited number of possibilities, so I can check 'em all.	19:01.58
	I can't check all the possible rectangles that would satisfy me though.	19:02.11
sebras	depends on how we implement the compare function, right?	19:02.28
	what if the comparison would say FOUND! as long as the desired rectangle is covered..?	19:02.44
	i.e. even if the pixmap is larger than you originally wanted.	19:02.59
	but that would only solve the linear probing.	19:03.32
	ok.	19:03.33
KnoteAI	any plan to change the xmin, ymin, xmax, ymax to x,y,w,h ???	19:03.41
Robin_Watts	KnoteAI: What, where?	19:04.10
sebras	KnoteAI: you need much more context to get a decent answer. what xmin/ymin/etc..?	19:04.23
Robin_Watts	sebras: The intent is that the comparison should say "Found!" as long as the desired rectangle is covered.	19:04.47
sebras	right.	19:04.55
Robin_Watts	But that can't work with the current way we're driving the hash table.	19:05.15
	We need the "check multiple entries" stuff (which we can do by linear probing, I hope)	19:05.37
sebras	0	19:05.44
	if the hash is the same the entires will be stored linearly without gaps as far as I know.	19:06.06
Robin_Watts	sebras: That is my understanding too.	19:06.17
	We could have a hash_nextMatch or something.	19:06.28
sebras	we don't have a hash table interface to say, ok I have this entry, give me the next one.	19:06.34
	yes.	19:06.38
Robin_Watts	but in fz_underscore_stylee	19:06.53
sebras	fz_hash_find_next() I assume.	19:07.06
Robin_Watts	sebras: That'd work for me I think.	19:07.17
	I'm being distracted by my SSE4.1 intrinsics almost, but not quite working.	19:07.39
KnoteAI	Sorry I was talking about the span bounding box, I am wondering if you were on the point to change the bbox xmin, ymin, xmax, ymax to x,y,w,h? I'm just curious.	19:08.15
sebras	Robin_Watts: I discovered this while trying to help malc_ debug a problem in llpp, he's doing tiling when he's rendering, even if he's rendering a PNG, hence he triggered this. mupdf doesn't seem to be doing that so we haven't seen it.	19:08.20
	KnoteAI: I haven't heard of any such plans.	19:08.47
KnoteAI	ok ok thank you for your answer. I just made a bad conclusion with your message (<@Robin_Watts> i.e. if we have a 300x300 image, and we decode just the middle 100x100 of it, I'd expect x,y,w,h to all be 100.)	19:10.01
sebras	KnoteAI: that is about an entirely different data structure.	19:10.41
KnoteAI	got it thanks ;)	19:11.18
sebras	Robin_Watts: SSE4.1 intrinsics is for SO?	19:12.12
	Robin_Watts: thanks for setting me on the right track, I'll try to do something smart with this.	19:12.52
Robin_Watts	sebras: gs, image scaling.	19:22.29
Diemex	Hi!	19:27.12
	Couple of quick questions:	19:27.21
	Is there a way to get the source/javadoc from the maven repository?	19:27.45
Robin_Watts	Diemex: What source? What javadoc? What maven repository? :)	19:28.50
Diemex	https://mupdf.com/docs/android-sdk.html	19:29.03
	I can't see the javadoc in Android Studio for MuPdf methods	19:29.39
Robin_Watts	Everything is in the git repos listed on that page.	19:29.40
	It's entirely possible that there IS no javadoc as yet.	19:30.02
Diemex	Would be cool if that could be included in the repo if possible	19:30.45
	I'm using MuPdf from Android. If I hold on to the DisplayLists can I call MuPdf from multiple Java Threads to render multiple pages at once?	19:31.44
	If yes is there anything I need to watch out for?	19:32.42
sebras	Diemex: one you have a displaylist you should be able to have multiple threads read that list and render it.	19:34.40
	Diemex: e.g. if you two threads and one renders the upper half while the other renders the lower half.	19:35.01
	be sure to only one thread to parse the document though.	19:35.17
	Diemex: also, no there is no javadoc at all.	19:35.31
Robin_Watts	Diemex: If you want to write it for us, we'll add it, sure.	19:35.34
sebras	Robin_Watts: nice timing. :)	19:35.46
Robin_Watts	Diemex: You may find: http://ghostscript.com/~robin/mupdf_explored.pdf of interest.	19:36.15
	That describes the C API, upon which the Java API is based.	19:36.37
	I haven't written the chapters on the java bindings yet.	19:36.48
Diemex	Robin_Watts: Nice! Thanks for that	19:36.57
	I just returned to Android Dev from an absence of over a year. It's a breeze. Tools have improved a lot. And Kotlin is just WOW	19:38.41
	A walk in the park with a yummy icecream compared to FPGA dev XD	19:39.13
Robin_Watts	:did FPGA dev 20 years ago.	19:40.02
*Robin_Watts*	did FPGA dev 20 years ago.	19:40.07
Diemex	Niiiice. Which FPGA/vendor?	19:40.42
Robin_Watts	Diemex: We used Xilinx mostly, but then we were writing the tools to take software descriptions of programs, and compile them to netlists that then got laid out for the chips.	19:41.37
	so I was insulated from the hardware itself most of the time.	19:41.47
	Gotta go, sorry!	19:41.53
Diemex	Robin_Watts: cu :-)	19:42.09
	Does loading a page and holding on to the page object require a lot of memory? I have just "benchmarked" the loadPage function and it only requires about 0.3 ms per call for the PDF I tried. I expected it to take longer. Is there a possibility that I would run into memory troubles if I would f.e load every page of a large pdf and not free the objects?	20:13.30
sebras	Diemex: that depends on how much memory you have and the number of pages of course.	20:16.54
	Diemex: and probably other things too.	20:17.07
malc_	sebras: it depends on the pages too :)	20:17.13
sebras	malc_: certainly.	20:17.53
	malc_: the more complicated pages are the more memory they use.	20:18.07
malc_	sebras: not only that, they can contain humongous embedded images for instance	20:18.34
	there goes your memory	20:18.37
Diemex	Ah I see. So embedded images are loaded into memory. Can I get the memory usage from the Java API?	20:19.35
	To decide if I should release the page?	20:19.50
	Or I just keep it simple and only hold on to the last 5 pages or so	20:20.11
sebras	Diemex: I don't think we have any API that gives you the memory usage even at the C level.	20:27.38
malc_	sebras: custom allocator in fz_new_context might help some, then again no idea if libjpeg etc is tracked by that	20:30.54
sebras	malc_: the intent is that they should be, but I quite recently discovered allocations in the libraries that are not.	20:34.04
	malc_: I seem to recall that e.g. harfbuzz uses malloc directly.	20:34.15
malc_	sebras: aye.. simple grep reveals naked calloc there in harfbuzz	20:36.58
sebras	malc_: might be replaced by #define calloc() though.	21:22.53
	malc_: I think some libraries do it that way.	21:23.00
malc_	sebras: thus moving deeper into the realm of upstream deviation	21:25.25
janzo	i see mupdf can output xml. im trying to get the boxes/lines/words of a pdf. is this a possible way?	22:24.12
	i stumbled on the stext output	22:37.34
	Forward 1 day (to 2017/12/21)>>>

Log of #mupdf at irc.freenode.net.