MuPDF IRC logs

	<<<Back 1 day (to 2017/08/14)	20170815
avih	tor8: btw, i tested the threaded code with heap allocation instead of stack (with size of 1000), it's almost 10x slower :/ also, so you plan to leave the two implementations with that #if ?	11:28.12
	(and half off topic, why don't you leave the code you're working on at the top commits, and just add commits until it's stabilized, instead of updating the commit and force push it?)	11:31.24
	i get the need for clean history, but while you're working on something, at least for me, i find it useful to just keep adding commits as ugly as the history might look, and only squash once i'm happy with it all	11:31.24
tor8	avih: I'm working on a completely different implementation as well	11:31.52
avih	i thought you are :) the "lockstep" one?	11:32.12
tor8	avih: on tor/wip are two commits you could give a try	11:36.28
	I expect the lockstep one to be much slower for simple regexes (more book-keeping per step) but scale better with 'nasty' regexes	11:37.03
avih	tor8: will do. is the lockstep one still partial due to not handling I_REF, I_PLA, or I_NLA ?	11:39.44
tor8	I fall back to the recursive one if the program has either of those instructions	11:40.19
avih	also, so the recursive one is unlimited and will just blow off the stack if it overflows it?	11:40.33
tor8	so it should be complete (modulo the use of system malloc and no error handling for malloc failures)	11:40.36
	it will blow up the C stack if it overflows, yes	11:40.49
avih	up. right :)	11:41.02
tor8	but at least it doesn't always use 96k of stack :)	11:41.25
avih	280k !	11:41.54
	also, i think on windows the default stack is 1M, so it's actually not that far off it, in scale terms	11:42.49
	so the new recursive is slightly (~5-8%) slower than the previous recursive, but possibly because previously the flags were a mess. but it works. now trying the lockstep	11:49.32
	tor8: the lockstep one segfaults for me on the same test where the recursive doesn't - repeat("hello world ", 500 * 1000).match(/hello/g).length	11:53.03
	(i'm actually using "hello world ".repeat(500000) using my polyfill https://pastebin.mozilla.org/9029766 )	11:55.13
tor8	ugh, yes, there must still be some bugs left :(	11:55.24
	avih: top commit on tor/wip	11:57.10
avih	tor8: huh.. 50x slower than recursive for that ^ match	11:59.24
	(specifically, with 500k repeats the recursive is ~900ms, the heap-allocated MAXTHREADS ~8000ms, and the lockstep with the fix - i thought it hangs and aborted, but then i reduced the repeats to 5000 and it was ~450ms	12:00.52
tor8	avih: yes, that's about the slowdown I expected.	12:01.37
	I shall not be putting the lockstep one into master (not before optimizing the hell out of the Resub state managment, which is what's causing the slowdown)	12:01.59
avih	tor8: is there a real reason to not use the recursive by default? i don't see how the array size helps with the "threaded" implementation if you don't know what the stack size is to begin with. it can blow the stack either way, but the recursive is not limited to 1000, and only blows the stack once it actually needs to. i think overall it's better.	12:03.41
tor8	not really, it makes it somewhat harder to debug if you smash the stack, but that's pretty much the only reason	12:04.10
avih	what about incremental heap allocations? start with stack array of pointers to Rethread, and allocate more items as you need them? this way you can actually catch exhaustion	12:05.32
tor8	I'm happy enough with the simple recursive backtracking implementation.	12:06.02
avih	except blowing the stack ungracefully	12:06.20
	(which was the main reason against it, if i understand correctly)	12:06.38
tor8	that, and I was halfway towards the lockstep implementation when I figured it was good enough and moved on	12:07.27
avih	k	12:07.48
tor8	but back-references and positive/negative lookahead mess up a lockstep implementation and pretty much mandate a backtracking implementation	12:08.00
	it's not a simple DFA/NFA machine anymore with back-references	12:08.22
avih	in general though, this stuff is, IMO, classic for heap allocation. you can't know how much space you'll need in advance, and that's what the heap is for.. after all..	12:09.23
	not to mention it's order[s] of magnitude bugger than stack space, even with glibc	12:09.57
	i*	12:10.31
sebras	tor8: ok, I see.	12:36.26
	tor8: did you see my weekend ramblings? https://ghostscript.com/mupdfirclogs/2017/08/12.html	12:36.56
tor8	sebras: I'm not too fond of the fz_indexed_lookup function name	12:37.01
	and we have a rather too many awkward 'is this colorspace X' type of constructs in the code	12:37.21
	I think it wuold be good to add some more fz_colorspace_is_rgb and _is_cmyk functions	12:37.36
	sebras: yes, I saw. with that said, the printf commits LGTM.	12:37.56
	sebras: like the if-else chains in 'Add support for writing pdfs usin Lab colorspaces.'	12:38.51
	we have a bunch of other similar if tests elsewhere too	12:39.00
	where we test n==3 and assume RGB	12:39.04
	I think it'd be good if we could change all those to colorspace_is_rgb calls	12:39.26
	and add such a function	12:39.33
sebras	tor8: ok, so you'd rather we have explicit functions and test for each colorspace and not rely on interpreting the number of components?	12:39.48
tor8	and likewise for colorspace_is_gray	12:39.49
sebras	ok.	12:39.59
tor8	a DeviceN and Separation colorspace could mistakenly be considered as rgb or gray	12:40.26
sebras	that's a good point.	12:40.40
tor8	sebras: yeah. now that we support more than devicegray/rgb/cmyk/lab it's a bit trickier I think	12:40.47
	even with lab before, but that was such a rare colorspace	12:41.01
sebras	ok, let me rearrange my commits a bit to have those that are acceptable be merged immediately.	12:41.13
tor8	sebras: so apart from those two I mentioned, the rest LGTM	12:41.28
	though 'Bug 698168: Add support for writing indexed images to pdfs.' depends on the fz_indexed_lookup commit	12:41.49
sebras	tor8: I know.	12:42.00
tor8	fz_(get_)indexed_colorspace_palette?	12:42.51
sebras	tor8: so if you look at sebras/master now, all commits are LGTM, right?	12:42.55
	tor8: that's indeed a better name!	12:43.05
tor8	I'd have said fz_lookup_indexed_colorspace(ctx, cs, index, char color[]) but that won't match with how you use it	12:43.28
sebras	tor8: that way seems slower since we'd need to make one call per index..?	12:44.14
	tor8: do we care?	12:44.20
tor8	sebras: I don't like moving S_ISDIR to a public header file, but we've got too many messes like that already (which I intended to clean up before being side tracked)	12:44.37
sebras	tor8: we're potentially writing a GIGANTIC pdf so 10ns more might not register.	12:44.45
tor8	so I'll just move that to a private header file when I get around to it	12:44.52
	we don't care :)	12:45.02
sebras	tor8: ok... so nopt to a public header, but there are many things in that header that are already system-specific, why not this once too?	12:45.30
tor8	but change the name to something less confusing (lookup is a verb, hence the confusion)	12:45.37
	and it's fine by me	12:45.43
sebras	tor8: right, ok.	12:45.45
tor8	sebras: all of the cruft we do to paste over missing posix stuff on windows should live in a private header	12:46.10
	so we don't pollute system headers and namespaces for users of our library	12:46.32
sebras	tor8: oh, we never expose FILE * or something like that in external API:s?	12:46.32
	tor8: right, that argument resonates with me. :)	12:46.48
tor8	we unfortunately expose FILE* which makes a big mess of how to deal with LARGEFILE :(	12:46.54
	if we were to avoid FILE* we can get away with only exposing two system header files at all: stddef.h and stdint.h	12:47.20
	and we only need those for size_t and int64_t etc	12:47.45
sebras	tor8: I see, well. I'll let you deal with that in the cleanup in that case and remove the S_ISDIR commit.	12:49.00
tor8	no, leave the S_ISDIR commit. there's more stuff in that header file that needs to move along with it. we're not making things any worse with that commit!	12:49.24
	and this moves them to one place where I'll remember to find them	12:49.41
	I dislike having #ifdef's in the C files (if we move to using private headers)	12:49.53
sebras	tor8: then I misunderstood you. I thought you saw this one as unncessary churn. :)	12:50.56
tor8	nah, some churn is good	12:51.15
	especially as I don't know when I'll get around to the next bit of churning :)	12:51.27
sebras	tor8: ok, so sebras/master LGTMed? it clusters well.	12:52.16
	tor8: oh and refetch because I might have updated.	12:52.29
*sebras*	is worried he might push stuff which is not yet LGTMed.	12:52.44
tor8	we should make sure to have the same 'javah' tool for mupdf_native.h	12:53.14
	your javah and mine seem to output the symbols in a completely different order	12:53.30
sebras	tor8: oh, is that why we have an issue? I was wondering what was happning.	12:53.44
tor8	which leads to a LOT of needless churn in mupdf_native.h	12:53.46
sebras	tor8: do we need to check mupdf_native.h in?	12:54.02
	tor8: if we depend on javah regenerating it anyway...?	12:54.13
tor8	yes, we need to check it in. you know how android losers are...	12:55.25
	sebras: try changing platform/java/Makfile to use $(sort $(LIBRARY_JAVA_CLASSES)) for the javah command	12:55.52
sebras	tor8: in that case we should just check in libmupdf.so into the java repos...	12:55.58
tor8	and see if you still get so many changes	12:56.01
sebras	tor8: readlink -f $(which javah) give what output?	12:56.44
tor8	for me it's /opt/jdk1.8.0_121/bin/javah	12:57.01
	javah -version	12:57.20
	javah version "1.8.0_121"	12:57.21
sebras	tor8: oracle? openjdk?	12:57.41
	javah version "1.8.0_141" here.	12:58.07
tor8	Oracle, I think	13:00.02
	sebras: have you tried feeding it a sorted list?	13:00.23
sebras	tor8: nope.	13:03.05
	tor8: I'm testing now however.	13:04.08
	tor8: still a gigantic diff.	13:05.45
	tor8: the same if I switch to oracles: javah version "1.8.0_111"	13:06.42
	tor8: javah version "1.8.0_144" also generates a large diff.	13:20.31
	tor8: and the same one as is in df4f2031b	13:21.25
	tor8: this is with $(sort $(...)) so.. I'm not sure how to fix this.	13:21.45
tor8	I guess the only way is to do as you say: git rm platform/java/mupdf_native.h	13:31.05
	but that will cause problems with Android.mk	13:31.37
sebras	tor8: wow.. I haven't rebuilt -mini for a while and now gradle decided to download a crapload of packages of different versions. I'm still stunned.	13:33.22
	tor8: I was doing make clean in -mini btw.. :-P	13:33.38
tor8	sebras: alternatively we make sure to run the same version of jdk	13:42.31
sebras	tor8: I run 1.8.0u144 right now.	13:46.25
	tor8: downloaded it just now.	13:46.49
	tor8: http://kn-gloryo.github.io/Build_NDK_AndroidStudio_detail/	13:48.07
	tor8: https://medium.com/@siantlords/opencv-and-android-ndk-integration-in-android-studio-883a810189e2	13:48.09
	seems like there is no real consensus on how to configure android studio to autogenerate the headers. :-/	13:48.27
Guest74051	Hi sebras, good morning!	13:48.55
sebras	Guest74051: hi bruno.	13:49.34
Guest74051	I still don't know how to draw an image from png in the right dimensions and aspect ratio. Can you help me?	13:49.36
sebras	Guest74051: did you read the logs over at https://ghostscript.com/mupdfirclogs/2017/08/14.html ?	13:50.17
Guest74051	Yes, I did. Now I have two methods: the first method draws the image in a page and the second method adds the image directly to resources branch. But, in both cases, the image is distorced and cropped.	13:54.10
	The problem is probably in the matrix.	13:55.38
tor8	Guest74051: the problem is likely as you said in the matrix	14:10.49
	how do you calculate the matrix?	14:10.56
Guest74051	fz_document_writer *out = fz_new_document_writer(ctx, "page0.pdf", "pdf", "compress");	14:11.40
	fz_matrix ctm = *fz_copy_matrix(&ctm, &fz_identity);	14:11.45
tor8	it should be [ image_width_in_points_on_the_page 0 0 image_height_in_points_on_the_page left_coordinate_in_points bottom_coordinate_in_points ]	14:11.47
Guest74051	float xs = image->w * 96.0 / image->xres;	14:11.51
	float ys = image->h * 96.0 / image->yres;	14:11.57
	fz_scale(&ctm, xs, ys);	14:12.01
	fz_rect mediabox = {0, 0, xs, ys};	14:12.08
	fz_device *dev = fz_begin_page(ctx, out, &mediabox);	14:12.14
	fz_fill_image(ctx, dev, image, &ctm, 1);	14:12.19
	fz_end_page(ctx, out);	14:12.24
tor8	Guest74051: are you sure the image->xres and image->yres values are good (some images have bogus values there)	14:12.30
	Guest74051: I see nothing obviously wrong with that code	14:12.58
Guest74051	It shows 502 for image->xres and image->yres	14:12.59
tor8	except the mediabox	14:13.06
	which may or may not be what you want	14:13.15
	502 dpi? that's pretty high res	14:13.32
Guest74051	fz_rect mediabox = {0, 0, image->w, image->h};	14:14.32
tor8	Guest74051: if you change the mediabox to be { 0, 0, 1000, 1000 } and ctm to be { 1000, 0, 0, 1000, 0, 0 } you should get a full page image which is square (so wrong aspect ratio, but would exclude any other problems)	14:14.40
Guest74051	I still get cropped image and wrong aspect ratio	14:15.22
tor8	Guest74051: can you post somewhere the input image and output pdf you get?	14:15.55
Guest74051	The image has width=4299 and height=6086	14:16.35
	I just tried with other images and the code actually works. But the initial image that I tried, that is giving problems, is proprietary.	14:31.02
	Is it possible, that giving the dimensions of the image, it is split in tiles?	14:31.44
tor8	Guest74051: it is possibly a bug in the image loading code then	14:31.46
Guest74051	is it split in tiles?	14:31.52
tor8	what does it look like if you just open the image file with mupdf?	14:32.00
	you could also try "mutool convert -o out.pdf input.png"	14:32.53
	and see if that has the same errors	14:32.59
sebras	tor8: I think I have hunted down the javah issue: 7f0a9b9f commit introduced a few changes to the graft api which required 8 lines of mupdf_native.h to change	14:34.16
	tor8: later dd58ea5 introduced lcms2 and updated mupdf_native.h but not in the same was as my javah.	14:34.50
Guest74051	tor8: Yes, same problem. The image is cropped exactly in half and the aspect ratio is wrong.	14:37.16
sebras	tor8: those changes originate from ddf38c5 and 8174ab2 on robin/lcms. I'm not sure what version of javah he used to fix mupdf_native.h or if robin did the changes manually (presumably not?)	14:38.11
Guest74051	I did the following experiment: reduced the image to 1000x1416 pixels and the code worked. Then, yes I believe that fz_new_image_from_file is not handling big images properly.	14:40.46
tor8	Guest74051: if you could open a bug and attach the file, we can mark it as private	14:47.23
	or if you could find a non-private file that exhibits the same problems	14:47.44
sebras	Guest74051: tor8: does https://upload.wikimedia.org/wikipedia/commons/3/39/Periodic_table_large.png exhibit the problem?	14:48.16
	it is supposedly 6000x3300 pixels.	14:48.29
	tor8: so I guess the javah changes stem from the lcms merge and no one noticed because it was in platform/java... ;)	14:49.23
tor8	sebras: some changes but it looks like the whole file is being rearranged with your commit	14:50.00
	it's moving the "Header for class ...RenderingIntent" section down a few lines, etc	14:50.50
	and changing constant suffixes:	14:51.18
	-#define com_artifex_mupdf_fitz_TryLaterException_serialVersionUID -7034897190745766939i64	14:51.22
	+#define com_artifex_mupdf_fitz_TryLaterException_serialVersionUID -7034897190745766939LL	14:51.22
sebras	tor8: not my fault: git show ddf38c5 platform/java/mupdf_native.h \| tail -20	14:51.55
	tor8: is this a javah/win vs javah/other issue?	14:52.15
tor8	quite likely	14:52.33
	I would have said if I didn't generate the same exact file locally	14:52.50
sebras	tor8: https://bugs.openjdk.java.net/browse/JDK-5085132 aw crap, seems like they did this on purpose.	14:53.12
tor8	ah, hang on.	14:53.33
	I may be confused because on tor/master there is another fix for mupdf_native.h	14:53.46
sebras	tor8: that one seems quite similar to mine!	14:54.25
	tor8: but perhaps not identical	14:54.32
tor8	so yeah, I think we're down to windows vs linux javah	14:54.41
sebras	tor8: apparently there was time when the c compiler on windows didn't support the LL suffix, hence they generate i64 on that platform.	14:55.20
tor8	commit your change, and we'll poke robin and remind him not to do the mupdf_native.h regen from windows	14:55.21
sebras	tor8: ok.	14:55.34
Guest74051	The command "mutool convert -o page1.pdf Periodic_table_large.png" converts it properly. My code almost does: it has the correct aspect ratio and it is not cropped, but the image dimensions are smaller than the dimensions of the paper.	14:55.37
tor8	sebras: it may be that the version of MSVC we use (2005) doesn't...	14:55.57
	but my windows VM is still broken...	14:56.08
sebras	tor8: see the discussion about MSVC on the other channel btw.	14:56.34
	tor8: so, then we are back to thinking that sebras/master is LGTM..?	14:57.37
tor8	sebras: we are!	14:58.24
sebras	tor8: sweet! I'll push it.	14:58.32
	tor8: btw, we should consider disabling -Wmisleading-indentation somehow.	15:00.12
	tor8: it triggers in lcms.	15:00.18
	tor8: or we could fix lcms perhaps.	15:00.27
tor8	sebras: we could disable it for lcms in Makethird	15:05.01
sebras	tor8: in 3cbe36b63 I disabled freetypes stream support. this means that freetype no longer will be able to load files from the file system, but you have to supply it with a memory buffer.	15:07.10
	tor8: llpp apparently used this feature in freetype. mupdf doesn't platform/java/mupdf_native.c:load_not() e.g. uses fz_new_font_from_file() which loads to a fz_buffer before creating the font.	15:08.04
tor8	sebras: we do anyway, fz_new_font_from_file loads the font into a fz_buffer before handing to freetype	15:08.38
sebras	tor8: I wasn't aware that I would be restricting how system font loading would work (and neither does the cluster exercise this bit)	15:08.39
tor8	it's so we can always get the backing file out for example for pdfwrite	15:08.50
sebras	tor8: ok, now llpp did this: https://github.com/moosotc/llpp/commit/af256f5b4eefcb9c1aacd113906c54f90cc826a3	15:09.43
tor8	it shouldn't affect system font loading...	15:09.43
	the fz_load_system_font_fn callback is supposed to return a fz_font... ideally loaded with fz_new_font_from_file	15:10.06
sebras	tor8: ok, let me rephrase that: I wasn't aware that apps were accessing the file system directly to load fonts not related to mupdf using the same library.	15:11.11
tor8	right. using our linked freetype.	15:11.37
Guest74051	tor8, sebras: just found the steps to reproduce it	15:11.42
sebras	tor8: yes.	15:11.42
tor8	for non-mupdf purposes.	15:11.44
	yeah, I see the problem now.	15:11.49
sebras	tor8: quite.	15:11.49
tor8	I guess we should back that change out then	15:11.58
Guest74051	How do I upload the image?	15:12.14
sebras	Guest74051: you can attach it to the bug you create at bugs.ghostscript.com	15:12.37
	Guest74051: thanks for taking the time to reproduce with an accessible file btw. most people wouldn't go through the trouble to do so, but it valuable to us. :)	15:13.14
Guest74051	No problem :)	15:13.55
sebras	tor8: the llpp people also asked me if the fonts embedded in mupdf are accessible through some reasonable interface.	15:21.10
tor8	sebras: yes, the fz_lookup_xxx_font functions	15:21.46
	the ones in noto.c	15:21.58
	fz_lookup_base14_font, fz_lookup_builtin_font, fz_lookup_cjk_font, fz_lookup_noto_font, etc.	15:22.45
sebras	tor8: alright, I'll tell them.	15:23.35
	tor8: sebras/master LGTMed?	15:26.09
Guest74051	tor8, sebras: I created the bug https://bugs.ghostscript.com/show_bug.cgi?id=698357#c0	15:28.12
	I tested on mupdf 1.11, but bugzilla only showed 1.10.	15:29.09
sebras	Guest74051: ok, I just confirmed that I see the bug. thanks.	15:35.57
mvrhel_laptop	tor8: thanks for fixing the oc issue. rebased the branch and it looks good. I see one blend issue I have to fix though...	16:15.27
	but first I am going to fool with seeing how the colors are looking on the ipad	16:16.06
	oh the iOS app was moved to its own repo	16:40.53
	I was wondering what was going on	16:41.00
	Forward 1 day (to 2017/08/16)>>>

Log of #mupdf at irc.freenode.net.