| <<<Back 1 day (to 2017/08/14) | 20170815 |
avih | tor8: btw, i tested the threaded code with heap allocation instead of stack (with size of 1000), it's almost 10x slower :/ also, so you plan to leave the two implementations with that #if ? | 11:28.12 |
| (and half off topic, why don't you leave the code you're working on at the top commits, and just add commits until it's stabilized, instead of updating the commit and force push it?) | 11:31.24 |
| i get the need for clean history, but while you're working on something, at least for me, i find it useful to just keep adding commits as ugly as the history might look, and only squash once i'm happy with it all | 11:31.24 |
tor8 | avih: I'm working on a completely different implementation as well | 11:31.52 |
avih | i thought you are :) the "lockstep" one? | 11:32.12 |
tor8 | avih: on tor/wip are two commits you could give a try | 11:36.28 |
| I expect the lockstep one to be much slower for simple regexes (more book-keeping per step) but scale better with 'nasty' regexes | 11:37.03 |
avih | tor8: will do. is the lockstep one still partial due to not handling I_REF, I_PLA, or I_NLA ? | 11:39.44 |
tor8 | I fall back to the recursive one if the program has either of those instructions | 11:40.19 |
avih | also, so the recursive one is unlimited and will just blow off the stack if it overflows it? | 11:40.33 |
tor8 | so it should be complete (modulo the use of system malloc and no error handling for malloc failures) | 11:40.36 |
| it will blow up the C stack if it overflows, yes | 11:40.49 |
avih | up. right :) | 11:41.02 |
tor8 | but at least it doesn't always use 96k of stack :) | 11:41.25 |
avih | 280k ! | 11:41.54 |
| also, i think on windows the default stack is 1M, so it's actually not that far off it, in scale terms | 11:42.49 |
| so the new recursive is slightly (~5-8%) slower than the previous recursive, but possibly because previously the flags were a mess. but it works. now trying the lockstep | 11:49.32 |
| tor8: the lockstep one segfaults for me on the same test where the recursive doesn't - repeat("hello world ", 500 * 1000).match(/hello/g).length | 11:53.03 |
| (i'm actually using "hello world ".repeat(500000) using my polyfill https://pastebin.mozilla.org/9029766 ) | 11:55.13 |
tor8 | ugh, yes, there must still be some bugs left :( | 11:55.24 |
| avih: top commit on tor/wip | 11:57.10 |
avih | tor8: huh.. 50x slower than recursive for that ^ match | 11:59.24 |
| (specifically, with 500k repeats the recursive is ~900ms, the heap-allocated MAXTHREADS ~8000ms, and the lockstep with the fix - i thought it hangs and aborted, but then i reduced the repeats to 5000 and it was ~450ms | 12:00.52 |
tor8 | avih: yes, that's about the slowdown I expected. | 12:01.37 |
| I shall not be putting the lockstep one into master (not before optimizing the hell out of the Resub state managment, which is what's causing the slowdown) | 12:01.59 |
avih | tor8: is there a real reason to not use the recursive by default? i don't see how the array size helps with the "threaded" implementation if you don't know what the stack size is to begin with. it can blow the stack either way, but the recursive is not limited to 1000, and only blows the stack once it actually needs to. i think overall it's better. | 12:03.41 |
tor8 | not really, it makes it somewhat harder to debug if you smash the stack, but that's pretty much the only reason | 12:04.10 |
avih | what about incremental heap allocations? start with stack array of pointers to Rethread, and allocate more items as you need them? this way you can actually catch exhaustion | 12:05.32 |
tor8 | I'm happy enough with the simple recursive backtracking implementation. | 12:06.02 |
avih | except blowing the stack ungracefully | 12:06.20 |
| (which was the main reason against it, if i understand correctly) | 12:06.38 |
tor8 | that, and I was halfway towards the lockstep implementation when I figured it was good enough and moved on | 12:07.27 |
avih | k | 12:07.48 |
tor8 | but back-references and positive/negative lookahead mess up a lockstep implementation and pretty much mandate a backtracking implementation | 12:08.00 |
| it's not a simple DFA/NFA machine anymore with back-references | 12:08.22 |
avih | in general though, this stuff is, IMO, classic for heap allocation. you can't know how much space you'll need in advance, and that's what the heap is for.. after all.. | 12:09.23 |
| not to mention it's order[s] of magnitude bugger than stack space, even with glibc | 12:09.57 |
| i* | 12:10.31 |
sebras | tor8: ok, I see. | 12:36.26 |
| tor8: did you see my weekend ramblings? https://ghostscript.com/mupdfirclogs/2017/08/12.html | 12:36.56 |
tor8 | sebras: I'm not too fond of the fz_indexed_lookup function name | 12:37.01 |
| and we have a rather too many awkward 'is this colorspace X' type of constructs in the code | 12:37.21 |
| I think it wuold be good to add some more fz_colorspace_is_rgb and _is_cmyk functions | 12:37.36 |
| sebras: yes, I saw. with that said, the printf commits LGTM. | 12:37.56 |
| sebras: like the if-else chains in 'Add support for writing pdfs usin Lab colorspaces.' | 12:38.51 |
| we have a bunch of other similar if tests elsewhere too | 12:39.00 |
| where we test n==3 and assume RGB | 12:39.04 |
| I think it'd be good if we could change all those to colorspace_is_rgb calls | 12:39.26 |
| and add such a function | 12:39.33 |
sebras | tor8: ok, so you'd rather we have explicit functions and test for each colorspace and not rely on interpreting the number of components? | 12:39.48 |
tor8 | and likewise for colorspace_is_gray | 12:39.49 |
sebras | ok. | 12:39.59 |
tor8 | a DeviceN and Separation colorspace could mistakenly be considered as rgb or gray | 12:40.26 |
sebras | that's a good point. | 12:40.40 |
tor8 | sebras: yeah. now that we support more than devicegray/rgb/cmyk/lab it's a bit trickier I think | 12:40.47 |
| even with lab before, but that was such a rare colorspace | 12:41.01 |
sebras | ok, let me rearrange my commits a bit to have those that are acceptable be merged immediately. | 12:41.13 |
tor8 | sebras: so apart from those two I mentioned, the rest LGTM | 12:41.28 |
| though 'Bug 698168: Add support for writing indexed images to pdfs.' depends on the fz_indexed_lookup commit | 12:41.49 |
sebras | tor8: I know. | 12:42.00 |
tor8 | fz_(get_)indexed_colorspace_palette? | 12:42.51 |
sebras | tor8: so if you look at sebras/master now, all commits are LGTM, right? | 12:42.55 |
| tor8: that's indeed a better name! | 12:43.05 |
tor8 | I'd have said fz_lookup_indexed_colorspace(ctx, cs, index, char color[]) but that won't match with how you use it | 12:43.28 |
sebras | tor8: that way seems slower since we'd need to make one call per index..? | 12:44.14 |
| tor8: do we care? | 12:44.20 |
tor8 | sebras: I don't like moving S_ISDIR to a public header file, but we've got too many messes like that already (which I intended to clean up before being side tracked) | 12:44.37 |
sebras | tor8: we're potentially writing a GIGANTIC pdf so 10ns more might not register. | 12:44.45 |
tor8 | so I'll just move that to a private header file when I get around to it | 12:44.52 |
| we don't care :) | 12:45.02 |
sebras | tor8: ok... so nopt to a public header, but there are many things in that header that are already system-specific, why not this once too? | 12:45.30 |
tor8 | but change the name to something less confusing (lookup is a verb, hence the confusion) | 12:45.37 |
| and it's fine by me | 12:45.43 |
sebras | tor8: right, ok. | 12:45.45 |
tor8 | sebras: all of the cruft we do to paste over missing posix stuff on windows should live in a private header | 12:46.10 |
| so we don't pollute system headers and namespaces for users of our library | 12:46.32 |
sebras | tor8: oh, we never expose FILE * or something like that in external API:s? | 12:46.32 |
| tor8: right, that argument resonates with me. :) | 12:46.48 |
tor8 | we unfortunately expose FILE* which makes a big mess of how to deal with LARGEFILE :( | 12:46.54 |
| if we were to avoid FILE* we can get away with only exposing two system header files at all: stddef.h and stdint.h | 12:47.20 |
| and we only need those for size_t and int64_t etc | 12:47.45 |
sebras | tor8: I see, well. I'll let you deal with that in the cleanup in that case and remove the S_ISDIR commit. | 12:49.00 |
tor8 | no, leave the S_ISDIR commit. there's more stuff in that header file that needs to move along with it. we're not making things any worse with that commit! | 12:49.24 |
| and this moves them to one place where I'll remember to find them | 12:49.41 |
| I dislike having #ifdef's in the C files (if we move to using private headers) | 12:49.53 |
sebras | tor8: then I misunderstood you. I thought you saw this one as unncessary churn. :) | 12:50.56 |
tor8 | nah, some churn is good | 12:51.15 |
| especially as I don't know when I'll get around to the next bit of churning :) | 12:51.27 |
sebras | tor8: ok, so sebras/master LGTMed? it clusters well. | 12:52.16 |
| tor8: oh and refetch because I might have updated. | 12:52.29 |
sebras | is worried he might push stuff which is not yet LGTMed. | 12:52.44 |
tor8 | we should make sure to have the same 'javah' tool for mupdf_native.h | 12:53.14 |
| your javah and mine seem to output the symbols in a completely different order | 12:53.30 |
sebras | tor8: oh, is that why we have an issue? I was wondering what was happning. | 12:53.44 |
tor8 | which leads to a LOT of needless churn in mupdf_native.h | 12:53.46 |
sebras | tor8: do we need to check mupdf_native.h in? | 12:54.02 |
| tor8: if we depend on javah regenerating it anyway...? | 12:54.13 |
tor8 | yes, we need to check it in. you know how android losers are... | 12:55.25 |
| sebras: try changing platform/java/Makfile to use $(sort $(LIBRARY_JAVA_CLASSES)) for the javah command | 12:55.52 |
sebras | tor8: in that case we should just check in libmupdf.so into the java repos... | 12:55.58 |
tor8 | and see if you still get so many changes | 12:56.01 |
sebras | tor8: readlink -f $(which javah) give what output? | 12:56.44 |
tor8 | for me it's /opt/jdk1.8.0_121/bin/javah | 12:57.01 |
| javah -version | 12:57.20 |
| javah version "1.8.0_121" | 12:57.21 |
sebras | tor8: oracle? openjdk? | 12:57.41 |
| javah version "1.8.0_141" here. | 12:58.07 |
tor8 | Oracle, I think | 13:00.02 |
| sebras: have you tried feeding it a sorted list? | 13:00.23 |
sebras | tor8: nope. | 13:03.05 |
| tor8: I'm testing now however. | 13:04.08 |
| tor8: still a gigantic diff. | 13:05.45 |
| tor8: the same if I switch to oracles: javah version "1.8.0_111" | 13:06.42 |
| tor8: javah version "1.8.0_144" also generates a large diff. | 13:20.31 |
| tor8: and the same one as is in df4f2031b | 13:21.25 |
| tor8: this is with $(sort $(...)) so.. I'm not sure how to fix this. | 13:21.45 |
tor8 | I guess the only way is to do as you say: git rm platform/java/mupdf_native.h | 13:31.05 |
| *but* that will cause problems with Android.mk | 13:31.37 |
sebras | tor8: wow.. I haven't rebuilt -mini for a while and now gradle decided to download a crapload of packages of different versions. I'm still stunned. | 13:33.22 |
| tor8: I was doing make clean in -mini btw.. :-P | 13:33.38 |
tor8 | sebras: alternatively we make sure to run the same version of jdk | 13:42.31 |
sebras | tor8: I run 1.8.0u144 right now. | 13:46.25 |
| tor8: downloaded it just now. | 13:46.49 |
| tor8: http://kn-gloryo.github.io/Build_NDK_AndroidStudio_detail/ | 13:48.07 |
| tor8: https://medium.com/@siantlords/opencv-and-android-ndk-integration-in-android-studio-883a810189e2 | 13:48.09 |
| seems like there is no real consensus on how to configure android studio to autogenerate the headers. :-/ | 13:48.27 |
Guest74051 | Hi sebras, good morning! | 13:48.55 |
sebras | Guest74051: hi bruno. | 13:49.34 |
Guest74051 | I still don't know how to draw an image from png in the right dimensions and aspect ratio. Can you help me? | 13:49.36 |
sebras | Guest74051: did you read the logs over at https://ghostscript.com/mupdfirclogs/2017/08/14.html ? | 13:50.17 |
Guest74051 | Yes, I did. Now I have two methods: the first method draws the image in a page and the second method adds the image directly to resources branch. But, in both cases, the image is distorced and cropped. | 13:54.10 |
| The problem is probably in the matrix. | 13:55.38 |
tor8 | Guest74051: the problem is likely as you said in the matrix | 14:10.49 |
| how do you calculate the matrix? | 14:10.56 |
Guest74051 | fz_document_writer *out = fz_new_document_writer(ctx, "page0.pdf", "pdf", "compress"); | 14:11.40 |
| fz_matrix ctm = *fz_copy_matrix(&ctm, &fz_identity); | 14:11.45 |
tor8 | it should be [ image_width_in_points_on_the_page 0 0 image_height_in_points_on_the_page left_coordinate_in_points bottom_coordinate_in_points ] | 14:11.47 |
Guest74051 | float xs = image->w * 96.0 / image->xres; | 14:11.51 |
| float ys = image->h * 96.0 / image->yres; | 14:11.57 |
| fz_scale(&ctm, xs, ys); | 14:12.01 |
| fz_rect mediabox = {0, 0, xs, ys}; | 14:12.08 |
| fz_device *dev = fz_begin_page(ctx, out, &mediabox); | 14:12.14 |
| fz_fill_image(ctx, dev, image, &ctm, 1); | 14:12.19 |
| fz_end_page(ctx, out); | 14:12.24 |
tor8 | Guest74051: are you sure the image->xres and image->yres values are good (some images have bogus values there) | 14:12.30 |
| Guest74051: I see nothing obviously wrong with that code | 14:12.58 |
Guest74051 | It shows 502 for image->xres and image->yres | 14:12.59 |
tor8 | except the mediabox | 14:13.06 |
| which may or may not be what you want | 14:13.15 |
| 502 dpi? that's pretty high res | 14:13.32 |
Guest74051 | fz_rect mediabox = {0, 0, image->w, image->h}; | 14:14.32 |
tor8 | Guest74051: if you change the mediabox to be { 0, 0, 1000, 1000 } and ctm to be { 1000, 0, 0, 1000, 0, 0 } you should get a full page image which is square (so wrong aspect ratio, but would exclude any other problems) | 14:14.40 |
Guest74051 | I still get cropped image and wrong aspect ratio | 14:15.22 |
tor8 | Guest74051: can you post somewhere the input image and output pdf you get? | 14:15.55 |
Guest74051 | The image has width=4299 and height=6086 | 14:16.35 |
| I just tried with other images and the code actually works. But the initial image that I tried, that is giving problems, is proprietary. | 14:31.02 |
| Is it possible, that giving the dimensions of the image, it is split in tiles? | 14:31.44 |
tor8 | Guest74051: it is possibly a bug in the image loading code then | 14:31.46 |
Guest74051 | is it split in tiles? | 14:31.52 |
tor8 | what does it look like if you just open the image file with mupdf? | 14:32.00 |
| you could also try "mutool convert -o out.pdf input.png" | 14:32.53 |
| and see if that has the same errors | 14:32.59 |
sebras | tor8: I think I have hunted down the javah issue: 7f0a9b9f commit introduced a few changes to the graft api which required 8 lines of mupdf_native.h to change | 14:34.16 |
| tor8: later dd58ea5 introduced lcms2 and updated mupdf_native.h but not in the same was as my javah. | 14:34.50 |
Guest74051 | tor8: Yes, same problem. The image is cropped exactly in half and the aspect ratio is wrong. | 14:37.16 |
sebras | tor8: those changes originate from ddf38c5 and 8174ab2 on robin/lcms. I'm not sure what version of javah he used to fix mupdf_native.h or if robin did the changes manually (presumably not?) | 14:38.11 |
Guest74051 | I did the following experiment: reduced the image to 1000x1416 pixels and the code worked. Then, yes I believe that fz_new_image_from_file is not handling big images properly. | 14:40.46 |
tor8 | Guest74051: if you could open a bug and attach the file, we can mark it as private | 14:47.23 |
| or if you could find a non-private file that exhibits the same problems | 14:47.44 |
sebras | Guest74051: tor8: does https://upload.wikimedia.org/wikipedia/commons/3/39/Periodic_table_large.png exhibit the problem? | 14:48.16 |
| it is supposedly 6000x3300 pixels. | 14:48.29 |
| tor8: so I guess the javah changes stem from the lcms merge and no one noticed because it was in platform/java... ;) | 14:49.23 |
tor8 | sebras: some changes but it looks like the whole file is being rearranged with your commit | 14:50.00 |
| it's moving the "Header for class ...RenderingIntent" section down a few lines, etc | 14:50.50 |
| and changing constant suffixes: | 14:51.18 |
| -#define com_artifex_mupdf_fitz_TryLaterException_serialVersionUID -7034897190745766939i64 | 14:51.22 |
| +#define com_artifex_mupdf_fitz_TryLaterException_serialVersionUID -7034897190745766939LL | 14:51.22 |
sebras | tor8: not my fault: git show ddf38c5 platform/java/mupdf_native.h | tail -20 | 14:51.55 |
| tor8: is this a javah/win vs javah/other issue? | 14:52.15 |
tor8 | quite likely | 14:52.33 |
| I would have said if I didn't generate the same exact file locally | 14:52.50 |
sebras | tor8: https://bugs.openjdk.java.net/browse/JDK-5085132 aw crap, seems like they did this on purpose. | 14:53.12 |
tor8 | ah, hang on. | 14:53.33 |
| I may be confused because on tor/master there is *another* fix for mupdf_native.h | 14:53.46 |
sebras | tor8: that one seems quite similar to mine! | 14:54.25 |
| tor8: but perhaps not identical | 14:54.32 |
tor8 | so yeah, I think we're down to windows vs linux javah | 14:54.41 |
sebras | tor8: apparently there was time when the c compiler on windows didn't support the LL suffix, hence they generate i64 on that platform. | 14:55.20 |
tor8 | commit your change, and we'll poke robin and remind him not to do the mupdf_native.h regen from windows | 14:55.21 |
sebras | tor8: ok. | 14:55.34 |
Guest74051 | The command "mutool convert -o page1.pdf Periodic_table_large.png" converts it properly. My code almost does: it has the correct aspect ratio and it is not cropped, but the image dimensions are smaller than the dimensions of the paper. | 14:55.37 |
tor8 | sebras: it may be that the version of MSVC we use (2005) doesn't... | 14:55.57 |
| but my windows VM is still broken... | 14:56.08 |
sebras | tor8: see the discussion about MSVC on the other channel btw. | 14:56.34 |
| tor8: so, then we are back to thinking that sebras/master is LGTM..? | 14:57.37 |
tor8 | sebras: we are! | 14:58.24 |
sebras | tor8: sweet! I'll push it. | 14:58.32 |
| tor8: btw, we should consider disabling -Wmisleading-indentation somehow. | 15:00.12 |
| tor8: it triggers in lcms. | 15:00.18 |
| tor8: or we could fix lcms perhaps. | 15:00.27 |
tor8 | sebras: we could disable it for lcms in Makethird | 15:05.01 |
sebras | tor8: in 3cbe36b63 I disabled freetypes stream support. this means that freetype no longer will be able to load files from the file system, but you have to supply it with a memory buffer. | 15:07.10 |
| tor8: llpp apparently used this feature in freetype. mupdf doesn't platform/java/mupdf_native.c:load_not() e.g. uses fz_new_font_from_file() which loads to a fz_buffer before creating the font. | 15:08.04 |
tor8 | sebras: we do anyway, fz_new_font_from_file loads the font into a fz_buffer before handing to freetype | 15:08.38 |
sebras | tor8: I wasn't aware that I would be restricting how system font loading would work (and neither does the cluster exercise this bit) | 15:08.39 |
tor8 | it's so we can always get the backing file out for example for pdfwrite | 15:08.50 |
sebras | tor8: ok, now llpp did this: https://github.com/moosotc/llpp/commit/af256f5b4eefcb9c1aacd113906c54f90cc826a3 | 15:09.43 |
tor8 | it shouldn't affect system font loading... | 15:09.43 |
| the fz_load_system_font_fn callback is supposed to return a fz_font... ideally loaded with fz_new_font_from_file | 15:10.06 |
sebras | tor8: ok, let me rephrase that: I wasn't aware that apps were accessing the file system directly to load fonts not related to mupdf using the same library. | 15:11.11 |
tor8 | right. using our linked freetype. | 15:11.37 |
Guest74051 | tor8, sebras: just found the steps to reproduce it | 15:11.42 |
sebras | tor8: yes. | 15:11.42 |
tor8 | for non-mupdf purposes. | 15:11.44 |
| yeah, I see the problem now. | 15:11.49 |
sebras | tor8: quite. | 15:11.49 |
tor8 | I guess we should back that change out then | 15:11.58 |
Guest74051 | How do I upload the image? | 15:12.14 |
sebras | Guest74051: you can attach it to the bug you create at bugs.ghostscript.com | 15:12.37 |
| Guest74051: thanks for taking the time to reproduce with an accessible file btw. most people wouldn't go through the trouble to do so, but it valuable to us. :) | 15:13.14 |
Guest74051 | No problem :) | 15:13.55 |
sebras | tor8: the llpp people also asked me if the fonts embedded in mupdf are accessible through some reasonable interface. | 15:21.10 |
tor8 | sebras: yes, the fz_lookup_xxx_font functions | 15:21.46 |
| the ones in noto.c | 15:21.58 |
| fz_lookup_base14_font, fz_lookup_builtin_font, fz_lookup_cjk_font, fz_lookup_noto_font, etc. | 15:22.45 |
sebras | tor8: alright, I'll tell them. | 15:23.35 |
| tor8: sebras/master LGTMed? | 15:26.09 |
Guest74051 | tor8, sebras: I created the bug https://bugs.ghostscript.com/show_bug.cgi?id=698357#c0 | 15:28.12 |
| I tested on mupdf 1.11, but bugzilla only showed 1.10. | 15:29.09 |
sebras | Guest74051: ok, I just confirmed that I see the bug. thanks. | 15:35.57 |
mvrhel_laptop | tor8: thanks for fixing the oc issue. rebased the branch and it looks good. I see one blend issue I have to fix though... | 16:15.27 |
| but first I am going to fool with seeing how the colors are looking on the ipad | 16:16.06 |
| oh the iOS app was moved to its own repo | 16:40.53 |
| I was wondering what was going on | 16:41.00 |
| Forward 1 day (to 2017/08/16)>>> | |