| <<<Back 1 day (to 2018/05/14) | 20180515 |
tor8 | sebras: why do you deep copy the resource dict in pdf_filter_page_contents in the non-sanitize case? | 08:10.05 |
Robin_Watts | tor8: Without that he was seeing resources not being copied across. | 08:16.46 |
tor8 | Robin_Watts: different issue, I think. just pdf_keep_obj would suffice, why the *deep* copy is my question | 08:17.20 |
Robin_Watts | tor8: Oh, I see. I'll shut up. | 08:17.37 |
tor8 | and another, larger, question would be "should we also filter out unused resources when only cleaning?" | 08:19.18 |
avih | tor8: do you want the jsarray qsort implementation? | 08:22.18 |
| several weeks ago you'll say you don't mind taking it if i wrote one, and i've pinged you twice on it, but got no comment from you. | 08:23.13 |
| you said* | 08:23.23 |
| i don't mind at all if you implement it or modify my code, it's just orders of magnitude faster than the current sort, and with relatively small LOC, so i think mujs would benefit from it. | 08:26.20 |
tor8 | avih: I've been on vacation, just got back | 08:27.20 |
avih | it also barely uses additional memory - only additional log2(N) on average | 08:27.22 |
moolc | tor8: got the epub problem e-mail? | 08:27.39 |
avih | oh, hope you enjoyed it :) | 08:27.40 |
tor8 | it's supposed to sort in place; I'm not thrilled about creating the temporary 'stack' array | 08:28.15 |
| so I've been meaning to rewrite it, but haven't had time | 08:28.28 |
avih | it does sort in place | 08:28.29 |
| the stack is of yet-to-be-partitioned ranges | 08:28.58 |
tor8 | it would be even faster if not using js arrays for the intermediate arrays | 08:29.00 |
| which I can do if I reimplement it in C | 08:29.19 |
avih | correct, and i did have such implementation, but it's only marginally faster - barely measurable, and you have the additional memory to manage yourself | 08:29.33 |
tor8 | avih: did you c version use js array objects? | 08:30.14 |
avih | the original implementation uses fixed stack array of 2log2(64 bits) items, and just resets the sort on overflow. and because the pivot is random, it'd succeed with this stack size sooner rather than later. however, this doesn't take into account an adversary compare function - which the user can supply, which can cause it to reset over and over and never finish | 08:31.33 |
| so it has to account for the worst case - so the stack must be growable | 08:32.20 |
| tor8: no, the c version used the js alloc function and threw on errors. | 08:32.55 |
| the js array is actually an elegant approach. you get the memory management for free, and it's actually small part of the runtime. most of the time is at the partition function anyway. the stack is used an order of magnitude less than comparisons | 08:34.53 |
| and it's a rather tiny array. for 1000 items the stack array is ~20 items | 08:37.20 |
tor8 | avih: yeah. a ~20 item array that doesn't use the full blown js_Object array implementation with properties would be even nicer though. | 08:38.24 |
avih | oh? is there a size limit where it becomes "full blown" array? | 08:39.10 |
tor8 | no. but even a simple 20-element js array will be pretty heavy and slow compared to what we can get away with in C | 08:40.02 |
| it will always do string-to-number and number-to-string conversions and self balancing binary tree accesses for every array access | 08:40.30 |
avih | yes, i do get this | 08:40.37 |
| but iirc i measured it and it's barely measurable | 08:40.48 |
| the diff, that is | 08:40.55 |
tor8 | right. I can see how it could be dwarfed by other things | 08:41.10 |
avih | we're talking (roughly speaking) 300ms vs 320 ms for 10k items array | 08:41.29 |
| (or was it 40k? anyway, very very small diff, and way simpler implementation - worth it in my book) | 08:42.13 |
| sec, let me try to dig up the c implementation and measure again. | 08:44.54 |
| actually, i'll just test it with fixed stack big enough for the sorts i'll try, and throw on overflow. it won't get faster than that with c stack implementation. | 08:46.33 |
moolc | source/pdf/pdf-object.c:1258:16: warning: variable âdocâ set but not used [-Wunused-but-set-variable] | 08:58.38 |
tor8 | moolc: saw your email, haven't had time to look yet | 08:59.38 |
avih | tor8: the only diff is at the stack macros at the top: https://pastebin.mozilla.org/9085451 | 09:00.19 |
| for 80k items, the c code is ~680 ms, and the jsarray code is ~710ms (roughly on average with several repetitions) | 09:01.15 |
moolc | tor8: cool | 09:01.33 |
avih | tor8: obviously it could also be implemented with recursion, but then the heap could grow quite big on worst case scenarios. i think the own-stack solution is way nicer. | 09:10.25 |
| way nicer to use the heap for such things imo. | 09:10.53 |
| the stack* could grow quite big... | 09:11.15 |
| (for for these 81920 items, the max stack depth was 34) | 09:26.32 |
| fwiw, the array i was testing - without a comparison function so the internal toString based comparison is this, doubled 11 times: | 09:32.44 |
| var arr = ["1000X Radonius Maximus","10X Radonius","200X Radonius","20X Radonius","20X Radonius Prime","30X Radonius","40X Radonius","Allegia 50 Clasteron","Allegia 500 Clasteron","Allegia 51 Clasteron","Allegia 51B Clasteron","Allegia 52 Clasteron","Allegia 60 Clasteron","Alpha 100","Alpha 2","Alpha 200","Alpha 2A","Alpha 2A-8000","Alpha 2A-900","Callisto Morphamax","Callisto Morphamax 500","Callisto Morphamax 5000","Callisto Morphamax 600","Callisto | 09:32.46 |
| Morphamax 700","Callisto Morphamax 7000","Callisto Morphamax 7000 SE","Callisto Morphamax 7000 SE2","QRS-60 Intrinsia Machine","QRS-60F Intrinsia Machine","QRS-62 Intrinsia Machine","QRS-62F Intrinsia Machine","Xiph Xlater 10000","Xiph Xlater 2000","Xiph Xlater 300","Xiph Xlater 40","Xiph Xlater 5","Xiph Xlater 50","Xiph Xlater 500","Xiph Xlater 5000","Xiph Xlater 58"]; | 09:32.46 |
| hmm.. it looked smaller in my editor :) | 09:33.34 |
moolc | avih: as opposed to what? (IOW where does it look larger?) | 09:41.47 |
avih | looks bigger at the irc paste, i.e. sorry for the apam-ish paste :) | 09:42.18 |
| spam*-ish | 09:42.28 |
| in my editor it's 4 lines... | 09:42.44 |
moolc | avih: i pitty you... my editor _is_ my irc client | 09:43.09 |
| and mail and news... | 09:43.16 |
avih | you pray to the emacs gods? | 09:43.47 |
moolc | avih: hope https://boblycat.org/~malc/scratch/viper2.png answers your question | 09:45.35 |
avih | well, i don't pitty you, even if i should :) | 09:46.20 |
| i say everyone uses what's bet for them. | 09:46.53 |
| best | 09:46.58 |
moolc | avih: we'd be living in paradise were that assertion true | 09:50.38 |
| last i checked we aren't | 09:50.45 |
tor8 | avih: try this on for comparison: https://pastebin.mozilla.org/9085454 | 09:50.47 |
| that's using the system qsort with a temporary js_Value array | 09:50.59 |
avih | moolc: correction: what they think is best for them :) | 09:51.57 |
moolc | tor8: http://ix.io/ | 09:52.03 |
| avih: warmer! :) | 09:52.16 |
avih | :) | 09:52.29 |
| (though it was implied, TBH :) ) | 09:53.09 |
tor8 | moolc: hmm, I shall try to remember that one | 09:54.03 |
moolc | avih: irc is notoriously bad at properly conveying sarcasm innuendo and intentions | 09:54.09 |
| tor8: thought it's right up your alley :) | 09:54.26 |
avih | tor8: wouldn't this break for two js_States in two threads sorting concurrently? | 09:54.33 |
tor8 | avih: yes, hence the TODO: qsort_r | 09:54.55 |
avih | ah | 09:55.03 |
tor8 | but before I start getting into that nastiness, I figured it'd be worth having your input | 09:55.30 |
avih | sure. sec | 09:55.42 |
| tor8: i believe qsort_r/s are way less available than plain qsort. i did consider using the native qsort, and it's possible too with your own context reachable from the items, just wasn't worth the effort IMO. | 09:56.59 |
| tor8: definitely worth it. about 10x faster - ~70ms for 80k items, and on the face of it the sort seems correct. | 10:01.47 |
| (with mingw gcc 7.3 64) | 10:02.30 |
| now i should look at what it does. | 10:03.02 |
| hmm.. so you use O(N) additional memory | 10:05.36 |
| probably worth it though | 10:06.03 |
| it does use "internal" implementation knowledge though. fair enough for a built in implementation, but i _think_ an implementation which uses the mujs API couldn't get away easily with this approach | 10:08.17 |
| but yeah, definitely nice that it bypasses the "official" accessors | 10:09.10 |
| quite the overhead it seems. about time you implement js array as c array for some subsets of js arrays :) | 10:10.42 |
moolc | tor8: btw. have you ever considered using Symbola instead of Charis? it doesn't have variants, but other than that... (on the plus side your symbols will be covered by it too) | 10:13.46 |
avih | tor8: does your implementation account for empty items? i think it doesn't. for an empty item i think it uses the last non empty one. | 10:19.18 |
| a correct and more efficient implementation would be to: pass #1: count the empty and undefined items and collect the non-empty items. pass #2: sort only the non empty ones. pass #3: copy the sorted items to the begining, #4: append the number of undefined items, #5: clear the rest of the array to the original length | 10:21.45 |
| for very sparse (and big) arrays, such as maybe hash table implementations, sorting only non empty items could yield orders of magnitude faster sort | 10:24.15 |
tor8 | avih: yes. an "external" implementation can't use the trick of using js_Value directly (since that exposes too many dangerous surfaces for clueless users) | 10:24.35 |
avih | yeah | 10:24.44 |
tor8 | sorting sparse arrays is going to be problematic no matter what | 10:25.24 |
avih | yeah, but it'd be O(N) rather than multiplied by log(N). could be very meaningful | 10:25.57 |
tor8 | but the 'flattening' when copying into the temporary array could possibly handle that reasonably easily | 10:26.01 |
| handling sparse arrays is 'implementation defined' according to the spec | 10:26.15 |
avih | yes, since you go over them anyway. | 10:26.17 |
tor8 | I could iterate over the keys rather than 0..array.length | 10:26.32 |
| but then I should still only be looking at numeric keys | 10:26.47 |
avih | not sure i follow. you do the same allocation (worst case all items are non empty), and same going over the items, but only copy non empty ones, and count empty and undefined. | 10:27.17 |
| copy non empty and non undefined | 10:27.41 |
tor8 | comparing undefined items is fast in this implementation | 10:28.03 |
inflex | Does Tamir Evan show up here very frequently? | 10:28.32 |
tor8 | handling sparse arrays would create a shorter temporary array | 10:28.33 |
avih | yes, but potentially it multiplies the number of comparison by log(N) for no reason. all undefined go after the defined, and before the empty | 10:28.39 |
inflex | Wanted to just pass him an update to fix the git submodule issue we ran in to last night | 10:28.52 |
avih | but yes, if you count the undefined and empty before mallocing the temp array, then sparse array would only add O(defined items) temp memory | 10:30.19 |
| careful that if you realloc the temp array, then you start getting issues with longjmp protections | 10:31.11 |
| i _think_ volatile won't be enough on such case, as the qsort implementation doesn't necessarily with with volatile types | 10:31.49 |
| +work | 10:32.03 |
| (i did went there during my attempts) | 10:32.17 |
| but if you have an efficient way to count the properties, even if it includes undefined but non-empty items, then it would be good enough for sparse arrays. | 10:33.50 |
| tor8: btw, re js arrays as c array, i think i have a reasonably useful approach. not strictly c array - at least not for sparse arrays, but still way more efficient than the current code. the implementation is continuous memory of items ordered by their index value - where the index is part of the item itself. access is binary search for the item, which would be O(1) for non sparse array. splice etc would be implemented with memmov + rewrite of the indices for | 10:46.41 |
| moved items. you'd get O(N) for splice, O(1) for push/pop, O(1) for access of non sparse arrays, and O(log(N)) for access of sparse arrays, but pure integers in c and no need to compare string property names. | 10:46.42 |
| where N is the non empty items. so you get sparse fur 100% free | 10:50.26 |
| for* | 10:50.34 |
| the nice thing about it is that sparse doesn't need special considerations, and for non sparse it _is_ a c array. i think it's really quite nice. | 10:51.32 |
inflex | That's quite elegant. | 10:51.46 |
avih | and if the array is sparse relatively evenly, then that O(N) access will become typically O(1), because you start the search from a position relative to the array length. e.g. if length is 100 and it has 10 non-empty items spread evenly, and you want to access index 80, you start at index 8, and in 1-2 iterations you find your actual item | 10:55.47 |
| actually, the continuous memory would be just the sorted indices where each points to a jsvalue item. it will have a lot less overhead than the current implementation, and splice/copy etc would just be a matter of memcpy/mov and rewriting a bunch of indices without ever touching the actual values | 11:02.30 |
tor8 | using a c array for js array objects has a few issues that complicate matters | 11:03.42 |
| we still need to store key properties since each property can have metadata attributes, like readonly, no-delete, getter/setter accessor functions, etc. | 11:04.42 |
avih | it's not strictly a c array. just continuous memory of index values which point to the actual valuies. | 11:04.50 |
| right. | 11:05.07 |
| i'm not very familiar with the internal implementation, but i _think_ the approach, in a nutshell, is relatively solid. of course, the devil is always in the details, but still, solid approach is a nice starting point | 11:06.16 |
tor8 | the benefit would be had from accessing the js_Property as a mixture of array and tree lookups, instead of just tree lookups | 11:06.18 |
| take a look in jsvalue.h the struct layouts there should be telling enough | 11:06.35 |
| all values (the stuff stored on the stack) are js_Value structs (which are 16 bytes) | 11:07.40 |
avih | "telling enough" - depends who's listening :) | 11:07.49 |
tor8 | values that point to objects point to a js_Object struct which lives on the heap | 11:08.01 |
| and each js_Object has a js_Property *properties binary tree of properties | 11:08.23 |
avih | i roughly know that much, yes | 11:08.42 |
tor8 | where each js_Property is a string name, some attributes, and a getter/setter | 11:08.43 |
| doing js arrays as c arrays would mean having two structures for holding properties | 11:09.06 |
| a tree, and an array | 11:09.10 |
avih | correct | 11:09.15 |
tor8 | I have experimented with it, but the code got massively more complicated last time I tried | 11:09.38 |
avih | the tree for non array-index properties, and the continuous indices for tindex items | 11:09.48 |
tor8 | and it didn't cope with all the weird corner cases of property attributes | 11:09.49 |
| but something simpler than I tried then might work (I was hoping to avoid the js_Property stuff altogether) | 11:10.12 |
avih | you tried you mean the arraybuffer branch (or whatever its name was)? | 11:11.10 |
| also, jsproperty could be enhanced a bit to use the "c array" if the object is an array | 11:11.49 |
| tor8: yeah, that's an unfortunately ugly way for qsort context. regardless, i don't get how it handles empty values. what does js_getindex(J, 0, i); do for an empty item? | 11:26.18 |
| (in what you just pushed) | 11:26.33 |
| wouldn't it just use undefined? and then fill the array with undefined value for every empty value? that would be incorrect IMO | 11:28.48 |
| not to mention way more memory used for sparse arrays after sort | 11:29.18 |
tor8 | avih: correct on all points (undefined) | 11:31.10 |
avih | you should just collect the non empty values, then copy them back, then do js_setLength to number of collected items, then back to the original value | 11:31.26 |
tor8 | all js_array functions behave similarly; sparse arrays are not handled well by the js spec | 11:31.30 |
avih | all implementations sort defined items first, then undefined, then empty last. and you could do that easily too | 11:32.08 |
| tor8: (untested) https://pastebin.mozilla.org/9085466 | 11:37.42 |
tor8 | it would mean bloating the sortslot array with yet one more field | 11:38.02 |
| but given its alignment requirements, that's probably not an issue | 11:38.38 |
| so let me give it a try | 11:38.42 |
avih | tor8: sorry https://pastebin.mozilla.org/9085467 | 11:39.06 |
| yes, the temp array is O(len) rather than O(non empty), but not worse than your approach, and if you have an efficient way to count the number of non empty items (or even all the own properties) in O(defined properties), then you can allocate a more efficient amount of memory. | 11:43.06 |
| oh, and btw, one of the biggest advantages of my suggested approach for continuous memory arrays is that iterating them is O(defined items). | 11:45.25 |
| which is highly useful for map, filter, etc. | 11:46.09 |
tor8 | hm, try tor/master | 11:51.34 |
| avih: ^ | 11:51.37 |
avih | tor8: is delindex for the rest more efficient than two setlength? | 11:53.17 |
| (i do understand setlength can imply those delindex or equivalent) | 11:53.49 |
tor8 | no, but it is clearer (and it matches the same behavior as the initial pulling-to-temp-array) | 11:53.54 |
| setlength is slightly optimized, and can be faster than the equivalent delindex loop | 11:54.18 |
avih | then IMO put that as a comment and use setlength | 11:54.46 |
tor8 | setlength has a special case for handling sparse arrays, other than that it still calls delindex behind the scenes | 11:54.48 |
avih | yeah, i assumed so, but possibly can do that with less searching. | 11:55.17 |
tor8 | unfortunately that optimization involves creating an iterator object (which mallocs a lot of stuff, since an iterator has to be stable if properties are deleted while it is running) | 11:56.10 |
avih | behavior wise, i don't think it's different as far as the user can tell. | 11:56.11 |
| gotcha. | 11:56.21 |
tor8 | I was going to say -- premature optimization :) | 11:57.37 |
| now if you want to handle sparse arrays properly, you'd iterate the properties instead of the array length when creating the temporary array too | 11:58.04 |
avih | tor8: i _think_ it doesn't behave fully well with empty items. | 12:01.21 |
| (empirically). trying to come up with a test case. in a nutshell though, it seems it can make empty items disappear or become empty strings. | 12:01.59 |
| tor8: no, it's ok. it's concat which removes empty items. | 12:05.09 |
| without concat, it does "put" the empty ones at the end, just after the undefined ones. | 12:05.37 |
| so you push it to master? | 12:07.04 |
| <tor8> now if you want to handle sparse arrays properly, you'd iterate the properties instead of the array length when creating the temporary array too <-- that's always the best way if you can do so efficiently, isn't it? but in all your iteration function you always go from 0 to len, i.e. including empty items. | 12:09.14 |
tor8 | avih: you'll have to define "iterate the properties" to be more specific -- which properties? ;) | 12:10.24 |
| the own properties, or also those of the inherited objects | 12:10.33 |
| the spec is pretty clear about iterating over the integers, not the properties | 12:11.01 |
avih | hmm.. | 12:12.27 |
| i never though of array items as inherited, though i guess inheritance should work here the same as everywhere else | 12:13.12 |
tor8 | and the setlength trick will only work for actual Array objects | 12:13.41 |
| not other objects which you can pass to Array.prototype.sort.apply() | 12:13.55 |
| a = Object.create([5,4,3]); a.sort(); a is *not* an array with magic .length handling | 12:15.27 |
| but it has Array.prototype.sort in its prototype chain | 12:15.43 |
avih | huh | 12:15.53 |
tor8 | trying to be clever when JS is involved is guaranteed to backfire :) | 12:16.36 |
avih | lol | 12:16.42 |
| so you're pushing the delindex thingy to master? | 12:17.00 |
tor8 | I will | 12:17.57 |
| inflex: sorry, missed your question first time around. relative git submodules probably don't work nicely with githubs automated buttons. | 12:20.20 |
| then again, githubs automatic button voodoo seldom does what I want/expect anyway :) | 12:20.53 |
| moolc/malc/malc_: (for the logs) I need the variants, and it looks more like computer modern than Charter | 12:24.07 |
| not to mention the inocompatible licensing | 12:26.08 |
inflex | tor8, already sorted it out, just had to do a bit of manual adjustment to the .gitmodules and it's working fine now | 12:37.29 |
| ( made them point directly to the Artifex repos on github - https://github.com/inflex/mupdf/blob/master/.gitmodules ) | 12:38.19 |
sebras | tor8: wrt to deep copy vs. keep: yes it probaably ought to have been keep. | 13:00.40 |
| tor8: wrt to the question of cleaning out resources or not... I wasn't entirely sure what we wanted, after a quick discussion with robin I understood it like the clean shouldn't clean out the resources while sanitize should. perhaps I was mistaken? | 13:01.40 |
tor8 | 'clean' is intended to pretty-print the syntax | 13:02.39 |
| 'sanitize' does fancy processing and removes redundant state changes | 13:02.54 |
sebras | ok, is sanitize only operating inside content streams? | 13:03.15 |
| because we do cleaning of duplicate objects etc as well. | 13:03.30 |
tor8 | both the 'clean' and 'sanitize' operate on content streams | 13:03.32 |
| and recursively the resources used by the content stream | 13:03.58 |
| so that type3 fonts and patterns and other XObject forms will also have their content streams cleaned/sanitized | 13:04.15 |
sebras | tor8: right, but if we ignore content streams for the time being. clean doesn't remove any other objects that are redundant otherwise, right? | 13:08.06 |
| tor8: if that's the case perhaps we should only clean out redundant resources if we actually sanitize the stream? | 13:08.34 |
| tor8: or add another -d flag..? ;) | 13:08.43 |
tor8 | sebras: the -c flag to mutool clean touches every page's content stream data, nothing else | 13:09.14 |
| sebras: yeah, it's probably fine to just leave the resource dict as-is for '-c' | 13:09.37 |
| but then there is no way to get it to remove unused resources other than the full-blown -s sanitizing filter | 13:09.59 |
sebras | tor8: ah, right. I forgot about the -c flag. | 13:11.00 |
tor8 | this stuff *only* happens when asking for -c or -s | 13:11.20 |
moolc | tor8: i wanted to extract fonts from one pdf and failed.. i'm %99 positive some reincarnation of mutool was able to do that, is my memory once again at fault? | 13:11.30 |
sebras | tor8: mmm, so in that case clean without flags leaves resources and content streams intact, clean -c removes redundant resources but leave the content stream intact, while clean -cs would remove redundant resources and also sanitize the stream. that seem reasonable some how. | 13:11.44 |
tor8 | as of today: -c: pretty-print content streams, leave resource dictionary intact. -s: recreate content stream and remove redundant state changes, and remove unused resources from the resource dictionary. | 13:13.01 |
| -s combined with -g will drop unused resources from the file | 13:13.12 |
| I guess -c and -s are conflicting flags | 13:13.42 |
sebras | tor8 sounds to me like -sg and just -s are not different wrt to content streams and their resources (they do affect _other_ type of objects differently of course) | 13:15.10 |
tor8 | clean=syntax|state might be a better way to phrase it | 13:15.21 |
| -s only recreates the /Resources dictionary (by removing stuff that is unreferenced from the content stream) | 13:15.53 |
| -g eventually removes unreference resource objects from the file, if nothing else uses them | 13:16.18 |
| moolc: it should still work. | 13:16.43 |
sebras | tor8: right, so with -s the resource objects would still be there but no longer references by the resource dict. | 13:17.31 |
| tor8: this is a bit of a mess. :) | 13:17.40 |
tor8 | yes. it is. | 13:17.48 |
| many of these 'mutool clean' flags should just be separately available operations that act on a pdf_document | 13:18.12 |
| not baked into the magic pdf_save_document options | 13:18.22 |
moolc | tor8: what exactly should? '$ mutool huh pdf'? (i've completely forgotten what "huh" should be ;( ) | 13:18.33 |
tor8 | mutool extract | 13:18.40 |
moolc | tor8: and object should be? (root?) | 13:21.29 |
tor8 | https://mupdf.com/docs/manual-mutool-extract.html | 13:22.44 |
sebras | tor8: do you want me to make another commit ot replace pdf_deep_copy_obj() with pdf_keep_obj()? | 13:23.39 |
tor8 | sebras: I already have one on tor/master | 13:23.55 |
sebras | tor8: ok. | 13:24.04 |
tor8 | I was just wondering if there was a deeper reason that I didn't understand :) | 13:24.23 |
moolc | tor8: well sure, but i started asking because NOTHING is produced when i run mutool extract on this pdf here | 13:24.40 |
sebras | tor8: no, I was probably just confused as usual. | 13:24.46 |
tor8 | then there are likely no fonts in it. try mutool info. | 13:24.54 |
moolc | tor8: Fonts (4): | 13:25.20 |
| all four are Type0 if that's of any relevance | 13:25.57 |
tor8 | Type3 would be the relevant type ... since they don't have an embedded font file. | 13:26.50 |
| the fonts could also be non-embedded in which case extracting them would be impossible | 13:27.13 |
moolc | tor8: they are embedded | 13:27.39 |
| the one page pdf is whooping 400K in size, and i can read it just fine | 13:27.54 |
| and llpp reports that they are four subsets of calibri | 13:28.13 |
| Fonts (4): | 13:32.45 |
| 1(3 0 R):Type0 'CIDFont+F1' Identity-H (11 0 R) | 13:32.45 |
| 1(3 0 R):Type0 'CIDFont+F2' Identity-H (19 0 R) | 13:32.45 |
| 1(3 0 R):Type0 'CIDFont+F3' Identity-H (27 0 R) | 13:32.45 |
| 1(3 0 R):Type0 'CIDFont+F4' Identity-H (35 0 R) | 13:32.48 |
| | 13:32.50 |
| is what mutool info says | 13:32.54 |
tor8 | moolc: what does 'mutool show $file 11' say? | 13:35.08 |
moolc | tor8: http://ix.io/1anO | 13:35.49 |
tor8 | ah. it's not working because the FontDescriptor is not a numbered object. | 13:36.19 |
| quoth the specification: FontDescriptor dictionary (Required except for the standard 14 fonts; must be an indirect reference) | 13:36.55 |
moolc | tor8: so, in essence, the pdf producer that msword uses blows goats? | 13:37.28 |
tor8 | moolc: in a word, yes. | 13:38.41 |
moolc | tack *expletive* | 13:39.00 |
tor8 | moolc: you can get the data using 'mutool show' though | 13:39.30 |
| mutool show -b $file 11/DescendantFonts/FontDescriptor/FontFile2 | 13:40.01 |
moolc | tor8: no doubt, but all of this is just a measuring dicks contest between me and an ex co-worker, my cv in pdf form was 10x times smaller than his, and i wanted to know why | 13:40.18 |
| tor8: /tmp | 13:41.38 |
| - ~/x/rcs/git/mupdf/build/native/mutool show -b $file 11/DescendantFonts/FontDescriptor/FontFile2 | 13:41.39 |
| null | 13:41.39 |
| | 13:41.39 |
| tor8: https://boblycat.org/~malc/scratch/bravocnntypographers.png | 13:44.02 |
sebras | tor8: 10-line bugfix on sebras/master | 13:47.40 |
| tor8: it clusters well. | 14:13.20 |
tor8 | sebras: hmm. do you have a file for that? | 14:14.43 |
| sebras: it might make sense to just assume a sane default if it's set to 0 instead | 14:14.59 |
| like 1000 or 2048 (type1/truetype default values) | 14:15.09 |
sebras | tor8: ok, that's why I asked for a review. :) | 14:16.11 |
| tor8: the test file is in the bug report, but it is a fuzzed file, so I doubt it will make you happy. | 14:16.32 |
tor8 | if (units_per_EM == 0) units_per_EM = (ft_kind(face) == TRUETYPE) ? 2048 : 1000; should do the trick I think | 14:20.56 |
sebras | tor8: no warning? | 14:23.19 |
tor8 | sebras: nah. nobody looks at warnings anyway... :P | 14:23.39 |
| I think you could probably just get away with setting it to 2048 no ft_kind check required | 14:24.14 |
sebras | learns that he's a nobody. | 14:24.15 |
tor8 | I just need to check the freetype implementation to see where it gets/sets the units_per_EM for non-truetype files | 14:24.35 |
| I suspect it can't be 0 for type1/cff files | 14:25.27 |
| so we only need to worry about the truetype case (where 2048 is a decent fall-back) | 14:25.42 |
| if (units_per_EM == 0) units_per_EM = 2048 (with or without warning) | 14:26.06 |
sebras | tor8: new commit, clustering as we speak. | 14:29.26 |
tor8 | sebras: you don't need the (FT_Face) cast. font->ft_face is a void* | 14:30.51 |
sebras | tor8: what cast? | 14:31.43 |
| tor8: look again. | 14:31.47 |
tor8 | FT_Face face = (FT_Face) fontdesc->font->ft_face; | 14:31.50 |
sebras | tor8: that's not the code from sebras/master... ;) | 14:32.01 |
tor8 | not now it isn't... :) | 14:32.29 |
| sebras: LGTM. | 14:37.34 |
sebras | tor8: done! | 14:39.26 |
avih | tor8: btw, some numbers comparing the new native sort with pure js implementation for 10k items of pure strings: 1. for default toString sort, the native is ~40 times faster than the js sort (8ms / 300ms). 2 for trivial compare function (return a > b ? 1 ... ) it's 10x faster (20 ms / 200 ms - weird, not sure why it's 200 here and 300 without a function), and for slightly less trivial but still fairly fast compare function it's only twice faster (150ms/300ms). | 15:22.56 |
| that's the "internal" compare function which yields 300ms: return (a = ''+a) > (b = ''+b) ? 1 : a == b ? 0 : -1;, and that's the external trivial compare function which yields 200ms: return a > b ? 1 : a == b ? 0 : -1; | 15:25.29 |
| i guess the assignment and creation of new string is expensive-ish, though it's required if one doesn't know in advance the items are strings. | 15:26.26 |
| maybe it could be special cased (concat of string and an empty string) to avoid creation of a new value on such case | 15:27.32 |
| anyway, just fyi. thanks for adding qsort :) | 15:28.56 |
inflex | Tamir_Evan, I sorted out that .gitmodules problem for myself. Not sure if you care, but I can share the file if you preferred. | 17:23.04 |
Tamir_Evan | inflex: I saw the changes you made, and will probably do something similar in my own repo in the near future. | 17:23.27 |
inflex | np, thanks again for your work, things have progressed nicely | 17:23.46 |
Tamir_Evan | inflex: Thank you fro bringing the issue with the repo forking to my attention, and for making use of my repo. | 17:25.49 |
inflex | Well, wasn't really an issue with your fork per`se, more just seems the way the original mupdf was done. btw, what sort of changes did you have to do in order to get MinGW to build the GL version on Windows? | 17:27.13 |
Tamir_Evan | inflex: It was mainly changes to the 'Makethird', and a few lines in the 'Makefile'. They were mainly done here: https://github.com/TamirEvan/mupdf/commit/019f3a09e7adb3b5c023b2067c3e43af8050b33d , but some of the changes were done in other commits (both before and after). | 17:42.07 |
inflex | okay, so not overly dramatic, but certainly important. Surprised it wasn't in there by default. | 17:45.24 |
pihug12 | Hi! I was wondering if "mupdf-android-viewer-1.13.0-universal.apk" & "mupdf-android-viewer-mini-1.13.0-universal.apk" on https://mupdf.com/downloads/ were generated from the same Git project? | 20:17.33 |
tor8 | pihug12: no, they are created from separate git repositories. | 20:30.21 |
pihug12 | - http://git.ghostscript.com/?p=mupdf-android-viewer-mini.git --> last commit for v1.13 | 20:35.55 |
| - http://git.ghostscript.com/?p=mupdf-android-viewer.git --> last commit for v1.12 | 20:36.05 |
| The first APK is build with the 2nd repository despite the "wrong" version? | 20:37.02 |
tor8 | pihug12: try pulling again. | 20:37.54 |
pihug12 | Seems good now :) | 20:40.16 |
| And the tag is missing for v1.13 in the "mupdf-android-viewer-mini.git" repository | 20:41.00 |
| The version from the APKs seem to be still v1.12. Some strings may need to be updated in these files: | 21:00.14 |
| - http://git.ghostscript.com/?p=mupdf-android-viewer.git;a=blob;f=app/build.gradle;hb=HEAD | 21:00.25 |
| - http://git.ghostscript.com/?p=mupdf-android-viewer-mini.git;a=blob;f=app/build.gradle;hb=HEAD | 21:00.33 |
tor8 | pihug12: hm, yes, seems like a number or two has been missed | 21:08.27 |
pihug12 | Thanks! | 21:11.35 |
tor8 | I'll rebuild the apk binaries tomorrow. | 21:11.56 |
pihug12 | Is this possible to put the 1.13.0 tag on these last 2 commits? | 21:11.56 |
| I think F-Droid builds are based on tags | 21:12.42 |
tor8 | pihug12: yeah, no problem. | 21:12.51 |
pihug12 | Cool! Perfect! | 21:13.49 |
| Thanks for your time. I will check with F-Droid now. | 21:14.13 |
| Forward 1 day (to 2018/05/16)>>> | |