| <<<Back 1 day (to 2016/10/12) | 20161013 |
sebras | Robin_Watts: so w is the u plane and you << a single step each time through the loop and compares this to the largest w for any component. | 10:40.04 |
| Robin_Watts: and i is how many times you shifted. | 10:40.34 |
| Robin_Watts: what is size? | 10:40.35 |
Robin_Watts | sebras: yes. | 10:40.36 |
| size allows for the fact that when you shift up you don't get exactly what you expect. | 10:41.02 |
| i.e. if I have a U that's 129 wide, you'd naively expect the Y to be 129<<1 wide. | 10:41.41 |
| but it might actually be (129<<1)-1 wide. | 10:41.59 |
| or it might be (129<<2) wide, or (129<<2)-{1,2,3} wide. | 10:42.32 |
sebras | ok, so the computation is never further from the correct max_w than the value in size. | 10:42.38 |
Robin_Watts | Though I think my test should be w <= max_w - size | 10:42.39 |
| or something. hmm. | 10:42.39 |
sebras | Robin_Watts: let me put it like this. we have already computed i, and this is returned. | 10:42.40 |
| Robin_Watts: why is it important that we throw if w < max_w - size? | 10:42.41 |
Robin_Watts | sebras: Suppose the largest component is 256 wide. | 10:42.42 |
sebras | Robin_Watts: are we risking stepping outside the destination samples buffer? | 10:42.42 |
| ok, 256 samples wide. | 10:42.42 |
Robin_Watts | we can accept smaller components of 128, 64, 32 size. | 10:42.42 |
| The intention is that I whinge if we get one of (say) 96 wide. | 10:42.42 |
sebras | Robin_Watts: right. | 10:42.42 |
Robin_Watts | ok, that second test is just broken. | 10:42.42 |
| if (w >= max_w + size) | 10:42.42 |
| that might work better. | 10:42.43 |
sebras | Robin_Watts: can we do while (w < max_w) and do the check and fz_throw() after the loop? | 10:42.49 |
| right now there is no need for the trailing return i;, we can never end up there. | 10:42.50 |
Robin_Watts | yes, nicer. | 10:42.50 |
sebras | Robin_Watts: so if I understand this correctly: if the widest component is 256 samples, and one of the subsampled components is 64 samples wide w == 256, size == 4 and i == 2 | 10:42.52 |
| but we would also accept: a sub sampled component of 65, where w == 260, size = 4 and i == 2..? | 10:42.52 |
Robin_Watts | Updated version with new code that's hopefully easier to follow. | 10:44.21 |
sebras | Robin_Watts: I'm still not confident I truely understand the if-statement. | 10:56.05 |
| so basically you determine the number of shifts necessary to get w up to at least max_w size (possibly bigger) | 10:56.26 |
Robin_Watts | And then make sure it's not too much bigger. | 10:56.52 |
sebras | then you say, ok. take the width of the component that is subsampled, subtract one and shift it equally far. this may not be at least a big or bigger than the widest component. | 10:57.09 |
| why -1? | 10:57.45 |
| note that the code might be perfectly fine, this might just be me needting to fold my brain paths correctly to understand it. | 10:58.25 |
Robin_Watts | Ah, wait, I have a new formulation. | 10:58.52 |
| Try now. | 11:00.13 |
sebras | Robin_Watts: the reason for me asking for this is because I want to implement subsampling support for tiff, and I had trouble doing so. | 11:00.50 |
| Robin_Watts: so I'm going to steal your code later. | 11:01.00 |
Robin_Watts | sebras: I may already have that implemented for SOT. | 11:01.09 |
| I moved the tiff stuff over to SOT. | 11:01.21 |
| and in so doing, I had to rewrite various bits of it to avoid the need for decode_tile etc. | 11:01.36 |
| I moved a lot of stuff to working on streams. | 11:01.55 |
| I plan to port that back at some stage. | 11:02.09 |
sebras | ok, so now you are saying: take the widest component, pretend to subsample that, if the that width matche that of the relevant component then we konw how many steps to subsample. but if we cannot find a subsample depth that corresponds with the widest component, error out. | 11:06.09 |
| now I get it. :) | 11:06.16 |
Robin_Watts | cool. | 11:06.20 |
sebras | that probably means that the code now is inefficient. ;) | 11:06.28 |
| given the last few days of discussions. ;) | 11:06.45 |
| Robin_Watts: why is jpx->comps[k].data indexed by yy? this is the location where we read the data from. | 11:13.00 |
| Robin_Watts: isn't the component data packed more tightly for the subsampled components? | 11:13.18 |
Robin_Watts | The *old* code indexes using v = jpx->comps[k].data[y * w + x]; | 11:16.14 |
| The new code therefore needs to use v = jpx->comps[k].data[(y>>sw) * w + (x>>sh)]; | 11:16.39 |
| I avoid redoing lots of that calculation, by setting yy = (s<<sw) and data = &jpx->comps[k].data[yy]; | 11:17.24 |
| then I can just do: v = data[s>>sh]; for each pixel. | 11:17.39 |
sebras | so the subsampled component values are not packed tightly inside data? | 11:17.52 |
Robin_Watts | sebras: I don't understand. | 11:18.18 |
sebras | Robin_Watts: I.e. it looks something like: sample nothing nothing nothing sample nothing nothing nothing sample if that component is subsampled 2 times..? | 11:18.35 |
Robin_Watts | Noooo. | 11:18.42 |
sebras | Robin_Watts: right, if it did then I'd be surprised. | 11:18.56 |
Robin_Watts | x and y run to the unsampled width. | 11:19.12 |
| (and height0 | 11:19.17 |
| so (x>>sw) runs from 0 to the subsampled width. | 11:19.32 |
sebras | Robin_Watts: but I can't see how the contents of the subsampled component can be: sample0 sample1 sample2 and we index in the array using the sample indicies as for the widest component. | 11:19.41 |
Robin_Watts | data[x] would be indexing as if we used the widest component. | 11:20.02 |
| data[x>>sw] is indexing as if we use the subsampled component. | 11:20.23 |
| Note that the old code nested the loops y, x, k. | 11:20.41 |
| The new code nests them y, k, x. | 11:20.50 |
sebras | yes, I know. | 11:21.06 |
| oh! I'm so silly. >> divides, << multiplies. :) | 11:21.37 |
| now it all makes sense. | 11:21.42 |
Robin_Watts | :) | 11:21.45 |
sebras | I has looking at the l2subfactor() and was blindly thinking that >> was shifting left... | 11:22.14 |
| Robin_Watts: ok, so now it truely LGTM. :) | 11:27.49 |
Robin_Watts | Thanks. | 11:29.38 |
| Does the openjpeg code in gs cope with subsampled things? | 12:25.27 |
kens | No clue | 12:29.10 |
| Do you have an exmple ? | 12:29.19 |
Robin_Watts | I do, I'll run a test in a mo, but I figured sebras might know. | 12:33.54 |
| evidently it does. | 12:35.02 |
sebras | Robin_Watts: but not in the luratech case. | 12:37.05 |
Robin_Watts | The results in gs are awful for tests_private/pdf/sumatra/jpx_image_components_have_different_dimensions.pdf | 12:38.43 |
| I wonder if it's subsampling all the planes to match rather than upsampling the subsampled ones? | 12:38.59 |
sebras | Robin_Watts: using luratech or openjpeg? | 12:39.25 |
Robin_Watts | openjpeg | 12:39.36 |
| So the jpx image is 480x640. | 12:42.33 |
| If I use -g480x640 -dFitToPage I should see it 1:1, right? I must be getting that wrong, cos it's cropped. | 12:43.00 |
| -dPDFFitPage, d'oh. | 12:44.12 |
| ok, that looks horribly subsampled. | 12:44.19 |
sebras | Robin_Watts: can you help me an look at the log from my latest cluster push. | 14:05.40 |
| Robin_Watts: seems like quite a number of .jp2-files are different from trunk. | 14:05.59 |
Robin_Watts | ok... | 14:06.01 |
| Have you pulled in my changes? | 14:06.14 |
sebras | Robin_Watts: not if their are not on master. | 14:06.26 |
Robin_Watts | They are on golden master | 14:06.32 |
sebras | Robin_Watts: ok. then they are in my run. | 14:06.44 |
Robin_Watts | No, they aren't :) | 14:07.15 |
sebras | Robin_Watts: so then why could my clusterpush (consisting of pnm-changes and a leak-fix) claim differences in the errors from j2k-files? | 14:07.17 |
Robin_Watts | If you look in the diffs shown at the bottom of the log, all my subsampling and ycc->rgb changes are showing as missing in your cluster code. | 14:07.48 |
sebras | Robin_Watts: oh. | 14:12.46 |
| Robin_Watts: tor8: up for grabs: a number of pnm commits and a leak fix on sebras/master | 14:14.06 |
Robin_Watts | sebras: *dp++ = v ? 0 : 0xff; ? | 14:15.49 |
| if (cond) *dp++ = foo; else *dp++ = bar; might be better expressed as: *dp++ = cond ? foo : bar; | 14:16.58 |
| dunno if all compilers are smart enough to optimise the two the same. | 14:17.18 |
| The ==255 case is by far the most common, so I don't necessarily see the aesthetic argument for not having it as the first one in the if chain. | 14:18.45 |
| if (pnm->maxval == 255) memcpy(dp, p, pnm->width * pnm->height * img->n); ? | 14:20.23 |
| (I know that some of that code predates you) | 14:22.02 |
| if (pnm->maxval < 0 || pnm->maxval >= 65536) | 14:22.59 |
| I submit that pnm->maxval == 0 presents problems too :) | 14:23.12 |
| If we get multiple TUPLTTYPE lines, should we not at least warn? | 14:25.42 |
| Why are we strdupping tupltype rather than using an enum ? | 14:26.34 |
sebras | is still working on the memcpy... | 14:28.02 |
Robin_Watts | You've seen cmyk pnm files, with alpha? | 14:28.56 |
sebras | Robin_Watts: I didn't see any cmyk files at all. | 14:29.27 |
| also creating files using alpha is quite difficult. | 14:29.41 |
| but doable. | 14:29.43 |
Robin_Watts | Then I don't understand 32eeb29c2091 | 14:29.49 |
sebras | Robin_Watts: I haven't been looking for them but they are in the spec. | 14:30.13 |
| so I created one from scratch. | 14:30.31 |
Robin_Watts | This code all suffers from a pet peeve of mine in that we use foo->bar as an upper loop bound, and then do pointer stuff in the loop. | 14:32.15 |
| int n = img->n; for (k =0; k < n; k++) .... is better IMHO than for (k=0; k < img->n; k++) ... | 14:33.30 |
| cos the moment you hit any memory in the .... bit, the compiler has to assume that img->n may have changed, and so loads it again. | 14:34.02 |
| and in the last commit I disapprove of the changes to fz_drop_glyph_cache_context | 14:37.13 |
sebras | TUPLTYPE is specified weirdly in PAM. the spec says you should concatenate all TUPLTYPE lines, but doesn't really say what happens with the samples if you have TUPLTYPE RGB\nTUPLTYPE GRAYSCALE\n. in what order do the samples come? R G B gr? maybe only gr (i.e. the 1st line is ignored)? maybe only R G B (i.e. the 2nd line is ignored) | 14:38.55 |
Robin_Watts | sebras: Wow, broken specs R us. | 14:39.38 |
sebras | also netpbm itself checks for TUPLTYPE == "RGB" e.g. so... | 14:39.59 |
Robin_Watts | I can only imagine: | 14:40.18 |
sebras | maybe we should just error out if we see those..? | 14:40.22 |
Robin_Watts | TUPLETYPE R | 14:40.44 |
| TUPLETYPE G | 14:40.45 |
| TUPLETYPE B | 14:40.47 |
| to get 'RGB' maybe? | 14:40.56 |
| but in the absence of examples I'd error out, I think. | 14:41.06 |
sebras | Robin_Watts: if I also make tupltype an enum the risk of leaks vanish. | 14:42.46 |
Robin_Watts | sebras: yeah. | 14:43.07 |
| and it's not like we can actually cope with any types other than the fixed (enummed) set we know about. | 14:44.00 |
| i.e. we can't hold onto the TUPLTYPE in order to parrot it out again later. | 14:44.27 |
sebras | Robin_Watts: sebras/master updated according to review. | 16:33.18 |
| Robin_Watts: including fz_drop_glyph_cache() which may now be to your liking..? | 16:33.29 |
| you never really said what was wrong, but I took a guess. | 16:33.41 |
Robin_Watts | unnecessary locking. | 16:33.51 |
sebras | Robin_Watts: I also converted the header token parsing to use the enum stuff. | 16:36.07 |
| I anticipated you would comment on that next. | 16:36.13 |
| and I think all the -> in for loops are gond. | 16:36.23 |
Robin_Watts | sebras: Will look in a mo, just in the middle of talking to mvrhel. | 16:36.34 |
sebras | np. | 16:36.48 |
Robin_Watts | right. | 16:53.42 |
| sebras: First commit looks lovely now. | 16:55.00 |
| We *could* improve it further by having loops that count down rather than up, but let's not hold it up for that. | 16:55.30 |
| second commit is good too. | 16:56.47 |
| The use of memcmp in the third worries me a bit. | 16:57.41 |
| Why memcmp, not strcmp ? | 16:57.46 |
| Suppose s points to "A" (i.e. a 1 char string). | 16:58.03 |
| memcmp("A", "WIDTH", 5) will overrun the "A" buffer | 16:58.39 |
| strncmp would be better ? | 16:58.49 |
| strncmp(s, tokens[i].str, len) == 0 && isspace(s[len]) ? | 16:59.36 |
sebras | because the input buffer is not necessarily nul-terminated. | 17:00.14 |
Robin_Watts | ok, so strncmp, not strcmp. | 17:00.39 |
sebras | oh. | 17:00.47 |
| ok. | 17:00.48 |
Robin_Watts | strncmp stops on a 0, or after the maximum length, I think. | 17:00.59 |
sebras | yes. | 17:01.11 |
Robin_Watts | and we should check that the char after the matching token is whitespace. | 17:01.26 |
sebras | makes sense. | 17:02.00 |
Robin_Watts | (but otherwise, a nice looking commit) | 17:02.23 |
| same comment for the next commit. | 17:02.30 |
sebras | hm.. wait. we already know that the char after the token is white, eol or eof. | 17:03.24 |
Robin_Watts | We do? | 17:03.36 |
sebras | because I while on iswhiteeol() to find the end of the token. | 17:03.36 |
Robin_Watts | sebras: Ah. If you use strncmp you don't need that. | 17:04.10 |
| You skip any whitespace you find, then strncmp looking for matches. If you match, you know the length from the match. | 17:04.51 |
| You miss the nice optimisation about checking len == len though that way... | 17:05.20 |
| yeah, either way is fine. | 17:05.40 |
| I still don't quite understand the premultiply alpha one. | 17:05.56 |
| but all the others look good. | 17:06.30 |
sebras | hm.. isn't it a problem to use strncmp() when the strings I'm comparing the input to are prefixes of each other? | 17:06.39 |
| i.e. BLACKANDWHITE is a prefix of BLACKANDWHITE_ALPHA | 17:07.04 |
Robin_Watts | not if you continue to check your lengths first. | 17:07.05 |
| so your counting of the len and then checking len == len solves that. | 17:07.27 |
| so your way wins :) | 17:07.37 |
sebras | it's rare, but it happens! | 17:08.00 |
Robin_Watts | The last commit looks like it changes what happens in fz_drop_font_context. | 17:09.02 |
sebras | Robin_Watts: why are we dropping the ctx->font->symbol before the &ctx->font->ctx_refs == 0 ? | 17:09.57 |
Robin_Watts | I think your new formulation may be correct though :) | 17:10.13 |
sebras | ok. | 17:10.18 |
| * Only drop font context objects when dropping the last reference. | 17:10.27 |
| this is in the commit message. | 17:10.30 |
Robin_Watts | I wish I understood why it was written the other way. | 17:10.41 |
sebras | mistake? | 17:10.58 |
Robin_Watts | Ah, it was a commit of tors, and you know how you can't trust his code :) | 17:11.51 |
| Yeah, I suspect it was a slip. | 17:12.02 |
sebras | Robin_Watts: commit 52ed5832db9ea4bcc9dcfd2f0391af89f5aadd95 introduces the dropping of things before the call to fz_drop_imp() | 17:12.02 |
Robin_Watts | So lgtm. | 17:12.14 |
sebras | /* CMYK is a subtractive colorspace, we want additive for premul alpha */ | 17:12.57 |
| this is what load-tiff says before doing premultiply. | 17:13.07 |
| this is the reason that many image loaders convert cmyk to rgb before calling premultiply. | 17:13.51 |
Robin_Watts | Ah! | 17:14.02 |
| Copy that same comment into your commit, and I'm happy. | 17:14.36 |
sebras | Robin_Watts: new version up (and being clustered) | 17:15.43 |
Robin_Watts | All lgtm now. | 17:17.30 |
sebras | Robin_Watts: one more fetch please. | 17:20.03 |
| Robin_Watts: didn't notice gcc complaint about types. | 17:21.29 |
| we have too many warnings when recompling... :-( | 17:21.38 |
Robin_Watts | sebras: We don't have that many do we? Mainly in thirdparty stuff? | 17:22.35 |
| MSVC is warning free in the non third-party stuff (except for 2 in memento) | 17:22.53 |
| what changed? :) | 17:23.18 |
sebras | Robin_Watts: (char *) when passing buffer to strncmp() | 17:23.32 |
Robin_Watts | oic. OK. | 17:23.54 |
| lgtm. | 17:23.59 |
sebras | I get two warnings in load-jpx (presumably because of the new code drop), one in gl-input.c and 107 warnings in in pdf-cmap-table.c and one in curl (thirdparty, but not compiled with the thirdparty stuff for reasons unknown). | 17:26.25 |
| Robin_Watts: I think 726.j2k and p0_13.j2k are indeterminisms with openjpeg. | 17:27.14 |
Robin_Watts | curl is outside our control. | 17:27.19 |
| sebras: ooh. | 17:27.27 |
sebras | Robin_Watts: what did you find? | 17:27.57 |
Robin_Watts | was commenting on indeterminisms. | 17:28.28 |
sebras | ah. | 17:28.52 |
| Robin_Watts: ok, then I push the pnm-stuff and the context dropping. | 17:32.15 |
Robin_Watts | cool. | 17:34.18 |
sebras | Robin_Watts: tor8: why do we not have OPENSSL in Makethird? | 18:01.51 |
| are we conciously deciding not to make it easy to compile mupdf using OPENSSL? | 18:02.23 |
Robin_Watts | sebras: the openssl stuff is too large and hairy to have as a thirdparty dependency. | 18:05.56 |
| Having the build rules in the Makefile in case someone DOES import it would be good though. | 18:06.19 |
| paul did the fiddling with that though. | 18:07.10 |
sebras | Robin_Watts: yeah, and I was looking at a bug report concerning pauls code. | 18:07.27 |
| and discovered that I have never built mupdf using openssl despite having it installed on my system. | 18:07.45 |
| maybe we want to avoid using the system's openssl too..? | 18:07.54 |
| Robin_Watts: http://git.ghostscript.com/?p=user/sebras/mupdf.git;a=commitdiff;h=65059bf99be52e80f282b75cd43c0108d8f91616 this makes me compile using my system's openssl all the time. | 18:08.59 |
| Robin_Watts: but maybe we don't want to make that the default? | 18:09.25 |
Robin_Watts | I'm going to defer to tor8 on this. | 18:14.06 |
tor8 | sebras: Robin_Watts: because openssl has a hairy build system and cannot be built by simply compiling the source files | 22:59.43 |
| openssl does not exist on macosx | 23:00.29 |
| Forward 1 day (to 2016/10/14)>>> | |