| <<<Back 1 day (to 2016/06/30) | 20160701 |
sebras_ | tor8: a trivial patch on sebras/master for you | 10:53.13 |
tor8 | sebras_: case insensitivity patch LGTM | 10:53.47 |
| sebras_: fz_load_jpx LGTM as well | 10:54.29 |
sebras_ | tor8: excellent! I'll reorder the patches so that is first in the queue in that case. | 10:54.38 |
| tor8: yeah, re: jpx: I've reduced the set of failing .j2k-files to about 20 now. I can see differences between the old rendering and new rendering though. | 10:54.59 |
| tor8: some times luratech improves upon openjpeg and is equal to gs, but sometimes it looks wrong. seems like quite a few of the testcases that fail are having a linear gradient superimposed on the images and that is not handled correctly. | 10:56.12 |
| tor8: why those handled correctly when you use openjpeg I don't know (yet). | 10:56.27 |
| tor8: the trivial patch on sebras/master is about adding support for J2K/JP2 to CBZ. that makes it much easier to test things rather than iterating over subdirectories. Yes I'm lazy. :) | 10:58.40 |
| ok. have to head out for dinner. but I'll be back to read the logs later! thanks for the prompt reviews so far! it's appreciated. :) | 10:59.13 |
tor8 | Robin_Watts: got a minute? | 11:07.18 |
Robin_Watts | sure. | 11:07.25 |
tor8 | in pdf-write.c, removeduplicateobjects | 11:07.43 |
| after a = pdf_get_xref_entry | 11:07.55 |
| we do a = pdf_resolve_indirect | 11:08.02 |
| that seems wrong to me | 11:08.14 |
Robin_Watts | Can an xref object never be a duplicate? | 11:09.42 |
| There wouldn't seem to be much benefit in having one as being a duplicate, but I don't know that it's illegal. | 11:10.07 |
| and we want to compare the eventual object that's pointed to, right? | 11:10.23 |
tor8 | we're comparing the actual numbered objects | 11:10.45 |
| 5 0 obj 66 0 R endobj, 6 0 obj 66 0 R endobj | 11:11.52 |
| for instance, here a and b after get_xref_entry will be two separate pdf_obj's with the same contents (both being indirect references) | 11:12.22 |
| I think we want to compare objects 5 and 6, rather than dereference and compare what they point to | 11:12.44 |
| and I also think that anybody who creates pdf objects that are just references to other objects, ought to be disqualified from being a programmer :P | 11:13.25 |
| Robin_Watts: anyway, I noticed this while digging through the code trying to simplify and make more robust how object numbers and generation numbers are handled | 11:13.57 |
Robin_Watts | In that example, if we fetch 5 and 6 as a and b, and resolve them, they both end up as object 66. | 11:15.04 |
tor8 | yes. | 11:15.13 |
Robin_Watts | so they'll be spotted as duplicates and one will be dropped. | 11:15.14 |
| How is that not the right thing? | 11:15.19 |
tor8 | the same will happen whether we resolve them or not | 11:15.26 |
| but resolving them, we compare the pointed to object (but not comparing the streams) | 11:15.53 |
Robin_Watts | OK. So consider the: case where we are comparing object 5 and object 66. | 11:16.07 |
tor8 | so if 5 and 6 point to a stream dictionaries 55 and 66 which both have the same length | 11:16.20 |
Robin_Watts | If we don't resolve them, then they will be taken as unequal, and we won't drop the duplicate. | 11:16.35 |
tor8 | we might accidentally drop one of them | 11:16.42 |
| Robin_Watts: no? | 11:16.54 |
Robin_Watts | we compare the streams explicitly. | 11:16.59 |
tor8 | they're both 66 0 R | 11:17.00 |
Robin_Watts | Suppose we are comparing objects 5 and 66. | 11:17.23 |
tor8 | 5 0 obj 55 0 R endobj, 6 0 obj 67 0 R endobj, 55 0 obj << /Length 1 >> stream A endstream, 66 0 obj << /Length 1 >> stream B endstream | 11:17.36 |
| that might go wrong | 11:17.40 |
Robin_Watts | a direct pdf_objcmp of 5 and 66 will return 'different'. | 11:17.54 |
tor8 | since we'll compare object 55's dictionary with object 66's dictionary, but the streams of object 5 and 6, when comparing objects 5 and 6 | 11:18.20 |
| they'll look the same, and we'll drop object 6 | 11:18.25 |
Robin_Watts | tor8: OK, so the pdf_is_stream tests should be done on the resolved a and b ? | 11:21.07 |
tor8 | I think we should not resolve a and b at all | 11:21.38 |
| comparing 5 and 6 directly, without resolving, (in the case where they both point to the same object number) will do the right thing | 11:22.10 |
| 5 0 obj 66 0 R endobj, 6 0 obj 66 0 R endobj will compare 66 0 R and 66 0 R (different pointers, same content) | 11:22.31 |
| different pdf_obj*, same content | 11:22.45 |
Robin_Watts | Yes, the 5 and 6 case works, but the 5 and 66 case does not. | 11:23.17 |
| The only fully correct solution I can see is to resolve, and to check pdf_is_stream on the resolved thing. | 11:24.00 |
tor8 | right, you want to remove the needless indirect reference object and make the references that used point to 5 dereference and point directly to 66 instead? | 11:24.02 |
Robin_Watts | tor8: Yes. | 11:24.11 |
tor8 | Robin_Watts: you know what. we're already broken on that case | 11:24.53 |
| garbage collection doesn't follow the 66 0 R ... | 11:24.54 |
| mutool clean -g on a file like that will drop the 66 object even though it's referenced | 11:24.54 |
Robin_Watts | really? | 11:25.14 |
| Then we should fix that. | 11:25.35 |
tor8 | yeah... | 11:25.43 |
| Robin_Watts: and I know why... pdf_resolve_indirect resolves multiple steps of indirection | 11:34.41 |
| so if an intermediate step is a numbered object, our mark/sweep will skip marking the intemediate object | 11:35.00 |
| gah. and pdf_graft_object also gets this case wrong :/ | 11:38.19 |
Robin_Watts | tor8: ah.... | 11:38.44 |
tor8 | if I make resolve_indirect only do one step of resolving, pdf_graft_object gets it right | 11:39.49 |
| but mutool clean -g still fails | 11:39.54 |
| Robin_Watts: a couple of possibly contentious commits on tor/master | 12:07.46 |
| including a fix for the garbage collection issues | 12:07.57 |
Robin_Watts | Oh Yes! | 12:17.24 |
| Just got an LZW tiff decoded from SmartOffice. | 12:17.37 |
| (and a "difficult" old format one at that) | 12:17.53 |
| Forward 1 day (to 2016/07/02)>>> | |