| <<<Back 1 day (to 2020/04/06) | Fwd 1 day (to 2020/04/08) >>> | 20200407 |
kens | chrisl Nancy_iMac any thoughts about recording object and generation numbers ? | 15:01.18 |
| There is a compromise solution (which I'm not ecstatic about) which would involve holding parent numbers only for strings, arrays and dictionaries | 15:01.55 |
chrisl | So, complete disclosure: I scanned your email briefly, and forgot to go back to it..... | 15:01.55 |
kens | :-) | 15:02.01 |
Nancy_iMac | I personally like to keep it simple and add the 2 new 'parent' object and generation numbers. But I understand the objection abouit the size. If you really don't want to do that, then the other solution needs to be super clear and with macros/inline functions. | 15:02.07 |
kens | I'm not wild about adding 12 bytes of storage to every object | 15:02.27 |
Nancy_iMac | still going to be way more efficient than the gs implementation? | 15:02.47 |
kens | I have no clue how the GS implementation handles this to be honest | 15:03.00 |
Nancy_iMac | I just mean generally, the size of each object there? | 15:03.18 |
kens | Oh I have no idea how big they will be, huge I imagine | 15:03.32 |
| Lots of overhead | 15:03.51 |
| I would probably prefer inline functions over macros. I don't think there are actually many places where we look at object numbers, let me have a quick grep | 15:04.36 |
Nancy_iMac | how about having the parent info for only strings, arrays and dicts? That seems reasonable to me. | 15:04.55 |
kens | That's what I said up above | 15:05.23 |
Nancy_iMac | yeah I don't have a pref between macros and inline functions. Just that we are gunshy about macros in this code base, so inline functions. | 15:05.26 |
| Yes I am saying I like what you said above. :) | 15:05.33 |
kens | Oh, I don't really like it :-) | 15:05.45 |
| It means we have to do special stuff when creating certain kinds of objects which we don't do for other kinds | 15:06.10 |
| Turns out there are quite a few places we look at object_num | 15:06.52 |
| 73 matches | 15:06.57 |
Nancy_iMac | yeah I figured object_num used a lot | 15:07.00 |
kens | Well mostly in pdf_int.c | 15:07.09 |
chrisl | How do you handle inheritance of things like resources? | 15:07.21 |
kens | Which is unsurprising because that's where we create an ddeal with objects on the whole | 15:07.28 |
| chrisl we deal with that when unrolling the pages tree | 15:07.46 |
| Any inherited resoruces are copied directly into the page resources dictionary | 15:08.03 |
chrisl | So, there's no "parent" entry in other objects | 15:08.10 |
kens | So we cna have multiple copies of the same data | 15:08.12 |
| No, we don't have a parent entry anywhere right now | 15:08.24 |
| For objects under the page tree, we keep a copy of the page dictionary in the context, so we can refer to that. | 15:09.18 |
Nancy_iMac | kens: if you're going to have to add special stuff to deal with this anyway, I don't see that handling strings/dicts/arrays special on creation is a big deal? | 15:09.23 |
kens | Its more the principle, I don't like handling these objects differently | 15:09.49 |
| I'd prefer to handle all objects the same way | 15:10.02 |
Nancy_iMac | have two types of objects, "complex" and "simple". Handle "complex" objects differently? | 15:10.36 |
kens | We only have a single routine to allocate objects | 15:10.49 |
chrisl | There are already atomic and composite objects | 15:11.18 |
kens | Unless we have two different routines, which means checking the type up front and I *really* don't want to do this, then we have to pass the 'parent' numbers to the allocation routine anyway | 15:11.23 |
| regardless of the object type | 15:11.42 |
| Sure we can discard the numbers for simple objects | 15:11.52 |
| But I dislike special handling really | 15:12.07 |
Nancy_iMac | I think keeping it simple and putting the parent fields in everything will win then? | 15:12.12 |
kens | But I don;'t like that either because it'll amek things like an in much bigger | 15:12.32 |
| int* | 15:12.38 |
Nancy_iMac | So which soluition do you like? | 15:13.13 |
kens | Well I don't like any of them really :-( | 15:13.23 |
chrisl | How about adding it to every object, but packing the object # and generation in to int32_t ? | 15:13.38 |
kens | We can't keep the object number as a 32-bit value | 15:14.01 |
| They can range up to 10 decimal digits | 15:14.10 |
Nancy_iMac | so object number is 10 decimal digits. How many bits is that? | 15:14.15 |
kens | 34 | 15:14.21 |
chrisl | I thought that was the byte offset | 15:14.42 |
kens | In practice it shouldn't be a problem, but we know how long it'll be before someone turns up with sucha an insane file | 15:14.45 |
Nancy_iMac | pack it into a int64? | 15:14.52 |
kens | chrisl it *is* the byte offset, but...... | 15:14.55 |
| There is no defined limit on the objct number, however, the free list uses the byte offset to point to the object number of the next entry in the free list, which limits it to 10 bytes. | 15:15.29 |
chrisl | "indirect object 8,388,607 Maximum number of indirect objects in a PDF file." | 15:15.44 |
kens | Nancy_iMac: the generation number is a 32-bit number, so that's 66 in total | 15:15.53 |
| chrisl where did you find that ? I spent ages looking for it | 15:16.07 |
chrisl | "Implementation Limits" | 15:16.23 |
kens | Oh yeah, but that just means Acrobat, not the real spec | 15:16.40 |
chrisl | Well, that is true, but realistically ..... | 15:17.29 |
kens | So Acrobat uses 24 buts, interesting | 15:17.31 |
| well 23 | 15:17.45 |
Nancy_iMac | so if we change object numbers to be 32 bits, saving all that space? | 15:17.49 |
kens | We could do that, I guess | 15:18.02 |
| I admit keeping a separate number would be easier, but I was kind of horrified at the potential space explosion | 15:18.40 |
Nancy_iMac | if Adobe's limit is 24 bits and we use 32 bits, we are way better than them :) | 15:18.52 |
kens | Won't be the first time :-) | 15:19.15 |
chrisl | kens: Even just keeping a pointer to the parent would add 8 bytes on most modern systems | 15:19.30 |
kens | chrisl is there an implementation limit on the generation number ? | 15:19.36 |
chrisl | Not that I can see | 15:20.00 |
kens | Oh well, woudl have been nice but never mind | 15:20.16 |
| TBH I only need the bottom 3 bytes of the object number and the bottom 2 bytes of the generation, so making the parent object 32 bits and hte parent generation 16 bits would cost us 6 bytes, we'd gain back 4 from making the object number 32 bits | 15:21.58 |
| So it would cost 2 bytes, I guess that's OK | 15:23.12 |
Nancy_iMac | that's my vote then. Will be the most readable solution | 15:23.31 |
kens | OK I'll start on that. | 15:23.43 |
chrisl | It also makes it pretty easy to change it if we find a need for larger numbers | 15:23.52 |
kens | Well I guess yes. | 15:24.00 |
| Nancy_iMac: when hacking the transaprency spot stuff, watch out for the SpotNames dictionary | 15:24.22 |
| Its a temporary dictionary but it gets stored in the context | 15:24.37 |
Nancy_iMac | yeah I saw that. I have been in that code before. Not a problem :) | 15:25.02 |
kens | Need to allocate it and count it down properly. Its already initialised to NULL in context creation | 15:25.05 |
| OK wanted to mention it because its usage in the existing function is spread about a bit (near the top to init it, in the middle where its used and at the end where its counted down) | 15:25.43 |
Nancy_iMac | kens: Arguable the argument to all those routines should be the SpotNames dict, not the number of spots? Then count them at the top? I could make that change I suppose. | 15:28.42 |
| gets rid of that weird variable in ctx which was really always a hack. | 15:29.03 |
kens | Hmmm | 15:29.03 |
| We use the Spotnames dict to hold the names of all the spots we've found, so that we can tell if a name is new. | 15:29.40 |
Nancy_iMac | changes a bunch of code, but I think it would be an improvement. | 15:29.40 |
kens | We still need to be able to tell if a name is new, even if its just for this page | 15:29.58 |
Nancy_iMac | yeah, I am not saying to get rid of that dict, I am saying pass that around everywhere, instead of the spots count. | 15:30.24 |
| Pretty sure it would work out and be more clear. | 15:30.31 |
kens | You could do that, then count the number of spots in the dictionary instead I guess | 15:30.42 |
Nancy_iMac | yeah onluy the top wants to k now the number. Everywhere else it just adds to the dictionary if needed. | 15:31.00 |
| I think it will be cleaner. I don't mind fixing it up while I'm in there. | 15:31.11 |
kens | Feel free to do that if you like | 15:31.16 |
Nancy_iMac | I suppose I am avoiding going back to annotations :) | 15:31.51 |
kens | I wonder why I didn't do that before anyway. | 15:31.52 |
Nancy_iMac | probably it was a thing that evolved | 15:31.59 |
kens | Could be yes | 15:32.06 |
| You can have decryption instead if you don't want to do annotations :-) | 15:32.17 |
Nancy_iMac | I mean the design evolves and then need to step back at look at it from a different angle sometimes | 15:32.28 |
| Heh you have those almost done I bet. | 15:32.38 |
kens | Oh I wouldn't say that :-( | 15:32.47 |
Nancy_iMac | I wonder if I should take another pass to look for some memory leaks. | 15:32.49 |
| I am sure we have new memory leaks | 15:32.57 |
kens | I've done revision 2 & 3, revision 4 needs the string object numbers, and I have to get AES decryption working. Then I need to look at revision 6 which uses all new password authentication techniques, which I think sebras wrote, not me. | 15:33.35 |
| I'm not sure we can do certificate security | 15:33.57 |
| I'm pretty sure GS doesn't. | 15:34.06 |
Nancy_iMac | seems fair to only handle what gs does, in first pass. | 15:34.30 |
kens | Memory leaks, reference counting problems are all good if you want a break | 15:34.32 |
| I'm not sure what else needs implementing, did you look at optional content ? | 15:35.02 |
Nancy_iMac | I did OC for images, I am pretty sure. | 15:35.20 |
| at some point we will need to do a compare between gs and gpdf and then we will find a zillion things we missed | 15:35.56 |
kens | Absolutely no doubt about that at all | 15:36.17 |
Nancy_iMac | I am pretty sure that unfortunately your halftones aren't identical to the gs ones, so there are small differences that arguably don't necessarily matter other than messing up a bmpcmp | 15:36.21 |
kens | I haven't (yet) updated the code to actually set up the same halftones as the PostScript itnerpreter | 15:36.46 |
| So they are still differetn, yes. But we do (or should) have the ability to sort that out now | 15:37.04 |
| OK so given its nearly 5pm I'll start in on the parent object numbers tomorrow | 15:38.03 |
Nancy_iMac | kens: Sounds good. | 15:38.24 |
kens | Oh I merged in master today as well, so we should be up to date again | 15:38.43 |
Nancy_iMac | cool | 15:39.46 |
| <<<Back 1 day (to 2020/04/06) | Forward 1 day (to 2020/04/08)>>> | |