| <<<Back 1 day (to 2016/12/22) | 20161223 |
sebras | tor8: there are two commits on sebras/master, do they make sense to you? | 12:12.23 |
tor8 | sebras: I'd rather fix the segfault to allow for NULL resource dicts | 13:05.48 |
| resource dictionaries in pages are optional, after all | 13:06.00 |
sebras | tor8: notice that this NULL, not null. | 13:07.43 |
tor8 | should not matter. pdf_dict_get(page_obj, "Resources") will return NULL not null. | 13:08.38 |
| ...if the resource dictionary is missing | 13:09.04 |
| resource dictionaries are not required to be indirect objects either, as you state in the commit comment. | 13:09.37 |
| they can be inline dictionaries | 13:09.43 |
| and that is what you create in your patch as well... | 13:10.00 |
sebras | in the file that was originally created, we create an indirect reference to an object that does not exist. | 13:11.42 |
| this causes pdf_load_page to call pdf_resource_use_blending which eventually tries to resolve the object. | 13:12.16 |
tor8 | sebras: I'm confused. We don't create a file in make_fake_doc. | 13:12.25 |
sebras | this cannot be done because there is no file backing it so fz_seek() segfaults in the end. | 13:12.29 |
tor8 | we pass NULL to pdf_add_page | 13:12.42 |
sebras | tor8: exactly, so NULL is stored in the xref. | 13:13.09 |
tor8 | huh, why are we trying to seek in a newly created document?! | 13:13.19 |
sebras | xref entry->obj | 13:13.20 |
tor8 | ah, pdf_add_page creates an indirect object for the resources dictionary | 13:13.48 |
sebras | let me switch irc computer... | 13:13.49 |
tor8 | if (resources) pdf_dict_put_drop(...Resources) then? | 13:13.58 |
| same with Contents, they are also optional | 13:14.03 |
sebras | that might work. | 13:15.16 |
| since this an in-memory only object we cannot afford to seek in the non-existing stream. | 13:15.43 |
tor8 | hm. reading the spec more closely, it says that if there are no resources we should create an empty dictionary (or else it will go looking for inherited resource dictionaries in the page tree) | 13:16.34 |
sebras | tor8: right, so those if-statements in pdf_add_page() also fixes it. | 13:16.35 |
tor8 | sebras: I think the core issue here is calling pdf_add_object with NULL | 13:16.51 |
| but crashing on such errors is also bad | 13:17.24 |
sebras | tor8: maybe pdf_add_page() should be able to handle a NULL resource argument and create the empty dict in that case? | 13:17.43 |
tor8 | yeah. I think that would be best. | 13:18.35 |
| and it should possible avoid creating a new indirect object as well | 13:18.55 |
sebras | OTOH, if we do that you cannot add pages which you want to inherit the resource dict fro mthe page tree.. | 13:19.16 |
tor8 | no. we don't want to get involved in that insanity :) | 13:19.44 |
| that's just a brain dead bit of spec which we shall avoid at all costs when creating our own files. | 13:20.17 |
sebras | this is actually what the file I'm using for testing happens to use. :) | 13:20.37 |
tor8 | we don't allow access to the page tree so there's no way for anyone to even do that | 13:20.41 |
| using the pdf_add_page and pdf_insert_page apis | 13:20.51 |
sebras | right, then the path to resolving this is clera. | 13:21.02 |
tor8 | yes. the only question is whether to put the resource dictionary inline or create a new numbered object for it. | 13:21.42 |
| most files tend to use numbered objects for it | 13:22.19 |
sebras | inline if it is empty and indirect if it is provided? | 13:22.23 |
tor8 | and that will let our garbage collection dupe detection coalesce them | 13:22.34 |
| yeah, inline if it is empty and indirect if it is provided and not already indirect is probably best. | 13:22.51 |
sebras | so the issue causing all of this is that there is no stream backing the fake document. | 13:25.05 |
| this means that having the resource dict be an indirect reference to a non-existing object in a normal file ought to work fine. | 13:25.35 |
tor8 | yeah. there is normally no way to create a new document and have a bogus indirect reference that will try to load the underlying file. | 13:25.49 |
sebras | no, but you could have a file we parse that says /Resource 99 0 R but not have 99 0 obj in there, and in thta case well error out on "object out of range" which results e.g. in "cannot find Font dictionary". | 13:27.17 |
tor8 | if it's a non-existent object, we'll throw exceptions based on it being out of range of xref->len and not try to load | 13:27.27 |
| if it's created with pdf_create_object it'll be a 'f' free entry and thus not loaded | 13:27.42 |
| and if it's updated with pdf_update_object then it'll have a proper object | 13:27.54 |
| *but* if we call pdf_update_object with NULL all hell breaks loose | 13:28.07 |
| so we should catch that right there | 13:28.12 |
| and add a 'null' object | 13:28.20 |
| or just no-op it | 13:28.24 |
| sebras: try the two top commits on tor/master | 13:31.16 |
sebras | that works for the tests I have been doing. mention bug 697183 in the commit message though. | 13:36.47 |
| though I'm worried what will happen if csi->rdb is a null object and we look for a resource other than a font though. | 13:37.33 |
tor8 | sebras: in the page interpretation? | 13:38.00 |
sebras | yes. | 13:38.04 |
| do we handle not having a resource dict gracefully everywhere. | 13:38.37 |
| in pdf-interpret.c | 13:38.43 |
tor8 | sebras: we'll just get NULL | 13:39.50 |
| see pdf_process_Do where we look for image resources | 13:40.04 |
| xres = dict_get(csi->rdb, "XObject") will return NULL if rdb in NULL | 13:40.29 |
sebras | yes, we check xres for NULL. | 13:40.35 |
tor8 | and even if it were not checked, we'd eventually end up pdf_load_image with xobj==NULL where we'd fail on width==0 | 13:41.44 |
| sebras: was there somewhere specific you were looking at? | 13:42.00 |
sebras | no, I'm looking at all uses of csi->rdb | 13:42.18 |
| to make sure they're all robust. | 13:42.43 |
tor8 | resolve_properties is the only one that doesn't have an explicit check | 13:43.22 |
sebras | true but then again the caller checks for !cooked. | 13:43.49 |
tor8 | which is for the marked content stuff used for optional content groups | 13:43.53 |
| but that can cope with blank and missing entries well enough | 13:44.05 |
| yeah. | 13:44.28 |
sebras | load_font_or_hail_mary() is the only one I haven't looked at yet. | 13:44.45 |
tor8 | sebras: the other fixes in your make_fake_doc look reasonable | 13:44.54 |
| missing the 's' stroke command on the content stream makes me wonder if this has ever been properly tested :) | 13:45.17 |
sebras | yeah, I'll bring your commits over and create separate commits for that. | 13:45.24 |
| I think not. | 13:45.35 |
| also the bogus error handling seems to suggest that no... ;) | 13:45.50 |
tor8 | sebras: there is one other place where we create a 'fake' document | 13:45.51 |
| in pdfportfolio.c when creating a new portfolio pdf | 13:45.59 |
sebras | right, I'm actually attempting to add portfolio support to the x11 viewer just as an experiment. | 13:46.22 |
tor8 | but there we add some actual text and not just a cross | 13:46.33 |
| sebras: okay. as an extra command line argument which portfolio entry to open? | 13:47.04 |
sebras | oh, pdf_load_type3_font() checks for rdb being NULL. so I think we handle NULL everywhere in that case. | 13:47.09 |
tor8 | sebras: yes, we should. the loading code is pretty robust these days. | 13:47.31 |
| it's the creation code that's had less field testing... | 13:47.40 |
sebras | tor8: no, I wanted to generate a fake page with GoToE links. this is why i was looking into the make_fake_doc() code. | 13:48.11 |
tor8 | take a look at pdfportfolio.c | 13:48.31 |
sebras | I did. | 13:48.35 |
tor8 | okay. | 13:48.39 |
sebras | as of now, when you press F, you'll get a fake page with the names of the embedded files | 13:49.03 |
| but as of yet no GoToE links. | 13:49.12 |
| we don't support those, but they would be handy in this case. | 13:49.33 |
tor8 | sebras: right. we don't support those... | 13:49.41 |
sebras | in order to support them fully we'd need to keep a hierarchy of pdf's in memory. :-P | 13:50.17 |
tor8 | I think if we allow mupdf to be called with a file:// url argument and crack that to go to the correct file, embedded doc, and page | 13:50.24 |
| as an external URL | 13:50.34 |
sebras | yes, I have already implemented an ugly hack to support mupdf-x11 file:///absolute/path/to/document.pdf#page=42 | 13:51.03 |
tor8 | sebras: the other alternative is to have a pdf portfolio container fz_document which just puts all the portfolio entries in sequence | 13:51.10 |
sebras | along with an initial fake page used as index? | 13:52.15 |
| if we want. | 13:52.19 |
tor8 | yeah. | 13:52.49 |
sebras | tor8: so pdfportfolio leaks a lot where the fake document is created. in case of error. | 13:53.07 |
tor8 | BUT ... pdf portfolios are such a bad idea I don't fancy the idea of wasting effort supporting them. | 13:53.15 |
| sebras: yeah, but it's a command line tool so we don't really care. | 13:53.28 |
| but if you want to clean it up, go ahead! :) | 13:53.51 |
sebras | the resource dict is being fully populated though so there are no issues there. | 13:54.23 |
tor8 | if we get an error, we'll end up calling abort() with uncaught exception | 13:54.27 |
sebras | yeah, I see that. | 13:54.39 |
| tor8: I have another thing pending where I think the gif decoder needs to enable old_tiff LZW-decoding. | 14:05.59 |
| tor8: but I don't have access to my entire gif suite now so I'm not sure if there will be regressions. | 14:06.21 |
tor8 | sebras: https://helpx.adobe.com/acrobat/kb/link-html-pdf-page-acrobat.html for specific pdf page numbers and named destinations | 14:11.26 |
| but I can't find an equivalent syntax to go to an embedded portfolio file | 14:11.36 |
sebras | yeah I found that link in git log and read it. | 14:12.16 |
tor8 | https://forums.adobe.com/thread/2111745 looks like the answer is "no can do" | 14:14.07 |
| but that doesn't prevent us inventing our own syntax, like #subfile=N,page=M | 14:14.51 |
| or #portfolio=N to be clearer | 14:15.21 |
sebras | is N the name or a number though? | 14:23.00 |
| tor8: the commits on sebras/master clustered fine, final LGTM? | 14:23.40 |
serge111 | Thanks for the guidance Robin_Watts, I'll fiddle with MuPDF and see what I can do. | 14:34.40 |
karl_ | any mupdf developer here ? | 22:02.41 |
| Forward 1 day (to 2016/12/24)>>> | |