| <<<Back 1 day (to 2013/06/27) | 2013/06/28 |
sebras | after some updating I get this instead: | 00:35.46 |
| error: Annotation object not a dictionary | 00:35.49 |
| warning: ignoring broken annotation | 00:35.49 |
| this is on a Make magainze pdf, but I just noticed that I get the same on pdfref17.pdf... | 00:41.08 |
| Robin_Watts: looks like it might be 9d20a4f3a69fdea855f8678c1ad50b5db7472d81 causing the problem. | 00:44.22 |
| wait... no. | 00:46.20 |
| git blames paul in f07fedd9. I mistook the setting of is_dict = 0 for the problem. | 00:46.56 |
| it is not. | 00:46.59 |
| but in f07fedd9 paul changed the if (!is_dict) continue) into if (!is_dict) fz_throw(). which appears to be incorrect. as I read table 8.15 page 606 of pdfref17 AP is not required, hence we shouldn't error out...? | 00:48.08 |
Guest51945 | !list | 02:10.58 |
jamma | hi ! | 04:24.35 |
| Can I merge PDF files and keep bookmarks ? | 04:25.17 |
| seems that yes, but the dest pages are wrong.. | 04:25.34 |
ray_laptop | morning, kens | 07:22.15 |
kens | Hi ray_laptop you're up late | 07:22.23 |
| I replied to Mani because I thought you colonails would all be in bed :-) | 07:22.41 |
ray_laptop | I saw your response. Thansk | 07:23.02 |
| thanks | 07:23.06 |
| we are both offloading the mupdf folks :-) | 07:23.34 |
kens | Well, I have neough to do, but given those folks are in India, I thought it would be good to get them a response in their working day | 07:24.13 |
ray_laptop | would be nice if we got a real customer | 07:24.17 |
kens | One day :-) | 07:24.24 |
| But its really hard to make money on apps | 07:24.39 |
ray_laptop | kens: what I don't grok is the proliferation of video games at > $25 prices | 07:28.53 |
| most are junk | 07:29.03 |
kens | Absolutely, but people will pay for them, while apps are free in most people's mind (or nearly soo, 50c is nothing) | 07:29.38 |
| Robin_Watts : tor8 ping ? | 07:51.26 |
tor8 | hi kens | 07:51.35 |
kens | shobhit has opened a SO question about his compilation problem: | 07:51.55 |
| http://stackoverflow.com/questions/17347594/errors-while-compiling-the-mupdf-for-android-platform | 07:51.56 |
| He doesn't seem to have done what Robin_Watts asked him, so perhaps an answer on OS might prompt him to do as was suggested. | 07:52.25 |
| There's also someone wanting to generate a thumbnail using MuPDF, but its not clear to me what he wants to do exactly, I'd have thought Mudraw was what he wanted: | 07:53.28 |
| http://stackoverflow.com/questions/17348382/mupdf-generate-thumbnail | 07:53.29 |
tor8 | kens: is there a bugzilla bug for shobhit's question? | 07:55.29 |
| e-book app ... yet another gpl violator? :) | 07:55.57 |
| and his question is incoherent... or incomprehensible | 07:56.56 |
| "thumbnail for each pdf file in my project" huh? | 07:57.08 |
kens | tor8 no, he hasn't reportted a bugzilla bug, he was on here (and off, and on and off) yesterday | 08:10.27 |
| tor8 yes, that was waht I meant when I said I didn't really understand his question. Presumably if he wants an image he can just use mudraw at a lower resolution... | 08:11.07 |
tor8 | kens: yeah. I'll answer him how to use mudraw. | 08:14.13 |
kens | tor8 thanks | 08:14.19 |
tor8 | Robin_Watts: paulgardiner: I think I have a suspicion why opening a document like pdfref17 takes so long... | 09:09.42 |
| it's finding the page numbers for the link destinations that's taking ages | 09:09.54 |
| (I made it even slower, and now it's really slow...) | 09:10.11 |
paulgardiner | I hadn't realised it did that when opening the document. Had thought that was a page-load thing | 09:12.12 |
tor8 | it happens when loading the bookmark outline thing | 09:12.58 |
paulgardiner | Oh I see | 09:13.11 |
tor8 | we currently do a linear search through the page tree to find a hit | 09:13.44 |
| one way to speed that up would be to cache the page numbers in the pdf_xref_entry | 09:14.31 |
| but I'm going to try an approach that follows the Parent links and adds up the Count, but I suspect that'll be both tricky to write and not very fast | 09:15.15 |
Robin_Watts | Let me answer him. | 09:15.41 |
tor8 | I think the parent link following is most robust in the long run. less duplicated state that needs to be kept in sync in the face of edits. | 09:16.26 |
| another way would be to make links opaque with callback to get the page number out | 09:19.28 |
paulgardiner | That sounds quite attractive | 09:20.52 |
tor8 | but I think first make it a dynamic lookup through the page tree, and then callbackify it to reduce the startup overhead | 09:21.42 |
| Robin_Watts: I forgot that your page tree editing existed... they will need some work too. | 09:22.09 |
Robin_Watts | tor8: ok. | 09:22.23 |
| http://prism.andrevv.com/ | 09:38.20 |
kens | :-) | 09:38.52 |
Robin_Watts | And for those of you not on facebook... | 09:39.49 |
| Heisenberg is driving down the motorway when a cop pulls him over. "Sr, Do you know how fast you were going?" "No, but I know where I am!" | 09:40.35 |
| "You were doing 100 miles an hour!" "Great. Now I'm lost!" | 09:40.53 |
kens | Hmm, but if Heisenberg knows where he is, the cop can't know how fast he was going.... | 09:41.27 |
kens | thinks the GS PDF Form implementation might be slow | 09:43.18 |
paulgardiner | Robin_Watts: Oh, I do like that. :-) | 09:50.55 |
kens | OK so its not the form processing per se that's slow, its a particular form that's slow | 09:51.26 |
sebras | paulgardiner: did you see my discovery about issues with pdfref? | 09:55.02 |
| paulgardiner: basically your commit (see the logs) requires "AP" in the annotation dict, which is not correct. | 09:55.29 |
tor8 | Robin_Watts: going for lunch, there are some preliminary pagetree commits on tor/pagetree | 09:55.40 |
| I need to fix the create/delete/insert page functions as well | 09:55.55 |
| (or if you want to tackle that while I'm out) | 09:56.05 |
| there's some problem with annotations with those changes though... | 09:56.23 |
| oh, those are broken before the page tree commits as well... so nvm that. | 09:57.14 |
paulgardiner | Oh right. Thanks sebras. | 10:01.27 |
| Hmmm. But old behaviour if no ap was to continue, which would step to the next array element without creating an annot struct for the current one. Now we're creating an annot struct but later removing it if no ap. I don't understand what's wrong. | 10:06.27 |
| sebras: ^ | 10:07.21 |
Robin_Watts | paulgardiner: There are 2 changes on robin/master. | 10:27.02 |
paulgardiner | Robin_Watts: okay | 10:27.25 |
Robin_Watts | I *think* tor was happy with them last night, but if you could nod at them too, I'd be grateful. | 10:27.27 |
| tor8: (For when you get back). It's not quite a "linear" search, is it? You'll skip whole subtrees with the count stuff. | 10:37.43 |
| And I don't think the parent thing will help you, will it? | 10:38.53 |
paulgardiner | fz_disable_device_hints(dev, FZ_IGNORE_IMAGE): should I read that as a double negative? | 10:51.35 |
Robin_Watts | Yes. | 10:51.49 |
| The FZ_IGNORE_IMAGE hint says to "ignore images". | 10:52.06 |
paulgardiner | Right. So default is no images | 10:52.10 |
| Both look good to me | 10:52.23 |
Robin_Watts | Default for what device? | 10:52.23 |
Robin_Watts | wonders what paul is looking at. | 10:52.41 |
paulgardiner | HTML extraction ... | 10:53.01 |
Robin_Watts | no, the two before that. | 10:53.08 |
| sorry. | 10:53.10 |
paulgardiner | Ah. No my fault. | 10:53.28 |
| I was looking at 2 and 3 | 10:53.39 |
| 2 is good | 10:53.43 |
| Both good | 10:54.10 |
Robin_Watts | Thanks. | 10:54.15 |
paulgardiner | Robin_Watts, tor8: I have two ready to go too, although not rebased past those two yet. | 10:55.40 |
| cluster test of mujstest is now happy with them | 10:55.58 |
Robin_Watts | paulgardiner: As soon as I finish looking at tor8's I will look at yours. | 10:56.10 |
paulgardiner | ta | 10:56.16 |
| No hurry. I'm mostly not intending to work today. I was just trying to get that one finished | 10:56.48 |
Robin_Watts | tor8: In your first commit you have: for (i = 0; i < pdf_array_len(kids); i++) | 10:59.29 |
| That's bad - calling a function every time around the loop. | 10:59.40 |
| Also, in general, our handling for pdf arrays is frequently bad. | 11:00.22 |
| Actually, scratch that, sorry. | 11:01.43 |
| tor8: Second commit looks fine, except: 1) Where you mark/unmark you need try/catch logic. | 11:12.14 |
| 2) Rather than limiting pdf_lookup_page_number by depth you should use mark/unmark. | 11:12.36 |
| In the third commit, should really mark/unmark in pdf_lookup_inherited_page_item too. | 11:14.11 |
| tor8: I'm going to prod at the commits a bit. Yell when you're back. | 11:14.42 |
tor8 | yells at Robin_Watts | 12:03.43 |
| for 2) the depth, lookup_page_number is fairly performance critical at load time now so I'm wary of doing a lot of extra book-keeping and cleanup logic | 12:04.48 |
Robin_Watts | tor8: right. | 12:06.12 |
| I have new versions of the commits on robin/pagetree. | 12:06.24 |
| And I am working on a version now that avoids doing try/catch at every level, but is still safe. | 12:06.46 |
tor8 | and unwinding the stack of mark/unmark needs a dynamic array (or recursion) to recover, which means a lot of extra try/catch | 12:06.47 |
| the depth limit should cover our bases, and the normal page loading should catch most errors | 12:07.27 |
Robin_Watts | Leaving marks throughout the page tree is bad. I think I have a way to get us both speed and correctness. Let me bash on it for a bit. | 12:08.01 |
tor8 | for correctness, recursion instead of iteration is cleanest. but also slower. | 12:10.49 |
| a depth counter is fast and catches the error eventually | 12:11.08 |
Robin_Watts | recursion to a depth of 100 would be bad. | 12:11.33 |
tor8 | yes, but we only iterate to a depth of 100 a.t.m. | 12:12.03 |
| if we recurse we'll have to mark and stop and unwind | 12:12.16 |
| or keep a separate stack and check both depth and then mark/unwind | 12:12.34 |
| but bash on it a while longer if you wish, maybe you can figure something elegant out | 12:12.57 |
Robin_Watts | but I've seen files that do: 1 0 obj << /Count 1000 /Kids [ 2 0 R .... 11 0 R ] >> ... 11 0 R << /Count 990 /Kids [12 0 R ... 21 0 R ] >> etc | 12:13.03 |
tor8 | I'm more concerned with how page tree edits will affect links | 12:13.10 |
| Robin_Watts: so a very tilted page tree? | 12:13.37 |
Robin_Watts | yeah. | 12:13.41 |
tor8 | ick. | 12:13.44 |
| right. depth of 100 will hurt, but so will unwinding marks on errors :( | 12:14.04 |
Robin_Watts | At the moment, the code won't compile, right? | 12:14.13 |
tor8 | build/debug/mupdf will compile | 12:14.25 |
Robin_Watts | Right, but pdfclean doesn't. | 12:14.37 |
tor8 | but not mutool (due to the uncommented page creation code) | 12:14.39 |
Robin_Watts | right. | 12:14.47 |
tor8 | the try/catch safety commit looks fine | 12:16.46 |
| anywhere we save page numbers external to the pdf_obj hierarchy should be refitted to do dynamic page lookups from object numbers IMO | 12:17.41 |
| so we can do safe page edits etc | 12:17.50 |
Robin_Watts | tor8: I think you're right. | 12:17.53 |
tor8 | which means we have to make fz_link etc subclassed by the document types | 12:18.03 |
Robin_Watts | There should be an FZ_NORETURN change on there. | 12:18.07 |
tor8 | have you tried the noreturn changes on gcc? | 12:18.39 |
Robin_Watts | and your first commit is tweaked slightly - I rolled the 'delete dead code' commit into it and tweaked some for loops to not call functions. | 12:18.48 |
| Yes, I just cluster tested it. | 12:18.57 |
| I'm being called for lunch. back in 40 mins, sorry. | 12:19.37 |
tor8 | fab. | 12:19.48 |
| oh bollocks. merge cockup with winrt/mupdfwinrt/status.h | 12:23.10 |
Robin_Watts | back. | 12:57.04 |
tor8 | Robin_Watts: so how do you want to progress now? | 13:00.51 |
| pdf_insert_page and pdf_delete_page should be able to work on the existing page tree with some careful thought | 13:02.59 |
| pdf_balance_page_tree could rebuild a new balanced page tree, if we take the naive approach when implementing insert/delete page | 13:03.57 |
Robin_Watts | tor8: How is the speed at the moment ? | 13:11.09 |
tor8 | which speed? | 13:11.26 |
Robin_Watts | The speed of opening a large PDF file. | 13:11.40 |
| Is that better or worse than before ? | 13:11.45 |
tor8 | pdfref17 is too fast on my desktop to notice. with the naive lookup_page_number it was noticeably slow though. | 13:12.23 |
Robin_Watts | So this code is not noticably worse than the old one? That's all I was worried about. | 13:14.01 |
| It feels like it should be faster not having to load the whole tree when we start. | 13:14.21 |
tor8 | let me do some comparisons | 13:14.44 |
| it's faster :) | 13:15.07 |
| new code loads and prints the outline in 127ms with little variation | 13:15.42 |
| the old code takes between 120 and 150ms with a lot of variation | 13:16.00 |
Robin_Watts | fab. | 13:16.35 |
tor8 | this is on debug builds though | 13:17.18 |
| but still, a measurable difference at these time scales is pretty good to have | 13:17.35 |
Robin_Watts | yes. nice one. | 13:18.28 |
| Of course, this is all a complete pain in the ass for progressive loading. | 13:18.57 |
tor8 | let me try just nuking the outline loading to see what happens then | 13:18.59 |
Robin_Watts | because we can't use the page tree at all. | 13:19.13 |
tor8 | Robin_Watts: why? | 13:19.23 |
| or rather, why not? | 13:19.26 |
Robin_Watts | page tree = last thing in the file. | 13:19.33 |
| pretty much. | 13:19.38 |
tor8 | how did that work before then? | 13:19.40 |
Robin_Watts | My previous progressive loading stuff would find the first page, and populate the entry manually. | 13:20.05 |
| Then when the rest of the file arrived, it would read the page tree. | 13:20.15 |
| The right way to do it is to either use hintstreams to tell you where pages begin/end or spot /Type/Page as you read objects. | 13:20.59 |
tor8 | if we skip loading the outline, the document loads in 90ms | 13:20.59 |
Robin_Watts | tor8: Right, for progressive stuff I need to skip the outline loading. | 13:21.17 |
| cos that's also at the end of the file. | 13:21.23 |
tor8 | the outline is loaded manually by the clients | 13:21.38 |
Robin_Watts | Right. In pdfapp though it's loaded up front. | 13:22.05 |
tor8 | and I suspect if we delay that (in all apps and the android in particular) to when the outline view is actually shown we could gain significant load time speedups | 13:22.29 |
Robin_Watts | For progressive operation, I'll need to have a list of page objects I can populate as it loads. | 13:22.45 |
| and then we'll bin that and go back to the normal mode of operation at the end of loading. | 13:22.58 |
tor8 | ouch. hasOutlineInternal loads and tosses the outline | 13:23.15 |
Robin_Watts | so no editing documents etc while they load - which I don't think is unreasonable :) | 13:23.16 |
| tor8: yeah. | 13:23.21 |
tor8 | then it's reloaded in getOutlineInternal | 13:23.21 |
Robin_Watts | but this feels like a much better way to be working. | 13:23.40 |
tor8 | Robin_Watts: you can't be "sure" that the first /Type/Page you encounter is page 1 though | 13:23.48 |
Robin_Watts | tor8: The linearised object at the top of the file tells me what the first page is. | 13:24.19 |
| After that I am guaranteed to get them in order. | 13:24.31 |
tor8 | right. so look at that for page 0 in these functions? | 13:24.34 |
| and for all pages > 0 require the page tree | 13:24.49 |
Robin_Watts | tor8: These functions will gain a bit that says: if (doc->progressive) { ... look it up in the array ... } else { do what we do now } | 13:25.18 |
tor8 | I think, as much as we can, we should be working at this level now that we support pdf file editing | 13:25.26 |
| all this loading stuff into internal c structs to simplify other bits of code that uses it has to go :( | 13:25.46 |
| which array? | 13:26.17 |
Robin_Watts | tor8: I'll need to have a temporary array that maps page number -> object number or something for use during loading. | 13:27.01 |
tor8 | that stuff isn't part of the hint stream? | 13:27.19 |
Robin_Watts | and that will get binned at the end of progressive loading. | 13:27.21 |
sebras | paulgardiner: sorry for the delay. I think you're mistaken, if "AP" is not present we throw an error, should we do that? | 13:27.24 |
Robin_Watts | 1) We can't rely on the hint stream | 13:27.34 |
sebras | paulgardiner: i.e. if is_dict == FALSE. | 13:27.35 |
Robin_Watts | cos no one uses it, therefore no one generates it right. | 13:27.47 |
| 2) Even if we could rely on the hint stream it's compressed, so we'd need to unpack it into such an array anyway. | 13:28.09 |
tor8 | Robin_Watts: right. I'd be more comfortable just doing page 0 that way than all the pages (since no one generates files properly...) | 13:28.23 |
Robin_Watts | so I'd rather have an array and populate it as we go. | 13:28.29 |
| tor8: Well, chrome manages to display pages as they load. and we're aiming to be as good as chrome. | 13:28.52 |
sebras | paulgardiner: I don't think there is a requirement in pdfref that annotations must have "AP". | 13:28.56 |
tor8 | but anyway, I just changed all the accesses to page_objs/refs to pdf_lookup_page_obj (or should that be page_ref?) | 13:29.00 |
sebras | hence I argue that we shouldn't throw in this case. | 13:29.06 |
Robin_Watts | tor8: right. I am absolutely in favour of all your changes. | 13:29.27 |
| I've just pushed a new commit to robin/pagetree | 13:29.44 |
| Should give us safety with speed. | 13:29.56 |
paulgardiner | Hi sebras. I thought that was correct because the catch clause doesn't bomb out completely: it just removes that annot from the list, which is the same effect as the continue that was in place before, because the continue was before the creation of the annot structure | 13:30.57 |
sebras | paulgardiner: hm... but why is the error printed to the console in that case? | 13:32.03 |
paulgardiner | There's still an fz_warn in there. But there was before, so I wouldn't expect an error printed out where there wasn't before. | 13:32.57 |
sebras | paulgardiner: I guess that is my main gripe. if we do internal error handling using throw to skip over annotations that dont have AP then that is fine, but we shouldn't complain about it. | 13:32.58 |
tor8 | Robin_Watts: ugh. so. much. code. | 13:33.07 |
Robin_Watts | tor8: Well, my worry was that I didn't want to make the code slower. If it's not slow,then we don't necessarily need the faster version. | 13:33.45 |
sebras | paulgardiner: hm. can you try rendering pdfref17.pdf, just the first few pages. if you see no errors/warnings printed for those pages then there might be something strange with my build. | 13:33.55 |
tor8 | I think lookup_page_number is the one that has to be fast, not lookup_page_obj | 13:34.06 |
| since that one is used to load all links and link dests | 13:34.16 |
sebras | paulgardiner: though I rebuilt everything from scratch last night, so I would be really surprised if I messed up. :) | 13:34.16 |
tor8 | though if we make fz_link opaque with dynamic page lookup on demand, neither needs to be blazing fast | 13:34.35 |
paulgardiner | sebras: and this is an error that didn't previously appear, before these changes? | 13:34.57 |
| Ah! Robin spotted a problem with pdfref17.pdf, but that's supposedly fixed now. | 13:35.42 |
Robin_Watts | sebras: there was a problem that used to spit lots of "can't load object 333278 into cache" or something like that errors | 13:36.34 |
| tor8: So are we really happy with just a depth counter? | 13:37.51 |
| I'd like to have a go at pdf_mark/unmarking there. | 13:38.43 |
tor8 | I think I'm happy, but maybe bump it up beyond 100 if we run into these bad deeply nested cases too often | 13:43.14 |
| you could try a recursive variant with mark/unmark though | 13:43.37 |
| but that's going to hit stack problems with those same files | 13:43.50 |
| I'm having a go at insert/delete page | 13:44.06 |
| which needs a more generic lookup_obj function that returns the parent and index in kids array | 13:44.22 |
Robin_Watts | tor8: I think I can do a non-recursive version with mark/unmark. | 13:44.57 |
paulgardiner | sebras: I do see "broken annotation" warnings pdfref17.pdf. We could not warn in that situation, but what I'm more concerned about is if these warnings are newly occuring. Something isn't working as I thought if these warnings weren't appearning before these changes. | 13:47.41 |
Robin_Watts | actually, I think there is a flaw in my faster version with less try/catching. | 13:48.02 |
| By using try/catch in the cleanup path, I think I lose the ability to rethrow the error. | 13:48.29 |
| actually, no, I should be fine. | 13:49.07 |
paulgardiner | sebras: oh hang on. You're right. We'd have avoided a warning there before. Damn! that will be awkard to handle neatly without using a special error code, or a local flag | 13:50.33 |
| Robin_Watts, tor8: now we have error codes, do you think it's reasonable to use them in earnest even for cases where it's just so that we easily share error clean up code? | 13:55.15 |
Robin_Watts | paulgardiner: No point in having 'em if we don't use 'em. | 13:56.14 |
paulgardiner | In this case I'd be introducing a missing-appearance-stream error, just so that I can test for it and avoid issuing a warning. | 13:56.22 |
Robin_Watts | Argh. That's the excuse for all C++ code being awful isn't it ? | 13:56.25 |
paulgardiner | :-) | 13:56.53 |
| sebras: anyway, thanks for spotting that, and sorry for being so slow to spot what was going on. | 13:57.24 |
tor8 | Robin_Watts: you know what, how about just leaving dirt marks around if there are errors? that's screw up recovery on incremental loading though won't it? | 13:57.26 |
Robin_Watts | It'll screw everything up. | 13:58.12 |
| but in particular incremental loading. | 13:58.26 |
tor8 | I hate error handling... it always ends up being five times as much code and completely obscures what the code is supposed to be doing, exceptions or error code returns doesn't matter it's all just the same ugly crap in the end :( | 13:59.40 |
| and our allman-style brace convention really adds to the number on semi-blank useless lines for try/always/catch | 14:00.16 |
paulgardiner | But exceptions at least avoid it polluting a large number of functions | 14:00.19 |
Robin_Watts | Exceptions generally make the error handling much easier to read, but in this case, I'm working to avoid the try/catch overhead. | 14:00.28 |
paulgardiner | At least you don't have to explicitly pass errors through functions that don't even need clean up | 14:01.02 |
Robin_Watts | Just thinking about it... if we hit a file with a large unbalanced tree of pages, we'll overflow the stack in the pdf_lookup_page_obj stuff. | 14:01.12 |
tor8 | Robin_Watts: we did that in the old code too though | 14:01.54 |
| that one was also recursive | 14:01.57 |
Robin_Watts | http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=0e4c512bb4b73d575cfa6640efe6ed6372f6d060 | 14:03.10 |
tor8 | if (mark) both sets and tests? | 14:06.24 |
Robin_Watts | it does. | 14:06.31 |
tor8 | won't the always block hit the same cycle and loop forever? | 14:07.11 |
| while (parent2 != parent) | 14:07.32 |
Robin_Watts | eh? | 14:08.01 |
| I just updated my patch. How did you find the new one? | 14:08.11 |
| http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=8438125a08d50467a1ffc25ec8009ec3af853730 | 14:08.28 |
| oh, you didn't yet. but I think the above solves it. | 14:08.47 |
tor8 | yeah, it looks reasonable. head deep in some code of my own so haven't considered all the cases. | 14:09.50 |
| or is that neck deep? | 14:09.58 |
sebras | paulgardiner: no problem, I meant to fix it last night, but I have slept badly the entire week so I opted for just telling you instead of spending the night bugfixing mupdf. ;) | 14:10.24 |
dogisfat | Why does the *_plane_data portion of a driver provide a height, from my experience I have only seen one passed in as the height. Is it not possible to process the image all at once? | 14:10.33 |
sebras | paulgardiner: it doesn't matter that you broke it as long as we fix it. :) | 14:10.46 |
Robin_Watts | dogisfat: No, it is not always possible to process all the image at once. | 14:12.09 |
paulgardiner | sebras: yeah, but still good to know one way or the other and the mechanism by which it's broken. | 14:12.31 |
Robin_Watts | tor8: I will do the same modification to pdf_lookup_inherited_page_item. | 14:12.33 |
dogisfat | Robin_Watts: Can you give me an example? I am trying to better understand how images work within GS. | 14:12.49 |
Robin_Watts | dogisfat: The device interface is driven from all sorts of different places. | 14:13.46 |
| Sometimes the places that are calling it don't have whole images available. | 14:14.06 |
| So they have to be able to pass the data they do have. | 14:14.14 |
| In particular we might get images in in some obscure compressed format. | 14:14.50 |
dogisfat | That makes sense. Will one row aways be passed at a time, never more? | 14:14.50 |
Robin_Watts | dogisfat: No. We can pass as many lines as we have. | 14:15.07 |
dogisfat | Ah, thanks! | 14:15.18 |
| Also I understand that PDFs are flattened when there is transparency within the image. What would happen if there were a PNG with transparency in the PDF, will that be passed as a transparent image to *_plane_data? | 14:16.02 |
Robin_Watts | dogisfat: Different type of transparency, I believe. | 14:17.18 |
| actually, no. ignore me. | 14:17.29 |
| PDFs do not ever have PNGs in them. | 14:17.42 |
chrisl | We don't support PNG in PDF, IIRC | 14:17.42 |
kens | pdfwrite doesn't, no | 14:17.55 |
chrisl | PNG is an almost never implemented optional filter in PDF, IIRC | 14:18.13 |
dogisfat | Lets say then any image format with transparency found in a PDF. | 14:18.40 |
kens | pdfwrite only flattens transparency when going to a version of PDF < 1.4 or to PostScript | 14:19.02 |
| But that's more to do with transparency than images | 14:19.38 |
Robin_Watts | chrisl: PNG is not even mentioned in the spec! | 14:19.56 |
tor8 | Robin_Watts: ugh. missing functions I want: pdf_array_delete and pdf_array_insert_drop | 14:20.51 |
Robin_Watts | dogisfat: Most image formats (including all of those used in PDF I believe) support (at most) 'on' or 'off alpha channels (akin to gif) | 14:20.56 |
chrisl | Robin_Watts: Oh, maybe I dreamed that, then...... | 14:21.12 |
kens | I dn't think images carry transparency as such in PDF< they can be associated with a SMask, or in a transparecncy group, but that is all (I think) | 14:21.19 |
dogisfat | Ok, thank you very much | 14:21.32 |
Robin_Watts | The only way to get 'proper' alpha transparency is to use, as kens says, either an SMask, or a transparency group. | 14:21.39 |
kens | As robin says, ther is also the imagemask but tha'ts not exactly transparency | 14:21.43 |
Robin_Watts | kens: PDF can do on/off transparency using colormasking things too. | 14:22.06 |
| but again that's not real transparency | 14:22.22 |
kens | Robin_Watts : so can PostScript, but its stil masking, not really transparency | 14:22.25 |
| type 4 images if I recall are chroma keyed | 14:22.49 |
dogisfat | I am really interested in extracting elements from different document types and trying to figure out how the data will be available to me. | 14:23.29 |
Robin_Watts | dogisfat: Using gs? You'll always get the uncompressed image data fed to you. | 14:27.41 |
| If you're writing your own device, then you can avoid using the clist, and thus hopefully avoid getting images clipped into several bands. | 14:28.31 |
dogisfat | Robin_Watts: Right, but imagine I have a PDF as an input file with two overlapping images with transparency. How will those arrive in my driver? Obviously this depends on the way the driver is written, but in my case I am intrested in how it will arrive at the *_plane_data function | 14:30.02 |
Robin_Watts | PDF files are intrinsically drawn with the painters algorithm right? | 14:31.18 |
| start at the back, move forwards. | 14:31.26 |
| New things overlay old things. | 14:31.41 |
| So generally you'll expect to get things in order. | 14:31.58 |
| HOWEVER, for transparency, we do all sorts of stuff with compositor groups etc. | 14:32.15 |
| so it can get complex. | 14:32.34 |
| Can I ask, is postscript important to you? | 14:32.43 |
dogisfat | Yes, as much so as PDF if not more | 14:32.53 |
Robin_Watts | ok. | 14:33.01 |
| so I can't suggest that you look at mupdf then. | 14:33.12 |
dogisfat | I am writing a driver to intercept PS files from the OSX print system and break them into graphical elements. I also wish to support as many other file types as possible and PDF happens to be an easy one to test with. | 14:34.22 |
| Yeah, I have evaluated a handful of packages but thus far GS supports the most features. | 14:34.57 |
Robin_Watts | dogisfat: I think there is a trace device that can be built into gs. | 14:36.23 |
dogisfat | Robin_Watts: I am using it as a model. | 14:36.44 |
Robin_Watts | right, so with that you can see exactly how the device is tickled. | 14:37.50 |
dogisfat | I will, however, I need to see with what it is supplied and I am having trouble getting said information out. I am so inexperienced with GS that performing simple tasks such as writing data out to a file can be very cumbersome. | 14:40.11 |
Robin_Watts | dogisfat: If you enhance the trace device to trace with more information than it currently does, then let us know. We'd be interested to see what you come up with. | 14:41.26 |
dogisfat | I will be sure to do that! | 14:41.53 |
Robin_Watts | tor8: New version of the latest commit on robin/pagetree | 14:42.15 |
| tor8: Is there a separable bit of work I can do to help you? | 14:42.37 |
| Oh, I should review paulgardiners changes, unless you have. | 14:42.50 |
dogisfat | One last question if I may. Will plane_data be called sequentially until all of an images information has been passed? | 14:44.12 |
Robin_Watts | Yes. | 14:44.52 |
dogisfat | Awsome | 14:45.08 |
Robin_Watts | I mean, a caller might decide to decimate an image and send it as several smaller images over the device interface. | 14:45.18 |
| As long as the device interface sees all it needs to render stuff, it doesn't care. | 14:45.33 |
| but why would they? | 14:45.37 |
| 99 times out of 100 callers do "the thing that is simplest for them", which in this case is generally to run through an image sending all it's data in conveniently sized lumps. | 14:46.31 |
dogisfat | Ok, that makes sense, I just remember reading that calls to one of the image handlers was such that you had to keep track of which image was being processed. | 14:47.45 |
tor8 | Robin_Watts: more pagetree stuff on tor/master | 14:48.17 |
Robin_Watts | dogisfat: Right. For some PS image functions, you get image and alpha interleaved. | 14:48.20 |
tor8 | reimplemented lookup_page_obj, that needs try/catch sanity checking | 14:48.33 |
| and check for off-by-one errors in the array_insert/drop | 14:48.45 |
dogisfat | Robin_Watts: Thank you for al of your help. | 14:48.49 |
Robin_Watts | and they are possibly sent as separate images. | 14:48.51 |
| np. | 14:48.53 |
| tor8: Let me look. | 14:49.06 |
| The first of paulgardiners commits looks fine. I still need to look at the second one. | 14:49.29 |
| previously pdf_array_insert would have inserted at the front ? | 14:50.27 |
| if we need that we should call it pdf_array_prepend (and likewise pdf_array_append) | 14:51.18 |
tor8 | we have pdf_array_push for adding to the end | 14:51.32 |
| insert used to insert at the front, yes | 14:51.51 |
Robin_Watts | pdf_array_insert is broken. | 14:52.15 |
| second arg to memmove is wrong I think. Needs a +i ? | 14:52.24 |
tor8 | d'oh! it used to have a+i | 14:52.38 |
| must've undone once too many somewhere | 14:52.53 |
| the retainpages code is probably broken now in pdfclean, with the resources etc being inherited from the page tree | 14:53.20 |
Robin_Watts | does pdfclean now build? | 14:53.42 |
tor8 | everything builds now | 14:53.49 |
| retainpages could be rejigged to call pdf_delete_page | 14:54.28 |
| and then make a tree rebalancer (or just recreation function, based on the #if 0'd code) | 14:54.45 |
| that should help us not have to recreate the dests name tree as well | 14:55.40 |
| we still have to take care of page resources | 14:56.22 |
Robin_Watts | so, have you taken on my try/catch stuff as a basis for these commits ? | 14:56.27 |
| yes, you took on the simple versions. | 14:57.01 |
| OK. | 14:57.03 |
tor8 | no. the lookup_page_obj had to be rewritten, and I haven't cherry-picked over your second one | 14:57.03 |
Robin_Watts | You took on: Add try/catch safety to pdf_lookup_page_obj_imp | 14:57.28 |
tor8 | yes, I used your simple version | 14:57.30 |
| with the skip counter instead of offset | 14:57.37 |
Robin_Watts | right, that makes the diffs make sense :) | 14:57.38 |
| You can't pass &needle in to pdf_lookup_page_loc_imp | 14:58.52 |
| because if !hit you need the original value of needle for the error message. | 14:59.10 |
| That's why I had int skip = needle; and passed &skip | 14:59.21 |
tor8 | oh, right you are | 14:59.42 |
| fixes pushed | 15:00.12 |
Robin_Watts | For pdf_lookup_page_loc, I might be tempted to have a structure with a pdf_obj * and an int * in. | 15:00.25 |
| And then pass the pointer to that rather than 2 separate pointers. | 15:00.31 |
| on ARM having 4 or less params is nicer. | 15:00.46 |
tor8 | too many types! I see your point, but ugh. | 15:00.48 |
Robin_Watts | minor thing. | 15:00.55 |
| It means that if we ever have to update what a "location" is, we can do it without changing too much code. | 15:01.20 |
tor8 | hopefully won't see too many calls there | 15:01.23 |
| we could hoist the Count skip thing up to before we call the recursion | 15:01.43 |
Robin_Watts | Root/Pages is guaranteed not to be a Page is it? | 15:04.26 |
tor8 | I believe it must be a page tree node (as opposed to page tree leaf) | 15:05.53 |
| since it must have a Count at the top somewhere | 15:06.14 |
Robin_Watts | tor8: Are you going to rebase -i fixup those fix commits ? | 15:07.21 |
tor8 | sure. let me just try hoisting the count skip update first. | 15:07.45 |
Robin_Watts | ok. I'm not sure I see how that will work, but I'll wait for it :) | 15:08.27 |
tor8 | pull :) | 15:08.52 |
| not sure if that catches every corner case though | 15:09.39 |
| ignore that one, I'm too tired for it... off by one bugs everywhere too | 15:10.21 |
| I'll squish the rest up and push a cleaned fixup | 15:10.37 |
dogisfat | Are there routines avaliable that I can call to convert binary data into Base64 available within G? | 15:10.48 |
| GS* | 15:10.51 |
Robin_Watts | tor8: oh, I see. On the grounds that we should never be out of range of the topmost item, hence pushing in doesn't hurt. | 15:11.24 |
tor8 | yeah. | 15:11.36 |
Robin_Watts | Well, you've lost my Mariotastic comment :( | 15:11.48 |
| if (*skip >= count) { *skip -= count; break; } hit = pdf_lookup_page_loc_imp; | 15:12.48 |
| no need for if(hit)break then, and fewer lines overall? | 15:13.05 |
| We had a rule at my previous place that we should avoid right hand creep where possible, so it's become a habit. | 15:13.39 |
| Oh, ignore me! We still need if (hit)break. | 15:14.28 |
| so, I approve of your first 2 commits, and the fixes for them. | 15:15.04 |
| I still have to read 3 and 4. | 15:15.15 |
| oh, 4 is another fix. That's fine too. | 15:15.34 |
tor8 | Robin_Watts: squished and resorted commits on tor/pagetree now | 15:15.34 |
| still need to rebase past an area of conflict... | 15:15.54 |
Robin_Watts | First one is still broken :) | 15:16.10 |
| +1 != +i | 15:16.17 |
tor8 | shoot me now. | 15:16.32 |
SparFux | Hi all. In Fedora F18, I get a foomatic-rip error: http://pastebin.com/ec2Lgc7A | 15:16.47 |
Robin_Watts | It's a friday. | 15:16.48 |
kens | SparFux : looks like your PostScript file is broken. If youi want someone to look more closely, open a bug report | 15:17.38 |
Robin_Watts | SparFux: You either need to speak to tkamppeter, or you need to reduce the problem to a direct invocation of ghostscript. | 15:17.49 |
kens | SparFux : the last error is 'broken pipe' so perhaps you need to back to teh Foomatic rip people | 15:18.43 |
SparFux | Yeah, gs can actually display the file on screen. | 15:19.16 |
kens | SparFux : then it sounds like its not us.... | 15:19.33 |
SparFux | so I guess it's not broken, but it might be the foomatic-rip stuff. | 15:19.34 |
| kens: thx :-) | 15:19.39 |
Robin_Watts | tor8: Your second commit has: "// TODO: inherit" in it. | 15:20.04 |
| just after you add the inheritance code :) | 15:20.12 |
| actually, there is a problem with pdf_array_insert I think. | 15:21.58 |
| Suppose I have an array with 2 things in, and a cap of 4. And I want to insert at position 10. | 15:22.26 |
tor8 | yeah, we're not good about bounds checking there | 15:22.50 |
| should probably add some | 15:22.59 |
| 0 <= i <= array->len | 15:23.21 |
Robin_Watts | same issue in pdf_insert_page | 15:23.26 |
| tor8: That feels reasonable. | 15:23.37 |
tor8 | I gotta go soon, I've pushed rebased version on tor/pagetree | 15:23.42 |
| don't push the hoist one whatever you do, I think it's broken | 15:23.50 |
Robin_Watts | I can do these little fixes if you want. | 15:23.55 |
tor8 | go for it. I'm not fit for more coding today, considering how many times I tried to fix array_insert and failed >.< | 15:24.19 |
Robin_Watts | ok. | 15:24.51 |
| I'm gonna take a 20 minute break. | 15:25.05 |
tor8 | make sure to pull before you start again, there's one more fix up | 15:27.41 |
Robin_Watts | will do | 15:28.26 |
dogisfat | How does the ImageMatrix describe the translation/transformation of an image? | 15:46.22 |
kens | dogisfat, that's really outside the scope of an IRC discussion, you should read teh PostScript language reference manual or the PDF reference manual | 15:47.08 |
dogisfat | Ok, thank you. | 15:47.21 |
SparFux | Sorry guys! Was definitely my fault. My direct printing script and the cups configuration used an old IP. | 16:04.19 |
kens | Good to know you've sorted it out SparFux | 16:04.42 |
SparFux | kens: yeah, just didn't want to leave anybody in doubt in might have been postscript or some software has a bug not found yet :-) | 16:05.26 |
Robin_Watts | tor8: IF we have a /Page with a /Count in it, we'll go wrong. | 16:25.49 |
tor8 | Robin_Watts: in which function? | 16:28.30 |
Robin_Watts | pdf_count_pages_before_kid | 16:28.53 |
| robin/pagetree has the out of bounds fixes on. | 16:30.10 |
| proposed fix for pdf_count_pages_before_kid on robin/master. | 16:32.41 |
| robin/pagetree | 16:32.47 |
| oops. broken. but you get the idea. | 16:34.21 |
| Fixed one there now. | 16:35.58 |
| and now a version that compiles too. | 16:37.22 |
| tor8: If I don't speak to you before, btw, I hope the move goes well. | 16:38.36 |
tor8 | Robin_Watts: right, they both look fine | 16:55.53 |
| and thanks :) | 16:56.03 |
Robin_Watts | tor8: np. | 16:56.12 |
| cluster testing shows a file that goes wrong. | 16:57.35 |
| will look into it. | 16:57.38 |
| And a file that now works :) | 16:58.01 |
| ghostpcl/tests_private/pdf/PDF_1.7_FTS/fts_07_0704.pdf has page 2 with no /Type/Page | 17:00.36 |
| oh, it's a /Type /Template | 17:01.02 |
| cluster problems aren't really problems, so it looks good. | 17:23.31 |
ray_laptop | Robin_Watts: Scott's trying to skype you | 20:57.00 |
| Forward 1 day (to 2013/06/29)>>> | |