| <<<Back 1 day (to 2015/12/08) | 20151209 |
onionhammer | hey guys, question.. if I have an fz_buffer, what's the idiomatic way to output that to a file | 02:46.49 |
| i guess i can just fwrite the data ptr? | 02:52.49 |
kens | chrisl removing the ReusableStreamDecode from the ICCBased colour space resolve code reduces the memory usage from > 1.9GB to 800MB. Its still a lot, but it indicates a fairly serious problem :-( | 08:37.53 |
chrisl | kens: Well, first question is: is this happening in global VM? | 08:42.02 |
kens | Hmm, let me check | 08:42.43 |
| No, defintely not | 08:43.08 |
| The ReusableStream is stored as /DataSource in a dictoinary, which is stored in an array. On return from that function I 'pop' the array, so the array and dict should be released. | 08:43.58 |
chrisl | Well, they won't be release immediately | 08:44.18 |
kens | Yes, but the ReusableStream seems never to be released | 08:44.38 |
| I only mention the sequence to show that the dict and therefore stream *is* released, its not a dangling reference | 08:45.09 |
chrisl | I'll hack together a test file, and see if I can spot something obvious.... | 08:45.46 |
kens | I'm not sure how easy it will be to tell :-( | 08:46.15 |
| Bearing in mind this is all inside the PDF interpreter | 08:46.27 |
chrisl | True, but if there's something amiss with the ReusableStreamDecode it *should* be visible from a fairly simple PS example | 08:47.15 |
kens | I'd hope so..... | 08:47.26 |
chrisl | Are there any other filters involved? | 08:49.34 |
kens | Hmm, maybe.... | 08:49.52 |
| Looks like Flate as well | 08:51.05 |
chrisl | So it could be a problem with Flate | 08:51.29 |
kens | Wel, I suppose possibly, but if I knowck out *just* the reusable stream the problem gets better | 08:52.09 |
| (not right, but better) | 08:52.15 |
| Note that the way thsi sis set up, I never actually use the stream | 08:52.44 |
chrisl | Huh, invalidaccess on setfileposition | 08:54.00 |
kens | Odd, what *have* you done ? | 08:54.22 |
| OK seems like they all have Flate and nothing else | 08:55.03 |
chrisl | Open a file for reading and do "<file> 0 setfileposition" | 08:55.12 |
kens | Well that seems like it should work | 08:55.27 |
| FWIW we don't use ReusableStreamDecode that much in the PDF interpreter, far less than I thought | 08:58.10 |
chrisl | Well, I'm not seeing increasing memory use with repeated ReusableStreamDecode filters in a simple PS example | 09:00.52 |
kens | I feared that might be the case | 09:01.05 |
chrisl | Is it viable to replace it with a SubFileDecode? | 09:01.38 |
kens | Umm, no idea | 09:01.48 |
| I don't know why we even put a ReusableStream in there | 09:02.10 |
chrisl | Just to be clear: preventing the ICC profile creation, but leaving the filter in place still showed the problem? | 09:03.01 |
kens | I'm not entirely knocking out the use of ICC profiles. I've disabled it for images only, byu replacing a bunch of functions in pdf_draw.ps | 09:03.50 |
| Now, with the ReusableStream removed, teh ICC code can't read the profile (I believe, let me check) | 09:04.42 |
chrisl | Well, I ask because when we create an ICC profile object, we store the ICC profile data in a buffer attached to the profile object | 09:04.59 |
kens | Yes, but I showed that was being freed appropriately | 09:05.17 |
| OK I poped the stream and replaced the DataSource in the dicionary with null | 09:05.59 |
| and the file still runs, so obviously I'm not using the DataSource | 09:06.13 |
| peak memory usage seems to be ~250 MB | 09:06.54 |
| Umm its going up again, probably more shadings | 09:07.35 |
| Which seem to have the same problem, unsurprisingly since they use the same code to reolve the ICC profile | 09:08.05 |
| OK that peaked at 794MB, let me put back the ReusableStreamDecode and I bet it will run out of memory | 09:09.21 |
| One of the problems is that I can't edit the PDF file, as decompressed its > 800MB | 09:10.38 |
| Aha, interesting | 09:11.05 |
| It looks like not putting the ReusableStream into the dictionary solves the problem, its only if I add it to the dict as a /DataSource that the problem arises. Which suggests that popping the array is not freeing the dictionary (maybe there's another reference) or freeing the dictionary is not releasing the ReusableStream | 09:12.26 |
chrisl | Um, that actually makes even less sense :-( | 09:13.17 |
kens | Well, if the dictionary created for the image has another reference then it won't be releases, which means the DataSource wo't be released either | 09:14.02 |
chrisl | Oh, okay. Well, we'll be storing it when we dereference the PDF object reference | 09:15.37 |
kens | Yes, that's what's happening. We call 'resolvecolorspace' which converts the array '[/ICCBased object 0 R]' into '[ICCBased -dict-]' | 09:16.35 |
| The dict contains (amongst other stuff) a /DataSource | 09:16.54 |
| If I store the stream in /DataSource then the memory increases, if I store a null in DataSource and pop the stream, the problem goes away | 09:17.32 |
| pop nullSO it seems like either there is more than one reference to the dictionary (in addition to the array that contains it) or that popping the array does not (somehow) release the DataSource | 09:18.38 |
| Let me send you a pdf_draw.ps | 09:19.27 |
chrisl | I'm going to get a coffee..... | 09:20.51 |
kens | Wise move..... | 09:20.59 |
chrisl | So, I'd guess that we keep the resolved color space dictionary in the same way we keep (most) other objects - replacing the resolving procedure(s) with the resolved objects, so that if we reuse them, we don't have to re-resolve | 09:32.39 |
kens | Hmm, that's a good point, I bet we do | 09:32.55 |
| But, in that case, we should not resolve the space again, and we certaily do | 09:33.20 |
| Or at least, I think we do | 09:33.35 |
chrisl | In forms? | 09:33.46 |
kens | I think the PDF interpreter stores them by object number | 09:34.08 |
| SO it doesn't (or shouldn't) matter where they are used. | 09:34.20 |
chrisl | But aren't forms run in a save/restore | 09:34.27 |
kens | image_resolvecolorspaceI would have thought so yes, but this is (again I thnk) mostly done in a single level of form nesting. And if we restore'd away the resolved space, shouldn't the DataSource go with it ? | 09:35.25 |
chrisl | Hrm, yes.... | 09:35.56 |
| Oh, maybe each case is a different color space object, referencing the same ICC profile? | 09:36.29 |
kens | I admit it does look like we are resolving the spaces more than once | 09:36.32 |
| Hmm, that's possible, the colour space definition for the images is unusual | 09:36.59 |
| Instead of 'CoorSpace xxx 0 R' these are defined as '/ColorSpace [/ICCBased xxx 0 R]' | 09:37.43 |
| SO perhaps that is what's going on. It would also explain why extracting the page with Acrobat solves it | 09:38.24 |
chrisl | Right, so we won't cache the color space array, but we will cache the image dict containing it | 09:38.43 |
kens | I imagine Acrobat alters the ColorSpace definitions | 09:38.44 |
| THis looks like its going to be a nightmare to solve, I'll need to thnk about it | 09:39.22 |
chrisl | My first thought is to take a copy of the image dictionary, do all the resolving and rendering in the *copy*, and throw away the copy, leaving the original intact | 09:41.25 |
kens | Hmm, possibly, like I said I'll need to thnk about it, got to go and do a few chores before I head out. THanks for the help though | 09:42.33 |
chrisl | Okay, back to trying to figure out why my dictionary parsing works on the plain file Fontmap.GS, but fails on it in romfs :-( | 09:52.39 |
| Or maybe I will shoot round to the shops first.... | 09:56.28 |
leopatras_ | Hi, I stumbled across mujs: looks really neat! However I miss the tests/ subdir in the git repo . Any pointers on that ? Also whats planned next. Are there thoughts about a build in debugger ? | 11:20.48 |
Robin_Watts | leopatras_: Hi. I don't think we're likely to be doing a debugger. | 11:33.59 |
| You really need to talk to tor8 when he appear. | 11:34.12 |
leopatras | What was the goal for mujs ? Which projects are using it ? | 14:46.05 |
kens | It was written for MuPDF, MuPDF is using it. Beyond that, we wouldn't necessarily know | 14:46.37 |
leopatras | ok , understand. How large is the amount of js lines in MuPDF ? | 14:47.33 |
kens | Umm, it doesn't work like that.... PDF files can contain JavaScript, MuPDF executes the JavaScript it encounters in PDF files. | 14:48.11 |
| According to my (limited) undestanding | 14:48.35 |
leopatras | aah.. didn't know that PDF can contain js. Always learn something new:-) | 14:48.58 |
| I really like the minimalist approach of the C-sources. A JS interpreter in about 10KLoc is amazing. However the samples are pretty sparse...will need to come back once I fiddled around a bit with it. Thanks! | 14:54.00 |
Robin_Watts | leopatras: If you feel like contributing examples etc, that'd be great. | 15:27.28 |
| Forward 1 day (to 2015/12/10)>>> | |