| <<<Back 1 day (to 2013/10/24) | 2013/10/25 |
stanv | It is necessary to fill BBox with real numbers ? I fill it with INT | 09:14.51 |
kens2 | Depends on the BBox, I suggest you check the spec | 09:18.44 |
stanv | I don't understand what is content stream ? Is it PS ? Or what ? I have stream and rg, Tf Td tj res ref â¦. | 09:22.26 |
| Where I can read about it ? | 09:22.37 |
kens2 | The PDF Reference Manual | 09:22.54 |
stanv | I can't find ( | 09:23.02 |
kens2 | open DPF reference manual search for 'content stream'. First result section 3.7 "Content streams and resources". Why is this hard ? | 09:25.22 |
stanv | I did it | 09:26.21 |
| But i can't find anything about "res" "ref" | 09:26.39 |
kens2 | No reaosn you would | 09:27.30 |
Robin_Watts | chrisl: ping | 10:49.53 |
| I'm going to have to ignore the alignment issues etc of memory buffers until ray gets here I think. The clist (as ever) complicates things massively. | 11:19.46 |
| Not to mention the problems I suspect we will have with the garbage collector. | 11:21.15 |
| gc will probably want to move the buffer so it's not aligned any more. | 11:21.36 |
| So their SSE code is bonkers. | 11:37.36 |
| hey paulgardiner. Back home ? | 11:37.44 |
paulgardiner | Yep. Back to UK weather, although I guess it's not as bad today as it has been from all accounts. | 11:39.36 |
| Was a thoroughly good holiday. Jumped up and down on the top of vesuvius but didn't manage to set it off. | 11:41.17 |
Robin_Watts | glad to hear it. | 11:44.38 |
chrisl | Robin_Watts: sorry, forgot to change my status..... back now | 11:56.22 |
Robin_Watts | chrisl: No worries. | 11:56.32 |
| I was talking yesterday with ray about how we'd like the buffer devices to be aligned better. | 11:57.09 |
| (i.e. align to 16 bytes and pad the raster to 16 bytes to enable SSE to work on it) | 11:57.27 |
chrisl | Yeh, I saw some of the discussion..... | 11:57.37 |
Robin_Watts | BUT... the garbage collector will bugger than up, right? | 11:57.47 |
| But having looked at their SSE, I don't think they need it to be aligned anyway, so I propose to ignore the problem. | 11:58.22 |
chrisl | Maybe, but we could deal with that - IF raster memory is allocated in GC memory, which seems rather pointless to me. We could tweak gc to retain alignment, but I wonder if it would be better to allocate raster memory from non-gc memory....... | 11:59.40 |
Robin_Watts | chrisl: It looks to me like the clist allocates a large block of memory and the raster memory is magically in the middle of it somewhere. | 12:01.03 |
| So i'm very glad to be ignoring the problem :) | 12:01.15 |
chrisl | Yeh, I think you're right - let's ignore it unless it really is a issue..... if it does become an issue, the other option would be copy data out of the raster buffer into an aligned buffer do the operations and copy them back again, until you reach the first aligned address in the raster buffer, then do the operations in place from there on. | 12:04.11 |
Robin_Watts | Their SSE is bonkers, as far as I can tell. | 12:11.27 |
| Well, inefficient. | 12:11.42 |
| And their output isn't really 4 bit, it's 0-7 | 12:20.29 |
kens2 | chrisl ping | 13:30.39 |
chrisl | kens2: pong | 13:30.55 |
kens2 | do you have a URL for the 'how to get stuff out of CUPS' page ? | 13:31.18 |
| For bug 694743 | 13:31.36 |
chrisl | https://wiki.ubuntu.com/DebuggingPrintingProblems#Capturing_print_job_data | 13:31.50 |
kens2 | THanks | 13:31.55 |
chrisl | That's going to be a 'mare to work on, without direct access to the printer in question..... | 13:32.37 |
kens2 | Well possibly, though it may be that its the PXLmono driver that's slow | 13:33.12 |
chrisl | That's true...... | 13:33.58 |
kens2 | I'm just trying to get som sensible info from him | 13:34.16 |
chrisl | Once again, I really feel the CUPS devs should be the ones to work out how to reproduce the problem in Ghostscript - rather than us and the end user..... | 13:37.11 |
kens2 | Me too, but they never do | 13:37.21 |
Robin_Watts | chrisl, kens2: Frankly, I feel that we ought to push back on this. | 13:37.53 |
kens2 | Robin_Watts : we've tried that before | 13:38.04 |
chrisl | Robin_Watts: the problem is, we end up penalising the user, and it's usually not their fault | 13:38.19 |
Robin_Watts | If the cups people can't give us a command line to reproduce the problem, then they can't expect us to solve it. | 13:38.43 |
| And as nice as it is to help users out, ultimately, that's not our job. | 13:39.07 |
chrisl | I think in this case we may not have any option...... | 13:40.16 |
Robin_Watts | chrisl: RESOLVED INVALID | 13:40.36 |
| Has there been any indication that the cups people have even looked at the problem? | 13:41.16 |
kens2 | Robin_Watts : I've spent less than 5 minutes on this, its not a great investment so far. | 13:43.00 |
| I spend longer than that on Stack Overflow each day | 13:43.15 |
Robin_Watts | yes, but this could easily turn into a time sink, and the idea is that all the time on SO comes back multiple times as the answers are there are references. | 13:44.05 |
| But it's you guys who get to deal with this, so I'll shut up. | 13:44.18 |
kens2 | chrisl this one may be of some interest to us in future. Someone wanting to know hwo ti 'sign the Ghsotscript print driver' on Windows 8. It seems there's a reasonably simple way to do it, and he's been kind enough to outline the steps too: | 13:48.00 |
| http://stackoverflow.com/questions/19520540/signed-ghostscript-postscript-print-driver/19535192?noredirect=1#comment29067490_19535192 | 13:48.00 |
chrisl | Interesting, I was under the impression we need stuff from MS for it, but apparently not...... | 13:49.25 |
kens2 | Mayeb the certificate he talks about in step 1 | 13:49.38 |
| Anyway, its far from vital | 13:49.47 |
| Just interesting | 13:49.52 |
Robin_Watts | Morning mvrhel_laptop | 13:50.03 |
| You speak SSE, right ? | 13:50.11 |
mvrhel_laptop | good morning | 13:50.13 |
| Robin_Watts: I used to | 13:50.24 |
Robin_Watts | could I get you to sanity check some code later for me please? | 13:50.36 |
kens2 | A weasel reply :-) | 13:50.37 |
mvrhel_laptop | Robin_Watts: yes that would be fine | 13:51.17 |
Robin_Watts | Thanks. | 13:51.22 |
kens2 | mvrhel_laptop : ping | 14:02.10 |
mvrhel_laptop | kens2: I will be back in about 45 minutes | 14:12.29 |
kens2 | OK NP | 14:12.34 |
Robin_Watts | Morning ray_laptop | 14:59.37 |
ray_laptop | Robin_Watts: I saw the comment earlier about buffer alignment, but I have to take my son in now, so I'll be back in about 30 m | 14:59.51 |
Robin_Watts | Can I get some time with you today to talk about this padding stuff? | 15:00.00 |
| Ah, great, thanks. | 15:00.04 |
ray_laptop | Robin_Watts: and the GC doesn't move around chunks -- it just moves small stuff to pack chunks | 15:01.00 |
| and since the everything larger than about 10k gets its own chunk, the "buffer" is safe | 15:01.34 |
| Robin_Watts: tweaking the line pointer setup to leave padding, and/or perform alignment should be possible, but I'll have a look when I get back | 15:02.35 |
Robin_Watts | ray_laptop: Thanks. | 15:03.06 |
| While it looks like their SSE code doesn't NEED alignment, it does NEED padding. | 15:03.27 |
ray_laptop | It'd be a good check to make sure that no place assumes that lines are at a given stride, rather than using the line pointers | 15:03.38 |
Robin_Watts | And alignment would let the SSE code run a lot faster. | 15:03.46 |
| I am always slightly surprised that getbits doesn't return the raster. | 15:04.16 |
ray_laptop | Robin_Watts: AIUI even if it doesn't *need* alignment, reading aligned data is faster | 15:04.21 |
Robin_Watts | get_bits_rectangle, rather. | 15:04.25 |
| ray_laptop: yes. but SSE has separate instructions for aligned and unaligned data. | 15:04.50 |
| and the unaligned ones are slower. | 15:04.57 |
ray_laptop | Robin_Watts: yes, that's what I was trying to say | 15:05.15 |
| bbiab. ~30m | 15:05.57 |
mvrhel_laptop | kens2: I am back | 15:20.52 |
kens2 | Hi michael | 15:21.03 |
| I htink I see a problem with ghx_concretize_ICC | 15:21.21 |
mvrhel_laptop | ok | 15:21.27 |
kens2 | when I compoare the ciode with gx_remap_ICC it does not cater for Lab spaces (sepcifically remapping the components in to the correct range) | 15:21.56 |
| with the result that I get incorrect colours for Lab spaces | 15:22.14 |
| in gx_rempa_ICC is this: | 15:23.05 |
| if (pcs->cmm_icc_profile_data->data_cs == gsCIELAB || | 15:23.05 |
| pcs->cmm_icc_profile_data->islab) { | 15:23.05 |
| but I see no equivalent in gx_concretize_ICC | 15:23.18 |
mvrhel_laptop | hmm that is true. is there a case when going to a raster device that this causes an issue or only for what you are doing in the new pdfwrite code | 15:25.25 |
kens2 | the rendering code doesn't seem to use concretize, but the pdfwrite (actually ps2write) code does | 15:25.54 |
| and it gets different (wrong) answers compared to rendering | 15:26.05 |
mvrhel_laptop | I understand | 15:26.16 |
kens2 | I htink the same special case range remapping needs to be done in conretize | 15:26.34 |
mvrhel_laptop | kens2: yes it looks that way | 15:26.42 |
| do you want to go ahead and make the change | 15:26.53 |
kens2 | I'll do it tomorrow if that's OK with you | 15:27.04 |
mvrhel_laptop | I am fine with that | 15:27.09 |
kens2 | OK thanks. I'll put it past you for review before I commit | 15:27.24 |
mvrhel_laptop | ok. I am more interested in the cluster push results. I think this is just going to be a cut and paste job | 15:27.58 |
kens2 | yeah it'll be easy code, I';ll cluster push it separately | 15:28.18 |
mvrhel_laptop | ok sounds good | 15:28.27 |
Robin_Watts | http://www.meteorwatch.org/uk-iss-passes-october-2013.html | 15:37.36 |
| Today should be a good day for seeing the space station. | 15:37.48 |
ray_laptop | Robin_Watts: OK, I'm back. | 15:42.26 |
Robin_Watts | ray_laptop: OK, so let me find the bit of code that had me interested earlier... | 15:42.54 |
| gdev_mem_set_line_ptrs, in gdevmem.c | 15:43.25 |
ray_laptop | Robin_Watts: this gets called by gx_default_setup_buf_device. | 15:44.05 |
Robin_Watts | Indeed. Look at what happens to the value of raster that is passed in. | 15:44.22 |
| If base is non null (i.e. if we are being given a buffer) then we take on the value of raster we are given, and we store it in mdev->raster | 15:45.17 |
| but then when we get into the bottom loop to set up the line pointers, we completely ignore raster and calculate it again. | 15:45.46 |
ray_laptop | Robin_Watts: right, screwy | 15:45.54 |
Robin_Watts | Now, as long as the raster we pass in is the same as the calculated one, no harm, no foul. | 15:46.21 |
| but this morning I was watching a planar device in the debugger. | 15:46.43 |
| I was getting a raster value of 24792 or something like that on entry. | 15:47.03 |
ray_laptop | so even if we feed it a padded bytes_per_line, (called raster in that func), it would ignore it and use the "width" | 15:47.11 |
Robin_Watts | which was for a width of 4900 or something. | 15:48.03 |
| with 5 planes. | 15:48.10 |
| So the *actual* raster is 4900ish, not the stored value of 24792. | 15:48.50 |
| and if you actually calculated the size of the buffer based upon the stored raster, you'd be off too, as 5*padded(w) != padded(5*w) | 15:49.02 |
| So.... I'm thinking that we should remove the calculation from this function and only use the value passed in. | 15:49.33 |
| And we should ensure that in planar cases, we pass in the correct value. | 15:49.45 |
ray_laptop | Robin_Watts: I agree. That seems better | 15:49.51 |
Robin_Watts | That way we can potentially pad higher up and pass in a padded width here without ill effects. | 15:50.18 |
| OK, so let me find the next bit that had me confused. | 15:50.34 |
ray_laptop | and that would let us set the pad in a custom setup_buf_device | 15:51.08 |
Robin_Watts | clist_rasterize_lines calls gx_default_setup_buf_device which calls gdev_mem_set_line_ptrs | 15:51.29 |
| In particular mdev->base is set to be mdata, which is calculated at line 707 of gxclread.c | 15:52.21 |
ray_laptop | Robin_Watts: but the "gotcha" is that the buffer size needed for a band height needs to know about the pad (or the band height calculated from the buffer size) | 15:52.44 |
Robin_Watts | So that looks to me like we have a single clist data block that contains first the page_tile_cache, then the rendered image. | 15:53.22 |
| ray_laptop: Yes, I can see that this is all a maze of twisty passages. | 15:53.40 |
ray_laptop | Robin_Watts: yes, and the mdata area is actually both the raster lines AND the line pointers (by default) | 15:54.05 |
Robin_Watts | ray_laptop: Right. | 15:54.13 |
| How does this compare with multi-threaded rendering? | 15:54.58 |
ray_laptop | Robin_Watts: there is a facility for "foreign line pointers" that aren't in the block, but separately allocated | 15:55.04 |
Robin_Watts | Does each thread have it's own data block ? | 15:55.21 |
ray_laptop | Robin_Watts: I don't understand your question | 15:55.26 |
Robin_Watts | ray_laptop: Yes, I am aware of that, but that wasn't bothering me too much. The complexity is not in the line_pointers. | 15:55.34 |
ray_laptop | Robin_Watts: oh, yes with MT, each thread gets it own buffer area, | 15:56.00 |
Robin_Watts | So we're still dealing with this same piece of code? | 15:56.22 |
ray_laptop | yes | 15:56.30 |
Robin_Watts | OK, that's a relief. I was afraid there might be another variant of this to think about as well. | 15:56.48 |
ray_laptop | The only funky bit is that we swap the buffers between the main thread and the thread that just rendered a band we need (to avoid a copy) | 15:57.45 |
Robin_Watts | ray_laptop: Yeah, I saw the logic for that when I was doing process_page. | 15:58.39 |
ray_laptop | then at the end we have to make sure the main thread's buffer is set back to the one it allocated since we don't want it pointing at one we free thinking that a thread owns it | 15:58.43 |
Robin_Watts | So, as far as I can make out we need: | 15:59.15 |
ray_laptop | drum roll... | 15:59.28 |
Robin_Watts | 1) Some changes to the buffer size prediction code to find out about/allow for alignment/padding. | 15:59.37 |
ray_laptop | agrees with 1 | 16:00.05 |
Robin_Watts | 2) A way of identifying the aligned start position of the render stuff within the crdev->data | 16:00.29 |
| Possibly that's just modifying page_tile_cache_size up a bit? But maybe we want a separate member that stores a padded value? | 16:01.15 |
| 3) A way of storing the actual padded raster used, rather than calculating it from clist_plane_raster | 16:01.48 |
| 4) to ensure that we pass the real value of raster in to gdev_mem_set_line_ptrs rather than an incorrect one in planar mode | 16:02.35 |
| 5) to ensure that gdev_mem_set_line_ptrs doesn't recalculate the raster it's given. | 16:02.51 |
| That's all I have so far. | 16:03.53 |
ray_laptop | on 2, we need to do the alignment of 'mdata', but once we do the setup_buf_device, everyone _should_ access via the line_pointers, so no one else needs to know about the alignment or padding, right ? | 16:04.46 |
ray_laptop | agrees with 3, 4 and 5 | 16:05.31 |
Robin_Watts | ray_laptop: For users of the device, I agree. | 16:05.42 |
| I don't know how many places use the address in the code that sets up the device. | 16:06.52 |
ray_laptop | Robin_Watts: ??? | 16:07.43 |
| which address ? | 16:07.55 |
Robin_Watts | I was thinking that if there were lots of places in the code that did: blah = crdev->data + page_tile_cache_size; (such as the line I pointed you to earlier) then we might want to have a new member so we could do: | 16:08.40 |
| blah = crdev->data + start_of_rendered_data; | 16:08.55 |
ray_laptop | so making clist_rasterize_lines have enough info to perform the padding and alignment is needed. | 16:08.57 |
Robin_Watts | but if that's the only place that does the calculation ,we can just pad there, and be happy. | 16:09.26 |
| s/pad/align/ | 16:09.34 |
| I'm not entirely convinced that clist_rasterize_lines needs to be the one to have the brains in. | 16:11.18 |
ray_laptop | Robin_Watts: OK, so it looks like clist_render_thread also calls setup_buf_device and create_buf_device | 16:11.35 |
Robin_Watts | Certainly clist_rasterize_lines needs to be able to find the right aligned address, and to pass the raster into the sub functions. | 16:11.56 |
ray_laptop | I think re-factoring that logic to a shared function might be good | 16:12.04 |
| Robin_Watts: we can ignore gdevprna.c (I am going to delete that) | 16:12.28 |
Robin_Watts | but the decision on the size of the data block/page_cache_size/padding etc, may be better done elsewhere, and just some information can be left in the structure for clist_rasterize_lines to read. | 16:12.54 |
| I don't have a clear enough view on this code to see how it could be refactored nicely, but I could easily believe that such a refactoring would be beneficial. | 16:13.25 |
ray_laptop | Robin_Watts: yes, we need the alignment and padding info, probably as values in the device | 16:13.55 |
Robin_Watts | Now, this needs to be made to work for the non-clist case too. | 16:14.35 |
ray_laptop | Robin_Watts: I just grepped for page_tile_cache size, and it's really only used in gxclread (clist_rasterize_lines) and the uses in gxclthrd.c | 16:14.38 |
| Robin_Watts: and probably for non gx_device_printer devices | 16:15.13 |
Robin_Watts | And the buffer functions are only called from the clist case I believe. | 16:15.20 |
| right. | 16:15.23 |
| I had vaguely wondered about a gxdso call. | 16:15.41 |
ray_laptop | which means that raster line alignment and padding should be values is gx_device * | 16:15.55 |
Robin_Watts | gxdso_render_alignment | 16:16.06 |
| ray_laptop: Do we need separate alignment and padding values? | 16:16.23 |
ray_laptop | What's wrong with values in the device struct | 16:16.27 |
Robin_Watts | I'm generally averse to changing all devices when we don't have to. | 16:17.00 |
ray_laptop | Robin_Watts: I was thinking separate because a device may not need to constrain alignment, but does want to pad to a certain multiple | 16:17.17 |
Robin_Watts | especially given the horrific state of affairs we have with macros initialising everything already. | 16:17.26 |
ray_laptop | Robin_Watts: I admit it's easier to not have to change all the device initialiizers. | 16:17.50 |
| Robin_Watts: and this is infrequently used so a call out is OK | 16:18.12 |
Robin_Watts | I'm not averse to having both alignment and padding values. | 16:18.26 |
ray_laptop | so, a dso is OK | 16:18.33 |
Robin_Watts | ray_laptop: How are you set for paniced customers at the moment? | 16:19.01 |
| Are you in a position to help me with this? | 16:19.08 |
| I get the feeling that you'd make changes in the clist code with far less pain/breakage than me. | 16:19.30 |
ray_laptop | Robin_Watts: yes, I was going to volunteer to help. No P1 customers chewing on my behind | 16:19.34 |
Robin_Watts | Fab. That's a huge relief! :) | 16:19.44 |
| ray_laptop: So how do we proceed here? | 16:21.59 |
ray_laptop | Robin_Watts: so, I'm thinking that 'pad' will insure that each line is a multiple of 'pad' bytes, and that 4 is the minimum (to avoid breakage in the gdevmem functions). Values returned < 4 use 4. | 16:22.01 |
Robin_Watts | Values returned smaller than align_bitmap ? | 16:22.24 |
| or align_bitmap_mod or whatever the thing used in bitmap_raster is. | 16:22.51 |
ray_laptop | and that alignment be align_bitmap_mod as a minimum | 16:23.01 |
Robin_Watts | One thing that struck me earlier. | 16:23.19 |
ray_laptop | also to avoid breakage of code that assumes that align_bitmap_mod is applied. So alignment may be > align_bitmap_mod | 16:23.48 |
Robin_Watts | gs guarantees that if you have a w * h bitmap, that you can access from base to base + (pad(w) * h-1) + w | 16:24.10 |
| i.e. the padding guarantees are different at the end of the last line. | 16:24.26 |
| For SSE operation, we want to guarantee that we always safely read the last part of the line using an SSE op. | 16:24.56 |
ray_laptop | Robin_Watts: yeah, that's a bit funky | 16:24.57 |
Robin_Watts | So we should guarantee that you can do base + (pad(w) * h) | 16:25.20 |
ray_laptop | seems like a size of (pad(w) * h) should be used | 16:25.25 |
Robin_Watts | I agree. | 16:25.32 |
ray_laptop | I think that the clist is the only place where it tried to truncate the final line, but the space savings doesn't seem worth it | 16:26.42 |
Robin_Watts | ray_laptop: I'm not suggest we try to lift that restriction everywhere. | 16:27.07 |
ray_laptop | Robin_Watts: do you recall offhand where that is mentioned | 16:27.21 |
Robin_Watts | Just that we should ensure that the buffers we create in the new code should all be the larger size. | 16:27.28 |
ray_laptop | I have no problem with that | 16:28.18 |
Robin_Watts | I can't find the reference any more :( | 16:30.01 |
ray_laptop | Robin_Watts: np. | 16:30.53 |
Robin_Watts | so, ray_laptop, what's the plan here? | 16:36.05 |
| Do you have enough information to make progress? | 16:36.05 |
| If so, I'll keep bashing on the rest of the device. | 16:36.05 |
ray_laptop | Robin_Watts: I think it's time for me to clean up the stuff in gdev_prn_allocate w.r.t. guessing if we need to band | 16:37.03 |
| yuck. Looks like there are a whole mess of functions that need to understand padding and alignment, but most use gdev_mem_bits_size. I hope changing that doesn't break too much :-( | 16:39.22 |
| but since it will use a dso callout, it shouldn't change other devices | 16:40.51 |
| Robin_Watts: I'll start on it now, and shout out if I run into something I'd like to discuss before changing. | 16:42.53 |
Robin_Watts | ray_laptop: sorry, was on phone. That sounds great. Thanks. | 16:46.49 |
ray_laptop | Robin_Watts: BTW, the comment in gxbitmap.h lie 50 seems to say that bitmaps *must* be of size pad(w)*h | 16:46.59 |
| * The padding requirement is that if the last data byte being operated on | 16:47.26 |
| * is at offset B relative to the start of the scan line, bytes up to and | 16:47.28 |
| * including offset ROUND_UP(B + 1, align_bitmap_mod) - 1 may be accessed, | 16:47.29 |
| * and therefore must be allocated (not cause hardware-level access faults). | 16:47.31 |
| nothing there about the last line being smaller | 16:47.56 |
Robin_Watts | ok, cool. | 16:48.24 |
ray_laptop | I think that business about a shorter last line was confined to the clist writing | 16:48.55 |
| an area I think you had occasion to swear about | 16:49.20 |
| Robin_Watts: So you are thinking that a device that has a special alignment and padding will provide a dso and will *NOT* assume the line stride is bitmap_raster(width*depth) (as your fpng code does, for one) | 16:56.33 |
Robin_Watts | ray_laptop: Absolutely. | 16:56.48 |
ray_laptop | so I am thinking we need a bitmap_raster_padded(int width_bits, int pad) | 16:57.25 |
| oops, or bitmap_raster_padded(dev) | 16:57.55 |
| the latter would call the dso, and use the dev width and depth | 16:58.29 |
| what's your opinion ? | 16:58.53 |
Robin_Watts | urm... | 16:59.37 |
| So you're suggesting a bitmap_raster_padded(dev) that would call the dso, and if that replies, it uses that value, otherwise it uses bitmap_raster(dev->width * ...) ? | 17:00.47 |
ray_laptop | Robin_Watts: yes | 17:01.07 |
Robin_Watts | The only worry I have with that is that bitmap_raster_padded(target) would give the correct value, but bitmap_raster_padded(mdev) would not. | 17:01.27 |
| and people that didn't know better might be confused. | 17:01.40 |
ray_laptop | Robin_Watts: or we could have bitmap_raster_padded(gx_device *dev, int width_bits, int pad) | 17:01.47 |
Robin_Watts | I would be tempted to have bitmap_raster_padded as you just said, where you pass the pad value in, and it does the calculation. That way the gxdso call is kept explicit. | 17:02.46 |
| Is that what you were suggesting ? | 17:02.52 |
ray_laptop | Robin_Watts: true, so not passing the dev makes the caller get the pad | 17:03.02 |
| yes, that was my first suggestion. Thanks. | 17:03.23 |
| bitmap_raster_padded(int width_bits, int pad) | 17:03.36 |
Robin_Watts | Right. | 17:03.57 |
| Possibly we could offer both bitmap_raster_padded(int width_bits, int pad) and also 'device_raster(target)' | 17:04.26 |
| where device_raster would do the gxdso etc and call bitmap_raster_padded with the results. | 17:05.02 |
ray_laptop | Robin_Watts: so device_raster(target_dev) would be like the second suggestion. | 17:05.14 |
Robin_Watts | yeah | 17:05.23 |
| it would need to figure out if it was a planar device, cos that affects what bit depth you use. | 17:05.44 |
| The planar support in gs is slightly complicated in that some of it is written as if we were going to allow different bit depths in each plane, and all the important stuff is written so we absolutely do not allow different bit depths in each plane :) | 17:06.33 |
| so personally, I'd just assume that all the planes have the same bitdepth if it's planar. | 17:06.56 |
| as that makes the code easier, and lots of other places assume that already. | 17:07.11 |
ray_laptop | I'm thinking I need to add alignment and pad the gx_device_memory struct since gdev_mem_bits_size will need it | 17:07.14 |
Robin_Watts | s/the/to the/ ? | 17:07.54 |
ray_laptop | Robin_Watts: yeah, I saw that in gdev_mem_bits_size width * planes[pi].depth | 17:08.02 |
| yes, /to the/ | 17:08.19 |
Robin_Watts | Ok, that makes sense, yes :) | 17:08.27 |
ray_laptop | Oh, great. We have a gx_device_raster that takes a 'bool pad'. I wonder if we should have it call the dso if 'pad' is true ? | 17:18.37 |
| if 'pad' is false, it just returns the number of bytes, so that's OK | 17:19.28 |
| oops. and gdev_prn_raster returns the *unpadded* line_size :-( | 17:22.03 |
| and gdev_prn_copy_scan_lines assumes that the dest buffer is 'packed' (no alignment or pad observed since it uses the above) | 17:23.05 |
| If this doesn't blow up, I'm going to be surprised. :-/ | 17:25.21 |
| well, lots of devices use the (deprecated) copy_scan_lines, so I guess the packing issue is OK (or they'd all be broken) | 17:29.16 |
Robin_Watts | sheesh, so this device... | 17:56.15 |
ray_laptop | Robin_Watts: so, is it OK to define mdev->raster to be the size of a padded scan_line and if so, what do we do about different depths per plane ? | 17:56.20 |
Robin_Watts | ray_laptop: I think we should define mdev->raster to be the size of a padded scanline. | 17:56.38 |
ray_laptop | Robin_Watts: or do we disallow, retroactively, differing depths per plane (I think we should) | 17:56.55 |
Robin_Watts | Further, I think that for planar operation, mdev->raster should be the size of the raster for a plane. | 17:57.06 |
ray_laptop | We can error out if they differ | 17:57.14 |
Robin_Watts | We disallow differing depths per plane. | 17:57.17 |
| We already throw an error if plane depths differ., | 17:57.32 |
ray_laptop | OK. So if they differ, we'll return an error | 17:57.45 |
| from gdev_mem_bits_size | 17:57.53 |
Robin_Watts | I think Peter started to code it allowing for different depths, then realised that wasn't going to be trivial, and started to code around it with the idea he'd come back and fix it later. | 17:58.09 |
| Then I think he just gave up :) | 17:58.14 |
ray_laptop | Robin_Watts: yes, for planar, mdev->raster will be the scanline size for a plane | 17:58.38 |
Robin_Watts | This devices opens 1 FILE * per plane. | 17:58.43 |
ray_laptop | not just the total | 17:58.47 |
Robin_Watts | Then at the end of the page, closes them. | 17:58.54 |
| It never writes to those files. | 17:59.00 |
| It DOES open other files and write to them. | 17:59.09 |
| but it collects ALL the data from every plane into a buffer, then writes each plane out individually, possibly with compression. | 17:59.32 |
ray_laptop | Robin_Watts: maybe so. Way, way back there were devices like VGA 256 color that had different number of bits per colorant, but they weren't planar | 17:59.41 |
Robin_Watts | ray_laptop: yeah, I did LOTS of work on 565 devices in the past. | 18:00.02 |
ray_laptop | Robin_Watts: we all did. | 18:00.17 |
| although I was really glad when we upgraded to 12-bit color at our college (in 1974) on our Ramtek color display | 18:01.10 |
| it still had a CLUT, so we could set it for 256 gray shades | 18:01.57 |
| but for video games, 4096 color was prefered (and naughty pictures) | 18:02.28 |
| not that any of us ever did the latter | 18:02.41 |
| ;-) | 18:02.51 |
| Robin_Watts: I'll "fix" the definition of 'raster' to be as we discussed for planar devices. | 18:03.25 |
Robin_Watts | The suns I used at college had mono screens (literally mono, no greys). | 18:03.37 |
ray_laptop | Sun didn't exist yet, afaik | 18:03.53 |
Robin_Watts | Then we got some sparc boxes that did 8 bits. | 18:04.07 |
| I cut my teeth on BBC micros, so we had a tradeoff between 'high res' mono, or very low res 16 colors. | 18:04.36 |
ray_laptop | At CalComp we had the early Sun's that were diskless ethernet (NAS) only | 18:04.44 |
Robin_Watts | where 16 colors = 8 colors + flashing :) | 18:04.53 |
ray_laptop | and those were also 1-bit mono | 18:05.03 |
| Robin_Watts: yeah, like EGA | 18:05.14 |
Robin_Watts | The physics department had one Next, and that was GORGEOUS. | 18:05.31 |
| Even though they only had the greyscale one, not the color one. | 18:05.46 |
ray_laptop | I don't think I ever saw a real Next | 18:06.22 |
Robin_Watts | but nowadays, anything short of 24 bit in full HD is unusable. | 18:06.26 |
| ray_laptop: They were massively ahead of their time. | 18:06.36 |
ray_laptop | what year was that ? | 18:07.05 |
Robin_Watts | 1990ish | 18:07.36 |
ray_laptop | I see. In 1980 we built a a prototype CAD system that had 256 shade 1024x1024 with h/w line drawing with AA and h/w flood fill -- driven from 68000's | 18:09.38 |
Robin_Watts | There was a sudden transition on PCs where suddenly graphics cards changed from "maybe" being able to do 16 or 24bit color but at a lower res, to having it as an absolutely standard thing to do that at any res you wanted. | 18:09.40 |
| Dedicated cad systems had been around for a while at that stage, but they weren't standard off the shelf units you could just buy. | 18:10.44 |
ray_laptop | that was fairly advanced for the time, but the project got killed because it was too inexpensive compared to the 'cash cow' vector displays the parent company sold to IBM for CAD | 18:10.45 |
Robin_Watts | heh. | 18:10.54 |
ray_laptop | We had negotiated with CADAM to run their s/w on the 68000 (all FORTAN of course). They were excited, but the parent company had a fit when they got wind of it since the IBM displays on "big iron" was so profitable. | 18:12.31 |
| This was just a couple of years before AutoCad cleaned their clock | 18:13.00 |
| anyway, back to work... | 18:13.40 |
Robin_Watts | ray_laptop: When you looked at their SSE the other day, did you end up making sense of it? | 18:15.39 |
| I looked earlier, and I think I understand it now. | 18:15.51 |
| including the bluenoise levels they are using etc. | 18:16.02 |
| but, as far as I can tell, they only ever output in 3 of the 4 bits. | 18:16.22 |
| i.e. their maximum value is 7, not 15. | 18:16.43 |
ray_laptop | Robin_Watts Yes, I saw maxdrop coming in as 7 | 18:17.35 |
Robin_Watts | ray_laptop: Right, not just maxdrop ==7, the actual values that the SSE would end up returning would be 7. | 18:18.10 |
ray_laptop | Robin_Watts: but I still don't quite understand how they don't get extra 'noise' if all 0 comes in, adding the noise doesn't seem right | 18:18.28 |
| the BNM tables range from 0 to 255 | 18:18.54 |
Robin_Watts | they'd take a value in the 0..255 range, expand it to be in the 0..256 range, multiply it by 7 (so giving a value in the 0..0x700 range). | 18:18.59 |
| Then they'd add the noise to it (giving a value in the 0..0x7ff range) and throw away the bottom 8 bits. | 18:19.30 |
| so the noise level seems right to me. | 18:20.00 |
ray_laptop | OK, so a 0x80 comes in, stays as 128 ? then becomes 896 (0x380). So the noise > 0x80 becomes level 4. OK. | 18:23.37 |
henrys | fairly important that mvrhel_laptop be up to speed on all the 801 changes he'll be visiting the customer. | 18:24.18 |
Robin_Watts | henrys: ok. | 18:24.38 |
| I'll draft a reply to them about their device within the next couple of days and we can kick it about between us. | 18:25.08 |
henrys | sounds good. Are we really using unaligned scan lines in ghostscript - I'd like to see alignment be a requirement for all devices. | 18:26.15 |
Robin_Watts | Essentially, I shall highlight the ways in which we think we can improve on their current system: 1) Using the planar device (new technology that they probably didn't know about). 2) Using process_page (new APIs based upon their cunning code to do stuff on the threads, but more portable), 3) Using SSE aligned buffers (something we've just added for their needs). | 18:27.00 |
ray_laptop | miles is starting in position 81 (out of 84). I wonder what the S/T in the time column means ? | 18:27.19 |
Robin_Watts | ray_laptop: That means "did not prequalify" | 18:27.38 |
| i.e. he skipped the running yesterday. | 18:28.00 |
ray_laptop | henrys: no, we align to 'align_bitmap_mod' which is a compiile time setting | 18:28.06 |
Robin_Watts | henrys: We have always had "aligned" buffers (where "aligned" means "to a small degree") | 18:28.16 |
| as ray says, to align_bitmap_mod, which is 1-4 or 8. (4 on windows or 32 bit linux, 64 on 64bit linux). | 18:28.48 |
| Essentially to the size of a long. | 18:28.54 |
| but for SSE etc, you need to align to 16 bytes. | 18:29.05 |
| And if we push everything up to aligning to 16 bytes, that'll play hell with the clist compression for bitmaps etc. | 18:29.30 |
| oops, 8 on 64bit linux, sorry. | 18:29.51 |
henrys | oh I didn't realize sse2 was 16 | 18:30.08 |
ray_laptop | henrys: 16 is more efficient | 18:30.25 |
Robin_Watts | mvrhel_laptop hit the SSE alignment stuff in his thresholding code so had to do padding etc in there. | 18:31.05 |
henrys | oh right the sse2 registers are 128 bit. | 18:31.09 |
ray_laptop | Robin_Watts: so if align_mod is > pad, we will actually use align_mod, but if 'pad' comes in with something screwy (like 6 when align_mod == 4) we will pad by 8. Sound right ? | 18:37.26 |
| Robin_Watts: in other words we guarantee alignment AND provide the requested pad as a minimum | 18:37.57 |
Robin_Watts | ray_laptop: OK. | 18:38.13 |
ray_laptop | Just want to make sure it makes sense. | 18:38.35 |
| thanks. | 18:38.42 |
Robin_Watts | So the caller gets "at least the padding it requested" | 18:38.42 |
ray_laptop | and the alignment they need | 18:38.58 |
| usually align_mod and pad will be the same | 18:39.26 |
Robin_Watts | OK. | 18:39.39 |
| For this stuff, I may well want to use 16 and 32 actually. | 18:39.54 |
ray_laptop | Oh. Why ? | 18:40.17 |
Robin_Watts | because for the downsampling case, I read in 2 16 bit SSE registers worth from each of 2 scanlines. | 18:40.39 |
| Then I combine those to give me 1 16 bit SSE register full of the bluenoised average of those 4 and store it back. | 18:41.08 |
ray_laptop | Robin_Watts: OK. I didn't know you were going to tackle the downsample average with SSE as well. | 18:41.27 |
Robin_Watts | hence I need a padding of 32 for the reading. | 18:41.30 |
| I coded it earlier. Never done SSE before. I have 2 possible versions of the code, one that needs SSE3. | 18:41.56 |
ray_laptop | I still don't know why you need the extra pad. You are reading in 32 bytes, but from 3 different scanlines | 18:42.14 |
| 2 different | 18:42.26 |
| only 16 bytes from each | 18:42.40 |
| oh, nm. 2 16 byte *per* scanline. | 18:43.15 |
| sorry | 18:43.16 |
| yeah, that makes sense. And that should be a *lot* more efficient than what they had | 18:43.55 |
Robin_Watts | because I can only write out 16 at a time, I need to read 64 in (32 per scanline) at a time. | 18:44.05 |
ray_laptop | yes, I got it. sorry for the misread. | 18:44.20 |
Robin_Watts | ray_laptop: It would be nice to get the packing to 4 bits into the SSE too. | 18:44.29 |
| but I need to think about that. | 18:45.02 |
ray_laptop | yeah. shifts and or's and stuff | 18:46.38 |
Robin_Watts | ray_laptop: It's not just the shifts and ors, it's the fact that everything needs to shuffle up. | 18:47.14 |
| I've got the results as 16 8 bit values. | 18:47.24 |
| and I need to repack that as 32 4 bit values. | 18:47.36 |
| Possibly I can do it by repacking the register as 16s to another register of 8s, then shifting the original and repacking again and combining. | 18:49.15 |
| but then I will need to read 64 values per row and I might run out of SSE registers and confuse the compiler. | 18:49.54 |
ray_laptop | Robin_Watts: OK. I'll leave it to you. Going to lunch. | 18:52.46 |
| can't be any worse than the rotate raster bitmap right I did in SSE (first in C of course). | 18:54.07 |
mvrhel_laptop | henrys: yes. I do need to understand what is going on with 801 before I go | 19:19.19 |
| bbiab | 21:03.26 |
ray_laptop | Robin_Watts: did you figure out the packing ? | 21:13.44 |
| I think RL4 A, MOV to B from A, RL4 B, OR A B to A (which doesn't matter) then the same thing for the other 16 bytes, result in, say B, then PACKUSWB A and B into A. Since the data is 0-7, the sign bit of the result is never set, so we never "saturate" to 0 and the "garbage" in the low bytes should get discarded. Just an idea | 21:23.46 |
| caveat: I haven't read the Intel manual thoroughly enough yet, and when I last did SSE code was MMX :-) | 21:25.12 |
| Robin_Watts: BTW, the intrinsic for that last op is: __m128i _mm_packus_epi16 (__m128i a, __m128i b) | 21:50.27 |
| and the description is: Convert packed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst. (dst is a) | 21:51.16 |
| Each destination byte comes from dst[7:0] := Saturate_Int16_To_UnsignedInt8 (a[15:0]), etc. | 21:52.10 |
| hmm... there may be a SRL 8 needed on the two registers to get the data to the low bytes :-/ | 21:55.29 |
| once I finish with cust 532 fire drill, I'll play with it | 21:55.58 |
Robin_Watts | ray_laptop: For the logs: I've not thought any more about the SSE since I mentioned it before. I want to get the rest of the device working first. | 23:00.18 |
| I'll probably write a vanilla C version first. | 23:00.31 |
| Forward 1 day (to 2013/10/26)>>> | |