IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2013/10/24)2013/10/25 
stanv It is necessary to fill BBox with real numbers ? I fill it with INT09:14.51 
kens2 Depends on the BBox, I suggest you check the spec09:18.44 
stanv I don't understand what is content stream ? Is it PS ? Or what ? I have stream and rg, Tf Td tj res ref …. 09:22.26 
  Where I can read about it ?09:22.37 
kens2 The PDF Reference Manual09:22.54 
stanv I can't find (09:23.02 
kens2 open DPF reference manual search for 'content stream'. First result section 3.7 "Content streams and resources". Why is this hard ?09:25.22 
stanv I did it09:26.21 
  But i can't find anything about "res" "ref"09:26.39 
kens2 No reaosn you would09:27.30 
Robin_Watts chrisl: ping10:49.53 
  I'm going to have to ignore the alignment issues etc of memory buffers until ray gets here I think. The clist (as ever) complicates things massively.11:19.46 
  Not to mention the problems I suspect we will have with the garbage collector.11:21.15 
  gc will probably want to move the buffer so it's not aligned any more.11:21.36 
  So their SSE code is bonkers.11:37.36 
  hey paulgardiner. Back home ?11:37.44 
paulgardiner Yep. Back to UK weather, although I guess it's not as bad today as it has been from all accounts.11:39.36 
  Was a thoroughly good holiday. Jumped up and down on the top of vesuvius but didn't manage to set it off.11:41.17 
Robin_Watts glad to hear it.11:44.38 
chrisl Robin_Watts: sorry, forgot to change my status..... back now11:56.22 
Robin_Watts chrisl: No worries.11:56.32 
  I was talking yesterday with ray about how we'd like the buffer devices to be aligned better.11:57.09 
  (i.e. align to 16 bytes and pad the raster to 16 bytes to enable SSE to work on it)11:57.27 
chrisl Yeh, I saw some of the discussion.....11:57.37 
Robin_Watts BUT... the garbage collector will bugger than up, right?11:57.47 
  But having looked at their SSE, I don't think they need it to be aligned anyway, so I propose to ignore the problem.11:58.22 
chrisl Maybe, but we could deal with that - IF raster memory is allocated in GC memory, which seems rather pointless to me. We could tweak gc to retain alignment, but I wonder if it would be better to allocate raster memory from non-gc memory.......11:59.40 
Robin_Watts chrisl: It looks to me like the clist allocates a large block of memory and the raster memory is magically in the middle of it somewhere.12:01.03 
  So i'm very glad to be ignoring the problem :)12:01.15 
chrisl Yeh, I think you're right - let's ignore it unless it really is a issue..... if it does become an issue, the other option would be copy data out of the raster buffer into an aligned buffer do the operations and copy them back again, until you reach the first aligned address in the raster buffer, then do the operations in place from there on.12:04.11 
Robin_Watts Their SSE is bonkers, as far as I can tell.12:11.27 
  Well, inefficient.12:11.42 
  And their output isn't really 4 bit, it's 0-712:20.29 
kens2 chrisl ping13:30.39 
chrisl kens2: pong13:30.55 
kens2 do you have a URL for the 'how to get stuff out of CUPS' page ?13:31.18 
  For bug 69474313:31.36 
chrisl https://wiki.ubuntu.com/DebuggingPrintingProblems#Capturing_print_job_data13:31.50 
kens2 THanks13:31.55 
chrisl That's going to be a 'mare to work on, without direct access to the printer in question.....13:32.37 
kens2 Well possibly, though it may be that its the PXLmono driver that's slow13:33.12 
chrisl That's true......13:33.58 
kens2 I'm just trying to get som sensible info from him13:34.16 
chrisl Once again, I really feel the CUPS devs should be the ones to work out how to reproduce the problem in Ghostscript - rather than us and the end user.....13:37.11 
kens2 Me too, but they never do13:37.21 
Robin_Watts chrisl, kens2: Frankly, I feel that we ought to push back on this.13:37.53 
kens2 Robin_Watts : we've tried that before13:38.04 
chrisl Robin_Watts: the problem is, we end up penalising the user, and it's usually not their fault13:38.19 
Robin_Watts If the cups people can't give us a command line to reproduce the problem, then they can't expect us to solve it.13:38.43 
  And as nice as it is to help users out, ultimately, that's not our job.13:39.07 
chrisl I think in this case we may not have any option......13:40.16 
Robin_Watts chrisl: RESOLVED INVALID13:40.36 
  Has there been any indication that the cups people have even looked at the problem?13:41.16 
kens2 Robin_Watts : I've spent less than 5 minutes on this, its not a great investment so far.13:43.00 
  I spend longer than that on Stack Overflow each day13:43.15 
Robin_Watts yes, but this could easily turn into a time sink, and the idea is that all the time on SO comes back multiple times as the answers are there are references.13:44.05 
  But it's you guys who get to deal with this, so I'll shut up.13:44.18 
kens2 chrisl this one may be of some interest to us in future. Someone wanting to know hwo ti 'sign the Ghsotscript print driver' on Windows 8. It seems there's a reasonably simple way to do it, and he's been kind enough to outline the steps too:13:48.00 
  http://stackoverflow.com/questions/19520540/signed-ghostscript-postscript-print-driver/19535192?noredirect=1#comment29067490_1953519213:48.00 
chrisl Interesting, I was under the impression we need stuff from MS for it, but apparently not......13:49.25 
kens2 Mayeb the certificate he talks about in step 113:49.38 
  Anyway, its far from vital13:49.47 
  Just interesting13:49.52 
Robin_Watts Morning mvrhel_laptop13:50.03 
  You speak SSE, right ?13:50.11 
mvrhel_laptop good morning13:50.13 
  Robin_Watts: I used to13:50.24 
Robin_Watts could I get you to sanity check some code later for me please?13:50.36 
kens2 A weasel reply :-)13:50.37 
mvrhel_laptop Robin_Watts: yes that would be fine13:51.17 
Robin_Watts Thanks.13:51.22 
kens2 mvrhel_laptop : ping14:02.10 
mvrhel_laptop kens2: I will be back in about 45 minutes14:12.29 
kens2 OK NP14:12.34 
Robin_Watts Morning ray_laptop 14:59.37 
ray_laptop Robin_Watts: I saw the comment earlier about buffer alignment, but I have to take my son in now, so I'll be back in about 30 m14:59.51 
Robin_Watts Can I get some time with you today to talk about this padding stuff?15:00.00 
  Ah, great, thanks.15:00.04 
ray_laptop Robin_Watts: and the GC doesn't move around chunks -- it just moves small stuff to pack chunks15:01.00 
  and since the everything larger than about 10k gets its own chunk, the "buffer" is safe15:01.34 
  Robin_Watts: tweaking the line pointer setup to leave padding, and/or perform alignment should be possible, but I'll have a look when I get back15:02.35 
Robin_Watts ray_laptop: Thanks.15:03.06 
  While it looks like their SSE code doesn't NEED alignment, it does NEED padding.15:03.27 
ray_laptop It'd be a good check to make sure that no place assumes that lines are at a given stride, rather than using the line pointers15:03.38 
Robin_Watts And alignment would let the SSE code run a lot faster.15:03.46 
  I am always slightly surprised that getbits doesn't return the raster.15:04.16 
ray_laptop Robin_Watts: AIUI even if it doesn't *need* alignment, reading aligned data is faster15:04.21 
Robin_Watts get_bits_rectangle, rather.15:04.25 
  ray_laptop: yes. but SSE has separate instructions for aligned and unaligned data.15:04.50 
  and the unaligned ones are slower.15:04.57 
ray_laptop Robin_Watts: yes, that's what I was trying to say15:05.15 
  bbiab. ~30m15:05.57 
mvrhel_laptop kens2: I am back15:20.52 
kens2 Hi michael15:21.03 
  I htink I see a problem with ghx_concretize_ICC15:21.21 
mvrhel_laptop ok15:21.27 
kens2 when I compoare the ciode with gx_remap_ICC it does not cater for Lab spaces (sepcifically remapping the components in to the correct range)15:21.56 
  with the result that I get incorrect colours for Lab spaces15:22.14 
  in gx_rempa_ICC is this:15:23.05 
  if (pcs->cmm_icc_profile_data->data_cs == gsCIELAB ||15:23.05 
  pcs->cmm_icc_profile_data->islab) {15:23.05 
  but I see no equivalent in gx_concretize_ICC15:23.18 
mvrhel_laptop hmm that is true. is there a case when going to a raster device that this causes an issue or only for what you are doing in the new pdfwrite code15:25.25 
kens2 the rendering code doesn't seem to use concretize, but the pdfwrite (actually ps2write) code does15:25.54 
  and it gets different (wrong) answers compared to rendering15:26.05 
mvrhel_laptop I understand15:26.16 
kens2 I htink the same special case range remapping needs to be done in conretize15:26.34 
mvrhel_laptop kens2: yes it looks that way15:26.42 
  do you want to go ahead and make the change15:26.53 
kens2 I'll do it tomorrow if that's OK with you15:27.04 
mvrhel_laptop I am fine with that15:27.09 
kens2 OK thanks. I'll put it past you for review before I commit15:27.24 
mvrhel_laptop ok. I am more interested in the cluster push results. I think this is just going to be a cut and paste job15:27.58 
kens2 yeah it'll be easy code, I';ll cluster push it separately15:28.18 
mvrhel_laptop ok sounds good15:28.27 
Robin_Watts http://www.meteorwatch.org/uk-iss-passes-october-2013.html15:37.36 
  Today should be a good day for seeing the space station.15:37.48 
ray_laptop Robin_Watts: OK, I'm back.15:42.26 
Robin_Watts ray_laptop: OK, so let me find the bit of code that had me interested earlier...15:42.54 
  gdev_mem_set_line_ptrs, in gdevmem.c15:43.25 
ray_laptop Robin_Watts: this gets called by gx_default_setup_buf_device.15:44.05 
Robin_Watts Indeed. Look at what happens to the value of raster that is passed in.15:44.22 
  If base is non null (i.e. if we are being given a buffer) then we take on the value of raster we are given, and we store it in mdev->raster15:45.17 
  but then when we get into the bottom loop to set up the line pointers, we completely ignore raster and calculate it again.15:45.46 
ray_laptop Robin_Watts: right, screwy15:45.54 
Robin_Watts Now, as long as the raster we pass in is the same as the calculated one, no harm, no foul.15:46.21 
  but this morning I was watching a planar device in the debugger.15:46.43 
  I was getting a raster value of 24792 or something like that on entry.15:47.03 
ray_laptop so even if we feed it a padded bytes_per_line, (called raster in that func), it would ignore it and use the "width"15:47.11 
Robin_Watts which was for a width of 4900 or something.15:48.03 
  with 5 planes.15:48.10 
  So the *actual* raster is 4900ish, not the stored value of 24792.15:48.50 
  and if you actually calculated the size of the buffer based upon the stored raster, you'd be off too, as 5*padded(w) != padded(5*w)15:49.02 
  So.... I'm thinking that we should remove the calculation from this function and only use the value passed in.15:49.33 
  And we should ensure that in planar cases, we pass in the correct value.15:49.45 
ray_laptop Robin_Watts: I agree. That seems better15:49.51 
Robin_Watts That way we can potentially pad higher up and pass in a padded width here without ill effects.15:50.18 
  OK, so let me find the next bit that had me confused.15:50.34 
ray_laptop and that would let us set the pad in a custom setup_buf_device15:51.08 
Robin_Watts clist_rasterize_lines calls gx_default_setup_buf_device which calls gdev_mem_set_line_ptrs15:51.29 
  In particular mdev->base is set to be mdata, which is calculated at line 707 of gxclread.c15:52.21 
ray_laptop Robin_Watts: but the "gotcha" is that the buffer size needed for a band height needs to know about the pad (or the band height calculated from the buffer size)15:52.44 
Robin_Watts So that looks to me like we have a single clist data block that contains first the page_tile_cache, then the rendered image.15:53.22 
  ray_laptop: Yes, I can see that this is all a maze of twisty passages.15:53.40 
ray_laptop Robin_Watts: yes, and the mdata area is actually both the raster lines AND the line pointers (by default)15:54.05 
Robin_Watts ray_laptop: Right.15:54.13 
  How does this compare with multi-threaded rendering?15:54.58 
ray_laptop Robin_Watts: there is a facility for "foreign line pointers" that aren't in the block, but separately allocated15:55.04 
Robin_Watts Does each thread have it's own data block ?15:55.21 
ray_laptop Robin_Watts: I don't understand your question15:55.26 
Robin_Watts ray_laptop: Yes, I am aware of that, but that wasn't bothering me too much. The complexity is not in the line_pointers.15:55.34 
ray_laptop Robin_Watts: oh, yes with MT, each thread gets it own buffer area, 15:56.00 
Robin_Watts So we're still dealing with this same piece of code?15:56.22 
ray_laptop yes15:56.30 
Robin_Watts OK, that's a relief. I was afraid there might be another variant of this to think about as well.15:56.48 
ray_laptop The only funky bit is that we swap the buffers between the main thread and the thread that just rendered a band we need (to avoid a copy)15:57.45 
Robin_Watts ray_laptop: Yeah, I saw the logic for that when I was doing process_page.15:58.39 
ray_laptop then at the end we have to make sure the main thread's buffer is set back to the one it allocated since we don't want it pointing at one we free thinking that a thread owns it15:58.43 
Robin_Watts So, as far as I can make out we need:15:59.15 
ray_laptop drum roll...15:59.28 
Robin_Watts 1) Some changes to the buffer size prediction code to find out about/allow for alignment/padding.15:59.37 
ray_laptop agrees with 116:00.05 
Robin_Watts 2) A way of identifying the aligned start position of the render stuff within the crdev->data16:00.29 
  Possibly that's just modifying page_tile_cache_size up a bit? But maybe we want a separate member that stores a padded value?16:01.15 
  3) A way of storing the actual padded raster used, rather than calculating it from clist_plane_raster16:01.48 
  4) to ensure that we pass the real value of raster in to gdev_mem_set_line_ptrs rather than an incorrect one in planar mode16:02.35 
  5) to ensure that gdev_mem_set_line_ptrs doesn't recalculate the raster it's given.16:02.51 
  That's all I have so far.16:03.53 
ray_laptop on 2, we need to do the alignment of 'mdata', but once we do the setup_buf_device, everyone _should_ access via the line_pointers, so no one else needs to know about the alignment or padding, right ?16:04.46 
ray_laptop agrees with 3, 4 and 516:05.31 
Robin_Watts ray_laptop: For users of the device, I agree.16:05.42 
  I don't know how many places use the address in the code that sets up the device.16:06.52 
ray_laptop Robin_Watts: ???16:07.43 
  which address ?16:07.55 
Robin_Watts I was thinking that if there were lots of places in the code that did: blah = crdev->data + page_tile_cache_size; (such as the line I pointed you to earlier) then we might want to have a new member so we could do:16:08.40 
  blah = crdev->data + start_of_rendered_data;16:08.55 
ray_laptop so making clist_rasterize_lines have enough info to perform the padding and alignment is needed.16:08.57 
Robin_Watts but if that's the only place that does the calculation ,we can just pad there, and be happy.16:09.26 
  s/pad/align/16:09.34 
  I'm not entirely convinced that clist_rasterize_lines needs to be the one to have the brains in.16:11.18 
ray_laptop Robin_Watts: OK, so it looks like clist_render_thread also calls setup_buf_device and create_buf_device16:11.35 
Robin_Watts Certainly clist_rasterize_lines needs to be able to find the right aligned address, and to pass the raster into the sub functions.16:11.56 
ray_laptop I think re-factoring that logic to a shared function might be good16:12.04 
  Robin_Watts: we can ignore gdevprna.c (I am going to delete that)16:12.28 
Robin_Watts but the decision on the size of the data block/page_cache_size/padding etc, may be better done elsewhere, and just some information can be left in the structure for clist_rasterize_lines to read.16:12.54 
  I don't have a clear enough view on this code to see how it could be refactored nicely, but I could easily believe that such a refactoring would be beneficial.16:13.25 
ray_laptop Robin_Watts: yes, we need the alignment and padding info, probably as values in the device16:13.55 
Robin_Watts Now, this needs to be made to work for the non-clist case too.16:14.35 
ray_laptop Robin_Watts: I just grepped for page_tile_cache size, and it's really only used in gxclread (clist_rasterize_lines) and the uses in gxclthrd.c16:14.38 
  Robin_Watts: and probably for non gx_device_printer devices16:15.13 
Robin_Watts And the buffer functions are only called from the clist case I believe.16:15.20 
  right.16:15.23 
  I had vaguely wondered about a gxdso call.16:15.41 
ray_laptop which means that raster line alignment and padding should be values is gx_device *16:15.55 
Robin_Watts gxdso_render_alignment16:16.06 
  ray_laptop: Do we need separate alignment and padding values?16:16.23 
ray_laptop What's wrong with values in the device struct16:16.27 
Robin_Watts I'm generally averse to changing all devices when we don't have to.16:17.00 
ray_laptop Robin_Watts: I was thinking separate because a device may not need to constrain alignment, but does want to pad to a certain multiple16:17.17 
Robin_Watts especially given the horrific state of affairs we have with macros initialising everything already.16:17.26 
ray_laptop Robin_Watts: I admit it's easier to not have to change all the device initialiizers.16:17.50 
  Robin_Watts: and this is infrequently used so a call out is OK16:18.12 
Robin_Watts I'm not averse to having both alignment and padding values.16:18.26 
ray_laptop so, a dso is OK16:18.33 
Robin_Watts ray_laptop: How are you set for paniced customers at the moment?16:19.01 
  Are you in a position to help me with this?16:19.08 
  I get the feeling that you'd make changes in the clist code with far less pain/breakage than me.16:19.30 
ray_laptop Robin_Watts: yes, I was going to volunteer to help. No P1 customers chewing on my behind16:19.34 
Robin_Watts Fab. That's a huge relief! :)16:19.44 
  ray_laptop: So how do we proceed here?16:21.59 
ray_laptop Robin_Watts: so, I'm thinking that 'pad' will insure that each line is a multiple of 'pad' bytes, and that 4 is the minimum (to avoid breakage in the gdevmem functions). Values returned < 4 use 4.16:22.01 
Robin_Watts Values returned smaller than align_bitmap ?16:22.24 
  or align_bitmap_mod or whatever the thing used in bitmap_raster is.16:22.51 
ray_laptop and that alignment be align_bitmap_mod as a minimum16:23.01 
Robin_Watts One thing that struck me earlier.16:23.19 
ray_laptop also to avoid breakage of code that assumes that align_bitmap_mod is applied. So alignment may be > align_bitmap_mod16:23.48 
Robin_Watts gs guarantees that if you have a w * h bitmap, that you can access from base to base + (pad(w) * h-1) + w16:24.10 
  i.e. the padding guarantees are different at the end of the last line.16:24.26 
  For SSE operation, we want to guarantee that we always safely read the last part of the line using an SSE op.16:24.56 
ray_laptop Robin_Watts: yeah, that's a bit funky16:24.57 
Robin_Watts So we should guarantee that you can do base + (pad(w) * h)16:25.20 
ray_laptop seems like a size of (pad(w) * h) should be used16:25.25 
Robin_Watts I agree.16:25.32 
ray_laptop I think that the clist is the only place where it tried to truncate the final line, but the space savings doesn't seem worth it16:26.42 
Robin_Watts ray_laptop: I'm not suggest we try to lift that restriction everywhere.16:27.07 
ray_laptop Robin_Watts: do you recall offhand where that is mentioned16:27.21 
Robin_Watts Just that we should ensure that the buffers we create in the new code should all be the larger size.16:27.28 
ray_laptop I have no problem with that16:28.18 
Robin_Watts I can't find the reference any more :(16:30.01 
ray_laptop Robin_Watts: np.16:30.53 
Robin_Watts so, ray_laptop, what's the plan here?16:36.05 
  Do you have enough information to make progress?16:36.05 
  If so, I'll keep bashing on the rest of the device.16:36.05 
ray_laptop Robin_Watts: I think it's time for me to clean up the stuff in gdev_prn_allocate w.r.t. guessing if we need to band16:37.03 
  yuck. Looks like there are a whole mess of functions that need to understand padding and alignment, but most use gdev_mem_bits_size. I hope changing that doesn't break too much :-(16:39.22 
  but since it will use a dso callout, it shouldn't change other devices16:40.51 
  Robin_Watts: I'll start on it now, and shout out if I run into something I'd like to discuss before changing.16:42.53 
Robin_Watts ray_laptop: sorry, was on phone. That sounds great. Thanks.16:46.49 
ray_laptop Robin_Watts: BTW, the comment in gxbitmap.h lie 50 seems to say that bitmaps *must* be of size pad(w)*h16:46.59 
  * The padding requirement is that if the last data byte being operated on16:47.26 
  * is at offset B relative to the start of the scan line, bytes up to and16:47.28 
  * including offset ROUND_UP(B + 1, align_bitmap_mod) - 1 may be accessed,16:47.29 
  * and therefore must be allocated (not cause hardware-level access faults).16:47.31 
  nothing there about the last line being smaller16:47.56 
Robin_Watts ok, cool.16:48.24 
ray_laptop I think that business about a shorter last line was confined to the clist writing16:48.55 
  an area I think you had occasion to swear about16:49.20 
  Robin_Watts: So you are thinking that a device that has a special alignment and padding will provide a dso and will *NOT* assume the line stride is bitmap_raster(width*depth) (as your fpng code does, for one)16:56.33 
Robin_Watts ray_laptop: Absolutely.16:56.48 
ray_laptop so I am thinking we need a bitmap_raster_padded(int width_bits, int pad)16:57.25 
  oops, or bitmap_raster_padded(dev)16:57.55 
  the latter would call the dso, and use the dev width and depth16:58.29 
  what's your opinion ?16:58.53 
Robin_Watts urm...16:59.37 
  So you're suggesting a bitmap_raster_padded(dev) that would call the dso, and if that replies, it uses that value, otherwise it uses bitmap_raster(dev->width * ...) ?17:00.47 
ray_laptop Robin_Watts: yes17:01.07 
Robin_Watts The only worry I have with that is that bitmap_raster_padded(target) would give the correct value, but bitmap_raster_padded(mdev) would not.17:01.27 
  and people that didn't know better might be confused.17:01.40 
ray_laptop Robin_Watts: or we could have bitmap_raster_padded(gx_device *dev, int width_bits, int pad)17:01.47 
Robin_Watts I would be tempted to have bitmap_raster_padded as you just said, where you pass the pad value in, and it does the calculation. That way the gxdso call is kept explicit.17:02.46 
  Is that what you were suggesting ?17:02.52 
ray_laptop Robin_Watts: true, so not passing the dev makes the caller get the pad17:03.02 
  yes, that was my first suggestion. Thanks.17:03.23 
  bitmap_raster_padded(int width_bits, int pad)17:03.36 
Robin_Watts Right.17:03.57 
  Possibly we could offer both bitmap_raster_padded(int width_bits, int pad) and also 'device_raster(target)'17:04.26 
  where device_raster would do the gxdso etc and call bitmap_raster_padded with the results.17:05.02 
ray_laptop Robin_Watts: so device_raster(target_dev) would be like the second suggestion.17:05.14 
Robin_Watts yeah17:05.23 
  it would need to figure out if it was a planar device, cos that affects what bit depth you use.17:05.44 
  The planar support in gs is slightly complicated in that some of it is written as if we were going to allow different bit depths in each plane, and all the important stuff is written so we absolutely do not allow different bit depths in each plane :)17:06.33 
  so personally, I'd just assume that all the planes have the same bitdepth if it's planar.17:06.56 
  as that makes the code easier, and lots of other places assume that already.17:07.11 
ray_laptop I'm thinking I need to add alignment and pad the gx_device_memory struct since gdev_mem_bits_size will need it17:07.14 
Robin_Watts s/the/to the/ ?17:07.54 
ray_laptop Robin_Watts: yeah, I saw that in gdev_mem_bits_size width * planes[pi].depth17:08.02 
  yes, /to the/17:08.19 
Robin_Watts Ok, that makes sense, yes :)17:08.27 
ray_laptop Oh, great. We have a gx_device_raster that takes a 'bool pad'. I wonder if we should have it call the dso if 'pad' is true ?17:18.37 
  if 'pad' is false, it just returns the number of bytes, so that's OK17:19.28 
  oops. and gdev_prn_raster returns the *unpadded* line_size :-(17:22.03 
  and gdev_prn_copy_scan_lines assumes that the dest buffer is 'packed' (no alignment or pad observed since it uses the above)17:23.05 
  If this doesn't blow up, I'm going to be surprised. :-/17:25.21 
  well, lots of devices use the (deprecated) copy_scan_lines, so I guess the packing issue is OK (or they'd all be broken)17:29.16 
Robin_Watts sheesh, so this device...17:56.15 
ray_laptop Robin_Watts: so, is it OK to define mdev->raster to be the size of a padded scan_line and if so, what do we do about different depths per plane ?17:56.20 
Robin_Watts ray_laptop: I think we should define mdev->raster to be the size of a padded scanline.17:56.38 
ray_laptop Robin_Watts: or do we disallow, retroactively, differing depths per plane (I think we should)17:56.55 
Robin_Watts Further, I think that for planar operation, mdev->raster should be the size of the raster for a plane.17:57.06 
ray_laptop We can error out if they differ17:57.14 
Robin_Watts We disallow differing depths per plane.17:57.17 
  We already throw an error if plane depths differ.,17:57.32 
ray_laptop OK. So if they differ, we'll return an error17:57.45 
  from gdev_mem_bits_size17:57.53 
Robin_Watts I think Peter started to code it allowing for different depths, then realised that wasn't going to be trivial, and started to code around it with the idea he'd come back and fix it later.17:58.09 
  Then I think he just gave up :)17:58.14 
ray_laptop Robin_Watts: yes, for planar, mdev->raster will be the scanline size for a plane17:58.38 
Robin_Watts This devices opens 1 FILE * per plane.17:58.43 
ray_laptop not just the total17:58.47 
Robin_Watts Then at the end of the page, closes them.17:58.54 
  It never writes to those files.17:59.00 
  It DOES open other files and write to them.17:59.09 
  but it collects ALL the data from every plane into a buffer, then writes each plane out individually, possibly with compression.17:59.32 
ray_laptop Robin_Watts: maybe so. Way, way back there were devices like VGA 256 color that had different number of bits per colorant, but they weren't planar17:59.41 
Robin_Watts ray_laptop: yeah, I did LOTS of work on 565 devices in the past.18:00.02 
ray_laptop Robin_Watts: we all did.18:00.17 
  although I was really glad when we upgraded to 12-bit color at our college (in 1974) on our Ramtek color display18:01.10 
  it still had a CLUT, so we could set it for 256 gray shades18:01.57 
  but for video games, 4096 color was prefered (and naughty pictures)18:02.28 
  not that any of us ever did the latter18:02.41 
  ;-)18:02.51 
  Robin_Watts: I'll "fix" the definition of 'raster' to be as we discussed for planar devices.18:03.25 
Robin_Watts The suns I used at college had mono screens (literally mono, no greys).18:03.37 
ray_laptop Sun didn't exist yet, afaik18:03.53 
Robin_Watts Then we got some sparc boxes that did 8 bits.18:04.07 
  I cut my teeth on BBC micros, so we had a tradeoff between 'high res' mono, or very low res 16 colors.18:04.36 
ray_laptop At CalComp we had the early Sun's that were diskless ethernet (NAS) only18:04.44 
Robin_Watts where 16 colors = 8 colors + flashing :)18:04.53 
ray_laptop and those were also 1-bit mono18:05.03 
  Robin_Watts: yeah, like EGA18:05.14 
Robin_Watts The physics department had one Next, and that was GORGEOUS.18:05.31 
  Even though they only had the greyscale one, not the color one.18:05.46 
ray_laptop I don't think I ever saw a real Next 18:06.22 
Robin_Watts but nowadays, anything short of 24 bit in full HD is unusable.18:06.26 
  ray_laptop: They were massively ahead of their time.18:06.36 
ray_laptop what year was that ?18:07.05 
Robin_Watts 1990ish18:07.36 
ray_laptop I see. In 1980 we built a a prototype CAD system that had 256 shade 1024x1024 with h/w line drawing with AA and h/w flood fill -- driven from 68000's 18:09.38 
Robin_Watts There was a sudden transition on PCs where suddenly graphics cards changed from "maybe" being able to do 16 or 24bit color but at a lower res, to having it as an absolutely standard thing to do that at any res you wanted.18:09.40 
  Dedicated cad systems had been around for a while at that stage, but they weren't standard off the shelf units you could just buy.18:10.44 
ray_laptop that was fairly advanced for the time, but the project got killed because it was too inexpensive compared to the 'cash cow' vector displays the parent company sold to IBM for CAD18:10.45 
Robin_Watts heh.18:10.54 
ray_laptop We had negotiated with CADAM to run their s/w on the 68000 (all FORTAN of course). They were excited, but the parent company had a fit when they got wind of it since the IBM displays on "big iron" was so profitable.18:12.31 
  This was just a couple of years before AutoCad cleaned their clock 18:13.00 
  anyway, back to work...18:13.40 
Robin_Watts ray_laptop: When you looked at their SSE the other day, did you end up making sense of it?18:15.39 
  I looked earlier, and I think I understand it now.18:15.51 
  including the bluenoise levels they are using etc.18:16.02 
  but, as far as I can tell, they only ever output in 3 of the 4 bits.18:16.22 
  i.e. their maximum value is 7, not 15.18:16.43 
ray_laptop Robin_Watts Yes, I saw maxdrop coming in as 718:17.35 
Robin_Watts ray_laptop: Right, not just maxdrop ==7, the actual values that the SSE would end up returning would be 7.18:18.10 
ray_laptop Robin_Watts: but I still don't quite understand how they don't get extra 'noise' if all 0 comes in, adding the noise doesn't seem right18:18.28 
  the BNM tables range from 0 to 25518:18.54 
Robin_Watts they'd take a value in the 0..255 range, expand it to be in the 0..256 range, multiply it by 7 (so giving a value in the 0..0x700 range).18:18.59 
  Then they'd add the noise to it (giving a value in the 0..0x7ff range) and throw away the bottom 8 bits.18:19.30 
  so the noise level seems right to me.18:20.00 
ray_laptop OK, so a 0x80 comes in, stays as 128 ? then becomes 896 (0x380). So the noise > 0x80 becomes level 4. OK.18:23.37 
henrys fairly important that mvrhel_laptop be up to speed on all the 801 changes he'll be visiting the customer.18:24.18 
Robin_Watts henrys: ok.18:24.38 
  I'll draft a reply to them about their device within the next couple of days and we can kick it about between us.18:25.08 
henrys sounds good. Are we really using unaligned scan lines in ghostscript - I'd like to see alignment be a requirement for all devices.18:26.15 
Robin_Watts Essentially, I shall highlight the ways in which we think we can improve on their current system: 1) Using the planar device (new technology that they probably didn't know about). 2) Using process_page (new APIs based upon their cunning code to do stuff on the threads, but more portable), 3) Using SSE aligned buffers (something we've just added for their needs).18:27.00 
ray_laptop miles is starting in position 81 (out of 84). I wonder what the S/T in the time column means ?18:27.19 
Robin_Watts ray_laptop: That means "did not prequalify"18:27.38 
  i.e. he skipped the running yesterday.18:28.00 
ray_laptop henrys: no, we align to 'align_bitmap_mod' which is a compiile time setting18:28.06 
Robin_Watts henrys: We have always had "aligned" buffers (where "aligned" means "to a small degree")18:28.16 
  as ray says, to align_bitmap_mod, which is 1-4 or 8. (4 on windows or 32 bit linux, 64 on 64bit linux).18:28.48 
  Essentially to the size of a long.18:28.54 
  but for SSE etc, you need to align to 16 bytes.18:29.05 
  And if we push everything up to aligning to 16 bytes, that'll play hell with the clist compression for bitmaps etc.18:29.30 
  oops, 8 on 64bit linux, sorry.18:29.51 
henrys oh I didn't realize sse2 was 1618:30.08 
ray_laptop henrys: 16 is more efficient18:30.25 
Robin_Watts mvrhel_laptop hit the SSE alignment stuff in his thresholding code so had to do padding etc in there.18:31.05 
henrys oh right the sse2 registers are 128 bit.18:31.09 
ray_laptop Robin_Watts: so if align_mod is > pad, we will actually use align_mod, but if 'pad' comes in with something screwy (like 6 when align_mod == 4) we will pad by 8. Sound right ?18:37.26 
  Robin_Watts: in other words we guarantee alignment AND provide the requested pad as a minimum18:37.57 
Robin_Watts ray_laptop: OK.18:38.13 
ray_laptop Just want to make sure it makes sense.18:38.35 
  thanks.18:38.42 
Robin_Watts So the caller gets "at least the padding it requested"18:38.42 
ray_laptop and the alignment they need18:38.58 
  usually align_mod and pad will be the same18:39.26 
Robin_Watts OK.18:39.39 
  For this stuff, I may well want to use 16 and 32 actually.18:39.54 
ray_laptop Oh. Why ?18:40.17 
Robin_Watts because for the downsampling case, I read in 2 16 bit SSE registers worth from each of 2 scanlines.18:40.39 
  Then I combine those to give me 1 16 bit SSE register full of the bluenoised average of those 4 and store it back.18:41.08 
ray_laptop Robin_Watts: OK. I didn't know you were going to tackle the downsample average with SSE as well.18:41.27 
Robin_Watts hence I need a padding of 32 for the reading.18:41.30 
  I coded it earlier. Never done SSE before. I have 2 possible versions of the code, one that needs SSE3.18:41.56 
ray_laptop I still don't know why you need the extra pad. You are reading in 32 bytes, but from 3 different scanlines18:42.14 
  2 different18:42.26 
  only 16 bytes from each18:42.40 
  oh, nm. 2 16 byte *per* scanline.18:43.15 
  sorry18:43.16 
  yeah, that makes sense. And that should be a *lot* more efficient than what they had18:43.55 
Robin_Watts because I can only write out 16 at a time, I need to read 64 in (32 per scanline) at a time.18:44.05 
ray_laptop yes, I got it. sorry for the misread.18:44.20 
Robin_Watts ray_laptop: It would be nice to get the packing to 4 bits into the SSE too.18:44.29 
  but I need to think about that.18:45.02 
ray_laptop yeah. shifts and or's and stuff18:46.38 
Robin_Watts ray_laptop: It's not just the shifts and ors, it's the fact that everything needs to shuffle up.18:47.14 
  I've got the results as 16 8 bit values.18:47.24 
  and I need to repack that as 32 4 bit values.18:47.36 
  Possibly I can do it by repacking the register as 16s to another register of 8s, then shifting the original and repacking again and combining.18:49.15 
  but then I will need to read 64 values per row and I might run out of SSE registers and confuse the compiler.18:49.54 
ray_laptop Robin_Watts: OK. I'll leave it to you. Going to lunch.18:52.46 
  can't be any worse than the rotate raster bitmap right I did in SSE (first in C of course).18:54.07 
mvrhel_laptop henrys: yes. I do need to understand what is going on with 801 before I go19:19.19 
  bbiab21:03.26 
ray_laptop Robin_Watts: did you figure out the packing ?21:13.44 
  I think RL4 A, MOV to B from A, RL4 B, OR A B to A (which doesn't matter) then the same thing for the other 16 bytes, result in, say B, then PACKUSWB A and B into A. Since the data is 0-7, the sign bit of the result is never set, so we never "saturate" to 0 and the "garbage" in the low bytes should get discarded. Just an idea21:23.46 
  caveat: I haven't read the Intel manual thoroughly enough yet, and when I last did SSE code was MMX :-)21:25.12 
  Robin_Watts: BTW, the intrinsic for that last op is: __m128i _mm_packus_epi16 (__m128i a, __m128i b)21:50.27 
  and the description is: Convert packed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst. (dst is a)21:51.16 
  Each destination byte comes from dst[7:0] := Saturate_Int16_To_UnsignedInt8 (a[15:0]), etc.21:52.10 
  hmm... there may be a SRL 8 needed on the two registers to get the data to the low bytes :-/21:55.29 
  once I finish with cust 532 fire drill, I'll play with it21:55.58 
Robin_Watts ray_laptop: For the logs: I've not thought any more about the SSE since I mentioned it before. I want to get the rest of the device working first.23:00.18 
  I'll probably write a vanilla C version first.23:00.31 
 Forward 1 day (to 2013/10/26)>>> 
ghostscript.com
Search: