| <<<Back 1 day (to 2015/11/08) | 20151109 |
kens | Hmm so a problem that doesn't exhibit with a public device, only with a custom one, but its a bug in Ghostscript ? Well maybe..... | 08:58.36 |
| chrisl I just finished up a bug, and was about to take a quick poke at Zoltan's one, unless you are already lookign at it ? | 10:38.24 |
chrisl | kens: I'm not looking at it, no.... | 10:39.17 |
kens | OK I'll take a quick stab then | 10:39.29 |
| Though it loks more like Ray's area..... | 10:39.37 |
chrisl | TBH, my first thought was that somewhere we don't protect against the band size reaching zero..... | 10:41.12 |
kens | Its possible, I'll have to start off by modifying the makefile to build in the device, and tehn I'll look at it. I'm a bit puzzled that a regular device does not exhibit the same problem though, suspicious to my mind, but we'll see.... | 10:42.02 |
chrisl | I should have said *somone* isn't protecting against the band size reaching zero | 10:43.07 |
kens | :-) | 10:43.16 |
| I'll make a start on it, even if it's Ray's I may be able to save him some time | 10:43.40 |
| Well the first thing I notice is the his device includes its own create_compositor method, I wouldn't be at all surprised if that's the culprit right there. | 10:55.39 |
| Looks like Zoltan's problem is that at a low enough resolution (2200 dpi) a pattern does not get written as a clist. At a higher resolutoin, it does. And if its a clist, it (possibly) doesn't work. He could probably increase some threshold somewhere to avoid the clist. | 11:23.20 |
chrisl | But the pattern clist should work.... | 11:24.38 |
kens | Agreed, but it doesn't (or if it does it takes a stupendously long time) | 11:25.01 |
| Just an observation at the moment | 11:25.10 |
| I'm going to reduce the PDF file, it decompresses to a mere 42MB | 11:25.31 |
| Well... it looks like pattern clists are just abominably slow to be honest | 12:08.15 |
chrisl | Hmm, that's not good :-( | 12:09.27 |
kens | The pattern itself is quite complex, so I reduced it to a simpler fill. That now runs 'quicker' but its still terribly slow | 12:09.59 |
chrisl | Is there anything in the pattern that requires a compositor device? | 12:11.13 |
kens | Not really, the compositor seems to be only there for overprinting. I may already have removed that | 12:11.37 |
| I guess I need to check | 12:11.45 |
chrisl | I just wondered if the device using its own push compositor meant we didn't use an optimisation of some sort | 12:12.29 |
kens | Its entirely possible | 12:12.38 |
| The answer appears to be that there is no longer anythign calling the create_compositor, when I get done with this run I'll put a break point and check for certain | 12:14.07 |
| It dos set a custom halftone but that appears to be all | 12:15.50 |
| OK so there are now no calls to create_compositor, that's a definite red herring. I've removed all the offending GStates | 12:23.27 |
| Simplifying the pattern improves the speed, but its stil horrendously slow | 12:23.53 |
| I thnk thsi is one for Ray to profile, it just seems to me tht pattern clists are very, very slow | 12:24.15 |
chrisl | So, running the file with, say, psdcmyk doesn't trigger the same problem? | 12:25.09 |
kens | Haven't tried yet, I was about to, I was just diff'ing the very slow and excrucitingly slow files to make sure I hadn't done somethign dumb | 12:25.44 |
| OK tmie to try a different device. | 12:26.19 |
| Other devices use tile_colored_fill instead of tile_pattern_clist, I don't know why yet. However that route is (comparatively) fast. I used psdcmyk in order to get the separations, even though I don't thnk we're actually setting any. However tiff24nc behaved exactly the same. | 12:30.01 |
| I need to find out where the method gets set. Going to grab some lunch quickly now though | 12:30.39 |
| So, not exactly a Ghostscript bug. But the performance of pattern clists does look like a worry. | 14:22.53 |
Robin_Watts | kens: Did you figure out what caused it to use the pattern clist rather than the tile_colored_fill? | 14:24.10 |
kens | Yes I put it in the bug, the customer's device has a bit depth of 64 instead of hte normal amximum of 32 | 14:24.39 |
| This means we need a bitmap tile twice the size to hold the rendered pattern | 14:24.54 |
| Since they didn't increase the MaxPatternBitmapSize, it exceeded that and went to clists | 14:25.18 |
| Double the size (to match the bit depth) and the problem goes away | 14:25.33 |
Robin_Watts | kens: Ah, sorry, hadn't see the bug. | 14:26.59 |
kens | Yeah ti can take time for hte mail to arrive | 14:27.19 |
| I'm juist testing a standard device (tiff24nc) with -dMaxPatternBitmap=3,000,000 and I see the same problem. The normal size is 10,000,000 | 14:28.09 |
| So I'm pretty sure ths is hte root of the performance drop | 14:28.32 |
| Whether its reasonable I'm less sure. We are ending up drawing a lot of pattern tiles by laboriously executing a (quite complex) pattern over and over, but it does seem slow all the same, which is why I ahven't closed it, but given it to Ray | 14:29.19 |
| Tiff turned out to be a bad choice, cvan't write a file that big :-) | 14:30.33 |
| I guess the fact that the cusomter has so many planes may slow it down some as well | 14:30.58 |
henrys | funny when the code crashes in the typical ghostscript macro, in gdb you do mac expand to see what the macro does and then say to your self, well okay I need a different approach to debugging this ... | 15:14.27 |
| mac expand WRITE_UNALIGNED(WRITE_OR, WRITE_OR_MASKED) | 15:14.30 |
| expands to: bits = (case_right ? ((skew) < 8 ? (((((const bits16 *)(bptr))[0]) >> (skew)) & right_masks2[skew]) + ((((const bits16 *)(bptr))[0]) << (cskew)) : ((bits16)*(const byte *)(bptr) << (cskew)) & 0xff00) : ((cskew) < 8 ? (((((const bits16 *)(bptr - ((int)(sizeof(bits16)))))[0]) << (cskew)) & left_masks2[cskew]) + ((((const bits16 *)(bptr - ((int)(sizeof(bits16)))))[0]) >> (skew)) + (((bits16)(((const byte *)(bptr - | 15:14.30 |
| ((int)(sizeof(bits16)))))[2]) << (cskew)) & 0xff00) : ((((((const bits16 *)(bptr - ((int)(sizeof(bits16)))))[0]) & 0xff00) >> (skew)) & 0xff) + (((((const bits16 *)(bptr - ((int)(sizeof(bits16)))))[1]) >> (skew)) & right_masks2[skew]) + ((((const bits16 *)(bptr - ((int)(sizeof(bits16)))))[1]) << (cskew)))); ((bits16 *)dbptr)[0] |= (((bits) ^ invert) & mask); while ( count >= (((int)(sizeof(bits16)))*8) ) { bits = ((cskew) < 8 ? | 15:14.30 |
| (((((const bits16 *)(bptr))[0]) << (cskew)) & left_masks2[cskew]) + ((((const bits16 *)(bptr))[0]) >> (skew)) + (((bits16)(((const byte *)(bptr))[2]) << (cskew)) & 0xff00) : ((((((const bits16 *)(bptr))[0]) & 0xff00) >> (skew)) & 0xff) + (((((const bits16 *)(bptr))[1]) >> (skew)) & right_masks2[skew]) + ((((const bits16 *)(bptr))[1]) << (cskew))); bptr += ((int)(sizeof(bits16))); dbptr += ((int)(sizeof(bits16))); *((bits16 *)dbptr) |= | 15:14.33 |
| ((bits) ^ invert); count -= (((int)(sizeof(bits16)))*8); } if ( count > 0 ) { bits = ((cskew) < 8 ? (((((const bits16 *)(bptr))[0]) << (cskew)) & left_masks2[cskew]) + ((((const bits16 *)(bptr))[0]) >> (skew)) : (((((const bits16 *)(bptr))[0]) & 0xff00) >> (skew)) & 0xff); if ( count > skew ) bits += ((skew) < 8 ? (((((const bits16 *)(bptr + ((int)(sizeof(bits16)))))[0]) >> (skew)) & right_masks2[skew]) + ((((const bits16 *)(bptr + | 15:14.33 |
| ((int)(sizeof(bits16)))))[0]) << (cskew)) : ((bits16)*(const byte *)(bptr + ((int)(sizeof(bits16)))) << (cskew)) & 0xff00); ((bits16 *)dbptr)[1] |= (((bits) ^ invert) & rmask); } | 15:14.36 |
sebras | henrys: beautiful! :) | 15:16.48 |
henrys | that bug is clear as day now... | 15:17.38 |
Robin_Watts | but, to be fair, it probably compiles down to 3 instructions :) | 15:27.51 |
henrys | Robin_Watts: right I should just generate assembly and debug that. | 15:28.21 |
Robin_Watts | You want something that only partially expands the macros. | 15:28.57 |
| Or allows for the simplifications allowed by the constant values at that point. | 15:29.25 |
chrisl | Or sane macros..... | 15:29.47 |
henrys | I studied this ages ago and there was some reason it didn't fit nicely into an inline function. | 15:30.02 |
| but I don't remember what the reason was | 15:30.20 |
chrisl | Possibly because the macro takes macros as parameters | 15:30.46 |
| Actually, if you do macro expand WRITE_UNALIGNED(a, b) it should expand just the first level | 15:38.26 |
henrys | chrisl: no it seems to be expanding everything for me. | 15:42.38 |
kens | manual expansion in the code | 15:43.11 |
chrisl | Sorry, what I meant was it'll not expand WRITE_OR and WRITE_OR_MASKED | 15:44.13 |
henrys | chrisl: With the work I was doing it was easy to get the old language switch working with a separate instance for ps/pdf and I'm tripping over this which is really stinking of a global somewhere. Like the font cache maybe? | 15:48.14 |
chrisl | The font cache isn't in a global | 15:49.17 |
henrys | chrisl: no device sharing everything is separate. But I might have screwed up somewhere. | 15:49.22 |
chrisl | henrys: does it happen with all/most devices? | 15:53.06 |
henrys | chrisl: btw I'm just babbling I can work on it. No display device works ljet4 does not ... | 15:53.47 |
| chrisl: ppmraw is wrong - any mono device seems to trip it up | 15:55.41 |
| sorry ppmraw is good ... | 15:55.50 |
chrisl | Well, I'd have to guess that's in the rendering code - nothing in the font caching code should be different for contone vs mono | 15:57.20 |
henrys | can we write PS and PCL front ends for mupdf add support to MuPDF for high level devices and move on? ;-) I guess not. | 15:57.59 |
chrisl | Given that we can't even tidy up the ghostscript apis...... | 15:59.45 |
henrys | chrisl: a lot of it is to do with PS, I get it... but ghostscript kicks you at every turn, I can't believe jaws was as difficult to change. | 16:02.33 |
chrisl | henrys: no it wasn't. But Jaws also got a *hefty* revision internally and APIs every five years or so - tossing out the crap, the fixing the hacks | 16:03.44 |
| Jaws made no pretence about maintaining compatibility back to 1989...... | 16:05.28 |
mvrhel_laptop | I am going to be out most of this morning. Need to pick up my father at the airport | 16:44.30 |
| Forward 1 day (to 2015/11/10)>>> | |