| <<<Back 1 day (to 2018/08/01) | 20180802 |
rnissl | Hi kens, I once again have a question regarding pdfwrite ;-) | 07:50.29 |
kens | OK | 07:51.10 |
rnissl | Is there a way to tell ghostscript that the same pice of postscript code appears on every page, hence pdfwrite could reuse the object of page 1? | 07:51.37 |
kens | Yes, a PostScript form | 07:51.48 |
rnissl | is this meanwhile supported? | 07:51.59 |
kens | Note that pdfwrite won't reuse PDF form XObjects | 07:52.03 |
| pdfwrite will embed a PostScript form as a PDF Form XObject and will use that once instead of including it each time it is used | 07:52.31 |
| But if the input is PDF, then we don't do the same trick with PDF Form XObjects | 07:52.51 |
| Because while PostScript programs generally use Forms sensibly, PDF files generally do not | 07:53.09 |
rnissl | the last time I read about PS form support in ghostscript, the documentation mentioned that it will simply render the content each time. | 07:53.44 |
kens | That's for rendering | 07:53.56 |
| pdfwrite isn't a rendering device. It may also be that I added the support since you last read teh dox | 07:54.24 |
| The code to use forms has been present for more than 3 years, because I see a bug fix against it in February 2015 | 07:55.26 |
| That commit improves performance by not rerunning the PaintProc when a form is a duplicate of an earlier one | 07:56.01 |
| http://git.ghostscript.com/?p=ghostpdl.git;a=commit;h=17131c7a7fdf9d8537c4715e538c49b29c8945a8 | 07:56.24 |
| That reduced an 80MB output file to a 4MB output file and reduced teh time from 20 minutes to 16 seconds | 07:56.56 |
kens | fetches coffee | 08:04.33 |
rnissl | thanks kens, I'll give PS form a try :-) | 08:13.51 |
kens | OK | 08:14.01 |
rnissl | bye | 08:14.05 |
kens | bb | 08:14.09 |
| chrisl so I'm looking at this text problem with pdfwrite | 13:25.55 |
| Its braodly similar to the problem you fixed before | 13:26.04 |
| The font has a FontMatrix of [0.001 0 0 0.001 0 0] | 13:26.17 |
| And a FontBBox of [0 0 2 1] | 13:26.24 |
chrisl | Oh, the disappearing text issue - right | 13:26.44 |
kens | In this case though, the glyph is only partially drawn, because its clipped | 13:26.46 |
| Because the estimated size of the glyph lies completely outside the clip, we drop it | 13:27.02 |
| Because the estimate is, of course, miniscule | 13:27.12 |
| I see no easy solution to this | 13:27.20 |
| We don't want to ignore the clip, becaue then we'd emit text that is totally invisible, inflating the size of the PDF | 13:27.42 |
| We don't want to run the CharString because that would be expensive | 13:28.00 |
chrisl | The only full solution is to run the charstring, get the "real" bbox | 13:28.11 |
kens | Yeah but I'd rather not do that, I'm not even sure how to do it from the point in the code we're at. | 13:28.30 |
chrisl | If it's the same code I changed, we can do it, pretty easily, actually | 13:28.56 |
kens | Really ? Its in the same routine | 13:29.08 |
chrisl | My first solution was to do that | 13:29.09 |
kens | Oh, well I guess maybe that would be best if you have a way to do that. | 13:29.22 |
| It'll be expensive but my only other solution is a hack | 13:29.31 |
| To look at the FontBBox product with the FotnMatrix, and if its tiny, use the inverse of the FotnMatrix for the FontBBox | 13:29.53 |
| Getting the real width would be better, if slower. | 13:30.13 |
chrisl | I take it, in this case, the glyphs are out of the top, or bottom of the clip? | 13:30.40 |
kens | Top, bottom, left right | 13:30.51 |
| Generally 'outside' | 13:31.00 |
chrisl | Left and right, we should be using the advance width of the glyph | 13:31.32 |
kens | The left side is clipped away | 13:31.46 |
| Which means that the code thinks the entire glyph is clipped out | 13:32.01 |
| Because the size is too small | 13:32.07 |
| The advance width is 0 in this case, its an xyshow | 13:32.20 |
chrisl | Stupid, stupid..... | 13:32.34 |
kens | You should see what else the EPS does..... | 13:32.49 |
| It draws one glyph 3 times, apparently just to make it look really ugly | 13:33.06 |
chrisl | Nice! | 13:33.14 |
kens | Every single one of the charpath or show operations for that glyph lies outside teh clip path as well | 13:33.43 |
| Because the 'size' of the glyph is so small | 13:33.56 |
| Crap fotn used in a crap way | 13:34.08 |
chrisl | Of course, I need to remember how to run the charstring.... <sigh> | 13:37.10 |
kens | Well you can probably figure it out faster than me, I'm clueless..... | 13:37.27 |
chrisl | Do you have a cut down test file? | 13:37.47 |
kens | Yes, *very* much smaller, I'll mail it | 13:37.59 |
chrisl | Thanks | 13:38.06 |
kens | On its way | 13:39.22 |
chrisl | Hmm, no sign yet..... | 13:41.34 |
kens | ah its stuck, 1 sec | 13:41.59 |
| Gone now | 13:42.10 |
chrisl | Got it | 13:42.28 |
kens | That's a relief | 13:42.39 |
chrisl | Do I need to use -dEPSFitPage ? | 13:43.33 |
kens | -dEPSCrop | 13:43.42 |
| I haven't got round to translating the page | 13:43.50 |
| -dEPSFitPage probably works too | 13:44.10 |
| need coffee brb | 13:45.44 |
chrisl | kens: so, doing this: http://git.ghostscript.com/?p=user/chrisl/ghostpdl.git;a=commitdiff;h=c0aabc06a131def424c3e881df54162399e868c7 | 14:00.10 |
| Means the glyph_info call runs the charstring, gets the path, and gives a real bbox for the glyph | 14:00.33 |
kens | Really ? Huh well there you go, I'd assumed it would be much harder than that | 14:00.46 |
chrisl | Obviously, the current code in process_text_estimate_bbox() doesn't *use* the bbox, so that needs adding | 14:01.00 |
kens | Right I can do that, thanks | 14:01.13 |
chrisl | The other thing is, I probably have more GLYPH_INFO_* options in there than I actually need, but the way that gs_type1_glyph_info() handles those is.... well, baffling, to me | 14:02.30 |
kens | I've no idea what most of those do | 14:02.49 |
| I'd 'guess' the 'PIECES' ones are for compound glyphs | 14:03.01 |
chrisl | In a Type 1? | 14:03.18 |
kens | Yeah you know SEAC glyphs | 14:03.27 |
chrisl | I guess... | 14:03.48 |
kens | Couldn't remember the term at first | 14:03.53 |
| The rest of it, I've no idea | 14:03.59 |
chrisl | The point is, with all the options, we'll run the charstring once all the way through, then again, dropping out early after the hsbw operator | 14:04.41 |
kens | That should allow me to get rid of the degenerate FontBBox test too | 14:04.45 |
chrisl | If we can work out the right combination to avoid the "width_members" conditional, but keep the "default_members", we'll only run the charstring once | 14:05.28 |
kens | That would be handly, let me try it like this and then poke it with a sharp stick to see if I can get what I want | 14:05.53 |
| I think I did work through all this a long time ago for a similar problem | 14:06.17 |
chrisl | Actually, with the real bbox, we don't care about the "width" do we? | 14:07.47 |
kens | I suspect not no | 14:07.58 |
| There looks to be a lot of possible simplification in this code with the real glyph bbox | 14:08.15 |
chrisl | Given we just need the bbox: http://git.ghostscript.com/?p=user/chrisl/ghostpdl.git;a=commitdiff;h=dfd125010cfe5664f0164585efae2d3e91fbbcf9 | 14:10.36 |
kens | 1 second | 14:10.51 |
| Well a quick hack 'nearly' works | 14:12.07 |
| The top of the 'F' is still missing, that's a vertical problem | 14:12.28 |
| Need to look at that in a minute, of course if we only get hsbw then that won't give us the height | 14:13.09 |
chrisl | One dowside about this is that we'll run this for every clipped instance of a glyph, until we reach an unclipped one.... but I guess that shouldn't be too onerous | 14:18.27 |
kens | Clipped glyphs aren't really that common | 14:18.47 |
| I'm screwing up the y value somewhere :-( | 14:19.08 |
chrisl | Well, there's a lot going on in there - but you are now getting a populated bbox? | 14:19.41 |
kens | Oh yes, its already much bettter | 14:20.32 |
chrisl | So, a simple matter programming then.... | 14:20.47 |
kens | Yeah I just need to figure out what I've broken :-) Thanks for the assist! | 14:21.03 |
chrisl | NP, I kind of wish I'd just gone with this the last time - 20/20 hindsight etc | 14:21.36 |
kens | Well there's no real way to see this in advance I don't think, it never occured to me either | 14:22.03 |
chrisl | Yeh, it's just ironic: if I had been *lazier*, this issue would already have been fixed :-) | 14:23.11 |
kens | LOL | 14:23.19 |
| OK screwup fixed. | 14:23.55 |
| Ugly text now appears as expected, so that's a good fix. I just want to mimic it in the /.notdef case, and see if I can optimise away some of this. | 14:25.09 |
| Think I'll do a cluster run first...... | 14:25.18 |
| Well that's unfortunate, lots of seg faults | 14:56.27 |
| chrisl it looks like trying to do that bbox on a type 42 font (TrueType, its a PDF) causes a crash | 15:03.03 |
chrisl | Well, that's bad - it ought to work. Which file? | 15:03.43 |
kens | gxttfb.c, gx_ttfReader_set_font, self is NULL | 15:03.55 |
chrisl | Which test file | 15:04.07 |
kens | I was using Bug692242.pdf | 15:04.09 |
| and the pdfwrite device obviously | 15:04.17 |
| My debugger is giving me nonsensical results on the back trace | 15:15.26 |
| Ah, no its not, its one fo those stupid if ( function() || function() things | 15:16.05 |
| OK so the 'pair' returned from fm_pair has ttr = NULL | 15:18.01 |
chrisl | Yeh, I have a horrid, hacky solution for that - but the bbox is coming back as all zeros.... | 15:18.53 |
kens | Oh.... | 15:19.20 |
| I could try applying this only when the FontType is 1 or 3 | 15:19.36 |
chrisl | Um, that might be just one glyph though, the next one looks sensible.... | 15:19.58 |
kens | And fallback to the FontBBox otherwise | 15:19.59 |
| Oh! | 15:20.06 |
chrisl | kens: So, when I said hacky.... the change in base/gstype42.c: | 15:21.49 |
| http://git.ghostscript.com/?p=user/chrisl/ghostpdl.git;a=commitdiff;h=3d9fb7afbf6ac2d6365be6ed90a6485d95556248 | 15:21.53 |
kens | Well, I have no real clue what's going on there, its all font'y | 15:22.29 |
| But not crashing is good :-) | 15:22.39 |
chrisl | The gx_provide_fm_pair_attributes() creates the ttf reader, and the ttf reader font (really a hacked up Freetype 1 ttf font) | 15:24.04 |
kens | Hmm, does it return reasonable values ? I haven't actually checked the return from the info proc | 15:24.28 |
| The PDF outptu file looks reasonable at a quick glance | 15:25.02 |
chrisl | It return values.... I'm not totally sure what would consitute "reasonable" without going through it in a *lot* more detail | 15:25.23 |
kens | Hmm the glyph bbox is all 0 for me | 15:25.39 |
chrisl | In Bug692242.pdf? | 15:26.00 |
kens | Yes | 15:26.04 |
| Page 2 | 15:26.06 |
| First time it hits that piece of code in gstype42.c | 15:26.21 |
| When I get back to process_text_estimate_bbox the info.bbox is all 0 | 15:26.45 |
chrisl | Right, but the second and subsequent times | 15:26.47 |
kens | Hmm, haven't checked the second hit. | 15:27.18 |
| But regardless, I still need to check if its all 0 for any result | 15:27.30 |
| Unless you think that all 0 is correct ? I suppose it could eb a .notdef or something | 15:27.45 |
| Hmm yeah second and subsequent glyphs are OK | 15:28.36 |
| maybe its just a real stupid gluyph | 15:28.44 |
| I'll give a cluster run a bash | 15:28.53 |
chrisl | So, the first glyph used in glyph_index 0, and the size of the glyph is 0 bytes, so..... | 15:29.03 |
| Sorry, glyph_index 1 | 15:29.10 |
kens | NP | 15:29.14 |
| Dumb glyphs deserve to be treated that way | 15:29.27 |
chrisl | I think we used to throw an error for that, and I changed it to treat it as an empty glyph | 15:30.03 |
kens | Oh, OK well I'm not going to worry about it too much. Just annoying we had to trip over that first! | 15:30.27 |
chrisl | I *suspect* it's a dumb way to subset a TTF | 15:30.43 |
kens | OK well the seg faults go away, which is good. 200 diffs to look at | 15:50.45 |
| Well Altona_Technical_v20_x4.pdf coems out wrong :-( | 16:03.01 |
| Looks like some clipping is occuring | 16:03.20 |
| Text is certainly disappearaing, a job for tomorrow I think | 16:03.52 |
| Forward 1 day (to 2018/08/03)>>> | |