| <<<Back 1 day (to 2016/02/09) | 20160210 |
sebras | tor8: I left two commits on sebras/master: 1. support for text-top/-bottom and 2. a patch to support cjk punctuation in iscjk(), not having this included confused me a bunch when I attempted to understand why my chinese input got breaks where I didn't expect them. ;) | 00:19.22 |
halabund | I have asked this question a year or so ago, but I still donât have a solution and I am hoping that some new tools may have become available since then: | 10:34.47 |
| Is there a way to scale a single-page PDF document by a some factor? | 10:35.04 |
| Nothing should visibly change when displayed on screen, however the print size should change. E.g. when scaled by 0.5, if the original page size was 10 by 10 centimetres, it should become 5 by 5. | 10:35.45 |
| Both the page and the contents should scale proportionally. | 10:35.57 |
| Can recent Ghostscript versions do this? | 10:36.18 |
chrisl | You can't really have different scaling for viewing and printing, AFAIK | 10:37.14 |
halabund | chrisl: No, I donât want different scaling for viewing and printing :) I was just trying to explain that everything should scale proportionally. This question is often misunderstood as wanting to place the same graphics on a different page size, say A4 -> Letter or something like that. Thatâs not what I want, I just want a simple scaling of *everything* by the same factor. | 10:40.36 |
chrisl | halabund: Ghostscript can't manipulate a PDF that way, but it can probably create a *new* PDF scaled as you want. | 10:41.42 |
halabund | chrisl: What does that mean? What I need is that the old and the new one look identical (other than the scaling). I do not mind if the internal sctructure changes in some way. | 10:42.50 |
chrisl | It should look the same, but internally (including a fair amount of metadata) will not be the same, or will be missing altogether | 10:43.34 |
| But you won't be able to do "scale by 0.5" with Ghostscript - you'll have to get the page size, work out the new page size, and then "convert" | 10:43.41 |
halabund | chrisl: Do you mean that I cannot use a scaling factor, only a target size in centimetres? That is okay, because I do know the starting page size. | 10:44.45 |
chrisl | Yes, exactly | 10:44.56 |
halabund | But only the width. | 10:44.56 |
| I do not know the height. | 10:45.00 |
chrisl | No, you need to know both | 10:45.15 |
halabund | I have an approximation of the height, I guess that might do. If I know the precise target size, how would I get Ghostscript to do this scaling for me? | 10:46.05 |
chrisl | If you add "-dDEVICEWIDTHPOINTS=w -dDEVICEHEIGHTPOINTS=h -dFIXEDMEDIA -dFitPage" to your command line, it should do what you want | 10:47.47 |
halabund | where w and h are the target, right? | 10:48.10 |
chrisl | Yes, in points | 10:48.35 |
| So. something like: gs -sDEVICE=pdfwrite -o new.pdf -dDEVICEWIDTHPOINTS=w -dDEVICEHEIGHTPOINTS=h -dFIXEDMEDIA -dFitPage old.pdf | 10:49.35 |
halabund | OK, let me try. Iâm trying to work around a Mathematica bug where it introduced inaccuracies when exporting at small sizes (due to some stupid internal rounding I guess). I want to export at 5x the size, then rescale to the target size. | 10:50.05 |
chrisl | halabund: there is a Postscript utility we ship called pdf_info.ps which (amongst many other things) will give you the original page size of the PDF (or, more accurately, the dimensions of the various bounding boxes a PDF must/may contain) | 10:54.05 |
halabund | chrisl: Just tried, it works very well with my figures! Thank you. Also, I see that if I give the wrong height, it wonât change the aspect ratio. That is very good. It means that it is not a problem if my height is a bit inaccurate (for as long as it is not as small as to crop the graphics). | 10:55.32 |
chrisl | halabund: well, as I said above, you can use pdf_info.ps to get the exact original size | 10:56.36 |
halabund | Well, thereâs one small problem: it recompresses some images as JPEG and the JPEG artefacts are visible now. Can I instruct it to use lossless compression? | 10:57.19 |
| or very high quality | 10:57.33 |
chrisl | Erm, you can - I'll have to look it up, though, hang on...... | 10:57.44 |
tor8 | sebras: both lgtm, I'll push. | 10:58.06 |
chrisl | halabund: try adding "-dAutoFilterColorImages=false -dAutoFilterGrayImages=false" to your command line | 11:00.27 |
halabund | chrisl: Thanks! It didnât solve the quality degradation completely, but by googling for this option I found -dColorImageFilter=/FlateEncode which does fix the problem entirely. In my case fortunately it doesnât blow up the file size at all. http://comp.lang.postscript.narkive.com/vwkyi2e5/how-to-tell-ghostscript-to-leave-bitmap-images-alone | 11:04.43 |
| chrisl: Thank you for all the help! :-) | 11:05.48 |
chrisl | Hmm, strange, our docs suggest that disabling the autofilter should have us always using flate - possibly a bug (either docs or code) there | 11:06.20 |
| halabund: NP | 11:07.01 |
halabund | I am using 9.16, maybe I should upgrade to 9.18. I notice itâs available for Mac now. http://pages.uoregon.edu/koch/ | 11:09.51 |
chrisl | halabund: whilst we always recommend using an up to date version, it's probably not critical unless you hit a problem - then you should definitely update and try it before reporting it, otherwise, you'll get your wrist slapped ;-) | 11:17.10 |
tor8 | Robin_Watts: rats, dirn_matches/fz_bidi_fragment_text/detect_flow_directionality gets stuck in an everlasting loop for some files | 11:18.40 |
Robin_Watts | tor8: It does? Throw the file at me, and I'll see what I can find. | 11:19.08 |
tor8 | http://ghostscript.com/~tor/stuff/0.epub | 11:19.35 |
| Robin_Watts: if you ever use linux, and have something that's taking longer than you expected: "sudo perf top" is your friend :) | 11:20.48 |
| like top but it actually looks at which functions in the process is taking time :) | 11:21.17 |
Robin_Watts | nice. | 11:35.36 |
tor8 | Robin_Watts: malc_ has found an even simpler test case for the hang | 12:01.19 |
malc_ | wtf? | 12:01.30 |
| i haven't found it | 12:01.33 |
| i MADE it | 12:01.36 |
| by hand | 12:01.38 |
| some courtesy please | 12:01.43 |
Robin_Watts | malc_: Oh, fabulous. I'd love to see it. | 12:02.14 |
| (Sorry, I'm buried in gs at the moment, hope to get to this in a short while) | 12:02.29 |
malc_ | <html> | 12:02.35 |
| <pre> | 12:02.35 |
| 25EFâ¯LARGE CIRCLE | 12:02.35 |
| â 20DDÂ â combining enclosing circle | 12:02.35 |
| â 25CB â white circle | 12:02.35 |
| â 2B24 ⬤ black large circle | 12:02.37 |
| â 2B55 â heavy large circle | 12:02.40 |
| â 3007 ã ideographic number zero | 12:02.42 |
| âá | 12:02.45 |
| aaâa | 12:02.47 |
| latinããããÙاÙسةÙÙ
Ù
شا ÙÙ Ùشاةش٠ÙÙ ÙشاÙØ©à°¹à±à±à°à°à±à°µà±à°°à±à°¨à±à°°à±ÑÑÑÑкий | 12:02.50 |
| </pre> | 12:02.53 |
| </html> | 12:02.55 |
| i guess only the line with arabic is relevant though | 12:02.58 |
| lemme test | 12:03.04 |
| nope | 12:03.25 |
| <html> | 12:04.00 |
| <pre> | 12:04.00 |
| latinããããÙاÙسةÙÙ
Ù
شا ÙÙ Ùشاةش٠ÙÙ ÙشاÙØ©à°¹à±à±à°à°à±à°µà±à°°à±à°¨à±à°°à±ÑÑÑÑкий | 12:04.00 |
| </pre> | 12:04.03 |
tor8 | if (broken) break; drops it out of the loop but leaves 'end' unchanged so we enter an eternal loop | 12:04.05 |
malc_ | </html> | 12:04.05 |
| drop <pre> and the hang disappears | 12:04.08 |
tor8 | malc_: the <pre> adds the equivalent of <br/> on all newlines to our internal data structure | 12:05.11 |
| malc_: you only need a <pre> tag to trigger the bug. no need for actual bidi text. | 12:06.56 |
malc_ | tor8: don't you grow a new type of appreciation of mozilla/etc developpers? ;) | 12:08.34 |
Robin_Watts | tor8: How about... moving end = end->next; to be just before if (broken) break; ? | 12:09.01 |
| That should remove the possibility of end being unchanged, so we'll always make progress. | 12:09.31 |
| Nothing after that point uses end at all. | 12:09.37 |
tor8 | Robin_Watts: ta, that looks like it fixes the problem. | 12:09.56 |
Robin_Watts | Fab. | 12:10.04 |
| tor8, malc_: Thanks. | 12:10.11 |
| tor8: Can i let you commit that? | 12:10.23 |
malc_ | Robin_Watts: and thank you for not endulging yourself with 'ta' and 'fab' | 12:10.31 |
tor8 | Robin_Watts: yes, I can commit that. | 12:11.01 |
| Robin_Watts: that and one more short commits on tor/master | 12:12.07 |
| one to add a build=sanitize flag to use clang/gcc's address sanitizer | 12:12.20 |
malc_ | tor8: you should add sanitize=undefined too :) | 12:12.47 |
| would be fun to experience the fallout of that | 12:13.18 |
Robin_Watts | tor8: lgtm. | 12:16.14 |
sebras | tor8: great, thanks! | 12:52.54 |
Robin_Watts | tor8: I've got a commit on robin/master to sort out that common code. See what you think. | 14:03.14 |
tor8 | Robin_Watts: I don't see any new commits | 14:11.18 |
Robin_Watts | tor8: oops. | 14:53.26 |
| tor8: sorry, look now. | 14:55.58 |
| That runs with no diffs. | 14:56.10 |
| (as you might expect) | 14:56.16 |
| Do we have any epub files in the cluster? | 14:56.28 |
HenryStiles | z/OS ... seriously? | 14:56.51 |
| oh I guess it is more recent than I thought I'm confusing it with their older mainframe OS's | 14:59.25 |
tor8 | Robin_Watts: looks pretty good. maybe call it string_shaper rather than walker? | 15:00.30 |
Robin_Watts | tor8: Could do. | 15:00.42 |
tor8 | not sure which is clearer, walker is pretty obvious :) | 15:00.53 |
Robin_Watts | I originally had the shaping separate to the walking, but then I twigged I could put it all together. | 15:01.14 |
| I think I prefer walker. | 15:01.45 |
kens | HenryStiles : The described condition does not occur for me running the file on Windows, but in the absemnce of a command line..... | 15:01.49 |
HenryStiles | kens: I was going to leave it with marcosw for now | 15:02.15 |
Robin_Watts | cos it makes more sense that a "walker" gets called multiple times whereas a "shaper" might be expected to be called only once. | 15:02.22 |
kens | As I said to Chrisl I was waiting for a regression run so I gave it a quick try | 15:02.34 |
tor8 | Robin_Watts: yeah. probably best to just leave it as is. | 15:03.18 |
kens | I find it hard to see how they get to that line with size->y being 0, since there's an explicit test and return against it higher up | 15:03.28 |
tor8 | Robin_Watts: LGTM. | 15:03.46 |
Robin_Watts | ta. | 15:03.53 |
| tor8: Do you have a set of epub files you use to test? We should put that in the cluster. | 15:07.24 |
tor8 | Robin_Watts: I do not, I just write simple html files for testing new features... | 15:12.08 |
Robin_Watts | tor8: OK. | 15:12.23 |
tor8 | that, and my private ebook collection | 15:12.38 |
| which is mostly simple fiction so doesn't exercise any fancy features | 15:12.53 |
| Robin_Watts: I think sebras collected a bunch of epub files a while back | 15:15.28 |
| they should be on casper somewhere | 15:15.33 |
Robin_Watts | tor8: so, next thing to think about... | 15:27.28 |
| for some lines, when we shape things, the combined shaped text has a taller bbox than any of the individual glyphs. | 15:28.03 |
| hence we ought to up the line spacing on such lines. | 15:28.27 |
tor8 | Robin_Watts: yeah... that's a difficult problem. uneven line heights are really ugly. | 15:29.27 |
| we could just bump our default line spacing by a fair bit | 15:29.45 |
Robin_Watts | More generally than that, I wonder how baselines compare for different scripts. | 15:29.50 |
| We calculate bboxes as strictly positive things. | 15:30.16 |
| i.e. from (0,0) + (w,h) | 15:30.25 |
tor8 | the html specification and implementations all do terrible typographic mistakes, with extra line heights added for stuff like <sup> tags | 15:30.56 |
Robin_Watts | if we have languages that 'hang' from the baseline, then we might need both an 'ascender' and 'descender' value maybe. | 15:31.14 |
tor8 | Robin_Watts: measure_line measures the ascenders and descenders and figures out the baseline and total line height | 15:31.50 |
Robin_Watts | oh, right, cool. | 15:32.16 |
tor8 | it returns the line height, line width and baseline values | 15:32.29 |
| so I guess the problem we have now is that the font ascender value doesn't always match the final height of shaped stuff? | 15:33.04 |
Robin_Watts | AIUI, no. | 15:33.12 |
| So currently it assumes 80% above the baseline, and 20% below ? | 15:33.58 |
tor8 | Robin_Watts: oh, yeah... that code should probably look at the node->font :) | 15:34.40 |
| or we add two floats to the struct (to your horror) | 15:35.22 |
| the node->font is not necessarily the font that will be used, and the metrics may not match the fallback font | 15:36.18 |
Robin_Watts | So layout_flow calls measure_word | 15:36.29 |
| and after a few of those calls flush_line which calls measure_line | 15:36.44 |
| While we are measuring the word we could keep the max/min ascenders/descenders. | 15:37.19 |
| (We have the correct font etc in measure_word) | 15:37.27 |
| those max/mins could be fed into measure_line? | 15:37.52 |
tor8 | node->y is the final calculated baseline to use for the node | 15:38.38 |
| right, so do the max_a, max_d and line height calculations on the fly in measure_word instead? | 15:39.43 |
| I'm thinking we could probably simplify a bit of this line layout code by using a packer structure or something | 15:40.34 |
Robin_Watts | 'on the fly' ? | 15:41.05 |
tor8 | but we don't want to paint ourselves into a corner, so we can't do TeX-style line layout | 15:41.07 |
| Robin_Watts: sorry, that was a bit unclear. I mean we keep track of the ascender, descender and max height values while we loop over measure_word and eliminate the measure_line call | 15:41.52 |
Robin_Watts | certainly we'd keep track of those values over the calls to measure_word. | 15:42.31 |
tor8 | then we can get accurate ascenders and descenders from the actual fonts used | 15:42.33 |
Robin_Watts | I haven't got far enough to actually get to the fact that measure_line could go yet. | 15:42.50 |
tor8 | yeah. I believe we would not need the measure_line function at all then. | 15:42.54 |
Robin_Watts | but possibly, yes. | 15:42.59 |
tor8 | well, all measure_line does is figure out the final width but we already have that at the call site | 15:43.49 |
| or we wouldn't know to call flush_line | 15:43.53 |
| Robin_Watts: still, I think if we try to measure several different possible layouts (as for TeX layout) we want to be able to loop over the nodes and figure out the same | 15:44.54 |
| BUT, here's my gripe, as soon as we hit a fallback character the line height for that line will differ from all the others | 15:45.40 |
| which is going to look absolutely terrible :( | 15:45.53 |
| if we have problems with our default fonts being too close together we can adjust this line instead: | 15:47.17 |
| style->line_height = number_from_property(match, "line-height", 1.2f, N_SCALE); | 15:47.23 |
| change the 1.2 to something bigger, like 1.3 | 15:47.29 |
Robin_Watts | tor8: I understand your objection to differing line heights, and I broadly agree. | 15:50.25 |
| What is TeXs solution to that problem? | 15:50.44 |
tor8 | not a clue | 15:51.09 |
Robin_Watts | I think we mostly want to lay out lines based upon 1.2 the max ascender-descender | 15:51.41 |
| (of the font) | 15:52.17 |
tor8 | Robin_Watts: that's what we currently do, but we use the max of any images in the line and the ascender and descender based on the em-size | 15:52.36 |
Robin_Watts | Most of the time our lines will fit comfortably within that (cos most glyphs don't use the full extent of the glyph bbox) | 15:52.50 |
tor8 | if we just pick the actual node->font used I'd be okay with it | 15:52.52 |
| s/used // | 15:53.02 |
| and ignore any ascender/descender values in the fallback fonts | 15:53.18 |
Robin_Watts | When we use a fallback char, we might get a different font. | 15:53.28 |
tor8 | then a sudden missing character won't throw off the line spacing | 15:53.31 |
| but if someone picks a font with a specific ascender/descender/line-height we'll use that | 15:54.00 |
| and we can make sure our defaults (in the absence of any user fonts) are sane | 15:54.18 |
Robin_Watts | Or when we shape we might get glyphs that are offset outside that range. | 15:54.29 |
| I reckon we want to stick with the current 1.2 * max, and only increase that if we have glyph combinations that have an actual measured min/max larger than that. | 15:55.10 |
| So a random inserted bit of (say) arabic that slopes upwards *can* change the line spacing, but only if it genuinely would have run into the stuff above us. | 15:56.30 |
tor8 | Robin_Watts: fonts also have a line height property which we currently ignore | 15:56.32 |
| well, we currently ignore everything except the CSS set em-size | 15:56.57 |
sebras | tor8: did I? did you find the epub stash? | 16:00.41 |
tor8 | Robin_Watts: https://www.w3.org/TR/CSS2/visudet.html#line-height | 16:00.53 |
Robin_Watts | sebras: I could not find it. | 16:02.12 |
sebras | Robin_Watts: any particular type that your looking for? r2l ones I imagine? | 16:04.23 |
| Robin_Watts: I guess I have them on my desktop. I'll let you know if I find them a little later today. | 16:05.09 |
Robin_Watts | tor8: sorry about that. PC bluescreened. | 16:13.53 |
| tor8: I just tried to read that css line-height spec, and my brain blue screened. | 16:24.31 |
tor8 | Robin_Watts: that tends to happen when you try to read the css spec. | 16:25.49 |
Robin_Watts | I'm going to park this for a few hours while I ponder on it some more. | 16:26.57 |
| reboot. | 16:38.55 |
tor8 | Robin_Watts: commit on tor/master for tracking serif/bold/italic so we can use them when looking for fallback fonts | 17:09.10 |
Robin_Watts | tor8: using a char to hold a bool ? | 17:10.31 |
tor8 | Robin_Watts: yes. | 17:10.38 |
| as far as I am concerned, they could just be ints | 17:11.03 |
Robin_Watts | flag word :) | 17:11.15 |
tor8 | masking with constants is annoying... | 17:11.33 |
| and bitfields are overkill | 17:11.41 |
Robin_Watts | static inline int fz_font_is_bold(fz_font *b) { return !!(b->flags && FZ_FONT_BOLD); } | 17:12.45 |
| but sure, it looks fine. | 17:12.54 |
tor8 | Robin_Watts: thanks. | 17:14.14 |
| now we get serif fallbacks when available :) | 17:14.27 |
Robin_Watts | Nice. | 17:15.10 |
tor8 | and should we one day decide to bloat the binary with italic and bold versions, we can get those too | 17:15.33 |
Robin_Watts | If you ask for a bold font, and we don't have one, does it use fake_bold? | 17:17.29 |
tor8 | it does not, but it would be easy to add | 17:20.46 |
| doing the same for italic would be bad though, unless we restrict it to latin/cyrillic/greek scripts where we know italic as slanted works | 17:21.27 |
| but I worry that artificial boldening may make things illegible in some scripts | 17:21.48 |
| currently we only do it for XPS where you can explicitly ask for fake bold | 17:22.15 |
Robin_Watts | We had a complaint about SOT recently that bold is very important for CJKV, and we weren't supplying a bold font. | 17:34.57 |
| Forward 1 day (to 2016/02/11)>>> | |