| <<<Back 1 day (to 2013/04/30) | 2013/05/01 |
enginuitor | Is -dQFactor=x.x the correct parameter to use to control the "quality" of images compressed by DCTEncode? | 02:34.42 |
| Because I'm trying to get some control over the same, and regardless of what value I pass for QFactor I get the exact same output | 02:35.48 |
| hmm, okay, apparently the answer is that you can't control the quality when using DCTEncode | 04:19.26 |
| I had to hard-code in a call to gs_jpeg_set_quality | 04:21.21 |
Guest11688 | hey, I could really use some help on how to use gs to re-render a pdf with new resolutions for images, is anyone up for a quick look at the command I'm trying to use? | 07:24.50 |
kens | There are a whole slew of settings you need to set | 07:25.27 |
Guest11688 | so, taking the most basic approach I've been trying: | 07:26.13 |
kens | Please note that the pdfwrite device is not actually intended for this purpose, its really intended to convert PostScript into PDF | 07:26.22 |
| Anything else is effectively a bonus | 07:26.37 |
Guest11688 | gs -dAutoRotatePages=/None -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -sOUTPUTFILE=<OUTPUT_PDF> -dDownscaleColorImages=true -dColorImageDownsampleType=/Bicubic -dColorImageResolution=<RES> -q <INPUT_PDF> | 07:26.40 |
kens | Bicubic is not supported (well, it is now, but not released) | 07:27.03 |
| You also need to set the DownsampleThreshold | 07:27.20 |
Guest11688 | ah - I see, I'll just try that out thanks | 07:28.57 |
| Any better approach for re-writing the pdf like that then using the pdfwrite device? | 07:29.18 |
kens | There are tools intended for the job of manipualting PDF files, I'm sure one or more will offer the option to reduce the effective resolution of images. | 07:30.09 |
| I'm not saying pdfwrite won't work, but you need to understand the limits under which it operates | 07:30.34 |
Guest11688 | My output pdf seems to have the same size no matter what <RES> value I set. I tried setting the DownsampleThreshold to a very small value like 0.1 to ensure that it would downsample and a low resolution like 12.0 or something, but the size is the same as not setting any of the parameters for scaling at all | 07:48.17 |
| ex: gs -dAutoRotatePages=/None -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -sOUTPUTFILE=<outputpdf> -dDownscaleColorImages=true -dColorImageResolution=12.0 -dDownsampleThreshold=0.1 -q <inputpdf> | 07:50.56 |
kens | If Guest11688 comes back, poitn out to him that DownscaleColorImages is actually called DownsampleColorImages. If he uses the right switch it'll probably work. | 08:50.43 |
Robin_Watts | kens, chrisl: Stupid fonty fallback question if I may | 09:22.04 |
| In gs, fonts fallback to a range of fonts, right? | 09:22.25 |
| and cidfonts fallback to... just droidsansfallback ? | 09:22.39 |
| (as configured by default at least) | 09:23.33 |
chrisl | Robin_Watts: yes, there's some hairy/horrid heuristics for fonts | 09:23.59 |
Robin_Watts | The bug I have here, we have a request for a CIDFont called "ArialMT". We were falling back to a non-CIDFont for Heletica, and so all the glyphs were garbled. | 09:25.12 |
chrisl | Robin_Watts: Yes, I mentioned that to Tor quite a while ago, but it wasn't a user bug, just something I stumbled across | 09:25.51 |
Robin_Watts | I've arranged a patch so we fallback from cidfont to cidfont (droidsans) | 09:26.34 |
| but I was wondering if it was feasible to fallback from cidfont -> font if we remapped some chars. | 09:27.01 |
chrisl | It would be possible, but why bother? | 09:27.24 |
Robin_Watts | so if someone requests a cidfont "Arial" we'd fallback to Helvetica where chars exist, and then to droidsans if they didn't,. | 09:27.49 |
| chrisl: For better matching. | 09:27.55 |
chrisl | That's not better matching, that's worse - if you ask for a cidfont, you should get a cidfont | 09:28.17 |
| It's pretty horrible, because you'd be relying totally on knowing the glyph ordering of the cidfont *and* the contents of the font, neither of which you are sure to know | 09:29.31 |
| Personally, I don't like going to excessive effort to "fake up" substitute fonts because it hides real problems with files | 09:31.13 |
Robin_Watts | ok, so my approach seems right then. | 09:31.34 |
| We'll just let tor8 pull my implementation to bits later :) | 09:31.50 |
chrisl | I find it hard to believe that Helvetica would contain glyphs missing from DroidSansFallback anyway | 09:32.11 |
Robin_Watts | It's not that they might be missing. They might be a poor visual match. | 09:32.37 |
chrisl | Well, that's the risk you run relying on a substitute font - moral of the story: embed fonts!! | 09:33.09 |
mcs | Hi, can anyone explain me the significance/role of Document, View and Screen Space | 09:39.56 |
Robin_Watts | pgogna (b647b07e@gateway/web/freenode/ip.182.71.176.126) | 09:42.12 |
| mcs (b647b07e@gateway/web/freenode/ip.182.71.176.126) | 09:42.19 |
mcs | I am his colligue, we are working on the same network | 09:42.50 |
Robin_Watts | Ah, right. | 09:43.04 |
| Each page in a PDF file is described in terms of "document space" coordinates. | 09:43.25 |
mcs | thanks | 09:43.38 |
Robin_Watts | When the PDF file says "put this text here, at these coordinates", it's talking in document space coords. | 09:44.02 |
mcs | hmm | 09:44.16 |
Robin_Watts | So 0,0 is the top left of the page, and it extends to page_width/page_height at the bottom right. | 09:44.56 |
mcs | thanks | 09:45.38 |
Robin_Watts | (actually, there may be some other scalings/margins etc in there due to various things, but as far as the raw data exposed by mupdf, you can imagine that what I've just said is correct) | 09:45.48 |
mcs | ok | 09:46.03 |
Robin_Watts | In the MuPDF Android build, we then hold each page in a 'view'. | 09:46.11 |
mcs | So for iOs, you are not doing it? | 09:46.38 |
Robin_Watts | Each view also extends from (0,0) in the top left corner, to view_width/view_height | 09:46.54 |
| In order to zoom the page, we increase/decrease the view_width/view_height. | 09:47.19 |
mcs | I see | 09:47.46 |
Robin_Watts | The iOS app has a different internal structure - it's to do with the different mechanisms for scrolling etc that exist in android/iOS. | 09:48.13 |
| but it's broadly similar, I would imagine. | 09:48.23 |
| So, that's view space. Converting from document space <-> view space is easy, as you as you know page_width/page_height/view_width/view_height. | 09:49.04 |
| Views are then positioned on screen according to their left hand/top edges. | 09:49.42 |
| If a view has it's left hand edge off the screen, then the left hand position will be -something. | 09:50.19 |
| If there is a gap to the left of the page, then the left hand position will be +something. | 09:50.50 |
mcs | hmm | 09:51.06 |
Robin_Watts | So, adding in these offsets is how you go from view space to screen space. | 09:51.28 |
mcs | Is there any relationship between the view space (view_width / view_height) and device screen size? | 09:52.51 |
Robin_Watts | When a page is first opened, we scale it so that either the width or the height of the view fills the width or the height of the screen. | 09:54.17 |
| but as we zoom in and out of pages, no there is no fixed relationship left. | 09:54.55 |
mcs | So they are independent? | 09:55.06 |
| ok | 09:55.08 |
| What is the relationship between view space and document space? I am a bit murky on view space here. | 09:56.45 |
| Do we see a document through a view? | 09:58.20 |
| What if the dimensions of the view and screen differ? (If its a valid question) | 09:59.50 |
| Ok, here is a scenario | 10:01.39 |
| Imagine a PDF is opened when its best fit to height - in other words, you can see the entire PDF height wise | 10:02.21 |
Robin_Watts | sorry, was on phone. | 10:02.48 |
| We do indeed see a document through a view. | 10:03.01 |
mcs | then, if we try to get a bounding box of text, it appears almost correctly (we are drawing an overlay of the bbox) | 10:03.08 |
Robin_Watts | Think of a view as being a bitmap image of the entire page at screen scale. | 10:03.41 |
| So initially, in the case you describe, if the PDF is opened to fit best to height, screen_height = view_height, and screen_width >= view_width. | 10:04.28 |
mcs | Now, you pinch zoom, so you are now looking at a zoomed in view of the PFD, now we double tap to get the bbox | 10:04.42 |
| we still get it as if the view is not zoomed in - in other words, the PDF is fit to height | 10:05.14 |
| Any opinions on the scenario? | 10:05.24 |
Robin_Watts | Right. When you zoom in, view_height = screen_height * 2 (say) | 10:05.28 |
| When you double tap to get the bbox, if you're getting bbox coords in doc space, then they will come back to be the same regardless of the zoom factor. | 10:06.37 |
| If you're getting them in view space, then they'll come back differently - and you'd expect them to be twice as big. | 10:07.06 |
mcs | I see | 10:07.08 |
| So what space do you suspect we are getting the the bbox in that might be causing this behaviour? | 10:07.42 |
Robin_Watts | I have no idea. | 10:08.00 |
| because I don't know what processing you are doing on the values. | 10:08.16 |
| and frankly, I don't want to know. | 10:08.30 |
mcs | hmm | 10:08.31 |
| ok | 10:08.41 |
Robin_Watts | You've got all the screen sizes/page sizes etc. | 10:09.04 |
| You should be capable of doing some tests to check that the values you get at each stage look reasonable. | 10:09.20 |
mcs | hmm | 10:09.31 |
| yes, we are trying that | 10:09.49 |
| its almost working | 10:09.55 |
Robin_Watts | And if they are, then it's a simple matter to convert from one space to another. | 10:10.04 |
mcs | yes, thats why i am learning the significance of spaces | 10:10.25 |
| Its working if we zoom the PDF to fit height | 10:10.47 |
| otherwise, it doesnt work, because internally it always believes that the zoom level is fit to eight or as you mentioned, screen_height = view_height | 10:11.37 |
| eight = height | 10:12.01 |
| thanks for your help | 10:14.10 |
| let me know if some idea comes to your mind regarding the scenario | 10:14.42 |
Robin_Watts | Morning tor8 | 10:21.28 |
tor8 | hi robin. | 10:25.00 |
Robin_Watts | A review on robin master for you. | 10:25.11 |
| Essentially, it alters font fallback so cidfonts can't fall back to fonts, only to cidfonts. | 10:25.32 |
tor8 | that could be dangerous... the font fallback stuff is very delicate :( | 10:26.07 |
Robin_Watts | tor8: Well, it's an advance for this file, and I can't see offhand how it can break anything else. | 10:26.38 |
| but have a look and see if you can see something I've missed. | 10:26.47 |
tor8 | lots of non-cjk fonts are implemented as cid-fonts | 10:27.11 |
| mostly with identity-h encodings | 10:27.25 |
Robin_Watts | right. and if we ever fall back from a cidfont to a font, we get garbled text. | 10:27.33 |
| because the orders on fonts and cidfonts are different. | 10:27.43 |
| s/orders/orderings/ | 10:27.51 |
tor8 | define ordering | 10:28.00 |
| do you mean encoding? | 10:29.12 |
Robin_Watts | the default position of a glyoh (say '3') is different in a cidfont with an identity-h encoding, than it is in a font. | 10:29.13 |
| do I ? | 10:29.16 |
tor8 | the glyph id of a character varies randomly from font to font, has nothing to do with cid or non-cid...? | 10:29.44 |
Robin_Watts | tor8: Hmm. I don't know the terminology here, so bear with me. | 10:30.09 |
tor8 | if we don't have a way to get at an actual encoding, it's all just random guessing | 10:30.11 |
Robin_Watts | The situation I have is that I have a file that asks for a CIDfont ArialMT, which supposedly has Identity-H encoding. | 10:30.41 |
tor8 | most commonly when the creator has not embedded a truetype font and assumes the substitute font will always be a specific microsoft ttf | 10:30.45 |
| and then it (erroneously) assumes that it can encode the glyph id directly | 10:31.04 |
Robin_Watts | and the text is then "30 Rue des halles" or something. | 10:31.14 |
| If we fallback to a font, we get "1?sdsdfjas". | 10:31.40 |
tor8 | which, if we have the actual microsoft font, will work. but if we use any other font (as would be allowed by the spec) all bets are off what you'll actually see | 10:31.40 |
| the ToUnicode mapping is only used for searchability, never for glyph lookup | 10:31.57 |
Robin_Watts | If instead we fallback to DroidSansFallback, we get perfect results. | 10:31.58 |
| gs/Acrobat etc all get it right. | 10:32.11 |
tor8 | that would probably be a random side effect of droidsansfallback having the same internal glyph ordering as ArialMT on windows | 10:32.31 |
| gs and acrobat may have access to the real ArialMT file | 10:32.51 |
| acrobat certainly will, not sure about gs | 10:32.57 |
Robin_Watts | Talking to chris this morning, he says (I'm sure he'll correct me if I'm wrong) that gs never falls back from CIDFont to Font for exactly this reason. | 10:33.13 |
tor8 | do you have the bug#? | 10:33.19 |
Robin_Watts | He spotted this problem ages ago. | 10:33.23 |
| It's in my commit message. | 10:33.29 |
| http://bugs.ghostscript.com/show_bug.cgi?id=693931 | 10:33.39 |
tor8 | pdf.js gets that one right too | 10:34.47 |
| my guess would be that both pdf.js and sumatra have access to arial.ttf | 10:35.00 |
Robin_Watts | gs doesn't. | 10:35.19 |
| AIUI gs only has droidsansfallback for cidfonts. | 10:35.42 |
tor8 | Robin_Watts: is that true for all cidfonts, or just cjk fonts? | 10:36.27 |
Robin_Watts | This isn't a CJK font. | 10:36.58 |
| or at least, the characters in question are all latin. | 10:37.12 |
tor8 | that's what I mean, so maybe it isn't using droidsansfallback for it | 10:37.14 |
| the odd thing about that file is that some of the fonts look alright but some don't | 10:37.20 |
Robin_Watts | tor8: Right. There are 3 fonts, only one of which is a cidfont. | 10:37.44 |
| no, sorry. There are 3 fonts. At least 1 is not a cidfont, at least 1 is. | 10:37.57 |
tor8 | they're all the same | 10:38.14 |
| all three are type0 with identity-h | 10:38.26 |
Robin_Watts | 6 0 obj = Arial-BoldMT as a Type0 with a CIDFontType2. | 10:39.28 |
| 7 0 obj = ArialMT as a Type0 with a CIDFontType2 descendant. | 10:40.32 |
| and 9 is the same but italic. | 10:41.10 |
| Hmm. OK. I don't know how one works and the others don't. | 10:41.21 |
tor8 | they should all be having the same problems... | 10:41.23 |
| maybe we should add some optional font parsing logging to all the steps we do when picking encodings | 10:42.08 |
| okay, if you compile with -DNODROIDFONT we get the base14 as substitute fonts | 10:43.50 |
| in that case *all* of them end up garbled | 10:43.57 |
Robin_Watts | ah. | 10:45.41 |
tor8 | we normally use droidsans.ttf as a fallback for all non-cjk fonts | 10:46.00 |
| droidsansfallback is reserved for CJK | 10:46.08 |
Robin_Watts | so ensuring that we only fallback from cidfont to cidfont still sounds right to me. | 10:46.22 |
tor8 | it will make no difference. droidsansfallback is not a cidfont | 10:46.42 |
| it may share internal glyph ordering with arial.ttf but that's only coincidence | 10:47.08 |
Robin_Watts | Oh. | 10:47.18 |
tor8 | for non-embedded CJK cid fonts we use the CMaps in the cmap directory to look up the correspending unicode code point for each character | 10:47.48 |
| then map that through droidsansfallback.ttfs unicode encoding | 10:47.58 |
Robin_Watts | Can we do that for ALL non embedded cid fonts? not just CJK ones? | 10:48.23 |
tor8 | nope. because of Identity-H which means that the characters in the PDF file are supposed to be used directly as the glyph index, bypassing the encoding tables | 10:49.00 |
| that's where the problem lies. the file specifies a specific glyph index in a specific truetype file. but it "forgot" to embed it. | 10:49.54 |
| and relies on the viewer to be on windows and have access to exactly the same font file that was used to create the pdf | 10:50.16 |
chrisl | tor8: but it doesn't bypass the cmap tables in the TTF, does it? | 10:50.27 |
tor8 | chrisl: identity-h bypasses the cmap in the ttf | 10:50.39 |
| it may have a secondary lookup via the CIDToGID array in the font descriptor | 10:50.54 |
| but an identity-h will never use the ttf cmap | 10:51.04 |
chrisl | IIRC, Ghostscript uses a "special" CMap to handle this case..... | 10:51.29 |
Robin_Watts | Ok. How about "for all non embedded, cidfonts where we can't use ToUnicode to help us, only fallback to fonts that we know have a 'standard' glyph ordering (i.e. droidsansfallback)". | 10:51.42 |
tor8 | yeah. what we could do is set up some CMAp tables giving identity-glyph-index-to-unicode mappings for the common windows fonts | 10:52.03 |
chrisl | Frankly, that will work the vast majority of the time | 10:52.13 |
Robin_Watts | tor8: That sounds better, but much more work. | 10:52.36 |
tor8 | Robin_Watts: we would have to look at the "standard" glyph ordering of all microsofts fonts first to see if there actually is one | 10:52.55 |
chrisl | Okay, Ghostscript uses Identity-UTF16-H to map from Identity-H to a Unicode value we can pass to the TTF font | 10:53.08 |
tor8 | and if there is, even better, we just need to make one CMap to go from microsoft-glyph-id to unicode | 10:53.18 |
sebras | heh... I remember writing about this file a few days ago. | 10:53.31 |
| tor8: wouldn't that be the same as ToUnicode..? | 10:53.47 |
tor8 | sebras: it might be, but should work even in the absence of ToUnicode | 10:54.13 |
| a proper ToUnicode would be useless here, because it may contain N-to-M mappings as well | 10:54.30 |
| but as a fallback, it may be the path of least resistance | 10:54.46 |
| if cidfont and identity-h and not embedded: use tounicode and the same approach as cjk fonts (but with the same fallback font as used today, not necessarily droidsansfallback) | 10:55.36 |
sebras | tor8: so if there is a /ToUnicode, use that, if there isn't use a builtin cmap ms->unicode instead? | 10:55.37 |
chrisl | Frankly, I don't think it's worth all the effort - if it works, great. If it doesn't work, that's the price for relying on substitute cidfonts | 10:56.06 |
tor8 | chrisl: so that's just taking the character values and pretending they're unicode? | 10:56.18 |
sebras | tor8: isn't that what mupdf does now..? | 10:56.39 |
chrisl | tor8: yes, or use a one-size-fits-all mapping, like GS does | 10:56.45 |
tor8 | personally I'd be perfectly happy to say "screw this file, it's broken" | 10:56.49 |
| sebras: we don't use the ToUnicode for encoding | 10:57.04 |
| sebras: or do you mean "taking character values and pretending they're unicode"? | 10:57.24 |
chrisl | tor8: I still think it would be good for mupdf to allow some means of accessing other font files, though | 10:57.36 |
sebras | tor8: the latter. | 10:57.41 |
tor8 | we are currently taking character values and using them as glyph ids as the spec says, but without access to the correct font, we get garbage | 10:57.55 |
| chrisl: sumatrapdf have their own substitute font lookup that finds windows system fonts | 10:58.19 |
| which is why they can draw this file correctly | 10:58.28 |
chrisl | tor8: but that's a Windows specific solution - I'd rather see a general purpose one available in the core | 10:59.08 |
tor8 | chrisl, Robin_Watts I think I may have a simple fix for this | 11:00.32 |
| by using the ToUnicode if available | 11:00.41 |
Robin_Watts | tor8: Great. | 11:00.50 |
| but I nobbled the ToUnicode, and Acrobat/gs still got it right. | 11:01.02 |
chrisl | Acrobat will be using Arial, and gs does it's own mapping stuff | 11:01.57 |
| Robin_Watts: IIRC, only the vector devices ever use ToUnicode ings | 11:02.34 |
| s/ings/in gs | 11:02.42 |
| I can't remember if Freetype lets you pass in gids, skipping the cmap tables..... | 11:03.49 |
tor8 | Robin_Watts: right. so we could assemble our own ToUnicode CMaps for microsofts truetype fonts | 11:04.02 |
| and dump them in cmaps/ | 11:04.09 |
| chrisl: we always pass in gids to freetype in mupdf | 11:04.23 |
| and we use freetype to explicitly look things up in freetype cmaps if needed | 11:04.53 |
chrisl | tor8: okay, so couldn't you just skip the cmap table lookup? | 11:05.03 |
tor8 | chrisl: I do. that's why it's ending up as garbage. | 11:05.17 |
| because the glyph ids we get from the pdf don't "match" the same character in our substitute font as arial.ttf | 11:05.40 |
chrisl | Oh, I right, sorry - brain's a bit fuzzy :-( | 11:06.07 |
tor8 | here's what happens in mupdf: | 11:06.10 |
| 1) decode the character from the content stream (using the multibyte codespacerange stuff in the CMap if the font has one) | 11:06.36 |
| 2) convert the code point from (1) into a CID using the CMap if it exists | 11:06.53 |
| these steps happen in pdf_show_string in pdf_interpret.c and use fontdesc->encoding | 11:08.20 |
| pdf_decode_cmap for step 1 | 11:08.29 |
| pdf_lookup_cmap for step 2 | 11:08.35 |
Robin_Watts | runs. I trust you'll have this all sorted by the time I return :) | 11:08.54 |
tor8 | in this file, fontdesc->encoding is a 2-byte identity encoding | 11:08.58 |
| okay, if robin watts isn't interested I'll just shut up now. | 11:09.44 |
sebras | ehm... hi..? | 11:10.32 |
tor8 | sebras: right, so you're still listening :) | 11:10.49 |
| continuing then | 11:10.55 |
sebras | I am. | 11:11.05 |
| because I looked at the file. | 11:11.12 |
tor8 | so the value from step 2 gets passed to pdf_show_char as the CID | 11:11.16 |
chrisl | tor8: right, IIRC, Ghostscript creates a CIDDecoding resource (an inverse CMap) for use when substituting to go from a CID to a Unicode code. | 11:11.38 |
tor8 | chrisl: we do the same in mupdf, but only for CJK fonts | 11:12.05 |
chrisl | tor8: gs does it for any CIDfont substitution | 11:12.21 |
tor8 | I have a hack here to use the ToUnicode in this case | 11:12.33 |
| chrisl: right. for identity-h and an embedded font that CIDDecoding should be identity, correct? | 11:12.56 |
sebras | tor8: why can't we rely on ToUnicode always? (if it's present) | 11:13.14 |
tor8 | or do you always go through the truetype cmap? in which case you'll have to use the inverse of the cmap | 11:13.29 |
chrisl | tor8: that's where we use the custom CMAP Identity-UTF16-H | 11:13.35 |
tor8 | sebras: because it isn't supposed to be used to encode characters for display. the spec says so :) | 11:13.48 |
chrisl | tor8: I *think* we always go through the TTF cmap table | 11:14.03 |
tor8 | chrisl: I suspect that you may be mistaken there, because there are lots of cases where you shouldn't be using the ttf cmap table | 11:14.55 |
| but if this is igor's old code, all bets are off. he probably invented some gold ruberg contraption to invert the cmap first :) | 11:15.38 |
chrisl | tor8: the problem is, when using substitute fonts like this, all bets are off, anyway | 11:16.03 |
tor8 | 3) the cid is converted to a glyph id by calling pdf_font_cid_to_gid | 11:16.21 |
| 4) the gid is passed to freetype to render | 11:16.38 |
chrisl | tor8: I know when we have a "real" CIDFont, and the ordering is Identity, we skip the cmap table lookup (or possibly, the cmap table we lookup is a "fake" identity mapping) | 11:17.35 |
tor8 | 3.a) if fontdesc has a to_ttf_cmap use that to look up a unicode value which is then passed through freetype's cmap handling | 11:18.45 |
| 3.b) if fontdesc has a cid_to_gid table, use that to get the gid instead | 11:19.11 |
| 3.c) neither of the above, assume the cid is a gid | 11:19.24 |
| 3.a is what happens for CJK cidfonts | 11:19.32 |
sebras | an 3.c in this file, right..? | 11:19.40 |
tor8 | 3.b is what happens for embedded cid fonts | 11:19.41 |
| 3.c can also happen for embedded cid fonts, if there is no CIDToGID array in the font descriptor | 11:20.10 |
| 3.c is what happens in this file | 11:20.16 |
| but shouldn't end up here because the font really should be embedded but isn't | 11:20.31 |
sebras | if the cmap is Identity-H as in this case and ToUnicode is not a direct mapping, couldn't this be used to tell whether the cmap is reliable or not..? | 11:21.31 |
| or rather, that you can't know which one is wrong/right so you need to do some guess work. | 11:21.57 |
tor8 | okay, I'm on to one difference in the font loading | 11:30.53 |
| one of the three fonts ends up as a "builtin" font, the other two as "system" fonts | 11:31.10 |
| ArialMT -> Helvetica -- builtin | 11:31.21 |
sebras | and this is the one that fails. | 11:31.30 |
tor8 | Arial-BoldMT,Bold -> system | 11:31.31 |
| and the builtin font is the fail one | 11:31.42 |
sebras | and this all comes down to the difference between BaseFont and the cleaned up fornt name..? | 11:32.32 |
tor8 | builtin fonts don't get the substitute font treatment | 11:32.42 |
| which is where our actual bug may be | 11:32.48 |
| sebras: yeah. | 11:32.55 |
sebras | though both Helvetica and ArialMT are both in teh base_font_names table? | 11:33.26 |
| teh! teh?! the! :p | 11:33.37 |
tor8 | pdfref1.7 (maybe earlier) abolished the notion of builtin fonts, but there are a lot of old files out there that may break if we drop the special handling for builtin fonts | 11:33.48 |
| Robin_Watts: so you were on the right track after all, but for the wrong reasons :) | 11:36.42 |
| Robin_Watts: one of the three fonts ended up using the builtin Helvetica font, and the other two using DroidSans | 11:37.16 |
sebras | tor8: I'm confused. so where is the error? it couldn't be because we're missing Arial-BoldMT,Bold in the lookup table, right? because if it was present it'd be substituted by Helvetica-Bold which likely would cause all the text to be garbled..? | 11:39.44 |
tor8 | sebras: two bugs here | 11:40.09 |
| one -- we're treating a cid font as a builtin (when it never should be able to, only simple fonts should get that) | 11:40.27 |
| two -- the one we've been discussing still remains. droidsans.ttf is lucky to have the same glyph indexes as arial so that's why the other two seemed like they worked. | 11:41.04 |
| if you compile pdf_fontfile.c with -DNODROIDFONT all end up as garbage | 11:41.18 |
sebras | tor8: I see. so you'd still want the microsoft-to-unicode cmap for nr 2. | 11:41.50 |
| how would you know when to apply it? | 11:42.20 |
tor8 | if (identity-encoding && substitute-font && not-CJK) | 11:42.59 |
| or well, for microsoft-to-unicode we'd have to find a CMap based on the fontname | 11:43.25 |
| but we can always try the ToUnicode in the above case | 11:43.34 |
sebras | ok, now it makes sense to me. now I'll let you get back to whipping up a patch. :) | 11:44.43 |
tor8 | Robin_Watts: sebras: two patches on tor/master | 11:52.52 |
sebras | tor8: looks good to me. | 12:08.39 |
tor8 | runs through sane suite with progressions | 12:08.51 |
sebras | nice. | 12:10.32 |
tor8 | running on the cluster now to get the bmpcmps | 12:12.33 |
Robin_Watts | tor8: I told you you'd have it sorted by the time I returned :) | 12:25.44 |
tor8 | Robin_Watts: indeed :) | 12:25.53 |
| progressions on both sane and bmpcmp and sebras said LGTM so I've pushed | 12:26.06 |
Robin_Watts | I'm afraid fonts are an area where I have tried to know as little as possible. | 12:26.13 |
| fab. | 12:26.14 |
tor8 | Robin_Watts: it's a hairy area, so I can't blame you for staying away from it | 12:26.29 |
sebras | it has always annoyed me that I don't Get It<tm>... :( | 12:30.30 |
kens | tor8 Robin_Watts : http://stackoverflow.com/questions/16307966/current-page-using-mupdf | 12:34.28 |
| There's an answer there but its not terribly helpful | 12:34.44 |
Robin_Watts | will scribble something. | 12:40.42 |
sebras | Robin_Watts: hm... this probably only shows that I ought to populate the docs-directory a bit more... | 12:42.32 |
Robin_Watts | sebras: I suspect they want a nice friendly java class they can call - and it doesn't exist yet. | 12:43.00 |
sebras | wonders how people that can manage 100+ git branches in a single repo are wired... | 13:24.29 |
Robin_Watts | suspects they simply don't know about git branch -D :) | 13:50.19 |
sebras | Robin_Watts: no, that's not the case. they made it their branching policy to create branches for _every_ bugfix and every topic, and every project (consisting of bugfixes and topics) which are then merged into master. and all branches stay in the repo since they want to be able to combine these differently for other products. | 13:54.16 |
| please, never do that in mupdf... | 13:54.24 |
Robin_Watts | 100s of active branches is just nuts. | 13:54.47 |
sebras | Robin_Watts: it is. sorry about griping here... :) | 13:56.34 |
henrys | more snow - we've gone from total draught to average snowpack in a month. | 14:31.06 |
Robin_Watts | but no more forest fires? :) | 14:40.06 |
henrys | yes that is the good news | 14:40.57 |
paulgardiner | makefiles: 2 paul: 0 | 14:43.01 |
kens | If this is the GS makefile, that's normal | 14:43.26 |
paulgardiner | I even tried threatening them with deletion. | 14:45.03 |
henrys | we've tried that they won't go away. | 14:45.30 |
chrisl | paulgardiner: careful, they'll fight back.... | 14:45.35 |
paulgardiner | :-) | 14:46.11 |
chrisl | paulgardiner: actually, one big thing to remember with the Windows makefiles is that we rely on calling nmake recursively, and you have to explicitly new parameters to the subcalls | 14:47.20 |
| s/explicitly/explicitly add | 14:47.43 |
paulgardiner | That's exavtly what I'm not understanding. I cannot see explicitly added parameters, other than defines. | 14:48.29 |
Robin_Watts | paulgardiner: Can you give us an example of a parameter that you're adding? | 14:49.11 |
paulgardiner | I'm just trying to understand how msvc32.mak debug works. msvc32.mak includes msvc32.mak includes msvc.mak which defines MAKEFILE=mscv32.mak and then the debug target calls nmake -f $(MAKEFILE) with a load of defines but no new target that I can see. | 14:51.33 |
| Oops! Too many 32s | 14:51.54 |
Robin_Watts | paulgardiner: Right. It doesn't set a target, just defines. | 14:52.10 |
| so it gets the default target. | 14:52.17 |
paulgardiner | yeah, but that looks to be debug: | 14:52.34 |
Robin_Watts | The net effect is that "make debug" does exactly the same as "make" except that it has lots of other defines in it. | 14:52.49 |
| It's not. There are includes in there I think. | 14:53.01 |
paulgardiner | Oh of course | 14:53.11 |
| So the first include with a target? | 14:53.31 |
Robin_Watts | dosdefault:\n\tdefault | 14:54.14 |
| in msvccmd.mak | 14:54.19 |
| "dosdefault: default" even. | 14:54.49 |
paulgardiner | Ah ok. | 14:55.32 |
Robin_Watts | gs/base/gs.mak has "all default : $(GS_XE) ... | 14:56.04 |
| " | 14:56.08 |
mvrhel_laptop | good morning | 15:38.18 |
Robin_Watts | morning | 15:53.29 |
| henrys: When you did svgout from gs, how smart were you about not sending data multiple times? | 16:14.42 |
| like repeatedly sending colors/stroke widths etc ? | 16:14.52 |
| henrys: Did you check your svgwrite code in? | 16:16.20 |
henrys | ralph did svg out - but I'm familiar with it | 16:16.31 |
Robin_Watts | I thought you had been working on it recently? | 16:16.55 |
henrys | devices/vectors/gdevsvg.c | 16:16.57 |
| your thinking of xps | 16:17.03 |
Robin_Watts | Ah. | 16:17.09 |
henrys | I might have fixed some stuff. | 16:17.15 |
| but yes the state is written infrequently - when something changes. | 16:17.56 |
Robin_Watts | ok. | 16:18.10 |
| If I'm reading the spec correctly, I can only set colors, line states etc at group level. | 16:19.18 |
henrys | search for the poorly named svg->dirty_flag | 16:19.19 |
Robin_Watts | Right, so colors/stroke state are sent as groups, and any change in any of it causes it to be rewritten. | 16:21.21 |
| and transforms are never sent. | 16:21.33 |
henrys | just at begin page time. the coordinates have already been put in "device" space | 16:22.32 |
Robin_Watts | Right. In mupdf the transform changes throughout the page. | 16:22.47 |
| so either I need to roll that into ever object I output (flatten the objects w.r.t. transforms), or I need to be smarter. | 16:23.24 |
| s/ever/every/ | 16:23.30 |
henrys | simple enough to do a transform with all the other state stuff right? | 16:23.35 |
Robin_Watts | henrys: Right, but then I need to reoutput all the state stuff whenever a color or a stroke state or a transform changes. | 16:24.18 |
henrys | stuff being color, line ends etc. | 16:24.22 |
Robin_Watts | which means I end up rewriting the state pretty much every time. | 16:24.34 |
henrys | oh is it very common in mupdf? | 16:24.39 |
Robin_Watts | I fear it's common, yes. | 16:25.02 |
henrys | scale the coordinates instead? | 16:25.25 |
Robin_Watts | Think of how often something changes between pdf write operations. | 16:25.29 |
| yes, that's what I meant by flattening the objects w.r.t transforms. | 16:25.51 |
henrys | okay fine I'll read what you've right geez | 16:26.13 |
| ;-) | 16:26.20 |
Robin_Watts | :) | 16:26.22 |
henrys | or write evne | 16:26.27 |
Robin_Watts | ok, for consistency, I'm going to go with the way gs does it. | 16:27.38 |
| and for the lack of any better ideas for now :) | 16:27.47 |
tor8 | Robin_Watts: either make a state tracker, or dump explicit for each element. I'd do the latter, and let gzip worry about file sizes. | 16:31.54 |
Robin_Watts | tor8: Yeah. In pdfwrite I track the state through gstates (pushes and pops etc) | 16:32.39 |
| In svg I'm just going to have a dirty flag. | 16:32.54 |
henrys | svg support gzip ? | 16:32.58 |
tor8 | henrys: all web servers that send svg support gzip | 16:33.27 |
| Robin_Watts: how would a dirty flag work? | 16:33.45 |
| Robin_Watts: a new <g> every time state changes? | 16:33.58 |
Robin_Watts | yes. | 16:34.03 |
| but I won't include the transform in the state. | 16:34.16 |
tor8 | that will be terrible for editing the svg in inkscape later | 16:34.18 |
henrys | right but it isn't part of the format so it really is not very good to depend on it. | 16:34.43 |
Robin_Watts | tor8: Presumably inkscape has the ability to 'flatten' groups ? | 16:35.39 |
tor8 | you could run the graphics through a pre-pass to collect the most common values to use as defaults, and then just override individual attributes where they differ | 16:35.51 |
henrys | does inkscape import pdf? | 16:36.02 |
ray_laptop | mvrhel_laptop: thanks for looking over my plan to finish the AutoColorDetection stuff for cust 801. As I replied in the email, I plan on adding a 'pnmcmyk' device that will use this to write either a pgm (neutral true) or pamcmyk32 format. | 16:36.06 |
tor8 | svg is xml, I wouldn't worry about bloat | 16:36.07 |
| since xml is going to be bloated no matter what... | 16:36.17 |
henrys | tor8:there is that. | 16:36.24 |
tor8 | and since xml is bloated, anybody who cares about size is going to gzip | 16:36.41 |
Robin_Watts | Well, just outputting explicit values each time is easy for me. | 16:36.43 |
henrys | ray_laptop:so we only have 1 page ahead with your background printing? | 16:36.55 |
Robin_Watts | and I can stay with using groups for transforms. | 16:36.56 |
mvrhel_laptop | ray_laptop: no problem thank you for talking it over | 16:37.13 |
tor8 | it seems to me that the main reason for wanting svg out is to edit the file | 16:37.24 |
Robin_Watts | completely forgot to look at ray_laptops commit :( | 16:37.32 |
tor8 | and for that, explicit values and using <g> in "natural" locations would help | 16:37.36 |
kens2 | Right, I'm off. Goodnight all | 16:38.55 |
Robin_Watts | Well, I now have mupdf outputting tiger as an SVG properly. That counts as complete, right? | 17:27.33 |
henrys | Robin_Watts:right you're done. | 17:42.18 |
marcosw | Robin_Watts: Based on your tiger.pdf -> SVG success I've contacted the customer and told him the software is ready to ship... | 17:42.19 |
Robin_Watts | woo hoo! | 17:42.48 |
henrys | a fuzzy diff with a raster device is quick way to find problems | 17:43.26 |
| probably stating the obvious | 17:43.44 |
Robin_Watts | henrys: Anything that involves text or bitmaps would find problems quickly :) | 17:51.29 |
| or clipping. | 17:51.46 |
| or blending. | 17:51.54 |
| ok, simple bitmaps work too. | 19:01.46 |
| SVG can't express bitmap masks, or bitmap clips though :( | 19:02.03 |
| marti has now taken all but one of my lcms2 patches. | 19:45.54 |
| but the one he hasn't taken yet is the biggest of the set. | 19:46.10 |
| OK, I've coded clip paths... can anyone think of a good test file for them offhand? | 19:47.59 |
henrys | Robin_Watts:if you just want to smoke test use itext examples there is a pdf on this page: http://itextpdf.com/examples/iia.php?id=191 | 20:41.07 |
Robin_Watts | henrys: Thanks. I found a couple of examples earlier, and (after some tweaking) they both work now. | 23:24.16 |
| Next stop... text. | 23:24.34 |
| Just watched the end of S5 of Sons of Anarchy. | 23:24.52 |
| It is unquestionably the best plotted thing on TV. | 23:25.02 |
| Forward 1 day (to 2013/05/02)>>> | |