| <<<Back 1 day (to 2013/07/25) | 2013/07/26 |
sebras | tor7: alright, done. what is over at sebras/master is the best I can do. I did go over the manpage _again_ to make it consistenly say "Pan" instead of "Pans", e.g. so you'd better re-read it again. I also reworded a few commit messages (It just looked daft to have four in sequence on the form "Update ..."). | 00:17.31 |
| tor7: unless you ask me I don't intend to do anyting further with those commits. | 00:18.04 |
| s/me I/me to, I/ | 00:18.26 |
sebras | sleeps. | 00:18.30 |
scarpino | which advantages give me the commercial license? | 07:31.38 |
| or if I buy the commercial license, could I get quickly support to fix the bugs I found? | 07:32.32 |
| "or if you wish to pay for technical support, you will need to acquire a commercial license from Artifex." | 07:33.09 |
chrisl | scarpino: you really need to discuss things like that with our VP of sales, he can give you all the information like that | 07:34.41 |
scarpino | chrisl: ok, any idea about the cost? | 07:34.55 |
chrisl | scarpino: that will generally depend on your specific case. Scott (the sales VP) will usually ask you a bunch of questions and from that, he'll put together a proposal | 07:36.29 |
scarpino | chrisl: thank you. I'll mail sales@ then | 07:36.44 |
chrisl | scarpino: cool - sorry I can't be more help, but I'm just a simple engineer ;-) | 07:37.10 |
scarpino | chrisl: no problem, I understand this are "private" stuff ;-) | 07:37.31 |
tor7 | sebras: thanks. I've pushed. | 07:43.50 |
sebras | tor7: yey, no changes! that means my English skills were somewhat up to par last night! :) | 07:59.15 |
| tor7: didn't you build mupdf with clang?! | 08:03.53 |
| tor7: I get simd compilation errors in openjpeg. | 08:04.11 |
tor7 | sebras: I build with clang-3.4 | 08:05.23 |
sebras | tor7: right, I have clang 3.0-6.2 | 08:06.08 |
paulgardiner | tor7: The values I get back from FT_GetAdvance, I seem to have to devide by face->size->metrics.x_ppem. I'm guessing that I shouldn't be doing that explicitly. Any idea what correct usage is? | 08:19.25 |
chrisl | paulgardiner: I can't remember exactly, but that doesn't sound unreasonable - the advance will probably be in scaled font coordinates, and you probably want it in PDF user units. | 08:26.24 |
sebras | tor7: should we be calling opj_stream_destroy_v3() instead of opj_stream_destroy() which appears to be deprecated in the version of openjpeg we're using..? | 08:27.04 |
tor7 | paulgardiner: when I get the metrics from freetype, I always start by setting the face to size 1 | 08:28.28 |
| FT_Set_Char_Size(face, 64, 64, 72, 72) | 08:28.43 |
sebras | tor7: yes, we should. | 08:28.54 |
tor7 | then the metrics are in fixed point format, just divide by 65536 to get the metrics in unscaled space | 08:29.26 |
paulgardiner | chrisl: yeah, it seems reasonable to me. It's just that I wonder if there is a function that I should be using to interrogate the value, or whether there is a setup call to request advance back in other units. I get the impression that x_ppem isn't supposed to be public | 08:29.30 |
tor7 | paulgardiner: xps_measure_font_glyph() | 08:29.48 |
chrisl | Well, I would have a fitz api call for retrieving metrics, but that's just me...... | 08:30.17 |
tor7 | paulgardiner: also in fz_text_extract() | 08:30.22 |
paulgardiner | tor7: ah right. Just the sort of thing I was looking for. So why does that set the face to size 1? | 08:30.49 |
tor7 | the size argument is 26.6 fixed point format | 08:31.26 |
| and 72 is the nominal DPI | 08:31.39 |
| paulgardiner: though I think it may be worth doing as chrisl suggests, make a proper fitz api for getting font metrics | 08:32.09 |
| there hasn't been a lot of need for it before, hence the three places where we do it by interrogating freetype directly | 08:32.34 |
paulgardiner | Right okay good | 08:32.50 |
tor7 | sebras: _v3 is broken (according to robin_watts) | 08:32.59 |
sebras | Robin_Watts: is it? | 08:33.09 |
tor7 | it casts the void* to a FILE* and calls fclose on it... | 08:33.10 |
sebras | tor7: yes, but the void* is NULL in our case. | 08:33.28 |
| tor7: in other cases they might set the void * to point to a FILE*. | 08:33.42 |
tor7 | paulgardiner: I'd base it on xps_measure_glyph, but take care of supporting type3 fonts as well | 08:33.44 |
chrisl | tor7: isn't there a performance penalty in (re-)setting the font size like that? | 08:34.03 |
tor7 | sebras: or something like it... you'll have to ask robin | 08:34.06 |
sebras | tor7: will do. | 08:34.15 |
| tor7: maybe we set that void* somewhere... | 08:35.24 |
paulgardiner | tor7: I might leave adding the api to another day. This task is dragging me into enough pain as it is | 08:35.30 |
| I don't really want to be thinking about the type 3 case just now if I can help it | 08:35.59 |
tor7 | chrisl: good question, actually. | 08:36.59 |
paulgardiner | Might it make sense to call FT_Set_Char_Size when loading the font? | 08:37.24 |
tor7 | paulgardiner: the type3 case should be trivial (just font->t3widths) | 08:37.41 |
| we use 3 different char sizes for nefarious purposes | 08:38.01 |
| 64 and 1000 for metrics, and also 65536 to work around rounding issues in the outline extraction and scaling logic inside freetype | 08:38.44 |
| but I think we could probably make it work with 65536 everywhere | 08:38.56 |
| freetype stupidly scales (and rounds to integers) the outline twice | 08:39.24 |
| once by the char size, and once by the matrix | 08:39.41 |
| with limited precision on one of them | 08:39.47 |
| so char size 64 is really too low precision... | 08:40.07 |
| paulgardiner: I'll stick on the ROADMAP to clean up and isolate the uses of freetype into some more common font api | 08:40.28 |
paulgardiner | Right. I was just going to suggest a bug, but the ROADMAP would be better | 08:40.58 |
tor7 | the bug tracker is too burdened with bugs... I think ROADMAP is clearer for short term projects | 08:42.07 |
paulgardiner | That's weird. Using 64, 64, 72, 72 gives me slightly further spaced out text than when I simply divided by x_ppem | 08:42.24 |
tor7 | should probably put it on a wiki instead of hosting it on a branch in the git though... if only we had one :) | 08:42.26 |
sebras | Robin_Watts: tor7: a patch over at sebras/master to look at. I reset the userdata pointer just before calling the broken _v3 function. this is a bit ugly but ought to work and give no deprecatation warnings. | 08:43.31 |
tor7 | paulgardiner: there's also face->units_per_EM but that's only for some other unscaled values in the face struct | 08:43.40 |
paulgardiner | Very strange. It now falls slightly outside the box I previously calculated using pdf_measure_text | 08:45.14 |
tor7 | sebras: I don't understand why there is an opj_stream_destroy_v3 and why it's so stupidly broken by design... | 08:47.24 |
| [trunk] add functions to avoid to use FILE* into the API (thanks winfried). | 08:48.54 |
| Update issue 120 and update issue 198 | 08:48.54 |
Robin_Watts | sebras: IIRC, we use the userdata pointer to store a context in or something. | 08:49.55 |
| hence it can't safely be cast to FILE *. | 08:50.08 |
sebras | Robin_Watts: tor7: I agree, so this is why I reset the pointer to NULL before calling the _v3 interface. | 08:51.02 |
| and yes that sucks. :-P | 08:51.07 |
tor7 | we should mention that their bug fix to issues 120 and 198 are broken | 08:51.07 |
| https://code.google.com/p/openjpeg/issues/detail?id=120 | 08:51.19 |
| https://code.google.com/p/openjpeg/issues/detail?id=198 | 08:51.31 |
sebras | tor7: if they hadn't made the assumption that the userdata is a FILE* then it would have been alright. | 08:52.09 |
tor7 | they fixed the issue by not thinking it through and abusing the existing semantics... | 08:52.26 |
dbrgn | hi. does the mupdf library for android support extraction of text/tables from pdf? | 08:52.35 |
sebras | tor7: agreed. | 08:52.35 |
tor7 | they *should* have added a free_user_data callback function, then it would have worked | 08:52.46 |
Robin_Watts | sebras: I'd rather call the deprecated function, because the deprecated function does the right thing without requiring us to jump through hoops. | 08:52.51 |
| I hope they will undeprecate it in future. | 08:53.02 |
| or reprecate it :) | 08:53.09 |
sebras | Robin_Watts: they probably will create a opj_stream_destroy_v4()... | 08:53.28 |
sebras | wonders why they went for v3 instead of v2 though... | 08:53.40 |
| tor7: https://code.google.com/p/openjpeg/issues/detail?id=227 | 09:01.34 |
| tor7: apparently winfried has already complained, but for a different reason. | 09:01.51 |
paulgardiner | tor7: I tried replacing the 64s by 256*64 and then dividing the returned advance by 256, but still I get the same slightly more spaced out text than with the direct div by x_ppem | 09:02.30 |
tor7 | paulgardiner: maybe there are more factors involved than just the x_ppem? | 09:03.15 |
| it could also be hinting related | 09:03.20 |
paulgardiner | I guess, although the results using div by x_ppem seemed to match my bounding box. Of course that could mean there are matching inaccuracies in pdf_measure_text | 09:04.58 |
tor7 | paulgardiner: x_ppem * x_scale | 09:05.22 |
| is what the results from FT_Get_Advance should be scaled by, if my reading of the docs are correct | 09:05.53 |
| paulgardiner: or it could just be that the x_ppem is rounded to pixels, and lost precision | 09:06.40 |
| paulgardiner: http://freetype.org/freetype2/docs/reference/ft2-quick_advance.html#FT_Get_Advance | 09:07.48 |
| you could get the metrics unscaled I guess, and divide by the face->units_per_EM | 09:08.36 |
| in fact, I think that's probably a better idea all around | 09:09.08 |
| than what I've been doing | 09:09.15 |
| http://freetype.org/freetype2/docs/reference/ft2-base_interface.html#FT_LOAD_XXX | 09:10.09 |
| NO_SCALE | NO_HINTING | 09:10.36 |
| actually, just NO_SCALE should do | 09:10.53 |
paulgardiner | Hmmm. units_per_EM is 1000 whereas the divisor I was using and seemed be working (x_ppem) is 16 | 09:11.04 |
| Ah right. | 09:11.16 |
| I'll give that a go. | 09:11.36 |
tor7 | x_ppem is integer, where it ought to be float or fixed point | 09:11.54 |
paulgardiner | I was just thinking that given that 16 was working, 1000 wouldn't, but I missed that you were suggesting sending different flags | 09:12.48 |
dbrgn | no mupdf devs here? does it support content extraction from pdf files? | 09:21.28 |
tor7 | dbrgn: yes, we are here. yes, we support extracting all kinds of content from pdf files. | 09:21.48 |
dbrgn | tor7: great! thanks. | 09:21.55 |
tor7 | you'll just need to be more specific when asking questions :) | 09:21.55 |
dbrgn | tor7: sorry :) ok, so more specific: is it possible to extract images and text tables from pdf files on android? | 09:22.50 |
tor7 | dbrgn: mupdf is primarily a C library. the JNI bindings in the android app are not general purpose. we have a project planned to make more general bindings for JNI, but that's several months away at least. | 09:23.12 |
paulgardiner | :-) It still overflows the box. But I agree that what you've just suggested looks to be the best approach. I guess the overflow is caused elsehwere. | 09:23.34 |
tor7 | dbrgn: you could however base your work off the existing JNI bindings and adapt them to suit your needs. | 09:23.41 |
dbrgn | tor7: ok. but i guess in this case i could create my own specific JNI bindings? | 09:23.44 |
| tor7: ok, thanks. i'll take a look at it. | 09:23.50 |
tor7 | dbrgn: the "mudraw" tool has code to extract text in various amounts of detail | 09:24.21 |
| extracting tables as tables is tricky though, since all semantic information in a PDF is lost | 09:24.41 |
| PDF just has "draw this character at this position" information | 09:24.54 |
| the "text extraction" device in mupdf tries to reassemble the text back into paragraphs and columns and tables, but it doesn't always work | 09:25.55 |
| there are plenty of cases that are problematic | 09:26.08 |
dbrgn | yeah, i know about the difficulties. i'll try and see if i succeed :) | 09:26.13 |
tor7 | I'd suggest looking at the various flavors of mudraw -t output to see if they'll do what you want first | 09:26.34 |
| curiousity makes me ask, what do you intend to do with this? | 09:26.58 |
dbrgn | tor7: it would be great if i could extract the table on the second page here: http://www.skyguide.ch/fileadmin/dabs-today/DABS_20130726.pdf | 09:27.38 |
tor7 | the android app has a reflow mode (one of the button in the toolbar) that runs a pdf -> html extraction and displays the resulting html in a webview | 09:27.43 |
dbrgn | tor7: as well as the first page as image. | 09:28.02 |
chrisl | paulgardiner: I'd expect the advance width to be larger than the width of the bbox - if I follow correctly, that's what you're seeing? | 09:28.18 |
paulgardiner | tor7: Do you think I should use FT_Get_PFR_Metrics to get units_per_EM, or is there something better? | 09:28.20 |
| chrisl: Yes, I see what you mean, but it's not that. I'm seeing the text going outside the box returned by pdf_measure_text. | 09:29.35 |
tor7 | dbrgn: the image on the first page is an image we can extract. I was worried it'd be line art, in which case you'll need to render the page to get a bitmap out | 09:29.43 |
dbrgn | tor7: ok, that sounds great. | 09:30.01 |
tor7 | getting the table data out as a table, we can't do that very well yet | 09:30.48 |
| our text-assembly doesn't recognize the table cells as such | 09:31.24 |
| but you can try your own assembly by looking at the raw text data yourself of course | 09:31.45 |
dbrgn | ok, but maybe i can get the content line-by-line? | 09:31.52 |
chrisl | paulgardiner: do you know what freetype the bbox from pdf_measure_text() comes from? | 09:32.04 |
| s/freetype/freetype call | 09:32.14 |
tor7 | paulgardiner: pdf_measure_text uses the /Widths array in the font descriptor. is this a base14 font you're trying? | 09:34.01 |
| if it's a substitute font I'd expect the metrics from freetype and pdf_measure_text to vary wildly | 09:34.19 |
paulgardiner | tor7: It's one I created without a /Widths array. | 09:34.25 |
| So I'm guessing we create are own hmtx table | 09:34.44 |
tor7 | yeah. the hmtx table is seeded with the FT_Get_Advance widths | 09:35.08 |
paulgardiner | More of a problem at the moment is getting access to units_per_EM. | 09:35.31 |
| tor7: Right. So it should be consistent | 09:35.41 |
tor7 | paulgardiner: actually, that's a slight lie. the hmtx table is seeded with the advances returned by FT_Load_Glyph | 09:36.29 |
paulgardiner | It also uses fz_bound_glyph. It could be just the width of the last char that is being miscalculated | 09:36.41 |
tor7 | those sometimes vary minutely from the metrics returned by FT_Get_Advance | 09:36.42 |
| due to the various bits of rounding in the processing steps | 09:36.48 |
| and it uses fz_bound_glyph for the glyphs at the pen locations. I would guess that the final pen location is outside the bbox (since the next glyph would start beyond it) | 09:37.55 |
| paulgardiner: to be more consistent, we should rename pdf_measure_text to pdf_bound_text | 09:39.02 |
paulgardiner | Yeah | 09:39.12 |
tor7 | it returns the bounding box, the other measure functions return the advance metrics | 09:39.16 |
| the pdf_text_stride should have the pdf_measure_text name :) | 09:39.36 |
paulgardiner | Didn't understand what you said about the final pen location. I'd have though that would cause the box to be overestimated. | 09:40.10 |
| I'm probably misunderstanding what you meant | 09:40.22 |
tor7 | the bbox of "abcd" would cover those glyphs | 09:41.10 |
| the pen location after advancing through "abcd" would be past the end of the bbox | 09:41.21 |
| ready where the "e" would start, if there had been an "e" | 09:41.35 |
paulgardiner | Yeah, but I'm seeing text slightly truncated | 09:41.51 |
| So part of the d is missing | 09:42.04 |
tor7 | so the pen location is short? oh. that's odd. | 09:42.15 |
paulgardiner | The box doesn't quite encompasse it | 09:42.16 |
tor7 | now I'm confused. | 09:42.36 |
paulgardiner | When I was simply dividing by x_ppem the text fit within the box that pdf_measure_text calculated | 09:43.01 |
tor7 | the pdf_measure_text bbox, is that narrower or larger than the FT_Get_Advance sum? | 09:43.05 |
paulgardiner | pdf_Measure_text uses the advance to decide the char positions, and for each calls fz_bound_glyph for each and unions the results | 09:44.19 |
| But with less "for each"s | 09:44.43 |
tor7 | the char position x is += h.w after each glyph. but the last position is never used. | 09:45.16 |
| so far so good? | 09:45.24 |
paulgardiner | Oh. There's a hardwired 1000.0! | 09:46.12 |
tor7 | the hmtx is in 1000.0 font space (that's hardcoded in so many places in the pdf spec) | 09:46.34 |
paulgardiner | I wonder if that's the units_per_EM that I'm struggling to find an API to interrogate | 09:46.40 |
| Oh Okay | 09:46.53 |
tor7 | paulgardiner: for type1 fonts, it is usually 1000 | 09:47.00 |
| for ttf fonts, more commonly 2048 or 1024 | 09:47.07 |
| ft_face->units_per_EM is right there in the face struct | 09:47.31 |
paulgardiner | Maybe I need another include | 09:48.15 |
tor7 | http://freetype.org/freetype2/docs/reference/ft2-base_interface.html#FT_FaceRec | 09:48.17 |
| that is part of the base freetype.h include | 09:49.03 |
paulgardiner | I thought the fact that I could call FT_Get_Advance implied I had whatever freetype include I needed, and the lack of units_per_EM was because the struct was opaque. | 09:50.36 |
Robin_Watts | tor7: fz_pixmap_bbox could be fz_bound_pixmap - but it's an irect not a rect. | 09:50.38 |
tor7 | robin_watts: yes, I think that's the reason for the different name | 09:51.04 |
sebras | tor7: Robin_Watts: I posted a recommendation and a potential patch for them to look at over at https://code.google.com/p/openjpeg/issues/detail?id=227 | 09:51.14 |
| in the meantime I guess I revert my patch at sebras/master | 09:51.28 |
Robin_Watts | sebras: Either your patch to openjpeg or just reverting the deprecation would seem fine to me. | 09:52.23 |
tor7 | sebras: maybe that should be set_user_data_v3 | 09:52.27 |
Robin_Watts | tor7: No, _v3 would seem to insist that userdata == FILE *. | 09:52.47 |
sebras | tor7: I thought about it, but I sincerely dislike versioning interface... | 09:52.51 |
| interfaces. | 09:53.03 |
tor7 | me three, but when in rome... | 09:53.30 |
sebras | tor7: if they want to have _v3 they'll have to add it themselves. :-P | 09:54.09 |
tor7 | I'm thinking of backwards compatibility with existing source, which I guess is their motivation for _v3 | 09:54.32 |
| but let them deal with it | 09:54.53 |
sebras | mmm. I'll have breakfast instead! :) | 09:55.03 |
tor7 | robin_watts: we could just remove the deprecation in the opj headers in our bugfix branch | 09:55.14 |
| and get rid of the warnings that way | 09:55.19 |
| anyway, I think it's time to tag and cook a release candidate... | 09:55.44 |
| robin_watts, paulgardiner: the version string in android "1.2 (Build 50/armv7a)", how is that to be updated? | 10:04.03 |
Robin_Watts | tor7: strings.xml in the android/res dir. | 10:23.52 |
| I update that when I do a build. | 10:23.58 |
| tor7: Removing the deprecation warnings from our branch would be the neatest solution, IMHO. | 10:37.42 |
tor7 | robin_watts: okay, so no magic script. | 10:47.34 |
Robin_Watts | no, sorry. | 10:48.09 |
| The ant build system seems to cope poorly with the idea of having to build different configurations | 10:48.45 |
tor7 | I'll just bump it to 1.3 and leave the build and arch for you when you do the builds. | 10:48.50 |
Robin_Watts | googles solution is "copy the code into several directories and maintain several versions" | 10:49.09 |
tor7 | robin_watts: ugh. | 10:49.21 |
Robin_Watts | No magic "we'll cpp the source tree and build it several times according to this config script" or anything | 10:49.40 |
tor7 | robin_watts: is the string on tor/master now okay with you? | 10:49.47 |
| robin_watts: even we can do that with our fairly simple makefile based system... | 10:50.27 |
Robin_Watts | I'd be tempted to just leave it as 1.3 (GIT Build) | 10:50.33 |
tor7 | right. that'd work too. | 10:50.49 |
Robin_Watts | as otherwise people will assume that $REV/$BUILD should be filled out. | 10:50.59 |
tor7 | re-fetch | 10:51.29 |
Robin_Watts | I have to run helen to the station in a mo. She has to take the dogs on the train. should be "fun" | 10:51.36 |
tor7 | my sympathies... | 10:51.46 |
Robin_Watts | looks good. | 10:51.50 |
tor7 | I'll make some tar balls and builds then. | 10:52.02 |
| I'll leave the android to you | 10:52.06 |
| when you get back | 10:52.13 |
| no point worrying about iOS for a while... apple's developer site is still down | 10:52.30 |
| https://developer.apple.com/support/system-status/ | 10:53.28 |
paulgardiner | tor7: can you remind me how, given an fz_font, I can tell it's a base 14? I thought we decided the data would be pointing to a known array, but I'm seeing ft_data NULL here. On the other hand the name is "Helvetica" which is perhaps sufficient. | 11:07.07 |
tor7 | paulgardiner: let me check | 11:07.29 |
| ft_data may be null because it's pointing to static data that shouldn't be freed | 11:07.41 |
| so that may indeed be the indication you're looking for | 11:07.49 |
paulgardiner | So perhaps if data non NULL throw. If NULL use the name as Basename in the pdf descriptor I create? | 11:08.41 |
| Maybe I should check that the name is one of the base 14 names, but perhaps that is unnecessary because of the data NULL check. | 11:10.53 |
tor7 | paulgardiner: pdf_load_embedded_font (and xps_load_font) both set ft_data | 11:15.45 |
| so base14 and substitute fonts have ft_data==NULL | 11:15.57 |
| and substitute fonts have ft_substitute set to true | 11:16.15 |
paulgardiner | Right. Thanks | 11:16.29 |
tor7 | so a check for ft_data==NULL && !ft_substitute should tell you whether it's a base14 font or not | 11:16.59 |
| alternatively we can add an is_base14 flag to fz_font | 11:17.23 |
paulgardiner | And if it is base14 I can use the name, it seems | 11:17.26 |
tor7 | what do you need the name for? | 11:17.35 |
| for the font descriptor? | 11:18.01 |
paulgardiner | So that the pdf device can fill in Basefont in the descriptor it creates | 11:18.10 |
| Yep | 11:18.13 |
tor7 | that should work. the font->name gets passed in when the font is constructed. | 11:19.25 |
paulgardiner | Great. | 11:19.39 |
tor7 | for base14 fonts, it's always been through the base_font_names cleanup array. | 11:19.42 |
| so you could make a more future-proof is-base14-test by just comparing the font name | 11:20.33 |
| but that might still give some false positives | 11:20.52 |
| in case it's an embedded font with the same name | 11:20.59 |
| so nevermind that idea | 11:21.11 |
paulgardiner | I just need to map the gids to winansi and then I think I have first version we can commit | 11:23.22 |
| Is that likely to be something we have done elsewhere? | 11:26.46 |
tor7 | pdf-unicode.c | 11:31.24 |
| but only half of it :) | 11:31.30 |
paulgardiner | Bleh! I guess I need to get FT to tell me the corresponding character name and then reverse look it up in pdf_win_ansi | 11:31.42 |
tor7 | paulgardiner: pdf_load_tounicode maps from encoding to gid, and then gid back to unicode, and saves the result as a cmap | 11:32.00 |
| what you need is from gid to unicode (or winansi) in the pdfwrite device right? | 11:32.34 |
| because when creating the text object, you just need to map unicode to gids | 11:32.47 |
| paulgardiner: I wonder, is winansi not a subset of latin-1? | 11:33.16 |
paulgardiner | Well that fact that I'm cheating at the moment and passing the winansi char to fz_add_text, rather than unicode might help :-) | 11:34.02 |
tor7 | :) | 11:34.34 |
| that won't render too well if you pass the same fz_text object to a draw device :P | 11:34.49 |
paulgardiner | I know. But I promise never to do so. | 11:35.17 |
| So winansi subset latin-1 means? Would that imply that all the unicode chars I get back in this situation don't need converting? | 11:36.16 |
| i.e. winansi => gid => unicode is the identity | 11:38.26 |
| Ah. The pdf device is already using it->ucs when outputting Tj. So for WinAnsi that should just work. | 11:42.10 |
| That was an easy change: zero code. | 11:42.26 |
tor7 | ew. that it->ucs thing won't work too well in the real world, but for starters I'm sure it'll do | 11:43.42 |
| winansi is a superset of latin-1 with some characters being different... | 11:44.11 |
| so not strictly unicode | 11:44.17 |
| windows codepage 1252 | 11:44.59 |
paulgardiner | I could shoot for a different restriction of use. I could demand that ucs is set to unicode. Then I need to convert from unicode to winansi, rather than from gid. | 11:48.24 |
| I mean not handle the case where we have no known mapping to unicode, which is not much of a restriction seeing as we are looking to handle only WinAnsi at the moment. | 11:51.51 |
| And then assume that in fz_text items ->ucs is set correctly, which presumably it isn't for the text from some pdf docs. | 11:52.46 |
tor7 | ucs *is* unicode. but ucs is *not* the glyph that's printed on the page. it may not even be set. there can be a N-to-M mapping between it->gid and it->ucs by either one being set to -1 | 12:12.06 |
| of course, that won't be the case for any base-14 fonts | 12:12.41 |
| but it can and will happen for embedded fonts (usually in the XPS case) | 12:13.03 |
| so for base14 I think your assumption is safe enough (but only by lucky coincidence) | 12:13.46 |
| paulgardiner: on the other hand, I don't think it should be that difficult to map from gid -> winansi using a scheme similar to what exists in pdf_load_unicode | 12:14.38 |
| oh... but the pdf_load_unicode function has a big TODO shaped hole :) | 12:15.33 |
paulgardiner | Oh okay. I was hoping that it->ucs would be reliable in the XPS case. Not so good a restriction then | 12:16.06 |
tor7 | right where the logic I thought you could reuse is supposed to be :) | 12:16.08 |
| it->ucs is only reliable for the base14 fonts | 12:16.26 |
paulgardiner | Not also for fonts that have a tounicode cmap? | 12:16.49 |
tor7 | even regular embedded or substitute fonts may not have a usable to_unicode table, which will put garbage in the ucs | 12:16.53 |
| when you are creating the fz_text object, you are mapping from a unicode(-ish) encoding using freetype to the gid | 12:18.03 |
| what I'm suggesting is to create a reverse of that encoding table and use that in the pdfwrite device to get back unicode or winansi values | 12:18.40 |
| for embedded fonts, you can cheat and use a CIDFont fontdescriptor with an Identity-H encoding and just use the gid values directly | 12:19.14 |
| for substitute fonts, I don't have a plan yet | 12:19.27 |
paulgardiner | Yeah, I got that. Was just taking a detour wondering if the ucs value could be used instead, but I guess not then | 12:19.28 |
tor7 | right. in the general case, we won't want to. but for the narrow case of getting base14 to work, you can take the shortcut of using the ucs. | 12:20.05 |
| it all depends on how hard it'll be to create a reverse lookup table | 12:20.29 |
| mapping from gid -> unicode using freetype's tables | 12:20.50 |
| the biggest problem here is that simple fonts (as they're called in PDF) can't do more than 255 characters | 12:21.49 |
| so anything outside WinAnsi for base14 fonts can't be represented in our scheme | 12:22.11 |
| (not that I think it really matters for now) | 12:22.31 |
paulgardiner | Yeah. I was just hoping to make as much of the code I put in also correct for the more general case. | 12:22.56 |
| I'm wondering now whether to stick with what I have for the initial commit, seeing as making use of ucs isn't a longterm solution | 12:23.55 |
| At the moment I'm pretending that WinAnsi is a subset of unicode | 12:24.33 |
sebras | tor7: I have a pdf here which has a space after the last word of at the end of every line. when I copy/paste the text, do we want to keep this, or do we want to trim the initial/trailing whitespace? | 12:24.42 |
tor7 | sebras: I don't know. | 12:25.11 |
sebras | tor7: I can't imagine there being a case where you'd want to keep those spaces. newlines is a different case of course! :) | 12:25.50 |
tor7 | it's a matter of policy, how much to clean up :) | 12:26.08 |
| do we also want to remove double spaces after a period? | 12:26.16 |
sebras | I would vote yes on that too, but I know there are differing opinions on that. | 12:29.02 |
| I'm cutting a pasting a bit from this pdf so I noticed that I need to remove the trailing whitespace which set me of thinking about it. | 12:29.47 |
tor7 | sebras: if there are trailing space characters at the end of lines, it makes it easier to recover hyphenated words | 12:32.56 |
| but we can't rely on that anyway, so I guess we could just strip whitespace off the ends of copied (and extracted) text | 12:33.14 |
sebras | mmm, I'm thinking that we want to retain the whitespace for mudraw -t though. | 12:33.37 |
| because in that case we want it to be 100% accurate. | 12:33.45 |
| stripping the whitespace is more of a convenience. | 12:33.53 |
tor7 | even that isn't 100% accurate, we insert spaces using heuristics | 12:34.02 |
sebras | true, hm.. | 12:34.15 |
| after the rc! :) | 12:34.24 |
Robin_Watts | reads the logs "fonts fonts fonts...". OK I didn't miss anything :) | 12:36.50 |
zeniko | Robin_Watts: ping | 12:47.10 |
Robin_Watts | zeniko: hi | 12:47.25 |
zeniko | I've managed to test my openjpeg submodule changes on the cluster | 12:48.20 |
| Looks like you have not yet had the time to merge the additional test files I've sent you | 12:48.38 |
Robin_Watts | zeniko: no indeed, not yet. | 12:48.55 |
zeniko | There seem to be some hundred differences with Shelly's changes, | 12:50.20 |
| but most of them seem minor enough that I couldn't tell whether they're progressions, regressions or simply noise | 12:50.46 |
| I also still have to run the changes through the fuzzing files locally (since I assume they aren't tested by the cluster) | 12:51.42 |
| Another question was: does the cluster also run through the XPS files? | 12:52.20 |
Robin_Watts | I don't think so. | 12:52.52 |
zeniko | Would it be possible to add them (after the holiday and release season)? | 12:53.32 |
Robin_Watts | probably, (sorry, phone went) | 12:55.22 |
| Just looking now. | 12:55.34 |
zeniko | marcosw: FYI: my bmpcmp e-mail notifications consist mainly of errors ("/bin/sh: 1: pkg-config: not found", "cp: cannot stat `/home/marcos/cluster/users/zeniko/ghostpdl/urwfonts': No such file or directory" and "make: pkg-config: Command not found"), the website works fine, though | 12:56.58 |
Robin_Watts | zeniko: OK. A mupdf cluster test includes the .xps files. | 12:58.48 |
| at least the xpsfts ones. | 12:58.56 |
zeniko | Robin_Watts: Thanks. I'd be great if you could also include the ones I've sent you (which cover multiple edge cases MuXPS still gets wrong) | 13:00.30 |
Robin_Watts | It's possible those haven't made it into our cluster repo yet :( | 13:01.12 |
| I'll need to talk to marcosw about that. | 13:01.27 |
paulgardiner | tor7, robin_watts: Initial commits for freetext-annotation support are on paul/signature-appearance | 13:09.11 |
zeniko | Robin_Watts: thanks, but first enjoy your holidays! | 13:12.44 |
Robin_Watts | tor7: When you talk about me building android stuff "when I get back", did you mean "when I get back from dropping Helen at the station" or "when I get back from holiday" ? | 13:56.18 |
| The latter I hope. | 13:56.20 |
tor7 | robin_watts, paulgardiner: mupdf.com/news2 | 15:19.46 |
Robin_Watts | gimme 5 mins. | 15:19.58 |
dbrgn | can mudraw output image data to stdout | 15:20.54 |
| ? | 15:20.58 |
Robin_Watts | dbrgn: You could try "-o -", but I suspect not as standard. Trivial change though. | 15:22.26 |
dbrgn | Robin_Watts: nope, "-" is not implemented. | 15:22.53 |
paulgardiner | tor7: LGTM | 15:23.40 |
tor7 | dbrgn: -o /dev/stdout | 15:23.58 |
Robin_Watts | tor7: Looks good to me. | 15:24.16 |
tor7 | robin_watts, paulgardiner: thanks. it's live now! | 15:25.28 |
| chrisl: henrys: MuPDF 1.3 RC 1 is out. | 15:25.38 |
Robin_Watts | did you tag it? | 15:26.04 |
henrys | tor7:early this time. | 15:26.39 |
dbrgn | tor7: doesn't seem to work. possibly because it needs the filetype suffix. | 15:27.08 |
tor7 | robin_watts: tag pushed to origin | 15:27.24 |
| henrys: robin_watts is going on vacation, figured we'll do the big release once he gets back. | 15:27.42 |
| and now's a good time, there's a lull in the commits and changes :) | 15:28.03 |
Robin_Watts | dbrgn: Indeed, it looks for the suffix. | 15:33.08 |
| but if it doesn't find one, it'll send PNG format data. | 15:33.51 |
dbrgn | Robin_Watts: hm, but "mudraw -r 120 -o image.png DABS_20130726.pdf 1" works while "mudraw -r 120 -o /dev/stdout DABS_20130726.pdf 1" doesn't (return code 0). | 15:35.27 |
| there is simply nothing returned. | 15:35.33 |
Robin_Watts | and the changes needed inside mudraw aren't trivial if we want them to work portably on windows too. | 15:35.47 |
| dbrgn: Yes, you may be out of luck. | 15:36.27 |
| Can you use named pipes? | 15:36.39 |
tor7 | robin_watts: consider adding a flag to let "mudraw -f pgm -o -" work? | 15:36.44 |
Robin_Watts | tor7: The problem is we call fz_write_pnm etc. | 15:37.03 |
| and they take a char*filename. | 15:37.13 |
henrys | marcosw:you here? | 15:37.14 |
tor7 | robin_watts: oh, right! | 15:37.15 |
dbrgn | Robin_Watts: named pipes would probably work, but then you're using the filesystem and can use regular files as well. | 15:37.25 |
Robin_Watts | so we'd need to push the - => stdout thing in there, and that's nasty. | 15:37.31 |
tor7 | they could take "-" as the filename and use stdout, portably enough | 15:37.31 |
| but yeah, that's nasty | 15:37.38 |
| -f format -o /dev/stdout should be easier | 15:37.54 |
Robin_Watts | possibly we should generalise those functions to take a fz_output *. | 15:38.29 |
| then the filename versions can be trivially implemented in terms of them. | 15:38.56 |
| Less than 24 hours til I leave the house for holiday. I should probably pack at some point. | 15:39.45 |
tor7 | robin_watts: I thought we sort of did | 15:43.23 |
Robin_Watts | sort of did what? | 15:48.18 |
ablemike | Hi all, I have a small bash script that I am using to crop PDFs using CropBox | 15:49.46 |
| I am a bit confused as to what to search next. | 15:49.53 |
| My PDFs are definitely being cropped and rendered correctly in viewers that respect the CropBox definitions. | 15:50.19 |
| However, I need to actually take that CropBox and make a true crop, EG. actually trimming the paper size down to that dimmension. | 15:50.56 |
| dimension* | 15:51.04 |
| any help here is much appreciated. | 15:51.47 |
Robin_Watts | ablemike: It sounds to me like you want to trim the MediaBox to be the same as the CropBox. | 15:52.44 |
| I am not aware of any tools that will do that for you. | 15:52.54 |
ablemike | if I can access MediaBox via CLI that might work :) | 15:53.14 |
| I just didn't know the terms | 15:53.22 |
henrys | for most folks we just say have a nice vacation but for Robin_Watts "Good Luck" is more appropriate! | 15:56.17 |
paulgardiner | As in "Don't get eaten"? | 16:00.22 |
henrys | or hope the inoculations were effective | 16:02.35 |
tor7 | robin_watts: sort of generalise the functions to take a fz_output. | 16:03.08 |
Robin_Watts | henrys: Didn't need any innoculations. | 16:03.25 |
tor7 | robin_watts: where are you off to for vacation? | 16:03.40 |
Robin_Watts | Namibia | 16:03.45 |
tor7 | ah, right! I should've remembered that :) | 16:04.19 |
| stay safe! don't get mugged. | 16:04.30 |
Robin_Watts | Yeah. Helen promises me that car jacking doesn't happen in Namibia, just in south africa. | 16:05.04 |
chrisl | There's no point wishing Robin a good/safe holiday now, as he'll be back online in the car to Heathrow, in Heathrow, the airport at the other end.......... ;-) | 16:06.11 |
Robin_Watts | chrisl: I have booked an airport lounge with wifi while we wait for the flight in south africa, yes :) | 16:06.43 |
chrisl | So, better to wait until you actually are off into the wilderness! | 16:07.06 |
Robin_Watts | tor7: I think we should have fz_output_pnm that takes an fz_output | 16:07.47 |
| marcosw1: Hey. | 16:07.51 |
| I've added some new pdfs from zeniko into the tests_private/pdf/sumatra directory. | 16:08.14 |
| I've also added his xps files into tests_private/xps/sumatra and enabled that dir in build.pl | 16:08.41 |
marcosw1 | robin_watts: yeah, that's a bit of problem. | 16:08.49 |
henrys | marcosw1:did you actually build 801's system? | 16:08.53 |
Robin_Watts | I note that we aren't testing the ms xps files. | 16:08.59 |
| marcosw1: oh, how so? | 16:09.06 |
marcosw1 | Two of the files have utf-8 characters in their name, this confounds the cluster | 16:09.07 |
henrys | marcosw1:looks like a windows setup | 16:09.15 |
marcosw1 | Fri Jul 26 09:03:31 PDT 2013: svn: svn: Can't convert string from 'UTF-8' to native encoding: | 16:09.16 |
| Fri Jul 26 09:03:31 PDT 2013: svn: svn: /home/marcos/cluster/tests_private/pdf/sumatra/0_-_opwd_?\195?\164?\195?\182?\195?\188?\226?\130?\172.pdf | 16:09.16 |
| Fri Jul 26 09:03:31 PDT 2013: svn: svn: Can't convert string from 'UTF-8' to native encoding: | 16:09.16 |
| Fri Jul 26 09:03:31 PDT 2013: svn: svn: /home/marcos/cluster/tests_private/pdf/sumatra/0_-_password_?\195?\164?\195?\182?\195?\188?\226?\130?\172__fails__.pdf | 16:09.16 |
Robin_Watts | marcosw1: Oh, sorry. Delete those files? | 16:09.23 |
marcosw1 | Presumably renaming them would work, but I can't do an svn_update so I'm not sure how to proceed. | 16:09.52 |
| I've temporarily disabled the svn update tests_private step to fix the cluster | 16:10.28 |
Robin_Watts | marcosw1: If we rename them, we lose the password, right? | 16:10.50 |
| hence just deleting them seems easiest. | 16:10.58 |
| want me to do that? | 16:11.02 |
marcosw1 | robin_watts: the password should be stored in a file with the same name as the pdf with the extension .pwd (see 0_-_password_letmein.pdf.pwd 0_-_password_password_crypt_level_5.pdf.pwd) | 16:11.42 |
| robin_watts: since I can't do an svn update to get the files I don't think I can rename or delete them | 16:12.11 |
Robin_Watts | I will try. | 16:12.17 |
marcosw1 | I don't think the files in tests_private/xps/sumatra are included in a cluster run, I'll add that directory. | 16:13.17 |
| actually they are, but only for GhostXPS | 16:13.47 |
Robin_Watts | marcosw1: I added that dir about 1/2 hour ago. | 16:14.15 |
| You'll note I added the MS directories too, but left them commented out. | 16:14.32 |
| Is there a reason we don't test those files? | 16:14.38 |
marcosw1 | henrys: I did not build customer #801 project, I don't have a windows setup. In any case, I'm not sure what that would help, I presume they are correct re. the missing blue plane. | 16:15.22 |
| robin_watts: give me a sec to look at the build.pl code | 16:16.10 |
| the tests_private/xps/sumatra files are being included in a mupdf test | 16:19.12 |
Robin_Watts | marcosw1: Right, cos I added them :) | 16:21.01 |
| But the ms ones aren't being tested either for ghostxps or mupdf. | 16:21.18 |
marcosw1 | sorry, I thought you were saying that wasn't working | 16:21.34 |
Robin_Watts | no, I was just telling you that I'd been fiddling. | 16:22.08 |
marcosw1 | is there a reason you left the ms ones disabled? | 16:22.10 |
Robin_Watts | Well, the sumatra ones are new, and we specifically wanted them tested, so I enabled them. | 16:22.40 |
marcosw1 | robin_watts: if we were better about committing the changes to the cluster code I could just read the logs and figure out what's been going on :-) | 16:22.48 |
Robin_Watts | The ms ones have been there for a while, and weren't being tested, so I wanted to check with you as to why they weren't being tested already in case there was a good reason. | 16:23.23 |
marcosw1 | I don't think there is any reason, just fewer tests means faster cluster runs. | 16:24.33 |
Robin_Watts | marcosw1: OK, want to try svn update now ? | 16:25.02 |
marcosw1 | works. | 16:25.31 |
marcosw1 | thinks that miles has developed a hardware fault, it's gone down twice this week (points is my fault, I rebooted it after a configuration change). | 16:28.37 |
henrys | marcosw:I know you keep gs executable for testing do you also have old pcl's so we can narrow down the performance issue just reported? | 16:30.52 |
| marcosw1 ^^^ | 16:31.29 |
marcosw1 | henrys: no, ghostpcl regressions are so rare that I just run git bisect when they do occur. | 16:31.52 |
Robin_Watts | marcosw1: You suckup :) | 16:33.07 |
henrys | the key is not to change it ;-) | 16:33.41 |
| marcosw1:just from looking at the email I'd say if mono blame Robin_Watts else blame mvrhel_laptop ;-0 | 16:36.05 |
| mvrhel_laptop: are you around? | 16:37.13 |
mvrhel_laptop | henrys: yes | 16:46.03 |
| henrys: is this the cyan and blue data bug? | 16:46.56 |
henrys | if you have any great ideas about 801 and pcl color I'm all ears. PCL really needs to have an RGB con tone device to work properly. | 16:47.01 |
mvrhel_laptop | 694435? | 16:47.01 |
| henrys: let me look it over and get caught up. i was surprised to see non-RGB used for PCL | 16:47.40 |
| robin_watts: if I miss you before you sign off for the day, have a great holiday | 16:48.27 |
Robin_Watts | thanks. | 16:48.45 |
mvrhel_laptop | henrys: so how does PCL behave when going out to a CMYK device now? I know we had this discussion many times with respect to approximations etc. The "right" way to do this is to have a device that maintains the RGB conton buffer until the end and then does the mapping from RGB to CMYK + spot | 16:54.45 |
henrys | it simply doesn't behave correctly with cmyk. | 16:55.35 |
| we used to have a some rube goldberg contraption but we've gotten rid of it. | 16:56.22 |
| pcl is completely rgb I can see how you would get K but not sure how there would ever be a spot color. Is there something in icc that would map an rgb triple to something that used the spot color? | 16:58.08 |
mvrhel_laptop | henrys: yes | 16:58.21 |
| my thought is this | 16:58.26 |
ray_laptop | if your device profile has 5 components out, then that can produce 'Blue' for certain selected bluish colors, right ? | 16:59.44 |
mvrhel_laptop | similar to the set up that we had before, we add in another profile option that is a device link profile that can map from RGB to CMYK + spots which will get applied in a final step by the device. to the graphics lib, the device behaves like RGB and the graphics lib would not have any knowledge about this profile | 17:00.49 |
| ray_laptop: yes, we can do that now. we can specify N-color ICC profiles for tiffsep and psdcmky | 17:01.12 |
ray_laptop | other than picking the right RGB, PCL doesn't have any way to directly map to Blue | 17:01.23 |
henrys | mvrhel_laptop: obviously a band at a time. | 17:02.16 |
mvrhel_laptop | what we are talking about here, is another device that is really RGB but will use a N-color device link profile at the end | 17:02.16 |
| yes | 17:02.20 |
ray_laptop | Not sure how they come up with such an ICC profile that mixes in a blue colorant sometimes, but I've seen 6 color profiles that use Orange and Green | 17:02.32 |
mvrhel_laptop | well that is for the customer to worry about in my opinion. How they want that generated is another problem. It is no profile to specify with an ICC profile | 17:03.32 |
ray_laptop | in the package they posted, it had some ICC profiles, iirc | 17:03.35 |
mvrhel_laptop | oh ok. | 17:03.43 |
| henrys: we could possible alter the psdrgb device to demo this | 17:03.58 |
ray_laptop | mvrhel_laptop: they sent their entire code snapshot | 17:04.19 |
mvrhel_laptop | oh | 17:04.23 |
| so we could just fix up their device | 17:04.37 |
ray_laptop | Since it was in henrys' lap, I didn't look into how they set their device profiles | 17:04.53 |
mvrhel_laptop | me either. but I am getting a sinking feeling.... | 17:05.12 |
ray_laptop | why should I have all the fun ? Let henrys enjoy, too ;-) | 17:05.25 |
henrys | mvrhel_laptop, ray_laptop:I'm just starting setting up their system now. I'll let you know. | 17:05.57 |
| I was surprised they sent windows code - I thought they were linux | 17:06.17 |
ray_laptop | The "gotcha" for PCL is that RasterOps need to be able to read back colors that have been painted -- AS RGB | 17:06.35 |
mvrhel_laptop | right | 17:06.47 |
| that keeps us with rgb contone as far as the device appears to the graphics lib | 17:07.03 |
| if we want to keep our sanity | 17:07.20 |
ray_laptop | so hopefully their ICC profile is bi-directional and can get us from CMYKB backwards | 17:07.27 |
mvrhel_laptop | I dont think one can rely upon that | 17:07.49 |
henrys | ray_laptop:no mvrhel_laptop is saying we have to keep the rgb | 17:08.00 |
| as I parse it. | 17:08.10 |
mvrhel_laptop | yes. | 17:08.15 |
| that I believe is the safest approach | 17:08.23 |
ray_laptop | oh, and then transform the RGB buffer to CMYKB at the end ? | 17:08.29 |
mvrhel_laptop | yes | 17:08.34 |
ray_laptop | OK. That _does_ make it a lot simpler (if lower performance). | 17:09.17 |
mvrhel_laptop | yes, but from my limited understanding of PCL needs and my knowledge of ICC round tripping I think it is the better approach | 17:10.02 |
ray_laptop | and we are in the same boat as with TIS. We don't know if non-idempotent RasterOps are used until the page is done. | 17:10.12 |
henrys | bbiam | 17:10.31 |
mvrhel_laptop | TIS? | 17:10.56 |
ray_laptop | so we have to select between a 3 component imaging device if non-idempotent RasterOps are needed, or the (preferred) 5-component CMYKB imaging device if not. | 17:11.59 |
| because we really want to avoid having to transform the entire buffer | 17:12.21 |
mvrhel_laptop | so do you know that ahead of time | 17:13.44 |
ray_laptop | their target printers are > 150 ppm and PCL can be fast. Adding a full transform step to text pages would slow things down | 17:13.45 |
mvrhel_laptop | I agree it would slow things down | 17:13.57 |
ray_laptop | mvrhel_laptop: in clist mode we do. Collecting that info may have bit-rotted slightly, but we can fix it if so | 17:14.28 |
mvrhel_laptop | certainly know if a band is blank or even has color would help | 17:14.28 |
ray_laptop | mvrhel_laptop: well we know if it has color (the code you added) :-) | 17:14.55 |
mvrhel_laptop | right | 17:15.00 |
| that would keep things from slowing down for a lot of documents | 17:15.24 |
ray_laptop | mvrhel_laptop: but I think that was collected on a page basis only. Unlike what I did for PDF which is maintained for each band | 17:15.41 |
mvrhel_laptop | ray_laptop: you are talking about the presence of non-idempotent RasterOps? | 17:16.55 |
Robin_Watts | mvrhel_laptop: Just skimming the discussion so far. | 17:17.03 |
| Perhaps what we want is a call to getbits that will convert from rgb to cmyk. | 17:17.29 |
| using mvrhel's profile. | 17:17.39 |
| That way we minimise the special code in the device. | 17:18.00 |
henrys | the profile is invertible? | 17:18.28 |
mvrhel_laptop | henrys: not guaranteed. | 17:18.44 |
ray_laptop | robin_watts: The key thing is that we don't want to have to convert on every page | 17:18.49 |
| for the entire page | 17:18.55 |
henrys | so get bits won't work. | 17:19.00 |
Robin_Watts | henrys: getbits will work fine if we draw EVERYTHING in rgb, and then the final 'getbits' done by the device asks for the conversion to cmyk. | 17:19.38 |
| But that idea doesn't fit with rays idea of avoiding transforming the entire buffer. | 17:20.00 |
mvrhel_laptop | unless the getbits had some other information about the buffer/band | 17:20.17 |
ray_laptop | Since it is such a special device, I think just doing the conversion in their device is fine, sort of like what is done for monochrome mode when page_is_neutral | 17:20.19 |
Robin_Watts | essentially, I was proposing that we do what mvrhel_laptop initially suggested (render entirely in rgb, then convert to cmyk in the device), but was trying to avoid having to have that rgb->cmyk step in every device. | 17:21.01 |
| I'm not sure I see how ray_laptop hopes to avoid converting every pixel. | 17:21.25 |
henrys | mvrhel_laptop: I thought about this a long time ago ⦠and I really think there must be a way to create an invertible profile with extra information stored when the transformation is done. That would solve the problem nicely with any device. | 17:21.26 |
ray_laptop | except with 'page_is_neutral' we still render to CMYK, then do a simple transform to K | 17:21.27 |
| robin_watts: we didn't. To do that would require a custom 'image' device that takes in CMYK colors and produces only a single K plane. We haven't done that (yet). | 17:22.14 |
| But the CMYK->K transform is fast compared with ICC link profile transformation | 17:23.00 |
mvrhel_laptop | henrys: It is certainly possible (and easier) to create the profile that round trips ok from RGB and back. Compared to CMYK(B) to RGB and back (really impossible) | 17:24.38 |
ray_laptop | We probably want to do RasterOps in a compositor device, then it can operate in RGB. Then the per-band RasterOp needed can skip the compositor on bands that don't need. it | 17:25.02 |
mvrhel_laptop | henrys: if we had such a profile, we would want to hook in the inversion from CMYK(B) to RGB in any graphic lib getbits calls. Is that correct? | 17:25.33 |
ray_laptop | The 'put_image' for such a RasterOp compositor would be the place that transforms from RGB to device colors (like pdf14_put_image does now) | 17:26.29 |
mvrhel_laptop | ray_laptop: oh that is an interesting thought | 17:27.03 |
ray_laptop | I have to run. bbiaw | 17:27.24 |
mvrhel_laptop | so we treat this how we do pdf14 which can have these color spaces different than the device | 17:27.28 |
| henrys: see my above comment though | 17:27.44 |
henrys | mvrhel_laptop: I'm not thinking of a round trip exactly, I'm thinking of a profile and extra bookkeeping to record the conversions that don't invert. | 17:28.02 |
mvrhel_laptop | oh. well as I think of this more, having the round trip work properply RGB --> CMYK(B) --> RGB is really not that hard | 17:28.56 |
| and could be a requirement for this to work properly | 17:29.08 |
| we are never going to have a CMYK(B) that does not have a related RGB value. | 17:29.47 |
| granted with the interpolations there are some points in CMYK(B) that may be used that could cause an issue though | 17:30.14 |
henrys | yes be it does need to got back to it's original rgb value - hence the extra bookkeeping. | 17:30.26 |
| s/be/but | 17:30.48 |
mvrhel_laptop | I don't think you would need bookkeeping though if one spent time getting the profile right with respect to this | 17:31.07 |
Robin_Watts | mvrhel_laptop: Lots of potential for loss of accuracy in the roundtrip though. | 17:31.29 |
mvrhel_laptop | there may be some minor issues on the RGB gamut edges in the CMYK(B) space | 17:31.40 |
| This is why I would prefer my original approach | 17:31.58 |
henrys | I don't think they'll be able to tune the profile for their device and keep that requirement. | 17:32.01 |
mvrhel_laptop | it is much safer | 17:32.15 |
Robin_Watts | mvrhel_laptop: Keeping it in rgb and just converting at the end? | 17:32.31 |
mvrhel_laptop | syes | 17:32.35 |
| yes | 17:32.37 |
Robin_Watts | That sounds the sanest approach to me. | 17:32.38 |
henrys | agreed what is appealing about the other way is it will work with any device. | 17:33.04 |
mvrhel_laptop | if we know a band had neutral only or is all white, then we can avoid transforming it | 17:33.23 |
| with the profile | 17:33.26 |
Robin_Watts | To do what henry is suggesting (mapping an n dimensional space down to an m+k dimentional space, where m+k = n and m dimensions are what you really want, and k dimensions are extra bookeeping) seems like a HARD problem | 17:34.00 |
| Of course, we could always map the n dimensional space down to an m+n dimensional space. | 17:34.26 |
| Where m is what we really want, and n is just a copy of where we came from. | 17:34.43 |
henrys | do you have any sense of how many colors will not transform back properly? | 17:35.08 |
mvrhel_laptop | oh I see what you are saying | 17:35.08 |
| henrys: any that are on the RGB gamut boundary in CMYK(B) space | 17:35.31 |
| it really depends upon the mapping | 17:35.38 |
| picture a 3-D surface in a 5 D space | 17:35.55 |
| all those points on the surface will get interpolated by points that dont have real RGB values | 17:36.16 |
henrys | yeah I'm trying to figure out how much space the bookkeeping is going to take, if it is feasible. | 17:36.22 |
mvrhel_laptop | If I had to put a number on it | 17:37.35 |
| It would be on the order of 256^2 | 17:37.43 |
| as opposed to the entire volume which is 256^3 | 17:38.08 |
| does that make sense? | 17:38.38 |
henrys | then there are practical considerations jot shrink what is needed - we can rebuild the bookkeeping each page - | 17:38.58 |
| we don't have to worry about the entire space. | 17:39.17 |
| s/jot/to | 17:39.54 |
| it is sounding like a research project though and we should probably just focus on your original suggestion. | 17:40.31 |
mvrhel_laptop | How much time is it going to take to convert the buffers. | 17:40.31 |
Robin_Watts | mvrhel_laptop: This is why we have caching transforms :) | 17:40.54 |
mvrhel_laptop | right | 17:40.58 |
Robin_Watts | (well, lcms has a 1 place cache) | 17:41.11 |
| but we can extend that if required. | 17:41.28 |
mvrhel_laptop | we can | 17:41.31 |
| I would suggest we do the original plan and look at adding in speed ups (e.g. caching) if we find it is slow | 17:42.17 |
| also, I think having the knowledge if a buffer has color would be good | 17:42.35 |
| or is all white | 17:42.42 |
Robin_Watts | mvrhel_laptop: I am always in favour of doing the simple thing first, and then fixing it if it isn't good enough. | 17:43.04 |
| (unless to do the more complex version sounds like more fun :) ) | 17:43.26 |
mvrhel_laptop | at least we have something then. also, we don't know if there is an issue with the simpler approach | 17:43.39 |
henrys | mvrhel_laptop: so did you want to modify psdrgb as a demo? | 17:46.26 |
mvrhel_laptop | henrys; I can do that if you would like | 17:47.04 |
| I need to look at that device and make sure it even works. If I recall I may have an open bug with respect to it | 17:47.50 |
henrys | I think it would be a good thing to have anyway, in the meantime I'll set up 801's system and study that. | 17:47.51 |
mvrhel_laptop | ok. bbiab | 17:49.24 |
| henrys: quick question for you | 19:12.13 |
| priority-wise: I am at a stopping point in the phone right now. I have everything working except text search and hyperlinks. do you want me to stop and work on psdrgb, finish phone, or work on both? | 19:13.03 |
| I had to fix an binding issue with the zoom and just got that working this morning. the windows phone app is looking pretty good so far | 19:13.43 |
| bbiaw | 19:31.24 |
ray_laptop | Using a RasterOP compositor device would work for any devices. It would not install itself IFF the device is RGB. I think that's better than modifying one our psdrgb, then they have to modify their device. Also it can more readily lead to optimization when the clist is used (not installing the device when a band doesn't have non-idempotent RasterOps | 19:39.26 |
| I have to take my daughter to her music class. BBIAB. In the meantime, I'll go back to tracking down the bug I found in my saved-pages rendering. | 19:41.14 |
| I added a '--saved-pages-test' command line arg that runs files in saved-pages mode, then after the file it does the --saved-pages-print='print normal flush' | 19:42.09 |
| that will let us easily do regression testing. If a device doesn't support saved-pages, then --saved-pages-test is simply ignored (i.e., pdfwrite, ps2write, pxl...) | 19:43.21 |
| but running some files showed a segfault. | 19:44.19 |
mvrhel_laptop | based upon ray's comments I will hold off on psdrgb until we talk about it more | 21:57.09 |
henrys | mvrhel_laptop: yes stick with the phone for now. | 22:49.31 |
mvrhel_laptop | ok. will do. working now on binding canvas of rectangles over my rendered pages now. of course the way I did things in the windows 8 app does not work | 22:50.25 |
henrys | I see the theoretical argument to 256^2 but 1) I doubt all those colors are used and 2) there are many colors in gamut that can't be mapped back even if the gamut were the same size, many rgb triples could map to the same cmyk(b) value and we wouldn't be able to get back. I should be able to set up a simple lcms program that goes through all rgb triples and see how many are recoverable using the inverse table, right? | 23:03.02 |
| for a given icc profile | 23:03.32 |
| mvrhel_laptop: ^^^ | 23:05.06 |
mvrhel_laptop | henrys: while it is possible that the different rgb values map to the same cmyk(b) value, that is unlikely. It is generally true in the other way around, that is different cmyk(b) values get mapped to the same RGB value. Since we are starting out with RGB we are in a much better place | 23:07.27 |
| henrys: but it would be interesting to do what you suggest, which is push all the values through to see what % don't roundtrip | 23:08.04 |
Robin_Watts | This still feels like you're trying to fit a squid into an octopuses sweater. | 23:09.03 |
| However clever you are, you're going to be 2 sleeves too short. | 23:09.21 |
henrys | that's 6d and 8d this is 3d and 5d | 23:10.30 |
| or squid have 10 legs don't they? | 23:11.28 |
Robin_Watts | Yeah 10, I think. | 23:11.35 |
| You can kinda visualise this as transforming the axes of a coordinate space. | 23:12.24 |
ray_laptop | IMHO, a RasterOps compositor is the way to go. TOTALLY avoids the RGB<->CMYK(B) issue | 23:12.47 |
Robin_Watts | Any sane colorspace will keep the axes orthogonal. | 23:12.57 |
henrys | it's too expensive for most pcl customers they already wine with what we've got. | 23:13.21 |
ray_laptop | the compositor operates in RGB. The compositor isn't used when it isn't needed (i.e., a band doesn't have non-idempotent RasterOps or the device is RGB) | 23:13.59 |
| henrys: what's too expensive ? a compositor | 23:14.30 |
| ? | 23:14.32 |
Robin_Watts | The compositor is a nice idea, if I'm understanding it correctly. | 23:14.41 |
ray_laptop | I thought mvrhel_laptop thought so too | 23:15.00 |
henrys | yes, it's always needed pcl is transparent by default. | 23:15.10 |
Robin_Watts | basically it'll only kick in if required, and then it'll render in rgb and convert to cmyk at the end. | 23:15.14 |
henrys | like I said always required that's why we can't do anything intelligent with pdfwrite | 23:15.37 |
mvrhel_laptop | my only concern is, do we know when we need it | 23:15.42 |
ray_laptop | Robin_Watts: right, on a band-by-band basis | 23:15.49 |
Robin_Watts | the compositor is required for non-neutral stuff or for stuff with rops, AIUI. | 23:15.50 |
mvrhel_laptop | as we are doing the clist writing, how do we know that we should have pushed it a while back | 23:16.29 |
Robin_Watts | BUT the code required to write the compositor, is, it seems to me, a more complex refactoring of the code that would be required to do the simple version. | 23:16.31 |
| i.e. we should write the simple version, cos it's simple, and it might be good enough. | 23:16.48 |
| If it's not, then we have a jumping off point to get to the more complex version from. | 23:17.00 |
| It's not like it's wasted effort. | 23:17.12 |
henrys | like I said by default pcl is always transparent so the compositor would always be required. | 23:17.12 |
Robin_Watts | henrys: aiui, the compositor is not about transparency in this case. | 23:17.34 |
| it's about rops or non-neutral colors. | 23:17.58 |
ray_laptop | henrys: b0x1cy ??? | 23:17.59 |
Robin_Watts | r@y iz 50 l33t. | 23:18.39 |
ray_laptop | henrys: I thought the default RasterOp was 252 which is idempotent | 23:18.41 |
| R\0b\in_W(h)\at*s ??? | 23:19.24 |
Robin_Watts | ray_laptop: ah, there was a control char in there. henrys meant "by" | 23:19.27 |
ray_laptop | henrys: but even if PCL is transparent by default, we can at least optimize some bands that are aren't touched, or are monochrome, right? | 23:21.08 |
mvrhel_laptop | I agree with robin_watts. we do the simple case now and then look at a compositor later | 23:21.30 |
ray_laptop | and painting over unpainted area doesn't really involve transparency (equivalent to painting in PDF when the BG is alpha == 0) | 23:22.05 |
| mvrhel_laptop: maybe, but it is far from the first time this has come up. Also, now we (I) have a better understanding of what is possible with a compositor and optimizing post-clist | 23:23.17 |
| This is likely to come up with any PCL printer company since they don't print in RGB unless they have h/w color conversion post rendering | 23:24.23 |
| and Takane-san will want to be able to sell "real PCL" that has decent performance. On printers, non-hw color conversion tends to be a SERIOUS performance hit, particularly if we have to convert the entire page. | 23:25.53 |
| The CPU's tend to be much lower powered than our development \systems | 23:26.27 |
| but doing the simple one is fine, but I recommend doing it in the customer's device. Nobody cares about the psdrgb device | 23:27.23 |
| then have an enhancement to "do it right" | 23:28.21 |
henrys | first I'm going to get it running, see what proportion of pcl files I can print correctly without doing anything at all, then we can argue ;-) | 23:28.49 |
mvrhel_laptop | ray_laptop: you know better than me on this | 23:29.00 |
| if you think it is relatively easy, I will help you out with it | 23:29.15 |
ray_laptop | unless I misunderstand the "simple" approach, it will ALWAYS transform the entire page | 23:29.16 |
mvrhel_laptop | ray_laptop: except where the band is white or all neutral | 23:29.35 |
ray_laptop | henrys: "get it running" ??? | 23:29.41 |
mvrhel_laptop | henrys: you mean do the round tripping? | 23:30.03 |
| ray_laptop: on the above, I will help you out even if it is not relatively easy. so I guess we would need to make a new compositor device? | 23:31.02 |
henrys | ray_laptop:no I am going to get the system they sent me working with their device and see what it prints. PCL does print a fairly large number of test files with a CMYK device, I don't see why it shouldn't do the same with their device. | 23:31.13 |
mvrhel_laptop | henrys: that is a good point | 23:31.33 |
henrys | they have sent us a system to debug - see your email. | 23:31.57 |
mvrhel_laptop | bbiab... | 23:32.46 |
ray_laptop | henrys: OK. That's a good start | 23:33.17 |
| Adding is_non_neutral to gx_color_usage_s would be needed to collect the areas / bands needed for non-neutral rendering. Since slow_rop is only a bool, we probably don't need a bbox. Currently the trans_bbox is only used on a band basis as well even though we collect the info | 23:37.12 |
| bbbiaw: My daughter's musical is on in a bit. | 23:37.47 |
| Forward 1 day (to 2013/07/27)>>> | |