# IRC Logs

## Log of #ghostscript at irc.freenode.net.

Search:
 <<size->metrics.x_ppem. I'm guessing that I shouldn't be doing that explicitly. Any idea what correct usage is? 08:19.25 chrisl paulgardiner: I can't remember exactly, but that doesn't sound unreasonable - the advance will probably be in scaled font coordinates, and you probably want it in PDF user units. 08:26.24 sebras tor7: should we be calling opj_stream_destroy_v3() instead of opj_stream_destroy() which appears to be deprecated in the version of openjpeg we're using..? 08:27.04 tor7 paulgardiner: when I get the metrics from freetype, I always start by setting the face to size 1 08:28.28 FT_Set_Char_Size(face, 64, 64, 72, 72) 08:28.43 sebras tor7: yes, we should. 08:28.54 tor7 then the metrics are in fixed point format, just divide by 65536 to get the metrics in unscaled space 08:29.26 paulgardiner chrisl: yeah, it seems reasonable to me. It's just that I wonder if there is a function that I should be using to interrogate the value, or whether there is a setup call to request advance back in other units. I get the impression that x_ppem isn't supposed to be public 08:29.30 tor7 paulgardiner: xps_measure_font_glyph() 08:29.48 chrisl Well, I would have a fitz api call for retrieving metrics, but that's just me...... 08:30.17 tor7 paulgardiner: also in fz_text_extract() 08:30.22 paulgardiner tor7: ah right. Just the sort of thing I was looking for. So why does that set the face to size 1? 08:30.49 tor7 the size argument is 26.6 fixed point format 08:31.26 and 72 is the nominal DPI 08:31.39 paulgardiner: though I think it may be worth doing as chrisl suggests, make a proper fitz api for getting font metrics 08:32.09 there hasn't been a lot of need for it before, hence the three places where we do it by interrogating freetype directly 08:32.34 paulgardiner Right okay good 08:32.50 tor7 sebras: _v3 is broken (according to robin_watts) 08:32.59 sebras Robin_Watts: is it? 08:33.09 tor7 it casts the void* to a FILE* and calls fclose on it... 08:33.10 sebras tor7: yes, but the void* is NULL in our case. 08:33.28 tor7: in other cases they might set the void * to point to a FILE*. 08:33.42 tor7 paulgardiner: I'd base it on xps_measure_glyph, but take care of supporting type3 fonts as well 08:33.44 chrisl tor7: isn't there a performance penalty in (re-)setting the font size like that? 08:34.03 tor7 sebras: or something like it... you'll have to ask robin 08:34.06 sebras tor7: will do. 08:34.15 tor7: maybe we set that void* somewhere... 08:35.24 paulgardiner tor7: I might leave adding the api to another day. This task is dragging me into enough pain as it is 08:35.30 I don't really want to be thinking about the type 3 case just now if I can help it 08:35.59 tor7 chrisl: good question, actually. 08:36.59 paulgardiner Might it make sense to call FT_Set_Char_Size when loading the font? 08:37.24 tor7 paulgardiner: the type3 case should be trivial (just font->t3widths) 08:37.41 we use 3 different char sizes for nefarious purposes 08:38.01 64 and 1000 for metrics, and also 65536 to work around rounding issues in the outline extraction and scaling logic inside freetype 08:38.44 but I think we could probably make it work with 65536 everywhere 08:38.56 freetype stupidly scales (and rounds to integers) the outline twice 08:39.24 once by the char size, and once by the matrix 08:39.41 with limited precision on one of them 08:39.47 so char size 64 is really too low precision... 08:40.07 paulgardiner: I'll stick on the ROADMAP to clean up and isolate the uses of freetype into some more common font api 08:40.28 paulgardiner Right. I was just going to suggest a bug, but the ROADMAP would be better 08:40.58 tor7 the bug tracker is too burdened with bugs... I think ROADMAP is clearer for short term projects 08:42.07 paulgardiner That's weird. Using 64, 64, 72, 72 gives me slightly further spaced out text than when I simply divided by x_ppem 08:42.24 tor7 should probably put it on a wiki instead of hosting it on a branch in the git though... if only we had one :) 08:42.26 sebras Robin_Watts: tor7: a patch over at sebras/master to look at. I reset the userdata pointer just before calling the broken _v3 function. this is a bit ugly but ought to work and give no deprecatation warnings. 08:43.31 tor7 paulgardiner: there's also face->units_per_EM but that's only for some other unscaled values in the face struct 08:43.40 paulgardiner Very strange. It now falls slightly outside the box I previously calculated using pdf_measure_text 08:45.14 tor7 sebras: I don't understand why there is an opj_stream_destroy_v3 and why it's so stupidly broken by design... 08:47.24 [trunk] add functions to avoid to use FILE* into the API (thanks winfried). 08:48.54 Update issue 120 and update issue 198 08:48.54 Robin_Watts sebras: IIRC, we use the userdata pointer to store a context in or something. 08:49.55 hence it can't safely be cast to FILE *. 08:50.08 sebras Robin_Watts: tor7: I agree, so this is why I reset the pointer to NULL before calling the _v3 interface. 08:51.02 and yes that sucks. :-P 08:51.07 tor7 we should mention that their bug fix to issues 120 and 198 are broken 08:51.07 https://code.google.com/p/openjpeg/issues/detail?id=120 08:51.19 https://code.google.com/p/openjpeg/issues/detail?id=198 08:51.31 sebras tor7: if they hadn't made the assumption that the userdata is a FILE* then it would have been alright. 08:52.09 tor7 they fixed the issue by not thinking it through and abusing the existing semantics... 08:52.26 dbrgn hi. does the mupdf library for android support extraction of text/tables from pdf? 08:52.35 sebras tor7: agreed. 08:52.35 tor7 they *should* have added a free_user_data callback function, then it would have worked 08:52.46 Robin_Watts sebras: I'd rather call the deprecated function, because the deprecated function does the right thing without requiring us to jump through hoops. 08:52.51 I hope they will undeprecate it in future. 08:53.02 or reprecate it :) 08:53.09 sebras Robin_Watts: they probably will create a opj_stream_destroy_v4()... 08:53.28 sebras wonders why they went for v3 instead of v2 though... 08:53.40 tor7: https://code.google.com/p/openjpeg/issues/detail?id=227 09:01.34 tor7: apparently winfried has already complained, but for a different reason. 09:01.51 paulgardiner tor7: I tried replacing the 64s by 256*64 and then dividing the returned advance by 256, but still I get the same slightly more spaced out text than with the direct div by x_ppem 09:02.30 tor7 paulgardiner: maybe there are more factors involved than just the x_ppem? 09:03.15 it could also be hinting related 09:03.20 paulgardiner I guess, although the results using div by x_ppem seemed to match my bounding box. Of course that could mean there are matching inaccuracies in pdf_measure_text 09:04.58 tor7 paulgardiner: x_ppem * x_scale 09:05.22 is what the results from FT_Get_Advance should be scaled by, if my reading of the docs are correct 09:05.53 paulgardiner: or it could just be that the x_ppem is rounded to pixels, and lost precision 09:06.40 paulgardiner: http://freetype.org/freetype2/docs/reference/ft2-quick_advance.html#FT_Get_Advance 09:07.48 you could get the metrics unscaled I guess, and divide by the face->units_per_EM 09:08.36 in fact, I think that's probably a better idea all around 09:09.08 than what I've been doing 09:09.15 http://freetype.org/freetype2/docs/reference/ft2-base_interface.html#FT_LOAD_XXX 09:10.09 NO_SCALE | NO_HINTING 09:10.36 actually, just NO_SCALE should do 09:10.53 paulgardiner Hmmm. units_per_EM is 1000 whereas the divisor I was using and seemed be working (x_ppem) is 16 09:11.04 Ah right. 09:11.16 I'll give that a go. 09:11.36 tor7 x_ppem is integer, where it ought to be float or fixed point 09:11.54 paulgardiner I was just thinking that given that 16 was working, 1000 wouldn't, but I missed that you were suggesting sending different flags 09:12.48 dbrgn no mupdf devs here? does it support content extraction from pdf files? 09:21.28 tor7 dbrgn: yes, we are here. yes, we support extracting all kinds of content from pdf files. 09:21.48 dbrgn tor7: great! thanks. 09:21.55 tor7 you'll just need to be more specific when asking questions :) 09:21.55 dbrgn tor7: sorry :) ok, so more specific: is it possible to extract images and text tables from pdf files on android? 09:22.50 tor7 dbrgn: mupdf is primarily a C library. the JNI bindings in the android app are not general purpose. we have a project planned to make more general bindings for JNI, but that's several months away at least. 09:23.12 paulgardiner :-) It still overflows the box. But I agree that what you've just suggested looks to be the best approach. I guess the overflow is caused elsehwere. 09:23.34 tor7 dbrgn: you could however base your work off the existing JNI bindings and adapt them to suit your needs. 09:23.41 dbrgn tor7: ok. but i guess in this case i could create my own specific JNI bindings? 09:23.44 tor7: ok, thanks. i'll take a look at it. 09:23.50 tor7 dbrgn: the "mudraw" tool has code to extract text in various amounts of detail 09:24.21 extracting tables as tables is tricky though, since all semantic information in a PDF is lost 09:24.41 PDF just has "draw this character at this position" information 09:24.54 the "text extraction" device in mupdf tries to reassemble the text back into paragraphs and columns and tables, but it doesn't always work 09:25.55 there are plenty of cases that are problematic 09:26.08 dbrgn yeah, i know about the difficulties. i'll try and see if i succeed :) 09:26.13 tor7 I'd suggest looking at the various flavors of mudraw -t output to see if they'll do what you want first 09:26.34 curiousity makes me ask, what do you intend to do with this? 09:26.58 dbrgn tor7: it would be great if i could extract the table on the second page here: http://www.skyguide.ch/fileadmin/dabs-today/DABS_20130726.pdf 09:27.38 tor7 the android app has a reflow mode (one of the button in the toolbar) that runs a pdf -> html extraction and displays the resulting html in a webview 09:27.43 dbrgn tor7: as well as the first page as image. 09:28.02 chrisl paulgardiner: I'd expect the advance width to be larger than the width of the bbox - if I follow correctly, that's what you're seeing? 09:28.18 paulgardiner tor7: Do you think I should use FT_Get_PFR_Metrics to get units_per_EM, or is there something better? 09:28.20 chrisl: Yes, I see what you mean, but it's not that. I'm seeing the text going outside the box returned by pdf_measure_text. 09:29.35 tor7 dbrgn: the image on the first page is an image we can extract. I was worried it'd be line art, in which case you'll need to render the page to get a bitmap out 09:29.43 dbrgn tor7: ok, that sounds great. 09:30.01 tor7 getting the table data out as a table, we can't do that very well yet 09:30.48 our text-assembly doesn't recognize the table cells as such 09:31.24 but you can try your own assembly by looking at the raw text data yourself of course 09:31.45 dbrgn ok, but maybe i can get the content line-by-line? 09:31.52 chrisl paulgardiner: do you know what freetype the bbox from pdf_measure_text() comes from? 09:32.04 s/freetype/freetype call 09:32.14 tor7 paulgardiner: pdf_measure_text uses the /Widths array in the font descriptor. is this a base14 font you're trying? 09:34.01 if it's a substitute font I'd expect the metrics from freetype and pdf_measure_text to vary wildly 09:34.19 paulgardiner tor7: It's one I created without a /Widths array. 09:34.25 So I'm guessing we create are own hmtx table 09:34.44 tor7 yeah. the hmtx table is seeded with the FT_Get_Advance widths 09:35.08 paulgardiner More of a problem at the moment is getting access to units_per_EM. 09:35.31 tor7: Right. So it should be consistent 09:35.41 tor7 paulgardiner: actually, that's a slight lie. the hmtx table is seeded with the advances returned by FT_Load_Glyph 09:36.29 paulgardiner It also uses fz_bound_glyph. It could be just the width of the last char that is being miscalculated 09:36.41 tor7 those sometimes vary minutely from the metrics returned by FT_Get_Advance 09:36.42 due to the various bits of rounding in the processing steps 09:36.48 and it uses fz_bound_glyph for the glyphs at the pen locations. I would guess that the final pen location is outside the bbox (since the next glyph would start beyond it) 09:37.55 paulgardiner: to be more consistent, we should rename pdf_measure_text to pdf_bound_text 09:39.02 paulgardiner Yeah 09:39.12 tor7 it returns the bounding box, the other measure functions return the advance metrics 09:39.16 the pdf_text_stride should have the pdf_measure_text name :) 09:39.36 paulgardiner Didn't understand what you said about the final pen location. I'd have though that would cause the box to be overestimated. 09:40.10 I'm probably misunderstanding what you meant 09:40.22 tor7 the bbox of "abcd" would cover those glyphs 09:41.10 the pen location after advancing through "abcd" would be past the end of the bbox 09:41.21 ready where the "e" would start, if there had been an "e" 09:41.35 paulgardiner Yeah, but I'm seeing text slightly truncated 09:41.51 So part of the d is missing 09:42.04 tor7 so the pen location is short? oh. that's odd. 09:42.15 paulgardiner The box doesn't quite encompasse it 09:42.16 tor7 now I'm confused. 09:42.36 paulgardiner When I was simply dividing by x_ppem the text fit within the box that pdf_measure_text calculated 09:43.01 tor7 the pdf_measure_text bbox, is that narrower or larger than the FT_Get_Advance sum? 09:43.05 paulgardiner pdf_Measure_text uses the advance to decide the char positions, and for each calls fz_bound_glyph for each and unions the results 09:44.19 But with less "for each"s 09:44.43 tor7 the char position x is += h.w after each glyph. but the last position is never used. 09:45.16 so far so good? 09:45.24 paulgardiner Oh. There's a hardwired 1000.0! 09:46.12 tor7 the hmtx is in 1000.0 font space (that's hardcoded in so many places in the pdf spec) 09:46.34 paulgardiner I wonder if that's the units_per_EM that I'm struggling to find an API to interrogate 09:46.40 Oh Okay 09:46.53 tor7 paulgardiner: for type1 fonts, it is usually 1000 09:47.00 for ttf fonts, more commonly 2048 or 1024 09:47.07 ft_face->units_per_EM is right there in the face struct 09:47.31 paulgardiner Maybe I need another include 09:48.15 tor7 http://freetype.org/freetype2/docs/reference/ft2-base_interface.html#FT_FaceRec 09:48.17 that is part of the base freetype.h include 09:49.03 paulgardiner I thought the fact that I could call FT_Get_Advance implied I had whatever freetype include I needed, and the lack of units_per_EM was because the struct was opaque. 09:50.36 Robin_Watts tor7: fz_pixmap_bbox could be fz_bound_pixmap - but it's an irect not a rect. 09:50.38 tor7 robin_watts: yes, I think that's the reason for the different name 09:51.04 sebras tor7: Robin_Watts: I posted a recommendation and a potential patch for them to look at over at https://code.google.com/p/openjpeg/issues/detail?id=227 09:51.14 in the meantime I guess I revert my patch at sebras/master 09:51.28 Robin_Watts sebras: Either your patch to openjpeg or just reverting the deprecation would seem fine to me. 09:52.23 tor7 sebras: maybe that should be set_user_data_v3 09:52.27 Robin_Watts tor7: No, _v3 would seem to insist that userdata == FILE *. 09:52.47 sebras tor7: I thought about it, but I sincerely dislike versioning interface... 09:52.51 interfaces. 09:53.03 tor7 me three, but when in rome... 09:53.30 sebras tor7: if they want to have _v3 they'll have to add it themselves. :-P 09:54.09 tor7 I'm thinking of backwards compatibility with existing source, which I guess is their motivation for _v3 09:54.32 but let them deal with it 09:54.53 sebras mmm. I'll have breakfast instead! :) 09:55.03 tor7 robin_watts: we could just remove the deprecation in the opj headers in our bugfix branch 09:55.14 and get rid of the warnings that way 09:55.19 anyway, I think it's time to tag and cook a release candidate... 09:55.44 robin_watts, paulgardiner: the version string in android "1.2 (Build 50/armv7a)", how is that to be updated? 10:04.03 Robin_Watts tor7: strings.xml in the android/res dir. 10:23.52 I update that when I do a build. 10:23.58 tor7: Removing the deprecation warnings from our branch would be the neatest solution, IMHO. 10:37.42 tor7 robin_watts: okay, so no magic script. 10:47.34 Robin_Watts no, sorry. 10:48.09 The ant build system seems to cope poorly with the idea of having to build different configurations 10:48.45 tor7 I'll just bump it to 1.3 and leave the build and arch for you when you do the builds. 10:48.50 Robin_Watts googles solution is "copy the code into several directories and maintain several versions" 10:49.09 tor7 robin_watts: ugh. 10:49.21 Robin_Watts No magic "we'll cpp the source tree and build it several times according to this config script" or anything 10:49.40 tor7 robin_watts: is the string on tor/master now okay with you? 10:49.47 robin_watts: even we can do that with our fairly simple makefile based system... 10:50.27 Robin_Watts I'd be tempted to just leave it as 1.3 (GIT Build) 10:50.33 tor7 right. that'd work too. 10:50.49 Robin_Watts as otherwise people will assume that $REV/$BUILD should be filled out. 10:50.59 tor7 re-fetch 10:51.29 Robin_Watts I have to run helen to the station in a mo. She has to take the dogs on the train. should be "fun" 10:51.36 tor7 my sympathies... 10:51.46 Robin_Watts looks good. 10:51.50 tor7 I'll make some tar balls and builds then. 10:52.02 I'll leave the android to you 10:52.06 when you get back 10:52.13 no point worrying about iOS for a while... apple's developer site is still down 10:52.30 https://developer.apple.com/support/system-status/ 10:53.28 paulgardiner tor7: can you remind me how, given an fz_font, I can tell it's a base 14? I thought we decided the data would be pointing to a known array, but I'm seeing ft_data NULL here. On the other hand the name is "Helvetica" which is perhaps sufficient. 11:07.07 tor7 paulgardiner: let me check 11:07.29 ft_data may be null because it's pointing to static data that shouldn't be freed 11:07.41 so that may indeed be the indication you're looking for 11:07.49 paulgardiner So perhaps if data non NULL throw. If NULL use the name as Basename in the pdf descriptor I create? 11:08.41 Maybe I should check that the name is one of the base 14 names, but perhaps that is unnecessary because of the data NULL check. 11:10.53 tor7 paulgardiner: pdf_load_embedded_font (and xps_load_font) both set ft_data 11:15.45 so base14 and substitute fonts have ft_data==NULL 11:15.57 and substitute fonts have ft_substitute set to true 11:16.15 paulgardiner Right. Thanks 11:16.29 tor7 so a check for ft_data==NULL && !ft_substitute should tell you whether it's a base14 font or not 11:16.59 alternatively we can add an is_base14 flag to fz_font 11:17.23 paulgardiner And if it is base14 I can use the name, it seems 11:17.26 tor7 what do you need the name for? 11:17.35 for the font descriptor? 11:18.01 paulgardiner So that the pdf device can fill in Basefont in the descriptor it creates 11:18.10 Yep 11:18.13 tor7 that should work. the font->name gets passed in when the font is constructed. 11:19.25 paulgardiner Great. 11:19.39 tor7 for base14 fonts, it's always been through the base_font_names cleanup array. 11:19.42 so you could make a more future-proof is-base14-test by just comparing the font name 11:20.33 but that might still give some false positives 11:20.52 in case it's an embedded font with the same name 11:20.59 so nevermind that idea 11:21.11 paulgardiner I just need to map the gids to winansi and then I think I have first version we can commit 11:23.22 Is that likely to be something we have done elsewhere? 11:26.46 tor7 pdf-unicode.c 11:31.24 but only half of it :) 11:31.30 paulgardiner Bleh! I guess I need to get FT to tell me the corresponding character name and then reverse look it up in pdf_win_ansi 11:31.42 tor7 paulgardiner: pdf_load_tounicode maps from encoding to gid, and then gid back to unicode, and saves the result as a cmap 11:32.00 what you need is from gid to unicode (or winansi) in the pdfwrite device right? 11:32.34 because when creating the text object, you just need to map unicode to gids 11:32.47 paulgardiner: I wonder, is winansi not a subset of latin-1? 11:33.16 paulgardiner Well that fact that I'm cheating at the moment and passing the winansi char to fz_add_text, rather than unicode might help :-) 11:34.02 tor7 :) 11:34.34 that won't render too well if you pass the same fz_text object to a draw device :P 11:34.49 paulgardiner I know. But I promise never to do so. 11:35.17 So winansi subset latin-1 means? Would that imply that all the unicode chars I get back in this situation don't need converting? 11:36.16 i.e. winansi => gid => unicode is the identity 11:38.26 Ah. The pdf device is already using it->ucs when outputting Tj. So for WinAnsi that should just work. 11:42.10 That was an easy change: zero code. 11:42.26 tor7 ew. that it->ucs thing won't work too well in the real world, but for starters I'm sure it'll do 11:43.42 winansi is a superset of latin-1 with some characters being different... 11:44.11 so not strictly unicode 11:44.17 windows codepage 1252 11:44.59 paulgardiner I could shoot for a different restriction of use. I could demand that ucs is set to unicode. Then I need to convert from unicode to winansi, rather than from gid. 11:48.24 I mean not handle the case where we have no known mapping to unicode, which is not much of a restriction seeing as we are looking to handle only WinAnsi at the moment. 11:51.51 And then assume that in fz_text items ->ucs is set correctly, which presumably it isn't for the text from some pdf docs. 11:52.46 tor7 ucs *is* unicode. but ucs is *not* the glyph that's printed on the page. it may not even be set. there can be a N-to-M mapping between it->gid and it->ucs by either one being set to -1 12:12.06 of course, that won't be the case for any base-14 fonts 12:12.41 but it can and will happen for embedded fonts (usually in the XPS case) 12:13.03 so for base14 I think your assumption is safe enough (but only by lucky coincidence) 12:13.46 paulgardiner: on the other hand, I don't think it should be that difficult to map from gid -> winansi using a scheme similar to what exists in pdf_load_unicode 12:14.38 oh... but the pdf_load_unicode function has a big TODO shaped hole :) 12:15.33 paulgardiner Oh okay. I was hoping that it->ucs would be reliable in the XPS case. Not so good a restriction then 12:16.06 tor7 right where the logic I thought you could reuse is supposed to be :) 12:16.08 it->ucs is only reliable for the base14 fonts 12:16.26 paulgardiner Not also for fonts that have a tounicode cmap? 12:16.49 tor7 even regular embedded or substitute fonts may not have a usable to_unicode table, which will put garbage in the ucs 12:16.53 when you are creating the fz_text object, you are mapping from a unicode(-ish) encoding using freetype to the gid 12:18.03 what I'm suggesting is to create a reverse of that encoding table and use that in the pdfwrite device to get back unicode or winansi values 12:18.40 for embedded fonts, you can cheat and use a CIDFont fontdescriptor with an Identity-H encoding and just use the gid values directly 12:19.14 for substitute fonts, I don't have a plan yet 12:19.27 paulgardiner Yeah, I got that. Was just taking a detour wondering if the ucs value could be used instead, but I guess not then 12:19.28 tor7 right. in the general case, we won't want to. but for the narrow case of getting base14 to work, you can take the shortcut of using the ucs. 12:20.05 it all depends on how hard it'll be to create a reverse lookup table 12:20.29 mapping from gid -> unicode using freetype's tables 12:20.50 the biggest problem here is that simple fonts (as they're called in PDF) can't do more than 255 characters 12:21.49 so anything outside WinAnsi for base14 fonts can't be represented in our scheme 12:22.11 (not that I think it really matters for now) 12:22.31 paulgardiner Yeah. I was just hoping to make as much of the code I put in also correct for the more general case. 12:22.56 I'm wondering now whether to stick with what I have for the initial commit, seeing as making use of ucs isn't a longterm solution 12:23.55 At the moment I'm pretending that WinAnsi is a subset of unicode 12:24.33 sebras tor7: I have a pdf here which has a space after the last word of at the end of every line. when I copy/paste the text, do we want to keep this, or do we want to trim the initial/trailing whitespace? 12:24.42 tor7 sebras: I don't know. 12:25.11 sebras tor7: I can't imagine there being a case where you'd want to keep those spaces. newlines is a different case of course! :) 12:25.50 tor7 it's a matter of policy, how much to clean up :) 12:26.08 do we also want to remove double spaces after a period? 12:26.16 sebras I would vote yes on that too, but I know there are differing opinions on that. 12:29.02 I'm cutting a pasting a bit from this pdf so I noticed that I need to remove the trailing whitespace which set me of thinking about it. 12:29.47 tor7 sebras: if there are trailing space characters at the end of lines, it makes it easier to recover hyphenated words 12:32.56 but we can't rely on that anyway, so I guess we could just strip whitespace off the ends of copied (and extracted) text 12:33.14 sebras mmm, I'm thinking that we want to retain the whitespace for mudraw -t though. 12:33.37 because in that case we want it to be 100% accurate. 12:33.45 stripping the whitespace is more of a convenience. 12:33.53 tor7 even that isn't 100% accurate, we insert spaces using heuristics 12:34.02 sebras true, hm.. 12:34.15 after the rc! :) 12:34.24 Robin_Watts reads the logs "fonts fonts fonts...". OK I didn't miss anything :) 12:36.50 zeniko Robin_Watts: ping 12:47.10 Robin_Watts zeniko: hi 12:47.25 zeniko I've managed to test my openjpeg submodule changes on the cluster 12:48.20 Looks like you have not yet had the time to merge the additional test files I've sent you 12:48.38 Robin_Watts zeniko: no indeed, not yet. 12:48.55 zeniko There seem to be some hundred differences with Shelly's changes, 12:50.20 but most of them seem minor enough that I couldn't tell whether they're progressions, regressions or simply noise 12:50.46 I also still have to run the changes through the fuzzing files locally (since I assume they aren't tested by the cluster) 12:51.42 Another question was: does the cluster also run through the XPS files? 12:52.20 Robin_Watts I don't think so. 12:52.52 zeniko Would it be possible to add them (after the holiday and release season)? 12:53.32 Robin_Watts probably, (sorry, phone went) 12:55.22 Just looking now. 12:55.34 zeniko marcosw: FYI: my bmpcmp e-mail notifications consist mainly of errors ("/bin/sh: 1: pkg-config: not found", "cp: cannot stat `/home/marcos/cluster/users/zeniko/ghostpdl/urwfonts': No such file or directory" and "make: pkg-config: Command not found"), the website works fine, though 12:56.58 Robin_Watts zeniko: OK. A mupdf cluster test includes the .xps files. 12:58.48 at least the xpsfts ones. 12:58.56 zeniko Robin_Watts: Thanks. I'd be great if you could also include the ones I've sent you (which cover multiple edge cases MuXPS still gets wrong) 13:00.30 Robin_Watts It's possible those haven't made it into our cluster repo yet :( 13:01.12 I'll need to talk to marcosw about that. 13:01.27 paulgardiner tor7, robin_watts: Initial commits for freetext-annotation support are on paul/signature-appearance 13:09.11 zeniko Robin_Watts: thanks, but first enjoy your holidays! 13:12.44 Robin_Watts tor7: When you talk about me building android stuff "when I get back", did you mean "when I get back from dropping Helen at the station" or "when I get back from holiday" ? 13:56.18 The latter I hope. 13:56.20 tor7 robin_watts, paulgardiner: mupdf.com/news2 15:19.46 Robin_Watts gimme 5 mins. 15:19.58 dbrgn can mudraw output image data to stdout 15:20.54 ? 15:20.58 Robin_Watts dbrgn: You could try "-o -", but I suspect not as standard. Trivial change though. 15:22.26 dbrgn Robin_Watts: nope, "-" is not implemented. 15:22.53 paulgardiner tor7: LGTM 15:23.40 tor7 dbrgn: -o /dev/stdout 15:23.58 Robin_Watts tor7: Looks good to me. 15:24.16 tor7 robin_watts, paulgardiner: thanks. it's live now! 15:25.28 chrisl: henrys: MuPDF 1.3 RC 1 is out. 15:25.38 Robin_Watts did you tag it? 15:26.04 henrys tor7:early this time. 15:26.39 dbrgn tor7: doesn't seem to work. possibly because it needs the filetype suffix. 15:27.08 tor7 robin_watts: tag pushed to origin 15:27.24 henrys: robin_watts is going on vacation, figured we'll do the big release once he gets back. 15:27.42 and now's a good time, there's a lull in the commits and changes :) 15:28.03 Robin_Watts dbrgn: Indeed, it looks for the suffix. 15:33.08 but if it doesn't find one, it'll send PNG format data. 15:33.51 dbrgn Robin_Watts: hm, but "mudraw -r 120 -o image.png DABS_20130726.pdf 1" works while "mudraw -r 120 -o /dev/stdout DABS_20130726.pdf 1" doesn't (return code 0). 15:35.27 there is simply nothing returned. 15:35.33 Robin_Watts and the changes needed inside mudraw aren't trivial if we want them to work portably on windows too. 15:35.47 dbrgn: Yes, you may be out of luck. 15:36.27 Can you use named pipes? 15:36.39 tor7 robin_watts: consider adding a flag to let "mudraw -f pgm -o -" work? 15:36.44 Robin_Watts tor7: The problem is we call fz_write_pnm etc. 15:37.03 and they take a char*filename. 15:37.13 henrys marcosw:you here? 15:37.14 tor7 robin_watts: oh, right! 15:37.15 dbrgn Robin_Watts: named pipes would probably work, but then you're using the filesystem and can use regular files as well. 15:37.25 Robin_Watts so we'd need to push the - => stdout thing in there, and that's nasty. 15:37.31 tor7 they could take "-" as the filename and use stdout, portably enough 15:37.31 but yeah, that's nasty 15:37.38 -f format -o /dev/stdout should be easier 15:37.54 Robin_Watts possibly we should generalise those functions to take a fz_output *. 15:38.29 then the filename versions can be trivially implemented in terms of them. 15:38.56 Less than 24 hours til I leave the house for holiday. I should probably pack at some point. 15:39.45 tor7 robin_watts: I thought we sort of did 15:43.23 Robin_Watts sort of did what? 15:48.18 ablemike Hi all, I have a small bash script that I am using to crop PDFs using CropBox 15:49.46 I am a bit confused as to what to search next. 15:49.53 My PDFs are definitely being cropped and rendered correctly in viewers that respect the CropBox definitions. 15:50.19 However, I need to actually take that CropBox and make a true crop, EG. actually trimming the paper size down to that dimmension. 15:50.56 dimension* 15:51.04 any help here is much appreciated. 15:51.47 Robin_Watts ablemike: It sounds to me like you want to trim the MediaBox to be the same as the CropBox. 15:52.44 I am not aware of any tools that will do that for you. 15:52.54 ablemike if I can access MediaBox via CLI that might work :) 15:53.14 I just didn't know the terms 15:53.22 henrys for most folks we just say have a nice vacation but for Robin_Watts "Good Luck" is more appropriate! 15:56.17 paulgardiner As in "Don't get eaten"? 16:00.22 henrys or hope the inoculations were effective 16:02.35 tor7 robin_watts: sort of generalise the functions to take a fz_output. 16:03.08 Robin_Watts henrys: Didn't need any innoculations. 16:03.25 tor7 robin_watts: where are you off to for vacation? 16:03.40 Robin_Watts Namibia 16:03.45 tor7 ah, right! I should've remembered that :) 16:04.19 stay safe! don't get mugged. 16:04.30 Robin_Watts Yeah. Helen promises me that car jacking doesn't happen in Namibia, just in south africa. 16:05.04 chrisl There's no point wishing Robin a good/safe holiday now, as he'll be back online in the car to Heathrow, in Heathrow, the airport at the other end.......... ;-) 16:06.11 Robin_Watts chrisl: I have booked an airport lounge with wifi while we wait for the flight in south africa, yes :) 16:06.43 chrisl So, better to wait until you actually are off into the wilderness! 16:07.06 Robin_Watts tor7: I think we should have fz_output_pnm that takes an fz_output 16:07.47 marcosw1: Hey. 16:07.51 I've added some new pdfs from zeniko into the tests_private/pdf/sumatra directory. 16:08.14 I've also added his xps files into tests_private/xps/sumatra and enabled that dir in build.pl 16:08.41 marcosw1 robin_watts: yeah, that's a bit of problem. 16:08.49 henrys marcosw1:did you actually build 801's system? 16:08.53 Robin_Watts I note that we aren't testing the ms xps files. 16:08.59 marcosw1: oh, how so? 16:09.06 marcosw1 Two of the files have utf-8 characters in their name, this confounds the cluster 16:09.07 henrys marcosw1:looks like a windows setup 16:09.15 marcosw1 Fri Jul 26 09:03:31 PDT 2013: svn: svn: Can't convert string from 'UTF-8' to native encoding: 16:09.16 Fri Jul 26 09:03:31 PDT 2013: svn: svn: /home/marcos/cluster/tests_private/pdf/sumatra/0_-_opwd_?\195?\164?\195?\182?\195?\188?\226?\130?\172.pdf 16:09.16 Fri Jul 26 09:03:31 PDT 2013: svn: svn: Can't convert string from 'UTF-8' to native encoding: 16:09.16 Fri Jul 26 09:03:31 PDT 2013: svn: svn: /home/marcos/cluster/tests_private/pdf/sumatra/0_-_password_?\195?\164?\195?\182?\195?\188?\226?\130?\172__fails__.pdf 16:09.16 Robin_Watts marcosw1: Oh, sorry. Delete those files? 16:09.23 marcosw1 Presumably renaming them would work, but I can't do an svn_update so I'm not sure how to proceed. 16:09.52 I've temporarily disabled the svn update tests_private step to fix the cluster 16:10.28 Robin_Watts marcosw1: If we rename them, we lose the password, right? 16:10.50 hence just deleting them seems easiest. 16:10.58 want me to do that? 16:11.02 marcosw1 robin_watts: the password should be stored in a file with the same name as the pdf with the extension .pwd (see 0_-_password_letmein.pdf.pwd 0_-_password_password_crypt_level_5.pdf.pwd) 16:11.42 robin_watts: since I can't do an svn update to get the files I don't think I can rename or delete them 16:12.11 Robin_Watts I will try. 16:12.17 marcosw1 I don't think the files in tests_private/xps/sumatra are included in a cluster run, I'll add that directory. 16:13.17 actually they are, but only for GhostXPS 16:13.47 Robin_Watts marcosw1: I added that dir about 1/2 hour ago. 16:14.15 You'll note I added the MS directories too, but left them commented out. 16:14.32 Is there a reason we don't test those files? 16:14.38 marcosw1 henrys: I did not build customer #801 project, I don't have a windows setup. In any case, I'm not sure what that would help, I presume they are correct re. the missing blue plane. 16:15.22 robin_watts: give me a sec to look at the build.pl code 16:16.10 the tests_private/xps/sumatra files are being included in a mupdf test 16:19.12 Robin_Watts marcosw1: Right, cos I added them :) 16:21.01 But the ms ones aren't being tested either for ghostxps or mupdf. 16:21.18 marcosw1 sorry, I thought you were saying that wasn't working 16:21.34 Robin_Watts no, I was just telling you that I'd been fiddling. 16:22.08 marcosw1 is there a reason you left the ms ones disabled? 16:22.10 Robin_Watts Well, the sumatra ones are new, and we specifically wanted them tested, so I enabled them. 16:22.40 marcosw1 robin_watts: if we were better about committing the changes to the cluster code I could just read the logs and figure out what's been going on :-) 16:22.48 Robin_Watts The ms ones have been there for a while, and weren't being tested, so I wanted to check with you as to why they weren't being tested already in case there was a good reason. 16:23.23 marcosw1 I don't think there is any reason, just fewer tests means faster cluster runs. 16:24.33 Robin_Watts marcosw1: OK, want to try svn update now ? 16:25.02 marcosw1 works. 16:25.31 marcosw1 thinks that miles has developed a hardware fault, it's gone down twice this week (points is my fault, I rebooted it after a configuration change). 16:28.37 henrys marcosw:I know you keep gs executable for testing do you also have old pcl's so we can narrow down the performance issue just reported? 16:30.52 marcosw1 ^^^ 16:31.29 marcosw1 henrys: no, ghostpcl regressions are so rare that I just run git bisect when they do occur. 16:31.52 Robin_Watts marcosw1: You suckup :) 16:33.07 henrys the key is not to change it ;-) 16:33.41 marcosw1:just from looking at the email I'd say if mono blame Robin_Watts else blame mvrhel_laptop ;-0 16:36.05 mvrhel_laptop: are you around? 16:37.13 mvrhel_laptop henrys: yes 16:46.03 henrys: is this the cyan and blue data bug? 16:46.56 henrys if you have any great ideas about 801 and pcl color I'm all ears. PCL really needs to have an RGB con tone device to work properly. 16:47.01 mvrhel_laptop 694435? 16:47.01 henrys: let me look it over and get caught up. i was surprised to see non-RGB used for PCL 16:47.40 robin_watts: if I miss you before you sign off for the day, have a great holiday 16:48.27 Robin_Watts thanks. 16:48.45 mvrhel_laptop henrys: so how does PCL behave when going out to a CMYK device now? I know we had this discussion many times with respect to approximations etc. The "right" way to do this is to have a device that maintains the RGB conton buffer until the end and then does the mapping from RGB to CMYK + spot 16:54.45 henrys it simply doesn't behave correctly with cmyk. 16:55.35 we used to have a some rube goldberg contraption but we've gotten rid of it. 16:56.22 pcl is completely rgb I can see how you would get K but not sure how there would ever be a spot color. Is there something in icc that would map an rgb triple to something that used the spot color? 16:58.08 mvrhel_laptop henrys: yes 16:58.21 my thought is this 16:58.26 ray_laptop if your device profile has 5 components out, then that can produce 'Blue' for certain selected bluish colors, right ? 16:59.44 mvrhel_laptop similar to the set up that we had before, we add in another profile option that is a device link profile that can map from RGB to CMYK + spots which will get applied in a final step by the device. to the graphics lib, the device behaves like RGB and the graphics lib would not have any knowledge about this profile 17:00.49 ray_laptop: yes, we can do that now. we can specify N-color ICC profiles for tiffsep and psdcmky 17:01.12 ray_laptop other than picking the right RGB, PCL doesn't have any way to directly map to Blue 17:01.23 henrys mvrhel_laptop: obviously a band at a time. 17:02.16 mvrhel_laptop what we are talking about here, is another device that is really RGB but will use a N-color device link profile at the end 17:02.16 yes 17:02.20 ray_laptop Not sure how they come up with such an ICC profile that mixes in a blue colorant sometimes, but I've seen 6 color profiles that use Orange and Green 17:02.32 mvrhel_laptop well that is for the customer to worry about in my opinion. How they want that generated is another problem. It is no profile to specify with an ICC profile 17:03.32 ray_laptop in the package they posted, it had some ICC profiles, iirc 17:03.35 mvrhel_laptop oh ok. 17:03.43 henrys: we could possible alter the psdrgb device to demo this 17:03.58 ray_laptop mvrhel_laptop: they sent their entire code snapshot 17:04.19 mvrhel_laptop oh 17:04.23 so we could just fix up their device 17:04.37 ray_laptop Since it was in henrys' lap, I didn't look into how they set their device profiles 17:04.53 mvrhel_laptop me either. but I am getting a sinking feeling.... 17:05.12 ray_laptop why should I have all the fun ? Let henrys enjoy, too ;-) 17:05.25 henrys mvrhel_laptop, ray_laptop:I'm just starting setting up their system now. I'll let you know. 17:05.57 I was surprised they sent windows code - I thought they were linux 17:06.17 ray_laptop The "gotcha" for PCL is that RasterOps need to be able to read back colors that have been painted -- AS RGB 17:06.35 mvrhel_laptop right 17:06.47 that keeps us with rgb contone as far as the device appears to the graphics lib 17:07.03 if we want to keep our sanity 17:07.20 ray_laptop so hopefully their ICC profile is bi-directional and can get us from CMYKB backwards 17:07.27 mvrhel_laptop I dont think one can rely upon that 17:07.49 henrys ray_laptop:no mvrhel_laptop is saying we have to keep the rgb 17:08.00 as I parse it. 17:08.10 mvrhel_laptop yes. 17:08.15 that I believe is the safest approach 17:08.23 ray_laptop oh, and then transform the RGB buffer to CMYKB at the end ? 17:08.29 mvrhel_laptop yes 17:08.34 ray_laptop OK. That _does_ make it a lot simpler (if lower performance). 17:09.17 mvrhel_laptop yes, but from my limited understanding of PCL needs and my knowledge of ICC round tripping I think it is the better approach 17:10.02 ray_laptop and we are in the same boat as with TIS. We don't know if non-idempotent RasterOps are used until the page is done. 17:10.12 henrys bbiam 17:10.31 mvrhel_laptop TIS? 17:10.56 ray_laptop so we have to select between a 3 component imaging device if non-idempotent RasterOps are needed, or the (preferred) 5-component CMYKB imaging device if not. 17:11.59 because we really want to avoid having to transform the entire buffer 17:12.21 mvrhel_laptop so do you know that ahead of time 17:13.44 ray_laptop their target printers are > 150 ppm and PCL can be fast. Adding a full transform step to text pages would slow things down 17:13.45 mvrhel_laptop I agree it would slow things down 17:13.57 ray_laptop mvrhel_laptop: in clist mode we do. Collecting that info may have bit-rotted slightly, but we can fix it if so 17:14.28 mvrhel_laptop certainly know if a band is blank or even has color would help 17:14.28 ray_laptop mvrhel_laptop: well we know if it has color (the code you added) :-) 17:14.55 mvrhel_laptop right 17:15.00 that would keep things from slowing down for a lot of documents 17:15.24 ray_laptop mvrhel_laptop: but I think that was collected on a page basis only. Unlike what I did for PDF which is maintained for each band 17:15.41 mvrhel_laptop ray_laptop: you are talking about the presence of non-idempotent RasterOps? 17:16.55 Robin_Watts mvrhel_laptop: Just skimming the discussion so far. 17:17.03 Perhaps what we want is a call to getbits that will convert from rgb to cmyk. 17:17.29 using mvrhel's profile. 17:17.39 That way we minimise the special code in the device. 17:18.00 henrys the profile is invertible? 17:18.28 mvrhel_laptop henrys: not guaranteed. 17:18.44 ray_laptop robin_watts: The key thing is that we don't want to have to convert on every page 17:18.49 for the entire page 17:18.55 henrys so get bits won't work. 17:19.00 Robin_Watts henrys: getbits will work fine if we draw EVERYTHING in rgb, and then the final 'getbits' done by the device asks for the conversion to cmyk. 17:19.38 But that idea doesn't fit with rays idea of avoiding transforming the entire buffer. 17:20.00 mvrhel_laptop unless the getbits had some other information about the buffer/band 17:20.17 ray_laptop Since it is such a special device, I think just doing the conversion in their device is fine, sort of like what is done for monochrome mode when page_is_neutral 17:20.19 Robin_Watts essentially, I was proposing that we do what mvrhel_laptop initially suggested (render entirely in rgb, then convert to cmyk in the device), but was trying to avoid having to have that rgb->cmyk step in every device. 17:21.01 I'm not sure I see how ray_laptop hopes to avoid converting every pixel. 17:21.25 henrys mvrhel_laptop: I thought about this a long time ago â¦ and I really think there must be a way to create an invertible profile with extra information stored when the transformation is done. That would solve the problem nicely with any device. 17:21.26 ray_laptop except with 'page_is_neutral' we still render to CMYK, then do a simple transform to K 17:21.27 robin_watts: we didn't. To do that would require a custom 'image' device that takes in CMYK colors and produces only a single K plane. We haven't done that (yet). 17:22.14 But the CMYK->K transform is fast compared with ICC link profile transformation 17:23.00 mvrhel_laptop henrys: It is certainly possible (and easier) to create the profile that round trips ok from RGB and back. Compared to CMYK(B) to RGB and back (really impossible) 17:24.38 ray_laptop We probably want to do RasterOps in a compositor device, then it can operate in RGB. Then the per-band RasterOp needed can skip the compositor on bands that don't need. it 17:25.02 mvrhel_laptop henrys: if we had such a profile, we would want to hook in the inversion from CMYK(B) to RGB in any graphic lib getbits calls. Is that correct? 17:25.33 ray_laptop The 'put_image' for such a RasterOp compositor would be the place that transforms from RGB to device colors (like pdf14_put_image does now) 17:26.29 mvrhel_laptop ray_laptop: oh that is an interesting thought 17:27.03 ray_laptop I have to run. bbiaw 17:27.24 mvrhel_laptop so we treat this how we do pdf14 which can have these color spaces different than the device 17:27.28 henrys: see my above comment though 17:27.44 henrys mvrhel_laptop: I'm not thinking of a round trip exactly, I'm thinking of a profile and extra bookkeeping to record the conversions that don't invert. 17:28.02 mvrhel_laptop oh. well as I think of this more, having the round trip work properply RGB --> CMYK(B) --> RGB is really not that hard 17:28.56 and could be a requirement for this to work properly 17:29.08 we are never going to have a CMYK(B) that does not have a related RGB value. 17:29.47 granted with the interpolations there are some points in CMYK(B) that may be used that could cause an issue though 17:30.14 henrys yes be it does need to got back to it's original rgb value - hence the extra bookkeeping. 17:30.26 s/be/but 17:30.48 mvrhel_laptop I don't think you would need bookkeeping though if one spent time getting the profile right with respect to this 17:31.07 Robin_Watts mvrhel_laptop: Lots of potential for loss of accuracy in the roundtrip though. 17:31.29 mvrhel_laptop there may be some minor issues on the RGB gamut edges in the CMYK(B) space 17:31.40 This is why I would prefer my original approach 17:31.58 henrys I don't think they'll be able to tune the profile for their device and keep that requirement. 17:32.01 mvrhel_laptop it is much safer 17:32.15 Robin_Watts mvrhel_laptop: Keeping it in rgb and just converting at the end? 17:32.31 mvrhel_laptop syes 17:32.35 yes 17:32.37 Robin_Watts That sounds the sanest approach to me. 17:32.38 henrys agreed what is appealing about the other way is it will work with any device. 17:33.04 mvrhel_laptop if we know a band had neutral only or is all white, then we can avoid transforming it 17:33.23 with the profile 17:33.26 Robin_Watts To do what henry is suggesting (mapping an n dimensional space down to an m+k dimentional space, where m+k = n and m dimensions are what you really want, and k dimensions are extra bookeeping) seems like a HARD problem 17:34.00 Of course, we could always map the n dimensional space down to an m+n dimensional space. 17:34.26 Where m is what we really want, and n is just a copy of where we came from. 17:34.43 henrys do you have any sense of how many colors will not transform back properly? 17:35.08 mvrhel_laptop oh I see what you are saying 17:35.08 henrys: any that are on the RGB gamut boundary in CMYK(B) space 17:35.31 it really depends upon the mapping 17:35.38 picture a 3-D surface in a 5 D space 17:35.55 all those points on the surface will get interpolated by points that dont have real RGB values 17:36.16 henrys yeah I'm trying to figure out how much space the bookkeeping is going to take, if it is feasible. 17:36.22 mvrhel_laptop If I had to put a number on it 17:37.35 It would be on the order of 256^2 17:37.43 as opposed to the entire volume which is 256^3 17:38.08 does that make sense? 17:38.38 henrys then there are practical considerations jot shrink what is needed - we can rebuild the bookkeeping each page - 17:38.58 we don't have to worry about the entire space. 17:39.17 s/jot/to 17:39.54 it is sounding like a research project though and we should probably just focus on your original suggestion. 17:40.31 mvrhel_laptop How much time is it going to take to convert the buffers. 17:40.31 Robin_Watts mvrhel_laptop: This is why we have caching transforms :) 17:40.54 mvrhel_laptop right 17:40.58 Robin_Watts (well, lcms has a 1 place cache) 17:41.11 but we can extend that if required. 17:41.28 mvrhel_laptop we can 17:41.31 I would suggest we do the original plan and look at adding in speed ups (e.g. caching) if we find it is slow 17:42.17 also, I think having the knowledge if a buffer has color would be good 17:42.35 or is all white 17:42.42 Robin_Watts mvrhel_laptop: I am always in favour of doing the simple thing first, and then fixing it if it isn't good enough. 17:43.04 (unless to do the more complex version sounds like more fun :) ) 17:43.26 mvrhel_laptop at least we have something then. also, we don't know if there is an issue with the simpler approach 17:43.39 henrys mvrhel_laptop: so did you want to modify psdrgb as a demo? 17:46.26 mvrhel_laptop henrys; I can do that if you would like 17:47.04 I need to look at that device and make sure it even works. If I recall I may have an open bug with respect to it 17:47.50 henrys I think it would be a good thing to have anyway, in the meantime I'll set up 801's system and study that. 17:47.51 mvrhel_laptop ok. bbiab 17:49.24 henrys: quick question for you 19:12.13 priority-wise: I am at a stopping point in the phone right now. I have everything working except text search and hyperlinks. do you want me to stop and work on psdrgb, finish phone, or work on both? 19:13.03 I had to fix an binding issue with the zoom and just got that working this morning. the windows phone app is looking pretty good so far 19:13.43 bbiaw 19:31.24 ray_laptop Using a RasterOP compositor device would work for any devices. It would not install itself IFF the device is RGB. I think that's better than modifying one our psdrgb, then they have to modify their device. Also it can more readily lead to optimization when the clist is used (not installing the device when a band doesn't have non-idempotent RasterOps 19:39.26 I have to take my daughter to her music class. BBIAB. In the meantime, I'll go back to tracking down the bug I found in my saved-pages rendering. 19:41.14 I added a '--saved-pages-test' command line arg that runs files in saved-pages mode, then after the file it does the --saved-pages-print='print normal flush' 19:42.09 that will let us easily do regression testing. If a device doesn't support saved-pages, then --saved-pages-test is simply ignored (i.e., pdfwrite, ps2write, pxl...) 19:43.21 but running some files showed a segfault. 19:44.19 mvrhel_laptop based upon ray's comments I will hold off on psdrgb until we talk about it more 21:57.09 henrys mvrhel_laptop: yes stick with the phone for now. 22:49.31 mvrhel_laptop ok. will do. working now on binding canvas of rectangles over my rendered pages now. of course the way I did things in the windows 8 app does not work 22:50.25 henrys I see the theoretical argument to 256^2 but 1) I doubt all those colors are used and 2) there are many colors in gamut that can't be mapped back even if the gamut were the same size, many rgb triples could map to the same cmyk(b) value and we wouldn't be able to get back. I should be able to set up a simple lcms program that goes through all rgb triples and see how many are recoverable using the inverse table, right? 23:03.02 for a given icc profile 23:03.32 mvrhel_laptop: ^^^ 23:05.06 mvrhel_laptop henrys: while it is possible that the different rgb values map to the same cmyk(b) value, that is unlikely. It is generally true in the other way around, that is different cmyk(b) values get mapped to the same RGB value. Since we are starting out with RGB we are in a much better place 23:07.27 henrys: but it would be interesting to do what you suggest, which is push all the values through to see what % don't roundtrip 23:08.04 Robin_Watts This still feels like you're trying to fit a squid into an octopuses sweater. 23:09.03 However clever you are, you're going to be 2 sleeves too short. 23:09.21 henrys that's 6d and 8d this is 3d and 5d 23:10.30 or squid have 10 legs don't they? 23:11.28 Robin_Watts Yeah 10, I think. 23:11.35 You can kinda visualise this as transforming the axes of a coordinate space. 23:12.24 ray_laptop IMHO, a RasterOps compositor is the way to go. TOTALLY avoids the RGB<->CMYK(B) issue 23:12.47 Robin_Watts Any sane colorspace will keep the axes orthogonal. 23:12.57 henrys it's too expensive for most pcl customers they already wine with what we've got. 23:13.21 ray_laptop the compositor operates in RGB. The compositor isn't used when it isn't needed (i.e., a band doesn't have non-idempotent RasterOps or the device is RGB) 23:13.59 henrys: what's too expensive ? a compositor 23:14.30 ? 23:14.32 Robin_Watts The compositor is a nice idea, if I'm understanding it correctly. 23:14.41 ray_laptop I thought mvrhel_laptop thought so too 23:15.00 henrys yes, it's always needed pcl is transparent by default. 23:15.10 Robin_Watts basically it'll only kick in if required, and then it'll render in rgb and convert to cmyk at the end. 23:15.14 henrys like I said always required that's why we can't do anything intelligent with pdfwrite 23:15.37 mvrhel_laptop my only concern is, do we know when we need it 23:15.42 ray_laptop Robin_Watts: right, on a band-by-band basis 23:15.49 Robin_Watts the compositor is required for non-neutral stuff or for stuff with rops, AIUI. 23:15.50 mvrhel_laptop as we are doing the clist writing, how do we know that we should have pushed it a while back 23:16.29 Robin_Watts BUT the code required to write the compositor, is, it seems to me, a more complex refactoring of the code that would be required to do the simple version. 23:16.31 i.e. we should write the simple version, cos it's simple, and it might be good enough. 23:16.48 If it's not, then we have a jumping off point to get to the more complex version from. 23:17.00 It's not like it's wasted effort. 23:17.12 henrys like I said by default pcl is always transparent so the compositor would always be required. 23:17.12 Robin_Watts henrys: aiui, the compositor is not about transparency in this case. 23:17.34 it's about rops or non-neutral colors. 23:17.58 ray_laptop henrys: b0x1cy ??? 23:17.59 Robin_Watts r@y iz 50 l33t. 23:18.39 ray_laptop henrys: I thought the default RasterOp was 252 which is idempotent 23:18.41 R\0b\in_W(h)\at*s ??? 23:19.24 Robin_Watts ray_laptop: ah, there was a control char in there. henrys meant "by" 23:19.27 ray_laptop henrys: but even if PCL is transparent by default, we can at least optimize some bands that are aren't touched, or are monochrome, right? 23:21.08 mvrhel_laptop I agree with robin_watts. we do the simple case now and then look at a compositor later 23:21.30 ray_laptop and painting over unpainted area doesn't really involve transparency (equivalent to painting in PDF when the BG is alpha == 0) 23:22.05 mvrhel_laptop: maybe, but it is far from the first time this has come up. Also, now we (I) have a better understanding of what is possible with a compositor and optimizing post-clist 23:23.17 This is likely to come up with any PCL printer company since they don't print in RGB unless they have h/w color conversion post rendering 23:24.23 and Takane-san will want to be able to sell "real PCL" that has decent performance. On printers, non-hw color conversion tends to be a SERIOUS performance hit, particularly if we have to convert the entire page. 23:25.53 The CPU's tend to be much lower powered than our development \systems 23:26.27 but doing the simple one is fine, but I recommend doing it in the customer's device. Nobody cares about the psdrgb device 23:27.23 then have an enhancement to "do it right" 23:28.21 henrys first I'm going to get it running, see what proportion of pcl files I can print correctly without doing anything at all, then we can argue ;-) 23:28.49 mvrhel_laptop ray_laptop: you know better than me on this 23:29.00 if you think it is relatively easy, I will help you out with it 23:29.15 ray_laptop unless I misunderstand the "simple" approach, it will ALWAYS transform the entire page 23:29.16 mvrhel_laptop ray_laptop: except where the band is white or all neutral 23:29.35 ray_laptop henrys: "get it running" ??? 23:29.41 mvrhel_laptop henrys: you mean do the round tripping? 23:30.03 ray_laptop: on the above, I will help you out even if it is not relatively easy. so I guess we would need to make a new compositor device? 23:31.02 henrys ray_laptop:no I am going to get the system they sent me working with their device and see what it prints. PCL does print a fairly large number of test files with a CMYK device, I don't see why it shouldn't do the same with their device. 23:31.13 mvrhel_laptop henrys: that is a good point 23:31.33 henrys they have sent us a system to debug - see your email. 23:31.57 mvrhel_laptop bbiab... 23:32.46 ray_laptop henrys: OK. That's a good start 23:33.17 Adding is_non_neutral to gx_color_usage_s would be needed to collect the areas / bands needed for non-neutral rendering. Since slow_rop is only a bool, we probably don't need a bbox. Currently the trans_bbox is only used on a band basis as well even though we collect the info 23:37.12 bbbiaw: My daughter's musical is on in a bit. 23:37.47 Forward 1 day (to 2013/07/27)>>>
 ghostscript.com Search: