| <<<Back 1 day (to 2016/01/14) | 20160115 |
tor8 | chrisl: thanks for helping mvrhel with his CMap confusions! | 00:15.49 |
| Robin_Watts: <br> is tricky, but it should work... | 00:16.30 |
Robin_Watts | <p>foo<br>bar</p><p>baz</p> is rendered as foo\nbaz | 00:23.51 |
| <p>foo<br></br>bar</p><p>baz</p> is rendered as foo\nbar\nbaz | 00:24.12 |
| tor8: Let's talk about this stuff tomorrow. Too late now. | 00:24.31 |
bofh_ | ugh so I decided to try to implement something like xpdf's pdftotext -layout option to the span extraction and this has revealed to me just how complicated of a mess pdf structured text extraction is. | 02:31.57 |
| also, unrelated: is there any benefit to caching decoded images/pixmaps? on several test systems I had there was none (in fact it was often slower), but these were all x86_64 desktops or fast armv7 boards. | 02:32.49 |
| no idea if it differs on mobile, but that should always be memory-bound, but re-decoding means less data copied than pulling a decoded copy out of the store, at least for anything non-tiny | 02:33.32 |
| (this is all mupdf btw) | 02:33.46 |
kub | hi, I want to obtain cmyk raster with 3 drop sizes. | 09:19.49 |
| What works: -sProcessColorModel=DeviceRGB -sDEVICE=ppmraw -dGrayValues=3 | 09:20.06 |
kens | So you are talking about Ghostscript | 09:20.18 |
| I have no idea what you mean when you say '3 drop sizes' | 09:20.31 |
kub | yes | 09:20.34 |
kens | And if you use ProcessColorModel=DeviceRGB then you are not producing CMYK | 09:20.53 |
chrisl | Also, ppm is an RGB raster format | 09:21.16 |
kub | with DeviceCMYK I obtain a Unrecoverable error: rangecheck in .putdeviceprops | 09:21.23 |
kens | That's because (as CHris just said) ppmraw is an RGB format | 09:21.42 |
| SO you can't use CMYK with it | 09:21.48 |
kub | the tiff* devices produce all the rangecheck error | 09:21.49 |
kens | You will need to supply an example file and command line for us to look at it. Probably best if you just open a bug report. | 09:22.17 |
| And I still don't know what you mean by 'drop sizes' | 09:22.28 |
kub | :-) 3 drop sizes can our InkJet head print | 09:23.25 |
kens | I doubt you can have 3 shades of gray | 09:23.58 |
| So I presume there's a 'nothing' drop size for a total of 4 values | 09:24.16 |
| So you need 2 bpp | 09:24.22 |
| By the way, are you a commercial customer ? | 09:25.03 |
| Setting GrayValues to 3 looks like an invalid number, it should be 1, 2, 4, 8 | 09:25.46 |
chrisl | FWIW, doing a simple "showpage", the above command line (with the addition of an output file) works without an error for me, with the current version | 09:25.57 |
kens | Looks like GrayValues should be 4 in ths case | 09:26.39 |
| Goodness knows what that looks like with halftoning | 09:26.54 |
kub | gs -q -dPARANOIDSAFER -dNOPAUSE -dBATCH -r600x600 -sProcessColorModel=DeviceRGB -sDEVICE=ppmraw -dGrayValues=3 -sOutputFile=temp.ppm ~/.local/share/ghostscript/9.18/examples/text_graph_image_cmyk_rgb.pdf | 09:27.43 |
| that works -^ | 09:27.54 |
kens | But it doesn't do what you thnk it does. | 09:28.13 |
kub | maybe | 09:28.21 |
kens | Its not CMYK output and I doubt that it is 2 bits per pixel either | 09:28.34 |
kub | is in GS some documentation about halftoning, diffusion, levels of gray parameters, I did not find so far | 09:29.03 |
kens | kub please answer the earlier question, are you a commercial customer, and if not are you representing a printer manufacturer (You say 'our print head' for example) | 09:29.14 |
| Ghostscript implements the PostScript halftoning method and there are specific Ghostscript tecniques | 09:29.54 |
| Please answer the previous question | 09:30.02 |
kub | I am R&D and contacted sales@ but obtained no reply so far. ATM I evaluate different RIP's including GS | 09:31.50 |
kens | OK that's fine. When did you contact sales ? | 09:32.13 |
chrisl | I think the documentation about GrayValues is misleading: "-sDEVICE=ppmraw -dGrayValues=16 will make this the default device and set the number of bits per component to 4" - PPM only works in 1 or 2 byte samples. | 09:32.21 |
kens | Sometimes they lose emails, if you haven't heard we can poke them for you | 09:32.24 |
kub | 11.1. | 09:32.35 |
| kens thanks, would be glad to read back | 09:33.11 |
kens | 4 days is too long, can you forward the email to support (@artifex.com) and I will ensure they contact you | 09:33.14 |
| What version of GS are you suing, and on what platform ? (Linux, Windows, something else) | 09:33.41 |
kub | kens - forward done; I develop mainly on Linux and will deploy Windows | 09:34.51 |
kens | OK but are you using Linux right now ? We need top reproduce your problem before we can help you | 09:35.22 |
kub | ys | 09:35.29 |
| yes | 09:35.32 |
kens | OK which Linux, and where did you get the version of GS you are using ? Did you get a package or build it yourself ? What version is it ? and are you using the 64-bit version ? | 09:36.13 |
| Better yet, post the whole back channel output when you run the failing setup | 09:36.42 |
| And please let us know the command line that *doesn't* work | 09:36.58 |
kub | oS-Leap41 with gs-9.18 tar ball | 09:37.11 |
kens | Because I don't thnk ppmraw is going to be a good format for you | 09:37.12 |
chrisl | And preferably *stop* using "-q" | 09:37.22 |
kens | I'm assuming that your printer is CMYK, so you need to use either a separating device (4 rasters, one each of C, M, Y and K) or a composite CMYK device. TIFF seems like a good choice for an evaluation. So lets get that working. | 09:38.45 |
kub | http://www.behrmann.name/temp/gsgrayvalues3.output.txt first version | 09:39.51 |
kens | Canyou do that without -q please | 09:40.12 |
kub | bbl | 09:40.16 |
chrisl | tiffsep does not support GrayValues, I think | 09:41.44 |
kens | Possibly true | 09:41.57 |
| Which might explain the rangecheck error | 09:42.05 |
| We must have some way to produce CMYK halftoned output though | 09:42.41 |
| bitcmyk maybe ? | 09:43.00 |
chrisl | I think bitcmyk is the only way to get *2 bit* CMYK. tiffsep1 will produce 1bpp halftoned output | 09:43.55 |
kub | chrisl http://www.behrmann.name/temp/gsgrayvalues3withoutPoption.output.txt | 09:44.32 |
kens | Well he'll need 2 bit if he wants 3 drop sizes (plus none of course) | 09:44.33 |
| kub the first thing is that is 9.15, not 9.18 | 09:44.54 |
kub | I was ;-) | 09:45.02 |
| I saw | 09:45.09 |
kens | SO you aren't using the version you think you are :-) | 09:45.23 |
| I'd guess that is a version of GS bundled into the operating system | 09:45.36 |
| From what Chris and I can see you need to use the bitcmyk device and GrayValues=4 in order to get CMYK rasters, halftoned, with 4 values (0, 1, 2, 3) | 09:46.16 |
kub | checked with 9.18 its the same rangecheck message with different version info | 09:46.38 |
| I'll try, | 09:47.00 |
kens | kub it looks like our mail server marked your mail to sales as spam, I'm just forwarding it on now. If you don't hear back in a day or so, please do let me know and I'll poke them again. | 09:47.12 |
| If you give me a minute, I'll try and come up with a command line for you (unless Chris beats me to it) | 09:47.31 |
chrisl | Note that bitcmyk will output *raw* data - it won't be wrapped in any image file format | 09:47.43 |
kub | -sDEVICE=bitcmyk -dGrayValues=4 gives no error | 09:48.15 |
kens | OK well that should be producing what you need, but as raw pixels, as Chris says, its not an image format | 09:49.50 |
| I have no idea what you could use to read that | 09:50.15 |
| There are other screening possibilities, in addition to the standard PostScript methods, but I'm not really up to date with them. | 09:51.19 |
| I guess the first thing is to experiment with ths output format, and come back when you have more questions. We'll do our best to answer them, oor get answers for you from the developers who know more about the screening (they are in the US so won't be here for a few hours) | 09:52.43 |
chrisl | Photoshop can read raw files, but I don't know about 2bpp | 09:52.48 |
kens | chrisl is the output separate files or one composite ? | 09:53.07 |
kub | I can wrap the bits, np | 09:53.29 |
kens | Oh, OK | 09:53.35 |
chrisl | IIRC, the bit* devices are composite | 09:53.41 |
kens | can't decide if composite or separated is easier :-) | 09:54.06 |
| kub is that enough for you to start with ? | 09:54.18 |
kub | It's a start to estimate how beautiful the raster will be printed | 09:56.28 |
kens | OK then I suggest you start that way, and we'll ask our colour epert about screening this afternoon when he comes online. | 09:57.06 |
kub | thanks for your help | 09:57.32 |
kens | If you can come back in about 6 or 7 hours then we can give you some more help | 09:57.34 |
| Or drop in on Monday and we'll tell you what we've found out:-) | 09:57.49 |
| kub your web site appears to be non functoinal ths morning. All I get is a blank page.... | 10:15.18 |
kub | kenshttp://www.behrmann.name looks fine from at least two locations | 10:17.51 |
chrisl | kub: I think kens was meaning dropjet.com | 10:22.11 |
kens | Yes,http://www.dropjet.com doesn't do anythign for me | 10:26.06 |
tor8 | Robin_Watts: a lone BR tag in xhtml must be closed <br/> | 10:26.20 |
Robin_Watts | tor8: Ah, so we need to use <br></br> then. | 10:26.56 |
| That does work. | 10:27.01 |
tor8 | so the "<p>foo<br>bar</p><p>baz</p>" is parsed as <p>foo<br>bar</br><p>baz</p>[and implicit </p>" since I cheat and don't actually check that a closing tag matches | 10:27.45 |
| Robin_Watts: or just <br/> | 10:28.04 |
Robin_Watts | tor8: right. | 10:28.34 |
chrisl | tor8: So, I'm hoping I helped clarify the CIDFont/CMap/cmap/ToUnicode things for mvrhel_laptop, and not made things worse! | 10:29.17 |
Robin_Watts | So, experimenting with the code, I see that: <p>A<b>B</b>C</p> results in 3 calls to generate text with "A" "B" and "C" respectively. | 10:29.35 |
tor8 | chrisl: everything you said was perfectly clear and accurate! (to me, at least... I guess I also have font related job security...) | 10:29.53 |
kens | But do you want it..... | 10:30.33 |
Robin_Watts | The bidirectional algorithm needs to be passed whole lines (or paragraphs) at a time because the directionality of certain chars depends on context. | 10:30.45 |
| So I reckon it needs to be done at a higher level than generate_text. | 10:31.12 |
tor8 | Robin_Watts: can we keep the current paragraph directionality as an in-out parameter that gets passed to generate_text? | 10:32.09 |
Robin_Watts | tor8: That won't help. | 10:32.26 |
tor8 | or do you need to look both before and after the current char to determine directionality? | 10:33.15 |
Robin_Watts | Some chars have different directionality according to the stuff around them, not just the 'current paragraph directionality'. | 10:33.15 |
| "A" "(" "B" for example. | 10:33.37 |
tor8 | you could put those dependent bits in a fragment of its own and set the directionality bit to 'depends' in generate_text, and resolve it as a post process? | 10:33.55 |
Robin_Watts | The "(" is only L2R if B is. | 10:33.58 |
tor8 | it'll mean splitting unneccessarily (gah, I can't spell that word) but I think that'll be easier than splitting afterwards | 10:35.01 |
kub | kens chrisl - indeed that link is down, thanks for letting me know | 10:37.15 |
kens | NP | 10:37.21 |
tor8 | Robin_Watts: so, the lex_number diff is on tor's bmpcmp now | 10:37.35 |
kens | I just thought I'd look at dropJet's products to get some familiarity, I was able to get a good overview from the ESMA site though | 10:37.56 |
tor8 | Robin_Watts: most of them actually look like progressions | 10:39.54 |
Robin_Watts | tor8: the rules for exactly how to recognise fragments are hard. That's why everyone uses the same piece of example code from unicode to do it. | 10:40.48 |
| My preference, I think would be to do a pass over the text after we've parsed it. | 10:41.11 |
tor8 | okay, then we'll need to either split flow nodes during that pass or keep directionality per character rather than per flow node | 10:42.08 |
Robin_Watts | Run through the flow a paragraph at a time, gathering the text up. Feed that text into the unicode algorithm to get directions out, and split the flow accordingly. | 10:42.31 |
| So at the end of that pass we have the boxes tagged with appropriate directions. | 10:42.58 |
| Then layout just needs to be updated to cope with using those directions. | 10:43.13 |
tor8 | Robin_Watts: right. so same result as I was thinking of, but done as a separate post-process by splitting nodes | 10:44.27 |
Robin_Watts | yes. | 10:44.46 |
tor8 | okay. good. | 10:47.47 |
Robin_Watts | How do you explain the difference in 9 ? | 10:47.51 |
| 28 looks like a clear progression. | 10:47.51 |
| 34 too. | 10:47.51 |
kens | I see 2 clear progressions, a bunch of pixely 'who cares' diffs and some oddities | 10:48.32 |
| KenEg catx5720.pdf | 10:48.52 |
| Oh damn | 10:48.52 |
| I mean catx5720.pdf | 10:49.00 |
| #80 looks like a progression too, not surprising given the bug title | 10:50.03 |
| Also 86 | 10:50.20 |
| 92 looks worse | 10:50.56 |
chrisl | We should probably edit that out of the logs ^^ | 10:51.01 |
kens | Yeah please, if someone could do that, sorry | 10:51.16 |
chrisl | Robin_Watts: can you do the honours please? | 10:51.38 |
Robin_Watts | Ok, so if you're happy with 9, 80, 88 and 92, then I'm happy. | 10:52.09 |
tor8 | Robin_Watts: syntax error in the font descriptor object | 10:52.14 |
kens | For number 92 MuPDF matches GS but not Acrobat, ths may be a first example of Acrobat doing something other than setting to 0 | 10:52.19 |
tor8 | it has "/ItalicAngle -17.-21823" which got turned into 3 tokens | 10:52.29 |
| and then parsing failed because -21823 is not a valid dictionary key | 10:52.42 |
kens | It 'looks like' Acrobat turns '--' into '-' | 10:52.57 |
Robin_Watts | so, -17.-21823 goes to -17.21823 ? | 10:53.26 |
kens | No, | 10:53.37 |
| --17.2 goes to -17.2 | 10:53.47 |
| ~92 has lots of values with doublte negatives | 10:54.00 |
tor8 | no, it goes to "-17." and the rest is discarded | 10:54.03 |
Robin_Watts | edits logs. | 10:54.12 |
kens | defers to tor | 10:54.13 |
| GS interprets teh '--' as invalid and sets the numbers to 0 | 10:54.35 |
| Whch gives the same result as MuPDF | 10:54.45 |
| However Acrobat differs | 10:54.54 |
| It looks to me like Acrobat is turning the '--' into a '-' and using the rest of the number | 10:55.12 |
tor8 | kens: I'm still talking about number 9 | 10:55.34 |
kens | Oh sorry, I was talking about #92 | 10:55.46 |
| WHich is actually a slight regression | 10:56.17 |
tor8 | kens: so for 92, if I change the '-' detection to a while loop it looks like the pdf creator intended | 10:57.47 |
| so instead of if (c == '-') I just while (c == '-') to eat them all | 10:58.13 |
kens | So you truncate the '--' back to a '-' ? | 10:58.15 |
| Right | 10:58.19 |
| Its the first instance I've seen where a malformed number is corrected instead of set to 0 | 10:58.36 |
tor8 | but I don't actually negate the sign twice | 10:58.38 |
kens | Yeah that's what I thnk is 'correct' or at least 'same as Acrobat' | 10:58.57 |
| I guess I'll have to try and do that in GS as well :-( | 10:59.20 |
| For #9 I don't see a great difference even with the GS output which sets the -18 to 0 | 10:59.39 |
| err -17 that is | 10:59.55 |
tor8 | kens: the font is embedded so I don't expect the -17 to actually show up in the render | 11:00.18 |
| we dropped the embedded font because we got confused while parsing the dictionary and errored out | 11:00.42 |
kens | Hmm, OK but I thought it might affect GS, seems it doesn't | 11:00.42 |
| Oh OK | 11:00.52 |
Robin_Watts | tor8: Are you up for the parliament trip in June? | 11:01.18 |
kens | overall I'd say its a distinct improvement, and if you treat '--' as '-' its even better | 11:01.20 |
tor8 | 85 is probably the best test to see if adobe sets to 0 or parses the initial bit | 11:02.10 |
kens | Hmm, let me look at that one agin | 11:02.38 |
| Acrobat actually throws a warning | 11:03.24 |
| But it has only the largest square | 11:03.33 |
| SO it is different to MuPDF and GS | 11:03.48 |
tor8 | kens: 40.-40 40+60 160 160-1 re s 80e0 80.abc 80 80 re s | 11:04.00 |
kens | bangs head on table | 11:04.19 |
| Obviously a hand-broken file | 11:04.33 |
tor8 | yes, it is. but it would show what acrobat does. what kind of warning does it toss out? | 11:04.54 |
kens | the usual 'something is wrong and hte page may not display as expected' | 11:05.14 |
tor8 | right, I forgot how useful adobes errors are :) | 11:05.32 |
kens | From prior experience, Acroibat stops processing when it throws that error | 11:05.33 |
| So some part of that broken rect is causing Acrobat to give up | 11:06.04 |
| I could repair bits of it to see where it stops I guess | 11:06.17 |
tor8 | Robin_Watts: not sure; how soon do I have to make up my mind? | 11:06.20 |
kens | It looks like its OK with the first rect, but the second it throws an error | 11:06.52 |
| I'm guessing its the .abc | 11:07.01 |
tor8 | kens: I suspect the 80e0 since it looks hexadecimal | 11:07.30 |
kens | Give me a second, just changing it | 11:07.40 |
tor8 | but if not, then both 80e0 and 80.abc should be the same, number followed by a word | 11:08.00 |
kens | If I take out the e0 then it just displays the first rectangle | 11:08.07 |
tor8 | 80 then e0 and 80. then abc | 11:08.09 |
Robin_Watts | tor8: Just trying to get an idea of numbers. | 11:08.23 |
kens | So it looks like the e0 actually makes Acrobat error out | 11:08.29 |
Robin_Watts | I'll put you down as a maybe. | 11:08.41 |
kens | Interesting, the 80.abc is not treated as 0 | 11:09.34 |
tor8 | Robin_Watts: Thanks. My disdain for politicians and everything they do might be overcome by the group's enthusiasm. | 11:09.48 |
kens | Nor is it treated as 80, wtf ? | 11:10.03 |
tor8 | kens: huh, that's .... odd | 11:10.16 |
kens | repeats the tests | 11:10.27 |
tor8 | does it try to parse it as 0x80.abc ? | 11:10.28 |
kens | Hard to say | 11:11.04 |
| If I change it to 0 80.abc 80 80 re s then it shows nothing | 11:11.21 |
| If I change it to 0 80 80 80 re s then it strokes a rectangle at 0,80 | 11:11.40 |
| If I change it to 0 0 80 80 re s then it strokes a rectangle at 0 0 | 11:12.03 |
| SO what's it doing with the .abc ? | 11:12.16 |
tor8 | very inconsistent handling of numbers then! | 11:12.18 |
kens | and teh .abc doesn't throw an error either..... | 11:12.46 |
tor8 | 0 80 0 80 [ignored 80] re maybe? | 11:13.00 |
kens | Hmm, that could be | 11:13.11 |
| THa't'd be a 0 widht rect | 11:13.19 |
| let me put the .abc in the first number | 11:13.34 |
tor8 | it'd still show up, it's stroked not filled | 11:13.51 |
kens | Its not showing up at all with .abc no matter what I do | 11:14.20 |
| And not giving an error | 11:14.26 |
| Taking away one of the opernds doesn't do anything either | 11:14.55 |
| It looks like Acrobat is sliently ignoring the error | 11:15.06 |
| OK so if I *deliberately* create a rect with too few operands, Acrobat silently ignores it | 11:15.43 |
| Hmm | 11:16.18 |
| Oh boy | 11:16.48 |
| It looks like Acrobt throws away the malformed number, then because there are too few opernds for the 're' it doesn';t draw it. But it doesn't throw an error either. | 11:17.18 |
| What a pile of poo | 11:17.25 |
| Obviously it 'fixes' at least one of the numbers in the larger rectangle | 11:18.04 |
tor8 | kens: indeed, I think the conclusion is, acrobat does arbitrary stuff to broken numbers. | 11:18.49 |
kens | So ths: "0.abc 0 0 80 80 re s" produces an 80 rectangle at 0,0 | 11:19.12 |
| Whereas this : "0.abc 0 80 80 re s" silently produces no rectangle | 11:19.32 |
| I don't thnk its really worth trying to duplicate this insane behaviour | 11:19.56 |
| At a guess, Acrobat ignores numerals and signs in the middle of a number, truncating the number from that point. So "40.-40 40+60 160 160-1 re s" becomes " 40 40 160 160 re s" | 11:21.32 |
| But alphas in a number it throws an error on | 11:21.56 |
| The 80e0 I'm not so sure what its doing | 11:22.31 |
| Except that it throws an actual error on that one | 11:23.18 |
| endstreamYeah 80e0 throws an error, 80.e0 does not. Madness | 11:24.47 |
tor8 | so the integer and real parsers differ in how they handle errors. madness indeed! | 11:25.11 |
kens | Well I wouldn't like to guess what's going on behind the screen | 11:25.29 |
| It might be that they are saying that a .x is a missing traling 0, whereas a alpha in a number is not a missing whtespace | 11:26.09 |
| In any event, I thnk your current approach is more than good enough | 11:26.29 |
| I'll have a poke at GS and see if I can get it to treat '--' as '-' as well :-( | 11:26.48 |
| Hah, GS already treats 40.-40 as 40.0, I didn't know that | 11:27.57 |
| But 160-1 gets turned into a 0 | 11:28.10 |
Robin_Watts | ok, so, tor8: I need to code up that second pass now. | 11:33.08 |
| Am I right in thinking that to find all the text in a paragraph I do a depth first search breaking the text at each 'break' node ? | 11:34.09 |
| Or is this the time we should be looking at http://www.unicode.org/reports/tr14/ ? | 11:36.45 |
tor8 | I've sort-of hacked a partial implementation of tr14 already -- it's what creates the 'break' nodes in the first place | 11:47.30 |
| sorry, 'glue' nodes | 11:47.43 |
| the fz_html boxes that get spit out from generate_box come in four flavours | 11:48.43 |
| BLOCK, BREAK, FLOW and INLINE | 11:48.54 |
| for bidi you only care about the FLOW boxes | 11:49.13 |
| ugh, I can barely remember how these things are strung together | 11:50.27 |
| anyway, each BOX_FLOW has a paragraph or possibly more, depending on the presence of <br/> tags or being a <pre> tag, which will show up as FLOW_BREAK nodes | 11:51.44 |
kens | lunches | 12:29.53 |
NTQ | Is there an example on how to create PDF/A-2a? At the moment I always get PDF/A-2b. If there is no example, can I upload you my test scenario? | 12:37.16 |
chrisl | NTQ: I don't know for sure, but I suspect that PDF/A-2a has many of the same requirements as PDF/A-1a. In which case, the answer is covered here: http://ghostscript.com/FAQ.html | 13:01.34 |
NTQ | chrisl: Thank you. So because PDF/A-1a is not implemented, you also did not implement PDF/A-2a I guess. | 13:10.01 |
| Because both of them have nearly the same restrictions. | 13:10.24 |
Robin_Watts | tor8: Gotcha, ta, I'll give that a whirl. | 13:11.31 |
chrisl | NTQ: The information to produce A-1a (and I assume A-2a) is not available by the time we (Ghostscript) see the input. | 13:13.28 |
kens | Chris is correct, we cannot make PDF/A-xa files | 13:16.10 |
| The spec (PDF/A-1a) specifically says you sare not supposed to guess at the document structure and without that, you cannot make a 'a' file. | 13:16.34 |
NTQ | kens: Thank you. Then I will ask our costumer if he also would accept PDF/A-2b. | 13:17.33 |
| The main reason why I want to use PDF/A-2 is transparency. We sometimes received PDF documents with transparent images. After creating a PDF/A-1b from such a document the whole page gets rendered as an image, so text too. And a few weeks ago I heard from you that it is not possible to only render the image again, excluding the text. | 13:20.19 |
kens | You cannot easily tell whether any portion of the text is partially or fully transparent, so you have to render it all. | 13:21.07 |
tor8 | Robin_Watts: I expect we'll need to add arabic/hebrew fonts to mupdf now then? | 13:22.09 |
Robin_Watts | tor8: some kind of fallback mechanism, yes. | 13:22.36 |
tor8 | Robin_Watts: it'll be easy enough to merge in DroidSansArabic and DroidSansHebrew into DroidSansFallback | 13:23.24 |
Robin_Watts | but that doesn't solve for other languages. | 13:24.00 |
| would be nicer to have a generic fallback system that could cascade through a set of script fallbacks. | 13:24.24 |
tor8 | Robin_Watts: yeah, agreed | 13:24.32 |
| we only have a two-level fallback now | 13:24.37 |
NTQ | kens: Sorry, I am not a PDF expert. But if I create a PDF/A-1b with Adobe Acrobat it recognizes exactly which parts of a page have to be rendered new and which not. What makes it hard to identify these parts of a page where a transparent image has any effect? | 13:28.16 |
tor8 | Robin_Watts: I'll take a stab at making a cascading fallback font system | 13:30.05 |
kens | NTQ I didn't say it was impossible,I said 'easily' | 13:32.35 |
| Say I draw some text, then paint some more stuff, then create a transparency group and draw through it. If paret of that group intersects the text, then the text must be rendered to an image | 13:33.30 |
| But by the time we get to the transparency operation, we've already stored the text in the output PDF file. | 13:33.50 |
| Its not impossible to preparse the entire PDF file, but it would mean totally rewriting our PDF output device, and frankly that's not going to happen | 13:34.31 |
| The benefit is small, the cost is huge | 13:34.47 |
NTQ | kens: Alright. Thank you. I fully understand now. | 13:35.11 |
chrisl | We can produce PDF/A-2b IIRC | 13:35.55 |
kens | We cna, yes | 13:36.02 |
| Possibly even PDF/A-3 now I thnk | 13:36.16 |
tor8 | Robin_Watts: on tor/master there's a quick fix that merges DroidSansArabic and Hebrew into the CJK fallback fonte | 13:59.21 |
Robin_Watts | Ta. | 13:59.41 |
tor8 | Robin_Watts: there's also a "direction" property in CSS that I don't currently pass on | 14:00.35 |
| Robin_Watts: cocked up something with the encoding in that one, there's a new version of the commit up now | 14:13.11 |
Robin_Watts | is just boggling at these html structures. | 14:51.31 |
| surely they take a HUGE amount of memory ? | 14:52.26 |
| 44 bytes for every flow entry. And there is a flow entry for every word, plus another for every space. | 14:53.14 |
| The type and expand can be combined into a flags word. | 14:55.11 |
tor8 | Robin_Watts: not to mention just how damned many of them there are! the fz_html_flow struct is overdue for a diet | 14:55.34 |
Robin_Watts | I reckon we should be reference counting styles and sharing them where possible. | 14:55.59 |
tor8 | Robin_Watts: the *style is a pointer to the box's embedded struct | 14:56.13 |
Robin_Watts | Ok, so can't we just omit that and always pass both a box pointer and a flow pointer ? | 14:56.45 |
tor8 | a flow box is always a child of a block box | 14:57.23 |
Robin_Watts | fz_html_flow is always a child of an fz_html, you mean ? | 14:57.56 |
tor8 | but the inline boxes are also children of the block box, but the text content of the inline box lives in their sibling flow box | 14:57.59 |
| and fz_html_flow is a child of a fz_html with the FLOW_BOX type | 14:58.21 |
| but the flow->style does not necessarily point to the parent box's style | 14:58.35 |
| it may point to it's uncle or cousin box's style | 14:58.48 |
Robin_Watts | So... if I have <p>Mary had a <b>little</b>lamb</p> | 14:59.42 |
| we'd have an inline box for the <b> section, then a flow box with "Mary" " " "had" " " "a" " " "little" " " "lamb" | 15:00.56 |
tor8 | you get a box tree: { block[p] { inline(b) {}, flow {"Mary had a ", "little", "lamb" } } | 15:01.03 |
| yeah | 15:01.05 |
Robin_Watts | and the style for "little" would point to the inline box. | 15:01.09 |
tor8 | yeah. I figured I'd save a *little* bit of memory (considering how much I already waste) by not making every flow node have its own style | 15:01.56 |
Robin_Watts | Gotcha. | 15:02.08 |
tor8 | the inline boxes I don't use for anything other than creating the flow nodes, but I have to keep them around just because they hold the styles | 15:02.20 |
Robin_Watts | tor8: I'd consider having a global style dictionary. | 15:02.28 |
tor8 | the inline boxes are needed for the css matching | 15:02.29 |
| Yeah. that'd probably save a fair bit of memory. | 15:02.47 |
Robin_Watts | and then instead of having pointers to the style, have indexes into the dictionary. | 15:02.52 |
tor8 | considering that each html node may have unique style attributes, but the vast majority of them will be shared | 15:03.14 |
Robin_Watts | Would we still need the inline boxes then? | 15:03.30 |
tor8 | no, then we could free them once we're done | 15:03.41 |
| but everything is allocated using the pool allocator now | 15:03.56 |
Robin_Watts | We could allocate inlines using a different pool allocator. | 15:04.11 |
| and then free that pool. | 15:04.21 |
tor8 | yeah. | 15:04.23 |
| the fz_css_style could use bitfields for a lot of its fields | 15:04.47 |
Robin_Watts | Paragraphs never extend outside a flow block, right? | 15:05.15 |
tor8 | and a lot of the flow properties are computable with a bit of care, so don't need to be stored | 15:05.31 |
| define "extend" | 15:05.52 |
Robin_Watts | block { flow { "This is a different paragraph" } block { flow "to this" } } | 15:06.14 |
| block { flow { "This is a different paragraph" } block { flow { "to this" } } } | 15:06.30 |
| When computing the directions of the text, I need to pass whole paragraphs to the code at once. | 15:07.09 |
tor8 | a single paragraph is never split into multiple flow boxes | 15:07.24 |
Robin_Watts | That means passing the contents of a whole 'flow' at once, never having to combine multiple flows together. | 15:07.33 |
| Cool. | 15:07.35 |
tor8 | a line break is always at the end of a flow box | 15:07.47 |
| A smarter/dumber way is to not have the flow nodes at all and just have an array of where the spaces and breaks are in the text | 15:09.23 |
| it'll mean more work during rendering, but would save huge amounts of memory | 15:09.34 |
| and assign styles to spans of text | 15:10.07 |
| so the flow box would look something like struct { char *text; char **spaces; char **breaks; style *styles; char **style_starts; } | 15:11.05 |
| and then another array of breaks actually taken | 15:11.49 |
Robin_Watts | Or, make use of some of the utf8 unused codes. | 15:11.52 |
tor8 | or just plain old escape codes | 15:12.25 |
Robin_Watts | so char *text becomes a list of either valid utf8 codes, or invalid ones that act as escapes for 'break', 'change style' etc. | 15:12.40 |
tor8 | though I think we should hold off optimizing this too much until we've implemented a bit more | 15:13.52 |
| bidi, floating around images, tables, hyphenation and tex-style global line breaking optimization | 15:14.18 |
Robin_Watts | tor8: Yeah. | 15:14.25 |
tor8 | this structure is wasteful, but it's designed for rapid prototyping | 15:14.40 |
Robin_Watts | I need to add a direction flag to fz_html_flow_s. | 15:14.46 |
| so to do that I'll move expand and type and direction into a single bitfield. | 15:15.02 |
tor8 | Robin_Watts: sounds good. | 15:15.12 |
| you could put text and image in a union | 15:15.36 |
Robin_Watts | will do. | 15:16.04 |
tor8 | the x,y,w,h stuff is used for line layout so needs to stay | 15:17.51 |
| the 'em' is calculated from the style, but depends on the tree context and the current font size set during layout so needs to be stored as well | 15:18.59 |
Robin_Watts | tor8: I understand the need for w and h (to avoid repeated measuring). I don't get the need for x and y to be in the structure. | 15:19.42 |
tor8 | it's where the layout puts them so the drawing code can draw the node without redoing the layout | 15:20.11 |
| Robin_Watts: one way to skip the x,y,w,h,em fields would be to create the fz_text node during layout instead of during drawing | 15:27.50 |
HenryStiles | Robin_Watts, mvrhel_laptop: have you guys tried to login to RSA? I went through the entire process and now it says it doesn't know me. Pretty sure I did everything right. | 16:33.38 |
Robin_Watts | HenryStiles: Yeah, worked for me. | 16:36.11 |
| Well, i've registered etc, if that's what you meant. | 16:36.36 |
HenryStiles | huh, it worked the second time around. | 16:40.50 |
kub | mvrhel_laptop: hello | 17:27.04 |
| mvrhel_laptop: how is GS Even Tone Screening invoked. We need it with 8/16bit CMYK for producing contone colors in 4 levels of gray. | 17:41.27 |
Robin_Watts | kub: Hi. Are you a commercial customer of Artifex? | 17:48.35 |
jogux | Robin_Watts: jub is the person kens was talking to this morning who hasn't yet had a reply from sales@, iirc. | 17:50.04 |
| kub, even, sorry. | 17:50.17 |
kens | scott has replied, I've seen the email | 17:50.30 |
jogux | ah, I'd also not noticed kens had reconnected :) | 17:50.44 |
kens | yeah network is fl;aky | 17:50.57 |
Robin_Watts | Ok. I was interested to know if we (Artifex) had supplied the separate ETS code to kub, or whether he was trying to use the version of it that's pickled into the rinkj deviec in gs. | 17:51.40 |
kens | We won't have supplied any new code, at least as yet | 17:52.26 |
Robin_Watts | Ok, so I would expect it to be quite hard for kub to do any serious evaluation until he gets the latest version from us. | 17:53.02 |
kens | kub did you get an email from Scott Sackett ? | 17:54.49 |
| OK I'm off for hte night, have a good weekend everyone | 18:05.19 |
kub | bbl | 18:45.27 |
| Robin_Watts: not yet, but interested in becommig a commercial one | 19:06.42 |
| kens: Scott Sackett did send me an email, and I replied. | 19:07.12 |
Robin_Watts | kub: OK. It sounds like we need to get you a copy of the latest code for evaluation. I'm not sure I'm authorised to just send it out. | 19:08.11 |
kub | Robin_Watts: is the ETS (EvenTone Screening) not inside AGPL GS? | 19:08.12 |
Robin_Watts | HenryStiles: What's the process? | 19:08.20 |
| kub: There is an old version of the code in gs, as part of the rinkj device. | 19:08.40 |
| If you're doing customisation and tuning, then we have a standalone version that is probably easier to work with. | 19:09.21 |
kub | aha | 19:09.21 |
| ok | 19:09.28 |
| -sDEVICE=rinkj -dGrayValues=4 gives a rangecheck error and without -dGrayValues I get a crash | 19:15.06 |
| Robin_Watts: yes, the ETS code is appreciated for evaluating | 19:16.04 |
Robin_Watts | kub: I need to get the OK from HenryStiles to send it out. His OK may or may not be conditional on getting a signed evaluation agreement between you and Scott. | 19:17.10 |
kub | ah, git cloned, but that appears not sufficient from your wording | 19:21.09 |
| Robin_Watts: will wait for your ping or/and email | 19:22.03 |
HenryStiles | Robin_Watts: sorry at lunch, it's fine to send it. | 19:47.26 |
Robin_Watts | kub: Email address? | 20:14.02 |
HenryStiles | it's fine not having the latest stuff out but is ets the reason for rinkj not working? I guess we don't know that. | 20:27.21 |
| rinkj should work. | 20:27.35 |
Robin_Watts | I have never used rinkj in my life. | 20:29.32 |
kub | rinkj appears to need some setup, which I omitted | 20:30.31 |
| http://ghostscript.com/doc/current/Devices.htm#Rinkj | 20:31.14 |
HenryStiles | kub: breaks for me with setup too. | 20:36.30 |
| kub: so the first sentence of the documentation is spot on ;-) | 20:40.01 |
| Robin_Watts: the device uses a color manager directly, it crashes lcms. What is truly bizarre is we have we make this call if lcms_deshandle is NULL, des_color_space = cmsGetPCS(lcms_deshandle) but the first thing that function does is dereference lcms_deshandle so something is awry in the code generally (rinkj aside) | 20:54.14 |
| mvrhel_laptop: ^^^ | 20:54.19 |
| gsicc_lcms2.c:523 des_color_space = cmsGetPCS(lcms_deshandle); | 20:55.53 |
| sorry I'll rewrite that gibberish if needed | 20:57.57 |
mvrhel_laptop | kub are you still there | 22:28.25 |
| HenryStiles: I was able to login to the RSA, but it appears that I already had an account with my artifex email | 22:29.10 |
| HenryStiles: It looks like rinkj is really screwed up. I will see if I can get it working after I finish up this font stuff in mupdf | 22:32.15 |
HenryStiles | mvrhel_laptop: a little more worried about gsicc_lcms2.c, maybe that case can never happen? | 22:58.44 |
mvrhel_laptop | oh let me look hold on | 22:59.04 |
| that makes no sense.. hold on | 23:01.02 |
| HenryStiles: I am going to have to take a closer look into this to understand when or how this case could occur. My comment /* We must have a device link profile. */ is a clue. | 23:05.09 |
HenryStiles | mvrhel_laptop: don't interrupt your mupdf stuff, but I thought you'd want to know about it. | 23:07.30 |
mvrhel_laptop | HenryStiles: thanks. I suspect it is supposed to be lcms_srchandle in line 526 but I will take a closer look at it later. | 23:08.01 |
| I will open a bug to remind myself | 23:08.07 |
| Forward 1 day (to 2016/01/16)>>> | |