Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2016/01/14)	20160115
tor8	chrisl: thanks for helping mvrhel with his CMap confusions!	00:15.49
	Robin_Watts: <br> is tricky, but it should work...	00:16.30
Robin_Watts	<p>foo<br>bar</p><p>baz</p> is rendered as foo\nbaz	00:23.51
	<p>foo<br></br>bar</p><p>baz</p> is rendered as foo\nbar\nbaz	00:24.12
	tor8: Let's talk about this stuff tomorrow. Too late now.	00:24.31
bofh_	ugh so I decided to try to implement something like xpdf's pdftotext -layout option to the span extraction and this has revealed to me just how complicated of a mess pdf structured text extraction is.	02:31.57
	also, unrelated: is there any benefit to caching decoded images/pixmaps? on several test systems I had there was none (in fact it was often slower), but these were all x86_64 desktops or fast armv7 boards.	02:32.49
	no idea if it differs on mobile, but that should always be memory-bound, but re-decoding means less data copied than pulling a decoded copy out of the store, at least for anything non-tiny	02:33.32
	(this is all mupdf btw)	02:33.46
kub	hi, I want to obtain cmyk raster with 3 drop sizes.	09:19.49
	What works: -sProcessColorModel=DeviceRGB -sDEVICE=ppmraw -dGrayValues=3	09:20.06
kens	So you are talking about Ghostscript	09:20.18
	I have no idea what you mean when you say '3 drop sizes'	09:20.31
kub	yes	09:20.34
kens	And if you use ProcessColorModel=DeviceRGB then you are not producing CMYK	09:20.53
chrisl	Also, ppm is an RGB raster format	09:21.16
kub	with DeviceCMYK I obtain a Unrecoverable error: rangecheck in .putdeviceprops	09:21.23
kens	That's because (as CHris just said) ppmraw is an RGB format	09:21.42
	SO you can't use CMYK with it	09:21.48
kub	the tiff* devices produce all the rangecheck error	09:21.49
kens	You will need to supply an example file and command line for us to look at it. Probably best if you just open a bug report.	09:22.17
	And I still don't know what you mean by 'drop sizes'	09:22.28
kub	:-) 3 drop sizes can our InkJet head print	09:23.25
kens	I doubt you can have 3 shades of gray	09:23.58
	So I presume there's a 'nothing' drop size for a total of 4 values	09:24.16
	So you need 2 bpp	09:24.22
	By the way, are you a commercial customer ?	09:25.03
	Setting GrayValues to 3 looks like an invalid number, it should be 1, 2, 4, 8	09:25.46
chrisl	FWIW, doing a simple "showpage", the above command line (with the addition of an output file) works without an error for me, with the current version	09:25.57
kens	Looks like GrayValues should be 4 in ths case	09:26.39
	Goodness knows what that looks like with halftoning	09:26.54
kub	gs -q -dPARANOIDSAFER -dNOPAUSE -dBATCH -r600x600 -sProcessColorModel=DeviceRGB -sDEVICE=ppmraw -dGrayValues=3 -sOutputFile=temp.ppm ~/.local/share/ghostscript/9.18/examples/text_graph_image_cmyk_rgb.pdf	09:27.43
	that works -^	09:27.54
kens	But it doesn't do what you thnk it does.	09:28.13
kub	maybe	09:28.21
kens	Its not CMYK output and I doubt that it is 2 bits per pixel either	09:28.34
kub	is in GS some documentation about halftoning, diffusion, levels of gray parameters, I did not find so far	09:29.03
kens	kub please answer the earlier question, are you a commercial customer, and if not are you representing a printer manufacturer (You say 'our print head' for example)	09:29.14
	Ghostscript implements the PostScript halftoning method and there are specific Ghostscript tecniques	09:29.54
	Please answer the previous question	09:30.02
kub	I am R&D and contacted sales@ but obtained no reply so far. ATM I evaluate different RIP's including GS	09:31.50
kens	OK that's fine. When did you contact sales ?	09:32.13
chrisl	I think the documentation about GrayValues is misleading: "-sDEVICE=ppmraw -dGrayValues=16 will make this the default device and set the number of bits per component to 4" - PPM only works in 1 or 2 byte samples.	09:32.21
kens	Sometimes they lose emails, if you haven't heard we can poke them for you	09:32.24
kub	11.1.	09:32.35
	kens thanks, would be glad to read back	09:33.11
kens	4 days is too long, can you forward the email to support (@artifex.com) and I will ensure they contact you	09:33.14
	What version of GS are you suing, and on what platform ? (Linux, Windows, something else)	09:33.41
kub	kens - forward done; I develop mainly on Linux and will deploy Windows	09:34.51
kens	OK but are you using Linux right now ? We need top reproduce your problem before we can help you	09:35.22
kub	ys	09:35.29
	yes	09:35.32
kens	OK which Linux, and where did you get the version of GS you are using ? Did you get a package or build it yourself ? What version is it ? and are you using the 64-bit version ?	09:36.13
	Better yet, post the whole back channel output when you run the failing setup	09:36.42
	And please let us know the command line that doesn't work	09:36.58
kub	oS-Leap41 with gs-9.18 tar ball	09:37.11
kens	Because I don't thnk ppmraw is going to be a good format for you	09:37.12
chrisl	And preferably stop using "-q"	09:37.22
kens	I'm assuming that your printer is CMYK, so you need to use either a separating device (4 rasters, one each of C, M, Y and K) or a composite CMYK device. TIFF seems like a good choice for an evaluation. So lets get that working.	09:38.45
kub	http://www.behrmann.name/temp/gsgrayvalues3.output.txt first version	09:39.51
kens	Canyou do that without -q please	09:40.12
kub	bbl	09:40.16
chrisl	tiffsep does not support GrayValues, I think	09:41.44
kens	Possibly true	09:41.57
	Which might explain the rangecheck error	09:42.05
	We must have some way to produce CMYK halftoned output though	09:42.41
	bitcmyk maybe ?	09:43.00
chrisl	I think bitcmyk is the only way to get 2 bit CMYK. tiffsep1 will produce 1bpp halftoned output	09:43.55
kub	chrisl http://www.behrmann.name/temp/gsgrayvalues3withoutPoption.output.txt	09:44.32
kens	Well he'll need 2 bit if he wants 3 drop sizes (plus none of course)	09:44.33
	kub the first thing is that is 9.15, not 9.18	09:44.54
kub	I was ;-)	09:45.02
	I saw	09:45.09
kens	SO you aren't using the version you think you are :-)	09:45.23
	I'd guess that is a version of GS bundled into the operating system	09:45.36
	From what Chris and I can see you need to use the bitcmyk device and GrayValues=4 in order to get CMYK rasters, halftoned, with 4 values (0, 1, 2, 3)	09:46.16
kub	checked with 9.18 its the same rangecheck message with different version info	09:46.38
	I'll try,	09:47.00
kens	kub it looks like our mail server marked your mail to sales as spam, I'm just forwarding it on now. If you don't hear back in a day or so, please do let me know and I'll poke them again.	09:47.12
	If you give me a minute, I'll try and come up with a command line for you (unless Chris beats me to it)	09:47.31
chrisl	Note that bitcmyk will output raw data - it won't be wrapped in any image file format	09:47.43
kub	-sDEVICE=bitcmyk -dGrayValues=4 gives no error	09:48.15
kens	OK well that should be producing what you need, but as raw pixels, as Chris says, its not an image format	09:49.50
	I have no idea what you could use to read that	09:50.15
	There are other screening possibilities, in addition to the standard PostScript methods, but I'm not really up to date with them.	09:51.19
	I guess the first thing is to experiment with ths output format, and come back when you have more questions. We'll do our best to answer them, oor get answers for you from the developers who know more about the screening (they are in the US so won't be here for a few hours)	09:52.43
chrisl	Photoshop can read raw files, but I don't know about 2bpp	09:52.48
kens	chrisl is the output separate files or one composite ?	09:53.07
kub	I can wrap the bits, np	09:53.29
kens	Oh, OK	09:53.35
chrisl	IIRC, the bit* devices are composite	09:53.41
*kens*	can't decide if composite or separated is easier :-)	09:54.06
	kub is that enough for you to start with ?	09:54.18
kub	It's a start to estimate how beautiful the raster will be printed	09:56.28
kens	OK then I suggest you start that way, and we'll ask our colour epert about screening this afternoon when he comes online.	09:57.06
kub	thanks for your help	09:57.32
kens	If you can come back in about 6 or 7 hours then we can give you some more help	09:57.34
	Or drop in on Monday and we'll tell you what we've found out:-)	09:57.49
	kub your web site appears to be non functoinal ths morning. All I get is a blank page....	10:15.18
kub	kenshttp://www.behrmann.name looks fine from at least two locations	10:17.51
chrisl	kub: I think kens was meaning dropjet.com	10:22.11
kens	Yes,http://www.dropjet.com doesn't do anythign for me	10:26.06
tor8	Robin_Watts: a lone BR tag in xhtml must be closed <br/>	10:26.20
Robin_Watts	tor8: Ah, so we need to use <br></br> then.	10:26.56
	That does work.	10:27.01
tor8	so the "<p>foo<br>bar</p><p>baz</p>" is parsed as <p>foo<br>bar</br><p>baz</p>[and implicit </p>" since I cheat and don't actually check that a closing tag matches	10:27.45
	Robin_Watts: or just <br/>	10:28.04
Robin_Watts	tor8: right.	10:28.34
chrisl	tor8: So, I'm hoping I helped clarify the CIDFont/CMap/cmap/ToUnicode things for mvrhel_laptop, and not made things worse!	10:29.17
Robin_Watts	So, experimenting with the code, I see that: <p>A<b>B</b>C</p> results in 3 calls to generate text with "A" "B" and "C" respectively.	10:29.35
tor8	chrisl: everything you said was perfectly clear and accurate! (to me, at least... I guess I also have font related job security...)	10:29.53
kens	But do you want it.....	10:30.33
Robin_Watts	The bidirectional algorithm needs to be passed whole lines (or paragraphs) at a time because the directionality of certain chars depends on context.	10:30.45
	So I reckon it needs to be done at a higher level than generate_text.	10:31.12
tor8	Robin_Watts: can we keep the current paragraph directionality as an in-out parameter that gets passed to generate_text?	10:32.09
Robin_Watts	tor8: That won't help.	10:32.26
tor8	or do you need to look both before and after the current char to determine directionality?	10:33.15
Robin_Watts	Some chars have different directionality according to the stuff around them, not just the 'current paragraph directionality'.	10:33.15
	"A" "(" "B" for example.	10:33.37
tor8	you could put those dependent bits in a fragment of its own and set the directionality bit to 'depends' in generate_text, and resolve it as a post process?	10:33.55
Robin_Watts	The "(" is only L2R if B is.	10:33.58
tor8	it'll mean splitting unneccessarily (gah, I can't spell that word) but I think that'll be easier than splitting afterwards	10:35.01
kub	kens chrisl - indeed that link is down, thanks for letting me know	10:37.15
kens	NP	10:37.21
tor8	Robin_Watts: so, the lex_number diff is on tor's bmpcmp now	10:37.35
kens	I just thought I'd look at dropJet's products to get some familiarity, I was able to get a good overview from the ESMA site though	10:37.56
tor8	Robin_Watts: most of them actually look like progressions	10:39.54
Robin_Watts	tor8: the rules for exactly how to recognise fragments are hard. That's why everyone uses the same piece of example code from unicode to do it.	10:40.48
	My preference, I think would be to do a pass over the text after we've parsed it.	10:41.11
tor8	okay, then we'll need to either split flow nodes during that pass or keep directionality per character rather than per flow node	10:42.08
Robin_Watts	Run through the flow a paragraph at a time, gathering the text up. Feed that text into the unicode algorithm to get directions out, and split the flow accordingly.	10:42.31
	So at the end of that pass we have the boxes tagged with appropriate directions.	10:42.58
	Then layout just needs to be updated to cope with using those directions.	10:43.13
tor8	Robin_Watts: right. so same result as I was thinking of, but done as a separate post-process by splitting nodes	10:44.27
Robin_Watts	yes.	10:44.46
tor8	okay. good.	10:47.47
Robin_Watts	How do you explain the difference in 9 ?	10:47.51
	28 looks like a clear progression.	10:47.51
	34 too.	10:47.51
kens	I see 2 clear progressions, a bunch of pixely 'who cares' diffs and some oddities	10:48.32
	KenEg catx5720.pdf	10:48.52
	Oh damn	10:48.52
	I mean catx5720.pdf	10:49.00
	#80 looks like a progression too, not surprising given the bug title	10:50.03
	Also 86	10:50.20
	92 looks worse	10:50.56
chrisl	We should probably edit that out of the logs ^^	10:51.01
kens	Yeah please, if someone could do that, sorry	10:51.16
chrisl	Robin_Watts: can you do the honours please?	10:51.38
Robin_Watts	Ok, so if you're happy with 9, 80, 88 and 92, then I'm happy.	10:52.09
tor8	Robin_Watts: syntax error in the font descriptor object	10:52.14
kens	For number 92 MuPDF matches GS but not Acrobat, ths may be a first example of Acrobat doing something other than setting to 0	10:52.19
tor8	it has "/ItalicAngle -17.-21823" which got turned into 3 tokens	10:52.29
	and then parsing failed because -21823 is not a valid dictionary key	10:52.42
kens	It 'looks like' Acrobat turns '--' into '-'	10:52.57
Robin_Watts	so, -17.-21823 goes to -17.21823 ?	10:53.26
kens	No,	10:53.37
	--17.2 goes to -17.2	10:53.47
	~92 has lots of values with doublte negatives	10:54.00
tor8	no, it goes to "-17." and the rest is discarded	10:54.03
*Robin_Watts*	edits logs.	10:54.12
*kens*	defers to tor	10:54.13
	GS interprets teh '--' as invalid and sets the numbers to 0	10:54.35
	Whch gives the same result as MuPDF	10:54.45
	However Acrobat differs	10:54.54
	It looks to me like Acrobat is turning the '--' into a '-' and using the rest of the number	10:55.12
tor8	kens: I'm still talking about number 9	10:55.34
kens	Oh sorry, I was talking about #92	10:55.46
	WHich is actually a slight regression	10:56.17
tor8	kens: so for 92, if I change the '-' detection to a while loop it looks like the pdf creator intended	10:57.47
	so instead of if (c == '-') I just while (c == '-') to eat them all	10:58.13
kens	So you truncate the '--' back to a '-' ?	10:58.15
	Right	10:58.19
	Its the first instance I've seen where a malformed number is corrected instead of set to 0	10:58.36
tor8	but I don't actually negate the sign twice	10:58.38
kens	Yeah that's what I thnk is 'correct' or at least 'same as Acrobat'	10:58.57
	I guess I'll have to try and do that in GS as well :-(	10:59.20
	For #9 I don't see a great difference even with the GS output which sets the -18 to 0	10:59.39
	err -17 that is	10:59.55
tor8	kens: the font is embedded so I don't expect the -17 to actually show up in the render	11:00.18
	we dropped the embedded font because we got confused while parsing the dictionary and errored out	11:00.42
kens	Hmm, OK but I thought it might affect GS, seems it doesn't	11:00.42
	Oh OK	11:00.52
Robin_Watts	tor8: Are you up for the parliament trip in June?	11:01.18
kens	overall I'd say its a distinct improvement, and if you treat '--' as '-' its even better	11:01.20
tor8	85 is probably the best test to see if adobe sets to 0 or parses the initial bit	11:02.10
kens	Hmm, let me look at that one agin	11:02.38
	Acrobat actually throws a warning	11:03.24
	But it has only the largest square	11:03.33
	SO it is different to MuPDF and GS	11:03.48
tor8	kens: 40.-40 40+60 160 160-1 re s 80e0 80.abc 80 80 re s	11:04.00
*kens*	bangs head on table	11:04.19
	Obviously a hand-broken file	11:04.33
tor8	yes, it is. but it would show what acrobat does. what kind of warning does it toss out?	11:04.54
kens	the usual 'something is wrong and hte page may not display as expected'	11:05.14
tor8	right, I forgot how useful adobes errors are :)	11:05.32
kens	From prior experience, Acroibat stops processing when it throws that error	11:05.33
	So some part of that broken rect is causing Acrobat to give up	11:06.04
	I could repair bits of it to see where it stops I guess	11:06.17
tor8	Robin_Watts: not sure; how soon do I have to make up my mind?	11:06.20
kens	It looks like its OK with the first rect, but the second it throws an error	11:06.52
	I'm guessing its the .abc	11:07.01
tor8	kens: I suspect the 80e0 since it looks hexadecimal	11:07.30
kens	Give me a second, just changing it	11:07.40
tor8	but if not, then both 80e0 and 80.abc should be the same, number followed by a word	11:08.00
kens	If I take out the e0 then it just displays the first rectangle	11:08.07
tor8	80 then e0 and 80. then abc	11:08.09
Robin_Watts	tor8: Just trying to get an idea of numbers.	11:08.23
kens	So it looks like the e0 actually makes Acrobat error out	11:08.29
Robin_Watts	I'll put you down as a maybe.	11:08.41
kens	Interesting, the 80.abc is not treated as 0	11:09.34
tor8	Robin_Watts: Thanks. My disdain for politicians and everything they do might be overcome by the group's enthusiasm.	11:09.48
kens	Nor is it treated as 80, wtf ?	11:10.03
tor8	kens: huh, that's .... odd	11:10.16
*kens*	repeats the tests	11:10.27
tor8	does it try to parse it as 0x80.abc ?	11:10.28
kens	Hard to say	11:11.04
	If I change it to 0 80.abc 80 80 re s then it shows nothing	11:11.21
	If I change it to 0 80 80 80 re s then it strokes a rectangle at 0,80	11:11.40
	If I change it to 0 0 80 80 re s then it strokes a rectangle at 0 0	11:12.03
	SO what's it doing with the .abc ?	11:12.16
tor8	very inconsistent handling of numbers then!	11:12.18
kens	and teh .abc doesn't throw an error either.....	11:12.46
tor8	0 80 0 80 [ignored 80] re maybe?	11:13.00
kens	Hmm, that could be	11:13.11
	THa't'd be a 0 widht rect	11:13.19
	let me put the .abc in the first number	11:13.34
tor8	it'd still show up, it's stroked not filled	11:13.51
kens	Its not showing up at all with .abc no matter what I do	11:14.20
	And not giving an error	11:14.26
	Taking away one of the opernds doesn't do anything either	11:14.55
	It looks like Acrobat is sliently ignoring the error	11:15.06
	OK so if I deliberately create a rect with too few operands, Acrobat silently ignores it	11:15.43
	Hmm	11:16.18
	Oh boy	11:16.48
	It looks like Acrobt throws away the malformed number, then because there are too few opernds for the 're' it doesn';t draw it. But it doesn't throw an error either.	11:17.18
	What a pile of poo	11:17.25
	Obviously it 'fixes' at least one of the numbers in the larger rectangle	11:18.04
tor8	kens: indeed, I think the conclusion is, acrobat does arbitrary stuff to broken numbers.	11:18.49
kens	So ths: "0.abc 0 0 80 80 re s" produces an 80 rectangle at 0,0	11:19.12
	Whereas this : "0.abc 0 80 80 re s" silently produces no rectangle	11:19.32
	I don't thnk its really worth trying to duplicate this insane behaviour	11:19.56
	At a guess, Acrobat ignores numerals and signs in the middle of a number, truncating the number from that point. So "40.-40 40+60 160 160-1 re s" becomes " 40 40 160 160 re s"	11:21.32
	But alphas in a number it throws an error on	11:21.56
	The 80e0 I'm not so sure what its doing	11:22.31
	Except that it throws an actual error on that one	11:23.18
	endstreamYeah 80e0 throws an error, 80.e0 does not. Madness	11:24.47
tor8	so the integer and real parsers differ in how they handle errors. madness indeed!	11:25.11
kens	Well I wouldn't like to guess what's going on behind the screen	11:25.29
	It might be that they are saying that a .x is a missing traling 0, whereas a alpha in a number is not a missing whtespace	11:26.09
	In any event, I thnk your current approach is more than good enough	11:26.29
	I'll have a poke at GS and see if I can get it to treat '--' as '-' as well :-(	11:26.48
	Hah, GS already treats 40.-40 as 40.0, I didn't know that	11:27.57
	But 160-1 gets turned into a 0	11:28.10
Robin_Watts	ok, so, tor8: I need to code up that second pass now.	11:33.08
	Am I right in thinking that to find all the text in a paragraph I do a depth first search breaking the text at each 'break' node ?	11:34.09
	Or is this the time we should be looking at http://www.unicode.org/reports/tr14/ ?	11:36.45
tor8	I've sort-of hacked a partial implementation of tr14 already -- it's what creates the 'break' nodes in the first place	11:47.30
	sorry, 'glue' nodes	11:47.43
	the fz_html boxes that get spit out from generate_box come in four flavours	11:48.43
	BLOCK, BREAK, FLOW and INLINE	11:48.54
	for bidi you only care about the FLOW boxes	11:49.13
	ugh, I can barely remember how these things are strung together	11:50.27
	anyway, each BOX_FLOW has a paragraph or possibly more, depending on the presence of <br/> tags or being a <pre> tag, which will show up as FLOW_BREAK nodes	11:51.44
*kens*	lunches	12:29.53
NTQ	Is there an example on how to create PDF/A-2a? At the moment I always get PDF/A-2b. If there is no example, can I upload you my test scenario?	12:37.16
chrisl	NTQ: I don't know for sure, but I suspect that PDF/A-2a has many of the same requirements as PDF/A-1a. In which case, the answer is covered here: http://ghostscript.com/FAQ.html	13:01.34
NTQ	chrisl: Thank you. So because PDF/A-1a is not implemented, you also did not implement PDF/A-2a I guess.	13:10.01
	Because both of them have nearly the same restrictions.	13:10.24
Robin_Watts	tor8: Gotcha, ta, I'll give that a whirl.	13:11.31
chrisl	NTQ: The information to produce A-1a (and I assume A-2a) is not available by the time we (Ghostscript) see the input.	13:13.28
kens	Chris is correct, we cannot make PDF/A-xa files	13:16.10
	The spec (PDF/A-1a) specifically says you sare not supposed to guess at the document structure and without that, you cannot make a 'a' file.	13:16.34
NTQ	kens: Thank you. Then I will ask our costumer if he also would accept PDF/A-2b.	13:17.33
	The main reason why I want to use PDF/A-2 is transparency. We sometimes received PDF documents with transparent images. After creating a PDF/A-1b from such a document the whole page gets rendered as an image, so text too. And a few weeks ago I heard from you that it is not possible to only render the image again, excluding the text.	13:20.19
kens	You cannot easily tell whether any portion of the text is partially or fully transparent, so you have to render it all.	13:21.07
tor8	Robin_Watts: I expect we'll need to add arabic/hebrew fonts to mupdf now then?	13:22.09
Robin_Watts	tor8: some kind of fallback mechanism, yes.	13:22.36
tor8	Robin_Watts: it'll be easy enough to merge in DroidSansArabic and DroidSansHebrew into DroidSansFallback	13:23.24
Robin_Watts	but that doesn't solve for other languages.	13:24.00
	would be nicer to have a generic fallback system that could cascade through a set of script fallbacks.	13:24.24
tor8	Robin_Watts: yeah, agreed	13:24.32
	we only have a two-level fallback now	13:24.37
NTQ	kens: Sorry, I am not a PDF expert. But if I create a PDF/A-1b with Adobe Acrobat it recognizes exactly which parts of a page have to be rendered new and which not. What makes it hard to identify these parts of a page where a transparent image has any effect?	13:28.16
tor8	Robin_Watts: I'll take a stab at making a cascading fallback font system	13:30.05
kens	NTQ I didn't say it was impossible,I said 'easily'	13:32.35
	Say I draw some text, then paint some more stuff, then create a transparency group and draw through it. If paret of that group intersects the text, then the text must be rendered to an image	13:33.30
	But by the time we get to the transparency operation, we've already stored the text in the output PDF file.	13:33.50
	Its not impossible to preparse the entire PDF file, but it would mean totally rewriting our PDF output device, and frankly that's not going to happen	13:34.31
	The benefit is small, the cost is huge	13:34.47
NTQ	kens: Alright. Thank you. I fully understand now.	13:35.11
chrisl	We can produce PDF/A-2b IIRC	13:35.55
kens	We cna, yes	13:36.02
	Possibly even PDF/A-3 now I thnk	13:36.16
tor8	Robin_Watts: on tor/master there's a quick fix that merges DroidSansArabic and Hebrew into the CJK fallback fonte	13:59.21
Robin_Watts	Ta.	13:59.41
tor8	Robin_Watts: there's also a "direction" property in CSS that I don't currently pass on	14:00.35
	Robin_Watts: cocked up something with the encoding in that one, there's a new version of the commit up now	14:13.11
*Robin_Watts*	is just boggling at these html structures.	14:51.31
	surely they take a HUGE amount of memory ?	14:52.26
	44 bytes for every flow entry. And there is a flow entry for every word, plus another for every space.	14:53.14
	The type and expand can be combined into a flags word.	14:55.11
tor8	Robin_Watts: not to mention just how damned many of them there are! the fz_html_flow struct is overdue for a diet	14:55.34
Robin_Watts	I reckon we should be reference counting styles and sharing them where possible.	14:55.59
tor8	Robin_Watts: the *style is a pointer to the box's embedded struct	14:56.13
Robin_Watts	Ok, so can't we just omit that and always pass both a box pointer and a flow pointer ?	14:56.45
tor8	a flow box is always a child of a block box	14:57.23
Robin_Watts	fz_html_flow is always a child of an fz_html, you mean ?	14:57.56
tor8	but the inline boxes are also children of the block box, but the text content of the inline box lives in their sibling flow box	14:57.59
	and fz_html_flow is a child of a fz_html with the FLOW_BOX type	14:58.21
	but the flow->style does not necessarily point to the parent box's style	14:58.35
	it may point to it's uncle or cousin box's style	14:58.48
Robin_Watts	So... if I have <p>Mary had a <b>little</b>lamb</p>	14:59.42
	we'd have an inline box for the <b> section, then a flow box with "Mary" " " "had" " " "a" " " "little" " " "lamb"	15:00.56
tor8	you get a box tree: { block[p] { inline(b) {}, flow {"Mary had a ", "little", "lamb" } }	15:01.03
	yeah	15:01.05
Robin_Watts	and the style for "little" would point to the inline box.	15:01.09
tor8	yeah. I figured I'd save a little bit of memory (considering how much I already waste) by not making every flow node have its own style	15:01.56
Robin_Watts	Gotcha.	15:02.08
tor8	the inline boxes I don't use for anything other than creating the flow nodes, but I have to keep them around just because they hold the styles	15:02.20
Robin_Watts	tor8: I'd consider having a global style dictionary.	15:02.28
tor8	the inline boxes are needed for the css matching	15:02.29
	Yeah. that'd probably save a fair bit of memory.	15:02.47
Robin_Watts	and then instead of having pointers to the style, have indexes into the dictionary.	15:02.52
tor8	considering that each html node may have unique style attributes, but the vast majority of them will be shared	15:03.14
Robin_Watts	Would we still need the inline boxes then?	15:03.30
tor8	no, then we could free them once we're done	15:03.41
	but everything is allocated using the pool allocator now	15:03.56
Robin_Watts	We could allocate inlines using a different pool allocator.	15:04.11
	and then free that pool.	15:04.21
tor8	yeah.	15:04.23
	the fz_css_style could use bitfields for a lot of its fields	15:04.47
Robin_Watts	Paragraphs never extend outside a flow block, right?	15:05.15
tor8	and a lot of the flow properties are computable with a bit of care, so don't need to be stored	15:05.31
	define "extend"	15:05.52
Robin_Watts	block { flow { "This is a different paragraph" } block { flow "to this" } }	15:06.14
	block { flow { "This is a different paragraph" } block { flow { "to this" } } }	15:06.30
	When computing the directions of the text, I need to pass whole paragraphs to the code at once.	15:07.09
tor8	a single paragraph is never split into multiple flow boxes	15:07.24
Robin_Watts	That means passing the contents of a whole 'flow' at once, never having to combine multiple flows together.	15:07.33
	Cool.	15:07.35
tor8	a line break is always at the end of a flow box	15:07.47
	A smarter/dumber way is to not have the flow nodes at all and just have an array of where the spaces and breaks are in the text	15:09.23
	it'll mean more work during rendering, but would save huge amounts of memory	15:09.34
	and assign styles to spans of text	15:10.07
	so the flow box would look something like struct { char text; char spaces; char breaks; style styles; char **style_starts; }	15:11.05
	and then another array of breaks actually taken	15:11.49
Robin_Watts	Or, make use of some of the utf8 unused codes.	15:11.52
tor8	or just plain old escape codes	15:12.25
Robin_Watts	so char *text becomes a list of either valid utf8 codes, or invalid ones that act as escapes for 'break', 'change style' etc.	15:12.40
tor8	though I think we should hold off optimizing this too much until we've implemented a bit more	15:13.52
	bidi, floating around images, tables, hyphenation and tex-style global line breaking optimization	15:14.18
Robin_Watts	tor8: Yeah.	15:14.25
tor8	this structure is wasteful, but it's designed for rapid prototyping	15:14.40
Robin_Watts	I need to add a direction flag to fz_html_flow_s.	15:14.46
	so to do that I'll move expand and type and direction into a single bitfield.	15:15.02
tor8	Robin_Watts: sounds good.	15:15.12
	you could put text and image in a union	15:15.36
Robin_Watts	will do.	15:16.04
tor8	the x,y,w,h stuff is used for line layout so needs to stay	15:17.51
	the 'em' is calculated from the style, but depends on the tree context and the current font size set during layout so needs to be stored as well	15:18.59
Robin_Watts	tor8: I understand the need for w and h (to avoid repeated measuring). I don't get the need for x and y to be in the structure.	15:19.42
tor8	it's where the layout puts them so the drawing code can draw the node without redoing the layout	15:20.11
	Robin_Watts: one way to skip the x,y,w,h,em fields would be to create the fz_text node during layout instead of during drawing	15:27.50
HenryStiles	Robin_Watts, mvrhel_laptop: have you guys tried to login to RSA? I went through the entire process and now it says it doesn't know me. Pretty sure I did everything right.	16:33.38
Robin_Watts	HenryStiles: Yeah, worked for me.	16:36.11
	Well, i've registered etc, if that's what you meant.	16:36.36
HenryStiles	huh, it worked the second time around.	16:40.50
kub	mvrhel_laptop: hello	17:27.04
	mvrhel_laptop: how is GS Even Tone Screening invoked. We need it with 8/16bit CMYK for producing contone colors in 4 levels of gray.	17:41.27
Robin_Watts	kub: Hi. Are you a commercial customer of Artifex?	17:48.35
jogux	Robin_Watts: jub is the person kens was talking to this morning who hasn't yet had a reply from sales@, iirc.	17:50.04
	kub, even, sorry.	17:50.17
kens	scott has replied, I've seen the email	17:50.30
jogux	ah, I'd also not noticed kens had reconnected :)	17:50.44
kens	yeah network is fl;aky	17:50.57
Robin_Watts	Ok. I was interested to know if we (Artifex) had supplied the separate ETS code to kub, or whether he was trying to use the version of it that's pickled into the rinkj deviec in gs.	17:51.40
kens	We won't have supplied any new code, at least as yet	17:52.26
Robin_Watts	Ok, so I would expect it to be quite hard for kub to do any serious evaluation until he gets the latest version from us.	17:53.02
kens	kub did you get an email from Scott Sackett ?	17:54.49
	OK I'm off for hte night, have a good weekend everyone	18:05.19
kub	bbl	18:45.27
	Robin_Watts: not yet, but interested in becommig a commercial one	19:06.42
	kens: Scott Sackett did send me an email, and I replied.	19:07.12
Robin_Watts	kub: OK. It sounds like we need to get you a copy of the latest code for evaluation. I'm not sure I'm authorised to just send it out.	19:08.11
kub	Robin_Watts: is the ETS (EvenTone Screening) not inside AGPL GS?	19:08.12
Robin_Watts	HenryStiles: What's the process?	19:08.20
	kub: There is an old version of the code in gs, as part of the rinkj device.	19:08.40
	If you're doing customisation and tuning, then we have a standalone version that is probably easier to work with.	19:09.21
kub	aha	19:09.21
	ok	19:09.28
	-sDEVICE=rinkj -dGrayValues=4 gives a rangecheck error and without -dGrayValues I get a crash	19:15.06
	Robin_Watts: yes, the ETS code is appreciated for evaluating	19:16.04
Robin_Watts	kub: I need to get the OK from HenryStiles to send it out. His OK may or may not be conditional on getting a signed evaluation agreement between you and Scott.	19:17.10
kub	ah, git cloned, but that appears not sufficient from your wording	19:21.09
	Robin_Watts: will wait for your ping or/and email	19:22.03
HenryStiles	Robin_Watts: sorry at lunch, it's fine to send it.	19:47.26
Robin_Watts	kub: Email address?	20:14.02
HenryStiles	it's fine not having the latest stuff out but is ets the reason for rinkj not working? I guess we don't know that.	20:27.21
	rinkj should work.	20:27.35
Robin_Watts	I have never used rinkj in my life.	20:29.32
kub	rinkj appears to need some setup, which I omitted	20:30.31
	http://ghostscript.com/doc/current/Devices.htm#Rinkj	20:31.14
HenryStiles	kub: breaks for me with setup too.	20:36.30
	kub: so the first sentence of the documentation is spot on ;-)	20:40.01
	Robin_Watts: the device uses a color manager directly, it crashes lcms. What is truly bizarre is we have we make this call if lcms_deshandle is NULL, des_color_space = cmsGetPCS(lcms_deshandle) but the first thing that function does is dereference lcms_deshandle so something is awry in the code generally (rinkj aside)	20:54.14
	mvrhel_laptop: ^^^	20:54.19
	gsicc_lcms2.c:523 des_color_space = cmsGetPCS(lcms_deshandle);	20:55.53
	sorry I'll rewrite that gibberish if needed	20:57.57
mvrhel_laptop	kub are you still there	22:28.25
	HenryStiles: I was able to login to the RSA, but it appears that I already had an account with my artifex email	22:29.10
	HenryStiles: It looks like rinkj is really screwed up. I will see if I can get it working after I finish up this font stuff in mupdf	22:32.15
HenryStiles	mvrhel_laptop: a little more worried about gsicc_lcms2.c, maybe that case can never happen?	22:58.44
mvrhel_laptop	oh let me look hold on	22:59.04
	that makes no sense.. hold on	23:01.02
	HenryStiles: I am going to have to take a closer look into this to understand when or how this case could occur. My comment /* We must have a device link profile. */ is a clue.	23:05.09
HenryStiles	mvrhel_laptop: don't interrupt your mupdf stuff, but I thought you'd want to know about it.	23:07.30
mvrhel_laptop	HenryStiles: thanks. I suspect it is supposed to be lcms_srchandle in line 526 but I will take a closer look at it later.	23:08.01
	I will open a bug to remind myself	23:08.07
	Forward 1 day (to 2016/01/16)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.