Ghostscript IRC logs

	<<<Back 1 day (to 2016/10/23)	20161024
Robin_Watts	sebras: For the logs: I will look at your commit tomorrow, but I agree tor8 should take a look too.	00:10.24
	I have a proposed fix on my branch for the slowdown bug. Just clustering it now.	00:10.41
	gotta check one file that is now timing out.	00:25.14
sebras	tor8: morning?	11:16.40
tor8	morning!	11:18.15
	sebras: I have a few comments on your commits	11:18.24
	FZ_STEXT_EXPAND_WHITESPACE .. is not really expanding whitespace	11:18.36
	normalize, clean, or maybe collapse (to match css/html whitespace terminology)	11:19.02
	the comment says replace with "space characters of variable width" ... variable width? I think you mean "regular space characters"	11:19.25
sebras	tor8: well, codepoint-wise yes, but width-wise no.	11:19.47
	tor8: this was due to a file that fred had issues with that contained tab chars IIRC.	11:20.17
tor8	well, stext is mainly about the character codes so I think being clear that it means U+0032 SPACE is important	11:21.24
	ahem, U+0020	11:21.33
sebras	tor8:2yes2i2agree.	11:21.58
tor8	in the last bug fix for 697236, you change the actual output	11:23.07
	it used to be that we'd emit an array for a ligature with charater code/glyph indexes [glyph-id-of-ffi/f, -1/f, -1/i)	11:24.00
	now we emit [-1/f, -1/f, -1/i]	11:24.12
	the intent of the output array is to match what XPS calls 'cluster mapping'	11:24.35
	where you have N-to-M glyph index to character code clusters	11:24.50
	we do that by passing -1 for character codes or glyph indexes that belong to the cluster but don't have a separate character code or glyph	11:25.30
sebras	ok, right but besides getting the gids wrong, how about the computed widths? are you happy with this approach?	11:26.17
	the gids ought to be easy to fix I belive.	11:26.24
	tor8: here, you get an extra character for the previous message: e	11:26.47
tor8	yes, that's a reasonable approach. it doesn't quite match what happens with harfbuzz and how ligatures are emitted in say HTML	11:26.50
	and opentype	11:26.53
	sebras: thanks. can't have to many eeees	11:27.04
sebras	tor8: how are they handled in hb/html?	11:27.12
tor8	sebras: oh, and have an extra 'o'	11:27.12
	the widths are all put on the glyphs, and the characters have zero width	11:27.34
	there is no reasonable way to do otherwise for things like the indic language syllable reshaping	11:27.53
	sebras: have you got a sample file somewhere called "cluster.xhtml"?	11:28.35
	or is that not committed to the cluster test files?	11:28.43
sebras	tor8: what is cluster.xhtml?	11:29.15
tor8	http://ghostscript.com/~tor/epub-tests/	11:29.22
sebras	oh.	11:29.27
	no, that file is not mine.	11:29.31
tor8	it has a bunch of text that exercises harfbuzz	11:29.35
sebras	ah.	11:29.43
tor8	samples of many-to-one (f,i -> fi ligature), one-to-many, and many-to-many mappings	11:30.20
sebras	tor8: my commits originate from reading bug 697236 and discussing it with robin last night.	11:31.10
	s/night/sleep cycle/	11:31.20
tor8	sebras: http://pastebin.com/raw/ii2jtGLj has the output of what is drawn for cluster.xhtml	11:33.07
	and that comes out crap, with an extra space character after 'fi'	11:34.02
	I thought I'd fixed that... sigh	11:34.07
sebras	tor8: I know why it happenbs.	11:34.20
	tor8: we add characters when expanding the ligatures.	11:34.37
	and the trm is never updated between characters.	11:35.00
tor8	<char bbox="48 46.696878 83677230000000000 111.595317" x="48" y="92.4" c="i"/> is what comes out with sebras/master with stext device	11:35.23
sebras	so eventually we are so far from the left hand side of the span that we decide to add a space.	11:35.24
	apparently I need to download cluster.xhtml	11:35.48
	tor8: did my explanation about the eeeeeeeeeeeeeeextraneous space make sense?	11:41.40
	tor8: ehm... crouton decided to give you a few extra e:s.	11:42.05
tor8	the trim should be updated between glyphs, and if there's a diff between what the font's advance says it sohuld be and where actually ends up we add a space	11:44.03
	trim? trm!	11:44.11
	I've borrowed kens keypboard today too....	11:44.18
kens	:-)	11:44.27
	Comedy keyboards R us	11:44.36
sebras	kens: :)	11:47.27
	tor8: so on origin/master	11:48.05
	in source/fitz/stext-device.c in fz_add_stext_char_imp()	11:48.28
	around line 617 we compute p and q for the character	11:48.43
	because trm is never updated between constituent characters in ligatures	11:49.04
	shouldn't we accumulate the advance per character?	11:49.48
	because i think delta will later be computed incorrectly, if we do not.	11:50.31
	IIUC this is the source of the extraneous space.	11:51.03
tor8	sebras: yes. it seems like fz_add_stext_char doesn't cope with gid==-1 and/or c==	11:55.41
	-1	11:55.43
sebras	well, that's not the mistake I think.	11:56.47
	tor8: basically if gid < 0 we simply make sure not to add any extra space between the chars.	11:57.07
	tor8: which ought to be fine for ligatures.	11:57.17
tor8	if we look at the EPUB case where we want the c/glyph/adv tuples for "fi" to be [char_f/glyph_fi/100, char_i/-1/0]	11:57.21
sebras	tor8: 100 and 0 being the widths of the characters?	11:57.51
tor8	yes	11:57.56
	100 is the width of the 'fi' ligature	11:58.01
	0 is the width of nothing	11:58.22
sebras	:) yes, i got that much!	11:58.38
tor8	for EPUB we don't do the split the width evenly because the ligature handling is all done by harfbuzz	11:58.39
sebras	does harfbuzz split ligatures and figure out the constituent chars individual widths?	11:59.21
	or for epub that is simply never desirable?	11:59.35
tor8	unfortunately harfbuzz emits the 'fi' and -1 characters at the same spot, and I think that triggers the addition of a space	11:59.47
	<g unicode="f" glyph="fi" x="36" y="68.4" />	12:00.27
	<g unicode="i" glyph="-1" x="36" y="68.4" />	12:00.27
	<g unicode="l" glyph="l" x="58.640626" y="68.4" />	12:00.27
	that's what harfbuzz spits out for "fil" when it makes a ligature	12:01.21
sebras	tor8: yes, if that happens then we'll detect the just from 36 to 58.640626 and compare that to our space detection.	12:01.30
tor8	and the delta between 'f' and 'i' will be negative	12:01.53
	and thus ignored (as kerning)	12:02.15
	and then the delta between 'i' and 'l' is really big	12:02.25
	so yes, I understand what's going on but not quite sure how we fix it	12:02.36
sebras	tor8: the width of the ligature, right..?	12:02.41
tor8	well, we fix it by ignoring adding spaces between any -1 glyphs or characters	12:02.58
	yeah, the width of the ligature, erroneously compared with the width of .notdef	12:03.18
sebras	neither before, nor after -1 characters.	12:03.22
tor8	adv = fz_advance_glyph(ctx, style->font, -1, wmode); in fz_add_stext_char	12:03.32
sebras	yeah, after i moved that code in there.	12:04.02
	but maybe that was not a good idea.	12:04.10
tor8	the pen advancement space addition stuff should just ignore all glyph==-1 \|\| character==-1	12:04.11
	the closing brace after EXPAND_WHITESPACE is on the wrong indent	12:04.38
sebras	tor8: it is indeed.	12:06.28
	wait... from where do you infer that fz_advance_glyph() uses gid == -1?	12:07.58
	tor8: it will compute the width of the ligature, not the -1 chars.	12:08.14
	tor8: that's why I take pains to call fz_encode_character_with_fallback() in fz_add_stext_ligature().\	12:08.55
tor8	on the 'i' character	12:08.56
	this is for epub. we don't call fz_add_stext_ligature.	12:09.18
sebras	oh.	12:09.29
tor8	that's only for when we use compatibility unicode character 'fi'	12:09.47
sebras	so for epub harfbuzz has split up the ligatures before we end up in stext?	12:09.50
tor8	harfbuzz has created a ligature and created a fz_text object with the ligature pre-split	12:10.25
*sebras*	can feel tor8's frustration from .tw	12:10.46
tor8	the input to harfbuzz is 'f', 'i' and the output is f/fi, i/-1	12:11.08
	the input to pdf is 'fi' and output is 'fi' and we want to split that in the stext-	12:11.24
	device	12:11.27
	sebras: :)	12:11.46
sebras	ok my concept of when things happen for epub is a bit hazy (as with most things)	12:12.09
tor8	I'm not sure fz_encode_character_with_fallback is the proper approach in fz_add_stext_ligature	12:12.22
	that will load fallback fonts for PDF	12:12.31
	which is not really desired	12:12.39
sebras	tor8: I needed to figure out the gid somehow and this was the best I could come up with.	12:12.53
tor8	fz_encode_character, and if that fails, take a wild guess	12:13.10
sebras	but if we keep the gid == -1 and the widths == 0 then this is not needed.	12:13.14
tor8	but let's discuss that after we fix the epub case where ligatures are pre-split	12:13.37
	the reason I just split evenly in the old ligature code is because of not being able to guarantee I can get the advances for the parts of the ligature	12:14.36
	due to funky embedded pdf fonts that have stripped out th encoding tables and use Identity-H encodings	12:15.10
sebras	tor8: right, I also worried that when asking the font it might not have the constituent chars.	12:15.15
tor8	sebras: I need to think about this for a bit	12:17.13
	how to handle incoming ligatures in the stext device	12:17.24
	I might experiment a bit with your commits	12:17.36
sebras	tor8: no worries.	12:17.43
	this was more complicated than i anticipated.	12:18.00
tor8	sebras: it always is :)	12:18.11
sebras	tor8: basically i don't see any reason to keep my commits around, because they feel... incorrect.	12:18.55
tor8	sebras: Robin_Watts: what would you call the flag for turning all whitespace into regular space characters?	12:18.56
	I'd probably go with FZ_STEXT_NORMALIZE_WHITESPACE	12:19.24
sebras	tor8: ok.	12:19.58
tor8	sebras: the lastchar commits should be good, if you fix the commit message and indentation	12:20.34
	sebras: the change from PRESERVE to EXPAND ... is that to change the default option?	12:21.08
sebras	tor8: basically the bug reporter wanted to be able to disable both types of preservation.	12:22.03
	tor8: and if I inverted the interpretation of the flags then this was possible.	12:22.19
	tor8: otherwise I'd need a special value to denote: disable all.	12:22.34
tor8	sebras: something must have gone wrong with the old code...	12:23.00
	the default should be OFF for both preserves	12:23.08
Robin_Watts	tor8: "FZ_STEXT_NORMALIZE_WHITESPACE" sounds good to me.	12:24.41
sebras	tor8: really!? I thought you wanted stext to _not_ expand ligatures or translate tabs into variable width space characters by default..?	12:25.17
tor8	my original thought is that we parse them with fz_has_option("preserve-whitespace")	12:25.21
Robin_Watts	2 commits on robin/master.	12:25.30
tor8	which is why the default should be to normalize/expand and you can turn it off by setting the preserve option	12:25.45
	but that must have gotten lost in translation somewhere	12:25.51
Robin_Watts	The first commit tested out all but for 1 file last night. The second commit should fix that I hope. Clustering now.	12:25.53
sebras	tor8: ehm.. i think this is quite similar to what the code looked like from the beginning, albeit I parsed everything in fz_device instead of in fz_stext.	12:26.46
tor8	Robin_Watts: both LGTM	12:26.55
sebras	tor8: I remeber having this discussion with robin at least.	12:27.13
	tor8: but then we'd have to change the interface for the hints for the device.	12:27.30
tor8	sebras: yes, we moved it (but apparently not completely) to the subclass device options rather than device hints	12:27.34
sebras	tor8: no, i messed up the default values, sorry.	12:27.56
	tor8: and if we have hints as a bitfield in device I think it makes sense to have the same for stext.	12:28.25
tor8	IMO the defaults should be options that are 'off', and we turn them 'on' when we want the other behaviour	12:28.30
sebras	thought personally i'd prefer them to be strings (as these are more extensible and we don't create a milion devices per second in any case)	12:28.52
tor8	sebras: pdf-write and draw-device has options parsing	12:29.09
	by strings	12:29.12
	but there we pre-parse them into option structs with bitfields	12:29.52
	we could do something similar with stext_device, fz_parse_stext_options returns the bitmask to pass as the option argument	12:30.18
Robin_Watts	tor8: clustered and pushed, thanks.	12:30.38
tor8	or even a struct, if we want to add more tunable heuristic arguments	12:30.41
	sebras: shouldn't the last line in fz_new_stext_device just be dev->options = options without a ternary expression?	12:32.58
	and that would fix the bug with the not being able to disable both	12:33.16
	and we keep the 'preserve' name	12:33.26
sebras	so if I set options to FZ_STEXT_PRESERVE_LIGATURES then I'd actually expand them..?	12:34.14
tor8	and add device option parsing to mutool draw -Fstext	12:34.18
sebras	you wanted ligature expansion to be inactive by default, no..?	12:34.51
	and having to be explicitly activated.	12:35.01
tor8	I want cleaning up crap to be the default	12:35.09
	so PRESERVE should be off by default	12:35.24
	and fredross-perry can turn it on to preserve tabs in his app	12:35.35
sebras	tor8: fred actually wanted the output to contain spaces instead of tabs to be able to detect "words: properly.	12:36.41
	("words" are hard to define though, but whitespace delimited entities is what he was looking IIRC)	12:37.12
	looking for.	12:37.21
	so he wanted conversion from 0x09 -> 0x20. he never really asked for being able to control ligatures but it is a similar type of interpretation that a use might want to control.	12:38.31
tor8	sebras: right, so fred should be okay with the defaults being to normalize and expand (which should be options field 0)	12:47.52
sebras	tor8: indeed.	12:48.20
Robin_Watts	I need a review for: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=e332187c7f1259756b1accf8ef5b49f3068fc79b	13:18.01
	I will prod Paul and Joseph too, cos it's androidy (and fred if he comes on line)	13:18.15
sebras	Robin_Watts: i'm not well equipped to reivew that code.	13:20.49
*Robin_Watts*	either. It looks plausible, but...	13:21.31
jogux	Robin_Watts: If I understand everything correctly, then 'cancelAndWait' seems like a bad name for a function that doesn't wait.	13:25.02
	It otherwise seems plausible, but I think I'd rename that function rather than leaving it as a trap for the next person that works with the code :)	13:28.15
Robin_Watts	s/cancelAndWait/cancel/ then ? seems reasonable.	13:30.20
jogux	yeah, assuming cancel isn't reserved/in use :-)	13:34.48
Robin_Watts	jogux: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=a76bbd0f0b2f966a04a560cb02ba2d926f0e4adb	13:48.23
	That seems to work.	13:48.27
jogux	Robin_Watts: Cool. Seems to means we're somewhat away from what Matt originally tried to achieve in http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commit;f=platform/android/viewer/src/com/artifex/mupdfdemo/CancellableAsyncTask.java;h=d7ece4132d6219ee10ba9ed85a9f2a052a6bb92c - but... no worse than we currently are, except with the bonus of not crashing.	13:51.20
tor8	sebras: ligature space fix on tor/master	15:33.19
*sebras*	fetches.	15:38.20
	dfa1d0e ought to mention fixing bug 697236..?	15:39.59
	tor8: besides that it LGTM.	15:42.19
tor8	sebras: ah, that's the bug number.	15:51.22
	I lost it when you deleted your commits :)	15:51.33
	so, right now ligatures will look the same whether they come from epub or pdf	15:52.29
	when they come from pdf we can try to make their bboxes be more like if they weren't	15:53.00
	but I think aiming for consistency is probably better here	15:53.23
Robin_Watts	tor8: Ideally bboxes should be 'sane' for text selection.	15:54.35
tor8	sebras: hmmm. it looks like the final character in a word has its bbox merged with inserted fake space characters	15:54.48
	Robin_Watts: define sane for indic languages with syllable rearrangements, or thai with one-to-many mappings :)	15:55.12
Robin_Watts	ie. the bbox for 'ffi' should be approximately the same as 'f' 'f' 'i'.	15:55.28
	for such things, not sane is fine :)	15:55.34
tor8	Robin_Watts: the output from harfbuzz has lost that information though	15:55.54
	and for PDF we can't reliably recover it since we can't trust the font encoding	15:56.19
	we can try to evenly divide the ligature/cluster into equal parts	15:56.34
Robin_Watts	So we can't know that 'ffi' is actually 'ffi' and not something else ?	15:56.57
tor8	Robin_Watts: we don't know which glyph is the 'f' character, if it even exists in the font	15:57.26
Robin_Watts	Right.	15:57.33
tor8	so we can't guarantee we can measure the width of 'f' and 'i' to do the split proportionally	15:57.43
	we can definitely try, but that may be more wrong than just dividing it evenly	15:57.59
	many-to-one mappings (such as f,f,i to ffi) we can divide up in the stext-device if we rearchitect it a bit	15:58.57
	one-to-many (like a hypothetical Ã¥ in some fancy font that decomposes it to a,aring glyphs)	15:59.45
	those we just drop the 'aring' part in the stext device as it is written now	16:00.24
	so, dividing the many-to-one mappings evenly might be the sanest approach for everything that doesn't involve digging through opentype tables for caret positions	16:01.02
	it might be that harfbuzz can give us those, in which case we really need to rearchitect things	16:01.17
sebras	tor8: welll if can_append == 0 we add a new span	16:03.11
	tor8: and in that span the first thing we might do is insert a space and we use p for both the space and the char.	16:03.36
tor8	sebras: so, adding a fake space character seems to get the p,q values wrong too	16:03.41
sebras	tor8: meybe we should update p and q accordingly when inserting the space?	16:03.49
tor8	the p should be the old pen value	16:04.00
sebras	tor8: and q shoudl be adv more advanced.	16:04.56
	basically increment both p and q by r and you ought to be fine..?	16:05.18
tor8	hm, add_space has the 'r' and 'p' arguments to add_char reversed	16:05.34
sebras	no I don;t think so.	16:05.52
	tor8: i think the idea is that r extends 0.2f to the left of trm.e	16:06.11
tor8	the idea is that 'r' is the start of the character following the space	16:06.14
	I mean, 'p' is the start of the character following the space	16:06.24
	so 'r' is calculated to be a space character's width before that	16:06.32
sebras	ah...	16:06.47
tor8	so we should be adding space,r,p then c,p,q	16:06.56
sebras	but wait...	16:07.05
	does this take wmode into account?	16:07.12
tor8	or better yet, take 'r' from the cur_span->max.x if possible	16:07.12
	nope, it does not	16:08.00
sebras	ok, cool, then I at least spotted something.	16:08.15
tor8	sebras: yes! thanks for spotting that. I have a fix on tor/master.	16:11.07
	which uses the actual space width if available (which I think it should always be possible to do)	16:11.42
	but I left the old 0.2 width case in there for safety	16:11.58
sebras	yeah, but the old 0.2 doesn't take wmode into account. :)	16:13.43
	now, I don't think that we can end up in the situation where we want to add a space without having a cur_span though.	16:14.29
	correct. if cur_span == NULL then can_append == 0 and add_space is consequently never set.	16:15.04
	why is there an ifdef around setting spacing = 0 in this function?	16:16.23
	around prints() seems sane though.	16:16.45
tor8	sebras: I'm guessing because it's only used by the ifdef'd printf in the next ifdef chunk	16:18.52
sebras	tor8: we init spacing to 0, kill the ifdef please. :)	16:20.01
	oh, wrong if-statement!	16:20.22
tor8	I can kill the ifdef anyway, and always set spacing=0 if you like	16:20.33
sebras	maybe I should just keep quiet. :)	16:20.37
	tor8: maybe with this new found knowledge it is a good time to tackle http://bugs.ghostscript.com/show_bug.cgi?id=697074 ?	16:24.57
tor8	sebras: ohhhh	16:31.50
	I spotted another "problem" with out search	16:31.58
sebras	tor8: ok..?	16:32.06
tor8	if I search for 'mm' I will find overlapping hits :)	16:32.08
	both the start and end of 'mmm' will result in a hit	16:32.16
	(mm)m and m(mm)	16:32.25
	so double highlighted!	16:32.31
sebras	tor8: elegant!	16:33.48
	tor8: luckily the x11 viewer uses Xor for highlighting so I never get to see the double highlighting. ;)	16:34.16
tor8	and that file runs into the 'too many results' problem with having a fixed array of result hit boxes	16:35.57
sebras	tor8: oh, I simply created a test file: <html><body>commmit :)	16:37.33
tor8	okay, more commits on tor/master	16:43.28
sebras	tor8: LGTM.	16:48.21
	tor8: this doesn't cover the top three commits on tor/master though.	16:50.42
tor8	sebras: nope, those need to wait until post release	16:51.52
	sebras: and thanks!	16:51.59
sebras	tor8: did you decide to tackle 697074 too or, did you give up on that one?	16:53.44
tor8	sebras: that one is fixed by the space coordinate commit	17:02.18
	I guess I could put the bug number in. I spotted the same error before you mentioned the bug :)	17:02.31
	bah, too late. I already pushed.	17:02.40
sebras	tor8: :)	17:02.45
	tor8: well, close the bug at least!	17:02.50
tor8	I shall	17:04.36
	hm, thinknig a bit here, but isn't most kerning negative (as in bringing characters closer) rather than positive. thinking our heuristic for space detection could be a bit tighter if we only look for positive spaces...	17:05.34
	but that could fall into traps with things like apple's word processor adding even character spacing between all characters for justification sometimes	17:06.03
Robin_Watts	All justification is done by adding space, AIUI.	17:07.44
	I've never knowingly seen justification that squashed text together.	17:08.11
tor8	Robin_Watts: I'm talking about the abomination that adds space between all characters	17:08.44
	not the usual kind that stretches the space characters	17:09.03
	some justification can shrink space characters though. I've seen that often enough.	17:09.36
sebras	tor8: for 696710 (that also mentioned stext): both evince and xpdf manage to see the spaces on the first line of the abstract.	17:10.44
tor8	ah, no, looking through the AFM files we have for kerning pairs, 1/5th of pairs are positive	17:11.47
sebras	and PDF.js	17:11.51
tor8	sebras: we can too, if we change the SPACE_DIST constant in stext-device	17:13.51
sebras	tor8: I know.	17:14.13
	I wonder how the other viewers do it though.	17:14.37
	to avoid false positives.	17:14.51
tor8	probably check the sign of the extra spacing	17:15.33
	sebras: we should add text output to the cluster	17:17.43
sebras	tor8: that'd be useful.	17:19.34
tor8	okay, commit on tor master. if you could test that with a bunch of weird files I'd appreciate it :)	17:20.48
	but now I need to eat dinner and you need to sleep, I think...	17:21.09
sebras	tor8: hm...	17:22.44
	tor8: similar code exists in strain_soup().	17:28.02
	I'm thinking of the move ob fabsf().	17:30.17
	but I'm to tired to figure out how strain_soup() would handle negative spacing.	17:30.46
	maybe it cannot be negative due to other reasons though.	17:30.57
*sebras*	sleeps.	18:45.57
	Forward 1 day (to 2016/10/25)>>>

Log of #ghostscript at irc.freenode.net.