Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2016/02/02)	20160203
sebras	tor8: there is a new tentative patch over at sebras/master. it doesn't implement the full width property support, but it is a simple start. at least the file in question is now displaying properly.	00:45.24
	tor8: I also took a stab at soft hyphens, just to familiarize myself with the epub code. I think I might be on to something, but there is still something about those glue nodes that I don't fully get.	00:46.13
Robin_Watts	sebras: hyphens definitely need to be thought about in the context of shaping.	09:40.46
	Morning tor8.	11:04.10
	tor8, sebras: Am I right in thinking that glue nodes don't have content, or position or styles etc?	11:06.08
	In fact, couldn't we represent a glue node by a bit on a node that says: "And this is followed by a glue node"?	11:06.48
tor8	Robin_Watts: glue nodes should be able to have different widths (for hairline spaces, em-spaces, etc)	11:21.50
Robin_Watts	tor8: OK.	11:22.11
tor8	Robin_Watts: about Java, I added the JavaDevice to get around some trouble I had with constructors and finalizers but I've managed to solve things without adding the extra class now	11:22.37
Robin_Watts	So every node could have a width for the glue node that follows it, where width = 0 means no node ?	11:22.41
	tor8: Ah, cool.	11:22.55
tor8	Robin_Watts: also non-breaking spaces that still need to be adjusted for width	11:23.05
	I was hoping to be able to get the actual java exception message out and throw that in fz_throw_java	11:23.29
Robin_Watts	tor8: So, width and a flag to say whether they are non-breaking or not?	11:24.08
tor8	and about the style, I too prefer having the opening brace for java functions on a separate line, but this is the conventional java style	11:24.22
	Robin_Watts: yeah. that should save us a bunch of nodes.	11:24.40
Robin_Watts	I have 9 bits left in the flagword :)	11:24.51
tor8	:)	11:24.58
	then go for it!	11:25.03
	I did notice you added some 8-bit bitfields, can't those just be unsigned chars?	11:25.31
Robin_Watts	They could, but I'm not convinced that compilers are smart enough to pack bitfields into chars/shorts rather than ints	11:28.54
tor8	http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-141270.html#381	11:28.56
	Robin_Watts: ah, right. so you're filling it up until you hit 32 bits.	11:29.27
Robin_Watts	tor8: yeah.	11:29.38
	well, I have 32bits in my head as the amount to try to stay within.	11:30.01
	scripts take just over 7 bits.	11:30.16
	lang isn't done yet, and I need to remove the fallback one.	11:30.34
	As to the paren style, we don't follow recommendations anywhere else :)	11:31.01
	brace style, I mean.	11:31.07
	given that we push to a new line for if () {, I don't see why we don't do the same for int function(...) { too.	11:31.42
	cos ultimately you want to be able to easily look back to check stuff matches.	11:31.58
tor8	our C code uses the 'allman' brace style, but for java the convention is more like K&R. personally I like fewer mostly blank lines just so I can add real blank lines so they stand out better to separate code into paragraphs of sorts	11:33.27
Robin_Watts	Crap. Stupid windows vs2015 runtime has a bug.	11:34.06
	_read, given a single return in text mode returns 0.	11:34.33
tor8	a "\r" not "\r\n" ?	11:39.22
Robin_Watts	The _read implementation checks for \r\n and converts it to \n	11:39.57
	but in this case, it gets just \r, converts it to \n, and forgets that it read it (so returns 0 bytes read)	11:40.21
	It's a clear change in behaviour from previous VS builds.	11:40.38
	and it means the gs code detects it as being EOF.	11:40.53
	and closes the console down.	11:41.11
	I have a hacky fix.	11:41.25
tor8	eww. nasty.	11:58.15
Robin_Watts	tor8: I might look at simon91's atof thing now ?	12:52.47
torand	Robin_Watts: sure	13:13.10
sebras	torand: Robin_Watts: I have been toying around with letting glue nodes be used for hyphens.	13:56.16
Robin_Watts	sebras: So you'd insert glue nodes in the middle of words?	13:56.45
sebras	I'm not sure if that is a good idea. I'm just playing with it to understand it a bit better.	13:56.46
	Robin_Watts: indeed.	13:56.50
Robin_Watts	sebras: How do you know where it is safe to insert a hyphen?	13:57.13
sebras	Robin_Watts: with content.text set to "-", and it I'm thinking that maybe it would only show it is at the end of the line.	13:57.22
	Robin_Watts: actually in my case the originaltext contains which is a soft hyphen.	13:57.43
torand	sebras: my original idea was to put soft hyphens as break nodes	13:57.56
	with alternate text content for broken and unbroken	13:58.06
sebras	Robin_Watts: they are normally not shown, unless the are used as a line breaking point at the end of the line and then they are rendered as normal hyphens.	13:58.12
Robin_Watts	sebras: Gotcha.	13:58.28
sebras	torand: ah! yeah, that is a another idea.	13:58.33
	tor8: welcome back!	13:58.37
	tor8: I'll try to use the break nodes instead in that case.	13:59.01
	tor8: but at least now I understand the layout functions a bit better. :)	13:59.18
Robin_Watts	The 'interesting' question for hyphens is whether we can support hypenating text that is not marked with	13:59.23
	auto-	13:59.41
tor8	Robin_Watts: okay, updated java commit that does away with the JavaDevice and rethrows java exceptions using the exception.toString() string to fitz	13:59.42
Robin_Watts	matically.	13:59.44
	tor8: will look in just a mo.	14:00.02
sebras	Robin_Watts: to do that we'd need to know the language and how hyphenation works in that language, no..?	14:00.12
tor8	Robin_Watts: we could, given the presence of lang="" html attributes, preprocess the text using the TeX hyphenation tables to insert characters	14:00.19
Robin_Watts	tor8: That would be a reasonable approach.	14:00.37
	I did wonder about just encoding hyphenatable positions into the text as an illegal unicode value.	14:01.06
tor8	the "problem" is we'll need to insert some sort of penalty calculation -- when do we prefer to break a word in the middle or leave extra whitespace instead	14:01.16
	breaking at every possible hyphenation point is going to be ugly, there needs to be some tweakable value	14:02.02
Robin_Watts	So rather than having "notif" "ic" "ation" we'd have "notifXicAation"	14:02.04
tor8	all as one flow node?	14:02.29
Robin_Watts	Yes.	14:02.36
	It means we need to be smart about splitting the last flow node on a line.	14:02.55
tor8	my original thoughts when designing the flow nodes was to be ["notif"] + ["-"/""] + "ic" + ["-"/""] + ["ation"]	14:03.12
Robin_Watts	but it avoids needlessly exploding the number of nodes.	14:03.19
*Robin_Watts*	is called for lunch by the boss. bbs.	14:03.47
tor8	the nodes are there to eventually allow for a shortest-path search to optimize the line breaks	14:03.57
Robin_Watts	tor8: You envisage more than a simple greedy algorithm eventually then ?	14:04.33
tor8	I do. I hope to get the TeX layout algorithm in there eventually.	14:05.00
Robin_Watts	I'm not sure how well the TeX algorithm fits with the unicode stuff and shaping. But that's ignorance on my part.	14:05.41
tor8	and do small typographic tweaks like hanging punctuation (in essence kerning . and , against the margin)	14:05.42
Robin_Watts	bbs.	14:06.01
tor8	the tex algorithm is just picking the line breaks and hyphenation points to minimize the number of hyphenations and lines with exceeding amounts of stretching for justification	14:06.53
Robin_Watts	tor8: fair enough.	14:40.27
	tor8: Couple of commits on robin/master	16:23.45
tor8	'effectievely' typo in commit message	16:29.11
Robin_Watts	Fixed.	16:30.11
tor8	blank line between fz_runetochar and fz_runelen looks like it has disappeared	16:30.28
	not sure why you broke the comment into two lines, are you using an 80-column editor?	16:31.00
	have not read strtof.c but other than my comments above LGTM	16:31.28
Robin_Watts	tor8: I am not using an 80 char editor, but I try to keep comments to shorter than 72 chars.	16:31.36
tor8	I dislike broken commenst (because of the extra '*' at the beginning of the line	16:32.08
	sometimes I wish we could just use '//' comments	16:32.36
	for the last comment -- do we really care that much about matching adobe's arguably broken behaviour?	16:33.19
	s/comment/commit/	16:33.24
Robin_Watts	oh, that comment change was simon :)	16:33.51
kens	Customers complain if we don't :-(	16:33.58
tor8	kens: this is the insane number overflow parsing	16:34.22
kens	Yeah, and like I said :-)	16:34.31
tor8	kens: bah! stupid customers!	16:34.40
	kens: too bad I can't just wish them to go away ;)	16:34.48
Robin_Watts	tor8: There are files in our test suite that go horribly wrong without it.	16:34.49
kens	The reason I changed GS's handlign was to match (better) Adobe	16:34.50
tor8	Robin_Watts: kens: okay, fair enough. it stays then I guess.	16:35.03
*kens*	doesn't like it one bit either.....	16:35.14
Robin_Watts	Ok, so I'm going to look at the gs tiling problem bug.	16:35.31
	tor8: Thanks for the review.	16:35.40
mvrhel_laptop	tor8: finally getting going on the pdfwrite stuff	16:35.50
Robin_Watts	Let me know when the java stuff is in a state to be picked up again.	16:35.53
kens	Im writing documentation, I hate writing documentation :-(	16:35.59
mvrhel_laptop	tor8: so I will get it to share the image handling that I addec	16:36.05
	added	16:36.07
Robin_Watts	actually, better go and get the dog back from the vets.	16:36.09
tor8	Robin_Watts: downloading the latest android sdk's, ndk's and studio... taking AGES due to slow pipes from google	16:36.14
Robin_Watts	As I dropped him at the vets yesterday, I spied an intersting figure on the computer screen: "Billings this period: 1944 quid"	16:36.55
	I told myself that that couldn't possibly be right, but I'm beginning to fear it might be...	16:37.37
tor8	Robin_Watts: if that's the figure for january, then he's not going to stay open much longer I can't imagine...	16:38.08
	mvrhel_laptop: cool!	16:38.15
mvrhel_laptop	tor8: do we want to do anything with the type0 font stuff?	16:38.30
Robin_Watts	tor8: No. That's the figure that they've billed me, I think.	16:39.06
tor8	Robin_Watts: ouch. expensive dogs.	16:39.17
	mvrhel_laptop: define "do anything"	16:39.22
Robin_Watts	dog was free. dog upkeep not free.	16:39.33
tor8	Robin_Watts: just like kids, from what I've heard...	16:39.48
mvrhel_laptop	tor8: is there anything that I need to do with pdfwrite and the type0 work I did. Right now it works for pdfcreate	16:39.49
Robin_Watts	It's like the whole printer/printer toner thing :)	16:39.49
mvrhel_laptop	tor8: But I was not sure if there was any other uses for it	16:40.02
	tor8: At some point, it would be nice to add the capability via an api to add in an image or text to a page in an existing document	16:40.36
	that would be useful in gsview	16:40.39
	tor8: I suspect I should be able to leverage what I have for that	16:40.52
tor8	mvrhel_laptop: ah! well, we should look into using it for pdfwrite. I believe pdfwrite's current font handling is not up to task when it comes to outputting text.	16:41.05
	now that you have the ability to create cid fonts, everything that comes into pdfwrite should come out as a cid font. we'll tackle type3 fonts later I think.	16:41.50
mvrhel_laptop	tor8: ok. I will work on the images first and then the font	16:42.10
tor8	mvrhel_laptop: yes, having an api to add stuff to pages in existing documents can be done two ways.	16:42.24
	one: interpret the page through the device interface and use pdfwrite to recreate the graphics, and then add drawing commands to the end of that	16:42.48
	two: use the 'pdf_processor' interface to clean up the syntax, and then add extra pdf graphics commands to the end	16:43.12
	the latter is probably better in the long run; it doesn't change the actual contents of the page by reinterpreting them	16:43.33
mvrhel_laptop	tor8: yes. that sounds good to me	16:43.42
tor8	so most of the file is going to be intact, with the existing fonts and encodings etc. less risk for trouble.	16:43.53
mvrhel_laptop	not sure what you mean by "clean up the syntax" though	16:43.55
tor8	and the 'sanitize' mode to the pdf_processor will make sure to balance the q/Q push states so we can safely add stuff on top at the end	16:44.13
mvrhel_laptop	ah ok	16:44.20
tor8	mvrhel_laptop: right. so we have another interface that sits between the pdf content stream parser and device interface	16:44.52
	we currently have three implementations of this interface	16:45.08
	the interface is basically just a bag of function pointers, one per content stream operator	16:45.33
	so we have functions for 'q' and 'Q' and moveto and lineto and fill etc	16:45.46
mvrhel_laptop	ok. I recall seeing that	16:46.03
tor8	one of the implementations does the pdf interpretation and turns these commands into calls to the device interface	16:46.09
	one of them just printf's the commands back out to a buffer	16:46.26
	so if we use the second one, we can get a syntax pretty-printed content stream	16:46.48
kens	tor8 you may not want to turn incoming non-CIDFonts into CIDFonts, or did I misunderstand you above ?	16:47.09
tor8	kens: you understood correctly. I think turning everything into an Identity-H CIDFont is the road to least complexity.	16:47.34
kens	Its less complex true, but.....	16:47.45
tor8	except Type3 fonts, they'll need to be special cased.	16:47.47
	mvrhel_laptop: the code for this processing interface is in pdf-op-run.c and pdf-op-buffer.c	16:48.10
kens	If the incoming text is in a regular Font, and uses ASCII character codes, then its searchable in Acrobat. But if you turn it into a CIDFont, then it isn't	16:48.19
tor8	for the two implementations I've mentioned so far	16:48.20
mvrhel_laptop	tor8: ok great. what is the third interface?	16:48.31
tor8	kens: even if we create a ToUnicode table?	16:48.34
kens	</p>OK if you create a ToUnicode then its fine	16:48.45
tor8	mvrhel_laptop: that's the fancy implementation -- pdf-op-filter.c	16:48.46
	it tracks state changes and omits redundant ones	16:49.03
	so you get a minimized content stream out	16:49.10
*kens*	has an enhancement bug somewhere to create ToUnicode CMaps in pdfwrite if we don't have one and think we can.	16:49.48
tor8	so the pdf_new_filter_processor takes another processor as an argument and forwards the "optimized" calls to it	16:50.02
mvrhel_laptop	kens: you can borrow what I wrote for mupdf	16:50.03
kens	mvrhel_laptop : That's not the problem :-)	16:50.14
mvrhel_laptop	tor8: ok that makes sense	16:50.17
tor8	so if you chain the filter and the buffer processor together you can get a nicely formatted and somewhat cleaned up content stream up	16:50.39
kens	GS already has loads of code to write ToUnicode, its identifying the condition where such athing is sensible	16:50.42
tor8	this is what the '-s' flag to mutool clean does	16:50.45
	so if we're editing pages to add images on top, etc. what I think we could do is run the page through said filters and then just concatenate on the graphics commands to draw the image.	16:51.22
	after the 'pdf-op-filter' pass, the q and Q's should be balanced and the graphics state is back to the initial defaults	16:51.55
	kens: yeah, given garbage in we're going to have garbage out. I expect this will fail in some cases; I think mvrhel is building the ToUnicode from the actual font cmaps	16:52.50
mvrhel_laptop	tor8: ok. I believe I follow that. So is the added content stored with the existing content in a pdfobj at that point?	16:52.59
tor8	so if those are missing, we're in trouble	16:53.01
kens	tor8 fonts don't have CMaps, only CIDFonts :-)	16:53.10
	But it shouldn't be any worse than the original anyway	16:53.28
tor8	mvrhel_laptop: I haven't got that far. the current code just gives you the primitive operations; nothing is tied into automatically updating pd_obj's and associated streams.	16:53.40
mvrhel_laptop	tor8: sorry	16:53.50
tor8	kens: lowercase truetype cmaps, sorry for the confusion	16:53.58
kens	:-)	16:54.05
tor8	kens: or glyph names	16:54.06
mvrhel_laptop	gawd. more font syntax goofiness	16:54.18
kens	Yeah glyph names was my approach, we have the Adobe Glyh List already in GS	16:54.25
tor8	mvrhel_laptop: I took a quick glance through your latest commits.	16:56.59
	It looks like you suffer the same affliction as Robin and most other people who use syntax coloring...	16:57.20
mvrhel_laptop	what is that	16:57.45
tor8	you don't put blank lines before a new section is introduced with a comment, for example in pdf_add_cid_to_unicode at the "Now output non-zero entries" comment I would've put a blank line before	16:58.10
mvrhel_laptop	tor8: ah ok	16:59.02
tor8	I possibly overdo the blank lines personally; I think about code in paragraphs. and it helps me navigate with my editor, there are quick short cuts to skip to the next/prev blank line	16:59.08
mvrhel_laptop	I can see where without color that would be harder to see	16:59.14
tor8	I haven't used syntax coloring in over a decade; I find it extremely distracting.	16:59.38
mvrhel_laptop	and I can understand the editor usefullness	16:59.39
	I love it	16:59.44
	I will fix these	16:59.50
tor8	turn it off for a few weeks; code formatting will only improve once you don't have coloring as a crutch ;)	17:00.19
mvrhel_laptop	what about at line 2032	17:00.34
	do you put a blank line before that comment?	17:00.47
	the line before it is a {	17:00.53
tor8	mvrhel_laptop: just a sec, I'm looking in gitk	17:01.01
	mvrhel_laptop: nah, the '{' on a line of its own is enough of a paragraph separator for me	17:02.07
mvrhel_laptop	tor8: ok good. oh also a question. for pdfcreate do we want to add a command line option to use either the simple / or type 0?	17:02.23
tor8	I'd vote for simple, or distinguish using a different command line flag to add the font?	17:02.50
	the use case for the tool is to hand craft simple pdf files; and there the simple fonts would be easier to work with	17:03.27
mvrhel_laptop	yes	17:03.32
	tor8: ok. great. thanks for all the details about the options after the pdf content stream parser. I will take a look at this once I get pdfwrite changes in place	17:05.04
	maybe around that time we can merge this work in?	17:05.12
	perhaps after the release though	17:05.23
	I worry about the testing for pdfwrite	17:05.29
	do we currently have any regression testing in place for it?	17:05.42
tor8	mvrhel_laptop: yes. I wouldn't worry much, I certainly hope we don't have anybody actually using our pdfwrite yet.	17:05.48
	I'm not sure about testing, Robin would know	17:06.02
mvrhel_laptop	gsview uses it a lot	17:06.07
	or at least I use it quite a bit with gsview	17:06.13
tor8	mvrhel_laptop: ah!	17:06.20
mvrhel_laptop	mainly for expanding content	17:06.23
tor8	good :)	17:06.24
	we have a user!	17:06.32
mvrhel_laptop	anytime I have a pdf file that I need to fool with I open with gsview and save as expanded pdf	17:06.50
tor8	mvrhel_laptop: typedef struct resource_tables_s resource_tables that publicly visible struct really needs to have our pdf_ namespace prefix	17:07.25
mvrhel_laptop	tor8: ok. that makes sense	17:07.39
tor8	same with res_table and res_search_fn	17:07.57
mvrhel_laptop	ok	17:08.07
tor8	if structs and functions are local to a file, I don't mind skipping the prefix but we shouldn't pollute the namespace with externally visible symbols	17:08.46
mvrhel_laptop	right.	17:08.56
tor8	I'll probably go and make you rename every single function once we're ready to merge. we haven't been perfectly consistent in our naming,	17:09.54
	but I have tried to make some effort to clean up some of the places lately. as you've no doubt ran into and had to fix your code for.	17:10.24
	I'm sure you've already looked at docs/naming.txt ... I ought to add some more examples and motivations to that document.	17:11.09
	and I'm sure there are things I've forgot to put in it in the first place	17:11.30
mvrhel_laptop	tor8: yes. no problem. I will take another look at that document also.	17:11.47
	tor8: tbh I may not have read naming.txt before or it may have been a long time ago. Just been copying the style that I saw. This clears up a few things...	17:17.46
tor8	mvrhel_laptop: the pdf_obj function naming does not follow the style guide	17:18.29
	Robin prefers functions of the noun_verb kind (object oriented) whereas I prefer verb_noun (it reads better, and I prefer functional programming to o.o.)	17:19.19
	and we've had verb_noun since the beginning even though sometimes it would be clearer with noun_verb	17:20.12
	conversion functions are generally of the x = x_from_y(y) form, easer to keep track of what's what with the alliteration of variable names and the function name parts	17:21.32
	x = y_to_x(y) is just awkward, IMO	17:21.46
mvrhel_laptop	certainly it is good to pick one and do it for all	17:22.08
Robin_Watts	mvrhel_laptop: You get used to it. tor has just completely renamed everything in the java stuff I did :)	17:24.25
	I note that we now have to_Matrix rather than fz_matrix_to_Matrix.	17:24.42
mvrhel_laptop	tor8: added the pdf prefix to the visible structures. question for you. should pdf_resource_table_free not use the "free" wording since it is not really rc'd I see in naming.txt you say that that word is reserved for rc schemes. I could change it to release	17:33.52
	just trying to reduce the amount of rewrite as I move forward	17:34.32
Robin_Watts	keep/drop are the rc words.	17:36.12
	free seems fine to me, I think.	17:37.03
mvrhel_laptop	ok	17:40.04
	bbiab	17:52.04
tor8	mvrhel_laptop: pdf_drop_resource_table is the preferred naming	22:22.33
	Robin_Watts: we use drop for both free and refcounted things. saves us having to rename (and remember which is which) should we change	22:23.13
	Forward 1 day (to 2016/02/04)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.