Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2015/09/09)	20150910
srini	Mudraw html does not recognize decorations like underlines...?	06:43.11
kens	chrisl, you having any network problems ths morning ?	06:54.17
	I'm wondering if my ISP's DNS is shafted	06:54.32
chrisl	kens: I haven't so far, but then I just sat down	06:56.10
	Nope, everything seems fine	06:56.56
kens2	Weird	06:57.02
	Shifting to the Google DNS server helps, in that I can now resolve the IP addresswes, but I can't ping a boat load of sites	06:57.34
	Eghttp://www.bbc.co.uk is OK< buthttp://www.dilbert.com isn't	06:58.08
	It does seem to be progressively improving, I suspect my ISP has a problem	06:58.44
chrisl	dilbert.com works fine for me - actually, everything seems quite quick this morning	06:59.07
kens2	It is fine for me too, now.....	06:59.28
	Oh it seems dilbert just doesn't respond to pings	07:00.06
	I am having to use the Google DNS though	07:00.26
chrisl	Well, lots of sites ignore pings these days	07:00.50
kens2	yeah, but I know it used to be OK, which is why I'm using it, they must have changed	07:01.10
chrisl	Yeh, it's not responding to ping for me either.	07:02.18
kens2	Not to worry, now I switched DNS everything is working. I've had to do that before with my ISPs DNS	07:02.46
srini	What is the best time to catch up with mupdf developers?	07:11.23
kens2	an hour or two from now	07:11.33
srini	kens2, Thank you!	07:11.47
tor8	Robin_Watts: so you're finally doing that forwarding device? I remember running into some trouble doing that before.	08:02.58
	the way paths and a few other things require back-and-forth communication	08:03.20
	s/paths/patterns/	08:03.26
	but I guess enough time might have passed for those issues to have been fixed in the passing	08:04.55
Robin_Watts	tor8: they do?	08:34.07
tor8	to be fair I was trying a 'tee' device, to feed two devices from the same source	08:34.34
	that didn't fare well with the pattern device call having a return code and that the interpreter can change behaviour based on	08:35.01
Robin_Watts	tor8: Ah, yes.	08:35.10
tor8	a straight up forwarding/decorating device shouldn't have the same troubles	08:35.33
	when I get some time over I would like to expose the device and interpreter interfaces to javascript, so we could script this kind of stuff without having to recompile	08:36.17
	a "mutool run script.js"	08:36.35
Robin_Watts	tor8: Yeah, I have some prototype code somewhere that exposes the device interface into android java.	08:43.51
tor8	Robin_Watts: is there a quick way to get the bounding box for a display list?	08:44.15
	I'm thinking of adding two fz_rects to the display list struct	08:44.33
	one for the mediabox passed to the fz_begin_page call and another for the actual inked area	08:44.43
	as would be calculated by the bounding box device	08:44.59
Robin_Watts	What if you have more than one beginpage?	08:45.03
tor8	you shouldn't, ever	08:45.12
Robin_Watts	Suppose you open a display device, and then fz_run_contents, then fz_run_annotations.	08:45.41
tor8	but yes, that is a fair question. I guess we could take the union.	08:45.43
	fz_run_annotations wouldn't call beginpage	08:45.51
Robin_Watts	So fz_end_page is not the last thing in a display list?	08:46.07
tor8	only for actual pages	08:46.41
Robin_Watts	Or suppose you want to 'merge' 2 pages, by running 2 pages into the same device with different CTMs ?	08:46.55
tor8	fz_run_page wraps the page contents and the annots with begin/end page calls	08:47.06
	that's a better scenario	08:47.30
Robin_Watts	tor8: Right, so fz_run_page gives different results from running the contents and annots separately.	08:47.34
tor8	and in that case, taking the union is the right thing to do, but I haven't really considered how to handle that case on the backend side with the pdfwrite device etc	08:48.12
Robin_Watts	tor8: I agree that taking the union would be the right thing to do.	08:48.38
	I'm disquieted by the difference in behaviour between fz_run_page and fz_run_contents/annots.	08:49.06
	Holding a couple of top level rects doesn't offend me massively.	08:49.38
tor8	if you're calling fz_run_page_contents, I expect you to call fz_begin_page manually as part of the contract for doing things the low level way	08:49.41
Robin_Watts	I worry that computing the second one might slow down display list collection.	08:50.00
tor8	I'm looking through the code and wondering what the different rects that the display list collection calculates are	08:50.30
	if it might be easy to just stash them away at an opportune spot into a high level rect	08:50.58
Robin_Watts	pdfapp.c does not call fz_begin_page, end_page then.	08:51.35
tor8	Robin_Watts: on tor/master is the main reason for this ... a few high level functions to wrap the most common ways of creating and using devices	08:51.46
	Robin_Watts: and in adding a few more to that I ran into the issue with the bounding box missing from the display list.	08:52.33
	I want to call fz_new_display_list_from_page() and then fz_new_pixmap_from_display_list()	08:52.53
	without having to query the page mediabox manually, so the display list needs a way to bound itself for the latter call	08:53.14
Robin_Watts	Nor does platform/android/jni/mupdf.c	08:53.18
tor8	preferably using the mediabox	08:53.19
	those bits of code predate the addition of the begin_page/end_page calls	08:53.39
	it might be that we should call begin_page/end_page for the run_page_contents call	08:54.02
Robin_Watts	right, but it's an area that should be cleaned up. Not blaming anyone.	08:54.03
tor8	Robin_Watts: agreed.	08:54.07
Robin_Watts	I agree that we'd like a way to get the media box out of the displaylist.	08:55.07
	A top level rect seems reasonable.	08:55.26
	which is the union of all the begin_pages in the list.	08:55.40
tor8	but if we're going to use the beginpage/endpage calls in high level output devices then getting the endpage before the annotations is going to be awkward	08:55.49
	yes, that sounds ideal	08:55.59
Robin_Watts	It's the other calculation (the inked area) which may be trickier to achieve without doing excess work.	08:56.20
tor8	but also means that if I make separate display lists for annotations, which bounding box do I use for those?	08:56.34
Robin_Watts	I'd be tempted to just use fz_bound_page for that one, and let it work the usual way.	08:56.40
tor8	could be that each annotation has its own bounding box that I could feed into the begin_page calls when creating the display list for them	08:57.07
Robin_Watts	Could we add a flag to begin_page where we can say if it's a page or an annotation?	08:57.30
	Actually... do we ever want annotations to change the media size of the page ?	08:57.54
tor8	we don't	08:58.16
	so going to be a bit of a hack	08:58.42
Robin_Watts	So, there are 3 bboxes we need.	08:58.52
	1) the mediabox (the union of the page boxes)	08:59.03
tor8	I need the page mediabox and the annotation for rendering annotations to temporary bitmaps	08:59.32
Robin_Watts	2) the inked region (the union of all the marking operations for a page intersected with 1))	08:59.40
	3) the bbox that completely covers both the page and the annotations.	08:59.59
tor8	I'm planning to do several separate bitmaps and then compose the final screen from those rather than re-rendering the page + annotations each update from display lists	09:00.06
Robin_Watts	tor8: Right.	09:00.32
tor8	I don't think we need 3)	09:00.37
Robin_Watts	Well, we might be able to live without 3) if we can fz_bound_annot.	09:00.47
tor8	ah, we do have a fz_bound_annot function	09:01.03
Robin_Watts	Maybe hold separate displaylists for each annotation?	09:01.13
tor8	and I can live with just 1) for now	09:01.16
	I was going to hold separate display lists for each annotation and the page	09:01.25
	so I should just need to make sure to begin/endpage each display list with its appropriate bbox	09:01.39
Robin_Watts	So each annot displaylist will have no 'mediabox', as it won't have any begin_page/end_page.	09:01.49
tor8	being the mediabox or annotation box	09:01.50
	or as you said, add a flag to begin_page	09:02.18
Robin_Watts	Either annotations have begin/end_page, or they don't.	09:02.31
	If they do have it, then we can't get a proper mediabox from a displaylist generated by run_page.	09:02.57
tor8	how about making sure the devices can cope with nested begin/endpage calls?	09:03.04
Robin_Watts	Not sure that helps.	09:03.25
tor8	take the union of the top level ones and intersect child beginpage boxes	09:04.00
	we could pass a 'mediabox' to fz_new_display_list...	09:04.49
Robin_Watts	I still can't see how that nesting scheme solves this situation, and it would require changes to the bbox device to properly clip, or it'd be nonsensical.	09:05.33
tor8	but I rather like the begin_page / end_page calls as they are	09:05.54
Robin_Watts	tor8: But we don't necessarily know the mediabox at display list creation time.	09:06.02
tor8	yeah, ignore the nesting thing, that's just adding bad complexity	09:06.12
	actually, I think we do	09:06.29
Robin_Watts	I am tempted to say leave begin_page as it is (except properly insert it on run_page_contents).	09:06.51
	We can keep the media box generated by that.	09:07.01
tor8	but that was one of the reasons for adding the begin_page/end_page calls	09:07.02
Robin_Watts	We should either add a begin/end_annotation call (or add a flag to begin_page).	09:08.17
	annotations don't include a mediabox.	09:08.27
	Actually, let me start again.	09:09.23
	Add a flag to begin_page to say if it's a page or an annotation.	09:09.39
	We keep 3 rects in the displaylist.	09:09.48
	1) The mediabox of the pages.	09:09.52
	2) the mediabox of the annotations	09:09.58
	3) the union of the inked regions of the page	09:10.43
	That gives us everything we could possibly need I think.	09:11.06
	and every call is clearly defined.	09:11.19
tor8	Robin_Watts: okay. how do you envision the call nesting for the begin_page(is_annot) calls?	09:12.55
Robin_Watts	tor8: every fz_run_annot starts with a begin annot and ends with an end_annot.	09:19.03
	every fz_run_contents starts with a begin_page and ends with an end_page	09:19.25
	So it's simple to see from the display list where each thing came from.	09:19.38
tor8	so basically: begin_page, page contents, end_page, begin_annot, ..., end_annot, begin_annot ... end_annot	09:20.13
	I was otherwise thinknig begin_page ... begin_annot ... end_annot begin_annot ... end_annot end_page	09:20.52
	Robin_Watts: on tor/master, a basic first step that just records page bounding boxes and doesn't attempt to cope with per-annotation separate display lists	09:23.32
Robin_Watts	tor8: I prefer the former to the latter.	09:24.27
tor8	hm, maybe we shouldn't be passing the ctm to fz_bound_display_list	09:24.48
	Robin_Watts: actually, now I remember another reason why I added the calls	09:25.04
Robin_Watts	the latter makes it impossible to do fz_run_page_contents then fz_run_annot.	09:25.06
tor8	so that we can use them in the pdfwrite device so you can get natural page delineations without having to recreate the device for each page	09:25.41
	which makes a lie of my earlier statement that I only expect to see one begin_page call	09:26.04
	and means that we really ought not to call begin_page automatically in run_page_contents, since that would foul up any n-up style devices	09:26.37
Robin_Watts	tor8: ok... just mulling that.	09:29.25
	tor8: No... I think I disagree.	09:30.14
	when we call fz_run_page, that must call fz_begin_page, ...contents ..., fz_end_page.	09:30.33
	If I run that to a display list and then run the display list out again, I want to get the exact same sequence of events.	09:31.03
tor8	fz_run_page draws the whole page and wraps it with begin_page/end_page and that's fine	09:31.18
Robin_Watts	If we have an n-up device, then that should swallow the begin_page/end_pages, and regenerate a new one.	09:31.39
tor8	but when calling run_page_contents, I think we should make the begin_page/end_page wrapping the clients responsibility	09:31.41
Robin_Watts	ok, I can't immediately see an objection to that.	09:32.17
tor8	run_page_contents is for the guts of one page without the begin/end_page package	09:32.24
	and n-up can be done in two ways here I think	09:32.51
	1) at the fz_document level, by calling run_page_contents with various matrices	09:33.07
	2) at the fz_device level by eating begin_page/end_page calls and decorating and forwarding calls	09:33.39
kens	ah drat, I just created a branch on master I didn't mean to	09:44.39
	tor8 or Robin_Watts would one of you mind killing the branch dynamic_PJL_allocs please ?	09:45.34
Robin_Watts	kens: on golden?	09:47.25
kens	yes pleae	09:47.33
Robin_Watts	git push golden :dynamic_PJL_allocs	09:47.37
srini	I need help in mupdf text extraction to html	09:47.45
kens	Having screwed it up I don't thnk I trust myself to undo it	09:47.45
srini	is there a way to get the text-decorations? Underline StrikeThrough etc.... and also is there any patch on h1 h2 h3 detection?	09:48.24
Robin_Watts	done.	09:48.25
kens	thanks Robin_Watts	09:48.30
srini	font-color also	09:48.41
Robin_Watts	srini: The text decorations aren't defined in PDF.	09:48.44
srini	Robin_Watts, so we have not way to detect underlined text?	09:49.11
	*no way	09:49.32
kens	You could detect the presence of a thin line parallel to, and in close proximity to, a glyph	09:50.10
	Which 'might' be an underlined piece of text, or it might be the top of the frame for a figure right underneath it	09:50.39
	So you would have to further test that the path extended from the beginning of the text to the end of the text, and no further	09:51.14
srini	How about h1 h2 h3 detections	09:52.10
kens	No idea what you are talking about there	09:52.22
Robin_Watts	srini: In the PDF file, all we get is 'put this glyph in this font, at this position'.	09:52.30
	And things like underlines/strikeouts are added on later using line art.	09:52.45
	There is no concept of 'structure'.	09:52.55
	Hence we have no h1/h2/h3 information.	09:53.03
srini	Hmmm ... got it...	09:53.09
Robin_Watts	We don't even have paragraph information.	09:53.11
	Now, when we do text extraction we do some funky heuristics that try to reassembly chars to words, words to lines, lines to paragraphs, etc.	09:53.40
	but those are all prone to failure.	09:53.47
kens	chrisl, looks like I was suffering from ths problem ths morning:	09:53.51
	http://www.theregister.co.uk/2015/09/10/major_plusnet_outage_dns_goes_titsup/	09:53.53
srini	I actually tried a way sometime.... find most used font and size, considering this a base font size, continue searching fonts greater than base font size and mark them into h1 h2 h3 etc	09:54.15
	but did not actually work well that way	09:54.27
Robin_Watts	srini: That's exactly the kind of think you'd need to do.	09:54.44
	your best bet is to operate on the structures you get back from the fz_stext_device.	09:55.12
kens	All these things are, in the absence of dtaa, heuristic. If your heuristic doens't work well, then you need to refine it	09:55.13
Robin_Watts	s/think/thing/	09:55.28
srini	Any official/unofficial patch available for text decorations?	09:56.49
	And, how about font-color?	09:57.17
Robin_Watts	nope.	09:57.20
kens	I thought the current color was in the XML output (I coudl be wrong)	09:57.41
Robin_Watts	kens: Not in the text extraction output, I thought.	09:58.04
kens	Ah....	09:58.11
	Its been a while (ie more than 10 minutes) and therefore I cannot recall for certain	09:58.34
Robin_Watts	oh, wait, you may be right.	09:58.50
	the color might be in the style.	09:58.56
srini	Robin_Watts, kens - many things are not available in the text extraction... they are there in the library	09:59.01
Robin_Watts	no, the text color is passed to fz_lookup_text_style, but is not actually used.	09:59.50
	You could extend that fairly simply if you wanted.	09:59.58
*kens*	thnks PlusNet is havig a very* bad day, I'm glad I'm not on their Hell Desk*	10:01.26
srini	I am not a seasoned hand on coding... decoration especially underline... can someone help me add code?	10:03.33
Robin_Watts	srini: Forget about underline. That ain't gonna happen easily.	10:04.22
kens	likewise strikethrough I'd have thought	10:04.35
Robin_Watts	same deal, yes.	10:04.41
	color should be a simple addition for anyone that knows C.	10:04.53
kens	that one might be easier to detect, but its till a project	10:04.54
Robin_Watts	kens: strikethough and underline are both basically the same problem (dot products with the baseline and examine the offset)	10:05.47
kens	http://www.theregister.co.uk/2015/09/10/major_plusnet_outage_dns_goes_titsup/Actually, I wonder if the underline and strikethrough are done as part of the font definition, in which case they are inherently undetectable in PDF anyway	10:05.54
	Robin_Watts : that assumes that teh work isn't done by definiing a type 3 font which first draws then underlying font glyph, then draws a line	10:06.25
Robin_Watts	kens: true.	10:06.38
kens	I'd be fairly certain that's how QAuark Xpress would do it, but I doubt if MS Office does (for insgtance), but I muight be wrong	10:06.55
	There's a bunch of werid stuff in the MS PS output that I've never needed to look at	10:07.23
srini	mostly I come across text merging issues.... they are generally because of very small space <less than general tab space> at the beginning of the paragraphs...	10:25.46
	can we have a method to alter them from the command line...	10:26.19
hj__	is there a difference in handling file name between 9.16 and 9.07. we have a input file name using chinese characters and ghostscript 9.16 gives an open error	13:27.14
kens	You probably need to use the latest code. THere may be a work-around, I don;t recall	13:27.55
hj__	in our debug it looks that "os_status" is called with the UTF8 name passed to stat. are there no specific os_status/os_rename etc for windows doing the wide char version	13:31.07
	when converting the utf8 name in os_status to wchar and calling _wstat instead of stat it seems to work	13:34.18
	a quick look in the code...the os_rename/os_status/os_delete should have a windows specific implementation	13:35.14
chrisl	hj__: as kens said, that's been fixed, a few months ago	13:40.35
hj__	thanks.	13:41.37
chrisl	hj__: here's the commit: http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=1572afcd	13:43.00
hj__	chirshl thanks for the stat commit. do you know if the rename/remove also have got a similar change?	13:46.52
	i just looked in the git source. the os_delete and os_rename just call the default c functions. this might need some changes for windows wide char	13:51.54
kens	If you think you;'ve found a bug, feel free to open a bug report. Don;t forget to include a sample file and instrcutions for reproducing the bug.	13:53.57
hj__	oke	13:54.20
henrys	kens: begrundginly I'm good with it, it was intended never to fail, only doing allocations at startup; the job manager should alway be able to run on a pcl printer, but I realize that's too inflexible.	14:02.31
kens	henrys I could go with that if it wasn't for the fact that we don't action the PJL immediately but save it all up. TO do that, I need a bigger (potentially much bigger) allocation	14:04.50
	By the way, it won't fail if it can't allocate a new string or tbale, it just keeps on using the old one	14:05.36
	If it only runs at startup, a failure to allocate one of these small strings woud be only one step ahead of the whole interpreter faolding up anyway I'd have thought	14:06.20
	Anyway, I'll move ahead to the distillerparams	14:07.16
henrys	kens: sounds good.	14:08.21
kens	Well, ths will be the more interesting part :)	14:08.48
henrys	kens: where are the device parameter setting going to take place exactly as I'm rewriting much of plmain and friends.	14:09.00
	?	14:09.09
kens	It'll be simple strings	14:09.22
	I have yet to decide on delimiters for arrays and dicts, probably I'll just stick with the PS ones	14:09.57
	henrys, what do you see with C705.bin and -dFirstPage=2 -dLastPage=4 ?	14:10.28
	D'oh I clearly cannot read	14:10.42
	I will plan to process the dsitillerparams at the same time as all the other devcie stuff	14:11.05
	SO all the params will get assembled into a big set ofstorage, then we'll call some magic procedure which will turn it all into param lists and call putdeviceparams	14:11.37
	IIRC this is pretty much what happens for the existing settings ?	14:12.02
henrys	kens: I haven't looked at C705.bin but it can't work, it installs XL's device in pcl's state the page count is gone.	14:14.56
kens	THe page count is held by the First/Last page device	14:15.25
	Its independent of the interpreter	14:15.31
	Err, I think.....	14:15.50
	It does work on WIndows	14:16.02
henrys	XL and PCL have their own device and their own subclass.	14:16.28
kens	Well each should operate independently, assuming you pass hte device params to both	14:17.09
	Like I said, I did try ths on Windows and it seemd to do what I expected	14:17.36
henrys	oh maybe I should look again.	14:18.52
kens	It ddn't work on Linux, but it didn't work wihtout -dFirstPage either	14:19.12
	I have no clue why	14:19.28
henrys	I will look at it, it was "a note to self" type bug. It can wait.	14:21.52
kens	No problem, I'll leave it with you	14:22.30
mvrhel_laptop	tor8: is there any particular reason that fz_keep_page does not have a prototype in document.h?	16:48.14
	I was maybe going to make use of it in my page cache	16:49.04
	or Robin_Watts if tor8 is not around...	16:50.11
Robin_Watts	mvrhel_laptop: Sounds like an omission to me.	16:50.37
mvrhel_laptop	ok	16:50.52
	I will add it	16:51.47
	Robin_Watts: the tiny minor commit is on my mupdf repos if you can take a 2 sec look	17:00.07
Robin_Watts	looking.	17:00.30
	Can't argue with that.	17:01.01
mvrhel_laptop	I hope not :)	17:01.10
	what day do you fly in to chicago?	17:01.24
	Robin_Watts: are you guys coming early for the show?	17:02.22
Robin_Watts	mvrhel_laptop: No, flying on wednesday.	17:03.14
	We have to stay til sunday night, so that's quite long enough away from home (or so I'm told).	17:03.35
	Who will deal with the spiders in my absence? :)	17:03.44
mvrhel_laptop	Robin_Watts: :)	17:03.50
Robin_Watts	mvrhel_laptop: I have tea.	17:04.05
mvrhel_laptop	Robin_Watts: I may try to meet up with you all for dinner on Wed. what time do you arrive?	17:04.05
	oh great. do you want anything like dark chocolate malt balls, or dark chocolate pb cups?	17:04.28
Robin_Watts	We land at 12:30	17:04.30
mvrhel_laptop	oh early	17:04.33
Robin_Watts	mvrhel_laptop: Definitely not!	17:04.39
mvrhel_laptop	I need to bring something	17:04.45
Robin_Watts	Still working off the holiday fat :(	17:04.52
	Helen loved the last ones but said "Tell Michael never to bring these again!"	17:05.09
mvrhel_laptop	they are my favorite	17:05.26
rayjj	tor8: Scott is on the phone and a potential customer wants to know if we support EPUB 4 OS	17:11.08
Robin_Watts	wtf is EPUB 4 OS ?	17:11.23
mvrhel_laptop	it the one after 3	17:11.53
Robin_Watts	MuPDF supports epub v1.	17:12.06
	v2 is 'fixed format' rather than 'reflowable' so moving towards pdf. We don't support that properly.	17:12.32
rayjj	that's what I didn't know either. As far as I knew their is a v 3 (that we can't handle)	17:12.58
Robin_Watts	hmm. Maybe we support v2, but not v3.	17:13.54
	Certainly I've never heard of a v4.	17:14.00
	Googling "EPUB 4 OS" is unhelpful.	17:14.13
	I suspect the customer is parroting something requested by their engineering team, and filtered through scott it's arriving corrupted.	17:14.53
	No idea where the corruption might be occurring.	17:15.02
rayjj	Robin_Watts: it's a .jp company, so that doesn't help	17:15.51
Robin_Watts	Get scott to ask them to forward him some links about what they want supported.	17:16.21
	and we can look at those.	17:16.30
rayjj	It might be a question about which OS's	17:19.16
henrys	I hope our imiplementation is compatible with epub 2, that's what we are saying to the world	17:19.50
Robin_Watts	and a link about the OS they want supported would make that clear.	17:19.53
rayjj	Scott is asking them. I had him list that we support Windows, unix/linux, Mac OS/x, iOS and Android in case that answers their question	17:20.31
Robin_Watts	henrys: Right. I think I was wrong earlier. We have a v2 implementation, but we don't support the new stuff in v3.	17:20.32
henrys	yeah what we've said is 2 compatible and we'd add 3 features as time permits if they are sane	17:21.25
rayjj	Robin_Watts: that's what I recalled as well. But I also hadn't heard about v4. V3 was just released 2014, so it's a little soon for a v4 :-)	17:21.25
	henrys: Robin_Watts: thanks. That's what I told Scott. v2, not v3, but if they need v3 we'd have to discuss it (and get sample files)	17:22.29
henrys	sounds fine rayjj	17:22.46
nvs	Hi! Are there any mupdf guys here?	17:33.02
mvrhel_laptop	bbiaw	17:46.32
	Forward 1 day (to 2015/09/11)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.