Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2016/02/22)	20160223
mvrhel_laptop	Robin_Watts: for the logs, there is a commit on my mupdf repos for you to review. It fixes some mem leaks in the pdf-device. When I was checking for leaks in my pdf_create branch I stumbled upon these in the master	00:19.43
Robin_Watts	let me look.	00:21.37
	looks good to me.	00:22.32
mvrhel_laptop	oh Thanks Robin_Watts did not expect you to be up	01:06.45
	I will push to golden then	01:06.52
halabund	Can Ghostscript convert everything in a PDF to RGB (even if it CYMK)? Also, can it get rid of layers in the PDF (whatever those are)?	10:14.02
kens	It can certainly colour convert, its in the documentation.	10:14.23
	As for layers, it depends what you mean by layers	10:14.31
halabund	My new colleagues donât know LaTeX, so I am forced to work in that abomination of MS Word and it keeps destroying some embedded PDFs. Google tells me that these two things may be the reason	10:14.33
kens	But basically Ghostscript will attempt to maintain the content of a PDF file when processing it. So probably not, but like I said, it depends what you mean by layers	10:15.17
	Oh of course you could always convert it into an EPS and embed that into MS Word	10:18.08
	But depending ont the content of the PDF you may not be happy with the result	10:18.30
Robin_Watts	tor8: You here?	11:55.10
tor8	Robin_Watts: yes.	11:56.41
Robin_Watts	So, was pondering this shapy texty thing when running this morning.	11:56.56
tor8	okay.	11:57.24
Robin_Watts	At the moment we're managing to keep stuff fairly high level in the device interface.	11:57.40
	It would seem to be a bit of a shame if we lose the ability to pass high level text through the device interface.	11:58.17
	So, how would you feel about fz_text_spans gaining text direction information?	11:58.49
tor8	you mean the bidi direction?	11:59.08
Robin_Watts	yes, ish.	11:59.35
	For every piece of text, we potentially have some extra information.	11:59.56
	1) What language it's specified to be in in the source text.	12:00.06
	2) The direction given to it in the source text.	12:00.16
	3) The direction of the text (unset/l2r/r2l/number-in-r2l-context)	12:01.09
tor8	I am not entirely sold on the idea ... for PDF, XPS, etc we won't have any of this information set	12:01.18
Robin_Watts	tor8: And so for PDF/XPS we can ignore it.	12:01.30
	But for html (and other sources) we can pass it through.	12:01.50
tor8	which means you'll get varying results depending on what you do on text that looks identical but comes from different sources	12:01.52
Robin_Watts	tor8: Yes.	12:02.07
	but with the language information, text may not look identical, for instance.	12:03.01
	(once we get that hooked up to harfbuff)	12:03.09
	We already have wmode as an int.	12:03.46
	If we change that to be a bitfield, then we get all the other stuff included for no extra size.	12:03.59
tor8	Robin_Watts: ahem. I just zapped the wmode (and put it back into the fz_font where it belongs)...	12:04.09
	so there's space for another field to take its place	12:04.15
Robin_Watts	ok.	12:04.20
tor8	on the other hand, if we make it into a bitfield, we could put the wmode and all other extra bits you want here back into it	12:04.47
Robin_Watts	We can also have a field in there for 'should be shaped' or not.	12:05.04
	so PDF can leave that blank.	12:05.23
tor8	I was just being annoyed by passing around the wmode argument to all the text functions just because of XPS having IsSideways as an extra attribute not part of the font	12:05.32
Robin_Watts	and we can have a routine that takes an unshaped fz_text to a shaped one.	12:05.43
tor8	but I could change my mind if we want to pass around other extra bits of information	12:05.43
	hmmm, you mean stuffing raw unpositioned text into the fz_text and then shaping that to position it? not sure that belongs in there.	12:06.47
	maybe if we add the extra bits of info that pdf text objects have, like the leading and charspace etc	12:07.04
	so we can replicate the PDF text commands that go between BT and ET with the fz_text functions	12:07.29
	text layout interfaces are complicated ... I would like to keep the fz_text simple, as a plain container for already laid out text	12:08.28
	http://git.ghostscript.com/?p=user/tor/mupdf.git;a=blob;f=source/fitz/text.c;h=de4211cc8569eb61bcd30f9df7073e7cae43a5a5;hb=a1066e62b3337e3cb4c1108070f5f4b89d8fab3b#l99	12:09.17
	I'm pretty sure vertical text layout with that function is still "broken" -- we don't offset the origins using the metrics, etc	12:10.44
	but if you could stuff harfbuzz into that function, that's all I wanted to start with	12:11.11
	annotating the text spans with language and bidi levels; we could do that to let the text extraction device be smarter	12:11.38
	or rather, let it be dumber by reading that info instead of trying to guess	12:11.54
Robin_Watts	The problem is that if we have fz_text as a really dumb low-level "just put this text here" block of data, it means that text extraction etc or html-write or whatever has to work MUCH harder to extract the original information.	12:13.46
tor8	Robin_Watts: yeah, but it already needs to work that hard for PDF	12:14.18
Robin_Watts	Having something that carries high level information which the low level info can be easily obtained from covers both ends.	12:14.32
tor8	still, I can see the point of having some high level information in there about bidi at least would be useful	12:14.53
	seeing as we already carry along the unicode values	12:15.06
Robin_Watts	yes, it needs to work hard for PDF, but it would be nice to remove some of the guesswork for cases that we can get away with.	12:15.09
	yeah.	12:15.10
	The bidi stuff is enough that we can work backwards from the shaped stuff losslessly, I think.	12:15.42
tor8	then I'm okay with adding bidi levels; and simply make the pdf/xps guess the bidi info	12:16.40
	and then simplify the structured text bidi reversing stuff	12:17.00
	in fact, I'd be perfectly okay with starting over the structured text extraction from scratch :)	12:17.19
Robin_Watts	tor8: The text extraction falls loosely into 2 parts.	12:44.26
	There is the gluing of text fragments back into spans, and then the derivation of things like columns etc from those spans.	12:44.55
	I'm broadly happy with the approach we take for the first half of that problem (certainly it's better than things we've done before, cos it copes with text at an angle etc).	12:45.31
	but it could probably be improved a bit.	12:45.49
	The second half of the problem is a horrible nightmare though. I started it with good intentions and ended up just happy to get out alive.	12:46.21
tor8	Robin_Watts: yes, the first half is probably okay... it's the second half I'm having doubts about.	12:49.09
Robin_Watts	tor8: The second half is a horrible problem. I am absolutely sure that it's possible to do a better job.	12:49.43
	I'm also sure that it's a potential black hole for time.	12:49.59
	It feels like a university level research project to me.	12:50.28
	i.e. go away, and spend some time on it, and at the end of 3 years you might not have anything that works, but you should have enough stuff to write up a thesis on things you tried.	12:51.07
tor8	Robin_Watts: yes. not really something that belongs in a shipping product...	12:54.06
Robin_Watts	The only saving graces of the stuff we have is that 1) it kinda works, and 2) it's optional.	12:55.07
	You text extract, then you can call the analysis or not.	12:55.22
ediee	Hi	13:05.38
ghostbot	Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.	13:05.38
ediee	I want to know that whether the images can be extracted from the pdf page??	13:06.07
tor8	Robin_Watts: I wonder if (with the bidi flags added) we could skip the extraction step for stuff like search and copy&paste	13:06.23
kens	If by 'images' you mean bitmaps, then yes	13:06.26
ediee	ok	13:06.48
tor8	just get the fz_text objects and work from there. we'd still need to add the space insertion heuristics for pdf files that don't emit spaces.	13:06.56
ediee	In mupdf's reflow mode why the images is not showing??	13:07.16
	In mupdf's reflow mode why the images is not showing??	13:08.39
kens	tor8 Robin_Watts that question is for you	13:08.50
Robin_Watts	ediee: Dunno.	13:09.04
	Presumably this is on Android ?	13:09.32
ediee	yess...	13:09.39
Robin_Watts	Is it all images, or just specific ones?	13:09.48
ediee	In mupdf's reflow mode if the images exists in pdf page then it will not show	13:09.57
	all the images	13:10.05
	it shows only text	13:10.09
Robin_Watts	I would have expected jpegs to work. Others should be converted to PNGs and get shown too.	13:10.36
ediee	but how??	13:10.53
	bcoz we wont get any images array to do that	13:11.11
kens	Are you writing an app eddiee ?	13:11.40
ediee	yess... planning to do so	13:12.06
	but reflow mode is in dilemma	13:12.16
kens	http://www.bbc.co.uk/news/education-35631030OK are you clear on the licencing terms	13:12.18
Robin_Watts	ediee: OK, so before we go any further, let's just check you understand the licensing terms.	13:12.24
kens	D'oh	13:12.25
Robin_Watts	MuPDF is released under 2 licences. You must use one of the licenses, or you can't distribute your app at all.	13:12.53
ediee	ok... i didnt know that	13:13.12
Robin_Watts	The first license is the GNU AGPL.	13:13.19
ediee	wat are the 2 licenses... ??	13:13.24
Robin_Watts	This is a free license. It says (basically) that you can use the code for free, but in exchange, you must be prepared to give away the source for your ENTIRE app to any end user of your app that asks for it.	13:14.09
	i.e. if fred bloggs gets your app, he gets the right to ask for the entire source code, which he can then pass on to anyone else he wants.	13:15.03
	So, most people writing commercial applications think that that's a non-starter.	13:15.24
	If you're writing a free app, then that may be fine though.	13:15.48
ediee	im writing a free app only..	13:16.09
Robin_Watts	And you're happy to give away the full source code too ?	13:16.24
ediee	and wat abt the second license?	13:17.09
Robin_Watts	(Some people write free apps that talk to their own specific services, so they are unhappy to give away the source code.)	13:17.15
	The second license is the Artifex Commercial license.	13:17.29
	This costs money, but in exchange you are freed from all the strictures of the GNU AGPL.	13:17.57
ediee	ok... wat are all the features I will get in v	13:18.43
	Commercial license	13:18.44
	ok... wat are all the features I will get in Commercial license??	13:18.58
Robin_Watts	ediee: Exactly the same code.	13:19.03
	Exactly the same features.	13:19.09
	Just you get to distribute it without having to abide by the terms of the GNU AGPL.	13:19.32
ediee	ok	13:19.42
	can I get solved with the reflow issue?	13:19.50
	wat i described previously?	13:19.57
Robin_Watts	ediee: Some commercial licenses come with support included.	13:20.17
ediee	means?	13:20.28
Robin_Watts	(or you can buy a separate support contract).	13:20.30
	ediee: We're generally a friendly bunch, and will (time permitting) help out where we can.	13:20.51
	Problems for commercial customers take priority of course.	13:21.06
ediee	ok	13:21.11
	so can u solve my problem?	13:21.18
	for reflow mode?	13:21.24
Robin_Watts	So, the way the reflow stuff works is that the page is run through the text extraction device.	13:21.36
	This gives us a set of structures at the end (lines of text on the page etc).	13:21.59
ediee	ok... but text extraction has no issues... the issue is with images	13:22.11
Robin_Watts	We then have some code that converts those structures back into HTML.	13:22.18
	And that's what the reflow code uses.	13:22.25
ediee	if the page has images like mathematical formulaes, scientific notations, etc....	13:22.41
	they all wont be displayed in reflow mode	13:22.52
Robin_Watts	If you set a flag on the text extraction device then it will keep images as part of that text extraction process too.	13:22.54
	This did all work fine before.	13:23.07
ediee	set a flag?	13:23.16
	whr?	13:23.20
Robin_Watts	It's possible it's been broken and we haven't noticed it.	13:23.24
ediee	can u show some sample code?	13:23.25
Robin_Watts	ediee: Are you using our example MuPDF app as a basis?	13:23.51
kens	At ths point, sharing an example file that does not work might be helpful	13:24.16
ediee	Robin : yesss	13:24.45
	kens : ok.. then can u plz share some links	13:24.56
	which i can refer	13:25.01
kens	No, I'm suggesting you share a file with us	13:25.12
Robin_Watts	ediee: OK, so in platform/android/jni/mupdf.c	13:25.27
ediee	ok	13:25.45
	robin : can u plz elaborate?	13:26.01
Robin_Watts	ediee: I'm telling you to load that into an editor.	13:26.46
	Then look for the JNI_FN(MuPDFCore_textAsHtml) function	13:27.03
	In there, you should see a call:	13:27.19
	dev = fz_new_stext_device(ctx, sheet, text);	13:27.27
	After that, try adding:	13:27.32
	fz_disable_device_hints(ctx, dev, FZ_IGNORE_IMAGE);	13:27.56
	That should tell the text extraction to stop ignoring images.	13:28.20
	Then try that out.	13:28.36
ediee	ok	13:29.07
	Robin : let me try and get back to u	13:29.17
	textAsHtml is used for reflow mode??	13:29.40
Robin_Watts	I believe so.	13:32.19
ediee	Robin : stop ignoring images means I assume that it should include image... right?	13:33.09
Robin_Watts	Yes.	13:33.18
ediee	Robin : what happens if the pdf page itself is an image.. for e.g., a scan copy...	13:34.58
Robin_Watts	ediee: Then reflow ain't gonna help much :)	13:38.28
ediee	ok... but it will display the page... i presume	13:38.45
Robin_Watts	ediee: Should do.	13:38.52
ediee	ok... :)	13:39.27
	Robin : let me try this	13:39.37
	Robin : it does not shows images	13:42.07
	i have tried	13:42.10
	i think there is no img tag in JNI_FN(MuPDFCore_textAsHtml)	13:42.34
	there we write all the html	13:42.42
Robin_Watts	fz_print_stext_page_html(ctx, out, text) knows how to write img tags.	13:42.58
	OK, so presumably you are either on a windows or a linux box ?	13:43.30
ediee	linux	13:43.43
Robin_Watts	OK, so build "mutool" for linux.	13:43.56
ediee	but I want so file... to include in my android app	13:44.23
Robin_Watts	Should be as easy as doing "make build=debug" in the top level.	13:44.30
	ediee: Yes, I know what you want, this is a test.	13:44.54
ediee	how to build mutool	13:45.30
	?	13:45.31
Robin_Watts	Should be as easy as doing "make build=debug" in the top level.	13:45.47
	Once you've built that, run: mutool draw -o out.html in.pdf	13:46.09
ediee	ok	13:46.09
Robin_Watts	and then hopefully there should be images in the out.html file.	13:46.37
ediee	Robin : ok let me check	13:47.00
	im getting fatal error while doing "make build=debug"	13:48.13
	error is : fatal error: X11/Xcursor/Xcursor.h: No such file or directory	13:48.18
Robin_Watts	make build=debug HAVE_X11=no	13:49.12
HenryStiles	kens: I meant to tell you sometime ago there isn't intended to be a "set" in pjl. The only way to set something is through the language. I wanted to keep that as is. Do you need that for some reason, it looked like you just added it for completeness.	13:52.16
kens	I don't remember adding a SET, is this the C code ?	13:52.53
	Because as I recall it only works with DEFAULT	13:53.11
ediee	Robin : I cant able to use mutool draw	13:55.21
Robin_Watts	ediee: Why not?	13:55.45
ediee	i dont know	13:55.58
	the draw option is not there	13:56.05
Robin_Watts	ediee: What version of mupdf are you using?	13:56.18
ediee	1.8	13:56.24
Robin_Watts	Do you have build/debug/mudraw ?	13:57.03
	(You were running build/debug/mutool draw, right?)	13:57.28
HenryStiles	kens: you added pjl_set_envvar and pjl_set_defvar, no?	13:57.57
kens	Err probably	13:58.09
	And yes, I think I added set_defvar for completeness	13:58.37
	Also possibly because there was a C warning, but I'm unsure of tht now. If its a problem then you can pull it back out	13:59.00
HenryStiles	kens: yeah just verifying you didn't need it for something with PDF/A	13:59.46
ediee	Robin : yess it got work now	13:59.57
	im checking the output	14:00.08
	Robin : no the output is not as like as pdf page	14:01.49
Robin_Watts	ediee: That's not what I asked.	14:02.07
	What I asked was "are there images in the output" ?	14:02.13
ediee	yess.. there is images in the output	14:02.33
Robin_Watts	ediee: Right.	14:02.47
kens	HenryStiles : If I need it I'd be calling it, so removing them will stop it cvompiling :-)	14:03.01
ediee	Robin : but the page is not is the format what the original pdf has	14:03.44
	?	14:03.46
Robin_Watts	ediee: So, if you've done the alteration to mupdf.c as I described above, and rebuilt correctly, then there will be images in the page that is sent to the webview for reflow.	14:04.13
	The layout not being correct is an entirely different question :)	14:04.32
ediee	ok... now wat abt the layout??	14:04.59
	its getting different	14:05.03
Robin_Watts	ediee: Well, I can't comment on that without seeing an example file.	14:05.14
	And even then, this is likely to be something that will require me to invest some time into looking at it.	14:05.38
ediee	Robin : ok	14:07.20
	I will try	14:07.26
	and let u knw	14:07.30
	thanks you for ur support	14:08.51
	will u be available tomorrow?	14:08.59
Robin_Watts	ediee: I will be here tomorrow, yes.	14:11.12
ediee	ok let me try today.. I will chat with u tomorrow abt today's progress	14:11.45
inarus	Hi	15:06.44
ghostbot	Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.	15:06.44
inarus	I need concatenate pdf. I am working for a company. Can I use the publicly available soft or do I need the commercial one?	15:10.27
kens	Which software are you referring to ?	15:10.50
	In either case (MuPDF, Ghostscript) the software is provided under the terms of the GNU AGPL, provided you abide by the terms of the licence you can use it. Otherwise you need a commercial licence.	15:11.40
	Please note that if you are referring to Ghostscript it does NOT concatenate PDF files.	15:12.00
inarus	Ok. My bad, I read some Web pages explaining how to concatenate pdf files with ghostscript. I must have misunderstood	15:14.12
kens	Many people think that Ghostscript concatenates PDF files, it does not. It interprets the input and can create a new PDF file which is visually the same as the input(s). However, the actual contents of the PDF files are not reflected in the output, so it is not concatenating the files.	15:15.24
inarus	Do you mean that content and/or formating could be missed?	15:19.37
kens	The visual appearance should be the same. Metadata may not be carried over and the internal representaton will not be the same	15:20.06
Robin_Watts	inarus: Stuff like Outlines or Annotations etc	15:20.23
kens	No Outlines and Annotatoins are preserved	15:20.33
Robin_Watts	kens: Stuff like Outlines and Annotations :)	15:21.00
kens	But The Creator won't be nor will some other elements, and the fotns may be differently described, the character codes could be differnt, images may be compressed differntly etc	15:21.07
inarus	Ok I get it. That might be a major issue for me, thank you	15:22.21
kens	NP	15:22.27
rayjj	inarus: the logs may not have caught up, but for most purposes, gs can combine PDF's into a single PDF. Links that specify a page number may be a problem (kens can address that)	15:27.12
	kens: does the pdfwrite adjust the page number destination in links for PDF's after the first input ?	15:28.00
kens	Up to a point yes	15:28.37
tor8	Robin_Watts: a bunch of commits on tor/master for review. sebras' stuff is LGTM but a second pair of eyes wouldn't hurt.	21:34.40
marcosw	HenryStiles: ping	23:06.44
HenryStiles	marcosw: hi	23:38.15
	Forward 1 day (to 2016/02/24)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.