Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2016/08/24)	20160825
k-man	hi	02:23.58
ghostbot	Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.	02:23.58
k-man	is there a way to print from mupdf?	02:24.11
fontdebug	Hi, there, I'd like to get coordinates from glyphs/chars in PDF. Tried -sDEVICE=txtwrite -dTextFormat=4 with ghostscript 9.19. But bbox y values are the same, meaning bbox=flat(?)	08:45.52
	corr. -dTextFormat=0, of course	08:46.08
kens	txtwrite will give the co-ordinates, if you request XML output,	08:46.15
	What version of Ghostscript are you using ?	08:46.23
fontdebug	9.19. My problem are the bbox y-values	08:46.41
kens	And can you put a copy of the PDF somewhere public so we can look at it ?	08:46.50
fontdebug	just a moment... let's seee..	08:47.10
kens	DropBox or something is fine, as long as you don;t mind the file being public	08:47.46
fontdebug	here is an example: https://www.zvdd.de/fileadmin/AGSDD-Redaktion/zvdd_MODS_Application_Profile_2.1.pdf	08:48.19
sebras	k-man: I think the mupdf app for Android can print through the cloud.	08:48.45
fontdebug	output of gs-9.19 is e. g. <char bbox="113 47 121 47" c="M"/>	08:48.54
kens	Give me a minute, are you looking at apge 1 ?	08:49.06
	page*	08:49.10
fontdebug	yes, page 1	08:49.17
kens	WHich font ? A lot fo them are not embedded	08:49.31
fontdebug	oops... ABCDEE+Calibri	08:49.50
kens	OK Calibri is embedded	08:50.02
sebras	k-man: The other apps (e.g. the one for Linux and) don't support printing yet.	08:50.12
fontdebug	the <char bbox="113 47 121 47" c="M"/> is the first letter on the first page of the mentioned pdf.	08:51.19
kens	Yeah I see that	08:51.33
	Finding out why might take a little longer....	08:52.00
fontdebug	I've also tried mupdf draw -F stext, but in mupdf chars below 32(dec) are put out as "?"	08:52.01
kens	Does MuPDF get the bbox correct ?	08:52.19
	The text output from MuPDF has generally had more work on it than the Ghostscript one	08:52.33
fontdebug	(in mupdf bbox seems correct)	08:53.15
	I have some rather obscure fonts in other PDFs where ghostscript says char is (correct), but I need proper coordinates to inspect glyphs.	08:54.25
kens	character codes below 32 are not unusual, especially with embedded subsets which often start from character code 1	08:55.02
fontdebug	Did a workaround by adding/subtracting a few pixels in y, but there seem to be cases where "the whole page" is a single glyph bbox...	08:56.24
kens	That coudl be possible, we've seen some pretty badly created fonts over the years	08:56.52
fontdebug	Will compare this with mutool. If mutool is correct, perhaps I get mutool to put out instead of "?"	08:57.34
	thanks so far.	09:03.10
kens	Can't say we've helped at all. At the moment I'm struggling to see where the bbox is set up, which is embarassing since I wrote the code....	09:03.39
fontdebug	perhaps in devices/vector/gdevtxtw.c	09:06.34
kens	Well yes, since that's the txtwrite device.....	09:06.48
	I meant within the code path	09:07.10
fontdebug	uuh... there are a lot of start.y & end.y ...	09:08.34
kens	Indeed, and mmany of them are nothign to do with the glyph, they relate to the positions of text fragnents	09:08.59
	Trying to piece together text out of a PDF file is a non-trivial task	09:09.15
fontdebug	PS: the y coordinates returned are slight above the baseline	09:09.27
kens	Probably the y-co-ordinate is the starting position of the text, though I haven't checked	09:10.07
	OK he reason is that the font is a horizontal writing font, not vertical	09:10.41
	So we deal with horizontals, but not verticals.	09:10.54
	I believe that we don't currently have a decent way to get a proper glyph bopunding box which is why its only reliable in the font writing direction.	09:11.21
	SO if you njeed an accurate BBox you are going to have to use MuPDF for now	09:11.37
fontdebug	Yes, because of "putting text output together" is non-trivial, we've bought pdflib tet, but I didn't know of this features of ghostscript+mutool before.	09:12.24
kens	Well, its all very heuristic, but as I say the MuPDF one has had more attention than the Ghostscript one	09:12.58
	They don't share the same code base, or even the same approach, though. So its possible that sometimes the Ghostscript one will perform better	09:13.30
	In any event, the reason the char bbox is 0 in the y direction is because its not a vertical font. We might well change that at some point in the future, but for now that';s the way it is. Better to stick with MuPDF I think.	09:16.01
	I'm sure you can change it to emit character codes < 32 if no Unicode code point information is available. and its something maybe tor8 or Robin_Watts might consider anyway	09:16.56
fontdebug	in mupdf, pdf-unicode.c, approx. line 100: change font->cid_to_ucs[cpt] = '?' to font->cid_to_ucs[cpt] = cpt	09:39.27
	corr.: line 99 (mupdf-1.9a)	09:40.30
kens	I suspected it would be straight-forward	09:40.56
fontdebug	here the output of "hg difff", gzipped and base64'd: H4sIAD69vlcAA41OTY+CMBQ8t79ibrqp1RbBD4wu/oe9mQ3BQrGRUAL0ZPa/b0EP64FkD2/e5M3Mey83WoO3WG/kVYSRiORWorOuVcWqyfVQ3NVG2bxYKso5R7aaksnXzeHsSgQRpIwDEa9DBEJuwEQgBGWM4frfdChjufubThLwfbTYgXncI0koCHkMQIzGvOtbU5fdRTX998c4JdrWPT8pk6e9TZ16ajjCn00ra++uSbOyek8eKPfJouqKkUzumH3ODpS9rHiMdNLs29P8Mzw2gC/6C0O2Au97AQAA	09:41.36
tor8	fontdebug: if you want the raw glyph positions, mutool draw -Ftrace (but you'll also need to apply the matrices to the coords)	09:42.51
kens	tor8 question for you on #artifex	09:44.26
fontdebug	did you mean me?	09:47.38
	(-kens)?	09:47.45
kens	Nope, I meant tor8	09:47.48
tor8	fontdebug: the 'cpt' there is essentially a random number with no known correlation to unicode, which is why we set it to '?'	10:02.59
kens	Its hte character code ?	10:03.13
tor8	kens: It's the character code, but usually the glyph index for an Identity-H encoded font missing a ToUnicode cmap.	10:04.09
kens	Right	10:04.18
tor8	so using that for text extraction and searching would, well, it wouldn't be much better than '?' :)	10:04.35
kens	Agreed, but if its what fontdebug wants.....	10:04.50
tor8	though we should probably be using U+FFFD (REPLACEMENT CHARACTER)	10:05.33
	if it's what he want, he's free to hack his own source, just beware of the risks of false positives	10:06.10
fontdebug	Yes, U+FFFD seems a bit clearer than "?".	10:34.54
	or something like U+FF00+cpt	10:37.07
tor8	fontdebug: cpt may be >= 256 for multibyte encodings	10:41.19
	though that's obviously not the case for that particular bit of code	10:42.02
fontdebug	bye.	11:20.54
	Forward 1 day (to 2016/08/26)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.