IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2016/08/24)20160825 
k-man hi02:23.58 
ghostbot Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.02:23.58 
k-man is there a way to print from mupdf?02:24.11 
fontdebug Hi, there, I'd like to get coordinates from glyphs/chars in PDF. Tried -sDEVICE=txtwrite -dTextFormat=4 with ghostscript 9.19. But bbox y values are the same, meaning bbox=flat(?)08:45.52 
  corr. -dTextFormat=0, of course08:46.08 
kens txtwrite will give the co-ordinates, if you request XML output,08:46.15 
  What version of Ghostscript are you using ?08:46.23 
fontdebug 9.19. My problem are the bbox y-values08:46.41 
kens And can you put a copy of the PDF somewhere public so we can look at it ?08:46.50 
fontdebug just a moment... let's seee..08:47.10 
kens DropBox or something is fine, as long as you don;t mind the file being public08:47.46 
fontdebug here is an example: https://www.zvdd.de/fileadmin/AGSDD-Redaktion/zvdd_MODS_Application_Profile_2.1.pdf08:48.19 
sebras k-man: I think the mupdf app for Android can print through the cloud.08:48.45 
fontdebug output of gs-9.19 is e. g. <char bbox="113 47 121 47" c="M"/>08:48.54 
kens Give me a minute, are you looking at apge 1 ?08:49.06 
  page*08:49.10 
fontdebug yes, page 108:49.17 
kens WHich font ? A lot fo them are not embedded08:49.31 
fontdebug oops... ABCDEE+Calibri08:49.50 
kens OK Calibri is embedded08:50.02 
sebras k-man: The other apps (e.g. the one for Linux and) don't support printing yet.08:50.12 
fontdebug the <char bbox="113 47 121 47" c="M"/> is the first letter on the first page of the mentioned pdf.08:51.19 
kens Yeah I see that08:51.33 
  Finding out why might take a little longer....08:52.00 
fontdebug I've also tried mupdf draw -F stext, but in mupdf chars below 32(dec) are put out as "?"08:52.01 
kens Does MuPDF get the bbox correct ?08:52.19 
  The text output from MuPDF has generally had more work on it than the Ghostscript one08:52.33 
fontdebug (in mupdf bbox seems correct)08:53.15 
  I have some rather obscure fonts in other PDFs where ghostscript says char is &#x1; (correct), but I need proper coordinates to inspect glyphs.08:54.25 
kens character codes below 32 are not unusual, especially with embedded subsets which often start from character code 108:55.02 
fontdebug Did a workaround by adding/subtracting a few pixels in y, but there seem to be cases where "the whole page" is a single glyph bbox...08:56.24 
kens That coudl be possible, we've seen some pretty badly created fonts over the years08:56.52 
fontdebug Will compare this with mutool. If mutool is correct, perhaps I get mutool to put out &#x1; instead of "?"08:57.34 
  thanks so far.09:03.10 
kens Can't say we've helped at all. At the moment I'm struggling to see where the bbox is set up, which is embarassing since I wrote the code....09:03.39 
fontdebug perhaps in devices/vector/gdevtxtw.c09:06.34 
kens Well yes, since that's the txtwrite device.....09:06.48 
  I meant within the code path09:07.10 
fontdebug uuh... there are a lot of start.y & end.y ...09:08.34 
kens Indeed, and mmany of them are nothign to do with the glyph, they relate to the positions of text fragnents09:08.59 
  Trying to piece together text out of a PDF file is a non-trivial task09:09.15 
fontdebug PS: the y coordinates returned are slight above the baseline09:09.27 
kens Probably the y-co-ordinate is the starting position of the text, though I haven't checked09:10.07 
  OK he reason is that the font is a horizontal writing font, not vertical09:10.41 
  So we deal with horizontals, but not verticals.09:10.54 
  I believe that we don't currently have a decent way to get a proper glyph bopunding box which is why its only reliable in the font writing direction.09:11.21 
  SO if you njeed an accurate BBox you are going to have to use MuPDF for now09:11.37 
fontdebug Yes, because of "putting text output together" is non-trivial, we've bought pdflib tet, but I didn't know of this features of ghostscript+mutool before.09:12.24 
kens Well, its all very heuristic, but as I say the MuPDF one has had more attention than the Ghostscript one09:12.58 
  They don't share the same code base, or even the same approach, though. So its possible that sometimes the Ghostscript one will perform better09:13.30 
  In any event, the reason the char bbox is 0 in the y direction is because its not a vertical font. We might well change that at some point in the future, but for now that';s the way it is. Better to stick with MuPDF I think.09:16.01 
  I'm sure you can change it to emit character codes < 32 if no Unicode code point information is available. and its something maybe tor8 or Robin_Watts might consider anyway09:16.56 
fontdebug in mupdf, pdf-unicode.c, approx. line 100: change font->cid_to_ucs[cpt] = '?' to font->cid_to_ucs[cpt] = cpt09:39.27 
  corr.: line 99 (mupdf-1.9a)09:40.30 
kens I suspected it would be straight-forward09:40.56 
fontdebug here the output of "hg difff", gzipped and base64'd: H4sIAD69vlcAA41OTY+CMBQ8t79ibrqp1RbBD4wu/oe9mQ3BQrGRUAL0ZPa/b0EP64FkD2/e5M3Mey83WoO3WG/kVYSRiORWorOuVcWqyfVQ3NVG2bxYKso5R7aaksnXzeHsSgQRpIwDEa9DBEJuwEQgBGWM4frfdChjufubThLwfbTYgXncI0koCHkMQIzGvOtbU5fdRTX998c4JdrWPT8pk6e9TZ16ajjCn00ra++uSbOyek8eKPfJouqKkUzumH3ODpS9rHiMdNLs29P8Mzw2gC/6C0O2Au97AQAA09:41.36 
tor8 fontdebug: if you want the raw glyph positions, mutool draw -Ftrace (but you'll also need to apply the matrices to the coords)09:42.51 
kens tor8 question for you on #artifex09:44.26 
fontdebug did you mean me?09:47.38 
  (-kens)?09:47.45 
kens Nope, I meant tor809:47.48 
tor8 fontdebug: the 'cpt' there is essentially a random number with no known correlation to unicode, which is why we set it to '?'10:02.59 
kens Its hte character code ?10:03.13 
tor8 kens: It's the character code, but usually the glyph index for an Identity-H encoded font missing a ToUnicode cmap.10:04.09 
kens Right10:04.18 
tor8 so using that for text extraction and searching would, well, it wouldn't be much better than '?' :)10:04.35 
kens Agreed, but if its what fontdebug wants.....10:04.50 
tor8 though we should probably be using U+FFFD (REPLACEMENT CHARACTER)10:05.33 
  if it's what he want, he's free to hack his own source, just beware of the risks of false positives10:06.10 
fontdebug Yes, U+FFFD seems a bit clearer than "?".10:34.54 
  or something like U+FF00+cpt10:37.07 
tor8 fontdebug: cpt may be >= 256 for multibyte encodings10:41.19 
  though that's obviously not the case for that particular bit of code10:42.02 
fontdebug bye.11:20.54 
 Forward 1 day (to 2016/08/26)>>> 
ghostscript.com
Search: