Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2020/10/29)Fwd 1 day (to 2020/10/31)>>>20201030 
pedr0 hi all - I am getting this error when compiling 1.18: error: call of overloaded ‘abs(float)’ is ambiguous09:49.28 
  https://pastebin.pl/view/5229443409:50.05 
  Is there any new dependency needed which wasn't needed prior to 1.18 ?09:50.39 
kens pedr0: there isn't anyone around who can answer that right at the moment, either stick around a bit or check back later in the logs09:53.26 
pedr0 Thanks. I think this has something to do with the CPP bindings which have been introduced, is there a way not to compile them at all ?09:57.42 
  Did this chap here get its way into the 1.18 release ? It does not look like that looking at the bug tracker09:58.40 
  https://bugs.ghostscript.com/show_bug.cgi?id=70271509:58.40 
kens pedr0: sorry I'm not one of the MuPDF developers and I can't answer any of your questions so far :-(10:00.48 
  There will be developers along in the next few hours who will be better placed to help10:01.12 
pedr0 Yes no problem at all, I just leaving a few lines here so when they log-in they can answer, or tell me that the questions are terribly silly :-)10:02.27 
kens you'll want either sebras (who is in Taiwan at the moment) ator (who is a late starter) or Robin_Watts_ (who is probably off shooting at clay things right now). One or more of them should be around 'soon'. I'm just filling in to say 'don't give up!' if anyone asks questions :-)10:04.26 
Robin_Watts_ pedr0: That commit did make it to 1.18.0, yes.10:05.29 
kens is surprised to see Robin_Watts_10:05.39 
Robin_Watts_ Too windy to shoot clays this morning, apparently :)10:05.52 
kens Oh! never thought of that....10:06.14 
Robin_Watts_ pedr0: 1.18.0 includes tesseract and leptonica as dependencies.10:07.50 
pedr0 ok, thanks. Are you guys scraping text from images or anything of that sort ?10:16.39 
  I did not know MULIB had such capabilities10:16.51 
  I still get the same error:10:17.47 
  usr/src/app/w/src/TextFilter.h: In member function ‘bool Character::operator==(const Character&) const’:10:17.49 
  if( abs( this->m_x0 - a.m_x0 ) > 1)10:17.49 
  ^10:17.49 
  In file included from /usr/include/c++/6/cstdlib:75:0,10:17.49 
kens All new :-)10:17.49 
pedr0 from /usr/include/c++/6/stdlib.h:36,10:17.52 
  from /usr/local/include/mupdf/memento.h:187,10:17.54 
  from /usr/local/include/mupdf/fitz/system.h:36,10:17.56 
  from /usr/local/include/mupdf/fitz.h:10,10:17.58 
  from /usr/src/app/w/src/TextFilter.h:8,10:18.00 
  from /usr/src/app/w/src/TextFilter.cpp:14:10:18.02 
  extern int abs (int __x) __THROW __attribute__ ((__const__)) __wur;10:18.04 
  ^~~10:18.06 
  That's what I've installed: tesseract-ocr leptonica-progs10:18.28 
  I've the feeling this has to do with the C++ STDLIB I've installed on my system. What do you compile this stuff with ? g++ ?10:18.59 
sh4rm4^bnc use std::abs in you C++ code10:19.00 
pedr0 that's not my code tough :)10:20.24 
  *though*10:20.32 
sh4rm4^bnc oh, mupdf uses C++ now too ? eek10:20.44 
kens MuPDF does not, tesseract does10:21.01 
  If you want to build Leptonica and Tesseract you need to use C++, same goes for Harfbuzz10:21.21 
sh4rm4^bnc *harfbutt10:21.36 
kens laughs10:21.48 
sh4rm4^bnc oh lol leptonica is C++ now too? wtf10:21.55 
kens But AIUI you *ought* to be able to write pure C code and link it, however this is distinctly not my field10:22.15 
  I'm not certain about Leptonica10:22.25 
  but Tesseract needs it and so they got pulled in tofgether10:22.40 
sh4rm4^bnc the versions i used used to be in C10:22.44 
kens Its possible it still is10:23.06 
sh4rm4^bnc but dan bloomberg used to stuff so many example programs into his tarball that it went from 4MB to 10MB in a year so i stopped updating it10:23.19 
kens Robin_Watts_: did the integration for both MuPDF and Ghostscript, so I'm hazy on the details. I tend to lump the two together as they ariived together10:23.43 
sh4rm4^bnc <pedr0> if( abs( this->m_x0 - a.m_x0 ) > 1)10:24.07 
  anyway the fix is to make this std::abs10:24.15 
kens IIRC there's a build option (possibly --without-tesseract) which won't try to build them in, or you can just delete he leptonica and tesseract directories10:24.58 
  Obviously that means teh OCR stuff won't be built either10:25.18 
ator pedr0: um, TextFilter.cpp is not a file that's part of mupdf...10:25.33 
  harfbuzz is implemented in C++ but has no dependency on the C++ standard library, and exposes a pure C api.10:26.17 
  leptonica is still C10:26.30 
sh4rm4^bnc <310:26.35 
ator tesseract is icky C++ though10:26.39 
  neither tesseract nor leptonica should build by default, you need "make tesseract=yes" to enable it10:27.16 
kens Morning ator10:27.18 
ator kens: Morning!10:27.46 
kens I'm glad you and Robin are here, I can stop talking cautiously about stuff I don't understand :-)10:28.14 
ator thanks for holding down the fort!10:29.19 
  or however that saying goes10:29.28 
kens Sounds right to me :-)10:29.57 
pedr0 How can I take advantage of the new features that use tesseract/leptonica ? What puzzles me is that I've a system where it compiles no problem, and another system, based on debian 9, which does not.10:30.42 
sh4rm4^bnc pedr0, i think it depends on the GCC version used10:31.38 
  i have a program that compiled fine with GCC 4.7.4, but threw the exact same abs error with GCC 6.5.010:32.16 
ator pedr0: exactly what are you building? the error you show is in a file that is NOT a part of any mupdf or thirdparty sources shipped with mupdf.10:32.17 
  pedr0: to take advantage of tesseract, you pass "tesseract=yes" as an argument to make10:33.29 
pedr0 I just run make10:37.54 
  hang on10:38.15 
  am I a complete fool ?10:38.28 
sh4rm4^bnc so where's usr/src/app/w/src/TextFilter.h from?10:39.10 
pedr0 The reason why I thought it was is that the only thing that I've done is too upgrade the MULIB library version10:39.23 
  Yeah it was my application source10:42.40 
  what an idiot, why on earth was this not a problem prior to the upgrade of the release .. I do not know.10:43.03 
  Sorry everybody for bothering such a thing.10:43.28 
  *for such a thing10:43.35 
  my English is leaving me10:43.41 
sh4rm4^bnc tranquilo10:43.59 
pedr0 :-)10:44.13 
  I quite interested in the new OCR/tesseract feature, is it documented somewhere ?10:44.24 
kens Hmm documentation... Novel concept :-)10:45.43 
sh4rm4^bnc hehe10:45.55 
kens Its probably in the release notes10:46.03 
pedr0 :-)10:49.11 
  https://bugs.ghostscript.com/show_bug.cgi?id=70271510:49.12 
  I reckon that is still on the master branch and it hasn't been released as yet10:49.36 
Robin_Watts_ pedr0: As I said earlier, that DID make it to 1.18.010:49.55 
pedr0 Oh. Sorry, trying to do too many things at the same time.10:51.50 
  Yeah the release note mentions that 'api: Optional use of Tesseract to use OCR to extract text.' Is there an example anywhere ?11:11.58 
ator pedr0: grep for fz_new_ocr_device11:14.24 
pedr0 Thanks11:15.30 
Robin_Watts_ pedr0: Are you calling at the C level? Or are you calling mutool ?11:44.16 
pedr0 I am interested in both really11:45.17 
  but I am used to navigate through the device's definition in the C files to read the documentation there, which generally good11:45.48 
  can I OCR a file from mutool as well ?11:45.57 
Robin_Watts_ ok, so at the mutool level, formats with .ocr in them will use the OCR stuff.11:46.10 
  mutool draw -o out.ocr.txt -r200 in.pdf11:46.28 
pedr0 Sorry, maybe I am getting confused. Does that mean that simply using the suffix 'ocr' will cause the program to try to OCR the images within a given PDF ?11:47.58 
Robin_Watts_ pedr0: Not quite.11:48.15 
  We look at the suffix to the file given on -o to guess a format.11:48.31 
  or you can specify a format using -F.11:48.40 
  And certain formats, namely: ocr.txt, ocr.html, ocr.xhtml, ocr.stext will trigger the use of ocr.11:49.06 
  Also ocr.pdf will trigger the use of bitmap-wrapped-as-pdf with OCR.11:49.28 
  see the usage message for mutool.11:49.46 
  You'll need traineddata for tesseract to use.11:50.24 
  like eng.traineddata from here: https://github.com/tesseract-ocr/tessdata_fast11:50.50 
  and you can specify the language(s) to use by using "-t eng" or "-t eng+ara" etc.11:51.17 
  default is "eng".11:51.34 
pedr0 I think I get the gist, is it OCR *only* images or does it try to OCR the whole pdf seen as an image ? it must be the former now that I think about it11:52.49 
Robin_Watts_ pedr0: We render the pdf to a bitmap, images/text/everything. Then we ocr that.11:53.26 
pedr0 I see, thanks a lot as usual, really helpful. I will let you know how it went.11:54.12 
Robin_Watts_ specifically we want to OCR stuff that's text already, cos frequently, in badly generated PDFs, which are *way* too common, the text might look right, but it has crap unicode values, so cutting and pasting doesn't work.11:54.37 
pedr0 ah I see11:55.19 
  and does it work well generally ? Or does it require a lot of manual intervention thereafter11:57.18 
  no need to answer to that one, I will check myself, it depends on an awful lot of factors I would answer myself :-)11:58.26 
Robin_Watts_ pedr0: It uses the tesseract engine, which is reputedly a good one, but as you say, all sorts of factors involved.11:58.48 
 <<<Back 1 day (to 2020/10/29)Forward 1 day (to 2020/10/31)>>> 
ghostscript.com #ghostscript
Search: