Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2020/11/16)Fwd 1 day (to 2020/11/18)>>>20201117 
sebras @paulgardiner yes, as far as I can see we were setting the ReadOnly flag in /Ff in check_field_locking(), which is called during signing.04:36.05 
  I lost your question in the log yesterday...04:37.28 
artifexirc-bot <sebras> @Robin_Watts that commit, "Fix lgtm issues: int * int promoted to size_t can lose bits.", looks reasonable to me.11:12.58 
  <Robin_Watts> @sebras Thanks.11:21.10 
  <sebras> @Robin_Watts you saw that I updated sebras/signatures and that I tried to explain the ReadOnly NULL-check above..?11:22.39 
  <Robin_Watts> @sebras I missed that you'd updated it. Looking now.11:23.20 
  <Robin_Watts> @sebras The name == NULL thing looks great.11:24.35 
  <Robin_Watts> It was all lgtm before, so it's even more so now.11:25.02 
sebras @Robin_Watts ah, I thought I needed one last lgtm before pushing.11:27.37 
  @Robin_Watts unless you tell me no I want to push my signature tests now.15:06.17 
artifexirc-bot <Robin_Watts> please go for it!15:06.29 
  <Robin_Watts> @sebras @ator Top 2 commits here: https://git.ghostscript.com/?p=user/robin/mupdf.git15:08.45 
ator sebras: which repo has the signature tests?15:09.13 
  @Robin_Watts both LGTM15:09.42 
artifexirc-bot <Robin_Watts> @ator Thanks.15:09.54 
sebras @ator tests_private/pdf/js15:10.06 
  @ator I think ffi_PDFDocument_validateChangeHistory() is wrong.15:17.05 
  it pushes a boolean but should push a number.15:17.15 
  if you look at the description for it in include/mupdf/pdf/xref.h it says: "return the number of the last version that checked out OK."15:18.04 
ator @sebras looks that way15:18.08 
sebras and do_info() treats it as a number too!15:18.32 
ator @sebras fix on tor/master15:19.18 
sebras LGTM15:19.32 
  I get the sinking feeling that this might have been the only reason for me wanting to do the coerscscssccssion.15:20.43 
  maybe it is polish? coerszion.15:21.20 
artifexirc-bot <Robin_Watts> Coverity is having a bad hair day.15:22.56 
  <Robin_Watts> Top 3 on here now: https://git.ghostscript.com/?p=user/robin/mupdf.git15:23.23 
  <Robin_Watts> zoom time in 7 minutes.15:23.36 
ator @Robin_Watts ugh, these (size_t) multiplication casts everywhere are starting to itch15:26.31 
  all 3 LGTM15:26.36 
artifexirc-bot <Robin_Watts> @ator Well, arguably we should be using size_t rather than ints in the code.15:28.56 
  <Robin_Watts> @ator Well, arguably we should be using size_t rather than ints more prevalently in the code.15:29.07 
  <ator> @Robin_Watts signed numbers in C were a huge mistake :)15:30.08 
  <ator> the only lesson I've learned here is to NEVER mix signed and unsigned int in the same codebase...15:30.47 
Zsolt hello16:10.56 
mubot Welcome to #mupdf, the channel for MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.16:10.56 
Zsolt I have a pdf with searchable text. The problem is that between the letters of every word a newline is present. I know this by copy pasting the text.16:16.03 
  Also using Adobe function to search for potential font problems it gives me "text references glyph, but glyph does not have a contour"16:16.04 
artifexirc-bot <Robin_Watts> Zsolt: We are in a meeting at the moment, so bear with the lag.16:17.03 
Zsolt ok, no problem16:17.17 
ator Zsolt: did you build from source, or are you using a linux distro version?16:17.26 
  there have been recent bugs in freetype that can have this effect, so if you've got un unlucky combination you can get this16:17.55 
Zsolt ator: no, I'm on windows. It is about a document downloaded from the internet.16:18.16 
artifexirc-bot <Robin_Watts> Zsolt: Your PDF will not have a newline present, cos those aren't present in PDFs. What you probably mean is that "when Mupdf extracts the text, it puts a newline in between each character".16:18.19 
ator Zsolt: open a bug at bugs.ghostscript.com and attach the file, and we can take a look16:18.54 
Zsolt ator: thanks. But you did not understood me. I'm not talking about mupdf. I just ask for help with a pdf I downloaded from the internet and I have font problems with it.16:20.30 
  It is not about mupdf16:20.35 
  I just ask for advice16:20.42 
  :)16:20.45 
ator Zsolt: if you point me at the file I can take a quick look16:22.29 
Zsolt I'm trying to upload the file16:26.28 
sebras @Robin_Watts I don't have permissions to put the new test cases into the regression system, do you mind checking those in? I stored them under tmp/tests_private on my casper account.16:30.56 
artifexirc-bot <Robin_Watts> sebras: Of course. will do that now.16:31.10 
Zsolt ator: https://dfiles.eu/files/6ic8nxpth16:31.22 
sebras @Robin_Watts probably safer than giving my sudo rights. ;)16:31.30 
Zsolt there are some advertisement on the webpage, but there is a big button with "download regular"16:31.57 
sebras @Robin_Watts I have of course run this locally without any issues.16:33.59 
Zsolt ator: https://filetransfer.io/data-package/xkmoxfnk16:34.32 
  this is a better link, with no advertisement16:34.48 
artifexirc-bot <Robin_Watts> sebras: committed.16:36.16 
sebras @Robin_Watts thanks! now hold your ancles, cross your I:s and dot your T:s!16:37.02 
ator Zsolt: I have the file now16:46.20 
Zsolt ator: thank you in advance16:54.58 
  if I copy paste text from the pdf to a text file, it will be pasted as:17:37.36 
  e17:37.37 
  x17:37.38 
  a17:37.38 
  m17:37.39 
  p17:37.40 
  l17:37.42 
  e17:37.43 
  insted as 'example'17:37.49 
  and because of this I can't search the pdf although it has a searchable text layer17:38.38 
sebras Zsolt: if you don't get an answer from ator today (it is evening in europe), hang around until tomorrow and he'll probably be back.17:41.10 
Zsolt sebras: ok, thank you, I'm also from Europe, maybe I could exit and enter again tomorrow afternoon17:43.17 
sebras Zsolt: of course. :) just wanted to ask you to be patient, even if it might have looked like ator started to work on it immediately.17:45.11 
Zsolt ok17:46.21 
malc_ Zsolt: is there an example (pun intended) document available?18:59.37 
Zsolt yes19:00.02 
  malc_: https://filetransfer.io/data-package/xkmoxfnk19:00.18 
  this is a 3 page example from the document in question19:00.46 
malc_ Zsolt: thanks, looking19:01.59 
Zsolt thank you19:04.28 
malc_ Zsolt: "mutool draw -o sample.txt sample.pdf" produces txt where (almost) every character is followed by a new-line, VeryDOC produced something funny, and ator is indeed your best hope for getting an answer why19:06.25 
Zsolt malc_: thank you, so there are some problem with the document text layer?19:12.23 
malc_ Zsolt: there layout is mupdf unfriendly that's all i can say :(19:14.03 
Zsolt ok19:14.27 
  malc_: ator is one of the main developers of mupdf?19:15.13 
malc_ Zsolt: yes19:15.38 
artifexirc-bot <KenSharp> The original author19:15.46 
malc_ KenSharp: wonder what the situation wrt text extraction form this document is when GS is used19:16.56 
Zsolt I will quit, I will be back tomorrow and talk to ator19:23.11 
  thank you for trying19:23.33 
  to help me19:23.38 
  bye19:25.09 
artifexirc-bot <KenSharp> malc no idea, let me get the file and I can try it19:46.29 
malc_ KenSharp: thanks, an interesting exercise19:47.06 
artifexirc-bot <KenSharp> Seems I need to build a binary first19:47.11 
malc_ KenSharp: pdf.js has problems with copy&paste on this document, not as severe but they are still there19:48.40 
artifexirc-bot <KenSharp> Well I only get the 1st page, but there's a known bug with txtwrite at the moment, and I suspect its triggering it19:49.08 
  <KenSharp> I get a 'reasonable' approximation,. but some of the spacing is missing so words run together19:49.32 
  <KenSharp> EG "F.3.2 Packetsimpliļ¬cation"19:49.45 
  <KenSharp> Was there a particular line of interest on the first page ?19:50.03 
  <KenSharp> Oh wait, that's a different file!19:50.37 
  <KenSharp> Yeah I'm afraid this is hitting the known bug19:52.01 
  <KenSharp> OK a slightly older version owrks19:53.36 
  <KenSharp> The text is not sane19:53.56 
  <KenSharp> There are loads of spaces between characters19:54.39 
malc_ KenSharp: i fail to parse that, it either works and the output (text) is sane, or it does not, which is it?19:54.40 
artifexirc-bot <KenSharp> An older version parses the code and the output contains text, but.....19:55.15 
  <KenSharp> The txtwrite default output model is to try to recreate the layout using ASCII19:55.29 
  <KenSharp> And the layout is utterly mad, it has many spaces between characters which, in the original, are consecutive19:55.49 
malc_ KenSharp: gotcha, thanks for an explanation19:56.43 
artifexirc-bot <KenSharp> The 'XML-like' output looks better19:56.58 
  <KenSharp> some very strange Unicode code points too19:58.13 
  <KenSharp> Oh copyright symbol, emdash, I guess that is reasonable19:59.14 
 <<<Back 1 day (to 2020/11/16)Forward 1 day (to 2020/11/18)>>> 
ghostscript.com #ghostscript
Search: