| <<<Back 1 day (to 2020/11/16) | Fwd 1 day (to 2020/11/18)>>> | 20201117 |
sebras | @paulgardiner yes, as far as I can see we were setting the ReadOnly flag in /Ff in check_field_locking(), which is called during signing. | 04:36.05 |
| I lost your question in the log yesterday... | 04:37.28 |
artifexirc-bot | <sebras> @Robin_Watts that commit, "Fix lgtm issues: int * int promoted to size_t can lose bits.", looks reasonable to me. | 11:12.58 |
| <Robin_Watts> @sebras Thanks. | 11:21.10 |
| <sebras> @Robin_Watts you saw that I updated sebras/signatures and that I tried to explain the ReadOnly NULL-check above..? | 11:22.39 |
| <Robin_Watts> @sebras I missed that you'd updated it. Looking now. | 11:23.20 |
| <Robin_Watts> @sebras The name == NULL thing looks great. | 11:24.35 |
| <Robin_Watts> It was all lgtm before, so it's even more so now. | 11:25.02 |
sebras | @Robin_Watts ah, I thought I needed one last lgtm before pushing. | 11:27.37 |
| @Robin_Watts unless you tell me no I want to push my signature tests now. | 15:06.17 |
artifexirc-bot | <Robin_Watts> please go for it! | 15:06.29 |
| <Robin_Watts> @sebras @ator Top 2 commits here: https://git.ghostscript.com/?p=user/robin/mupdf.git | 15:08.45 |
ator | sebras: which repo has the signature tests? | 15:09.13 |
| @Robin_Watts both LGTM | 15:09.42 |
artifexirc-bot | <Robin_Watts> @ator Thanks. | 15:09.54 |
sebras | @ator tests_private/pdf/js | 15:10.06 |
| @ator I think ffi_PDFDocument_validateChangeHistory() is wrong. | 15:17.05 |
| it pushes a boolean but should push a number. | 15:17.15 |
| if you look at the description for it in include/mupdf/pdf/xref.h it says: "return the number of the last version that checked out OK." | 15:18.04 |
ator | @sebras looks that way | 15:18.08 |
sebras | and do_info() treats it as a number too! | 15:18.32 |
ator | @sebras fix on tor/master | 15:19.18 |
sebras | LGTM | 15:19.32 |
| I get the sinking feeling that this might have been the only reason for me wanting to do the coerscscssccssion. | 15:20.43 |
| maybe it is polish? coerszion. | 15:21.20 |
artifexirc-bot | <Robin_Watts> Coverity is having a bad hair day. | 15:22.56 |
| <Robin_Watts> Top 3 on here now: https://git.ghostscript.com/?p=user/robin/mupdf.git | 15:23.23 |
| <Robin_Watts> zoom time in 7 minutes. | 15:23.36 |
ator | @Robin_Watts ugh, these (size_t) multiplication casts everywhere are starting to itch | 15:26.31 |
| all 3 LGTM | 15:26.36 |
artifexirc-bot | <Robin_Watts> @ator Well, arguably we should be using size_t rather than ints in the code. | 15:28.56 |
| <Robin_Watts> @ator Well, arguably we should be using size_t rather than ints more prevalently in the code. | 15:29.07 |
| <ator> @Robin_Watts signed numbers in C were a huge mistake :) | 15:30.08 |
| <ator> the only lesson I've learned here is to NEVER mix signed and unsigned int in the same codebase... | 15:30.47 |
Zsolt | hello | 16:10.56 |
mubot | Welcome to #mupdf, the channel for MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line. | 16:10.56 |
Zsolt | I have a pdf with searchable text. The problem is that between the letters of every word a newline is present. I know this by copy pasting the text. | 16:16.03 |
| Also using Adobe function to search for potential font problems it gives me "text references glyph, but glyph does not have a contour" | 16:16.04 |
artifexirc-bot | <Robin_Watts> Zsolt: We are in a meeting at the moment, so bear with the lag. | 16:17.03 |
Zsolt | ok, no problem | 16:17.17 |
ator | Zsolt: did you build from source, or are you using a linux distro version? | 16:17.26 |
| there have been recent bugs in freetype that can have this effect, so if you've got un unlucky combination you can get this | 16:17.55 |
Zsolt | ator: no, I'm on windows. It is about a document downloaded from the internet. | 16:18.16 |
artifexirc-bot | <Robin_Watts> Zsolt: Your PDF will not have a newline present, cos those aren't present in PDFs. What you probably mean is that "when Mupdf extracts the text, it puts a newline in between each character". | 16:18.19 |
ator | Zsolt: open a bug at bugs.ghostscript.com and attach the file, and we can take a look | 16:18.54 |
Zsolt | ator: thanks. But you did not understood me. I'm not talking about mupdf. I just ask for help with a pdf I downloaded from the internet and I have font problems with it. | 16:20.30 |
| It is not about mupdf | 16:20.35 |
| I just ask for advice | 16:20.42 |
| :) | 16:20.45 |
ator | Zsolt: if you point me at the file I can take a quick look | 16:22.29 |
Zsolt | I'm trying to upload the file | 16:26.28 |
sebras | @Robin_Watts I don't have permissions to put the new test cases into the regression system, do you mind checking those in? I stored them under tmp/tests_private on my casper account. | 16:30.56 |
artifexirc-bot | <Robin_Watts> sebras: Of course. will do that now. | 16:31.10 |
Zsolt | ator: https://dfiles.eu/files/6ic8nxpth | 16:31.22 |
sebras | @Robin_Watts probably safer than giving my sudo rights. ;) | 16:31.30 |
Zsolt | there are some advertisement on the webpage, but there is a big button with "download regular" | 16:31.57 |
sebras | @Robin_Watts I have of course run this locally without any issues. | 16:33.59 |
Zsolt | ator: https://filetransfer.io/data-package/xkmoxfnk | 16:34.32 |
| this is a better link, with no advertisement | 16:34.48 |
artifexirc-bot | <Robin_Watts> sebras: committed. | 16:36.16 |
sebras | @Robin_Watts thanks! now hold your ancles, cross your I:s and dot your T:s! | 16:37.02 |
ator | Zsolt: I have the file now | 16:46.20 |
Zsolt | ator: thank you in advance | 16:54.58 |
| if I copy paste text from the pdf to a text file, it will be pasted as: | 17:37.36 |
| e | 17:37.37 |
| x | 17:37.38 |
| a | 17:37.38 |
| m | 17:37.39 |
| p | 17:37.40 |
| l | 17:37.42 |
| e | 17:37.43 |
| insted as 'example' | 17:37.49 |
| and because of this I can't search the pdf although it has a searchable text layer | 17:38.38 |
sebras | Zsolt: if you don't get an answer from ator today (it is evening in europe), hang around until tomorrow and he'll probably be back. | 17:41.10 |
Zsolt | sebras: ok, thank you, I'm also from Europe, maybe I could exit and enter again tomorrow afternoon | 17:43.17 |
sebras | Zsolt: of course. :) just wanted to ask you to be patient, even if it might have looked like ator started to work on it immediately. | 17:45.11 |
Zsolt | ok | 17:46.21 |
malc_ | Zsolt: is there an example (pun intended) document available? | 18:59.37 |
Zsolt | yes | 19:00.02 |
| malc_: https://filetransfer.io/data-package/xkmoxfnk | 19:00.18 |
| this is a 3 page example from the document in question | 19:00.46 |
malc_ | Zsolt: thanks, looking | 19:01.59 |
Zsolt | thank you | 19:04.28 |
malc_ | Zsolt: "mutool draw -o sample.txt sample.pdf" produces txt where (almost) every character is followed by a new-line, VeryDOC produced something funny, and ator is indeed your best hope for getting an answer why | 19:06.25 |
Zsolt | malc_: thank you, so there are some problem with the document text layer? | 19:12.23 |
malc_ | Zsolt: there layout is mupdf unfriendly that's all i can say :( | 19:14.03 |
Zsolt | ok | 19:14.27 |
| malc_: ator is one of the main developers of mupdf? | 19:15.13 |
malc_ | Zsolt: yes | 19:15.38 |
artifexirc-bot | <KenSharp> The original author | 19:15.46 |
malc_ | KenSharp: wonder what the situation wrt text extraction form this document is when GS is used | 19:16.56 |
Zsolt | I will quit, I will be back tomorrow and talk to ator | 19:23.11 |
| thank you for trying | 19:23.33 |
| to help me | 19:23.38 |
| bye | 19:25.09 |
artifexirc-bot | <KenSharp> malc no idea, let me get the file and I can try it | 19:46.29 |
malc_ | KenSharp: thanks, an interesting exercise | 19:47.06 |
artifexirc-bot | <KenSharp> Seems I need to build a binary first | 19:47.11 |
malc_ | KenSharp: pdf.js has problems with copy&paste on this document, not as severe but they are still there | 19:48.40 |
artifexirc-bot | <KenSharp> Well I only get the 1st page, but there's a known bug with txtwrite at the moment, and I suspect its triggering it | 19:49.08 |
| <KenSharp> I get a 'reasonable' approximation,. but some of the spacing is missing so words run together | 19:49.32 |
| <KenSharp> EG "F.3.2 Packetsimpliļ¬cation" | 19:49.45 |
| <KenSharp> Was there a particular line of interest on the first page ? | 19:50.03 |
| <KenSharp> Oh wait, that's a different file! | 19:50.37 |
| <KenSharp> Yeah I'm afraid this is hitting the known bug | 19:52.01 |
| <KenSharp> OK a slightly older version owrks | 19:53.36 |
| <KenSharp> The text is not sane | 19:53.56 |
| <KenSharp> There are loads of spaces between characters | 19:54.39 |
malc_ | KenSharp: i fail to parse that, it either works and the output (text) is sane, or it does not, which is it? | 19:54.40 |
artifexirc-bot | <KenSharp> An older version parses the code and the output contains text, but..... | 19:55.15 |
| <KenSharp> The txtwrite default output model is to try to recreate the layout using ASCII | 19:55.29 |
| <KenSharp> And the layout is utterly mad, it has many spaces between characters which, in the original, are consecutive | 19:55.49 |
malc_ | KenSharp: gotcha, thanks for an explanation | 19:56.43 |
artifexirc-bot | <KenSharp> The 'XML-like' output looks better | 19:56.58 |
| <KenSharp> some very strange Unicode code points too | 19:58.13 |
| <KenSharp> Oh copyright symbol, emdash, I guess that is reasonable | 19:59.14 |
| <<<Back 1 day (to 2020/11/16) | Forward 1 day (to 2020/11/18)>>> | |