| <<<Back 1 day (to 2019/08/19) | Fwd 1 day (to 2019/08/21)>>> | 20190820 |
bsdfan12 | hello world | 06:37.58 |
| why Python is needed my mupdf? I wonder if mupdf shall not meant to be fast and efficient. | 06:38.24 |
pink_mist | is it? https://mupdf.com/docs/building.html doesn't mention anything about python at all | 07:07.35 |
| where did you get that information from? | 07:07.45 |
ELIZABETH21 | Hello, I'm looking to hire a front end architect that is capable of leading a small development team in London. Consequently I had hoped that some people here might like to discuss further. I can be reached at JamesBTobin (at) Gmail (dot) Com | 10:01.22 |
pink_mist | ator: jt4 has the same ip and ident as ELIZABETH21 | 10:15.47 |
kens | recruitment spam on IRC <boggles> | 10:16.12 |
pietrop | hi all. In trying to "clean" my PDFs I use mutool clean with the following flags '-asdifggg' and it works fine as I can see the PDF's structure. However I'd like the stream to be readable as well, namely I'd like to read the drawing instructions in there as well. Is it possible to get mutool to do that ? | 10:57.26 |
kens | Using -d to decompress should result in the page and form content streams being deciompressed. So that should show the drawing commands | 10:58.07 |
sebras | bsdfan12: mupdf doesn't require python to run. where did you get that impression? | 11:10.50 |
| paulgardiner: perhaps it is easier to discuss the annotations here? | 12:28.44 |
| paulgardiner: or over at #artifex, which ever you prefer. | 12:28.55 |
pietrop | I am still seeing it as encoded in one PDF, another one (which I've created by hand using libreoffice) comes out "clear" | 12:54.19 |
sebras | pietrop: can you attach the original file and list the exact commands you used as a bug report over at https://bugs.ghostscript.com/ ? | 12:56.41 |
pietrop | sebras: will do | 14:06.55 |
sebras | pietrop: thank you! | 14:14.02 |
pietrop | https://bugs.ghostscript.com/show_bug.cgi?id=701449 | 14:26.09 |
| hopefully what I've reported is not utter non-sense | 14:26.29 |
| thanks for having a look | 14:26.39 |
kens | Well teh content stream for page 1 is certainly compressed, or at least, binary | 14:27.38 |
| Weirdly it has no filter stated. | 14:28.28 |
| Ah, its password encrypted that's why | 14:29.10 |
| It has been decompressed, but the decompressed stream has been encrypted, because the original file was password-protected | 14:29.44 |
| I'll have to leave it to sebras to come up with a way to prevent that, I have no ideas | 14:30.16 |
sebras | kens: thanks for looking into it! | 14:31.18 |
| pietrop: to disable encryption you add the -D flag to mutool clean | 14:31.30 |
| ator: did we take on these changes? http://ix.io/1NPr | 14:31.46 |
kens | sebras my mutool (bit old) doesn't have a -D | 14:34.56 |
sebras | kens: yeah, -D was recently added I think. | 14:35.34 |
| kens: if pietrop's version is missing that it would be best to upgrade. | 14:35.58 |
ator | sebras: I think we did take those on | 14:42.19 |
sebras | ator: then I'll delete my TODO-note. :) | 14:45.32 |
pietrop | I did try but it does not work for me folks | 14:57.07 |
| sorry | 14:58.34 |
| I does work | 14:58.40 |
| it does work | 14:58.46 |
| thanks a lot | 14:59.07 |
| I did not know -D even existed | 14:59.16 |
| :-) | 14:59.20 |
| I am not sure my request makes sense at all, but would it be possible to format Did not work - added output of | 15:01.22 |
| ... | 15:01.32 |
| to format TJ [<DSADSADAS>] with a readable string ? | 15:01.50 |
| <HEX> | 15:01.57 |
kens | I'm afraid that's a 'does not make sense' | 15:02.56 |
| The character Encoding need note be (often isn't) ASCII | 15:03.08 |
pietrop | but if it is unicode (and does not need to be) is it possible to print it out or that's gibberish as well ? | 15:05.17 |
kens | It would very ofen be nonsense | 15:05.39 |
ator | pietrop: add -a to encode binary strings as ascii hex | 15:05.49 |
kens | character Encoding in PDF can be totally arbitrary | 15:05.49 |
ator | pietrop: that won't change the content streams, only PDF strings | 15:06.02 |
kens | ator he's referring to an argument to TJ | 15:06.23 |
ator | pietrop: if you want to clean up the content stream syntax too, use the (somewhat experimental) '-c' option | 15:06.31 |
| pietrop: mutool clean -d -D -a -c input.pdf output.pdf | 15:06.54 |
| that will decompress, decrypt, ascii hex encode, and rewrite content streams | 15:07.06 |
kens | I need to update my checkout.... | 15:07.13 |
ator | -dif is probably best, to preserve images and fonts compressed | 15:07.20 |
sebras | kens: :) | 15:07.22 |
pietrop | it does not work but I would appreciate if I can get this right. The hex data contains the code to be used to address the font dictionary to get the glyph to draw ? | 15:09.12 |
| (a series of codes, the ones forming the string) | 15:09.35 |
kens | depending on whether its a Font or a CIDFont the character code (which may involve multiple bytes) is used to index the Encoding or CMap. | 15:11.00 |
| For a Font the Encoding is an array which maps the character code to a glyph name and that is looked up in the CharStrings dictionary ( for type 1C fonts) to find the glyph description. TrueType fonts end up with a GID which they use. | 15:12.10 |
| CIDFonts end up with a CID which is used to find the glyph program | 15:12.21 |
| The Encoding can be totally arbitrary, and for subset fonts usually is. | 15:12.38 |
| So the first character used in a font might get index 1, the second index 2 etc. | 15:12.54 |
pietrop | is there a reference you'd recommend me reading ? Other than the PDF reference. | 15:13.29 |
kens | So Hello World would be character codes 1, 2, 3, 3, 4, 5, 6, 4, 7, 3, 8 | 15:13.29 |
| The PDF Reference is all there is, everything is in there | 15:13.41 |
| Of course some PDF producers do write the text encodings as ASCII | 15:14.01 |
pietrop | I think I get the gist, you can really print a string just given the HEX data, is way more complex than that. You may not be able to knwo the character code at all, which is why sometimes you can render a pdf but you can't copy and paste text from it | 15:14.23 |
| Oks I will have the n-th read then | 15:14.47 |
| to the PDF reference | 15:14.56 |
kens | Yes, this is exactly why copy/paste from some files won't work. You can also add a ToUnicode CMap which maps the character code to a Unicode point, which will allow copy/paste/search | 15:15.04 |
pietrop | how do you extract a font from a PDF to inspect its map ? | 15:16.06 |
kens | Fonts don't have a 'map' there's an Encoding or a CMap depending if its a Font or CIDFont | 15:16.44 |
| These are stored in the Font dictionary in the PDF file | 15:16.54 |
| The actual font data is given by the FontFile key | 15:17.06 |
| In the Font Descriptor if memory serves | 15:17.13 |
ator | pietrop: it's a combination of PDF objects and the embedded font | 15:17.16 |
pietrop | it's fairly complicated, requires time to pick it up. | 15:17.52 |
ator | the Encoding (that kens mentioned) is combined with the encoding of the embedded font file in many strange and fragile ways | 15:18.11 |
kens | No! Its not fairly comlictaed, its *hideously* complicated :-) | 15:18.15 |
pietrop | eheh | 15:18.23 |
ator | it is the stuff of nightmares. | 15:18.28 |
pietrop | I recall of a tool I used to extract a font from a PDF and open it up in a windowing system, I must say I understood 5% of what I was doing at the time. Do you recall the name ? It seemed a fairly old-school Unix kind of thing (the best ones) | 15:20.17 |
| Google does not say | 15:20.28 |
| thanks all for the explanation by the way | 15:20.45 |
kens | I thnk mutool will extract fonts you would probably want fontforge after that I guess | 15:20.56 |
ator | mutool extract will pull out the embedded font file | 15:21.09 |
| but that won't give you all the info you need for the TJ string's encoding | 15:21.46 |
pietrop | what do you ususally use for that ? | 15:24.57 |
ator | I use mutool show to quickly find and print PDF objects, faster than opening the file and reading it in a text editor | 15:26.11 |
| but generally, I have to read the PDF file and look at all the font dictionary stuff | 15:26.29 |
| mutool show input.pdf pages/1/Resources/Font | 15:27.04 |
pietrop | I get to the toUnicode and from there I get a list of <XX> <YY> <ZZ> triples. | 15:31.49 |
| mutool is more powerful than I thoght | 15:32.05 |
| thought | 15:32.07 |
bsdfan12 | Netbsd uses it in the packages (python). I guess as well openbsd if I can remember. | 16:49.03 |
sebras | bsdfan12: there are python packages that wrap mupdf, such as pymupdf https://pypi.org/project/PyMuPDF/ | 17:13.10 |
| it contains python bindings for mupdf | 17:14.27 |
| bsdfan12: is this what you are talking about? | 17:16.18 |
| bsdfan12: while this might not be the fastest way to interact with mupdf, having bindings for different languages doesn't impact the mupdf library itself. so why are you asking why mupdf is not fast and efficient? | 17:17.57 |
| I must be misunderstanding you somehow. | 17:18.13 |
bsdfan12 | sebras: in the netbsd stable, there is python. I guess it is rather a non necessary requirement since mupdf is meant to be less fat than other PDF viewers (okular,...): | 17:51.11 |
pink_mist | bsdfan12: again, there is _NO_ requirement from mupdf's side for python. if your netbsd package requires python, that is netbsd's fault entirely. | 17:52.14 |
bsdfan12 | but we cannot do things to allow developers to create better packages. It is not the first time that mupdf has fat package making. debian based OSses it is the very same. | 17:57.46 |
sebras | pink_mist: bsdfan12: we do have a few python scripts to build cmap tables in mupdf, but neither of those are used in runtime. if netbsd erroneously added a runtime dependency on python to their mupdf package then you need to file a bug with netbsd. | 17:58.00 |
| bsdfan12: https://packages.debian.org/sid/mupdf on debian mupdf does not depend on python. | 17:58.20 |
bsdfan12 | Maybe if developers and package maintainers could compile mupdf with ./configure ; make ; ... method, this would allow maybe their work to be simpler. | 17:58.50 |
| Maybe they would not bring mess and stuffs into the clean mupdf. | 17:59.16 |
sebras | bsdfan12: then that is a problem with the netbsd package maintainers. you probably need to talk to them. | 18:01.07 |
| pink_mist: are you familiar with bsd? do you know if they have a page similar to debians showing the dependencies? | 18:02.16 |
pink_mist | well, there's pkgsrc, which doesn't list a dependency on python: http://pkgsrc.se/print/mupdf | 18:05.48 |
sebras | pink_mist: oh, that's better than the cvs I found: http://cvsweb.netbsd.org/bsdweb.cgi/~checkout~/pkgsrc/print/mupdf/patches/ | 18:06.16 |
| pink_mist: thanks. | 18:06.19 |
| then I don't understand what the issue is, bsdfan12. | 18:07.26 |
bsdfan12 | sebras: there are no issues, it is just a discussion about it. We cannot do anything about it, Unix like system do compile their mupdf the way they want. | 18:10.32 |
pink_mist | sebras: ah, I found http://cdn.netbsd.org/pub/pkgsrc/current/pkgsrc/print/mupdf/README.html too .. which does list python as a runtime dependency | 18:11.54 |
sebras | pink_mist: I think they might be listing the recursive dependencies for libraries we use. | 18:14.45 |
| pink_mist: e.g. they say we depend on harfbuzz, which is correct, but then http://cdn.netbsd.org/pub/pkgsrc/current/pkgsrc/fonts/harfbuzz/README.html states that harfbuzz requires python to run. | 18:15.13 |
pink_mist | ah, that makes some sense then | 18:15.47 |
sebras | I'm confused though. because harfbuzz itself appears to be depending on libicu which is claimed to be depending on python | 18:17.33 |
| and indeed libicu depends on python at runtime according to http://cdn.netbsd.org/pub/pkgsrc/current/pkgsrc/textproc/icu/README.html | 18:17.57 |
| but when i look up libicu63 on debian there is no such dependency!? | 18:18.59 |
bsdfan12 | come on - harfbuzz is really meant to be there ? | 18:19.14 |
sebras | bsdfan12: yes, harfbuzz is used for right to left text processing. | 18:19.44 |
bsdfan12 | bsd or linux are usually putting more stuffs into their packages, it is not first time. | 18:20.05 |
sebras | bsdfan12: in our own releases we do build harfbuzz into mupdf itself without depending on the system libharfbuzz, thereby actually removing some dependencies. | 18:20.45 |
bsdfan12 | I must say that mupdf is not the easiest to compile, so likely they might add more into the package. | 18:21.47 |
sebras | the changes they do to the netbsd package appears to be available here: http://cdn.netbsd.org/pub/pkgsrc/current/pkgsrc/print/mupdf/patches/ | 18:22.58 |
bsdfan12 | yeap :( | 18:24.04 |
sebras | Robin_Watts: ator: on sebras/master I attempt to add support for inverting pixmap luminance. | 18:40.33 |
| I did a simple patch to support this in desktop java, perhaps you'd want to leave that out, but I added it so you can make that decision. | 18:41.08 |
| oh and the very first RFC commit on sebras/wip... why am I seeing those changes to the header file? did we forget to add some changes to another commit | 18:41.46 |
| ? | 18:41.49 |
Robin_Watts | Passing 3 pointers to a function for every pixel.... | 18:42.13 |
| could invert luminance be marked static inline ? | 18:42.34 |
| otherwise lgtm. | 18:43.39 |
sebras | ator: ok, I've updated sebras/master according to previous review comments. now you're up! :) | 23:59.31 |
| <<<Back 1 day (to 2019/08/19) | Forward 1 day (to 2019/08/21)>>> | |