Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2019/08/19)Fwd 1 day (to 2019/08/21)>>>20190820 
bsdfan12 hello world06:37.58 
  why Python is needed my mupdf? I wonder if mupdf shall not meant to be fast and efficient.06:38.24 
pink_mist is it? https://mupdf.com/docs/building.html doesn't mention anything about python at all07:07.35 
  where did you get that information from?07:07.45 
ELIZABETH21 Hello, I'm looking to hire a front end architect that is capable of leading a small development team in London. Consequently I had hoped that some people here might like to discuss further. I can be reached at JamesBTobin (at) Gmail (dot) Com10:01.22 
pink_mist ator: jt4 has the same ip and ident as ELIZABETH2110:15.47 
kens recruitment spam on IRC <boggles>10:16.12 
pietrop hi all. In trying to "clean" my PDFs I use mutool clean with the following flags '-asdifggg' and it works fine as I can see the PDF's structure. However I'd like the stream to be readable as well, namely I'd like to read the drawing instructions in there as well. Is it possible to get mutool to do that ?10:57.26 
kens Using -d to decompress should result in the page and form content streams being deciompressed. So that should show the drawing commands10:58.07 
sebras bsdfan12: mupdf doesn't require python to run. where did you get that impression?11:10.50 
  paulgardiner: perhaps it is easier to discuss the annotations here?12:28.44 
  paulgardiner: or over at #artifex, which ever you prefer.12:28.55 
pietrop I am still seeing it as encoded in one PDF, another one (which I've created by hand using libreoffice) comes out "clear"12:54.19 
sebras pietrop: can you attach the original file and list the exact commands you used as a bug report over at https://bugs.ghostscript.com/ ?12:56.41 
pietrop sebras: will do14:06.55 
sebras pietrop: thank you!14:14.02 
pietrop https://bugs.ghostscript.com/show_bug.cgi?id=70144914:26.09 
  hopefully what I've reported is not utter non-sense14:26.29 
  thanks for having a look14:26.39 
kens Well teh content stream for page 1 is certainly compressed, or at least, binary14:27.38 
  Weirdly it has no filter stated.14:28.28 
  Ah, its password encrypted that's why14:29.10 
  It has been decompressed, but the decompressed stream has been encrypted, because the original file was password-protected14:29.44 
  I'll have to leave it to sebras to come up with a way to prevent that, I have no ideas14:30.16 
sebras kens: thanks for looking into it!14:31.18 
  pietrop: to disable encryption you add the -D flag to mutool clean14:31.30 
  ator: did we take on these changes? http://ix.io/1NPr14:31.46 
kens sebras my mutool (bit old) doesn't have a -D14:34.56 
sebras kens: yeah, -D was recently added I think.14:35.34 
  kens: if pietrop's version is missing that it would be best to upgrade.14:35.58 
ator sebras: I think we did take those on14:42.19 
sebras ator: then I'll delete my TODO-note. :)14:45.32 
pietrop I did try but it does not work for me folks14:57.07 
  sorry14:58.34 
  I does work14:58.40 
  it does work14:58.46 
  thanks a lot14:59.07 
  I did not know -D even existed14:59.16 
  :-)14:59.20 
  I am not sure my request makes sense at all, but would it be possible to format Did not work - added output of15:01.22 
  ...15:01.32 
  to format TJ [<DSADSADAS>] with a readable string ?15:01.50 
  <HEX>15:01.57 
kens I'm afraid that's a 'does not make sense'15:02.56 
  The character Encoding need note be (often isn't) ASCII15:03.08 
pietrop but if it is unicode (and does not need to be) is it possible to print it out or that's gibberish as well ?15:05.17 
kens It would very ofen be nonsense15:05.39 
ator pietrop: add -a to encode binary strings as ascii hex15:05.49 
kens character Encoding in PDF can be totally arbitrary15:05.49 
ator pietrop: that won't change the content streams, only PDF strings15:06.02 
kens ator he's referring to an argument to TJ15:06.23 
ator pietrop: if you want to clean up the content stream syntax too, use the (somewhat experimental) '-c' option15:06.31 
  pietrop: mutool clean -d -D -a -c input.pdf output.pdf15:06.54 
  that will decompress, decrypt, ascii hex encode, and rewrite content streams15:07.06 
kens I need to update my checkout....15:07.13 
ator -dif is probably best, to preserve images and fonts compressed15:07.20 
sebras kens: :)15:07.22 
pietrop it does not work but I would appreciate if I can get this right. The hex data contains the code to be used to address the font dictionary to get the glyph to draw ?15:09.12 
  (a series of codes, the ones forming the string)15:09.35 
kens depending on whether its a Font or a CIDFont the character code (which may involve multiple bytes) is used to index the Encoding or CMap.15:11.00 
  For a Font the Encoding is an array which maps the character code to a glyph name and that is looked up in the CharStrings dictionary ( for type 1C fonts) to find the glyph description. TrueType fonts end up with a GID which they use.15:12.10 
  CIDFonts end up with a CID which is used to find the glyph program15:12.21 
  The Encoding can be totally arbitrary, and for subset fonts usually is.15:12.38 
  So the first character used in a font might get index 1, the second index 2 etc.15:12.54 
pietrop is there a reference you'd recommend me reading ? Other than the PDF reference.15:13.29 
kens So Hello World would be character codes 1, 2, 3, 3, 4, 5, 6, 4, 7, 3, 815:13.29 
  The PDF Reference is all there is, everything is in there15:13.41 
  Of course some PDF producers do write the text encodings as ASCII15:14.01 
pietrop I think I get the gist, you can really print a string just given the HEX data, is way more complex than that. You may not be able to knwo the character code at all, which is why sometimes you can render a pdf but you can't copy and paste text from it15:14.23 
  Oks I will have the n-th read then15:14.47 
  to the PDF reference15:14.56 
kens Yes, this is exactly why copy/paste from some files won't work. You can also add a ToUnicode CMap which maps the character code to a Unicode point, which will allow copy/paste/search15:15.04 
pietrop how do you extract a font from a PDF to inspect its map ?15:16.06 
kens Fonts don't have a 'map' there's an Encoding or a CMap depending if its a Font or CIDFont15:16.44 
  These are stored in the Font dictionary in the PDF file15:16.54 
  The actual font data is given by the FontFile key15:17.06 
  In the Font Descriptor if memory serves15:17.13 
ator pietrop: it's a combination of PDF objects and the embedded font15:17.16 
pietrop it's fairly complicated, requires time to pick it up.15:17.52 
ator the Encoding (that kens mentioned) is combined with the encoding of the embedded font file in many strange and fragile ways15:18.11 
kens No! Its not fairly comlictaed, its *hideously* complicated :-)15:18.15 
pietrop eheh15:18.23 
ator it is the stuff of nightmares.15:18.28 
pietrop I recall of a tool I used to extract a font from a PDF and open it up in a windowing system, I must say I understood 5% of what I was doing at the time. Do you recall the name ? It seemed a fairly old-school Unix kind of thing (the best ones)15:20.17 
  Google does not say15:20.28 
  thanks all for the explanation by the way15:20.45 
kens I thnk mutool will extract fonts you would probably want fontforge after that I guess15:20.56 
ator mutool extract will pull out the embedded font file15:21.09 
  but that won't give you all the info you need for the TJ string's encoding15:21.46 
pietrop what do you ususally use for that ?15:24.57 
ator I use mutool show to quickly find and print PDF objects, faster than opening the file and reading it in a text editor15:26.11 
  but generally, I have to read the PDF file and look at all the font dictionary stuff15:26.29 
  mutool show input.pdf pages/1/Resources/Font15:27.04 
pietrop I get to the toUnicode and from there I get a list of <XX> <YY> <ZZ> triples.15:31.49 
  mutool is more powerful than I thoght15:32.05 
  thought15:32.07 
bsdfan12 Netbsd uses it in the packages (python). I guess as well openbsd if I can remember.16:49.03 
sebras bsdfan12: there are python packages that wrap mupdf, such as pymupdf https://pypi.org/project/PyMuPDF/17:13.10 
  it contains python bindings for mupdf17:14.27 
  bsdfan12: is this what you are talking about?17:16.18 
  bsdfan12: while this might not be the fastest way to interact with mupdf, having bindings for different languages doesn't impact the mupdf library itself. so why are you asking why mupdf is not fast and efficient?17:17.57 
  I must be misunderstanding you somehow.17:18.13 
bsdfan12 sebras: in the netbsd stable, there is python. I guess it is rather a non necessary requirement since mupdf is meant to be less fat than other PDF viewers (okular,...):17:51.11 
pink_mist bsdfan12: again, there is _NO_ requirement from mupdf's side for python. if your netbsd package requires python, that is netbsd's fault entirely.17:52.14 
bsdfan12 but we cannot do things to allow developers to create better packages. It is not the first time that mupdf has fat package making. debian based OSses it is the very same.17:57.46 
sebras pink_mist: bsdfan12: we do have a few python scripts to build cmap tables in mupdf, but neither of those are used in runtime. if netbsd erroneously added a runtime dependency on python to their mupdf package then you need to file a bug with netbsd.17:58.00 
  bsdfan12: https://packages.debian.org/sid/mupdf on debian mupdf does not depend on python.17:58.20 
bsdfan12 Maybe if developers and package maintainers could compile mupdf with ./configure ; make ; ... method, this would allow maybe their work to be simpler.17:58.50 
  Maybe they would not bring mess and stuffs into the clean mupdf.17:59.16 
sebras bsdfan12: then that is a problem with the netbsd package maintainers. you probably need to talk to them.18:01.07 
  pink_mist: are you familiar with bsd? do you know if they have a page similar to debians showing the dependencies?18:02.16 
pink_mist well, there's pkgsrc, which doesn't list a dependency on python: http://pkgsrc.se/print/mupdf18:05.48 
sebras pink_mist: oh, that's better than the cvs I found: http://cvsweb.netbsd.org/bsdweb.cgi/~checkout~/pkgsrc/print/mupdf/patches/18:06.16 
  pink_mist: thanks.18:06.19 
  then I don't understand what the issue is, bsdfan12.18:07.26 
bsdfan12 sebras: there are no issues, it is just a discussion about it. We cannot do anything about it, Unix like system do compile their mupdf the way they want.18:10.32 
pink_mist sebras: ah, I found http://cdn.netbsd.org/pub/pkgsrc/current/pkgsrc/print/mupdf/README.html too .. which does list python as a runtime dependency18:11.54 
sebras pink_mist: I think they might be listing the recursive dependencies for libraries we use.18:14.45 
  pink_mist: e.g. they say we depend on harfbuzz, which is correct, but then http://cdn.netbsd.org/pub/pkgsrc/current/pkgsrc/fonts/harfbuzz/README.html states that harfbuzz requires python to run.18:15.13 
pink_mist ah, that makes some sense then18:15.47 
sebras I'm confused though. because harfbuzz itself appears to be depending on libicu which is claimed to be depending on python18:17.33 
  and indeed libicu depends on python at runtime according to http://cdn.netbsd.org/pub/pkgsrc/current/pkgsrc/textproc/icu/README.html18:17.57 
  but when i look up libicu63 on debian there is no such dependency!?18:18.59 
bsdfan12 come on - harfbuzz is really meant to be there ?18:19.14 
sebras bsdfan12: yes, harfbuzz is used for right to left text processing.18:19.44 
bsdfan12 bsd or linux are usually putting more stuffs into their packages, it is not first time.18:20.05 
sebras bsdfan12: in our own releases we do build harfbuzz into mupdf itself without depending on the system libharfbuzz, thereby actually removing some dependencies.18:20.45 
bsdfan12 I must say that mupdf is not the easiest to compile, so likely they might add more into the package.18:21.47 
sebras the changes they do to the netbsd package appears to be available here: http://cdn.netbsd.org/pub/pkgsrc/current/pkgsrc/print/mupdf/patches/18:22.58 
bsdfan12 yeap :(18:24.04 
sebras Robin_Watts: ator: on sebras/master I attempt to add support for inverting pixmap luminance.18:40.33 
  I did a simple patch to support this in desktop java, perhaps you'd want to leave that out, but I added it so you can make that decision.18:41.08 
  oh and the very first RFC commit on sebras/wip... why am I seeing those changes to the header file? did we forget to add some changes to another commit18:41.46 
  ?18:41.49 
Robin_Watts Passing 3 pointers to a function for every pixel....18:42.13 
  could invert luminance be marked static inline ?18:42.34 
  otherwise lgtm.18:43.39 
sebras ator: ok, I've updated sebras/master according to previous review comments. now you're up! :)23:59.31 
 <<<Back 1 day (to 2019/08/19)Forward 1 day (to 2019/08/21)>>> 
ghostscript.com #ghostscript
Search: