Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2020/05/07)Fwd 1 day (to 2020/05/09)>>>20200508 
Franciman Hi, how can I use mutool to extract annotations data from a pdf?11:20.55 
sebras Franciman: is this a one off for just a few files, or is this something you want to automate?11:38.23 
Franciman I would like to automate11:38.37 
sebras then my recommemdation would be to look at mutool run11:39.26 
  Franciman: https://mupdf.com/docs/manual-mutool-run.html11:40.28 
Franciman ok thanks11:40.31 
  and to have a list of all annotations in the pdf?11:40.40 
  this time this would be a one off, I want to look with my eyes11:40.52 
  to understand what happens11:40.56 
sebras mutool show my.pdf pages.42.Annots would show the annotations on page 42 if I remember correctly.11:44.05 
  mutool show can be used to show any PDF object in the file. mutool show my.pdf trailer e.g.11:44.53 
Franciman https://bpaste.net/JTUA I get this output when running11:45.54 
  mutool show my.pdf pages.1.Annots11:46.02 
  uhm probably I have an older version11:49.09 
sebras maybe, we just released 1.17.0 i think11:49.24 
Franciman 1.12 :>11:49.43 
  damn u ubuntu!11:49.47 
  thank sebras11:49.54 
ator Franciman: you can also use wildcards, like so: mutool show my.pdf pages.*.Annots.*11:54.27 
Franciman oh great11:54.35 
  thanks11:54.36 
ator probably not in 1.12 though, you probably ought to compile from source :)11:54.53 
  (no need to 'make install', the mutool binary is standalone statically linked, no other dependencies)11:55.10 
Franciman ator, can I only compile mutool?11:59.24 
ator make build/release/mutool11:59.40 
Franciman or else I need to install opengl libs11:59.42 
  perfect11:59.44 
  thanks11:59.46 
ator or the simpler: "make tools"11:59.56 
Franciman perfect ty12:00.35 
  sebras, don't know if you remember about those pdfs with embedded audio19:09.04 
  finally I was able to extract the audio19:09.09 
  and listen to it!19:09.18 
sebras Franciman: oh, nice!19:17.13 
  Franciman: when you mentioned it, I do remember it. :)19:17.38 
  Franciman: was it worth the effort to extract it? :)19:17.54 
Franciman oh yes19:17.59 
  the teacher is continuoing giving lectures in this style19:18.10 
  she attaches audio to pdfs19:18.15 
  so it's really important19:18.17 
sebras Franciman: right. do you remember the name of the file you gave me?19:18.39 
malc_ would be "interesting" if the audio just dictated the pdf in its entirety19:18.40 
Franciman unfortunately i don't sebras19:19.04 
  but I know what stopped us the last time you tried19:19.10 
  the audio format info was given by the pdf, and correctly too19:19.21 
  you just need to explicitly tell the media reader what it is19:19.33 
  it works with ffmpeg and mpv19:19.37 
  so i can convert it to mp319:19.41 
sebras do you have the name of the latest pdf?19:23.46 
  Franciman: I bet it is similar. :)19:23.54 
Franciman uhm19:24.10 
  w-wait19:24.13 
sebras Franciman: is she Garbagnati?19:24.40 
Franciman prolly19:24.47 
  it is19:25.00 
  she is19:25.02 
sebras ok.19:25.03 
Franciman not my teacher19:25.07 
  but of a fellow friend19:25.13 
sebras then I found the file among the 140 other pdfs in my test directory. :)19:25.17 
Franciman :D19:25.23 
  last time you managed to extract the audio19:25.44 
sebras unfortunately pdfs accumulate faster there then I'm able to resolve problems.19:25.46 
Franciman now if you want to play it, it's easy19:25.52 
malc_ sebras: i'm surprised your test directory is that small :)19:26.02 
Franciman you need to tell your player that it's unsigned 8 bit samples19:26.03 
  the sample rate is 1102519:26.15 
  and there is 1 channel19:26.18 
  if you are using something based off ffmpeg, the file format is u819:26.25 
  now, the cool thing is that she must have upgraded her recording set, cause in the next lessons kek19:26.57 
  there is a signed 16 bit floating point samples19:27.06 
  2 channels19:27.07 
  44100 Hz19:27.11 
  top notch!19:27.29 
  there are*19:27.48 
  and the pdf is like around 150mb :P19:28.19 
  thank you adobe reader for allowing this19:28.32 
  only thing, sebras I couldn't manage to use the javascript api to get the audio annotations :<19:29.15 
sebras Franciman: I see what I might have done wrong last time! when I extracted the audio binaries it overlooked that they were flate compressed.19:34.11 
  Franciman: so when loading the audio into audacity it was just static all the time.19:34.22 
  d'oh. sorry.19:34.25 
  Franciman: if you are not able to extract the audio using a javascript via mutool run then that's something we probably should address at some point.19:35.29 
  even if mupdf itself doesn't play the audio our API should be able to give the data to an app that might know what to do with it.19:36.49 
  otoh, few PDFs appear to be using embedded audio/videos.19:37.15 
Franciman fortunately19:42.24 
  <sebras> Franciman: if you are not able to extract the audio using a javascript via mutool run then that's something we probably should address at some point.19:42.27 
  probably I misused the api19:42.31 
  but I couldn't find a way to access the raw dictionary representing the annotation19:42.44 
  and in the PDFAnnotation class there was no suitable method19:42.55 
 <<<Back 1 day (to 2020/05/07)Forward 1 day (to 2020/05/09)>>> 
ghostscript.com #ghostscript
Search: