| <<<Back 1 day (to 2020/05/07) | Fwd 1 day (to 2020/05/09)>>> | 20200508 |
Franciman | Hi, how can I use mutool to extract annotations data from a pdf? | 11:20.55 |
sebras | Franciman: is this a one off for just a few files, or is this something you want to automate? | 11:38.23 |
Franciman | I would like to automate | 11:38.37 |
sebras | then my recommemdation would be to look at mutool run | 11:39.26 |
| Franciman: https://mupdf.com/docs/manual-mutool-run.html | 11:40.28 |
Franciman | ok thanks | 11:40.31 |
| and to have a list of all annotations in the pdf? | 11:40.40 |
| this time this would be a one off, I want to look with my eyes | 11:40.52 |
| to understand what happens | 11:40.56 |
sebras | mutool show my.pdf pages.42.Annots would show the annotations on page 42 if I remember correctly. | 11:44.05 |
| mutool show can be used to show any PDF object in the file. mutool show my.pdf trailer e.g. | 11:44.53 |
Franciman | https://bpaste.net/JTUA I get this output when running | 11:45.54 |
| mutool show my.pdf pages.1.Annots | 11:46.02 |
| uhm probably I have an older version | 11:49.09 |
sebras | maybe, we just released 1.17.0 i think | 11:49.24 |
Franciman | 1.12 :> | 11:49.43 |
| damn u ubuntu! | 11:49.47 |
| thank sebras | 11:49.54 |
ator | Franciman: you can also use wildcards, like so: mutool show my.pdf pages.*.Annots.* | 11:54.27 |
Franciman | oh great | 11:54.35 |
| thanks | 11:54.36 |
ator | probably not in 1.12 though, you probably ought to compile from source :) | 11:54.53 |
| (no need to 'make install', the mutool binary is standalone statically linked, no other dependencies) | 11:55.10 |
Franciman | ator, can I only compile mutool? | 11:59.24 |
ator | make build/release/mutool | 11:59.40 |
Franciman | or else I need to install opengl libs | 11:59.42 |
| perfect | 11:59.44 |
| thanks | 11:59.46 |
ator | or the simpler: "make tools" | 11:59.56 |
Franciman | perfect ty | 12:00.35 |
| sebras, don't know if you remember about those pdfs with embedded audio | 19:09.04 |
| finally I was able to extract the audio | 19:09.09 |
| and listen to it! | 19:09.18 |
sebras | Franciman: oh, nice! | 19:17.13 |
| Franciman: when you mentioned it, I do remember it. :) | 19:17.38 |
| Franciman: was it worth the effort to extract it? :) | 19:17.54 |
Franciman | oh yes | 19:17.59 |
| the teacher is continuoing giving lectures in this style | 19:18.10 |
| she attaches audio to pdfs | 19:18.15 |
| so it's really important | 19:18.17 |
sebras | Franciman: right. do you remember the name of the file you gave me? | 19:18.39 |
malc_ | would be "interesting" if the audio just dictated the pdf in its entirety | 19:18.40 |
Franciman | unfortunately i don't sebras | 19:19.04 |
| but I know what stopped us the last time you tried | 19:19.10 |
| the audio format info was given by the pdf, and correctly too | 19:19.21 |
| you just need to explicitly tell the media reader what it is | 19:19.33 |
| it works with ffmpeg and mpv | 19:19.37 |
| so i can convert it to mp3 | 19:19.41 |
sebras | do you have the name of the latest pdf? | 19:23.46 |
| Franciman: I bet it is similar. :) | 19:23.54 |
Franciman | uhm | 19:24.10 |
| w-wait | 19:24.13 |
sebras | Franciman: is she Garbagnati? | 19:24.40 |
Franciman | prolly | 19:24.47 |
| it is | 19:25.00 |
| she is | 19:25.02 |
sebras | ok. | 19:25.03 |
Franciman | not my teacher | 19:25.07 |
| but of a fellow friend | 19:25.13 |
sebras | then I found the file among the 140 other pdfs in my test directory. :) | 19:25.17 |
Franciman | :D | 19:25.23 |
| last time you managed to extract the audio | 19:25.44 |
sebras | unfortunately pdfs accumulate faster there then I'm able to resolve problems. | 19:25.46 |
Franciman | now if you want to play it, it's easy | 19:25.52 |
malc_ | sebras: i'm surprised your test directory is that small :) | 19:26.02 |
Franciman | you need to tell your player that it's unsigned 8 bit samples | 19:26.03 |
| the sample rate is 11025 | 19:26.15 |
| and there is 1 channel | 19:26.18 |
| if you are using something based off ffmpeg, the file format is u8 | 19:26.25 |
| now, the cool thing is that she must have upgraded her recording set, cause in the next lessons kek | 19:26.57 |
| there is a signed 16 bit floating point samples | 19:27.06 |
| 2 channels | 19:27.07 |
| 44100 Hz | 19:27.11 |
| top notch! | 19:27.29 |
| there are* | 19:27.48 |
| and the pdf is like around 150mb :P | 19:28.19 |
| thank you adobe reader for allowing this | 19:28.32 |
| only thing, sebras I couldn't manage to use the javascript api to get the audio annotations :< | 19:29.15 |
sebras | Franciman: I see what I might have done wrong last time! when I extracted the audio binaries it overlooked that they were flate compressed. | 19:34.11 |
| Franciman: so when loading the audio into audacity it was just static all the time. | 19:34.22 |
| d'oh. sorry. | 19:34.25 |
| Franciman: if you are not able to extract the audio using a javascript via mutool run then that's something we probably should address at some point. | 19:35.29 |
| even if mupdf itself doesn't play the audio our API should be able to give the data to an app that might know what to do with it. | 19:36.49 |
| otoh, few PDFs appear to be using embedded audio/videos. | 19:37.15 |
Franciman | fortunately | 19:42.24 |
| <sebras> Franciman: if you are not able to extract the audio using a javascript via mutool run then that's something we probably should address at some point. | 19:42.27 |
| probably I misused the api | 19:42.31 |
| but I couldn't find a way to access the raw dictionary representing the annotation | 19:42.44 |
| and in the PDFAnnotation class there was no suitable method | 19:42.55 |
| <<<Back 1 day (to 2020/05/07) | Forward 1 day (to 2020/05/09)>>> | |