| <<<Back 1 day (to 2019/06/09) | Fwd 1 day (to 2019/06/11)>>> | 20190610 |
pulsarpietro | Hi all, I am experimenting the MUPDF library and I've a read at the docs folder, however I couldn't find the API documentation. | 08:34.24 |
kens | [ul | 08:34.38 |
pulsarpietro | Is it somewhere in the source path ? Ot is it generated a build time ? | 08:34.50 |
kens | All the documentation is, as far as I know, in the source tree.Either as PDF or in the header files. I am not, however a MuPDF developer | 08:35.25 |
| Everyone is travsalling back from a staff meeting currently, so it may be a few hours before they are able to respond | 08:35.50 |
| If you stick around, or read the logs at your convenience, someone more knowledgeable than me will reply | 08:36.16 |
pulsarpietro | Thanks a lot | 08:36.53 |
| are you guys based in the US ? | 08:50.11 |
kens | All over the place :-) | 08:50.21 |
pulsarpietro | eheh OK | 08:50.27 |
kens | The MuPDF developers are mostly European | 08:50.33 |
pulsarpietro | So no chance I can get the right timetable | 08:50.42 |
| ah OK. That's good I am based in Europe too | 08:50.56 |
| *right timezone | 08:51.01 |
kens | Well, there's someone around 'most' of the time. One of the MuPDF developers spends up to 6 months at a time in Taiwan, one is in Sweden, one in the UK, some in California | 08:51.23 |
| But when we have staff meetings we are all in the same place. Then we all have to get home, and people often tack a little holiday on the end too, if we are somewhere interesting for the meeting | 08:52.25 |
pulsarpietro | Have you ever build the library from sources ? | 09:02.54 |
kens | I do fairly frequently, yes | 09:03.06 |
pulsarpietro | What's the target you give out ? I am using the default but apparently it fails with an obscure assembler error | 09:04.04 |
| Fatal error: can't create build/release/thirdparty/freeglut/src/fg_callbacks.o: Permission denied | 09:04.05 |
kens | You must be using some flavour of Linux, I usually use Windows | 09:04.36 |
Robin_Watts | That's not an assembler error. | 09:04.44 |
pulsarpietro | Right, I use Debian | 09:04.44 |
Robin_Watts | That's a permission error. | 09:04.52 |
kens | Morning Robin_Watts | 09:04.56 |
Robin_Watts | which suggests that you have a problem with the writability of that file. | 09:05.16 |
pulsarpietro | Indeed, a bit of context may help | 09:05.16 |
Robin_Watts | That's out of our control. | 09:05.21 |
| kens: Morning. | 09:05.24 |
pulsarpietro | CC build/release/thirdparty/freeglut/src/fg_callbacks.o Assembler messages: Fatal error: can't create build/release/thirdparty/freeglut/src/fg_callbacks.o: Permission denied Makethird:372: recipe for target 'build/release/thirdparty/freeglut/src/fg_callbacks.o' failed make: *** [build/release/thirdparty/freeglut/src/fg_callbacks.o] Error 2 | 09:05.25 |
| ahhrg, bad formatting. Sorry | 09:05.34 |
Robin_Watts | is not here for another hour at least, really. | 09:05.47 |
pulsarpietro | Assembler messages: | 09:06.02 |
| Fatal error: can't create build/release/thirdparty/freeglut/src/fg_callbacks.o: Permission denied | 09:06.03 |
| Oks, thanks. Will double check my stuff. | 09:06.14 |
kens | Well, it 'seems like' the compiler can't create the file, either because the directory doesn't exist, or you don;'t have write permissions in the directory | 09:06.15 |
| Hmm, wait, why is it creating the .o file in the *src* directory ? | 09:06.44 |
| Did you pull the sources from our Git repository ? | 09:07.10 |
kens | boots up a Linux | 09:07.36 |
pulsarpietro | Yes but wait - I was the silly man in the room | 09:08.31 |
| :-) | 09:08.39 |
kens | Ah, well I needed to clone the repositroy in my current setup anyway | 09:08.55 |
pulsarpietro | It's all right. I probably started a build as a root user before - while installing the needed packages. | 09:09.12 |
| sorry about that | 09:09.21 |
kens | Oh, that would probably not work yes | 09:09.23 |
| Not a problem, I should have a MuPDF available to me in Linux as well, but I upgraded Ubuntu a while back and hadn't gotten round to it yet | 09:09.54 |
| So this at least prompted me to do htat | 09:10.08 |
| cd mupdf | 09:10.25 |
| LOL | 09:10.30 |
pulsarpietro | I downloaded the 1.15.0 source tarball, I am not on mainstream. I think it is reasonable to tell you what I am trying to experiment with. I'd like to "open" a PDF and scan for all objects in a given page, gathering information when I want to, this is not for a specific reason as yet but more for my understanding. An initial experiment would be to see if there is a recurrent object in all pages which can be identified as a waterm | 09:14.59 |
| Please tell me if you reckon I am completely crazy and what I am trying to do makes no sense, it would help me as to know that I don't know enough ... it's something I guess | 09:16.03 |
kens | I'm 'reasonably' sure you can do that, but I'm no expert on the internals. A watermark might be a single object common to all pages, or it might be a number of (identical) objects one per page. It would be easy to spot the first case, less so for the second. Thgouh an annotaiton would be easuier to see. | 09:16.28 |
| I just pulled the current HEAD from Git and iits compiling now | 09:17.22 |
| Hmm, no GL, so it failed to build | 09:18.26 |
pulsarpietro | I needed to install CCcc2018 | 09:20.06 |
| ops | 09:20.09 |
| libglu1-mesa-dev freeglut3-dev | 09:20.19 |
kens | Its OpenGL I'm missing, just installing it | 09:20.22 |
| <sigh> one step forward..... | 09:22.06 |
pulsarpietro | libxrandr-dev libxi-dev | 09:25.28 |
| that's is what I needed to install as well | 09:25.33 |
kens | I need X11 | 09:25.40 |
pulsarpietro | would you have, off the top of your head, a good example of a C file which traverses a PDF file and get some data out of it ? :-) | 09:26.21 |
kens | Not me, sorry | 09:26.38 |
pulsarpietro | np | 09:26.45 |
kens | OK looks like that build completed | 09:28.10 |
pulsarpietro | ;-) | 09:37.29 |
ator | jarindyk2: the mutool run javascript tool can be used to traverse and print the values of form fields fairly easily | 09:37.49 |
| jarindyk2: there's also "mutool show file.pdf form" that prints all the form fields (but I'm not sure if it prints the current value, if not that should be trivial to add, I use it to find out what javascripts and actions are hooked up to the fields when debugging files) | 09:38.36 |
kens | Morning ator, was your flight OK ? Looked busy..... | 09:38.38 |
ator | kens: very busy, very late | 09:38.50 |
| but we got home eventually | 09:38.55 |
kens | :-( | 09:38.56 |
| 30 minute delay for us | 09:39.18 |
ator | yeah, I saw your flight (and many others) having delays | 09:39.46 |
| and passport control back in sweden was another 30 minute wait in line... that's a first :( | 09:40.01 |
kens | 30 minutes at the end of the day is not unexpected really. Of coruse we didn't have to do passport control.... | 09:40.18 |
ator | kens: no, but usually you have 10hrs of flying to do, 30 minute delay on a 30 minute flight is quite a bit more proportionally :) | 09:40.56 |
kens | Well, its a 70 minute flight, but yes | 09:41.15 |
ator | kens: oh, that long? | 09:41.25 |
kens | We should have a meeting in Sweden. Or at least Copenhagen | 09:41.32 |
| ator most of the time is spent going up and then down. | 09:41.43 |
ator | kens: yeah. and circling waiting for a landing slot :) | 09:41.57 |
kens | Its 300+ miles, so that's only 30 minutes at cruising speed | 09:41.59 |
| Gatwick is better than Heathrow when it comes to holding patterns :-) | 09:42.39 |
ator | pulsarpietro: there are some undocumented features that could help with what you're wanting to do, if you're willing to dig through the source a bit | 09:44.25 |
| I'm working on a new set of API documentation, but it's far from complete yet. | 09:44.50 |
| https://ghostscript.com/~robin/mupdf_explored.pdf can also be a handy bit of documentation. bits of it may be a bit out of date, but as a general overview it should be good. | 09:45.58 |
| section 31.4 "PDF Operator Proecessors" is what I think may help your analyse and find common patterns | 09:47.15 |
| probably easier than at the "Device" level where it's all baked down into low level graphics drawing commands | 09:47.41 |
pulsarpietro | many thanks indeed. I will have a look to both. I'd like to dig into the sources but it's probably going to take me LONG time. | 09:48.20 |
| I am still reading a lot of the PDF standards and the book "Developing with PDF" which I finding very useful | 09:48.50 |
kens | Hmm, don' tknow that one | 09:49.28 |
pulsarpietro | If you think the easiest way to start off is to use the mutool run javascript I'd go for it | 09:49.30 |
ator | pulsarpietro: my suggestion would be to read the PDF Reference 1.3 (the last "good" version before Adobe started bloating the format with lots of useless features) | 09:49.31 |
pulsarpietro | I need to start somwhere :) | 09:49.34 |
ator | it's short and readable, and if you understand PDF 1.3, then you've got most of the basics | 09:49.51 |
pulsarpietro | Which is the one I've got here, only reason is that it was the cheapest :) | 09:50.04 |
ator | PDF 1.4 adds a horribly complicated transparency model | 09:50.14 |
| PDF 1.5 adds compressed object streams, which are just bleh, but trivial to understand if you already know the PDF format | 09:50.45 |
| PDF 1.6 and 1.7 just add more annotation types and encryption algorithms | 09:51.01 |
| https://www.adobe.com/devnet/pdf/pdf_reference_archive.html here's where you can download the different specs | 09:51.39 |
pulsarpietro | cheers | 09:51.59 |
kens | ator, ran across this interesting SO question regarding Acrobat and encryption | 09:53.09 |
| https://stackoverflow.com/questions/56507280/trying-to-figure-out-why-a-pdf-is-invalid-for-acrobat-reader-but-opens-fine-in-a?noredirect=1#comment99621141_56507280: | 09:53.09 |
| Looks like Acrobat doesn't like Version 2 encryption handler with a PDF 1.5 file using compressed objects and xref | 09:53.39 |
| GS and MuPDF, of course, are entirely happy with it | 09:53.50 |
| But for creation of encrypted PDF files, it may be something to consider, maybe sebras shoudl look at this with his work on producinfg encrypted PDF files. | 09:54.55 |
ator | kens: huh, that's ... sad but not surprising | 09:55.40 |
kens | Yeah :-( | 09:55.48 |
ator | we don't create compressed object streams, but should we do, that's definitely be something to keep in mind | 09:56.00 |
kens | I can't prove the hypothesis, but everyone except Acrobat can open the file, and I can't make a file like that from Acrobat | 09:56.11 |
| Ah, wasn't aware you didn't do compressed streams yet. That's a TODO for me too | 09:56.30 |
ator | kens: the whole "new" security handler stuff they added in recentish versions of adobe with the per stream CryptFilt stuff seems not very well thought through | 09:57.01 |
| kens: I hate the very idea, it makes the files just gobble up more memory | 09:57.16 |
kens | I'd say the whole encryption thing is not well thought through :-) | 09:57.24 |
ator | if you're concerned about transfer bandwidth, web servers can do gzip compression, and if you worry about disk space, stop creating hugely bloated PDF files :) | 09:57.47 |
| kens: yeah. there's just so many ways the information can leak. | 09:58.04 |
kens | Totally true,. Also the compressed objects and xrefs really don't save much, generally, which is why I#'ve not put any priority on it | 09:58.24 |
ator | IMO it's only there to "enforce" the permissions by making it a lot of work to get around it | 09:58.43 |
| s/O/belief/ | 09:59.08 |
kens | And tehre are plenty of sites that will do it for you, or tell you how. | 09:59.08 |
ator | kens: or just run it through any open source software.... | 09:59.23 |
kens | Yes, many of them recommend using GS | 09:59.33 |
ator | now you can use mutool clean -D (or -E with a new password, using sebras' code) to strip or change the password, but don't mention it to management. | 10:01.07 |
kens | :-D | 10:01.18 |
ator | convincing them what a pointlessly bad idea it would be to start enforcing permissions in open source software would be ... a waste of everybody's time. | 10:01.58 |
kens | Well, GS does enforce permissions, but if you open a PDF file and run it through pdfwrite, you end up with a PDF file with no restrictions | 10:02.45 |
| Assuming you don't need a user password to even open it of course | 10:03.05 |
ator | kens: how do you enforce the "no-print" permission? | 10:03.15 |
kens | I think we simply refuse to process the file | 10:03.34 |
| Becasuse we re, after all, a 'printer' | 10:03.43 |
ator | kens: so how do you run it through pdfwrite in that case? | 10:03.54 |
kens | Well in that case you can't of coruse. | 10:04.05 |
ator | kens: supporting 'no-annotate' is trivial in GS :) | 10:04.15 |
kens | But as you say, its trivial to change the software | 10:04.15 |
| Actually, we don't support no-annotate, and with GS and pdfmark, you can do it! | 10:04.45 |
ator | kens: oh! :) | 10:04.57 |
kens | I really don't propose to try and deal with that, we use pdfmark internally for too many things. Like passing existing annotations | 10:05.46 |
ator | yeah, and as you said, in open source software, it's trivial to remove any such checks anyway, so why waste time making life harder for users? | 10:07.16 |
pulsarpietro | hi all, I am trying to use the mutool run trace-device.js but I get an error when trying to do so. I apologise in advance if I haven't spotted a silly mistake I've made .. | 10:42.58 |
| mutool run docs/examples/trace-device.js ~/pdfs/minimal.pdf 1 | 10:43.07 |
| ReferenceError: 'scriptArgs' is not defined | 10:43.19 |
ator | pulsarpietro: sounds like your mutool is too old | 10:52.48 |
| pulsarpietro: what does 'mutool -v' say? | 10:52.52 |
pulsarpietro | 1.9a | 10:58.08 |
| I've got this tarball, is it too old mupdf-1.15.0-source ? | 10:58.35 |
ator | well, 1.9 is several years old by now | 11:00.30 |
| that tarball is new enough, but that's not the mutool you ran (it should print "1.15.0") | 11:00.58 |
pulsarpietro | shall I clone the GIT repo and jump to a branch ? | 11:01.00 |
| oh dear | 11:01.19 |
| forgot about it | 11:01.23 |
| forget about it - I meant to override the path but I may have opened another terminal ... | 11:01.43 |
| sorry about that II | 11:02.29 |
sebras | Robin_Watts: did you ever try Alexei Podtelezhnikov's suggestion of including intrin.h and using some pragma? | 11:29.50 |
Robin_Watts | sebras: No. It's on my list to try. | 11:32.03 |
sebras | Robin_Watts: ok, I was worried that you might forget after the meeting. :) | 11:32.30 |
| (and having forgotten myself, I found the conversation in my inbox just now) | 11:32.50 |
pulsarpietro | hello, sorry for hammering here. I am getting a bit lost into the pdf-run/murun.c files. Is there a quick win to show, for each page leaf, the objects referenced in the "Contents" key using the javascript binding ? | 12:07.02 |
sebras | Robin_Watts: that result still means that the change is in freetype, right? | 12:13.14 |
Robin_Watts | sebras: Yes. We still need to have a slightly changed freetype thirdparty thing until we take a new release from them. | 12:13.43 |
ator | pulsarpietro: the "Contents" entry of a page object is a stream, which contains drawing commands | 12:14.28 |
Robin_Watts | but I was figuring we'd wait for them to put a commit on their dev branch, then cherry pick that onto a branch that hangs off the last release tag. | 12:14.29 |
| then use that. | 12:14.35 |
sebras | Robin_Watts: alright, I'll keep my out for that one, and update the thridparty html accordingly. ;) | 12:14.40 |
Robin_Watts | sebras: Cool. | 12:14.47 |
ator | pulsarpietro: some drawing commands can refer to other resources (fonts, images, other content streams) that are defined in the page Resources dictionaries | 12:15.17 |
sebras | Robin_Watts: seems like it is this one..? https://git.savannah.gnu.org/cgit/freetype/freetype2.git/commit/?id=e13c1f46dc1afb1b2287849be5fa74ef70e0607b | 12:15.40 |
Robin_Watts | sebras: Looks like it, yes. | 12:16.02 |
| Do you want to handle that or should I? | 12:16.23 |
ator | pulsarpietro: if you want to look at the contents of a page, the 'docs/example/trace-device.js' is one place to start | 12:16.51 |
sebras | Robin_Watts: you can compile test this, so it is better if you do it. | 12:16.54 |
Robin_Watts | sebras: OK. I'll get a review up after lunch. | 12:17.06 |
sebras | Robin_Watts: I was referring to the 2.10.1 or 2.11.0 upstream tag. :) | 12:17.08 |
pulsarpietro | Doesn't it reference a stream, or an array of stream. It may be me referring to an older PDF specs though. | 12:18.25 |
| https://web.archive.org/web/20101214132912/ href="http://partners.adobe.com/public/developer/en/pdf/PDFReference13.pdf">http://partners.adobe.com/public/developer/en/pdf/PDFReference13.pdf | 12:18.25 |
| page 620 (or 604 in the actual document) | 12:18.43 |
Robin_Watts | Page contents are a stream or array of streams, yes | 12:19.06 |
| The trace device converts those to a printable list of graphics operations. | 12:20.05 |
pulsarpietro | the trace-device goes really low level I reckon, right into the stream's commands. Within my silly attempts would like to print all *references* contained in the "Content" section for all page leaves. | 12:21.34 |
| *The silly example I am working at ... | 12:22.00 |
| In other words, I need only to access the Page Objects | 12:23.01 |
ator | pulsarpietro: if you run the file through "mutool clean -d input.pdf output.pdf" then you can open up the 'output.pdf' file in a text editor and have an easier way of looking at it | 12:23.49 |
| the '-d' option decompresses all the streams, so you can look at them in the editor without compression getting in the way | 12:24.11 |
| the "mutool show file.pdf pages/1" will display the page object for page 1 | 12:25.35 |
| mutool show file.pdf pages/1/Contents will show the content stream | 12:25.56 |
| mutool show file.pdf pages/1/Resources will show the resource dictionaries | 12:26.06 |
| etc. | 12:26.06 |
sebras | ator: how about the first (two?) commits on sebras/master? | 13:23.31 |
ator | sebras: LGTM. | 13:25.39 |
Robin_Watts | ator, sebras: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=3c75966a611468faeac8d8ef290817df508dd50c | 13:39.51 |
pulsarpietro | ator: thanks | 13:40.00 |
ator | Robin_Watts: LGTM | 13:41.42 |
Robin_Watts | Ta. I'll push it once the (belt and braces) cluster run finishes. | 13:42.26 |
sebras | Robin_Watts: LGTM2. | 13:47.20 |
Robin_Watts | Ta2. | 13:47.37 |
pulsarpietro | ator: can I render a single content item only on the screen somehow ? For instance if I've got an array of Contents which is [ 364 0 R 359 0 R 63 0 R 360 0 R 365 0 R ] and I want to render them one by one on the screen to see they are. | 13:56.11 |
| *what they are | 13:56.27 |
kens | Not reliably | 13:56.40 |
| Each content stream need not be independent | 13:56.49 |
| The graphics state may be set up by a prior content stream, and relied upon by a future one | 13:57.16 |
Robin_Watts | pulsarpietro: As an example, your complete strean might be "0 0 100 100 re f" say. | 13:57.31 |
pulsarpietro | yeah it makes sense | 13:57.46 |
Robin_Watts | And your first stream could be "0 0 1" and the second "00 100 re f" | 13:57.50 |
| You can't even assume it'll break at a token boundary. | 13:58.10 |
pulsarpietro | My naive idea was that a watermark would be the a listed content among all pages | 13:58.44 |
Robin_Watts | pulsarpietro: Are you looking to REMOVE watermarks? | 13:59.07 |
pulsarpietro | but it could be anything, "embedded" within all page's streams | 13:59.19 |
Robin_Watts | Yes. | 13:59.23 |
pulsarpietro | I am playing with it yes | 13:59.28 |
Robin_Watts | It's a fairly frequent thing that the first thing a stream does is to clear the entire backdrop with a fill operation./ | 13:59.52 |
| (It's not required, but lots of PDF producers do it). | 14:00.04 |
| so any watermark done by prepending a "draw the watermark" bit of content, then the original content wouldn't work. | 14:00.32 |
| so watermarks are more likely to be done using transparency operations AFTER the page content has been drawn. | 14:00.53 |
| But, you can't be sure that the PDF graphics state at the end of the page writing will be sensible. | 14:01.28 |
pulsarpietro | Oks - I am completely off track then. I did not expect it to be simple though. | 14:01.34 |
Robin_Watts | So you might thing you can do "q" <original PDF content> "Q" <watermark content> | 14:02.04 |
| BUT even that can fail, as the original PDF content might not have matching q/Q counts. | 14:02.24 |
| This is part of the reason that our pdf filter processor exists; so we can sanitize existing streams to make sure they are 'sane' so we can append watermarks etc properly. | 14:03.23 |
pulsarpietro | This are complicated stuff looks like. | 14:06.21 |
kens | It would be a lot less complicated if PDF producers created decent files | 14:06.49 |
pulsarpietro | sorry folks, I run mutool clean -d input.txt output.txt 1 | 14:09.32 |
| but the output file is unreadable (vim). The input source is uncompressed, not sure if that makes any difference | 14:10.22 |
| and by saying that I mean that I can read PDF objects (tags and this kind of stuff, not streams). I'd like to see the streams' contents (aka the operations) | 14:11.33 |
kens | objects are often uncompressed (compressed object and xref streams are a PDF 1.5 feature) so that doesn't mean its an cunompressed file. | 14:12.09 |
| When you decompress it any images and fotns will be uncompressed as well | 14:12.30 |
| Which means you end up with a file with loads of binary in it, this often confuses editors | 14:12.46 |
pulsarpietro | is there a specific tool for handling "source" pdfs ? | 14:14.57 |
kens | Not sure what you mean. | 14:15.13 |
Robin_Watts | mutool clean -difggg input.txt output.txt 1 | 14:15.50 |
| D'Oh. wait. | 14:16.03 |
| mutool clean -difggg input.pdf output.pdf 1 | 14:16.11 |
kens | was still reading the 'help' | 14:16.31 |
Robin_Watts | That takes input.pdf and produces output.pdf from page 1 of it. | 14:16.34 |
| -d says "decompress the contents" | 14:16.47 |
| -i says "don't decompress images" | 14:16.54 |
| -f says "don't decompress fonts" | 14:17.02 |
pulsarpietro | Yeah I am reading the documentation | 14:17.13 |
Robin_Watts | -ggg says "garbage collect away as many objects as possible | 14:17.19 |
pulsarpietro | It does work | 14:17.29 |
| :) | 14:17.34 |
Robin_Watts | So output.pdf should be a fairly readable form. | 14:17.51 |
| If you throw in a -s, then we'll "sanitize" the page contents too. | 14:18.09 |
| but then you're not really looking at the source. | 14:18.25 |
ator | pulsarpietro: you can add -a to the mutool clean options, to make sure everything is ASCII (but this will sometimes hide stream contents by asciihex encoding them if they have binary data) | 14:37.40 |
pulsarpietro | Guys - I'd like to thank you for your generosity. I know it can be a pain to explain things to a newby. I've manage to remove the watermark to my pdf as it is an image listed under XObject and it is applied using transparency. I don't think it is a universal solution but it does the job in my case and more important I got a better understanding of the document's structure. | 14:54.46 |
kens | Its always good to have a project to give you a goal | 14:55.19 |
| Its not entirely unusual to have watermarks defined that way, but they are relatively easy to remove, the producers got wise to that and made it harder in later versions of their software | 14:55.55 |
sebras | wrt 701182 I was also thinking about named destinations and outlines which I suspect may both leak information. | 14:58.01 |
| to make redactions water tight is a hard problem imho. | 14:59.06 |
ator | sebras: garbage collecting everything not referenced anymore after redacting would probably be necessary -- name tables, resources, table of content outline entries, etc. | 15:01.19 |
| name trees* | 15:01.31 |
sebras | ator: yes, and since redaction is an internal destructive operation I'd expect that removing anything unused (but not actively redacted) would not pose a problem. | 15:06.51 |
pulsarpietro | hello, I've stumbled upon a bizarre behaviour of mutool clean - if I give to it an awkward file name it seems to hang. Try with mutool clean -asdifggg pdffile ./pdf.p59B6bGnE for example. | 17:13.44 |
ator | pulsarpietro: try with one fewer 'g' | 17:21.18 |
| the first two are useful, the third one makes things a *lot* slower | 17:21.44 |
| I do not know why it hangs with three g's, that would be a question for Robin_Watts | 17:22.17 |
| the -ggg deduplication can make for some fairly small savings, but at a huge processing cost | 17:22.54 |
Robin_Watts | Really? The filename makes a difference? | 17:22.59 |
ator | Robin_Watts: no, but -ggg with those other options do | 17:23.09 |
Robin_Watts | ator: pulsarpietro seems to be claiming the name makes a difference. | 17:24.05 |
pulsarpietro | Nono here the filename makes the whole difference | 17:24.09 |
Robin_Watts | I agree that using 3 g's will make it much slower (so much so, potentially, that you might think it's hung) | 17:24.33 |
pulsarpietro | I've cut down my doc to be 2 pages so to make that clear | 17:24.57 |
| mutool clean -asdifggg TAKKO_1Q17.pdf.sub ./pdf.p59B6bGnE | 17:25.05 |
| that does not terminate - waited for a few mins | 17:25.15 |
| time mutool clean -asdifggg TAKKO_1Q17.pdf.sub ./a.pdf real0m1.104s user0m1.000s sys0m0.088s | 17:25.29 |
| I can't rule out I am making some silly mistake, but I can't see what at the moment | 17:25.51 |
ator | pulsarpietro: it does not treat your second argument as an output filename; since it doesn't end with ".pdf" it tries to parse it as a page number/range | 17:26.02 |
pulsarpietro | mutool version 1.15.0 | 17:26.02 |
Robin_Watts | ator: Ah... | 17:26.25 |
pulsarpietro | right so the extension does make the difference | 17:26.32 |
ator | I just stumbled on a file which hangs (or takes more than a minute) to do -ggg | 17:26.38 |
Robin_Watts | ator: pdf_reference17.pdf :) | 17:26.52 |
ator | pulsarpietro: it gets stuck parsing your filename as an infinite sequence of "1-1" page ranges | 17:28.21 |
| the page range parsing doesn't do a lot of error checking :) | 17:28.31 |
| Robin_Watts: almost, pdfref17.pdf cut down to the first page by a previous mutool clean :) | 17:28.59 |
| <<<Back 1 day (to 2019/06/09) | Forward 1 day (to 2019/06/11)>>> | |