| <<<Back 1 day (to 2016/06/29) | 20160630 |
tor8 | Robin_Watts: I'm looking at the code in pdf-clean-file.c that creates subsets of pages | 12:52.41 |
| it looks like it's going to fail whenever any page items are inherited from the page tree (such as MediaBox and Resources) | 12:53.21 |
inkbottle | How can I check for "existing embedded annotations" in pdf file (looking for a workaround to https://bugs.kde.org/show_bug.cgi?id=357526) | 14:47.16 |
kens | Well you could grep for Annotation | 14:47.45 |
Robin_Watts | That's an okular bug, right? | 14:48.51 |
kens | Make that 'Annots' | 14:48.59 |
inkbottle | Robin_Watts: kens: yes... because of it I cannot annotate the document in okular | 14:50.21 |
kens | Not *our* problem though | 14:50.34 |
inkbottle | there are 2 examples of files that trigger the bug in the bug page | 14:50.43 |
kens | Why not ask the Okular people ? | 14:50.45 |
| Not being funny, but why should we spend any time looking at this ? | 14:51.06 |
inkbottle | no, my question is: I would like to try to see if there are such annotations | 14:51.22 |
| and 2, remove them | 14:51.33 |
kens | I gave you one answer, you didn't say anything about removing them | 14:51.47 |
| It would be possible to do that with Ghostscript, but you'd have to hack the PDF interpreter so that it didn't pass on annotations. Of course, you'd be getting a different PDF file, but then, that's what you want | 14:52.39 |
inkbottle | kens: OK, yes you did, I'll do the grep then and be back | 14:52.44 |
Robin_Watts | Neater solution would be to do it by manipulating the PDF file using "mutool run". | 14:53.33 |
| tor8: ^ | 14:54.04 |
inkbottle | kens: I did the search and it yielded a positive match; and a negative one with a modified version obtained through "pdfjam" (the modified version lost TOC) | 14:57.44 |
| Robin_Watts: ok for mutool run | 14:58.00 |
Robin_Watts | inkbottle: A simple grep finding "Annots" is a pretty good indication that a file has annotations. There may be false positives. | 14:58.29 |
| Certainly there will be false negatives however. | 14:58.38 |
kens | Hmm which annotations can you have that are not represented by an Annots in teh page dictionary ? | 14:59.11 |
Robin_Watts | kens: If object compression is on, a grep will fail. | 14:59.43 |
kens | Are page objects compressed ? I can never remember | 15:00.03 |
Robin_Watts | All objects can be compressed, except for the trailer stuff, AIUI. | 15:00.23 |
kens | If so then you could do it by having Ghostscript do the job for you, in the manner of pdf_info.ps | 15:00.24 |
inkbottle | Robin_Watts: yes to "false positive", but the second clue obtained with the same file treated with "pdfjam" increases the "likelihood" | 15:00.25 |
| Reading mutool man page | 15:01.13 |
kens | wonders why I can never find pdf_info.ps is it in toolbin ? | 15:01.29 |
| Oh, yes it is | 15:01.40 |
| Well a relatively simple modification to pdf_info.ps would allow it to determine if Annotations are present | 15:02.18 |
| The code already checks for Annots in order to find out which fonts are used, so it would be trivial to add a detction for it | 15:03.02 |
Robin_Watts | A mutool run based solution seems better to me. 1) It would preserve more of the original file structure, 2) it requires coding in javascript, not Postscript - I know which one most people will find preferable. | 15:04.05 |
kens | I'm simply offering alternaties. And it would be very easy to add the detection to the existing info in pdf_info.ps | 15:04.42 |
inkbottle | Robin_Watts: I did "mutool clean -d hello.pdf hello_clean.pdf" (http://ghostscript.com/pipermail/gs-cvs/2016-April/019748.html about adding mutool run documentation) | 15:06.56 |
| Robin_Watts: As usually you wont advise me to remove the Annots by hand... | 15:07.34 |
| ;-) | 15:07.46 |
kens | Apparently in that file only page 1 uses Annotations, and it is also the only page which uses transparency | 15:07.57 |
Robin_Watts | inkbottle: mutool clean -d will produce a decompressed version. You don't need to do that if you use mutool run. | 15:08.13 |
| You do need to write a javascript script to manipulate the PDF file. | 15:08.36 |
kens | will commit the modified pdf_info.ps, may as well have more informaiton available. | 15:08.38 |
Robin_Watts | Various examples of such scripts are given in the documentation added in that commit. | 15:08.55 |
inkbottle | Robin_Watts: OK... | 15:09.27 |
Robin_Watts | The expert here is tor8. | 15:09.44 |
inkbottle | ... | 15:10.36 |
kens | Ah Ghostscript already has a feature to do this. If you set ShowAnnots to false I believe pdfwrite will produce a PDF with the Annotations stripped out | 15:10.54 |
| So gs -sDEVICE=pdfwrite -dShowAnnots=false -sOutputFile=stripped.pdf in.pdf should do it | 15:11.17 |
| Ghostscript isn't very happy about page 1 of that file anyway, looks like its been modified and broken in the process | 15:13.33 |
| Of course, a smaller file would be better | 15:13.49 |
| Seems GS doesn't like a lot of pages in that file, there's a bad Form in there. It doesn't affect the output as far as I can tell, but I'm not going to carefully check 1442 pages | 15:15.09 |
| The resulting file has no annotatins. So there is a complete answer. | 15:16.05 |
inkbottle | kens: simply astonishing | 15:16.05 |
kens | SO I had to modify ghostpdl/toolbin/pdf_info.ps, I added: | 15:17.52 |
| dup /Annots pget { | 15:17.53 |
| pop | 15:17.53 |
| ( Page contains Annotations) print | 15:17.53 |
| } if | 15:17.53 |
| At line 115 of that file | 15:17.53 |
inkbottle | Robin_Watts: Ken: I will have to dig into mutool sometimes ;-) (Also I'm happy I didn't today because javascript is not my forte) | 15:18.15 |
kens | The caveats thaqt Robin mentioned apply, MuPDF would do less interpretation of the PDF file, and the resulting output would be 'moe like' the original than the Ghostscript output. As I said, there are some problems with that file (there's a fomr with unbalanced q and Q operators) and its always possible that the result will not be acceptable. MuPDF is generally better for 'manipulating' PDF files, but it happens that we already have a canned so | 15:20.26 |
HenryStiles | I created an email list for mupdf mobile stuff, if you didn't just get an email than you are not on it, if you want to be on it, let me know | 15:23.04 |
inkbottle | kens: Also I'm in the process of learning ps and pdf (the pdf doc was PLRM.pdf), so I'm happy there was an easy solution | 15:24.46 |
| kens: Robin_Watts: thanks a lot | 15:25.01 |
kens | You're welcome | 15:25.06 |
inkbottle | kens: I'm putting the gs command as a workaround in the aforementioned kde bug, giving source #ghostscript irc channel: I believe it's alright to you | 15:45.01 |
kens | Certainly, please do include a warning about the conversion, you can point people to the Ghostscript documentation, partiocularly the overview in VectorDevices.htm | 15:45.39 |
tor8 | inkbottle: if you check out mupdf, the mutool run docs are in docs/mutool/run.html with several examples in docs/mutool/examples/ | 21:39.18 |
inkbottle | tor8: got it, thanks | 22:03.04 |
| Forward 1 day (to 2016/07/01)>>> | |