| <<<Back 1 day (to 2020/04/02) | Fwd 1 day (to 2020/04/04) >>> | 20200403 |
chrisl | Wobak: ask your question - it may not get answered immediately, but the channel is logged so someone will see it eventually (depending on timezones) | 05:38.49 |
Wobak | chrisl, thanks :) | 07:51.59 |
| I am using ghostscript to try and apply a "watermark" to a PDF file. And it's working pretty fine. My problem is that on some PDF files, it just doesn't work, and I have no idea why. | 07:52.30 |
chrisl | Wobak: For anyone to answer that, we'd probably need to see an example PDF and the exact method you using to apply the watermark | 08:06.03 |
Wobak | ok | 08:10.49 |
| let me try and give a one pager of a file that works and a file that doesn't work | 08:11.03 |
| and the ps file I use to apply the watermark | 08:11.12 |
kens | .THe doesn't work one is more interesting | 08:11.16 |
Wobak | So. The ps file I'm using to create the watermark is : https://pastebin.com/4YvTLmnC | 10:51.08 |
| The command line i'm using is : gs -dBATCH -dNOPAUSE -dQUIET -dPDFSETTINGS=/prepress -sProcessColorModel=DeviceRGB -sOutputFile="working-watermarked.pdf" -SDEVICE=pdfwrite mark.ps working-extract.pdf | 10:51.14 |
| and I've put 2 extract (working and non working) + after running that command line | 10:51.52 |
| in this share : https://cloud.wobak.fr/index.php/s/RBdByYpKqtANuxw (pw : ghost ) | 10:51.59 |
| so when I watermark the working extract => I get working-watermarked, and when I watermark the notworking-extract, I get notworking-watermarked | 10:52.26 |
| except the watermark isn't there :( | 10:52.42 |
kens | The likely problem is that you are drawing the 'watermark' on the page first, then drawing the PDF content over the top of it. Anything in the PDF file will therefore obliterate what's already there. If the PDf file starts by (for example) drawing a white rectangle over the page, then your watermark will obviously vanish | 10:52.47 |
Wobak | hmmmm ok | 10:53.01 |
| so it might depend on how the PDF is build | 10:53.12 |
| built* | 10:53.13 |
kens | I'll ook at the files, need a minute | 10:53.16 |
Wobak | if I understand you properly ? | 10:53.19 |
kens | Essentially, yes. | 10:53.26 |
Wobak | let met know if I can help being clearer | 10:53.32 |
| I'm a 100% newbie in GS, as I found that method on the web, and trying to adapt it to my needs :) | 10:53.51 |
kens | not_working extract is the input file ? | 10:54.18 |
Wobak | yes | 10:54.54 |
| and notworking-watermarked the output | 10:55.00 |
| I've put the cmdline.txt file | 10:55.05 |
| to show the command I'm using | 10:55.09 |
| and the mark.ps file | 10:55.11 |
| in the share as well | 10:55.16 |
kens | Yeah I really just need to look at what's in the input file | 10:55.16 |
Wobak | ok | 10:55.20 |
| curiosity question : how do you look at it ? | 10:55.30 |
| if I get what you're saying properly, my watermark may actually be there, but behind the PDF | 10:55.52 |
kens | I decompress it then use a binary editor | 10:55.53 |
Wobak | ok | 10:55.56 |
kens | In the watermarked file yes. Consider id the whole page was a bitmap image, the bitmap would be on top of your watermark and s would obscure it | 10:56.37 |
| Ah, your watermark is rather discreet, I couldn't see it at all at first | 10:59.39 |
chrisl | It's kind of worrying if EndPage isn't working correctly | 11:01.06 |
kens | Yes, I hadn't realised it was an EndPage procedure | 11:01.28 |
| I just noticed that. | 11:01.33 |
| I can see the linwork in the working file, need to look again at the not working case | 11:02.11 |
chrisl | Is there transparency involved? | 11:02.21 |
kens | Oh yes :-) | 11:02.29 |
| page group at least | 11:02.33 |
chrisl | So, maybe the opacity setting is hanging around | 11:02.50 |
| Or blend mode... or something | 11:02.59 |
kens | multiple forms, each with groups, Multiply belnd mode.... | 11:03.01 |
| It 'looks like' the linework simply isn't there in the not working case. | 11:04.07 |
| Which would imply either that the EndPage isn't being executed, or that pdfwrite isn't putting the content into the page stream | 11:05.19 |
Wobak | so just for the record, I'm kind of not bad in what I do, but right now I feel like a kid listening to adults talking high level philosophy :D :D :D | 11:06.23 |
kens | Nothing looks wrong, off the top of my head I can't see why there would be a problem | 11:06.46 |
| I'm just putting the command together now to try it here | 11:06.57 |
Wobak | I see that as both a good thing and a bad thing | 11:07.02 |
| good because it means I didn't do something stupid | 11:07.09 |
| and bad because it's weird that you can't see anything wrong :D | 11:07.18 |
| (btw, thank you very much for taking the time to help me out guys) | 11:07.34 |
kens | You may have to open a bug report and elave it with us for later, I'm in the middle of a different project and don't want to spend too much time on this. I'll give it a try here but if it still continues not to work then some serious debugging is indicated. BTW what version of GS are you using ? | 11:08.35 |
Wobak | GPL Ghostscript 9.25 (2018-09-13) | 11:08.55 |
| yum package on CentOS 7 | 11:09.00 |
kens | I'd recommend you update to at least 9.50 | 11:09.27 |
Wobak | ok | 11:09.30 |
kens | Though I doubt it would make a difference | 11:09.37 |
Wobak | let me see if I can do that easily | 11:09.57 |
kens | 9.25 has some rather well-documented security flaws | 11:09.58 |
Wobak | but I get that it would be easier to be bugged anyway | 11:10.08 |
| as it might be the first answer | 11:10.14 |
| "does it happen with the latest release" | 11:10.19 |
kens | That's usually the first question yes :-) | 11:11.01 |
Wobak | I worked in tech support | 11:11.10 |
| so I get it :D | 11:11.13 |
| I've downloaded the binary for 9.52 | 11:12.07 |
| let me see if it does the same or not | 11:12.12 |
kens | Well the EndPage procedure is run | 11:12.49 |
| But the content is not in the output file. | 11:13.45 |
Wobak | refresh the share | 11:14.08 |
| I've added the file after running the procedure in 9.52 | 11:14.20 |
kens | At which point I'm going to have to stop, its clearly not something simple. I'd suggest you open a bug report and I'll look at it when I have time. I have to warn you its not likely to be soon | 11:14.33 |
Wobak | ok | 11:15.36 |
| do you have the link for the bug submission ? | 11:15.45 |
kens | bugs.ghostscript.com you'll need to make an account on bugzilla so you'll need an email address, even if its a one-shot address | 11:16.25 |
| Sorry I can't give you a solution though | 11:17.57 |
Wobak | no worries | 11:23.10 |
| I'm happy I didn't miss something obvious | 11:23.18 |
| I was comparing the available fonts in the 2 PDF | 11:23.28 |
| thinking this might be at some point a reason | 11:23.35 |
| but wasn't sure | 11:23.38 |
| so I'll bug this right away | 11:23.42 |
kens | No it won't be the fonts, for some reason the ENdPage content isn't making it into the PDF file, offhand I have no clue why that would be. | 11:24.07 |
Wobak | ok | 11:24.14 |
| was there enough info in the share ? | 11:24.22 |
| or should I add more into this ? | 11:24.27 |
kens | More than enough, just the PostScript EndPage code, input file and command line is enough. Please do put all of that in the bug report though, because by the time I get to look at it I will have forgotten everything. | 11:25.12 |
| You can reference teh #ghostscript IRC logs for April 3rd at around 12:00 hours as well if you like to avoid ahving to reprise all this | 11:25.45 |
Wobak | ok | 11:26.15 |
| Component PDF Writer ? | 11:30.45 |
kens | yep | 11:30.50 |
Wobak | should I try on my windows machine | 11:31.09 |
| to see if it's the linux version ? | 11:31.13 |
kens | No it fails on mine | 11:31.16 |
Wobak | ok | 11:31.18 |
| "Ghostscript Endpage procedure runs but no result on the PDF" | 11:32.00 |
| is that a proper title ? | 11:32.03 |
kens | Good enough for me, far better than we usualy get :-) | 11:32.14 |
Wobak | :D | 11:32.20 |
| so in attachment : the notworking-extract and the mark.ps files ? | 11:32.47 |
| that should be enough ? | 11:32.50 |
kens | Yes I believe so | 11:32.56 |
| I was able to reproduce it with just those. It might be wroth mentioning in the text where the watermark should appear and what it shoudl say | 11:33.24 |
Wobak | yeah, I'll put the working one so that people can compare | 11:33.39 |
kens | That would help, I was initially looking for something more obvious :-) | 11:33.59 |
Wobak | yeah I should've cleared that info :D | 11:34.10 |
kens | Its not a problem, most people use really BIG* watermrks though, usually across the whole page | 11:34.36 |
Wobak | is there any markdown | 11:35.44 |
| or something to make the bug a bit clearer ? | 11:35.49 |
kens | Not sure what you mean | 11:36.02 |
| The title is good, you've supplied the example to reproduce it, so it all sounds good to me. | 11:36.30 |
Wobak | yeah but there isn't any formatting in the bug template | 11:36.44 |
| it's full plain text | 11:36.46 |
| I'm guessing ? | 11:36.48 |
kens | just plain text yes | 11:36.54 |
Wobak | ok | 11:36.57 |
kens | Lunch time for me, back in a bit | 11:38.22 |
Wobak | bug submitted, thanks for the guidance :) | 11:41.16 |
kens | OK assigned the bug to me so it doesn't get missed | 12:08.09 |
Wobak | ok kens I think I get it. So basically there's an "invisible" box that exists around the actually displayed PDF, and I'm watermarking that before printing the content of the actually displayed PDF with pdfwrite | 14:51.16 |
| am I understanding that right ? | 14:51.20 |
| and therefore, as my watermark is NOT in the displayed area, I do not see it in the end result | 14:51.47 |
kens | Yeah more or less. There's the entire page, then there's a 'CropBox' inside that that the Acrobat display is reduced to | 14:51.49 |
Wobak | ok | 14:51.55 |
kens | Indeed, your watermrk is in the gap between teh MediaBox and teh CropBox and because we throw away anything in there (because we clip it) your watermark is invisible twice over | 14:52.25 |
| You have to shift the position of the watermark so that its inside the CropBox | 14:52.43 |
Wobak | yeah OK | 14:52.57 |
| and to figure out where the Cropbox begins | 14:53.03 |
kens | You can determine the lower left x and y of the CropBox and then just add those values to 1,1 and it should work | 14:53.04 |
Wobak | I need to use something else | 14:53.17 |
| that I have no idea how it works | 14:53.21 |
| I'll try and figure that out, but first, some physical exercice :) | 14:53.33 |
kens | If you use ghostpdl/pdf_info.ps it willtell you the Box sizes | 14:53.33 |
Wobak | ok | 14:53.43 |
kens | Ah, I had my run this morning :-) | 14:53.44 |
Wobak | I had that in your comment | 14:53.53 |
| it's just that I have no idea on how it's used :) | 14:54.00 |
| (and that's on me) | 14:54.10 |
chrisl | kens: Could clippath pathbbox work to get the imageable area? | 14:54.16 |
Wobak | as that PDF is generated for me to be watermarked | 14:54.31 |
| I'm also talking to the person generating it | 14:54.37 |
kens | Oh well you just do: | 14:54.40 |
| gs -dNOSAFER -sFile=....input.pdf ghostpdl/lib/pdf_info.ps | 14:54.40 |
| chrisl umm, no I don't think so | 14:54.50 |
Wobak | (because it is a printed magazine originally) | 14:54.58 |
kens | But I didn't try it (chrisl) | 14:55.08 |
Wobak | and they're trying to sell old numbers using PDF with a slight watermark | 14:55.13 |
| so the PDF is generated for that specific purpose | 14:55.24 |
| probably using Acrobat Pro | 14:55.27 |
kens | I see, makes sense. BTW there's a better solution than -dNOSAFER but you'll haev to read the documentation in ghostpdl/Use/htm (look for -dSAFER) to permit specific files to be read | 14:55.49 |
Wobak | ok | 14:56.06 |
kens | --permit-file-reading | 14:56.09 |
| The default behaviour for Ghostscript is to prevent random PostScript program sfrom reading files off disk | 14:56.34 |
Wobak | oh ok | 14:56.40 |
kens | So you have to tell it you want to allow that | 14:56.41 |
Wobak | well it's a non-shared server where i'm running my script | 14:56.51 |
kens | either by -dNOSAFER (which is OK for a quick test) or by specifying the file(s) which are allowed, which is better for production | 14:57.07 |
Wobak | so that level of security is probably the last level I need | 14:57.10 |
| as I'm making things safer before that | 14:57.20 |
| but I see your point :) | 14:57.23 |
kens | Well it can write files too | 14:57.31 |
| Obviously limited by the permissions of the user, but still | 14:57.44 |
Wobak | ghostpdl is bundled with ghostscript ? | 14:57.46 |
| or is it different ? | 14:58.11 |
kens | GhostPDL is the entire family; Ghostscript, GhostPCL, GhostXPS | 14:58.18 |
| Also a specific product GhostPDL which does all of the above an reads image files too | 14:58.39 |
| Its all open source though | 14:58.56 |
Wobak | hmmmmm ok | 14:59.15 |
chrisl | --permit-file-reading and co are for 9.50 and later: https://ghostscript.com/doc/9.50/Use.htm#Safer | 14:59.39 |
kens | Ghostscript reads PostScript and PDF, GhostXPS reads XPS files and GhostPCL reads PCL and PCL-XL (PCL6) files | 14:59.49 |
Wobak | ok seems that ghostpdl-9.52 belongs in /usr/share | 15:01.07 |
| is there a guideline to where to put the resources ? | 15:01.18 |
chrisl | The distros go their own way | 15:01.45 |
kens | We don't really care where you put Ghostscript :-) | 15:01.49 |
Wobak | :D | 15:01.56 |
| ok so there's a MediaBox, a CropBox, a BleedBox and a TrimBox | 15:03.31 |
| with each 4 values | 15:03.35 |
| I can pretend they all make sense to me | 15:03.43 |
kens | And ArtBox | 15:03.47 |
Wobak | I don't see ArtBox in pdf_info.ps output | 15:03.57 |
kens | The numbers are in pairs | 15:04.00 |
| Then your PDF file doesn't use it, but its another possibility | 15:04.11 |
Wobak | ok | 15:04.14 |
kens | The pairs are x and y co-ordinates | 15:04.25 |
Wobak | ok | 15:04.29 |
| pair = 1 2 then 3 4 ? | 15:04.39 |
kens | Normally you will get the llx and lly first, but technically you can get it the other way around | 15:04.44 |
Wobak | or is it 1 - 3 and 2 - 4 ? | 15:04.47 |
kens | Which is terribly confusing | 15:04.53 |
| Sometimes its kkx, llx, urx, ury | 15:05.03 |
Wobak | yeah as if the rest was already crystal clear xDDDDD | 15:05.03 |
kens | Usually in fact | 15:05.06 |
| But sometimes its written as urx, ury, llx, lly | 15:05.15 |
Wobak | ok and if I remember well, my script is using llx and lly | 15:05.24 |
kens | So you actulaly have to 'normalise' the values (ie look for the smallest | 15:05.35 |
Wobak | so I should take the CropBox llx & lly value | 15:05.39 |
| add 1 to each | 15:05.42 |
| and consider that to be the starting point of my watermark | 15:05.50 |
kens | yes exactly right | 15:05.51 |
Wobak | correct ? | 15:05.52 |
| ok | 15:06.03 |
| let me try and see if I manage to do that without breaking the Internet on its own | 15:06.15 |
| (that is a possible side effect of me touching stuff I'm not sure) | 15:06.26 |
kens | Oh I usually just break my computer | 15:06.41 |
Wobak | :D | 15:06.49 |
| this is on a remote server on the internet | 15:06.54 |
| so I might have the ability to break the Internet as a whole | 15:07.05 |
| but I doubt it | 15:07.08 |
| I'm not that good | 15:07.10 |
kens | Could eb fun, if my connection drops I'll know who to blame | 15:07.25 |
Wobak | ahha | 15:07.32 |
| true | 15:07.33 |
| can you confirm that it's the line .90 setgray 1 1 moveto | 15:07.41 |
| that I need to change ? | 15:07.45 |
| in my ps file ? | 15:07.48 |
| this is llx lly | 15:07.56 |
| before the moveto | 15:07.59 |
kens | yes its the 1 1 moveto that needs top change | 15:08.02 |
Wobak | right ? | 15:08.02 |
| ok cool | 15:08.06 |
| ok so I'll put that in variables first | 15:08.17 |
kens | Sounds good to me | 15:08.27 |
Wobak | is there a way to request pdf_info of just the first page ? | 15:09.06 |
| I don't care if the watermark is missing on a few pages that don't have the same size | 15:09.17 |
| they're supposed to have the same size anyway | 15:09.23 |
kens | Umm I have no idea, but you could pick the PostScript program apart and look at it | 15:09.26 |
Wobak | don't bother, awk is a tool I master way better than PostScript :D | 15:09.43 |
kens | Sorry but I have another conversation in a differen twindow, I'm going to be distracted for a bit | 15:09.44 |
Wobak | np, thanks for the help ! | 15:09.50 |
kens | You're welcome, hope you get it sorted out | 15:10.04 |
Wobak | for those interested, I'll paste it for the logs. If you extract the pdf_info.ps to a file, you can get the lowest llx / lly for the Page using this small awk script : | 15:20.06 |
| read llx lly <<< $(awk '/Page 2 / { gsub(/.*CropBox: \[/,"",$0) ; gsub(/\] Bleedbox.*/,"",$0) ; if ($1<$3) { print $1 } else { print $3 } ; if ($2<$4) { print $2 } else { print $4 }}' dainfo.txt) | 15:20.07 |
| where dainfo.txt is the file generated from : | 15:20.31 |
| gs -dNOSAFER -sFile=sourcefile.pdf ghostpdl-9.52/lib/pdf_info.ps | 15:20.52 |
| > dainfo.txt | 15:20.52 |
| kens, sorry to bother you again. Side question : is there an easy way for me to do a different value for a specific page ? Or do I have to extract the page, mark it, mark the rest of the pages and then collate them back ? | 17:06.58 |
kens | You can track the page number and run different procedures for different page numbers | 17:07.44 |
| requires some PostScript programming | 17:07.59 |
Wobak | ok | 18:13.52 |
| <<<Back 1 day (to 2020/04/02) | Forward 1 day (to 2020/04/04)>>> | |