Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2019/11/26)Fwd 1 day (to 2019/11/28) >>>20191127 
cryptopsy any pdfcropping tool which can do bbox ?13:14.00 
  i want to trim all margin whitespace without calculating myself13:14.10 
chrisl There's probably isn't a trivial tool for that, because it will need to interpret and (at least) scan convert the contents.13:16.42 
cryptopsy there used to be13:16.56 
chrisl Oh, well use that then13:17.05 
cryptopsy the scan is the easy part13:17.07 
kens Ghostscript's bbox tool can tell you the bbox of the page contents13:17.14 
cryptopsy yep13:17.26 
kens 'Not 'scan' but scan-convert'13:17.26 
cryptopsy gs \13:17.45 
  -q -dBATCH -dNOPAUSE \13:17.48 
  -sDEVICE=bbox \13:17.50 
  "$1"13:17.52 
  it just needs to do a little string parsing13:18.08 
kens What ? I think you are mistaken13:18.21 
cryptopsy to calculate -g5400x7200 \13:18.24 
chrisl No, it really, really needs to do a *lot* more than that13:18.26 
cryptopsy -c "<</PageOffset [-36 -36]>> setpagedevice" \13:18.26 
  it just needs to set that 5400, 7200, 3613:18.40 
kens Setting PageOffset has nothing to do with cropping content13:18.46 
cryptopsy this uses the bounding box for the perfect page not the largest bounding box, which would require a little more arithmetic13:18.57 
  for the first page*13:19.06 
kens Well since we have not seen you rfile we have no clue what is on the first page, nor its content. So your numbers make no sense13:19.27 
cryptopsy the 2nd gs command will crop for sure, its all about getting the right numbers13:19.34 
kens Also I cannot see whhy you are using the resolution and declaring the media in pixels, you;'d do better to stay in points13:19.47 
cryptopsy i could easily just use the 2nd page or 3rd page13:19.58 
  its a book13:20.03 
  true that the 1st page is often a different size than the rest13:20.10 
kens No, the second Ghostscript commnd does no cropping. It simply moves the markign content with respect to the underlygin media13:20.22 
cryptopsy those are pts13:20.30 
chrisl If you think you can get an accurate bounding box of the marking operations without scan converting, I'll be most interesting to hear how13:20.41 
kens -g5400x7200 is not points13:20.42 
cryptopsy i can guess the bb based on some page's bb13:24.16 
  this seems reasonable?13:25.39 
kens Since you have not supplied an example file, its impossible to tell13:25.54 
cryptopsy i mean, theoretically13:26.35 
  i have some files here but i'm currently writing the script13:26.54 
kens To be frank yuou have not defined the problem sufficeintly well for me to have an informed opinion.13:27.00 
  Bluntly; I haven't a clue what you are talking about now13:27.15 
cryptopsy is -sDEVICE=bbox adequate in guessing a bbox?13:27.37 
chrisl It doesn't guess13:27.47 
kens Its not guessing anything13:27.47 
cryptopsy while in and of itself it doesn't guess, used in this context it provides a guess by using the bbox of a page to represent the entirety of the pages13:28.22 
kens No, there is no guesswork13:28.37 
cryptopsy the guessing is me accepting that bbox as a possibility to represent the entirey of the pages13:28.58 
kens Well that'll be wrong then13:29.06 
cryptopsy why would it be wrong in a book?13:29.15 
kens Many PDF files have pages of differnt sizes and orientations13:29.18 
cryptopsy yes, i know13:29.24 
  the question is whether bbox is good at returning the bbox values13:29.44 
kens You keep talking generally then jumping to 'I'm talking about a book' but won't supply an example, so its not surprising we're talkign at cross-purposes13:29.53 
cryptopsy i could pull a pdf at random what does that prove13:30.13 
  would you like to test it on 10k pdf books?13:30.22 
kens It gives us a concrete example to talk about13:30.27 
cryptopsy i could provide you with 10, 10013:30.41 
kens OK this is not a forum for discussing random problems. The answer to your question is that the bbox device will provide the precise bounding box of a given page form a file, a\t a given resolution.13:31.36 
cryptopsy here is a pdf chrome-extension://mhjfbmdgcfjbbpaeojofohoefgiehjai/index.html?https://www.ema.europa.eu/en/documents/report/meeting-report-paediatric-high-grade-glioma-medicines-expert-workshop_en.pdf13:32.09 
  err13:32.16 
kens I've answered your question above13:32.23 
cryptopsy you wanted a pdf, here is your pdf https://www.ema.europa.eu/en/documents/report/meeting-report-paediatric-high-grade-glioma-medicines-expert-workshop_en.pdf13:32.33 
kens No, frankly I'm not discussing this any longer.13:32.46 
cryptopsy lol13:32.50 
  i am taking your replies one at a time13:32.58 
kens If you have a Ghostscript question ask it. If you have random PDF questions, please take them elsewhere13:33.27 
cryptopsy i am truly baffled why you changed your tone this way13:33.44 
  ¯\_(ツ)_/¯13:33.54 
  book was not the right choice of word, i should have said 'document' since a book can be a pdf with scanned images of book pages13:35.51 
  so what conditions would make bbox return wrong values?13:36.44 
kens None13:36.55 
cryptopsy i have mentioned one above, a scanned image. Can pdfs come broken somehow, such that no margins are set, but rather the text is somehow confined to a space on the space to pretend there was a margin?13:37.34 
  how does bbox work?13:37.40 
kens Well teh source is available to you13:38.01 
cryptopsy does it compare pixels? probably not ?13:38.03 
  oh, yea, the source ...13:38.10 
  neat13:38.20 
kens And yes, it does render the input and (effectiely) count pizels13:39.09 
cryptopsy does gs offer something like poppler's pdfinfo ?13:40.24 
kens I don't know what poppler's thing does. Ghostscript has pdf_infp.ps13:40.47 
  pdf_info13:40.56 
cryptopsy didnt come with my 9.50 ghostscript-gpl-9.5013:41.22 
kens And you got that from where ?13:42.14 
chrisl http://git.ghostscript.com/?p=ghostpdl.git;a=blob;f=toolbin/pdf_info.ps13:42.20 
cryptopsy normal gentoo install13:42.23 
kens Then you probably want to complain to the Gentoo dpackage maintainer13:42.36 
cryptopsy there was no options for compiling 'examples' or 'scripts' or anything extra13:42.40 
kens Well no13:42.48 
cryptopsy you'd think so but that would not solve things13:42.54 
kens Then your solution is to get the code from us and build it yourself.13:43.21 
cryptopsy thats the difference between bb and hiresbb ?13:48.02 
  %%BoundingBox: 18 0 567 68313:48.31 
  %%HiResBoundingBox: 18.593999 0.000000 566.927983 682.86597913:48.33 
kens I assume you mean 'what' rather than 'that', and the answer is one is more accurate13:48.56 
cryptopsy can gs hide Substituting font text when processing pages so that only page numbers are seen?13:53.37 
kens This is covered in the documentation13:54.04 
  However, this is important information. It tells you that the fonts being used are not present. Using different fonts will potentially affect the bounding box13:55.13 
cryptopsy yep13:56.18 
  my input file doesnt contain any /CropBox lines but -c "[/CropBox [36 36 60 60] /PAGES pdfmark" \ failed to crop it, even though the change was made to the file14:09.09 
  i am putting together an example now14:09.28 
kens Adding a CropBox doesn't crop the file in any meaningful sense, its merely an instruction to the consumer that when rendering the content should be cropped in this fashion *if* the consumer chooses to use the CropBox14:10.17 
cryptopsy i am using mupdf-gl does it not use cropbox by default?14:10.45 
kens No idea, that would be a MuPDF question14:10.58 
cryptopsy in that case i should use DEVICEWIDTHPOINTS DEVICEHEIGHTPOINTS, -c translate and -c rectclip14:12.31 
  i was not able to crop this file https://clbin.com/yWOIV with this command gs -o cropped.pdf -sDEVICE=pdfwrite -dDEVICEWIDTHPOINTS=595 -dDEVICEHEIGHTPOINTS=842 -dFIXEDMEDIA -c "24 72 translate" -c " 0 0 235 422 rectclip" -f $114:16.57 
  or maybe ...14:17.19 
kens That file really needs to go to dropbox or something14:17.42 
  Copying and pasting from a browser isn't likely to work correctly14:17.55 
cryptopsy its a 170kb pdf i dont have a commandline tool for uploading files so i just used that clbin14:18.06 
  file opens if renamed14:18.15 
kens Maybe on your system, I'm using Windows14:18.31 
cryptopsy save-as "file.pdf"14:18.44 
  stil doesn't ?14:18.47 
chrisl It's probably borked the binary14:19.08 
cryptopsy i think windows uses \r\n insead of \n14:19.21 
  it crops, rectclip wasnt sufficient14:20.26 
  is the documentation for rectclip online?14:22.03 
kens IIRC eahch page of a PDF file from teh PDF interpreter will execute initgraphics, whihch will reset the cli[14:22.06 
  rectclip is part of the PostScript Language14:22.18 
  And as I said above, it won;'t do you any good14:22.36 
cryptopsy man that's old https://ghostscript.com/pipermail/gs-text-api/2004-January/000146.html14:22.46 
kens The PDF itnerpreter executes 'intigraphics' before starting each page of the PDF file. That operator rezets the graphics state, which blows away the clipi you ahev set14:23.35 
cryptops1 crash14:26.43 
  can i quiet the startup message but not the page processing? -q quiets all14:31.17 
kens No14:31.30 
cryptops1 i can't find the translate documentation on the doc page https://www.ghostscript.com/doc/current/Use.htm14:36.16 
kens translate is a PostScript operator14:36.35 
  We don't provide documentation for teh PostScritp language14:37.02 
cryptops1 what about rectclip?14:37.55 
  welp, i am out of ideas14:38.07 
kens That too is part of the PostScript language14:38.09 
cryptops1 how do i run gs on one page? is sPageList=pagenumber deprecated?14:41.38 
kens -sPageList works14:41.52 
cryptops1 it says: "These command line options are no longer specific to PDF, but have some specific differences with PDF files"14:41.54 
  ok14:41.56 
kens As does FirstPage nad LastPage14:41.59 
cryptopsy .j $postscript15:20.19 
  oh, that channel is dead ...15:20.39 
  i can't get my document o translate, does gs offer some ability for this?15:21.39 
  my attempt was with -c "-$x -$y translate" 15:22.16 
kens I've already said, twice, that using translate in PostScript preceding the PDF page is not going to work. That's because the PDF interpreter executes a setpagedevice before every page form the PDF file (in order to set hte required media size) and setpagedevice does an implicit 'initgraphics'. The initgraphics operator resets the graphics state to its default. That means it throws away your translate.15:23.16 
  So you can't use that and expect anythign to happen15:23.54 
cryptopsy even with -dFIXEDMEDIA ?15:23.56 
kens That has nothing to do with it15:24.08 
cryptopsy saw a many upvoted thread on stakoverflow doing it15:24.11 
kens Then its wrong, or its not doing what you think it is15:24.30 
cryptopsy yes it must be wrong since it isn't working15:24.41 
  what can i do?15:24.44 
kens Well that depends entirely on what you are trying to achieve15:24.57 
cryptopsy i achieved a crop i would like to center the content15:25.16 
kens You can use <</PageOffset x y>> setpagedevice to move the content across the media. Because the PageOffset key is in the page device dictionary it is preserved when setpagedevice is executed, which means it continues to take effect on every page15:26.29 
cryptopsy i found your SO thread15:26.43 
  reading now15:26.45 
  https://stackoverflow.com/questions/46051517/ghostscript-crop-pdf-not-correctly15:26.53 
kens Actuall that should be <</PageOfgfset px y]>> setpagedevice15:26.59 
cryptopsy hmm i actually i had a script with that ...15:28.49 
  its really nasty it processes one page at a time in a loop15:29.14 
  https://i.imgur.com/UTqjX8q.png15:29.42 
kens doesn't grok bash scripting15:30.06 
cryptopsy i hope it will work once i add this PageOffset thing15:30.42 
  mm15:32.10 
  it over translated15:33.35 
  -c "<</PageOffset [-$x -$y]>>setpagedevice" \15:34.33 
  oops15:34.35 
  https://i.imgur.com/j9eRlJR.png15:34.42 
  there were the x y w h from gs -q -dBATCH -dNOPAUSE -sDEVICE=bbox -sPageList=1 "$1"15:35.06 
kens Looks correct to me15:37.42 
  the left and bottom extents of the content seem to be at the left and bottom of hte output15:38.06 
cryptopsy x y w h before 62.387998 42.479999 532.691984 827.63997515:38.13 
  x y w h after 0.000000 0.000000 470.303986 785.16004615:38.20 
  i guess bbox didn't box as i expected15:39.14 
  since there is still a top and right margin15:39.20 
  this is the part i was dreading , bash arithmetic15:40.12 
  i have to subtract another margin from w and h15:40.26 
  why did bbox do this?15:40.49 
kens I susp[ect that the logo contains white15:41.16 
  That gets counted as marking the page15:41.30 
cryptopsy which logo?15:41.42 
kens The blue circle at the top of the page15:41.54 
cryptopsy lets try bbox on page 315:42.18 
  https://i.imgur.com/UKYjXf7.png15:43.16 
  lets try another file15:43.43 
kens Well I'd have to dissect the PDF file to figure out why its doing that, but my guess is still that the PDF draws white on the page15:43.51 
cryptopsy no problem - could be luck of a draw - i pull that off the net15:44.09 
kens Looks like the majority of those pages are images.15:45.25 
cryptopsy https://i.imgur.com/GziQxQt.png15:45.34 
  this seems like there is exactly another margi's width of whitespace there15:46.09 
kens Well as I say, that file seems to consist mostly of images. Images can contain white space, and it is still counted as being marked15:46.39 
cryptopsy this was another file15:46.59 
  pure text you can select it with the mouse15:47.17 
kens Being able to select text does not mean it is not an image15:47.31 
  THis is a common technique used by OCR packages, the text is actually invisible15:47.43 
cryptopsy if i zoom into this it isn't scanned text 15:48.16 
  the background is a perfect white15:48.28 
kens Well I don't have the file so I can't comment.15:48.44 
cryptopsy you would know better i am not familiar with this kind of trickery15:48.48 
kens TBH we are well past the 'goodwill' point here, we don't give technical support to open source users. If you think you've found a bug feel free to report it in our bug tracker.15:49.39 
  I need to concentrate on actual paid work15:49.48 
cryptopsy https://gofile.io/?c=nJTeMk15:50.03 
  alright15:50.05 
  ♥ тнαηк уσυ ѕєηραι ♥15:50.08 
  subtracing another x,y from w,h, crops perfectly. I guess most pdf are set up that way16:01.39 
 <<<Back 1 day (to 2019/11/26)Forward 1 day (to 2019/11/28)>>> 
ghostscript.com #mupdf
Search: