Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2020/12/05)Fwd 1 day (to 2020/12/07) >>>20201206 
velix Hey, please don't kill me for this question. Can Ghostscript or muTools add missing BBOX tag to an existing PDF?22:35.06 
  oops: the *images* are lacking the BBOX tag22:35.20 
artifexirc-bot <RayJohnston> velix: Images don't have a BBOX22:37.33 
velix RayJohnston: According to PDF/UA they do (and this seems to be the problem).22:38.14 
  I think I'll decode the PDF using mutools and add them manually using a script.22:38.41 
ray_laptop velix: presumably mutool clean -d ? 22:39.04 
velix yep, f.e.22:39.13 
  I'm always feeling like a surgeon when working on such a PDF :D22:43.31 
  you can pull out lots of code and the PDF still can be loaded ;)22:43.52 
ray_laptop the hard part when editing a PDF is not messing it up so that mutool clean can repair (rebuild the xref at least)22:45.34 
velix 16 0 obj22:48.11 
  [ 70.85 437.51 524.45 695.91 ]22:48.11 
  endobj22:48.11 
  That's what it wants.22:48.16 
  In Acrobat, this is shown as bbox22:48.23 
ray_laptop since I don't have the file you're talking about, I don't know what object 16 is used for22:49.50 
velix Oops. Sorry, It's an image, tagged as figure (PDF/UA with Matterhorn protocol)22:50.24 
  ray_laptop: You can't help me much here. I can create a demo, but I think I have to do all the coding.22:50.38 
  It's not a standard covered by GS22:50.50 
ray_laptop velix: that's fine. Have fun22:50.55 
velix ray_laptop: but if you want, I can create an example PDF for you22:51.07 
ray_laptop or mupdf22:51.11 
  velix: that's OK. I have plenty of other stuff to do. I steer clear of tagged PDF as much as possible (those pages of my PDF reference are in pristine state :-) )22:52.06 
velix The problem actually is Microsoft Office. The PDF it exports is 99% fine, but it lacks the BBOX on the images. Since, as you said, this ain't standard to PDFs.22:52.58 
  The PDF/UA tools only warn about the missing tag... but... I don't want a warning ;)22:53.20 
  Ah, here we go. Object 14 contains "/BBox 16 0 R", which links to object 16, which is a BBOX.22:56.13 
  [0 0 0 0] is recognized as valid... it's ugly, but a work around.22:56.37 
  Then I could reference all images to this object.22:56.44 
  Ha!22:56.46 
  ray_laptop: thanks :D22:56.49 
ray_laptop that does seem less than useful. Presumably the tag is so that an area of the page that contains the image can be identified for accessibility, so [ 0 0 0 0 ] wouldn't help 22:58.03 
velix ray_laptop: Actually, 90% of the guys on the web simply remove the tag of the image to make it validate.23:00.06 
  ray_laptop: so [ 0 0 0 0] at least will keep it tagged23:00.36 
  Sorry, my DSL line disconnected23:00.38 
ray_laptop If the BBOX is specified in page coordinates, it would be hard to generate, needing the current effective CTM which is difficult to get. A debug GS _could_ probably provide the actual values using -Zb and -dPDFDEBUG23:01.09 
velix ray_laptop: okay, I'll play with that23:02.46 
ray_laptop velix: you get a LOT of noise from that, but it would contain: [b]Image: w=1223 h=1188 [0.24 0 0 -0.239838 936.416 815.932] for an image that was painted by: 23:07.22 
  293.52 0 0 284.928 936.416 531.004 cm23:07.24 
  /Im1 Do23:07.25 
  (I just looked at that output from an Altona_Visual PDF since I knew it uses images and picked the first such image painted)23:07.27 
velix Thanks for the research My main problem will be to find the right object in the PDFs. In the final image, there are maaany images.23:08.21 
ray_laptop that image, when rendered to a 72dpi page spans (936,531)-(1228,814) (approximately, using cursor to get position). Seems to correspond, however.23:14.30 
velix Okay, I hope to get this working somehow :/23:15.12 
  ray_laptop: Can mutools decode /Alt streams? 23:15.35 
artifexirc-bot <RayJohnston> velix: I don't know. But calling mupdf via the API lets you get at almost anything, I imagine. We've released API bindings for mupdf javascript, and now Python as well23:19.47 
  <RayJohnston> but the learning curve my be a bit steep23:20.10 
velix True... okay, I'll keep on digging23:20.25 
 <<<Back 1 day (to 2020/12/05)Forward 1 day (to 2020/12/07)>>> 
ghostscript.com #mupdf
Search: