Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2020/04/30)Fwd 1 day (to 2020/05/02)>>>20200501 
pedr0 Hi all, I am dealing with a very bizarre PDF document, bizarre for me at least. It results in some very bizarre hidden text being present in the page, a lot of 'llll' scattered around. It is caused by an XObject but I don't understand why. The XObject is called with a Do instruction and within its own stream it calls ... himself through another 'Do' operation, which does not make any sense to me but I am obviously missing s07:55.18 
  omething. I wonder if what causes the transparency is the /Group/S leaf, set to 'Transparency' but I could not find confirmation on the PDF 1.7 specs. When I say hidden I mean that the text rendering of it, through mutool draw for instance, does show such material. Any help appreciated as I am completely lost.07:55.19 
kens It almomst certainly is not calling tiself07:55.44 
pedr0 That's the page giving me troubles: https://anonymousfiles.io/UDyZ85XY/07:55.46 
kens Its calling another object with the same name, but the name has been redefined locally07:55.58 
  I'm in the middle of writing a cvomplex email so it'll be a few minutes before I can look at the file07:56.17 
pedr0 Oh right, there is another Resources dictionary in the XObject itself, I missed that07:56.56 
  of course, whenever you like07:57.05 
kens Hmm, my browser says anonymousfiles.io has been reported as a malware page07:59.28 
pedr0 ah07:59.34 
  It did not say a word in my case - I am using Chrome07:59.49 
  Need to use sth else07:59.54 
kens If you would please :-)08:00.09 
pedr0 https://file.io/YsTWYx1x08:00.57 
kens That one's OK :-)08:01.08 
  Hmm big file08:01.26 
pedr0 Yap. 20 MB08:01.35 
  = Images around08:01.40 
  I guess08:01.42 
kens one page, 20MB, that's really very big08:02.15 
pedr0 it is uncompressed though, to be more precise, that's the result of mutool clean + options to read the stream in a text editor08:02.20 
kens Yes I se its uncompresed, that'll make it much larger o course08:02.55 
pedr0 yeah sorry I always forget to compress it back ... always08:03.13 
kens Doesn't matter I'd only have to decompress it again, I'm not good at reading compressed byte streams in my head08:03.40 
pedr0 :-)08:03.51 
kens So which of the numerous forms are you looking at ?08:03.58 
  Actually it probably doesn't matter.08:05.38 
pedr0 where did you understand it's a form ?08:05.40 
  main stream - page's stream08:05.57 
  name /Fm008:06.01 
  subtype08:06.20 
kens Right and that is itself a Form XObject with its own Resources, on of which is an XObject dictionary which defines the name /Fm0 to be a different object.08:06.38 
pedr0 Correct08:07.08 
kens This file uses (many) nested forms, each form has its own Resources, and many of those have XObjects. When those are Form XObjcets it defines the anems of the Forms as /Fm<x> where <x> starts at 0 and foes up to the number of Form XObjects used by that Form08:07.55 
pedr0 But what are form used for ?08:08.26 
kens Resource names are not unique in a PDF file, they depend on the context08:08.30 
  Forms are a kind of reusable resource08:08.40 
pedr0 How do they differ from non-form XObject ?08:09.02 
kens There are only 2 kinds of XObjects (I'm ignoring PS XObjects because they were a silly idea and deprecated decades ago). There are image XObjects and Fortm Xobjects.08:09.49 
pedr0 I can tell by experience that I've seen docs where XObject are used to draw the page contents and I could not possibly understand why those operations were not contain within the page's stream08:09.53 
kens Image XObjects are bitmaps, Form Xobjects contain content streas08:10.02 
pedr0 right08:10.09 
kens You don't *have* to use Forms08:10.17 
  The original intention was that thy were, literally, a form08:10.30 
pedr0 there I get lost in translation, I am not a native English speaker. Do you mean a mask with fields to be filled up by users ?08:11.04 
kens Then they got used as a way of making a resource you could call. Say you have a company logo (in vector format) You could stick that in the file once, then call it whenever you want to draw the logo08:11.16 
  Obviously (if the logo is complex) that makes a much smaller PDF file08:11.29 
  Originally a Form meant just that, the kind of theing the government sends you to fill in, or your bank wants you to fill ou when openeing a bank account.08:12.02 
  Sbut now, a form is a 'thing'. You don;'t have to know what's in it, you can just use it.08:12.28 
  Sadly, this has again been hijacked08:12.39 
pedr0 but you can but whatever you like in there, there are not constraints - it's just a stream of operations to be executed.08:13.02 
kens Yes, any valid PDF operations08:13.22 
pedr0 hijacked = misused08:13.23 
kens :-)08:13.27 
  Many applications now use a PDF 'Form' whn their own native object defines a layer or an object.08:13.35 
  So Cairo uses Forms *extensively* because of the way its transparency works08:13.52 
  For example08:13.57 
  So instead of making things simpler, Forms these days actually make things more complicated08:14.19 
  Most PDF files today that contain forms use those forms only once.08:15.02 
  Oh, there is one more genione reason for having a form, and that's to do with transparency. If you want to create a transparent group, you have to do it using a Form XObject.08:15.45 
pedr0 here we're getting to the group Group/S/ : Transparency08:16.23 
kens Yes, exactly08:16.37 
  That's a whole new level of complicated08:16.52 
pedr0 It made me feel bad while reading it in the PDF Ref. couldn't get it08:17.25 
kens Its insanely over-specified, after literally decades of work on it we are *still* fixing problems in Ghostscript. (I imagine the same is true of MuPDF but I don' t know)08:18.22 
  I admit these are teh edge and corner cases, and teh parts that are rarely used, but still.....08:18.38 
pedr0 But ... I've two questions, three actually: can I read Group/S Transparency as 'this won't be shown within the image'08:19.23 
  >08:19.25 
  ?08:19.27 
kens No08:19.33 
  Its starts a new transparency Group08:19.42 
  Which will then be blended with what's already been drawn08:20.19 
  Depending on whether the group is Isolated or non-Isolated either the whoel Group content is ddrawn, and then that is blended with the backdrop, or the contents are blended with the backdrop as the content is drawn08:21.01 
  And of coruse groups can contain groups08:21.11 
pedr0 omg08:21.28 
kens Also you can change the colour space being used to blend in08:21.42 
  On a group by group basis08:21.47 
  So your page could be blended in CMYK but the group drawn in it might be blended with the page backdrop using RGB08:22.14 
  I'm afraid that's only the start of the complexity. Its an area I try to stay well away from08:22.46 
pedr0 It did seem complex yes08:24.06 
  other than the PDF Reference, is there any other material you would suggest to understand this format better ? As a general suggestion, is there anything that springs to mind ?08:25.34 
kens Unfortunately, no, the PDF Reference is all there is. My only recommendation would be to use the Adobe specifications as far as possible and only use hte ISO ones if you need something specific to PDF 2.0. The ISO specifications are really hard to use.08:26.29 
pedr0 Another dream I've from time to time is a kind of PDF 'debugger' - sth that shows the state of the graphic state, text state etc. etc. at a given point in a stream of istructions08:26.54 
  oks - thanks for that08:26.59 
kens I'm writing a new PDF interpreter for Ghostscript, and it started as a PDF 'lint', it was intended to show a load of interpretation related stuff. Since then its become a real interpreter though08:27.45 
pedr0 I've started writing a PDF interpreter myself using PoDoFo - but it's a lot of work and I am always afraid of doing mistakes and I need to double check with a 'real' implementation08:29.31 
kens The biggest problem with writing a PDF interpreter is all the broken PDF files which Acrobat will happily ignore and display.08:30.01 
pedr0 I thought GhostScript would only understand PS08:30.02 
kens Ghostscript has a PDF itnerpreter, which is written in PostScript currently :-)08:30.24 
pedr0 I am reading the FAQ just now. Listen I'd like to thank you a lot for your help, I am often in this chat and hopefully I will be able to give back sooner or later.08:34.27 
kens Not a problem, I do need to go out for a run now though. If you've more questions Tor and co should be here soon.08:35.33 
ator sebras: both LGTM, should go onto the release branch (currently tor/release)09:35.30 
cgdae Robin_Watts: you around?19:32.40 
ntat Hi21:56.44 
mubot Welcome to #mupdf, the channel for MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.21:56.44 
ntat How can I run mupdf with Z parameter (fit page size to window)? Now I must every time click shift+z to do this.21:58.49 
 <<<Back 1 day (to 2020/04/30)Forward 1 day (to 2020/05/02)>>> 
ghostscript.com #ghostscript
Search: