| <<<Back 1 day (to 2020/04/30) | Fwd 1 day (to 2020/05/02)>>> | 20200501 |
pedr0 | Hi all, I am dealing with a very bizarre PDF document, bizarre for me at least. It results in some very bizarre hidden text being present in the page, a lot of 'llll' scattered around. It is caused by an XObject but I don't understand why. The XObject is called with a Do instruction and within its own stream it calls ... himself through another 'Do' operation, which does not make any sense to me but I am obviously missing s | 07:55.18 |
| omething. I wonder if what causes the transparency is the /Group/S leaf, set to 'Transparency' but I could not find confirmation on the PDF 1.7 specs. When I say hidden I mean that the text rendering of it, through mutool draw for instance, does show such material. Any help appreciated as I am completely lost. | 07:55.19 |
kens | It almomst certainly is not calling tiself | 07:55.44 |
pedr0 | That's the page giving me troubles: https://anonymousfiles.io/UDyZ85XY/ | 07:55.46 |
kens | Its calling another object with the same name, but the name has been redefined locally | 07:55.58 |
| I'm in the middle of writing a cvomplex email so it'll be a few minutes before I can look at the file | 07:56.17 |
pedr0 | Oh right, there is another Resources dictionary in the XObject itself, I missed that | 07:56.56 |
| of course, whenever you like | 07:57.05 |
kens | Hmm, my browser says anonymousfiles.io has been reported as a malware page | 07:59.28 |
pedr0 | ah | 07:59.34 |
| It did not say a word in my case - I am using Chrome | 07:59.49 |
| Need to use sth else | 07:59.54 |
kens | If you would please :-) | 08:00.09 |
pedr0 | https://file.io/YsTWYx1x | 08:00.57 |
kens | That one's OK :-) | 08:01.08 |
| Hmm big file | 08:01.26 |
pedr0 | Yap. 20 MB | 08:01.35 |
| = Images around | 08:01.40 |
| I guess | 08:01.42 |
kens | one page, 20MB, that's really very big | 08:02.15 |
pedr0 | it is uncompressed though, to be more precise, that's the result of mutool clean + options to read the stream in a text editor | 08:02.20 |
kens | Yes I se its uncompresed, that'll make it much larger o course | 08:02.55 |
pedr0 | yeah sorry I always forget to compress it back ... always | 08:03.13 |
kens | Doesn't matter I'd only have to decompress it again, I'm not good at reading compressed byte streams in my head | 08:03.40 |
pedr0 | :-) | 08:03.51 |
kens | So which of the numerous forms are you looking at ? | 08:03.58 |
| Actually it probably doesn't matter. | 08:05.38 |
pedr0 | where did you understand it's a form ? | 08:05.40 |
| main stream - page's stream | 08:05.57 |
| name /Fm0 | 08:06.01 |
| subtype | 08:06.20 |
kens | Right and that is itself a Form XObject with its own Resources, on of which is an XObject dictionary which defines the name /Fm0 to be a different object. | 08:06.38 |
pedr0 | Correct | 08:07.08 |
kens | This file uses (many) nested forms, each form has its own Resources, and many of those have XObjects. When those are Form XObjcets it defines the anems of the Forms as /Fm<x> where <x> starts at 0 and foes up to the number of Form XObjects used by that Form | 08:07.55 |
pedr0 | But what are form used for ? | 08:08.26 |
kens | Resource names are not unique in a PDF file, they depend on the context | 08:08.30 |
| Forms are a kind of reusable resource | 08:08.40 |
pedr0 | How do they differ from non-form XObject ? | 08:09.02 |
kens | There are only 2 kinds of XObjects (I'm ignoring PS XObjects because they were a silly idea and deprecated decades ago). There are image XObjects and Fortm Xobjects. | 08:09.49 |
pedr0 | I can tell by experience that I've seen docs where XObject are used to draw the page contents and I could not possibly understand why those operations were not contain within the page's stream | 08:09.53 |
kens | Image XObjects are bitmaps, Form Xobjects contain content streas | 08:10.02 |
pedr0 | right | 08:10.09 |
kens | You don't *have* to use Forms | 08:10.17 |
| The original intention was that thy were, literally, a form | 08:10.30 |
pedr0 | there I get lost in translation, I am not a native English speaker. Do you mean a mask with fields to be filled up by users ? | 08:11.04 |
kens | Then they got used as a way of making a resource you could call. Say you have a company logo (in vector format) You could stick that in the file once, then call it whenever you want to draw the logo | 08:11.16 |
| Obviously (if the logo is complex) that makes a much smaller PDF file | 08:11.29 |
| Originally a Form meant just that, the kind of theing the government sends you to fill in, or your bank wants you to fill ou when openeing a bank account. | 08:12.02 |
| Sbut now, a form is a 'thing'. You don;'t have to know what's in it, you can just use it. | 08:12.28 |
| Sadly, this has again been hijacked | 08:12.39 |
pedr0 | but you can but whatever you like in there, there are not constraints - it's just a stream of operations to be executed. | 08:13.02 |
kens | Yes, any valid PDF operations | 08:13.22 |
pedr0 | hijacked = misused | 08:13.23 |
kens | :-) | 08:13.27 |
| Many applications now use a PDF 'Form' whn their own native object defines a layer or an object. | 08:13.35 |
| So Cairo uses Forms *extensively* because of the way its transparency works | 08:13.52 |
| For example | 08:13.57 |
| So instead of making things simpler, Forms these days actually make things more complicated | 08:14.19 |
| Most PDF files today that contain forms use those forms only once. | 08:15.02 |
| Oh, there is one more genione reason for having a form, and that's to do with transparency. If you want to create a transparent group, you have to do it using a Form XObject. | 08:15.45 |
pedr0 | here we're getting to the group Group/S/ : Transparency | 08:16.23 |
kens | Yes, exactly | 08:16.37 |
| That's a whole new level of complicated | 08:16.52 |
pedr0 | It made me feel bad while reading it in the PDF Ref. couldn't get it | 08:17.25 |
kens | Its insanely over-specified, after literally decades of work on it we are *still* fixing problems in Ghostscript. (I imagine the same is true of MuPDF but I don' t know) | 08:18.22 |
| I admit these are teh edge and corner cases, and teh parts that are rarely used, but still..... | 08:18.38 |
pedr0 | But ... I've two questions, three actually: can I read Group/S Transparency as 'this won't be shown within the image' | 08:19.23 |
| > | 08:19.25 |
| ? | 08:19.27 |
kens | No | 08:19.33 |
| Its starts a new transparency Group | 08:19.42 |
| Which will then be blended with what's already been drawn | 08:20.19 |
| Depending on whether the group is Isolated or non-Isolated either the whoel Group content is ddrawn, and then that is blended with the backdrop, or the contents are blended with the backdrop as the content is drawn | 08:21.01 |
| And of coruse groups can contain groups | 08:21.11 |
pedr0 | omg | 08:21.28 |
kens | Also you can change the colour space being used to blend in | 08:21.42 |
| On a group by group basis | 08:21.47 |
| So your page could be blended in CMYK but the group drawn in it might be blended with the page backdrop using RGB | 08:22.14 |
| I'm afraid that's only the start of the complexity. Its an area I try to stay well away from | 08:22.46 |
pedr0 | It did seem complex yes | 08:24.06 |
| other than the PDF Reference, is there any other material you would suggest to understand this format better ? As a general suggestion, is there anything that springs to mind ? | 08:25.34 |
kens | Unfortunately, no, the PDF Reference is all there is. My only recommendation would be to use the Adobe specifications as far as possible and only use hte ISO ones if you need something specific to PDF 2.0. The ISO specifications are really hard to use. | 08:26.29 |
pedr0 | Another dream I've from time to time is a kind of PDF 'debugger' - sth that shows the state of the graphic state, text state etc. etc. at a given point in a stream of istructions | 08:26.54 |
| oks - thanks for that | 08:26.59 |
kens | I'm writing a new PDF interpreter for Ghostscript, and it started as a PDF 'lint', it was intended to show a load of interpretation related stuff. Since then its become a real interpreter though | 08:27.45 |
pedr0 | I've started writing a PDF interpreter myself using PoDoFo - but it's a lot of work and I am always afraid of doing mistakes and I need to double check with a 'real' implementation | 08:29.31 |
kens | The biggest problem with writing a PDF interpreter is all the broken PDF files which Acrobat will happily ignore and display. | 08:30.01 |
pedr0 | I thought GhostScript would only understand PS | 08:30.02 |
kens | Ghostscript has a PDF itnerpreter, which is written in PostScript currently :-) | 08:30.24 |
pedr0 | I am reading the FAQ just now. Listen I'd like to thank you a lot for your help, I am often in this chat and hopefully I will be able to give back sooner or later. | 08:34.27 |
kens | Not a problem, I do need to go out for a run now though. If you've more questions Tor and co should be here soon. | 08:35.33 |
ator | sebras: both LGTM, should go onto the release branch (currently tor/release) | 09:35.30 |
cgdae | Robin_Watts: you around? | 19:32.40 |
ntat | Hi | 21:56.44 |
mubot | Welcome to #mupdf, the channel for MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line. | 21:56.44 |
ntat | How can I run mupdf with Z parameter (fit page size to window)? Now I must every time click shift+z to do this. | 21:58.49 |
| <<<Back 1 day (to 2020/04/30) | Forward 1 day (to 2020/05/02)>>> | |