| <<<Back 1 day (to 2021/02/15) | Fwd 1 day (to 2021/02/17) >>> | 20210216 |
mnestorov | Hello gentlemen :) . A quick follow-up to our conversation yesterday. | 14:25.17 |
| 1. Using pdfa_dep.ps and removing useless command line arguments solved 6.2.3 (DeviceRGB may be used only if the file has a PDF/A-1 OutputIntent that uses an RGB colour space). | 14:25.17 |
| 2. I compiled and ran 9.54.0 (commit:master 1efe1f702) locally. Running with parameter `-dPDFACompatibilityPolicy=2` against the "problematic" PDF gives: | 14:25.18 |
| `GPL Ghostscript GIT PRERELEASE 9.54.0: Text string detected in DOCINFO cannot be represented in XMP for PDF/A1, aborting conversion.` | 14:25.18 |
| `GPL Ghostscript GIT PRERELEASE 9.54.0: Unrecoverable error, exit code 255` | 14:25.19 |
| 2.1 However, if the command is ran with `-dPDFACompatibilityPolicy=1`, the warning is still printed, but no abort is made. Successful command completion plus a completely valid PDF/A document, as validated by veraPDF. | 14:25.19 |
| I mean, logically that the compatibility policy to act like that, maybe it's just helpful for you to know that the logging of 9.54.0 is in this particular case | 14:26.07 |
artifexirc-bot | <KenSharp> mnestorov yes that behaviour is what I would expect if the Title can't be handled | 14:28.40 |
| <KenSharp> Thanks for reporting back! | 14:28.48 |
mnestorov | I think that would classify that your fix works :) | 14:29.04 |
artifexirc-bot | <KenSharp> Yeah it sounds like it is the same problem I fixed a week or so back, something weird about the way recent versions of ImageMagick are creating their PDF files | 14:29.35 |
| <KenSharp> It's not illgeal, but I really don't think it's what htey intend | 14:29.52 |
mnestorov | Is the DOCINFO referring to a title? | 14:30.52 |
artifexirc-bot | <KenSharp> DOCINFO is the pdfmark for the Document Information Dictionary, which is where the /Title is located in the PDF. There is a duplicate (in XML obviously) in the XMP metadata block | 14:32.12 |
| <KenSharp> For PDF/A they have to be byte by byte identical | 14:32.28 |
| <KenSharp> Which only works if the /Title is in PDFDocEncoding and limited to the values representable in one byte by UTF-8 | 14:32.50 |
mnestorov | Yes, that is why I understood from the PDF/A protocol | 14:32.57 |
artifexirc-bot | <KenSharp> The /Title from ImageMagick cotains NUL characters | 14:33.16 |
mnestorov | aha | 14:33.24 |
artifexirc-bot | <KenSharp> Which basically means we cant' handle them | 14:33.35 |
mnestorov | Should it contain NUL? | 14:33.45 |
artifexirc-bot | <KenSharp> So the only alternatives are 1) drop the /Title or 2) don't make a PDF/A | 14:33.51 |
| <KenSharp> I don't believe that it shoudl contain a NUL. (I'm not certain baout your PDF obviously, this relates to the ImageMagick ones) | 14:34.20 |
| <KenSharp> Each characer is written as 2 bytes | 14:34.34 |
| <KenSharp> Each initial byte is a 0x00 | 14:34.42 |
| <KenSharp> That looks awfully like its UTF-16 | 14:34.49 |
| <KenSharp> But there is no byte order mark | 14:34.58 |
| <KenSharp> Additionally the string is terminated with 0x00 0x00 | 14:35.09 |
| <KenSharp> Which looks like someone read a C string, including hte terminator | 14:35.21 |
mnestorov | Hah... I don't know if that can be considered as a bug on IM side | 14:36.21 |
artifexirc-bot | <KenSharp> It is possible that this is deliberate, it is leagal in PDFDocEncoding to use the NUL character (it turns into a /.notdef glyph) but it seems highly unlikely | 14:36.23 |
mnestorov | I see | 14:36.33 |
artifexirc-bot | <KenSharp> If you look at the document properties of such a file using Acrobat it displays an empty string for the /Title | 14:36.58 |
mnestorov | So they drop it as well? | 14:38.05 |
| I mean, the iText guys | 14:38.18 |
artifexirc-bot | <KenSharp> Acrobat doesn't display it in the original PDF. I can't comment about iText, I don't use it. | 14:38.35 |
mnestorov | I don't either, but I thought that Adobre used iText underneath | 14:39.03 |
artifexirc-bot | <KenSharp> Heck no! | 14:39.09 |
| <KenSharp> Adobe invented PDF 🙂 | 14:39.14 |
mnestorov | hah....misinformation from my part then | 14:39.35 |
| sorry for being silly | 14:39.41 |
artifexirc-bot | <KenSharp> Not a problem | 14:39.50 |
| <KenSharp> I do suspect the broken /Title from ImageMagick is a bug, but I'm not inclined to sign up and report it, not least because I'd have to find a way to reproduce it | 14:40.19 |
mnestorov | I understand that. Maybe someone else got around to it. | 14:41.14 |
artifexirc-bot | <KenSharp> I did ask the person who reported the bug to us to report the problem to IM, but I have no idea if they did | 14:41.44 |
mnestorov | When was this initially reported here? | 14:42.06 |
artifexirc-bot | <KenSharp> Give me a minute and I'll look | 14:42.18 |
| <KenSharp> someone pinging me on another channel | 14:42.25 |
mnestorov | no worries | 14:42.29 |
artifexirc-bot | <KenSharp> OK the bug report is here | 14:43.36 |
| <KenSharp> https://bugs.ghostscript.com/show_bug.cgi?id=703486 | 14:43.37 |
| <KenSharp> From the 6th February | 14:43.47 |
| <KenSharp> That was actually their second attempt to report a bug to us I think, the first time round I couldn't reproduce a problem, because the file they sent had been produced by an earlier version of IM, which properly included the byte order mark | 14:45.16 |
mnestorov | I might just dig around the bug tracker in IM and see if someone reported it. But other than that, I truly appreciate your help with my situation and your fixes in the latest gs! :) If my feedback is of any use, the gs documentation, namely these files here https://ghostscript.com/doc/current/Psfiles.htm and the compilation docs here | 14:49.08 |
| https://www.ghostscript.com/doc/9.50/Make.htm were very useful. The only thing I couldn't get is why is your canonical repo at https://git.ghostscript.com/?p=ghostpdl.git;a=summary restricted to the public, in terms of getting snapshots or pulling. I had to use the mirror at github. | 14:49.09 |
artifexirc-bot | <KenSharp> I think that's due to restricting bandwidth | 14:49.51 |
| <KenSharp> We were serving too many requests from our own servers | 14:50.02 |
| <KenSharp> But maybe I'm wrong about that, it ought to be possible to clone our own repo | 14:50.37 |
artifexirc-bot | <KenSharp> is not a Git expert | 14:50.41 |
mnestorov | It didn't cause a problem, of course, I managed to find you guys on GH. :) | 14:51.22 |
artifexirc-bot | <KenSharp> I think we prefer people to pickup the code from Github, but I believe it ought to work from our own server..... | 14:51.59 |
chrisl | The snapshot feature was a magnet for (D)DOS attacks, so we had to disable it | 14:52.48 |
artifexirc-bot | <KenSharp> Ah, that would be it | 14:53.11 |
mnestorov | Hah, I'm hoping it's not the competition ddos-ing you :) | 14:54.25 |
chrisl | Given other experiences, it was *probably* "security researchers" | 14:55.02 |
artifexirc-bot | <KenSharp> It may be because we run a bug bounty program for our **products** we get an awful lot of script kisddies 'your website has the following problems.... where's my bounty ?' | 14:55.11 |
mnestorov | Ugh | 14:55.45 |
| I don't know how you guys keep up with all of the work, especially when you have to deal with such things...it doesn't sound like it's your first time external forces keep you busy (from the fun of programming) | 14:57.14 |
| first time dealing with* | 14:57.24 |
artifexirc-bot | <KenSharp> Oh we get a few questions, it's not a huge number | 14:57.36 |
| <KenSharp> And teh bug reports keep the product nice and tight | 14:57.49 |
| <KenSharp> Which is good for our commercial customers | 14:57.57 |
| <KenSharp> Everyone needs a brak from programming now and then | 14:58.23 |
mnestorov | I hope you (and all of the nice people from gs and friends) get it :) | 14:59.56 |
artifexirc-bot | <KenSharp> Well ordinarily we'd travel to our company meetings, but that's been out of the question for the last 12 months 😦 | 15:00.31 |
| <KenSharp> A short weekend break in San Francisco is always nice | 15:00.48 |
| <<<Back 1 day (to 2021/02/15) | Forward 1 day (to 2021/02/17)>>> | |