| <<<Back 1 day (to 2018/03/12) | 20180313 |
kens | velix your question was about XMP metadata, not rectangles. I've already explained (and pointed to documentation) how pdfwrite works, rectangles and rectangular paths may be written either way, it just depends. | 08:08.03 |
| Weird, my change to ICC profile handling is resulting in changes in the output of files which are PostScript converted to PDF. I think that means we are converting CIE spaces to ICC profiles which aren't valid for embedding in PDF. | 09:33.32 |
| So that's another conversation to have with Michael I guess. I'll investigate further first | 09:34.17 |
| Seems our profiles generated internally for CIE colour spaces haev a profile->data_cs which is GS_UNDEFINED. But when we write the profile, it is correctly defined. So it looks like some kind of lazy evaluation going on. I need to tackle Michael about it. For the time being I can treat GS_UNDEFINED as valid in the checking code in pdfwrite. | 09:55.29 |
| chrisl commmit f06d2b00d005012379d4a2f6f47aab99b7b13b5c fixes the XPS problem, it only affects that file. Commit cb018b64dea61f2d6146652dce349d3fadab1709 squashes a compiler warning introduced by the PCL /.notdef fix. I'd ignore the other commit if I were you, it only prevents us writing a colour value twice, so no real effect other thna a tiny optimisation. | 11:05.29 |
| I believe that addresses all the concerns with changes detected in the release testing. | 11:06.35 |
velix | kens: Okay, then I didn't understand it ;) | 13:16.02 |
| kens: need to re-read the docs. | 13:16.07 |
kens | Ghostscript interprets its input (from whatever source) and converts that to its internal representation of graphics primitives | 13:16.40 |
| A graphcis primitive is the lowest level 'object' type supported by the graphics model or library. | 13:17.04 |
| So a rectangular path will be converted internally to a path. | 13:17.23 |
| When we come to write paths back out using one of the hgh level devices (pdfwrite and friends) we try to spot paths which are rectangles, and emit them as rectangles, rather than a sequence of path operations. | 13:18.04 |
| However, due to rounding errors, and other more subtle complications, we may not be able to identify a rectangular path as a rectangle. And sometimes we migth identify a path as a rectangle. | 13:18.41 |
velix | ah, I see. | 13:18.58 |
kens | So you cannot be certain that what started as a rectangle in the input, will end up as a 're' operation in the output. | 13:19.02 |
velix | Yeah, got it. | 13:19.28 |
kens | This is a consequence of the way Ghostscript and the pdfwrite device operate, and it has benefits as well as drawbacks. | 13:19.50 |
| It allows us to do colour conversion, image downsamplin,g rescaing, dropping of content etc very easily | 13:20.13 |
| But it does mean that the l;ow level of the PDF file is not the same as the input file would have been | 13:20.28 |
velix | kens: So for PDF manipulation, I should use mupdf ? | 13:20.36 |
kens | Which could be regarded as a drawback | 13:20.36 |
| Depends on the manipulation you have in mind | 13:20.51 |
velix | kens: Cut, merge, move pages. | 13:20.59 |
kens | And whether the change in the low level description is a problem, I can't really see why it would be. | 13:21.17 |
| TO be honest, I don't know enough about how MuPDF modifies files to have a decent opinion. I'd expect it to preserve the content more closely, but I could be mistaken. | 13:22.04 |
| You'd want Robin_Watts or tor8 to comment on that. | 13:22.15 |
velix | Okay. I'll also do some tests with more complex objects. | 13:22.30 |
| kens: Funny thing: PDFs created by Ghostscript are easier to manipulate in Acrobat than the ones printed with Acrobat's PDF reader ;) | 13:23.18 |
| eeeh PDF printer* | 13:23.28 |
kens | :-) | 13:23.32 |
| I don't really know why that would be, possibly it makes more Form XObjects, they are difficult to edit in Acrobat | 13:23.53 |
velix | I'll make some screenshot at work tomorrow. | 13:24.06 |
kens | OK | 13:24.52 |
tor8 | velix: yes, you can use mupdf to do high level manipulations of pdf (like rearranging pages, etc) | 13:25.04 |
velix | tor8: without the >possible< little problems of GS's pdfwrite? | 13:25.28 |
tor8 | velix: yes, we can do manipulation of the PDF objects that define the page structure, without affecting the drawing commands on each page | 13:26.02 |
velix | tor8: Okay, I'll have a look ;) | 13:26.12 |
| thx | 13:26.17 |
tor8 | velix: "mutool clean" can create a PDF with a subset or rearranged pages | 13:26.17 |
| and "mutool run" allows you to write javascript scripts to manipulate documents and do all kinds of things, if 'mutool clean' isn't enough for your needs | 13:26.46 |
| there's also "mutool merge" to combine documents | 13:27.23 |
velix | I really need to compare qpdf, mutool, PDKtk etc. :) | 13:27.33 |
tor8 | now be aware that surprising things may happen to the list of bookmarks and forms and things like that | 13:27.40 |
| when merging or subsetting documents, these things may break or be removed | 13:27.59 |
velix | sure | 13:28.17 |
| I understand this. | 13:28.21 |
kens | So... ICC profile version 3.3 is the lowest supported by PDF. This means falling back to version 2 is always guaranteed to be OK. Version 4 is supported by 1.5 and above. 1.7 supports 4.2.0.0, 1.6 supports something in between..... Ah version 4.1.0. So I need to know if a profile is 4.0, 4.1 or 4.2 and compare that against the PDF versions. If they aren't valid then I need to downgrade to V 2. | 13:28.54 |
velix | Working with ICC profiles is always fun ;) | 13:29.49 |
kens | Well, its more the case of someone wanting to take an existing PDF ffile and downgrade it to an earlier version | 13:30.10 |
| Or PDF/A ro something | 13:30.20 |
velix | ah, I see. | 13:30.23 |
| I like Ghostscript's NoFont...thing. | 13:30.35 |
| .) | 13:30.42 |
kens | That flattens the fotns to vectors I think | 13:30.48 |
velix | yes. | 13:30.51 |
| Very nice feature. | 13:30.57 |
kens | It used to be done with -dNOCACHE | 13:31.05 |
| But we wanted to remove that I think | 13:31.12 |
velix | oh | 13:31.20 |
kens | Also I think the NOCACHE meant that it slowed performance | 13:31.30 |
| But I can't recall exactly right now | 13:31.40 |
velix | Ok :) | 13:31.46 |
kens | I'm fairly sure there was a good reason for it | 13:32.29 |
| According to the commit (http://git.ghostscript.com/?p=ghostpdl.git;a=commit;h=8d3081c0403a1d911a79dce57008ede4279d050a) it was something to do with a discussion on gs-devel | 13:33.35 |
| Ah, and it was people using pswrite which triggered it. We deprecated that device years ago, and then removed it. Which upset some people who hadn't been paying attention | 13:34.05 |
mvrhel_laptop | Hi kens Do you want to chat about the color issue now? | 14:49.42 |
kens | If you think the meeting is over :-) | 14:49.53 |
mvrhel_laptop | I think my part is | 14:50.00 |
kens | What I encoutnered was in lifting the code from znumicc_components, when I ran 405-01.ps through the code it was rtripping over the ICC profile having a cs_data set to GS_UNDEFINED. | 14:50.32 |
| So we didn't write the progile. | 14:50.37 |
| Yet before I wrote that code (so we embedded the profile) it would work. | 14:50.56 |
| Turns out the profile when read back from teh PDF file has a cs_data which is not GS_UNDEFINED | 14:51.12 |
| For now I've chosen to permit embedding ICC profiles where the cs_data is GS_UNDEFINED, but this really doesn't feel right to me. | 14:51.45 |
| If you wan to tkae a look at the code, just pull the latest code from master and I'll tell you where to set breakpoints | 14:52.12 |
mvrhel_laptop | I thought my part was over.... | 14:53.14 |
kens | Nearly :-) | 14:53.41 |
mvrhel_laptop | kens ok let me get updated | 14:53.58 |
kens | Yes probably best | 14:54.03 |
| Hehe not over yet either :-) | 14:54.15 |
mvrhel_laptop | kens ok I am up and running | 15:02.14 |
kens | :-) | 15:02.17 |
mvrhel_laptop | don't know if I have 405-01.ps | 15:02.23 |
kens | So in gdevpdfk.c around line 296 | 15:02.30 |
| I do a switch on cmm_icc_profile_data->data_cs | 15:02.45 |
| What I see is that certain PostScript files get here with an ICC profile which has (I believe) been created form a PostScript CIEBased space | 15:03.23 |
| Those profiles have a cs_data which is set to GS_UNDEFINED | 15:03.38 |
mvrhel_laptop | hmm is that in the current code? | 15:03.59 |
| i.e. the switch on cmm_icc_profile_data->data_cs | 15:04.10 |
kens | Yes, I committed it earlier | 15:04.18 |
mvrhel_laptop | hmm hold on | 15:04.27 |
kens | This commit: | 15:04.41 |
| http://git.ghostscript.com/?p=ghostpdl.git;a=commit;h=f06d2b00d005012379d4a2f6f47aab99b7b13b5c | 15:04.41 |
mvrhel_laptop | yes ok hold ono | 15:04.55 |
| ok your line number was off | 15:08.39 |
| not 296 | 15:08.43 |
kens | Hmm, sorry must have misread it | 15:08.49 |
mvrhel_laptop | 796 | 15:08.56 |
kens | Ah yes, did misread it | 15:09.02 |
| 2 for 7 | 15:09.08 |
| So as I say when I process the PostScript file, the profile has a cs_data which is GS_UNDEFINED. | 15:09.33 |
| However, if I embed teh profile anyway, then process the PDF file with Ghostscript, when we get to znumicc_components() the cs_data is either 2 or 3, not 0 (GS_UNDEFINED) | 15:10.14 |
mvrhel_laptop | right. as it has read the data out of the profile | 15:10.38 |
| I think we just need to get this thing populated when we create the profile | 15:10.49 |
kens | Yep that was my suspicion, its simply not being set | 15:11.00 |
| For the release I'm happy enough with treating GS_UNDEFINED as acceptable, but it would be nice to get it turned into somethign else | 15:11.33 |
| The other thing was the version number | 15:11.43 |
mvrhel_laptop | yes the version number | 15:11.56 |
| I started working on that | 15:12.00 |
| and then got distracted with my paper | 15:12.06 |
kens | Oh cool | 15:12.06 |
mvrhel_laptop | :) | 15:12.07 |
kens | There's no rush | 15:12.11 |
| I won't put this in the release | 15:12.17 |
| But it would be useful to have. | 15:12.24 |
mvrhel_laptop | ok, but I will try to fix both of these this week. | 15:12.34 |
kens | Some versions of PDF can accept different versions of ICC profile | 15:12.39 |
mvrhel_laptop | I don't want to forget about them | 15:12.42 |
kens | version 4 ICC profile that is | 15:12.45 |
mvrhel_laptop | right | 15:12.50 |
kens | I hsould check the PDF 2.0 spec too, to see if that's changed..... | 15:13.02 |
| Didn't occur to me before | 15:13.06 |
| Though I think 4.2.0 is the current version anyway | 15:13.18 |
| Yeah no change there | 15:14.20 |
| I'd suggest you do your paper first since there's a deadline on that | 15:14.37 |
mvrhel_laptop | well I think I can split my day and do both | 15:15.06 |
kens | Up to you :-) I'll leave the bug open for now as a reminder | 15:15.19 |
mvrhel_laptop | thanks kens | 15:15.25 |
kens | Thanks to you for looking at it :-) | 15:15.33 |
mvrhel_laptop | np | 15:15.43 |
kens | Tomorrow I cna do the relelase bitmap checking and then back to pargraphs and columns again | 15:15.59 |
velix | Can mupdf or ghostscript flatten a PDF's layers? | 20:08.34 |
kens | for the logs, this question was answered on the #mupdf channle | 20:39.44 |
| (flatteing layers) | 20:40.04 |
velix | kens: Guess what? MS is using a tool programmed by "Softek", which embedds EPS into WMF/EMF. | 21:05.06 |
| http://www.accesssoftek.com/file-format-conversions | 21:05.27 |
| Hehe. References: "Reverse-engineered Corel Draw files for conversion into WMF/EMF formats for the Microsoft Office Suite." | 21:05.46 |
| "The development of a complete EPS parser and interpreter " | 21:06.27 |
| They're really calling it "parser". | 21:06.34 |
| Yeah, they're just embedding the EPS. | 21:15.12 |
| VERY interesting. | 21:15.15 |
| EPS in WMF | 21:15.19 |
| kens: hehe yeah, gs really re-creates rectangles: 500 499.996 1000 1000 re | 22:48.31 |
| It takes the bad rouding from Corel and creates a new rectangle. | 22:48.43 |
| Perhaps rounding 499.996 to 500 could be implemented with a user switch? | 22:49.04 |
| -dRoundPrecision=2 | 22:49.26 |
| Interesting rectangle rounding: 250.0002 49.9997 100 100.0001 re | 23:35.59 |
| Can I just render a specific font to paths? | 23:44.56 |
| Forward 1 day (to 2018/03/14)>>> | |