| <<<Back 1 day (to 2018/03/27) | 20180328 |
ray_laptop | I've pushed a patch to rename lcms2art to lcms2mt to my repository. Any comments (RobinWatts and chrisl in particular) | 04:56.41 |
kens | Well GhostXPS comprehensively corrupts non-Latin filenames :-( | 08:28.48 |
| I think because it passes them through a param list | 08:29.15 |
| It looks like the same effect happens for PDF files too, which is more than unfortunate | 08:29.35 |
chrisl | In such cases, they should be converted to UTF8 before being handed onto the "internals" | 08:32.02 |
kens | Yeah it is converted to UTF-8 | 08:35.35 |
| But I don't think its converted back again when we try to open the file. I may be mistaken | 08:35.49 |
| Really I only just started looking at the problem. | 08:36.01 |
chrisl | Hmm, that's strange.... not sure how the device end would know the difference | 08:37.09 |
kens | Its not the device end, its the front end opening the file. Or at least, I think it is | 08:37.40 |
| I'm at pl_cursor_open and the filename is UTF-8, about to see where this goes | 08:38.08 |
chrisl | Ah, I bet it doesn't go through the gp_* API | 08:38.38 |
kens | Very probably not, debugging it in a second..... | 08:38.51 |
| Well, it calls gp_fopen() | 08:40.38 |
| Oh, no it doesn't | 08:40.49 |
chrisl | And I think it's the gp_fopen that converts from UTF-8 to wchar_t | 08:41.34 |
kens | My bad, it does call gp_fopen, which returns NULL | 08:41.53 |
| Round again to see why | 08:42.00 |
| utf8_to_wchar() returns garbage it seems | 08:42.46 |
chrisl | Well, that's probably not good.... | 08:43.24 |
kens | I'm going to have to guess that's the problem :) | 08:43.35 |
| Its possible the UTF-8 is garbage before I get there of course, I'll need to check that out in a minute | 08:44.00 |
| Ah, looks like utf8_to_wchar() simply isn''t dealing properly with non-Latin characters. Looks like it stops when it finds one. Odd, I'd have though GS used all this too | 08:44.49 |
| Well it definitely isn't processing all theinput characters | 08:47.05 |
| OK so in gp_wutf8.c, utf8_to_wchar(), line 42: | 08:49.58 |
| if ((c & 0xC0) != 0x80) | 08:49.58 |
| return -1; | 08:49.58 |
| That's where the code stops | 08:50.04 |
chrisl | Huh, I rather thought gs used the same code | 08:51.00 |
kens | and returns an error, which (of course!) we don't check | 08:51.01 |
| Yes I thought that too | 08:51.07 |
| I'm going to run the same filename through GS in a minute and see what's different. | 08:51.21 |
| We do detect the fact the the fopen failed, because the FILE* is NULL, so we correctly return an ioerror | 08:52.17 |
| OK lets see what GS does | 08:52.28 |
| OK so I'm clearly mistaken somewhere, GS does call the same code with the same result | 08:54.13 |
| And gives me the same error as the PDF interpreter did. | 08:55.41 |
| I suspect this may be due to me running the windowed binary, lets try the command line | 08:56.09 |
chrisl | And that code hasn't changed since 2011 | 08:56.24 |
kens | OK so if I run the command line code, then teh filename is different | 08:57.12 |
| And the utf8_to_wchar works as expected | 08:58.25 |
| So I have to beleive that the UTF8 in GXPS is in fact wrong. | 08:58.39 |
chrisl | I don't follow - command line code?? | 08:58.59 |
kens | gswin32c instead of gswin32 | 08:59.10 |
chrisl | Oh, that's not good | 08:59.26 |
kens | Nope :-) | 08:59.34 |
| Running gswin32c <filename> appears to be subtly different from running gswin32 <filename> | 08:59.55 |
| gswin32c works | 09:00.02 |
chrisl | Eek! | 09:00.06 |
kens | It 'looks like' the UTF8 when I run gswin32 is wrong the same way as when I run gxps so I think I'll go back to gxps because the code is simpler to follow | 09:01.07 |
| Need to see where the filename is converted to UTF8 because it does appear that whatever we're seeing is not correct for some reason. | 09:01.36 |
| Ignore me while I burble randomly..... | 09:01.56 |
| So it 'loooks like' the filename is UTF8 when we run gswin32c and its wchar when we run gxps and gswin32c. Unsurprisingly, trying to take a wchar, treat it as UTF8 and create a wchar from it results in bad voodoo | 09:12.40 |
| I can understnad that when we deal with filenames in the PostScript world "(....) (r) file" then the string has to be UTF8. I'm less clear on why it *isn''t* UTF8 when we run gswin32, but is when we run gswin32c. It also looks like the GXPS code needs to convert the wchar to UTF8 and its not doing that step...... | 09:14.19 |
| This is, of course, Windows-specific code, Linux already gives us UTF8 filenames | 09:14.40 |
| Hmm, the GXPS code looks like it does convert to UTF8. So what am I missing ? :-( | 09:15.42 |
chrisl | gswin32c converts to utf8 | 09:15.51 |
kens | Yeah I think gswin32 doesn't though | 09:16.09 |
| I don't know why..... | 09:16.14 |
| The filenames are different when I look in gp_fopen() | 09:16.30 |
| In GS we end up here from zlibsystemfile or something like that | 09:16.44 |
chrisl | I think gswin32 conversion to UTF8 maybe wrong | 09:17.01 |
kens | That could be it, I'm sort of concentrating o ngxsp at the moment | 09:17.23 |
| Becaus eon the whole it looks simpler :) | 09:17.30 |
| gxps definitely converts the filename to UTF8 :-( | 09:18.21 |
chrisl | It is wchar_t to start with, though? | 09:18.45 |
kens | Yes it is | 09:18.59 |
| It seems like there's a double conversion going on somewhere, by the time I get to gp_fopen() the filename has changed again | 09:19.24 |
| I need to find out where that is happening | 09:19.31 |
| Trying to remember what sequence of binary is correct (out of three different ones) is taxing me this morning. I shoudl probably write them all down | 09:20.15 |
chrisl | kens: I'd look at what's happening in base/gsargs.c | 09:22.59 |
kens | Yeah I'm in there now :-( | 09:23.22 |
| I'm thinking that;s the problem | 09:23.32 |
| I hate param lists :-( | 09:24.12 |
| The filename is definitely b*ggered by the time we try to process it in pl_main_process_options | 09:25.48 |
| OOps, I shoudln't be here! | 09:25.54 |
| Back in a couple of hours | 09:25.59 |
Robin_Watts | ray_laptop: (For the logs) Commit looks good. | 10:37.07 |
| ray_laptop: (For the logs) We need to update the WhyThisFork.txt file to mention that we changed cmsChangeBuffersFormat. | 10:37.41 |
kens | Anyone know if the JPEG devices are actually capable of writing to stdout ? | 12:49.57 |
| If not I'll try and find out in a bit. | 12:50.19 |
| Hmm, well a quick look doesn't seem like it should be a problem. Oh well, another problem to look at. | 12:53.01 |
| OK so the problem with the Cyrillic filename is (as Chris suggested) that in gsargs.c, arg_next() we do : | 12:57.42 |
| i += codepoint_to_utf8(&cstr[i], c); | 12:57.42 |
| Since the filename argument has laready been converted from wchar to UTF8, this pretty much corrupts it I think. | 12:57.42 |
Robin_Watts | codepoint_to_utf8 should do nothing in the case where the arg encoding is UTF8. | 13:03.17 |
kens | How does it know ? | 13:03.28 |
Robin_Watts | hmm. ignore me maybe. | 13:04.38 |
| Does this only go wrong with gxps builds? | 13:04.47 |
kens | It definitely does change the content | 13:04.49 |
| At the moment I can say it goes wrong with gxps and gswin32 on Windows, it does not go wrong with gswin32c on Windows (no idea why yet), I have not yet tried gpcl | 13:05.24 |
Robin_Watts | Are gswin32 and gswin32c entered differently? | 13:06.02 |
kens | The symptom for gswin32 and gxps are the same, the filename is corrupted, its the same filename in each case, corrutped the same way, and it appears to be due to applying UTF8 encoding twice | 13:06.02 |
| I don't know yet, I decided to concentrate o gxps because the code is easier to follow. Ghostscritp gets the arguments from PostScript.... | 13:06.29 |
| Don't feel you have to respond to me BTW I'm just thinking out loud here. | 13:06.50 |
chrisl | It'll go wrong with gpcl6, too because it uses the same front end as gxps | 13:06.56 |
Robin_Watts | yeah, WinMain vs wmain | 13:07.01 |
kens | Yes I expected that, but since I hadn't tried it, I didn't like to say | 13:07.09 |
| My suspicion woudl be that gswin32c doesn't UTF8 encode the filename twice :-) | 13:07.34 |
| Let me puddle onwards.... | 13:07.43 |
| Looks like its not that simple :-( | 13:08.49 |
| If I don't do the doube encoding I stil get the wrong answer out of utf8_to_wchar | 13:09.10 |
Robin_Watts | I suspect new_main in dwmain.c should be setting the arg encoding to UTF8. | 13:09.53 |
kens | Sorry that's a bit of code I'm not looking at right now. I didn't even know we could state the arg encoding | 13:10.21 |
Robin_Watts | WinMain (and wmain) are both called with wchar_t's. | 13:10.41 |
kens | Right | 13:10.50 |
Robin_Watts | Both convert the wchar_t's into utf8. | 13:10.53 |
kens | Yeah same as the gxps code | 13:11.01 |
Robin_Watts | That then gets fed into the gsapi_init_with_args stuff. | 13:11.09 |
kens | gp_fopen converts the UTF8 back to wchar | 13:11.17 |
Robin_Watts | kens: as it should. | 13:11.25 |
kens | and uses Windows calls to open the file | 13:11.30 |
| Yes, when the name is correctly UTF9 encoded, this works (gswin32c) | 13:11.42 |
Robin_Watts | the difference, I think between the WinMain and wmain routes is that wmain calls main_utf8, which calls gsdll.set_arg_encoding(instance, GS_ARG_ENCODING_UTF8); | 13:12.13 |
| and the new_main route does not call that. | 13:12.24 |
| That call should happen before the init_with_args call. | 13:12.33 |
kens | Well that sort of sounds like it | 13:12.40 |
Robin_Watts | see dwmain.c line 680 | 13:12.53 |
kens | Though I don't think that will hel pwith gxps | 13:12.55 |
Robin_Watts | see dwmainc.c line 680 | 13:13.00 |
| I bet I forgot to call it in the gxps case too. | 13:13.13 |
kens | Possibly, but gsargs doesn't seem to cater for it. Unless possibly the copdepoin_to_utf8 does ? | 13:13.51 |
| I haven't actually looked in that routine | 13:13.59 |
Robin_Watts | kens: That code is an utter mindfuck. | 13:14.15 |
kens | Its certainly challenging me this morning :-) | 13:14.28 |
Robin_Watts | Take if from me that if the code is passing in UTF8, and hasn't set the arg encoding to UTF8 that routine will go wrong. | 13:14.56 |
kens | Hmm, I see the call to gsdll.set_arg_encoding | 13:14.56 |
| Yeah it does go wrong that's for certain :-D | 13:15.07 |
| I'll poke this with a shaper stick for a bit, thanks | 13:15.17 |
Robin_Watts | Good luck, and thanks. | 13:15.25 |
kens | Ah Henry came to the same conclusion about the PCL problem as I did, the enmuerated value is totally weird. | 13:22.54 |
| Afternoon HenryStiles I notice you concluded the same as I did about that PCL report. That's a very peculiar value | 13:26.13 |
| I checked the latest PXL spec I could find and it only has the same three values | 13:26.28 |
HenryStiles | kens: yeah I really hate putting exceptions like that in the interpreter, but my printer works. | 13:27.20 |
kens | Oh, I never thought to try my printer, I guess I shoul dhave | 13:27.34 |
| Do you think HP ignores it ? | 13:28.03 |
HenryStiles | seems to do that | 13:28.15 |
kens | Oh, well I suppose that's understandable at least | 13:28.29 |
chrisl | That seems like the kind of PCL5 crap they were trying to get away from with PXL - ignore crap input, and carry on regardless :-( | 13:29.01 |
kens | Hmm so we set (gs_main_instance *) minst->get_codepoint based on the type of argument encoding. Which is fine. Presumably we use that somewhere to 'decode' the UTF8 and because we set it to NULL, it simply doesn;'t so anything. So at least the first part of the problem is that the code in gsargs doesn't do that, its just calls codepoint_to_utf8 | 13:30.38 |
HenryStiles | chrisl: yeah for the most part XL does a good job with that, but there are exceptions. | 13:30.48 |
kens | Oooh, but some code in gsargs *does* use get-codepoint. | 13:31.41 |
HenryStiles | chrisl, kens : I thought Robin_Watts fixed all the filename argument stuff sometime ago. | 13:31.51 |
kens | Yeah it seems not | 13:31.58 |
Robin_Watts | kens: Is the arg encoding being set to utf8 ? | 13:32.10 |
kens | Robin_Watts : not in gxps, but at the moment the gxps code isn't calling anything which uses minst->get_codepoint either | 13:32.40 |
| I was just rambling through the gswin32c code seeing what setting the encoding actually did. | 13:33.03 |
Robin_Watts | kens: Ok. I would strongly suggest that you make it be set to utf8 before you do anything else, otherwise it will never work. | 13:33.14 |
kens | Yes, one of the reasons for doing this was exactly to see how to do it :-) | 13:33.41 |
Robin_Watts | I'm perfectly prepared to believe there are cases in there that are still wrong. | 13:33.46 |
kens | Its sort of weird certainly | 13:33.57 |
| Its definitly a complicated area | 13:34.12 |
Robin_Watts | The code is an utter mindfuck. it has to cope with args being decoded, then pushed back into the arg list, so you mustn't decode them a second time. | 13:34.34 |
kens | plmain.c seems to set gp_local_arg_encoding() which is probably not correct :-) | 13:34.47 |
| But it looks like gxps doesn't set it at all | 13:35.10 |
chrisl | There's a coffee out there with my name on it - I'm going to track it down....... | 13:35.46 |
kens | Aha, I see the get_codepoint call | 13:35.52 |
Robin_Watts | I am being defeated by simple ****ing arithmetic at the moment. | 13:36.32 |
kens | Ah, its set by arg_init. Right, finally feel I'm making some forward progress instead of floundering | 13:36.34 |
| Arithmetic is always a trial for real mathematicians :-) | 13:36.58 |
| OK looks like getting the encoding correct will do the job, manually hacking it seems to work. Thanks for the hint Robin_Watts | 13:46.52 |
Robin_Watts | kens: No worries. | 13:47.00 |
kens | Robin_Watts : car to cast a quick eye over: | 14:25.46 |
| http://git.ghostscript.com/?p=user/ken/ghostpdl.git;a=commit;h=33701ac07115cb2f634a40ab73a5c127ca870ad8 | 14:25.46 |
| Particularly the change in plmain.c. I think dwmain.c is non-controversial because its a copy of what's in dwmainc.c | 14:25.46 |
Robin_Watts | I can't really comment on plmain.c without some digging, cos I have no idea what the code there does. | 14:31.04 |
kens | Oh it sets arg_codepoint, which is broadly the same as the set_encoding done in dwminc.c | 14:31.43 |
Robin_Watts | Do gxps etc now set the arg encoding with this change? | 14:31.43 |
kens | It doesn't set the encoding as such, but arg_init gets set up with the correct 'encoding' | 14:32.10 |
Robin_Watts | right. | 14:32.15 |
kens | Normally that is set from minst->encoding (or something liek that0 which is set by set_encoding | 14:32.26 |
| This has the same effect overall | 14:32.33 |
Robin_Watts | It looks to me like arg_get_codepoint will now be set to NULL in almost all cases. | 14:32.54 |
kens | Yes | 14:33.05 |
Robin_Watts | cos we "never" build with GS_NO_UTF8 | 14:33.11 |
kens | You would have to set GS_NO_UTF8 to get it non-NULL | 14:33.17 |
Robin_Watts | right. Well, if this works, I'm happy. | 14:33.27 |
kens | It definitely works for me, and it fixes gxps and gpcl at the same time | 14:33.42 |
| OK so I'll commit it. We may need further work, but its defintiely better than it was. | 14:33.57 |
| I guess this is something we may also need to address with language switching. I don't htink the current PL approach is going to be sufficient, its too primitive | 14:34.31 |
Robin_Watts | mvrhel_laptop: ping. | 14:38.40 |
kens | OK on to JPEG output to stdout. | 14:40.00 |
| Coffee first I fancy | 14:40.06 |
| I see you scared Michael off Robin_Watts | 14:40.18 |
Robin_Watts | Indeed. | 14:40.33 |
| mvrhel_laptop: ping ? | 15:01.02 |
mvrhel_laptop | Morning Robin_Watts | 15:01.12 |
Robin_Watts | morning. | 15:01.16 |
| Quick question about the thresholding stuff. | 15:01.32 |
mvrhel_laptop | ok | 15:01.53 |
Robin_Watts | You apply a sign fix to both contone and screen before doing the subs. | 15:02.03 |
mvrhel_laptop | ok | 15:02.13 |
Robin_Watts | Is there a reason why you don't preapply that to the screen data? | 15:02.18 |
mvrhel_laptop | no | 15:02.29 |
Robin_Watts | good answer :) | 15:02.44 |
| Also, did we try doing the bit reversal in SSE? | 15:03.34 |
mvrhel_laptop | I thought you or Steve did | 15:03.45 |
Robin_Watts | By using the shuffle instruction before we movemask? | 15:03.49 |
mvrhel_laptop | and it was slower | 15:03.53 |
| but I dont know | 15:03.56 |
| for sure | 15:03.59 |
Robin_Watts | I have this memory of trying it, but I'm trying to think how it could have been slower! :) | 15:04.09 |
ray_laptop | Robin_Watts: Thanks for the comments on the lcms2mt rename | 15:13.45 |
| Robin_Watts: oh, and I haven't yet gotten rid of cmsChangeBuffersFormat -- I've added the comment about "replacing" it with the Clone function in the "Why" doc. I'll get rid of the Change function and push an amended change | 15:23.32 |
| later, however. I have an errand to run. | 15:23.54 |
Robin_Watts | ray_laptop: Yeah, but we should document the new function in the "Why" doc, and say why it's a problem. | 15:24.46 |
| I am happy to do that after you put your commit in though if you'd rather. | 15:25.28 |
ray_laptop | Robin_Watts: (part of) the explanation I had was "replace cmsChangeBuffersFormat which cannot be used to alter a link profile that might be in use by another thread." It doesn't explain "why" it can't be used by another thread. Should I state that as well? | 16:27.57 |
Robin_Watts | ray_laptop: I'd say something like "The goal of this work is to allow transforms to be used in multiple threads at the same time. Sadly cmsChangeBuffersFormat conflicts with this aim. Accordingly, we've replaced it with cmsCloneTransformWithChangedFormats (or whatever it was), which allows safe operation with (almost) as low a cost." | 16:30.19 |
ray_laptop | Robin_Watts: mvrhel_laptop: I've moved the relevant part of cmsChangeBuffersFormat into CreateNamedColorDevicelink where it was used. | 16:51.59 |
mvrhel_laptop | ray_laptop: sounds ood | 16:52.10 |
ray_laptop | and gotten rid of the former | 16:52.12 |
mvrhel_laptop | and even good | 16:52.14 |
Robin_Watts | cool. | 16:52.16 |
mvrhel_laptop | Robin_Watts, and ray_laptop: ok another version now on casper | 17:24.13 |
Robin_Watts | looking. | 17:24.21 |
mvrhel_laptop | hopefully we are pretty much there now | 17:24.26 |
ray_laptop | Robin_Watts: I didn't use your wording exactly, but I added more information that discusses the problem and reasoning for the change. | 17:27.00 |
| mvrhel_laptop: looking... | 17:27.06 |
Robin_Watts | cool. | 17:27.09 |
ray_laptop | mvrhel_laptop: well, actually still downloading. | 17:28.25 |
mvrhel_laptop | Yeah I need to replace that pdf of the images with just a single image | 17:29.08 |
| I did that with the altona files | 17:29.23 |
ray_laptop | mvrhel_laptop: oh, yeah, that might help. It doesn't need 95M pixels :-) | 17:29.44 |
mvrhel_laptop | including the v2 altona one natively really made it large (and slow) | 17:29.46 |
Robin_Watts | So, what name did we settle on? :) | 17:31.38 |
| Introduction, paragraph 3, "with out" | 17:32.49 |
mvrhel_laptop | name | 17:33.06 |
| ? | 17:33.07 |
Robin_Watts | lcms-mt or lcms2-mt ? | 17:33.21 |
| There was discussion of this yesterday on here. | 17:33.34 |
mvrhel_laptop | ok. Not that I was part of | 17:33.43 |
| :) | 17:33.45 |
| I am fine with the code having an lcms2 designation if that is what lcms has | 17:34.08 |
| ie. lcms2 become lcms2mt | 17:34.20 |
Robin_Watts | "This severely limits performance capabilities of applications..." "performance capabilities" ? not just "performance" ? | 17:34.24 |
mvrhel_laptop | Robin_Watts: I would prefer if you would write up your comments in an email if you could | 17:35.11 |
| it would make it easier for me | 17:35.15 |
Robin_Watts | Sure. | 17:35.19 |
| On the website he calls it "LittleCMS" | 17:35.30 |
mvrhel_laptop | yes | 17:35.33 |
Robin_Watts | but all the downloads etc are "lcms2-2.9", and the directories are "lcms2-2.9" etc. | 17:35.51 |
| so either we should be "LittleCMS-MT" or we should be "lcms2-mt", IMAO. | 17:36.18 |
mvrhel_laptop | I might be ok with LittleCMS-MT. lcms2-mt I am not ok with in the paper | 17:37.10 |
ray_laptop | I think LittleCMS-MT is consistent with Marti's docs and usage | 17:38.52 |
mvrhel_laptop | yes | 17:39.04 |
| we will go with LittleCMS-MT | 17:39.38 |
Robin_Watts | cool. | 17:40.19 |
ray_laptop | chrisl: I'd appreciate it if you could give the configure.ac and makefile changes a look. I know it builds on linux, but that is the extent of the validation I have (my repos for renaming to lcms2mt) | 17:44.39 |
| Robin_Watts: please have a look at the WhyThisFork in what I just pushed, please (mvrhel, too??) | 17:48.08 |
Robin_Watts | Will do, still reading. | 17:48.20 |
mvrhel_laptop | Robin_Watts: getting ready to head out for a bit. I will look at what you have when I return... | 17:59.37 |
ray_laptop | mvrhel_laptop: I don't see problems with the paper now | 18:33.13 |
chrisl | ray_laptop: Oops, sorry I forgot to reply - I looked the lcms2mt changes earlier, and I think it's okay | 19:16.13 |
ray_laptop | chrisl: thanks | 19:20.50 |
mvrhel_laptop | Robin_Watts, or ray_laptop you around? | 20:52.31 |
| or HenryStiles ... | 20:52.59 |
HenryStiles | yup | 20:55.32 |
mvrhel_laptop | HenryStiles: could you bring chains off the regression testing for me while I run some timing tests? | 20:55.55 |
| I dont have super user ability | 20:56.02 |
| to do this | 20:56.05 |
| there it goes. Thanks HenryStiles | 21:00.36 |
| I will let you know when I am done | 21:00.46 |
HenryStiles | okay I never did it before and had to figure it out | 21:00.48 |
| mvrhel_laptop: leave a note here and I'll enable it later, have an errand | 21:27.03 |
mvrhel_laptop | HenryStiles: ok thanks. It is going to be awhile | 21:27.24 |
| bbiab | 22:10.10 |
| Forward 1 day (to 2018/03/29)>>> | |