| <<<Back 1 day (to 2012/12/11) | 2012/12/12 |
tkamppeter | chrisl, you asked for me yesterday? | 09:15.22 |
chrisl | tkamppeter: yeh, we've got a cups bug that's rumbling on in launchpad..... | 09:15.54 |
| Well, cups with gs, I should say | 09:16.07 |
| tkamppeter: https://bugs.launchpad.net/bugs/978120 | 09:16.30 |
tkamppeter | chrisl, I have seen the last comments. Can you post on the bug with which parameters you generated the working PostScript file? Then I can add the appropriate exception to the pdftops filter, for Toshiba printers. | 09:18.58 |
chrisl | tkamppeter: I didn't generate it, I modified it by hand. But, IIRC, we already have a couple of printers that disable *all* the compression from ps2write, so it should be same as those. | 09:20.50 |
| But I suggest we wait to hear back from a few other people (hopefully) trying it out | 09:21.23 |
| tkamppeter: as long as that bug is "on your radar" again, I don't think we need anything from you until we get a few more people running the test job..... | 09:35.39 |
| and now I'm off to play squash | 09:35.51 |
Robin_Watts | samples_mupdf_001.zip and samples_gs_001.zip are now uploaded to my casper home dir. | 14:45.13 |
| Hi marcosw. | 16:33.01 |
marcosw | morning Robin_Watts | 16:33.08 |
Robin_Watts | You know the company we visited last week? Well, their security team has been looking at gs and mupdf (independent of our visit). | 16:33.36 |
| They've sent a couple of archives of files that cause crashes. | 16:33.54 |
| I've uploaded them to casper in my home directory. | 16:34.09 |
| samples_{gs,mupdf}_001.zip | 16:34.17 |
| chrisl (or anyone else)... so we have the gsapi interface designed for people to drive the gs lib. | 16:36.31 |
| We also have a gsdll interface. | 16:36.39 |
| Which seems to wrap the gsapi one. | 16:36.56 |
| Is gsdll windows specific ? | 16:37.01 |
marcosw | Robin_Watts: There appear to be several :-) files in each archive | 16:37.23 |
Robin_Watts | no, I'm seeing macos stuff in there too... | 16:37.31 |
| marcosw: Oh yes. All with "unique stack traces" apparently. | 16:37.49 |
marcosw | so I should enter one bug for each file and assign them all to you? | 16:38.22 |
Robin_Watts | marcosw: I have no idea. I've not had a chance to look at any of them yet. | 16:38.46 |
| But henrys said I should share the archives with you, so... there you are. | 16:39.01 |
henrys | marcosw:they should be treated like customer bugs | 16:39.21 |
Robin_Watts | henrys: There are quite a few files... | 16:39.42 |
marcosw | something close to 2000 PDF files | 16:39.56 |
| though that's both the mupdf and gs archives, so there may be overlap. | 16:40.15 |
| henrys: I'll ask miles/joann to generate a customer number of the potential customer, so that we can track the bugs. | 16:40.48 |
henrys | marcosw:okay. | 16:41.43 |
| the dll is supposed to work on linux, windows and mac | 16:42.42 |
| last I looked at it. | 16:42.52 |
Robin_Watts | ok. | 16:42.56 |
henrys | marcosw:since there are so many do you want to split up the tests among the staff? | 16:48.19 |
Robin_Watts | henrys, marcosw: Presumably marcosw is going to look at the gs ones and leave the mupdf ones to me ? | 16:50.02 |
marcosw | henrys: presumably the mupdf ones are the most important, since that's what they are discussing licensing, so I was going to look at those first. Is there anybody other than tor8 and Robin_Watts I should assign bugs to? | 16:50.05 |
henrys | not for mupdf bugs | 16:50.54 |
marcosw | Robin_Watts: I was going to go through all the bugs and check to see if they can be duplicated with master before entering them. | 16:51.16 |
Robin_Watts | If it turns out that some/most of the bugs have been fixed already, then that would be appreciated, as it will save me/tor8 time. | 16:51.47 |
| If on the other hand, every file causes a bug to be opened, it may be more efficient to just have us do it as we go. | 16:52.15 |
marcosw | do we know what version of mupdf they tested with? | 16:52.24 |
Robin_Watts | marcosw: I would imagine 1.1 or earlier. | 16:52.39 |
marcosw | so there is hope that some have been fixed. | 16:53.08 |
henrys | were all these bugs produced in an android environment? | 16:53.14 |
marcosw | is there any suggestion that some of these are valid pdf files? i.e. are they expecting output or just not a segfault? | 16:54.07 |
Robin_Watts | henrys: The crash logs show what looks like x86 assembly to me. | 16:54.55 |
henrys | Was the associated email mailed to support and I missed it? | 16:55.05 |
marcosw | a lot of these are crashing in j2k_decode | 16:55.16 |
Robin_Watts | And it is indeed Mupdf-1.1 | 16:55.48 |
marcosw | or jbig2 or fz_paint | 16:55.52 |
henrys | I am certainly hoping they haven't found 2000 unique crashes, that would be very bad news. | 16:56.00 |
marcosw | only 1246 for mupdf :-( | 16:56.27 |
| would does "asan" mean? (the files all have SIGSEGV or asan as part of the filename). | 16:57.00 |
Robin_Watts | "address sanitiser" | 16:57.25 |
| That's the tool they've used to detect problems. | 16:57.36 |
marcosw | actually many of the files are duplicated, i.e. | 16:57.42 |
| 340 -rw-r----- 1 marcos marcos 340409 2012-12-01 12:40 925.pdf.asan.13.4249 | 16:57.44 |
| 340 -rw-r----- 1 marcos marcos 340409 2012-12-01 12:41 925.pdf.asan.38.4249 | 16:57.45 |
| 340 -rw-r----- 1 marcos marcos 340409 2012-12-01 12:40 925.pdf.asan.40.4249 | 16:57.46 |
| 340 -rw-r----- 1 marcos marcos 340409 2012-12-01 12:41 925.pdf.asan.50.4249 | 16:57.48 |
| 340 -rw-r----- 1 marcos marcos 340409 2012-12-01 12:40 925.pdf.asan.6b.4249 | 16:57.49 |
| 340 -rw-r----- 1 marcos marcos 340409 2012-12-01 12:41 925.pdf.asan.8.4249 | 16:57.51 |
| 340 -rw-r----- 1 marcos marcos 340409 2012-12-01 12:41 925.pdf.SIGSEGV.48.4249 | 16:57.52 |
| 340 -rw-r----- 1 marcos marcos 340409 2012-12-01 12:40 925.pdf.SIGSEGV.5fa.4249 | 16:57.54 |
| 340 -rw-r----- 1 marcos marcos 340409 2012-12-01 12:40 925.pdf.SIGSEGV.745.4249 | 16:57.55 |
| 340 -rw-r----- 1 marcos marcos 340409 2012-12-01 12:41 925.pdf.SIGSEGV.f4c.4249 | 16:57.55 |
Robin_Watts | http://code.google.com/p/address-sanitizer/wiki/AddressSanitizer | 16:58.23 |
marcosw | any idea what that means? Does 925.pdf crash in 10 different ways? | 16:58.34 |
henrys | 4249 is likely the process number so how many of those do we have? | 16:58.46 |
Robin_Watts | I suspect that each file is for a different detected callstack? | 16:59.15 |
henrys | with the same exact size? | 16:59.30 |
marcosw | there appear to be 542 unique PDF files. | 16:59.45 |
Robin_Watts | but honestly, I've just downloaded the zips and reuploaded them at this point. I haven't done any digging beyond checking that each file is indeed a PDF file, and that the first one does indeed crash mupdf. | 17:00.02 |
| I'm trying to avoid being distracted from unicode hell. | 17:00.15 |
henrys | see in 10 minutes we've reduce the workload to 1/4 at this rate we should be done in a few more minutes ;-) | 17:00.27 |
marcosw | Robin_Watts: give me a couple of hours to look at the files; i'll send an email when I have something to report. | 17:01.04 |
Robin_Watts | marcosw: Great, thanks. | 17:01.16 |
marcosw | Oops, it's 9:00, I have to run. Back online in a bit. | 17:01.19 |
Robin_Watts | I'll dig up their emails and post them to support. | 17:01.32 |
henrys | Robin_Watts: thanks | 17:01.55 |
kens2 | time to go, goodnight all | 17:03.16 |
Robin_Watts | Night kens2 | 17:03.32 |
| henrys, ray, anyone else interested: I've just put a new version of my Unicode changes up on bug 692381. My very limited testing suggests it works, but it would be good to have it sanity checked before I go any further. | 17:21.33 |
| If any of you have the time to look at the proposed change and decide whether you think it's reasonable etc, I would be very grateful. | 17:22.04 |
henrys | okay let's get chrisl to weigh in also before a commit | 17:23.57 |
Robin_Watts | I'd kinda like Sags to OK it too, given that he's the one who has been finding faults so far. | 17:24.39 |
| ok. Now to document GS_THREADSAFE etc. | 17:25.26 |
mvrhel_laptop | Hi Robin_Watts | 17:30.45 |
| so does the PAM format support 16 bit values? Do you just set the max value to 65535 and do you use big or little endian encoding? | 17:31.29 |
Robin_Watts | mvrhel_laptop: Hi | 17:38.52 |
| Hold on... | 17:38.54 |
| http://netpbm.sourceforge.net/doc/pam.html | 17:39.39 |
marcosw | Robin_Watts and henrys: the good news is that the crashes are all easily reproducible: mupdf <filename> is sufficient in all cases I've tried. Also none of the files appear to be valid, i.e. Acrobat can't read them. Also in many cases mupdf prints warnings and errors before crashing, i.e.: | 17:39.42 |
| mupdf(7372) malloc: *** mmap(size=1010085888) failed (error code=12) | 17:39.45 |
| *** error: can't allocate region | 17:39.45 |
| *** set a breakpoint in malloc_error_break to debug | 17:39.47 |
| error: malloc of array (-1047476 x 3136 bytes) failed (integer overflow) | 17:39.48 |
| error: out of memory | 17:39.50 |
| error: cannot draw xobject/image | 17:39.51 |
| warning: Ignoring errors during rendering | 17:39.52 |
| Bus error | 17:39.53 |
| Exit 138 | 17:39.54 |
Robin_Watts | That suggests that maxval can be 65535. | 17:40.06 |
marcosw | you'd think "out of memory" would not be a "Ignoring errors during rendering" condition :-) | 17:40.22 |
mvrhel_laptop | Robin_Watts: yes. OK thanks | 17:40.25 |
marcosw | I think I'll triage the files based on where the crash occurs (jp2k, jbig, fz_, etc). and open one bug for each category. | 17:40.39 |
Robin_Watts | and the data should be most significant byte first (i.e. the wrong way round :) ) | 17:40.40 |
mvrhel_laptop | alright. I am trying to nudge Max then to use this format. The one that he sent me does not include any depth (number of colorants) in the header. I was just not sure about the 16 bit handling of PAM but it would appear to be just fine | 17:41.46 |
Robin_Watts | marcosw: The idea is that in the case of such errors we ignore them and continue as best we can, so the user gets *something* on the screen. | 17:41.48 |
henrys | Robin_Watts:the patch seems reasonable to me except the trivial nit that I don't like the term "rune" | 17:41.57 |
Robin_Watts | but we leave an indication there that the rendering is incomplete so that callers can expose that to the user somehow. | 17:42.40 |
mvrhel_laptop | rats. I am at the coffee shop and left my external drive with the PDF FTS files at home needed it to work on my 2 P1 customer bugs | 17:42.46 |
Robin_Watts | henrys: I should perhaps have used 'codepoint' | 17:42.55 |
mvrhel_laptop | with SVN I should be able to get the individual files though | 17:43.06 |
Robin_Watts | mvrhel_laptop: If there is a particular file, I could mail it. | 17:43.11 |
mvrhel_laptop | oh that would work | 17:43.19 |
| hold on | 17:43.22 |
| fts_25_2526.pdf | 17:43.41 |
Robin_Watts | or you can scp from peeves if you have that set up. | 17:43.54 |
mvrhel_laptop | and fts_14_1418.pdf | 17:44.00 |
| I used to be able to get to peeves is that where they are | 17:44.16 |
Robin_Watts | They are on every cluster node. | 17:44.28 |
mvrhel_laptop | let me try peeves | 17:44.36 |
Robin_Watts | /home/marcos/cluster/tests/... or /home/marcos/cluster/tests_private/... | 17:44.53 |
mvrhel_laptop | ok I am on peeves | 17:44.57 |
| ok. this should work just fine. thanks Robin_Watts | 17:46.13 |
Robin_Watts | np. | 17:46.19 |
| hey sags | 17:50.57 |
sags | @Robin_Watts, about the GSDLL interface (for the logs): A comment in base\gsdll.h says /* This interface is deprecated and will be removed in future ghostscript releases. Use the interface described in API.htm and iapi.h. */. | 17:50.59 |
| However, I have absolutely no idea how many people use (or rely on) it. Maybe it's better to keep it. | 17:51.07 |
Robin_Watts | sags: Yeah, I added the obvious entrypoint just in case. | 17:51.34 |
| Does the proposed patch meet with your approval? | 17:52.01 |
sags | I didn't know there is a new patch untill now, so have not looked. | 17:52.33 |
Robin_Watts | sags: Ah. I only uploaded it 20 minutes ago or so. I thought that was what had prompted you to appear :) | 17:53.19 |
mvrhel_laptop | hmm my ubuntu under hyper-v seems snappier today | 17:53.39 |
sags | Anyway, I'm thinking about a different way to handle the @files charset, which eliminates the need to store it in the context/ etc -- "sniff" the charset. | 17:53.41 |
| As a side effect, it solves a problem that I always forget about, the BOM. On Windows, it's usual that UTF-8 files contain a BOM at the beginning, and this has to be skipped. Also UTF-16 files need a BOM to know the endianness. | 17:55.29 |
Robin_Watts | sags: I'm not sure I follow. With the proposed patch, we set the encoding up front. "sniffing" implies guesswork to me... | 17:55.37 |
| Possibly we should update the patch to skip the BOM if it is met (and is the right type), and to error out if the wrong type is met. | 17:56.48 |
sags | Yes, some guesswork, but I think it's better overall. In the end, an @file can be a created by a different tool than the GSAPI client (for example be more of a "config" file), so there's not an absolute connection between the GSAPI function used and the encoding of @files. | 17:57.28 |
Robin_Watts | Well, there is if we define there to be :) | 17:58.11 |
| I could conceive of a system where we assume that the format of the @ files is assumed to be the same as the configured encoding format, but where we swap to a different encoding for the @ file being processed if we hit a BOM. | 17:59.41 |
marcosw | occasionally get's confused which window the pointer is in and types shell command into chrome, surprisingly this often works (i.e. 'man sed') | 17:59.55 |
sags | If the BOM is present there's no guessing. If there's no kind of BOM, then we can either: | 18:00.37 |
| (1) assume host encoding (which translates to ANSII on Windows, UTF-8 on Linux) | 18:01.06 |
| or (2) Verify if there are any NUL bytes in the 1st 1024 bytes or so of the file. If yes, consider the file as UTF-16 "HE" "host endianness". If no asume native host encoding. | 18:02.21 |
| Then convert string to UTF-8 in the wrapper function, and you don't need to handle varying encodings anywhere else. | 18:03.43 |
| (The JIT-recoding is still needed but only for the @files.) | 18:04.28 |
Robin_Watts | sags: I am tempted to go with what we have in the patch for now. We can add 'sniffing' later if we have a call for it. | 18:07.52 |
| The problem with anything that involves guesswork is that someone will find a case where it fails, and complain. | 18:08.58 |
| (like someone will come up with an @ file that contains a single character (> 128) and no BOM, and we'll pick the wrong encoding) | 18:09.51 |
sags | As long as it will not need any change to the GSAPI interface, yes it can be added later. Even if the behaviour will change a bit, that will be "documented as an enhancement" :) | 18:10.12 |
Robin_Watts | sags: I believe it should not need any API changes. | 18:10.36 |
| Have a look over the patch at your leisure and let me know what you think. | 18:10.51 |
| I hope it addresses all your concerns. | 18:10.58 |
sags | Yes, there definitely are cases when the encoding cannot be guessed. Example: an UTF-16 @file containing a single chinese filename, without drive/ directory/ extension and without a terminating newline and without a BOM. So no character in the range U+0000..U+00FF. There nobody can tell it's a UTF-16 or an ANSII one. | 18:13.15 |
| Ok, I'll take a llok at that patch, most likely this weekend. | 18:14.04 |
mvrhel_laptop | bbiaw | 19:13.49 |
marcosw | cut down the number of problem mupdf files by another 23, they had included some ghostscript errors in the mupdf .zip file (haven't looked into the ghostscript .zip file to see if they made the opposite error as well, so this may be a short lived reduction). | 20:28.31 |
| Forward 1 day (to 2012/12/13)>>> | |