| <<<Back 1 day (to 2018/08/21) | 20180822 |
moriarty | hey folks | 00:25.04 |
paulgardiner | I have a file with a too-long name. When I view it, the problem object is loaded via pdf_resolve_indirect, which catches the error and the resulting render looks AFAICT the same as with AR. When I try to save the file, the problem object is loaded via pdf_load_object, nothing catches the error and the save fails. Any thoughts? | 09:40.52 |
tor8 | paulgardiner: how long is the name? | 09:44.51 |
| the spec limits names to 127 bytes | 09:45.14 |
paulgardiner | Not sure. At least twice that, I think. | 09:46.06 |
tor8 | Hm. I think we should probably return PDF_TOK_ERROR instead of throwing. Let me cook up a patch for you to try. | 09:58.45 |
| paulgardiner: try the commit on the top of tor/master | 10:00.56 |
paulgardiner | Oh great. Thanks Tor | 10:01.11 |
tor8 | I have no idea if that will help, but I think it will make it consistent with other types of syntax errors at least | 10:01.36 |
paulgardiner | Ah, now pdf_parse_dict throws "invalid key in dict" because it's got a PDF_TOK_ERROR where it expects a PDF_TOK_NAME. | 10:09.22 |
| I wonder if it's worth my trying catching the original error where the save function preloads all the objects... whether saving would complete with a missing object. | 10:11.00 |
tor8 | oh ... that just moves the error to a higher level, when parsing not lexing. | 10:11.19 |
paulgardiner | yeah | 10:11.33 |
tor8 | paulgardiner: or we could tweak the parsing to not throw syntax errors, and just put 'null' in the place when assembling duff arrays and dictionaries | 10:13.22 |
paulgardiner | Yes, I wondered that. Just wasn't sure how the rest of the code might behave in the presence of those 'null's. | 10:15.24 |
| I'm just can try out catching the errors in the preload | 10:15.49 |
sebras | it was me adding the 127 byte limit. I remember having a file that had a too long name that caused trouble, but I didn't note which one so we might be causing that problem to reappear. | 10:17.10 |
tor8 | paulgardiner: try the next commit on tor/master too | 10:18.29 |
| sebras: we're not removing the limit. I changed the error handling from fz_throw() to return PDF_TOK_ERROR | 10:18.55 |
sebras | tor8: ok, I was worried that you were considering removing it altogether. | 10:19.24 |
paulgardiner | tor8: Whoop! That works. | 10:24.36 |
| I think I may have seen this before and opened a bug for it. | 10:25.02 |
| 699620, although I seem to be suggesting the error is inappropriate there. I think it may be that a single sequence of chars is used both as a name and as a string within the document, so that although the name is ignored, the string still appears on the page. | 10:28.44 |
tor8 | paulgardiner: that content object stream has a dictionary with a name that exceeds the max length: /Yes:#20If#20"Yes",#20print#20your | 10:31.41 |
| #20former#20name#20exactly#20as#20it#20appears#20on#20your#20present#20license#20or#20non-driver#20ID#20#28Identification#29#20card. | 10:31.41 |
| with the patches on tor/master it goes through and jsut drops that dictionary key/value pair | 10:32.22 |
paulgardiner | That string still appears in the document even with your change though, so I think it also occurs as a string. | 10:33.27 |
| Actually, maybe not that string, but the one I first ran into as a too-long name: "Yes - (If you marked Yes the vehicle..." | 10:37.12 |
tor8 | paulgardiner: the offending name is the name of one of the AP/D states | 10:39.15 |
| annotation object 1672 in that file | 10:39.27 |
| has an /AP/D dictionary with an /Off key and a /YES#20-#20Complete#20 | 10:39.51 |
| Voter#20Registration#20Application#20Section#20#28Not#20necessary#20if#20you#20 | 10:39.51 |
| bring#20this#20form#20to#20a#20DMV#20#28Department#20of#20Motor#20Vehicles#29#2 | 10:39.51 |
| 0office#29. 1574 0 R>>/N<</YES#20-#20Complete#20Voter#20Registration#20Applicat | 10:39.51 |
| ion#20Section#20#28Not#20necessary#20if#20you#20bring#20this#20form#20to#20a#20 | 10:39.51 |
| DMV#20#28Department#20of#20Motor#20Vehicles#29#20office#29. key | 10:39.51 |
paulgardiner | Yes. All I'm saying is I think the same sequence of characters occurs elsewhere in the document as a string, which is what confused me into suggesting that the error was inappropriate in the bug. | 10:40.37 |
tor8 | now, I see two ways of handling this -- either we drop the offending too long name completely or we truncate it... I think dropping it is probably better. | 10:40.42 |
| paulgardiner: yes, the same string shows up in the document text | 10:40.56 |
| but it does mean we cannot actually select the "YES - | 10:42.22 |
| Co | 10:42.22 |
| mp | 10:42.22 |
| lete Vo | 10:42.22 |
| ter | 10:42.22 |
| Registr | 10:42.23 |
| atio | 10:42.25 |
| n | 10:42.27 |
| Ap | 10:42.29 |
| p | 10:42.31 |
| licatio | 10:42.33 |
| n | 10:42.35 |
| Sectio | 10:42.37 |
| n | 10:42.39 |
| ( | 10:42.41 |
| No | 10:42.43 |
| t n | 10:42.45 |
| ecessar | 10:42.47 |
| y if yo | 10:42.49 |
| u br | 10:42.51 |
| in | 10:42.53 |
| g this fo | 10:42.55 |
paulgardiner | Your current patch fixes both the occurrences of this I've so far run into. | 10:42.56 |
tor8 | r | 10:42.57 |
| m to | 10:42.59 |
| a DMV o | 10:43.01 |
| ffice)" checkbox | 10:43.03 |
| oops, copy paste didn't go well from that document | 10:43.05 |
sebras | tor8: I'm surprised you weren't rate limited. :) | 10:43.28 |
tor8 | paulgardiner: I hope you mean "now fixed" in bug 694804 | 11:27.51 |
sebras | tor8: hm... I get bmpcmp differences with the nested_depth removal. | 11:33.10 |
tor8 | sebras: yes. I'm getting other errors too. | 11:34.08 |
sebras | tor8: I've gotten a few random timetouts. | 11:34.39 |
| tor8: but my bmpcmp results are consistent Bug688796b.pdf is drawn differently. | 11:34.59 |
| I can reproduce that locally. | 11:35.03 |
tor8 | yeah. that's possibly due to the change in pdf-op-run.c where the condition for render_direct is different | 11:35.35 |
sebras | ah, yes. I did notice that expression last night, but had forgotten about ti. | 11:39.17 |
| tor8: I'm guessing that we can consider this one a progression? if so I'll go ahead and push sebras/master..? | 11:39.46 |
| well the first 5 patches. | 11:40.07 |
tor8 | sebras: I would like to figure out the other problems first, but up to and including "Bug 699666: Forbid cycles in Type3 font CharProcs." are good to push | 11:40.35 |
sebras | tor8: I only get the timeouts on some of the machines. | 11:44.46 |
| tor8: locally I could reproduce a long rendering time for one of the files ( | 11:45.09 |
tor8 | did it get longer? | 11:45.23 |
sebras | but the other two PDFs both 0 bytes) | 11:45.29 |
| tor8: I think it was just statistical variation. I did run it inside gdb and looked at the backtrace and it seemed to be rendering a 2000x5000 pixmap | 11:46.28 |
| tor8: rerunning this I can see that Bug692933.pdf results in a 5760x2268 pixmap, so I'm not surprised that it takes 4 sec on my machine to render. and possibly too long, triggering a time out, on some of the cluster's slower(?) machines.. | 11:50.15 |
| Bug690237.pdf and Bug692160.pdf are both 0 byte files, so why these triggered timeouts I'm not sure. | 11:50.47 |
tor8 | that sounds like a cluster problem -- those files are not 0 bytes | 11:52.47 |
sebras | tor8: oh really? I downloaded them from the cluster. | 11:55.45 |
sebras | takes a look locally. | 11:55.50 |
tor8 | ~/src/tests_private/comparefiles $ ll Bug690237.pdf | 11:56.10 |
| -rw-r--r-- 1 tor tor 12M Nov 4 2016 Bug690237.pdf | 11:56.10 |
sebras | tor8: redownloading helped for that file. I'll try the other. | 11:58.30 |
tor8 | your report says they stopped timing out, not that they started timing out... | 11:58.56 |
| and they are huge pages, so no wonder they're on the edge of timing out | 11:59.23 |
sebras | oh! I see what happened now. | 12:01.42 |
| tor8: I grabbed the files from my download folder before they were fully downloaded. firefox creates a file.pdf.part and a file.pdf which is empty while it is being downloaded. then once the download is done the .part-file is renamed to the correct name. | 12:02.34 |
| tor8: I guess my location doesn't help with the download time... | 12:05.44 |
| tor8: so with that I believe this is now sorted..? | 12:16.48 |
| i.e. I was just being silly. | 12:17.03 |
paulgardiner | tor8: looks like that too-long name breaks the form and stops the checkboxes from being checked, even with your fix (probably unsurprisingly). | 12:38.17 |
tor8 | paulgardiner: yeah. at least the content shows up with the fix :) | 12:46.35 |
paulgardiner | Yes,definitely an improvement. :-) | 12:47.12 |
| Somehow AR is working around it. | 12:47.21 |
tor8 | maybe they ignore the limit in the spec | 12:47.36 |
paulgardiner | Yeah, could be. | 12:47.44 |
sebras | tor8: so if we convert such a file, do we shorten the name to make it conform to the spec or do we emit a broken PDF? | 12:50.38 |
| or clean or whatever. | 12:50.46 |
tor8 | sebras: with my current patch, we drop the offending name (turn it into 'null') | 12:51.13 |
| another approach would be to truncate and warn | 12:51.32 |
paulgardiner | I have a file that wont clean because of the issue | 12:51.47 |
| .. resulting file wont load in AR | 12:52.14 |
sebras | can we test if AR has any upper limit at all? what about a 1M character name? | 12:53.23 |
| perhaps they just raised it a bit. | 12:53.35 |
tor8 | paulgardiner: with or without the patches on tor/master? | 12:56.21 |
paulgardiner | I tried both, I believe. | 12:57.18 |
sebras | pdfref20 mentions the 127 char name limit as an architectural consideration, not a strict limit. | 12:57.23 |
paulgardiner | Without gave an empty file, With gave a sensibly sized file that wouldn't load. | 12:57.59 |
tor8 | it's probably missing a vital part. have you got the file somewhere? | 12:58.20 |
paulgardiner | Just emailed it. | 13:02.44 |
sebras | tor8: may I bother you with a review of sebras/master? | 13:58.37 |
tor8 | sebras: sebras/master LGTM | 14:03.22 |
sebras | tor8: it clusters well, I'll go ahead and push then. | 14:04.56 |
tor8 | sebras: go for it. | 14:05.10 |
| paulgardiner: yeah, I see some weird stuff going on with mutool clean with that patch | 14:10.03 |
| it's dropping '0' bytes and truncating output | 14:10.14 |
paulgardiner | Does sound weird | 14:11.06 |
tor8 | maybe encryption-related | 14:12.50 |
paulgardiner | Oh, I hadn't noticed it was encrypted. | 14:13.29 |
tor8 | but with the 'truncation' patch, it goes through properly... | 14:14.11 |
| valgrind says it's writing uninitialized data | 14:16.45 |
| paulgardiner: I'm inclined to believe that the encryption writing is what's breaking the cleaning of that file | 14:39.41 |
| the fact that it also has too long names is just the icing on the cake | 14:39.55 |
paulgardiner | Yes, now you mention it. I believe I've seen other cases where encryption breaks cleaning. | 14:40.47 |
| There are some checkboxes on the second page that don't work though, probably because of the too-long names. | 14:41.56 |
tor8 | yes. truncating instead of dropping those names make the checkboxes work. | 14:42.24 |
| but cleaning is still broke | 14:42.30 |
paulgardiner | Ah okay. That's fine then. | 14:42.51 |
sebras | tor8: how do you feel about "Remember to end groups/softmasks even upon exception."? | 14:46.38 |
| tor8: that's on sebras/master and will be the last thing I'll do today. | 15:14.19 |
kens | tor8 can you answer this question ?: | 15:21.30 |
| https://stackoverflow.com/questions/51969592/get-mutool-to-output-structured-text-as-xml | 15:21.30 |
sebras | tor8: ok, I'll leave that for another day, need to go. bye! | 15:34.27 |
tor8 | kens: answered | 15:36.28 |
kens | THaniks tor, I wasn't absolutely sure what the question meant.... | 15:36.43 |
tor8 | sebras: my brain is too mushy today, I'll tackle your last commit tomorrow (or when you're back from HK) | 15:36.50 |
sebras | tor8 (for the logs): after is good enough, thank you. :) | 22:51.59 |
| Forward 1 day (to 2018/08/23)>>> | |