| <<<Back 1 day (to 2013/02/06) | 2013/02/07 |
marcosw | alexcher: appears that the libcups2-dev package wasn't installed | 00:01.46 |
| I don't understand why no one noticed this earlier, but thanks for catching it now. | 00:03.02 |
| wait, this may have broken in the last day or two. I upgraded both angstroms and points from 10.04 to 12.04 and both were missing cups packages. | 00:07.50 |
Robin_Watts | tor8: Morning | 10:14.39 |
tor8 | Robin_Watts: hi. | 10:15.20 |
Robin_Watts | Android fix online. | 10:15.28 |
tor8 | big changes, looks like we forgot android completely there. | 10:17.51 |
| bloody hell. xcode had a minor update ... now it's downloaded gigs of data and has to reinstall all the "components" like the damn compiler. | 10:19.27 |
Robin_Watts | yeah. gotta love apple. | 10:19.40 |
tor8 | not to mention that they've changed the build system. again. breaking all the build scripts. again. for no visible gain. again. | 10:25.09 |
| can I please just rm -rf ios and get on with my day? | 10:25.27 |
kens | Hmm, well that's all the segv's out of my new code, now for the errors and differences | 10:31.22 |
Robin_Watts | oh, bugger. | 12:53.23 |
| the heights of the bboxes collected by the text code are based on the font bbox, not the char bboxes. | 12:53.43 |
paulgardiner | Robin_Watts: I see you found the vargs problem. Nice... well the problem isn't nice, but it's good you found it. | 13:04.32 |
Robin_Watts | paulgardiner: Yeah. Hole in the original var_args spec :( | 13:09.09 |
tor8 | oh forf... now the ios app crashes when freeing pages. probably something apple changed in the automagic reference counting Blocks^⢠crap. | 13:15.57 |
| I should probably take a break and go for a long walk before I destroy my mac laptop in a fit of rage... | 13:16.27 |
Robin_Watts | tor8: You're in the same position now I was in with the va_args stuff last night. Some 'compiler magic' is failing. | 13:17.06 |
| You have my sympathies. | 13:17.12 |
tor8 | Robin_Watts: I've just given up trying to make the "make generate" step work from Xcode | 13:17.28 |
Robin_Watts | tor8: I bet this means the gs xcode build won't work either now :( | 13:17.56 |
tor8 | some crap environment setting magically changes the compiler architecture setting so it builds for a different architecture than it later wants to link with | 13:18.02 |
| Robin_Watts: well, it's complicated because of the deployment target cross compiling madness with "SDKROOTs" and whatnot | 13:18.24 |
| but yes, they seem to break everything that isn't done "The Apple Way" every single damn point release | 13:18.50 |
Robin_Watts | I wouldn't mind if they documented what apples fucking way was :( | 13:19.18 |
paulgardiner | It's conservation of ease of use. Every bit of UI simplicity presented to end users has to be clawed back from the developers. That's the only way xcode makes sense to me. | 13:21.40 |
Robin_Watts | tor8: To do proper line distance measurements, we need the baseline, not just the bbox of the line. | 13:47.53 |
| bboxes are the wrong thing to store. | 13:51.26 |
tor8 | Robin_Watts: right. that should go in the fz_text_line struct then. | 13:51.49 |
| brb, going out to cool off so I can finish the ios fixes then I'll work on the block/line/char thing | 13:52.14 |
Robin_Watts | tor8: So we're giving up on all hope of ever dealing with non-orthogonal text? | 13:52.29 |
| Sure. Go out, cool off. | 13:52.42 |
| I think we can revamp the block/line/span/char thing into something more useful with a bit of care. | 13:53.10 |
tor8 | Robin_Watts: I think that'll be a reasonable thing. I've neven seen non-orthogonal text used in anything except logos and graphs. | 13:53.57 |
Robin_Watts | How about holding: | 13:55.14 |
| fz_text_char = { char c, float x, float y } | 13:56.45 |
| fz_text_span = { fz_text_char chars[], style, dir, end_x, end_y, ascender, descender } | 13:56.47 |
| So each char is just the position at which it is placed. | 13:57.13 |
| The span contains sets of chars placed with the same direction in the same style. | 13:57.40 |
paulgardiner | Robin_Watts, tor8: for the annotation work, the centre line and base line would be useful.... if such exist | 13:57.44 |
Robin_Watts | end_x, end_y, ascender, descender could be calculated each time, but are probably worth storing in the struct. | 13:58.21 |
tor8 | Robin_Watts: the direction could be part of the style. | 13:58.21 |
| and so would ascender and descender, being dependent on the font and font size | 13:58.43 |
Robin_Watts | tor8: Right. Like I say they could be calculated/fetched each time but for ease we might want to store locally. | 13:59.23 |
| Gathering chars into spans is easy (look for matching style/direction, and end_x, end_y being in line with the new x/y and direction) | 14:00.08 |
| Generating the bbox from that information for a given span is easy. | 14:00.29 |
tor8 | Robin_Watts: I'm not sure we want to keep the span though | 14:00.47 |
| having to worry only about lines and blocks would simplify a lot of things | 14:01.04 |
Robin_Watts | Ah, I was thinking that we definitely want spans. | 14:01.15 |
tor8 | spans are basically just coding in that a run of chars in a line share a style | 14:01.33 |
| so a memory optimisation | 14:01.40 |
Robin_Watts | Yes and no. | 14:01.55 |
tor8 | unless we want to add more than just that info to it | 14:01.58 |
Robin_Watts | I'd like spans to mean 'words' I think. | 14:02.06 |
tor8 | hmm | 14:02.20 |
| that's something different again | 14:02.28 |
| by words do you mean line-breaking units? | 14:02.46 |
Robin_Watts | so when we get big gaps in a line (say in tables of contents) we hold "1.1" and "Transparency in PDF" as separate spans. | 14:02.49 |
| yeah, I don't mean words. | 14:02.59 |
tor8 | well, I'd make those be separate line objects | 14:03.09 |
Robin_Watts | For spans I mean "groups of chars that are clearly supposed to be together" | 14:03.33 |
| I have to have lunch. You go cool off, we can think about this and talk about it more later. | 14:04.08 |
tor8 | okay. | 14:04.18 |
Robin_Watts | I'm sure that a move away from bboxes would be a good idea though. | 14:04.32 |
tor8 | Robin_Watts: we need hit boxes for hit detection, but we should definitely add baseline info | 15:13.56 |
Robin_Watts | tor8: We can trivially generate bboxes from the other information. | 15:14.46 |
tor8 | Robin_Watts: yes, so no need to store that in the text_char | 15:15.24 |
Robin_Watts | The current comments say that a text line is a list of text spans that share the same baseline. | 15:16.01 |
| I wonder if we should just adopt that more thoroughly. | 15:16.24 |
| spans would have a list of chars, and a style, and an end point. | 15:17.00 |
| lines would then have the direction, and maximum ascender/descender values and a list of spans. | 15:18.02 |
tor8 | am I correct in assuming you want basically: chars, chunks of chars, hboxes and vboxes? | 15:18.19 |
Robin_Watts | tor8: I think so. | 15:23.31 |
| I'm toying with the structures locally to get it straight in my head. | 15:23.56 |
tor8 | if we're going with orthogonal only text to maintain sanity and clear algorithms, I'd suggest having: char, line, block only | 15:27.30 |
proge | hello, wikipedia says about mupdf "provides support for other operations such as searching and listing the table of contents", but so far I haven't found any info for this in the man pages for mupdf 0.9-2 on debian. Any tips? Thanks! | 15:27.53 |
tor8 | proge: hit '/' to start searching | 15:28.13 |
| and lines are broken by large horizontal gaps, like in tables | 15:28.18 |
proge | what about listing the table of contents or outline of a pdf? | 15:28.42 |
Robin_Watts | So a text line like: "This is an /ITALIC/ thing". would be what? | 15:29.03 |
| I like the idea of that being a line containing 3 spans. | 15:29.16 |
tor8 | each char has a pointer to its style | 15:29.23 |
Robin_Watts | ooh, no, don't like that. | 15:29.36 |
tor8 | when emitting, the emitter can group them back into spans | 15:29.38 |
| there'll be a lot of faffing about to split and merge spans when assembling the lines and reordering RTL text | 15:30.11 |
| the spans don't add any spatial information that's useful in analyzing text | 15:30.42 |
| ...layout | 15:30.49 |
Robin_Watts | tor8: They do. | 15:30.56 |
| Look at page 5 of pdf_reference17.pdf | 15:31.53 |
| I don't want a single line that says "1.1 About This Book 25". | 15:32.27 |
tor8 | proge: not possible in the x11 app. you can use the command line to list the table of contents. | 15:32.29 |
Robin_Watts | I want a line thats 3 spans: "1.1" "About This Book" "25" | 15:32.53 |
tor8 | Robin_Watts: right. In my scenario that would be 3 "lines" | 15:33.43 |
proge | tor8: what would be the command? I tried looking at pdfshow, but that wasn't very helpful. | 15:33.51 |
tor8 | proge: mudraw -l file.pdf | 15:34.40 |
| Robin_Watts: so you're probably right in that we want chunk, hbox and vbox | 15:35.03 |
| hopefully that's flexible enough to reassemble tables down the line | 15:35.16 |
Robin_Watts | So effectively you're suggesting we drop lines, and keep spans. | 15:35.19 |
tor8 | question is, do we enforce chunks inside hboxes inside vboxes | 15:35.52 |
| or just have chunks and boxes with direction as parameter | 15:36.08 |
proge | tor8: hmm... I have mupdf or pdfdraw installed but neither have a -l option. | 15:36.48 |
tor8 | Robin_Watts: the current code makes three lines from that example. the spans are only used for amortizing the cost of a style pointer | 15:36.56 |
| proge: then you'll need to upgrade. 0.9 is rather old by now. | 15:37.14 |
| mupdf is a fast moving project. | 15:37.24 |
Robin_Watts | is slow: chars = chunks, spans = hboxes, blocks = vboxes ? | 15:37.31 |
tor8 | chunk is a spatially contiguous sequence of chars | 15:38.24 |
proge | tor8: ok will do. I tried upgrading before and had troubles with dependencies, but should give it another try... thanks for the help! | 15:38.25 |
| exit | 15:38.37 |
| quit | 15:38.40 |
tor8 | what is currently a span would disappear | 15:39.03 |
Robin_Watts | chunk = spans, hboxes = lines, vboxes = blocks then. | 15:39.04 |
tor8 | Robin_Watts: roughly, yes | 15:39.13 |
Robin_Watts | OK. | 15:39.16 |
tor8 | but I'd keep the styles out of the chunk, so one chunk could have multiple fonts | 15:39.44 |
Robin_Watts | I prefer thinking in lines/blocks rather than hboxes/vboxes. | 15:39.45 |
| as for vertical text lines are vertical, and chunks are horizontal, if you see what I mean. | 15:40.15 |
tor8 | so in the most common case, each line would consist of one chunk | 15:40.16 |
Robin_Watts | yes. | 15:40.27 |
tor8 | but for tabulated stuff, each line would be grouped into chunks | 15:40.37 |
Robin_Watts | right, but we have enough information to dive in and split the table later. | 15:40.59 |
tor8 | we could possibly get away with a non-hierarchical structure for this. encode the line/block info in a separate structure | 15:41.21 |
| when assembling text, we should only care about the chunks | 15:41.33 |
Robin_Watts | Different algorithms work at different levels though. | 15:41.49 |
tor8 | consider the table of contents on page 5 | 15:42.02 |
Robin_Watts | paragraph analysis works at the line level. | 15:42.25 |
tor8 | the "1.1" is really in a column of its own | 15:42.46 |
Robin_Watts | Our current code outputs "1.1 1.2 1.3 1.4" "About This Book 25 Introduction to PDF 1.7 Features 28. ..." | 15:43.21 |
tor8 | the "About This Book" and "25" are far apart, but the "25" is not vertically aligned with anything else | 15:43.26 |
| yeah, that's the column detection we have at work | 15:44.00 |
Robin_Watts | I'd rather gather that into lines and spans to start with, and then split it later. | 15:44.12 |
tor8 | Robin_Watts: how about span soup, and then separate indexes to these spans by lines and by column | 15:44.38 |
| well, span soup to start with at least | 15:45.34 |
Robin_Watts | Yeah. | 15:45.40 |
tor8 | then find the column breakers by marked region analysis | 15:45.44 |
Robin_Watts | The only instant qualms I have about that are: | 15:46.19 |
tor8 | and I guess that could work horizontally as well (for say newspapers with top half and bottom half running different columns) | 15:46.37 |
Robin_Watts | 1) Adding to span soup could easily become n^2. But then if we stick with our current thing of 'appending to current one or make a new current one' we'd be no worse off than now. | 15:47.13 |
| 2) Whenever I see a data structure that has pointers into another data structure, I wonder if that's supposed to mean that elements of the second data structure can be used more than once (or not at all). | 15:48.03 |
tor8 | yes. current approach to initially create. then sorting them and doing a separate merging pass afterwards would work better than n^2 | 15:48.15 |
Robin_Watts | Wheras we will want to have a 1.1 mapping. | 15:48.20 |
| tor8: absolutely. | 15:48.38 |
tor8 | Robin_Watts: well, essentially we'd have a span pool which the layout assembly pass would draw from. | 15:49.16 |
Robin_Watts | So span soup to collect, (post process to collate spans if possible), (collate spans into lines), (collate lines into paragraphs). | 15:50.03 |
tor8 | yes. | 15:50.27 |
Robin_Watts | Sounds reasonable to me. | 15:50.37 |
tor8 | insert a step to "figure out column breaker regions" that is used by the line and paragraph collators | 15:50.47 |
| so that spans and lines are not collected across the regions | 15:51.07 |
Robin_Watts | tor8: yeah, the region stuff would do that well, I think. | 15:51.38 |
| The important thing is, I think, that we get something up and running fast, and then can go back and insert extra stages later. | 15:52.09 |
sebras | tor8: sounds like your flood-fill-based algorithm a bit... | 15:52.29 |
tor8 | the RTL pass, should that run before the line pass on a span itself | 15:52.33 |
Robin_Watts | How does the RTL stuff work? | 15:52.49 |
tor8 | fz_text_char { int c; float x, y; fz_text_style *s; } would help to get that up and running | 15:52.52 |
| BiDi algorithm, but simplified | 15:52.58 |
Robin_Watts | How do you know that given text needs RTL ? | 15:53.12 |
tor8 | run through and detect regions of chars that have RTL directionality, and reverse them in the list | 15:53.19 |
| Robin_Watts: that's what the unicode database I sucked in is for | 15:53.31 |
| look at the unicode character bidi class, if it has strong RTL direction I'll treat it as needing to be reversed | 15:53.58 |
Robin_Watts | so essentially if we see a sequence of chars output from L to R on the page, where they all should be R to L, we swap them in the span? | 15:54.23 |
tor8 | thinking about it, it should definitely run at the span level | 15:54.30 |
Robin_Watts | Possibly we should just set a flag in the span? | 15:54.40 |
tor8 | Robin_Watts: yes. but they could be mixed with LTR characters in the same span | 15:54.57 |
Robin_Watts | OK, so we split the span to have just R2L or L2R in each one. | 15:55.15 |
tor8 | Robin_Watts: we want logical ordering so that when we dump the span to html it comes out the right way | 15:55.43 |
| but from the PDF it's all in visual LTR order | 15:55.55 |
Robin_Watts | tor8: We want to have it so that we *can* drop it to html and have it come out the right way. | 15:56.16 |
tor8 | Robin_Watts: not sure why we'd want to split the span | 15:56.19 |
| it's still a contiguous chunk of text as far as the rest of the analysis goes | 15:56.41 |
Robin_Watts | If each char in the span is absolutely positioned, then we can do it by reordering stuff within the span, yes. | 15:56.52 |
| It means if we ever go back from our analysed text to PDF then we'd output R2L text in R2L order. | 15:57.41 |
tor8 | Robin_Watts: right. the chars in a span would have to be strictly ordered by wmode direction | 15:57.49 |
Robin_Watts | eh? I didn't follow that. | 15:58.08 |
| Suppose I have a PDF that has 'ABCDEF' on the page, where those are all really R2L things. | 15:58.31 |
tor8 | sorted in asceding X order if wmode is horizontal, in the user space | 15:58.39 |
Robin_Watts | We'd collect them into a span as "ABCDEF". | 15:58.44 |
| Your R2L would then change them to be "FEDCBA", right ? | 15:58.58 |
tor8 | Robin_Watts: correct | 15:59.04 |
Robin_Watts | So we'd end up with a span where the X would be descending. | 15:59.19 |
tor8 | Robin_Watts: also correct | 15:59.29 |
| but consider "abcDEFxyz" where upper case is RTL and lower case is LTR. the bidi pass would turn that into "abcFEDxyz" | 16:00.03 |
Robin_Watts | Yeah. That seems sane to me. | 16:00.17 |
tor8 | doing this without having to worry about crossing span borders would be very helpful | 16:00.49 |
Robin_Watts | That means that spans are going to need to have some sort of 'min x' and 'max x' field as we can't just assume that the left of the first char and the right of the right char give us our bbox. | 16:00.56 |
tor8 | Robin_Watts: right. I'd assume the span would have baseline start and end, and a cached bbox | 16:01.19 |
Robin_Watts | tor8: suppose we have ABC where the B is a different style. | 16:01.45 |
tor8 | so those would be unaffected, but you wouldn't get those from the chars after the initial assembly pass | 16:01.48 |
| Robin_Watts: that's why I want to stick style in the text char | 16:02.07 |
Robin_Watts | That bloats stuff a lot. | 16:02.37 |
| but sod it. | 16:02.52 |
tor8 | we're unbloating it by going from rect to x,y, so we're break even | 16:02.56 |
Robin_Watts | ok. | 16:03.04 |
tor8 | and yes, it's insignificant but simplifies all algorithms | 16:03.23 |
| uhm, should be another word there. "but" doesn't really fit :) | 16:03.42 |
| needs more sugar and coffeine | 16:04.03 |
Robin_Watts | OK. Let me bash on the structures for a bit and see how it falls out. | 16:04.08 |
| I'll let you get back to iOS, cos I know you love it. | 16:04.22 |
tor8 | alright. and I'll bash my head against the desk trying to get this apple insanity working. | 16:04.28 |
marcosw | sebras: okay if I reboot casper? (you are the only other person logged on) | 16:24.39 |
henrys | marcosw:since you have this performance regression stuff setup is it easy to do say 9.06 vs. 9.07 before we ship? | 16:25.54 |
marcosw | henrys: it should already be done. we just need to decide what are reasonable 9.06 and 9.07 git hashes and take a look at the files on miles. | 16:27.16 |
henrys | oh I didm''t know you saved them that far back. | 16:28.04 |
marcosw | I never through anything away (you should see the amount of paper in my office). | 16:28.44 |
henrys | so you have some script that reads 2 files and does a diff? | 16:29.08 |
| and you can email that to tech? | 16:29.22 |
marcosw | will do. | 16:29.32 |
henrys | maybe we don't want to know ;-( | 16:29.59 |
paulgardiner | Robin_Watts, tor8: A recent commit has messed up text selection (lines no longer in line, I think). Not sure which yet, but I guess it has to be something to do with reflow | 16:41.51 |
tor8 | paulgardiner: Robin_Watts made a commit that changed how lines were broken | 16:42.17 |
| paulgardiner: http://git.ghostscript.com/?p=mupdf.git;a=commit;h=0399332d547b92c79bfea20982a3a1492f6df272 | 16:42.39 |
paulgardiner | tor8: I'm just about to test before that commit | 16:43.15 |
sebras | marcosw: no, rebooting is not a problem. thanks for the heads up though. :) | 16:58.02 |
marcosw | sebras: when you didn't answer i figured you weren't doing anything, so I went ahead. | 16:58.39 |
sebras | marcosw: that's fine. | 17:03.54 |
paulgardiner | Robin_Watts, tor8: Three commits on paulg/master if you get a moment. Robin, the last one is the patch you just gave me | 17:10.01 |
Robin_Watts | paulgardiner: I have everything in bits around me at the mo. I will try to look when I get back to sanity. | 17:13.52 |
paulgardiner | no hurry | 17:14.21 |
kens2 | GOodnight folks | 17:18.00 |
ray_laptop | is chrisl_away still really "away" ? | 18:34.40 |
henrys | ray_laptop:marcosw script thresholds at 5% right marcosw? | 18:36.30 |
Robin_Watts | he headed out a while ago to squash. | 18:36.35 |
ray_laptop | Robin_Watts: thanks | 18:36.43 |
| henrys: OK. I am sort of curious how we are doing on the 'improvement' side as well (and overall, at least for high z tests) | 18:37.27 |
Robin_Watts | An overall "average % change" field would be nice :) | 18:38.06 |
henrys | yes has the data it should just be a tweak to the script for that | 18:38.08 |
| s/yes/yes he/ | 18:38.21 |
ray_laptop | henrys: but I agree that some of the files that go from under a second to multiple seconds is worth looking into. | 18:42.51 |
henrys | FWIW hunting down a particular day for the regression I did this: ssh miles.ghostscript.com grep tests_private__customer_tests__wltnt10 /home/marcos/performance/results/*, I guess I could search my email to, but that give a list of every >5% increase and its date | 18:43.46 |
ray_laptop | not too many PS or PDF files. A LOT of PCL files surfaced, however | 18:44.21 |
henrys | yeah one thing is enabling RTL via PJL all those file were just treated as PCL before and only rendered very small portions of the page but some of these are alarming | 18:46.02 |
ray_laptop | only 14 of the pdf tests are more than 100% slower (out of the 114 total that are at least 100% slower) | 18:48.27 |
henrys | a lot of duplication though the file above (wltn10 I was looking at) has 8 entries | 18:50.28 |
| why don't we divide these up for review. Obviously I'll look at PCL maybe Alex can do PDF and we need a volunteer for postscript | 18:52.37 |
| alexcher ^^ | 18:54.22 |
| crickets ;-) | 18:58.19 |
ray_laptop | henrys: for PS, I see 002-21.ps , Bug691335.eps , bug690338.ps , 473-01.ps, Bug692378 , 12-14*.PS , Bug692331.ps , 483-05-fixed.ps , 09-47*.PS , 12-07A.PS , 446-01-fixed.PS , Bug692330.ps and self-intersect2.ps | 19:00.22 |
| that's in order from worst change down, but the 12-14 and 09-47 cases sort of spread around | 19:01.20 |
| so about 13 files to look into (assuming that the 09-47 and 12-14 cases are the same root cause for the various suffix letter versions of the file) | 19:02.37 |
alexcher | henrys: please file a bug report. | 19:03.59 |
ray_laptop | henrys: as long as this isn't holding up the release (i.e., drop everything), I'll do the PS files unless someone else steps up. | 19:04.05 |
henrys | I don't think any of it should hold up the release - review seems prudent | 19:04.35 |
ray_laptop | henrys: good. Particularly since so many are PCL, and our PCL customers tend to take a while to pick up new versions | 19:05.24 |
henrys | ray_laptop:right | 19:05.37 |
ray_laptop | hi, marcosw | 19:05.48 |
| we were asking for an overall look at performance of 9.06 vs. 9.07 | 19:06.12 |
marcosw | ray_laptop: good morning | 19:06.15 |
| do you mean total time for processing all files? | 19:07.24 |
ray_laptop | marcosw: so is it correct that we were only 5% or more slower on 547 of the 60K plus files ? | 19:07.54 |
| marcosw: right. overall, is 9.07 better or worse (at least for the high z files) | 19:08.24 |
| marcosw: it'd be nice to know if we are making things at all better, or just worse | 19:09.19 |
marcosw | ray_laptop: that's a valid question that I don't have the answer to. Files that are faster aren't reported. Let me run the total processing time for al 60k files and let you know. | 19:09.20 |
ray_laptop | marcosw: OK. Thanks. | 19:09.40 |
| marcosw: I thought you might have the times captured in a log so you wouldn't have to re-run | 19:10.00 |
marcosw | I do have the times, but can't just add up the columns, since there are files added to the repository, so have to run an inner join or some such thing. | 19:10.43 |
ray_laptop | marcosw: Oh, true. there are more files in the 9.07 set than we had for the 9.06 runs | 19:11.23 |
marcosw | yup. | 19:11.31 |
henrys | alexcher:they aren't bugs yet marcosw prepared a list of performance regressions - the list needs to be filtered for bugs - if we are printing the file correctly now then it is not a regression. | 19:22.36 |
| I'm filtering the pcl problems, ray postscript and you pdf | 19:23.12 |
alexcher | henrys: yes. I'll do PDF. | 19:23.56 |
marcosw | ray_laptop: so overall we are doing better, 9.06 took a total of 45,625 seconds and 9.07rc1 took 40,261 seconds | 19:23.57 |
| pdfwrite output and input also slightly improved. | 19:24.37 |
henrys | I have some huge changes in pcl and I can't see what might have caused it, I'll keep looking | 19:25.51 |
marcosw | henrys, ray_laptop, alexcher: will look at the output and open bugs for file that took a significant performance hit without an improvement in output. | 19:27.03 |
| ^will^I will | 19:27.12 |
henrys | okay great it probably is best you do it then we can dive right into fixing them as they are reported | 19:27.52 |
Robin_Watts | tor8: I've got the code rewritten here with the new types. Various bits are #if 0'd out. | 20:16.31 |
| And I haven't coded strain_soup(); yet. | 20:16.40 |
| Work in progress pushed to my reflow branch for your comments. | 20:17.32 |
kens | Hmm Michael isn't here tonight ? | 20:34.15 |
Robin_Watts | He fell off irc a while ago. | 20:38.30 |
| but he hasn't said he's not about. | 20:38.37 |
kens | :-( | 20:47.06 |
| Oh well, off again I guess | 20:54.16 |
sags | @kens (for the logs), about bug 693614 "pdfmark: accented character ..." : there was no bug inthere, [unpatched] GS and Reader are both OK. The problem is exclusively with the user's PS code. | 21:43.58 |
| First, '\n' == 0x0A == LF and '\r' == 0x0D == CR, not the other way around. The patch gets this right, but the comments on the bug report do not. | 21:43.58 |
| Then, if the outline title in the PDF is "(\376\377\001\n)" then it means "FE FF 01 0a" which is U+010A "LATIN CAPITAL LETTER C WITH DOT ABOVE". | 21:43.58 |
| The question is how does the CR in the source PS stream turns into a LF in the output PDF. The answer is in the PLRM section 3.8.1 "Basic File Operators" under "End-of-Line Conventions". There it says that any unescaped EOL (single CR, single LF, or CR+LF pair) found by the PS language scanner when it is scanning a literal string is converted to a single LF in the resulting PS string object. So the "'(' FE FF 01 0D ')'" byte sequence in the | 21:43.58 |
| input stream is the syntactic representation of a string object of length 4 containing the bytes "FE FF 01 0A". So, mystery solved. | 21:44.01 |
| In conclusion, no bug and no patch needed. The committed patch is not wrong, just that the new code uses a few more CPU cycles, there are a few more bytes in the output PDF, and it does a few extra allocs/frees. | 21:44.03 |
| The user needs to fix the code, s/he cannot just emit some bytes inside "()" and expect to get a correct string. The obvious byte values to handle specially are ")" and "\". Also "(", otherwise the closing ")" may pair with a stray "(" and thus not close the string. Less obvious byte values to watch for are 0x0D and 0x0A (cf that section 3.8.1). | 21:44.06 |
henrys | mvrhel_laptop:kens was looking for you | 22:17.08 |
mvrhel_laptop | oh | 22:19.07 |
| let me check the logs. I had a dental cleaning from 9 to 10 this morning | 22:19.27 |
JakeSays | is there code in mupdf to extract pages from a pdf in to another pdf? | 22:19.48 |
mvrhel_laptop | oh it was just a bit ago darn | 22:20.19 |
| I was off to lunch | 22:20.26 |
| I will get on early tomorrow to catch him | 22:20.38 |
| bbiab | 22:25.19 |
JakeSays | ah so does this make sense? open the pdf, delete all the pages but the ones i want, then do pdf_write_document() to a new file | 22:28.40 |
| Forward 1 day (to 2013/02/08)>>> | |