| <<<Back 1 day (to 2011/10/12) | 2011/10/13 |
henrys | I'll follow up with him. | 00:03.42 |
LaoLang_cool | hello, doesn't mupdf support hyperlink in pdf? | 02:18.01 |
AlecTaylor | hi | 03:12.13 |
ghostbot | salut | 03:12.13 |
AlecTaylor | http://bugs.ghostscript.com/show_bug.cgi?id=692586 | 03:33.25 |
| My XML output from pdfdraw is not parsable by LibXML2.... any ideas? | 03:48.25 |
mvrhel2 | oops. bed time for me | 06:55.41 |
chrisl | reboots for update....... | 07:27.29 |
kens | Hmm 2 bug reports for me opened and closed overnight, that's a good result :-) | 07:38.30 |
chrisl | Well, I did one - turned out to be an evince/poppler bug, rather than pdfwrite | 07:39.02 |
| That was the mangled clipping path one that came up on here just before you finished yesterday | 07:39.47 |
kens | Yes, which was nice. Someone spotted the complex output problem in txtwrite, I'd changed teh types from into to float, but not changed the sprintf. | 07:39.51 |
chrisl | Oops, easy oversight....... | 07:40.26 |
kens | Indeed, especially since I was concentrating on the 'simple' output. The differnece is that teh simple output requires lots of procesing to reassemble the text while the 'complex' output just dumps everything out :-) | 07:41.07 |
Robin_Watts | Le Roi est mort... http://boingboing.net/2011/10/12/dennis-ritchie-1941-2011-computer-scientist-unix-co-creator-c-co-inventor.html | 09:54.12 |
kens | Yes, I just aaw it. | 09:54.24 |
| On the regsiter | 09:54.28 |
| But its a bit of a rmour, unless they've found more conrete evidence | 09:54.49 |
chrisl | I missed that: that's sad news...... | 09:58.41 |
AlecTaylor | hi | 10:48.11 |
ghostbot | hi | 10:48.11 |
AlecTaylor | What's the relationships between xPDF, MuPDF and ghostscript? | 10:49.14 |
kens | xpdf, nothing. | 10:50.03 |
| MuPDF and GS are licenced by Articfex | 10:50.15 |
| And so have some of the same developers | 10:50.30 |
AlecTaylor | But don't share codebase? | 10:56.21 |
kens | no | 10:56.33 |
| at least, not much | 10:56.43 |
AlecTaylor | hmm, kk. Thanks for clearing that up | 10:56.44 |
| oh, kK!! | 10:56.50 |
| Finally; http://bugs.ghostscript.com/show_bug.cgi?id=692586 - XML output from pdfdraw not parsable by LibXML2 | 10:57.17 |
| kens: My XML output from pdfdraw is not parsable by LibXML2. How to fix? | 11:08.28 |
sebras | AlecTaylor: I'm trying to reproduce your problem. | 11:09.36 |
| Not sure where the problem is or how to fix it yet. | 11:09.48 |
AlecTaylor | Ah, okay, no worries. I'm thinking to patch it to use libxml2. That way it'll be easier to extend with my extra stuff | 11:11.36 |
AlecTaylor | has to figure out how to use it first | 11:12.40 |
sebras | AlecTaylor: http://pastebin.com/MP78cJkL | 11:20.03 |
| I don't have permissions to push to the repo and I don't have my own text-branch as of yet. | 11:20.29 |
| tor8: you may want that patch too... | 11:21.13 |
AlecTaylor | line 809 in it, testing now | 11:21.46 |
tor8 | right. | 11:25.58 |
sebras | tor8: uploaded to bugzilla so we don't lose it. | 11:26.34 |
tor8 | sebras: we should probably go through all xml output modes and make sure everything is correct. which reminds me, some places probably should have the escape quoting when printing strings as well. | 11:28.08 |
AlecTaylor | value=4 1 char 1 | 11:28.29 |
| file:///G%3A/Library/law4.xml:2850: parser error | 11:28.30 |
| : Extra content at the end of the document | 11:28.30 |
| <page> | 11:28.30 |
| ^ | 11:28.30 |
| 4 14 #text 0 | 11:28.30 |
| value=G:\Users\Samuel\University\COMP350\Library\law4.xml : failed to parse | 11:28.30 |
| Hmm, I'll add it to the bugtracker | 11:28.53 |
tor8 | sebras: or, just dump it as JSON instead of XML ;) | 11:28.56 |
sebras | tor8: that'd be an option. | 11:29.15 |
AlecTaylor | I found this project in my research - https://sourceforge.net/projects/pdf2xml/ - It seems rather good, if I could only get it to compile... (used a precompiled version, which gave quite useful output) | 11:29.38 |
sebras | tor8: is json less fuzzy about well-formedness..? | 11:30.16 |
tor8 | it's less redundantly verbose, and a bit easier to parse | 11:30.38 |
sebras | tor8: and requires a multitude of libraries to parse? ;) | 11:31.01 |
tor8 | of course, most people who use XML use these huge bloated libraries which are way sledgehammery overkill | 11:31.02 |
sebras | tor8: I'm giving the full doc another round of testing, but it looked fine for a single page... | 11:31.53 |
tor8 | we're missing the <?xml?> header tag and sometimes a top-level wrapping element since xml only allows one top-level tag | 11:32.45 |
sebras | tor8: right, it's the top-level tag that cases it. | 11:34.35 |
| tor8: fixing that likely requires additions in drawrange() or similar, but I don't want to mess it up more than necessary. I'll leave the header and top-level tags to you. | 11:37.26 |
AlecTaylor | http://bugs.ghostscript.com/show_bug.cgi?id=692586 - Updated with patch compiled output | 11:38.27 |
sebras | AlecTaylor: as you probably can see from the discussion above, if you add <document> at the top of the xml-file and </document> at the end you'll have a reasonably good xml. | 11:39.37 |
AlecTaylor | trying now | 11:39.55 |
sebras | AlecTaylor: I'm sure we'll get around to fixing it proper later, but please be adviced that origin/text is on of the development branches. it is not stable, so don't expect too much. ;) | 11:40.41 |
Robin_Watts | tor8: You know the bbox of blocks is always 0 0 0 0 in the current code ? | 11:41.00 |
tor8 | Robin_Watts: I probably forget to update them somewhere | 11:41.20 |
| the code is a bit of a jumble at the moment with a lot of approaches to assembling the text | 11:41.38 |
AlecTaylor | So standardise with an XML or JSON library? | 11:45.21 |
Robin_Watts | AlecTaylor: The use of an XML (or JSON) library is NOT the issue. | 11:52.28 |
| You're talking about whether we put metallic or matt paint on the car, when we're still tinkering with the engine design. | 11:53.14 |
AlecTaylor | k | 11:56.25 |
| Final question before I go back to my research; is there any chance you can make the output as comprehensive as the pdftohtml -xml project? | 11:58.45 |
| *I mean the pdf2xml project | 11:59.08 |
tor8 | AlecTaylor: pastebin a sample of the pdf2xml output | 12:05.35 |
AlecTaylor | http://pastebin.com/6BaP4m05 | 12:09.39 |
| (that's just an extract of a few pages + original top&bottom) | 12:10.36 |
| From this: http://mises.org/Books/humanaction.pdf | 12:10.49 |
tor8 | that's less information than we provide... | 12:11.04 |
AlecTaylor | tor8: But is there a way to display that amount of information with your tool? | 12:11.32 |
tor8 | if you want to change the format, look for the function fz_print_text_page() in fitz/dev_text.c | 12:13.17 |
sebras | AlecTaylor: what information are you missing in the output? | 12:13.34 |
| AlecTaylor: remember that the bbox combines x, y, width and height. | 12:13.56 |
AlecTaylor | sebras: The text all being per lines, not per character | 12:14.03 |
| That's what I'd like the most from this | 12:14.16 |
tor8 | "Use the source, Luke" | 12:14.52 |
| kens: how about we bash our heads together and figure out a common format for the textwrite device and the pdfdraw text extraction? | 12:15.50 |
| kens: I was thinking if there's some standard or commonly used format used by OCR tools | 12:16.22 |
sebras | tor8: hm.. when i pdfclean lawofthehayes00ewinrich.pdf it triggers the compression bomb logic. | 12:18.07 |
AlecTaylor | Groups contiguous text blocks into a single <text> tag, so that separate small text elements (usually a few characters at a time) are grouped into complete lines of text. | 12:18.16 |
tor8 | AlecTaylor: we already have that grouping. what you want is to throw away the per-character bounding box information in the xml output. | 12:19.04 |
AlecTaylor | yeah | 12:19.34 |
tor8 | all you need to do is change 5 lines in fz_print_text_page_xml in fitz/dev_text.c and you're done | 12:20.10 |
AlecTaylor | Which 5 lines? | 12:20.36 |
| & to what? | 12:21.32 |
tor8 | maybe line 793 that prints the xml element that you don't want...? | 12:21.35 |
| I'm sorry if I can't be more specific, if you need help programming you'll have to ask someone else, I don't have time to tutor you | 12:22.04 |
kens | tor8 sorry was at lunch. | 12:22.13 |
tor8 | kens: np. at least you saw my question :) | 12:22.29 |
kens | I'm happy to take your suggestions on board. The 'complex' output from txtwrite has not had much work and is well behind the 'simple' output | 12:22.52 |
tor8 | kens: it's no rush, just all this talk about the text extraction reminded me :) | 12:28.26 |
AlecTaylor | :) | 12:30.35 |
kens | tor8 unfortunately I'm swamped :-( Nearly finished the text rendering modes though. 'A few days' that turned into 4 weeks. | 12:31.20 |
tor8 | yeah... like this iphone stuff. reading miles of documents trying to understand their way overarchitected class libraries is draining | 12:32.32 |
| I'm not content just copy-pasting bits of code from stackoverflow which I assume 90% of iphone developers resort to | 12:32.54 |
kens | Almost certainly ! | 12:33.06 |
| In my case its a nasty intersectio0n between teh PDF itnerpreter (written in PostScript) the PostScript interpreter, and teh pdfwrite code in C :-( | 12:33.30 |
| The C code interaction with teh interpreter (pattern colour spaces) is quite tricky.... | 12:33.49 |
tor8 | doesn't help that the ios docs are full of "let's explain this in excruciating detail, while pretending you're a complete moron" ... the linux man pages are a better read than that... | 12:33.56 |
kens | That's quite a damning comment ;-) | 12:34.15 |
tor8 | the docs are full of helpful comments like "using a higher resolution image will make for a crisper image on a high resolution device" | 12:34.49 |
Robin_Watts | The problem I found with iOS and android docs both was that they explain the minutiae, but never give the big picture. | 12:34.59 |
kens | well duh! | 12:35.00 |
tor8 | Robin_Watts: exactly! | 12:35.12 |
| the few bits of big picture they have are sadly lacking in detail | 12:35.35 |
| I'm trying to figure out how they exactly intend you to switch between different "views" so I can swap between the file browser, outline view and document view. | 12:36.50 |
Robin_Watts | ponders putting windows 7 and a RAM upgrade on my existing machine... | 12:38.12 |
tor8 | Robin_Watts: what are you running now? | 12:40.31 |
Robin_Watts | 32bit Windows XP. | 12:40.39 |
| On a Core 2 Quad (Q6700) | 12:40.48 |
kens | From the current comments on WIndows 8 I think I may have to upgrade before it gets released in order to avoid it. | 12:40.58 |
tor8 | Robin_Watts: ah. yeah, I would recommend that upgrade. | 12:41.13 |
Robin_Watts | with 4Gig RAM, with a 512Meg graphics card, so only 3Gig is being used. | 12:41.23 |
| In theory I can go to 16Gig without changing any other hardware. | 12:41.47 |
| Windows 7 Professional sounds like the one to go for. | 12:42.18 |
tor8 | Robin_Watts: waste of money, IMO. home premium is more than adequate. | 12:42.36 |
Robin_Watts | Ultimate has nothing I want, and premium lacks the Remote Desktop. | 12:42.44 |
kens | Don't think I want Remote Desktop either | 12:43.01 |
Robin_Watts | With XP, home was crippled with regards to file sharing too. | 12:43.19 |
| (limited access to permissions on fileshares etc) | 12:43.45 |
tor8 | Robin_Watts: remote desktop *server* is the only thing missing. | 12:43.52 |
Robin_Watts | tor8: Right. So you can't connect *in*. | 12:44.08 |
| which admittedly I don't do much, but would be nice occasionally. | 12:44.41 |
tor8 | tightvnc apparently has good support on windows. still doesn't come near the real thing though. too bad the real thing isn't cross platform :( | 12:45.25 |
| macosx has a vnc server built in, but yikes is it slow | 12:45.50 |
AlecTaylor | is running Windows 8 Dev Preview | 12:49.41 |
| (x64) | 12:49.45 |
| Robin_Watts: UltraVNC | 12:51.55 |
kens | One more time round the clusterpush merry-go-round... | 12:52.58 |
Robin_Watts | foods | 12:53.33 |
| mvrhel2: Hey. | 16:09.16 |
| I've got a wird one here. | 16:09.23 |
| wierd one. | 16:09.26 |
| weird one. | 16:09.34 |
| got there eventually :) | 16:09.40 |
mvrhel2 | hi Robin_Watts | 16:10.11 |
Robin_Watts | In the pamcmyk4 vs plank tests, I have a postscript file that's giving different results. | 16:10.20 |
| And this happens regardless of banding. | 16:10.35 |
BubbaH57 | I upgraded my win32 system today to 9.04 and it seems that the same command from 9.02 are running much slower. Is this a known issue? | 16:10.38 |
Robin_Watts | BubbaH57: Try with -dNOTRANSPARENCY | 16:11.37 |
mvrhel2 | Robin_Watts: does it have patterns? | 16:11.42 |
Robin_Watts | Yes. | 16:11.45 |
ray_laptop | BubbaH57: please open a bug report and we'll have a look. We generally don't like to get slower. | 16:12.45 |
Robin_Watts | The pattern in question is a halftoned magenta blob, with a halftoned blue blob on top of it, and just extending off to the left hand side. | 16:12.49 |
| In the plank case, the bit of the blue blob that extends out of the magenta blob is cyan, not blue. | 16:13.18 |
| Let me find you a png to look at. | 16:13.40 |
mvrhel2 | sounds like it may not be getting all the channels | 16:13.51 |
BubbaH57 | Robin_Watts: Doesn't seem to help. This command http://pastebin.com/XP34hXPu would exectute in a matter of 10-20 seconds before. Now, it's taking over a minute. | 16:14.00 |
Robin_Watts | -dNOINTERPOLATE | 16:14.12 |
kens | Time for me to go, (without Lewis Carrol impersonations tonight). Goodnight all | 16:15.24 |
Robin_Watts | http://ghostscript.com/~robin/plank.png | 16:17.34 |
| http://ghostscript.com/~robin/pamcmyk4.png | 16:17.38 |
BubbaH57 | Robin_Watts: no change. has been running 1m30s and just finished | 16:18.16 |
| It is working ... just slooow | 16:18.26 |
Robin_Watts | Is the quality the same? | 16:18.36 |
BubbaH57 | visually appears to be | 16:18.58 |
Robin_Watts | Generally when we see a slowdown like that, it's because we've fixed a bug and it's now taking longer to do something right :) | 16:19.02 |
mvrhel2 | Robin_Watts: we should probably chop this file down to just a fill with that pattern | 16:19.04 |
Robin_Watts | mvrhel2: I've tried. | 16:19.11 |
| Tiny changes in the file make it work :( | 16:19.27 |
mvrhel2 | oh no | 16:19.37 |
| that is odd | 16:19.50 |
| maybe we should have alexcher chop it down | 16:20.06 |
Robin_Watts | By the time I get to the code that *uses* the pattern, the pattern as loaded into the dev_color is fine. | 16:20.07 |
| s/is fine/shows what we get in the output/ | 16:20.28 |
| So the problem is in the construction of the pattern tile. | 16:20.38 |
mvrhel2 | I see | 16:20.45 |
| did you use the debug option to dump out what the pattern is when it is created | 16:21.03 |
Robin_Watts | So I need to look at what operations the pattern calls. | 16:21.03 |
mvrhel2 | just to catch it after it is constructed | 16:21.12 |
Robin_Watts | No.. wossat option then ? | 16:21.23 |
mvrhel2 | hold on | 16:21.27 |
| RAW_PATTERN_DUMP | 16:23.08 |
Robin_Watts | Will try that, thanks. | 16:23.46 |
mvrhel2 | if you would like me to dig a bit at it let me know. still working on the screen creation stuff | 16:24.17 |
Robin_Watts | I'll keep bashing. | 16:25.43 |
| I'd got it into my head that if it was going wrong at pattern creation time it wasn't my problem - but of course it'll be using the planar stuff for that too. | 16:26.12 |
ray_laptop | BubbaH57: try setting -dMaxPatternBitmap=100000000 (just to see if you are tripping over pattern-clist) | 16:27.03 |
BubbaH57 | yes sir | 16:27.45 |
| that helped, thankee | 16:28.55 |
ray_laptop | I had fixed a bug in the calculation of the number of bytes that a pattern would need as a tile, and even though the default limit for triggering pattern clist mode was the same, more patterns would now exceed the limit | 16:31.00 |
| that change/fix was made Jul 31 2010, so 9.02 should have been the same. BubbaH57 can you send the file ? email to me is fine if you don't want to open a bug. | 16:39.27 |
| BubbaH57: if something else changed in the way pattern clist performs, I'd like to look into it. I'm looking at performance in that area right now. | 16:42.39 |
Robin_Watts | marcosw_: Good news about lcms1 vs 2 - thanks. | 16:43.03 |
marcosw_ | no problem. I'm running the complete test suite today and should have confirmation that everything is okay by tomorrow morning (my time). | 16:43.41 |
mvrhel2 | nice | 16:44.28 |
henrys | marcosw_, ray_laptop:I didn't hear from miles today did you guys want to have a meeting on IRC? | 16:48.42 |
marcosw_ | I have a meeting in 10 minutes; can we meet this afternoon or tomorrow morning? | 16:49.28 |
henrys | okay well next time the 3 of us are here we'll do it. | 16:50.42 |
marcosw_ | sounds good. thanks. | 16:53.29 |
| we might want to think about moving the regular meeting, since my 10:00am Thursday meeting at school has become a regular thing. The simplest thing would be to move it to Thursday @ 9:00am, but we should probably check with Miles. | 16:54.46 |
henrys | fine by me. | 16:56.55 |
BubbaH57 | After all of that, it appears that the root cause was environmental. I rebooted the laptop (to pick up some MS updates) and now I'm seeing the same results ... as I expected. | 17:03.03 |
| I suspect that I crossed up environment variables or something. perhaps. | 17:03.27 |
| I'm embarrassed. | 17:04.04 |
Robin_Watts | I have a postscripty question. | 17:12.12 |
| I have a file that's setting up a pattern. | 17:12.25 |
| The pattern definition creates a path, does a gsave, sets the color, fills, grestores, then strokes. | 17:13.43 |
| i.e. the color for the fill is explicit within the pattern definition. | 17:14.03 |
| the color for the stroke is not. | 17:14.09 |
| What is the stroke color? | 17:14.17 |
| Whatever stroke color was in place when the pattern was defined? | 17:14.31 |
henrys | I think it depends on the painttype colored time of makepattern uncolored time of paintproc? At least that is my interpretation of PaintType in the plrm. | 17:23.59 |
mvrhel2 | bbiaw. need to head to the airport to pick up my father | 17:27.49 |
Robin_Watts | PaintType=1, PatternType=1, TilingType=1 | 17:28.02 |
henrys | I have no idea how that is supposed to work with our pattern cache, probably doesn't and we just don't have a test for it. | 17:28.09 |
Robin_Watts | This is 09-47N.ps | 17:28.30 |
| and it SEEMS to be correctly giving me a pattern color when it comes to draw the stroke. | 17:29.00 |
| The question is, is that pattern color correct (and the stroke is drawing it wrong) or is that pattern color wrong already. | 17:29.37 |
henrys | the stroke color should be the color when makepattern was executed. | 17:30.23 |
Robin_Watts | which is another pattern. | 17:30.35 |
henrys | oh my what sort of pattern was that? | 17:31.26 |
Robin_Watts | 1,1,1 again | 17:32.23 |
chrisl | Robin_Watts: is this the file with the "pacman" shape in it? | 17:33.36 |
Robin_Watts | Lots of different shapes, no obvious pacman. | 17:34.20 |
| The bit in question is stars filled with patterns. | 17:34.33 |
| (My God, it's full of patterns) | 17:34.42 |
chrisl | ;-) There's a few QL tests that seem to have patterns in patterns - I hate that...... | 17:35.08 |
| IIRC, henrys is correct, a colored tiling pattern will inherit the color(space) from the current graphics state when the pattern is instantiated - which could be a pattern color space. | 17:37.18 |
henrys | We'd have to look at some tests to see what adobe really does, I think the plrm definition is not clear enough to handle the recursive situation. | 17:37.34 |
| but my first guess would be the color for the stroke should be the pattern active when the first pattern was made. | 17:38.55 |
chrisl | No, it's not clear - the really horrible one I banged me head on a few years ago was colored pattern inheriting an uncolored pattern space, and the other way round (an uncolored pattern painted using colored pattern). | 17:39.16 |
Robin_Watts | Right. That's consistent with what I'm seeing, I think. | 17:39.22 |
henrys | chrisl, Robin_Watts:messy | 17:39.24 |
Robin_Watts | Why is it never the simple cases that go wrong? :) | 17:39.41 |
chrisl | henrys: I seem to remember that's the conclusion I came to as well - otherwise "makepattern" makes no sense, as it is supposed to create the pattern cache entry. | 17:40.41 |
| Anyway, I need to go eat - I have a squash match later on...... | 17:41.14 |
Robin_Watts | mvhrel2: With the RAW_PATTERN_DUMP, what order are the planes in? | 18:08.42 |
| (when in chunky format) | 18:08.57 |
| Ha! | 18:31.46 |
| memset the tile to have all it's planes fully set, and the pattern only drew the first plane. | 18:32.10 |
| So the problem must be in whatever call the stroke ends up calling. | 18:32.26 |
mvrhel2 | Robin_Watts: just got back. did you figure out the planes in the raw data? | 19:15.00 |
| it is CMYK | 19:15.03 |
| or RGB | 19:15.05 |
Robin_Watts | c being the high bit, yeah. | 19:15.15 |
| thanks. | 19:15.18 |
mvrhel2 | ok great | 19:15.23 |
Robin_Watts | I can see where the code is going wrong. | 19:15.26 |
| I have ABSOLUTELY no idea why though. | 19:15.34 |
mvrhel2 | oh good | 19:15.35 |
| oh | 19:15.39 |
Robin_Watts | I'll keep stepping. | 19:15.40 |
| It'll be something absurdly stupid, I'm sure. | 19:15.55 |
| And if it involves a macro I shall stick another pin in the voodoo doll. | 19:16.32 |
mvrhel2 | ha | 19:16.40 |
Robin_Watts | I think I see it. | 19:21.43 |
| in mem_planar_copy_plane, if the plane depth is 1, we call copy_mono to do the copy_plane. | 19:22.11 |
| and we offset the line pointers to allow for each new plane. | 19:22.22 |
| but I bet copy_mono doesn't use the line_pointers. | 19:22.32 |
| Damn. Not that. | 19:27.34 |
| Ah, the line pointers are incorrect. | 19:37.17 |
pickcoder | I am on 9.02 and possibly ran across a pdfmarks issue with 1.4 compat. With v1.3 set the output is fine. Adobe reader says the 1.4 format can not be read. Have you seen this before and could this be a content issue as apposed to a code problem? | 19:47.09 |
Robin_Watts | They are getting smushed as part of a garbage collection thing. I think the area can't be being alloced big enough. | 19:56.14 |
| mvhrel2: You here? | 20:00.48 |
mvrhel2 | Robin_Watts: sorry | 20:26.40 |
| you misspelled my name so it was not blinking or beeping. | 20:26.51 |
| I need to have Chatzilla set up to catch the mispellings | 20:27.08 |
| ok. I think I have it set up to catch those now | 20:29.16 |
| need to head out to get kids from school in a bit | 20:29.41 |
pickcoder | Robin_Watts: was that directed to me or elsewhere? | 20:46.32 |
Robin_Watts | pickcoder: Something else. | 21:08.51 |
| vmr2hle: Sorry! | 21:09.12 |
| gah. chunk memory vm reclaim hell. enough for tonight. | 23:51.41 |
| Forward 1 day (to 2011/10/14)>>> | |