IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2011/10/12)2011/10/13 
henrys I'll follow up with him.00:03.42 
LaoLang_cool hello, doesn't mupdf support hyperlink in pdf?02:18.01 
AlecTaylor hi03:12.13 
ghostbot salut03:12.13 
AlecTaylor http://bugs.ghostscript.com/show_bug.cgi?id=69258603:33.25 
  My XML output from pdfdraw is not parsable by LibXML2.... any ideas?03:48.25 
mvrhel2 oops. bed time for me06:55.41 
chrisl reboots for update.......07:27.29 
kens Hmm 2 bug reports for me opened and closed overnight, that's a good result :-)07:38.30 
chrisl Well, I did one - turned out to be an evince/poppler bug, rather than pdfwrite07:39.02 
  That was the mangled clipping path one that came up on here just before you finished yesterday07:39.47 
kens Yes, which was nice. Someone spotted the complex output problem in txtwrite, I'd changed teh types from into to float, but not changed the sprintf.07:39.51 
chrisl Oops, easy oversight.......07:40.26 
kens Indeed, especially since I was concentrating on the 'simple' output. The differnece is that teh simple output requires lots of procesing to reassemble the text while the 'complex' output just dumps everything out :-)07:41.07 
Robin_Watts Le Roi est mort... http://boingboing.net/2011/10/12/dennis-ritchie-1941-2011-computer-scientist-unix-co-creator-c-co-inventor.html09:54.12 
kens Yes, I just aaw it.09:54.24 
  On the regsiter09:54.28 
  But its a bit of a rmour, unless they've found more conrete evidence09:54.49 
chrisl I missed that: that's sad news......09:58.41 
AlecTaylor hi10:48.11 
ghostbot hi10:48.11 
AlecTaylor What's the relationships between xPDF, MuPDF and ghostscript?10:49.14 
kens xpdf, nothing.10:50.03 
  MuPDF and GS are licenced by Articfex10:50.15 
  And so have some of the same developers10:50.30 
AlecTaylor But don't share codebase?10:56.21 
kens no10:56.33 
  at least, not much10:56.43 
AlecTaylor hmm, kk. Thanks for clearing that up10:56.44 
  oh, kK!!10:56.50 
  Finally; http://bugs.ghostscript.com/show_bug.cgi?id=692586 - XML output from pdfdraw not parsable by LibXML210:57.17 
  kens: My XML output from pdfdraw is not parsable by LibXML2. How to fix?11:08.28 
sebras AlecTaylor: I'm trying to reproduce your problem.11:09.36 
  Not sure where the problem is or how to fix it yet.11:09.48 
AlecTaylor Ah, okay, no worries. I'm thinking to patch it to use libxml2. That way it'll be easier to extend with my extra stuff11:11.36 
AlecTaylor has to figure out how to use it first11:12.40 
sebras AlecTaylor: http://pastebin.com/MP78cJkL11:20.03 
  I don't have permissions to push to the repo and I don't have my own text-branch as of yet.11:20.29 
  tor8: you may want that patch too...11:21.13 
AlecTaylor line 809 in it, testing now11:21.46 
tor8 right.11:25.58 
sebras tor8: uploaded to bugzilla so we don't lose it.11:26.34 
tor8 sebras: we should probably go through all xml output modes and make sure everything is correct. which reminds me, some places probably should have the escape quoting when printing strings as well.11:28.08 
AlecTaylor value=4 1 char 111:28.29 
  file:///G%3A/Library/law4.xml:2850: parser error11:28.30 
  : Extra content at the end of the document11:28.30 
  <page>11:28.30 
  ^11:28.30 
  4 14 #text 011:28.30 
  value=G:\Users\Samuel\University\COMP350\Library\law4.xml : failed to parse11:28.30 
  Hmm, I'll add it to the bugtracker11:28.53 
tor8 sebras: or, just dump it as JSON instead of XML ;)11:28.56 
sebras tor8: that'd be an option.11:29.15 
AlecTaylor I found this project in my research - https://sourceforge.net/projects/pdf2xml/ - It seems rather good, if I could only get it to compile... (used a precompiled version, which gave quite useful output)11:29.38 
sebras tor8: is json less fuzzy about well-formedness..?11:30.16 
tor8 it's less redundantly verbose, and a bit easier to parse11:30.38 
sebras tor8: and requires a multitude of libraries to parse? ;)11:31.01 
tor8 of course, most people who use XML use these huge bloated libraries which are way sledgehammery overkill11:31.02 
sebras tor8: I'm giving the full doc another round of testing, but it looked fine for a single page...11:31.53 
tor8 we're missing the <?xml?> header tag and sometimes a top-level wrapping element since xml only allows one top-level tag11:32.45 
sebras tor8: right, it's the top-level tag that cases it.11:34.35 
  tor8: fixing that likely requires additions in drawrange() or similar, but I don't want to mess it up more than necessary. I'll leave the header and top-level tags to you.11:37.26 
AlecTaylor http://bugs.ghostscript.com/show_bug.cgi?id=692586 - Updated with patch compiled output11:38.27 
sebras AlecTaylor: as you probably can see from the discussion above, if you add <document> at the top of the xml-file and </document> at the end you'll have a reasonably good xml.11:39.37 
AlecTaylor trying now11:39.55 
sebras AlecTaylor: I'm sure we'll get around to fixing it proper later, but please be adviced that origin/text is on of the development branches. it is not stable, so don't expect too much. ;)11:40.41 
Robin_Watts tor8: You know the bbox of blocks is always 0 0 0 0 in the current code ?11:41.00 
tor8 Robin_Watts: I probably forget to update them somewhere11:41.20 
  the code is a bit of a jumble at the moment with a lot of approaches to assembling the text11:41.38 
AlecTaylor So standardise with an XML or JSON library?11:45.21 
Robin_Watts AlecTaylor: The use of an XML (or JSON) library is NOT the issue.11:52.28 
  You're talking about whether we put metallic or matt paint on the car, when we're still tinkering with the engine design.11:53.14 
AlecTaylor k11:56.25 
  Final question before I go back to my research; is there any chance you can make the output as comprehensive as the pdftohtml -xml project?11:58.45 
  *I mean the pdf2xml project11:59.08 
tor8 AlecTaylor: pastebin a sample of the pdf2xml output12:05.35 
AlecTaylor http://pastebin.com/6BaP4m0512:09.39 
  (that's just an extract of a few pages + original top&bottom)12:10.36 
  From this: http://mises.org/Books/humanaction.pdf12:10.49 
tor8 that's less information than we provide...12:11.04 
AlecTaylor tor8: But is there a way to display that amount of information with your tool?12:11.32 
tor8 if you want to change the format, look for the function fz_print_text_page() in fitz/dev_text.c12:13.17 
sebras AlecTaylor: what information are you missing in the output?12:13.34 
  AlecTaylor: remember that the bbox combines x, y, width and height.12:13.56 
AlecTaylor sebras: The text all being per lines, not per character12:14.03 
  That's what I'd like the most from this12:14.16 
tor8 "Use the source, Luke"12:14.52 
  kens: how about we bash our heads together and figure out a common format for the textwrite device and the pdfdraw text extraction?12:15.50 
  kens: I was thinking if there's some standard or commonly used format used by OCR tools12:16.22 
sebras tor8: hm.. when i pdfclean lawofthehayes00ewinrich.pdf it triggers the compression bomb logic.12:18.07 
AlecTaylor Groups contiguous text blocks into a single <text> tag, so that separate small text elements (usually a few characters at a time) are grouped into complete lines of text.12:18.16 
tor8 AlecTaylor: we already have that grouping. what you want is to throw away the per-character bounding box information in the xml output.12:19.04 
AlecTaylor yeah12:19.34 
tor8 all you need to do is change 5 lines in fz_print_text_page_xml in fitz/dev_text.c and you're done12:20.10 
AlecTaylor Which 5 lines?12:20.36 
  & to what?12:21.32 
tor8 maybe line 793 that prints the xml element that you don't want...?12:21.35 
  I'm sorry if I can't be more specific, if you need help programming you'll have to ask someone else, I don't have time to tutor you12:22.04 
kens tor8 sorry was at lunch.12:22.13 
tor8 kens: np. at least you saw my question :)12:22.29 
kens I'm happy to take your suggestions on board. The 'complex' output from txtwrite has not had much work and is well behind the 'simple' output12:22.52 
tor8 kens: it's no rush, just all this talk about the text extraction reminded me :)12:28.26 
AlecTaylor :)12:30.35 
kens tor8 unfortunately I'm swamped :-( Nearly finished the text rendering modes though. 'A few days' that turned into 4 weeks.12:31.20 
tor8 yeah... like this iphone stuff. reading miles of documents trying to understand their way overarchitected class libraries is draining12:32.32 
  I'm not content just copy-pasting bits of code from stackoverflow which I assume 90% of iphone developers resort to12:32.54 
kens Almost certainly !12:33.06 
  In my case its a nasty intersectio0n between teh PDF itnerpreter (written in PostScript) the PostScript interpreter, and teh pdfwrite code in C :-(12:33.30 
  The C code interaction with teh interpreter (pattern colour spaces) is quite tricky....12:33.49 
tor8 doesn't help that the ios docs are full of "let's explain this in excruciating detail, while pretending you're a complete moron" ... the linux man pages are a better read than that...12:33.56 
kens That's quite a damning comment ;-)12:34.15 
tor8 the docs are full of helpful comments like "using a higher resolution image will make for a crisper image on a high resolution device"12:34.49 
Robin_Watts The problem I found with iOS and android docs both was that they explain the minutiae, but never give the big picture.12:34.59 
kens well duh!12:35.00 
tor8 Robin_Watts: exactly!12:35.12 
  the few bits of big picture they have are sadly lacking in detail12:35.35 
  I'm trying to figure out how they exactly intend you to switch between different "views" so I can swap between the file browser, outline view and document view.12:36.50 
Robin_Watts ponders putting windows 7 and a RAM upgrade on my existing machine...12:38.12 
tor8 Robin_Watts: what are you running now?12:40.31 
Robin_Watts 32bit Windows XP.12:40.39 
  On a Core 2 Quad (Q6700)12:40.48 
kens From the current comments on WIndows 8 I think I may have to upgrade before it gets released in order to avoid it.12:40.58 
tor8 Robin_Watts: ah. yeah, I would recommend that upgrade.12:41.13 
Robin_Watts with 4Gig RAM, with a 512Meg graphics card, so only 3Gig is being used.12:41.23 
  In theory I can go to 16Gig without changing any other hardware.12:41.47 
  Windows 7 Professional sounds like the one to go for.12:42.18 
tor8 Robin_Watts: waste of money, IMO. home premium is more than adequate.12:42.36 
Robin_Watts Ultimate has nothing I want, and premium lacks the Remote Desktop.12:42.44 
kens Don't think I want Remote Desktop either12:43.01 
Robin_Watts With XP, home was crippled with regards to file sharing too.12:43.19 
  (limited access to permissions on fileshares etc)12:43.45 
tor8 Robin_Watts: remote desktop *server* is the only thing missing.12:43.52 
Robin_Watts tor8: Right. So you can't connect *in*.12:44.08 
  which admittedly I don't do much, but would be nice occasionally.12:44.41 
tor8 tightvnc apparently has good support on windows. still doesn't come near the real thing though. too bad the real thing isn't cross platform :(12:45.25 
  macosx has a vnc server built in, but yikes is it slow12:45.50 
AlecTaylor is running Windows 8 Dev Preview12:49.41 
  (x64)12:49.45 
  Robin_Watts: UltraVNC12:51.55 
kens One more time round the clusterpush merry-go-round...12:52.58 
Robin_Watts foods12:53.33 
  mvrhel2: Hey.16:09.16 
  I've got a wird one here.16:09.23 
  wierd one.16:09.26 
  weird one.16:09.34 
  got there eventually :)16:09.40 
mvrhel2 hi Robin_Watts16:10.11 
Robin_Watts In the pamcmyk4 vs plank tests, I have a postscript file that's giving different results.16:10.20 
  And this happens regardless of banding.16:10.35 
BubbaH57 I upgraded my win32 system today to 9.04 and it seems that the same command from 9.02 are running much slower. Is this a known issue?16:10.38 
Robin_Watts BubbaH57: Try with -dNOTRANSPARENCY16:11.37 
mvrhel2 Robin_Watts: does it have patterns?16:11.42 
Robin_Watts Yes.16:11.45 
ray_laptop BubbaH57: please open a bug report and we'll have a look. We generally don't like to get slower.16:12.45 
Robin_Watts The pattern in question is a halftoned magenta blob, with a halftoned blue blob on top of it, and just extending off to the left hand side.16:12.49 
  In the plank case, the bit of the blue blob that extends out of the magenta blob is cyan, not blue.16:13.18 
  Let me find you a png to look at.16:13.40 
mvrhel2 sounds like it may not be getting all the channels16:13.51 
BubbaH57 Robin_Watts: Doesn't seem to help. This command http://pastebin.com/XP34hXPu would exectute in a matter of 10-20 seconds before. Now, it's taking over a minute.16:14.00 
Robin_Watts -dNOINTERPOLATE16:14.12 
kens Time for me to go, (without Lewis Carrol impersonations tonight). Goodnight all16:15.24 
Robin_Watts http://ghostscript.com/~robin/plank.png16:17.34 
  http://ghostscript.com/~robin/pamcmyk4.png16:17.38 
BubbaH57 Robin_Watts: no change. has been running 1m30s and just finished16:18.16 
  It is working ... just slooow16:18.26 
Robin_Watts Is the quality the same?16:18.36 
BubbaH57 visually appears to be16:18.58 
Robin_Watts Generally when we see a slowdown like that, it's because we've fixed a bug and it's now taking longer to do something right :)16:19.02 
mvrhel2 Robin_Watts: we should probably chop this file down to just a fill with that pattern16:19.04 
Robin_Watts mvrhel2: I've tried.16:19.11 
  Tiny changes in the file make it work :(16:19.27 
mvrhel2 oh no16:19.37 
  that is odd16:19.50 
  maybe we should have alexcher chop it down16:20.06 
Robin_Watts By the time I get to the code that *uses* the pattern, the pattern as loaded into the dev_color is fine.16:20.07 
  s/is fine/shows what we get in the output/16:20.28 
  So the problem is in the construction of the pattern tile.16:20.38 
mvrhel2 I see16:20.45 
  did you use the debug option to dump out what the pattern is when it is created16:21.03 
Robin_Watts So I need to look at what operations the pattern calls.16:21.03 
mvrhel2 just to catch it after it is constructed16:21.12 
Robin_Watts No.. wossat option then ?16:21.23 
mvrhel2 hold on16:21.27 
  RAW_PATTERN_DUMP16:23.08 
Robin_Watts Will try that, thanks.16:23.46 
mvrhel2 if you would like me to dig a bit at it let me know. still working on the screen creation stuff16:24.17 
Robin_Watts I'll keep bashing.16:25.43 
  I'd got it into my head that if it was going wrong at pattern creation time it wasn't my problem - but of course it'll be using the planar stuff for that too.16:26.12 
ray_laptop BubbaH57: try setting -dMaxPatternBitmap=100000000 (just to see if you are tripping over pattern-clist)16:27.03 
BubbaH57 yes sir16:27.45 
  that helped, thankee16:28.55 
ray_laptop I had fixed a bug in the calculation of the number of bytes that a pattern would need as a tile, and even though the default limit for triggering pattern clist mode was the same, more patterns would now exceed the limit16:31.00 
  that change/fix was made Jul 31 2010, so 9.02 should have been the same. BubbaH57 can you send the file ? email to me is fine if you don't want to open a bug.16:39.27 
  BubbaH57: if something else changed in the way pattern clist performs, I'd like to look into it. I'm looking at performance in that area right now.16:42.39 
Robin_Watts marcosw_: Good news about lcms1 vs 2 - thanks.16:43.03 
marcosw_ no problem. I'm running the complete test suite today and should have confirmation that everything is okay by tomorrow morning (my time).16:43.41 
mvrhel2 nice16:44.28 
henrys marcosw_, ray_laptop:I didn't hear from miles today did you guys want to have a meeting on IRC?16:48.42 
marcosw_ I have a meeting in 10 minutes; can we meet this afternoon or tomorrow morning?16:49.28 
henrys okay well next time the 3 of us are here we'll do it.16:50.42 
marcosw_ sounds good. thanks.16:53.29 
  we might want to think about moving the regular meeting, since my 10:00am Thursday meeting at school has become a regular thing. The simplest thing would be to move it to Thursday @ 9:00am, but we should probably check with Miles.16:54.46 
henrys fine by me.16:56.55 
BubbaH57 After all of that, it appears that the root cause was environmental. I rebooted the laptop (to pick up some MS updates) and now I'm seeing the same results ... as I expected.17:03.03 
  I suspect that I crossed up environment variables or something. perhaps.17:03.27 
  I'm embarrassed. 17:04.04 
Robin_Watts I have a postscripty question.17:12.12 
  I have a file that's setting up a pattern.17:12.25 
  The pattern definition creates a path, does a gsave, sets the color, fills, grestores, then strokes.17:13.43 
  i.e. the color for the fill is explicit within the pattern definition.17:14.03 
  the color for the stroke is not.17:14.09 
  What is the stroke color?17:14.17 
  Whatever stroke color was in place when the pattern was defined?17:14.31 
henrys I think it depends on the painttype colored time of makepattern uncolored time of paintproc? At least that is my interpretation of PaintType in the plrm.17:23.59 
mvrhel2 bbiaw. need to head to the airport to pick up my father17:27.49 
Robin_Watts PaintType=1, PatternType=1, TilingType=117:28.02 
henrys I have no idea how that is supposed to work with our pattern cache, probably doesn't and we just don't have a test for it.17:28.09 
Robin_Watts This is 09-47N.ps17:28.30 
  and it SEEMS to be correctly giving me a pattern color when it comes to draw the stroke.17:29.00 
  The question is, is that pattern color correct (and the stroke is drawing it wrong) or is that pattern color wrong already.17:29.37 
henrys the stroke color should be the color when makepattern was executed.17:30.23 
Robin_Watts which is another pattern.17:30.35 
henrys oh my what sort of pattern was that?17:31.26 
Robin_Watts 1,1,1 again17:32.23 
chrisl Robin_Watts: is this the file with the "pacman" shape in it?17:33.36 
Robin_Watts Lots of different shapes, no obvious pacman.17:34.20 
  The bit in question is stars filled with patterns.17:34.33 
  (My God, it's full of patterns)17:34.42 
chrisl ;-) There's a few QL tests that seem to have patterns in patterns - I hate that......17:35.08 
  IIRC, henrys is correct, a colored tiling pattern will inherit the color(space) from the current graphics state when the pattern is instantiated - which could be a pattern color space.17:37.18 
henrys We'd have to look at some tests to see what adobe really does, I think the plrm definition is not clear enough to handle the recursive situation.17:37.34 
  but my first guess would be the color for the stroke should be the pattern active when the first pattern was made.17:38.55 
chrisl No, it's not clear - the really horrible one I banged me head on a few years ago was colored pattern inheriting an uncolored pattern space, and the other way round (an uncolored pattern painted using colored pattern).17:39.16 
Robin_Watts Right. That's consistent with what I'm seeing, I think.17:39.22 
henrys chrisl, Robin_Watts:messy17:39.24 
Robin_Watts Why is it never the simple cases that go wrong? :)17:39.41 
chrisl henrys: I seem to remember that's the conclusion I came to as well - otherwise "makepattern" makes no sense, as it is supposed to create the pattern cache entry.17:40.41 
  Anyway, I need to go eat - I have a squash match later on......17:41.14 
Robin_Watts mvhrel2: With the RAW_PATTERN_DUMP, what order are the planes in?18:08.42 
  (when in chunky format)18:08.57 
  Ha!18:31.46 
  memset the tile to have all it's planes fully set, and the pattern only drew the first plane.18:32.10 
  So the problem must be in whatever call the stroke ends up calling.18:32.26 
mvrhel2 Robin_Watts: just got back. did you figure out the planes in the raw data?19:15.00 
  it is CMYK19:15.03 
  or RGB19:15.05 
Robin_Watts c being the high bit, yeah.19:15.15 
  thanks.19:15.18 
mvrhel2 ok great19:15.23 
Robin_Watts I can see where the code is going wrong.19:15.26 
  I have ABSOLUTELY no idea why though.19:15.34 
mvrhel2 oh good19:15.35 
  oh19:15.39 
Robin_Watts I'll keep stepping.19:15.40 
  It'll be something absurdly stupid, I'm sure.19:15.55 
  And if it involves a macro I shall stick another pin in the voodoo doll.19:16.32 
mvrhel2 ha19:16.40 
Robin_Watts I think I see it.19:21.43 
  in mem_planar_copy_plane, if the plane depth is 1, we call copy_mono to do the copy_plane.19:22.11 
  and we offset the line pointers to allow for each new plane.19:22.22 
  but I bet copy_mono doesn't use the line_pointers.19:22.32 
  Damn. Not that.19:27.34 
  Ah, the line pointers are incorrect.19:37.17 
pickcoder I am on 9.02 and possibly ran across a pdfmarks issue with 1.4 compat. With v1.3 set the output is fine. Adobe reader says the 1.4 format can not be read. Have you seen this before and could this be a content issue as apposed to a code problem?19:47.09 
Robin_Watts They are getting smushed as part of a garbage collection thing. I think the area can't be being alloced big enough.19:56.14 
  mvhrel2: You here?20:00.48 
mvrhel2 Robin_Watts: sorry20:26.40 
  you misspelled my name so it was not blinking or beeping.20:26.51 
  I need to have Chatzilla set up to catch the mispellings20:27.08 
  ok. I think I have it set up to catch those now20:29.16 
  need to head out to get kids from school in a bit20:29.41 
pickcoder Robin_Watts: was that directed to me or elsewhere?20:46.32 
Robin_Watts pickcoder: Something else.21:08.51 
  vmr2hle: Sorry!21:09.12 
  gah. chunk memory vm reclaim hell. enough for tonight.23:51.41 
 Forward 1 day (to 2011/10/14)>>> 
ghostscript.com
Search: