IRC Logs

Log of #ghostscript at

 <<<Back 1 day (to 2011/11/15)2011/11/16 
Robin_Watts henrys: You're welcome. Glad to hear you think it'll fix it.01:24.51 
arthurf tor8: Hi. Tried to do some quick searching with the iOS app on a PDF file, but not much happened. I'm guessing there is more work to do? 02:52.24 
tor8 arthurf: yes. it still doesn't show any results :)02:52.50 
arthurf tor8: Okay thanks. Just wanted to check that I wasn't completely missing something. Thanks. :)02:53.22 
robingower I'm having a bit of nightmare compiling ghostscript on a mac02:54.44 
  I filed this ticket at macports months ago02:54.54 
  I can't get it to build outside of macports either02:55.18 
  any ideas of what else I could do to diagnose my problem?02:55.49 
tor8 robingower: that's odd, many of us (developers) use macs as development machines when working on ghostscript02:56.02 
robingower indeed. I've tried the usual suspects like fiddling with arch settings and I've removed fink and macports etc02:57.41 
tor8 my normal procedure is: git clone, ./, make02:59.30 
  I don't use macports or anything like that, just a plain mac install with Xcode03:00.10 
robingower 'k thanks tor8 I'll give that a try03:08.37 
  I've cloned from git:// and done autogen & make03:23.01 
  but I still get03:23.04 
  ../gs/base/gstype42.c:68: error: conflicting types for ‘gs_type42_read_data’03:23.09 
  ../gs/base/gxfont42.h:141: error: previous declaration of ‘gs_type42_read_data’ was here03:23.14 
  I've checked the offending lines (dozen's of times!) and they look the same to me03:23.36 
  i.e. same parameter types etc03:23.49 
AlecTaylor hi04:53.39 
ghostbot hola04:53.39 
AlecTaylor Any MuPDF developers around?04:53.48 
  How would I go about reverse-engineering an XML file back into the PDF?05:06.41 
alexcher AlecTaylor: XML describes a general structured document, e.g. an IP datagram.05:15.12 
AlecTaylor just posted on the mailing-list05:15.30 
alexcher AlecTaylor: PDF is a page description language.05:15.46 
  AlecTaylor: How one can be converted to another?05:16.48 
AlecTaylor alexcher: I was able to output an XML document using MuPDFs pdfextract tool. I have extended the output to detect header/footers and page numbers. Now I want to push the *proper* page numbers and additional per/page logical structure information back into the PDF05:17.18 
alexcher AlecTaylor: MuPDF developers will be here in a few hours.05:20.18 
AlecTaylor kk05:20.22 
  BTW: Did I post to the right mailing-list?05:20.32 
alexcher AlecTaylor: yes05:23.56 
AlecTaylor Great :)05:24.07 
AlecTaylor has been writing these extensions for the poppler libraries, but they don't seem to have anything for moving from XML->PDF05:24.29 
kens Morning marcosw_08:46.36 
AlecTaylor How can I reverse-engineer an XML file generated with MuPDFs pdfextract tool back into the PDF?09:01.06 
kens Write a tool to do it ?09:01.37 
  Open it with a XML reader and print to PDF ?09:01.53 
AlecTaylor kens: Back into the PDF it was created with09:04.01 
kens You can not recover teh *same* PDF (as requested somwhere else), its been interpreted and information has been lost.09:04.44 
  You can create a simliar, visually more or less identical PDF, but its not the same PDF.09:05.07 
  And we don't (as far as I know) provide a means to put it back. Why would you boother, you have the original.09:05.37 
  Actually, I guess you could do so, but I still don;t think there's a tool do it. I still don't see why you would want to eitehr, you have the original file09:07.22 
AlecTaylor kens: I am adding logical structure information and *proper* page numbers into the PDF. No actual text will be changed, just some internal information09:35.45 
AlecTaylor has developed a solution using just the stdlib and the header-only RapidXML library09:36.15 
kens Ah, editing a non-editable format :-)09:40.05 
AlecTaylor kens: eh?09:40.48 
kens Well PDF files aren't really meant to be edited, even for metadata.09:43.07 
AlecTaylor kens: True, but there are billions of PDFs without proper metadata. I would like to fix them all up!09:50.41 
AlecTaylor is sure there's a way09:50.55 
kens Creating a PDF file from the XML output of pdfextract is possilbe, but you need to regenerate the xref. SO you need an application to do it, and we haven't written one.09:52.50 
  You could use such a thing to create a Linearized PDF as well mind you09:53.07 
  But that might require more analysis09:53.25 
AlecTaylor Hmm09:56.49 
  kens: Would there be another open-source project which provides the XML->PDF feature?09:57.48 
kens I doubt it10:09.54 
AlecTaylor :(10:10.28 
kens But you know, you can be the first to write one :-)10:10.43 
AlecTaylor yeah, but been working the last few months on this project, want it completed already!10:11.02 
kens will be back later10:25.27 
Robin_Watts marcosw_: ping?12:58.50 
  Hmm. The cluster code is going to need some changes to cope with building on a windows system.13:49.03 
kens I would expect so13:49.12 
Robin_Watts Presumably we want to call the projects rather than the makefiles.13:49.15 
  (otherwise we'd be testing a cygwin build rather than an MSVC one, and defeating the point)13:49.37 
kens Yes, not good13:49.51 
  Unless you explicitly call nmake and force use of VC instead of gcc ?13:50.41 
Robin_Watts I think an explicit call to nmake is the way to go.13:50.59 
  Or, probably better, a command line call to msdev, because that checks the projects.13:51.28 
kens Will it use the right comiler though ?13:51.31 
  OK sounds good13:51.40 
Robin_Watts Normally, you have to make something run once a minute on a crontab... that's going to need tweaking too.13:52.06 
  I am NOT having 2 copies of tests_private checked out on my machine. Time for some symlinks :)13:53.02 
kens can't use task scheduling ?13:53.11 
Robin_Watts I'd rather run something to start the cluster, and kill it to stop it.13:54.20 
  (I've just spent 250 quid on a graphics card - no point in that if I'm going to have the machine drop to a crawl when someone commits :) )13:55.09 
  (this is not an Artifex owned machine :) )13:57.00 
kens So buy another one for Artifex.13:57.13 
Robin_Watts If we're going to invest in a dedicated windows node, then putting somewhere with faster network access might make sense.13:58.09 
kens Chris's office ;-)13:58.26 
chrisl as long as it's *very* quiet!13:58.55 
kens Or perhaps Alex's basement13:58.56 
  being slightly more serious for a minute13:59.15 
Robin_Watts My new graphics card is slightly more noisy than my last one :(13:59.31 
  Helens new passport has just been delivered. I guess she'll be coming to Miami after all...14:12.33 
kens Glad to hear it :-)14:12.55 
tkamppeter_ kens, can you have a look at bug 692687? If there is a simple fix for it I would like to issue this fix as a patch release for Oneiric, too.14:20.43 
kens OK14:21.56 
  I owuld always advise against 'round tripping' files though14:23.20 
tkamppeter kens, the problem here is that we are in a transition between PostScript-centric and PDF-centric CUPS filtering. In Debian and Ubuntu we have switched to PDF-centric filtering, but there are still some few applications which send print jobs in PS and some few printer drivers which rerquire PS input.14:27.04 
kens Yeah, just saying14:27.37 
  Each conversion potentially throws stuff away14:28.39 
AlecTaylor bn14:32.29 
tor8 Robin_Watts: looking through your patches now. they look good to me, but I must have something to complain about! I really do prefer: while (span) { ... span = next; } to the do { ... span = next; } while (span != NULL) form for loops like in the "stack overflow in text handling" commit14:36.43 
kens tkamppeter it seems to be somthing to do with multiple subset TrueType fonts. It looks like a lot of glyphs are being replaced with .notdef14:40.48 
  Its going to take a while just to reduce the original PostScript file to somethign simple enough to investigate. I'm rather busy with a different problem right now, how urgent is this ?14:41.40 
daviddoria In the help of a latex package it says I can run a line like this to convert all of the pages to separate png files: gs -sDEVICE=png16m -dTextAlphaBits=4 -r300 -dGraphicsAlphaBits=4 -dSAFER -q -dNOPAUSE -sOutputFile=equation%d.png Equations.pdf - however, when I do that, i just get GS> prompt?14:43.23 
kens Any error message ?14:46.00 
  Did you properly specify the input file ?14:46.07 
  Oh, and you need to add -dBATCH to exit afterwards or type 'quit'14:46.32 
daviddoria kens, oh haha it actually worked... I am just suprised it didn't exit automatically but rather took me to a GS> prompt?14:46.39 
kens See comment above14:46.50 
daviddoria yep, that did the trick, thanks14:47.02 
kens no problem14:47.10 
  tkamppeter did you see my earlier question ?14:54.35 
tkamppeter kens, it is not overly urgent, but if it is done in a week it would be great.15:09.36 
Robin_Watts tor8: I did it that way, because it exactly matched what was there before.15:12.12 
kens tkamppeter, well maybe, I can't be certain15:12.29 
Robin_Watts But, yes, it would be nicer to have used while () { ... } as that would have protected against span being NULL on entry.15:12.39 
kens It'll tkae me several hours just to reduce the problem until I can work on it15:12.42 
Robin_Watts tor8: I'll fix that if you want.15:13.06 
kens The file contains 4 fonts, 3 of tehm are compostie nad contin up to 15 sub-fonts15:13.09 
tor8 Robin_Watts: right. in other places where I've used an iteration to free a linked list instead of the lazy recursion I use while (foo) {}15:13.09 
kens tkamppeter, also this file started life as a PDF file.15:13.34 
  So ita PDF->PS->PDF->PS15:13.44 
tor8 Robin_Watts: also get rid of the != NULL explicitness, we don't do that in mupdf ;)15:14.04 
Robin_Watts but we should :)15:14.24 
tor8 then we may as well rewrite it all in pascal...15:14.41 
  do you want to amend your commit or make another one?15:15.07 
  (ie, should I push to master or wait)15:15.16 
Robin_Watts IF you haven't merged yet, I'll fix mine.15:15.28 
  but if you have, push what you've got and I'll do new ones.15:15.43 
  I can't do this immediately - in the middle of fighting the cluster.15:15.52 
tor8 I haven't merged anything yet15:15.53 
Robin_Watts by "merged" I mean "pulled my changes in", sorry.15:16.12 
tor8 I pulled them in but I haven't done anything other than looked at them15:16.29 
Robin_Watts ok, I'll redo them. Will be a nicer history that way.15:16.40 
tor8 so if you want to amend, that's no trouble for me15:16.43 
tkamppeter kens, the problem here is also that okular sends print job in PS format and it should send PDF, see 
AlecTaylor Reverse XML containing amended page number and new logical structure information (partition header/footer from other page content) using MuPDF? - If so, how?15:45.03 
Robin_Watts AlecTaylor: You're starting from the wrong place.15:46.26 
  When you convert from pdf to xml you are getting out a certain subset of the information from within the PDF.15:46.58 
kens tkamppeter it just increases the complexity, and the number of conversions. Makes it harder to sort out the real problem. Its going to take me quite a while just to simplify the problem. A file with 1 font instead of 4 and no grpahics, would help.15:47.03 
Robin_Watts If you discard the PDF you are then losing all the data.15:47.13 
AlecTaylor Robin_Watts: I needed XML so that I could have per page in "normal" text15:47.21 
Robin_Watts Far smarter to get the xml out, process it to get what you want to add, then do a process to 'add' your new information back into the original pdf.15:47.43 
AlecTaylor Robin_Watts: Also, can't I just modify the PDF that the XML was generated from15:47.49 
  yeah, that15:47.59 
  How can I do that?15:48.09 
Robin_Watts AlecTaylor: Someone suggested on the mailing list that you add it into the end of the contents streams.15:48.30 
  That's possible, but a bad idea, IMHO.15:48.44 
kens Which will break teh stream length (need to recalculate it) and the xref.15:48.50 
Robin_Watts A better idea would be to add your new content as annotations.15:48.56 
AlecTaylor So what should I do?15:48.57 
Robin_Watts kens: Yeah, but easy enough to do with a modified pdfclean.15:49.08 
AlecTaylor Robin_Watts: As annotations?15:49.12 
Robin_Watts Echo...15:49.26 
  PDF provides the facility to add annotations to a document.15:49.54 
  Read the PDF spec for more details.15:50.19 
  I suspect you'd end up taking pdfclean (which reads a pdf in, 'does stuff' to it, and then writes it out) and tweaking it.15:51.01 
  You want to read a pdf in, add annotations to it, and then write it out.15:51.13 
kens Could use pdfwrite and pdfmarks15:51.43 
Robin_Watts A 'free text' annotation seems like the one you want.15:52.51 
AlecTaylor Hmm, I don't think annotation is what I am after. The displayed page number I'd like changed to whatever I have heuristically determined.15:54.20 
Robin_Watts You use an /AP entry to point at a new xobject that you define, and that xobject can have a stream of pdf operators in it.15:54.21 
kens What do you mean by 'displayed page number' ?15:54.43 
  The rendered value, or the ordinal displayed b Acrobat ?15:54.57 
Robin_Watts AlecTaylor: AIUI, you want to add specified content to every page (different on each page), right?15:55.33 
kens thinks not15:55.41 
AlecTaylor kens: The number displayed on each page in the PDF viewer15:56.10 
kens So like I thought, not the page content stream15:56.42 
AlecTaylor The other feature I want to add is a per page logical structure, which specifies what's page content, and what header/footer15:56.49 
kens I think Acrobat get s that from teh Outlines tree if present15:56.53 
  So you need to add an Outlines15:57.08 
Robin_Watts (or amend an existing one)15:57.19 
AlecTaylor has written and prototyped algorithms that do this, and is outputting to an XML file. Now wants to push that to a PDF15:57.19 
kens As for your other info, what PDF structure do you propose to use to hold that inrformation, marked content ?15:57.44 
Robin_Watts Right, so you want some code that takes the pdf and the xml and outputs a new pdf.15:58.05 
AlecTaylor Yeah15:58.14 
Robin_Watts AIUI, the complete scheme for what you have does:15:58.18 
  original.pdf -> temp.xml (using mupdf)15:58.33 
  temp.xml -> temp2.xml (using your processing code)15:58.51 
AlecTaylor pdfextract original.pdf; someothertool original.xml original.pdf15:59.08 
Robin_Watts Then you want to go: temp2.xml + original.pdf -> final.pdf (using the bit of code we are talking about now)15:59.18 
AlecTaylor Now original.pdf has new information from original.xml15:59.23 
Robin_Watts OK. So you want something like a modified pdfclean. You can strip that down so it just opens and reoutputs a pdf (and lose all the garbage collection, page extraction code etc)16:00.45 
  Then you can start to build it up so that it reads your xml and adds extra objects to the PDF before it reoutputs it.16:01.14 
  But you have to realise that pdfextract goes to some lengths to present you with a 'coherent' version of content.16:02.09 
  Just because you see 2 columns of content in the xml output, doesn't mean it appears like that within the PDF source.16:02.57 
  The pdf source could have bits of column 0 and bits of column 1 interleaved with one another.16:03.14 
  hence 'tagging content' at that level is likely to be hard.16:03.45 
  I still don't entirely understand what your aim is here.16:04.00 
  I thought from what you had said before that it was to add headers/footers to pages.16:04.44 
  If you're now saying you want to 'reflow' page content that's a hugely different thing.16:05.01 
kens AIUI the aim is to add metadata to PDF files with things like the ''tagged' (marked data) concept to delineate header/footer/ page number etc. And also to add Outlines with teh 'correct' page info (content, index etc)16:05.03 
Robin_Watts Using marked data will require (I reckon) a modified pdf interpreter; it'd need to run the PDF, and catch the marking of the page; as it marks into the appropriate regions (identified by the previous xml processing), then it would need to rewrite the streams to include marked content markers.16:08.12 
  That's a huge job.16:08.18 
  But I'd like to hear a clear description of what's required from AlecTaylor before we go any further, cos I could be barking up the wrong tree.16:09.01 
  Is the idea to 'mark' the headers/footers/stories (say by adding visual boxes to the page)?16:09.43 
AlecTaylor Hmm16:10.35 
  "<kens> AIUI the aim is to add metadata to PDF files with things like the ''tagged' (marked data) concept to delineate header/footer/ page number etc. And also to add Outlines with teh 'correct' page info (content, index etc)"16:10.46 
  That's what I am trying to do16:10.53 
  as well as "repair" the page numbers displayed by the PDF viewer to be the same as the page number "printed" on the page16:11.25 
Robin_Watts So what's the purpose of adding metadata? How will my viewing experience differ by me looking at a processed file rather than the original one?16:12.03 
kens That depends (I think) on the Outlines. If there are no Outliens Acrobat displays the ordinal page number (1/50 and so on)16:12.11 
  Robin_Watts : If you use (eg) a PDF to speech package, it won't read page numbers if they are idetified as such, for example16:12.59 
  tagged PDF is mandated by a number of governement agencies for this reason16:13.14 
Robin_Watts OK. That's what I was after. Is that the reason, AlecTaylor ?16:13.35 
kens Because they don't really understand what the creation implies16:13.48 
  Also header/footer may be omitted etc.16:14.16 
AlecTaylor Yeah, for accessbility and searching is my purpose for specifying what on the page is the header/footer and what is the other stuff.16:14.45 
kens Searching already works, or if it doesn't you won't be able to fix it16:15.05 
Robin_Watts AlecTaylor: Right. So annotations are not the way to go.16:15.12 
  You do need to rewrite the content streams.16:15.28 
kens I see it as /Outlines plus marked content (tagged PDF)16:15.39 
  I *think* you can insert marked content inside text blocks, but I'm notr certain16:16.03 
  If you cant' then that's going to be really hairy16:16.12 
Robin_Watts And pdfclean is a reasonable starting point (it reads in/writes out, and will enable you to add Outlines).16:16.18 
  But it's far from easy.16:16.39 
AlecTaylor Hmm16:16.56 
Robin_Watts Because you're going to have to interpret the PDF in order to know where each text rendering operation puts its text (and hence in what section it goes).16:17.24 
AlecTaylor Well on the bright side, at least IT'S POSSIBLE in this library (rather than poppler, which I've been working off)16:17.29 
Robin_Watts You could easily have a text object that wrote to both header, footer, and multiple stories within in.16:18.02 
  AlecTaylor: Anything is possible with any library depending on how much you're prepared to rewrite :/16:18.47 
AlecTaylor Robin_Watts: I envision a find/replace function, search for this text "aaa 44" (whatever's in the XML tag), add information around that text tagging it up16:18.47 
Robin_Watts AlecTaylor: Sadly, no.16:18.58 
AlecTaylor :\16:19.07 
kens That text may not appear in the PDF16:19.18 
Robin_Watts You might have got information in your extracted XML telling you that the word "hello" appeared on the page.16:19.45 
  and you might want to tag that as being part of the header.16:19.52 
  Within the pdf, you might have "hello", or you might have had "h", "e", "l", "l", "o"16:20.24 
  or "he" "ll" "o".16:20.30 
AlecTaylor Ahh, because it isn't shown as a big line of text16:20.34 
Robin_Watts or "o" "l" "e" "h" "l"16:20.45 
kens Or even random text between, but positioned separately16:20.47 
AlecTaylor wait16:20.50 
  That last one, I don't get it16:20.57 
kens Font encoding16:21.03 
  What you see in the XML is eht Unicode from teh ToUnicode cMap16:21.16 
Robin_Watts kens: I wasn't even talking about Font encoding yet :)16:21.18 
kens Or a guess based on other heuristics16:21.25 
Robin_Watts AlecTaylor: Letters can be sent to the page in any order.16:21.37 
kens Yep.16:21.46 
Robin_Watts PDF is basically a stream of 'page marking' operations.16:21.51 
sebras Robin_Watts: mmm, or what characters are actually part of the font embedded in the pdf...16:21.53 
kens And often are in CJKV or right to left languages16:21.58 
AlecTaylor :S16:22.30 
Robin_Watts As such any marks that add up to give the correct final result are fine; letters can be send in any orders.16:22.39 
  You could send all the 'a's then all the 'b's etc.16:22.51 
  You need to redo the page interpretation, and watch where each char is placed; if it happens to be outputting a char to a place you know is in a header, then you can rewrite the stream to mark it as being in the header.16:24.46 
  It's not a trivial job.16:24.57 
AlecTaylor How about extending pdfclean to fix up the streams for this operation?16:28.12 
Robin_Watts AlecTaylor: pdfclean will get the streams into memory in a raw, uncompressed form.16:28.48 
  You can rewrite them there.16:29.02 
AlecTaylor Hmm16:29.08 
kens Its not as simple as 'fixing' them, you haev to identify where teh stream writes the glyphs you are interested in and modify it16:29.13 
AlecTaylor Seems quite extensive and complicated16:29.19 
Robin_Watts BUT... knowing how to rewrite them is the hard part.16:29.21 
kens AlecTaylor : Yes, that's what we've been saying :-)16:29.35 
AlecTaylor :(16:29.43 
Robin_Watts AlecTaylor: You didn't want to pick a simple project for your thesis, right?16:29.44 
AlecTaylor My thesis uses something else, haven't started it yet (hasn't been approved yet).16:30.07 
  This was a one-semester undergrad research project16:30.19 
  Semester ends in 7 days16:30.58 
  Also have 2 exams for other subjects within these 7 days16:31.22 
kens Well you can weite up the research16:32.07 
AlecTaylor Yeah, that's what I'll do instead.16:32.40 
  This seems wayyyyy too extensive for something I can do in 7 days, even if I wasn't working on anything else16:32.58 
kens Yeah, not a hope of doing it in 7 days16:33.12 
Robin_Watts indeed not.16:33.18 
kens You could do an 80/20 probably16:33.21 
AlecTaylor wah16:33.35 
Robin_Watts AlecTaylor: If you have reliably detected headers/footers/page numbers, then that's a big job done.16:33.58 
kens I think that's still information Tor and I would be intrested in getting16:34.31 
Robin_Watts You could easily write a 'further work' section describing how you'd like to have a tool to 'reinsert' that information back into the original pdf.16:34.49 
AlecTaylor All well, you guys want my research prototype which creates an XML from the XML generated by pdfextract, with header/footer tags with proper page numbers, implemented in only stdlib (with the header-only RapidXML library for reading in the XML)?16:34.55 
AlecTaylor is giving it the poppler project, but no reason can't give it to you guys too16:35.13 
Robin_Watts AlecTaylor: I think we'd all be interested in seeing it, yes, thanks.16:35.23 
kens I'd certainly be interested in it16:35.24 
AlecTaylor Sure, I'll send over a patch soon16:35.35 
Robin_Watts In fact, when you write up your paper, give us a link :)16:35.39 
kens At the least we can add the information to the XML outptu from MuPDF and Ghostscript16:35.41 
AlecTaylor Will do :)16:35.46 
  I might add in some fuzzy stuff, are OCR errors prominent enough to warrant it?16:36.53 
kens My experience is OCR is too good to need it today16:37.19 
Robin_Watts 'fuzzy' ?16:37.51 
  OCR is at least as good at spelling as your average youtube commentard :)16:38.09 
AlecTaylor Robin_Watts: So I could employ Levenstein Distance with a barrier of 1 or so16:38.13 
  On spell-checkers, Google has gotten terrible. It overly uses its result statistics to spell-check16:38.46 
  But sure, if OCR is fine at the moment, I'll leave it be16:39.44 
  Is there an article I can reference so that I can legitamately skip it?16:39.58 
kens None that I know of. My only real experience of OCR is 30 years out of date16:40.33 
  Back tehn it was an intersting problem16:40.51 
Robin_Watts AlecTaylor: I don't see that you need to cite a reference; stating that it's a problem that you are ignoring should be enough.16:41.15 
  The original document scanner is far better placed to do such fixes.16:41.37 
AlecTaylor So can I safely say "levenstein distance is utilsed at scan stage so I won't do it here"16:42.02 
  or somethign along those lines?16:42.07 
kens I don't know what techniques OCR people use today.16:42.37 
  I'd be inclined to say that it 'could be used as a further refinement, if experience demomstrates a problem'16:42.57 
Robin_Watts AlecTaylor: Yeah, don't tie yourself to a single possible algorithm.16:43.41 
AlecTaylor k16:44.06 
  All well, gotta head off to grab a few hours sleep soon16:44.22 
  (3:44AM here)16:44.27 
  Just gotta figure out this file upload problem for a conference I am organising (which is in 7 days)16:44.43 
  Robin_Watts, kens: Porting MuPDF to javascript: possible/impossible?16:47.06 
  (just for a viewer)16:47.13 
kens Re-writing in JavaScript ? Impossible (or rather takes too long16:47.41 
Robin_Watts AlecTaylor: You wouldn't rewrite, you'd write from scratch.16:48.42 
AlecTaylor Would it be a waste of time? - Is it what Google did?16:49.13 
Robin_Watts Someone else has done it.16:49.19 
  It's not what google did.16:49.26 
  Yes, it would be a HUGE waste of time (unless you're interested in doing it to drive javascript development)16:49.47 
AlecTaylor Nope, just interested16:50.43 
AlecTaylor was thinking to port the Qt Poppler viewer stuff to Wt16:51.01 
mvrhel_laptop good morning16:52.57 
Robin_Watts Morning16:53.05 
kens Hi mvrhel_laptop16:53.49 
  Would you mind looking at bug #692680 ?16:54.14 
  Just to read it!16:54.24 
mvrhel_laptop sure16:56.07 
kens Thanks, if I'm gone when Ray arrives, could you ask him to read it too please ?16:56.35 
mvrhel_laptop yes16:56.50 
kens At the moment it looks like Alex's suggestion is the only concrete possibility for a solution, but you and Ray were talking about high level colour so I'd like you to bear this problem in mind ;-)16:57.17 
mvrhel_laptop ok. making my way through the comments16:58.08 
kens Yeah, its a bit complicated.16:58.23 
  Feel free to ask if its not clear16:58.44 
  Cool technology:16:59.18 
mvrhel_laptop wow this eps file sounds terrible16:59.49 
kens I understand why its doign it, but I don't like it17:00.02 
AlecTaylor I like that green flashing lazer17:04.30 
  makes me hungry for fake 3d fish17:04.37 
kens I want my real 3D telly now please.17:05.00 
AlecTaylor You already have one17:06.00 
kens proper 3D display17:06.09 
  fake ones don't work for me17:06.16 
AlecTaylor You have a 2D TV? - Amazing!17:09.03 
  You're such a square :P17:09.11 
kens display....17:09.22 
  ray_laptop : I'm about to go off, but can you review and think about bug #692680 please.17:12.47 
  Not for a solution, just to consider in terms of future work relating to colour17:13.02 
ray_laptop kens I saw your comment and the gs-bugs email with17:13.11 
kens Thanks Ray17:13.21 
ray_laptop kens: I agree that it looks AWFUL17:13.31 
kens ray_laptop : well on reflection I understand what's done and why, but its nasty.17:13.50 
ray_laptop kens: yes, and they didn't even 'bind' the procset :-(17:14.11 
kens Actually, even if they did it wouldn't help17:14.25 
  Because we actually need to do what Acrobat does. Send the Indexed sample through both colour spaces, to get 6 colour values for each image sample17:14.48 
  Then set up /DeviceN [/Black /Spot1 /Spot2 /None /None /None] /DeviceRGB17:15.21 
ray_laptop oh, yuck!17:15.45 
mvrhel_laptop wow17:15.50 
kens So if the inks are rpesent we ignore teh last three values, if the inks aren't present then the tint transform to RGB does {pop pop pop} to remove the spot inks and elave the RGB17:15.54 
ray_laptop kens: I understand now.17:16.37 
henrys ray_laptop:it looks like marcos dumped 692674 on you without history investigation - that customer did get poor service with their problem, will you have a chance to look at it soon? Or should we send it back to marcos for history?17:16.51 
ray_laptop henrys: I've started looking at it (on peeves)17:17.12 
  henrys: I'll post something today. I think Marcos just gave it to me today17:17.32 
  henrys: and I agree that they sort of got ignored, but then, they aren't my favorite customer either17:18.04 
AlecTaylor Damn .htaccess file, it's working now17:18.24 
kens Did they get ignored, they opened it one day and started bleating 24 hours later17:18.28 
henrys ray_laptop:I didn't follow it carefully they whined in the bug report.17:19.12 
ray_laptop henrys: these were the guys that had me work on trimming down the initialization time of gs because they were timing 5,000 simple jobs and starting gs each time17:19.16 
henrys oh same folks, sigh17:19.38 
mvrhel_laptop I wonder what the *huge* difference is17:19.44 
ray_laptop henrys: Miles had them pay extra for that.17:19.47 
  I'll post my comments to the bug once I do the timings on peeves.17:20.06 
  (I'll do profile as well)17:20.28 
kens TBH I think the answer is 'there are bg fixes in newer releases, sometimes these cause performance penalties for correct behaviour. You cna have fast or good, your choice' ;-)17:20.33 
ray_laptop kens: may be -- but first I want to get the facts17:20.52 
kens Yeah, I was thinking of looking at it, but got diverted, and then realised it wasn't pdfwrite17:21.23 
Robin_Watts kens: performance differences between 8.x and 9 is probably color management - but ray_laptop is right that we'd like to be sure of that.17:21.28 
mvrhel_laptop sure. blame it on the color... :)17:21.56 
ray_laptop mvrhel_laptop: if it is the color, I'll assign the bug to you :-)17:22.15 
kens It could be almost anything, including the fact that it uses FreeType, which is what fixed hteir *original* problem.17:22.28 
  Bear in mind that Chris has done some perfomrmance fixes in that area since the release of 9.0417:22.54 
ray_laptop kens: iirc, FT is slightly faster than the AFS (Artifex Font Scaler)17:23.13 
kens But that's what made me think of 'yes,. its right now, but slower' as an answer17:23.13 
  ray_laptop : There was something that made it slower, Chris fixed it.17:23.27 
ray_laptop kens: don't worry, I will also time HEAD17:23.33 
kens I think it was something about freeing memory17:23.39 
chrisl Yeh, confusion about what Freetype does and when.17:24.30 
kens Anyway, time for me to go, goodnight all17:24.34 
AlecTaylor Alright, well thanks very much Robin_Watts and kens. I will be sure to include the additional relevent information in my research paper, and proceed without reprocessing the PDF file, but just showing the XML as PoC17:24.35 
chrisl ray_laptop: there was a FAPI/Freetype performance regression in 9.04, which I fixed, and actually fixed what I was trying to do when I introduced the problem - in theory, it should be slightly faster than 9.02, but I suspect it won't be measurable17:26.46 
ray_laptop chrisl: we'll see. I'll let you know if the profile shows that it is font related in the HEAD rev.17:27.36 
chrisl ray_laptop: okay, thanks.,17:27.50 
  ray_laptop: and just in case, the -dDisableFAPI command option still works if you want back-to-back FAPI/AFS numbers.17:29.08 
ray_laptop chrisl: thanks for the hint17:30.05 
henrys chrisl:so how can shelly run the clusters but not have an account?17:31.50 
chrisl henrys: I have no idea - does git have its own permissions?17:32.27 
Robin_Watts chrisl: He does have an ssh account.17:32.41 
henrys and he has gs-priv17:33.05 
  he's in the group17:33.11 
  my guess is he's trying to check into http read only.17:33.24 
chrisl Okay, let me talk to him, and see if it's a settings problem......17:33.28 
Robin_Watts If he's using "git push" then he doesn't need to have an account.17:33.31 
  sorry "git cluster".17:33.41 
henrys we want him to be able to commit and I don't see why he can't.17:34.33 
Robin_Watts Me either. Tell him to log in here, and we'll talk him through it.17:36.34 
ray_laptop if he cloned from the http, then his remote.origin may be wrong17:36.45 
chrisl If the worst comes to the worst, he can come down here, and I'll hit his computer with a mallet until it works.17:36.52 
henrys ray_laptop:that's what I suspect17:36.55 
ray_laptop have him check his git config -l17:37.01 
Robin_Watts git remote -v17:37.16 
  chrisl: Is there a reason he doesn't log in here ?17:37.46 
ray_laptop Robin_Watts: yeah, that has the info too :-)17:37.47 
chrisl Robin_Watts: timing, mostly.17:38.16 
ray_laptop with so many people here with strange nicks, how can you tell he's not17:38.27 
Robin_Watts He'd have to be up at VERY odd hours for someone not to be here :)17:38.44 
henrys he's been on before - IRC may not be compatible with his work situation.17:39.10 
ray_laptop there's alway webchat17:39.30 
Robin_Watts henrys: Presumably he's not working for us while *at* work..17:39.47 
ray_laptop Robin_Watts: why not ?17:40.04 
henrys that's what I'd do, why waste my free time ;-)17:40.11 
chrisl Robin_Watts: I'd probably best not comment on that......17:40.39 
AlecTaylor :P17:41.35 
chrisl henrys: I've dropped a mail to Shelly, saying what we think - he'll give me a call if it's not what we think17:47.01 
henrys chrisl:please do invite him to irc next time you talk, I assume I don't need to answer his last email since youare are talking to him.17:47.08 
  oops should have said that sooner.17:47.25 
chrisl henrys: I do mention IRC whenever we talk - I'll keep doing so17:48.36 
henrys thanks17:49.01 
chrisl henrys: it would be good if you could reply to him wrt to the billing - seeing us right for the bounty fix that was wrong17:50.48 
henrys chrisl:will do.17:51.37 
chrisl Thanks - that's the proverbial "above my pay grade" ;-)17:52.01 
henrys for those of us that have been kicked upstairs before we break something ;-)17:52.59 
chrisl refuses to comment ;-)17:53.49 
henrys I must warn you guys before the meeting I've cut my hair so try to recognize me. I've had some confusing encounters...17:54.42 
chrisl Wow! Was it weighing you down too much?17:55.11 
Robin_Watts henrys: What would you have got if you'd won the bet?17:56.11 
henrys as you age it gets thinner and you've got cut it back so it'll get thicker.17:56.45 
Robin_Watts henrys: I'm not sure that works... by that token I'd have *really* thick hair.17:57.26 
henrys according to 23andme I have the genes for male pattern baldness but it hasn't taken hold yet.17:59.30 
ray_laptop henrys: have you told Miles and Scott ? (it would be fun to see their reaction)17:59.32 
henrys no I haven't told them.17:59.59 
  I've had a ponytail for 15 years, I didn't go down easily ;-)18:01.25 
mvrhel_laptop wow18:02.00 
  I had told my kids about your long hair. Luckily Alden had seen you before otherwise they won't believe me when they see you in Miami18:02.37 
henrys it's still longer than Robin_Watts' ;-)18:03.23 
Robin_Watts mvrhel_laptop: Tell them an alligator got to it.18:04.10 
henrys my kids were shocked, they don't have memory of me with short hair.18:05.12 
mvrhel_laptop You should be faster in your runs now18:05.52 
henrys anyway speaking of Florida it's be great to have a get together with everybody, I'm glad Miles invited spouses, families and all.18:06.15 
Robin_Watts mvrhel_laptop: You're forgetting the Samson effect.18:06.20 
mvrhel_laptop good point18:06.27 
henrys the stats so far indicate a slowdown but we'll see.18:06.43 
AlecTaylor lolol18:07.21 
henrys chrisl:I notice the jbig2 stuff is going to get complicated if he has sjbig2.c changes - have we looked at git externals yet? This business of having 2 repos for jbig2 is a problem.18:14.14 
chrisl henrys: I thought we wanted to avoid externals18:15.09 
Robin_Watts god yes, we want to avoid externals.18:17.13 
  not even sure it's possible with git.18:17.18 
chrisl git submodule18:17.34 
henrys well I thought git externals were better than svn externals and tor was going to ease us into it. But I haven't read about git externals myself.18:17.39 
ray_laptop my kids were shocked when I showed them pictures of me with a beard18:20.58 
  henrys: did you donate your hair ?18:21.23 
  my beard wasn't long enough to donate ;-)18:21.55 
AlecTaylor needs a haircut, his hair is almost down to his lobes!18:22.01 
AlecTaylor shaved for the first time in 2.5 weeks a few days ago18:22.17 
henrys ray_laptop:yes locks of love18:22.18 
chrisl Robin_Watts, henrys: git submodules pulls in a specific revision of the sub-project - so we'd have the same issue of not being able to commit to it.18:22.28 
ray_laptop henrys: good going !!18:22.30 
henrys chrisl:so what to do about jbig2 continue to maintain it in 2 repos?18:24.19 
Robin_Watts Why can't we make the ghostpdl repo the 'one true repo' for jbig2? We can do releases in turn with the ghostscript releases.18:25.30 
chrisl henrys: if I'm honest, my inclination would be to make the ghostscript jbig2dec the "canonical" repos, and kill the other one18:25.35 
henrys chrisl:the mail I've sent so far suggests an API change requires updates in both repos and regular fixes go to the standalone jbig2, then you'll grab that as you please.18:25.38 
  I tried that and tor didn't like it.18:25.58 
  but I guess he can be outvoted - I didn't know you guys would buy into that also.18:27.00 
tor8 henrys, chrisl: that'l be painful for mupdf thirdparty libraries, but as long as we provide release tar balls of jbig2dec we're not in a worse spot than with zlib18:27.34 
chrisl tor8: where do you get the jbig2dec source for the third part libs?18:28.13 
tor8 at the moment I have a (private, on my machine) git repo which uses submodules to pull in all third party libs18:28.22 
  from the jbig2dec git on casper18:28.34 
  there are many ways to have multiple repos for one project with git18:29.14 
  submodules is just one way18:29.20 
  android uses the 'repo' tool18:29.26 
henrys chrisl:so can I tell shelly to just check into gs?18:29.29 
chrisl henrys: for now, I think so. Let's see if we can hammer out a solution keeping the separate repositories - if we do, we'll up date it from the gs code18:30.56 
tor8 henrys: if we want to start using submodules I can set that up, but it means all of us have to run a few more git invocations to keep things in sync18:31.04 
  chrisl: I wonder if git cherrypick will work for that, though I think the different paths will pose a problem18:31.57 
chrisl tor8: could we use some triggers so that commits into the gs/jbig2dec get mirrored to the jbig2dec repos and vice versa?18:32.22 
henrys tor8:maybe something for the meeting - I wouldn't expect a warm reception to new git training.18:32.38 
tor8 chrisl: I think it'd be relatively easy to make a script pull and update the jbig2dec sources in gs if we use the external jbig2dec git as a master18:32.56 
  henrys: yeah. conceptually git submodules work great, but they aren't trivial to use :(18:33.27 
  you need to run a few extra commands to keep the subrepositories updated18:33.47 
chrisl Yeh, it was the other way (gs->jbig2dec) I was wondering about18:34.01 
Robin_Watts tor8: Why not have a script that automatically recommits any changes to the ghostscript version into the other repo ?18:34.01 
ray_laptop since jbig2dec doesn't get updated frequently, it might work18:34.14 
henrys tor8:the immediate issue is shelly has a fix that requires a change to gs (sjbig2.c) and jbig2 - somehow this must be syncronized.18:34.21 
ray_laptop henrys: but jbig2 _is_ in the gs repos18:34.56 
henrys ray_laptop:it is in 2 places.18:35.16 
ray_laptop is the problem the shared lib staying in sync ?18:35.39 
tor8 henrys: that's not a problem if we use git submodules (since we'd update the submodule version in the same commit as sjbig2.c)18:35.44 
ray_laptop votes for disallowing shared lib support in gs builds ;-)18:36.59 
tor8 henrys: one approach is to make the jbig2dec.git auto-generated by filtering out all non-jbig2dec related stuff from the main repo18:37.08 
tor8 agrees with ray! I hate shared libraries.18:37.24 
henrys fair enough - the simplest thing is to make gs canonical and do jbig2 tarballs but that solution seems to have slipped away.18:37.25 
Robin_Watts What tor8 just said.18:37.36 
tor8 henrys: let's start with that, and I'll work on a script to to create a read-only jbig2dec subset git18:37.52 
Robin_Watts (about auto-generating from the main repo).18:37.54 
henrys tor8:I'm sure chrisl will appreciate that.18:38.13 
tor8 it's very similar to what I did when converting the svn to git repos18:38.16 
  henrys: put it on the tech agenda so you can remind me later18:39.11 
chrisl tor8: if it's doable that'd be great, but if it's a problem I don't mind adding updating of the jbig2dec repos to the Ghostscript release process - we just need to make sure people don't generally commit to it.18:39.55 
tor8 chrisl: we can add hooks to disable commits, or just change the file permissions.18:40.30 
henrys tor8:will do about the agenda for now shelly will commit to gs - and any bug that affects mupdf you'll get one way or another.18:40.43 
tor8 the issue is filtering the whole gs history to create a jbig2dec repo is rather cpu expensive, so we probably don't want to do that on every commit18:41.05 
chrisl henrys: is this a jbig2dec API change, or just a change in how GS uses it?18:41.35 
  tor8: once a week is probably (more than!) enough18:41.59 
tor8 chrisl: indeed!18:42.28 
chrisl And if even that uses too much CPU time - once a month......18:43.19 
Robin_Watts tor8: Once we've generated it, can we not do incremental updates?18:43.38 
  i.e. use a hook on the golden repo on casper so that whenever we commit to ghostpdl in the jbig2dec dir, it recommits to the jbig2dec standalone ?18:44.23 
tor8 Robin_Watts: I'll have to look into it, but I think it should be possible.18:44.32 
Robin_Watts That way we get it instantly in sync, for low cost.18:44.39 
tor8 git filter-branch is what I was thinking of using18:44.51 
  git filter-branch --subdirectory-filter gs/jbig2dec --prune-empty ...revs...18:45.25 
henrys chrisl:now I can't find the patch, hang on.18:45.51 
chrisl henrys: it's really for tor8's benefit - if the jbig2dec API changes, mupdf will need to be aware of it, too.18:46.44 
falko_ hi is it possible to create a rpt port to ghosscript and if i print i print to a printer, create a psfile and after that start a executable (that moves the ps file around) ?18:48.18 
  at the moment i have a script that checks every 3 sec for a new file ,.. does the printing to the printer and teh moving around .. but that is not a very nice solution18:49.32 
henrys chrisl:apparently the sjbig2.c is reference in his email as being in the patch but it isn't there?18:49.49 
chrisl henrys: he did ask how we wanted it - I may have misinterpreted what he said.18:51.02 
henrys figure it out later if he commits it's easy enough to back out.18:55.30 
ray_laptop mvrhel_laptop: the color stuff is DEFINITELY in the top of the profile for the performance (904 vs 871) :-(18:58.34 
mvrhel_laptop :(18:59.00 
  ray_laptop: if you want to dump this on me, that is fine18:59.19 
ray_laptop mvrhel_laptop: cmsSample3DGrid is particluarly heavy.18:59.34 
  mvrhel_laptop: I think the issue is that 871 runs with 'simple' color and 90x effectively always runs with -dUseCIEColor19:00.12 
chrisl henrys: Okay, getting Shelly able to commit should resolve the API question from a GS point of view, and I'll make sure to tell him to inform tor8 if the API really has changed.19:00.56 
mvrhel_laptop ray_laptop: yes. I guess what I need to do is to get the "dumb" CMM in place19:01.18 
  for those who don't want color management19:01.28 
ray_laptop mvrhel_laptop: we'd need a 'dumb' cms that totally ignored the ICC profiles 19:01.33 
mvrhel_laptop hehe19:01.38 
  that is on my todo list. and I did start it19:02.04 
henrys what is the effect of just using the dump profiles for now?19:02.17 
ray_laptop mvrhel_laptop: but the complication is that we can only do that for the 'default' colorspaces -- If Lab or ICC comes in, we still need to handle it19:02.30 
mvrhel_laptop henrys: they will still use lcms19:02.52 
  ray_laptop: yes19:03.00 
  I need to think a bit about this19:03.07 
henrys mvrhel_laptop:it may be unique to pcl but many jobs run significantly faster with the dump profiles.19:03.29 
mvrhel_laptop interesting.19:03.39 
  I do have an idea though how to do this19:03.46 
  basically we have an interface between the graphics library and the CMM and we know when the profile was a substituted one for a defaultRGB etc19:04.24 
  this is needed for PDFwrite19:04.29 
ray_laptop mvrhel_laptop: if the "quick and dirty" cmm recognized the input ICC profiles enough to recognize that it is RGB, Gray or CMYK then the 'link' profile it returns can be a suitable dummy that the 'quick and dirty' color conversion code can recognize and shortcut19:04.36 
henrys I think there are many cases in the code where you don't get "pure" colors with the fancy profiles.19:04.44 
mvrhel_laptop yes exactly19:04.45 
  let me look at adding that as an option 19:05.07 
ray_laptop mvrhel_laptop: OK. But let's decide when that fits with your other priorities19:05.36 
Robin_Watts I'm about to risk breaking the cluster.19:07.22 
  Anyone have any jobs they want to get in desperately?19:07.32 
ray_laptop Robin_Watts: not me19:07.39 
mvrhel_laptop well I am doing a run right now19:07.49 
  oh it just finished19:07.55 
Robin_Watts This will only affect new jobs.19:08.01 
mvrhel_laptop I am done. Going to do a commit now19:08.23 
Robin_Watts ok, I've swapped to a modified that should at least get the build right on windows cluster machines.19:08.58 
mvrhel_laptop actually I may run one more real quick19:09.03 
Robin_Watts Go for it.19:09.09 
  In theory with no windows cluster machines you should see no differences.19:09.19 
mvrhel_laptop ok19:09.27 
henrys Robin_Watts:oh are you getting the windows node going?19:10.20 
Robin_Watts henrys: trying to.19:10.29 
tor8 chrisl, kens: more ammo if you don't like the _t suffix ... it's reserved by POSIX so we shouldn't be using it in the first place!19:17.38 
ray_laptop you mean like tin64_t ?19:19.23 
tor8 yeah19:19.36 
ray_laptop int64_t that is :-/19:19.39 
tor8 posix makes the claim on *_t in ANY header19:20.11 
ray_laptop that seems a bit excessive19:20.47 
  what happened to C standard 19:21.07 
tor8 yeah, but _t is an ugly wart so I don't complain :)19:21.15 
  the C standard, it got let out in the wild... :/19:21.45 
  looking at the list of name spaces that posix lays claim on, it's pretty excessive even without *_t!19:22.42 
Robin_Watts ok, we have a windows cluster node... and it's pinging the cluster, let's try doing a build.19:22.56 
  My cygwin installation appears to be having problems. Gah.19:25.38 
ray_laptop btw, the HEAD rev is virtually the same on this test as 904, so FT isn't bolliixing anything up19:25.42 
henrys bbiab19:29.07 
mvrhel_laptop bbiaw19:45.50 
Robin_Watts mvrhel_laptop: I've taken my node offline, so your job should run without a problem.19:48.35 
mvrhel Robin_Watts: you around?20:51.25 
Robin_Watts I am.20:51.31 
mvrhel so I was just testing 692512 to see if we could close this20:51.55 
  and I see that we get horizontal lines at 600dpi20:52.09 
  I thought we had this fixed20:52.18 
Robin_Watts so did I :(20:52.46 
mvrhel the bug was originally for vertical lines20:52.50 
  I have a single page version of this file if you want to take a look at it20:53.06 
Robin_Watts Assign it to me, and I'll have a look when I uncluster myself.20:53.09 
mvrhel ok.20:53.15 
Robin_Watts Damn. Looks like I really am going to need to check out another copy of tests/tests_private.21:08.01 
mvrhel_laptop stormy here. we may lose power23:43.32 
 Forward 1 day (to 2011/11/17)>>>