IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2011/11/15)2011/11/16 
Robin_Watts henrys: You're welcome. Glad to hear you think it'll fix it.01:24.51 
arthurf tor8: Hi. Tried to do some quick searching with the iOS app on a PDF file, but not much happened. I'm guessing there is more work to do? 02:52.24 
tor8 arthurf: yes. it still doesn't show any results :)02:52.50 
arthurf tor8: Okay thanks. Just wanted to check that I wasn't completely missing something. Thanks. :)02:53.22 
robingower I'm having a bit of nightmare compiling ghostscript on a mac02:54.44 
  I filed this ticket at macports months ago02:54.54 
  https://trac.macports.org/ticket/2882502:54.55 
  I can't get it to build outside of macports either02:55.18 
  any ideas of what else I could do to diagnose my problem?02:55.49 
tor8 robingower: that's odd, many of us (developers) use macs as development machines when working on ghostscript02:56.02 
robingower indeed. I've tried the usual suspects like fiddling with arch settings and I've removed fink and macports etc02:57.41 
tor8 my normal procedure is: git clone, ./autogen.sh, make02:59.30 
  I don't use macports or anything like that, just a plain mac install with Xcode03:00.10 
robingower 'k thanks tor8 I'll give that a try03:08.37 
  I've cloned from git://git.ghostscript.com/ghostpdl.git and done autogen & make03:23.01 
  but I still get03:23.04 
  ../gs/base/gstype42.c:68: error: conflicting types for ‘gs_type42_read_data’03:23.09 
  ../gs/base/gxfont42.h:141: error: previous declaration of ‘gs_type42_read_data’ was here03:23.14 
  I've checked the offending lines (dozen's of times!) and they look the same to me03:23.36 
  i.e. same parameter types etc03:23.49 
AlecTaylor hi04:53.39 
ghostbot hola04:53.39 
AlecTaylor Any MuPDF developers around?04:53.48 
  How would I go about reverse-engineering an XML file back into the PDF?05:06.41 
alexcher AlecTaylor: XML describes a general structured document, e.g. an IP datagram.05:15.12 
AlecTaylor just posted on the mailing-list05:15.30 
alexcher AlecTaylor: PDF is a page description language.05:15.46 
  AlecTaylor: How one can be converted to another?05:16.48 
AlecTaylor alexcher: I was able to output an XML document using MuPDFs pdfextract tool. I have extended the output to detect header/footers and page numbers. Now I want to push the *proper* page numbers and additional per/page logical structure information back into the PDF05:17.18 
alexcher AlecTaylor: MuPDF developers will be here in a few hours.05:20.18 
AlecTaylor kk05:20.22 
  BTW: Did I post to the right mailing-list?05:20.32 
  http://ghostscript.com/pipermail/gs-devel/2011-November/009091.html05:23.14 
alexcher AlecTaylor: yes05:23.56 
AlecTaylor Great :)05:24.07 
AlecTaylor has been writing these extensions for the poppler libraries, but they don't seem to have anything for moving from XML->PDF05:24.29 
kens Morning marcosw_08:46.36 
AlecTaylor How can I reverse-engineer an XML file generated with MuPDFs pdfextract tool back into the PDF?09:01.06 
kens Write a tool to do it ?09:01.37 
  Open it with a XML reader and print to PDF ?09:01.53 
AlecTaylor kens: Back into the PDF it was created with09:04.01 
kens You can not recover teh *same* PDF (as requested somwhere else), its been interpreted and information has been lost.09:04.44 
  You can create a simliar, visually more or less identical PDF, but its not the same PDF.09:05.07 
  And we don't (as far as I know) provide a means to put it back. Why would you boother, you have the original.09:05.37 
  Actually, I guess you could do so, but I still don;t think there's a tool do it. I still don't see why you would want to eitehr, you have the original file09:07.22 
AlecTaylor kens: I am adding logical structure information and *proper* page numbers into the PDF. No actual text will be changed, just some internal information09:35.45 
AlecTaylor has developed a solution using just the stdlib and the header-only RapidXML library09:36.15 
kens Ah, editing a non-editable format :-)09:40.05 
AlecTaylor kens: eh?09:40.48 
kens Well PDF files aren't really meant to be edited, even for metadata.09:43.07 
AlecTaylor kens: True, but there are billions of PDFs without proper metadata. I would like to fix them all up!09:50.41 
AlecTaylor is sure there's a way09:50.55 
kens Creating a PDF file from the XML output of pdfextract is possilbe, but you need to regenerate the xref. SO you need an application to do it, and we haven't written one.09:52.50 
  You could use such a thing to create a Linearized PDF as well mind you09:53.07 
  But that might require more analysis09:53.25 
AlecTaylor Hmm09:56.49 
  kens: Would there be another open-source project which provides the XML->PDF feature?09:57.48 
kens I doubt it10:09.54 
AlecTaylor :(10:10.28 
kens But you know, you can be the first to write one :-)10:10.43 
AlecTaylor yeah, but been working the last few months on this project, want it completed already!10:11.02 
kens will be back later10:25.27 
Robin_Watts marcosw_: ping?12:58.50 
  Hmm. The cluster code is going to need some changes to cope with building on a windows system.13:49.03 
kens I would expect so13:49.12 
Robin_Watts Presumably we want to call the projects rather than the makefiles.13:49.15 
  (otherwise we'd be testing a cygwin build rather than an MSVC one, and defeating the point)13:49.37 
kens Yes, not good13:49.51 
  Unless you explicitly call nmake and force use of VC instead of gcc ?13:50.41 
Robin_Watts I think an explicit call to nmake is the way to go.13:50.59 
  Or, probably better, a command line call to msdev, because that checks the projects.13:51.28 
kens Will it use the right comiler though ?13:51.31 
  OK sounds good13:51.40 
Robin_Watts Normally, you have to make something run once a minute on a crontab... that's going to need tweaking too.13:52.06 
  I am NOT having 2 copies of tests_private checked out on my machine. Time for some symlinks :)13:53.02 
kens can't use task scheduling ?13:53.11 
Robin_Watts I'd rather run something to start the cluster, and kill it to stop it.13:54.20 
  (I've just spent 250 quid on a graphics card - no point in that if I'm going to have the machine drop to a crawl when someone commits :) )13:55.09 
  (this is not an Artifex owned machine :) )13:57.00 
kens So buy another one for Artifex.13:57.13 
Robin_Watts If we're going to invest in a dedicated windows node, then putting somewhere with faster network access might make sense.13:58.09 
kens Chris's office ;-)13:58.26 
chrisl as long as it's *very* quiet!13:58.55 
kens Or perhaps Alex's basement13:58.56 
  being slightly more serious for a minute13:59.15 
Robin_Watts My new graphics card is slightly more noisy than my last one :(13:59.31 
  Helens new passport has just been delivered. I guess she'll be coming to Miami after all...14:12.33 
kens Glad to hear it :-)14:12.55 
tkamppeter_ kens, can you have a look at bug 692687? If there is a simple fix for it I would like to issue this fix as a patch release for Oneiric, too.14:20.43 
kens OK14:21.56 
  I owuld always advise against 'round tripping' files though14:23.20 
tkamppeter kens, the problem here is that we are in a transition between PostScript-centric and PDF-centric CUPS filtering. In Debian and Ubuntu we have switched to PDF-centric filtering, but there are still some few applications which send print jobs in PS and some few printer drivers which rerquire PS input.14:27.04 
kens Yeah, just saying14:27.37 
  Each conversion potentially throws stuff away14:28.39 
AlecTaylor bn14:32.29 
tor8 Robin_Watts: looking through your patches now. they look good to me, but I must have something to complain about! I really do prefer: while (span) { ... span = next; } to the do { ... span = next; } while (span != NULL) form for loops like in the "stack overflow in text handling" commit14:36.43 
kens tkamppeter it seems to be somthing to do with multiple subset TrueType fonts. It looks like a lot of glyphs are being replaced with .notdef14:40.48 
  Its going to take a while just to reduce the original PostScript file to somethign simple enough to investigate. I'm rather busy with a different problem right now, how urgent is this ?14:41.40 
daviddoria In the help of a latex package it says I can run a line like this to convert all of the pages to separate png files: gs -sDEVICE=png16m -dTextAlphaBits=4 -r300 -dGraphicsAlphaBits=4 -dSAFER -q -dNOPAUSE -sOutputFile=equation%d.png Equations.pdf - however, when I do that, i just get GS> prompt?14:43.23 
kens Any error message ?14:46.00 
  Did you properly specify the input file ?14:46.07 
  Oh, and you need to add -dBATCH to exit afterwards or type 'quit'14:46.32 
daviddoria kens, oh haha it actually worked... I am just suprised it didn't exit automatically but rather took me to a GS> prompt?14:46.39 
kens See comment above14:46.50 
daviddoria yep, that did the trick, thanks14:47.02 
kens no problem14:47.10 
  tkamppeter did you see my earlier question ?14:54.35 
tkamppeter kens, it is not overly urgent, but if it is done in a week it would be great.15:09.36 
Robin_Watts tor8: I did it that way, because it exactly matched what was there before.15:12.12 
kens tkamppeter, well maybe, I can't be certain15:12.29 
Robin_Watts But, yes, it would be nicer to have used while () { ... } as that would have protected against span being NULL on entry.15:12.39 
kens It'll tkae me several hours just to reduce the problem until I can work on it15:12.42 
Robin_Watts tor8: I'll fix that if you want.15:13.06 
kens The file contains 4 fonts, 3 of tehm are compostie nad contin up to 15 sub-fonts15:13.09 
tor8 Robin_Watts: right. in other places where I've used an iteration to free a linked list instead of the lazy recursion I use while (foo) {}15:13.09 
kens tkamppeter, also this file started life as a PDF file.15:13.34 
  So ita PDF->PS->PDF->PS15:13.44 
tor8 Robin_Watts: also get rid of the != NULL explicitness, we don't do that in mupdf ;)15:14.04 
Robin_Watts but we should :)15:14.24 
tor8 then we may as well rewrite it all in pascal...15:14.41 
  do you want to amend your commit or make another one?15:15.07 
  (ie, should I push to master or wait)15:15.16 
Robin_Watts IF you haven't merged yet, I'll fix mine.15:15.28 
  but if you have, push what you've got and I'll do new ones.15:15.43 
  I can't do this immediately - in the middle of fighting the cluster.15:15.52 
tor8 I haven't merged anything yet15:15.53 
Robin_Watts by "merged" I mean "pulled my changes in", sorry.15:16.12 
tor8 I pulled them in but I haven't done anything other than looked at them15:16.29 
Robin_Watts ok, I'll redo them. Will be a nicer history that way.15:16.40 
tor8 so if you want to amend, that's no trouble for me15:16.43 
tkamppeter kens, the problem here is also that okular sends print job in PS format and it should send PDF, see https://bugs.launchpad.net/ubuntu/+source/okular/+bug/891199.15:44.43 
AlecTaylor Reverse XML containing amended page number and new logical structure information (partition header/footer from other page content) using MuPDF? - If so, how?15:45.03 
Robin_Watts AlecTaylor: You're starting from the wrong place.15:46.26 
  When you convert from pdf to xml you are getting out a certain subset of the information from within the PDF.15:46.58 
kens tkamppeter it just increases the complexity, and the number of conversions. Makes it harder to sort out the real problem. Its going to take me quite a while just to simplify the problem. A file with 1 font instead of 4 and no grpahics, would help.15:47.03 
Robin_Watts If you discard the PDF you are then losing all the data.15:47.13 
AlecTaylor Robin_Watts: I needed XML so that I could have per page in "normal" text15:47.21 
Robin_Watts Far smarter to get the xml out, process it to get what you want to add, then do a process to 'add' your new information back into the original pdf.15:47.43 
AlecTaylor Robin_Watts: Also, can't I just modify the PDF that the XML was generated from15:47.49 
  yeah, that15:47.59 
  How can I do that?15:48.09 
Robin_Watts AlecTaylor: Someone suggested on the mailing list that you add it into the end of the contents streams.15:48.30 
  That's possible, but a bad idea, IMHO.15:48.44 
kens Which will break teh stream length (need to recalculate it) and the xref.15:48.50 
Robin_Watts A better idea would be to add your new content as annotations.15:48.56 
AlecTaylor So what should I do?15:48.57 
Robin_Watts kens: Yeah, but easy enough to do with a modified pdfclean.15:49.08 
AlecTaylor Robin_Watts: As annotations?15:49.12 
Robin_Watts Echo...15:49.26 
  PDF provides the facility to add annotations to a document.15:49.54 
  Read the PDF spec for more details.15:50.19 
  I suspect you'd end up taking pdfclean (which reads a pdf in, 'does stuff' to it, and then writes it out) and tweaking it.15:51.01 
  You want to read a pdf in, add annotations to it, and then write it out.15:51.13 
kens Could use pdfwrite and pdfmarks15:51.43 
Robin_Watts A 'free text' annotation seems like the one you want.15:52.51 
AlecTaylor Hmm, I don't think annotation is what I am after. The displayed page number I'd like changed to whatever I have heuristically determined.15:54.20 
Robin_Watts You use an /AP entry to point at a new xobject that you define, and that xobject can have a stream of pdf operators in it.15:54.21 
kens What do you mean by 'displayed page number' ?15:54.43 
  The rendered value, or the ordinal displayed b Acrobat ?15:54.57 
Robin_Watts AlecTaylor: AIUI, you want to add specified content to every page (different on each page), right?15:55.33 
kens thinks not15:55.41 
AlecTaylor kens: The number displayed on each page in the PDF viewer15:56.10 
kens So like I thought, not the page content stream15:56.42 
AlecTaylor The other feature I want to add is a per page logical structure, which specifies what's page content, and what header/footer15:56.49 
kens I think Acrobat get s that from teh Outlines tree if present15:56.53 
  So you need to add an Outlines15:57.08 
Robin_Watts (or amend an existing one)15:57.19 
AlecTaylor has written and prototyped algorithms that do this, and is outputting to an XML file. Now wants to push that to a PDF15:57.19 
  yeah15:57.32 
kens As for your other info, what PDF structure do you propose to use to hold that inrformation, marked content ?15:57.44 
Robin_Watts Right, so you want some code that takes the pdf and the xml and outputs a new pdf.15:58.05 
AlecTaylor Yeah15:58.14 
Robin_Watts AIUI, the complete scheme for what you have does:15:58.18 
  original.pdf -> temp.xml (using mupdf)15:58.33 
  temp.xml -> temp2.xml (using your processing code)15:58.51 
AlecTaylor pdfextract original.pdf; someothertool original.xml original.pdf15:59.08 
Robin_Watts Then you want to go: temp2.xml + original.pdf -> final.pdf (using the bit of code we are talking about now)15:59.18 
AlecTaylor Now original.pdf has new information from original.xml15:59.23 
  yeah15:59.33 
Robin_Watts OK. So you want something like a modified pdfclean. You can strip that down so it just opens and reoutputs a pdf (and lose all the garbage collection, page extraction code etc)16:00.45 
  Then you can start to build it up so that it reads your xml and adds extra objects to the PDF before it reoutputs it.16:01.14 
  But you have to realise that pdfextract goes to some lengths to present you with a 'coherent' version of content.16:02.09 
  Just because you see 2 columns of content in the xml output, doesn't mean it appears like that within the PDF source.16:02.57 
  The pdf source could have bits of column 0 and bits of column 1 interleaved with one another.16:03.14 
  hence 'tagging content' at that level is likely to be hard.16:03.45 
  I still don't entirely understand what your aim is here.16:04.00 
  I thought from what you had said before that it was to add headers/footers to pages.16:04.44 
  If you're now saying you want to 'reflow' page content that's a hugely different thing.16:05.01 
kens AIUI the aim is to add metadata to PDF files with things like the ''tagged' (marked data) concept to delineate header/footer/ page number etc. And also to add Outlines with teh 'correct' page info (content, index etc)16:05.03 
Robin_Watts Using marked data will require (I reckon) a modified pdf interpreter; it'd need to run the PDF, and catch the marking of the page; as it marks into the appropriate regions (identified by the previous xml processing), then it would need to rewrite the streams to include marked content markers.16:08.12 
  That's a huge job.16:08.18 
  But I'd like to hear a clear description of what's required from AlecTaylor before we go any further, cos I could be barking up the wrong tree.16:09.01 
  Is the idea to 'mark' the headers/footers/stories (say by adding visual boxes to the page)?16:09.43 
AlecTaylor Hmm16:10.35 
  "<kens> AIUI the aim is to add metadata to PDF files with things like the ''tagged' (marked data) concept to delineate header/footer/ page number etc. And also to add Outlines with teh 'correct' page info (content, index etc)"16:10.46 
  That's what I am trying to do16:10.53 
  as well as "repair" the page numbers displayed by the PDF viewer to be the same as the page number "printed" on the page16:11.25 
Robin_Watts So what's the purpose of adding metadata? How will my viewing experience differ by me looking at a processed file rather than the original one?16:12.03 
kens That depends (I think) on the Outlines. If there are no Outliens Acrobat displays the ordinal page number (1/50 and so on)16:12.11 
  Robin_Watts : If you use (eg) a PDF to speech package, it won't read page numbers if they are idetified as such, for example16:12.59 
  tagged PDF is mandated by a number of governement agencies for this reason16:13.14 
  (accessibility)16:13.20 
Robin_Watts OK. That's what I was after. Is that the reason, AlecTaylor ?16:13.35 
kens Because they don't really understand what the creation implies16:13.48 
  Also header/footer may be omitted etc.16:14.16 
AlecTaylor Yeah, for accessbility and searching is my purpose for specifying what on the page is the header/footer and what is the other stuff.16:14.45 
kens Searching already works, or if it doesn't you won't be able to fix it16:15.05 
Robin_Watts AlecTaylor: Right. So annotations are not the way to go.16:15.12 
  You do need to rewrite the content streams.16:15.28 
kens I see it as /Outlines plus marked content (tagged PDF)16:15.39 
  I *think* you can insert marked content inside text blocks, but I'm notr certain16:16.03 
  If you cant' then that's going to be really hairy16:16.12 
Robin_Watts And pdfclean is a reasonable starting point (it reads in/writes out, and will enable you to add Outlines).16:16.18 
  But it's far from easy.16:16.39 
AlecTaylor Hmm16:16.56 
Robin_Watts Because you're going to have to interpret the PDF in order to know where each text rendering operation puts its text (and hence in what section it goes).16:17.24 
AlecTaylor Well on the bright side, at least IT'S POSSIBLE in this library (rather than poppler, which I've been working off)16:17.29 
Robin_Watts You could easily have a text object that wrote to both header, footer, and multiple stories within in.16:18.02 
  AlecTaylor: Anything is possible with any library depending on how much you're prepared to rewrite :/16:18.47 
AlecTaylor Robin_Watts: I envision a find/replace function, search for this text "aaa 44" (whatever's in the XML tag), add information around that text tagging it up16:18.47 
Robin_Watts AlecTaylor: Sadly, no.16:18.58 
AlecTaylor :\16:19.07 
kens That text may not appear in the PDF16:19.18 
Robin_Watts You might have got information in your extracted XML telling you that the word "hello" appeared on the page.16:19.45 
  and you might want to tag that as being part of the header.16:19.52 
  Within the pdf, you might have "hello", or you might have had "h", "e", "l", "l", "o"16:20.24 
  or "he" "ll" "o".16:20.30 
AlecTaylor Ahh, because it isn't shown as a big line of text16:20.34 
Robin_Watts or "o" "l" "e" "h" "l"16:20.45 
kens Or even random text between, but positioned separately16:20.47 
AlecTaylor wait16:20.50 
  what16:20.52 
  That last one, I don't get it16:20.57 
kens Font encoding16:21.03 
  What you see in the XML is eht Unicode from teh ToUnicode cMap16:21.16 
Robin_Watts kens: I wasn't even talking about Font encoding yet :)16:21.18 
kens Or a guess based on other heuristics16:21.25 
Robin_Watts AlecTaylor: Letters can be sent to the page in any order.16:21.37 
kens Yep.16:21.46 
Robin_Watts PDF is basically a stream of 'page marking' operations.16:21.51 
sebras Robin_Watts: mmm, or what characters are actually part of the font embedded in the pdf...16:21.53 
kens And often are in CJKV or right to left languages16:21.58 
AlecTaylor :S16:22.30 
Robin_Watts As such any marks that add up to give the correct final result are fine; letters can be send in any orders.16:22.39 
  You could send all the 'a's then all the 'b's etc.16:22.51 
  You need to redo the page interpretation, and watch where each char is placed; if it happens to be outputting a char to a place you know is in a header, then you can rewrite the stream to mark it as being in the header.16:24.46 
  It's not a trivial job.16:24.57 
AlecTaylor How about extending pdfclean to fix up the streams for this operation?16:28.12 
Robin_Watts AlecTaylor: pdfclean will get the streams into memory in a raw, uncompressed form.16:28.48 
  You can rewrite them there.16:29.02 
AlecTaylor Hmm16:29.08 
kens Its not as simple as 'fixing' them, you haev to identify where teh stream writes the glyphs you are interested in and modify it16:29.13 
AlecTaylor Seems quite extensive and complicated16:29.19 
Robin_Watts BUT... knowing how to rewrite them is the hard part.16:29.21 
kens AlecTaylor : Yes, that's what we've been saying :-)16:29.35 
AlecTaylor :(16:29.43 
Robin_Watts AlecTaylor: You didn't want to pick a simple project for your thesis, right?16:29.44 
AlecTaylor My thesis uses something else, haven't started it yet (hasn't been approved yet).16:30.07 
  This was a one-semester undergrad research project16:30.19 
  Semester ends in 7 days16:30.58 
  Also have 2 exams for other subjects within these 7 days16:31.22 
kens Well you can weite up the research16:32.07 
  write16:32.12 
AlecTaylor Yeah, that's what I'll do instead.16:32.40 
  This seems wayyyyy too extensive for something I can do in 7 days, even if I wasn't working on anything else16:32.58 
kens Yeah, not a hope of doing it in 7 days16:33.12 
Robin_Watts indeed not.16:33.18 
kens You could do an 80/20 probably16:33.21 
AlecTaylor wah16:33.35 
Robin_Watts AlecTaylor: If you have reliably detected headers/footers/page numbers, then that's a big job done.16:33.58 
kens I think that's still information Tor and I would be intrested in getting16:34.31 
Robin_Watts You could easily write a 'further work' section describing how you'd like to have a tool to 'reinsert' that information back into the original pdf.16:34.49 
AlecTaylor All well, you guys want my research prototype which creates an XML from the XML generated by pdfextract, with header/footer tags with proper page numbers, implemented in only stdlib (with the header-only RapidXML library for reading in the XML)?16:34.55 
AlecTaylor is giving it the poppler project, but no reason can't give it to you guys too16:35.13 
Robin_Watts AlecTaylor: I think we'd all be interested in seeing it, yes, thanks.16:35.23 
kens I'd certainly be interested in it16:35.24 
AlecTaylor Sure, I'll send over a patch soon16:35.35 
Robin_Watts In fact, when you write up your paper, give us a link :)16:35.39 
kens At the least we can add the information to the XML outptu from MuPDF and Ghostscript16:35.41 
AlecTaylor Will do :)16:35.46 
  I might add in some fuzzy stuff, are OCR errors prominent enough to warrant it?16:36.53 
kens My experience is OCR is too good to need it today16:37.19 
Robin_Watts 'fuzzy' ?16:37.51 
  OCR is at least as good at spelling as your average youtube commentard :)16:38.09 
AlecTaylor Robin_Watts: So I could employ Levenstein Distance with a barrier of 1 or so16:38.13 
  On spell-checkers, Google has gotten terrible. It overly uses its result statistics to spell-check16:38.46 
  But sure, if OCR is fine at the moment, I'll leave it be16:39.44 
  Is there an article I can reference so that I can legitamately skip it?16:39.58 
kens None that I know of. My only real experience of OCR is 30 years out of date16:40.33 
  Back tehn it was an intersting problem16:40.51 
Robin_Watts AlecTaylor: I don't see that you need to cite a reference; stating that it's a problem that you are ignoring should be enough.16:41.15 
  The original document scanner is far better placed to do such fixes.16:41.37 
AlecTaylor So can I safely say "levenstein distance is utilsed at scan stage so I won't do it here"16:42.02 
  or somethign along those lines?16:42.07 
kens I don't know what techniques OCR people use today.16:42.37 
  I'd be inclined to say that it 'could be used as a further refinement, if experience demomstrates a problem'16:42.57 
Robin_Watts AlecTaylor: Yeah, don't tie yourself to a single possible algorithm.16:43.41 
AlecTaylor k16:44.06 
  All well, gotta head off to grab a few hours sleep soon16:44.22 
  (3:44AM here)16:44.27 
  Just gotta figure out this file upload problem for a conference I am organising (which is in 7 days)16:44.43 
  Robin_Watts, kens: Porting MuPDF to javascript: possible/impossible?16:47.06 
  (just for a viewer)16:47.13 
kens Re-writing in JavaScript ? Impossible (or rather takes too long16:47.41 
Robin_Watts AlecTaylor: You wouldn't rewrite, you'd write from scratch.16:48.42 
AlecTaylor Would it be a waste of time? - Is it what Google did?16:49.13 
Robin_Watts Someone else has done it.16:49.19 
  It's not what google did.16:49.26 
  Yes, it would be a HUGE waste of time (unless you're interested in doing it to drive javascript development)16:49.47 
AlecTaylor Nope, just interested16:50.43 
AlecTaylor was thinking to port the Qt Poppler viewer stuff to Wt16:51.01 
mvrhel_laptop good morning16:52.57 
Robin_Watts Morning16:53.05 
kens Hi mvrhel_laptop16:53.49 
  Would you mind looking at bug #692680 ?16:54.14 
  Just to read it!16:54.24 
mvrhel_laptop sure16:56.07 
kens Thanks, if I'm gone when Ray arrives, could you ask him to read it too please ?16:56.35 
mvrhel_laptop yes16:56.50 
kens At the moment it looks like Alex's suggestion is the only concrete possibility for a solution, but you and Ray were talking about high level colour so I'd like you to bear this problem in mind ;-)16:57.17 
mvrhel_laptop ok. making my way through the comments16:58.08 
kens Yeah, its a bit complicated.16:58.23 
  Feel free to ask if its not clear16:58.44 
  Cool technology:16:59.18 
  http://www.reghardware.com/2011/11/16/3d_holographic_displays_become_reality/16:59.18 
mvrhel_laptop wow this eps file sounds terrible16:59.49 
kens I understand why its doign it, but I don't like it17:00.02 
AlecTaylor I like that green flashing lazer17:04.30 
  makes me hungry for fake 3d fish17:04.37 
kens I want my real 3D telly now please.17:05.00 
AlecTaylor You already have one17:06.00 
kens proper 3D display17:06.09 
  fake ones don't work for me17:06.16 
AlecTaylor You have a 2D TV? - Amazing!17:09.03 
  You're such a square :P17:09.11 
kens display....17:09.22 
  ray_laptop : I'm about to go off, but can you review and think about bug #692680 please.17:12.47 
  Not for a solution, just to consider in terms of future work relating to colour17:13.02 
ray_laptop kens I saw your comment and the gs-bugs email with17:13.11 
kens Thanks Ray17:13.21 
ray_laptop kens: I agree that it looks AWFUL17:13.31 
kens ray_laptop : well on reflection I understand what's done and why, but its nasty.17:13.50 
ray_laptop kens: yes, and they didn't even 'bind' the procset :-(17:14.11 
kens Actually, even if they did it wouldn't help17:14.25 
  Because we actually need to do what Acrobat does. Send the Indexed sample through both colour spaces, to get 6 colour values for each image sample17:14.48 
  Then set up /DeviceN [/Black /Spot1 /Spot2 /None /None /None] /DeviceRGB17:15.21 
ray_laptop oh, yuck!17:15.45 
mvrhel_laptop wow17:15.50 
kens So if the inks are rpesent we ignore teh last three values, if the inks aren't present then the tint transform to RGB does {pop pop pop} to remove the spot inks and elave the RGB17:15.54 
ray_laptop kens: I understand now.17:16.37 
henrys ray_laptop:it looks like marcos dumped 692674 on you without history investigation - that customer did get poor service with their problem, will you have a chance to look at it soon? Or should we send it back to marcos for history?17:16.51 
ray_laptop henrys: I've started looking at it (on peeves)17:17.12 
  henrys: I'll post something today. I think Marcos just gave it to me today17:17.32 
  henrys: and I agree that they sort of got ignored, but then, they aren't my favorite customer either17:18.04 
AlecTaylor Damn .htaccess file, it's working now17:18.24 
  WOOT17:18.25 
kens Did they get ignored, they opened it one day and started bleating 24 hours later17:18.28 
henrys ray_laptop:I didn't follow it carefully they whined in the bug report.17:19.12 
ray_laptop henrys: these were the guys that had me work on trimming down the initialization time of gs because they were timing 5,000 simple jobs and starting gs each time17:19.16 
henrys oh same folks, sigh17:19.38 
mvrhel_laptop I wonder what the *huge* difference is17:19.44 
ray_laptop henrys: Miles had them pay extra for that.17:19.47 
  I'll post my comments to the bug once I do the timings on peeves.17:20.06 
  (I'll do profile as well)17:20.28 
kens TBH I think the answer is 'there are bg fixes in newer releases, sometimes these cause performance penalties for correct behaviour. You cna have fast or good, your choice' ;-)17:20.33 
ray_laptop kens: may be -- but first I want to get the facts17:20.52 
kens Yeah, I was thinking of looking at it, but got diverted, and then realised it wasn't pdfwrite17:21.23 
Robin_Watts kens: performance differences between 8.x and 9 is probably color management - but ray_laptop is right that we'd like to be sure of that.17:21.28 
mvrhel_laptop sure. blame it on the color... :)17:21.56 
ray_laptop mvrhel_laptop: if it is the color, I'll assign the bug to you :-)17:22.15 
kens It could be almost anything, including the fact that it uses FreeType, which is what fixed hteir *original* problem.17:22.28 
  Bear in mind that Chris has done some perfomrmance fixes in that area since the release of 9.0417:22.54 
ray_laptop kens: iirc, FT is slightly faster than the AFS (Artifex Font Scaler)17:23.13 
kens But that's what made me think of 'yes,. its right now, but slower' as an answer17:23.13 
  ray_laptop : There was something that made it slower, Chris fixed it.17:23.27 
ray_laptop kens: don't worry, I will also time HEAD17:23.33 
kens I think it was something about freeing memory17:23.39 
chrisl Yeh, confusion about what Freetype does and when.17:24.30 
kens Anyway, time for me to go, goodnight all17:24.34 
AlecTaylor Alright, well thanks very much Robin_Watts and kens. I will be sure to include the additional relevent information in my research paper, and proceed without reprocessing the PDF file, but just showing the XML as PoC17:24.35 
chrisl ray_laptop: there was a FAPI/Freetype performance regression in 9.04, which I fixed, and actually fixed what I was trying to do when I introduced the problem - in theory, it should be slightly faster than 9.02, but I suspect it won't be measurable17:26.46 
ray_laptop chrisl: we'll see. I'll let you know if the profile shows that it is font related in the HEAD rev.17:27.36 
chrisl ray_laptop: okay, thanks.,17:27.50 
  ray_laptop: and just in case, the -dDisableFAPI command option still works if you want back-to-back FAPI/AFS numbers.17:29.08 
ray_laptop chrisl: thanks for the hint17:30.05 
henrys chrisl:so how can shelly run the clusters but not have an account?17:31.50 
chrisl henrys: I have no idea - does git have its own permissions?17:32.27 
Robin_Watts chrisl: He does have an ssh account.17:32.41 
henrys and he has gs-priv17:33.05 
  he's in the group17:33.11 
  my guess is he's trying to check into http read only.17:33.24 
chrisl Okay, let me talk to him, and see if it's a settings problem......17:33.28 
Robin_Watts If he's using "git push" then he doesn't need to have an account.17:33.31 
  sorry "git cluster".17:33.41 
henrys we want him to be able to commit and I don't see why he can't.17:34.33 
Robin_Watts Me either. Tell him to log in here, and we'll talk him through it.17:36.34 
ray_laptop if he cloned from the http, then his remote.origin may be wrong17:36.45 
chrisl If the worst comes to the worst, he can come down here, and I'll hit his computer with a mallet until it works.17:36.52 
henrys ray_laptop:that's what I suspect17:36.55 
ray_laptop have him check his git config -l17:37.01 
Robin_Watts git remote -v17:37.16 
  chrisl: Is there a reason he doesn't log in here ?17:37.46 
ray_laptop Robin_Watts: yeah, that has the info too :-)17:37.47 
chrisl Robin_Watts: timing, mostly.17:38.16 
ray_laptop with so many people here with strange nicks, how can you tell he's not17:38.27 
Robin_Watts He'd have to be up at VERY odd hours for someone not to be here :)17:38.44 
henrys he's been on before - IRC may not be compatible with his work situation.17:39.10 
ray_laptop there's alway webchat17:39.30 
  s/alway/always/17:39.41 
Robin_Watts henrys: Presumably he's not working for us while *at* work..17:39.47 
ray_laptop Robin_Watts: why not ?17:40.04 
  ;-)17:40.10 
henrys that's what I'd do, why waste my free time ;-)17:40.11 
chrisl Robin_Watts: I'd probably best not comment on that......17:40.39 
AlecTaylor :P17:41.35 
chrisl henrys: I've dropped a mail to Shelly, saying what we think - he'll give me a call if it's not what we think17:47.01 
henrys chrisl:please do invite him to irc next time you talk, I assume I don't need to answer his last email since youare are talking to him.17:47.08 
  oops should have said that sooner.17:47.25 
chrisl henrys: I do mention IRC whenever we talk - I'll keep doing so17:48.36 
henrys thanks17:49.01 
chrisl henrys: it would be good if you could reply to him wrt to the billing - seeing us right for the bounty fix that was wrong17:50.48 
henrys chrisl:will do.17:51.37 
chrisl Thanks - that's the proverbial "above my pay grade" ;-)17:52.01 
henrys for those of us that have been kicked upstairs before we break something ;-)17:52.59 
chrisl refuses to comment ;-)17:53.49 
henrys I must warn you guys before the meeting I've cut my hair so try to recognize me. I've had some confusing encounters...17:54.42 
chrisl Wow! Was it weighing you down too much?17:55.11 
Robin_Watts henrys: What would you have got if you'd won the bet?17:56.11 
henrys as you age it gets thinner and you've got cut it back so it'll get thicker.17:56.45 
  ;-)17:56.49 
Robin_Watts henrys: I'm not sure that works... by that token I'd have *really* thick hair.17:57.26 
  :)17:57.39 
henrys according to 23andme I have the genes for male pattern baldness but it hasn't taken hold yet.17:59.30 
ray_laptop henrys: have you told Miles and Scott ? (it would be fun to see their reaction)17:59.32 
henrys no I haven't told them.17:59.59 
  I've had a ponytail for 15 years, I didn't go down easily ;-)18:01.25 
mvrhel_laptop wow18:02.00 
  I had told my kids about your long hair. Luckily Alden had seen you before otherwise they won't believe me when they see you in Miami18:02.37 
henrys it's still longer than Robin_Watts' ;-)18:03.23 
Robin_Watts mvrhel_laptop: Tell them an alligator got to it.18:04.10 
henrys my kids were shocked, they don't have memory of me with short hair.18:05.12 
mvrhel_laptop You should be faster in your runs now18:05.52 
henrys anyway speaking of Florida it's be great to have a get together with everybody, I'm glad Miles invited spouses, families and all.18:06.15 
Robin_Watts mvrhel_laptop: You're forgetting the Samson effect.18:06.20 
mvrhel_laptop good point18:06.27 
henrys the stats so far indicate a slowdown but we'll see.18:06.43 
AlecTaylor lolol18:07.21 
henrys chrisl:I notice the jbig2 stuff is going to get complicated if he has sjbig2.c changes - have we looked at git externals yet? This business of having 2 repos for jbig2 is a problem.18:14.14 
chrisl henrys: I thought we wanted to avoid externals18:15.09 
Robin_Watts god yes, we want to avoid externals.18:17.13 
  not even sure it's possible with git.18:17.18 
chrisl git submodule18:17.34 
henrys well I thought git externals were better than svn externals and tor was going to ease us into it. But I haven't read about git externals myself.18:17.39 
ray_laptop my kids were shocked when I showed them pictures of me with a beard18:20.58 
  henrys: did you donate your hair ?18:21.23 
  my beard wasn't long enough to donate ;-)18:21.55 
AlecTaylor needs a haircut, his hair is almost down to his lobes!18:22.01 
AlecTaylor shaved for the first time in 2.5 weeks a few days ago18:22.17 
henrys ray_laptop:yes locks of love18:22.18 
chrisl Robin_Watts, henrys: git submodules pulls in a specific revision of the sub-project - so we'd have the same issue of not being able to commit to it.18:22.28 
ray_laptop henrys: good going !!18:22.30 
henrys chrisl:so what to do about jbig2 continue to maintain it in 2 repos?18:24.19 
Robin_Watts Why can't we make the ghostpdl repo the 'one true repo' for jbig2? We can do releases in turn with the ghostscript releases.18:25.30 
chrisl henrys: if I'm honest, my inclination would be to make the ghostscript jbig2dec the "canonical" repos, and kill the other one18:25.35 
henrys chrisl:the mail I've sent so far suggests an API change requires updates in both repos and regular fixes go to the standalone jbig2, then you'll grab that as you please.18:25.38 
  I tried that and tor didn't like it.18:25.58 
  but I guess he can be outvoted - I didn't know you guys would buy into that also.18:27.00 
tor8 henrys, chrisl: that'l be painful for mupdf thirdparty libraries, but as long as we provide release tar balls of jbig2dec we're not in a worse spot than with zlib18:27.34 
chrisl tor8: where do you get the jbig2dec source for the third part libs?18:28.13 
tor8 at the moment I have a (private, on my machine) git repo which uses submodules to pull in all third party libs18:28.22 
  from the jbig2dec git on casper18:28.34 
  there are many ways to have multiple repos for one project with git18:29.14 
  submodules is just one way18:29.20 
  android uses the 'repo' tool18:29.26 
henrys chrisl:so can I tell shelly to just check into gs?18:29.29 
chrisl henrys: for now, I think so. Let's see if we can hammer out a solution keeping the separate repositories - if we do, we'll up date it from the gs code18:30.56 
tor8 henrys: if we want to start using submodules I can set that up, but it means all of us have to run a few more git invocations to keep things in sync18:31.04 
  chrisl: I wonder if git cherrypick will work for that, though I think the different paths will pose a problem18:31.57 
chrisl tor8: could we use some triggers so that commits into the gs/jbig2dec get mirrored to the jbig2dec repos and vice versa?18:32.22 
henrys tor8:maybe something for the meeting - I wouldn't expect a warm reception to new git training.18:32.38 
tor8 chrisl: I think it'd be relatively easy to make a script pull and update the jbig2dec sources in gs if we use the external jbig2dec git as a master18:32.56 
  henrys: yeah. conceptually git submodules work great, but they aren't trivial to use :(18:33.27 
  you need to run a few extra commands to keep the subrepositories updated18:33.47 
chrisl Yeh, it was the other way (gs->jbig2dec) I was wondering about18:34.01 
Robin_Watts tor8: Why not have a script that automatically recommits any changes to the ghostscript version into the other repo ?18:34.01 
ray_laptop since jbig2dec doesn't get updated frequently, it might work18:34.14 
henrys tor8:the immediate issue is shelly has a fix that requires a change to gs (sjbig2.c) and jbig2 - somehow this must be syncronized.18:34.21 
ray_laptop henrys: but jbig2 _is_ in the gs repos18:34.56 
henrys ray_laptop:it is in 2 places.18:35.16 
ray_laptop is the problem the shared lib staying in sync ?18:35.39 
tor8 henrys: that's not a problem if we use git submodules (since we'd update the submodule version in the same commit as sjbig2.c)18:35.44 
ray_laptop votes for disallowing shared lib support in gs builds ;-)18:36.59 
tor8 henrys: one approach is to make the jbig2dec.git auto-generated by filtering out all non-jbig2dec related stuff from the main repo18:37.08 
tor8 agrees with ray! I hate shared libraries.18:37.24 
henrys fair enough - the simplest thing is to make gs canonical and do jbig2 tarballs but that solution seems to have slipped away.18:37.25 
Robin_Watts What tor8 just said.18:37.36 
tor8 henrys: let's start with that, and I'll work on a script to to create a read-only jbig2dec subset git18:37.52 
Robin_Watts (about auto-generating from the main repo).18:37.54 
henrys tor8:I'm sure chrisl will appreciate that.18:38.13 
tor8 it's very similar to what I did when converting the svn to git repos18:38.16 
  henrys: put it on the tech agenda so you can remind me later18:39.11 
chrisl tor8: if it's doable that'd be great, but if it's a problem I don't mind adding updating of the jbig2dec repos to the Ghostscript release process - we just need to make sure people don't generally commit to it.18:39.55 
tor8 chrisl: we can add hooks to disable commits, or just change the file permissions.18:40.30 
henrys tor8:will do about the agenda for now shelly will commit to gs - and any bug that affects mupdf you'll get one way or another.18:40.43 
tor8 the issue is filtering the whole gs history to create a jbig2dec repo is rather cpu expensive, so we probably don't want to do that on every commit18:41.05 
chrisl henrys: is this a jbig2dec API change, or just a change in how GS uses it?18:41.35 
  tor8: once a week is probably (more than!) enough18:41.59 
tor8 chrisl: indeed!18:42.28 
chrisl And if even that uses too much CPU time - once a month......18:43.19 
Robin_Watts tor8: Once we've generated it, can we not do incremental updates?18:43.38 
  i.e. use a hook on the golden repo on casper so that whenever we commit to ghostpdl in the jbig2dec dir, it recommits to the jbig2dec standalone ?18:44.23 
tor8 Robin_Watts: I'll have to look into it, but I think it should be possible.18:44.32 
Robin_Watts That way we get it instantly in sync, for low cost.18:44.39 
tor8 git filter-branch is what I was thinking of using18:44.51 
  git filter-branch --subdirectory-filter gs/jbig2dec --prune-empty ...revs...18:45.25 
henrys chrisl:now I can't find the patch, hang on.18:45.51 
chrisl henrys: it's really for tor8's benefit - if the jbig2dec API changes, mupdf will need to be aware of it, too.18:46.44 
falko_ hi is it possible to create a rpt port to ghosscript and if i print i print to a printer, create a psfile and after that start a executable (that moves the ps file around) ?18:48.18 
  at the moment i have a script that checks every 3 sec for a new file ,.. does the printing to the printer and teh moving around .. but that is not a very nice solution18:49.32 
henrys chrisl:apparently the sjbig2.c is reference in his email as being in the patch but it isn't there?18:49.49 
chrisl henrys: he did ask how we wanted it - I may have misinterpreted what he said.18:51.02 
henrys figure it out later if he commits it's easy enough to back out.18:55.30 
ray_laptop mvrhel_laptop: the color stuff is DEFINITELY in the top of the profile for the performance (904 vs 871) :-(18:58.34 
mvrhel_laptop :(18:59.00 
  ray_laptop: if you want to dump this on me, that is fine18:59.19 
ray_laptop mvrhel_laptop: cmsSample3DGrid is particluarly heavy.18:59.34 
  mvrhel_laptop: I think the issue is that 871 runs with 'simple' color and 90x effectively always runs with -dUseCIEColor19:00.12 
chrisl henrys: Okay, getting Shelly able to commit should resolve the API question from a GS point of view, and I'll make sure to tell him to inform tor8 if the API really has changed.19:00.56 
mvrhel_laptop ray_laptop: yes. I guess what I need to do is to get the "dumb" CMM in place19:01.18 
  for those who don't want color management19:01.28 
ray_laptop mvrhel_laptop: we'd need a 'dumb' cms that totally ignored the ICC profiles 19:01.33 
mvrhel_laptop hehe19:01.38 
  that is on my todo list. and I did start it19:02.04 
henrys what is the effect of just using the dump profiles for now?19:02.17 
ray_laptop mvrhel_laptop: but the complication is that we can only do that for the 'default' colorspaces -- If Lab or ICC comes in, we still need to handle it19:02.30 
mvrhel_laptop henrys: they will still use lcms19:02.52 
  ray_laptop: yes19:03.00 
  I need to think a bit about this19:03.07 
henrys mvrhel_laptop:it may be unique to pcl but many jobs run significantly faster with the dump profiles.19:03.29 
mvrhel_laptop interesting.19:03.39 
  I do have an idea though how to do this19:03.46 
  basically we have an interface between the graphics library and the CMM and we know when the profile was a substituted one for a defaultRGB etc19:04.24 
  this is needed for PDFwrite19:04.29 
ray_laptop mvrhel_laptop: if the "quick and dirty" cmm recognized the input ICC profiles enough to recognize that it is RGB, Gray or CMYK then the 'link' profile it returns can be a suitable dummy that the 'quick and dirty' color conversion code can recognize and shortcut19:04.36 
henrys I think there are many cases in the code where you don't get "pure" colors with the fancy profiles.19:04.44 
mvrhel_laptop yes exactly19:04.45 
  let me look at adding that as an option 19:05.07 
ray_laptop mvrhel_laptop: OK. But let's decide when that fits with your other priorities19:05.36 
Robin_Watts I'm about to risk breaking the cluster.19:07.22 
  Anyone have any jobs they want to get in desperately?19:07.32 
ray_laptop Robin_Watts: not me19:07.39 
mvrhel_laptop well I am doing a run right now19:07.49 
  oh it just finished19:07.55 
Robin_Watts This will only affect new jobs.19:08.01 
mvrhel_laptop I am done. Going to do a commit now19:08.23 
Robin_Watts ok, I've swapped to a modified run.pl that should at least get the build right on windows cluster machines.19:08.58 
mvrhel_laptop actually I may run one more real quick19:09.03 
Robin_Watts Go for it.19:09.09 
  In theory with no windows cluster machines you should see no differences.19:09.19 
mvrhel_laptop ok19:09.27 
henrys Robin_Watts:oh are you getting the windows node going?19:10.20 
Robin_Watts henrys: trying to.19:10.29 
tor8 chrisl, kens: more ammo if you don't like the _t suffix ... it's reserved by POSIX so we shouldn't be using it in the first place!19:17.38 
ray_laptop you mean like tin64_t ?19:19.23 
tor8 yeah19:19.36 
ray_laptop int64_t that is :-/19:19.39 
tor8 posix makes the claim on *_t in ANY header19:20.11 
ray_laptop that seems a bit excessive19:20.47 
  what happened to C standard 19:21.07 
tor8 yeah, but _t is an ugly wart so I don't complain :)19:21.15 
  the C standard, it got let out in the wild... :/19:21.45 
  looking at the list of name spaces that posix lays claim on, it's pretty excessive even without *_t!19:22.42 
Robin_Watts ok, we have a windows cluster node... and it's pinging the cluster, let's try doing a build.19:22.56 
  My cygwin installation appears to be having problems. Gah.19:25.38 
ray_laptop btw, the HEAD rev is virtually the same on this test as 904, so FT isn't bolliixing anything up19:25.42 
henrys bbiab19:29.07 
mvrhel_laptop bbiaw19:45.50 
Robin_Watts mvrhel_laptop: I've taken my node offline, so your job should run without a problem.19:48.35 
mvrhel Robin_Watts: you around?20:51.25 
Robin_Watts I am.20:51.31 
mvrhel so I was just testing 692512 to see if we could close this20:51.55 
  and I see that we get horizontal lines at 600dpi20:52.09 
  I thought we had this fixed20:52.18 
Robin_Watts so did I :(20:52.46 
mvrhel the bug was originally for vertical lines20:52.50 
  I have a single page version of this file if you want to take a look at it20:53.06 
Robin_Watts Assign it to me, and I'll have a look when I uncluster myself.20:53.09 
mvrhel ok.20:53.15 
  bbiaw20:55.36 
Robin_Watts Damn. Looks like I really am going to need to check out another copy of tests/tests_private.21:08.01 
  dinnertime.21:08.21 
mvrhel_laptop stormy here. we may lose power23:43.32 
 Forward 1 day (to 2011/11/17)>>> 
ghostscript.com
Search: