Ghostscript IRC logs

Log of #ghostscript at irc.freenode.net.

	<<<Back 1 day (to 2011/11/15)	2011/11/16
Robin_Watts	henrys: You're welcome. Glad to hear you think it'll fix it.	01:24.51
arthurf	tor8: Hi. Tried to do some quick searching with the iOS app on a PDF file, but not much happened. I'm guessing there is more work to do?	02:52.24
tor8	arthurf: yes. it still doesn't show any results :)	02:52.50
arthurf	tor8: Okay thanks. Just wanted to check that I wasn't completely missing something. Thanks. :)	02:53.22
robingower	I'm having a bit of nightmare compiling ghostscript on a mac	02:54.44
	I filed this ticket at macports months ago	02:54.54
	https://trac.macports.org/ticket/28825	02:54.55
	I can't get it to build outside of macports either	02:55.18
	any ideas of what else I could do to diagnose my problem?	02:55.49
tor8	robingower: that's odd, many of us (developers) use macs as development machines when working on ghostscript	02:56.02
robingower	indeed. I've tried the usual suspects like fiddling with arch settings and I've removed fink and macports etc	02:57.41
tor8	my normal procedure is: git clone, ./autogen.sh, make	02:59.30
	I don't use macports or anything like that, just a plain mac install with Xcode	03:00.10
robingower	'k thanks tor8 I'll give that a try	03:08.37
	I've cloned from git://git.ghostscript.com/ghostpdl.git and done autogen & make	03:23.01
	but I still get	03:23.04
	../gs/base/gstype42.c:68: error: conflicting types for âgs_type42_read_dataâ	03:23.09
	../gs/base/gxfont42.h:141: error: previous declaration of âgs_type42_read_dataâ was here	03:23.14
	I've checked the offending lines (dozen's of times!) and they look the same to me	03:23.36
	i.e. same parameter types etc	03:23.49
AlecTaylor	hi	04:53.39
ghostbot	hola	04:53.39
AlecTaylor	Any MuPDF developers around?	04:53.48
	How would I go about reverse-engineering an XML file back into the PDF?	05:06.41
alexcher	AlecTaylor: XML describes a general structured document, e.g. an IP datagram.	05:15.12
*AlecTaylor*	just posted on the mailing-list	05:15.30
alexcher	AlecTaylor: PDF is a page description language.	05:15.46
	AlecTaylor: How one can be converted to another?	05:16.48
AlecTaylor	alexcher: I was able to output an XML document using MuPDFs pdfextract tool. I have extended the output to detect header/footers and page numbers. Now I want to push the proper page numbers and additional per/page logical structure information back into the PDF	05:17.18
alexcher	AlecTaylor: MuPDF developers will be here in a few hours.	05:20.18
AlecTaylor	kk	05:20.22
	BTW: Did I post to the right mailing-list?	05:20.32
	http://ghostscript.com/pipermail/gs-devel/2011-November/009091.html	05:23.14
alexcher	AlecTaylor: yes	05:23.56
AlecTaylor	Great :)	05:24.07
*AlecTaylor*	has been writing these extensions for the poppler libraries, but they don't seem to have anything for moving from XML->PDF	05:24.29
kens	Morning marcosw_	08:46.36
AlecTaylor	How can I reverse-engineer an XML file generated with MuPDFs pdfextract tool back into the PDF?	09:01.06
kens	Write a tool to do it ?	09:01.37
	Open it with a XML reader and print to PDF ?	09:01.53
AlecTaylor	kens: Back into the PDF it was created with	09:04.01
kens	You can not recover teh same PDF (as requested somwhere else), its been interpreted and information has been lost.	09:04.44
	You can create a simliar, visually more or less identical PDF, but its not the same PDF.	09:05.07
	And we don't (as far as I know) provide a means to put it back. Why would you boother, you have the original.	09:05.37
	Actually, I guess you could do so, but I still don;t think there's a tool do it. I still don't see why you would want to eitehr, you have the original file	09:07.22
AlecTaylor	kens: I am adding logical structure information and proper page numbers into the PDF. No actual text will be changed, just some internal information	09:35.45
*AlecTaylor*	has developed a solution using just the stdlib and the header-only RapidXML library	09:36.15
kens	Ah, editing a non-editable format :-)	09:40.05
AlecTaylor	kens: eh?	09:40.48
kens	Well PDF files aren't really meant to be edited, even for metadata.	09:43.07
AlecTaylor	kens: True, but there are billions of PDFs without proper metadata. I would like to fix them all up!	09:50.41
*AlecTaylor*	is sure there's a way	09:50.55
kens	Creating a PDF file from the XML output of pdfextract is possilbe, but you need to regenerate the xref. SO you need an application to do it, and we haven't written one.	09:52.50
	You could use such a thing to create a Linearized PDF as well mind you	09:53.07
	But that might require more analysis	09:53.25
AlecTaylor	Hmm	09:56.49
	kens: Would there be another open-source project which provides the XML->PDF feature?	09:57.48
kens	I doubt it	10:09.54
AlecTaylor	:(	10:10.28
kens	But you know, you can be the first to write one :-)	10:10.43
AlecTaylor	yeah, but been working the last few months on this project, want it completed already!	10:11.02
*kens*	will be back later	10:25.27
Robin_Watts	marcosw_: ping?	12:58.50
	Hmm. The cluster code is going to need some changes to cope with building on a windows system.	13:49.03
kens	I would expect so	13:49.12
Robin_Watts	Presumably we want to call the projects rather than the makefiles.	13:49.15
	(otherwise we'd be testing a cygwin build rather than an MSVC one, and defeating the point)	13:49.37
kens	Yes, not good	13:49.51
	Unless you explicitly call nmake and force use of VC instead of gcc ?	13:50.41
Robin_Watts	I think an explicit call to nmake is the way to go.	13:50.59
	Or, probably better, a command line call to msdev, because that checks the projects.	13:51.28
kens	Will it use the right comiler though ?	13:51.31
	OK sounds good	13:51.40
Robin_Watts	Normally, you have to make something run once a minute on a crontab... that's going to need tweaking too.	13:52.06
	I am NOT having 2 copies of tests_private checked out on my machine. Time for some symlinks :)	13:53.02
kens	can't use task scheduling ?	13:53.11
Robin_Watts	I'd rather run something to start the cluster, and kill it to stop it.	13:54.20
	(I've just spent 250 quid on a graphics card - no point in that if I'm going to have the machine drop to a crawl when someone commits :) )	13:55.09
	(this is not an Artifex owned machine :) )	13:57.00
kens	So buy another one for Artifex.	13:57.13
Robin_Watts	If we're going to invest in a dedicated windows node, then putting somewhere with faster network access might make sense.	13:58.09
kens	Chris's office ;-)	13:58.26
chrisl	as long as it's very quiet!	13:58.55
kens	Or perhaps Alex's basement	13:58.56
	being slightly more serious for a minute	13:59.15
Robin_Watts	My new graphics card is slightly more noisy than my last one :(	13:59.31
	Helens new passport has just been delivered. I guess she'll be coming to Miami after all...	14:12.33
kens	Glad to hear it :-)	14:12.55
tkamppeter_	kens, can you have a look at bug 692687? If there is a simple fix for it I would like to issue this fix as a patch release for Oneiric, too.	14:20.43
kens	OK	14:21.56
	I owuld always advise against 'round tripping' files though	14:23.20
tkamppeter	kens, the problem here is that we are in a transition between PostScript-centric and PDF-centric CUPS filtering. In Debian and Ubuntu we have switched to PDF-centric filtering, but there are still some few applications which send print jobs in PS and some few printer drivers which rerquire PS input.	14:27.04
kens	Yeah, just saying	14:27.37
	Each conversion potentially throws stuff away	14:28.39
AlecTaylor	bn	14:32.29
tor8	Robin_Watts: looking through your patches now. they look good to me, but I must have something to complain about! I really do prefer: while (span) { ... span = next; } to the do { ... span = next; } while (span != NULL) form for loops like in the "stack overflow in text handling" commit	14:36.43
kens	tkamppeter it seems to be somthing to do with multiple subset TrueType fonts. It looks like a lot of glyphs are being replaced with .notdef	14:40.48
	Its going to take a while just to reduce the original PostScript file to somethign simple enough to investigate. I'm rather busy with a different problem right now, how urgent is this ?	14:41.40
daviddoria	In the help of a latex package it says I can run a line like this to convert all of the pages to separate png files: gs -sDEVICE=png16m -dTextAlphaBits=4 -r300 -dGraphicsAlphaBits=4 -dSAFER -q -dNOPAUSE -sOutputFile=equation%d.png Equations.pdf - however, when I do that, i just get GS> prompt?	14:43.23
kens	Any error message ?	14:46.00
	Did you properly specify the input file ?	14:46.07
	Oh, and you need to add -dBATCH to exit afterwards or type 'quit'	14:46.32
daviddoria	kens, oh haha it actually worked... I am just suprised it didn't exit automatically but rather took me to a GS> prompt?	14:46.39
kens	See comment above	14:46.50
daviddoria	yep, that did the trick, thanks	14:47.02
kens	no problem	14:47.10
	tkamppeter did you see my earlier question ?	14:54.35
tkamppeter	kens, it is not overly urgent, but if it is done in a week it would be great.	15:09.36
Robin_Watts	tor8: I did it that way, because it exactly matched what was there before.	15:12.12
kens	tkamppeter, well maybe, I can't be certain	15:12.29
Robin_Watts	But, yes, it would be nicer to have used while () { ... } as that would have protected against span being NULL on entry.	15:12.39
kens	It'll tkae me several hours just to reduce the problem until I can work on it	15:12.42
Robin_Watts	tor8: I'll fix that if you want.	15:13.06
kens	The file contains 4 fonts, 3 of tehm are compostie nad contin up to 15 sub-fonts	15:13.09
tor8	Robin_Watts: right. in other places where I've used an iteration to free a linked list instead of the lazy recursion I use while (foo) {}	15:13.09
kens	tkamppeter, also this file started life as a PDF file.	15:13.34
	So ita PDF->PS->PDF->PS	15:13.44
tor8	Robin_Watts: also get rid of the != NULL explicitness, we don't do that in mupdf ;)	15:14.04
Robin_Watts	but we should :)	15:14.24
tor8	then we may as well rewrite it all in pascal...	15:14.41
	do you want to amend your commit or make another one?	15:15.07
	(ie, should I push to master or wait)	15:15.16
Robin_Watts	IF you haven't merged yet, I'll fix mine.	15:15.28
	but if you have, push what you've got and I'll do new ones.	15:15.43
	I can't do this immediately - in the middle of fighting the cluster.	15:15.52
tor8	I haven't merged anything yet	15:15.53
Robin_Watts	by "merged" I mean "pulled my changes in", sorry.	15:16.12
tor8	I pulled them in but I haven't done anything other than looked at them	15:16.29
Robin_Watts	ok, I'll redo them. Will be a nicer history that way.	15:16.40
tor8	so if you want to amend, that's no trouble for me	15:16.43
tkamppeter	kens, the problem here is also that okular sends print job in PS format and it should send PDF, see https://bugs.launchpad.net/ubuntu/+source/okular/+bug/891199.	15:44.43
AlecTaylor	Reverse XML containing amended page number and new logical structure information (partition header/footer from other page content) using MuPDF? - If so, how?	15:45.03
Robin_Watts	AlecTaylor: You're starting from the wrong place.	15:46.26
	When you convert from pdf to xml you are getting out a certain subset of the information from within the PDF.	15:46.58
kens	tkamppeter it just increases the complexity, and the number of conversions. Makes it harder to sort out the real problem. Its going to take me quite a while just to simplify the problem. A file with 1 font instead of 4 and no grpahics, would help.	15:47.03
Robin_Watts	If you discard the PDF you are then losing all the data.	15:47.13
AlecTaylor	Robin_Watts: I needed XML so that I could have per page in "normal" text	15:47.21
Robin_Watts	Far smarter to get the xml out, process it to get what you want to add, then do a process to 'add' your new information back into the original pdf.	15:47.43
AlecTaylor	Robin_Watts: Also, can't I just modify the PDF that the XML was generated from	15:47.49
	yeah, that	15:47.59
	How can I do that?	15:48.09
Robin_Watts	AlecTaylor: Someone suggested on the mailing list that you add it into the end of the contents streams.	15:48.30
	That's possible, but a bad idea, IMHO.	15:48.44
kens	Which will break teh stream length (need to recalculate it) and the xref.	15:48.50
Robin_Watts	A better idea would be to add your new content as annotations.	15:48.56
AlecTaylor	So what should I do?	15:48.57
Robin_Watts	kens: Yeah, but easy enough to do with a modified pdfclean.	15:49.08
AlecTaylor	Robin_Watts: As annotations?	15:49.12
Robin_Watts	Echo...	15:49.26
	PDF provides the facility to add annotations to a document.	15:49.54
	Read the PDF spec for more details.	15:50.19
	I suspect you'd end up taking pdfclean (which reads a pdf in, 'does stuff' to it, and then writes it out) and tweaking it.	15:51.01
	You want to read a pdf in, add annotations to it, and then write it out.	15:51.13
kens	Could use pdfwrite and pdfmarks	15:51.43
Robin_Watts	A 'free text' annotation seems like the one you want.	15:52.51
AlecTaylor	Hmm, I don't think annotation is what I am after. The displayed page number I'd like changed to whatever I have heuristically determined.	15:54.20
Robin_Watts	You use an /AP entry to point at a new xobject that you define, and that xobject can have a stream of pdf operators in it.	15:54.21
kens	What do you mean by 'displayed page number' ?	15:54.43
	The rendered value, or the ordinal displayed b Acrobat ?	15:54.57
Robin_Watts	AlecTaylor: AIUI, you want to add specified content to every page (different on each page), right?	15:55.33
*kens*	thinks not	15:55.41
AlecTaylor	kens: The number displayed on each page in the PDF viewer	15:56.10
kens	So like I thought, not the page content stream	15:56.42
AlecTaylor	The other feature I want to add is a per page logical structure, which specifies what's page content, and what header/footer	15:56.49
kens	I think Acrobat get s that from teh Outlines tree if present	15:56.53
	So you need to add an Outlines	15:57.08
Robin_Watts	(or amend an existing one)	15:57.19
*AlecTaylor*	has written and prototyped algorithms that do this, and is outputting to an XML file. Now wants to push that to a PDF	15:57.19
	yeah	15:57.32
kens	As for your other info, what PDF structure do you propose to use to hold that inrformation, marked content ?	15:57.44
Robin_Watts	Right, so you want some code that takes the pdf and the xml and outputs a new pdf.	15:58.05
AlecTaylor	Yeah	15:58.14
Robin_Watts	AIUI, the complete scheme for what you have does:	15:58.18
	original.pdf -> temp.xml (using mupdf)	15:58.33
	temp.xml -> temp2.xml (using your processing code)	15:58.51
AlecTaylor	pdfextract original.pdf; someothertool original.xml original.pdf	15:59.08
Robin_Watts	Then you want to go: temp2.xml + original.pdf -> final.pdf (using the bit of code we are talking about now)	15:59.18
AlecTaylor	Now original.pdf has new information from original.xml	15:59.23
	yeah	15:59.33
Robin_Watts	OK. So you want something like a modified pdfclean. You can strip that down so it just opens and reoutputs a pdf (and lose all the garbage collection, page extraction code etc)	16:00.45
	Then you can start to build it up so that it reads your xml and adds extra objects to the PDF before it reoutputs it.	16:01.14
	But you have to realise that pdfextract goes to some lengths to present you with a 'coherent' version of content.	16:02.09
	Just because you see 2 columns of content in the xml output, doesn't mean it appears like that within the PDF source.	16:02.57
	The pdf source could have bits of column 0 and bits of column 1 interleaved with one another.	16:03.14
	hence 'tagging content' at that level is likely to be hard.	16:03.45
	I still don't entirely understand what your aim is here.	16:04.00
	I thought from what you had said before that it was to add headers/footers to pages.	16:04.44
	If you're now saying you want to 'reflow' page content that's a hugely different thing.	16:05.01
kens	AIUI the aim is to add metadata to PDF files with things like the ''tagged' (marked data) concept to delineate header/footer/ page number etc. And also to add Outlines with teh 'correct' page info (content, index etc)	16:05.03
Robin_Watts	Using marked data will require (I reckon) a modified pdf interpreter; it'd need to run the PDF, and catch the marking of the page; as it marks into the appropriate regions (identified by the previous xml processing), then it would need to rewrite the streams to include marked content markers.	16:08.12
	That's a huge job.	16:08.18
	But I'd like to hear a clear description of what's required from AlecTaylor before we go any further, cos I could be barking up the wrong tree.	16:09.01
	Is the idea to 'mark' the headers/footers/stories (say by adding visual boxes to the page)?	16:09.43
AlecTaylor	Hmm	16:10.35
	"<kens> AIUI the aim is to add metadata to PDF files with things like the ''tagged' (marked data) concept to delineate header/footer/ page number etc. And also to add Outlines with teh 'correct' page info (content, index etc)"	16:10.46
	That's what I am trying to do	16:10.53
	as well as "repair" the page numbers displayed by the PDF viewer to be the same as the page number "printed" on the page	16:11.25
Robin_Watts	So what's the purpose of adding metadata? How will my viewing experience differ by me looking at a processed file rather than the original one?	16:12.03
kens	That depends (I think) on the Outlines. If there are no Outliens Acrobat displays the ordinal page number (1/50 and so on)	16:12.11
	Robin_Watts : If you use (eg) a PDF to speech package, it won't read page numbers if they are idetified as such, for example	16:12.59
	tagged PDF is mandated by a number of governement agencies for this reason	16:13.14
	(accessibility)	16:13.20
Robin_Watts	OK. That's what I was after. Is that the reason, AlecTaylor ?	16:13.35
kens	Because they don't really understand what the creation implies	16:13.48
	Also header/footer may be omitted etc.	16:14.16
AlecTaylor	Yeah, for accessbility and searching is my purpose for specifying what on the page is the header/footer and what is the other stuff.	16:14.45
kens	Searching already works, or if it doesn't you won't be able to fix it	16:15.05
Robin_Watts	AlecTaylor: Right. So annotations are not the way to go.	16:15.12
	You do need to rewrite the content streams.	16:15.28
kens	I see it as /Outlines plus marked content (tagged PDF)	16:15.39
	I think you can insert marked content inside text blocks, but I'm notr certain	16:16.03
	If you cant' then that's going to be really hairy	16:16.12
Robin_Watts	And pdfclean is a reasonable starting point (it reads in/writes out, and will enable you to add Outlines).	16:16.18
	But it's far from easy.	16:16.39
AlecTaylor	Hmm	16:16.56
Robin_Watts	Because you're going to have to interpret the PDF in order to know where each text rendering operation puts its text (and hence in what section it goes).	16:17.24
AlecTaylor	Well on the bright side, at least IT'S POSSIBLE in this library (rather than poppler, which I've been working off)	16:17.29
Robin_Watts	You could easily have a text object that wrote to both header, footer, and multiple stories within in.	16:18.02
	AlecTaylor: Anything is possible with any library depending on how much you're prepared to rewrite :/	16:18.47
AlecTaylor	Robin_Watts: I envision a find/replace function, search for this text "aaa 44" (whatever's in the XML tag), add information around that text tagging it up	16:18.47
Robin_Watts	AlecTaylor: Sadly, no.	16:18.58
AlecTaylor	:\	16:19.07
kens	That text may not appear in the PDF	16:19.18
Robin_Watts	You might have got information in your extracted XML telling you that the word "hello" appeared on the page.	16:19.45
	and you might want to tag that as being part of the header.	16:19.52
	Within the pdf, you might have "hello", or you might have had "h", "e", "l", "l", "o"	16:20.24
	or "he" "ll" "o".	16:20.30
AlecTaylor	Ahh, because it isn't shown as a big line of text	16:20.34
Robin_Watts	or "o" "l" "e" "h" "l"	16:20.45
kens	Or even random text between, but positioned separately	16:20.47
AlecTaylor	wait	16:20.50
	what	16:20.52
	That last one, I don't get it	16:20.57
kens	Font encoding	16:21.03
	What you see in the XML is eht Unicode from teh ToUnicode cMap	16:21.16
Robin_Watts	kens: I wasn't even talking about Font encoding yet :)	16:21.18
kens	Or a guess based on other heuristics	16:21.25
Robin_Watts	AlecTaylor: Letters can be sent to the page in any order.	16:21.37
kens	Yep.	16:21.46
Robin_Watts	PDF is basically a stream of 'page marking' operations.	16:21.51
sebras	Robin_Watts: mmm, or what characters are actually part of the font embedded in the pdf...	16:21.53
kens	And often are in CJKV or right to left languages	16:21.58
AlecTaylor	:S	16:22.30
Robin_Watts	As such any marks that add up to give the correct final result are fine; letters can be send in any orders.	16:22.39
	You could send all the 'a's then all the 'b's etc.	16:22.51
	You need to redo the page interpretation, and watch where each char is placed; if it happens to be outputting a char to a place you know is in a header, then you can rewrite the stream to mark it as being in the header.	16:24.46
	It's not a trivial job.	16:24.57
AlecTaylor	How about extending pdfclean to fix up the streams for this operation?	16:28.12
Robin_Watts	AlecTaylor: pdfclean will get the streams into memory in a raw, uncompressed form.	16:28.48
	You can rewrite them there.	16:29.02
AlecTaylor	Hmm	16:29.08
kens	Its not as simple as 'fixing' them, you haev to identify where teh stream writes the glyphs you are interested in and modify it	16:29.13
AlecTaylor	Seems quite extensive and complicated	16:29.19
Robin_Watts	BUT... knowing how to rewrite them is the hard part.	16:29.21
kens	AlecTaylor : Yes, that's what we've been saying :-)	16:29.35
AlecTaylor	:(	16:29.43
Robin_Watts	AlecTaylor: You didn't want to pick a simple project for your thesis, right?	16:29.44
AlecTaylor	My thesis uses something else, haven't started it yet (hasn't been approved yet).	16:30.07
	This was a one-semester undergrad research project	16:30.19
	Semester ends in 7 days	16:30.58
	Also have 2 exams for other subjects within these 7 days	16:31.22
kens	Well you can weite up the research	16:32.07
	write	16:32.12
AlecTaylor	Yeah, that's what I'll do instead.	16:32.40
	This seems wayyyyy too extensive for something I can do in 7 days, even if I wasn't working on anything else	16:32.58
kens	Yeah, not a hope of doing it in 7 days	16:33.12
Robin_Watts	indeed not.	16:33.18
kens	You could do an 80/20 probably	16:33.21
AlecTaylor	wah	16:33.35
Robin_Watts	AlecTaylor: If you have reliably detected headers/footers/page numbers, then that's a big job done.	16:33.58
kens	I think that's still information Tor and I would be intrested in getting	16:34.31
Robin_Watts	You could easily write a 'further work' section describing how you'd like to have a tool to 'reinsert' that information back into the original pdf.	16:34.49
AlecTaylor	All well, you guys want my research prototype which creates an XML from the XML generated by pdfextract, with header/footer tags with proper page numbers, implemented in only stdlib (with the header-only RapidXML library for reading in the XML)?	16:34.55
*AlecTaylor*	is giving it the poppler project, but no reason can't give it to you guys too	16:35.13
Robin_Watts	AlecTaylor: I think we'd all be interested in seeing it, yes, thanks.	16:35.23
kens	I'd certainly be interested in it	16:35.24
AlecTaylor	Sure, I'll send over a patch soon	16:35.35
Robin_Watts	In fact, when you write up your paper, give us a link :)	16:35.39
kens	At the least we can add the information to the XML outptu from MuPDF and Ghostscript	16:35.41
AlecTaylor	Will do :)	16:35.46
	I might add in some fuzzy stuff, are OCR errors prominent enough to warrant it?	16:36.53
kens	My experience is OCR is too good to need it today	16:37.19
Robin_Watts	'fuzzy' ?	16:37.51
	OCR is at least as good at spelling as your average youtube commentard :)	16:38.09
AlecTaylor	Robin_Watts: So I could employ Levenstein Distance with a barrier of 1 or so	16:38.13
	On spell-checkers, Google has gotten terrible. It overly uses its result statistics to spell-check	16:38.46
	But sure, if OCR is fine at the moment, I'll leave it be	16:39.44
	Is there an article I can reference so that I can legitamately skip it?	16:39.58
kens	None that I know of. My only real experience of OCR is 30 years out of date	16:40.33
	Back tehn it was an intersting problem	16:40.51
Robin_Watts	AlecTaylor: I don't see that you need to cite a reference; stating that it's a problem that you are ignoring should be enough.	16:41.15
	The original document scanner is far better placed to do such fixes.	16:41.37
AlecTaylor	So can I safely say "levenstein distance is utilsed at scan stage so I won't do it here"	16:42.02
	or somethign along those lines?	16:42.07
kens	I don't know what techniques OCR people use today.	16:42.37
	I'd be inclined to say that it 'could be used as a further refinement, if experience demomstrates a problem'	16:42.57
Robin_Watts	AlecTaylor: Yeah, don't tie yourself to a single possible algorithm.	16:43.41
AlecTaylor	k	16:44.06
	All well, gotta head off to grab a few hours sleep soon	16:44.22
	(3:44AM here)	16:44.27
	Just gotta figure out this file upload problem for a conference I am organising (which is in 7 days)	16:44.43
	Robin_Watts, kens: Porting MuPDF to javascript: possible/impossible?	16:47.06
	(just for a viewer)	16:47.13
kens	Re-writing in JavaScript ? Impossible (or rather takes too long	16:47.41
Robin_Watts	AlecTaylor: You wouldn't rewrite, you'd write from scratch.	16:48.42
AlecTaylor	Would it be a waste of time? - Is it what Google did?	16:49.13
Robin_Watts	Someone else has done it.	16:49.19
	It's not what google did.	16:49.26
	Yes, it would be a HUGE waste of time (unless you're interested in doing it to drive javascript development)	16:49.47
AlecTaylor	Nope, just interested	16:50.43
*AlecTaylor*	was thinking to port the Qt Poppler viewer stuff to Wt	16:51.01
mvrhel_laptop	good morning	16:52.57
Robin_Watts	Morning	16:53.05
kens	Hi mvrhel_laptop	16:53.49
	Would you mind looking at bug #692680 ?	16:54.14
	Just to read it!	16:54.24
mvrhel_laptop	sure	16:56.07
kens	Thanks, if I'm gone when Ray arrives, could you ask him to read it too please ?	16:56.35
mvrhel_laptop	yes	16:56.50
kens	At the moment it looks like Alex's suggestion is the only concrete possibility for a solution, but you and Ray were talking about high level colour so I'd like you to bear this problem in mind ;-)	16:57.17
mvrhel_laptop	ok. making my way through the comments	16:58.08
kens	Yeah, its a bit complicated.	16:58.23
	Feel free to ask if its not clear	16:58.44
	Cool technology:	16:59.18
	http://www.reghardware.com/2011/11/16/3d_holographic_displays_become_reality/	16:59.18
mvrhel_laptop	wow this eps file sounds terrible	16:59.49
kens	I understand why its doign it, but I don't like it	17:00.02
AlecTaylor	I like that green flashing lazer	17:04.30
	makes me hungry for fake 3d fish	17:04.37
kens	I want my real 3D telly now please.	17:05.00
AlecTaylor	You already have one	17:06.00
kens	proper 3D display	17:06.09
	fake ones don't work for me	17:06.16
AlecTaylor	You have a 2D TV? - Amazing!	17:09.03
	You're such a square :P	17:09.11
kens	display....	17:09.22
	ray_laptop : I'm about to go off, but can you review and think about bug #692680 please.	17:12.47
	Not for a solution, just to consider in terms of future work relating to colour	17:13.02
ray_laptop	kens I saw your comment and the gs-bugs email with	17:13.11
kens	Thanks Ray	17:13.21
ray_laptop	kens: I agree that it looks AWFUL	17:13.31
kens	ray_laptop : well on reflection I understand what's done and why, but its nasty.	17:13.50
ray_laptop	kens: yes, and they didn't even 'bind' the procset :-(	17:14.11
kens	Actually, even if they did it wouldn't help	17:14.25
	Because we actually need to do what Acrobat does. Send the Indexed sample through both colour spaces, to get 6 colour values for each image sample	17:14.48
	Then set up /DeviceN [/Black /Spot1 /Spot2 /None /None /None] /DeviceRGB	17:15.21
ray_laptop	oh, yuck!	17:15.45
mvrhel_laptop	wow	17:15.50
kens	So if the inks are rpesent we ignore teh last three values, if the inks aren't present then the tint transform to RGB does {pop pop pop} to remove the spot inks and elave the RGB	17:15.54
ray_laptop	kens: I understand now.	17:16.37
henrys	ray_laptop:it looks like marcos dumped 692674 on you without history investigation - that customer did get poor service with their problem, will you have a chance to look at it soon? Or should we send it back to marcos for history?	17:16.51
ray_laptop	henrys: I've started looking at it (on peeves)	17:17.12
	henrys: I'll post something today. I think Marcos just gave it to me today	17:17.32
	henrys: and I agree that they sort of got ignored, but then, they aren't my favorite customer either	17:18.04
AlecTaylor	Damn .htaccess file, it's working now	17:18.24
	WOOT	17:18.25
kens	Did they get ignored, they opened it one day and started bleating 24 hours later	17:18.28
henrys	ray_laptop:I didn't follow it carefully they whined in the bug report.	17:19.12
ray_laptop	henrys: these were the guys that had me work on trimming down the initialization time of gs because they were timing 5,000 simple jobs and starting gs each time	17:19.16
henrys	oh same folks, sigh	17:19.38
mvrhel_laptop	I wonder what the huge difference is	17:19.44
ray_laptop	henrys: Miles had them pay extra for that.	17:19.47
	I'll post my comments to the bug once I do the timings on peeves.	17:20.06
	(I'll do profile as well)	17:20.28
kens	TBH I think the answer is 'there are bg fixes in newer releases, sometimes these cause performance penalties for correct behaviour. You cna have fast or good, your choice' ;-)	17:20.33
ray_laptop	kens: may be -- but first I want to get the facts	17:20.52
kens	Yeah, I was thinking of looking at it, but got diverted, and then realised it wasn't pdfwrite	17:21.23
Robin_Watts	kens: performance differences between 8.x and 9 is probably color management - but ray_laptop is right that we'd like to be sure of that.	17:21.28
mvrhel_laptop	sure. blame it on the color... :)	17:21.56
ray_laptop	mvrhel_laptop: if it is the color, I'll assign the bug to you :-)	17:22.15
kens	It could be almost anything, including the fact that it uses FreeType, which is what fixed hteir original problem.	17:22.28
	Bear in mind that Chris has done some perfomrmance fixes in that area since the release of 9.04	17:22.54
ray_laptop	kens: iirc, FT is slightly faster than the AFS (Artifex Font Scaler)	17:23.13
kens	But that's what made me think of 'yes,. its right now, but slower' as an answer	17:23.13
	ray_laptop : There was something that made it slower, Chris fixed it.	17:23.27
ray_laptop	kens: don't worry, I will also time HEAD	17:23.33
kens	I think it was something about freeing memory	17:23.39
chrisl	Yeh, confusion about what Freetype does and when.	17:24.30
kens	Anyway, time for me to go, goodnight all	17:24.34
AlecTaylor	Alright, well thanks very much Robin_Watts and kens. I will be sure to include the additional relevent information in my research paper, and proceed without reprocessing the PDF file, but just showing the XML as PoC	17:24.35
chrisl	ray_laptop: there was a FAPI/Freetype performance regression in 9.04, which I fixed, and actually fixed what I was trying to do when I introduced the problem - in theory, it should be slightly faster than 9.02, but I suspect it won't be measurable	17:26.46
ray_laptop	chrisl: we'll see. I'll let you know if the profile shows that it is font related in the HEAD rev.	17:27.36
chrisl	ray_laptop: okay, thanks.,	17:27.50
	ray_laptop: and just in case, the -dDisableFAPI command option still works if you want back-to-back FAPI/AFS numbers.	17:29.08
ray_laptop	chrisl: thanks for the hint	17:30.05
henrys	chrisl:so how can shelly run the clusters but not have an account?	17:31.50
chrisl	henrys: I have no idea - does git have its own permissions?	17:32.27
Robin_Watts	chrisl: He does have an ssh account.	17:32.41
henrys	and he has gs-priv	17:33.05
	he's in the group	17:33.11
	my guess is he's trying to check into http read only.	17:33.24
chrisl	Okay, let me talk to him, and see if it's a settings problem......	17:33.28
Robin_Watts	If he's using "git push" then he doesn't need to have an account.	17:33.31
	sorry "git cluster".	17:33.41
henrys	we want him to be able to commit and I don't see why he can't.	17:34.33
Robin_Watts	Me either. Tell him to log in here, and we'll talk him through it.	17:36.34
ray_laptop	if he cloned from the http, then his remote.origin may be wrong	17:36.45
chrisl	If the worst comes to the worst, he can come down here, and I'll hit his computer with a mallet until it works.	17:36.52
henrys	ray_laptop:that's what I suspect	17:36.55
ray_laptop	have him check his git config -l	17:37.01
Robin_Watts	git remote -v	17:37.16
	chrisl: Is there a reason he doesn't log in here ?	17:37.46
ray_laptop	Robin_Watts: yeah, that has the info too :-)	17:37.47
chrisl	Robin_Watts: timing, mostly.	17:38.16
ray_laptop	with so many people here with strange nicks, how can you tell he's not	17:38.27
Robin_Watts	He'd have to be up at VERY odd hours for someone not to be here :)	17:38.44
henrys	he's been on before - IRC may not be compatible with his work situation.	17:39.10
ray_laptop	there's alway webchat	17:39.30
	s/alway/always/	17:39.41
Robin_Watts	henrys: Presumably he's not working for us while at work..	17:39.47
ray_laptop	Robin_Watts: why not ?	17:40.04
	;-)	17:40.10
henrys	that's what I'd do, why waste my free time ;-)	17:40.11
chrisl	Robin_Watts: I'd probably best not comment on that......	17:40.39
AlecTaylor	:P	17:41.35
chrisl	henrys: I've dropped a mail to Shelly, saying what we think - he'll give me a call if it's not what we think	17:47.01
henrys	chrisl:please do invite him to irc next time you talk, I assume I don't need to answer his last email since youare are talking to him.	17:47.08
	oops should have said that sooner.	17:47.25
chrisl	henrys: I do mention IRC whenever we talk - I'll keep doing so	17:48.36
henrys	thanks	17:49.01
chrisl	henrys: it would be good if you could reply to him wrt to the billing - seeing us right for the bounty fix that was wrong	17:50.48
henrys	chrisl:will do.	17:51.37
chrisl	Thanks - that's the proverbial "above my pay grade" ;-)	17:52.01
henrys	for those of us that have been kicked upstairs before we break something ;-)	17:52.59
*chrisl*	refuses to comment ;-)	17:53.49
henrys	I must warn you guys before the meeting I've cut my hair so try to recognize me. I've had some confusing encounters...	17:54.42
chrisl	Wow! Was it weighing you down too much?	17:55.11
Robin_Watts	henrys: What would you have got if you'd won the bet?	17:56.11
henrys	as you age it gets thinner and you've got cut it back so it'll get thicker.	17:56.45
	;-)	17:56.49
Robin_Watts	henrys: I'm not sure that works... by that token I'd have really thick hair.	17:57.26
	:)	17:57.39
henrys	according to 23andme I have the genes for male pattern baldness but it hasn't taken hold yet.	17:59.30
ray_laptop	henrys: have you told Miles and Scott ? (it would be fun to see their reaction)	17:59.32
henrys	no I haven't told them.	17:59.59
	I've had a ponytail for 15 years, I didn't go down easily ;-)	18:01.25
mvrhel_laptop	wow	18:02.00
	I had told my kids about your long hair. Luckily Alden had seen you before otherwise they won't believe me when they see you in Miami	18:02.37
henrys	it's still longer than Robin_Watts' ;-)	18:03.23
Robin_Watts	mvrhel_laptop: Tell them an alligator got to it.	18:04.10
henrys	my kids were shocked, they don't have memory of me with short hair.	18:05.12
mvrhel_laptop	You should be faster in your runs now	18:05.52
henrys	anyway speaking of Florida it's be great to have a get together with everybody, I'm glad Miles invited spouses, families and all.	18:06.15
Robin_Watts	mvrhel_laptop: You're forgetting the Samson effect.	18:06.20
mvrhel_laptop	good point	18:06.27
henrys	the stats so far indicate a slowdown but we'll see.	18:06.43
AlecTaylor	lolol	18:07.21
henrys	chrisl:I notice the jbig2 stuff is going to get complicated if he has sjbig2.c changes - have we looked at git externals yet? This business of having 2 repos for jbig2 is a problem.	18:14.14
chrisl	henrys: I thought we wanted to avoid externals	18:15.09
Robin_Watts	god yes, we want to avoid externals.	18:17.13
	not even sure it's possible with git.	18:17.18
chrisl	git submodule	18:17.34
henrys	well I thought git externals were better than svn externals and tor was going to ease us into it. But I haven't read about git externals myself.	18:17.39
ray_laptop	my kids were shocked when I showed them pictures of me with a beard	18:20.58
	henrys: did you donate your hair ?	18:21.23
	my beard wasn't long enough to donate ;-)	18:21.55
*AlecTaylor*	needs a haircut, his hair is almost down to his lobes!	18:22.01
*AlecTaylor*	shaved for the first time in 2.5 weeks a few days ago	18:22.17
henrys	ray_laptop:yes locks of love	18:22.18
chrisl	Robin_Watts, henrys: git submodules pulls in a specific revision of the sub-project - so we'd have the same issue of not being able to commit to it.	18:22.28
ray_laptop	henrys: good going !!	18:22.30
henrys	chrisl:so what to do about jbig2 continue to maintain it in 2 repos?	18:24.19
Robin_Watts	Why can't we make the ghostpdl repo the 'one true repo' for jbig2? We can do releases in turn with the ghostscript releases.	18:25.30
chrisl	henrys: if I'm honest, my inclination would be to make the ghostscript jbig2dec the "canonical" repos, and kill the other one	18:25.35
henrys	chrisl:the mail I've sent so far suggests an API change requires updates in both repos and regular fixes go to the standalone jbig2, then you'll grab that as you please.	18:25.38
	I tried that and tor didn't like it.	18:25.58
	but I guess he can be outvoted - I didn't know you guys would buy into that also.	18:27.00
tor8	henrys, chrisl: that'l be painful for mupdf thirdparty libraries, but as long as we provide release tar balls of jbig2dec we're not in a worse spot than with zlib	18:27.34
chrisl	tor8: where do you get the jbig2dec source for the third part libs?	18:28.13
tor8	at the moment I have a (private, on my machine) git repo which uses submodules to pull in all third party libs	18:28.22
	from the jbig2dec git on casper	18:28.34
	there are many ways to have multiple repos for one project with git	18:29.14
	submodules is just one way	18:29.20
	android uses the 'repo' tool	18:29.26
henrys	chrisl:so can I tell shelly to just check into gs?	18:29.29
chrisl	henrys: for now, I think so. Let's see if we can hammer out a solution keeping the separate repositories - if we do, we'll up date it from the gs code	18:30.56
tor8	henrys: if we want to start using submodules I can set that up, but it means all of us have to run a few more git invocations to keep things in sync	18:31.04
	chrisl: I wonder if git cherrypick will work for that, though I think the different paths will pose a problem	18:31.57
chrisl	tor8: could we use some triggers so that commits into the gs/jbig2dec get mirrored to the jbig2dec repos and vice versa?	18:32.22
henrys	tor8:maybe something for the meeting - I wouldn't expect a warm reception to new git training.	18:32.38
tor8	chrisl: I think it'd be relatively easy to make a script pull and update the jbig2dec sources in gs if we use the external jbig2dec git as a master	18:32.56
	henrys: yeah. conceptually git submodules work great, but they aren't trivial to use :(	18:33.27
	you need to run a few extra commands to keep the subrepositories updated	18:33.47
chrisl	Yeh, it was the other way (gs->jbig2dec) I was wondering about	18:34.01
Robin_Watts	tor8: Why not have a script that automatically recommits any changes to the ghostscript version into the other repo ?	18:34.01
ray_laptop	since jbig2dec doesn't get updated frequently, it might work	18:34.14
henrys	tor8:the immediate issue is shelly has a fix that requires a change to gs (sjbig2.c) and jbig2 - somehow this must be syncronized.	18:34.21
ray_laptop	henrys: but jbig2 _is_ in the gs repos	18:34.56
henrys	ray_laptop:it is in 2 places.	18:35.16
ray_laptop	is the problem the shared lib staying in sync ?	18:35.39
tor8	henrys: that's not a problem if we use git submodules (since we'd update the submodule version in the same commit as sjbig2.c)	18:35.44
*ray_laptop*	votes for disallowing shared lib support in gs builds ;-)	18:36.59
tor8	henrys: one approach is to make the jbig2dec.git auto-generated by filtering out all non-jbig2dec related stuff from the main repo	18:37.08
*tor8*	agrees with ray! I hate shared libraries.	18:37.24
henrys	fair enough - the simplest thing is to make gs canonical and do jbig2 tarballs but that solution seems to have slipped away.	18:37.25
Robin_Watts	What tor8 just said.	18:37.36
tor8	henrys: let's start with that, and I'll work on a script to to create a read-only jbig2dec subset git	18:37.52
Robin_Watts	(about auto-generating from the main repo).	18:37.54
henrys	tor8:I'm sure chrisl will appreciate that.	18:38.13
tor8	it's very similar to what I did when converting the svn to git repos	18:38.16
	henrys: put it on the tech agenda so you can remind me later	18:39.11
chrisl	tor8: if it's doable that'd be great, but if it's a problem I don't mind adding updating of the jbig2dec repos to the Ghostscript release process - we just need to make sure people don't generally commit to it.	18:39.55
tor8	chrisl: we can add hooks to disable commits, or just change the file permissions.	18:40.30
henrys	tor8:will do about the agenda for now shelly will commit to gs - and any bug that affects mupdf you'll get one way or another.	18:40.43
tor8	the issue is filtering the whole gs history to create a jbig2dec repo is rather cpu expensive, so we probably don't want to do that on every commit	18:41.05
chrisl	henrys: is this a jbig2dec API change, or just a change in how GS uses it?	18:41.35
	tor8: once a week is probably (more than!) enough	18:41.59
tor8	chrisl: indeed!	18:42.28
chrisl	And if even that uses too much CPU time - once a month......	18:43.19
Robin_Watts	tor8: Once we've generated it, can we not do incremental updates?	18:43.38
	i.e. use a hook on the golden repo on casper so that whenever we commit to ghostpdl in the jbig2dec dir, it recommits to the jbig2dec standalone ?	18:44.23
tor8	Robin_Watts: I'll have to look into it, but I think it should be possible.	18:44.32
Robin_Watts	That way we get it instantly in sync, for low cost.	18:44.39
tor8	git filter-branch is what I was thinking of using	18:44.51
	git filter-branch --subdirectory-filter gs/jbig2dec --prune-empty ...revs...	18:45.25
henrys	chrisl:now I can't find the patch, hang on.	18:45.51
chrisl	henrys: it's really for tor8's benefit - if the jbig2dec API changes, mupdf will need to be aware of it, too.	18:46.44
falko_	hi is it possible to create a rpt port to ghosscript and if i print i print to a printer, create a psfile and after that start a executable (that moves the ps file around) ?	18:48.18
	at the moment i have a script that checks every 3 sec for a new file ,.. does the printing to the printer and teh moving around .. but that is not a very nice solution	18:49.32
henrys	chrisl:apparently the sjbig2.c is reference in his email as being in the patch but it isn't there?	18:49.49
chrisl	henrys: he did ask how we wanted it - I may have misinterpreted what he said.	18:51.02
henrys	figure it out later if he commits it's easy enough to back out.	18:55.30
ray_laptop	mvrhel_laptop: the color stuff is DEFINITELY in the top of the profile for the performance (904 vs 871) :-(	18:58.34
mvrhel_laptop	:(	18:59.00
	ray_laptop: if you want to dump this on me, that is fine	18:59.19
ray_laptop	mvrhel_laptop: cmsSample3DGrid is particluarly heavy.	18:59.34
	mvrhel_laptop: I think the issue is that 871 runs with 'simple' color and 90x effectively always runs with -dUseCIEColor	19:00.12
chrisl	henrys: Okay, getting Shelly able to commit should resolve the API question from a GS point of view, and I'll make sure to tell him to inform tor8 if the API really has changed.	19:00.56
mvrhel_laptop	ray_laptop: yes. I guess what I need to do is to get the "dumb" CMM in place	19:01.18
	for those who don't want color management	19:01.28
ray_laptop	mvrhel_laptop: we'd need a 'dumb' cms that totally ignored the ICC profiles	19:01.33
mvrhel_laptop	hehe	19:01.38
	that is on my todo list. and I did start it	19:02.04
henrys	what is the effect of just using the dump profiles for now?	19:02.17
ray_laptop	mvrhel_laptop: but the complication is that we can only do that for the 'default' colorspaces -- If Lab or ICC comes in, we still need to handle it	19:02.30
mvrhel_laptop	henrys: they will still use lcms	19:02.52
	ray_laptop: yes	19:03.00
	I need to think a bit about this	19:03.07
henrys	mvrhel_laptop:it may be unique to pcl but many jobs run significantly faster with the dump profiles.	19:03.29
mvrhel_laptop	interesting.	19:03.39
	I do have an idea though how to do this	19:03.46
	basically we have an interface between the graphics library and the CMM and we know when the profile was a substituted one for a defaultRGB etc	19:04.24
	this is needed for PDFwrite	19:04.29
ray_laptop	mvrhel_laptop: if the "quick and dirty" cmm recognized the input ICC profiles enough to recognize that it is RGB, Gray or CMYK then the 'link' profile it returns can be a suitable dummy that the 'quick and dirty' color conversion code can recognize and shortcut	19:04.36
henrys	I think there are many cases in the code where you don't get "pure" colors with the fancy profiles.	19:04.44
mvrhel_laptop	yes exactly	19:04.45
	let me look at adding that as an option	19:05.07
ray_laptop	mvrhel_laptop: OK. But let's decide when that fits with your other priorities	19:05.36
Robin_Watts	I'm about to risk breaking the cluster.	19:07.22
	Anyone have any jobs they want to get in desperately?	19:07.32
ray_laptop	Robin_Watts: not me	19:07.39
mvrhel_laptop	well I am doing a run right now	19:07.49
	oh it just finished	19:07.55
Robin_Watts	This will only affect new jobs.	19:08.01
mvrhel_laptop	I am done. Going to do a commit now	19:08.23
Robin_Watts	ok, I've swapped to a modified run.pl that should at least get the build right on windows cluster machines.	19:08.58
mvrhel_laptop	actually I may run one more real quick	19:09.03
Robin_Watts	Go for it.	19:09.09
	In theory with no windows cluster machines you should see no differences.	19:09.19
mvrhel_laptop	ok	19:09.27
henrys	Robin_Watts:oh are you getting the windows node going?	19:10.20
Robin_Watts	henrys: trying to.	19:10.29
tor8	chrisl, kens: more ammo if you don't like the _t suffix ... it's reserved by POSIX so we shouldn't be using it in the first place!	19:17.38
ray_laptop	you mean like tin64_t ?	19:19.23
tor8	yeah	19:19.36
ray_laptop	int64_t that is :-/	19:19.39
tor8	posix makes the claim on *_t in ANY header	19:20.11
ray_laptop	that seems a bit excessive	19:20.47
	what happened to C standard	19:21.07
tor8	yeah, but _t is an ugly wart so I don't complain :)	19:21.15
	the C standard, it got let out in the wild... :/	19:21.45
	looking at the list of name spaces that posix lays claim on, it's pretty excessive even without *_t!	19:22.42
Robin_Watts	ok, we have a windows cluster node... and it's pinging the cluster, let's try doing a build.	19:22.56
	My cygwin installation appears to be having problems. Gah.	19:25.38
ray_laptop	btw, the HEAD rev is virtually the same on this test as 904, so FT isn't bolliixing anything up	19:25.42
henrys	bbiab	19:29.07
mvrhel_laptop	bbiaw	19:45.50
Robin_Watts	mvrhel_laptop: I've taken my node offline, so your job should run without a problem.	19:48.35
mvrhel	Robin_Watts: you around?	20:51.25
Robin_Watts	I am.	20:51.31
mvrhel	so I was just testing 692512 to see if we could close this	20:51.55
	and I see that we get horizontal lines at 600dpi	20:52.09
	I thought we had this fixed	20:52.18
Robin_Watts	so did I :(	20:52.46
mvrhel	the bug was originally for vertical lines	20:52.50
	I have a single page version of this file if you want to take a look at it	20:53.06
Robin_Watts	Assign it to me, and I'll have a look when I uncluster myself.	20:53.09
mvrhel	ok.	20:53.15
	bbiaw	20:55.36
Robin_Watts	Damn. Looks like I really am going to need to check out another copy of tests/tests_private.	21:08.01
	dinnertime.	21:08.21
mvrhel_laptop	stormy here. we may lose power	23:43.32
	Forward 1 day (to 2011/11/17)>>>

IRC Logs

Log of #ghostscript at irc.freenode.net.