Ghostscript IRC logs

	<<<Back 1 day (to 2017/02/05)	20170206
tor8	sebras: for the logs. in PDF the 0 byte is whitespace. see table 3.1 white-space characters.	11:17.39
	I suspect Robin_Watts just picked the first best iswhite() function for use in mudraw :)	11:18.13
	the use of octal characters bothers me though... I'd just use decimal if I were to have written it today.	11:19.26
Robin_Watts	tor8: i think I have a couple of commits pending.	11:30.03
tor8	Robin_Watts: fix win32 and -i to invert LGTM, though maybe use capital -I for invert?	11:31.09
	to match the -I invert flag of mudra	11:31.39
	w	11:31.40
Robin_Watts	Good point. Will fix.	11:31.47
	fixed commit, plus 1 more tiny one.	11:36.04
tor8	Robin_Watts: might want to fix '-i' in the commit message too?	11:37.31
Robin_Watts	D'Oh.	11:37.41
	Done.	11:38.25
tor8	Robin_Watts: those 3 commits LGTM.	11:38.57
Robin_Watts	Ta	11:39.12
tor8	I tried an approach to improve staying on the same "page" when doing text re-layout	11:40.01
	I'm not entirely happy with it though	11:40.07
	I take a temporary 'bookmark' to a location in the text (the first bit of text on a page), layout the document with the new font size, and find the page that contains the same marked location again	11:41.06
	but it's obviously not reversible. changing font size up and then back down to the same, you end up at a different page	11:41.32
Robin_Watts	How about... take a temporary bookmark to a location in the text.	11:42.05
tor8	maybe I'm just trying too hard, and there's a simpler approach by just using current_page / page_count?	11:42.07
Robin_Watts	Keep that temporary bookmark until you change page.	11:42.17
	That way if you zoom up and down, the bookmark is still in the same place.	11:42.39
tor8	taking the bookmarks is expensive, but keeping it around unless a page change occurs might help that use case yeah	11:43.11
	my other gripe is how they are temporary and that's going to lead users into confusion trying to save the bookmarks in a preference file or something	11:43.38
	at the moment I just use a raw pointer to the fz_html_flow node	11:44.09
Robin_Watts	I'm not sure what else you could use, unless you start using counts of html flow entries or something - and that'll be screwed when we change the structure at all.	11:45.05
tor8	yeah. still not thrilled about the API implications though.	11:45.35
acharles	Hi.	19:57.02
ray_laptop	acharles: hi back	21:07.06
	(1 hr later)	21:07.20
acharles	haha	21:07.36
	I had a few questions about ghostscript and the way it handles pdf files.	21:07.57
ray_laptop	acharles: go ahead	21:08.10
	I can most likely answer those	21:09.02
	the general answer is "very well" :-)	21:09.38
acharles	Is the postscirpt interpreter used to process pdf files as well or does pdf have a different code path?	21:10.09
ray_laptop	the PS interpreter processes the PDF input, invoking PS operators to actually do stuff (images, text, other graphics)	21:13.24
	the PostScript that does that is in Resource/Init/pdf*.ps	21:13.47
	Note that the "scanner" that processes PDF input is in C -- it's not like PostScript is trying to read the PDF directly (except at the very start to find out if the input is PDF or not)	21:15.30
acharles	Ah, so you take advantage of the fact that pdf is a subset of postscript to use postscript functions to process the pdf.	21:15.51
ray_laptop	acharles: well, PDF is actually a disjoint set (not a subset), but yes, the syntax is similar enough that the scanner has only a few special "tweaks" for PDF	21:17.23
	but things like << ... ... >> defining a dictionary and strings being enclosed in (...)	21:18.17
acharles	Ah, I didnât know it was disjoint.	21:18.41
ray_laptop	well, at the operator level, PDF has transparency and the concept of "streams" but doesn't have some of the noisome PS operators like file manipulation	21:20.01
acharles	Ah, I guess that makes sense.	21:21.02
ray_laptop	but our scanner has "PDF_SCAN_RULES" for a couple of exceptions	21:23.47
kens	lurking	21:23.49
ray_laptop	hi kens. ISTR there is something about names in PDF as well, right ?	21:24.24
kens	spaces in names	21:24.37
	and other non-priontable characters	21:24.44
	or even printable	21:24.52
	THe original point was that hte graphics model of PDF matched that of PostScript, so a one-to-one mapping was trivial, it tgherefore made sense to wite a PDF processor in PostScript.	21:25.36
	Since then, well things have changed....	21:25.45
	And many PDF files break the specification, but Acrobat opens them so we have to too. Which makes our handling much, much more complicated than it should be.	21:26.42
ray_laptop	ah, it is that ANY character except NUL can be in a name using hex	21:26.43
kens	I think a NULL can be in a name too, you just escape it with #	21:27.03
ray_laptop	kens: PDF 1.7 spec section 3.2 (p 57) excludes NUL	21:27.52
kens	acharles is there a reason for wanting to know all this ? Its probabl;y not useful....	21:27.59
	ray_laptop well, this is all form memory for me, I don't have the spec open in front of me	21:28.12
acharles	Yes, there is. :)	21:28.16
ray_laptop	kens: I cheated and opened the spec :-)	21:28.30
kens	It might be easier to explain what your goal is	21:28.35
acharles	My goal is âsecure pdf processingâ, but thatâs vague. Iâm just doing some investigative work and I figured asking here made more sense than reading the code for days on end. :)	21:30.20
kens	Well, PDF is pretty secure if you avoid JavaScript	21:30.39
acharles	Yes, but postscript is not	21:30.57
kens	Though its also a good idea to prevent PostScript XObjects	21:31.01
ray_laptop	kens: I think GS disables PS XObjects by default	21:31.38
kens	acharles, but you cannot execute random PostScript in a PDF file using Ghostscript	21:31.45
	Obviously if you send PostScript that's a different matter	21:32.03
ray_laptop	acharles: and GS has "SAFER" mode that is supposed to make it more secure	21:32.04
	(for PS or PDF input)	21:32.30
acharles	Yeah, Iâm assuming -dSAFER is enabled.	21:32.44
ray_laptop	and since GS doesn't use JS, there isn't a problem there	21:32.57
acharles	How does GS detect pdf vs ps input?	21:33.08
kens	THough (as the recent news showed) if you are running a job server its a good idea to set the job server password to something other than 0 :-)	21:33.26
ray_laptop	acharles: using PS code in Resource/Init/pdf_main.ps	21:33.30
kens	acharles depensd how you invoke it	21:33.38
	ray_laptop you can use pdfrun directly	21:33.46
ray_laptop	kens: true, then we don't even try to "detect:	21:34.04
acharles	how does pdfrun work?	21:34.21
ray_laptop	acharles: basically if you don't use pdfrun and just "run" an input file, it looks at the first 1024 bytes for the PDF header	21:34.53
kens	Its an internal Ghostscript thing, you give it a filename and it runs it as a PDF file	21:34.54
	Hmm, actually that may not be entierly correct.	21:35.32
	Probably best not to rely on memory at this time of night	21:35.43
ray_laptop	kens: actually, I think you have to send it a PS file	21:35.50
	filetype, not a string that contains the filename	21:36.07
kens	ah runpdfbegin maybe	21:36.31
	Oops no there it is, runpdf, which calls runpdfbegin :-)	21:36.57
ray_laptop	acharles: so you need to make a PDF file type, which can be done with: (filename.pdf) (r) file runpdf	21:37.14
	kens: right -- they both expect a filetype	21:37.37
kens	Indeed	21:37.42
	It would be trivial to define a function to take a filename, but why bother....	21:38.07
ray_laptop	kens: agreed	21:38.19
acharles	Ah, thatâs not exposed as a command line option?	21:38.23
kens	You can use -c and -f	21:38.33
	to send PostScript directly	21:38.40
	so -c "(filename.pdf) (r) runpdf" -f	21:39.01
ray_laptop	kens: the -f doesn't really do anything other than get out of -c mode, so is rather useless if -dBATCH is given	21:40.15
kens	Note that the pdf.ps files constitute a rather large PostScript program, one of the things it will do is attempt to validate the PDF file. So if you send it a PostScript file it won't* run it, it will just complain its not a valid PDF file	21:40.25
acharles	What does the (r) parameter mean? I mean, it pushes r on the stack.	21:40.35
kens	makes it readable, like +r in C	21:40.45
ray_laptop	acharles: you can also do: echo (filename.pdf (r) file runpdf \| gs ... -	21:40.47
kens	If you wnted a writable file you would use (w)	21:41.11
acharles	Ah	21:41.36
ray_laptop	acharles: for that refer to the PLRM	21:41.38
acharles	Ah, file is the PS operator for opening a file	21:42.16
	that makes sense.	21:42.26
kens	yes exactly. It will leave a file object on the stack, which is then consumed by the pdfrun executable function	21:42.41
ray_laptop	darn, I forgot the ) after the filename.pdf and kens forgot the "file" operator, but acharles, I assume you get the idea	21:42.49
kens	Hey, its late here :)	21:43.01
acharles	I do	21:43.45
*kens*	thinks I'm doing well to be making any sense at all....	21:44.16
acharles	I only first read the PLRM on Friday and Iâm not used to stack based languages. But I think Iâm learning fast. :P	21:44.53
kens	If you only want to process PDF files, why not use MuPDF ?	21:45.29
ray_laptop	acharles: so if you want to use "runpdf", if the input file is NOT PDF, it will confuse the pdf_main.ps code that is trying to open it as a PDF and won't expose you to accidentally executing PS	21:45.30
kens	I wouldn't say it will confuse it exactly, it will reject it as an invalid and unfixable PDF file	21:46.16
ray_laptop	e.g., gs -c "(examples/colorcir.ps) (r) file runpdf quit"	21:48.27
	gives: Error: /syntaxerror in pdfopen	21:48.43
	acharles: well, you don't have to read ALL 912 pages -- just the first 700 or so ;-)	21:49.43
acharles	Does MuPDF offer pdf compression?	21:49.44
kens	I still think that if you don't need PostScript (or PCL) input, MuPDF is probably more appropriate.	21:49.45
ray_laptop	acharles: yes	21:49.50
kens	acharles ah, you want to modify the PDF files ?	21:49.59
ray_laptop	acharles: but pdf output from mupdf is rather limited	21:50.39
*kens*	was assuming rendering the PDF files was the goal	21:50.39
acharles	read an input file and create an output file that contains the same pdf, but linearized and compressed (perhaps with lower quality)	21:51.03
kens	Currently Ghostscript has more options for doing that.	21:51.21
ray_laptop	acharles: I don't know which (if any) mupdf can do,	21:51.41
acharles	And the runpdf command gives me an error about invalid file access, which I assume is due to using -dSAFER	21:51.54
kens	MuPDF can compress and linearize the file (though linearization is pointless) but I don't think it can currently d things like downsample images or subset fonts	21:52.02
	acharles yes, it will be.	21:52.15
	You only really need to worry about -dSAFER if you are using PostScript, PDF has no file operators	21:52.56
	Umm actually that's not totally true.	21:53.09
	It can link to other files.	21:53.17
	other PDF files I should say	21:53.24
	Anyway, I have to be off. GOt to go and feed the cat	21:54.16
	Goodnight all	21:54.21
acharles	Night	21:54.30
	Thanks	21:54.33
ray_laptop	acharles: I am in PST, so I'll be around for a while yet	21:55.08
acharles	Iâm also PST	21:55.59
ray_laptop	acharles: -dSAFER will limit the files you can read and write to.	21:56.01
acharles	Can I use -dSAFER and read from the pdf input file?	21:56.48
ray_laptop	if you use -DELAYSAFER and open the input file, such as with (filename.pdf) (r) file then you can use .setsafe to go into SAFER mode before running the file (with "run" or "runpdf")	21:57.31
	the filenames named on the command line as arguments are automatically allowed in SAFER mode	21:58.14
acharles	And we use -dColorImageResolution and -dSubsetFonts.	21:58.14
	So, I guess MuPDF isnât an option	21:58.25
ray_laptop	acharles: yes, those are on GS options (pdfwrite options)	21:58.35
	acharles:: the use of .setsafe is in doc/Language.htm that also discusses PermitFileReading PermitFileWriting, etc.	21:59.38
acharles	Should SAFER prevent using the status command on files when processing PostScript files? (unrelated to pdf processing)	22:00.40
	so, Iâm running `gs -dSAFER -dDELAYSAFER -c â(file.pdf) (r) file .setsafe runpdfâ -f`	22:04.28
ray_laptop	acharles: that is a question	22:04.29
acharles	It seems to work.	22:04.32
	And it gives me an error if I give it a PS file.	22:04.52
ray_laptop	hmm... I need to look into SAFER mode. It isn't doing what I expect (at least on Windows). I wonder if it is bitrotted	22:07.18
	this is NOT good	22:10.39
can-of-bees	having a hard time googling this -- is there a way for ghostscript to return the version of pdf? e.g. can i feed gs a pdf and have it tell me if the pdf is pdf/a?	22:12.30
	thanks in advance	22:12.40
ray_laptop	can-of-bees: not currently. It is contained in the XML Metadata, but our toolbin/pdf_info.ps doesn't currently dump any of the Metadata	23:08.07
	it is possible to write PS (or extend pdf_info.ps) to allow you to dump all or part of the Metadata	23:08.44
	The Metadata object is in the Catalog object (the document Root object from the trailer)	23:10.43
acharles	ray_laptop: Did you determine if SAFER is working as intended?	23:48.01
ray_laptop	acharles: haven't had a chance to look into it yet	23:49.30
	sorry	23:49.34
	Forward 1 day (to 2017/02/07)>>>

Log of #ghostscript at irc.freenode.net.