Ghostscript IRC logs

	<<<Back 1 day (to 2020/08/30)	Fwd 1 day (to 2020/09/01) >>>	20200831
myopia	reading the pdf specs, how the pdf interpreter indirect object references has me perplexed. say, assuming the interpreter reads from beginning to end, and we are now at "1 0 obj" part of which references "2 0 R" in the Length entry. however, "1 0 obj" contains purely binary data stream which includes in the binary stream multiple instances of "endobj" and "endstream" before the actual PDF endobj/endstream. to make the matter worse, it also features		09:00.16
	"2 0 obj" after some of the endobj/endstream. then, how is the pdf interpreter supposed to know where the real "2 0 obj" begins?		09:00.16
chrisl	myopia: That's for the xref table is for		09:02.09
	myopia: That's what the xref table is for		09:02.20
myopia	erh... what if the octet stream also included "xref", "trailer" and "startxref"...		09:03.12
	for "1 0 obj", that is		09:03.21
	then shouldn't the spec define a special offset that gives the real xref?		09:03.52
chrisl	It does		09:04.01
myopia	can you give the section number in the form of x.y.z that deals with this?		09:06.36
chrisl	I only have the PDF 1.7 ref manual currently to hand, but in there, you should read 3.4.4 ("File Trailer")		09:08.38
myopia	my thanks		09:09.19
	gotta say I prefer random access dictionaries in the fore with padding for room of growth		09:12.25
ator	myopia: also the /Length entry in the object states how many bytes are between the stream and endstream keywords		09:24.11
	myopia: to answer your #mupdf question from yesterday, "mutool convert" should be able to convert JP2 images to PDF.		09:24.38
myopia	except that Length could refer to a "next" object which again goes to the "3.4.4 File Trailer"		09:26.10
ator	myopia: correct. you need the xref in order to read a PDF file correctly in all cases.		09:26.50
	most times you can recover a PDF file that has a missing xref table, but as you say, if the stream has "endstream" in the content then you're screwed.		09:27.38
myopia	ator: and thanks for the file conversion part, but I later tried to write my own script for the express purpose of generating a minimum pdf file		09:27.56
ator	but most PDF streams are compressed, so the likelyhood of that is extremely small		09:27.57
	mutool convert will write pretty much a minimal PDF file		09:28.58
	the only thing it could write smaller is to inline the resources dictionary		09:29.23
myopia	continuing on xrefs, a binary file format could also solve the problem of xrefs. say, we have a file that is an array of records, which is <record type><record length><further data>. let's say the first record has to be a master record dictionary which keeps a list of offsets of later records. this special record has the form <record type = 0><record length><record next offset><further data>. the <record next offset> allows the file to append at		09:39.05
	the end of the file for the purpose of incremental upgrades. and constructing a full master dictionary can entail reading a linked list of records		09:39.05
kens	Feel free to propose that to Adobe as a replacement for PDF.		09:39.31
	In the meantime we have to live with what the specification says		09:39.44
ator	myopia: the xrefs are a binary file format, with fixed length records. it also happens to be mostly-human-readable.		09:48.06
myopia	(and the text-binary pdf format could assign/mandate an entry "TrailerOffset" or "XRefOffset" alongside "Version" and Producer" without breaking memory caching order by reading from the end)		09:51.16
ator	startxref at the end of the file has the offset of the first xref section, the trailer at the end of the xref may have a /Prev which links to other xref sections. that's what's used to link together incremental updates.		09:52.37
	you need to start reading from the end, because when incrementally updating you don't want to change what's already there, you just append.		09:53.34
myopia	(you could also pre-pad the number of characters with whitespaces, say "/TrailerOffset 200 SP SP SP... CR LF", but once we were onto the boat of reading from the end, this is becoming a pattern of convenience, I guess, and this is probably what gives PDF the random access pattern)		09:57.43
	<<<Back 1 day (to 2020/08/30)	Forward 1 day (to 2020/09/01)>>>

Log of #ghostscript at irc.freenode.net.