Log of #ghostscript at irc.freenode.net.

 <<<Back 1 day (to 2020/08/30)Fwd 1 day (to 2020/09/01) >>>20200831 
myopia reading the pdf specs, how the pdf interpreter indirect object references has me perplexed. say, assuming the interpreter reads from beginning to end, and we are now at "1 0 obj" part of which references "2 0 R" in the Length entry. however, "1 0 obj" contains purely binary data stream which includes in the binary stream multiple instances of "endobj" and "endstream" before the actual PDF endobj/endstream. to make the matter worse, it also features 09:00.16 
  "2 0 obj" after some of the endobj/endstream. then, how is the pdf interpreter supposed to know where the real "2 0 obj" begins?09:00.16 
chrisl myopia: That's for the xref table is for09:02.09 
  myopia: That's what the xref table is for09:02.20 
myopia erh... what if the octet stream also included "xref", "trailer" and "startxref"...09:03.12 
  for "1 0 obj", that is09:03.21 
  then shouldn't the spec define a special offset that gives the *real* xref?09:03.52 
chrisl It does09:04.01 
myopia can you give the section number in the form of x.y.z that deals with this?09:06.36 
chrisl I only have the PDF 1.7 ref manual currently to hand, but in there, you should read 3.4.4 ("File Trailer")09:08.38 
myopia my thanks09:09.19 
  gotta say I prefer random access dictionaries in the fore with padding for room of growth09:12.25 
ator myopia: also the /Length entry in the object states how many bytes are between the stream and endstream keywords09:24.11 
  myopia: to answer your #mupdf question from yesterday, "mutool convert" should be able to convert JP2 images to PDF.09:24.38 
myopia except that Length could refer to a "next" object which again goes to the "3.4.4 File Trailer"09:26.10 
ator myopia: correct. you need the xref in order to read a PDF file correctly in all cases.09:26.50 
  most times you can recover a PDF file that has a missing xref table, but as you say, if the stream has "endstream" in the content then you're screwed.09:27.38 
myopia ator: and thanks for the file conversion part, but I later tried to write my own script for the express purpose of generating a *minimum* pdf file09:27.56 
ator but most PDF streams are compressed, so the likelyhood of that is extremely small09:27.57 
  mutool convert will write pretty much a minimal PDF file09:28.58 
  the only thing it could write smaller is to inline the resources dictionary09:29.23 
myopia continuing on xrefs, a binary file format could also solve the problem of xrefs. say, we have a file that is an array of records, which is <record type><record length><further data>. let's say the first record has to be a master record dictionary which keeps a list of offsets of later records. this special record has the form <record type = 0><record length><record next offset><further data>. the <record next offset> allows the file to append at 09:39.05 
  the end of the file for the purpose of incremental upgrades. and constructing a full master dictionary can entail reading a linked list of records09:39.05 
kens Feel free to propose that to Adobe as a replacement for PDF.09:39.31 
  In the meantime we have to live with what the specification says09:39.44 
ator myopia: the xrefs are a binary file format, with fixed length records. it also happens to be mostly-human-readable.09:48.06 
myopia (and the text-binary pdf format could assign/mandate an entry "TrailerOffset" or "XRefOffset" alongside "Version" and Producer" without breaking memory caching order by reading from the end)09:51.16 
ator startxref at the end of the file has the offset of the first xref section, the trailer at the end of the xref may have a /Prev which links to other xref sections. that's what's used to link together incremental updates.09:52.37 
  you need to start reading from the end, because when incrementally updating you don't want to change what's already there, you just append.09:53.34 
myopia (you could also pre-pad the number of characters with whitespaces, say "/TrailerOffset 200 SP SP SP... CR LF", but once we were onto the boat of reading from the end, this is becoming a pattern of convenience, I guess, and this is probably what gives PDF the random access pattern)09:57.43 
 <<<Back 1 day (to 2020/08/30)Forward 1 day (to 2020/09/01)>>> 
ghostscript.com #mupdf