MuPDF IRC logs

	<<<Back 1 day (to 2018/01/04)	20180105
tor8	Robin_Watts: 3 commits for review on tor/master	10:58.21
Robin_Watts	tor8: Including the random one?	11:44.26
tor8	all excluding the freetype update one	11:44.44
Robin_Watts	I need to get the fix for cody in, and that's gated on the random one.	11:44.45
tor8	yes, that includes the random one	11:44.55
Robin_Watts	I see it, thanks.	11:45.34
	yeah, those 3 look good. ta.	11:46.20
	I shall rebase the ones on robin/master then.	11:46.29
	So there are 3 commits of mine on top of yours in robin/master ready for review.	11:47.47
tor8	Robin_Watts: Fix "being able to search for redacted text" bug. LGTM	11:49.41
	Robin_Watts: "Enable saving of encrypted PDF files." is unchanged, right?	11:52.05
Robin_Watts	yes.	11:52.12
	but it too was gated on the random thing I think.	11:52.20
tor8	The "Add ascii option to PDF object output." is not what we discussed, and it's broken too.	11:52.34
	the first two LGTM, but hold off on the "Add ascii option" commit	11:52.59
	I thought you were making the 'if not ascii' option print raw unescaped binary strings to save space	11:55.08
	it looks now like 'if ascii' we print ALL strings as <hexstrings> instead, which is not what we discussed (nor do I see the value of such a behavior)	11:56.12
Robin_Watts	tor8: ok, then we were at cross purposes.	11:57.46
tor8	Anyway, after thinking about it over my vacation, I am happy with encrypted strings going out as your current "Enable saving" patch does it.	11:59.01
Robin_Watts	ok, so I'll put that in and look again at the isascii patch.	11:59.26
tor8	I thought you were looking to squeeze more bytes out of it by saving unescaped binary strings.	11:59.51
Robin_Watts	tor8: I wasn't aware that unescaped strings were actually an option.	12:00.12
tor8	where if the 'isascii' is false, fmt_str would not do octal escapes	12:00.15
Robin_Watts	I'll need to reread that bit of the spec.	12:00.38
tor8	aha. then I see where we may have talked across each other, yes.	12:00.38
	you can have all bytes in a PDF string except '(', ')', and '\'	12:01.26
	parenthesises must be balanced (or escaped) and backslashes must be escaped for obvious reasons	12:01.45
Robin_Watts	tor8: Right, so it's simple enough. I'll put that on the list.	12:03.26
tor8	so we should end up with 3 ways to write strings: hexstrings, raw strings, and escaped strings. raw strings if !ascii, escaped/hex strings whichever is smaller if ascii.	12:04.18
	and most things should default to ascii, IMO.	12:04.53
	like in the JNI bindings	12:05.04
Robin_Watts	tor8: I believe that's how I have it set up.	12:07.02
Guest66018	sebras, thanks that cleared some of it up	15:08.57
	i guess what i'm trying to do is figure out why the example code i was given is producing entirely different output	15:09.31
	here is what the code i was given produces: https://pastebin.com/Hyp5auPA	15:09.39
	and here is what mutool show grep produces: https://pastebin.com/Bwmd5Svu	15:10.49
	and I can't figure out how to reconcile them	15:10.55
	ideally I need to produce the same output as the code I was given gets but it appears to be travesing things differently and getting different results	15:11.25
	both of those were examining the same file btw	15:11.40
sebras	Guest66018: the python code you show with the /Pages/Kids/Parent/MediaBox style "paths" are indeed resolving PDF object references and trying to express how these objects are releated.	15:12.21
	Guest66018: in the greppable output, if you look at object 1 (search for :1:) you can see that there is a /Metadata entry in that dictionary.	15:14.04
	Guest66018: its value is 81 0 R which means that is an object reference to object 81.	15:14.23
Guest66018	right, object 1 maps to the first few paths	15:14.24
sebras	Guest66018: next search for :81:	15:14.27
Guest66018	but after that is seems to diverge	15:14.33
sebras	in that dictionary you have Length Subtype and Type entries.	15:14.44
	now, in the python output if you look at e.g. /Metadata/PDFStream/Length you can see that it started in object 1, found the Metadata entry, realized that the Metadata entry points to an object which is ia PDFStream and then lists the entries in that stream object's dictionary part.	15:15.41
Guest66018	okay, i follow that	15:16.24
	so if i am using the mupdf library, i can just extract the dictionary from the object?	15:16.40
sebras	how does pdf_paths.py work in detail and what objects does it start with? I don't know. perhaps object 1 is the /Root object in the trailer of the PDF..? does pdf_paths.py ignore some objects? I don't know. :)	15:16.46
	Guest66018: you can manipulate objects programmatically from C (or Java) yes. so you ought to be able to list the entries in objects.	15:17.32
Guest66018	pdf_paths does ignore some objects yes	15:17.53
sebras	Guest66018: pdf_trailer() would e.g. give you the trailer that contains the /Root entry which is presumably object :1: in your particular file.	15:18.03
Guest66018	it starts with a pdfquery.PDFQuery(name).doc.catalog object	15:18.38
sebras	Guest66018: pdf_paths.py also make up fake object names in it's path like "PDFStream".	15:18.47
Guest66018	then after deque-ing it, walks it from there	15:18.48
	ah, yes, it is doing exactly that	15:19.16
	thanks, this helps a lot	15:20.37
	i may have other questions if that's alright	15:20.46
sebras	Guest66018: ok, so you need to open the document with something liek fz_open_document(), next call pdf_specifics() to access the PDF part of the document, next call pdf_trailer() next you might need pdf_dict_get_key() perhaps to iterate or pdf_dict_gets() if you already know the name of the thing.	15:20.51
Guest66018	also, "81 0 R"	15:21.07
	81 is the object index, what is 0 and R?	15:21.13
sebras	0 is the generation number. think of it like a version number. they used to be used when documents were updated, but recent PDFs don't really make use of them.	15:22.19
	R means it is an indirect object reference.	15:22.27
	you also have pdf_print_obj() (and fz_stdout()) if you want to print an object in its entirety (note that indirect references are not resolved)	15:23.25
	Guest66018: I hope this will get you started. :)	15:23.45
Guest66018	thanks, this is much better start than where i was	15:24.39
sebras	Guest66018: all of this presumes you are writing it in C.	15:24.52
Guest66018	yep, in C	15:24.57
sebras	Guest66018: there ought to be similar calls in javascript which you can try using mutool run documented over at https://mupdf.com/docs/manual-mutool-run.html if you like.	15:25.15
Guest66018	thanks, i'll take a look at that too	15:31.20
hellion	Hello! Has anyone on here successfully added mupdf to heroku for use with rails 5.2 ActiveStorage?	17:07.18
Robin_Watts	hellion: Are you familiar with the works of Gary Larson? :)	17:08.49
	https://c1.staticflickr.com/1/47/153603564_7281ad0588.jpg	17:09.26
	"blah blah blah blah MuPDF blah blah blah" :)	17:09.45
hellion	I am not familiar	17:13.54
Robin_Watts	He drew a cartoon strip called "The Far Side"	17:14.26
hellion	The Far Side....I do know him	17:14.39
Robin_Watts	My point was that most of that question went straight over my head :)	17:15.10
hellion	And, yet here you are....in the #mupdf discussion :)	17:15.39
Robin_Watts	MuPDF was the one bit I understood.	17:16.01
hellion	purhaps I should move over to an ActiveStorage discussion	17:16.10
Robin_Watts	I know nothing about heroku or rails.	17:16.22
	Best of luck.	17:16.37
hellion	thanks!	17:16.41
	Forward 1 day (to 2018/01/06)>>>

Log of #mupdf at irc.freenode.net.