Log of #mupdf at irc.freenode.net.

Search:
 <<<Back 1 day (to 2018/01/04)20180105 
tor8 Robin_Watts: 3 commits for review on tor/master10:58.21 
Robin_Watts tor8: Including the random one?11:44.26 
tor8 all excluding the freetype update one11:44.44 
Robin_Watts I need to get the fix for cody in, and that's gated on the random one.11:44.45 
tor8 yes, that includes the random one11:44.55 
Robin_Watts I see it, thanks.11:45.34 
  yeah, those 3 look good. ta.11:46.20 
  I shall rebase the ones on robin/master then.11:46.29 
  So there are 3 commits of mine on top of yours in robin/master ready for review.11:47.47 
tor8 Robin_Watts: Fix "being able to search for redacted text" bug. LGTM11:49.41 
  Robin_Watts: "Enable saving of encrypted PDF files." is unchanged, right?11:52.05 
Robin_Watts yes.11:52.12 
  but it too was gated on the random thing I think.11:52.20 
tor8 The "Add ascii option to PDF object output." is not what we discussed, and it's broken too.11:52.34 
  the first two LGTM, but hold off on the "Add ascii option" commit11:52.59 
  I thought you were making the 'if not ascii' option print raw unescaped binary strings to save space11:55.08 
  it looks now like 'if ascii' we print ALL strings as <hexstrings> instead, which is not what we discussed (nor do I see the value of such a behavior)11:56.12 
Robin_Watts tor8: ok, then we were at cross purposes.11:57.46 
tor8 Anyway, after thinking about it over my vacation, I am happy with encrypted strings going out as your current "Enable saving" patch does it.11:59.01 
Robin_Watts ok, so I'll put that in and look again at the isascii patch.11:59.26 
tor8 I thought you were looking to squeeze more bytes out of it by saving unescaped binary strings.11:59.51 
Robin_Watts tor8: I wasn't aware that unescaped strings were actually an option.12:00.12 
tor8 where if the 'isascii' is false, fmt_str would not do octal escapes12:00.15 
Robin_Watts I'll need to reread that bit of the spec.12:00.38 
tor8 aha. then I see where we may have talked across each other, yes.12:00.38 
  you can have all bytes in a PDF string except '(', ')', and '\'12:01.26 
  parenthesises must be balanced (or escaped) and backslashes must be escaped for obvious reasons12:01.45 
Robin_Watts tor8: Right, so it's simple enough. I'll put that on the list.12:03.26 
tor8 so we should end up with 3 ways to write strings: hexstrings, raw strings, and escaped strings. raw strings if !ascii, escaped/hex strings whichever is smaller if ascii.12:04.18 
  and most things should default to ascii, IMO.12:04.53 
  like in the JNI bindings12:05.04 
Robin_Watts tor8: I believe that's how I have it set up.12:07.02 
Guest66018 sebras, thanks that cleared some of it up15:08.57 
  i guess what i'm trying to do is figure out why the example code i was given is producing entirely different output15:09.31 
  here is what the code i was given produces: https://pastebin.com/Hyp5auPA15:09.39 
  and here is what mutool show grep produces: https://pastebin.com/Bwmd5Svu15:10.49 
  and I can't figure out how to reconcile them15:10.55 
  ideally I need to produce the same output as the code I was given gets but it appears to be travesing things differently and getting different results15:11.25 
  both of those were examining the same file btw15:11.40 
sebras Guest66018: the python code you show with the /Pages/Kids/Parent/MediaBox style "paths" are indeed resolving PDF object references and trying to express how these objects are releated.15:12.21 
  Guest66018: in the greppable output, if you look at object 1 (search for :1:) you can see that there is a /Metadata entry in that dictionary.15:14.04 
  Guest66018: its value is 81 0 R which means that is an object reference to object 81.15:14.23 
Guest66018 right, object 1 maps to the first few paths15:14.24 
sebras Guest66018: next search for :81:15:14.27 
Guest66018 but after that is seems to diverge15:14.33 
sebras in that dictionary you have Length Subtype and Type entries.15:14.44 
  now, in the python output if you look at e.g. /Metadata/PDFStream/Length you can see that it started in object 1, found the Metadata entry, realized that the Metadata entry points to an object which is ia PDFStream and then lists the entries in that stream object's dictionary part.15:15.41 
Guest66018 okay, i follow that15:16.24 
  so if i am using the mupdf library, i can just extract the dictionary from the object?15:16.40 
sebras how does pdf_paths.py work in detail and what objects does it start with? I don't know. perhaps object 1 is the /Root object in the trailer of the PDF..? does pdf_paths.py ignore some objects? I don't know. :)15:16.46 
  Guest66018: you can manipulate objects programmatically from C (or Java) yes. so you ought to be able to list the entries in objects.15:17.32 
Guest66018 pdf_paths does ignore some objects yes15:17.53 
sebras Guest66018: pdf_trailer() would e.g. give you the trailer that contains the /Root entry which is presumably object :1: in your particular file.15:18.03 
Guest66018 it starts with a pdfquery.PDFQuery(name).doc.catalog object15:18.38 
sebras Guest66018: pdf_paths.py also make up fake object names in it's path like "PDFStream".15:18.47 
Guest66018 then after deque-ing it, walks it from there15:18.48 
  ah, yes, it is doing exactly that15:19.16 
  thanks, this helps a lot15:20.37 
  i may have other questions if that's alright15:20.46 
sebras Guest66018: ok, so you need to open the document with something liek fz_open_document(), next call pdf_specifics() to access the PDF part of the document, next call pdf_trailer() next you might need pdf_dict_get_key() perhaps to iterate or pdf_dict_gets() if you already know the name of the thing.15:20.51 
Guest66018 also, "81 0 R"15:21.07 
  81 is the object index, what is 0 and R?15:21.13 
sebras 0 is the generation number. think of it like a version number. they used to be used when documents were updated, but recent PDFs don't really make use of them.15:22.19 
  R means it is an indirect object reference.15:22.27 
  you also have pdf_print_obj() (and fz_stdout()) if you want to print an object in its entirety (note that indirect references are not resolved)15:23.25 
  Guest66018: I hope this will get you started. :)15:23.45 
Guest66018 thanks, this is much better start than where i was15:24.39 
sebras Guest66018: all of this presumes you are writing it in C.15:24.52 
Guest66018 yep, in C15:24.57 
sebras Guest66018: there ought to be similar calls in javascript which you can try using mutool run documented over at https://mupdf.com/docs/manual-mutool-run.html if you like.15:25.15 
Guest66018 thanks, i'll take a look at that too15:31.20 
hellion Hello! Has anyone on here successfully added mupdf to heroku for use with rails 5.2 ActiveStorage?17:07.18 
Robin_Watts hellion: Are you familiar with the works of Gary Larson? :)17:08.49 
  https://c1.staticflickr.com/1/47/153603564_7281ad0588.jpg17:09.26 
  "blah blah blah blah MuPDF blah blah blah" :)17:09.45 
hellion I am not familiar17:13.54 
Robin_Watts He drew a cartoon strip called "The Far Side"17:14.26 
hellion The Far Side....I do know him17:14.39 
Robin_Watts My point was that most of that question went straight over my head :)17:15.10 
hellion And, yet here you are....in the #mupdf discussion :)17:15.39 
Robin_Watts MuPDF was the one bit I understood.17:16.01 
hellion purhaps I should move over to an ActiveStorage discussion17:16.10 
Robin_Watts I know nothing about heroku or rails.17:16.22 
  Best of luck.17:16.37 
hellion thanks!17:16.41 
 Forward 1 day (to 2018/01/06)>>> 
ghostscript.com #ghostscript
Search: