IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2014/12/23)20141224 
Robin_Watts henrys: Please don't feel you should miss out on skiing because of us.00:07.36 
henrys Robin_Watts: nope looking forward to going up to the park with all of you.00:11.33 
Robin_Watts Booked flights and hotels today. We now have 3 holidays queued up :)00:12.15 
henrys nice00:12.31 
Robin_Watts malc_: You were asking about pdf_lookup_page_loc_imp22:03.30 
malc_ aye22:03.35 
Robin_Watts Essentially that used to be implemented as a simple recursive function.22:03.50 
  but that turned out to be bad, for 2 reasons...22:04.04 
malc_ stackblowup22:04.10 
  i suppose22:04.13 
  try's22:04.14 
Robin_Watts Firstly, some files could cause stack blowup, yes.22:04.29 
  In particular we saw some files that were particularly pathological. We saw page trees of the form:22:05.37 
  [ <Page1> [<Page 2> [<Page3> [<Page4> ...] ] ] ]22:06.00 
  The only other complexities here are 1) the need to keep a parent pointer for a given node, and 2) the need to ensure we don't go into an infinite loop (which we do by marking nodes as we search)22:07.17 
malc_ saw that yeah22:07.54 
Robin_Watts So ideally, a file would produce a nice balanced node tree for the pages.22:09.38 
  I'm not sure what facilities we have within mupdf for making page trees.22:11.17 
  You could drive the low level PDF object manipulation functions yourself.22:11.39 
  What are you hoping to achieve?22:11.43 
malc_ have a linear array of all the objects representing pages22:13.45 
  for fast lookup22:13.54 
  basically the way it was before22:14.14 
Robin_Watts malc_: The problem with that is that when we do manipulations that change the page tree, that gets out of date.22:17.20 
malc_ sure22:17.27 
  but22:17.28 
Robin_Watts Also it means we have to read the entire tree to start with.22:17.31 
malc_ unless you do that22:17.42 
Robin_Watts A better scheme might be to have a page cache.22:17.50 
malc_ you can not presnet the user with the information of where the hell he is22:17.55 
  think scollbar or somesuch22:18.01 
Robin_Watts malc_: Eh?22:18.08 
  We can know how many pages there are, without having loaded them all.22:18.30 
malc_ you need to know how tall is the entire document, you can't know that unless you count the individual pages heights22:18.46 
Robin_Watts malc_: Right, yes.22:18.54 
malc_ "..ent is,"22:18.55 
Robin_Watts But we have to be a tad circumspect about this, especially as we add support for new formats, like the forthcoming epub support.22:19.24 
  We want to move away from knowing the number of pages at load time - cos for epub that requires us to lay out the whole damn document.22:19.46 
malc_ even el'cheepo ebooks show you how many pages there are22:20.43 
  but that's beside the point i suppose22:20.58 
Robin_Watts malc_: Right, but if you load a book in an ebook reader it will tell you 'x' pages, but it lies.22:21.25 
  (at least many of them lie). Change the font size, and the number of pages doesn't change.22:21.42 
  Moving pages doesn't always change the page number.22:21.49 
  The smart way to do this would be to have the app load the PDF file, read the size of the first page, read the number of pages, and then guess at a size.22:22.52 
  Then you could run through in the background reading a page at a time and adjusting the height to be correct.22:23.12 
  but that's app level cleverness, not core cleverness, to my mind.22:23.34 
  To have to load 5158 pages before you show the first one just because you want the scrollbar size to be exact seems... excessive.22:24.07 
malc_ i don't have to load 5158 pages22:26.08 
  only get their mediaboxen22:26.13 
  big difference22:26.16 
  and guesswork is inadequate if you spoort whitespace trimming and such22:27.11 
Robin_Watts That requires us to hunt through though.22:27.20 
malc_ Robin_Watts: sorry for being obtuse, but i'm still a bit unsure how to achieve what i want22:58.47 
  drop "a bit"22:59.13 
Robin_Watts malc_: If your intention is to speedily run through the entire file fetching the page boxes, then we probably need to write some new code.23:00.30 
  Possibly to have a page iterator.23:00.40 
  Or page 'map' function.23:00.52 
  But that will require some coding within the mupdf core, which you could do.23:01.11 
  Alternatively, if you just want to remain a 'user' of the core, then you'd need to do some cleverer coding. Like guessing at a size based on the first page, and then refining that guess on a background thread.23:02.25 
malc_ Robin_Watts: i was thinking about stealing https://github.com/sumatrapdfreader/sumatrapdf/blob/master/src/PdfEngine.cpp (line 1097 and bellow) but prefer to understand what i'm doing...23:05.21 
Robin_Watts malc_: The simplest 'fast' way of traversing all the pages would be to copy that pdf_lookup_page_loc_imp function, as a new function and modifying it.23:05.32 
  where pdf_lookup_page_loc_imp skips to a particular page within the page tree, just make it run through every single entry.23:06.07 
  And for each entry it finds, call a function pointer that's passed in.23:06.25 
  That gives you a 'map this function across every page' function, right?23:06.38 
  You can then call that new 'map' function with a function that extracts the page mediabox.23:07.16 
malc_ Robin_Watts: that's what i was trying to do :)23:07.19 
Robin_Watts Ok.23:07.23 
malc_ but i'm failing23:07.30 
Robin_Watts ok.23:07.40 
  The current 'stack' in there is used to store each node as we pass through it.23:09.28 
  At the moment we only ever move down the tree - hence the stack only gets extended (within the traversal)23:09.49 
  and then gets wound up at the end.23:09.56 
  You'll need to change that to step both up and down the tree. (basically you're doing a depth first search)23:10.16 
  You can probably lose indexp and parentp.23:10.36 
  and skip goes away too.23:10.43 
  You may be best dropping back to a simple recursive implementation while you get it right, and then turn that into an iterative one later - it all depends what you're most comfortable with.23:11.39 
malc_ Robin_Watts: yes, i was pondering that route too, thanks23:12.05 
  Robin_Watts: and done.. thanks a lot23:31.11 
Robin_Watts malc_: Fab. let me know how you get on. This may be something we should consider generalising.23:42.06 
  not entirely sure how given we want it to work with lots of file formats, but...23:42.25 
malc_ Robin_Watts: i was discussing it with Tor and he was against the idea of having some sort of visitor function supplied to lookup23:43.06 
  which would have allowed one to cache the stuff23:43.16 
Robin_Watts malc_: I think I agree with him that we don't want to clutter the existing lookup functions.23:44.03 
  What we're talking about here is a different beast. An addition to the API if you like.23:44.31 
malc_ Robin_Watts: sure23:46.08 
 Forward 1 day (to 2014/12/25)>>> 
ghostscript.com
Search: