[gs-bugs] [Bug 691974] New: after a page has been searched for a given string the fact of it presence or absence should be cached

bugzilla-daemon at ghostscript.com bugzilla-daemon at ghostscript.com
Tue Feb 15 16:36:13 UTC 2011


http://bugs.ghostscript.com/show_bug.cgi?id=691974

           Summary: after a page has been searched for a given string the
                    fact of it presence or absence should be cached
           Product: MuPDF
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P4
         Component: apps
        AssignedTo: tor.andersson at artifex.com
        ReportedBy: dsdutkiewicz at msn.com
         QAContact: gs-bugs at ghostscript.com
   Estimated Hours: 0.0


Created an attachment (id=7253)
 --> (http://bugs.ghostscript.com/attachment.cgi?id=7253)
proof of concept [NON WORKING]

i have a way to do this without making the program need to cache anything
and it can be simply extended to support caching

this is hint based like as is described in Lampson's Hints for Computer Design.

when you do a search, you remember for each page if there was a hit on it or
not. this requires only O(pageno) storage. the hint starts out saying that all
pages have a match on them so when searching you look at the hint if it says
'x' then the page might match so it is loaded and searched for the next hit if
you search the page and don't find anything you set the hint for this page to
'o'

you have a array: char * match_hit = malloc(sizeof(char)*pageno)
you set 'match_hit' to all 'x' i.e. memset(match_hit, 'x', sizeof(char)*pageno)
you have a vector of 1s the length of the number of pages

when you start a new search you reset it to all 1s
when looking for the next hit 
- if match_hit[current page] == 'o': you skip it (i.e. no need to check its
text
for possible hits because there are none)
- if you look at a page and don't find a hit: you set match_hit[pageno] = 'o'

--

to cache this for more than one search you just put match_hit in a cache keyed
by the term and when you start a search you put the current term:vector in the
cache and get the one that matches this search from the cache in this way you
could have a limited cache of say 8 or so without the need for any data
structure than one of your existing

--

i have attached a patch that does attempt at this it doesn't work. for some
reason it after it reaches the end of the document it starts marking off pages
that do have a match on them until there are none left and it gets stuck in a
loop

-- 
Configure bugmail: http://bugs.ghostscript.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


More information about the gs-bugs mailing list