Dynamic authority-based keyword search algorithms, such as Object Rank and personalized PageRank, leverage semantic link information to provide high quality, high recall search in databases, and the Web. Conceptually, these algorithms require a querytime PageRank-style iterative computation over the full graph. This computation is too expensive for large graphs and not feasible at query time. Alternatively, building an index of precomputed results for some or all keywords involves very expensive preprocessing. We introduce BinRank, a system that approximates ObjectRank results by utilizing a hybrid approach inspired by materialized views in traditional query processing. We materialize a number of relatively small subsets of the data graph in such a way that any keyword query can be answered by running ObjectRank on only one of the subgraphs. BinRank generates the subgraphs by partitioning all the terms in the corpus based on their co-occurrence, executing ObjectRank for each partition using the terms to generate a set of random walk starting points, and keeping only those objects that receive non-negligible scores. The intuition is that a subgraph that contains all objects and links relevant to a set of related terms should have all the information needed to rank objects with respect to one of these terms. We demonstrate that BinRank can achieve subsecond query execution time on the English Wikipedia data set, while producing high-quality search results that closely approximate the results of ObjectRank on the original graph. The Wikipedia link graph contains about 108 edges, which is at least two orders of magnitude larger than what prior state of the art dynamic authority-based search systems have been able to demonstrate. Our experimental evaluation investigates the trade-off between query execution time, quality of the results, and storage requirements of BinRank.
- PageRank algorithm utilizes the Web graph link structure to assign global importance to Web pages. It works by modeling the behavior of a œrandom Web surfer who starts at a random Web page and follows outgoing links with uniform probability.
- The PageRank score is independent of a keyword query.
- Personalized PageRank (PPR) for Web graph data sets and ObjectRank for graph-modeled databases results. Therefore, the issue of scalability of PPR has attracted a lot of attention.
- ObjectRank extends (personalized) PageRank to perform keyword search in databases. ObjectRank uses a query term posting list as a set of random walk starting points and conducts the walk on the instance graph of the database.
- In this project, a BinRank system that employs a hybrid approach where query time can be traded off for preprocessing time and storage. BinRank closely approximates ObjectRank scores by running the same ObjectRank algorithm on a small subgraph, instead of the full data graph.
- BinRank query execution easily scales to large clusters by distributing the subgraphs between the nodes of the cluster.
- We are proposing the BinRank algorithm for the trade time of search. Our alogorithm solves the time consuming problem in query execution. Time will be reduced because of cache storage and redundant query handling method.
- As we are implementing Materialized sub-graph, query execution time is low
- Ã˜ Quality of results generated is good
- Query execution time is high, thus CPU cost is increased
- Quality of results generated is low
- Processor : Any Processor above 500 MHz.
- Ram : 128Mb.
- Hard Disk : 10 GB.
- Input device : Standard Keyboard and Mouse.
- Output device : VGA and High Resolution Monitor.
- Operating System : Windows Family.
- Pages developed using : Java Server Pages and HTML.
- Techniques : Apache Tomcat Web Server 5.0, JDK 1.5 or higher
- Web Browser : Microsoft Internet Explorer.
- Data Bases : MySQL 5.0
- Client Side Scripting : Java Script