Dynamic personalized pagerank in entity-relation graphs

Chakrabarti, Soumen (2007) Dynamic personalized pagerank in entity-relation graphs In: WWW '07 Proceedings of the 16th International Conference on World Wide Web, May 8-12, Banff, Alberta, Canada.

[img]
Preview
PDF - Other
483kB

Official URL: http://dl.acm.org/citation.cfm?id=1242650

Abstract

Extractors and taggers turn unstructured text into entity-relation(ER) graphs where nodes are entities (email, paper, person,conference, company) and edges are relations (wrote, cited,works-for). Typed proximity search of the form type = personNEAR company~"IBM, paper~"XML" is an increasingly usefulsearch paradigm in ER graphs. Proximity search implementations either perform a Pagerank-like computation at query time, which is slow, or precompute, store and combine per-word Pageranks, which can be very expensive in terms of preprocessing time and space. We present HubRank, a new system for fast, dynamic, space-efficient proximity searches in ER graphs. During preprocessing, HubRank computesand indexes certain "sketchy" random walk fingerprints for a small fraction of nodes, carefully chosen using query log statistics. At query time, a small "active" subgraph is identified, bordered bynodes with indexed fingerprints. These fingerprints are adaptively loaded to various resolutions to form approximate personalized Pagerank vectors (PPVs). PPVs at remaining active nodes are now computed iteratively. We report on experiments with CiteSeer's ER graph and millions of real Cite Seer queries. Some representative numbers follow. On our testbed, HubRank preprocesses and indexes 52 times faster than whole-vocabulary PPV computation. A text index occupies 56 MB. Whole-vocabulary PPVs would consume 102GB. If PPVs are truncated to 56 MB, precision compared to true Pagerank drops to 0.55; incontrast, HubRank has precision 0.91 at 63MB. HubRank's average querytime is 200-300 milliseconds; query-time Pagerank computation takes 11 seconds on average.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to article belongs to WWW '07 Proceedings of the 16th International Conference, Association for Computing Machinery.
Keywords:Personalized Pagerank,; Graph Proximity Search
ID Code:100063
Deposited On:12 Feb 2018 12:27
Last Modified:12 Feb 2018 12:27

Repository Staff Only: item control page