Gupta, Rahul ; Sarawagi, Sunita (2009) Answering table augmentation queries from unstructured lists on the web Proceedings of the VLDB Endowment, 2 (1). pp. 289-300. ISSN 2150-8097
PDF
1MB |
Official URL: http://doi.org/10.14778/1687627.1687661
Related URL: http://dx.doi.org/10.14778/1687627.1687661
Abstract
We present the design of a system for assembling a table from a few example rows by harnessing the huge corpus of information-rich but unstructured lists on the web. We developed a totally unsupervised end to end approach which given the sample query rows --- (a) retrieves HTML lists relevant to the query from a pre-indexed crawl of web lists, (b) segments the list records and maps the segments to the query schema using a statistical model, (c) consolidates the results from multiple lists into a unified merged table, (d) and presents to the user the consolidated records ranked by their estimated membership in the target relation. The key challenges in this task include construction of new rows from very few examples, and an abundance of noisy and irrelevant lists that swamp the consolidation and ranking of rows. We propose modifications to statistical record segmentation models, and present novel consolidation and ranking techniques that can process input tables of arbitrary schema without requiring any human supervision. Experiments with Wikipedia target tables and 16 million unstructured lists show that even with just three sample rows, our system is very effective at recreating Wikipedia tables, with a mean runtime of around 20s.
Item Type: | Article |
---|---|
Source: | Copyright of this article belongs to ACM, Inc |
ID Code: | 128383 |
Deposited On: | 20 Oct 2022 03:52 |
Last Modified: | 20 Oct 2022 03:52 |
Repository Staff Only: item control page