Open-domain quantity queries on web tables: annotation, response and consensus models

Sarawagi, Sunita ; Chakrabarti, Soumen (2014) Open-domain quantity queries on web tables: annotation, response and consensus models In: KDD '14 Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

Full text not available from this repository.

Official URL: http://dl.acm.org/citation.cfm?id=2623749

Abstract

Over 40% of columns in hundreds of millions of Web tables contain numeric quantities. Tables are a richer source of structured knowledge than free text. We harness Web tables to answer queries whose target is a quantity with natural variation, such as net worth of zuckerburg, battery life of ipad, half life of plutonium, and calories in pizza. Our goal is to respond to such queries with a ranked list of quantity distributions, suitably represented. Apart from the challenges of informal schema and noisy extractions, which have been known since tables were used for non-quantity information extraction, we face additional problems of noisy number formats, as well as unit specifications that are often contextual and ambiguous. Early "hardening" of extraction decisions at a table level leads to poor accuracy. Instead, we use a Probabilistic Context Free Grammar (PCFG) based unit extractor on the tables, and retain several top-scoring extractions of quantity and numerals. Then we inject these into a new collective inference framework that makes global decisions about the relevance of candidate table snippets, the interpretation of the query's target quantity type, the value distributions to be ranked and presented, and the degree of consensus that can be built to support the proposed quantity distributions. Experiments with over 25 million Web tables and 350 diverse queries show robust, large benefits from our quantity catalog, unit extractor, and collective inference.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to KDD'14 Proceedings of the 20th ACM SIGKDD International Conference.
ID Code:99984
Deposited On:12 Feb 2018 12:26
Last Modified:12 Feb 2018 12:26

Repository Staff Only: item control page