Open-domain quantity queries on web tables

Sarawagi, Sunita ; Chakrabarti, Soumen (2014) Open-domain quantity queries on web tables ACM SIGKDD . pp. 711-720.

[img] PDF
1MB

Official URL: http://doi.org/10.1145/2623330.2623749

Related URL: http://dx.doi.org/10.1145/2623330.2623749

Abstract

Over 40% of columns in hundreds of millions of Web tables contain numeric quantities. Tables are a richer source of structured knowledge than free text. We harness Web tables to answer queries whose target is a quantity with natural variation, such as net worth of zuckerburg, battery life of ipad, half life of plutonium, and calories in pizza. Our goal is to respond to such queries with a ranked list of quantity distributions, suitably represented. Apart from the challenges of informal schema and noisy extractions, which have been known since tables were used for non-quantity information extraction, we face additional problems of noisy number formats, as well as unit specifications that are often contextual and ambiguous. Early "hardening" of extraction decisions at a table level leads to poor accuracy. Instead, we use a probabilistic context free grammar (PCFG) based unit extractor on the tables, and retain several top-scoring extractions of quantity and numerals. Then we inject these into a new collective inference framework that makes global decisions about the relevance of candidate table snippets, the interpretation of the query's target quantity type, the value distributions to be ranked and presented, and the degree of consensus that can be built to support the proposed quantity distributions. Experiments with over 25 million Web tables and 350 diverse queries show robust, large benefits from our quantity catalog, unit extractor, and collective inference.

Item Type:Article
Source:Copyright of this article belongs to ACM, Inc
ID Code:128350
Deposited On:19 Oct 2022 10:00
Last Modified:19 Oct 2022 10:00

Repository Staff Only: item control page