Annotating and searching web tables using entities, types and relationships

Limaye, Girija ; Sarawagi, Sunita ; Chakrabarti, Soumen (2010) Annotating and searching web tables using entities, types and relationships Proceedings of the VLDB Endowment, 3 (1-2). pp. 1338-1347. ISSN 2150-8097

[img] PDF
619kB

Official URL: http://doi.org/10.14778/1920841.1921005

Related URL: http://dx.doi.org/10.14778/1920841.1921005

Abstract

Tables are a universal idiom to present relational data. Billions of tables on Web pages express entity references, attributes and relationships. This representation of relational world knowledge is usually considerably better than completely unstructured, free-format text. At the same time, unlike manually-created knowledge bases, relational information mined from "organic" Web tables need not be constrained by availability of precious editorial time. Unfortunately, in the absence of any formal, uniform schema imposed on Web tables, Web search cannot take advantage of these high-quality sources of relational information. In this paper we propose new machine learning techniques to annotate table cells with entities that they likely mention, table columns with types from which entities are drawn for cells in the column, and relations that pairs of table columns seek to express. We propose a new graphical model for making all these labeling decisions for each table simultaneously, rather than make separate local decisions for entities, types and relations. Experiments using the YAGO catalog, DB-Pedia, tables from Wikipedia, and over 25 million HTML tables from a 500 million page Web crawl uniformly show the superiority of our approach. We also evaluate the impact of better annotations on a prototype relational Web search tool. We demonstrate clear benefits of our annotations beyond indexing tables in a purely textual manner.

Item Type:Article
Source:Copyright of this article belongs to ACM, Inc
ID Code:128365
Deposited On:19 Oct 2022 10:43
Last Modified:19 Oct 2022 10:43

Repository Staff Only: item control page