Annotating and searching web tables using entities, types and relationships

Limaye, Girija ; Sarawagi, Sunita ; Chakrabarti, Soumen (2010) Annotating and searching web tables using entities, types and relationships In: Proceedings of VLDB 2010, 36th International Conference on Very Large Data Bases, Singapore, 13 to 17 Sept 2010, Singapore.

[img]
Preview
PDF - Other
1MB

Official URL: http://www.vldb2010.org/proceedings/files/vldb2010...

Related URL: http://dx.doi.org/10.14778/1920841.1921005

Abstract

Tables are a universal idiom to present relational data. Billions of tables on Web pages express entity references, attributes and relationships. This representation of relational world knowledge is usually considerably better than completely unstructured, free-format text. At the same time, unlike manually-created knowledge bases, relational information mined from "organic" Web tables need not be constrained by availability of precious editorial time. Unfortunately, in the absence of any formal, uniform schema imposed on Web tables, Web search cannot take advantage of these high-quality sources of relational information. In this paper we propose new machine learning techniques to annotate table cells with entities that they likely mention, table columns with types from which entities are drawn for cells in the column, and relations that pairs of table columns seek to express. We propose a new graphical model for making all these labeling decisions for each table simultaneously, rather than make separate local decisions for entities, types and relations. Experiments using the YAGO catalog, DB-Pedia, tables from Wikipedia, and over 25 million HTML tables from a 500 million page Web crawl uniformly show the superiority of our approach. We also evaluate the impact of better annotations on a prototype relational Web search tool. We demonstrate clear benefits of our annotations beyond indexing tables in a purely textual manner.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to Proceedings of the VLDB Endowment International Conference.
ID Code:100011
Deposited On:12 Feb 2018 12:26
Last Modified:12 Feb 2018 12:26

Repository Staff Only: item control page