Curating probabilistic databases from information extraction models

Gupta, Rahul ; Sarawagi, Sunita (2006) Curating probabilistic databases from information extraction models In: 32nd International Conference on Very Large Data Bases.

[img] PDF
426kB

Abstract

Many real-life applications depend on databases automatically curated from unstructured sources through imperfect structure extraction tools. Such databases are best treated as imprecise representations of multiple extraction possibli-ties. State-of-the-art statistical models of extraction provide a sound probability distribution over extractions but are not easy to represent and query in a relational framework. In this paper we address the challenge of approximating such distributions as imprecise data models. In particular, we investigate a model that captures both row-level and column-level uncertainty and show that this representation provides significantly better approximation compared to models that use only row or only column level uncertainty. We present efficient algorithms for finding the best approximating parameters for such a model: our algorithm exploits the structure of the model to avoid enumerating the exponential number of extraction possibilities.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to ResearchGate GmbH
ID Code:128392
Deposited On:20 Oct 2022 04:33
Last Modified:14 Nov 2022 11:15

Repository Staff Only: item control page