Alias: An active learning led interactive deduplication system

Sarawagi, Sunita ; Bhamidipaty, Anuradha ; Kirpal, Alok ; Mouli, Chandra (2002) Alias: An active learning led interactive deduplication system VLDB '02: Proceedings of the 28th International Conference on Very Large Databases . pp. 1103-1106.

Full text not available from this repository.

Official URL: http://doi.org/10.1016/B978-155860869-6/50119-0

Related URL: http://dx.doi.org/10.1016/B978-155860869-6/50119-0

Abstract

Deduplication, a key operation in integrating data from multiple sources, is a time-consuming, labor-intensive and domain-specific operation. We present our design of ALIAS that uses a novel approach to ease this task by limiting the manual effort to inputing simple, domain-specific attribute similarity functions and interactively labeling a small number of record pairs. We describe how active learning is useful in selecting informative examples of duplicates and non-duplicates that can be used to train a deduplication function. ALIAS provides mechanism for efficiently applying the function on large lists of records using a novel cluster-based execution model.

Item Type:Article
Source:Copyright of this article belongs to Elsevier B.V
ID Code:128415
Deposited On:20 Oct 2022 08:52
Last Modified:20 Oct 2022 08:52

Repository Staff Only: item control page