Sarawagi, Sunita ; Bhamidipaty, Anuradha ; Kirpal, Alok ; Mouli, Chandra (2002) Alias: An active learning led interactive deduplication system VLDB '02: Proceedings of the 28th International Conference on Very Large Databases . pp. 1103-1106.
Full text not available from this repository.
Official URL: http://doi.org/10.1016/B978-155860869-6/50119-0
Related URL: http://dx.doi.org/10.1016/B978-155860869-6/50119-0
Abstract
Deduplication, a key operation in integrating data from multiple sources, is a time-consuming, labor-intensive and domain-specific operation. We present our design of ALIAS that uses a novel approach to ease this task by limiting the manual effort to inputing simple, domain-specific attribute similarity functions and interactively labeling a small number of record pairs. We describe how active learning is useful in selecting informative examples of duplicates and non-duplicates that can be used to train a deduplication function. ALIAS provides mechanism for efficiently applying the function on large lists of records using a novel cluster-based execution model.
Item Type: | Article |
---|---|
Source: | Copyright of this article belongs to Elsevier B.V |
ID Code: | 128415 |
Deposited On: | 20 Oct 2022 08:52 |
Last Modified: | 20 Oct 2022 08:52 |
Repository Staff Only: item control page