Scalable focussed entity resolution

B. N., Ranganath ; Bhatnagar, Shalabh (2016) Scalable focussed entity resolution In: International Joint Conference on Neural Networks (IJCNN), 24-29 July 2016, Vancouver, BC, Canada.

Full text not available from this repository.

Official URL: http://doi.org/10.1109/IJCNN.2016.7727658

Related URL: http://dx.doi.org/10.1109/IJCNN.2016.7727658

Abstract

The problem of entity resolution is widely studied in the research community, where the goal is to identify real users associated with the user references in the documents. We focus on the problem of entity resolution in dyadic data, where associations between one pair of domain entities such as documents-words and associations between another pair, such as documents-users are observed, the example of which includes bibliographic data. For this problem of entity resolution in bibliographic data, we propose a Bayesian nonparametric `Sparse entity resolution model' (SERM) exploring the sparse relationships between the grouped data i.e., grouping of the documents, and the topics, author entities in the group. Further, we also exploit the sparseness between an author entity and the associated author aliases. Grouping of the documents is achieved with the stick breaking prior for the Dirichlet processes (DP). To achieve sparseness, we propose a solution that introduces separate Indian Buffet process (IBP) priors over topics and the author entities for the groups and k-NN mechanism for selecting author aliases for the author entities. We propose a scalable inference for SERM by appropriately combining partially collapsed Gibbs sampling scheme in Focussed topic model (FTM), inference scheme used for parametric IBP prior and the k-NN mechanism. We perform experiments over bibliographic datasets, Citeseer and Rexa, to show that the proposed SERM model improves the accuracy of entity resolution by finding relevant author entities through modeling sparse relationships and is scalable, when compared to the state-of-the-art baseline.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to Institute of Electrical and Electronics Engineers.
ID Code:116652
Deposited On:12 Apr 2021 07:18
Last Modified:12 Apr 2021 07:18

Repository Staff Only: item control page