Cross training: learning probabilistic mappings between topics

Sarawagi, Sunita ; Chakrabarti, Soumen ; Godbole, Shantanu (2003) Cross training: learning probabilistic mappings between topics ACM SIGKDD international conference on Knowledge discovery and data mining . p. 177.

[img] PDF
347kB

Official URL: http://doi.org/10.1145/956750.956773

Related URL: http://dx.doi.org/10.1145/956750.956773

Abstract

Classification is a well-established operation in text mining. Given a set of labels A and a set DA of training documents tagged with these labels, a classifier learns to assign labels to unlabeled test documents. Suppose we also had available a different set of labels B, together with a set of documents DB marked with labels from B. If A and B have some semantic overlap, can the availability of DB help us build a better classifier for A, and vice versa? We answer this question in the affirmative by proposing cross-training: a new approach to semi-supervised learning in presence of multiple label sets. We give distributional and discriminative algorithms for cross-training and show, through extensive experiments, that cross-training can discover and exploit probabilistic relations between two taxonomies for more accurate classification.

Item Type:Article
Source:Copyright of this article belongs to ACM, Inc
ID Code:128409
Deposited On:20 Oct 2022 06:33
Last Modified:20 Oct 2022 06:33

Repository Staff Only: item control page