Sarawagi, Sunita ; Chakrabarti, Soumen ; Godbole, Shantanu (2003) Cross training: learning probabilistic mappings between topics ACM SIGKDD international conference on Knowledge discovery and data mining . p. 177.
PDF
347kB |
Official URL: http://doi.org/10.1145/956750.956773
Related URL: http://dx.doi.org/10.1145/956750.956773
Abstract
Classification is a well-established operation in text mining. Given a set of labels A and a set DA of training documents tagged with these labels, a classifier learns to assign labels to unlabeled test documents. Suppose we also had available a different set of labels B, together with a set of documents DB marked with labels from B. If A and B have some semantic overlap, can the availability of DB help us build a better classifier for A, and vice versa? We answer this question in the affirmative by proposing cross-training: a new approach to semi-supervised learning in presence of multiple label sets. We give distributional and discriminative algorithms for cross-training and show, through extensive experiments, that cross-training can discover and exploit probabilistic relations between two taxonomies for more accurate classification.
Item Type: | Article |
---|---|
Source: | Copyright of this article belongs to ACM, Inc |
ID Code: | 128409 |
Deposited On: | 20 Oct 2022 06:33 |
Last Modified: | 20 Oct 2022 06:33 |
Repository Staff Only: item control page