Sarawagi, Sunita ; Chakrabarti, Soumen ; Godbole, Shantanu (2003) Cross-training: learning probabilistic mappings between topics In: KDD '03 Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24-27, Washington, DC, USA.
|
PDF
- Other
347kB |
Official URL: http://dl.acm.org/citation.cfm?id=956773&CFID=8626...
Abstract
Classification is a well-established operation in text mining. Given a set of labels A and a set DA of training documents tagged with these labels, a classifier learns to assign labels to unlabeled test documents. Suppose we also had available a different set of labels B, together with a set of documents DB marked with labels from B. If A and B have some semantic overlap, can the availability of DB help us build a better classifier for A, and vice versa? We answer this question in the affirmative by proposing crosstraining: A new approach to semi-supervised learning in presence of multiple label sets. We give distributional and discriminative algorithms for cross-training and show, through extensive experiments, that cross-training can discover and exploit probabilistic relations between two taxonomies for more accurate classification.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Source: | Copyright of this article belongs to KDD '03 Proceedings of the Ninth ACM SIGKDD International Conference. |
Keywords: | Semi-Supervised Multi-Task Learning; Document Classi Cation; EM; Support Vector Machines |
ID Code: | 100101 |
Deposited On: | 12 Feb 2018 12:27 |
Last Modified: | 12 Feb 2018 12:27 |
Repository Staff Only: item control page