Cross-training: learning probabilistic mappings between topics

Sarawagi, Sunita ; Chakrabarti, Soumen ; Godbole, Shantanu (2003) Cross-training: learning probabilistic mappings between topics In: KDD '03 Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24-27, Washington, DC, USA.

Preview

PDF - Other
347kB

Official URL: http://dl.acm.org/citation.cfm?id=956773&CFID=8626...

Abstract

Classification is a well-established operation in text mining. Given a set of labels A and a set DA of training documents tagged with these labels, a classifier learns to assign labels to unlabeled test documents. Suppose we also had available a different set of labels B, together with a set of documents DB marked with labels from B. If A and B have some semantic overlap, can the availability of DB help us build a better classifier for A, and vice versa? We answer this question in the affirmative by proposing crosstraining: A new approach to semi-supervised learning in presence of multiple label sets. We give distributional and discriminative algorithms for cross-training and show, through extensive experiments, that cross-training can discover and exploit probabilistic relations between two taxonomies for more accurate classification.

Item Type:	Conference or Workshop Item (Paper)
Source:	Copyright of this article belongs to KDD '03 Proceedings of the Ninth ACM SIGKDD International Conference.
Keywords:	Semi-Supervised Multi-Task Learning; Document Classi Cation; EM; Support Vector Machines
ID Code:	100101
Deposited On:	12 Feb 2018 12:27
Last Modified:	12 Feb 2018 12:27

Repository Staff Only: item control page