Learning Dirichlet Processes from Partially Observed Groups

Dubey, Avinava ; Bhattacharya, Indrajit ; Das, Mrinal ; Faruquie, Tanveer ; Bhattacharyya, Chiranjib (2011) Learning Dirichlet Processes from Partially Observed Groups In: UNSPECIFIED, 11-14 December 2011, Vancouver, BC, Canada.

[img] PDF
317kB

Official URL: http://doi.org/10.1109/ICDM.2011.85

Related URL: http://dx.doi.org/10.1109/ICDM.2011.85

Abstract

Motivated by the task of vernacular news analysis using known news topics from national news-papers, we study the task of topic analysis, where given source datasets with observed topics, data items from a target dataset need to be assigned either to observed source topics or to new ones. Using Hierarchical Dirichlet Processes for addressing this task imposes unnecessary and often inappropriate generative assumptions on the observed source topics. In this paper, we explore Dirichlet Processes with partially observed groups (POG-DP). POG-DP avoids modeling the given source topics. Instead, it directly models the conditional distribution of the target data as a mixture of a Dirichlet Process and the posterior distribution of a Hierarchical Dirichlet Process with known groups and topics. This introduces coupling between selection probabilities of all topics within a source, leading to effective identification of source topics. We further improve on this with a Combinatorial Dirichlet Process with partially observed groups (POG-CDP) that captures finer grained coupling between related topics by choosing intersections between sources. We evaluate our models in three different real-world applications. Using extensive experimentation, we compare against several baselines to show that our model performs significantly better in all three applications.

Item Type:Conference or Workshop Item (Other)
Keywords:Copyright of this article belongs to IEEE
ID Code:127765
Deposited On:13 Oct 2022 11:01
Last Modified:13 Oct 2022 11:01

Repository Staff Only: item control page