Persistence of plug-in rule in classification of high dimensional multivariate binary data

Park, Junyong ; Ghosh, Jayanta K. (2007) Persistence of plug-in rule in classification of high dimensional multivariate binary data Journal of Statistical Planning and Inference, 137 (11). pp. 3687-3705. ISSN 0378-3758

Full text not available from this repository.

Official URL: http://linkinghub.elsevier.com/retrieve/pii/S03783...

Related URL: http://dx.doi.org/10.1016/j.jspi.2007.03.043

Abstract

In this paper, we consider the classification problem when the predictors are multivariate binary random variables. Variables are modeled as independent, but not necessarily identical, Bernoulli. A triangular array for parameters, (P11(n),...,P1d(n), P21(n),....,P2d(n)), is assumed to allow parameters to change and the number of the variables, d, to increase for adopting more flexible models as the sample size, n, increases. Our results are obtained under moderate assumptions on the triangular array of the probability vectors. We use maximum likelihood estimators for the parameters and plug them into the Bayes classifier. This is a plug-in classifier, a sort of objective Bayes rule. It is shown in Wilbur et al. [2002. Variable selection in high-dimensional multivariate binary data with application to the analysis of microbial DNA fingerprints. Biometrics 58, 378-386] via simulations that the plug-in rule classifies quite well even when the assumption of independence is violated. The main interest in this paper is in the complex case of d/nv→c for some v>0 and c>0 for which very little is known. Using linearity of the plug-in rule, we show its persistence, a generalization of the notion of consistency, when the variance of the plug-in rule or a quantity measuring signal to noise ratio is divergent; otherwise we show there exists an example of non-persistence of the plug-in rule. In case of non-persistence, we introduce the notion of sparsity and overcome non-persistence by selecting a subset of the variables. This shows why a variable selection procedure may be effective especially for contemporary practical problems with high dimensional data [Wilbur et al., 2002. Variable selection in high-dimensional multivariate binary data with application to the analysis of microbial DNA fingerprints.

Item Type:Article
Source:Copyright of this article belongs to Elsevier Science.
Keywords:Persistence; Triangular Array; High Dimensional Multivariate Binary Data; Plug-in Rule; Sparsity
ID Code:22527
Deposited On:24 Nov 2010 08:24
Last Modified:02 Jun 2011 06:37

Repository Staff Only: item control page