Ghosh, K. Anil ; Chaudhuri, Probal (2004) Optimal smoothing in kernel discrminant analysis Statistica Sinica, 14 . pp. 457-483. ISSN 1017-0405
Full text not available from this repository.
Official URL: http://www3.stat.sinica.edu.tw/statistica/j14n2/j1...
Abstract
One well-known use of kernel density estimates is in nonparametric discriminant analysis, and its popularity is evident in its implementation in some commonly used statistical softwares (e.g., SAS). In this paper, we make a critical investigation into the influence of the value of the bandwidth on the behavior of the average misclassification probability of a classifier that is based on kernel density estimates. In the course of this investigation, we have observed some counter-intuitive results. For instance, the use of bandwidths that minimize mean integrated square errors of kernel estimates of population densities may lead to rather poor average misclassification rates. Further, the best choice of smoothing parameters in classification problems not only depends on the underlying true densities and sample sizes but also on prior probabilities. In particular, if the prior probabilities are all equal, the behavior of the average misclassification probability turns out to be quite interesting when both the sample sizes and the bandwidths are large. Our theoretical analysis provides some new insights into the problem of smoothing in nonparametric discriminant analysis. We also observe that popular cross-validation techniques (e.g., leave-one-out or -fold) may not be very effective for selecting the bandwidth in practice. As a by-product of our investigation, we present a method for choosing appropriate values of the bandwidths when kernel density estimates are fitted to the training sample in a classification problem. The performance of the proposed method has been demonstrated using some simulation experiments as well as analysis of benchmark data sets, and its asymptotic properties have been studied under some regularity conditions.
Item Type: | Article |
---|---|
Source: | Copyright of this article belongs to Academia Sinica. |
Keywords: | Average Misclassification Probability; Bandwidth Selection; Bayes' Risk; Cross-validation Techniques; Location-shift Models; Scale Space; Spherical Symmetry |
ID Code: | 74632 |
Deposited On: | 17 Dec 2011 10:36 |
Last Modified: | 17 Dec 2011 10:36 |
Repository Staff Only: item control page