|
|
|
|
|
|
|
|
On the Finite Sample Performance of the Nearest Neighbor Classifier
|
|
|
|
|
|
|
|
|
|
|
|
IEEE Transactions on Information Theory |
|
|
|
|
|
|
The finite sample performance of a nearest neighbor classifier is analyzed for a two-class pattern recognition problem. An exact integral expression is derived for the m-sample risk Rm given that a reference m-sample of labeled points is available to the classifier. The statistical setup assumes that the pattern classes arise in nature with fixed a priori probabilities and that points representing the classes are drawn from Euclidean n-space according to fixed class-conditional probability distributions. The sample is assumed to consist of m independently generated class-labeled points. For a family of smooth class-condition distributions characterized by asymptotic expansions in general form, it is shown that the m-sample risk Rm has a complete asymptotic expansion...where Rinfinity denotes the nearest neighbor risk in the infinite sample limit and the coefficients ck are distribution-dependent constants independent of the sample size m. This analysis thus provides further analytic validation of Bellman's curse of dimensionality. Numerical simulations corroborating the formal results are included, and extensions of the theory discussed. The analysis also contains a novel application of Laplace's asymptotic method of integration to a multidimensional integral where the integrand attains its maximum on a continuum of points.
|
|
|
|
|
|
|
|
|
|
|
|
nearest neighbor classifier, pattern classification, curse of dimensionality, Laplace's method of integration
|
|