J. Japan Statist. Soc., Vol. 37 (No. 1), pp. 53-86, 2007

Multivariate Theory for Analyzing High Dimensional Data

M. S. Srivastava

Abstract. In this article, we develop a multivariate theory for analyzing multivariate\break datasets that have fewer observations than dimensions. More specifically, we consider the problem of testing the hypothesis that the mean vector μ of a p-dimensional random vector x is a zero vector where N, the number of independent observations on x, is less than the dimension p. It is assumed that x is normally distributed with mean vector μ and unknown nonsingular covariance matrix Σ. We propose the test statistic F+ = n-2 (p - n + 1) N' S+, where n = N - 1 < p, and S are the sample mean vector and the sample covariance matrix respectively, and S+ is the Moore-Penrose inverse of SIt is shown that a suitably normalized version of the F+ statistic is asymptotically normally distributed under the hypothesis. The asymptotic non-null distribution in one sample case is given. The case when the covariance matrix Σ is singular of rank r but the sample size N is larger than r is also considered. The corresponding results for the case of two-samples and k samples, known as MANOVA, are given.

Key words and phrases: Distribution of test statistics, DNA microarray data, fewer observations than dimension, multivariate analysis of variance, singular Wishart.

[Full text] (PDF 268 KB)