The identification and application of common principal components
[摘要] ENGLISH ABSTRACT: When estimating the covariance matrices of two or more populations,the covariance matrices are often assumed to be either equal or completelyunrelated. The common principal components (CPC) model provides analternative which is situated between these two extreme assumptions: Theassumption is made that the population covariance matrices share the sameset of eigenvectors, but have di erent sets of eigenvalues.An important question in the application of the CPC model is to determinewhether it is appropriate for the data under consideration. Flury (1988)proposed two methods, based on likelihood estimation, to address this question.However, the assumption of multivariate normality is untenable formany real data sets, making the application of these parametric methodsquestionable. A number of non-parametric methods, based on bootstrapreplications of eigenvectors, is proposed to select an appropriate commoneigenvector model for two population covariance matrices. Using simulationexperiments, it is shown that the proposed selection methods outperform theexisting parametric selection methods.If appropriate, the CPC model can provide covariance matrix estimatorsthat are less biased than when assuming equality of the covariance matrices,and of which the elements have smaller standard errors than the elements ofthe ordinary unbiased covariance matrix estimators. A regularised covariancematrix estimator under the CPC model is proposed, and Monte Carlo simulationresults show that it provides more accurate estimates of the populationcovariance matrices than the competing covariance matrix estimators.Covariance matrix estimation forms an integral part of many multivariatestatistical methods. Applications of the CPC model in discriminant analysis,biplots and regression analysis are investigated. It is shown that, in caseswhere the CPC model is appropriate, CPC discriminant analysis provides signi cantly smaller misclassi cation error rates than both ordinary quadraticdiscriminant analysis and linear discriminant analysis. A framework for thecomparison of di erent types of biplots for data with distinct groups is developed,and CPC biplots constructed from common eigenvectors are comparedto other types of principal component biplots using this framework.A subset of data from the Vermont Oxford Network (VON), of infants admitted to participating neonatal intensive care units in South Africa andNamibia during 2009, is analysed using the CPC model. It is shown thatthe proposed non-parametric methodology o ers an improvement over theknown parametric methods in the analysis of this data set which originatedfrom a non-normally distributed multivariate population.CPC regression is compared to principal component regression and partial least squares regression in thetting of models to predict neonatal mortalityand length of stay for infants in the VON data set. Thetted regressionmodels, using readily available day-of-admission data, can be used by medicalstaand hospital administrators to counsel parents and improve theallocation of medical care resources. Predicted values from these models canalso be used in benchmarking exercises to assess the performance of neonatalintensive care units in the Southern African context, as part of larger qualityimprovement programmes.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]