已收录 268921 条政策
 政策提纲
  • 暂无提纲
Aspects of the pre- and post-selection classification performance of discriminant analysis and logistic regression
[摘要] ENGLISH ABSTRACT:Discriminani analysis and logistic regression are techniques that can be used to classifyentities of unknown origin into one of a number of groups. However, the underlyingmodels and assumptions for application of the two techniques differ. In this study, thetwo techniques are compared with respect to classification of entities.Firstly, the two techniques were compared in situations where no data dependentvariable selection took place. Several underlying distributions were studied: thenormal distribution, the double exponential distribution and the lognormal distribution.The number of variables, sample sizes from the different groups and the correlationstructure between the variables were varied to' obtain a large number of differentconfigurations. .The cases of two and three groups were studied. The most importantconclusions are: for normal and double' exponential data linear discriminant analysisoutperforms logistic regression, especially in cases where the ratio of the number ofvariables to the total sample size is large. For lognormal data, logistic regressionshould be preferred, except in cases where the ratio of the number of variables to thetotal sample size is large. Variable selection is frequently the first step in statistical analyses. A large number ofpotenti8.Ily important variables are observed, and an optimal subset has to be selectedfor use in further analyses. Despite the fact that variable selection is often used, theinfluence of a selection step on further analyses of the same data, is often completelyignored. An important aim of this study was to develop new selection techniques foruse in discriminant analysis and logistic regression. New estimators of the postselectionerror rate were also developed. A new selection technique, cross modelvalidation (CMV) that can be applied both in discriminant analysis and logisticregression, was developed. .This technique combines the selection of variables and theestimation of the post-selection error rate. It provides a method to determine theoptimal model dimension, to select the variables for the final model and to estimate thepost-selection error rate of the discriminant rule. An extensive Monte Carlo simulationstudy comparing the CMV technique to existing procedures in the literature, wasundertaken. In general, this technique outperformed the other methods, especiallywith respect to the accuracy of estimating the post-selection error rate.Finally, pre-test type variable selection was considered. A pre-test estimationprocedure was adapted for use as selection technique in linear discriminant analysis. Ina simulation study, this technique was compared to CMV, and was found to performwell, especially with respect to correct selection. However, this technique is only validfor uncorrelated normal variables, and its applicability is therefore limited.A numerically intensive approach was used throughout the study, since the problemsthat were investigated are not amenable to an analytical approach.
[发布日期]  [发布机构] Stellenbosch University
[效力级别]  [学科分类] 
[关键词]  [时效性] 
   浏览次数:13      统一登录查看全文      激活码登录查看全文