Preconditioning for feature selection in classification
[摘要] ENGLISH SUMMARY : Increased dimensionality of data is a clear trend that has been observed over the past few decades. However, analysing high-dimensional data in order to predict an outcome can be problematic. In certain cases, such as when analysing genomic data, a predictive model that is both interpretable and accurate is required. Many techniques focus on solving these two components simultaneously; however, when the data are high-dimensional and noisy, such an approach may perform poorly. Preconditioning is a two-stage technique that aims to reduce the noise inherent in the training data before making final predictions. In doing so, it addresses the issues of interpretability and accuracy separately. The literature on this technique focuses on the regression case, but in this thesis, the technique is applied in a classification setting.An overview of the theory surrounding this method is provided, as well asan empirical analysis of the method. A simulation study evaluates the performance of the technique under various scenarios and compare the resultsto those obtained by standard (non-preconditioned) models. Thereafter, themodels are applied to real-world datasets and their performances compared.Based on the results of the empirical work, it appears that, at their best,preconditioned classifiers can only reach a performance that is on par withstandard classifiers. This is in contrast to the regression case, where the literaturehas shown that preconditioning can outperform standard regressionmodels in high-dimensional settings.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]