Performance of a genetic algorithm for mass spectrometry proteomics

[摘要]

Background

Recently, mass spectrometry data have been mined using a genetic algorithm to produce discriminatory models that distinguish healthy individuals from those with cancer. This algorithm is the basis for claims of 100% sensitivity and specificity in two related publicly available datasets. To date, no detailed attempts have been made to explore the properties of this genetic algorithm within proteomic applications. Here the algorithm's performance on these datasets is evaluated relative to other methods.

Results

In reproducing the method, some modifications of the algorithm as it is described are necessary to get good performance. After modification, a cross-validation approach to model selection is used. The overall classification accuracy is comparable though not superior to other approaches considered. Also, some aspects of the process rely upon random sampling and thus for a fixed dataset the algorithm can produce many different models. This raises questions about how to choose among competing models. How this choice is made is important for interpreting sensitivity and specificity results as merely choosing the model with lowest test set error rate leads to overestimates of model performance.

Conclusions

The algorithm needs to be modified to reduce variability and care must be taken in how to choose among competing models. Results derived from this algorithm must be accompanied by a full description of model selection procedures to give confidence that the reported accuracy is not overstated.

[发布日期] 2004-11-19 [发布机构]

[效力级别] [学科分类]

[关键词] [时效性]

浏览次数：2

统一登录查看全文激活码登录查看全文