Machine Learning Classification Techniques for Breast Cancer Diagnosis
[摘要] Breast cancer is one of the most widely spread disease and the second leading cause of cancer death among women. Breast cancer starts when malignant lumps which are cancerous begin to grow from the breast cells. Doctors may wrongly diagnose benign tumor (which is non-cancerous) as malignant tumor. There is need for a computer aided detection (CAD) systems which uses machine learning approach to provide accurate diagnosis of breast cancer. These CAD systems can aid in detecting breast cancer at an early stage. When, breast cancer is detected early enough, the survival rate increases because better treatment can be provided. This paper aims at investigating Support Vector Machine (using radial basis kernel), Artificial Neural Networks and Naïve Bayes using the Wisconsin Diagnostic Breast Cancer (WDBC) Dataset. The focus of this paper is to integrate these machine learning techniques with feature selection/feature extraction methods and compare their performances to identify the most suitable approach. The goal is combining the advantages of dimensionality reduction and machine learning. This paper proposed a hybrid approach for breast cancer diagnosis by reducing the high dimensionality of features using linear discriminant analysis (LDA), and then applying the new reduced feature dataset to Support Vector Machine. The proposed approach obtained an accuracy of 98.82%, sensitivity of 98.41%, specificity of 99.07% and area under the receiver operating characteristic curve of 0.9994.
[发布日期] [发布机构] Curtin University, CDT 250, Sarawak, Miri; 98009, Malaysia^1;Curtin University, Kent St, Bentley; WA; 6102, Australia^2
[效力级别] [学科分类] 工业工程学
[关键词] Breast Cancer;Computer aided detection systems;Dimensionality reduction;Linear discriminant analysis;Machine learning approaches;Machine learning classification;Machine learning techniques;Receiver operating characteristic curves [时效性]