已收录 272893 条政策
 政策提纲
  • 暂无提纲
Classification of Non-Small Cell Lung Cancer Based on Gene Expression in Cases of Smokers and Non-Smokers using Ensemble Methods with Statistical Based Feature Selection
[摘要] Lung cancer is one of the leading causes of death globally. One of the main risk factors for lung can ceris smoking, which causes more than 90% of lung cancer cases. There are two types of lung cancer, i.e., Small Cell Lung Cancer (SCLC) and Non-Small Cell Lung Cancer (NSCLC), which the latter is the most common. One method that can be used to detect cancer is the implementation of machine learning on gene expression data. Machine learning is one approach that promises good performance in classifying gene expression data. This study aimed to predict the existence of NSCLC based on gene expression, whether including NSCLC or normal. We used three data sets, i.e., GSE10072, GSE19804, and GSE19188, which relate to the cases of NSCLC in smokers and nonsmokers. The prediction was carried out using six Ensemble Methods, i.e., Random Forest, Adaptive Boosting, Extra Tree, Gradient Boosting, Extreme Gradient Boosting, and Categorical Boosting. Feature selection was carried out by calculating the correlation between feature and target according to statistical parameters, i.e., ANOVA, Mutual Information (MI), and a combination of ANOVA and MI. We obtained the prediction model that outperformed the related studies for two similar datasets with the value of accuracy for the GSE10072, GSE19804, and GSE19188 datasets 100%, 97.22%, and 100%, respectively.
[发布日期]  [发布机构] 
[效力级别]  [学科分类] 计算机科学(综合)
[关键词] Lung Cancer;NSCLC;Gene Expression;Ensemble Methods;Smoking [时效性] 
   浏览次数:2      统一登录查看全文      激活码登录查看全文