Feature selection approach using ensemble learning for network anomaly detection
[摘要] Feature selection is essential for prioritising important attributes in data to improve prediction quality in machine learning algorithms. As different selection techniques identify different feature sets, relying on a single method may result in risky decisions. The authors propose an ensemble approach using union and quorum combination techniques with five primary individual selection methods which are analysis of variance, variance threshold, sequential backward search, recursive feature elimination, and least absolute selection and shrinkage operator. The proposed method reduces features in three rounds: (i) discard redundant features using pairwise correlation, (ii) individual methods select their own feature sets independently, and (iii) equalise individual feature sets. The equalised individual feature sets are combined using union and quorum techniques. Both the combined and individual sets are tested for network anomaly detection using random forest, decision tree, K-nearest neighbours, Gaussian Naive Bayes, and logistic regression classifiers. The experimental results on the UNSW-NB15 data set show that random forest with union and quorum feature sets yields 99 and 99.02% f1_score with minimum 6 and 12 features, respectively. The results on the NSL-KDD data set show that random forest with union and quorum gets 99.34 and 99.21% f1_score with a minimum of 28 and 18 features.
[发布日期] [发布机构]
[效力级别] [学科分类] 数学(综合)
[关键词] feature selection;decision trees;data mining;pattern classification;security of data;support vector machines;Bayes methods;feature extraction;regression analysis;random forests;machine learning algorithms;primary individual selection methods;recursive feature elimination;absolute selection;shrinkage operator;discard redundant features;individual methods;equalise individual feature sets;equalised individual feature sets;quorum techniques;network anomaly detection;random forest;quorum feature;feature selection approach;ensemble learning [时效性]