已收录 268921 条政策
 政策提纲
  • 暂无提纲
Smoothing target encoding and class center-based firefly algorithm for handling missing values in categorical variable
[摘要] One of the most common causes of incompleteness is missing data, which occurs when no data value for the variables in observation is stored. An adaptive approach model outperforming other numerical methods in the classification problem was developed using the class center-based Firefly algorithm by incorporating attribute correlations into the imputation process (C3FA). However, this model has not been tested on categorical data, which is essential in the preprocessing stage. Encoding is used to convert text or Boolean values in categorical data into numeric parameters, and the target encoding method is often utilized. This method uses target variable information to encode categorical data and it carries the risk of overfitting and inaccuracy within the infrequent categories. This study aims to use the smoothing target encoding (STE) method to perform the imputation process by combining C3FA and standard deviation (STD) and compare by several imputation methods. The results on the tic tac toe dataset showed that the proposed method (C3FA-STD) produced AUC, CA, F1-Score, precision, and recall values of 0.939, 0.882, 0.881, 0.881, and 0.882, respectively, based on the evaluation using the kNN classifier.
[发布日期] 2022-12-25 [发布机构] 
[效力级别]  [学科分类] 
[关键词] Missing data;Encoding;Smoothing;Firefly algorithm;Class center [时效性] 
   浏览次数:2      统一登录查看全文      激活码登录查看全文