当前位置: X-MOL 学术Cybern. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Analysis of Machine Learning Based Imputation of Missing Data
Cybernetics and Systems ( IF 1.7 ) Pub Date : 2023-09-09 , DOI: 10.1080/01969722.2023.2247257
Syed Tahir Hussain Rizvi 1 , Muhammad Yasir Latif 2 , Muhammad Saad Amin 3 , Achraf Jabeur Telmoudi 4 , Nasir Ali Shah 5
Affiliation  

Abstract

Data analysis and classification can be affected by the availability of missing data in datasets. To deal with missing data, either deletion- or imputation-based methods are used that result in the reduction of data records or imputation of incorrect predicted value. Quality of imputed data can be significantly improved if missing values are generated accurately using machine learning algorithms. In this work, an analysis of machine learning-based algorithms for missing data imputation is performed. The K-nearest neighbors (KNN) and Sequential KNN (SKNN) algorithms are used to impute missing values in datasets using machine learning. Missing values handled using a statistical deletion approach (List-wise Deletion (LD)) and ML-based imputation methods (KNN and SKNN) are then tested and compared using different ML classifiers (Support Vector Machine and Decision Tree) to evaluate the effectiveness of imputed data. The used algorithms are compared in terms of accuracy, and results yielded that the ML-based imputation method (SKNN) outperforms the LD-based approach and KNN method in terms of the effectiveness of handling missing data in almost every dataset with both classification algorithms (SVM and DT).



中文翻译:

基于机器学习的缺失数据插补分析

摘要

数据分析和分类可能会受到数据集中缺失数据的可用性的影响。为了处理丢失的数据,使用基于删除或插补的方法,这会导致数据记录的减少或错误预测值的插补。如果使用机器学习算法准确生成缺失值,则可以显着提高估算数据的质量。在这项工作中,对基于机器学习的缺失数据插补算法进行了分析。K 最近邻 (KNN) 和顺序 KNN (SKNN) 算法用于通过机器学习来估算数据集中的缺失值。然后使用不同的 ML 分类器(支持向量机和决策树)测试和比较使用统计删除方法(列表删除 (LD))和基于 ML 的插补方法(KNN 和 SKNN)处理的缺失值,以评估估算数据。对所使用的算法在准确性方面进行了比较,结果表明,在使用两种分类算法处理几乎每个数据集中的缺失数据的有效性方面,基于 ML 的插补方法 (SKNN) 优于基于 LD 的方法和 KNN 方法( SVM 和 DT)。

更新日期:2023-09-10
down
wechat
bug