Missing Data Imputation using Genetic Algorithm for Supervised Learning

Waseem Shahzad; Qamar Rehman; Ejaz Ahmed

doi:10.14569/IJACSA.2017.080360

DOI: 10.14569/IJACSA.2017.080360

PDF

Missing Data Imputation using Genetic Algorithm for Supervised Learning

Author 1: Waseem Shahzad

Author 2: Qamar Rehman

Author 3: Ejaz Ahmed

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 8 Issue 3, 2017.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: Data is an important asset for any organization to successfully run its business. When we collect data, it contains data with low qualities such as noise, incomplete, missing values etc. If the quality of data is low then mining results of any data mining algorithm will also below. In this paper, we propose a technique to deal with missing values. Genetic algorithm (GA) is used for the estimation of missing values in datasets. GA is introduced to generate optimal sets of missing values and information gain (IG) is used as the fitness function to measure the performance of an individual solution. Our goal is to impute missing values in a dataset for better classification results. This technique works even better when there is a higher rate of missing values or incomplete information along with a greater number of distinct values in attributes/features having missing values. We compare our proposed technique with single imputation techniques and multiple imputations (MI) statistically based approaches on various benchmark classification techniques on different performance measures. We show that our proposed methods outperform when compare with another state of the art missing data imputation techniques.

Keywords: genetic algorithm; information gain; missing data; supervised learning

Waseem Shahzad, Qamar Rehman and Ejaz Ahmed, “Missing Data Imputation using Genetic Algorithm for Supervised Learning” International Journal of Advanced Computer Science and Applications(IJACSA), 8(3), 2017. http://dx.doi.org/10.14569/IJACSA.2017.080360

@article{Shahzad2017,
title = {Missing Data Imputation using Genetic Algorithm for Supervised Learning},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2017.080360},
url = {http://dx.doi.org/10.14569/IJACSA.2017.080360},
year = {2017},
publisher = {The Science and Information Organization},
volume = {8},
number = {3},
author = {Waseem Shahzad and Qamar Rehman and Ejaz Ahmed}
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

Missing Data Imputation using Genetic Algorithm for Supervised Learning

Upcoming Conferences