Future of Information and Communication Conference (FICC) 2025
28-29 April 2025
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 15 Issue 8, 2024.
Abstract: Unbalanced data sets represent data sets that contain an unequal number of examples for different classes. This dataset represents a problem faced by machine learning tools; as in datasets with high imbalance ratios, false negative rate per-centages will be increased because most classifiers will be affected by the major class. Choosing specific evaluation metrics that are most informative and sampling techniques represent a common way to handle this problem. In this paper, a comparative analysis between four of the most common under-sampling techniques is conducted over datasets with various imbalance rates (IR) range from low to medium to high IR. Decision Tree classifier and twelve imbalanced data sets with various IR are used for evaluating the effects of each technique depending on Recall, F1-measure, gmean, recall for minor class, and F1-measure for minor class evaluation metrics. Results demonstrate that Clusters Centroid outperformed Neighborhood Cleaning Rule (NCL) based on recall for all low IR datasets. For both medium, and high IR datasets NCL, and Random Under Sampling (RUS) outperformed the rest techniques, while Tomek Link has the worst effect.
Esraa Abu Elsoud, Mohamad Hassan, Omar Alidmat, Esraa Al Henawi, Nawaf Alshdaifat, Mosab Igtait, Ayman Ghaben, Anwar Katrawi and Mohmmad Dmour, “Under Sampling Techniques for Handling Unbalanced Data with Various Imbalance Rates: A Comparative Study” International Journal of Advanced Computer Science and Applications(IJACSA), 15(8), 2024. http://dx.doi.org/10.14569/IJACSA.2024.01508124
@article{Elsoud2024,
title = {Under Sampling Techniques for Handling Unbalanced Data with Various Imbalance Rates: A Comparative Study},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2024.01508124},
url = {http://dx.doi.org/10.14569/IJACSA.2024.01508124},
year = {2024},
publisher = {The Science and Information Organization},
volume = {15},
number = {8},
author = {Esraa Abu Elsoud and Mohamad Hassan and Omar Alidmat and Esraa Al Henawi and Nawaf Alshdaifat and Mosab Igtait and Ayman Ghaben and Anwar Katrawi and Mohmmad Dmour}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.