Future of Information and Communication Conference (FICC) 2025
28-29 April 2025
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 9 Issue 8, 2018.
Abstract: Big data processing requires extremely powerful and large computing setup. This puts bottleneck not only on processing infrastructure but also many researchers don’t get the freedom to analyze large datasets. This paper thus analyzes the processing of the large amount of data from machine learnt models that are built on the smaller sets of data samples. This work analyzes more than 40 GB data by testing different strategies of reducing the processed data without losing and compromising on the detection and model learning in machine learning. Many alternatives are analyzed and it is observed that 50% reduction does not drastically harm the machine learning model performance. On average, in SVM only 3.6%, and in Random Forest, only 1.8% performance is reduced, if only 50% data is used. The 50% reduction in instances means that in most cases, the data will fit in the RAM and the processing times will be considerably reduced, benefitting in execution times and or resources. From the incremental training and testing experiments, it is found that in special cases, smaller sub-sampled data can be used for model generation in machine learning problems. This is useful in cases, where there are either limitations on hardware or one has to select among many available machine learning algorithms.
Waleed Albattah and Rehan Ullah Khan, “Processing Sampled Big Data” International Journal of Advanced Computer Science and Applications(IJACSA), 9(8), 2018. http://dx.doi.org/10.14569/IJACSA.2018.090846
@article{Albattah2018,
title = {Processing Sampled Big Data},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2018.090846},
url = {http://dx.doi.org/10.14569/IJACSA.2018.090846},
year = {2018},
publisher = {The Science and Information Organization},
volume = {9},
number = {8},
author = {Waleed Albattah and Rehan Ullah Khan}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.