The Science and Information (SAI) Organization
  • Home
  • About Us
  • Journals
  • Conferences
  • Contact Us

Publication Links

  • IJACSA
  • Author Guidelines
  • Publication Policies
  • Metadata Harvesting (OAI2)
  • Digital Archiving Policy
  • Promote your Publication

IJACSA

  • About the Journal
  • Call for Papers
  • Author Guidelines
  • Fees/ APC
  • Submit your Paper
  • Current Issue
  • Archives
  • Indexing
  • Editors
  • Reviewers
  • Apply as a Reviewer

IJARAI

  • About the Journal
  • Archives
  • Indexing & Archiving

Special Issues

  • Home
  • Archives
  • Proposals
  • Guest Editors

Future of Information and Communication Conference (FICC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Computing Conference

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Intelligent Systems Conference (IntelliSys)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Future Technologies Conference (FTC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact
  • Home
  • Call for Papers
  • Indexing
  • Submit your Paper
  • Guidelines
  • Fees
  • Current Issue
  • Archives
  • Editors
  • Reviewers
  • Subscribe

Article Details

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

Hybrid Machine Learning-Based Approach for Anomaly Detection using Apache Spark

Author 1: Hanane Chliah
Author 2: Amal Battou
Author 3: Maryem Ait el hadj
Author 4: Adil Laoufi

Download PDF

Digital Object Identifier (DOI) : 10.14569/IJACSA.2023.0140496

Article Published in International Journal of Advanced Computer Science and Applications(IJACSA), Volume 14 Issue 4, 2023.

  • Abstract and Keywords
  • How to Cite this Article
  • {} BibTeX Source

Abstract: Over the past few decades, the volume of data has increased significantly in both scientific institutions and universities, with a large number of students enrolled and a high volume of related data. Furthermore, network traffic has increased with post-pandemic and the use of online learning. Therefore, processing network traffic data is a complex and challenging task that increases the possibility of intrusions and anomalies. Traditional security systems cannot deal with such high-speed and big data traffic. Real-time anomaly detection should be able to process data as quickly as possible to detect abnormal and malicious data. This paper proposes a hybrid approach consisting of supervised and unsupervised learning for anomaly detection based on the big data engine Apache Spark. Initially, the k-means algorithm was implemented in Sparks MLlib for clustering network traffic, then for each cluster, K-nearest neighbors algorithm (KNN) was implemented for classification and anomaly detection. The proposed model was trained and validated against a real dataset from Ibn Zohr University. The results indicate that the proposed model outperformed other well-known algorithms in detecting anomalies based on the aforementioned dataset. The experimental results show that the proposed hybrid approach can reach up to 99.94 % accuracy using the k-fold cross-validation method in the complete dataset with all 48 features.

Keywords: Anomaly detection; big data; Apache Spark; k-means; KNN

Hanane Chliah, Amal Battou, Maryem Ait el hadj and Adil Laoufi, “Hybrid Machine Learning-Based Approach for Anomaly Detection using Apache Spark” International Journal of Advanced Computer Science and Applications(IJACSA), 14(4), 2023. http://dx.doi.org/10.14569/IJACSA.2023.0140496

@article{Chliah2023,
title = {Hybrid Machine Learning-Based Approach for Anomaly Detection using Apache Spark},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2023.0140496},
url = {http://dx.doi.org/10.14569/IJACSA.2023.0140496},
year = {2023},
publisher = {The Science and Information Organization},
volume = {14},
number = {4},
author = {Hanane Chliah and Amal Battou and Maryem Ait el hadj and Adil Laoufi}
}


IJACSA

Upcoming Conferences

Future of Information and Communication Conference (FICC) 2023

2-3 March 2023

  • Virtual

Computing Conference 2023

22-23 June 2023

  • London, United Kingdom

IntelliSys 2023

7-8 September 2023

  • Amsterdam, The Netherlands

Future Technologies Conference (FTC) 2023

2-3 November 2023

  • San Francisco, United States
The Science and Information (SAI) Organization
BACK TO TOP

Computer Science Journal

  • About the Journal
  • Call for Papers
  • Submit Paper
  • Indexing

Our Conferences

  • Computing Conference
  • Intelligent Systems Conference
  • Future Technologies Conference
  • Communication Conference

Help & Support

  • Contact Us
  • About Us
  • Terms and Conditions
  • Privacy Policy

© The Science and Information (SAI) Organization Limited. All rights reserved. Registered in England and Wales. Company Number 8933205. thesai.org