IJACSA Volume 11 Issue 11 - thesai.org

IJACSA Volume 11 Issue 11

Copyright Statement: This is an open access publication licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

View Full Issue

Paper 1: Classification of Imbalanced Datasets using One-Class SVM, k-Nearest Neighbors and CART Algorithm

Abstract: In this paper a new algorithm, OKC classifier is proposed that is a hybrid of One-Class SVM, k-Nearest Neighbours and CART algorithms. The performance of most of the classification algorithms is significantly influenced by certain characteristics of datasets on which these are modeled such as imbalance in class distribution, class overlapping, lack of density, etc. The proposed algorithm can perform the classification task on imbalanced datasets without re-sampling. This algorithm is compared against a few well known classification algorithms and on datasets having varying degrees of class imbalance and class overlap. The experimental results demonstrate that the proposed algorithm has performed better than a number of standard classification algorithms.

Author 1: Maruthi Rohit Ayyagari

Keywords: SVM; k-NN; CART; OKC; classification; machine learning

Paper 2: Involving American Schools in Enhancing Children’s Digital Literacy and Raising Awareness of Risks Associated with Internet Usage

Abstract: The purpose of this study is to shine the light on the importance of educating students on digital literacy and netiquette, for technology has become a common denominator in most of our tasks. This study is mostly concerned with involving schools in educating students on this matter since students spend most of their time in schools. The paper expresses the urgency of increasing the dose of digital literacy taught in schools to help raise students’ awareness to potential risks the internet has. It breaks down the risks that young users are prone to face as well as ways to safely avoid them. Further, the paper analyzes the state standards practiced in the US. to serve as a wake-up call for schools to work on improving their standards to protect young users from the versatile harms. Therefore, schools are conjured to take on the role of enhancing students’ digital literacy and their understanding of the potential risks present online.

Author 1: Mohammed Tawfik Hussein

Author 2: Reem M. Hussein

Keywords: Digital literacy; e-learning; internet risks; online education and safety

Paper 3: Detecting Spam in Twitter Microblogging Services: A Novel Machine Learning Approach based on Domain Popularity

Abstract: Detecting Internet malicious activities has been and continues to be a critical issue that needs to be addressed effectively. This is essential to protect our personal information, computing resources, and financial capitals from unsolicited actions, such as, credential information theft, downloading and installing malware, extortion, etc. The introduction of the social media such as Twitter has given malicious users a new and a promising platform to perform their activities, ranging from a simple spam message to taking a full control over the victim’s machine. Twitter revealed that its algorithms for detecting spam are not very effective; most of the trending hashtags include unrelated spam and advertising tweets which indicates that there is a problem with the currently used spam detection framework. This paper proposes a new approach for detecting spam in Twitter microblogging using Machine Learning (ML) techniques and domain popularity services. The proposed approach comprises two main stages: 1) Tweets are collected periodically and filtered by selecting the ones that appear more frequently than a decided threshold in the specified period (i.e. common tweets). Then, an inspection is conducted on the common tweets by checking the associated URL domain with Alexa’s top one million globally viewed websites. If a tweet is common on Twitter but does not appear on the top one million globally viewed websites, it is flagged as a potential spam. 2) The second stage kicks in by running ML algorithms on the flagged tweets to extract features that help detect the cluster of spam and prevent it in real-time. The performance of the proposed approach has been evaluated using three most popular classification models (random forest, J48, and Naïve Bayes). For all classifiers, results showed the effectiveness of the proposed method in terms of different performance metrics (e.g. precision, sensitivity, F1-score, accuracy) and using different test scenarios.

Author 1: Khalid Binsaeed

Author 2: Gianluca Stringhini

Author 3: Ahmed E. Youssef

Keywords: Spam detection; phishing detection; domain popularity; machine learning; Twitter

Paper 4: Critical Success Factors on the Implementation of ERP Systems: Building a Theoretical Framework

Abstract: Existing pressure for the confrontation of a radically changing external environment has led many companies to invest in various Information Systems, such as Enterprise Resource Planning (ERP), in order to optimize their production processes and strategies. Despite the fact that ERP system is an important strategic tool, many companies fail to take advantage of its benefits due to their default in many aspects of management and implementation. This study aims to investigate the critical success factors of enterprise resource planning system implementation and build a categorization framework so as to create a theoretical base that enhances any further research approaches in various sectors of the economy. Therefore, 37 ERP critical success factors were identified by using Content Analysis method and classified into relative categories to the ERP orientations of implementation and the ERP life cycle phases. Finally, these two types of categorization were merged in order to examine the critical success factors’ behavior during the ERP implementation. This paper and the multilateral theoretical framework it creates, sets out how critical success factors must be taken into account by companies and marks a beginning point that promises a sequence of further research approaches in particular economic sectors or in a set of them. By fulfilling the purpose of this study, a significant contribution to computer science literature and especially to the ERP field is offered.

Author 1: Asimina Kouriati

Author 2: Thomas Bournaris

Author 3: Basil Manos

Author 4: Stefanos A. Nastis

Keywords: Enterprise resource planning (ERP); ERP implementation; critical success factors (CSFs); content analysis (CA); categorization; theoretical framework

Paper 5: Autoencoder based Semi-Supervised Anomaly Detection in Turbofan Engines

Abstract: This paper proposes a semi-supervised autoencoder based approach for the detection of anomalies in turbofan engines. Data used in this research is generated through simulation of turbofan engines created using a tool known as Commercial Modular Aero-Propulsion System Simulation (CMAPSS). C-MAPSS allows users to simulate various operational settings, environmental conditions, and control settings by varying various input parameters. Optimal architecture of autoencoder is discovered using Bayesian hyperparameter tuning approach. Autoencoder model with optimal architecture is trained on data representing normal behavior of turbofan engines included in training set. Performance of trained model is then tested on data of engines included in test set. To study the effect of redundant features removal on performance, two approaches are implemented and tested: with and without redundant features removal. Performance of proposed models is evaluated using various performance evaluation metrics like F1-score, Precision and Recall. Results have shown that best performance is achieved when autoencoder model is used without redundant feature removal.

Author 1: Ali Al Bataineh

Author 2: Aakif Mairaj

Author 3: Devinder Kaur

Keywords: Anomaly detection; autoencoder; bayesian hyperparameter tuning; turbofan engine

Paper 6: Definition of Unique Objects by Convolutional Neural Networks using Transfer Learning

Abstract: This article solves the problem of detecting medical masks on a person's face. Medical mask is one of the most effective measures to prevent infection with COVID-19, and its automatic detection is an actual task. The introduction of automatic recognition of medical masks in existing information security systems will allow quickly identify the violator of the mask regime, which in turn will increase security in a pandemic. The article provides a detailed analysis of existing solutions for face detection and automatic recognition of medical masks, method based on the use of convolutional neural networks was proposed. A distinctive feature of the new method is the use of two neural networks at once, using the RetinaFace neural network architecture at the face search stage and using the Resnet neural network architecture at the face mask recognition stage. It is shown that the use of transfer learning on scales, learned to work with faces, significantly accelerates learning and increases the accuracy of recognition. However, with this approach, there are some false positives, for example, when you try to cover your face with your hands, imitating a medical mask. Based on the study, we can conclude that the algorithm is applicable in the security system to determine the presence/absence of a medical mask on a person's face, as well as the need for additional research to solve the problems of false positives of the algorithm.

Author 1: Rusakov K. D

Author 2: Seliverstov D.E

Author 3: Osipov V.V

Author 4: Reshetnikov V.N

Keywords: Recognition of medical masks; COVID-19; convolutional neural networks; retina face; Resnet

Paper 7: Prelaunch Matching Architecture for Distributed Intelligent Image Recognition

Abstract: The paper presents a multi-agent solution for dynamic combination of several artificial neural networks used for image recognition. As opposed to the existing methods there is introduced a dispatcher agent that provides prelaunch matching of possible pro-active identification algorithms through competition. The proposed solution was implemented to solve a problem of stream processing of photo images produced by a number of distributed cameras using an intelligent mobile application. It was probated and utilized in practice to capture the results of electrical meters that are manually monitored by a group of patrol personnel inspectors using hand held devices. Prelaunch matching architecture allowed increasing the quality of digits recognition using various neural networks depending on the operating conditions.

Author 1: Anton Ivaschenko

Author 2: Arkadiy Krivosheev

Author 3: Pavel Sitnikov

Keywords: Multi-agent technology; artificial neural networks; image recognition; electricity meter data processing

Paper 8: STEM-Technology Example of the Computational Problem of a Chain on a Cylinder

Abstract: An application of the STEM technology to the computational problem of the parameters of a closed chain (with and without load) thrown over a horizontal cylinder is considered. The numerical solution is found and its graphical interpretation is made by compiling a system of transcendental equations, as well as carrying out numerical optimization with constraints. The approximating analytical dependence is determined using the fitting functions. In the process of solving a number of concepts from mathematics, physics, computer science are examined. Some possibilities of using specialized mathematical packages (in particular, Mathcad) and of working on online platforms are shown. Additional problems options for using STEM technology are presented.

Author 1: Valery Ochkov

Author 2: Konstantin Orlov

Author 3: Evgeny Barochkin

Author 4: Inna Vasileva

Author 5: Evgeny Nikulchev

Keywords: STEM technology; math education; closed chain; Mathcad

Paper 9: Recursive Least Square: RLS Method-Based Time Series Data Prediction for Many Missing Data

Abstract: Prediction methods for time series data with many missing data based on Recursive Least Square (RLS) method are proposed. There are two parameter tuning algorithms, time update and measurement update algorithms for parameter estimation of Kalman filter. Two learning methods for parameter estimation of Kalman filter are proposed based on RLS method. One is the method without measurement update algorithm (RLS-1). The other one is the method without both time and measurement update algorithms (RLS-2). The methods are applied to the time series data of Defense Meteorological Satellite Program (DMSP) / Special Sensor Microwave/Imager (SSM/I) data with a plenty of missing data. It is found that the proposed RLS-2 method shows smooth and fast convergence in learning process in comparison to the RLS-1.

Author 1: Kohei Arai

Author 2: Kaname Seto

Keywords: Special Sensor Microwave/Imager (SSM/I); Defense Meteorological Satellite Program (DMSP); Kalman filter; Recursive Least Square (RLS) method; missing data; parameter estimation

Paper 10: Mapping Linguistic Variations in Colloquial Arabic through Twitter

Abstract: The recent years have witnessed the development of different computational approaches to the study of linguistic variations and regional dialectology in different languages including English, German, Spanish and Chinese. These approaches have proved effective in dealing with large corpora and making reliable generalizations about the data. In Arabic, however, much of the work on regional dialectology is so far based on traditional methods; therefore, it is difficult to provide a comprehensive mapping of the dialectal variations of all the colloquial dialects of Arabic. As thus, this study is concerned with proposing a computational statistical model for mapping the linguistic variation and regional dialectology in Colloquial Arabic through Twitter based on the lexical choices of speakers. The aim is to explore the lexical patterns for generating regional dialect maps as derived from Twitter users. The study is based on a corpus of 1597348 geolocated Twitter posts. Using principal component analysis (PCA), data were classified into distinct classes and the lexical features of each class were identified. Results indicate that lexical choices of Twitter users can be usefully used for mapping the regional dialect variation in Colloquial Arabic.

Author 1: Abdulfattah Omar

Author 2: Hamza Ethleb

Author 3: Mohamed Elarabawy Hashem

Keywords: Colloquial Arabic; computational statistical model; lexical patterns; linguistic mapping; principal component analysis (PCA)

Paper 11: Implementation of Text Base Information Retrieval Technique

Abstract: Everyone is in the need of accurate and efficient information retrieval in no time. Search engines are the main source to extract the required information, when a user search a query and wants to generate the results. Different search engines provide different Application Programming Interface (API) and Libraries to the researchers and the programmers to access the data that has been stored in servers of the search engines. When a researcher or programmer search's a query by using API, it returns a Java Script Orientation Notation (JSON) file. In this JSON file, information is encapsulated where scraping techniques are used to filter out the text. The aim of this paper is to propose a different approach to effectively and efficiently filter out the queries based on text which has been searched by the search engines and return the most appropriate results to the users after matching the searched text because the previous techniques which are used are not enough efficient. We use different comparison techniques, i.e. Sequence Matcher Method and then compare the results of this technique with relevance feedback and in the end we found that our proposed technique is providing much better results.

Author 1: Syed Ali Jafar Zaidi

Author 2: Safdar Hussain

Author 3: Samir Brahim Belhaouari

Keywords: Information retrieval; sequence matcher method; relevance feedback

Paper 12: Single Modality-Based Event Detection Framework for Complex Videos

Abstract: Event detection of rare and complex events in large video datasets or in unconstrained user-uploaded videos on internet is a challenging task. The presence of irregular camera movement, viewpoint changes, illumination variations and significant changes in the background make extremely difficult to capture underlying motion in videos. In addition, extraction of features using different modalities (single streams) may offer computational complexities and cause abstraction of confusing and irrelevant spatial and semantic features. To address this problem, we present a single stream (RGB only) based on feature of spatial and semantic features extracted by modified 3D Residual Convulsion Network. We combine the spatial and semantic features based on this assumption that difference between both types of features can discover the accurate and relevant features. Moreover, introduction of temporal encoding builds the relationship in consecutive video frames to explore discriminative long-term motion patterns. We conduct extensive experiments on prominent publically available datasets. The obtained results demonstrate the great power of our proposed model and improved accuracy compared with existing state-of-the-art methods.

Author 1: Sheeraz Arif

Author 2: Adnan Ahmed Siddiqui

Author 3: Rajesh Kumar

Author 4: Avinash Maheshwari

Author 5: Komal Maheshwari

Author 6: Muhammad Imran Saeed

Keywords: Event detection; single-stream; feature fusion; temporal encoding

Paper 13: An Efficient Domain-Adaptation Method using GAN for Fraud Detection

Abstract: In this paper, an efficient domain-adaptation method is proposed for fraud detection. The proposed method employs the discriminative characteristics used in feature maps and generative adversarial networks (GANs), to minimize the deviation that occurs when a common feature is shifted between two domains. To solve class imbalance problem and increase the model’s detection accuracy, new data samples are generated by applying a minority class data augmentation method, which uses a GAN. We evaluate the classification performance of the proposed domain-adaption model by comparing it against support vector machine (SVM) and convolutional neural network (CNN) models, using classification performance evaluation indicators. The experimental results indicated that the proposed model is applicable to both test datasets; furthermore, it requires less time for learning. Although the SVM offers a better detection performance than the CNN and proposed domain-adaptation model, its learning time exceeds those of the other two models when dataset increases. Also, although the detection performance of the CNN-based model is similar to that of the proposed domain-adaptation model, its learning process is longer. In addition, although the GAN used to solve the class imbalance problem of the two datasets requires slightly more time than SMOTE (synthetic minority oversampling technique), it shows a better classification performance and is effective for datasets featuring class imbalances.

Author 1: Jeonghyun Hwang

Author 2: Kangseok Kim

Keywords: Fraud detection; domain adaptation; data augmentation; deep learning; GAN

Paper 14: Evaluation of Student Core Drives on e-Learning during the Covid-19 with Octalysis Gamification Framework

Abstract: Learning activities during the Covid-19 pandemic were carried out with an online system even though in reality many institutions had not prepared their systems and infrastructure properly. Some e-learning media that are generally used based on survey results include 53.81% google classrooms combined with other applications that are not integrated with the institution's Learning Management System. This condition provides research opportunities to evaluate the effectiveness of online learning, especially how students are motivated to learn the method, where the results can be used as a reference in developing and refining the method. Based on many studies, that the gamification model can increase individual motivation in carrying out activities, this study uses a gamification octalysis framework to analyze the extent of the role of gamification in the learning process and measure the amount of student motivation in online learning activities. The evaluation results show that the conclusion of the Likert scale results in a "High" level, while the highest score is "Very High". As for the octalysis test scale, the average score of 6.5 on a scale of 1 to 10. The conclusion from the results of this evaluation is that the motivation to learn e-learning during the Covid-19 period is quite high and has the potential to be developed. While the results of the Octalysis framework with 8 core drives are still average, for that we need innovation in E-learning which aims to increase student motivation based on Octalysis's 8 core drives. The results of this study recommend that gamification is needed to increase student learning motivation in order to improve learning outcomes.

Author 1: Fitri Marisa

Author 2: Sharifah Sakinah Syed Ahmad

Author 3: Zeratul Izzah Mohd Yusoh

Author 4: Anastasia L Maukar

Author 5: Ronald David Marcus

Author 6: Anang Aris Widodo

Keywords: Gamificaton; education; Covid-19 pandemic; octalysis framework

Paper 15: Covid-19 Ontology Engineering-Knowledge Modeling of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)

Abstract: COVID-19 pandemic has rapidly spread across the world since its arrival in December 2019 from Wuhan, China. This pandemic has disrupted the health of the citizens in such a way that the impact is enormous in terms of economy and social aspects. Education, employment, income, well-being of the humankind is affected very crucially by this corona virus. Nations world-wide are struggling to battle this emergency. Intensive studies are being carried out to control this pandemic by researchers all over the world. Medical science has advanced a lot with the application of computer assisted solutions in health care. Ontology based clinical decision support systems (CDSS) assist medical practitioners in the diagnosis and treatment of diseases. They are well known in data sharing, interoperability, knowledge reuse, and decision support. This research article presents the development of ontology for SARS-CoV-2 (COVID-19) to be used in a CDSS, which is proposed in the satellite clinics of Royal Oman Police (ROP), Sultanate of Oman. The key concepts and the concept relationships of COVID-19 is represented using an ontology. Semantic Web Rule Language (SWRL) is used to model the rules related to the initial diagnosis of the patient and Semantic Query Enhanced Web Rule Language (SQWRL) is used to retrieve the data stored in the ontology. The developed ontology successfully classified the patients into one of the different categories as non-suspected, suspected, probable, and confirmed. The reasoning time and the query execution time is found to be optimal.

Author 1: Vinu Sherimon

Author 2: Sherimon P.C

Author 3: Renchi Mathew

Author 4: Sandeep M. Kumar

Author 5: Rahul V. Nair

Author 6: Khalid Shaikh

Author 7: Hilal Khalid Al Ghafri

Author 8: Huda Salim Al Shuaily

Keywords: COVID-19; ontology; SARS-CoV-2; ontology reasoning; SWRL; SQWRL

Paper 16: A Conceptual Data Modelling Framework for Context-Aware Text Classification

Abstract: Data analytics has an interesting variant that aims to understand an entity's behavior. It is termed as diagnostic analytics, which answers “why type questions”. “Why type questions” find their applications in emotion classification, brand analysis, drug review modeling, customer complaints classification etc. Labeled data form the core of any analytics' problem, leave alone diagnostic analytics; however, labeled data is not always available. In some cases, it is required to assign labels to unknown entities and understand its behavior. For such scenarios, the proposed model unites topic modeling and text classification techniques. This combined data model will help to solve diagnostic issues and obtain meaningful insights from data by treating the procedure as a classification problem. The proposed model uses Improved Latent Drichlet Allocation for topic modeling and sentiment analysis to understand an entity's behavior and represent it as an Improved Multinomial Naïve Bayesian data model to achieve automated classification. The model is tested using drug review dataset obtained from UCI repository. The health conditions with their associated drug names were extracted from the reviews and sentiment scores were assigned. The sentiment scores reflected the behavior of various drugs for a particular health condition and classified them according to their quality. The proposed model performance is compared with existing baseline models and it is proved that our model exhibited better than other models.

Author 1: Nazia Tazeen

Author 2: K. Sandhya Rani

Keywords: Text classification; topic modeling; natural language processing; sentiment analysis; drug dataset; context-aware model; diagnostic analytics; feature extraction

Paper 17: Smart Start and HER for a Directed and Persistent Reinforcement Learning Exploration in Discrete Environment

Abstract: Reinforcement learning (RL) solves sequential decision making problems through trial and error, through experiences can be amassed to achieve goals and increase the accumulative rewards. Exploration-exploitation dilemma is a critical challenge in reinforcement learning, particularly environments with misleading or sparse rewards which have shown difficulties to construct a suitable exploration strategy. In this paper a framework for Smart Start (SS) and Hindsight experience replay (HER) is developed to improve the performance of SS and make the exploration more directed especially in the early episodes. The framework Smart Start and Hindsight experience replay (SS+HER) was studied in discrete maze environment with sparse rewards. The results reveal that the framework doubles the rewards at the early episodes and decreases the time of the agent to reach the goal.

Author 1: Heba Alrakh

Author 2: Muhammad Fahmi Miskon

Author 3: Rozilawati Mohd Nor

Keywords: Reinforcement learning; hindsight experience replay; smart start; limit search space; exploration-exploitation trade off

Paper 18: Implementation of Low Cost Remote Primary Healthcare Services through Telemedicine: Bangladesh Perspectives

Abstract: In this paper, we have implemented a low cost primary healthcare service for the remote rural people of Bangladesh. These services were delivered through our developed advanced telemedicine model. The main aim of this paper is to provide basic healthcare service through the developed low cost hardware. We have developed Arduino based low cost hardware’s to be used for this telemedicine services. Remote patients of Bangladesh can get the expert doctors opinion without going to the urban areas. We have collected nine vital signs such as electrocardiogram (ECG), oxygen saturation (SPO2), blood pressure, temperature, body position, glucose level, airflow, height, and weight of patients to be used in our model. We have removed unwanted signals from the collected vital signs through several filtering algorithms. Our system was successfully tested with the patients of Marie Stopes Bangladesh Hospital. From our developed model, rural patients can get primary healthcare services from the pharmacy of any remote village of Bangladesh with the assistance of local doctor by using Raspberry PI. Finally, we can say that the deployment of the developed healthcare service will reduce the cost of the telemedicine services and advances the healthcare facilities for the remote people of Bangladesh.

Author 1: Uzzal Kumar Prodhan

Author 2: Tushar Kanti Saha

Author 3: Rubya Shaharin

Author 4: Toufik Ahmed Emon

Author 5: Mohammad Zahidur Rahman

Keywords: Raspberry PI; DGHS; Arduino; Portable; ECG; SPO2

Paper 19: Towards a Standardization of Learning Behavior Indicators in Virtual Environments

Abstract: The need to analyze student interactions in virtual learning environments (VLE) and the improvements this generates is an increasingly emerging reality in order to make timely predictions and optimize student learning. This research aims to implement a proposal of standardized learning behavior indicators in virtual learning environments (VLE) to design and implement efficient and timely learning analytics (LA) processes. The methodology consisted of a data management analysis that was carried out in the Moodle platform of the Faculty of Education Sciences of the National University of San Agustin of Arequipa, with the participation of 20 teachers, where qualitative online questionnaires were used to collect the participants' perceptions. The results propose a standard in terms of indicators of behavior in the teaching-learning process in EVA as they are: Preparation for learning, progress in the progress of the course, resources for learning, interaction in the forums and evaluation of resources. These were evaluated through learning analytics and show the efficiency of the proposed indicators. The conclusions highlight the importance of implementing standardized behavior indicators that allow us to efficiently develop learning analytics processes in VLE in order to obtain better predictions to make timely decisions and optimize the teaching-learning processes.

Author 1: Benjamin Maraza-Quispe

Author 2: Olga Melina Alejandro-Oviedo

Author 3: Walter Choquehuanca-Quispe

Author 4: Nicolas Caytuiro-Silva

Author 5: Jose Herrera-Quispe

Keywords: Indicators; behavior; learning; analytical; environments; virtual

Paper 20: Ensemble Learning for Rainfall Prediction

Abstract: Climate change research is a discipline that analyses the varying weather patterns for a particular period of time. Rainfall forecasting is the task of predicting particular future rainfall amount based on the measured information from the past, including wind, humidity, temperature, and so on. Rainfall forecasting has recently been the subject of several machine learning (ML) techniques with differing degrees of both short-term and also long-term prediction performance. Although several ML methods have been suggested to improve rainfall forecasting, the task of appropriate selection of technique for specific rainfall durations is still not clearly defined. Therefore, this study proposes an ensemble learning to uplift the effectiveness of rainfall prediction. Ensemble learning as an approach that combines multiple ML multiple rainfall prediction classifiers, which include Naïve Bayes, Decision Tree, Support Vector Machine, Random Forest and Neural Network based on Malaysian data. More specifically, this study explores three algebraic combiners: average probability, maximum probability, and majority voting. An analysis of our results shows that the fused ML classifiers based on majority voting are particularly effective in boosting the performance of rainfall prediction compared to individual classification.

Author 1: Nor Samsiah Sani

Author 2: Abdul Hadi Abd Rahman

Author 3: Afzan Adam

Author 4: Israa Shlash

Author 5: Mohd Aliff

Keywords: Ensemble learning; classification; rainfall prediction; machine learning

Paper 21: Presenting and Evaluating Scaled Extreme Programming Process Model

Abstract: Extreme programming (XP) is one of the widely used software process model for the development of small scale projects from agile family. XP is widely accepted by software industry due to various features it provides such as: handling frequent changing requirements, customer satisfaction, rapid feedback, iterative structure, team collaboration, and small releases. On the other hand, XP also holds some drawbacks, including: less documentation, less focus on design, and poor architecture. Due to all of these limitations, XP is only suitable for small scale projects and doesn’t work well for medium and large scale projects. To resolve this issue many researchers have proposed its customized versions, particularly for medium and large scale projects. The real issue arises when XP is selected for the development of small scale and low risk project but gradually due to requirement change, the scope of the project changes from small scale to medium or large scale project. At that stage its structure and practices which works well for small project cannot handle the extended scope. To resolve this issue, this paper contributes by proposing a scaled version of XP process model called SXP. The proposed model can effectively handle such situation and can be used for small as well as for medium and large scale project with same efficiency. Furthermore, this paper also evaluates the proposed model empirically in order to reflect its effectiveness and efficiency. A small scale client oriented project is developed by using proposed SXP and empirical results are collected. For an effective evaluation, the collected results are compared with a published case study of XP process model. It is reflected by detailed empirical analysis that the proposed SXP performed well as compared to traditional XP.

Author 1: Muhammad Ibrahim

Author 2: Shabib Aftab

Author 3: Munir Ahmad

Author 4: Ahmed Iqbal

Author 5: Bilal Shoaib Khan

Author 6: Muhammad Iqbal

Author 7: Baha Najim Salman Ihnaini

Author 8: Nouh Sabri Elmitwally

Keywords: Extreme Programming Process Model; XP; modified XP; scaled XP; customized XP; empirical comparison; empirical analysis

Paper 22: Measuring Impact of Traffic Parameters in Adaptive Signal Control through Microscopic Simulation

Abstract: This paper aims to exploit the traffic parameters setting in adaptive traffic control. In this study, is known as Dynamic Timing Optimiser (DTO). DTO is an online algorithm, uses real-time optimisation in estimating cycle length according to fluctuations arrival flow registered from the detector. DTO cycle time estimation is also incorporated with preset parameters including saturation flow rate (s) and lost time (L). However, these traffic flow parameters commonly inputted as one deterministic value which adopted for the whole day. For example, presumed constant of saturation flow rate (s) do not accurately represent an actual oversaturated condition. The effects of employing inaccurate saturation flow rate (s) lead to the underestimation of cycle length. Therefore, a set of parameters value is applied and tested encompass of default value and adjusted value that implied a heaviest traffic condition through microscopic simulation. This resulted in outcomes of intersection performance in terms of intersection delay, travel time and throughput. According to simulation result, saturation flow rate (s) parameters show a great influence in cycle length optimisation compared to lost time (L) parameter. Employing a realistic saturation flow rate (s) while inputting parameters in DTO according to real traffic conditions contribute to a less intersection delay. In addition, the study revealed that a longer lost time (L) configured in the signal system, a longer cycle length generated by DTO algorithm. As predicted, high delay occurs during long cycle length yet benefited in allowing a higher throughput.

Author 1: Fatin Ayuni Bt Aminzal

Author 2: Munzilah Binti Md Rohani

Keywords: Adaptive signal control; optimal cycle length; saturation flow rate; lost time; microsimulation

Paper 23: An Extreme Learning Machine Model Approach on Airbnb Base Price Prediction

Abstract: The base price of Airbnb properties prediction is still a new area of prediction research, especially with the Extreme Learning Machine (ELM). The previous studies had several suggestions for the advantages of ELM, such as good generalization performance, fast learning speed, and high prediction accuracy. This paper proposes how the ELM approach is used as a prediction model for Air BnB base price. Generally, the steps are setting hidden neuron numbers, randomly assigning input weight and hidden layer biases, calculating the output layer; and the entire learning measure finished through one numerical change without iteration. The performance of the model is estimated utilizing mean squared error, mean absolute percentage error, and root mean squared error. Experiment with Airbnb dataset in London with twenty-one features as input generates a faster learning speed and better accuracy than the existing model.

Author 1: Fikri Nurqahhari Priambodo

Author 2: Agus Sihabuddin

Keywords: Airbnb; base price prediction; extreme learning machine; fast learning

Paper 24: ITTP-PG: A Novel Grouping Technique to Enhance VoIP Service Bandwidth Utilization

Abstract: Recently, the field of telecommunications started to migrate to Voice over Internet Protocol (VoIP) service. VoIP service applications produce packets with short payload sizes to reduce packetization delay. That is, increasing the preamble size and expends the network link bandwidth. Packet grouping is a technique to enhance the employment of network link bandwidth. Numerous grouping techniques are suggested to enhance link bandwidth employment when using RTP/UDP protocols. Unlike previous research, this article suggests a packet grouping technique that works over the Internet Telephony Transport Protocol (ITTP), not RTP/UDP. This technique is called ITTP Packet Grouping (ITTP-PG). The ITTP-PG technique groups VoIP packets, which exist in the same route, in a single ITTP/IP preamble instead of an ITTP/IP preamble to each packet. Consequently, preamble size is diminished and network link bandwidth is saved. ITTP-PG also adds 3-byte runt-preamble to each packet to distinguish the grouped packets. The suggested ITTP-PG technique is simulated and compared with the conventional ITTP protocol (without grouping) using three elements, namely, the number of concurrent VoIP calls, preamble overhead, and bandwidth usage. Based on all these elements, the ITTP-PG technique outperforms the conventional ITTP protocol. For example, the result shows that bandwidth usage improved by up to 45.9% in the tested cases.

Author 1: Mayy Al-Tahrawi

Author 2: Mosleh Abulhaj

Author 3: Yousef Alrabanah

Author 4: Sumaya N. Al-Khatib

Keywords: Voice over Internet Protocol (VoIP); Internet Telephony Transport Protocol (ITTP); packet grouping; network bandwidth

Paper 25: Improving Intelligent Personality Prediction using Myers-Briggs Type Indicator and Random Forest Classifier

Abstract: The term “personality” can be defined as the mixture of features and qualities that built an individual's distinctive characters, including thinking, feeling and behaviour. Nowadays, it is hard to select the right employees due to the vast pool of candidates. Traditionally, a company will arrange interview sessions with prospective candidates to know their personalities. However, this procedure sometimes demands extra time because the total number of interviewers is lesser than the total number of job seekers. Since technology has evolved rapidly, personality computing has become a popular research field that provides personalisation to users. Currently, researchers have utilised social media data for auto-predicting personality. However, it is complex to mine the social media data as they are noisy, come in various formats and lengths. This paper proposes a machine learning technique using Random Forest classifier to automatically predict people's personality based on Myers–Briggs Type Indicator® (MBTI). Researchers compared the performance of the proposed method in this study with other popular machine learning algorithms. Experimental evaluation demonstrates that Random Forest classifier performs better than the different three machine learning algorithms in terms of accuracy, thus capable in assisting employers in identifying personality types for selecting suitable candidates.

Author 1: Nur Haziqah Zainal Abidin

Author 2: Muhammad Akmal Remli

Author 3: Noorlin Mohd Ali

Author 4: Danakorn Nincarean Eh Phon

Author 5: Nooraini Yusoff

Author 6: Hasyiya Karimah Adli

Author 7: Abdelsalam H Busalim

Keywords: Machine learning; random forest; Myers–Briggs Type Indicator® (MBTI); personality prediction; random forest classifier; social media; Twitter user

Paper 26: The Development of Parameter Estimation Method for Chinese Hamster Ovary Model using Black Widow Optimization Algorithm

Abstract: Chinese Hamster Ovary (CHO) cells are very famous in biological and medical research, especially in the protein production industry. It is because the characteristic of the cells with low chromosome numbers make it suitable for genetic study. However, all the data tends to be noisy and not fit. That is why many parameter estimation methods have been developed since their first introduction to determine the best value for a particular parameter. Metaheuristic parameter estimation is an algorithm framework that is processed using some technique to generate a pattern or graph. It will help the researcher get the fitted graph model, correct data, and estimate the value based on the data's behaviour. This process started with implementing the parameter estimation that can be generated by using the combination of mathematical models and all the data obtained from the researcher's experiments. This way, biomedical research's cell culture can benefit from all this metaheuristic parameter estimation used. A kinetic model can estimate the data obtained from the Chinese Hamster Ovary (CHO) cells. Therefore, this paper proposed a Black Widow Optimisation (BWO) algorithm inspired by the bizarre mating behaviour of a spider as the method to use to solve the problem. The proposed algorithm has been compared with the other three famous algorithms, which are Particle Swarm Optimization (PSO), Differential Evolutionary (DE), and Bees Optimization Algorithm (BOA). The results showed that the proposed algorithm could get better value in terms of the best cost despite taking a long time to use.

Author 1: Nurul Aimi Munirah

Author 2: Muhammad Akmal Remli

Author 3: Noorlin Mohd Ali

Author 4: Hui Wen Nies

Author 5: Mohd Saberi Mohamad

Author 6: Khairul Nizar Syazwan Wan Salihin Wong

Keywords: Chinese Hamster Ovary; Black Widow optimization; metaheuristic; parameter estimation; genetic study

Paper 27: Self-Organizing Map based Wallboards to Interpret Sudden Call Hikes in Contact Centers

Abstract: In a contact center, it is required to foresee and excavate any disturbance to the daily experiencing call pattern. Abnormal call pattern may be a result of a sudden change in the organization’s external world. Expecting a methodological analysis prior to meet customers’ demand may introduce a delay for queuing customers. It is required a fast and promising method to predict and reasoning any unwilling event. It is not possible to draw conclusions by considering one dimension such as total call count. Total call count may increase in same way due to a failure in any service. Research mainly focusses on reasoning multidimensional events based on historical records. In contrast to traditional wallboards, our approach is capable of clustering and predicting disturbances to the normal call patterns based on historical knowledge by considering many dimensions such as queue statistics of many service queues. Our approach showed improved results over traditional wallboards equipped with 2D or 3D graphs.

Author 1: Samaranayaka J. R. A. C. P

Author 2: Prasad Wimalaratne

Keywords: Multidimensional data; visualization; contact centers; self-organizing map; clustering

Paper 28: SDCT: Multi-Dialects Corpus Classification for Saudi Tweets

Abstract: There is an increasing demand for analyzing the contents of social media. However, the process of sentiment analysis in Arabic language especially Arabic dialects can be very complex and challenging. This paper presents details of collecting and constructing a classified corpus of 4180 multi-dialectal Saudi tweets (SDCT). The tweets were annotated manually by five native speakers in two stages. The first stage annotated the tweets as Hijazi, Najdi, and Eastern based on some Saudi regions. The second stage annotated the sentiment as positive, negative, and natural. The annotation process was evaluated using Kappa Score. The validation process used cross validation technique through eight baseline experiments for training different classifier models. The results present that the 10-folds validation provides greater accuracy than 5-folds across the eight experiments and the classification of the Eastern dialects achieved the best accuracy compared to the other dialects with an accuracy of 91.48%.

Author 1: Afnan Bayazed

Author 2: Ola Torabah

Author 3: Redha AlSulami

Author 4: Dimah Alahmadi

Author 5: Amal Babour

Author 6: Kawther Saeedi

Keywords: Arabic dialects; dialects classification; language classification; natural language processing; Saudi dialects; sentiment analysis; Twitter

Paper 29: Using Interdependencies for the Prioritization and Reprioritization of Requirements in Incremental Development

Abstract: There is a growing trend to develop and deliver the software in an incremental manner; to achieve greater consistency in the developed software and better customer satisfaction during the requirement engineering process. Some of the developed increments in the incremental model will be delivered to consumers and run in their environments, so a set of these requirements are evaluated, introduced, and delivered as the first increment. Other requirements are delivered as the next step and so on for the next increment. The priority of requirements plays an important role in each increment, but it is precluded by the interdependences between the requirements and resources constraints. Therefore, this paper introduces a model for requirements prioritization and a reprioritization based on these important factors. The first one is the requirement interdependencies which are described as a hybrid approach of tractability list and directed acyclic graph, and the second factor is the constraints of the requirements resources that are used based on the queuing theory for requirements reprioritization. In order to achieve this, two algorithms namely; Priority Dependency Graph (PDG) and Resources Constraints Reprioritization (RCR), were proposed with a linear time complexity and implemented via a case study.

Author 1: Aryaf Al-Adwan

Author 2: Anaam Aladwan

Keywords: Requirement engineering; incremental model; requirement prioritization; requirement interdependencies; dependency graph; queuing theory

Paper 30: A Novel Geometrical Scale and Rotation Independent Feature Extraction Technique for Multi-lingual Character Recognition

Abstract: This paper presents a novel geometrical scale and rotation independent feature extraction (FE) technique for multi-lingual character recognition (CR). The performance of any CR techniques mainly depends on the robustness of the proposed FE methods. Currently, there are very few scale and rotation independent FE techniques present in the literature which successfully extract the robust features from characters with noise such as distortion and breaks in the characters. Many FE methods from the literature failed to distinguish the characters which look similar in their appearance. So, in this paper, we have proposed a novel scale and rotation independent geometrical shape FE technique which successfully recognized distorted, broken, and similarly looking characters. Aside from the proposed FE technique, we've used crossing count (CC) features. Finally, we have combined the proposed features with CC features to make as Feature Vector (FV) of the character to be recognized. The proposed CR technique is evaluated using publicly available media-lab license plate (LP), ISI_Bengali, and Chars74K benchmark data sets and achieved encouraging results. To further assess the performance of the proposed FE method, we've used a proprietary data set containing nearly 168000 multi-lingual characters from English, Devanagari, and Marathi scripts and achieved encouraging results. We have observed better classification rates for the proposed FE method using publicly available benchmark data sets as compared to few of the CR FE methods from the literature.

Author 1: Narasimha Reddy Soora

Author 2: Ehsan Ur Rahman Mohammed

Author 3: Sharfuddin Waseem Mohammed

Keywords: Feature extraction; character recognition; crossing count features; edit distance; scale and rotation independent feature extraction

Paper 31: Harmonic Mean based Classification of Images using Weighted Nearest Neighbor for Tagging

Abstract: On image sharing websites, the images are associated with the tags. These tags play a very important role in an image retrieval system. So, it is necessary to recommend accurate tags for the images. Also, it is very important to design and develop an effective classifier that classifies images into various sematic categories which is the necessary step towards tag recommendation for the images. The performance of existing tag recommendation based on k nearest neighbor methods can be affected due to the number of k neighbors, distance measures, majority voting irrespective of the class and outlier present in the k-neighbors. To increase the accuracy of the classification and to overcome the issues in existing k nearest neighbor methods, the Harmonic Mean based Weighted Nearest Neighbor (HM-WNN) classifier is proposed for the classification of images. Given an input image, the HM-WNN determines k nearest neighbors from each category for color and texture features separately over the entire training set. The weights are assigned to the closest neighbor from each category so that reliable neighbors contribute more to the accuracy of classification. Finally, the categorical harmonic means of k nearest neighbors are determined and classify an input image into the category with a minimum mean. The experimentation is done on a self-generated dataset. The result shows that the HM-WNN gives 88.01% accuracy in comparison with existing k-nearest neighbor methods.

Author 1: Anupama D. Dondekar

Author 2: Balwant A. Sonkamble

Keywords: Image classification; k-nearest neighbor; weighted nearest neighbor; harmonic mean vector; color and texture features

Paper 32: A Design Study to Improve user Experience of a Procedure Booking Software in Healthcare

Abstract: In the era of technology-driven healthcare delivery and the proliferation of e-health systems, procedure booking software is becoming common. Procedure booking software (PBS) affects healthcare delivery by improving health care efficiency and outcomes, while cutting costs. Therefore, poor software design for PBS, especially if it is designed for important and critical appointments such as cardiac catheterization operations, creates stress for physicians and may result in their rejection of this technology. Moreover, if the system design forces them to spend more time documenting health information, physicians would then tend to prefer face-to-face interaction with patients. Software with poor usability increases the workload of physicians thus reducing system efficiency. So designing a useful and effective web user interface for such software is an essential requirement for health websites. The aim of this paper is to design and develop a PBS as a case study using the health systems design (HSD) tool. HSD is a validated design tool for creating PBS based on physician behavior and persona. The applicability of a PBS design is explored by physicians evaluated. The PBS design was evaluated in terms of objective and subjective characteristics and user experience attributes. Test participants were divided into two groups: specialists and fellows. The results show that there was no significant difference between participants in either group. All were able to complete the tasks successfully with a minimum amount of time, clicks, and errors indicating that the effectiveness, efficiency and cognitive load were similar for all participants. User satisfaction yielded a score of 86 on the System Usability Scale (SUS), putting it in the A Grade. Also, user experience attributes demonstrated that participants were satisfied using the proposed design system.

Author 1: Hanaa Abdulkareem Alzahrani

Author 2: Reem Abdulaziz Alnanih

Keywords: Procedure booking software; health systems design tool; cardiac catheterization; user experience; usability evaluation; system usability scale

Paper 33: Validation Analysis of Scalable Vector Graphics (SVG) File Upload using Magic Number and Document Object Model (DOM)

Abstract: The use of technology is increasing rapidly, such as applications or services connected to the Internet. Security is considered necessary because of the growing and increasing use of digital systems. With the number of threats to attacks on digital form or server systems is required to handle the risk of attacks on the server, the file upload feature. The system usually processes the file upload feature on a website or server with server-side (back-end) validation or filtering of digital object file types or a client-side (front-end) web browser in HTML or Javascript. Filtering techniques for Scalable Vector Graphics (SVG) usually files only see the file extension or Multipurpose Internet Mail Extension (MIME) type of an uploaded file. However, this filtering can still manipulate, for example, in ASCII prefix checking, which has two writes, namely "<?xml” and “<svg ”. SVG files do not contain metadata such as image encoded in JPEG or PNG files. This problem can overcome by adding filtering techniques to check the validation of a file with validation of eXtensible Markup Language (XML) using magic numbers and the Document Object Model (DOM). This research developed using the waterfall method and black-box security testing refers to a software security testing method in which security controls, defense, and application design are tested. Handling of security validation for uploading SVG files using file extensions and MIME types has a success rate of 75 percent from the eight tested scenarios while handling using file extensions, magic numbers, and Document Object Model (DOM) produces a success rate of 100 percent from 8 test scenarios. Testing uses a black-box so that handling using the file extension, magic number, and Document Object Model (DOM) is better than using only file extensions and mime types.

Author 1: Fahmi Anwar

Author 2: Abdul Fadlil

Author 3: Imam Riadi

Keywords: Magic number; Scalable Vector Graphics (SVG); security; upload; validation

Paper 34: A Pilot Study of an Instrument to Assess Undergraduates’ Computational thinking Proficiency

Abstract: The potentiality of computational thinking (CT) in problem solving has gained much attention in academic communities. This study aimed at developing and validating an instrument, called Hi-ACT, to assess CT ability of university undergraduates. The Hi-ACT evaluates both technical and soft skills applicable to CT based problem solving. This paper reports a pilot study conducted to test and refine the initial Hi-ACT. Survey method was employed through which questionnaire comprising of 155 items was piloted among 548 university undergraduates. Structural equation modeling with partial least squares was applied to examine the Hi-ACT’s reliability and validity. Composite reliability was used to assess internal consistency reliability, while convergent validity was evaluated using based on items’ outer loadings and constructs’ average variance extracted. As a result, 41 items were excluded, and an instrument to assess CT ability comprising 114 items and ten constructs (abstraction, algorithmic thinking, decomposition, debugging, generalization, evaluation, problem solving, teamwork, communication, and spiritual intelligence) was developed. The reliability and validity of the Hi-ACT in its pilot form have been verified.

Author 1: Debby Erce Sondakh

Author 2: Kamisah Osman

Author 3: Suhaila Zainudin

Keywords: Computational thinking; assessment; skills; attitudes; undergraduates; self-assessment

Paper 35: Optimize the Combination of Categorical Variable Encoding and Deep Learning Technique for the Problem of Prediction of Vietnamese Student Academic Performance

Abstract: Deep learning techniques have been successfully applied in many technical fields such as computer vision and natural language processing, and recently researchers have paid much attention to the application of this technology in socio-economic problems including the student academic performance prediction (SAPP) problem. In this specialization, this study focusses on both designing an appropriate Deep learning model and handling categorical input variables. In fact, categorical data variables are quite popular in student academic performance prediction problem, and deep learning technique in particular or artificial neural network in general only work well with numerical data variables. Therefore, this study investigates the performance of the combination categorical encoding methods including label encoding, one-hot encoding and “learned” embedding encoding with deep learning techniques including Deep Dense neural network and Long short-term memory neural network for SAPP problem. In experiment, this study compared these proposed models with each other and with some prediction methods based on other machine learning algorithms at the same time. The results showed that the categorical data transformation method using the “learned” embedding encoding improved performance of the deep learning models, and its combination with long short-term memory network gave an outstanding result for the researched problem.

Author 1: Do Thi Thu Hien

Author 2: Cu Thi Thu Thuy

Author 3: Tran Kim Anh

Author 4: Dao The Son

Author 5: Cu Nguyen Giap

Keywords: Deep learning techinique; categorical data type; “learned” embedding encoding; student academic performance prediction

Paper 36: Conceptual Model for Connected Vehicles Safety and Security using Big Data Analytics

Abstract: The capability of Connected Vehicles (CVs) connecting to a nearby vehicle, surrounding infrastructure and cyberspace presents a high risk in the aspect of safety and security of the CV and others. Data volume generated from the sensors and infrastructure in CVs environment are enormous. Thus, CVs implementations require a real-time big data processing and analytics to detect any anomaly in the CVs’s environment which are physical layer, network layer and application layer. CVs are exposed to various vulnerabilities associated with exploitations or malfunctions of the components in each layer that could result in various safety and security event such as congestion and collision. The safety and security risks added an extra layer of required protection for the CVs implementation that need to be studied and refined. To address this gap, this research aims to determine the basic components of safety and security for CVs implementation and propose a conceptual model for safety and security in CVs by applying the machine learning and deep learning techniques. The proposed model is highly correlated to safety and security and could be applied in congestion and collision prediction.

Author 1: Noor Afiza Mat Razali

Author 2: Nuraini Shamsaimon

Author 3: Muslihah Wook

Author 4: Khairul Khalil Ishak

Keywords: Connected vehicles; safety and security monitoring; collision prediction; congestion prediction; machine learning; deep learning

Paper 37: A New Hybrid KNN Classification Approach based on Particle Swarm Optimization

Abstract: K-Nearest Neighbour algorithm is widely used as a classification technique due to its simplicity to be applied on different types of data. The presence of multidimensional and outliers data have a great effect on the accuracy of the K-Nearest Neighbour algorithm. In this paper, a new hybrid approach called Particle Optimized Scored K-Nearest Neighbour was proposed in order to improve the performance of K-Nearest Neighbour. The new approach is implemented in two phases; the first phase help to solve the multidimensional data by making feature selection using Particle Swarm Optimization algorithm, the second phase help to solve the presence of outliers by taking the result of the first phase and apply on it a new proposed scored K-Nearest Neighbour technique. This approach was applied on Soybean dataset, using 10 fold cross validation. The experiment results shows that the proposed approach achieves better results than the K-Nearest Neighbour algorithm and it’s modified.

Author 1: Reem Kadry

Author 2: Osama Ismael

Keywords: K-nearest neighbour; outlier; multidimensional; particle swarm optimization; scored k-nearest neighbour

Paper 38: An Effective Heuristic Method to Minimize Makespan and Flow Time in a Flow Shop Problem

Abstract: In this paper, it is presented a heuristic method for solving the multi-objective flow shop problem. The work carried out considers the simultaneous optimization of the makespan and the flow time; both objectives are essential in measuring the production system's performance since they aim to reduce the completion time of jobs, increase the efficiency of resources, and reduce waiting time in queue. The proposed method is an adaptation of multi-objective Newton's method, which is applied to problems with functions of continuous variables. In this adaptation, the method seeks to improve a sequence of jobs through local searches recursively. The computational experiments show the potential of the proposed method to solve medium-sized and large instances compared with other existing literature methods.

Author 1: Miguel Fernandez

Author 2: Avid Roman-Gonzalez

Keywords: Flow shop problem; multi-objective optimization; non-dominated solution

Paper 39: Level of Budget Execution According to the Professional Profile of Regional Governors Applying Machine Learning Models

Abstract: Machine Learning is a discipline of artificial intelligence that implements computer systems capable of learning complex patterns automatically and predicting future behaviors. The objective was to implement a Machine Learning model that allows to identify, classify and predict the influence of the professional training of the governors in the execution of the public spending of the regional governments of Peru. Of the 14 indicators of academic training, professional experience and university studies were selected as significant indicators that contribute to the execution of public spending by the 25 governors of Peru. For the prediction of the execution of the public spending of the regional governors, a supervised learning algorithm was implemented. The mean square error for the Machine Learning regression model was 4.20 and the coefficient of determination was 0.726, which indicates that the execution of public spending by regional governments is explained with 72.6% by the professional experience and university studies of the governors. The regional governors of Peru with university studies and professional experience achieve better results in the execution of public spending in the regional governments of Peru.

Author 1: Jose Luis Morales Rocha

Author 2: Mario Aurelio Coyla Zela

Author 3: Nakaday Irazema Vargas Torres

Author 4: Jarol Teofilo Ramos Rojas

Author 5: Daniel Quispe Mamani

Author 6: Jose Oscar Huanca Frias

Keywords: Machine learning; multiple regression; professional experience; university studies; public budget; governor; public spending

Paper 40: Investigating Students’ Computational Thinking Skills on Matter Module

Abstract: The revolution of the fourth industrial has impacted most aspect of our life and demanding a paradigm shift including education. It has become to our attention that there is a need to inculcate complex problem-solving skills among youth to equipped them to face the challenges in the era of digital technology. To fulfill the needs, computational thinking was introduced in school curriculum in Malaysia in 2017. It is still rather new, and this creates opportunity to understand how computational thinking can best be integrated in teaching and learning. In this study, we developed a module for a science topic, Matter and examine its impact on computational thinking skills on 65 students at secondary level. The computational thinking skills integrated in this study were abstraction, decomposition, algorithm, generalization, and evaluation. A quasi-experimental method was employed, and the ANCOVA result showed that there was no significant difference between control and treatment group on computational thinking skills. However, the score means for each of the computational thinking skills for both groups, showed that three skills in the treatment group were higher than the control group. The three computational thinking skills were decomposition, evaluation, and algorithm. This study suggested that CT involved mental process and proper planning is crucial to integrate computational thinking skills as teaching and learning is very contextual in nature.

Author 1: Noraini Lapawi

Author 2: Hazrati Husnin

Keywords: Computational thinking skills; problem solving skill; teaching and learning; decomposition; evaluation; algorithm; science module; matter; secondary level students

Paper 41: Analysis of Steganographic on Digital Evidence using General Computer Forensic Investigation Model Framework

Abstract: Steganography is one of the anti-forensic techniques used by criminals to hide information in other messages which can cause problems in the investigation process and difficulties in obtaining original information evidence on the digital crime. Digital forensic analysts are required ability to find and extract the messages that have been inserted by using proper tools. The purpose of this research is to analyze the hidden digital evidence using steganography techniques. This research uses the static forensics method by applying five stages in the Generic Forensics Investigation Model framework, namely pre-process, acquisition & preservation, analysis, presentation, and post-process as well as extracting files that have been infiltrated based on case scenarios involving digital crime. The tools used are FTK Imager, Autopsy, WinHex, Hiderman, and StegSpy. The results on the steganographic file insertion experiment of 20 files indicate that StegSpy and Hiderman are effective on the steganographic analysis of digital evidence. StegSpy can detect the presence of secret messages with 85% success rate. The extraction process using Hiderman for 18 files with containing steganographic messages had 100% successful.

Author 1: Muh. Hajar Akbar

Author 2: Sunardi

Author 3: Imam Riadi

Keywords: Steganography; anti forensics; general computer forensic investigation model; hiderman

Paper 42: Multi-Verse Algorithm based Approach for Multi-criteria Path Planning of Unmanned Aerial Vehicles

Abstract: In this paper, a method based on a Multiobjective Multi-Verse Optimizer (MOMVO) is proposed and successfully implemented to solve the unmanned aerial vehicles’ path planning problem. The generation of each coordinate of the aircraft is reformulated as a multiobjective optimization problem under operational constraints. The shortest and smoothest path by avoiding all obstacles and threats is the solution of such a hard optimization problem. A set of competitive metaheuristics such as Multiobjective Salp Swarm Algorithm (MSSA), Grey Wolf Optimizer (MOGWO), Particle Swarm Optimization (MOPSO) and Non-dominated Sorting Genetic Algorithm II (NSGA-II) are retained as comparison tools for the problem’s resolution. To assess the performance of the reported algorithms and conclude about their effectiveness, an empirical study is firstly performed for solving different multiobjective test functions from the literature. These algorithms are then used to obtain a set of optimal Pareto solutions for the multi-criteria path planning problem. An efficient Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) of Multi Criteria Decision-Making (MCDM) model is investigated to find the optimal solution from the non-dominant ones. Demonstrative results and statistical analysis are presented and compared in order to show the effectiveness of the proposed MOMVO-based path planning technique.

Author 1: Raja Jarray

Author 2: Soufiene Bouallegue

Keywords: Unmanned aerial vehicles; path planning problem; multiobjective optimization; multiobjective multi-verse algorithm; decision-making model; nonparametric statistical tests

Paper 43: Process Level Social Media Business Value Configuration of SMEs in Saudi Arabia

Abstract: The key enabler of strategic design based on IT is process level value; however, few researchers have tackled the mechanisms through which small and medium-sized enterprises (SMEs) can create value at the process level. This study sheds light on the mechanism of creating social media business value at the process level by identifying the interaction effects of social media and IT resources and the mediating role of management’s commitment to innovation as an organizational factor. The research model is based on the IT business value approach, quantitative and descriptive methodology is adopted, and the data are analyzed using structural equation modeling. Among the findings based on 301 SMEs in the Kingdom of Saudi Arabia, that management’s commitment to innovation is a necessary condition for social media resources to create dynamic capabilities, and the interaction effects between social media resources and IT resources on social media capability have no impact on the value-generation process at the process level. The result improves the understanding of the theoretical implications of social media business value at the process level, which can be used to guide theorizing about IT business value. SME managers, IT designers, and national decision-makers can use the findings to gain strategic advantage through social media platforms.

Author 1: Anwar Shams Eldin

Author 2: Awadia Elnour

Author 3: Rugaia Hassan

Keywords: Interaction effects of social media and IT resources; process level; SMEs; social media business value; social media capabilities; management’s commitment to innovation

Paper 44: Recent Progress of Blockchain Initiatives in Government

Abstract: Blockchain is a decentralized and distributed ledger technology that aims to ensure transparency, data security, and integrity. There are rising interest and investment by the governments and industries in Blockchain to deliver significant cost savings and increase efficiency. Identifying Blockchain initiatives that are currently implemented in the government world-wide could improve understanding as well as set benchmarks for specific countries. However, although some review studies on Blockchain initiatives have been carried out, there are very few studies that uncover Blockchain initiatives implemented by the government in Asian countries. Hence, this study reviews Blockchain initiatives in the five-top e-government development index (EGDI) countries in Asia; South Korea, Singapore, Japan, United Arab Emirates, and Cyprus. We strategized our review methods by utilizing relevant keyword searches in existing literature, books, academic journals, conferences, and industrial reports. The results of this study will help other researchers and practitioners to recognize the current stage of Blockchain initiatives in the government of Asian countries.

Author 1: Faizura Haneem

Author 2: Hussin Abu Bakar

Author 3: Nazri Kama

Author 4: Nik Zalbiha Nik Mat

Author 5: Razatulshima Ghazali

Author 6: Yasir Mahmood

Keywords: Blockchain initiatives; governments; review

Paper 45: Voice-Disorder Identification of Laryngeal Cancer Patients

Abstract: This Previous studies have shown that much of laryngeal cancer-based work was carried out with a minimal set of linear features. Much of the work was focused on the study of larynx preservation, quality of life around radiotherapy, or surgery. The voice disorder database was not solely limited to laryngeal cancer. In the context of this, the paper proposes a non-invasive voice disorder detection of laryngeal cancer patients. The sustained vowel /a/ was recorded with 55 laryngeal cases and 55 healthy cases. Owing to the non-linearity property of the vocal cords, seven non-linear parameters along with biologically inspired 39 Mel-Frequency Cepstral Coefficients (MFCC) are extracted. This forms a laryngeal dataset of size 110X46. The wrapper method is used for better feature selection and to enhance the discriminating ability of the present work. The classification is carried out using a tuned support vector machine (SVM) with grid search and random forest (RF). The present work has shown an improved accuracy of 76.56% with SVM and 80% in the case of random forest. The forward selection of features along with the involvement of non-linear features has played a significant role in the better performance of the present system.

Author 1: G. B. Gour

Author 2: V.Udayashankara

Author 3: Dinesh K. Badakh

Author 4: Yogesh A Kulkarni

Keywords: Support Vector Machine (SVM); random forest; Mel Frequency Cepstral Coefficients (MFCC); voice disorder detection; laryngeal cancer; non-linear features

Paper 46: An Improved Time-Based One Time Password Authentication Framework for Electronic Payments

Abstract: One-time Password is important in present day scenario in the purposes of improving the security of electronic payments. Security sensitive environment or perhaps organization avoid the resources from unauthorized access by allowing different access control mechanism as user authentication. There are several safety issues in one Password based authentication. However, studies show that OTP sent over SMS are causing different causes and issues, which lead to precious time, delay in transaction. User authentication can be raised with more levels within the procedure of multi-factor authentication scheme. Time-based One-time Password and biometrics are one of the widely accepted mechanisms that incorporate multi-factor authentication. In this paper, we approach the Time-based OTP authentication algorithm with biometric fingerprints to secure an electronic payment. This algorithm uses a secret key exchanged between the client and the server and uses a certain password through the algorithm. The shuffle of the TOTP approach better wear by screening the key as being a QR code, as revealed in the majority movable applications are able to read. It offers confidentiality at the application level within the system to protect user credential within equal entities (the user and the server) for preventing brute force and dictionary attacks. Thus, the proposed system design is possible for users because of the lack of the concern of holding its own hardware token or additional charges from the short message service. Our suggested approach has been found to improve safety performance substantially compared to existing methods with regard to authentication and authorization. This research hopes to boost research effort on further advancement of cryptosystems surrounding multi-factor authentication.

Author 1: Md Arif Hassan

Author 2: Zarina Shukur

Author 3: Mohammad Kamrul Hasan

Keywords: Electronic payments; One Time Password (OTP); Quick Response (QR) code; Time based One Time Password (TOTP)

Paper 47: An Efficient Digital Space Vector PWM Module for 3-φ Voltage Source Inverter (VSI) on FPGA

Abstract: The realization of digital control circuitry based PWM strategies provides many advantages. It includes better prototyping, higher switching frequency, simple hardware, and flexibility by overcoming the limitations of analog control strategies. In this article, The Digital space vector-based Pulse width Modulation ((DSV-PWM) is designed. The DSV-PWM Module includes, mainly, Xdq reference frame, Sector generation, Square root, switching time generation, Carry-save adder (CSA), and PWM Generation module. These modules are designed using simple logical operations, and combinational circuits to improve the DSV-PWM performance. The DSV-PWM Module is synthesized and implemented on a cost-effective Artix-7 FPGA device. The present work utilizes a < 1% chip area, operates at 597.83 MHz of maximum frequency, and utilizes 110mW of total power on FPGA Device. The DSV-PWM module is also compared with the existing SV-PWM approach with better improvement in hardware constraints like chip area, operating frequency, and dynamic power (mW).

Author 1: Shalini Vashishtha

Author 2: Rekha K.R

Keywords: Digital space vector PWM; 3-phase voltage source inverter; sector generation module; switching time generation; FPGA; Verilog-HDL; Xilinx

Paper 48: Lung Cancer Detection using Bio-Inspired Algorithm in CT Scans and Secure Data Transmission through IoT Cloud

Abstract: Primary recognition of pulmonary cancer nodules eloquently increases the odds of survival, also leads it solider problem to resolve, as it often relies on a tomography scan filmic examination. By increasing the possibility of effective treatment, earlier tumor diagnosis decreases lung cancer mortality. Radiologists usually diagnose lung cancer on medical images by a systematic analysis that consumes more time and is unreliable often, because of the substantial improvement in the transmission of data in the healthcare sector, the protection and integrity of medical data has been a huge problem for healthcare applications. This study utilizes computational intelligence techniques. For detection and data transmission, a novel Hybrid model is therefore proposed in this paper. Two steps are involved in the proposed method where diverse image processing procedures are used to detect cancer in the first step using MATLAB and data transfer to authorized persons via the IoT cloud in the second stage. The simulated steps include pre-processing, segmentation by Otsu thresholding along with swarm intelligence algorithm, extraction of features by local binary pattern and classification using the support vector machine (SVM). This work demonstrates the dominance of swarm-intelligent framework over the conventional algorithms in terms of performance metrics like sensitivity, accuracy and specificity as well as training time. The tests carried out show that the model built can achieve up to 92.96 percent sensitivity, 93.53 percent accuracy and 98.52 percent specificity.

Author 1: C. Venkatesh

Author 2: Polaiah Bojja

Keywords: Pulmonary; mortality; carcinogenic; swarm intelligence; IoT

Paper 49: Liver Tumor Segmentation using Superpixel based Fast Fuzzy C Means Clustering

Abstract: In computer aided diagnosis of liver tumor detection, tumor segmentation from the CT image is an important step. The majority of methods are not able to give an integrated structure for finding fast and effective tumor segmentation. Hence segmentation of tumor is most difficult task in diagnosing. In this paper, CT abdominal image is segmented using Superpixel-based fast Fuzzy C Means clustering algorithm to decrease the time needed for computation and eradicate the manual interface. In this algorithm, a superpixel image with perfect contour can be obtain using a Multiscale morphological gradient reconstruction operation. Superpixel is pre-segmentation algorithm and is employed to obtain segmentation accuracy. FCM with modified object is used to obtain the color segmentation. This method is examined on 20 CT images gathered from liveratlas database, results shows that this approach is fast and accurate compared to most of segmentation algorithms. Statistical parameters which include accuracy, precision, sensitivity, specificity, dice, rfn and rfp are calculated for segmented image. The results shows that this algorithm gives high accuracy of 99.58% and improved rfn value of 8.34% compared with methods discussed in the literature.

Author 1: Munipraveena Rela

Author 2: Suryakari Nagaraja Rao

Author 3: Patil Ramana Reddy

Keywords: CT scan image; image segmentation; fuzzy c mean clustering; liver mask; superpixel image

Paper 50: RHEM: A Robust Hybrid Ensemble Model for Students’ Performance Assessment on Cloud Computing Course

Abstract: Creating tools, such as a prediction model to assist students in a traditional or virtual setting, is an essential activity in today's educational climate. The early stage towards incorporating these predictive models using techniques of machine learning focused on predicting the achievement of students in terms of the grades obtained. The research aim is to propose a robust hybrid ensemble model (RHEM) that can warn at-risks students (on Cloud Computing course) of their likely outcomes at the early semester assessment. We hybridised four renowned single algorithms – Naïve Bayes, Multilayer Perceptron, k-Nearest Neighbours, and Decision Table – with four well-established ensemble algorithms – Bagging, RandomSubSpace, MultiClassClassifier, and Rotation Forest – which produced 16 new hybrid ensemble classifier models. Hence, we have thoroughly and rigorously built, trained, and tested 24 models all together. The experiment concluded that the Rotation Forest + MultiLayer Perceptron model was the best performing model based on the model evaluation in terms of Accuracy (91.70%), Precision (86.1%), F-Score rate (87.3%), and Receiver Operating Characteristics Area detection (98.6%). Our research will help students identify their likely final grades in terms of whether they are excellent, very good, good, pass, or fail, and, thus, transform their academic conduct to achieve higher grades in the final exam accordingly.

Author 1: Sapiah Sakri

Author 2: Ala Saleh Alluhaidan

Keywords: Academic performance; classification algorithms; cloud computing course; ensemble algorithms; hybrid ensemble classifier model; student academic performance tracking

Paper 51: An Ontology-Based Predictive Maintenance Tool for Power Substation Faults in Distribution Grid

Abstract: Recent advances in Power Grid (PG) technology pose an important problem of measuring the effectiveness of power grid configurations. Current assessment models are not adequate to mitigate the setup issues due to the absence of a high-fidelity evaluation framework that can consider diverse scenarios based on the market interest. Consequently, we develop a highly flexible Ontology-based Evaluation System that can accommodate and assess different scenarios. The use of ontology as middleware is the best approach to produce an efficient, semantically aware, and operationally accurate system environment for managing flexibility in evaluation. The evaluation is made by predicting the failure intensity and subsequently generate a maintenance report of a particular configuration. The selection of the best configuration is made by comparing the maintenance report of different configurations. The developed evaluation system consists of three main components which are Configuration Generator Tool (GCT), Failure Prediction Model (FDM), and Hybrid Simulation Platform (HSP). The GCT is a knowledge-based system that provides a powerful tool for engineers to generate alternative configurations. The GCT data were collected from literature, validated by experts, and modeled using Web Ontology Language (OWL). While the HSP was developed using several modelings and ontology-based tools such as blender 3D modeling, unity 3d, asp.net, my sql, and apache Jena fuseki. Finally, the FDM was developed based on the impact and relationship of odd events to power grid components and the impact of a failed component to other components, the prediction is modeled using two methods Poisson Model and Likelihood Estimation Method.

Author 1: Moamin A. Mahmoud

Author 2: Alicia Y.C. Tang

Author 3: Kuganesan Kumar

Author 4: Nur Liyana Law Mohd Firdaus Law

Author 5: Mathuri Gurunathan

Author 6: Durkasiny Ramachandran

Keywords: Predictive maintenance; ontology; power substation faults; distribution grid

Paper 52: Home Security System with Face Recognition based on Convolutional Neural Network

Abstract: Security of house doors is very important and becomes the basis for the simplest and easiest security and sufficient to provide a sense of security to homeowners and along with technological developments, especially in the IoT field, which makes technological developments in locking house doors have developed a lot like locking house doors with faces and others. The development of facial recognition systems has also developed and has been implemented for home door locking systems and is an option that is quite simple and easy to use and is quite accurate in recognizing the face of homeowners. The development of the CNN method in facial recognition has become one of the face recognition systems that are easy to implement and have good accuracy in recognizing faces and has been used in object recognition systems and others. In this study, using the CNN Alexnet facial recognition system which is implemented in a door locking system, data collection is done by collecting 1048 facial data on the face of the homeowner using a system which is then used to train machine learning where the results are quite accurate where the accuracy is the result is 97.5% which is quite good compared to some other studies. The conclusion is the CNN Alexnet method can perform facial recognition which is quite accurate which can be implemented on the IoT device, namely, the Raspberry Pi.

Author 1: Nourman S. Irjanto

Author 2: Nico Surantha

Keywords: Home door security; CNN Alexnet; facial recognition; Raspberry Pi

Paper 53: The Relationship of Trustworthiness and Ethical Value in the Healthcare System

Abstract: Females prefer discovering social media or healthcare systems to finding information and presenting their cases with any physician; however, the behavior of physicians tends to be uncontrollable on the healthcare system. Physicians have the capacity to share all of their patients’ information with their colleagues without any permission or concern from the patients. For this reason, it is of utmost importance to design a breast self-examination system that can keep monthly track of self-exam data and communication between patient and physician. To develop such a system, identify the ethical values and trustworthiness as an indicator. Then, the survey will provide the details on ethical values and trustworthiness applicable in the system. Therefore, this research objective on the importance of ethical value and trustworthiness in the healthcare system. The survey on 772 respondents leading to the importance of the ethical value being used in the healthcare system is required. The ethical value of interaction, integrity, confidentiality, protection, caring, and fairness have a significant influence on the healthcare system. The path coefficients are answering Hypothesis I in presenting the positive relationship and significant effect between ethical value and BSE system (P<.001). On the other side, trustworthiness has a significant influence on the healthcare system. The path coefficients are answering Hypothesis II in presenting the positive relationship and significant effect between trustworthiness and the BSE system (P<.001). Finally, the relationship in healthcare between trustworthiness and ethical value is on integrity with honesty and belief.

Author 1: Rajes Khana

Author 2: Manmeet Mahinderjit Singh

Author 3: Faten Damanhoori

Author 4: Norlia Mustaffa

Keywords: Ethics; ethical value; trustworthiness; breast self-examination; healthcare system; social media

Paper 54: Examining the Effect of Online Gaming Addiction on Adolescent Behavior

Abstract: It exceeds daily rates of Internet use among adolescents compared to adults’ use of the Internet, as it was monitored that the number of adolescents on the Internet is increasing all over the world. Today, as a result of the ease of access to the Internet in the world, most adolescents’ access to the internet world is easier and more common. In this paper, we review some studies that explain the behavior of adolescents while gaming online and its effects. There are some statistics to determine the impact of the Internet on teenagers. The study reviews past studies on adolescent behavior and privacy with a potential impact on ado-lescent behavior, which has become one of the most important problems. We focused on exploring online game addiction con-cerns and their effects on teens' behavior. The purpose of this type of study is to determine the objective and examine this study within the backdrop of social reality. This study employed a quantitative methodology. We have selected this methodology because it has been proven to be reliable and has sound construct validity. The data was analyzed using the SPL smart tool and the main objective of this study was to investigate adolescent's be-havior in terms of their addiction to online games, and to study parents' awareness of the dangers of online games for their chil-dren. The study explored various factors that can influence ad-diction fears and examines their effects on adolescent behavior and contributed to the literature by identifying correlation factors and addressing this gap by applying through SEM application specifically the Smart PLS tool.

Author 1: Maha AlDwehy

Author 2: Hedia Zardi

Keywords: Online gaming addiction; adolescent behavior on internet; privacy

Paper 55: BOTNETs: A Network Security Issue

Abstract: With the technological advancements in the field of networking and information technology in general, organizations are enjoying the technological blessings and simultaneously under perpetual threats that are present in the form of attacks, designed especially to disable organizations and their infrastructure, as the gravest cyber threats in recent times. Compromised computers or BOTNETs are unarguably the most severe threat to the security of internet community. Organizations are doing their best to curb BOTNETs in every possible way, spending huge amount of their budget every year for available hardware and software solutions. This paper presents a survey on the security issues raised by the BOTNETs, their future; how they are evolving and how they could be circumvent to secure the most valuable resource of the organizations which is data. The compromised systems may be treated like viruses in the network which are capable of performing substantial loss to the organization including theft of confidential information. This paper highlights the parameters that should be considered by the organizations or Network administrators to find out the anomalies that may point to the presence of BOTNET in the network. The early detection may reduce the impact of damage by taking timely actions against compromised systems.

Author 1: Umar Iftikhar

Author 2: Kashif Asrar

Author 3: Maria Waqas

Author 4: Syed Abbas Ali

Keywords: BOTNET; malware; drones; zombies; threats

Paper 56: Assessment of Surface Water Quality on the Upper Watershed of Huallaga River, in Peru, using Grey Systems and Shannon Entropy

Abstract: The assessment of the quality of surface water is a complex issue that entails the comprehensive analysis of several parameters that are altered by natural or man-made causes. In this sense, the Grey Clustering method, which is based on Grey Systems theory, and Shannon Entropy, based on the artificial intelligence approach, provide an alternative to evaluate water quality in an integral way considering the uncertainty within the analysis. In the present study, the water quality on the upper watershed of Huallaga river was evaluated taking into account the monitoring results of twenty-one points carried out by the National Water Authority (ANA) analyzing nine parameters of the Prati index. The results showed that all the monitoring points of the Huallaga river were classified as not contaminated, which means that the discharges, generated by economic activities, are carried out through of treatment plants meeting the quality parameters. Finally, the results obtained can be of great help to the ANA and the regional and local authorities of Peru in making decisions to improve the management of the Huallaga river watershed.

Author 1: Alexi Delgado

Author 2: Jharison Vidal

Author 3: Jhon Castro

Author 4: Jhonel Felix

Author 5: Jorge Saenz

Keywords: Grey clustering; Huallaga river; Prati index; Shannon entropy; water quality

Paper 57: Supplier Qualification Model (SQM): A Quantitative Model for Supplier Agreements Evaluation

Abstract: Recently software outsourcing has increasingly widespread due to the valuable economical and technical benefits it introduced to the software development industry. Where the software development organizations adopt a third party to acquire a software project component (product, service). In the acquisition, process companies rely on the CMMI supplier agreement management (SAM) process area to select the potential supplier. Potential suppliers (vendors) are carefully selected through a dedicated process to ensure the delivery of high-quality and reliable services. Most of the published work in the context of how to evaluate and select the right supplier is based on a normal process with plain steps, nevertheless, no literature was reported to evaluate suppliers in a measurable way and select the potentials depending on a quantitative model. The purpose of this paper is to propose a practical quantitative model called the Supplier Qualification Model that enables the organizations to easily evaluate and select the potential suppliers through a measurable approach depends on monitoring and executing the SLAs of the SAM. The proposed model has been verified by implementing it through building an extension for one of the worldwide leading Agile management platforms according to Gartner (Microsoft Team Foundation Server). Multiple versions of the extension were implemented to target the major versions of Microsoft Team Foundation Server and validated by using them in 426 worldwide companies. This proves the suitability of the model to be used.

Author 1: Mohammed Omar

Author 2: Yehia Helmy

Author 3: Ahmed Bahaa Farid

Keywords: Agile practices; vendor selection; CMMI; outsourcing; software acquisition; supplier agreement management; supplier selection; supplier evaluation

Paper 58: Feature-Based Sentiment Analysis for Arabic Language

Abstract: In light of the spread of e-commerce and e-marketing, and the presence of a huge number of reviews and texts written by people to share views on products, it became necessary to give attention to extracting these opinions automatically and analyzing the feelings of the reviewers. The goal is to obtain reports evaluating products and contribute to improve services at a glance. Sentiment Analysis is a relatively recent study that deals with the processing of natural texts published in web sites and social networks. However, the processing of texts written in the Arabic language is one of the challenges that specialists face because people do not rely on standard Arabic, writing people in spoken/colloquial languages and use various dialects. This paper will present feature-based sentiment analysis for Arabic language which works on text analysis technique that breaks down text into aspects (attributes or components of a product or service), and then allocates each one a sentiment level (positive, negative or neutral).

Author 1: Ghady Alhamad

Author 2: Mohamad-Bassam Kurdy

Keywords: Sentiment analysis; feature-based; colloquial Arabic; opinion mining; natural language processing

Paper 59: Permission Extraction Framework for Android Malware Detection

Abstract: Nowadays, Android-based devices are more utilized than other Operating Systems based devices. Statistics show that the market share for android on mobile devices in March 2018 is 84.8 percent as compared with only 15.1 percent iOS. These numbers indicate that most of the attacks are subjected to Android devices. In addition, most people are keeping their confidential information on their mobile phones, and hence there is a need to secure this operating system against harmful attacks. Detecting malicious applications in the Android market is becoming a very complex procedure. This is because as the attacks are increasing, the complexity of feature selection and classification techniques are growing. There are a lot of solutions on how to detect malicious applications on the Android platform but these solutions are inefficient to handle the features extraction and classification due to many false alarms. In this work, the researchers proposed a multi-level permission extraction framework for malware detection in an Android device. The framework uses a permission extraction approach to label malicious applications by analyzing permissions and it is capable of handling a large number of applications while keeping the performance metrics optimized. A static analysis method was employed in this work. Support Vector Machine (SVM) and Decision Tree Algorithm was used for the classification. The results show that while increasing input data, the model tries to keep detection accuracy at an acceptable level.

Author 1: Ali Ghasempour

Author 2: Nor Fazlida Mohd Sani

Author 3: Ovye John Abari

Keywords: Malware detection; android device; operating system; malicious application; machine learning

Paper 60: Performance Impact of Genetic Operators in a Hybrid GA-KNN Algorithm

Abstract: Diabetes is a chronic disease caused by a deficiency of insulin that is prevalent around the world. Although doctors diagnose diabetes by testing glucose levels in the blood, they cannot determine whether a person is diabetic on this basis alone. Classification algorithms are an immensely helpful approach to accurately predicting diabetes. Merging two algorithms like the K-Nearest Neighbor (K-NN) Algorithm and the Genetic Algorithm (GA) can enhance prediction even more. Choosing an optimal ratio of crossover and mutation is one of the common obstacles faced by GA researchers. This paper proposes a model that combines K-NN and GA with Adaptive Parameter Control to help medical practitioners confirm their diagnosis of diabetes in patients. The UCI Pima Indian Diabetes Dataset is deployed on the Anaconda python platform. The mean accuracy of the proposed model is 0.84102, which is 1% better than the best result in the literature review.

Author 1: Raghad Sehly

Author 2: Mohammad Mezher

Keywords: Data mining; classification; K-NN; GA; Pima Indian Diabetes Dataset; UCI

Paper 61: Enhanced Method to Stream Real Time Data in IoT using Dynamic Voltage and Frequency Scaling with Memory

Abstract: DVFS (Dynamic Voltage and Frequency Scaling) is a popular CPU (Central Processing Unit) level voltage frequency scale technology based on the application precedence. To motivate recurrence / voltage scaling as a feasible tool for energy productivity, i) basic workloads should ensure that memory recurrence scaling has an impact with insignificant degradation and (ii) that there is an enormous open door for reduction of power in this work. Therefore, if memory recurrence is that, two limiting forces on energy efficiency impact all items in an anomalous state. The competence depends on both power and runtime, because energy is the result of time and energy. The reduction in control alone will increase skills. However, further discussions at lower control work focuses are conducted, expanding operating times and energy in this way. There is a bloating edge that decreases the recurrence / voltage of memory in this way. This shows further that the recurrence of statically-scaling memory has little impact on many lower workloads because of recurrent effects only the idling of transmission interchange, part of the memory dormancy. This will be shown. Inspire in this paper, the scaling of memory recurrence will affect frame power (show a systemic model to simplify the scaling of voltage) and therefore electricity. It presents DVFS memory computing in real time. The DVFS technology is popular for measuring the frequency of voltage according to the CPU level applications. In this work, an enhanced DVFS with memory technique proposal is used to decrease energy use and improve performance at the memory level.

Author 1: H. A. Hashim

Keywords: Dynamic voltage; frequency scaling; central processing unit

Paper 62: Energy Efficient Cluster based Routing Protocol with Secure IDS for IoT Assisted Heterogeneous WSN

Abstract: Currently, wireless sensor networks (WSNs) and the Internet of Things (IoT) have become useful in a wide range of applications. The nodes in IoT assisted WSN commonly operate on restricted battery units, meaning energy efficiency is a major design issue. Clustering and route selection processes are commonly utilized energy-efficient techniques for WSN. Although several cluster-based routing approaches are available for homogeneous WSN, only a limited number of studies have focused on energy efficient heterogeneous WSN (HWSN). Moreover, security poses a major design issue in the HWSN. This paper introduces an energy efficient cluster-based routing protocol with a secure intrusion detection system in HWSN called EECRP-SID. The proposed EECRP-SID technique involves three main phases: cluster construction, optimal path selection, and intrusion detection. Initially, the type II fuzzy logic-based clustering (T2FC) technique with three input parameters are applied for cluster head (CH) selection. These parameters are residual energy level (REL), distance to the base station (DTBS), and node density (NDEN). In addition to CH selection, the salp swarm optimization (SSO) technique is utilized to select optimal paths for inter cluster data transmission, which results in energy efficient HWSN. Finally, to achieve security in cluster based WSN, an effective intrusion detection system (IDS) using long short-term memory (LSTM) is executed on the CHs to identify the presence of intruders in the network. The EECRP-SID method was implemented in MATLAB, and experimental outcomes indicate that it outperformed the compared methods in terms of distinct performance measures.

Author 1: Sultan Alkhliwi

Keywords: Wireless Sensor Networks (WSN); Clustering; Routing; Type II fuzzy logic; salp swarm algorithm; long short-term memory (LSTM)

Paper 63: Moment Features based Violence Action Detection using Optical Flow

Abstract: Instantaneous detection of violence is still an unsolved research problem although artificial intelligence lives its prosperous years. The severity of injury causes due to violence can be minimized by detecting violence in real time demands for effective violence detection. Various methods were previously proposed for violence detection which could not provide robust results due many challenges, i.e. noise, motion estimation, lack of appropriate feature selection, lack of effective classification approach, complex background and variations in illumination. This research proposes an efficient method for violence detection using moment features to use motion patterns to facilitate detection in each frame and provides smaller area as region of interest. This means probability for extraction of motion intensity is getting lost because of same colored object in the background is reduced and thus minimizes background complexity. After that, proposed method uses optical flow to calculate angles and linear distances in each frame. In this context, if there is any frame loss due to noise or illumination variation, proposed method uses Kalman filter to process that frame by illuminating noise. Finally, decision for violence is determined using random forest classifier from single feature vector by generating a set of probabilities for each class. Proposed research performed extensive experimentation where accuracy rate of 99.12% was achieved using frame rate of 35 fps which is higher comparing with previous research results. Experimental results reveal the effectiveness of the proposed methodology.

Author 1: A F M Saifuddin Saif

Author 2: Zainal Rasyid Mahayuddin

Keywords: Violence detection; feature extraction; classification; optical flow

Paper 64: Object based Image Splicing Localization using Block Artificial Grids

Abstract: People share pictures freely with their loved ones and others using smartphones or social networking sites. The news industry and the court of law use the pictures as evidence for their investigation. Simultaneously, user-friendly photo editing tools alter the content of pictures and make their validity ques-tionable. Over two decades, research work is going on in image forensics to determine the picture’s trustworthiness. This paper proposes an efficient statistical method based on Block Artificial Grids in double compressed images to identify regions attacked by image manipulation. In contrast to existing approaches, the proposed approach extracts the artefacts on individual objects instead of the entire image. A localized algorithm is proposed based on the cosine dissimilarity between objects and exploit the tampered object with maximum dissimilarity among objects. The experimental results reveals that the proposed method is superior over other current methods.

Author 1: P N R L Chandra Sekhar

Author 2: T N Shankar

Keywords: Image forensics; splicing localization; block artifi-cial grids; object segmentation; double compression

Paper 65: Multi-Channel Muscle Armband Implementation: Electronic Circuit Validation and Considerations towards Medical Device Regulation Assessment

Abstract: Multi-channel muscle arrays are commonly used as sensors in bionic prosthetic devices offering an innovative solution to recover motion in transradial amputees. This study presents preliminary assessments towards validation of a muscle armband for usage in transradial users. Analog and digital components were designed based on medical agencies’ recommendations to assess future compliance with Latin American medical device regulations. The study follows two approaches, an exploratory and pre-experimental design. Design was validated and confronted among research literature and medical device regulations. For validation, a pre-experimental design was guided by a quantitative paradigm. Muscle signal was assessed before and after the condition circuit for up to four muscle signals in real time. The present study considers both the conditioning muscle signal circuit and the embedded logic implementation to record signals from the designed muscle armband. Results show that the proposed device allows to record noninvasive signals with a frequency from 20-500 Hz.

Author 1: Martha Rocio Gonzales Loli

Author 2: Elsa Regina Vigo Ayasta

Author 3: Leyla Agueda Cavero Soto

Author 4: Jose Albites-Sanabria

Keywords: Component; muscle armband; surface electromyography; medical device regulation; transradial users

Paper 66: A Novel Machine Learning based Model for COVID-19 Prediction

Abstract: Since end of 2019, the World Health Organization (WHO) provided the name COVID-19 for the disease caused by the novel coronavirus. Coronavirus is a family of viruses that are named according to the spiky crown existed on the outer surface of the virus. The novel coronavirus, also known as SARS-CoV-2, which is a contagious respiratory virus that first reported in Wuhan, China. According to the rapid and sudden spread for COVID-19, it attracts most of the scientists and researchers all over the world. Researchers in the data science field are trying to analyze the worldwide infection cases day-by-day to gain a complete statistical view of the current situation. In this paper, a novel approach to predict the daily infection records for COVID-19 is presented. The model is applied for Egypt as well as the highest 10 ranked countries based on the number of cases and rate of change. The proposed model is implemented based on supervised Machine-Learning Regression algorithms. The dataset used for prediction was issued by WHO starting from 22 Jan 2020.

Author 1: Tamer Sh. Mazen

Keywords: Coronavirus; COVID-19; coronavirus in Egypt; supervised machine learning; regression models

Paper 67: COVID-19 Transmission Risks Assessment using Agent-Based Weighted Clustering Approach

Abstract: Coronavirus is a pandemic disease spreading from human-to-human rapidly all over the world. This virus is origin from common cold to severe disease such as MERS-CoV and SARS-CoV. Initially it was identified in China, December 2019. The main aim of this research is used to identify the COVID-19 transmission risks assessment from human-to-human within a cluster. The agent-based weighted clustering approach is used to identify the corona virus infected people rapidly within a cluster. In the weighted clustered approach, the normal agents are consisted as susceptible node and the corona virus infected people are considered as malicious node. The Cluster Head (CH) is elected based upon some weighting factors and the trust value is evaluated for all the agents within the cluster. The cluster head were periodically transfers the malicious node information to all other nodes within the cluster. Finally, the agent-based weighted clustering machine learning model approach is used to identify the number of corona virus infected people within the cluster.

Author 1: P. Vidya Sagar

Author 2: T. Pavan Kumar

Author 3: G. Krishna Chaitanya

Author 4: Moparthi Nageswara Rao

Keywords: COVID-19; machine learning; weighted clustering; malicious node; susceptible node; head; trust

Paper 68: Genetic Programming-Based Code Generation for Arduino

Abstract: This article describes a methodology for writing the program for the Arduino board using an automatic generator of assembly language routines that works based on a cooperative coevolutionary multi-objective linear genetic programming algorithm. The methodology is described in an illustrative example that consists of the development of the program for a digital thermometer organized on a circuit formed by the Arduino Mega board, a text LCD module, and a temperature sensor. The automatic generation of a routine starts with an input-output table that can be created in a spreadsheet. The following routines have been automatically generated: initialization routine for the text LCD screen, routine for determining the temperature value, routine for converting natural binary code into unpacked two-digit BCD code, routine for displaying a symbol on the LCD screen. The application of this methodology requires basic knowledge of the assembly programming language for writing the main program and some initial configuration routines. With the application of this methodology in the illustrative example, 27% of the program lines were written manually, while the remaining 73% were generated automatically. The program, produced with the application of this methodology, preserves the advantage of assembly language programs of generating machine code much smaller than that generated by using the Arduino programming language.

Author 1: Wildor Ferrel

Author 2: Luis Alfaro

Keywords: Genetic programming; Arduino mega board; multi-objective linear genetic programming; cooperative coevolutionary algorithm; automatic generation of programs; Arduino based thermometer

Paper 69: Drop-Out Prediction in Higher Education Among B40 Students

Abstract: Malaysia citizens are categorized into three different income groups which are the Top 20 Percent (T20), Middle 40 Percent (M40), and Bottom 40 Percent (B40). One of the focus areas in the Eleventh Malaysia Plan (11MP) is to elevate the B40 household group towards the middle-income society. In 2018, it was estimated that 4.1 million households belong to this group. The government of Malaysia has widened access to higher education for the B40 group in an effort to reduce the gaps in socioeconomics and to improve their living standards. Statistical data shows that since 2013, a yearly intake of students in bachelor's degree programs in Malaysia's public universities amounts to more than 85,000. Despite this huge number of enrolments, not all were able to graduate, including students from low-income family background. Data mining approach with machine learning techniques has been widely used effectively and accurately to predict students at risk of dropping out in general education. However, machine learning related works on student attrition in Malaysia's higher education is generally lacking. Therefore, in this research, three machine learning models were developed using Decision Tree, Random Forest and Artificial Neural Network algorithm in order to classify attrition among B40 students in bachelor's degree programs in Malaysia's public universities. Comparative performance analysis between the three models indicates that the Random Forest model is the best model in predicting student attrition in this study. Random Forest model outperforms the other two models in terms of accuracy, precision, recall and F-measure with the value of 95.93%, 97.10%, 81.26% and 88.50%, respectively. Nevertheless, there is a statistically significant difference in performance between the Random Forest model and Decision Tree model but no statistically significant difference between Random Forest models and Artificial Neural Network model.

Author 1: Nor Samsiah Sani

Author 2: Ahmad Fikri Mohamed Nafuri

Author 3: Zulaiha Ali Othman

Author 4: Mohd Zakree Ahmad Nazri

Author 5: Khairul Nadiyah Mohamad

Keywords: Machine learning; prediction; student attrition; student drop-out; B40; random forest; decision tree; artificial neural network

Paper 70: Proficiency Assessment of Machine Learning Classifiers: An Implementation for the Prognosis of Breast Tumor and Heart Disease Classification

Abstract: Breast cancer and heart disease can be acknowledged as very dangerous and common disease in many countries including Pakistan. In this paper classifiers comparative study has been performed for the tumor and heart disease classification. Around one lac women are diagnosed annually with this life-threatening disease having no family history of the disease. If it is not treated on time it may grow and spread to the other parts of human body. Mammograms are the X-rays of the breast which can be used for the screening of cancer tumor. Prior identification of breast cancer may increase the chance of survival up to 70 percent. Tumors which causes cancer can be categorized into two types: a) Benign and b) Malignant. Benign tumor can be explained as the tumor which are not attached to neighbor tissues or spread in the other parts of the body. In Malignant tumor, other parts may be affected by it as it can grow and spread in the other parts of the body. To classify the tumor as Malignant or Benign is very complex as the similarities of cancer tumor and tumor caused by the skin inflammation are almost same. The early identification of Malignant is mandatory to protect the patient life. Diversified medical methods based on deep learning and machine learning have been developed to treat the patients as cancer is a very serious and crucial issue in this era. In this research paper machine learning algorithms like logistic regression, K-NN and tree have been applied to the breast cancer data set which has been taken from UCI Machine learning repository. Comparative study of classifiers has been performed to determine the better classifier for the robust prediction of breast tumors. Simulated results proved that using Logistic regression, ninety-one percent accuracy was achieved. The research showed that logistic regression can be applied for the accurate and precise early prediction of breast cancer. Cardiovascular disease is very common throughout the world. It has been noticed that health in cardiac patients that there are so many factors which causes heart disease or heart attack. The factors leading to the heart failure includes varying blood pressure, high sugar, cardiac pain, and heart rate, high cholesterol level (LDL), artery blockage and irregular ECG signals. Many researchers proved that stress in patients can also be the reason for the heart disease. Higher numbers of cardiac surgeries like angioplasty and heart by-pass are performed on annual basis. Actually, people don’t care about their lifestyle and diet and fully ignore the symbols. It can be early predicted and cured if proper testing and medication for heart is done. Sometimes there is a false pain which has the same feeling like angina pain depicting cardiovascular disease. To reduce the false alarm and robustly classify the heart disease, several machine learning approaches have been adopted. In proposed research for the accurate classification of heart disease comparison has been performed among support vector machine (SVM), K-nearest neighbors K-NN and linear discriminant analysis. Simulated results demonstrated that Support vector machine was found to be a better classifier having an accuracy of 80.4%.

Author 1: Talha Ahmed Khan

Author 2: Kushsairy A. Kadir

Author 3: Shahzad Nasim

Author 4: Muhammad Alam

Author 5: Zeeshan Shahid

Author 6: M.S Mazliham

Keywords: Breast cancer; benign; malignant; logistic regression; cardiovascular disease; heart disease diagnosis; support vector machine; classifiers; k-nearest neighbors

Paper 71: Educational Data Mining for Monitoring and Improving Academic Performance at University Levels

Abstract: This study applied Educational Data Mining on 712 sample of logs extracted from Moodle Learning Management System (LMS) at an African University in order to measure students and staff patterns of use of the LMS resources and hence determine if the quantity of participation measured in the amount of time spent on the use of LMS resources improved academic performance of students. Data collected from Moodle LMS was preprocessed and analyzed using machine learning algorithms of clustering, classification and visualization from WEKA system tools. The dataset consisted of Course tools (Quiz, Assignment, Chat, Forum, URL, Folder and Files), Lecturer and Student usage of the tools. Furthermore, SPSS was used to obtain a matrix for coefficients of correlations for course tools, tests and final grade. Correlation analysis was done to verify if students use of course tools had impact on student’s academic performance. Findings indicated the pattern of usage for course1 as Quiz (38358), System (17910), Forum (8663), File (8566), Assignment (1235), Folder (514, File Submission (172), and Chat (37); Course2 as System (11920), Quiz (8208), Forum (4476), File (4394), Assignment (257), Chat (247), URL (125), and File Submission (38); Course3 as System (2622),File (1022), Folder (570), Forum (258), and URL (2). Overall, evaluating the correlation between the use of LMS resources and students’ performance, findings indicated there is significant relationship between the use of LMS resources and students’ academic performance at 0.01 level of significant. The findings are useful for strategic academic planning purpose with LMS data at the university.

Author 1: Ezekiel U Okike

Author 2: Merapelo Mogorosi

Keywords: Educational data mining; learning management systems; Weka system tools; improved academic performance

Paper 72: Improved PSO Performance using LSTM based Inertia Weight Estimation

Abstract: Particle Swarm Optimization (PSO) is first introduced in the year 1995. It is mostly an applied population-based meta-heuristic optimization algorithm. PSO is diversely used in the areas of sciences, engineering, technology, medicine, and humanities. Particle Swarm Optimization (PSO) is improved its performance by tuning the inertia weight, topology, velocity clamping. Researchers proposed different Inertia Weight based PSO (IWPSO). Every Inertia Weight based PSO in excelling the existing PSOs. A Long Short Term Memory (LSTM) predicting inertia weight based PSO (LSTMIWPSO) is proposed and its performance is compared with constant, random, and linearly decreasing Inertia Weight PSO. Tests are conducted on swarm sizes 50, 75, and 100 with dimensions 10, 15, and 25. The experimental results show that LSTM based IWPSO supersedes the CIWPSO, RIWPSO, and LDIWPSO.

Author 1: Y. V.R.Naga Pawan

Author 2: Kolla Bhanu Prakash

Keywords: Particle swarm optimization; inertia weight; long short term memory; benchmark functions; convergence

Paper 73: Identifying the Impacts of Active and Passive Attacks on Network Layer in a Mobile Ad-hoc Network: A Simulation Perspective

Abstract: In this research, we attempted to investigate about the features and behaviors of network layer based active and passive attacks in Ad-hoc On-Demand Vector (AODV) routing protocol in Mobile Ad-hoc Networks (MANET). Through the literature survey, we try to understand the features of each attacks and examine the behaviors of these attacks through simulations via Network Simulator 2 (NS2). Blackhole, Grayhole and Wormhole attacks are used in this simulation study. Each attacks are introduced independently into the network to find the impacts on network performances that are evaluated through Packet Delivery Ratio (PDR), Average End-to-End Delay (AEED), Throughput, Average Data Dropping Rate (ADDR) and Simula-tion Processing Time at Intermediate Nodes (SPTIN). To obtain more accurate results, simulation parameters are maintained same in each simulation. A controller network is simulated to compare with each attack simulation. Simulations are repeated by changing the number of connected intermediate nodes (hops) in the network. We observed at collected data analysis, the lowest SPTIN in the network that contained a Blackhole or Grayhole attack out of these three attacks. The network which is affected by a Blackhole attack shows higher amount of ADDR than controller network. Furthermore data forwarding rate is higher in the network which is affected by a Wormhole attack. Finally, according to the simulation studies, we are able to understand that Blackhole and Grayhole attacks cause more damage to the network performances than Wormhole attacks.

Author 1: Uthumansa Ahamed

Author 2: Shantha Fernando

Keywords: Active attack; network layer; passive attack; perfor-mance matrices; simulation study

Paper 74: Hybrid Solution for Container Placement and Load Balancing based on ACO and Bin Packing

Abstract: Currently, data centers energy consumption in the cloud is attracting a lot of interest. One of the most approaches to optimize energy and cost in data centers is virtualization. Recently, a new type of container-based virtualization has ap-peared, containers are considered very light and modular virtual machines, they offer great flexibility and the possibility of migra-tion from one environment to another, which allows optimizing applications for the cloud. Another approach to saving energy is to consolidate the workload, which is the amount of processing that the computer has to perform at any given time. In this article, we will study the container placement algorithm that takes into account the QoS requirements of different users in order to minimize energy consumption. Thus, we proposed a Hybrid approach for managing resources and workload based on ant colony optimization (ACO) and the first-fit decreasing (FFD) algorithm to avoid unnecessary power consumption. The results of the experiment indicate that using the first-fit decreasing algorithm (FFD) for container placement is better than ant colony optimization especially in a homogeneous systems. On the other hand the ant colony optimization shows very satisfying results in the case of workload management.

Author 1: Oussama SMIMITE

Author 2: Karim AFDEL

Keywords: Cloud; virtualization; container; placement; Green IT; containerization

Paper 75: Speaker-Independent Speech Recognition using Visual Features

Abstract: Visual Speech Recognition aims at transcribing lip movements into readable text. There have been many strides in automatic speech recognition systems that can recognize words with audio and visual speech features, even under noisy conditions. This paper focuses only on the visual features, while a robust system uses visual features to support acoustic features. We propose the concatenation of visemes (lip movements) for text classification rather than a classic individual viseme map-ping. The result shows that this approach achieves a significant improvement over the state-of-the-art models. The system has two modules; the first one extracts lip features from the input video, while the next is a neural network system trained to process the viseme sequence and classify it as text.

Author 1: Pooventhiran G

Author 2: Sandeep A

Author 3: Manthiravalli K

Author 4: Harish D

Author 5: Karthika Renuka D

Keywords: Visual speech recognition; audio speech recogni-tion; visemes; lip reading system; Convolutional Neural Network (CNN)

Paper 76: Security Issues in Near Field Communications (NFC)

Abstract: Near Field Communications (NFC) is a rising tech-nology that enables two devices that are within close proximity to quickly establish wireless contactless communications. It looks intuitively secure enough and various applications like ticketing, mobile payments, access grant etc. are taking advantage of NFC and flooding into the market in recent years. However, is it worth to trust such applications at the risk of leaking the user’s private information? This paper surveys NFC vulnerabilities and exploits different kinds of security attacks. Upon surveying related materials, the paper covered possible solutions that could defend against those security threats. Furthermore, attacks and countermeasures evaluation in terms of practicality and cost have been further investigated.

Author 1: Arwa Alrawais

Keywords: Near Field Communications (NFC); NFC attacks; NFC countermeasures; NFC vulnerabilities

Paper 77: Meezaj: An Interactive System for Real-Time Mood Measurement and Reflection based on Internet of Things

Abstract: Subjective well-being has a critical affect on progress and productivity vital for digital and strategical trans-formation. Increase in the suicide attempts of the college and university students is a clear indication of stress and anxiety among the students. Offering a fulfilling and healthy life to promote the life-long learning journey is also one of the important objectives of the Vision 2030 for modernization of the Kingdom of Saudi Arabia. Due to the multifaceted nature of subjective well-being, real-time mood measurement and reflection is a challenging task and demands using latest technologies. This paper aims to present Meezaj, an interactive system for real-time mood measurement and reflection leveraging the Internet of Things (IoT) technology. Architecture and workflow of the Meezaj system are discussed in detail. Meezaj not only promotes the sense of significance in the students, by indicating that their happiness matters in decision making, but also assists policy makers to identify factors affecting the happiness in an educational institute.

Author 1: Ehsan Ahmad

Keywords: Subject well-being; happiness; IoT; Arduino; Vision 2030

Paper 78: A Repeated Median Filtering Method for Denoising Mammogram Images

Abstract: In the medical field, mammogram analysis is one of the most important breast cancer detection procedures and early diagnosis. During the image acquisition process of mammograms, the acquired images may be contained some noises due to the change of illumination and sensor error. Hence, it is necessary to remove these noises without affecting the edges and fine details, achieving an effective diagnosis of beast images. In this work, a repeated median filtering method is proposed for denoising digital mammogram images. A number of experiments are conducted on a dataset of different mammogram images to evaluate the proposed method using a set of image quality metrics. Experimental results are reported by computing the image quality metrics between the original clean images and denoised images that are corrupted by different levels of simulated speckle noise as well as salt and paper noise. Evaluation quality metrics showed that the repeated median filter method achieves a higher result than the related traditional median filter method.

Author 1: Hussain AlSalman

Keywords: Mammogram images; image denoising; median filter; repeated median filtering; speckle noise; salt and paper noise

Paper 79: Developing an Information Management Strategy for e-government in Saudi Arabia

Abstract: Given the current Corona virus pandemic, the role of e-government in both developed and developing countries is becoming more important than ever. This study aims to assess the development of e-government in Saudi Arabia and to compare it with that of two world e-government leaders: USA and the Republic of Korea, during the period of 2003-2020. Data analysis consists of: 1) a comparative, cross-country, longitudinal analysis of the e-government development index (EGDI) relating to Saudi Arabia, the USA and the Republic of Korea; 2) a trend analysis of the online services, telecommunication infrastructure, and human capital indicators; and 3) a gap analysis to pinpoint the gap between Saudi Arabia and the USA, and the gaps between Saudi Arabia and the Republic of Korea. The results reveal a continuous rise in the rankings of Saudi Arabia’s EGDI over the years. However, findings also indicate some areas that require more improvement. An information management strategy for the support of e-government in Saudi Arabia has been developed, describing the current e-government situation and setting high, medium, and low-level priorities that the country needs to consider in order to better its compliance with international e-government practices.

Author 1: Fatmah Almehmadi

Keywords: e-Government; Information technology; Information management; Trend analysis; Saudi Arabia; the USA; the Republic of Korea

Paper 80: The Automatic Agricultural Crop Maintenance System using Runway Scheduling Algorithm: Fuzzyc-LR for IoT Networks

Abstract: In this framework, the crop diseases have been identified using three types of methods, fuzzy-c as a clustering algorithm, runway scheduling trains like classification algorithm, and logistic regression as prediction algorithm. These techniques are meaningful solutions for losses in yields and the quantity of agriculture production. In this work, crop disease and corresponding fertilizers are predicted based on pattern scalability by the above algorithms. It proposes a Sensor Calibration and Feed Back Method (SCFM) with RWSA for better agriculture crop maintenance with automation and Fuzzy-c, Logistic regressions are helpful in studying the datasets of the crops for classifying the disease. This research tries to identify the leaf color, leaf size, disease of plant, and fertilizer for the illness of crops. In this context, RWSA-Agriculture gives the solution for current problems and improves the F1-Score. The data collected from local sensors and remote station is estimated with the dataset, these sensor based L.R., and Fuzzy-c controls disease prediction system in SCFM and RWSA. This technique accurately regulates the dispensing of water as well as chemicals; fertilizers for crop monitor and prevent the diseases of crops. This investigation gives performance metrics values i.e PSNR=44.18dB, SSIM = 0.9943, BPP =1.46, Tp=0.945 and CR = 5.25.

Author 1: G. Balakrishna

Author 2: Nageswara Rao Moparthi

Keywords: Runway Scheduling Algorithm (RWSA); Sensor Calibration and Feedback Method (SCFM); IoT; fuzzy-c; logistic regression (LR)

Paper 81: Legal Requirements towards Enhancing the Security of Medical Devices

Abstract: Over 25 million Americans are dependent on med-ical devices. However, the patients who need these devices only have two choices, thus the choice between using an insecure critical-life-functioning devices or the choice to live without the support of a medical device with the consequences of the threats presented by the disease. This study therefore conducted a state-of-the-art on security requirements, concerning medical devices in the US and EU. Food, Drugs and Cosmetic Act, HIPAA, Medical Device Regulations of EU and GDPR were some of the identified regulations for controlling the security of these devices. Statutory laws such as computer Fraud and abuse Act (CFAA), Anti-Tempering Act, Panel Code as well as Battery and Trespass to Chattel in the civil law, were also identified. In analyzing the security requirements, there are less motivations on criminal charges against cyber criminals in addressing the security issues. Because it is often challenging to identify the culprits in medical device hacks. It is also difficult to hold device manufactures on negligence of duty especially after the device has been approved or if the harm on patient was as a result of a cyber attacker. Suggestions have been provided to improve upon the regulations so that both the regulatory bodies and MDM can improve upon their security conscious care.

Author 1: Prosper K. Yeng

Author 2: Stephen D. Wulthusen

Author 3: Bian Yang

Keywords: Information security; medical device; legal require-ment; healthcare; privacy

Paper 82: Fine-Tuning Pre-Trained Convolutional Neural Networks for Women Common Cancer Classification using RNA-Seq Gene Expression

Abstract: Most of the recent cancer classification methods use gene expression profile as features because it can provide very important information regarding tumor characteristics. Motivated by their success in the computer vision area now deep learning has been successfully applied to medical data because it can read non-linear patterns in a complex feature and can allow the leverage of information from unlabeled data of problems that do not belong to the problem being handled. In this paper, we implement transfer learning, which refers to the use of a model trained on one task to perform classification on another task to classify five cancer types that most commonly affect women. We used VGG16, Xception, DenseNet, and ResNet50 as base models and then added a dense layer to reflect our five-class classification problem. To avoid training over-fitting that can result in a very high training accuracy and a low cross-validation accuracy we used L2-regularization. We retrained (fine-tuned) these models using a five-fold cross-validation approach on RNA-Seq gene expression data after transforming it into 2D-image like data. We used the softmax activation function with the prediction dense layer and adam as optimizer in the model fit for all four architectures. The highest performance is obtained when fine-tuning Xception architecture, which achieved classification accuracy = 98.6%, precision = 98.6%, recall = 97.8%, and F1-score = 98% on five-fold cross-validation training and testing approach.

Author 1: Fadi Alharbi

Author 2: Murtada K. Elbashir

Author 3: Mohanad Mohammed

Author 4: Mohamed Elhafiz Mustafa

Keywords: Fine-tuning; RNA-Seq; gene expression

Paper 83: PlusApps: Towards a Privacy Risk Analysis for Android Plus Applications

Abstract: The Android platform leads the mobile operating system marketplace and subsequently has drawn the interest of malware authors and researchers. The significant number of proposed malware detection techniques, classification models and practical reverse engineering solutions are insufficient and there is a lack of perfection. Also, the number of Android apps has increased significantly in recent years, as has the number of apps revealing confidential data. It is essential to investigate the applications and make sure that none of them are leaking privacy data, and consequently a privacy leak analysis approach is needed. Therefore, this paper investigates plus apps behavior and data leakages with a machine-learning algorithm to determine the best features for differentiating plus apps from original apps. The result of the analysis discloses that the SVM classifier presents the greatest accuracy. Further investigation demonstrates that the classifier with the ranking algorithm that uses correlation coefficient (CorEvel) and information gain (InfGain) methods offers more exceptional precision than the other correlation algorithms. The result of this experiment proves that the ranking algorithm is able to decrease the dimension of features and produce an accuracy of 96.60%.

Author 1: Abdullah J. Alzahrani

Keywords: Android security; malware detection; permission analysis; privacy risk; plus application

Paper 84: Design of a Mobile Application for the Automation of the Census Process in Peru

Abstract: This study shows that the traditional census process in Peru has many shortcomings, including the loss of data and the long duration of the process. To solve this problem, a mobile application was designed to automate the census process in Peru. For the development, we will rely on the agile scrum methodology, Balsamiq Mockup and Adobe XD tools, for help us to make prototypes of this application. In the last census, many families were not registered due to lack of time or other factors, so we designed this prototype of a mobile application, which will help the census taker to make the data recording process faster. The result obtained is the proposal of a productive approach, optimizing the census process, through a mobile application, where each census taker will register the data of the families on the census in a faster way and this information will be taken directly to the database of the organization conducting the census, thus avoiding loss of data, saving time and money.

Author 1: Luis Alberto Romero Tuanama

Author 2: Juber Alfonso Quiroz Gutarra

Author 3: Laberiano Andrade-Arenas

Keywords: Automation; Balsamiq Mockup; census; scrum

Paper 85: Augmented Reality Electronic Glasses Prototype to Improve Vision in Older Adults

Abstract: In this article, we focus on the elderly who suffer from low vision. We seek to design Augmented reality electronic glasses to help the elderly who suffer from vision problems, which causes a limitation when performing their daily activities, so this can affect their development in society causing serious physical and emotional damage, so we used a set of scientific articles that analyzed the percentage of visual impairment. Technology has demonstrated on numerous occasions that it can be a great ally for the health and well-being of the elderly. In this work, the objective is to design electronic glasses to help the elderly improve their vision. The methodology used is Design Thinking, which thanks to the phases of this methodology helps us to understand, collect information about the problem and give a solution, the result obtained is a prototype of electronic glasses in which it will benefit adults who suffer from low vision. As for the case study, we will show the design of the mobile application and the detailed development of the prototype.

Author 1: Lilian Ocares Cunyarachi

Author 2: Alexandra Santisteban Santisteban

Author 3: Laberiano Andrade-Arenas

Keywords: Augmented reality; design thinking; electronic glasses; low vision; seniors

Paper 86: Clustering-Based Hybrid Approach for Multivariate Missing Data Imputation

Abstract: In the era of big data, a significant amount of data is produced in many applications areas. However due to various reasons including sensor failures, communication failures, environmental disruptions, and human errors, missing values are found frequently These missing data in the observed data make a challenge for other data mining approaches, requiring the missed data to be handled at the preprocessing stage of data mining. Several approaches for handling the missing data have been proposed in the past. These approaches consider the whole dataset for making a prediction, making the whole imputation approach to be cumbersome. This paper proposes the procedure which makes use of the local similarity structure of the dataset for making an Imputation. The K-means clustering technique along with the weighted KNN makes efficient imputation of the missed value. The results are compared against imputations by mean substitution and Fuzzy C Means (FCM). The proposed imputation technique shows that it performs better than other imputation procedures.

Author 1: Aditya Dubey

Author 2: Akhtar Rasool

Keywords: Clustering; imputation; KNN; missing at random; multivariate

Paper 87: Design of a Mobile Application for the Learning of People with Down Syndrome through Interactive Games

Abstract: The research work is focused on people with Down syndrome since it is the most common genetic disorder worldwide, also, these people have cognitive and visual-motor disabilities, however, the Peruvian state does not use even 1% of the budget allocated to the educational sector of these people, so, by not receiving the education they can not develop their skills. Therefore, a prototype of a mobile application was designed for the learning of people with Down syndrome through interactive games, implementing the Scrum methodology for the development of the application prototype, using the Troncoso method for the teaching of reading and writing, but the teaching of visual-motor coordination was added to the method and concerning the design of the prototype, the Balsamiq tool was used because it was the most appropriate. And so the objective of developing the prototype of the application was achieved. Having as a result that people with Down syndrome can read or write basic words and differentiate the hemispheres of their body, through the unlimited attempts of the exercises, in each level or type of learning. In this way, with the teachings received, these people will have a better quality of life, being able to integrate into society, and be more independent when performing daily activities.

Author 1: Richard Arias-Marreros

Author 2: Keyla Nalvarte-Dionisio

Author 3: Laberiano Andrade-Arenas

Keywords: Application of games; cognitive disabilities; Down’s Syndrome; scrum methodology; Troncoso Method

Paper 88: Anti-Molestation: An IoT based Device for Women’s Self-Security System to Avoid Unlawful Activities

Abstract: Now-a-days, the public, mostly women and children are facing much harassment from the societies. The unlawful activities against ladies and children have been increasing signifi-cantly, and regularly we find out about eve-teasing, sexual assault cases, and attempt to molest or even killing after rape in public places or open areas. Also, many cases had gone unwarranted due to short pieces of evidence. In Bangladesh, the current statistics of sexual assaults and various unlawful activities are proliferating. To acknowledge these problems, in this paper, we have designed an IoT-based (Internet of Things) embedded device that is able to communicate with the law enforcement agency by dialing “999” (An Emergency Telephone Number in Bangladesh) on demand. The device contains Arduino Pro-Mini Microcontroller with a GSM (Global System for Mobile communication) module and can send SMS (short message service) with the victim’s present area to the law enforcement agency and relatives via GPRS (General Packet Radio Services). The proposed device’s form factor is too tiny to carry out easily at anywhere and anytime. The device features the “Plug & Play” functionalities, which means one button to operate the entire device. Also, the device is cost-effective so that people of every level can afford it at a reasonable price.

Author 1: Md. Imtiaz Hanif

Author 2: Shakil Ahmed

Author 3: Wahiduzzaman Akanda

Author 4: Shohag Barman

Keywords: Anti-rape; IoT device; smart-safety device; women safety; wearable device; GSM/GPRS

Paper 89: Non-Linear Control Strategies for Attitude Maneuvers in a CubeSat with Three Reaction Wheels

Abstract: Development of nanosatellites with CubeSat stan-dard allow students and professionals to get involved into the aerospace technology. In nanosatellites, attitude plays an im-portant role since they can be affected by various disturbances such as gravity gradient and solar radiation. These disturbances generate a torque in the system that must be corrected in order to maintain the CubeSat behavior. In this article, the kinematic and dynamic equations applied to a CubeSat with three reaction wheels are presented. In order to provide a solution to the atti-tude maneuvering problem, three robust control laws developed by Boskovic, Dando, and Chen are presented and evaluated. Furthermore, these laws are compared with a feedback control law developed by Schaub and modified to use Quaternions. The simulated system was subjected to disturbances caused by a Gravity Gradient Torque and misalignments in the reaction wheels. The effectiveness of each law is determined using the Average of Square of the Commanded Control Torque (ASCCT), the Error Euler Angle Integration (EULERINT), the settlement time, the estimated computational cost (O), and the steady-state error (ess).

Author 1: Brayan Espinoza Garcia

Author 2: Ayrton Martin Yanyachi

Author 3: Pablo Raul Yanyachi

Keywords: Attitude control; attitude maneuvers; adaptive con-trol; feedback control; CubeSat; Quaternions; reaction wheels; comparison

Paper 90: Comparison of the CatBoost Classifier with other Machine Learning Methods

Abstract: Machine learning and data-driven techniques have become very famous and significant in several areas in recent times. In this paper, we discuss the performances of some machine learning methods with the case of the catBoost classifier algorithm on both loan approval and staff promotion. We compared the algorithm’s performance with other classifiers. After some feature engineering on both data, the CatBoost algorithm outperforms other classifiers implemented in this paper. In analysis one, features such as loan amount, loan type, applicant income, and loan purpose are major factors to predict mortgage loan approvals. In the second analysis, features such as division, foreign schooled, geopolitical zones, qualification, and working years had a high impact on staff promotion. Hence, based on the performance of the CatBoost in both analyses, we recommend this algorithm for better prediction of loan approvals and staff promotion.

Author 1: Abdullahi A. Ibrahim

Author 2: Raheem L. Ridwan

Author 3: Muhammed M. Muhammed

Author 4: Rabiat O. Abdulaziz

Author 5: Ganiyu A. Saheed

Keywords: Machine learning algorithms; data science; Cat-Boost; loan approvals; staff promotion

Paper 91: Hindustani or Hindi vs. Urdu: A Computational Approach for the Exploration of Similarities Under Phonetic Aspects

Abstract: The semantic coexistence is the reason to adopt the language spoken by other people. In such human habitats, different languages share words typically known as loan words which appears not only as of the principal medium of enriching language vocabulary but also for creating influence upon each other for building stronger relationships and forming multilin-gualism. In this context, the spoken words are usually common but their writing scripts vary or the language may have become a digraphia. In this paper, we presented the similarities and relatedness between Hindi and Urdu (that are mutually intelligible and major languages of Indian sub-continent). In general, the method modifies edit-distance; and works in the fashion that instead of using alphabets from the words it uses articulatory features from the International Phonetic Alphabets (IPA) to get the phonetic edit distance. This paper also shows the results for the languages consonant under the method which quantifies the evidence that the Urdu and Hindi languages are 67.8% similar on average despite the script differences.

Author 1: Muhammad Suffian Nizami

Author 2: Tafseer Ahmed

Author 3: Muhammad Yaseen Khan

Keywords: Lexical Similarity; Urdu; Hindi; Edit Distance; Pho-netics; Natural Language Processing; Computational Linguistics

Paper 92: A Novel Band Selection Approach for Hyperspectral Image Classification using the Kolmogorov Variational Distance

Abstract: In this paper, we introduce a novel band selection approach based on the Kolmogorov Variational Distance (KoVD) for Hyperspectral image classification. The main reason we are taking interest in KoVD is its unique relation to the classifi-cation error. Our previous works on band selection using the Mutual Information (MI), the Divergence Distance (DD), or the Bhattacharyya Distance (BD) inspire this study; thus, we are particularly interested in finding out how KoVD performs against these distances in terms of the numbers of band retained and the classification accuracy. All the distances in this study are modeled with the Gaussian Mixture Model (GMM) using the Bayes Information Criterion (BIC) / Robust Expectation-Maximization (REM). The experiments are carried on four benchmark Hy-perspectral images: Kennedy Space Center, Salinas, Botswana, and Indian Pines (92AV3C). The results show that band selection based on the Kolmogorov Variational Distance performs better than BD and DD, meanwhile against MI the results were too close.

Author 1: Mohammed LAHLIMI

Author 2: Mounir Ait KERROUM

Author 3: Youssef FAKHRI

Keywords: Band selection; Bayes Information Criterion (BIC); Bhattacharyya Distance; divergence distance; hyperspectral imag-ing; Kolmogorov Variation Distance; Gaussian Mixture Model (GMM); Robust Expectation Maximization (REM); remote sensing

Paper 93: Data Augmentation using Generative Adversarial Network for Gastrointestinal Parasite Microscopy Image Classification

Abstract: Gastrointestinal parasitic diseases represent a latent problem in developing countries; it is necessary to create a support tools for the medical diagnosis of these diseases, it is required to automate tasks such as the classification of samples of the causative parasites obtained through the microscope using methods like deep learning. However, these methods require large amounts of data. Currently, collecting these images represents a complex procedure, significant consumption of resources, and long periods. Therefore it is necessary to propose a computational solution to this problem. In this work, an approach for generating sets of synthetic images of 8 species of parasites is presented, using Deep Convolutional Adversarial Generative Networks (DCGAN). Also, looking for better results, image enhancement techniques were applied. These synthetic datasets (SD) were evaluated in a series of combinations with the real datasets (RD) using the classification task, where the highest accuracy was obtained with the pre-trained Resnet50 model (99,2%), showing that increasing the RD with SD obtained from DCGAN helps to achieve greater accuracy.

Author 1: Mila Yoselyn Pacompia Machaca

Author 2: Milagros Lizet Mayta Rosas

Author 3: Eveling Castro-Gutierrez

Author 4: Henry Abraham Talavera Diaz

Author 5: Victor Luis Vasquez Huerta

Keywords: Generative Adversarial Network (GAN); Deep Con-volutional Generative Adversaria Network (DCGAN); gastrointesti-nal parasites; classification; deep learning

Paper 94: Comparative Analysis of Threat Modeling Methods for Cloud Computing towards Healthcare Security Practice

Abstract: Healthcare organizations consist of unique activities including collaborating on patients care and emergency care. The sector also accumulates high sensitive multifaceted patients’ data such as text reports, radiology images and pathological slides. The large volume of the data is often stored as Electronic Health Records (EHR) which must be frequently updated while ensuring higher percentage up-time for constant availability of patients’ records. Healthcare as a critical infrastructure also needs highly skilled IT personnel, Information and Communication Technol-ogy (ICT) and infrastructure with regular maintenance culture. Fortunately, cloud computing can provide these necessary services at a lower cost. But with all thees enormous benefits of cloud computing, it is characterized with various information security issues which is not enticing to healthcare. Amid many threat modelling methods, which of them is suitable for identifying cloud related threats towards the adoption of cloud computing for healthcare? This paper compared threat modelling methods to determine their suitability for identifying and managing healthcare related threats in cloud computing. Threat modelling in pervasive computing (TMP) was identified to be suitable and can be combined with Attack Tree (AT), Attack Graph (AG) and Practical Threat Analysis (PTA) or STRIDE (spoofing, tampering, repudiation, information disclosure, denial of service and elevation of privilege). Also Attack Tree (AT) could be complemented with TMP, AG and STRIDE or PTA. Healthcare IT security professionals can hence rely on these methods in their security practices, to identify cloud related threats for healthcare. Essentially, privacy related threat modeling methods such as LINDDUN framework, need to be included in these synergy of cloud related threat modelling methods towards enhancing security and privacy for healthcare needs.

Author 1: Prosper K. Yeng

Author 2: Stephen D. Wulthusen

Author 3: Bian Yang

Keywords: Cloud computing; healthcare; threat modelling; security practice; data privacy

Paper 95: Dense Dilated Inception Network for Medical Image Segmentation

Abstract: In recent years, various encoder-decoder-based U-Net architecture has shown remarkable performance in medical image segmentation. However, these encoder-decoder U-Net has a drawback in learning multi-scale features in complex segmentation tasks and weak ability to generalize to other tasks. This paper proposed a generalize encoder-decoder model called dense dilated inception network (DDI-Net) for medical image segmentation by modifying U-Net architecture. We utilize three steps; firstly, we propose a dense path to replace the skip connection in the middle of the encoder and decoder to make the model deeper. Secondly, we replace the U-Net's basic convolution blocks with a modified inception module called multi-scale dilated inception module (MDI) to make the model wider without gradient vanish and with fewer parameters. Thirdly, data augmentation and normalization are applied to the training data to improve the model generalization. We evaluated the proposed model on three subtasks of the medical segmentation decathlon challenge. The experiment results prove that DDI-Net achieves superior performance than the compared methods with a Dice score of 0.82, 0.68, and 0.79 in brain tumor segmentation for edema, non-enhancing, and enhancing tumor. For the hippocampus segmentation, the result achieves 0.92 and 0.90 for anterior and posterior, respectively. For the heart segmentation, the method achieves 0.95 for the left atrial.

Author 1: Surayya Ado Bala

Author 2: Shri Kant

Keywords: Deep learning; Dense-Net; inception network; medical image segmentation; U-Net

Paper 96: Phishing Image Spam Classification Research Trends: Survey and Open Issues

Abstract: A phishing email is an attack that focused com-pletely on people to circumvent existing traditional security algorithms. The email appears to be a dependable, appropriate, and solid communication medium for internet users. At present, the email is submerged with spam content, both in text-based form or undesired text planted inside the images. This study reviews articles on phishing image spam classification published from 2006 to 2020 based on spam classification application domains, datasets, features sets, spam classification methods, and the measurement metrics adopted in the existing studies. More than 50 articles, both from Web of Science and Scopus databases were picked. Achieving the study’s target, we carried out a broad survey and analysis to identify the domains where spam classification was applied. Furthermore, several public data sets, features set, classification methods, and measuring metrics are found and the popular once were pinpointed. The study revealed that Personal Collection, Dredze, and Spam Archives datasets are the most commonly used datasets in image spam classification research. Low-level and image metadata are the most widely used features set. The methods of image spam classification as identified in this study are supervised machine learning, unsu-pervised machine learning, semi-supervised machine learning, content-based and statistical learning. Among these methods, the most commonly utilized is the Support Vector Machine (SVM) which falls under supervised machine learning. This is followed by Na¨ıve Bayes and K-Nearest Neighbor. The commonly adopted metrics for the performance evaluation of the existing image spam classifiers are also identified and briefly discussed. We compared the performance of the state-of-the-art image spam models. Lastly, we pointed out promising directions for future research.

Author 1: Ovye John Abari

Author 2: Nor Fazlida Mohd Sani

Author 3: Fatimah Khalid

Author 4: Mohd Yunus Bin Sharum

Author 5: Noor Afiza Mohd Ariffin

Keywords: Phishing; spam; image spam classification; machine learning; deep learning