Generalized Epileptic Seizure Prediction using Machine Learning Method

—In recent years, the electroencephalography (EEG) signal identification of epileptic seizures has developed into a routine procedure to determine epilepsy. Since physically identifying epileptic seizures by expert neurologists becomes a labor-intensive, time-consuming procedure that also produces several errors. Thus, efficient, and computerized detection of epileptic seizures is required. The disordered brain function that causes epileptic seizures can have an impact on a patient's condition. Epileptic seizures can be prevented by medicine with great success if they are predicted before they start. Electroencephalogram (EEG) signals are utilized to predict epileptic seizures by using machine learning algorithms and complex computational methodologies. Furthermore, two significant challenges that affect both expectancy time and genuine positive forecast rate are feature extraction from EEG signals and noise removal from EEG signals. As a result, we suggest a model that offers trustworthy preprocessing and feature extraction techniques. To automatically identify epileptic seizures, a variety of ensemble learning-based classifiers were utilized to extract frequency-based features from the EEG signal. Our algorithm offers a higher true positive rate and diagnoses epileptic episodes with enough foresight before they begin. On the scalp EEG CHB-MIT dataset on 24 subjects, this suggested framework detects the beginning of the preictal state, the state that occurs before a few minutes of the onset of the detention, resulting in an elevated true positive rate of (91%) than conventional methods and an optimum estimation time of 33 minutes and an average time of prediction is 23 minutes and 36 seconds. Depending on the experimental findings’ The maximum accuracy, sensitivity, and specificity rates in this research were 91 %, 98%, and 84%.


INTRODUCTION
A set of neurological illnesses known as epilepsy can afflict people of any age and are defined by a persistent propensity to cause repeated seizures. The progressive neurobiological process known as "epileptogenic" causes epilepsy [1]. The aberrant synchronized electrical activity of brain neurons is the primary cause of epilepsy, a persistent, non-communicable condition [2,3]. The oldest and most prevalent neurological condition in the globe is epilepsy [4,5]. Epilepsy is the third most prevalent neurological condition in the world, affecting 50 million individuals worldwide, based on a World Health Organization (WHO) study from June 2019 [6]- [10]. An abnormality of the brain characterized by recurrent seizures is called Epilepsy. Typically, a seizure is described as a sudden (abrupt) shift in behavior because of an abnormal disturbance in the electrical activity of the human brain [11]. Some minute electrical impulses are continuously produced by the brain resulting in a consistent pattern. Neurotransmitters are the chemical signals which carry electrical signals along with neurons, and neural networks in the brain and throughout the entire body [12]. Fig. 1 illustrates how epilepsy causes the brain's electrical cycles to become unbalanced and cause recurring seizures. Individuals having seizures must face synchronized electrical energy bursts that may alter their cognition, movements, or perceptions and disturb the regular brain electrical sequence for a period. The main symptoms of epilepsy are varied and complex due to variations in the beginning location and method of propagation of aberrant electrical activity in the brain [13]. Recurrent seizures can have a long-lasting severe impact on a patient's psychological and cognitive abilities and pose a serious risk to their lives [14]. Investigation into the treatment and diagnosis of epilepsy, therefore, has huge therapeutic implications.
Epileptic seizures can be prevented by medication if they are predicted early, giving ample time before they happen. Four distinct states occur during epileptic seizures. The first state that emerges before the beginning of the seizure is the prodromal (pre-ictal) state, the second state i.e., the ictal state, starts with the exact beginning of the seizure and completes leaving a threat, after the ending of the ictal state comes the third state i.e., postictal state, and last is an interictal state, that begins after the postictal state of the first seizure and is finished before the begin of the preictal state of the subsequent seizure. The various input conditions for three distinct channels are depicted in Fig. 2. Additionally, the onset of the preictal state can be used to anticipate seizures [15].  The remainder of the essay has been structured as follows: The complaints are covered in Section II, the background of epilepsy seizures is covered in Section III, and the proposed technique is covered in Section IV. The experimental results are reported in Section V. Section VI brings the essay to a close and discusses unfinished business.

II. TYPES AND SYMPTOMS OF EPILEPTIC SEIZURE
Neurologically epilepsy is characterized by abnormal activity of the brain that results in seizures resulting in strange behavior, emotional sensations, and most of the time total loss of conscious [16]. When a person experiences at least 2 seizures that are not related to another established medical problem, such as opiate withdrawal or exceptionally low blood sugar, an epilepsy diagnosis is typically made [17]. That part of the brain from which the seizure frequently originates in early phases causes disturbance in functions of the affected part. The right side of the body is governed by the left half of the brain, while the left side of the body is governed by the right half of the brain. Typically, Doctors determine seizure as either generalized or focal depending on where and how the abnormal activity of the brain starts [18]. Focal seizures are caused by the aberrant activity of the brain in a specific part of the brain, while Generalized seizures appear to be involved in the entire brain [19]. Neuro-experts have divided seizures into two main groups, partial and generalized, depending on the signs, as depicted in Fig. 3 [20,21]. The symptoms of a partial seizure, which are mostly brought on by damage to the cerebral hemisphere, can be utilized to define it. Additionally, there are two basic categories of partial seizures: simplepartial and complex-partial. In simple-partial, the person appears cognizant and can typically speak, whereas, in complex-partial, patients behave erratically, become disoriented, and frequently mumble and chew. A generalized seizure comprises two main components as well. While definitive seizures are challenging to detect because they lack motor signals, non-conclusive seizures can be identified by their clear motor symptoms. The person is unable to move or say anything other than to gaze [22,23].    4 illustrates the wide range of seizure signs. Throughout a seizure, some patients just stare aimlessly for a specific period, while others continuously jerk their limbs or legs. One seizure may not necessarily indicate epilepsy. For an epileptic classification, at least 2 unprovoked seizures (seizures caused by unknown reasons) must be occurred within 24 hours away [24,25]. Any brain-coordinated process can be disturbed by seizures since aberrant brain activity causes Epilepsy. Some specific symptoms determine the epilepsy type. Some of the below-mentioned sensations will be realized from time to time while others become consistent. Most of the time, an individual having epilepsy experiences the same type of seizure every other time. Symbols and Seizure indications may contain [26]:  Brief uncertainty (confusion).   RELATED WORK Early studies on Epilepsy prediction were conducted in the 1970s utilizing feature extraction methods that were linear [27]. Because of the non-linear character of EEG signals in the 1980s, researchers were able to apply these approaches for feature extraction thanks to the advent of non-linear methods [28,29]. The utilization of the pre-ictal phase for epilepsy identification was also implemented in this decade with the diagnosis of the EEG patterns associated with epilepsy, including preictal, ictal, and interictal patterns. Salant et al. conducted early ES prediction almost 6 seconds before the seizure began in 1998 [30], and Drogenlen et al. 2003 expanded on this work [31]. They employed a feature called Kolmogorov entropy to forecast epilepsy 2 -40 minutes before it began. The very first worldwide session on epilepsy forecasting took place in 2002, and several epilepsy facilities contributed a database of multi-day EEG recordings. Eventually, this database was the subject of other investigations [32]. Mormann et al. discovered in 2003 that the periodic synchronization of various EEG channels diminishes before seizure onset [33] using this theory that the hyper-synchronous discharge of the brain's neurons causes ES. Research studies on substantial EEG data have cast doubt on the accuracy of metrics computed in the past century during the first 10 years of the ongoing century. Some researchers discovered that these findings belonging to past studies were based on a limited number of carefully chosen data that could not be replicated on a large amount of previously unreported data. In worldwide workshops held on the subject, it was determined to hold contests on seizure prediction. These contests were created to make it easier to compare the effectiveness of algorithms that had been trained on the same dataset [34,35]. The International Workshop on Seizure Prediction 3 (IWSP3) and the International Workshop on Seizure Prediction 4 (IWSP4) collaborated on the inaugural seizure prediction competition, which took place in 2007. The participants in both events received continuous iEEG recordings from 3 epilepsy patients. The algorithms' results obtained, however, fell short of expectations.
The 2014 American Epilepsy Society Seizure Forecasting Trial used long-term iEEG recordings of epileptic canines as well as short-term human iEEG containing 942 seizures acquired over more than 500 days. The same training and testing data, lasting 10 minutes, was given to each contestant. An evaluation metric for effectiveness was the Area Under the Curve (AUC). Another competition by Melbourne University with a similar format comprised long-term iEEG recording with 1139 seizures [36]. Any algorithm estimating the fundamental properties of EEG signals for epilepsy predictions or machine learning algorithms based on these basic properties was eligible for the competition. In any scenario, we are still unsure of the ideal characteristics or techniques. People entered algorithms that were excessively complex in the competitions. Therefore, it is challenging to determine which attribute or ML method was better. A novel solution presented by Maturana et al. [37] may be effective for a variety of patients. They determined that the crucial slowing of neural activity served as an ES prediction indication. Fig. 5 shows a timeline for the evolution of EEG data measurements. Readers who are interested in learning more about the background of these advances should consult [38] for additional details. Fig. 6 from Natu et al. [39] discussion on the development of technology for epileptic seizure detection.  Since the turn of the century, scientists have been trying to get past the difficulties in diagnosing and predicting epilepsy. The initial emphasis of the ES forecast study was mainly on the evaluation of EEG recordings because EEG data are an important source to observe brain function before, throughout, and after epilepsy. Eye rotations, blinks, heart signals, and muscular noise contaminate EEG signals. To lessen the impact of these numerous sources of interference and distortions, a variety of filtration and noise reduction techniques are employed [40]. Substantial features are required for developing Machine learning models for the classification and identification of interictal and pre-ictal phases once artifacts have been removed. Fig. 7 illustrates the traditional Machine learning approach for epilepsy forecasting and emphasizes the key distinction between the application of Machine learning and Deep Learning methods.

A. Signal Processing
One important step in the analysis of raw biological signals is the identification of noise and artifacts. Filtering of these artifacts is required to lessen their impact on feature extraction. For filtering, a variety of methods have been used, including many filters such as Wavelet, Band-Pass, Finite Impulse response, and adaptive filters. Additionally, such processing is done to make the data standardized so that it may be compared to the records of other patients.

B. Feature Extraction and Collection
Reliable features are a requirement for all prediction models. These features can be divided into unilateral (steps undertaken on each EEG channel independently) and multimodal (measurements taken on two or more EEG channels) categories according to the quantity of EEG channels. Numerous techniques recommended in the literature were used to do the EEG study. As shown in Fig. 8, these methods were widely divided into 4 categories: frequency domain, time domain, nonlinear approaches, and timefrequency domain.

C. Classification
Artificial neural networks (ANN), fuzzy logic, k-means clustering, support vector machines (SVM), and decision trees are used to ensure the identification of epileptic seizures from provided EEG data. Most of the time, feature values with thresholds are used to draw inferences.  The primary goal of this research is to use computer vision algorithms to classify EEG signals as epileptic signals (preictal phase) or non-epileptic signals for the diagnosis of epilepsy. The signals in this are centered on ictal release for epileptic signals, whereas non-epileptic signals are consisting of both normal and pathological inter-ictal discharges for nonepileptic signals. The technique utilized to accomplish this is as follows:

1) EEG signal normalization and signal extraction.
2) To generate a feature collection, extract statistical features.
3) Apply wavelet decomposition to the signal to break it down.

4)
To reduce the runtime, using k-means clustering for reducing the number of features in the feature set.

5)
Training of the Support Vector Machine using the condensed feature set.
6) On a test data set, compare how well the SVM is based on the entire and modified feature set performed in separating epileptic from non-epileptic signals. Fig. 9. Machine learning proposed model. 506 | P a g e www.ijacsa.thesai.org Fig. 9 displays the suggested method's block diagram. The gathering of EEG datasets, the pre-processing of these signals, feature extraction, and classification are the four primary stages of this approach. Below is a detailed explanation of these actions.

D. EEG Dataset Collection
The CHB-MIT database, which contains EEG recordings, was used in this investigation. All signals were captured and made accessible to the public by Boston Children's Hospital. Many recordings last an hour, but others go on for two or four hours. 24 portions of an EEG recording are separated and recorded in the EDF database schema. An EEG recording is represented by each EDF file. The CHB-MIT dataset signals include 686 EEG recordings from 23 people ranging in age from 1.5 to 17 years old. Each participant is represented by several EEG signals from various channels, and the dataset's sample frequency is 256 Hz. The Chb01 (1st subject) and Chb21 (second subject) in this database are the same individuals who were enrolled over 1.5 years. Information from the CHB-MIT dataset is shown in Table I.  Fig. 11 [41]. Dataset information from EEGLAB executing in MATLAB is shown in Fig. 12.

E. Pre-Processing of EEG Signals
Preprocessing is the procedure of transforming raw data into a format that is more suitable for further analysis and interpretable for the user. In the case of EEG data, preprocessing usually refers to removing noise from the data to get closer to the true neural signals.
There are several reasons for preprocessing of EEG data is necessary. First, the signals that are picked up from the scalp are not necessarily an accurate representation of the signals originating from the brain, as the spatial information gets lost. www.ijacsa.thesai.org Secondly, EEG data tends to contain a lot of noise which can obscure weaker EEG signals. Artifacts such as blinking, or muscle movement can contaminate the data and distort the picture. Finally, we want to separate the relevant neural signals from random neural activity that occurs during EEG recordings. Fig. 13 represents the preprocessing pipeline that is followed in this research. Fig. 13. Preprocessing pipeline.
After following the preprocessing pipeline, we reduce the 23 channels into 8 channels which are: The results of these 8 channels are shown in Fig. 14. The dataset with the following setting is shown in Fig. 15. Save the dataset as a ‗.set' file extension for the next step which is feature extraction.

F. Feature Extraction
To collect the abstract information required for the classification procedure at this point, feature extraction was used to remove the duplicate information from the EEG signals. When analyzing signals using wavelet transform, it's crucial to choose the right wavelets and the right number of layers of decomposition. The signal's prominent frequency components are used to determine the number of decomposition levels. The amount of decay is selected so that the wavelet coefficients preserve the frequencies necessary for the identification of the signal. The MATLAB software program was used to calculate the wavelet coefficients. In this study, we extract several features including the Fast Fourier transform, wavelet transforms, Mean, and Standard Deviation for alpha, beta, theta, delta, and gamma frequencies as shown in Fig. 16. 12 Features value extracted and saved as ‗.mat' for normal and epilepsy as shown in Fig. 17.  508 | P a g e www.ijacsa.thesai.org

G. Classification
The goal of the model was to determine the most effective dimensionality reduction method that, when combined with SVM, would provide the maximum degree of sensitivity and validity for gathering statistics as either epileptic or not. In a high-dimensional space, the support vector machine (SVM) creates a hyperplane or series of hyperplanes that can be utilized for classification. SVM has been demonstrated to be a useful supervised model based on a statistical learning tool with high generalization. The principle underlying SVM is the separation of two data sets. This separation can be linear or non-linear. In the case of linear separation, SVM uses a discriminant hyperplane to distinguish classes. However, in the case of nonlinear separation, SVM uses the kernel function to identify decision boundaries. Compared with that of other supervised algorithms, such as ANNs [42,43] and KNN, the computational complexity of SVM is low [44]- [46].
In this study, the model of all data of 23 people is used for training each time, and data of the remaining 1 subject is used for the test. We explore multiple training, validation, and testing divisions of the dataset to see the effect on the performance achieved on these subsets. With an increase in training data as compared to testing data, an increase in performance for accuracy and sensitivity is observed. In our experimentation, a train-validation-test ratio of (70%-20%-10%) is followed. This ratio resulted in a total of 50 epochs for training. Fig. 18 shows the Epileptic seizure detection training, validation, and testing of 24 patients.

VI. RESULTS
The clinical employment of ES prediction methods requires a sufficient performance and quality check and different evaluation metrics have been discussed in this section. Our end goal is to classify data into two classes nonseizure and seizure. To measure the performance of the proposed method, a confusion matrix, shown in Fig. 19 is obtained. In this table, (TP) represents true positive (epileptic region predicted as epileptic), TN represents true negative (non-epileptic region predicted as non-epileptic), FP represents false positive (non-epileptic region predicted as epileptic), and FN represents false negative (epileptic region predicted as non-epileptic).
One main challenge in classifying seizure data is the imbalance of the dataset. This comes from the fact that seizures (and so preictal data) do not occur frequently and the size of the interictal class is much larger than the preictal class. This may cause naive classification, which means that the classifier labels all the data as interictal and completely ignores the other class, and still reports a good precision. To avoid this, we propose a few contingency plans. First, we do not rely only on accuracy as the main factor to choose the best classifier. More informative factors can be sensitivity and specificity. Here, accuracy is the correct classification rate, sensitivity is the proportion of the epileptic regions that are correctly classified and specificity is the proportion of the non-epileptic regions that are correctly classified. Sensitivity is defined as the ratio of the total number of true positives (TP) to the sum of the total number of true positives and false negatives (FN). True positive is defined as the detection of a seizure in a segment which is also identified as a seizure segment by experts. Whereas false negatives represent a seizure segment not being classified as so by the algorithm, while the segment is identified as a seizure segment by experts. Specificity is defined as the ratio of the total number of true negatives (TN) to the sum of the total number of true negatives and false positives (FP). True negative is defined as the detection of a non-seizure segment which is also identified as a non-seizure segment by experts. Whereas false positives represent a seizure segment being classified by the algorithm, while the segment is identified to be a non-seizure segment by experts Accuracy is defined as the ratio of the sum of TP and TN to the sum of TP, TN, FP, and FN. Hence, the higher the value, the better the performance is achieved. Classification results are shown in Table II.   TABLE II. CLASSIFICATION RESULTS In Fig. 19 predicted as benign and 2.0% are predicted as malignant. Out of 100 malignant cases, 84.0% are correctly classified as malignant and 16.0% are classified as benign. Overall, 91.0% of the predictions are correct and 9.0 % are wrong. Considering the proposed approach system's specificity (SP), sensitivity (SN), and accuracy (AC) allows for an evaluation of its performance. The proportion of the total number of true positives (TP) to the total number of false negatives and true positives is known as the sensitivity (FN). True positive is the identification of a seizure in a part that is also recognized by professionals as a seizure section. False negatives, on the other hand, refer to a seizure segment that is recognized as a seizure segment by specialists but is not classified as such by the algorithm. Therefore, better performance is obtained as the greater the value. The classification time of SVM is shown in Table III.

VII. CONCLUSION
The automatic approaches for detecting epileptic seizures have been suggested in this paper. Data from CHB MIT were utilized to detect seizure events. An SVM classifier was used for classification, and maximum accuracy of 90.7% was attained. Training of the classification algorithm was carried out across patients to assess the effectiveness of the suggested method, and the experimental findings were as a result. The maximum accuracy, sensitivity, and specificity rates in this research were 91.0%, 98%, and 84% correspondingly as shown in Fig. 20.