Earthquake Prediction using Hybrid Machine Learning Techniques

This research proposes two earthquake prediction models using seismic indicators and hybrid machine learning techniques in the region of southern California. Seven seismic indicators were mathematically and statistically calculated depending on pervious recorded seismic events in the earthquake catalogue of that region. These indicators are namely, time taken during the occurrence of n seismic events (T), average magnitude of n events (M_mean), magnitude deficit that is the difference between the observed magnitude and expected one (ΔM), the curve slope for n events using inverse power law of Gutenberg Richter (b), mean square deviation for n events using inverse power law of Gutenberg Richter (η), the square root of the released energy during T time (DE1/2) and average time between events (μ). Two hybrid machine learning models are proposed to predict the earthquake magnitude during fifteen days. The first model is FPA-ELM, which is a hybrid of the flower pollination algorithm (FPA) and the extreme learning machine (ELM). The second is FPA-LS-SVM, which is a hybrid of FPA and the least square support vector machine (LS-SVM). These two models' performance is compared and assessed using four assessment criteria: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Symmetric Mean Absolute Percentage Error (SMAPE), and Percent Mean Relative Error (PMRE). The simulation results showed that the FPA-LS-SVM model outperformed the FPA-ELM, LS-SVM, and ELM models in terms of prediction accuracy. Keywords—Extreme learning machine; least square support vector machine; flower pollination algorithm; earthquake prediction


I. INTRODUCTION
Earthquake is movements and shaking inside the ground that produce energy in rocks. Like many natural disasters, Earthquake causes many damages, financial loss, and injuries [1]. Earthquakes happen daily in various regions around the world. The more susceptible regions to earth-shaking are japan [2], Indonesia, south California, turkey, Iran, and Taiwan [3]. People can feel the Earthquake if its magnitude is more than 2.5, but if the magnitude is less than 2.5, the Earthquake will not be felt. The magnitude of highly caused damaged earthquakes is more than 4.5 [4]. Sometimes earthquakes are responsible for huge numbers of deaths.
So, scientists work hard in this field to prevent these severe effects. The best effort is done to Alert people in time because a wrong alert will cause unnecessary losses. Certainly, people cannot stop the occurrence of earthquakes, but they can adopt protective measures and precautions to minimize the deleterious effects by predicting earthquake magnitude using machine learning techniques. There are a lot of methods that can be applied to predict earthquake magnitude using different sensors, devices, magnetic and electrical waves, or seismic indicators obtained from the processing of the historical data of earthquakes [1]. Really, there is no perfect model that results from 100%, but at least a trial is made to improve the accuracy as much as possible [3].
Artificial intelligence plays an essential role in predicting and classifying problems. A neural network is a very effective tool to solve complicated non-linear issues [1]. Technology and machine learning provide many robust mechanisms to study seismic data and indicators. Data mining and machine learning are highly successful instrument in the prediction domain, especially if massive data is required as weather forecasting, stock prediction, and so on [5]. Dataset plays an essential role in determining the purposed model's quality and performance, so we search a lot to choose the more exact dataset [4].
There are thousands of machine learning algorithms, but no one is always suitable for all issues because there are many factors that affect that, like the number of features (input indicators), number of records of the dataset, and type of problem (classification or regression), so we will apply some machine learning algorithms and compare their results with each other to determine the more suitable algorithm for this issue. The comparison will be applied here to compare our work with other researchers that applied different machine learning algorithms on the same dataset [6]. In this article, the magnitude will be predicted using historical records of south California. Depending on the seismic indicators as an input for machine learning algorithms, which are obtained from performing some statistical and mathematical equations on raw data (time and magnitude) [7]. Machine learning techniques can predict well if provided by a suitable dataset, which is divided into (70%for train and 30%for test) [8]. In this paper, ELM, LS-SVM, and FPA will be used. ELM (Extreme Learning Machine) is a straightforward algorithm because it depends on a feed-forward neural network. The data moves in one direction (forward direction) with only one hidden layer. ELM is a well-known and well-resulted algorithm in classification and regression domains. Unlike other ANN algorithms that face the overfitting problem and the longrunning time problem, ELM overcomes these problems and can achieve high accuracy and high-speed results. ELM often uses a sigmoid function, which will be applied in this work. SVM (Support Vector Machine) is a well-known deep learning tool for regression that depends on kernel methods giving high prediction results. SVM is a developed algorithm of machine learning to avoid ANN shortages. SVM gives optimal global solutions avoiding local minima problems [9].
LS-SVM (Least Square Support Vector Machine) is the edited version of the SVM algorithm. LS-SVM simplifies the SVM method, but it needs kernel parameters, which are important for regression problems. So we need an appropriate way for the optimal choice of LS-SVM parameters, So optimizing LS-SVM with FPA is done [9].
FPA (Flower Pollination Algorithm) was developed in 2012 by Yang, which uses pollination of flowers. FPA was applied for many non-linear problems giving high results [9].
ELM and LS-SVM are optimized by FPA (Flower Pollination Algorithm), enhancing the accuracy and minimizing errors. ELM and LS-SVM are considered supervised learning where they take seven seismic parameters (time is taken during the occurrence of n seismic indicators (T), average magnitude during n events (M_mean), magnitude deficit, which is the difference between the observed magnitude and expected one (ΔM), the curve slope for n events using inverse power law of Gutenberg Richter (b), mean square deviation for n events using inverse power law of Gutenberg Richter (η), the square root of the released energy during T time (DE1/2)), average (mean) time between events (µ) [10] as network inputs and produces the future magnitude as a network output to train well and be able to test after that. No algorithm can predict 100%, but some shortages of ELM algorithm will be removed by using optimization (swarm intelligence) algorithm FPA to get FPA _ELM (the result of optimizing ELM with Flower Pollination algorithm, which optimizes weights of input nodes and biases of a hidden layer of ELM). The optimization of SVM with FPA is applied to enhance the performance of LS-SVM and get the FPA-LS-SVM model that yields high results.
This work is divided into four parts; the first part is processing the raw data to find the seismic indicators. The second part is to input these indicators into machine learning algorithms to predict or output the expected future magnitude. The third part is to optimize the output using artificial intelligence techniques (swarm intelligence). The fourth part is comparing different results using different algorithms.
In the rest of this paper, the related work is showed in Section II, Applied Algorithms in Section III. Data and parameters calculations in Section IV, methodology in Section V, Results and discussion in Section VI, conclusion and future work in Section VII, and finally the references in machine learning and earthquake fields that is helpful in this work.
II. RELATED WORK Of Course, the earthquake catastrophe takes a large space of scientists' interest, so there are many kinds of research and scientists' efforts which are spent in this domain. Researchers worldwide do their best to predict where and when the Earthquake occurs depending on seismic indicators and other seismic electric signals using machine learning techniques and optimization algorithms.
Adeli et al. applied the probabilistic neural network on seismic indicators to predict the earthquake magnitude. This model predicts well for magnitude from 4.5 to 6 in the southern California region [11]. Moustra et al. who use the artificial neural network to predict earthquake occurrence using time series data and seismic electric signals in the Greece region, there are two case studies which finally led to when appropriate data presented to NN, it can predict accurately [12]. Hegazy et al. optimized ELM with FPA (flower pollination algorithm), which shows a better accuracy when applied to prediction task [13]. In this paper, Ma et al. proposed a genetic algorithm-based neural network giving GA_NN model is applied to six seismic indicators to find the relation between these indicators and the maximum earthquakes in china [14]. Maceda et al. here SVM is applied to earthquake problem. This paper decided that SVM is perfect for solving classification problems using a small-size training dataset [15]. Li et al. depended on the collected data from earthquake stations to be used in machine learning techniques to distinguish between Earthquake and non-earthquake [16]. Rajguru et al. optimized ANN (artificial neural network) with GA (genetic algorithm) to predict the source location and dimensions of the earthquakes giving good results although the complexity of the problem [17]. Wang et al. introduced a deep learning algorithm (LSTM) to find the relation among the earthquakes in different places that is useful in earthquake prediction [18]. Reyes et al. used an ANN model to predict earthquake magnitude in a limited interval or bounded by threshold during the following five days. Applying statistical tests and experiments showed the higher success rate of that method than other machine learning classifiers [19]. Martínez-Álvarez et al. used various seismic indicators as inputs for ANN in different seismic zones, and this showed that the seismic indicators are the best features for earthquake prediction [20]. Zhou et al. showed how efficiently the neural network could predict the earthquake magnitude using the LM_BP algorithm [21]. Morales-Esteban et al. used statistical methods and applied ANN on Earthquake in two seismic regions The Western Azores-Gibraltar fault and the Alborán Sea whether the magnitude is limited in an interval or determined by threshold [22]. Wang et al. proposed a RBF(Radial Basis Function) neural network to earthquake prediction, and the results showed that RBF is an effective tool for that non-linear problem [23]. Alarifi et al. used ANN for earthquake prediction in the Red Sea region, and the results showed the high ability of ANN forecasting than any other statistical models [24]. Asencio-Cortés et al. proposed a machine learning model to predict the earthquake magnitude after the next seven days using cloud infrastructure [25]. Tan et al. proposed support vector machines (SVM) and the results were good and added a new method for predicting earthquakes [26]. Florido et al. interested in processing earthquake data in 655 | P a g e www.ijacsa.thesai.org Chile. The data were labeled by applying to clustering algorithms. It was easy to identify Earthquakes with a magnitude larger than 4.4 [27]. Last et al. used many data mining methods and time series to predict the magnitude of the highest earthquake event in the following year depending on data from the previous records using a multi-objective infofuzzy network algorithm [28]. Rafiei et al. was interested in giving early alarm weeks before maximum earthquake event using classification algorithm (CA) combining with mathematical optimization algorithm (OA) whose role is to find the location of the Earthquake with maximum magnitude [29]. Kerh et al. used a genetic algorithm combined with a neural network to evaluate the response of the Earthquake in Taiwan that produces high results comparing to neural network model only [30]. Mirrashid et al. used the system of adaptive neuro-fuzzy inference to predict the coming earthquakes with magnitude 5.5 or higher developed by three algorithms subtractive clustering (SC), grid partition (GP), and fuzzy Cmeans (FCM). The results showed that the ANFIS-FCM model gives the highest accuracy in predicting the magnitude of the Earthquake [31]. Asim et al. used many models of computational intelligence for earthquake prediction in northern Pakistan, and the feed-forward neural network showed the highest performance compared with other models [32]. Umar et al. combined the Logistic Regression (LR) with Frequency Ratio (FR) to overcome the limitations of each one individually. This model achieved high success and good earthquake prediction in Indonesia [33]. Asim et al. focused on computing the seismic indicators mathematically and then applying the tree classifiers on them to predict the earthquake magnitude in Hindu Kush, showing that the rotation forest gave a good prediction than random forest [34].
Machine learning (ML) methods such as artificial neural networks (ANNs), support vector machines (SVMs), extreme learning machine (ELM), are considered the most commonly used ML models in classification, regression. But these methods may suffer from local minima and overfitting problems due to using local optimization training algorithms such as gradient descent algorithm in ANN [35]. Swarm Intelligence algorithms such as particle swarm optimization (PSO), follower pollination algorithm (FPA), ant colony optimization (ACO), and artificial bee colony (ABC), can solve the problems or drawbacks of machine learning models such as ANN, SVM, and ELM methods [36,37]. Using swarm intelligence or meta-heuristic algorithms in optimizing and training classical machine learning models can enhance the accuracy and generalization ability of these methods [38][39][40][41][42][43].

A. ELM (Extreme Learning Machine)
This model is a feed-forward neural network with only one hidden layer, a very rapid learning method with solutions for many problems caused by traditional neural network algorithms like overfitting, local minima, slowness, achieving better performance higher accuracy. ELM is a simple model that is performed through three steps.
• First step: ELM chooses the weights of input nodes and hidden biases randomly.
• Second step: ELM performs calculations to generate the output matrix of the hidden layer.
• Third step: calculations of the output weight.
ELM uses a single hidden layer with N hidden nodes and f(x) the activation function to learn distinct samples (M) after that, the non-linear problem is turned to the linear problem:

Hβ=T
(1) the hidden output matrix H is defined as follow: � is the weight vector between input nodes and hidden ones, is the bias of hidden nodes where j= 1,2 ,..N. .
=[ , , … . ] is targets' matrix. The errors between the estimated value and the real value equal to zero using the following formula: weights that link between the hidden layer and output one are estimated by the least square solution to the linear problem using the following formula: Where − is the inverse of the H matrix using the Moore Penrose method that makes ELM perform better and faster [13].

B. LS-SVM (Least Square Support Vector Machine)
It is one of deep and supervised learning that can classify data and predict values LS-SVM is the new version of SVM that solves SVM issues. Using LS-SVM, the solution can be founded by solving some linear equations rather than a Quadratic programming problem used in SVM. Let x is the matrix of input data. Using training data, LS-SVM plays a good role in creating the function that shows the dependence of the output on the input. The formula of this function: where W, ϕ (x): → are column vectors with size n*1, and b ∈ . Now LS-SVM calculates that function using the same minimization problem in SVM. Showing the important difference that LS-SVM contains equality constraints, unlike inequality ones in SVM. LS-SVM depends on the least square function [9]. We can minimize the error using this optimization formula: min j(w,e,b)= w + C 656 | P a g e www.ijacsa.thesai.org = ϕ ( )+ b + Where e is n*1 column vector, C ∈ + is the parameter between training errors and solution size. In 2 Lagranian is formed that differs according to w,b,e, a where a is Largrangian multiplier, we have.
Note that: the kernel matrix K = Z And the parameter λ = − , the conditions for optimality give the following solution Types of the kernel function K: Third: RBF kernel (radial basis function) Forth: MLP kernel K(x, ) = ( + θ). • The cross-pollination (biotic) is called global pollination, where the Levy fight takes place • Self-pollination is considered as local pollination • The constancy of the flower can be treated as the probability of reproduction is proportional to two flowers similarity • There is a switch probability p ∈ {0,1} that determines the pollination process, even local or global.
In global pollination, the pollinators can carry pollens for long distances. This ensures creating the fittest (Best) that obeys levy distribution: In local pollination, the pollen can be carried by other factors, and this can be represented using the formula: (14) [13], [9] Algorithm 1: Flower Pollination Algorithm (FPA) 1) Initialization of: • Population size (n) and the flower population Xi (i = 1, 2,..., n) randomly chosen solutions. • switch probability p ∈ {0,1} • Maximum iterations' number.
• The best solution in the initial population (Best) • Function (fn) applied to each flower 2) randomly initial population generated.

3) while (t < Max_iteration)
for each flower do if (rand<p) Draw L (d-dimensional step vector) that undergoes a Levy distribution.
Global pollination for the solution i From all the solutions, choose �, k randomly.
Local pollination for the solution i  Seven parameters are mathematically and statistically calculated during a specific period of time [10] from the earthquake catalog. These parameters are the inputs for the network to predict the future expected magnitude as the output of the network.

1) Earthquake data:
The Earthquake catalog source for south California is available to be downloaded for free using the website (www.data.scec.org.). The historical earthquake data of Southern California is between. 657 | P a g e www.ijacsa.thesai.org 1st January 1950 and 31st May 1978 is divided into 693 periods. Each period consists of fifteen days.
2) Seismic parameters: The seven earthquake indicators are computed for each period of time. This part will show these indicators and their mathematical calculations. The first indicator is the time during the range of n events, which called T. T = tn-t1 (15) T1 is the time of the first event, and tn is the time for the period's nth event.
The second seismic parameter is the average magnitude of the last n events of the period, which is calculated as the following. Mi is the magnitude of the ith event, and n is the number of events.
The third parameter is DE 1/2 the square root of released seismic energy is during time T. The DE 1/2 the parameter can be computed as follow: √Ei is the square root of the seismic energy (E) of the ith event where E can be calculated from the following formula.
The fourth indicator is b_value, that is, the slope of the curve between the log of frequency of occurred earthquakes and the earthquake magnitudes given from Richter inverse power law. The fifth parameter is (η value), which is the sum of mean square deviation based on inverse power law. η value can be computed as follow.
η= ∑(log 10 Ni−(a−bMi)) 2 n−1 (22) The sixth parameter is the Δ M value, defined as the magnitude deficit (the difference between the observed magnitude and the expected one). ΔM is computed as follow: Where M excepected is calculated from the formula: The last one μ in the meantime among the characteristic events which is calculated as follow: n characteristic (25) [10]. Now a sample of the dataset from the period from 1st January 1950 to 30th May 1950 that represents ten periods of time will be presented. Each period consists of fifteen days as shown in Table I.

V. PROPOSED MODELS
The proposed methods depend on the study of historical earthquake data in earthquake catalogs. By processing these data, seismic indicators that are used as inputs for the network can be obtained. Then the ELM algorithm is optimized by FPA to enable us from an optimal prediction of the occurrence of the Earthquake, and also optimizing LS-SVM with FPA to enhance the accuracy of earthquake magnitude prediction. The network architecture contains seven input indicators, which represent the seismic indicators, and the output shows the predicted magnitude during fifteen days. The description of proposed FPA-ELM, and FPA-LS-SVM algorithms are shown in algorithm 2, and algorithm 3, respectively. The data is used in three manners. First, data is divided into 70% for training and 30% for testing. Then, data is divided into 80% for training and 20% for testing. At last data is divided into 90% for training and 10% for testing. The phases of the used models are shown in Fig. 1. The earthquake indicators which have been calculated from the datasets are shown in Table I. fbest optimal hidden weights and biases f(fbest) Sum square error for the NN over the validation set fbest 1) Initialization: • N,N_gen • switch probability p ∈ {0,1} • Function (fn) applied to each flower 2) randomly initial population generated.

VI. RESULTS AND DISCUSSSION
After processing data and introducing the indicators to the proposed models, the performance of each model is estimated using four performance evaluation criteria RMSE, MAE, SMAPE, and PMRE, which can be calculated through the following formulas [13]: First data is divided into 70% for training and 30% for testing and the results showed that the accuracy of FPA-ELM is higher comparing to solely using ELM, as shown in Fig. 2.  And the performance of LS-SVM became higher and better after optimizing LS-SVM with FPA as shown in Fig. 3.
After the experiment, the results showed that the performance of LS-SVM is better than the performance of ELM, according to this, FPA-LS-SVM accuracy is higher than FPA-ELM one. This is obvious from the following Fig. 4.  The values of evaluation tests were as follow in Table II: The column chart that provides the rate of RMSE evaluation criteria for all models when data is divided into 70% for training and 30% for testing is applied as shown in Fig. 5. The column chart that provides the rate of MAE evaluation criteria for all models when data is divided into 70% for training and 30% for testing is applied as shown in Fig. 6. The column chart that provides the rate of SMAPE evaluation criteria for all models when data is divided into 70% for training and 30% for testing is applied as shown in Fig. 7. The column chart that provides the rate of PMRE evaluation criteria for all models when data is divided into 70% for training and 30% for testing is applied as shown in Fig. 8. Second, data is divided into 80% for training and 20% for testing and the results also showed that the accuracy of FPA-ELM is higher comparing to solely using ELM, as shown in Fig. 9. Fig. 9. ELM Accuracy VS FPA-ELM Accuracy when Data is divided into 80% for Training and 20% for Testing. 661 | P a g e www.ijacsa.thesai.org And the performance of LS-SVM became higher and better after optimizing LS-SVM with FPA as shown in Fig. 10. And FPA-LS-SVM accuracy is higher than FPA-ELM one. This is obvious from the following Fig. 11. In this case the values of evaluation criteria were as follow in Table III:  The column chart that provides the rate of RMSE evaluation criteria for all models when data is divided into 80% for training and 20% for testing is applied as shown in Fig. 12. The column chart that provides the rate of MAE evaluation criteria for all models when data is divided into 80% for training and 20% for testing is applied as shown in Fig. 13. The column chart that provides the rate of SMAPE evaluation criteria for all models when data is divided into 80% for training and 20% for testing is applied as shown in Fig. 14 The column chart that provides the rate of PMRE evaluation criteria for all models when data is divided into 80% for training and 20% for testing is applied as shown in Fig. 15. At last data is divided into 90% for training and 10% for testing and the results also showed that the accuracy of FPA-ELM is higher comparing to solely using ELM, as shown in Fig. 16. And the performance of LS-SVM became higher and better after optimizing LS-SVM with FPA as shown in Fig. 17. And also FPA-LS-SVM accuracy is higher than FPA-ELM one. This is obvious from the following Fig. 18. In this case the values of evaluation criteria were as follow in Table IV:  The column chart that provides the rate of RMSE evaluation criteria for all models when data is divided into 90% for training and 10% for testing is applied as shown in Fig. 19. Fig. 19. RMSE Evolution Criteria for All Models when Data is divided into 90% for Training and 10% for Testing.
The column chart that provides the rate of MAE evaluation criteria for all models when data is divided into 90% for training and 10% for testing is applied as shown in Fig. 20. 663 | P a g e www.ijacsa.thesai.org The column chart that provides the rate of SMAPE evaluation criteria for all models when data is divided into 90% for training and 10% for testing is applied as shown in Fig. 21. The column chart that provides the rate of PMRE evaluation criteria for all models when data is divided into 90% for training and 10% for testing is applied as shown in Fig. 22.

VII. CONCLUSION
In this paper, two hybrid models, FPA-ELM and FPA-SVM, were proposed to forecast earthquake magnitude in the southern California region. Some seismic indicators were generated mathematically and statistically from the dataset to be employed as inputs for the proposed models. The proposed models were evaluated using four criteria. These criteria are RMSE, MAE, SMAPE, and PMRE. The experimental results showed that the accuracy of both ELM and LS-SVM were increased after optimizing it by FPA algorithm. The performance of proposed FPA-LS-SVM outperformed the FPA-ELM model according to all compared criteria. Also, FPA-LS-SVM is the best in reducing the false alarm ratio in earthquake prediction.