A Hybrid Material Generation Algorithm with Probabilistic Neural Networks for Solving Classification Problems

—Classification is based on machine learning, in which each element in a set of data is classified into one of a predetermined set of groups. In data mining, an artificial neural network (ANN) is the most significant methodology because of the exact results obtained through this algorithm and applied in solving many classification problems. ANN consists of a group of types of feed-forward networks, feed-back network, RFB networks, and the probabilistic neural networks (PNN). For classification issues, the PNN is frequently utilized. The primary goals of this research are to fine-tune the weights of neural networks to enhance the classiﬁcation accuracy. To accomplish this goal, the Material Generation Algorithm (MGA) was investigated with PNN in a hybrid model. Newly, the hybridization of algorithms is ubiquitous and it has led to the development of unique procedures that outperform those that use a single algorithm. Several distinct classification tasks are used to test the efficiency of the suggested (MGA-PNN) approach. The MGA algorithm's efficiency is evaluated using the PNN training outcomes generated, and its outcomes are compared to that of other optimization strategies. By 11 benchmark datasets, the suggested algorithm's performance in terms of classification accuracy is evaluated. The outcomes display that the MGA outperforms the biogeography based optimization, firefly method in terms of classification accuracy.


I. INTRODUCTION
Recently, our ability to collect data has greatly improved [1]. Millions of databases have been utilized in a variety of applications, including marketing campaigns, company management, scientific endeavors, and several others [2]. The availability of sophisticated and affordable database systems has resulted in a growing growth in the number of such databases [2,3]. There is a great need to resort to intelligent approaches to get knowledge from processed data. As a result, data mining has become a popular research field [4,5]. Classification is a supervised machine learning problem in which a collection of training data is used to map input data into one of several predetermined categories [5,6]. Any classification algorithm's purpose is to create a technique which can reliably predict the category of unobserved examples [7]. Classification has numerous uses in a range of areas, including document organization, medical diagnosis, and many more. Many classification techniques and models have been devised and used as a result of this, including radial basis function (RBF) network [8], naive Bayes (NB) classifier [9], support vector machines (SVMs) [10], and K-nearest neighbours (KNN) algorithm [11] and several others.
To solve classification difficulties, ANNs have been frequently used [12]. There are several different kinds of ANNs [13] as modular neural networks, RBF networks, feedforward neural networks, learning vector quantization neural networks, and several others. Not only do the aforementioned ANNs differ in terms of how they apply to learning, but also in terms of their control method and topology [14].
The PNN is considered a feed-forward neural network that can be used to predict challenges and solve classification. The gradient steepest descent approach, a common optimization technique, is used in the PNN technique to minimize errors between the predicted and actual output functions by allowing the network to modify the weights of the network [15].
The goal of merging metaheuristic algorithms and NN to build classification tools like the PNN is to improve effectiveness and efficiency while also allowing for more accurate and faster solutions of complex problems.
The problem statement was specified by one basic research question: Can the searchability of the Material Generation Algorithm have the ability to choose the best weights so we can get the best accuracy?
Metaheuristics are split into two kinds: population-based and single-based. Genetic algorithm (GA) [16], particle swarm optimization (PSO) [17], water evaporation optimization (WEO) [18], differential evolution (DE) [19], firefly algorithm (FA) [20], artificial bee colony (ABC) [21] and several others are population-based metaheuristics. Local search (LS) [22], tabu search [23] and simulated annealing (SA) [24] are examples of single-based metaheuristics. www.ijacsa.thesai.org The Material Generation Algorithm (MGA) was investigated and utilized in this research to enhance the efficiency of the PNN in solving the classification problem [25]. The PNN was utilized to generate some preliminary solutions that were generated at random and the MGA was utilized to tune the weights of the PNN.
The study is divided as follows. Section II is showed a background and literature review for the MGA. In Section III, the background on big data and its issues is presented. The PNN approach is described in detail in Section IV, while the MGA is described in detail in Section V. The proposed methodology is then detailed in Section VI. In Section VII, the outcomes are presented. Finally, in Section VIII, the conclusion is offered.
The author in [25] suggested that MGA be used to solve engineering challenges in the best possible design. The MGA has identified some of the advanced and fundamental parts of materials chemistry as inspirational concepts, notably the formation of chemical molecules and chemical reactions in the production of new materials. This research demonstrates that the MGA is able to produce highly competitive, if not exceptional, outcomes that outperform other metaheuristics.
The author in [26] presented the optimal design of truss structures using the MGA. For statistical purposes, many optimization runs are carried out. The results showed that the MGA may produce extremely acceptable, resulting in the smallest potential weight compared with the outcomes of a number of metaheuristic methods.
The author in [27] optimized the moulding parameters of resin reinforced sand mould cores using a hybrid Taguchi-WASPAS-MGA to get the optimum outcomes.
The author in [28] used sunflower optimization algorithm and MGA for efficient generation and analysis of materials and equipment of mechanical reducer for the material handling industry. The results showed that the technique is precise in providing better output.

III. BIG DATA: OPPORTUNITIES AND CHALLENGES
The existence of trillions of records that have been produced by millions of people and kept in a variety of online sites suggests the concept of big data [28]. Scientists can use the big data to address issues with small data samples by giving adequate test data to evaluate models, better handling noisy train data, avoiding overfitting models to train data and loosening theoretical model assumptions. In Big Data, there are challenges, like capturing, transferring, storing, cleaning, analyzing, filtering, searching, sharing, securing, and visualizing data [29]. Different research communities have been battling to produce a dynamic, fast, new, and userfriendly Big Data technology [28], which contribute to solving many problems related to data and how to retrieve it.

IV. PROBABILISTIC NEURAL NETWORK (PNN)
The PNN was proposed for the first time in [30]. The training of a PNN does not entail using heuristic searches to find the best smoothing factor, as this is an optimization problem [31]. A four-layered feed-forward network is formed: (a) input layer, (b) hidden layer, (c) summation layer, (d) output layer, using a statistical algorithm. Fig. 1 illustrates the architecture of a typical PNN. Each input neuron acts as a unique characteristic from the train and test datasets [32]. The PNN network's four levels are detailed below:  Input layer: Each indicator variable is represented by a neuron. The categorical factors are made up of N−1 neurons, with N being the number of categories. By subtracting the middle value, the input neuron is expected to normalize the value range. It then divides it into quartile range values.
 Hidden layer: Each occurrence in the training dataset is represented by a single neuron. Each training sample has one unit which creates a product of the input vector x and the weight vector wi, zi = x.wti, and then runs the nonlinear procedure:  Pattern/summation layer: A single pattern neuron is available for each class of objective criteria. The weight value that emerges from the hidden neurons is given to the pattern neurons that match with the hidden neurons. Each training group's objective class is stored alongside each hidden neuron. which combines the contributions for each type of input and provides the output of a network as a probabilistic vector:  Output / Decision layer: creates binary classes that correspond to the decision classes Ωs and Ωr, s≠r, s, r = 1, , … ,q relies on the following criteria of classification.
There is just one weight for these nodes, C, the number of training samples in each class and the prior membership probabilities, C, given by the cost parameter: In the year 2021, MGA is a bioinspired algorithm inspired by material chemistry [25]. To construct and formulate a welldefined mathematical model for the new method, the basic principles of chemical compounds, reactions, and stability are used. MGA determines a number of materials (Mat) made of several periodic table elements (PTEs), based on the fact that much natural evolution technique create a preset population of solution candidates that are evolved by random changes and selection. A materials numbers are examined as solution candidates (Mat n ) in this algorithm, each of which is made up of some constituents that are represented as decision variables (PTE j i ). The following is the mathematical representation of these two components: There are two variables in the mathematical equations where d denotes the number of items (decision variables) in each subject (the candidate solutions) and n denotes the total number of items considered. PTE j i is determined at random in the first step of the optimization procedure, whereas the decision variable boundaries are defined based on the problem under consideration. The initial placements of PTEs in the search space are set at random: Where PTE j i (0) is the beginning value of the jth element in the ith material; and are the minimum and maximum permissible values for the jth decision variable of the ith solution candidate, respectively; and Uni f(0, 1) is a random number in the [0, 1] range.
To mathematically simulate chemical compounds, all PTEs are considered to be in the ground state, which can be externally activated by magnetic areas, photon or light absorption, and interactions with other colliding entities or particles in the case of ions or other individual electrons. Elements have a tendency to gain, lose, or even share electrons with other PTEs due to their varied stabilities, resulting in ionic or covalent compounds. Using the initial Mat in equation (5), d random PTEs are chosen to model the ionic and covalent compounds. The probability theory is used to model the operations of sharing electrons, gaining, and losing for the selected PTEs. To achieve this goal, for each PTE, a continuous probability distribution is used to configure a chemical molecule, which is then regarded a new PTE, as follows: new k r1 r e -, k 1, ,… ,d R2 and r1 are random integers uniformly distributed in the intervals [1, d] and [1, n], respectively; is from the Mat that was chosen at random; e-is the probabilistic component for simulating electron loss, gain, and sharing in the mathematical model represented with a normal Gaussian distribution; and PTE k new new is the new material. PTEs are used to construct a new material (Matnew1), which is being added as a new solution filter to the list of the raw material (Mat(: The candidates for the overall solution are then integrated and displayed as follows: Mat The symbol for the standard deviation in the previous equation is ; the symbol for the variance is ; µ is the www.ijacsa.thesai.org median, expectation of the distribution or mean, which corresponds to the randomly chosen PTE ( ); and e is the natural logarithm's Naperian base or natural base.
Chemical reactions are a type of manufacturing method in which various chemical changes are decided for producing products with altered characteristics that are even distinct from the initial reaction mixture. To simulate the procedure of manufacturing new materials mathematically using the reaction mixture idea, an integer random number (l) is determined depend on the materials in the first Mat that are examined for participation in a reaction mixture. After that, to decide the placements of the picked materials in the initial Mat, l integer random numbers (mj) are created. As a result, new solutions are created that are linear combinations of the previous ones.
The Mat m is the mth randomly chosen material from the first Mat, Mat new2 is the new material created by the chemical reaction idea and p m is the normal Gaussian distribution for the mth material participation factor. The MGA was utilized in this research to determine the best weights to employ with the PNN algorithm. To address the classification issues, it suggested the MGA-PNN, a new hybrid method. As shown in Fig. 4, the method begins with the PNN producing the initial weights randomly. Following that, the input values are multiplied by the matching weights wij, which are based on the PNN model's values.
The proposed MGA-PNN structure is shown in Fig. 5. It is divided into two sections: the first is the PNN, which makes utilize the training data. The data that has been tested is then classified. The accuracy is calculated using equation (12). The MGA is then used to fine-tune the PNN weights. The new data will then be tested for accuracy. This method is continued until the end criteria have been fulfilled.
The object is categorized as TN if both the expected and actual labels are negative. The class is categorized as TP if both the expected and actual labels of the object are positive. Further, the class is categorized as FP, when the anticipated class is positive, but the actual label is negative. The anticipated class is negative, but the actual label is positive, therefore it's categorized as FN. See Table I [33].

VII. EXPERIMENTS AND RESULTS
The efficiency of the MGA-PNN approach is measured in this research using 11 benchmark UCI datasets. We will compare the outcomes of proposed method (MGA-PNN) with PNN, biogeography-based optimization (BBO) and firefly algorithm (FA).

A. Description of the Dataset
These studies are based on a set of a datasets, which may be found at [ 7 ] . The previous link provides the size of the testing and training sets. The split was made using a basic train/test split algorithm with a training size is equal to 0.7 and a testing size is equal to 0.3.

B. The Categorization Quality Evaluation Results
Experiments are run on a Windows 10 professional PC with MATLAB R2015b and 16 GB RAM with an Intel ® Xeon ®CPU ES-1630 v3 @3.70 GHz computer. Table II displays the settings for the input parameters.
The recommended method's rating quality is determined by their ability to improve the desired solution. Table III compares  the performance of the proposed MGA technique with PNN, FA [34] and BBO in terms of ratio G-mean, error rate (%), specificity, accuracy and sensitivity. The outcomes showed that the proposed algorithm is superior in 9 out of 11 datasets over the rest of the algorithms in Table III. The original PNN achieved 65.1% accuracy in the PIMA Indian diabetes (PID) dataset, while the proposed MGA-PNN attained 82.8 percent accuracy. All the best outcomes showed in bold. The suggested technique has strong exploitation capabilities and can come up with superior solutions because a large number of candidates are grouped around the best solution. On almost all datasets, the suggested MGA outperforms the original PNN approach in terms of error rate, sensitivity, specificity and accuracy.
The MGA's performance was further validated by examining whether it differed statistically from the FA. For classification accuracy, a t-test with a significance interval of 95 percent 5 was used able IV displays the suggested approach's standard deviations and accuracy means. The performance of the MGA is clearly superior to that of the FA, as all of the P-values are less than 0.01. The major goal of this study was to propose a new strategy for determining high-quality answers to categorization problems. The Material Generation Algorithm is a populationbased metaheuristic that MGA is a bioinspired algorithm inspired by material chemistry. Therefore, the weight values of the PNN can be optimized by MGA. When a huge search space is being examined, the MGA's superior exploitation and exploration capabilities allow it to achieve better results than FA and BBO. The MGA was utilized to tune the weight of the PNN in this study. To attain the research's targets, the results of this strategy rely on PNN and MGA was used to compare with the results original PNN's classification accuracy, FA-PNN and BBO-PNN. The MGA, which optimized the PNN weights, was used to improve the initial solutions, which were created randomly using the PNN. According to experimental results utilizing 11 benchmark datasets, the suggested MGA with PNN outperformed the original PNN, FA-PNN and BBO-PNN on 9 out of 11 benchmark datasets. This leads us to the fact that MGA can be implemented in additional real and high dimensional datasets to investigate their behavior under different situations in terms of trait numbers. As a result, we'll be focusing our efforts on this topic in the future.

IX. DISCUSSION
This study is considered one of the most important studies in the world of Data Mining. As our use of the method of merging with the metaphysical algorithms, especially with the MGA, and comparing its results with the results of 3 other studies (PNN, FA and BBO) that gives clear evidence of its importance in terms of increasing classification accuracy. As this study only used one algorithm to combine it with PNN, I believe that merging more than one of the high-specification meta-historical algorithms with PNN leads to an improvement and a significant increase in accuracy, and this is our destination in the work of these studies in the future.