A Hybrid Multiple Indefinite Kernel Learning Framework for Disease Classification from Gene Expression Data

— In recent years, Machine Learning (ML) techniques have been used by several researchers to classify diseases using gene expression data. Disease categorization using heterogeneous gene expression data is often used for defining critical problems such as cancer analysis. A variety of evaluated factors known as genes are used to characterize the gene expression data gathered from DNA microarrays. Accurate classification of genetic data is essential to provide accurate treatments to sick people. A large number of genes can be viewed simultaneously from the collected data. However, processing this data has some limitations due to noises, redundant data, frequent errors, increased complexity, smaller samples with high dimensionality, difficult interpretation, etc. A model must be able to distinguish the features in such heterogeneous data with high accuracy to make accurate predictions. So this paper presents an innovative model to overcome these issues. The proposed model includes an effective multiple indefinite kernel learning based model for analyze the gene expression microarray data, then an optimized kernel principal component analysis (OKPCA) to select best features and hybrid flow-directed arithmetic support vector machine (SVM)-based multiple infinite kernel learning (FDASVM-MIKL) model for classification. Flow direction and arithmetic optimization algorithms are combined with SVM to increase classification accuracy. The proposed technique has an accuracy of 99.95%, 99.63%, 99.60%, 99.51%, and 99.79% using the datasets including colon, Isolet,


INTRODUCTION
The integration of data tends to be an emerging topic, whereas decision making based on metabolomics and genomic requires better prediction or diagnosis rather than the utilization of clinical data alone [1]. The prediction or classification of diseases dependent upon the medical data requires appropriate methodologies [2]. Machine learning has been widely playing a huge role, especially in biomedical researchers, over the past decades [3]- [4]. This process is partially because of greater advancements in data collection that have enabled the study of biomedical mechanisms of various diseases, particularly cancer [5]. When the gene expression data are adopted from microarrays comprising high density oligonucleotide arrays (HDOA) or complementary Deoxyribonucleic acid (CDNA), the classification methods are utilized for data examination and interpretation.
Disease classification using heterogeneous gene expression data is greatly utilized for determining fundamental issues like disease analysis and drug detection [6]- [7]. The gene expression data collected from DNA micro arrays are characterized through diverse evaluated variables known as genes [8]. An exact classification of the gene data is very important in order to be able to treat sick people appropriately [9]- [10]. The molecular founded connections over a scale can be analyzed through gene expression data, which can be examined by a significant tool called microarray [11]. A huge amount of genes from the collected data can be observed simultaneously. Still, there are certain drawbacks in processing these data as they comprise noises, redundant data, often prone to errors, increased complexity, smaller sample with large dimensionality, complex interpretation, etc.
Several researches were conducted previously using machine learning approaches to classify diseases using gene expression data [12]- [13]. The major reason behind searching for effective approaches is to predict the survival rates to grab better treatment. The feature selection approaches are highly efficient in eradicating the noisy features, redundant data and are significant in describing the biological features when minimizing the model complexity [14]. The chief focus of the feature selection approach is to reduce the data dimensionality, which improves the overall system performance [15]- [16]. The kernel is generally used to indicate a kernel trick, an approach of utilizing a linear classifier to solve non-linear issues [17]. The Kernel learning transforms linearly inseparable heterogeneous data over separable data. The kernel approaches are better, but parametric assumptions cannot be made and sensible over outliers. www.ijacsa.thesai.org required, which is not a priority for many diseases [18]- [19]. The utilization of transcriptomics subjects possesses diverse challenges in interpretation. To overcome this, multiple kernel learning (MKL) permits the combination of pathway data into prediction models that utilize transcriptomics, whereas the interpretation and accuracy can be enhanced. Several studies have been developed through the application of MKL over genomic data. Every MKL method can render an ordering for data type significance that delivers appropriate information [20]. The MKL aims to determine the kernel's best convex integrations to generate the best classifier. Diverse feature components of heterogeneous data with various kernel functions can be mapped to expose the data better in the new feature space.

A. Motivation
Predicting and classifying the disease from gene expression data is an extremely challenging task due to the intrinsic nature of the data. Heterogeneous data involves large differences between traits, making predictions difficult for any learning model. In addition, the size of the data is extremely large, leading to several complications. A prominent solution identified to this problem is using meta-heuristics that can optimally tune the parameters, resulting in higher accuracy. In order to achieve better performance of multiple kernel learning, machine learning models are generally adopted, among which the SVM model is highly preferred. Considering all the problems, this proposal focuses on developing a hybrid multi-core learning framework with the hybridization of an effective meta-heuristic with an SVM model to achieve a higher percentage of accuracy in disease classification. The main contributions of the proposed work are:  To analyze the gene expression microarray data to attain higher accuracy in disease classification, an effective multiple indefinite kernel learning based model is proposed.
 A new approach of optimized kernel principal component analysis (OKPCA) is presented to decrease the dimensions of microarray data by eliminating the unimportant features from the feature space.
 In order to achieve higher accuracy in disease classification using gene expression data, several kernel functions such as radial basis function, sigmoid kernel, polynomial kernel and linear kernel are introduced and integrated into the hybrid flow-directed arithmetic SVM-based multiple infinite kernel learning (FDASVM-MIKL) classification frameworks.
 In order to validate its efficiency against the existing methods, extensive simulations of the proposed method are performed using different metrics.
The proposed research work is organized into various sections. The literature survey of disease classification from gene expression data directed by various researchers is described in Section II. The discussion of the proposed methodology for disease classification from gene expression data is described in Section III. Section IV discusses the simulations performed using simulation tools to analyze the outcomes of the proposed technique. Finally, the conclusion and the future scope of the proposed technique are provided in Section V with references.

II. RELATED WORKS
Most researchers have applied various methods to reduce dimensionality across heterogeneous gene expression data. Some of the prominent adopted models are examined as follows.
Liu et al. [21] presented a dimension reduction algorithm to enhance the classification performance and minimize the dimensionality. The dimensionality minimization was carried through a Weighted Kernel Principal Component Analysis (WKPCA) that builds the weights of the kernel function in accordance with the kernel matrix Eigenvalues. The feature dimensions were minimized through the multiple kernel functions combination. The t-class kernel functions were built to further enhance the efficacy of dimensional reduction. The classifiers like random forest, naive Bayes and Support Vector Machine were used to examine six real gene expression datasets. The major limitation faced in this approach was the non-flexibility of kernel function selection and the degraded embedding ability.
Rahimi et al. [22] presented a multi-task multiple KL approach with task clustering and developed a greatly timeeffective solution. The proposed solution approach in this research was dependent upon the benders decomposition and clustering issue treatment through the determination of given tree structures in the graph. The method is called forest formulation and has been used to differentiate early and late stage cancers through the adoption of gene sets and genomic data. When the number of tasks and clusters gets maximized, the forest formulation approach is highly favorable due to computational performance. The time consumed in solving large scale instances was too high in the case of a multi-task multiple KL approach and clustering.
The microarray data was utilized for training deep learning approaches using extracted features. Almarzouki et al. [23] established an effective feature selection approach to maximize accuracy and minimize the classification time. The most significant genes were picked by eradicating the superfluous and duplicate information. Artificial Bee Colony (ABC) method using bone marrow pyruvate carboxylase gene expression data was employed in this research work. The features selected using the ABC algorithm were made as a wrapper based features selection system. The datasets of lung, kidney and brain cancer were utilized during testing and training. The characteristics of data were not effectively examined, and there was a greater possibility of losses.
The seven cancer datasets were collected initially from the Broad Institute GDAC Firehose comprising of isoform expression profile, survival information, gene expression profile and expression data of DNA methylation, respectively. Feng et al. [24] recommended kernel principal component analysis (KPCA) to extract the relevant features for every expression profile. The features are then fed over three similar kernel matrices through a Gaussian kernel function combined as a global kernel matrix. Finally, the features were applied over the spectral clustering algorithm to obtain clustering (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 6, 2023 846 | P a g e www.ijacsa.thesai.org outcomes. Due to the collection of abundant datasets, the dimensionality issue was not solved effectively.
To overcome the limitation of the increased computational effort of using a huge data set, Wani et al. [25] proposed an efficient method. The MKL founded gene regulatory network (GRN) inference method was presented in this research in which numerous heterogeneous datasets were combined together using the MKL paradigm. The GRN learning issue has been formulated as a supervised classification issue in which the genes are directed through a specified transcription factor differentiated from other non-regulated genes. In order to learn a huge scale GRN, a parallel execution construction was devised. Better accuracy rates and speedups can be obtained, but the data quality and redundancy issues were not solved effectively. Table I describes the major contribution with its corresponding merits and demerits of existing methods. Based on the related work of the existing methods according to disease classification using gene expression data, various limitations negatively impact the performance of the method. Due to the disadvantages, such as the use of lesser datasets, non-flexible embedding capability, higher consumption of time, lower data quality, redundancy problem and lower classification accuracy, an effective proposed FDASVM-MIKL approach is developed. The additional limitations such as noises, redundant data, frequent errors, increasing complexity, smaller samples with high dimensionality, difficult interpretation, etc. This limitation can be reduced in appropriate feature extraction method, here in this research OKPCA proposed for analyzing expression data based on multiple indefinite kernel learning with and hybrid flow directed arithmetic SVM for gathering effective results to overcome the current limitations and increase classification accuracy.

III. PROPOSED METHODOLOGY
The heterogeneous nature of the data is a very difficult to process, predict and classify the disease utilizing gene expression data. The fact that the gene expression data are quite heterogeneous has been identified as one of the main challenges. In general, heterogeneous data have a wide range of feature variations, making it difficult for any learning model to make predictions. A model must be able to accurately distinguish between the characteristics of heterogeneous data in order to make reasonable predictions. Utilizing meta-heuristics to optimize the tuning of the parameters and increase accuracy has been proposed as one prominent solution to this problem. The support vector machine (SVM) model is highly preferred among the machine learning models when the task of multiple kernel learning is accomplished. This approach aims to consider all issues and create a hybrid multiple-kernel learning framework that combines an efficient meta-heuristic with an SVM model to increase the accuracy of disease classification. The The proposed framework comprises three major steps preprocessing, feature selection and classification. The dataset is initially brought through various processes to make it suitable for classification because it is diverse and has a large dimension. To improve the quality of the dataset, the outliers www.ijacsa.thesai.org are first eliminated. The missing values are then filled with mean values once the dataset has been examined for any missing values. After this step, the data set is then passed to the feature selection phase, in which the main features are extracted. The proposed study introduces the optimized kernel principal component analysis to identify the dataset's best features (OKPCA). The proposed hybrid flow directed arithmetic SVM based multiple indefinite kernel learning (FDASVM-MIKL) classification framework is then given the features chosen using the OKPCA technique. This proposed framework is associated with the SVM classifier, flow direction algorithm (FDA), and Arithmetic Optimization Algorithm (AOA). To increase overall performance, the SVM framework incorporates some kernels, containing the linear kernel, sigmoid kernel, polynomial kernel, and radial basis function.

A. Pre-processing
Pre-processing is a very essential step utilized to provide data cleansing that is useful for further analysis. Here, a standard pre-processing method is used. The dataset is then examined for non-value reduction processes, and the missing values are then filled with mean values. Outliers are removed to increase the quality of the data set. The dataset is then given to the feature selection stage, where the main features are extracted after this step.

B. Feature Selection and Extraction
The proposed study introduces the optimized kernel principal component analysis to identify the dataset's best features (OKPCA). This technique chooses the most important features from the dataset and ignores the rest, designed to decrease the dimensionality of the data.

1) Optimized kernel principal component analysis:
Principal component analysis, which finds recurring patterns in the dataset with little information loss, is frequently used to reduce complex spectral datasets into understandable information. The strength and flexibility of principal component analysis are greatly enhanced by its clarity and conciseness. The important factor when using PCA is that it is a linear transformation and it can be written in the simple matrix form, which is given below: Where, B is a transformed data matrix, A is an original data matrix, and H is a transformation matrix. The corresponding eigenvectors , and the necessarily non-negative eigenvalues   i  arranged in decreasing order.
The transformation matrix is obtained by stacking the eigenvectors which is shown in equation (2).
The enhanced version of PCA, known as kernel principal component analysis (KPCA), can handle non-linear correlations between variables. It employs a non-linear function to translate the observed data into high dimensional space (kernel function). KPCA uses a non-linear mapping function with N columns (variables) and M rows (observations). This can be calculated mathematically as shown in equation (3), Where f is the feature space. In kernel technique, kernel function and kernel matrix are known as respectively. The appropriate kernel parameter is best discovered using this kernel-based strategy by generalizing the problem to an eigenvector one. The scatter error metric is used as the objective function of the problem.
The defined goal function computes the gradient and Hessian matrices, and the kernel parameter of the method is tweaked using the gradient values. Gradient   f  and Hessian   This kernel-based approach best discovers the appropriate kernel parameter by generalizing the problem to an eigenvector. The scatter error metric is used as an objective function to solve the problem. The gradient and Hessian matrices are produced by the defined objective function, and (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 6, 2023 848 | P a g e www.ijacsa.thesai.org the gradient values are used to control the algorithm's kernel parameter. The hybrid flow directed arithmetic SVM based multiple indefinite kernel learning (FDASVM-MIKL) classification framework is then given to the features chosen using the OKPCA technique.

C. Classification using FDASVM-MIKL
The proposed hybrid Flow Directed Arithmetic SVM based multiple indefinite kernel learning (FDASVM-MIKL) frameworks are used for classification purposes. This framework combines an SVM classifier, a FDA and AOA which are described in below sections.

1) SVM based multiple indefinite kernel learning:
Various kernels are incorporated into the SVM architecture to improve overall performance, containing the polynomial kernel, linear kernel, sigmoid kernel, and radial basis function. The input layer, hidden layer, SVM kernel layers, SVM output layer, and the voting layer form an original SVM based multiple indefinite kernel learning. An additive kernel model enhances the functionality of a standard kernel model. This model is obtained using the weighted linear sum of kernels.
a) Radial basis function (RBF): RBF kernels are the most usually utilized kinds of kernelization due to their similarity to the Gaussian distribution. The degree or similarity of proximity among two points 2 1 X and X is determined by using the RBF kernel function. The mathematical representation of kernel is given in the following equation: Where  is the variance or hyperparameter, and 2 1 X X  is the Euclidean distance between two points 2 1 X and X .
b) Sigmoid kernel: The sigmoid kernel function is an activation function for artificial neurons and is similar to a two-layer perceptron neural network architecture. It is defined as equation (7): The hyperbolic tangent, tanh is used to define this kernel.
It can express intricate non-linear interactions when utilized with correctly calibrated parameters. However, this does not represent a true kernel since the sigmoid function might not be positive definite for some parameters.
c) Polynomial kernel: The kernel function is used with SVMs and other kernel models is termed as polynomial kernel, to represent the similarity of vectors (training samples) in a feature space via polynomials of the original variables machine learning can be used, allowing the learning of nonlinear methods.
Where P is the kernel parameter. It shows the similarity of vectors in the training dataset in a feature space over polynomials of the original variables that is utilized in the kernel. d) Linear kernel: Linear kernels are used when the data can be linearly separated.
When the data is linearly separable or can be divided along a single line, a linear kernel is utilized in SVM. When a given data set contains many features, it is typically employed.

D. Hybrid Flow Directed Arithmetic SVM
The SVM classifier categorizes diseases in the data, and the hybrid FDA with the AOA method is used to optimize each kernel parameter. Performance evaluations are then conducted to determine the effectiveness of the proposed methods across a variety of datasets. Furthermore, analyses of gene expression data in various forms demonstrate the heterogeneity of the method. Support Vector Classification (C), known as Regularization Parameter, has a strictly positive value. This regularization parameter is optimized using a hybrid FDA-AOA.

1) Flow direction algorithm:
FDA is evaluated using the direct runoff flow in basins, which is the main focus. The FDA calculates flow velocity based on each individual slope, which falls steeply toward its near neighbors. The FDA introduces new tools for performing optimization. The neighborhood radius decreases from high to low values by defining a washbasin filling technique that helps in escaping local solutions. The FDA algorithm applies the relationship below to determine the initial position of flows: denotes the location of the flow th i , vc and mc denote the lower and upper limits of the decision variables, and  denotes a uniformly distributed random number between zero and one. Additionally, it is assumed that each flow is surrounded by one or more neighborhoods, whose positions are determined by the relationship shown in below equation: The normal distribution with a mean of 0 and standard deviation of 1 where, represents the neighbor at the th j position.
The large numbers for this parameter shows the searching in a large range, while small numbers    limit searching to a small range. is lowered to zeros for the second term when iteration is increased, closing the gap between the two. Thus, the local search is not working. In the third term, the   G is determined as follows: Where, the random vector with uniform distribution is represented as  . The following relationship determines the new place of the flow. This FDA method starts with the initial population of the search space or drainage basin. The flows then shift to a low height position by achieving best outcome or output point with lowest height.

2) Arithmetic Optimization Algorithm (AOA):
AOA is a meta-heuristic algorithm that uses the distribution behavior using four major arithmetic operators in mathematics, such as multiplication, subtraction, division and addition. To carry out the optimization processes in various search areas, AOA is arithmatically modeled and placed into action. This metaheuristic technique uses population data to solve optimization problems without finding their derivatives. Initialization, exploration, and exploitation are the three important phases of the optimization process.
a) Initialization phase: The best optimized solution is regarded as the best candidate solution in each iteration of the AOA optimization process. The optimization procedure in AOA starts with a collection of candidate solutions   S , as shown in the matrix, which is generated randomly. The search phase before it starts to function (i.e., exploitation or exploration), AOA should be select. Math Optimizer Accelerated (MOA) function is the coefficient derived from equation (18) and provided in the subsequent search phases. (19) Where, the function value at the th i iteration is represented  c) Exploitation phase: The exploitation approach of AOA is described in this section. According to the arithmetic operators, the mathematical representation utilizing any subtraction or addition produced very high dense outcomes related to the exploitation search process. As a result, the exploitation search finds the almost ideal answer that can be determined after numerous attempts (iterations). The exploitation operators (Addition and Subtraction) of AOA investigate the search area systematically in some dense regions and take a method to determine the better result. According to two major search approaches (i.e., Addition search strategy and Subtraction search strategy), modeled below: A deep search is used to fully use the search space. The other operator addition will not be considered till the first operator subtraction in this phase (first rule in equation (22)), which is conditioned by If not, the subtraction will be replaced with the second operator addition to complete the current task. The partitions from the previous phase methods are analogous to those in this phase.
Initialize the AOA parameters, where   , .
Compute the fitness function for the solution given.

5.
Find the best solutions.
for (i=1 to solutions) do
Create random values between [0,1] (a, b and c) 11. if a >MOA then
if b > 0.5 then

24.
Update the th i position of the solution using equation (22) 25. else 26.
In AOA, generating a randomized set of populations is the first step in the optimization process. Every solution improves its position in relation to the best solution found. The parameter MOA is similarly increased from 0.2 to 0.9 to emphasize exploitation and exploration. When, MOA a  the candidate solutions effort to diverge from the near-optimal result, and when MOA b  they attempt to meet to the nearoptimal result.

IV. RESULTS AND DISCUSSION
This section describes the experimental outcomes of the proposed FDASVM-MIKL classification. The proposed work performance is evaluated by using a simulation tool in PYTHON. Some of the simulation parameters are given in the Table II.   TABLE II. SIMULATION PARAMETERS

Parameters Values
Regularization Parameter 1.0

Kernel functions Linear kernel, Sigmoid kernel, Polynomial kernel and Radial Basis Function
Iteration 1000 Intercept scaling 1.0

Fit intercept True
Random state None Several current approaches are examined to assess the proposed categorization performance. The next subsections provide descriptions of the dataset, representations of various performance metrics, analyses, and comparisons.

A. Dataset Description
The data utilized for assessing the performance of gene expression classification through the FDASVM-MIKL based approach is gathered from the datasets Colon, Prostate_GE, Isolet, Lung_ cancer, ALLAML, snp2graph and the download link of each dataset is given below: 1) Colon dataset (http://biogps.org/dataset/tag/colon/ ): It is a well-known dataset for expression data analysis (cancer). Seven criteria and 90 samples are included. Dataset based on the human species.

B. Performance Metrics
Various performance metrics, including accuracy, F measure, sensitivity, specificity, and recall statistics, are taken into consideration while evaluating the effectiveness of the proposed gene expression data classification. The mathematical expressions used to describe various metrics are shown in the following representation.

C. Performance Analysis
The analysis contains the main performance metrics, including accuracy, F-measure, recall, sensitivity and specificity, which are considered when comparing the performance of proposed and current techniques. The explanation of the performance includes a description and a graphical representation. The dataset such as Colon, Isolet, ALLAML, Lung CANCER, Prostate and Snp2 are used to classify gene expression into six categories. The performance comparison of accuracy is given in Table III. The comparison of Accuracy, Specificity Precision, F1score and Recall with its values is represented in Table III. The accuracy of the proposed approach using the ALLML dataset is 99.60%, the Colon dataset is 99.95 %, the Isolet dataset is 99.63%, the Lung Cancer dataset is 99.51%, the Prostate dataset is 99.71%, and the Snp2 dataset is 99.79% obtained. The performance values of Specificity Precision, F1 score and Recall are also given in the table. The performance examination of the proposed Accuracy, Sensitivity, Precision, F1-score, Specificity and Recall is given in Fig. 3.
Accuracy, Specificity, Precision, Recall and F1-score performance is greater than the existing method. Table IV shows the accuracy performance comparison of existing and proposed approaches using the Colon dataset.
The accuracy of the proposed FDASVM-MIKL method is 99.95%. The existing method includes DNN, Improved DNN, CNN, and RNN with accuracy performance of 91.4%, 91.4%, 82.8% and 84%, respectively. Related to the existing methods proposed FDASVM-MIKL approaches has a better accuracy outcome. The performance comparison of the proposed and existing methods using the Colon dataset is represented graphically in Fig. 4.  Table V presents an accuracy comparison of proposed and current approaches using the Isolet dataset.
The proposed FDASVM-MIKL approach has an accuracy value of 99.63%. The accuracy performance of the existing methods, including SVM with multiplicative kernel combination (GKML), ElasticNet-SVM, Multiple indefinite kernel learning based FS (MIK-FS), and SVM with l1 norm regularizer (11-SVM), is 96.01%, 81.589%, 88.03%, and 94.86%, respectively. The proposed FDASVM-MIKL approach has improved accuracy performance compared to the current methods. The performance study of the proposed and current approaches utilizing the Colon dataset is visually depicted in Fig. 5.  The graphical comparison of the proposed method with the known methods is shown in Fig. 5. Table VI represents the accuracy comparison of the proposed and existing methods using the Prostate dataset. The accuracy of the proposed FDASVM-MIKL approach is 99.71%. The accuracy performance of the existing methods, including DNN, Improved DNN, CNN, and RNN, is 89.2%, 93.2%, 89.2%, and 82.4%, respectively. The proposed FDASVM-MIKL approach has improved accuracy performance compared to the current methods. The performance study of the proposed and current approaches utilizing the Prostate dataset is visually depicted in Fig. 6.
The graphical comparison of the proposed method with the known approaches is shown in Fig. 6. Using the ALLAML dataset, Table VII compares the accuracy of the proposed and current approaches.
The accuracy of the proposed FDASVM-MIKL technique is 99.60 %. The current methods include rMRMR-nMGWO, Random Forest, Least Absolute Shrinkage and Selection Operator (LASSO), Elastic Nets, and Decision Tree accuracy performances are 98.3%, 48.6%, 87.5%, 98.7%, and 83.3%, respectively. The result shows that the accuracy performance of the proposed FDASVM-MIKL methodology is greater when compared with existing approaches. Fig. 7 illustrates the performance analysis of proposed and current approaches using the ALLAML dataset.
The graphical comparison of the proposed technique with the known methods is shown in Fig. 7. Using the Lung cancer dataset, Table VIII compares the accuracy of the proposed and current approaches.   The accuracy of the proposed FDASVM-MIKL approach is 99.51%. The accuracy performance of the existing methods, including robust Minimum Redundancy Maximum Relevancy-Gray wolf optimizer algorithm (rMRMR-\nMGWO), Random Forest, Elastic Nets, Least Absolute Shrinkage and Selection Operator (LASSO) and Decision Tree, is 97.5%, 79.3%, 91.6%, 97.5% and 87.6%, respectively. The proposed FDASVM-MIKL approach has improved accuracy performance compared to the current methods. The performance study of proposed and current approaches utilizing the Lung cancer dataset is visually depicted in Fig. 8. The comparison employs several epoch counts. Fig. 9 compares the Loss epochs of Train, Test and Validation of the proposed system. The training, testing and validation losses of the proposed approach on the provided dataset are plotted as a function of epoch number in Fig. 9. In the comparison, various epoch counts are employed. The Error rate of the proposed method for each dataset is illustrated in Table IX.

V. CONCLUSION AND FUTURE SCOPE
This paper proposes a unique method for the analysis of expression data based on multiple indefinite kernel learning, OKPCA and hybrid FDA-AOA is also employed. The PYTHON platform is used to carry out the proposed strategy. The evaluation results are also taken into account for various types of data sources. The accuracy of the classifications is used to determine the performance. The accuracy of the proposed technique using the colon dataset is 99.95%, the accuracy of the proposed technique using the Isolet dataset is 99.63 %, for ALLAML is 99.60%, using the Lung cancer dataset is 99.51%, for the prostate dataset is 99.71% and the proposed method accuracy using the Snp2 dataset is 99.79%. It is clear from the results that the Isolet database performs better. The comparative result of the proposed strategy demonstrated that it is more accurate than other existing methods. In the future, this study will be extended to include improved techniques and advanced classification approaches.