SVM for Diabetes diagnosis Ontological Feature weights (OFW) and intensified Tabu Search (ITS) for optimization

In accordance to the fast developing technology now a days, every field is gaining it’s benefit through machines other than human involvement. Many changes are being made much advancement is possible by this developing technology. Likewise this technology is too gaining its importance in bioinformatics especially to analyse data. As we all know that diabetes is one of the present day deadly diseases prevailing. So in this paper we introduce LS-SVM classification to understand which datasets of blood may have the chance to get diabetes. Further, considering the patient’s details we can predict where he has a chance to get diabetes, if so measures to cure or stop it. In this method, an optimal Tabu search model will be suggested to reduce the chances of getting it in the future.


I. INTRODUCTION
In the present situation we can say that diabetes has no cure.In real, it happens due to the lack of insulin which has to do along with glucose in our body.It has to be supplied into our body during loss conditions externally.Indirectly, I is the main cause for fatal heart, kidney, eye and nerve diseases, which can be overcome or prevented by good food habits and body exercises [1].Over all this, the difficult thing is to differentiate between disease diagnosis and interpretation of diabetes data.For doing this, we are with Support vector Machine(SVM) which was developed by Vepnik [2].Its work has been tested in many ways [3][12] [4].The utmost advantage with it is that it can even work good with nonlinear functions and it contains Radial basis Functions(RBF) which is even more precise than polynomial and linear kernel functions.The comparison of this SVM with other methods like Combined Neural Networks (CNNs), Mixture of Experts (MEs), Multilayer Perceptrons (MLPs), Probabilistic Neural Networks (PNNs) also revealed that svm methods are perfect.
In estimating the diabetes features, Feature Selection is applied.By the above analysis, if we are left with 8 features, we can come down to 4 by this feature selection.It can take out the factors not conserned with the feature set.[5][6] [7] [8].
PCA (Principal Component analysis) is one of the feature selection method recently gaining its importance being used in image recognision, signal processing, face recognition etc.
By applying SVM to disease datasets, it can grab a large circumference of data sets even relevant or not relevant to the diagnosing the disease.But by such features with variation, the diagnosing will not be perfect and so weighted factors are to be developed.And they were contributed by Zhichao Wang [9] giving those weights by their ontological relevance.
The LS-SVM technique at last works with 2 parameters for accurate results.Out of many datasets and values came from SVM, the choosing of 2 parameters is very important, If very high features are chosen, some datasets will be missed and if chosen with utter accuracy and care, leads to underfitting [6,13].So, 2 optimised solutions are to be found out possibly by Intensified Tabu Search (ITS) [14].
The working of this ITS involves 3 phases.PCA, discussed above, is to get rid of irrelevant evidences given by SVM.Then OFW is to calculate the weight of each factor which PCA thought relevant.Then comes ITS which can find out the best possible 2 parameters for SVM so that it may not be under fit or over fit.
To have a quick look on what paper contains, we shall see the initial data sets of diabetes in section II, then our 1st step of PCA reduction in section III, to weighted preferences in section IV, then OFW in section V followed by experimental results in later sections.

II. DATASET OVERVIEW
The initial data sets are gathered form UCI Machine Learning Repository [16].It contains almost 8 categories on a whole and 768 sub categories which is really a very large database.The attributes are choose from these large data sets may be either discrete or continuous with an interval [17].The large data base, provided now is from the following:

III. FEATURE SELECTION
The 1st method which runs for reducing the data base is feature selection.The complexity of data can be reduced so that we can be left eith less datasets and can be more precise.Then comes PCA helping the classification to happen further with the help of statistical measures.The simplification of data by PCA is as follows: D n-dimension dataset.
M principle axes a1,a2,….These are orthogonal axes… then, covariance matrix is: Where m is the average of samples, and L is the number of samples.Therefore 1,...., (2) where 12 , ...., n q q q are the principal components of k x .
LS-SVM: Of all the paper, we discussed the key idea of using SVM brought up by Vapnik [2]which plays a main role in collecting the wide database for our problem.It also has its use in solving pattern recognition and classification problems.The methods present in SVM other than polynomial and linear are its greatest assets which made it to lead global models containing structural risk minimization principle [19].Though SVM sounds easy due to its extended results, finding the solution is difficult and what all can do is to find sparse solutions.Its difficulty arises from finding nonlinear equations.So as a solution, Suykens and Vandewalle [20] introduced least-squares SVM which results out linear equations.For the new type of SVM also the further proceeding like PCA, OFW and its usage in quantification and classification are applicable and reported in some works [23,24].
In calculation of linear equation, (y=wx+b), we use the 2 axes like regression(x) and dependent variable (y).And the best minimised cost function is 2 1 11 (3) 22 Subject to: () The formula's two parts are weight decay the 1st to generalize weights and regression error of training data is the second, whereas the parameter indicated by g is to be optimized by the user.
For a better generalization model, the most important criteria are the proper selection of features for the RBF kernel and polynomial kernel.

A. Feature Weight Calculation
The process of computing domain ontology feature and ontology feature weight is as follow: a. Characteristic of the information is individually treated as a semantic category and is considered as an ontology semantic peer.The characteristics are grouped based on their semantic principles.b.The whole relevancy of a feature is used to calculate weight of the characteristic in the ontology tree.

B. Domain Ontology-feature Graph
We construct ontology-feature graph w.r.t. a particular column of information in order to represent the domain knowledge model.There are three layers in the graph.They are: i.
Concept layer ii.
Attribute layer iii.
Data-type layer Let us discuss them in detail,

Concept layer:
First layer has all the concepts of the ontology called ontology concept.It is explained by the attribute nodes and remaining elements of the concept layer.It can be represented as: Ontology-Concept = {Cpt.1,Cpt.2, …., Cpt.n cpt }.
For each layer, an object is considered as a node.

Attribute layer:
Second layer, Attribute layer explains the nodes in the concept layer i.e. ontology attribute following the regulations of the characteristic set.

Data Type Layer:
This layer explains the node of the Attribute layer following the regulations of the metadata layer i.e.Ontology-Data type.
Ontology-Data type={Dt.1,Dt.2, …, Dt.n dt } www.ijacsa.thesai.org In the figure 1, the solid line shows the relative characteristics of the concept semantic layer and attribute layer whereas, dotted lines show the data type layer nodes individually.The characteristics of the information source and database storage logical model in the domain ontology feature graph are used to compose the data object ontology node.The latter one is used to construct the nodes of the concept layer of the domain ontology characteristic graph based on its design pattern.Generally, the data object ontology node is constructed based on the remaining of the data source of the ontology graph.The database is composed by using the ontology principle as the www.ijacsa.thesai.orgprimary rule and the principle layer nodes of the graph which is constructed based on database.
The ontology attribute layer is constructed from the elements of the attributes which are the characteristics of the concept layer.The data type layer is used to convert the data types of the attributes to the semantic extension type.The correction between the concept layer and the attribute layer nodes can be described and computed based on the considered s of the principle layers.The communication and computation between the principle and attribute layer nodes is as following:

Domain Ontology Feature Tree:
It is mainly used to refer to the relationship among the nodes of the attribute layer, concept layer and their characteristics (represented in the domain ontology characteristic graph).We also make use of it in the computation of correlation between ontology-concepts and ontology-attributes.

Domain ontology characteristic tree can be referred to with the triples as
Ontology-Tree, Ontology-Tree = {<Cpt.root,partOf,Cpt.1>,<Cpt.root,partOf,Cpt.2>,<Cpt.1,propertyOf,Ab.1>, ……} Here, in the tree, the final node i.e. the leaf node is one of the characteristics of the domain ontology whereas the branches can be represented by the concept.Thus, the discussed correlation can be calculated as: Where, Height(Ab.i),Height(Ab.j)refer to characteristics hierarchy Ab.i and Ab.j of the concerned tree.
Boolean function Data Type(Ab.i,Ab.j)is used to compare data types of features Ab.i and Ab.j.Distance(Ab.i,Ab.j) is the shortest path to the elements.Max(Height(Ab.i),Height(Ab.j))refers to the maximum length of the tree.α,β represent the variable parameters with 0< α,β<1.This shows that, the values of the arbitrary parameters remain unaffected over the relation among attributes.However, we can better the situation by choosing proper parameters through various tests.Thus, the formula to compute the weight of a characteristic can be drawn from the above co relations as, 1 ( . ) ( ., . ) According to the Conventional LS-SVMs, the given function is performed by the equal contributions from all the www.ijacsa.thesai.orgcharacteristics.But, actually, the various characteristics play different roles with various weights.Thus, different contributions from different characteristics can be performed by using the theory proposed by Zhichao Wang [9].

Intensified Tabu search (ITS) for characteristic selection
As we know, BDF chooses the characteristics based on the betterment of the recognition rate.After considering we came to know that BDF increases the count of support vectors according to the size of the problem [26].This feature seems to be interesting in producing quick and better decision method, but it applies only if it is connected to the betterment in the recognition rate.
The support vectors and some more characteristics care mainly used to provide a quick and better SVM BDF.Due to this reason, in order to solve the conflict between the complexity and performance, Decision Function Quality (DFQ) criterion is used in association with regularization theory.Thus, SVM makes sure to coach right from the basic i.e. tiny dataset St', where it stands for the primary coaching group St.It will reduce the ambiguity related to the BDF.Even the primary set is also further optimized by using LBG algorithm based on certain assumptions.The basic assumption is to consider parameter k as a variable of problem in choosing model.It is so, because k may not be able to handle all kinds of prototypes generated by LBG algorithm during the process.
Hence, the value of k (i.e. the range of optimization), the characteristic subgroup β, the regularization constant C along with attributes of the kernel such as (σ with gaussian kernel) must be selected for every kernel method K using the model selection method.If we consider θ as a model, k  ,   , C  and   respectively will be the representatives of the attributes discussed so far.Moreover, q(θ) represents the DFQ criterion for a model θ (c.f.Section 3.1).
The Section 3.3 deals with the presumption of DFQ criterion along with a learning set l S showing q(θ) ≡ SVM- DFQ(θ, l S ) which is to be optimized for model θ.The optimizing *  for q(θ) not being tractable, we decide to define a TS function for choosing a model with optimal intensification and diversification methodologies.

Decision Function Quality(DFQ):
For smooth calculation of the equation, we need DFQ for the theta we have.It can be known by the recognition rate RR with the help of complexity CP of decision function hu.Here comes the q(θ) = ( ) ( )

Rp R h C h  
be the DFQ [25].
Here, the correct and accurate result from equation can be calculated by using smoothness term and fitting term in terms of recognition rate (RR).Cp indicates the smoothness term.The model complexity of a SVM BDF [25]  To discuss the parameters of the function, cp1 and cp2 are tradeoff between classification rate improvement and complexity reduction.Beta is a Boolean vector with n size of represented features.Ki is to represent cost for ith feature cost (beta) combined to the subset of selected features is: When those costs are unknown, i k = 1 is used for all features.

Simplification Step:
Reducing training set size is the simplest way to reduce complexity of SVM.This LBG algorithm [25] is used to simplify the dataset .The simplification details are in the below table and can be used in the further discussion: SYNOPSIS OF SIMPLIFICATION STEP

DFQ estimation
The Decision Function Quality (DFQ) [25] criterion of a particular model θ is calculated from a attained dataset Sl. we can observer the elocation of values from the details given in the Table 3  exp( )

SYNOPSIS OF DFQ CALCULATING FOR A DEFINED MODEL 
V. FEATURE OTIMIZATION

Tabu Search specification
The main function q to be obtained produces the quality of the BDF h  .The main issue is to select an optimal model (good sub-optimal solution to be exact) *  for a function q when 1 p C and 2 p C are affixed.A model θ can be denoted by a set of n′ integer variables θ = ( 1 where min( i  ) and max( i  ) respectively denote lower and upper www.ijacsa.thesai.orgbound values of i  variable).Above all the list of all possible neighborhood solutions is added.Among these possible solutions, the apt DFQ that is not tabu is selected.The set of all it tabu  solutions θ which are tabu at the it repeated step of TS is defined as follows: it tabu } with -the set of all solutions and t an adjustable parameter for the short memory used by TS (for experimental results t = ' 1 max( ) min( ) . The idea is that a variable i  could be changed only if its new value is not present in the short memory.Then, our TS method does not go back to a value of i  previously changed in short time, avoiding by that mechanism undesirable oscillation effects.
Tabu status of solutions it tabu  may prohibit some attractive moves at iteration it.Therefore, our TS uses an aspiration criterion which consists in allowing a move (even if it is tabu) if it results in a solution with an objective value better than that of the current best-known solution.
The initialization of model θ with our TS model selection is the following: In the present formula K  , 1 m  and 1 m  denotes positive and negative classes in binary sub-problems.The value of K  permits to begin with enough minimum datasets to get low training times with SVM for the first step.
Using intensification and diversification strategies develops TS methods [30].The selected model should handle two kinds of problems.The first problem is testing all moves between two repetitions with a great number of features which is time-consuming.Especially, it is a waste of time to investigate moves which are linked to features where real solution is not suitable.Thus, emphasizing on moves which are only linked to SVM hyper parameters or simplification level is better than to discover new solutions.Coming to second problem, it is difficult for TS method to free from deep valleys or big clusters of poor solutions by using the short memory which effect in not tab solutions.Utilizing diversified solutions helps in win over of the problem.This is handled by enlarging step size (δ > 1) of moves and by pointing the use of all types of moves (except feature selection moves for the reason stated above).In present TS method, intensification and diversification strategies are utilized one by one and start with the intensification strategy.Later on we deal about the two strategies.

Intensification strategy
In the intensification algorithm synopsis of Table 4, Extensive Search survey all possible basic moves, whereas Fast Extensive Search explores only eligible basic moves which are not related to feature selection (i.e.changing the value of β).sin promi g  Controls when the real solution is seen as enough and this one allows switching between the two functions mentioned.
BestNotTabu correlate to the move procedure chosen in the above part (the best tabu solution is chosen if all moves are tabu).In this synopsis, int ensification  corresponds to the best solution found into a same phase of intensification, although

Diversification strategy
In the diversification algorithm synopsis of Table 5, suitable variable (one which does not have a link with features) is selected (Select Eligible Variable) by random and a jump of ±δ is performed by modifying the chosen variable in the real solution.
There are only two investigated moves (Two Move) to force the diversification of identified solutions.The jump size enlarges with the number of successive failures ( failure n ) of the intensification strategy to investigate more different regions.
In the process of the diversification redundancy, the best visited solution is saved devertification ).www.ijacsa.thesai.orgThis part explains the desired method (OFW-ITS-LSSVM) for the identifying of diabetes diseases (see figure3).Especially the system works in three stages automatically 1) PCA is applied for feature reduction 2) Best feature weights are estimated using OFW 3) ITS is employed for finding the optimal values for C and  .At first, PCA method is used to identify four features from diabetes dataset.Thus, in feature choosing stage, only large principal components will be utilized.Then, the OFW-LSSVM is used to classify patients, the feature weights which are received by OFW and at last , the MCS algorithm is used to The OFW-ITS-LSSVM model was compared with other popular models like LS-SVM, PCA-LS-SVM, PCA-MI-LS-SVM, MI-CS-SVM and PCA-PSO-LS-SVM classifiers.We utilized fold cross validation develop the holdout method.The data set was divided into k subsets, and the holdout method was iterated k times.
Every time, one of the k subsets is utilized as the test set and rest are put together to form a training set.Then the average error across all k trials is computed (Polat and Günes, 2007).This method was used as 10 -fold cross validation in our experiments.We considered the related parameters of PSO in PCA-PSO-LS-SVM classifier as follows: swarm size was set to 50; the parameters C and  were arbitrary taken from the intervals 33 [10 , 200] [10 , 2] and  , respectively.
The inertia weight was 0.  As per the Table 6, it is observed that utilizing the LSSVM classifier with OFW and ITS, it is easy to get the correct classification accuracy compared to other methods.Hence it is apt to say that this method gives a high rate of accuracy in identifying of Diabetes disease.The method can also combine with software to help the physicians to take final decision confidently.Later Mutual Information was used to the chose features to weight them depend on their related task of classification.Outcome proves that it devises the accuracy of the method.In addition to, Modified Cuckoo Search is utilized that allows the quick change of the algorithm and locate the correct values for parameters of SVM.The outcome has proved that the present model is faster and significantly more reliable than other models..The method can also combined with software to help the physicians to take final decision confidently in order to diagnose Diabetic disease.


Pregnant: Number of times of pregnant  Plasma-Glucose: Plasma glucose concentration measured using a two-hour oral glucose tolerance test.Blood sugar level. BMI: Body mass index (w in kg/h in m)  DPF: Diabetes pedigree function  TricepsSFT: Triceps skin fold thickness (mm)  Serum-Insulin: 2-hour serum insulin (mu U/mt) www.ijacsa.thesai.org DiastolicBP: Diastolic blood pressure (mmHg)  Age: Age of the patient (years)  Class: Diabetes onset within five years (0 or 1)
model.One basic move in our TS method corresponds to adding δ ∈ [−1, 1] to the value of a i  , while preserving the constraints of the model which depend on it (i.e.∀i ∈ [1, . . ., n′], i best solution found in all intensification and diversification steps.Nmax is the maximum number of intensification redundancy for which no development of the last best intensification solution int () ensification  are identified as failure of the intensification strategy.Nfailure counts the number of failures of intensification strategy.If Nfailure is higher than a fixed maximum number of failures max then ITS method stops and returns the solution best known   .If a solution in £ next has a QDF which is better than best known   , aspiration mechanism is used.That solution is selected as the new best known   and failure n is reset to zero.

.
In the TS investigation, when aspiration is included, the strategy automatically moves to intensification and the number of failures is rearranged ( 0 failure n 

Figure 3 .
Figure 3. Flowchart of the OFW-ITS-LSSVM VI.PROPOSED METHOD OFW-ITS-LSSVM detect the best value for C and  parameters of OFW- LSSVM.The description of training procedure is: 1. Set up parameters of ITS and initialize the population of n nests (Algorithm 1) 2. Compute the corresponding fitness function formulated by classified total (total denotes the number of training samples, and classified denotes the number of correct classified samples) for each particle.3. 3. Find the best solution using ITS VII.EXPERIMENTAL RESULTS

Figure 4 .
Figure 4. Line chart representation of comparing OFW-ITS-LSSVM with other methods from literature Semantic Ontology Correlation: The two ontology topics follow the format with predicate suchas as <Subject, Predicate, Object>.Here, Predicate represents the group of predicates, Predicate= { partOf ,propertyOf }, It is mainly utilized in explaining the ontology predicate.Ontology relations, Concept-Concept and Concept-Attribute, are refered to as the CC and CA following the predicate set explanation.

TABLE III .
CONFUSION MATRIX www.ijacsa.thesai.orgThus it shows the frequency of disease how a patient is misclassified.Furthermore, Table5displays the categories accuracies of OFW-ITS-LSSVM.The present model gets the correct categories accuracy of 95.78% among classifiers on the test set.Determining the test performance of the classifiers is done by addition of specificity and sensitivity that are classified as: Specificity: number of true negative decisions / number of real negative case sensitivity: number of true positive decisions / number of real positive cases.A true positive decision happens only if the positive expectation of the network mingles with a positive expectation of the physician.A true negative decision happens if the two i.e. network and the physician advice negative expectation.

TABLE IV .
THE VALUES OF THE STATISTICAL PARAMETERS OF THE CLASSIFIERS Over all the work propose a new automatic method to diagnose Diabetes disease depend on Feature Weighted Support Vector Machines and Modified Cuckoo Search.For discarding the other features, Principal Component Analysis was utilized.