Heart Diseases Prediction for Optimization based Feature Selection and Classification using Machine Learning Methods

Globally, heart disease is considered to be the major cause of death. As per statistics, 17.9 million people are losing their lives every year worldwide. Chronic Kidney Disease (CKD) and Breast Cancer takes the next positions in the list. Disease classification is an important issue that needs more attention now. Making use of an optimized technique for such classification would be a better option. In this heart disease classification, initially, feature selection was done using Teaching learning based Optimization based (TLO) and Kernel Density. TLO is based on the process of classroom teaching, which involves too much iteration that leads to time complexity. Similarly, a certain level of misclassifications has been observed by using Kernel Density (KD). In the proposed method, KNearest Neighbour (KNN) is used to address the issue of NaN values and Density based Modified Teaching Learning based Optimization (DMTLO) is used for feature selection. Finally the classification process is done by considering Support Vector Machine (SVM) and Ensemble (Adaboosting method). SVM categorizes data bydissimilar class names by defining a group of support vectors that are part of the group of training inputs that plan a hyper plane in the attribute space. Ensemble method is used to solve statistical, computational and representational problems. Experimental outcomes have proved that the projected DMTLOovertakes the existing methodologies with required quantity of attributes. Keywords—Teaching learning based optimization; kernel density; support vector machine; k-nearest neighbour; ensemble learning


I. INTRODUCTION
Nowadays, datasets are tremendously accumulated with enormous quantity of data sources. Such high dimensional data rises the calculationrate and diminishes the results of a ML model if the dataset has inappropriate, duplicate and unwanted attributeswhich is notfavourable to the improvement of an analytical model. The issue of over fitting with vast number of features could be addressed by using Learning models. Choosing a relevant and suitable set of features could be a better way to solve this problem. Several feature selection algorithms are available in this regard. These algorithms are capable of minimizing the quantity of features in order to develop an AI model by authenticating different arrangements of features in an input dataset.
In general, wrapper based attribute selection strategies are projected to improve the competencies of classification methods. Finding a worthy arrangement of attributes is really a challenging task. Various optimization techniques are utilized for choosing proper features such as Genetic Algorithm (GA), and Particle Swarm Optimization (PSO) by numerousscientists to advance the outcomes of the classifiers.
Parham et al., (2016) [9] established an attribute choosing strategy which is a hybridization of PSO and local search strategy. Its results were evaluated with various screen and wrapper-based strategies. It has attained notable precision results. Hafez et al. (2015) [5] proposed an attribute choosing procedure that is dependent on Chicken swarm optimization. It replicated the performance of chicken swarms and attainedgood resultsthroughtypical datasets relatedtowards GA and PSO optimization algorithms. A methodology proposed by Panda (2017) [12] relies on elephant search optimization in aalliance with deep NN for inspecting microarray data. Venkata   [14],   [21] proposed extensive presentations of TLBO in many real time problems. The strategy of TLBO is proposed to decrease load of fixing the parameter standards during attribute choosing process.

II. RELATED WORK
Attribute selection is highly needed in various areas like categorization of emails, disease analysis, forged claims and also in the areas of credit/debit risks. In the process of developing a well-organized decision-making method, the significant step is to organize the better features which are more suitable to attain better precision results.Various scientists have made use of filter and wrapper choosing strategies Wah et al., (2018) [22] to increase the correctness of forecaststrategies. Several prevailing attribute choosing strategies have been observed to comprehend its pros and cons. Bahassine et al. (2018) [3] have projected a novel attribute choosing method for categorization of Arabic text by means of an better Chi-square technique to improve the classification outcomes. Better results have been attained by incorporating SVM classifier. Mazini et al. (2018) [11] established a new method intended for abnormality network-based intrusion discovery model. This helps to attain a maximum detection rate with a minimum false positive rate. This model is a hybridization of both artificial bee colony and AdaBoost algorithm. The former is utilized for selecting efficient attribute whereas the latter is for classification. www.ijacsa.thesai.org Thawkar et al., (2018) [18] projected an attribute choosing method. This method was developed using Biogeographybased optimization procedureaimed atcategorization of numeral mammograms with ANN. Wen et al. (2016) [23] developed a novel unsupervised attribute choosing techniquethat is related on L2;1-norm regularization on behalf ofidentifying certain human movements. The above said procedure achieves both attribute mining and selection instantaneously which produces ideal attributes. Xu et al. (2017) [24] projected an innovative discriminative L2 regularization-based sparse demonstration. This procedure is exclusively for classifying input images and accomplishednotableprecisionthroughvarious inputs.
Absolute dimensionality reducing method is proposed by Lai et al. (2017) [7] that can be termed asRobust Discriminant Regression (RDR) by means of L2,1-norm as the elementarystandard in the evaluation function for attribute extraction. RDR doesn't get proper predictions for attribute selection and that is considered to be its main disadvantage.

Mafarja et al. (2017) [8] utilized the Dual Dragonfly
Procedure. This is in the direction ofpicking a subdivision of attributes taken fromUCI repository and attainedimprovedout comesequated with GA and PSO algorithms.  TLO is familiar technique towards choosing the ideal sub division of features. This has binary segments. First segment covers an optimization Technique, which can be utilized to choose ideal set of attributes. Various classification models are covered in the latter phase. These segments arerecurrenttill anendingcondition has seen. Stopping criteria can be taken as astaticamount of iterations. Improved precision with various classification models cannot be adopted in Teaching Learning based Optimization (TLO) and also this TLO cannot be hybridized with any other feature selection strategies.

B. Feature Selection using Kernel Density (KD)
Kernel Density (KD) isa non-parametric and it doesn't make any conventions with respect to data distribution. It always chooses attributes that capture the performance of usual data by separating the outliers. A forward search strategy is used for estimating standards. This is highly capable of discovering outliers when compared to other familiar strategies. Incorporating other search techniques would be a more challenging factor in terms of attribute selection since it exploits the parallelism. Also, no proper studies have been done so far to ensure the value of the features.

IV. PROPOSED SYSTEM
In the proposed system pre-processing to remove the Nan is done using KNN method, feature selection using Density based Modified Teaching Learning based Optimization (DMTLO) and Kernel Density (KD) based method. Classification is done using Classification using SVM and Ensemble (Adaboosting method).

A. Pre-Processing to Remove the Nan using KNN Method
In familiar data mining tasks like, classification and regression Altman (1992) [2], K-Nearest Neighbour (K-NN) is considered to be a constraintfree approach. It is a method of instance-based learning and it is likewise termed as lazy learning. Local approximations are done on the functions and the calculations are suspended until classification. It is considered to be the basic way of all AI techniques.
Its outcomes determine the classification or regression. The characteristicscomprise ease to take outcomes, calculation time and analyticalcompetence.If K-NN is utilized for classification, the results give the class membership.
Objects are categorizedby means ofconsidering the vote attained from neighbours. All those objects are allocated to a class which is more obvious in KNN.In the phase of regression, the outcome provides the stuff of object which is the average of the values of KNNs.

B. Feature Selection using Density based Modified Teaching
Learning based Optimization (DMTLO) Density based Modified Teaching Learning based Optimization (DMTLO) is adopted inorder to streamline the conventional TLBO in the calculation of evaluation function. The size of input and design variables is considered to be theinput parameters to discover the biasedgroup of attributes.
The representation of objective function is given below.

1) Teacher phase:
The best learner would be chosen in this phase. Teacher tries to take an attempt in order to enrich the understanding of rest of the learners by maximizing their average mean. Throughout this phase, final iteration can be represented as.
Iteration for (y=1, 2, 3….m) Subject x(x=1, 2, 3…..n) Mean value for individual subject is considered and it could be demonstrated as ms(x,y) In this phase, variancesare taken to modernize the standards in the resolutionpool by totalling the value of differences to the present solution and the algorithm continues to the learner phase.
Chebyshev distance metric is taken to modernize the values in output space. Differences are denoted as Ds, Dchebyshev distance as Dc.
Dc (yi, yj) = max (|yi-yj|) X'new= f(y) + Dc(yi-yj) 2) Learner phase: By making interaction with the peers, the understandability of individual learners can be improved. Admit‗X''new', when a function value is superiortoits earlier value. The attributes that shows enhanced outcomes based on the latest evaluation function through the every cycle is accumulated in attribute subset. This algorithm finishes when each and every attributes are taken for evaluation.

C. Classification using SVM and Ensemble (Adaboostingmethod)
Classification is done using Classification using SVM and Ensemble (Adaboosting method).

1) Support Vector Machine (SVM):
One of the newestproceduresaimed at pattern classification is SVM. It isextensively used in various fields. It is a supervised learning techniqueconnected with learning procedures to examine data and to distinguish patterns. Fixing up the kernel factor for SVM in training phase will definitely influence the correctness of classification results. SVMs were initially recommended by Vapnik (1995) [20]. It is widely used in various applications like image recognition Pontil&Verri (1998) [13], bioinformatics Yu et al. (2003) [25] and text classification Joachims (1998) [6].
Class labels are used to classify the input data. This is possible via defining a group of support vectors which areconsidered to be a part of training inputs.
Along with linear classification, SVMs are well relevant for random classification with the help of data, indirectlyplotting their inputs on high-dimensional attribute spaces.
2) Ensemble classification: Ensemble learning helps in enlightening the outcomes of Machine Learning (ML) by linking several models. This strategy produces a notable outcome in contrast to a solitary model. A group of classifiers acquire and then cast their vote. The extrapolativecorrectness is upgraded but it is challenging to comprehend them Dietterich (2002) [4]. It is beneficial in solving statistical, computational and representational problems. It is not essential to find more precise models, but build models with errors. Ensemble models built to perform classification can misclassify initially.
There are different methods of building ensembles. The algorithm is shown below.
Step 1: Form the test set ‗T' using ‗n' documents in ‗X' Step 2: Form the training set ‗TR' using the residual documents in ‗X' Step 3: for every classifier in ‗C'.
Make use of classified documents to train the classifier in ‗T'.
Utilize the trained classifier to group the documents in ‗S'.
Store the resultant labels in the particular class. Step 5: ‗m' is served into the k-means procedure to form document groups.
Step 6: Apply SVM-linear algorithm on ‗T' for document categorization.
Step 7: Select the classes conforming to the clusters by finding the class attained in the preceding step.

D. Datasets
The datasets aretaken from UCI machine learning repository.
Nearly 76 features are present in the heart disease dataset, but most of the researchers have made use of 14 in the list. The objective of this dataset is to conclude whether a patient is having a heart disease or not. It is numericalvalue that ranges from 0 to 4. Investigations with the Cleveland database have focused on simply attempting to differentiate existence (values 1, 2, 3, 4) from non-existence (value 0) of heart disease.
In the heart diseases dataset there are 14 attributes 304 Instances, whereas in Chronic Kidney Disease dataset there are 25 attributes 400 Instances and Breast cancer dataset includes 32 attributes 569 Instances. Each has an attribute that is a class like present and not present.
Chronic Kidney Disease dataset includes blood tests and various other measures collected from the patients either with the presence or absence of CKD. The details are collected from nearly 400 patients who were in observation for over period of 60 days. Out of 400 patients, 250 were diagnosed with Chronic www.ijacsa.thesai.org Kidney Disease and 150 were without Chronic Kidney Disease. This variation is represented as -Class‖ in the dataset. Few important attributes of this dataset are age, Hyper tension, Diabetic, Blood Glucose Random, Blood Urea, Haemoglobin etc.
Wisconsin Diagnostic Breast Cancer (WDBC) is one of the standard datasets considered for Breast cancer diagnosis. It has nearly 699 instances, in which 458 are benign and 241 are malignant with 11 attributes that includes a class attribute.

V. RESULTS AND DISCUSSION
The following figures (Fig. 2-7) show the performance of the benchmarked and the proposed schemes. Table I shows the quantity of Features taken using TLO, KD and DMTLO.

VI. CONCLUSION
In this paper, the outcomes of the proposed system are evaluated for 3 various datasets like Heat disease, chronic kidney disease and Breast cancer. The experimental results are compared with existing Teaching Learning optimization and Kernel Density. The results are analysed in terms of Accuracy, Precision, Recall, F-measure, Time Period and Error Rate. Based on this, it is noticeable that the proposed DMLTO overtakes the existing methodologies.