Heart Disease Classification and Recommendation by Optimized Features and Adaptive Boost Learning

—In recent decades, cardiovascular diseases have eclipsed all others as the main reason for death in both low and middle income countries. Early identification and continuous clinical monitoring can reduce the death rate associated with heart disorders. Neither service is yet accessible, as it requires more intellect, time, and skill to effectively detect cardiac disorders in all circumstances and to advise a patient for 24 hours. In this study, researchers suggested a Machine Learning-based approach to forecast the development of cardiac disease. For precise identification of cardiac disease, an efficient ML technique is required. The proposed method works on five classes, one normal and four diseases. In the research, all classes were assigned a primary task, and recommendations were made based on that. The proposed method optimises feature weighting and selects efficient features. Following feature optimization, adaptive boost learning using tree and KNN bases is used. In the trial, sensitivity improved by 3-4%, specificity by 4-5%, and accuracy by 3-4% compared to the previous approach.


I. INTRODUCTION
The cardiovascular system, which also includes the lungs, is powered by the heart, a muscular organ which circulates blood throughout the body. The cardiovascular system includes a blood vessel network in addition to arteries, veins, and capillaries. Blood is distributed by these blood vessels all over the body. Cardiac disorders, also known as heart illnesses, are defined by deviations in the normal blood circulation of heart. The leading causes of death globally are heart disorders. Based on a survey conducted by the World Health Organization (WHO), strokes and heart attacks are responsible for 17.5 million deaths worldwide. Over 75% of deaths from heart disease happen in middle-income and low-income countries. In addition, strokes and heart attacks account for 80% of CVDrelated mortality [1], [2]. In light of this, the mortality rate from cardiovascular problems can be reduced with the use of early recognition of cardiac abnormalities and prediction tools. Predictive models for cardiovascular disorders can now be developed with the help of the vast amounts of patient data that are readily available thanks to the expansion of modern healthcare infrastructure (i.e. Big Data inside the Electronic Health Records System). Machine learning is a technique for finding new information by analysing large datasets from several angles. Numerous records on patients' health, disease diagnoses, and other topics are created every day in the modern healthcare sector [3], [4], [5]. Many methods for unearthing similarities or hidden patterns in data can be found using machine learning [6], [7], [8], [9]. Machine learning has proved to be beneficial when it comes to making predictions and judgments based on the massive amounts of data collected by businesses in the healthcare industry [7], [8], [9], [10], [11]. Machine learning allows computers to automatically learn from data sets and improve their performance based on past experiences with little to no human input. Each time a ML algorithm makes a good call, it gets smarter. Consequently, in this research, we present a ML algorithm for the development of a cardiovascular disease forecasting tool.

A. GAP in Previous Work
The fundamental challenge with heart disease classification is that there is a limited dataset and just five classifications, thus learning efficiently is critical. In prior work, the following problem was discovered: • Previous research has ignored feature overlaps and increased noise during learning [1], [2].
• Formerly, the emphasis was mostly on accuracy, which ranged from 40-50% in the case of five classes [4], [5].
• Do not improve the features based on their classification capacity [8], [9].
• Learning with a single classifier that is highly polynomial and increases over fitting [11].
• The majority of research is focused on binary categorization, yet this is not a true condition [12], [13].

B. Contribution of Research
• Apply entropy and information gain constraints to optimize features.
• Optimize feature selection and feature weights by using a genetic algorithm to maximize the Pareto surface.
• Work on feature-by-feature and weighted-features analysis of several performance metrics.
• In brief, optimize feature space and learning through optimizing classifiers.
• Focusing on five classes with high accuracy, sensitivity, and specificity.

II. RELATED WORK
The paper gives an in-depth analysis of how ML can be used to treat cardiovascular disease. We also examine numerous popular literature on predicting the course of heart disease. Ali et al. (2021) determines which machine learning classifiers provide the most accurate performance for diagnostic applications. Several supervised ML methods were implemented and compared in the prediction of heart disease. For all deployed algorithms except KNN and MLP, feature significance scores were assessed for each feature. All of the features were sorted based on their importance score to identify which ones offer the most reliable predictions of heart illness. Using a heart illness database from Kaggle and three-classification algorithms depending on KNN, DT, and RF, the analysis revealed that the RF method achieved 100% sensitivity, specificity, and accuracy. In this study, Katarya et al. (2021) summarized a portion of the expertise automated processes. Prediction and Feature selection are key components of every automated process. By selecting features effectively, one can attain improved heart disease prediction outcomes. The researchers have demonstrated useful methods for selecting attributes, including the hybrid grid search method and random search algorithm. As per Princy et al. (2020), a cardiac database is classified utilizing multiple cutting-edge Supervised ML algorithms for disease prediction. The findings show that the DT classifying model accurately diagnosed cardiovascular problems more so than the LR, NB, SVM, RF, and KNN approaches. 73% of the time, the Decision Tree produced the best outcome. This strategy could aid physicians in predicting the onset of heart problems and providing adequate treatment. Shah et al. (2020) offers several heart disease-related variables and a model based on supervised learning techniques like DT, NB, KNN, and RF. It utilizes the current database from the Cleveland dataset of UCI's heart disease patient repository. There are 76 attributes and 303 instances in the collection. For the purpose of verifying the efficacy of different approaches, only 14 of these 76 attributes are chosen for testing. The purpose of this report is to illustrate the occurrence of heart disease among patients. As per the results, K-nearest neighbour provides the highest accuracy. Sharma et al. (2020) makes a ML model that uses the relevant parameters to predict heart disease. The scholars used a standard UCI Heart disease prediction database for this research. This database has 14 key factors that are related to heart disease. For the creation of the model, ML techniques such as RF, SVM, DT, and NB, have been utilized. The research has also attempted to identify correlations between the numerous qualities present in the dataset by employing standard ML techniques and then employing these correlations to accurately forecast the likelihood of heart disease. When compared to other ML algorithms, the RF technique provides superior prediction accuracy and processing speed. The use of this system to aid in making decisions, this model may be beneficial to medical professionals in their clinic. Krishnan et al. (2019) used two supervised algorithms for data mining on a dataset to determine the likelihood of a patients experiencing heart disease, which were analyzed using classification models such as DT Classification and NB Classifier. These two algorithms were compared on a similar dataset to evaluate which one was the most accurate. The Decision Tree model accurately predicted the cardiovascular disease patient 91% of the time, while the Nave Bayes classifier correctly guessed the heart disease patient 87% of the time. Mohan et al. (2019) strategies and related cardiovascular disease prediction via hybrid ML techniques, with the purpose of discovering essential aspects by applying ML hence boosting the accuracy in the detection of cardiovascular illness. The expectation model consists of common feature groupings and their numerous permutations. The predictive model for cardiovascular illness with hybrid RF using a linear model allows the research teams to produce an improved exhibition level with a precision level of 88.7 percent (Table I). Individuals also informed about various data mining methods and assumption methods, for example, LR, KNN, NN, SVM, and Vote, which have recently been fairly popular in distinguishing and predicting heart disease. Santhana et al. (2019) detect cardiovascular disease in male patients using categorization approaches. This document offers exhaustive information on Cardiac Heart Diseases, including Risk Factors, Facts, and Frequent Type. WEKA seems to be the Data Mining tool used, and it is a great Computational Tool for Bioinformatics Fields. All three WEKA interfaces are used here; NB, ANNs, and DT are the main methods of data mining employed in this system to forecast heart disease. DTs such as C4.5, CART, CHAID, ID3, and J48 Algorithms, and NBs Techniques are commonly used for prediction. Gavhane et al. (2018) trained and examined the dataset using the multi-layer perceptron (MLP) neural network algorithm. Any number of input layers, output layers, and hidden units may be present in this algorithm. To achieve their desired effect, these hidden layers connect all input nodes to all output nodes. This bond is allocated weights. To achieve equilibrium in the perceptron, a second identity input, bias, with weight b, would indeed be introduced into the node. The nature of the nodes' connections to one another (feedforward or feedback) is determined by the task being performed. Li et al. (2018) have created an efficient ML-based approach for the diagnosis of cardiac disease. System design utilizes ML classifiers including ANN, K-NN, NB, SVM, and DT. Four classic feature selection methods, comprising MRMR, Relief, LLBFS, and LASSO, in addition, the issue of feature selection was addressed by employing a unique feature selection method. The system uses the LOSO crossvalidation approach to select the optimal hyperparameters. The system is evaluated utilizing the Cleveland cardiovascular disease database.

A. Dataset
In experiment use ''https://archive.ics.uci.edu/ml/datasets/ heart+disease" data set for classification and recommendation in which total 303 instances, five classes and thirteen features (see Fig. 1 and 2). Step 1: Input heart disease dataset with features and labels.

Steps for Analysis
Step 2: Features optimize by multi-objective optimization by this process given the efficient weight to features. In equation (3), E represent Entropy IG represent Information gain Cleveland Heart Disease and Statlog Heart Disease dataset The suggested feature selection approach is workable with SVM classifiers for building an advanced smart system for cardiac illness diagnosis.
Step 3: After crossover finish go to efficient Pareto space using following activation function or fitness function.  Step 4: After finding the Pareto space optimize weight. By this find the maximize optimal weights o features Step 6: After optimizing the weights of the weighted feature learn by classifier.  (Table II).
The comparison of the various features depending on their degrees of accuracy is shown in Fig. 3. When compared to other approaches, the accuracy that Adaptive Boosting KNN provides is far superior (Table III)

V. OBSERVATION OF RESULTS
• In the results, we compared existing and proposed adaptive boost approaches. There are three variants of adaptive boost in the results: one is basic adaptive boost, another is hybridized with tree, and the third is hybridized with KNN.
• Using multi-objective genetic optimization, features are given the appropriate weighting in all of the proposed methods. It makes sure that features don't overlap and boosts performance, as shown in the figures above.
• Adaptive boost tree improves all measures of performance because entropy and information gain map well on tree-based approaches.
• By maximizing performance improvement in sensitivity, the proposed model's recall value is raised.

VI. CONCLUSION
The long-term preservation of people's existence and the early detection of irregularities in heart problems will be made possible by recognizing the processing of primary health records of heart data. In order to process the raw data and deliver a new and unique insight towards heart disease, methods based on machine learning were applied in this study. Prediction of heart disease is difficult and crucial in the medical industry. However, if the disease is discovered in its initial stages and preventive measures are implemented as soon as feasible, the fatality rate can be significantly reduced. The proposed approach employs a five-class classification system to improve the diagnosis of specific heart disease and the subsequent recommendation. As a result, improving classification sensitivity is a significant task. Sensitivity is improved through feature optimization, and ensemble learning is enhanced through bagging and boosting. In comparison to traditional SVM and KNN methods, a 5% gain in sensitivity is highly significant.
In future, we enhance this work using non-linear mapping by deep learning approach and make optimize latent space for reducing overlapping between classes