The Role of Data Pre-processing Techniques in Improving Machine Learning Accuracy for Predicting Coronary Heart Disease

These days, in light of the rapid developments, people work day and night to live at a good level. This often causes them to not pay much attention to a healthy lifestyle, such as what they eat or even what physical activities they do. These people are often the most likely to suffer from coronary heart disease. The heart is a small organ responsible for pumping oxygen-rich blood to the rest of the human body through the coronary arteries. Accordingly, any blockage or narrowing in one of these coronary arteries may cause blood not to be pumped to the heart and from it to the rest of the body, and thus cause what is known as heart attacks. From here, the importance of early prediction of coronary heart disease has emerged, as it can help these people change their lifestyle and eating habits to become healthier and thus prevent coronary heart disease and avoid death. This paper improve the accuracy of machine learning techniques in predicting coronary heart disease using data preprocessing techniques. Data preprocessing is a technique used to improve the efficiency of a machine learning model by improving the quality of the feature. The popular Framingham Heart Study dataset was used for validation purposes. The results of the research paper indicate that the use of data preprocessing techniques had a role in improving the predictive accuracy of poorly efficient classifiers, and shows satisfactory performance in determining the risk of coronary heart disease. For example, the Decision Tree classifier led to a predictive accuracy of coronary heart disease of 91.39% with an increase of 1.39% over the previous work, the Random Forest classifier led to a predictive accuracy of 92.80% with an increase of 2.7% over the previous work, the K-Nearest Neighbor classifier led to a predictive accuracy of 92.68% with an increase of 2.58% over the previous work, the Multilayer Perceptron Neural Network (MLP) classifier led to a predictive accuracy of 92.64% with an increase of 2.64% over the previous work, and the Naı̈ve Bayes classifier led to a predictive accuracy of 90.56% with an increase of 0.66% over the previous work. Keywords—Coronary heart disease; heart; machine learning; data preprocessing; classification technique


I. INTRODUCTION
The heart is one of the most important organs in the human body. It is a small, muscular pumping organ responsible for supplying other organs in the body with oxygen and other important nutrients [1]. This means that a person's life depends on the efficiency of heart function. Therefore, if the heart does not function well, other organs also cannot function well [2].
People, in light of the difficult economic conditions, seek to secure their basic needs by working long hours daily. This lifestyle often does not take into account the diet and health of these people to ensure their safety [3]. This type often leads to a risk of diseases such as diabetes, high cholesterol and blood pressure at an early age, and all of these diseases, if not controlled, can lead to coronary heart disease [3].
Heart disease is a term that refers to any problem that can affect the heart and blood vessels [2], such as coronary heart disease, congenital heart disease, and rheumatic heart disease [4], which, according to the National Heart, Lung, and Blood Institute ranks among the most dangerous and common diseases in the world.
In coronary heart disease, a complete or partial blockage of the coronary arteries usually occurs due to blood clotting or the accumulation of fatty plaques on the walls, which leads to the inability of the heart to get enough oxygen [5] and thus it is difficult for the heart to function as efficiently as required.
There are two risk factors for coronary heart disease. The first type is stable and cannot be changed, such as age, gender and family history, while the other type depends on lifestyle such as diabetes, smoking, high cholesterol, high blood pressure, high body mass index, and low exercise [6]. However, the second type of risk factors can usually be controlled, according to experts, by changing our lifestyle and diet, and using certain medications if needed.
In recent years, artificial intelligence techniques have been used extensively in the medical fields in order to improve the efficiency of disease diagnosis/classification in its early stages [7]. Among those techniques stand out machine learning techniques, which are a set of statistical models that help the machine learn from past data [8]. In spite of this, it is often difficult to deal with patient data for diagnosis in the early stages due to reasons such as data volume, missing values and noise in the data. But machine learning techniques and their capabilities have helped process such data [9].
Also, it is noticeable regarding data features that they may be incomplete and huge. The range of some data features is small while the range is large for other data features. The type of data features is combined between categorical and numerical; all of this will affect the accuracy of machine learning techniques in diagnosing and classifying diseases in their early stages, including coronary heart disease. Using different techniques to manipulate the features under the socalled data preprocessing techniques and thus improve the accuracy of machine learning techniques in early prediction of the disease [10]. 12 This 6earc21paper is organized as follows: The second section is a review of some relevant work. The third section presents the methodology for this research paper. The fourth section is for presenting, evaluating and discussing the results of the research paper. The fifth section is for conclusion and the sixth section is the future work.

II. RELATED WORK
Recently, there has been an increase in the number of papers dealing with the use of machine learning techniques in predicting serious diseases that may affect people's lives, including coronary heart disease. In [11], the researchers applied a logistic regression technique on the Framingham Heart Study dataset to predict the ten-year risk of coronary heart disease. The researchers used 65% of the dataset for the training set. The accuracy obtained was 84.8%.
The researchers in [12] had a contribution by implementing four machine learning algorithms, namely support vector machine (SVM), neural network, XGBoost, and random forest to predict the ten-year risk of coronary heart disease. The researchers also used the Framingham Heart Study dataset to validate the results. The accuracy obtained was 84.8% for support vector machine, 85.4% for neural network, 86.99% for XGBoost, and 84.9% for random forest.
Also, the researchers in [4] contributed to the literature of this field by using boosting adaptive algorithm on four datasets, namely (UCI Cleveland, UCI Switzerland, UCI Long Beach, and UCI Hungarian) to diagnose coronary heart disease. This approach obtained accuracy (97.16% and 80.14% for Cleveland, 98.63% and 89.12% for Hungarian, 93.15% and 77.78% for Long Beach, 100% and 96.72% for Switzerland) for training and testing set respectively.
In [13], the researchers applied three machine learning algorithms, namely support vector machine, neural network, and Hybrid-SVM on the Framingham Heart Study dataset to predict the ten-year risk of heart attack. The accuracy obtained was 86.03% for support vector machine, 84.7% for neural network, and 94% for Hybrid-SVM. However, these results were better for some of the machine learning techniques used than those used for [12].
In [14], the researchers applied six algorithms, namely decision tree, boosted decision tree, random forest, support vector machine, neural network, and logistic regression on the Framingham Heart Study dataset to predict the ten-year risk of coronary heart disease. The data was divided into 80% training and 20% testing. The researchers used R Studio and Rapid-Miner in their work. The researchers used three techniques to deal with missing values. The first technique is to ignore missing values, and obtained accuracy of 85% for the decision tree, 63% for the boosted decision tree, and 63% for logistic regression. All this while using the Rapid-Miner tool. Whereas, the R studio tool enabled the researchers to obtain the accuracy of 84% for the decision Tree, 85% for the boosted decision tree, and 84% for logistic regression. Analysis of complete case is the second technique used, as the Rapid-Miner tool enabled the researchers to obtain accuracy of 54% for the decision tree, 64% for the boosted decision tree, 65% for the random forest, 69% for the support vector machine, 69% for the neural network, and 68% for logistic regression. R studio tool obtained accuracy 67%, 81%, 79%, 69%, 67%, and 68% for the decision tree, boosted decision tree, random forest, support vector machine, neural network, and logistic regression respectively. The final technique is to be replaced with the average, and the accuracy obtained while using the Rapid-Miner tool was 62% for the decision tree, 62% for the boosted decision tree, 63% for the random forest, 68% for the support vector machine, 68% for the neural network, and 67% for logistic regression. Whereas, the R Studio tool enabled the researchers to obtain an accuracy of 84% for the decision tree, 84% for the boosted decision tree, 78% for the random forest, 68% for the support vector machine,71% for the neural network, and 66% for logistic regression.
However, other researchers such as those in [15] applied only one algorithm which is the logistic regression on the Framingham Heart Study dataset to predict the ten-year risk of coronary heart disease. This approach obtained better accuracy of 86.6% than ever.
In [16], the researchers applied the same previous method of logistic regression to the Framingham Heart Study dataset to predict a heart attack. This approach obtained an accuracy of 87%.
In [19], the researchers applied a logistic regression and neural network to the KNHANES-VI dataset to predict the risk of coronary heart disease. However, this approach obtained accuracy 86.11% for the logistic regression and 87.04% for the neural network. The researchers used a distinct correlation (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 6, 2021 analysis to improve the accuracy of the neural network to become 87.63%.
In other research such as [20], the researchers applied Naïve Bayes, KNN, random forest, decision tree, SVM, logistic regression, and the ensemble classification approach to the NHANES and Framingham Heart Study dataset, to monitor the risk of chronic diseases. For the NHANES dataset, the decision tree algorithm obtained an accuracy of 97.6%, 96.5% for the ensemble approach, 80.8% for the KNN In [22], the researchers applied neural network algorithm on the Framingham Heart Study dataset to predict the heart disease. The accuracy obtained was 90% .
Other researchers such as those in [23] applied the knearest neighbor (KNN), Logistic regression (LR), linear discriminant analysis (LDA), support vector machine (SVM), classification and regression tree (CART), gradient boosting (GB), and random forest (RF) the Framingham Heart Study dataset to detect the heart disease. The accuracy obtained was 81% for KNN, 83% for LR, 83% for LDA, 82% for SVM, 75% for CART, 83% for GB, and 83% for RF. After that some ensemble techniques were applied and the accuracy was improvement to 86%.
Those in [24] applied k-nearest neighbor, decision tree, random forest logistic regression, and neural network on the Framingham Heart Study dataset to predict the heart disease. The accuracy obtained was 86% for k-nearest neighbor, 77% for decision tree, 86% for random forest, 85% for logistic regression, and 85% for neural network.
Most of previous researchers using either the UCI dataset or Framingham Heart Study dataset, UCI dataset is a good dataset for diagnosis, and prediction heart disease, but this data has some limitations, first limitation is the size of instance of the data is bit small, second limitation the dataset does not include some important features for predict and diagnose heart disease such as LDL cholesterol, HDL cholesterol, smoking or not smoking, diastolic blood pressure, systolic blood presume, number of cigarettes per day, body mass index, and family history of any type of heart disease. This means this data does not fit to diagnose or predict heart disease for smoking patients, patient with history of blood pressure, obesity patients, and patients with a family history of heart disease. also, Framingham Heart Study dataset is good data for predict heart disease, this data does not contain feature for family history of any type of heart disease. This means this data specific for patient with no family history of any type of heart disease.
Despite this and many other researches, the field is still open for researchers to conduct their experiments in order to improve the accuracy of the machine learning techniques for predicting diseases that pose a risk to human life, including coronary heart disease.

III. RESEARCH METHOD
It is unfortunate to hear that there is an increase in the number of patients diagnosed with coronary heart disease (angina or heart attack) day after day. High blood pressure, high cholesterol, uncontrolled diabetes, smoking, and a diagnosis of cardiovascular impairment and other risks, all increase the chance of diagnosis with coronary heart disease in the future. Therefore, an accurate system needed to help the patient protect him/herself from the risk of coronary heart disease, relying in this on the patient's demographic information, medical history, medical examination, behavior, and laboratory examination.
Many researchers have developed machine learning models using different classification algorithms such as decision tree, Naïve Bayes, SVM, KNN, and neural network. Most of these models were utilizing the Cleveland Heart Diseases dataset to predict coronary heart diseases, but few were using the Framingham Study dataset. This paper uses the Framingham Study dataset to validate the resulting model since it includes features for most of the potential risk factors for coronary heart disease and some of these features are not found in the most common dataset of heart disease namely, Cleveland Heart Disease dataset. In this paper, five machine learning classification algorithms were used such as decision tree, Naïve Bayes, neural network, random forest, and KNN. These five algorithms used the Framingham Heart Study dataset with two events for target (output) features to predict coronary heart disease, as a number of different Data Preprocessing techniques will be used to improve the accuracy of machine learning models for predicting coronary heart disease.

A. Dataset
The Framingham Heart Study dataset is the first long-term epidemiological study concerned with the possible causes of cardiovascular disease that began in 1948 in Framingham, Massachusetts [20]. The Framingham Heart Study dataset identified the prospective risk factors of cardiovascular diseases and their effects [20], [25].

B. Data Preprocessing
Data preprocessing is a group of techniques that are applied on the data to improve the quality of the data, such as handling missing values, convert the type of feature and many other techniques [10].
1) Impute Missing Values By Knn: knn for missing values working by calculate the distance or similarity to find the most similar case in the dataset and change the missing value with it [26], by applying (1).
Where X i some known values, and Y i some values that should predict their values.

2) Min Max Normalization:
This method is convert each numerical feature value into new value depending on the minimum and maximum values of the feature [27], by applying (2) Where Min is the smallest value in the selected feature, Max is the biggest value in the selected feature,X is a new select value after applying normalization, X is a selected value from a numerical feature.
3) Z-Score Standardization: This method is convert each numerical feature value into new value depending on the standard deviation and Mean of the feature [28], by applying Xis a new select value after applying standardization, X is a selected value from a numerical feature.

4) One Hot Encoding:
One Hot Encoding splits the categorical feature into a separate number of features depending on the number of the cases in the original categorical feature, and give 0 for absence and 1 for presence in each new feature [29].

5) Ordinal Encoding:
In this technique, each case in the categorical feature is converted into integer value [29].

6) Equal Width Discretization:
This is an easy method that sorting the values of numerical feature and split the range of sorting values into predefined equal-width bins [30] by applying (4) and (5).
Where W is the width of the bin, V Max is the maximum value in the selected numerical feature, V Min is the minimum in the selected numerical feature, i = 1. . . . . . k-1.

7) Equal Frequency Discretization:
In this method, firstly sorting the values in ascending order. Split the range of sorting values into predefined number of equal-frequency bins by applying N K , each bin has the same number of values [30].

C. Classification Algorithms
Classification is a supervised machine learning model used with a label's output to determine the result of the model from many labels or categorical input data [31]. The classifier model is built for training depending on many known labelled or categorical feature of input data [31]. In the next step, the model tested by using the test set to identify the number of the known target for the model and try to correct the unknown target for the model [31].
1) ID3 Decision Tree: Each decision tree contains a root node, leaf node, internal node and branches. In ID3 decision tree, all features set as root node, and after that the features are divided by finding the entropy which it utilizes the measure of the harmony in the data; the values of entropy is between 0 and 1 [7], and information gain is the difference between the feature and the subsets of this feature [7]. Entropy and information gain can be found by applying (6) and (7), and the feature which has the highest information gain value is selected as the root node of the tree [7].
Where C is number of outputs, P i is probability of occurrences each output from all output, K number of spilt data, F feature with some data, F i spilt data from feature F.
2) Random Forest: Random forest is a classification algorithm [32] works by creating many decision trees from the dataset [32]. The features are selected randomly from the training set to build the trees in the random forest [32]. After building each decision tree and find the result of each the tree, applying majority voting to decide the final result of the random forest [32]. In the process of building each decision tree, the randomization is applied to find the value the split node.
3) K-Nearest Neighbours: KNN is a lazy supervised machine learning algorithm that used to predict and classify unknown data from known data by measuring the distance between them [33]. The distance metric is using to measure the distance between point from testing data with all the point in training data [33], [34], the distance can calculate by applying (8).
Where X i some values belong to known output class , and Y i some values that should predict their output class.
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 6, 2021 4) Multilayer Perceptron Neural Network: Artificial Neural network structure is the same as the brain of human [35]. Multilayer perceptron (MLP) that contains more than one layer(input layer, hidden layer(s), output layer) [36].
First, in the neural network before start training from the dataset, the value of weight (w) is randomly assigned [36]. After that, the neural network begin the training [36]. Sigmoid is a non-linear activation function commonly use in feedforward neural networks to find the output [37]. Sigmoid function can be calculated by applying (9).
Back Propagation algorithm is commonly used to train Multilayer Perceptron Neural Network In the first step of this algorithm is to compare between predict output (Ȳ ) (Ȳ ) (Ȳ ) and actual output (Y) to find the error between them, this error return to neural network and the weight change depending on this error, and the weight numerical change until the value of(Ȳ ) (Ȳ ) (Ȳ ) become closer to (Y) [36].

5) Naïve
Bayes: Naïve Bayes is a statistical classification algorithm that works on the basis of Bayes' theory, and Naïve Bayes assumes that each feature is separate, and each variable is distinct in prediction and occurrence [3]. Naïve Bayes uses the prior probability of Bayes theorem to calculate the likelihood of the relationship between each feature in the test data with each target, the target with the highest probability is selected as the result of the model [38]. The probability can be found using (10): Where P (C i |F j ) probability of specific class (C i ) appear with specific feature (F j ) from the total of all Features F and Classes C, P (C i ) probability of specific class (C i ) from the total of all classes (C), P (F j |C i ) probability of specific feature (F j ) appear with specific class (Ci) from the total of all features (F) and classes(C), p(F j ) probability of specific feature (F j ) from the total of all features (F).

D. Stratified KFold Cross Validation
Cross validation is a static method used to test an algorithm by dividing the data set into a training set used to train the model and the test set used to evaluate the model performance [39]. In cross-validation, every point has the same chance of being used in the test [39]. In kfold, the dataset is evenly divided into k number of fields [39]. Stratified KFold means that each fold has the same class naming distribution in the original dataset [40]. For each iteration, one test folds and others are used for training [39].

E. Tool
RapidMiner is a data science software platform developed by the company of the same name that provides an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics [41]. In machine learning, RapidMiner can be used for feature processing, dataset segmentation, model training, model testing, network research, and performance evaluation [41].

IV. RESULTS AND DISCUSSION
In this paper, five machine learning classification techniques used to predict two primary CHD events, namely, angina pectoris (528 yes, 2735 no) and myocardial infarction (308 yes, 2955 no).

A. Performance Evaluation
Performance evaluation is a group of equations used to measure the effectiveness of the classifier or the model [42]. Below is the definition of some essential terms used in the equations of performance evaluation:

1) True Positive (TP):
The person is healthy and also predict as healthy [42] 2) False Positive (FP): The person is healthy, but predict as sick [42] 3) True Negative (TN): The person is sick and predict as sick [42] 4) False Negative (FN): The person is sick, but predict as healthy [42] B

. Confusion Matrix
The confusion matrix is used to analyze the ability of classifier or model to identify the classes of the dataset [42]. TN and TP are referred to correct classification, while FN and FP are referred to wrong classification [42]. For the accurate classifier or model, TP and TN are classified more than FN and FP [42], as shown in Table I  C. Performance Metrics 1) Accuracy: Accuracy is an evaluation metric of the total number of predictions the model or the classifier gets right [43]. The accuracy can be calculated by applying (11).
2) Precision: Precision is used to identified is the diagnosis or the predicted result is close to the real result [43]. Precision can be calculated by apply (12).

P recision =
T P T P + F P 3) F-Measure: F-Measure refers to the mean of consistency between Precision and Recall [43]. F-Measure can be calculated by apply (13).

4) Sensitivity(Recall):
Sensitivity is true positive rate measure. In other words, the rate of healthy person diagnosis or predict as healthy [43]. Sensitivity can be calculated by apply (14).
5) Specificity: Specificity is true negative rate measure. In other words, the rate of sick person diagnosis or predict as sick [43]. Specificity can be calculated by apply (15).

D. Algorithms Confusion Matrix
Below Table II, Table III, Table IV, Table V, and Table  VI, shown the number of correct predict (True Positive and True Negative) and wrong predict (False Positive and False Negative) for each algorithm.

H. Discussion
In this research paper, a set of machine learning techniques used to predict two events of coronary heart disease namely, Angina Pectoris (528 Yes, 2735 No), and Myocardial Infarction (308 Yes, 2955 No). Despite the previous researchers used many data preprocessing techniques, the results obtained from this work were very encouraging compared to other studies that use the same data set to calculate accuracy as shown in Table IX.
It is noted that the techniques that have been used to improve the accuracy of machine learning models or classifiers in predicting coronary heart disease have proven effective and thus have achieved better results than previous research.
For example, [20] and [21] used the same data set and obtained by applying the decision tree algorithm a predictive accuracy of 90% to predict coronary heart disease (CHD), while this research paper obtained an accuracy of 91.39%, with a positive increase of 1.39% as shown in Table IX.
Also, this research paper and through the application of the random forest algorithm obtained a predictive accuracy of CHD 92.80%, shown in Table IX, which is higher than the result obtained in the decision tree algorithm in this research paper on the one hand, and on the other hand, higher and better than the results obtained by [20] and [21] and that was 90.10%, with a positive increase of 2.7%.
As for the use of the MLP algorithm in predicting CHD, researchers in [21] obtained an accuracy of predicting the (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 6, 2021 disease 90%, while this research paper obtained a better accuracy of 92.64%, with a positive increase of 2.64% shown in Table IX.
Regarding the use of the KNN algorithm, researchers in [20] and [21] obtained a prediction accuracy of 90.10%, which is less than the prediction accuracy of the disease obtained in this research paper, which is 92.68%, which was applied to calculate the missing values and equal width discretization, with a positive increase of 2.58% as shown in Table IX.
The application of the Naïve Bayes in this research paper obtained a predictive accuracy of coronary heart disease 90.56% as shown in Table Table IX, which is better than the predictive accuracy of 89.90% obtained in [20].
After applied data preprocessing techniques, this proposed work obtained accuracy better than previous researches used the same dataset and same techniques, such as, [13] that published in 2018 was obtained accuracy 84.7% for neural network; decision tree was obtained 85%, random forest was obtained 79%, and neural network was obtained 71% in [14] that published in 2017; [20] that published in 2017 was obtained accuracy 90.1% for KNN, 90.1% for random forest, 89.9% for Naïve Bayes, and 90% for decision tree; the accuracy in [21] that published in 2018 was obtained 90.1% for KNN, 90.1% for random forest, 89.9% for Naïve Bayes, and 90% for decision tree; in 2020 the [22] was obtained accuracy 90% for neural network; [23] that published in 2021 was obtained accuracy 81% for KNN, 75% for decision tree, and 83% for random forest; decision tree was obtained 77%, random forest was obtained 86%, KNN was obtained 86%, and neural network was obtained 85% in [24] that was published in 2021.
Although the results obtained in predicting coronary heart disease in terms of accuracy were not as significant as it should be, it may contribute to an increase in the number of cases with the correct diagnosis of the disease and at the same time reduce the number of cases that are incorrectly diagnosed with coronary heart disease and thus save lives The heart is among the most important organs of the human body, as any problem with it can damage other important organs in the body, such as the brain. All doctors around the world warn of the sharp increase in the number of heart patients, being a serious disease that may lead to serious complications such as heart failure and cardiac arrest, both of which often lead to death if not diagnosed early.
In this paper, the researchers contributed to improving the accuracy of machine learning classification models in predicting two primary coronary heart disease events, namely, angina pectoris and myocardial infarction through the use of a number of feature processing techniques such as normalization, standardization, and discretization. For the purpose of validating the results obtained, the data set of the Framingham Heart Study was used with two main events (angina pectoris and myocardial infarction (heart attack)), due to its containment and after consulting with cardiologists about the most common factors causing coronary heart disease.
After using data preprocessing techniques on the dataset, the accuracy of machine learning algorithms for predicting coronary heart disease improved unevenly. For example, the improvement in accuracy prediction of CHD was 4.2% when using the ID3 decision tree algorithm, 0.14% when using the random forest algorithm, 3.18% when using the KNN algorithm, 2.08% when using the MLP algorithm, and 1.36% when using the Naive Bayes algorithm as shown in Table VII  and Table VIII. However, the best prediction accuracy obtained for the ID3 decision tree algorithm is at 91.39% when applied the equal width discretization method. Whereas, the random forest algorithm achieved a prediction accuracy of 92.80% when applied the equal width discretization and applied normalization methods. The MLP algorithm achieved an improvement in accuracy prediction by 92.64% when using one hot encoding technique. 92.68% represents the predictive accuracy obtained with the KNN algorithm when applied the ordinal coding and standardization techniques. However, all of the predicted values obtained were in the case of a myocardial infarction event. Whereas, the value obtained from Naive Bayes algorithm was 90.65% in the case of angina pectoris and when applied equal frequency discretization. The results obtained confirm the importance of using data preprocessing techniques in improving the accuracy performance of machine learning algorithms for predicting coronary heart disease compared to previous published research with the same objectives.
In the end, the presence of a correlation between some serious diseases such as the occurrence of stroke, high blood pressure, cardiovascular disease and coronary heart disease leads us in the future to predict such diseases and the effect of each of them on the occurrence of coronary heart disease on the one hand, and on the other hand the effect of the occurrence of coronary heart disease, on these diseases, to prevent death. This is because the patient in such cases does not have enough time to go to the doctor to see him and save his life.

VI. FUTURE WORK
In the future work, more data preprocessing techniques and more machine learning classification algorithms can apply to get better results than the ones that obtained in this proposed work.
Machine learning algorithms can used to analyze big data to forecast coronary heart disease. This means that a huge amount of data means that the prediction will get better because more data means that the result is more accurate.
Sometimes the patient does not have enough time to go to the doctor, so develop a website or smartphones application for the graphical user interface solve this problem, and this site makes the prediction process easier and from the patient's place where the user only enters his risk factors information and the result is presented to him immediately.