Weighted Minkowski Similarity Method with CBR for Diagnosing Cardiovascular Disease

This study implements Case-Based Reasoning (CBR) to make the early diagnosis of cardiovascular disease based on the calculation of the feature similarity of old cases. The features used to match old cases with new ones were age, gender, risk factors and symptoms. The diagnostic process was carried out by entering the case feature into the system, and then the system searched cases having similar features with the new case (retrieve). The level of similarity of each similar case was calculated using weighted Minkowski method. Cases with the highest level of similarity would be adopted as new case solutions. If the value of similarity was <0,8, the revision would be conducted by an expert. The tests result conducted by the expert showed that the system was able to perform the diagnosis correctly. The test results are performed on the sensitivity of 100% and specificity of 83,33%. Meanwhile, the accuracy of 95,83% and the error rate of 4,17% is so that this research is relevant enough to be implemented in the medical area. Keywords—CBR; cardiovascular; similarity; weighted minkowski


I. INTRODUCTION
Weighted Minkowski similarity method Cardiovascular Disease (CVD) is the term for a series of heart and blood vessel disorders.World Health Organization data of 2012 shows that CVD is the number one cause of death in the world.In 2008 there were 17,3 million people died from CVD, these numbers represent 30% of the cause of death in the world.There were 7,3 million people died because of coronary heart disease and 6,2 million because of stroke [1].
On this issue, it is necessary for Diagnosing Cardiovascular Disease using Case-based Reasoning (CBR) approach with Weighted Minkowski Similarity.Many of the early systems attempted to apply pure Rule-Based Reasoning (RBR) as reasoning by logic in the expert system [2].However, for broad and complex domains where knowledge cannot be represented by rules (i.e.IF-THEN), this pure rule-based system encounters several problems [3].Due to the difficulty of the knowledge acquisition process, computer experts have tried to learn other problem solving methods known as CBR using lambda value analysis on the weighted Minkowski distance model [2].
The knowledge representation of CBR is a case base occurred previously.CBR uses a solution from an earlier case similar to the current case to solve the problem.The method that can be used to calculate similarity is weighted Minkowski [4].If a new case has a resemblance to the old one, CBR will reuse the old case solution as a recommendation for the new case solution.But if there is none match, CBR will do adaptation by retaining the new case into the case database, so the CBR knowledge will increase [2].The more cases stored in the case base, the smarter the CBR system will be.
Based on the above facts, it is necessary to establish a system capable of diagnosing cardiovascular disease.The built system is an implementation of CBR in which the problems in new cases are solved by adapting solutions from old cases that have occurred and CBR is an important technique in artificial intelligence, which has been applied to various kinds of problems in a wide range of domains [3].We use this weighted Minkowski similarity method because it is very good in our case for completing the diagnosis of CVD.

II. RELATED WORK
Several studies in the domain of cardiovascular have been conducted by [5] used a structured poly tree concept and directed acyclic graphical model (DAG) to predict all cases that could cause coronary heart disease.Tests showed that the applied concept was more accurate and efficient in predicting heart disease before the physical examination.In the same years [6] proposed a new algorithm to predict heart disease using CBR techniques.Meanwhile, most of the algorithms are based on Binary data only.The system was implemented using Java and had successfully predicted different levels of risk of heart attack effectively [6].
The application of CBR in the field of cardiovascular disease has been done by [7] by building case-based expert system prototype of heart disease diagnosis.While [8] used CBR in building a multimedia decision support system (MM-DSS) of heart disease diagnosis.[6] used 110 cases for 4 types of heart disease.The two retrieval methods were used namely induction and nearest-neighbour.It showed that the accuracy of using the nearest neighbour method is better than that of the induction method, i.e. 100% and 53%.Meanwhile, [8] medical multimedia based clinical decision support system for operational chronic lung diseases diagnosis and training with 97,36% Sensitivity, 97,77% Specificity, 96,85% positive predictive value (PPV) and 93,90% negative predictive value (NPV).
CBR for diagnosing heart failure in children done by [9].The research conducted by [10] was for face recognition using 3 (three) different local features namely Manhattan distance, weighted angle distance and Minkowski distance.The results www.ijacsa.thesai.orgshowed that Minkowski distance provided better results in terms of time to recognize faces that were 0.46, 0,45 and 0,43 seconds for Minkowski, Weighted angle and Manhattan respectfully.[11] has developed a mobile cancer management system (MCSM) prototype to diagnose cancer patients.The system developed was a combination of CBR and CBIR with similarity measure using weighted Minkowski method.Based on 600 images of breast cancer radiology tested, it resulted in 90% accuracy.
Based on the explanation above, a great number of researches to diagnose cardiovascular disease have been conducted.In fact, the process of cardiovascular disease diagnosis needs to involve some risk factors, gender and age of patients to improve the accuracy of the diagnosis.Specifically, there has been no research conducted to diagnose the type of Acute Myocardial Infarction (I21) disease.Meanwhile, [12] conducted research on Individual risk prediction model for incident cardiovascular disease using the Bayesian approach.
Research in the application of CBR to make a diagnosis has also been conducted with various degrees of accuracy, while the application of Minkowski method has been performed for certain purposes with a fairly good level of accuracy.This research conducted in this paper applied CBR to diagnose type I21 cardiovascular disease.The diagnostic process involved symptoms, risk factors, age and gender of the patient.The calculation of similarity used weighted Minkowski method.The research was expected to generate a system capable of diagnosing cardiovascular disease, especially type I21 with a good level of accuracy.
Meanwhile, Minkowski central partition model by [13] for the pointer to a suitable distance exponent and consensus partitioning using developed clustering algorithm capable of computing feature weights.In [14], Minkowski metric for feature weighting and anomalous cluster is initializing in K-Means Clustering.

III. PROPOSED METHOD
The system input was in the form of risk factors and symptoms data of the patient's disease, and then the data were made into a case.There were two types of cases namely target case and source case.The source case was the case data entered into the system that served as knowledge for the system, while the target case was a new case data of which the solution to be sought.
The diagnostic process began with inserting patients' data, risk factors and symptoms experienced by the patient and then the similarities to the case stored were counted.Each feature had a certain value of weight obtained from the experts.The similarity between features was calculated using local similarity formula, and then calculated as a whole using global equality formula.
The calculation resulted in each case was sorted from the highest value to the lowest value.The highest value was the case most similar to the new case.The value of similarity ranged from 0 to 1 (in the percentage from 0% to 100%).If the value was smaller than the threshold value that was ≥ 0.8, the solution of the case must be re-shared by the expert.The system output was the name of the disease most similar to the new case.

A. Knowledge Acquisition
Case base would be formed from a collection of medical record data of cardiovascular disease inpatient of Dr. Sardjito public Hospital, Yogyakarta.The next stage was to make knowledge acquisition process to collect knowledge data from the knowledge source.The source of knowledge was obtained from an expert (cardiovascular disease specialist / SpJP).In addition to the expert, knowledge material was also derived from the literature related to the problem, such as books, journals, articles, etc.

B. Case Representation
The representation is intended to capture the essential properties problems and make that information accessible to the problem-solving procedure [3].Case data obtained from medical records were stored in a case base.The collected cases were represented in the form of a frame.The frame contained the relation among the patient data, the illness, the risk factors and the symptoms of the case.Levels of confidence/trust were given on the relationships of these data so that the case for the CMB system could be made based on the representation in which problem space was the risk factor and the symptoms of the disease and solution space where the name of a disease.
Every risk factor and symptom has a weight that indicates the level of importance of the disease.The weight value ranges from 1 to 10 and the greater the value of the weight, the more important the risk factor or symptom determine the patient's disease.The level of confidence showing the sureness of the diagnosis of the expert is based on the risk factors and symptoms experienced by patients.

C. Indexing
The index on a record consists of two parts, search-key (value) and pointer.The search key is the value of a record while a pointer is index position of the search key.Case data searching in the retrieving process requires one or more search keys.In the development of CBR for Cardiovascular disease, two search keys have been developed namely risk code and symptom code.

D. Retrieval and Similarity
Retrieval is the core of the CBRthe process found in the case-base, the cases closest to the current case.The most commonly investigated retrieval techniques so far are the knearest neighbour, decision tree and its derivative.This technique uses a similarity metric to determine the size of similarity between cases [1].In this study, the similarity method used referred to Equation (1) [10].
With ( ) is similarity value between case C i (new case) and case C j (old case), n is number of attributes in each case, k is individual attributes, ranging from 1 to n, w is weight given 1 to k attribute and r is Minkowski factor (positive integer).www.ijacsa.thesai.org The r value was the positive number ≥ 1, (from 1 to infinite).The research presented in this paper used r=3.The previous research conducted by [11] showed that with the use of r=3 resulted in maximum accuracy.
The weighted of features in diagnosing cardiovascular disease was necessary because of the difference between particular features.Weight value was obtained from experts/cardiovascular disease specialists.
Due to the weight difference given to features for each case and the handling of new symptoms that may arise in the new case, the equation (1) introduced by [10] needs to be modified.Modifications to deal with similar problems have been carried out by [3] by adding the value of trust and handling of new symptoms as shown in equation ( 2).
With ( ) is similarity normalization with the level of trust, ( ) is expert confidence level on a case in source case, ( ) is number of symptoms of target case appear in source case and ( ) is the number of symptoms in the target case [14].
The modification was made in equation ( 1) with reference to equation ( 2) so that the research conducted used equation ( The similarity of each aspect in two cases is computed by a particular local similarity function.The local similarity values are aggregated by means of a sum of weighted aspects [15].Local similarities are divided into two types namely symbolic and numeric.Features involved in symbolic is symptoms while in numeric are age, gender, smoking habit, body weight and so forth.The symbolic feature was calculated by using an equation ( 4) [12].
The numeric feature was calculated by using an equation ( 5) [13].
If the similarity level is high, the case will be reused, in which the old case solution will be reused as a new case solution.If there is no case that meets the threshold value, the expert needs to give a conclusion to the new case.

E. Case Revision
The case revision is the part of system adaptation performed by an expert.The expert would revise the name of the disease and the level of confidence of the disease as the result of the diagnosis which has similarities lower than 0,8.After being revised, the case becomes a new case base.

F. System Implementation
The system is divided into 2 categories based on user types namely expert and paramedic.Each category of the user has access to a system with different facilities.Expert administration has access to add new users to the system, enter knowledge data, enter and revise cases as the result of a diagnosis and to diagnose them.Users with paramedic type have access to input patients' data, diagnose new cases and store new cases.

G. System Assessment
System assessment is carried out by performing diagnostic tests to measure the system's ability to detect disease or exclude a person without the disease.In [2] explained that sensitivity and specificity are used to determine the accuracy of diagnostic tests.Predictive values can be used to estimate disease probabilities, but positive predictive values and negative predictive values vary according to the prevalence of disease.
The analysis was conducted by using 4 parameters namely TP, FP, TN and FN and then they subsequently were used in calculating sensitivity, specificity, PPV and NPV.Calculation of the values using equations ( 6), ( 7), ( 8), ( 9) [16].) www.ijacsa.thesai.org With TP (True Positive) is Positive diagnosis results for positive data samples, TN (True Negative) is Negative diagnosis Results for negative data samples, FP (False Positive) is Positive diagnosis results for negative data samples, FN (False Negative) is Negative diagnosis results for positive data samples, P is Total of positive diagnosis results and T is the total of negative diagnosis results.These values will appear in the Confusion matrix.
According to [16], the Confusion matrix is a useful way to analyze how well the system recognizes the tuples of different classes.TP and TN provide information when the system is correct, while FP and FN notify when the system is incorrect.Sensitivity and specificity can be used for the classification of accuracy.Sensitivity can be designated as true positives (recognition) rate (the proportion of correctly identified positive tuples).While specificity is true negatives rate (the proportion of the correctly identified negative tuples).The function of sensitivity and specificity can be used to show the accuracy level by equation (10) and the level of the system error rate can also be calculated by equation (11).

A. Case Base Filling Process
The initial stage of the use of the CBR system was the preparation and filling of base case.The case data inputted in the case base were medical records of inpatients obtained from the medical records installation of Dr. Sardjito Public Hospital, Yogyakarta.There were 126 cases with 74 symptoms and 9 types of risk factors that of class I21 disease (Acute Myocardial Infarction).
Symptoms and risk factors have a weight that indicates the level of importance of the symptoms or the risk factors.The weight of symptoms or risk factors were obtained based on expert data ranging from 1 to 10.Before filling the case base, users must first input patients' data, disease data, Symptom data and risk factor data into the system.

B. Diagnostic Process
Generally, the process of diagnosis of cardiovascular disease can be performed by doctors in several ways.The first way is to consider the risk factors and symptoms (felt by the patient).It is known as anamnesis.Another way is to perform laboratory tests to ensure the diagnosis.In the CBR system, the system performs the diagnostic process by anamnesis.
The diagnostic process began with selecting patient data to be diagnosed then entering symptoms data and risk factors felt by a patient into the system by utilizing the input facilities provided.Having all data entered, the system would perform retrieve process and calculate the similarity level between the new case and the case in the case base using weighted Minkowski.
Each case was calculated based on 4 components namely symptoms, risk factors, gender and age.Symptoms were assessed on the basis of the appearance of a symptom in a case.If a certain symptom appeared then it was valued 1 and 0 if otherwise.Gender was categorized into male and female, while patient age was grouped in 6 age ranges.Several risk factors were assessed by their appearance in a case (such as symptom assessment), but there were two risk factors calculated based on a certain range namely disease history and smoking.
For example, there was a case in the case base as shown in Table 1.The user diagnosed a new patient with data entered into the system as shown in Figure 1.Based on the case example, the system performed the process of calculating both cases after the user had clicked the Diagnose Result button.
The process of calculating the local similarity in the case was divided into 4 (four) sections namely of age, gender, risk factors and symptoms.The calculation of age and risk factors used equation ( 5), while the proximity of sex and symptoms used equation ( 4), and the global similarities were calculated by using equation (3).Gender proximity = 0, due to the difference of old case and a new case.Symptoms proximity: Symptoms G004, G014, G021, G028, G030, G036, G040, G052, G062 are valued 1 because both cases have the symptoms.Symptom G062 is valued 0 because only old case (case base) has the symptom.Disease I21.1, with 100% confidence level www.ijacsa.thesai.org The above calculation shows the percentage of the system's ability to recognize disease I21 correctly is 100% (sensitivity), the percentage of system ability to recognize disease which is not I21 correctly was 83,33% (specificity), positive predictive value was 94,74% (PPV), negative predictive value was 100% (NPV), And the accuracy was 95,83% with an error rate of 4,17%.

V. CONCLUSION
Based on the research and the results of the tests that have been conducted, it can be concluded that this study resulted in a case-based reasoning system with weighted Minkowski similarity calculation method used to perform early diagnosis of cardiovascular disease.This system performed a diagnostic process by taking into account the proximity between the case base and the target case based on the patient's condition (symptoms and risk factors), sex and age.The test results of the system for early diagnosis of cardiovascular disease using medical records of patients with disease I21 (based on case basis) and medical records of patients with I50 disease (not in accordance with the case basis), indicated that the system was able to recognize the disease I21 correctly (Sensitivity) of 100%, recognize non-I21 disease (specificity) of 83,33% with accuracy of 95,83% and error rate of 4,17%.

TABLE I .
CASE BASE SAMPLE