Diabetes Prediction Empowered with Multi-level Data Fusion and Machine Learning

—Technology improvements have benefited the medical industry, especially in the area of diabetes prediction. In order to find patterns and risk factors related to diabetes, machine learning and Artificial Intelligence (AI) are vital in the analysis of enormous volumes of data, including medical records, lifestyle variables, and biomarkers. This makes it possible for tailored management and early discovery, which might revolutionize healthcare. This study examines how machine learning algorithms may be used to identify diseases, with an emphasis on diabetes prediction. The Proposed Diabetes Prediction Empowered with Mutli-level Data Fusion and Machine Learning (DPEMDFML) model combines two distinct types of models—the Artificial Neural Network (ANN) and the Support Vector Machine (SVM)—to create a fused machine learning technique. Two separate datasets were utilized for training and testing the model in order to assess its performance. To ensure a thorough evaluation of the model's prediction ability, the datasets were split in two experiments in proportions of 70:30 and 75:25, respectively. The study's findings were encouraging, with the ANN algorithm obtaining a remarkable accuracy of 97.43%. This indicates that the model accurately identified instances of diabetes, indicating a high degree of accuracy. A more thorough knowledge of the model's prediction ability would result from further assessment and validation of its performance using various measures.


INTRODUCTION
The chronic metabolic condition known as diabetes affects millions of people worldwide.The World Health Organization projects that by 2030, 643 million people worldwide will have diabetes, up from an expected 537 million in 2021 [1].Diabetes is brought on by abnormalities in insulin synthesis or function, which hinder the body from effectively managing blood sugar levels.All ages are impacted, and if it is not treated, it might have detrimental implications on one's health.The body's immune system wrongly assaults and destroys pancreatic insulin-producing cells in autoimmune type 1 diabetes [2].It usually appears during childhood or adolescence and necessitates lifelong insulin medication.Obesity, inactivity, and poor eating habits are commonly linked to the majority of type 2 diabetes cases [3].Type 2 diabetes is differentiated by a decrease in the body's ability to produce enough insulin to maintain normal blood sugar levels or by an increase in insulin resistance [3].Numerous consequences can result from unmanaged diabetes.For diabetics, cardiovascular disease, such as heart attacks and strokes, is a major worry.Kidney issues, nerve damage (neuropathy), retinopathy, and foot issues are some of the consequences of diabetes [4].One's quality of life may be significantly impacted by these problems, which need continual medical care.Traditional diabetes prediction systems confront a number of problems.These techniques frequently depend on simplistic statistical models or rudimentary machine learning algorithms, which are incapable of capturing the intricate interplay of many risk variables.Furthermore, these techniques may underutilize the potential of accessible data sources such as patient medical records, genetic information, lifestyle variables, and environmental factors.As a result, the accuracy and reliability of diabetes prediction using these traditional methods are inadequate.A subset of artificial intelligence called machine learning has completely changed several industries, including the healthcare industry.It involves developing algorithms and models that are able to absorb knowledge from data and act or anticipate without being explicitly programmed.The medical sector's decision-making processes for disease prediction, diagnosis, and treatment have showed great promise when using machine learning techniques.Researchers have investigated the merging of different ML methods for diabetes prediction in order to overcome the limitations of existing methodologies (Table I).Fusing several algorithms enables for the use of each method's distinct strengths while correcting for their particular flaws and improving forecast accuracy.A fused machine learning model can give a more thorough and holistic view of the condition by merging diverse data sources such as electronic health records, medical imaging, genetic profiles, and lifestyle data [5].An ML-based diagnostic system can help detect diabetic patients early on which leads improve patient outcomes and help lessen the burden of diabetes on individuals and healthcare systems.This paper presents a unique framework utilizing machine learning fusion to achieve early diagnosis of diabetes patients.The system goals to increase the accuracy and efficacy of diabetes diagnosis by combining various machine learning algorithms and diverse datasets.This approach leads to proactive healthcare interventions and ultimately improves patient outcomes.
The Proposed Diabetes Prediction Empowered with Mutlilevel Data Fusion and Machine Learning (DPEMDFML) model framework is presenting diabetes disease prediction.It www.ijacsa.thesai.org is carried out using the ANN and SVM algorithms, while using two different datasets.
The IoMT is necessary for enhancing the accuracy, reliability, and efficacy of electronic equipment in the medical field.By integrating the existing health care assets and medical facilities, experts are advancing a digital medical system [6].The control of infectious disease waves is eased by prompt diagnosis and improved ongoing treatment.The internet of medical things (IoMT) is a growing area of technology that is now being used to assist Point-of-care testing (POCT).Using the IoMT, POCT devices may operate wirelessly and be connected to health professionals and medical facilities [7].
Recently has been discovered that developed ANNs may perform well in a variety of circumstances due to ANNs' universal prediction capabilities and adaptable network architectures [8].The building block of the ANN created to mimic the function of a human neuron.Also, one of the greatest methods for analyzing data is the use of SVM.To control data, they utilize generalization controlling [9].SVM is an artificial intelligence method that assigns labels to things by learning from examples [10].The innovative and promising IoMT framework presented in this study represents a significant leap forward in the realm of diabetes disease prediction.Drawing upon the capabilities of two cutting-edge machine learning algorithms, ANN and SVM, this framework exemplifies the fusion of advanced technology and healthcare, offering a transformative approach to diabetes management and patient care.At its core, the IoMT framework capitalizes on the vast amount of data generated by interconnected medical devices, wearable sensors, and health monitoring systems.By harnessing this continuous and diverse stream of patient-specific information, healthcare providers gain unprecedented insights into the multifaceted aspects of diabetes, allowing for more precise, proactive, and personalized interventions.The first pillar of the framework, Artificial Neural Networks (ANN), represents a sophisticated computational model inspired by the complex interconnections of neurons in the human brain.ANN's ability to learn from data and recognize intricate patterns and nonlinear relationships makes it an ideal candidate for diabetes prediction.The network's architecture is meticulously designed, leveraging multiple layers of interconnected neurons to extract high-level features from raw input data.The ANN's adaptability enables it to adjust its internal parameters during the learning process, optimizing the model's performance to achieve highly accurate diabetes predictions.In tandem with ANN, the IoMT framework also incorporates the renowned Support Vector Machine (SVM) algorithm, renowned for its prowess in binary classification tasks and its ability to handle complex decision boundaries.SVM's kernel-based approach allows it to efficiently discover non-linear patterns in the feature space, making it invaluable for diabetes prediction when the relationship between features and disease occurrence is intricate and not easily separable.
By integrating the capabilities of both ANN and SVM, the IoMT framework achieves a powerful ensemble of predictive models that complement each other's strengths.The diversity of these algorithms enhances the framework's ability to capture subtle nuances and intricate interactions within the data, ultimately leading to more reliable and accurate diabetes predictions.Data privacy and security are of paramount concern within the IoMT framework.Stringent measures are implemented to anonymize and safeguard patient information, and access controls are enforced to protect sensitive data from unauthorized disclosure.The framework's design ensures that data is utilized solely for model training purposes, mitigating the risk of data breaches and preserving patient confidentiality.The synergistic integration of ANNs and SVM algorithms within the IoMT framework marks a significant step towards personalized and data-driven diabetes prediction.With the potential to revolutionize healthcare practices, this cutting-edge approach empowers clinicians with actionable insights, fosters early detection, and facilitates effective diabetes management, ultimately enhancing the quality of life for patients worldwide.
The structure of the research paper is as follows: Section II represents the related work.In Section III, the contribution is presented.The detail of the proposed model is described in Section IV. Discussion and analysis of results are discussed in Section V.The conclusion of this research is presented in Section VI.

II. RELATED WORK
The presented findings encompass various studies that examined different healthcare databases and utilized diverse approaches and strategies to make predictions.Researchers have developed and employed a range of prediction models, incorporating various data mining techniques, algorithmic methods for machine learning, or even a combination of these strategies.These studies highlight the wide array of approaches utilized in healthcare research to enhance prediction accuracy and improve decision-making processes.
Akkarapol and Jongsawas [11] presented a paper that analysed a dataset comprising 50,788 records with 43 parameters.The research identified significant risk variables, including age, BMI, overall revenue, sex, heart attack history, marital status, dentist check-up frequency, and diagnosis of asthma.Other risk factors such as hypertension and cholesterol were also recognized.The study's overall reliability was reported as 77.11%, indicating a moderate level of consistency in the findings.Furthermore, the true negative rate specifically for the Artificial Neural Network (ANN) model was noted as 79.45%, indicating its ability to accurately identify negative cases.[12] focused on evaluating data mining and machine learning techniques for DM research.Through the systematic comparison of three algorithms, including Logistic Regression, Naive Bayes, and SVM, using 10-fold cross-validation, the study concluded that SVM achieved the highest accuracy rate of 84%.These findings contribute to the understanding of algorithm selection in DM research, highlighting the potential benefits of SVM in achieving accurate predictions and improving decisionmaking processes.

Kavakiotis et al.'s paper
Xue-Hui Meng et al.'s study [13] focused on comparing the performance of decision tree models, ANNs, and logistic www.ijacsa.thesai.orgregression in diagnosing diabetes or prediabetes based on general risk variables.The logistic regression model achieved a classification accuracy of 76.13%, indicating its ability to correctly classify individuals as having diabetes or prediabetes based on the general risk variables considered in the study.The decision tree model (C5.0) demonstrated a slightly higher classification accuracy of 77.87%.It also showed a relatively high sensitivity of 80.68%, meaning it successfully identified a large proportion of True Positive (TP) cases, and a specificity of 75.13%, indicating its capability to accurately identify True Negative (TN) cases.In contrast, the ANN model obtained a lower classification accuracy of 73.23%, suggesting that it was less effective in predicting the disease outcomes using the same set of general risk variables.
The research work conducted by Md.Faisal Faruque, Asaduzzaman, and Iqbal [14] focused on exploring the relationship between Diabetes Mellitus and multiple risk factors through the analysis of 16 attributes including factors such as age, diet, hypertension, vision problems, and genetic predisposition.By utilizing four popular machine learning algorithms, the researchers examined data from 200 patients.The findings of the study indicated that the Decision Tree algorithm demonstrated superior predictive performance compared to Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbour (KNN) algorithms in this particular study, suggesting its potential efficacy in predicting or classifying the disease based on the identified risk factors.The study conducted by Dey et al. [15] utilized four wellknown supervised machine learning algorithms: SVM, KNN, Naive Bayes, and ANN with MMS.These algorithms were selected for their ability to learn from labelled data and make predictions based on learned patterns and relationships to analyse the Pima Indian dataset.The study revealed that the ANN model with MMS achieved the highest accuracy rate of 82.35%, indicating its potential effectiveness in predicting the specific outcome compared to the other four algorithms examined.
Pradhan et al. research [16] employed supervised learning, which involves training models on labelled data to make predictions, to develop models for diabetes diagnosis.Additionally, they utilized hybrid learning, which combines multiple learning techniques, to further enhance the performance of the diagnostic models.Finally, the researchers explored ensemble learning, a powerful approach that combines the predictions of multiple individual models, to create a more robust and accurate diabetes diagnosis model.The results of the study demonstrated that the ensemble learning approach surpassed both supervised learning and hybrid learning in terms of accuracy.

III. CONTRIBUTION
In contrast to previous research, this Diabetes Prediction Empowered with Multi-level Data Fusion and Machine Learning (DPEMDFML) model represents a more comprehensive study that explores various commonly used techniques for diabetes identification.The primary objective is to compare the performance of these techniques and identify the most effective one.It has been accomplished by employing two distinct algorithms and evaluating them on two different datasets, considering all relevant evaluation metrics.Furthermore, this study delves into analyzing the significance of each attribute in influencing the classification outcome.This analysis provides valuable insights for future research to adapt and improve the dataset, making it more informative and suitable for diabetes diagnosis tasks.

IV. PROPOSED MODEL
The Diabetes Prediction Empowered with Multi-level Data Fusion and Machine Learning (DPEMDFML) model developed here seeks to predict diabetes in a smart healthcare system utilizing data from the Internet of Medical Things (IoMT) is divided into two stages: training and testing as shown in Fig. 1.During the Training Phase, hospitals (Hospitals A, B, C, and N) use IoMT devices to gather patient data, which is subsequently recorded in their respective local databases.This information might include vital indicators, blood glucose levels, lifestyle information, and other information.The 'Prediction Layer,' which houses multiple ML models, with a focus on Support Vector Machines (SVM) and Artificial Neural Networks (ANN), is at the core of this phase.www.ijacsa.thesai.orgThese models excel at classification tasks and are in charge of learning whether a patient has diabetes depending on the input data.Following the Prediction Layer, Fig. 1 shows the Performance Layer assesses the efficiency of the ML models by employing measures such as accuracy, miss rate, and sensitivity.Models that match the performance criteria are saved in the public cloud as the "DPEMDFML Generalized Model", while those that fall short go through additional training rounds to enhance accuracy.
The trained DPEMDFML Generalized Model is used in the Testing Phase.When new patients from Hospital N seek diabetes diagnosis, the system gathers raw data from IoMT devices, which is then processed.Data cleansing, value normalization, and missing data management are examples of pre-processing operations that ensure the input data is ideal for the ML models' predictions.
The DPEMDFML Generalized Model is then used to predict whether or not the patient has diabetes.This decisionwww.ijacsa.thesai.orgmaking procedure has two results: If diabetes is predicted by the model, the patient is directed to a specialist for prompt medical intervention.If the model predicts a poor outcome, the data is properly deleted, protecting patient confidentiality and data privacy.
Because the system is distributed, various hospitals can contribute data, resulting in a broad and complete dataset for model training.Furthermore, the cloud-based architecture improves accessibility and scalability, allowing the system to meet growing data volumes as well as changing healthcare demands.The system benefits from the capabilities of SVM and ANN as its major ML models in pattern recognition, feature extraction, and classification, results in accurate diabetes predictions.Furthermore, the system's iterative training technique allows for continuous development, keeping the models current with medical advances.
The relevance of this ML-driven approach resides in its potential to improve diabetes diagnosis and patient treatment.The approach leverages the available information by utilizing data from IoMT devices across many hospitals, resulting in more reliable and exact predictions.The capacity to detect diabetic patients quickly and give early medical treatment assures improved disease control and perhaps improves patient outcomes.As the system evolves, its influence on the healthcare environment is expected to go beyond diabetes diagnosis, with the ability to tackle additional medical difficulties utilizing a similar distributed, ML-based approach.
The distributed, cloud-based machine learning system for diabetes detection using IoMT data is a potential improvement in healthcare technology.Its training and testing phases, which are supported by SVM and ANN models, show that it can handle complicated medical data and make correct predictions.As the system evolves via iterative training and embraces an ever-growing dataset, it is positioned to impact the future of medical diagnosis, eventually improving patient care and contributing to the healthcare industry's continuing transformation.

A. Datasets
Diabetes Prediction Empowered with Mutli-level Data Fusion and Machine Learning (DPEMDFML) Model used two different datasets: The primary dataset employed in this research is the PIMA Indian Diabetes Database, accessible at the University of California machine learning repository [14].The dataset encompasses information from 768 individuals, all of whom are female, and their ages span from 21 to 81 years.For each individual, the dataset consists of nine distinct feature characteristics.These feature characteristics include eight continuous quantitative variables, namely the number of pregnancies, blood sugar level (in mg/dL), diastolic blood pressure (in mmHg), skin fold thickness (in mm), body mass index (BMI), serum insulin level (in mU/mL), age (in years), and a pedigree function associated with diabetes.By utilizing this comprehensive dataset, the study aims to explore the relationships between these feature characteristics and diabetes occurrence, enabling the development of predictive models for early detection and assessment of diabetes risk in female patients.
For the second dataset used in this paper, it is called the "Diabetes prediction dataset," sourced from Electronic Health Records (EHRs) [15].The dataset encompasses information from a substantial sample of 100,000 individuals, which were collected from diverse healthcare providers and then aggregated into a unified dataset.It is noteworthy that this dataset includes both female and male participants.The Diabetes prediction dataset consists of eight distinctive feature characteristics for each individual.These features include age, gender, hypertension, heart disease, smoking history, BMI (body mass index), HBA1C level (glycated haemoglobin level), and glucose level.By utilizing this comprehensive dataset, the study aims to explore the relationships between these feature characteristics and diabetes prediction.The inclusion of both genders and the diverse range of feature characteristics in this dataset facilitate a comprehensive analysis, providing valuable insights into predicting diabetes and its associated risk factors.

V. RESULTS AND DISCUSSION
This section showcases the results of diabetes prediction using two different machine learning models: Support Vector Machine (SVM) and Artificial Neural Network (ANN).The prediction is conducted on two distinct datasets, and each dataset is split into two different ratios for training and testing: 70:30 and 75:25.Then, a range of evaluation metrics are calculated, include accuracy, miss-classification rate, sensitivity, specificity, precision, False positive (FP) rate, False discovery rate, False omission rate, Positive likelihood ratio, Negative likelihood ratio, Prevalence threshold, critical success index, F1 Score, Mathews Correlation coefficient, Fowlkes-Mallows Index, informedness, and Diagnostic odds ratio.The following equations illustrate the equations used to calculate each of these metrics, providing a clear understanding of the underlying mathematical formulas for the statistical measurements [17][18][19][20][21][22][23].The utilization of this diverse set of metrics ensures a comprehensive assessment of the models' performance, accounting for different aspects of predictive accuracy and error rates.Python is utilized as the simulation tool for implementing both the SVM model and ANN model, to obtain the results. (1) √ ( 14) (17)

A. DPEMDFML -SVM System Model -using Pima Diabetes Dataset -70:30
Using SVM model with the Pima Diabetes Dataset, the dataset is divided as: 30% for testing (n=231) and 70% for training (n=537) to assess the model's performance accurately.The performance evaluation of the SVM model is depicted in Table II and Table III, which illustrate the confusion matrix.The confusion matrix provides crucial insights into the model's predictive accuracy, enabling a detailed examination of how well the SVM algorithm classifies diabetes and nondiabetes cases in the dataset.
During the training phase, the SVM model's predictions for diabetes disease are presented in Table II.The training dataset consists of 537 samples, which are further categorized into 187 real positive samples, indicating the presence of diabetes, and 350 real negative samples, indicating the absence of diabetes.Among the real positive samples (indicating the presence of diabetes), the SVM model correctly identifies 117 samples as positive, accurately signaling the presence of healthcare issues.However, the model misclassifies 70 records as negatives, incorrectly suggesting the absence of healthcare issues when there is an actual health concern.On the other hand, among the real negative samples (indicating the absence of diabetes), the SVM model correctly predicts 309 samples as negative, appropriately identifying the absence of healthcare conditions.However, the model makes errors in 41 samples, wrongly classifying them as positive, inaccurately indicating the presence of a healthcare issue when there is none.
During the testing phase, the SVM model's predictions for diabetes disease are presented in Table III.The testing dataset consists of 231 samples, which are further categorized into 81 real positive samples, indicating the presence of diabetes, and 150 real negative samples, indicating the absence of diabetes.
Among the real positive samples (indicating the presence of diabetes), the SVM model correctly identifies 48 samples as positive, accurately signaling the presence of healthcare issues.However, the model misclassifies 33 records as negatives, incorrectly suggesting the absence of healthcare issues when there is an actual health concern.However, the SVM model successfully predicted 124 samples as negative, properly recognizing the lack of medical diseases among the genuine negative samples (showing the absence of diabetes).But in 26 samples, the model misclassifies them as positive, thus implying the existence of a healthcare concern when there isn't one.For the true positive cases, the SVM algorithm successfully identifies and correctly classifies 124 samples as positive, meaning that it accurately detects the absence of healthcare problems in those cases.However, the algorithm makes 79 errors by misclassifying some samples as negatives, falsely suggesting the absence of healthcare concerns when diabetes is actually present.Regarding the true negative cases, the SVM model performs well by accurately predicting and classifying 330 samples as negative, properly recognizing the absence of diabetes and the presence of other medical issues in those cases.Nevertheless, the model misclassifies 43 samples as positive, falsely indicating the presence of a healthcare issue when there is, in fact, no such health concern.www.ijacsa.thesai.orgDuring the testing phase, Table VI showcases the SVM model's predictions for diabetes disease.The testing dataset consists of 192 samples, which are further categorized into 65 real positive samples, indicating the presence of diabetes, and 127 real negative samples, indicating the absence of diabetes.Among the real positive samples (indicating the presence of diabetes), the SVM model correctly identifies 36 samples as positive, accurately signaling the absence of healthcare issues.However, the model misclassifies 29 records as negatives, incorrectly suggesting the presence of healthcare issues when there is none.On the other hand, among the real negative samples (indicating the absence of diabetes), the SVM model correctly predicts 105 samples as negative, appropriately identifying the presence of healthcare conditions.However, the model makes errors in 22 samples, wrongly classifying them as positive, inaccurately indicating the absence of a healthcare issue when there is a health concern.The SVM model was utilized in this study with the EHRs Dataset (Electronic Health Records Dataset).To ensure a robust evaluation of the model's performance, the dataset was divided into 30% for testing (n=30,000) and 70% for training (n=70,000).To assess the effectiveness of the SVM model, its performance was analysed using two distinct evaluation tables: Table VIII and Table IX, both presenting the confusion matrix.
During the training phase, the SVM model's diabetes predictions are presented in Table VIII.The training dataset consists of an extensive sample of 70,000 records, which are further categorized into 5,972 instances as positive cases, indicating the presence of diabetes, and 64,028 instances as negative cases, indicating the absence of diabetes.Among the actual positive cases, the SVM model correctly identifies 3,621 samples as positive, correctly indicating the absence of healthcare issues.However, the model misclassifies 2,351 records as negative, falsely signalling the presence of healthcare issues where there are none.On the other hand, among the actual negative cases, the SVM model accurately predicts 63,602 samples as negative, correctly identifying the presence of healthcare conditions.However, the model makes errors in 426 samples, incorrectly classifying them as positive, falsely indicating the absence of a healthcare issue.
During the testing phase, the SVM model's predictions for diabetes disease are displayed in Table IX.The testing dataset comprises 30,000 samples, which are further categorized into 2,528 true positive cases, indicating the presence of diabetes, and 27,472 true negative cases, indicating the absence of diabetes.Among the true positive cases, the SVM model correctly classifies 1,515 samples as positive, accurately indicating the absence of any healthcare issues.However, the model misclassifies 1,013 records as negative, falsely indicating the presence of healthcare issues when there are none.Conversely, among the true negative cases, the SVM model accurately predicts 27,298 samples as negative, correctly identifying the presence of healthcare conditions.Nevertheless, the model makes errors in 174 samples, incorrectly classifying them as positive, falsely indicating the absence of a healthcare issue.

D. DPEMDFML -SVM System Model -using EHRs Dataset -75:25
Here, the SVM model with the EHRs Dataset (Electronic Health Records Dataset) is employed.To ensure a robust assessment of the model's performance, the dataset was split into 25% for testing (n=25,000) and 75% for training (n=75,000).To evaluate the SVM model's effectiveness, two different evaluation tables was used to analyse its performance: Table XI XI.The training dataset consists of 75,000 samples, which are further categorized into 6,409 true positive cases, indicating the presence of diabetes, and 68,591 true negative cases, indicating the absence of diabetes.Among the true positive cases, the SVM model correctly identifies 3,876 samples as positive, accurately indicating the absence of healthcare issues.However, the model misclassifies 2,533 records as negative, falsely signalling the presence of healthcare issues where there are none.On the other hand, among the true negative cases, the SVM model accurately predicts 68,111 samples as negative, correctly identifying the presence of healthcare conditions.However, the model makes errors in 480 samples, incorrectly classifying them as positive, falsely indicating the absence of a healthcare issue.During the testing stage, Table XII showcases the SVM model's diabetes predictions.The test dataset comprises 25,000 samples, split into 2,091 true positive cases (indicating the presence of diabetes) and 22,909 true negative cases (indicating the absence of diabetes).Among the true positive cases, the SVM model accurately identifies 1,266 samples as positive, correctly indicating the absence of healthcare issues.However, the model misclassifies 825 records as negative, erroneously suggesting the presence of healthcare issues.Conversely, among the true negative cases, the SVM model precisely predicts 22,758 samples as negative, correctly recognizing the presence of healthcare conditions.However, the model makes 151 errors, incorrectly classifying them as positive, falsely indicating the absence of healthcare issues.

E. DPEMDFML -ANN System Model -using Pima Diabetes
Dataset -70:30 Shifting our focus to the second algorithm used in this research, the Artificial Neural Network (ANN) model was employed, and the Pima Diabetes Dataset was utilized for evaluation.To ensure a robust assessment of the model's effectiveness, the dataset was split into two sets: 20% for testing (n=231) and 70% for training (n=537).To gauge the performance of the ANN model, a detailed analysis was conducted using two distinct evaluation tables: Table XIV  the presence of diabetes, and 349 true negative cases, indicating the absence of diabetes.Among the true positive cases, the ANN model correctly identifies 157 samples as positive, accurately indicating the absence of healthcare issues.However, the model misclassifies 31 records as negative, falsely indicating the presence of healthcare issues.Conversely, among the true negative cases, the ANN model accurately predicts 327 samples as negative, correctly identifying the presence of healthcare conditions.However, the model makes 22 errors, incorrectly classifying them as positive, falsely indicating the absence of a healthcare issue.
During the testing phase, the ANN model's predictions for diabetes disease are shown in Table XV.The testing dataset consists of 231 samples, further divided into 80 true positive cases, indicating the presence of diabetes, and 151 true negative cases, indicating the absence of diabetes.Among the www.ijacsa.thesai.orgtrue positive cases, the ANN model correctly identifies 47 samples as positive, accurately indicating the absence of healthcare issues.However, the model misclassifies 33 records as negative, falsely signalling the presence of healthcare issues where there are none.On the other hand, among the true negative cases, the ANN model accurately predicts 116 samples as negative, correctly identifying the presence of healthcare conditions.However, the model makes 35 errors, incorrectly classifying them as positive, falsely indicating the absence of a healthcare issue.During the training phase, Table XVII showcases the ANN model's predictions for diabetes disease.Out of the 576 samples used for training, 199 are identified as real positive cases, and 377 as real negative cases.Among these, 172 are correctly identified as positive, meaning no healthcare issues have been observed, while 27 are incorrectly projected as negatives, indicating a healthcare issue is present.Regarding the 377 samples with negative results, indicating the presence of a healthcare condition, 352 samples are correctly forecasted as negative, and 25 samples are wrongly forecasted as positive, indicating the absence of a healthcare issue.
During the testing phase, Table XVIII displays the ANN model's predictions for diabetes disease.The dataset consists of 192 samples, divided into 69 real positive cases and 123 real negative cases.Among these, the model correctly identifies 45 samples as positive, indicating no healthcare issues observed, while 24 samples are incorrectly projected as negatives, suggesting a healthcare issue.For the 123 samples with negative results, indicating the presence of a healthcare condition, the model appropriately forecasts 92 as negative, and 31 samples are wrongly forecasted as positive, indicating the absence of a healthcare issue.

G. DPEMDFML -ANN System Model -using EHRs Dataset -70:30
Utilizing the same algorithm, the ANN model applied to the second dataset, referred to as the EHRs Dataset (Electronic Health Records Dataset).To achieve a comprehensive evaluation of the model's performance, the data set was split as: 30% for testing (n = 30,000) and 70% for training (n = 70,000).The effectiveness of the ANN model was assessed through a thorough analysis of its performance using two separate evaluation tables: Table XX  In this phase, the model uses a dataset consisting of 70,000 samples, which are further divided into 5,972 real positive cases and 64,028 real negative cases.Among the real positive cases, 4,265 samples are correctly identified as positive, indicating the absence of healthcare issues.However, 1,707 samples are incorrectly classified as negatives, implying potential healthcare concerns.Regarding the real negative cases, which represent the presence of a healthcare condition, the model accurately predicts 63,938 samples as negative, indicating the presence of healthcare issues.However, 90 samples are falsely predicted as positive, suggesting the absence of healthcare issues, when in fact, they should have been classified as negative.During the testing phase, Table XXI demonstrates the ANN model's performance in predicting diabetes disease.The dataset used for testing consists of 30,000 samples, which are further divided into 2,528 actual positive cases and 27,472 actual negative cases.The model correctly identifies 1,754 positive cases, indicating the absence of healthcare issues.However, it mistakenly classifies 774 positive cases as negative, suggesting possible healthcare concerns.For the actual negative cases, which indicate the presence of healthcare conditions, the model accurately predicts 27,368 samples as negative.This demonstrates its ability to identify the presence of healthcare issues correctly.Nevertheless, there are 104 false positive predictions, where the model incorrectly identifies cases as negative, indicating the absence of healthcare issues when they should have been classified as positive.falsely signalling the presence of a healthcare condition.Out of the 68,591 negative results, which indicate the presence of a healthcare condition, the model correctly forecasted 68,472 samples as negative, demonstrating its effectiveness in correctly identifying such cases.However, there were 119 samples that were inaccurately forecasted as positive, indicating the absence of a healthcare issue when it was present.
During the testing phase, Table XXIV presents the predictions made by the ANN model for diabetes disease.The dataset used for testing comprises 25,000 samples, which are further divided into 2,091 real positive cases and 22,909 real negative cases.The model accurately identified 1,461 samples as truly positive, indicating the absence of healthcare issues.However, it misclassified 630 records as negatives, falsely signaling the presence of a healthcare condition.Out of the 22,909 negative results, which indicate the presence of a healthcare condition, the model correctly forecasted 22,827 samples as negative, demonstrating its effectiveness in correctly identifying such cases.However, there were 82 www.ijacsa.thesai.orgsamples that were inaccurately forecasted as positive, indicating the absence of a healthcare issue when it was present.
Table XXV provides a comprehensive summary of the ANN model's performance during the training phase, displaying various evaluation metrics.The ANN model achieved the following percentages for each metric: 97.40% for accuracy, 2.59% for miss-classification rate, 71.49% for sensitivity, 99.82% for specificity, 97.96% for precision, 0.17% for the False positive rate, 2.53% for the False discovery rate, 28.50% for the False omission rate, 41208.32% for the Positive likelihood ratio, 28.55% for the Negative likelihood ratio, 35.83% for the Prevalence threshold, 71.31% for the critical success index, 82.48% for the F1 Score, 82.25% for the Mathews Correlation coefficient, 97.09% for the Fowlkes-Mallows Index, 71.31% for informedness, and 144305.40%for the Diagnostic odds ratio.During the testing phase, the ANN model achieved the following percentages for each evaluation metric: 97.51% for accuracy, 2.84% for missclassification rate, 69.87% for sensitivity, 99.64% for specificity, 94.68% for precision, 0.35% for the False positive rate, 5.51% for the False discovery rate, 30.12% for the False omission rate, 19520.38%for the Positive likelihood ratio, 30.23% for the Negative likelihood ratio, 35.11% for the Prevalence threshold, 69.51% for the critical success index, 80.40% for the F1 Score, 79.96% for the Mathews Correlation coefficient, 96.82% for the Fowlkes-Mallows Index, 69.51% for informedness, and 64557.19%for the Diagnostic odds ratio.The results of DPEMDFML model on the EHRs diabetes dataset indicate that the ANN model outperformed other algorithms in both the 70:30 and 75:25 ratio splits.With the 70:30 split, the ANN model achieved an impressive accuracy of 97.43%, showcasing its robustness in correctly classifying diabetes cases.
Similarly, in the 75:25 split, the ANN model maintained a high accuracy of 97.40%, further validating its effectiveness in handling the dataset.On the other hand, the SVM model also showcased commendable results on the same EHRs diabetes dataset.In the 70:30 split, the SVM model achieved an accuracy of 96.03%, demonstrating its potential to effectively classify diabetes cases.
In the 75:25 split, the SVM model maintained a high accuracy of 95.98%, further highlighting its capability to handle varying data proportions.

VI. CONCLUSION
In summary, this research offers a distinctive and thorough investigation of the application of machine learning approaches for diabetes detection.The proposed DPEMDFML model shows improved accuracy in predicting diabetes disease compared to earlier efforts by using two separate algorithms and two different datasets.The comprehensive assessment tables show that the SVM and ANN models performed well during both the testing and training periods.The suggested framework's use of machine learning fusion has the potential to diagnose diabetes earlier, resulting in proactive healthcare treatments and better patient outcomes.This work advances the field of diabetes diagnostic research by offering insightful information on the efficacy of various algorithms and datasets.The findings open the way for further study and model enhancement, with the goal of facilitating improved and more accurate diabetes detection in clinical situations.In future, we will incorporate more recent datasets to enhance the study's relevance and accuracy.

TABLE IV .
SVM MODEL'S (PIMA DIABETES DATASET) EVALUATION METRICS, 70:30 The performance evaluation of the SVM model is depicted in Table V and Table VI, which illustrate the confusion matrix.Table V demonstrates the performance of the SVM model in predicting diabetic illness during the training phase.The training dataset comprises 576 samples, with 203 being true positive cases, indicating the presence of diabetes, and 373 being true negative cases, indicating the absence of diabetes.

Table X
and Table XII, which present the confusion matrix.During the training phase, the SVM model's diabetes predictions are presented in Table

TABLE XII .
SVM MODEL'S -EHRS DIABETES DATASET -TESTING PHASE -75:25 and Table XV.These tables present the confusion matrix, providing valuable insights into the model's ability to deliver accurate predictions during both the testing and training phases.During the training stage, Table XIV illustrates the ANN model's predictions for diabetes disease.The training dataset consists of 537 samples, further divided into 188 true positive cases, indicating

TABLE XV
Once more, the ANN model was utilized with the Pima Diabetes Dataset.The dataset here was split into 25% for testing (n=192) and 75% for training (n=576) to ensure a thorough evaluation of the model's performance.The performance metrics of the ANN model are presented in Table XVII and Table XVIII, displaying the confusion matrix results.

Table XIX
and Table XXI.These tables present detailed information from the confusion matrix, offering insights into the model's performance during both the testing and training phases.During the training phase, Table XX displays the outcomes of the ANN model's predictions for diabetes disease.

TABLE XXI .
ANN MODEL'S -EHRS DIABETES DATASET -TESTING PHASE -70:30 In this study, the Artificial Neural Network (ANN) model was utilized to analyse the Electronic Health Records Dataset (EHRs Dataset).To ensure a rigorous evaluation of the model's capabilities, the dataset was split into 25% for testing, comprising 25,000 samples, and 75% for training, with 75,000 samples.The effectiveness of the ANN model was thoroughly assessed using two distinct evaluation tables: TableXXIIIand Table XXIV, which offer a detailed view of the confusion matrix and facilitate an in-depth analysis of the model's performance.During the training phase, Table XXIII depicts the predictions made by the ANN model for diabetes disease.The dataset used for training consists of 75,000 samples, which are further categorized into 6,409 real positive cases and 68,591 real negative cases.The model accurately identified 4,582 samples as truly positive, indicating the absence of healthcare issues.However, it misclassified 1,827 records as negatives,

TABLE XXVI .
PERFORMANCE OF PROPOSED DPEMDFML MODEL W.R.T PIMA DATASET AND EHRS DATASET Table XXVII presented provides an overall comparison of the proposed DPEMDFML model with the previous works mentioned.The results clearly demonstrate that the accuracy of the proposed model has outperformed all the other accuracies reported in the mentioned works, using both of the employed algorithms.

TABLE XXVII .
COMPARISON OF PROPOSED DPEMDFML MODEL WITH PREVIOUS WORKS MENTIONED