Deep Learning Models for the Detection of Monkeypox Skin Lesion on Digital Skin Images

—The study is an investigation testing the accuracy of deep learning models in the detection of Monkeypox. The disease is relatively new and difficult for physicians to detect. Data for the skins were obtained from Google via web-scraping with Python’s BeautifulSoup, SERP API, and requests libraries. The images underwent scrutiny by professional physicians to determine their validity and classification. The researcher extracted the images’ features using two CNN models - GoogLeNet and ResNet50. Feature selection from the images involved conducting principal component analysis. Classification employed Support Vector Machines, ResNet50, VGG-16, SqueezeNet, and InceptionV3 models. The results showed that all the models performed relatively the same. However, the most effective model was VGG-16 (accuracy = 0.96, F1-score = 0.92). It is an affirmation of the usefulness of artificial intelligence in the detection of the Monkeypox disease. Subject to the approval of national health authorities, the technology can be used to help detect the disease faster and more conveniently. If integrated into a mobile application, it can be members of the public to self-diagnose before seeking official diagnoses from approved hospitals. The researcher recommends further research into the models and building bigger image databases that will power more reliable analyses.


I. INTRODUCTION
In the recent past, the world has experienced a pandemic and is still recovering from its adverse effects. Unfortunately, as COVID-19 diminishes in incidence and prevalence, other infectious diseases, such as Monkeypox and Ebola, have sprung up. As of July 2022, 77 countries had reported at least one case of Monkeypox disease [1]. It raises questions on whether another pandemic is in the offing. Whether Monkeypox prevalence will worsen to become a pandemic or not, the disease has made a significant mark on life and livelihoods across the globe. While the virus is endemic to Central and West Africa [2,3], the United States is the most affected, which has reported around 27,000 cases. The Center for Disease Control and Prevention (CDC) had to raise the alert level by declaring the disease a public health emergency [4,5]. The connectivity the US has with other parts of the world socially and economically implies that it may be a matter of time before the disease spreads even further. The most challenging factor in curbing its spread is that the virus is relatively new, and physicians are still grappling with its signs and symptoms [5].
The use of artificial intelligence in the medical field is an ongoing experiment that has been recording milestones of success. The most recent accomplishment was diagnosing COVID-19 from chest X-ray images, as studies have registered close to 100% accuracy in their predictions [6][7][8]. It begs the question of whether Artificial Intelligence (AI) scientists can extrapolate this methodology and apply it to the Monkeypox scourge. According to [9], the most significant signs of the disease are evident on a patient's skin. The study reports that such patients bear a rash on their skin. It is one of the many signs an individual experiences when infected with the disease [10]. Since it is a visible mark on a patient's skin, one would argue that physicians should be able to diagnose using their highly experienced eyes. However, the biggest problem with this assumption is that most physicians encounter these cases for the first time in their careers [11]. Another problem is that no scientifically proven lab tests can accurately diagnose the disease, especially in its early stages [9].
Another problem is that the rashes exhibited by Monkeypox are almost similar to those experienced by patients suffering from measles, chickenpox, smallpox, and cowpox. One would have to compare and contrast the patients' skins to tell one from the other. Such a process requires that a physician has access to patients with all other similar diseases, which is untenable [12]. Additionally, the chances of committing errors of judgment are high. Artificial intelligence can rid the diagnosis process of these bottlenecks because of its high accuracy and proven reliability in the past. Researchers are continuously creating databases of Monkeypox and other pox images to aid in the classification and isolation of the virus to curb its spread [13]. Hence, this paper tests the accuracy of machine learning models in classifying digital skin images to detect Monkeypox. A high F1-score, accuracy rate, and convincing confusion matrices should be sufficient to provide evidence that artificial intelligence is applicable in this situation. Therefore, this study invstigates and testing the accuracy of deep learning models in the detection of Monkeypox on digital skin images by using different models: Support Vector Machines, ResNet50, VGG-16, SqueezeNet, and InceptionV3.
The literature review section examines existing evidence on the topic where the researcher discusses what other studies have accomplished or failed to do so. The methodology section formulates the study's data collection, feature extraction, selection, classification, and evaluation plan. In results section, the paper compares different models in detecting Monkeypox. Afterwards, the researcher discusses these findings alongside what other studies have reported. The conclusion section explains the implications of the research and makes recommendations based on the findings.

II. LITERATURE REVIEW
Several studies have investigated the reliability of artificial intelligence in diagnosing Monkeypox using digital skin images. The study by [14] decries the rarity of Monkeypox as the cause of the knowledge gap, which inspired the investigation. The source employed deep machine learning techniques in sourcing, preparing, and testing the image data. Findings from the research indicated AI's precision of 0.85 and a mean accuracy score of 0.83. The confusion matrices developed in the study affirm the reliability of these tests to produce accurate results. Another study on this topic is by [15], which evaluates a modified VGG-16 model. The researchers also sourced digital images from online sources and were keen only to select those with licenses. The results from the research indicate that the modified model can detect Monkeypox with an accuracy of 0.97 in the first study and 0.88 in the second study [15]. The findings present a case for using AI techniques to diagnose potential Monkeypox patients.
Some studies have utilized transfer learning techniques for feature extraction. The study by [16] employed transfer learning and GoogLeNet deep network to handle its feature extraction procedures. The paper utilized publicly available datasets to evaluate hybrid classification algorithms. Results showed that, on average, the test accuracy was 0.99. The study by [1] examined the differences between warts caused by HPV and Monkeypox. In the investigation, the researchers used DNA mapping to determine whether an individual has Monkeypox, HPV, or is healthy. Findings established that the classification algorithm managed an F1-score of 0.99 and an average accuracy score of 0.96. Similarly, the investigation by [13] used MATLAB and TensorFlow to classify skin lesion images in the detection of Monkeypox. Moreover, the mention study was particularly unique in that it created a new mobile application that would be used to scan new digital images and report the classification results. The goal was to provide a preliminary system that people with skin anomalies can use in determining whether they have a reason to worry. The results establish an accuracy score of 0.91. Even with this accuracy, the researchers still encourage people to visit hospitals for check-ups regardless of the results from the mobile application.
Some studies first built their image databases before attempting to run the analysis. An excellent example of such a study is [17]. The paper is elaborate in its approach to classifying skin lesion images to detect Monkeypox. The researchers first developed the Monkeypox Skin Lesion Dataset to include two other pox diseases, namely Measles and Chickenpox. They sourced the images from websites, case reports, and news portals. They were careful only to include publicly accessible and non-commercial images. In selecting the experimental set-up, the study adopted the 3-fold crossvalidation method. Similar to the approach by [18], the researchers then augmented this data using various techniques to create a broader database. Augmentation enhanced the dataset's size by increasing the number of images from 228 to 3,192. Classification accuracy was 0.83 for the ResNet50 model and 0.79 for the InceptionV3 model. The VGG-16 model scored 0.81 accuracy.

III. METHODOLOGY
This section explains the methodological steps taken by the researcher in obtaining and analyzing data. The first subsection outlines the researcher's data collection plan, which is a critical part of the project. The other subsections explain the steps taken by the study in feature extraction, feature selection, image classification, and model evaluation. The flowchart for the experiment is shown by Fig. 1.

A. Data Collection
The study is a research paper testing the accuracy of machine learning models in classifying digital skin images to detect Monkeypox disease. Pictorial data on the disease is still scanty. Many studies, such as [14,15], have constructed databases of Monkeypox images that the researcher could have used. Nevertheless, the researcher was interested in conducting primary research, and contributing to the discourse by giving an independent opinion about the viability of machine learning models in detecting the disease. However, the researcher established that there are several images on the web that could become potential candidates for this analysis. Hence, the study used a web-scraping tool to search Google for Monkeypox, Measles, Smallpox, and healthy skin images. Using the requests, SERP API, and BeautifulSoup, the researcher obtained images from the search engine. The three libraries mentioned above are not the only ones employed but are the most crucial in web scraping for images on Google [19]. While there were many other images, the study confined itself to common license images to avoid unnecessary copyright infringements.
The study also hired one expert physician to screen the data to confirm its validity. The researcher targeted to have an equal number of images for each class. By the end of this screening exercise, the number of Monkeypox, Measles, Smallpox, and healthy skin images was 200 (50 for each class). While some classes had more images than others, the researcher only picked 50 so that he could maintain an equal distribution of items across the classes. Most images needed further processing, which involved cropping and removing any marks that anyone could use to identify the person in the image. Furthermore, the data was augmented by adjusting brightness, rotating, modifying sharpness, zooming, and shearing. In the end, the five augmentation techniques produced 1000 images out of the 200 original images [14].

B. Feature Extraction
The next step succeeding the pre-processing stage is feature extraction. It is important in machine learning because it significantly reduces the noise during analysis [20,21]. In this case, only the most critical features will end up as potential input items. The accuracy of learned models may be improved by extracting features from input data. The process eliminates duplicate data to diminish the objects' dimensions [22]. The outcome is that the time a model takes to learn the data substantially reduces. It involves technical procedures in the background, such as combinations and transformations [23]. This study considered CNN techniques in extracting features from the images. While CNN models are primarily used in classification, they also have powerful extracting capabilities. The specific CNN models used to extract features in this study were GoogLeNet and ResNet50. GoogLeNet is a 22-layer CNN with a pretrained version that can readily images into more than 1000 object categories. ResNet50 is also a CNN but has 50 layers to it and is more computational stronger than GoogLeNet [24]. The two were used in conjunction with each other to extract features from the image dataset.

C. Feature Selection
The researcher settled for the principal component analysis technique as the feature selection method. It involves obtaining the eigenvectors of a covariance matrix with the most significant eigenvalues and then using their respective eigenvectors [25]. These values then form the basis of feature selection. In this study, the researcher squared the standard deviations of the variables to obtain their variances. Variables with the highest variances were then retained, while those with lower variances were discarded. The PCA is used as a matrix dimension reducer because it examines components and selects only those that meet a specific criterion [25]. This procedure makes the modeling process more efficient by reducing the time needed to run the machine learning procedures due to many unimportant variables.

D. Classification
The classification phase is the stage at which the detection actually occurs. Several models can be used to classify the image data into different groups depending on their features. The researcher settled five models, namely Support Vector Machines (SVM), ResNet50, SqueezeNet, VGG-16, and InceptionV3. SVM is a deep learning technique that adopts a supervised learning approach with associated learning algorithms to classify or regress items [26]. It creates a hyperplane, which is also the decision boundary and the basis of the classification [27]. ResNet50 is a 50-layered CNN [17]. It is a robust algorithm for image classification, as it won the ImageNet challenge in 2015. VGG-16 is also a convolutional neural network that is 16 layers deep [23]. Its strength is in its implementation in that it is simple to use. InceptionV3 is a convolutional neural network primarily used in image classification and object detection [18]. It is highly applicable and is one of the modules used in GoogLeNet. SqueezeNet is an 18-layer CNN mostly used for computer vision. The model was developed and is maintained by researcher resident at the University of California. A pretrained SqueezeNet model is capable of classifying several categories of items including most common objects and animals.

E. Evaluation
In evaluating the models, the researcher's interest is in their accuracy, precision, recall, and F1-scores. Accuracy is the ratio of all correct predictions against all possible predictions. The precision metric measures the ratio of truly positive predictions against the number of the actual positives in a dataset. The recall metric is almost similar to precision, as it measures the proportion of cases predicted as positive that are, in fact, positives. The F1-score is the harmonic mean of the true positive rate (recall) and the precision. Hence, this study will consider the accuracy and F1-score as the most critical metric in determining the reliability of the models in detecting Monkeypox from digital skin images. According to [28], the F1-score should be at least 0.90 for a machine learning modeling process to be effective in carrying out predictive analysis. The F1-score is used as the basis of determination because it combines the usefulness of two competing metrics (recall and precision). Nevertheless, all other metrics will be reported and analyzed. The following equations were utilized to compute these metrics [15]: Where TP is the true positives, TN is the true negatives, FP is the false positives, and FN is the false negatives.

IV. RESULTS
This section presents and analyzes the findings in training and testing the five models to detect Monkeypox disease from digital skin images dataabse explained in detail in the data www.ijacsa.thesai.org collection section. The models used in this analysis are SVM, ResNet50, VGG-16, SqueezeNet, and InceptionV3. The researcher considered a 5-fold cross-validation in enhancing the models' predictive capability since the data was not expansive enough to utilize train, validation, and splitting operations. Results in Table I show the mean metrics from the five folds. The model with the highest quality score across the five folds is VGG-16, which obtained a mean accuracy of 0.96 and a mean F1-score of 0.92. The overall performance of the five models (SVM, ResNet50, SqueezeNet, VGG-16, and InceptionV3) was reasonably close to each other. The least effective classifier model was SqueezeNet, which had an accuracy of 0.86 and an F1-score of 0.74. Other metrics (precision, recall, and individual F1-scores) are also presented in the Table I below.
In the analysis involved attempting to make predictions using the models examined above. The outcome is presented as confusion matrices for the most effective model (VGG-16) as shown in Fig. 3 below. It is also noteworthy that there were no instances that the model reported a healthy person as being infected by Monkeypox. This fact is exemplified by healthy skin row of the confusion matrices below, where no healthy skin was detected as having Monkeypox. It adds to its reliability as a detector for the disease. Appendix A shows the matrices for the other models.   Table II shows that false positives and false negatives were minimal. These metrics contribute largely to the computation of recall and precision scores. For this reason, the model achieved high recall (0.91) and precision (0.92) values.

V. DISCUSSION
The study has established that the selected models are reasonably effective in detecting Monkeypox from digital skin images. However, the model with the highest metrics is VGG-16. It is an affirmation of the findings made in [15], where the researchers found the VGG-16 model highly effective in detecting the disease by obtaining an F1-score of 0.97, higher than the current study's findings. This study adds to their findings by comparing the model with similar CNN and SVM models to ascertain that the selected model gives good results. Nevertheless, there are remarkable differences between the studies. The cited source used only the VGG-16 model, while the one in focus applied five models. The approach taken by this investigation is similar to what [17] undertook. The researchers in the cited study compared VGG-16, ResNet50, InceptionV3, and Ensemble. The researchers in [17] found that ResNet50 is a better model for detecting Monkeypox because of its high F1-score (0.84) and accuracy (0.83) scores. Nevertheless, the same study found that VGG-16 is slightly lower than ResNet50 in its prediction accuracy, as it scored an F1-score of 0.83 and accuracy of 0.81. The data used by the study is dissimilar, and it may be the cause of the slight differences in the outcomes.
The researcher in this study opted for the multi-label approach because it is sometimes not enough to distinguish Monkeypox from healthy skin. Most people looking to determine their Monkeypox status usually suspect that they may have the disease because of the changes in their skin. Hence, it is important to differentiate it from other similar diseases that manifest as skin lesions. While this study dealt with four labels, other studies have dealt with even more. The investigation by [14] worked with six labels, namely chickenpox, cowpox, healthy, measles, and Monkeypox. The study also conducted 5-fold cross-validation to classify the digital skin images. It also engaged several other CNN models, which were ResNet50, InceptionV3, DenseNet121, MnasNet-A1, MobileNet-V2, ShuffleNet-V2, and SqueezeNet. The point of coincidence between [14] and the current study is that they both used ResNet50, InceptionV3, and SqueezeNet. ShuffleNet-V2 was the most effective model, scoring the highest accuracy (0.79) and F1-score (0.67) metrics. However, the cited study did not model using SVM or VGG-16.
The 'many models' approach has also been evident in detecting other diseases aside from Monkeypox. The benefit of using these many models is that it allows the researcher to compare them and establish which is the most effective [6]. The paper established in the confusion matrix that the VGG-16 network did not report any false positives on persons with healthy skin, which would suggest that persons with healthy skins had Monkeypox. The author in [14] also established similar findings. In all the folds the researchers ran, there was no instance of healthy skin detected as having Monkeypox. The authors in [6] and [14] suggest that healthy skin differs significantly from that which has contracted Monkeypox. Hence, so long as one does not spot any lesions, chances are that they are safe from the disease. The lack of enough skin images decried in [14,15] seems to have affected accuracy scores obtained in this study. Obtaining and processing Monkeypox images may be difficult currently because of their rarity. Some of the images on Google may not be of the disease but some websites post them as Monkeypox. It is crucial that researchers hire a microbiologist to examine the skin images before using them in model training and prediction. This approach was used in [14] in creating a Monkeypox skin image database. The researcher in this investigation also shared the results with the consulted microbiologist to help in propagating the news about the technology in the profession.
Based on F1 scores, Table III below compares the performance of different deep learning models from the articles used in this paper. The table also shows the number of classifications that the model deals with, which in turn affects the accuracy of the results.

VI. CONCLUSION AND FUTURE RESEARCH
The current research is testing the accuracy of machinelearning models in detecting Monkeypox from digital skin images. Its findings have established that the consulted models manifested an almost similar performance. However, the outstanding model was VGG-16, whose accuracy and F1scores were significantly higher than the rest. The detection of healthy skin was remarkably accurate because none of the healthy skin was classified as having Monkeypox. AI can be a reliable tool for physicians to differentiate between healthy skin and skin infected with the disease. Nevertheless, the accuracy in telling the difference in lesions caused by Monkeypox, measles, and smallpox still needs further analysis. The study's F1-score of 0.92 meets the threshold of 0.90 proposed in [28], which is a vindication of the usefulness of AI in predicting the Monkeypox disease from skin images. While the achieved scores are reasonably high to suggest proper classification, the researcher does not recommend that physicians use the technology until national health regulatory bodies further affirm the results. Other researchers should consider creating bigger databases that, when augmented, will validate the findings established in this investigation. The focus of future studies should only compare Monkeypox skin with diseases that cause lesions, such as Chickenpox, Measles, Smallpox, and Cowpox. With more research modeling the detection of Monkeypox, the researcher believes that artificial intelligence will add value to the detection process by making it quicker and more convenient. Once one has been tested, they can seek medical help and avoid contact with healthy individuals. If integrated into a mobile application, the technology can help in the detection of the disease in remote places where health facilities are fairly distant. It also provides a basis for additional research into the use of artificial intelligence in the detection of Monkeypox.