A Model for Classification and Diagnosis of Skin Disease using Machine Learning and Image Processing Techniques

— Skin diseases are a global health problem that is difficult to diagnose sometimes due to the disease’s complexity, and the time-consuming effort. In addition to the fact that skin diseases affect human health, it also affects the psycho-social life if not diagnosed and controlled early. The enhancement of images processing techniques and machine learning leads to an effective and fast diagnosis that help detect the skin disease early. This paper presents a model that takes an image of the skin affected by a disease and diagnose acne, cherry angioma, melanoma, and psoriasis. The proposed model is composed of five steps, i.e., image acquisition, preprocessing, segmentation, feature extraction, and classification. In addition to using the machine learning algorithms for evaluating the model, i.e., Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbor (K-NN) classifiers, and achieved 90.7%, 84.2%, and 67.1%, respectively. Also, the SVM classifier result of the proposed model was compared with other papers, and mostly the proposed model’s result is better. In contrast, one paper achieved an accuracy of 100%.


I. INTRODUCTION
Nowadays, imaging is used in medical science extensively, so before any surgery or treatment decision a preliminary knowledge can be determined, and diagnosis can be done. For this, imaging in medicine has become a tool to start most of the disease treatment cycle, starting from detection passing through evaluation, and ending with the treatment decision. Skin disease is one of these medical areas where images play a role in detecting, diagnosing, and treating the disease [1]. In recent years, skin diseases have increased and begun to be a global health problem [2]. Those who suffer from skin diseases without disease diagnosis may diminish their life quality and have a negatively psycho-social impact [3].
In fact, skin diseases are difficult to diagnose due to the complexity of human skin. Also, the lack of expertise may lead to misdiagnosis or overdue diagnoses. Diagnosis of skin diseases at the health center may take a long time and require domain expertise, which causes physical and financial costs. On the other side, machine learning and image processing techniques can help achieve high accuracy in skin diagnosing at the initial stage. Images processing plays an effective role in diagnosis the skin diseases with the help of libraries such as OpenCV, Scikit-Image, and NumPy. Afterward, machine learning algorithms such as SVM, RF, and K-NN are used for the classification task. Combining these techniques will save time and reach a quicker and more trusted diagnostic than typical procedures like patch tests and biopsy [4]. Due to the limitation of the existing models that diagnose different skin diseases, this proposed model studied four skin diseases. This research work aims to build a model that provides an easy, fast and efficient solution for skin disease diagnosis, i.e., acne, cherry angioma, melanoma, and psoriasis, using image processing and machine learning techniques.
The building of the proposed model passes through multisteps of image processing including, acquiring images, preprocessing images, i.e., resizing images, color transformation, de-noising, and normalization, segmentation, and feature extraction. In the end, train the model with traditional machine learning algorithms, e.g., SVM, RF, and K-NN. Several papers conducted on this paper's topic had been focused on the use of machine learning and image processing to classify skin cancer. Thus, this paper proposed a model to diagnose other common diseases in addition to skin cancer. Furthermore, the proposed model tested cherry angioma disease, which is very rarely tested in the previous research. This paper is organized into the following sections: Section 2 reviews the previous work, Section 3 presents the methodology, Section 4 shows the obtained results, and Section 5 is the conclusion of this work.

II. PREVIOUS WORK
Many researchers have proposed a model that combines image processing and machine learning algorithms techniques to classify and diagnose several skin diseases.
Hameed, Shabut, and Hossain [5] implemented a system that classifies healthy, acne, eczema, psoriasis, benign, and melanoma (malignant) skin diseases. The system was built based on image processing techniques. To enhance images, the authors used an algorithm called Dull Razor to remove hair from skin images and then applied the Gaussian filter to Additionally, Sinthura S. et al. [6] propose a method to detect skin diseases. Their method indicates using the adaptive filter to remove the noises, then converting it to grayscale color. Besides, they used Otsu's thresholding technique to segment the disease lesion. Furthermore, they used GLCM to extract the texture features. Finally, to validate their proposal, they used the SVM classifier and achieved an accuracy of 89%. In image classification-based color, the researchers in [7] train a model for detecting and classifying various skin diseases using the K-NN classifier. They use color models to extract features, including the HSV and the lightness, red/green, and blue/yellow (L*a*b) color models. Their results showed that the HSV color model is more efficient with 91.80% accuracy than the L*a*b color model with 81.60% accuracy. Moreover, Ahmed, Ema, and Islam [8] propose a new automated system using the Transductive SVM (TSVM) to classify 24 types of skin disease. The proposed system uses a hybrid genetic algorithm to segment the image. Also, they used ant colony optimization (ACO-GA) and GLCM to extract its features. Their work achieved 95% accuracy.
A method was carried out to apply pre-trained Convolutional Neural Networks (CNN) to extract features for skin diseases. The paper by ALenezi [9] proposes a system using the pre-trained CNN AlexNet to extract skin disease features and SVM to classify the diseases. The system was built with a dataset of 80 images of these skin types; melanoma, psoriasis, eczema, and healthy skin. Her system was tested on 20 images and achieved an accuracy of 100%. The same method used by researchers [10] present an intelligent expert system for classifying 9144 skin lesions, i.e., acne, eczema, benign or malignant (melanoma), and healthy skin images. For extracting the lesion's features, they used a pre-trained CNN model AlexNet. Their system result achieved an accuracy of 86.21% by using the SVM classifier, where divided dataset in the ratio 70:30 for train and test set.
Another study by Hajgude et al. [11] proposes a solution to detect 408 images of eczema, impetigo, melanoma skin diseases, and a class named other images. They build the model using these techniques: median filter to remove the noises, the Otsu method to segment lesions, 2D Wavelet transform to extract features like entropy and standard deviation, and GLCM to extract texture features like contrast and correlation. They used SVM and CNN classifiers to classify the diseases and obtained an accuracy of 90.7% and 99.1%, respectively. Authors in [12] describe the CNN classifier and the major libraries for image processing. Then, they use CNN to classify and diagnose skin disease with an accuracy of 70%, even though using a large dataset may increase accuracy by more than 90%. Authors in [13] propose a web system to diagnose skin diseases in Ghana (Africa). Their study includes a CNN to classify 254 images of diseases, i.e., atopic dermatitis, acne vulgaris, and scabies. In the end, they reached an accuracy for each disease of 88%, 85%, and 84.7%, respectively. The proposed system just takes 0.0001 seconds to diagnose, which may expedite more patients' diagnoses than a diagnosis in the clinic.
Most of the previous studies prove the efficacy of using SVM and CNN. Furthermore, the studies evinced the image processing plays a key role in helping to classify various skin diseases. Moreover, increasing the number of images may positively affect the classification due to the increased training model.

III. METHODOLOGY
This paper demonstrates the classification of several types of skin disease to diagnose the lesions such as acne, cherry angioma, melanoma, and psoriasis. Accordingly, the processes involved in identifying these skin diseases are preprocessing, segmentation, feature extraction, and classification. The following points show the datasets used and explain the proposed and techniques of this work.

A. Dataset
Due to the privacy of medical records, collecting images is a challenging task. Therefore, the images gathered from available resources: the dermnet NZ [14] and atlas dermatologico [15]. In this work, the dataset consists of 377 images of four different disease classes: acne, cherry angioma, melanoma, and psoriasis. Fig. 1 shows a sample of each class. Table I lists the number of images of each class.

1) Diseases definition:
In the following, a brief definition of each disease studied in this work, as mentioned in the dermnet NZ website [14]. a) Acne: It is a common chronic disorder, often confined to the face, but it may happen in the chest, back, and neck. Acne may occur in children and adults of all ages. However, acne is caused due to a combination of several factors such as familial tendency, acne bacteria, and hormones. Acne could be characterized as blackheads and whiteheads.
b) Cherry angioma: The reason of cherry angioma is unknown. It is very prevalent in males and females of any age, while it markedly increases in people from about the age of 40. However, cherry angioma may be in red or purple, or blue color. Also, it could be scattered overall body surface parts. c) Melanoma: It is the gravest form of skin cancer. It happened due to the uncontrolled growth of melanocytes (pigment cells). Melanoma may occur at any age but is very rare in children. However, the features of melanoma could be having several colors like blue, brown, red, etc. d) Psoriasis: It is a chronic inflammation of a skin condition. It affects males and females at any age. It is characterized by symmetrically distributed, red color, scaly plaques with well-defined edges. www.ijacsa.thesai.org

B. Proposed Methodology
This section presents the processes of this proposed model and the techniques used. The architecture of this model is shown in Fig. 2. The procedure of the proposed model is described in the following points:  Import the train set images and process it through: preprocessing the images, then segmenting the lesion from the remaining normal skin, and after that extracting its features.
 Import the test set images and process them in the same way as step 1.
 The features that extract from the train set images are stored in a knowledge base.
 Compare the features that extract from the test set image with the feature stored in the knowledge base.
 Diagnosis of the disease.
 When the user uploads an image, it is will pass through the same processing of the test set image.
1) Image processing: Image processing is a technique that manipulates and analyzes an image received from a camera or sensors. Therefore, the image processing's main objective is to enhance the image's quality and extract its information in order to be more interpretable by a human or machine perception. Nowadays, many image processing techniques are incorporated as they turn out to be strong computational methodologies and strong potential to be effective in healthcare and all fields [16]. In the following, describe the image processing techniques used in this work.
a) Preprocessing: The first step in processing the skin disease images is preprocessing in order to enhance images. In this work, firstly, all the images were resized to 250X250.
After that, due to the noises in skin images, a de-noising technique called a median filter was applied. The median filter is the most filter the researcher used according to the advantage of preserving the edges of the image [17]. Further, the color images converted to a grayscale color model for segmentation and feature extraction tasks. Finally, the images pixels' values normalized between 1 and 0. In Fig. 3, column A shows the original images with a size of 250X250. Next, column B shows the images after applying the median filter. Also, column C shows images converted to grayscale. b) Segmentation: The segmentation task is a process of partitioning the lesion region from the skin. This segment gains based on similarity or difference of pixels properties like color, sharpness, brightness, or intensity of an image [18] [19]. Based on this work, it's a challenging task due to the several diseases the proposed test. Also, the key problem is the entire lesion's color may be similar, and the lesion's boundaries may be fuzzy, besides the complexity of the skin itself. To address this problem, the Otsu's thresholding is used to create a mask (binary image), then applied it to the grayscale image. Some of the results are shown in Fig. 3. Column D shows the binary masks, and column E shows the final segmentation results.
 Otsu's: Otsu's is the most popular threshold segmentation technique, and it is applied to a grayscale image. Unlike the manual threshold, it automatically compares the minimize weighted within gray classes variance to find the optimal threshold value. Since the threshold value was determined, the lesion can be segmented from the normal skin region [20][21].  c) Feature Extraction: Feature extraction is a technique that plays a significant role in image processing. It is used the image or the segmented lesion to extract the characteristics' features that represent the information of that image for classification tasks. However, texture feature is a type of feature that can be used in recognition of an image by describing the visual image's surface. Textures are complex patterns composed of many characteristics, including size, color, brightness, slope, etc. [22]. In this work, feature extraction is performed using Gabor and Entropy techniques to extract texture features and the Sobel technique for edge features. All the feature extraction techniques' parameters used in this proposed model are listed in Table II.  Gabor: Gabor filter is a linear filter used to extract texture features. It is the most commonly used in image processing and image texture analysis [23]. Essentially, Gabor is a band-pass that extracts patterns in a signal at specific frequencies [24].
 Entropy: It measures the expectation of the quantity of information in the grayscale image. It calculates the center pixel of all the neighboring pixels in the kernel's window [25].
 Sobel: Sobel is a filter used to detect the image's edge features. It has the power on processing with more minor time consumption, less in loss of edge, and strongly resists the noise [26]. In this work, the Sobel is applied that utilized two kernels to obtain the horizontal(Gx) and vertical(Gy) approximation of the derivation of the grayscale skin diseases images. These kernels were convolved to overlay the image's pixels.
2) Classification: The classification task classifies data into distinct known classes using machine learning algorithms to predict the disease. Once features are extracted, it is given as input to the classifier model. When the model is accurate, it is used to classify new images that are a member of the trained disease classes. The proposed model used the traditional well-Known classifiers to conduct the experiments, i.e., SVM, RF, and K-NN.

IV. EXPERIMENTAL RESULTS
This section shows the evaluated experiments to measure the performance of this model. The proposed model's experiments were implemented using the python language in Spyder environment from anaconda, along with several libraries, i.e., Scikit-learn to perform machine learning algorithms, OpenCV, Glob, and Os to perform image processing, Skimage to perform filters, and Matplot for the visualization.

A. Evaluation Measure
The model performance was analysis with several measures. These measures are formulated as in (1) F1-score = 2 * (precision * recall / precision + recall) Where TP is true positive, TN is a true negative, FP is false positive, and FN is a false negative.

B. Splitting Dataset and Classifiers Parameters Results
To detect the behavior of machine learning algorithms, it needs to test the model on data that is not used in the training process. Toward this, the dataset splits into two sets: training set and testing set. Table III lists three experiments of splitting and lists the accuracy results of each classifier. Each ML classifier has several possible values of each corresponding parameter that value may affect on accuracy result. However, the value that obtains the highest accuracy result will be used in the proposed model. The possible parameters and their values are illustrated with the accuracy results in Table IV. All these experiments were tested on the dataset that split into 80% for training and 20% for testing.

C. Classifiers Performance Matrix
This section shows the confusion matrix of each classifier. It is represented in a table style to describe the model's classifier performance. However, it presents the prediction results on the test set data. The matrix size is based on the number of classes; in this case, the matrix size is 4X4. The rows indicate the true class, while the columns indicate the prediction for each class. While The diagonal of the matrix points to the number of images that are correctly classified. Fig. 4 displays the confusion matrices of the classifiers of SVM, RF, and K-NN as shown in A, B, and C, respectively. For example, in SVM, a total of 16 acne images, it classified 14 images correctly and 2 images were misclassified as psoriasis.
From these confusion matrices, could measure the accuracy, precision, recall, and F1-score of each disease with each classifier as shown in Table V. Among the three classifiers, SVM is superior performance on the accuracy rate of cherry angioma, melanoma, and psoriasis. While acne obtained the highest accuracy rate using the RF classifier. In contrast, K-NN has the worst results accuracy in all these diseases.

D. Classification Experiment
The proposed model was validated using SVM, RF, and K-NN classifiers with the evaluation measures, i.e., accuracy, precision, recall, and f1-score. Table VI lists all measurement results of each classifier. It is observed that the SVM has superiority in classifying skin diseases over others in accuracy, precision, recall, and f1-score of 90.7%, 91%, 90.8%, and 90.8%, respectively. At the same time, the K-NN has obtained the worst results.

E. Comparison Results
To our knowledge, there are no study experiments on the same diseases of this work. Also, due to different and unavailable datasets, the proposed model was compared with the research that tested some of the diseases studied in this work. Table VII details the comparison research [5], [6], [9], and [10] toward their studied diseases and the image processing techniques they used.
Basically, many techniques are available for image processing and classification tasks, and it used in several skin diseases classification research. Among that, the comparison research in Table VII are probably the most papers close to the techniques used and to the diseases tested in this work.
In the comparison papers, they present in two ways of image processing: manual and automatic. In manual image processing, researchers [5], [6] follow the same processing steps with different preprocessing filters techniques; Dull Razor with Gaussian in [5], and Median in [6]. Further, they used the same segmentation technique, i.e., Otsu's. Another essential point, studies have shown that extraction techniques play a key role in extracting the appropriate features for the www.ijacsa.thesai.org classification task. For that, [5] extract the texture features using GLCM and NGTDM, and extract color features using color spaces. Similarly, researchers [6] extract texture features using GLCM. On the other hand, both researchers [9] and [10] utilized the automatic image processing by CNN with a pretrain AlexNet.
Therefore, Table VIII lists the SVM classifier accuracy of this proposed model with the comparison research model's accuracy that used the same classifier, i.e., SVM, and different image processing methods. All these papers showed a promising high accuracy rate in diagnosing diseases above 83%. However, it was observed the proposed model is higher than [5], [6], and [10] with 90.7% accuracy.
Thus, probably the proposed model's performance is better since this proposed model used multi techniques to extract a combination of features, i.e., Gabor and Entropy for texture features and Sobel for edge features. In contrast, the paper [9] is superior to the proposed model with 100% accuracy. Despite the fact, that they trained the model on 80 images, while the proposed model was trained on 301 images. Also, the proposed model tested cherry angioma disease, whereas the other studies did not test it.   The main problem encountered in developing the proposed model was the few of availability of the image of the diseases tested. In addition, the few papers that study various types of skin diseases. May training more images increase the accuracy and make the model more accurate to diagnose new images, as well as selecting appropriate techniques to extract useful features.

V. CONCLUSION
This paper proposed a model that provides the classification of different types of skin diseases: acne, cherry angioma, melanoma, and psoriasis. According to the previous works in this area, there is a scarcity in the studies on these diseases as most research focuses on skin cancer. This model was conducted through image processing techniques and machine learning algorithms on a total of 377 images. The dataset is divided into 301 images for the train set and 76 images for the test set. Firstly, the processing techniques are applied to images in several steps: preprocessing including resizing images, removing the noise using the median filter and converting the image to grayscale, then separating the infected area using Otsu's, and extracting its features using Gabor, Entropy, and Sobel. Secondly, the model was evaluated using SVM, KNN, and RF classifiers in terms of accuracy, precision, recall, and f1-score. However, the proposed model results show that the SVM accomplished higher accuracy with 90.7% than RF and K-NN. At the same time, RF and K-NN achieved 84.2% and 67.1% accuracy, respectively. The result of the proposed model using the SVM classifier achieved better accuracy than the comparison research' accuracy. In contrast, one paper outperformed the proposed model's accuracy. Since it was not possible to collect a skin disease dataset locally, finding a public source with multi images of diseases was the biggest challenge during this work. This model can accomplish higher accuracy by using more dataset images. Moreover, programmers can utilize the model to develop a smartphone application to diagnose these skin diseases easily and early.