Automated Pneumonia Diagnosis using a 2D Deep Convolutional Neural Network with Chest X-Ray Images

—Tiny air sacs in one or both lungs become inflamed as a result of the lung infection known as pneumonia. In order to provide the best possible treatment plan, pneumonia must be accurately and quickly diagnosed at initial stages. Nowadays, a chest X-ray is regarded as the most effective imaging technique for detecting pneumonia. However, performing chest X-ray analysis may be quite difficult and laborious. For this purpose, in this study we propose deep convolutional neural network (CNN) with 24 hidden layers to identify pneumonia using chest X-ray images. In order to get high accuracy of the proposed deep CNN we applied an image processing method as well as rescaling and data augmentation methods as shear_range, rotation, zooming, CLAHE, and vertical_flip. The proposed approach has been evaluated using different evaluation criteria and has demonstrated 97.2%, 97.1%, 97.43%, 96%, 98.8% performance in terms of accuracy, precision, recall, F-score, and AUC-ROC curve. Thus, the applied deep CNN obtain a high level of performance in pneumonia detection. In general, the provided approach is intended to aid radiologists in making an accurate pneumonia diagnosis. Additionally, our suggested models could be helpful in the early detection of other chest-related illnesses such as COVID-19.


I. INTRODUCTION
Pneumonia is an infection of the lungs that can be caused by viruses, bacteria, or fungus. In principle, the condition is brought on by a wide range of bacteria, including fungal environmental pollutants, or even physical harm to the lungs caused by smoking or other forms of pollution [1]. Being a common disease, it also has the potential to be fatal [2]. Regarding modern solutions for pneumonia detection, it is hardly surprising that computer vision is an important field of study for artificial intelligence (AI), primarily since it provides answers to a wide range of issues that nowadays people encounter. Biomedical image analysis using artificial intelligence is one of the computer vision areas that has repeatedly shown to be beneficial.
Recent years have seen a rise in the use of deep models, particularly CNN models, as the dominant method for the categorization of clinical imaging. This is due to the fact that selecting which features to retrieve in conventional approaches to computational intelligence is a time-consuming process that also varies depending on its object [3][4][5]. These studies have made use of CNNs with a variety of architectural styles and methodological approaches, which were conducted using X-ray images. In order to get more favorable outcomes, CNN-based models require a substantial number of training samples [6]. It is quite problematic to gather medical imaging due to the process of collecting and identifying healthcare data which is complicated by time-consuming privacy rules and the explanations of medical professionals. According to researchers [7], transformation-based data augmentation has been proved to be a suitable method for classifying images. Image enhancement methods may help prevent overfitting during the training phase, which ultimately results in a more accurate model being created.
The majority of the strategies discussed here use transfer learning, which means the deep learning methods were initially trained on the data unrelated to pneumonia diagnosis. The usage of convolutional neural networks created from scratch in a number of image processing algorithms [8] shown that a simpler structure might achieve a higher precision than numerous pre-trained older models used in transfer learning. Throughout this work, we propose creating a cutting-edge deep convolutional neural network as a precise solution for the pneumonia detection issue. The primary innovation in the proposed network is the use of dropout in the convolutional portion of the model, as compared to a large number of other models, which only employ dropout in the fully connected portion of the neural network, which is where the bulk of the attributes are learned. This study establishes the suggested network's ability to perform correct classification even with a small number of training features, providing accompanying illustrations as to how this characteristic might increase predictive performance.
The remainder of this study is as follows: next section reviews the related works in the area of pneumonia detection. Section III describes the data used in this study and the division of the data into training, test and validation sets. Section IV

II. RELATED WORKS
In the past few years, the subject of artificial intelligence known as "deep learning" has had an incredible amount of growth, quickly becoming an integral feature for computational intelligence in a variety of domains, including text processing, speech analysis, image analysis, audio and video processing, and etc. CNNs, have shown themselves to be useful tools for a broad variety of applied problems. The common driving factors of this phenomenon are considered to be an appearance of massive datasets as well as increasingly powerful computing systems. To bridge the gap from high-level representation and low-level attributes, CNNs take as input source data, such as photos, and then execute a series of convolutional operations to gain knowledge rich information about the images. This ensures that the inputs are accurately mapped to the original values. Impressively, it has been recently discovered that CNNs can correspond or even surpass human quality in visual problems such as the classification of image features.
Pneumonia is a leading cause of mortality, accounting for more than four million deaths annually [9]. In terms of timely detection methods, chest X-rays is a prominent diagnostic tool for pneumonia. As for its advantages, chest X-rays are less expensive and can be performed anywhere in the world. Since the indications of pneumonia in X-ray data are not always coherent or readable to the human eye, the software may assist the medical expert in making a diagnosis of pneumonia owing to its fast rate and objective repeatable judgment, not least because of the computer has a high degree of objectivity. Besides, there are a few different decision making systems available, which may be used for the diagnosis and classification of pneumonia. It should be noted that the COVID-19 pandemic was the impetus for a fresh wave of research towards the creation of such diagnostic techniques. In due course, deep learning techniques will provide the backbone of the majority of the innovative approaches. In order to diagnose viral pneumonia [10] used VGG-16 architecture, Resnet-50 convolutional neural network, and Inception-V3 image recognition model, as well as transfer learning, and obtained from 71% to 88% accuracy on around 600 test photos.
Brunese et al. [11] developed a multiclass pneumonia identification system that was based on VGG16 structure and visual debugging method. Using this framework, they were able to obtain an accuracy of 96.2% on a collection of 6500 CXR pictures. When Panwar et al. [12] paired the VGG-19 network and GradCAM strategy, they managed to attain an overall accuracy of 95.6% within the context of a three-class pneumonia classification. In a similar manner, Ibrahim et al. [13] provided a method for the diagnosis of three classes of pneumonia by using VGG and Res-Net152V2 networks. Based on their own collection of X-ray pictures, they got recall values ranging from 93-97%.
A surprising 98.62% efficiency was achieved using a threestage hybrid model that was suggested by Jin et al. [14] and consisted of a feature representation, a feature selection, and a SVM classification stage. Karthik et al. [15] used the Channel-Shuffled Dual-Branched convolutional neural network to differentiate between several forms of pneumonia using a variety of datasets that were made accessible to the public. The researchers achieved scores ranging from 94% to 98%. When Quan et al. [16] applied combination of the DenseNet and CapsNet architectures, they obtained a recall of 96% on a COVID-19 dataset with a limited sample size, but with an accuracy of 90.7% on a bigger collection of pneumonia X-ray data. Alhudhaif et al. [17] applied a convolutional neural network that is 201 layers deep and obtained 90% and 95% accuracy and recall values on 6000 X-ray images, using the dataset provided by Kaggle. The design known as Mask R-CNN was used by Jaiswal et al. [18] in order to identify specific areas of the lung that were affected by pneumonia. The so-called CovXNet approach was developed by Mahmud et al. [19], a convolutional neural network that extracts a wide range of attributes from X-ray images by using depthwise convolution with varying dilation convolution rates. Based on 4697 X-ray pictures, Wang et al. [20] trained a visual geometry group architecture using what is likely the most extensive CXR picture dataset that is currently available. This dataset was released by the RSNA Pneumonia Detection Challenge and contains more than 120,000 individual images. When it came to the diagnosis of pneumonia, they had a success rate of 94.62%. The reader is encouraged to consult more recent review studies, such as [21][22][23], for further information about the current state of the art.

III. DATA
The dataset consists of 5863 chest X-ray images collected in 2018 from the Women's and Children's Medical Center and the Laboratory of Regenerative Medicine and Healthcare in Guangzhou City [24]. Chest X-ray scans were taken as part of standard clinical examination, and the study objectives didn't include collecting data for this task. The data has been divided into Training, Test, and Validation sets, each of them containing subfolders for the image category as normal and pneumonia. X-ray images have a size of more than 1000 pixels per measurement and occupy more than 1.2 GB of memory. Table I demonstrates train/test split of the applied dataset. Normal is a label for an X-ray of the healthy chest image, without signs of infectious and inflammatory lung diseases. Pneumonia label is a sign that characterizes a sick patient who has been diagnosed with pneumonia ( Fig. 1). Before publication, all chest images were initially checked by removing low-quality unreadable images, as well as by muffling noise in the images. Diagnoses were determined by two highly qualified doctors. In order to prevent human error, all images were retested by a third expert before the machine learning model was trained. All these tests provide confidence in the authenticity and data integrity.

A. Block Diagram
A block diagram was built to show in detail what steps you need to go through and how to implement the model. A graphical representation of the entire process using diagrams and blocks is the best way to demonstrate the work process to audience. In its turn, our project has similar development stages, like in another image classification tasks. Each block shows one development step. Sub blocks may occur within the blocks. They indicate a particular stage for the methods or functions that will be used. Arrows describes the right sequence of working process in our project. In total, creating an algorithm for classifying chest images takes 7 development stages. The process of solving the classification of X-ray images can be represented as a sequence of steps shown in Fig.  2. The scheme begins with data input, this is the initial part where images are selected for further processing and submission to the model input. The first step of the algorithm is scaling, a constant that will multiply the data before any other processing. Since our source images consist of an RGB color model in the range 0-255, these values are very difficult to calculate. Therefore, we focus on values between 0 and 1, scaling by a factor of 1/255, which is followed by data augmentation. Initially, the data has an imbalance of classes, and data augmentation is a powerful tool that in almost every case helps to increase the efficiency and reliability of the model. Afterwards, the clean data is used to build a CNN architecture. The algorithm takes an input data, assigns importance to various areas/objects in the image, and tries to distinguish one from the other. All parameters are optimized by minimizing the error on the training set by the backpropagation method.

B. Image Preprocessing
As mentioned earlier, our data has 1000 pixels per chest image and a size of more than 1.2 GB, which means that we would not have enough memory space to simultaneously process such many images. To solve this problem, we needed to find a more efficient way to submit and process data.
To prevent memory loss, a data generator was created using the Python Keras library, which can be employed on top of the GPU installation. It generates our dataset on multiple cores in real-time and immediately transfers to our deep learning model. Separately, the image generator was also built for training, testing, and validation data. For training, we used data augmentation to prevent an imbalance within classes. The rescale argument is defined with the parameter 1.0 / 255.0. After processing the image, we can see the outcome in Fig. 3.

C. Data Augmentation
In case of class imbalance, Data Augmentation is the main part of image processing, which increases the number of objects of a smaller class as well as the accuracy of the model and creates variability in the data. The general scheme of data augmentation is illustrated in Fig. 4.
As shown in the figure, first the chest images pass through the transformation functions, which are initially determined by a medical specialist. Although there are numerous methods of image processing, such as rotation, shift, brightness, shift inten-sity, horizontal flip, etc. each individual case employs their own augmentation methods. In our position, the training and test data splitted 80% to 20%. It can be noticed that the classes in the training data are noticeably different, so, in order to solve this problem, we used the following data processing techniques.
A data magnification is a method which generates new images by shifting elements in the image. The methodology is based on a shift or a slice of the image from both the X and Y axes. The measure of the shift or the probability of direction is set by the user as the shear_range argument for the ImageDataGenerator function. We used a small coefficient so that the change in the shape of the chest is insignificant (shear_range = 0.2). Example in Fig. 5.
The technique can change the angles at which our object is located. In the latest updates, images can be changed not only along the horizontal axis, but also to tilt in any direction. Moreover, the rotation method can improve the model and make it more stable on new data. In general, this method is beneficial for overtraining. If all images are served in a fixed position, it can improve the variety of images and increase the variation, which prevents the model from retraining. We used the rotation_range = 10 argument of the Image Data Generator function (see Fig. 6).  A processing method that changes the contrast of an image; it is a modified AHE method, it prevents excessive noise amplification in the image, limiting contrast enhancement by setting a threshold that cuts off the histogram before calculating the cumulative distribution function. To use it in our data processing, a function was created, and was passed as an argument to the preprocessing_function method Im-ageDataGenerator.

D. Proposed Model
Today convolutional neural networks are extensively used for classification, image segmentation and object detection projects. The principle of the algorithm is as follows, it assigns importance to each area of the image, and based on the weights, learn to distinguish one from the other. The architecture of a convolutional neural network is illustrated in Fig. 7.
As shown in the picture, the training steps can be divided into several stages: input layer, convolution layer, pooling layer, fully connected layer and output layer.
The first step of the algorithm is to prepare the image in the correct format to pass the model. In order to achieve this goal, we translated the images into a matrix view. If the input images have a black-and-white appearance, then in matrix form they will be of size n = m, which means the matrix will be in twodimensional. Nevertheless, most often in the real world, the data will be colored, and due to this they will need to be stored in a three-dimensional matrix. In addition to width and height, the data includes a third channel, which is called chromatic. As soon as the images are ready, we move to the convolution layer.
Convolutional layer is considered as the main CNN layer. At this stage, the matrix form of our image takes a filter (core). The number of filters is determined arbitrarily with a size of 3 × 3 or 5 × 5. This filter passes through the image and extract activation cards. During the pass, the filter is multiplied by the pixels in the image and summed by the formula:  704 | P a g e www.ijacsa.thesai.org A fully connected layer is the most recent and basic layer of a convolutional neural network. It provides our ready and processed data to identify the answer. A fully connected layer combines all the previous layers, smoothest and transforms into a vector view.
In the last fully connected layer, various activation functions are used. As an activation layer, we take rectified linear unit (ReLU).
Our developed model consists of six convolutional neural networks. For the first layer, preliminarily processed images are submitted in a matrix format with a size of 224×224×8. After each layer, max pooling is applied to lower the activation card. In the last block, convolution alignment is applied and sent as an input layer for the fully connected layer. The last output of the fully connected layer is two nodes that show whether a person is sick or not. To determine it, the softmax activation function is used. Adam was chosen as the optimization algorithm for the convolutional neural network. Thanks to this method, weights and learning rates are selected and changed, as well as model losses are reduced.
In order to characterize the correctness of our solutions, the loss function categorical cross entropy is used. The loss function has the following general form: Here: i ŷ is the i-th response of the model; This loss describes our model very well and shows how two discrete probability distributions are distinguishable from each other. Architecture of the entire convolutional neural network is illustrated in Fig. 8.

E. Normalization
Normalization is one of the main methods of preprocessing. It is used to standardize data, in short, converts data into the same range to reduce the computational time of the algorithm. The lack of normalization can lead to the complexity of the net-work and reduce the learning rate. In addition to standardization, it also helps to regularize the CNN model, which is one of its unintended advantages. Specifically for our model, we used batch normalization. It is important not to get confused, since batch normalization is not used for source data, and it is activated between network layers. The data is divided into small chunks with an average value of 0 and a standard deviation of 1. After processing, each element of the original data simulates a standard normal distribution. Here you can see the batch normalization formula.

F. Evaluation Metrics
Evaluation criteria include Accuracy, Precision, Recall, and AUC-ROC curve. In this section, we explain each of the evaluation parameters that applied in current research.
Confusion Matrix: With the help of the confusion matrix, we can see in more detail how our model behaves in various situations (see Fig. 9). It not only shows the model's response errors, but also their types. Thus, such breakdown of the response helps to prevent limitations related to accuracy. is a probability curve that shows the ratio of TPR and FPR. They, in turn, separate the "signal" from the "noise". The Area under the Curve (AUC) is the area under this curve. The wider and larger it is, the better the classification model works.

V. EXPERIMENTAL RESULTS
The accuracy of the model for training data has reached up to 91%, which indicates that the model classifies chest images adequately. However, it would be unpractical to reckon only one metric, in the light of an imbalance of classes. Therefore, we compared other metrics as well, and Precision and recall are kept around 90 percent, and the quality of validation data has decreased. This happens because on the validation data we have only 16 new images that the model has not seen before, so the system operated correctly on 13, only 3 images have an erroneous result, which is a very good indicator.
More detailed statistics can be seen in Fig. 10, where the model's responses are recorded point-by-point for each case. Our neural network correctly found 372 patients with pneumonia and identified 194 healthy patients, while in some images the system mistakenly classified patients. In most cases, there is an ambiguity in the pictures, or the low quality of the pictures.     Fig. 12 demonstrates several examples of the obtained results using the proposed deep convolutional neural network, while Fig. 12(a) illustrates the case that the proposed model found no trace of pneumonia when ground truth coincides with that decision. Fig. 12(b) demonstrates that there is a pneumonia when the ground truth is pneumonia. Fig. 12(c) shows the case that the model found as a pneumonia, when there is no pneumonia that coincides the false positive case. Table II compares the proposed deep convolutional neural network with the state-of-the art studies that dedicated to deep learning based pneumonia detection. The results show that the proposed deep convolutional neural network shows high performance in terms of different evaluation parameters including accuracy (97.2%), precision (97.1%), recall (97.43%), F-score (96%), and AUC (98.8%). a) Predicted class is false, ground truth is false b) Predicted class is true, ground truth is true c) Predicted class is true, ground truth is false  The experiments' results show that the proposed model, which is a lightweight CNN, not only performs a smaller number of calculations compared to the majority of deep learning models, but it also performs better in terms of important evaluation parameters as accuracy and recall. This can be observed when the number of variables is almost equal to an order of magnitude. Therefore, utilizing the depthwise separable convolution is beneficial in this situation. In light of the advent of deep learning, the majority of image processing models now includes a huge number of attributes and need a significant amount of computation, making them unsuitable for implementation in embedded devices. When it comes to diagnosing pneumonia, which is a fairly frequent condition, there is also necessity to consider how to complete it in a timely and accurate manner in locations where medical equipment and medical professionals are unavailable. One of the reasons why we suggest utilizing the proposed deep CNN for pneumonia diagnosis is because of this particular benefit. www.ijacsa.thesai.org Recently some studies [15,16] tried to develop a model from scratch or to adapt an existing model for the aim of identifying pneumonia. Next researches [13,14,17] concentrated on applying transfer learning and pretrained models. Meanwhile, Mahmud et al. [19] utilized a dataset from Kaggle [34], while Wang et al. [20] used training data from Mendeley, while the majority of studies [29][30][31][32] applied the Chest X-ray dataset [30]. In contrast to this work, a few other scholars [13,15] did not employ any kind of data augmentation technique in their work. Table II demonstrates that the proposed model has a recall of 97.43%, which makes sensitivity the most essential parameter in medical applications since it reveals the proportion of positive diagnoses that are accurate. When compared to the approach that has been proposed, all of the prior work [25-27, 29, 31-36] has been shown to produce a lower accuracy, sensitivity, precision, and F1 score than the way that has been recommended. In addition to this, the size of the dataset that they employed is far lower than the one that was used in this work. The structure of the suggested model is straightforward, which indicates that it converges quickly and does not call for a significant amount of processing resources. On the other hand, the generalization power of the recommended model is not on par with that of pretrained models. Only 5852 CXR images were utilized for the training of the suggested architecture, in contrast to the millions of photos that were used for the training of the pretrained models. The picture dataset that was employed for this research is insufficient to develop a credible CNN model with a high level of accuracy and to include all of the intrinsic image characteristics associated with pneumonia. In conclusion, the researchers who carried out the earlier studies did not provide sufficient information to carry out an exhaustive analysis; furthermore, they did not divulge the methodology that they applied in order to validate their data, and it is unclear whether or not they made use of cross validation like the current investigation did.
The fact that this study came up with almost faultless outcomes lends credibility and dependability to the applied methodology [37][38][39][40]. Lastly, the recommended model will have an influence on medicine. With the help of the proposed models, medical professionals working in rural regions will be able to diagnose pediatric pneumonia in a timely manner that is both cost-effective and accurate. In terms of older patients and younger children in particular, prompt and precise diagnosis of pneumonia may lessen the likelihood of deadly consequences from the disease. The interpretation variability and subjectivity issue that might arise while reading a chest X-ray radiograph can be helped by the suggested model. It may also be used to help inexperienced radiologists in distant places that lack professional radiologists to make a proper choice about a patient's treatment. The last phase is to create a mobile app that can differentiate between pneumonia and chest X-ray images, which will be used by airport personnel.

VII. CONCLUSION
Our final task is to carry out this project in medical institutions, since AI helps hospitals, especially those with limited resources, quickly to examine suspected patients for further diagnosis and treatment. In a few seconds after taking chest-X ray, the radiologist receives a notification about whether the patient needs to be assigned a high priority and enter a protocol for the treatment of pneumonia or not. More traditional methods may take longer to process due to an increase in the number of infected. Thus, AI has three valuable features during a serious outbreak: 1) Patients with symptoms are admitted to hospitals in large numbers, while AI can help to prioritize patients quickly.
2) AI serves as a supplement to the diagnosis of pneumonia, since the capabilities of the laboratory, even in cities with good equipment, are insufficient to work at a pace with the increase in the number of suspected cases. AI is an important supple-ment in an outbreak of a disease with high infectivity.
3) AI can easily compare changes and events in the lungs with different examinations of the same patient, which can be tedious and difficult for doctors, especially in the context of an epidemic situation.
All things considered, we firmly believe that that our project helps doctors for diagnosing pneumonia, provide immediate help to sick people and save time for doctors as well as patients.