Abnormal Pulmonary Sounds Classification Algorithm using Convolutional Networks

In the world and in Peru, Acute Respiratory Infections are the main cause of death, especially in the most vulnerable population, children under 5 years of age and older adults. Pneumonia is the leading cause of death of children in the world. 60.2% of pneumonia cases affect children under 5 years of age. Thus, prevention and timely treatment of lung diseases are crucial to reduce infant mortality in Peru. Among the main problems associated with this high is percentage the lack of medical professionals and resources, especially in remote areas, such as Puno, Huancavelica and Arequipa, which experience temperatures as low as -20°C during the cold season. This study develops an algorithm based on computational neural networks to differentiate between normal and abnormal lung sounds. The initial base of 917 sounds was used, through a process of data augmentation, this base was increased to 8253 sounds in total, and this process was carried out due to the need of a large number of data for the use of computational neural networks. From each signal, features were extracted using three methods: MFCC, Melspectogram and STFT. Three models were generated, the first one to classify normal and abnormal, which obtained a training Accuracy of 1 and a testing accuracy of 0.998. The second one classifies normal sound, pneumonia and other abnormalities and obtained training Accuracy values of 0.9959 and a testing accuracy of 0.9885. Finally, we classified by specific ailment where we obtained a training Accuracy of 0.9967 and a testing accuracy of 0.9909. This research provides interesting findings about the diagnosis and classification of lung sounds automatically using convolutional neural networks, which is the beginning for the development of a platform to assess the risk of pneumonia in the first moment, thus allowing rapid care and referral that seeks to reduce mortality associated mainly with pneumonia. Keywords—Algorithm; classification; computational neural networks; lung sounds; mortality; pneumonia


I. INTRODUCTION
Pulmonary diseases in the new pandemic context are a public health concern. Pneumonia, for example, is evolving rapidly and its complications threaten the lives of the population, especially in low-income countries such as Peru [1], [2]. The Ministry of Health recognizes Acute Respiratory Infections (ARI) as a constant concern in Peru, caused by viruses, bacteria and fungi. Among ARI, pneumonia is consistently the leading cause of death in children and older adults in the world. In Peru, 60.2% of pneumonia cases occur in the 0-5 age group, and it is the leading cause of death [3].
Prevention of pneumonia complications is a priority to reduce mortality in children, especially in the most remote areas of the country.
Peru is a country with a diversity of weather and regions. There are more than 30 cities, with altitudes ranging from 2000 to 5100 meters above sea level [4]. Populated centers that endure extreme temperatures, even below -20 degrees Celsius, in departments such as Puno, Huancavelica, Arequipa, Junin, Pasco and Cuzco, among others [5]. In cold weather seasons, vulnerability, remoteness and limited access to medical services cause ARI, especially pneumonia, to be the main cause of death for many years, being an unfortunate constant [6].
Pneumonia, when detected in early stages, responds to antibiotic treatment with good results, when there are no medical or viral complications. Mortality associated with this disease is due to late diagnosis and complications associated with viruses, bacteria or comorbidities( [7], [8], [9], [10]). For this reason, WHO created a program for the control of respiratory infections aimed at community workers, which did not have the desired impact due to the imprecision in evaluating and calculating some clinical data, such as respiratory frequency for example [11], [12].
Accurate diagnosis of pneumonia depends on medical expertise and requires evaluation of various clinical features; shortage of expertise and appropriate diagnostic tools hinders timely treatment [13].
The diagnosis of pneumonia provided by a health specialist is made with the chest X-ray and clinical data, but in rural areas of Peru these resources are not available, and even more health professionals to perform them, which is why 2 out of 3 deaths from pneumonia are out-of-hospital because of the mobilization of the patient for a diagnosis from one population center to another, which brings complications that increase the possibility of death in the patient [14].
One of the most important clinical data to be considered by the health specialist is the result of pulmonary auscultation. However, this is subjective and depends on the training and skill of the health personnel in charge. The health personnel must have the ability to recognize the different sound patterns, our research quantifies these values eliminating subjectivity to have a reliable classification of abnormal sounds to support the health personnel or community agent at the first level of care The most common pulmonary sounds are crackles for pneumonias and wheezing in case of bronchitis. According to the sound analysis, the sounds are located between 100-1000 Hz, which is the frequency range we will cover for this study [16]. On the other hand, the acoustic stethoscope has a frequency response that attenuates the frequency components of the sound signal that the human ear is not particularly sensitive to, thus the analysis of sound in a wider frequency range can benefit the follow-up and evolution of cases [16].
This study details the creation of an initial algorithm for the classification of lung sounds from the characteristics of the audio signals, as a first step in the development of an automatic model that will further aim to classify them automatically and continuous learning, being this the first approach to algorithms of such kind by means of the definitions and characteristics of the sounds in general [15]. This research develops, by means of convolutional networks, the evaluation of lung sound using an automatic algorithm that with clinical data will allow assessing the risk of pneumonia at an early stage.
This research aims to contribute to this problem, proposing a mobile tool that through the use of Deep learning techniques can classify whether a patient is going through a pulmonary process or pneumonia from clinical data and the capture of lung sounds classifying them with convolutional neural networks. This project is based on computational neural network models for the classification of lung sounds, and identification of wheezing, hoarseness, or crackles for an effective analysis of support in those places where there is no medical personnel for timely diagnosis. In this way, artificial intelligence provides by means of learning the capacity of automatic classification based on artificial neural networks through the use of computational servers for the respective calculation [17]. The system could help non-expert technicians, such as health assistants or community workers, to detect pneumonia in low-resource settings where specialized personnel are not available to interpret ultrasound images.

A. Sample
In this investigation a public sound base was used, 2 research groups from Portugal and Europe, which has classifications and clinical data of 126 patients from which a total of 920 sounds were collected. The equipments used for sound collection are electronic stethoscopes:  3M Litmmann 3200 Electronic Stethoscope (Litt3200).
It is worth mentioning that due to the audio recording by the aforementioned equipment directly on the patients, the sounds captured have those emitted by the lungs and have some noise coming from the heart or any other sound that exists in the body or external at the moment of the capture.
The sounds have a duration from 10 seconds to 90 seconds, captured by specialists and classified into five main ailments: The use of convolutional neural networks for this research requires a large amount of data. The database used had a limited amount of them in some cases of classification because of this the process of data augmentation was used to obtain a base with better characteristics that support a better classification.
The Data Augmentation process has been carried out using three sequential processes. Fig. 1 presents the spectrogram of a normal sound at the beginning of the process.
Noise Addition: This process involves the addition of noise which means white noise to the sample. White noises are random samples distributed at regular intervals with a mean of 0 and a standard deviation of 1. In order to achieve this, we will use numpy's normal method, generate the above distribution and add it to our original sample. The following factors were used. Time Shifting: This is the process of moving the acoustic wave to the right by a given factor along the time axis.
To accomplish this, we used numpy's roll function to generate time shifts. The following factors were used:    Pitch Shifting: An implementation of the pitch scale used in musical instruments. It is a process of changing the pitch of the sound without affecting its speed. We will use the pitch_shift function of librosa again (example in the Fig. 4). It samples the waveform, sampling frequency and number of steps through which the pitch should be shifted. The following factors are used: Pitch shifting de factor +2.
Pitch shifting de factor -2. With this process, the data ended up increased by a factor of 9, as follows Table I shows the original data and the final data after the complete process.
In Table I, we can see in the first column, the initial classification followed in the second column by the number of data originally in the database. In the third column, the amount of data obtained after the data augmentation process. In the last two columns we see the classification that was performed, with the classification between healthy and sick and in the last one the identification of healthy, pneumonia, and some other disease. Each base and classification has its respective model, based on the characteristics of the pulmonary sounds.

III. METHODOLOGY
The objective of this study is to model by means of convolutional neural networks, the classification of normal and abnormal sounds. To this end, the general process we used for the design of the study was based on a sequential process consisting of three main parts schematized in Fig. 5. The procedures shown are:  Data Augmentation process, to achieve a more solid base, our original base had 917 sounds and our final base of the process has 8253 sounds, which we finally worked with, thus allowing an adequate classification and model. This process was explained in the population and sample section [17].
 Feature extraction process, for this process 3 methods were used: MelSpectogram, STFT and MFCC, the sound features were combined and utilized to generate 3 models [7].
 Modeling process, after the features were extracted they were classified in three ways, the first to classify Normal and Abnormal sound, the second to classify Healthy, Pneumonia or other diseases, and the last one to allow classification by disease.
The feature extraction process is performed by 3 methods:  STFT: is a sequence of Fourier transforms of a windowed signal that provides time-localized frequency information for situations where the frequency components of a signal vary over time. As a general www.ijacsa.thesai.org rule, a narrow window width generates better resolution in the time domain, but generates poor resolution in the frequency domain and vice versa. Visualization of STFT is often done through its spectrogram, which is an intensity plot of the STFT magnitude over time. In the present work, a window length of 2048 an offset length of 512 were used. Once the STFT is obtained, it is transformed to the Mel scale by applying a bank composed of multiple triangular band-pass filters, for this work 128 filters were used.
 MFCC: Concerning the field of sound signal processing, the analysis is usually completed with the calculation of the logarithm of the energy of each frequency band and with the calculation of the Direct Cosine Transform, obtaining the Mel Frequency Cepstral Coefficients (MFCC). The anomaly classifier using the convolutional neural networks that we have used is summarized in Fig. 6. For each classification, different variations of the convolutional neural network (CNN) architecture were performed, but all share certain similarities. For all models, the structure of the convolutional network was as follows: an input layer with dimensions of 256 x 256, followed by 7 convolutional layers with a 3x3 kernel and RELU activation function whose filters had dimensions of 32,64,128,256,512 and 1028 respectively, 7 layers of Max Pooling, 7 layers of dropout of 20%.
Next follows the final convolutional layer which has a GlobalAveragePooling2D type with softmax activation function followed by a fully connected 4 neuron and softmax activation function. Finally, an output layer that varies depending on the classification required, 2 dimensions for dichotomous classification, 3 dimensions for pneumonia classification and others, and 6 dimensions for complete classification. The Process is in the Fig. 7.
The Modeling process starts after the extraction of features given by the previous methods, with this data 3 classifications are modeled which are:   Model 3: classification by disease and wellness, which would allow a finer classification.

Statistical Analysis
To evaluate the classification models of each of the 3 models selected for analysis in this paper, we have considered some measures to validate their prediction or effectiveness [18], [19].
Among the measures to evaluate each model are: Training Accuracy, this measure assesses the accuracy of the model that has been designed on the training set.
Testing Accuracy, this value represents the accuracy of the model performed on the test set, which is different from the one used for training.
Accuracy is the measure that quantifies the number of positive class predictions that actually belong to the positive class.
To remember, this measure quantifies the number of positive class predictions made from all positives in the data set.
F1-score, combines the measures of accuracy and completeness to return a more general measure of model quality.
Support, is the amount of data you have to evaluate the model. 412 | P a g e www.ijacsa.thesai.org Confusion matrix, within the field of artificial intelligence, it is important to measure the power of statistical classification. The confusion matrix allows the visualization of the performance of an algorithm used in supervised learning, observing clearly in absolute values or percentages the way in which the model has classified the subgroups that we seek to model.
Receiver Operating Characteristic or ROC curve is a graphical representation of sensitivity against specificity for a binary classifier system according to varying discrimination threshold.
These measures will help us to assess and estimate the predictive ability of the classification on the set selected by the convolutional neural networks.

IV. RESULTS
This study was formulated to demonstrate the feasibility of an automatic design software based on convolutional neural networks in order to categorize pulmonary sounds depending on the extraction of the aforementioned features. We have detailed the variables that will support us in the evaluation of each model or classification set. In the following, we will show the results of the best model for each subgroup.
Model 1: classification of normal and abnormal sound.
For this model we work with a classified database based on the output of the data augmentation process, the details of the database conformation are shown in Table II. For the MFCC feature extraction method. A training accuracy of 1 and a testing accuracy of 0.998 were obtained.
The details of the classification report are shown in Table III.
The confusion matrix based on percentages is detailed in the graph (Fig. 8).
For this model, we obtained an ROC curve of 1 for each classification.
Model 2: Classification between normal sound, pneumonia and other diseases.
For this model we work with a classified database based on the output of the data augmentation process, the details of the database conformation are shown in Table IV.   TABLE II. REGRESSION MODEL 1 RESULTS  For the MEL feature extraction method. A training Accuracy of 0.9959 and a testing accuracy of 0.9885 were obtained.

Description Number of samples Percentage
Classification report of wellness, pneumonia, and other illnesses is described in Table V.  The confusion matrix based on percentages is detailed in the Fig. 9. For this model we work with a database classified from the output of the data augmentation process, the details of the conformation of the database are shown in Table VI. For the MFCC feature extraction method. A training Accuracy of 0.9967 and a testing accuracy of 0.9909 were obtained.
The details of the classification report are shown in Table VII. The confusion matrix based on percentages is detailed in Fig. 10 as follows.
The models presented in the results are based on the extraction of features provided by the methods explained, as part of the convolutional neural network that we have applied. These results have been applied taking as a sample the database after the data augmentation process, and the random division of the training and testing set. For each case, we have provided tables showing the distribution of the sample in the respective categories within the models and results, where the values for each subclassification are also described.

V. DISCUSSION
In this research, model 1, which classifies normal and abnormal sounds by means of convolutional neural networks, obtained precision values for normal sounds of 0.998 and precision values for abnormal sounds of 100.
Other studies in the literature have pursued similar objectives, some of which are discussed below.
In 2017, Aykanat [20], their research aims to improve audio data analysis through machine learning in order to classify respiratory sounds. They are based on auscultation, a non-invasive process for the diagnosis of pulmonary anomalies. In terms of materials, the Thinklabs One electronic stethoscope was used. They classify their information into 4 sets: (1) healthy versus pathological classification [17 930 clips], (2) rale, rhonchus and normal sound classification [15 328 clips], (3) singular respiratory sound type classification [14 453 clips] and (4) audio type classification with all sound types [17930 clips]. The acquired database is utilized in the highest quality format and in a real environment. In addition, they process their raw data, without filtering, amplification and with a single instrument and software adapted to the hospital environment. The spectrograms used are in 28x28 grayscale (to reduce storage and post-processing memory). They have a classification of healthy and pathological (1) with an average accuracy of 86% using Convolutional Neural Networks. Our research uses similar techniques, with better results, also using sufficient amount of data, but our approximations are higher than those detailed in this analysis.
Tracey [21] in 2011 researches on a low-cost cough monitoring system for drug-resistant tuberculosis (TB) patients (MDR), which could be used in places with poor access to specialized laboratories, the validation of the system algorithm, based on other research, is presented. The cough detection algorithm implements an image analysis of the MFCC through machine learning algorithms and compares the performance of manual diagnosis of cough audios and diagnosis by Neural Networks such as: Multilayer perceptron (MLP), Support Vector Machine (SVM) and Sequential Minimal Optimization (SMO). Its samples consist of 30 audio files corresponding to 10 random patients and a recording time of 30 minutes. In addition, a total of 13 429 cough frames and 43 925 non-cough frames were used to train the Neural Network. The algorithm designed with SMO classifies events as cough and non-cough. Moreover, it presents a sensitivity of 81%, which in the validation with 28 patients with MDR TB yields a promising recovery due to the indication of cough in the patient. The designed algorithm proposes an improvement in cough detection by estimating an empirical filter for environmental noise. Each recording file contains a number of frames used to train the neural network. It consists of two known and manually diagnosed patients. To reduce the CPU load in the training, the "divide and conquer" clustering method was used. If we compare with the present research we focus on pulmonary sound, the amount of sound samples they contain www.ijacsa.thesai.org are not comparable with those used in our study, it is worth mentioning that our analysis also has higher accuracy results than those indicated in the aforementioned research.
In 2019, Tariq [17] used the same database as the one presented in this paper and a similar methodology, but the results we obtained and the data augmentation processes are different. As their research in medicine and advanced technologies is associated with more accurate care and treatment, so the research aims to evaluate the degree of accuracy of Deep Learning and 2D Convolutional Neural Networks (CNN) using lung spectrogram data. The methodology is based on three steps: (1) normalization for information cleaning; (2) data augmentation, for training; and (3) CNN model, as identifier and classifier through libraries on MEL spectrograms. As a result of the 2D CNN (3) we obtain a model called lung disease classification (LDC) with 7 classifications: Asthma (0), bronchiectasis (377), COPD (10 205), Healthy (455), LRTI (26), Pneumonia (481) and URTI (403). Additionally, an accuracy of approximately 83% for original data and 97% for augmented and normalized data. The data augmentation methods expanded the original data by approximately 95%, which serves as a useful baseline for training. They do not specify the database format and do not prioritize the use of resources such as GPU or data storage. Accuracy increases significantly according to the normalization and data augmentation technique to the point of reaching 97% of their previous research. Our research uses the same data, without data normalization but uses data augmentation at 900%. This makes the results work better for model building, achieving a positive accuracy of 0.998 and a negative accuracy of 1.0.

VI. CONCLUSIONS
We can conclude that the highest classification found is with respect to the classification of normal and abnormal, with classification values of normal of 0.998 and abnormal of 100, which gives a high probability of success in implementing this technique in an automatic classification. The second finding is the one shown in the second model, where we reached model accuracy values of 0.9959 and 0.9885, this gives us evidence of an acceptable classification of the recognition of pneumonia over other ailments, since this is the primary interest of the study.
It is important to mention that in this work every model has been provided with a training and testing set. Also, the analysis of our experts showed that the sounds of the mentioned base are normal heterogeneous sounds of child and adult, trachea and lung that are captured by the digital stethoscope, commercial, captured in normal conditions of attention which leads to environmental noise, which could negatively influence the design of each model, but even so, the models are acceptable and we obtain from the analysis an adequate and significant value that encourages us to continue analyzing this method but now for sounds of patients with ailments such as COVID, during the pandemic in order to develop a tool for the first level of care in vulnerable areas with scarce resources.
This research is the proof of concept for the development of a complete system for the first level of care that supports the diagnosis through the automatic classification of sounds between normal and abnormal, in conjunction with clinical data, is the first step as part of a process that we are starting in favor of reducing the gaps in health. Although there are recent studies that work with neural networks for classification, our study aims to achieve a better understanding of the differences between these sounds to achieve with an adequate result the discrimination of these, as part of a study to implement an automatic platform for remote and remote locations to assess the risk of pneumonia at the point of care so that we can work on low-cost tools to reduce child mortality in times of cold and highland areas of Peru.

ACKNOWLEDGMENT
We want to thank the Image Processing Research Laboratory.
(INTI-Lab) and the Universidad de Ciencias y Humanidades.
(UCH) for their support in this research, the National Fund for.
Scientific, Technological and Technological Innovation (FONDECYT), according to the research: "SAMAYCOV: "Desarrollo de un dispositivo electrónico portátil a bajo costo para evaluar riesgo de neumonía basado en sonido pulmonar anormal en pacientes con sospecha de COVID-19 en zonas vulnerables". CONVENIO 054-2020-FONDECYT"; for the financing of this research and the Electronics Laboratory of the UCH for assigning us their facilities and being able to carry out the respective tests.