CovSeg-Unet: End-to-End Method-based Computer-Aided Decision Support System in Lung COVID-19 Detection on CT Images

COVID-19 epidemic continues to threaten public health with the appearance of new, more severe mutations, and given the delay in the vaccination process, the situation becomes more complex. Thus, the implementation of rapid solutions for the early detection of this virus is an immediate priority. To this end, we provide a deep learning method called CovSeg-Unet to diagnose COVID-19 from chest CT images. The CovSeg-Unet method consists in the first time of preprocessing the CT images to eliminate the noise and make all images in the same standard. Then, CovSeg-Unet uses an end-to-end architecture to form the network. Since CT images are not balanced, we propose a loss function to balance the pixel distribution of infected/uninfected regions. CovSeg-Unet achieved high performances in localizing COVID-19 lung infections compared to others methods. We performed qualitative and quantitative assessments on two public datasets (Dataset-1 and Dataset-2) annotated by expert radiologists. The experimental results prove that our method is a real solution that can better help in the COVID-19 diagnosis process. Keywords—Deep learning; COVID-19; loss function; balanced data


I. INTRODUCTION
In December 2019, a viral pneumonia epidemic of unknown etiology emerged in Wuhan city, Hubei province, China [1]. On January 9, 2020, the World Health Organization (WHO) and Chinese Health Authorities officially announced the discovery of a new coronavirus. This pneumonia is an infectious disease caused by a virus identified under the name SARS-CoV-2 (Severe Acute Respiratory Syndrome CoronaVirus-2) by the ICTV (International Committee on Taxonomy of Viruses) [2], and causing a disease called COVID-19 (COronaVIrus Disease 2019). SARS-CoV-2 belongs to the coronavirus family. The reservoir of this virus is probably animal. Although SARS-CoV-2 closely resembles a virus detected in a bat, the animal that transmits it to humans has yet to be identified with certainty. Several research studies suggest that the pangolin, a small mammal eaten in southern China, could be involved as an intermediate host between bats and humans.
The new coronavirus has been confirmed to be transmitted between humans [3], and this is done mainly by air or by close contact with a contagious subject. Smaller particles can also be emitted in the form of aerosols during speech or during coughing efforts, which would explain that the virus could persist suspended in the air in an unventilated room. Finally, the virus can retain infectivity for a few hours on inert surfaces from where it can be transported by the hands. According to data from the World Health Organization, updated up to 24 hours on June 18, 2021, COVID-19 has affected 220 countries and territories, causing 178,584,744 people to be infected and 3,866,607 deaths worldwide. The overall number of people recovered is 163,102,134. Currently, the active cases are 11,616,003 of which 99.3\% in mild condition and 0.7\% in serious or critical condition, which poses a great threat to international human health.
Due to the vaccination process slowness, the high rate of virus contamination, and the appearance of new dangerous COVID-19 mutations, it is essential to detect and identify the disease at an early stage so that suspected patients do not infect the healthy population. As a result, new requirements for the prevention and control strategy must be put in place. Reverse Transcription Polymerase Chain Reaction (RT-PCR), gene sequencing for respiratory, or blood samples confirm the diagnosis of COVID-19. However, the false negatives of the RT-PCR [4], the delay in obtaining the results, and the tests carried out on people not strongly suspected of being infected with COVID-19 imply that numerous COVID-19 patients would not be identified quickly to isolate them from others. In addition, given the rapid and contagious spread of the virus, they present a real threat to infect a larger population, especially in areas with high epidemics. On the other hand, chest examinations quickly established themselves as an interesting diagnostic tool, given the characteristic presentation of COVID-19 lesions [5]. These tests can identify lesions, underlying conditions and complications associated with acute airway conditions. Consequently, the use of CT in particular High Resolution Computed Tomography (HRCT) could provide enormous help to radiologists [7] for the diagnosis, follow-up or investigation of pulmonary complications in patients suspected or confirmed of COVID-19. Thus, the development of an artificial intelligence (AI) method based on deep learning could help them enormously to assess the degree of lung damage caused by COVID-19. 497 | P a g e www.ijacsa.thesai.org According to [6], the authors indicated that CT images can be used to detect COVID-19 even before certain clinical symptoms are observed. Typical signs of COVID-19 appear in CT images as unilateral, multifocal, and terminal Ground Glass Opacity (GGO). This is a hazy cloud above the lungs that indicates a variety of problems, and may mean that the lungs are partially filled with inflamed material, and there is thickening in lung tissue or partial breakdown of the alveoli and tiny air sacs of the lungs. Pleural effusion, lymphadenopathy, and condensation [3], which are air spaces in the lungs filled with a substance, usually pus, blood or water, surrounded by an opaque edge of frosted glass, and although this is a common feature of lung disease, it may be more characteristic of COVID-19. To detect COVID-19 disease at an early stage, it is necessary to detect and locate these pathological changes in a short time. The growing number of patients and the limited number of well-trained expert radiologists in most hospitals prevent and slow down the process of early detection. Indeed, the use of deep learning methods for the automatic segmentation of the COVID-19 CT model has become paramount, and may offer an effective solution to identify and locate signs of COVID-19 in CT images [8].
In this paper, a new efficient method of COVID-19 diagnostic using Deep Learning network is proposed. Section II presents the related works, Section III shows problem statement, and Section IV explains in detail the proposed method. Section V describes simulations experiments. Section VI discusses the obtained results. Finally, the summary and future works are delineated in Section VII.
II. RELATED WORK In the literature, numerous methods of segmentation based on deep learning networks have been used to process and analyze chest X-ray or CT images for the COVID-19 diagnosis [9]. These methods mainly consist of delineating the regions of interest in these images, such as lobes, bronchopulmonary segments, lung, and infected regions or lesions for further quantification and evaluation.
CT provides detailed and high definition three dimensional images to detect COVID-19. Among the segmentation methods used for the diagnosis of COVID-19 we cite, U-Net [10], UNet++ [11], V-Net [12]. The authors of [9] have proposed a 3D architecture of U-Net using inter-slice information; this method consists in replacing the conventional layers of U-Net by a 3D version. In [12], the authors proposed the V-Net architecture, in which they used the residual blocks as the basic convolutional block, and optimized the network by a loss of dice. In [13], the authors proposed the Attention U-Net method, which captures fine structures to locate lesions and pulmonary nodules in medical images. Generally, the large number of well-labeled images is the key to forming an efficient and robust segmentation network. In the case of COVID-19 image segmentation, the data used during the training phase is limited and often unavailable because manual lesion delineation is a difficult operation and requires a lot of time.
Several other research works obtaining reasonable segmentation results have been proposed in this context. The lung segmentation field is experiencing a lack of labeled medical images, as a result, semi supervised and unsupervised methods are very favorable and recommended in studies on COVID-19, as in [10], the authors used an unsupervised method to generate pseudo-segmentation masks for the images. In [14], the authors proposed a new COVID-19 lung infection segmentation network called Inf-Net to detect infected regions from chest CT images. This method uses a parallel decoder for aggregating high-level features and generating a global map. Then, it uses a semi-supervised segmentation framework based on a propagation strategy chosen at random to overcome the lack of labeled data. In [15], the authors proposed a computer-assisted diagnosis (CAD) system based on the YOLO predictor to detect and diagnose COVID-19. The CAD method calls the data balancing regularizations, transfer learning, and augmentation to improve the overall diagnostic performance for COVID-19. The authors of [16] proposed a synergistic approach based on deep meta-learning to accelerate the detection of COVID-19 cases. This approach uses contrastive learning with a pretrained ConvNet encoder for the classification of COVID-19 cases. In [17], the authors proposed a computer-aided detection (CAD) method to assist radiologists to automatically detect COVID-19 on the chest X-ray images. The proposed method uses the DLs: the Discrimination-DL to extract lung features from chest X-ray images, and the Localization-DL to localize and assign the infected lung region. In [18], the authors built prognosis models to predict the patients' severity outcomes. The proposed method is based on deep learning in the CT image segmentation process for COVID-19 pneumonia, and it uses datasets from multiple institutions worldwide to validate the proposed models. In [19], the authors proposes a CovFrameNet framework to detect COVID-19 cases using CT images, which incorporates an image preprocessing mechanism and a deep learning model for smoothing, denoising, feature extraction, classification, and performance measurement.
III. PROBLEM STATEMENT Our main goal trough this paper is to diagnose COVID-19 lung infection in chest CT images. In this regard, we use an architecture similar to that of the U-Net [10] approach called CovSeg-Unet method, U-Net is considered as the most commonly used algorithm in medical image segmentation. U-Net is a symmetrical encoder/decoder architecture consisting of several stages. Each stage of the encoder performs a set of operations such as, convolution, normalization, max pooling, activation, concatenation etc. In parallel, each stage of the decoder performs deconvolution operations. U-Net method uses jump connections allowing the exploitation of local and global information. These connections concatenate the subsampling characteristics of the contraction path with those of the up-sampling of the expanding path.
The general idea of the CovSeg-Unet approach is described as follows. Let S be a learning space which contains a set of n images = 1 … , and n corresponding groundtruth masks = 1 … R . From the Y ground-truth masks, the network learns the lung infections distribution of the X 498 | P a g e www.ijacsa.thesai.org learning images to establish an image-to-image mapping relationship between X and Y, this map is defined as follows: = օ .
is the encoder function that learns the characteristic vectors of infected lung regions to establish a functional space. While is the decoder function, which learns the spatial localization of features to better locate the infected/uninfected region. And is the set of probabilities extracted from an input biomedical image, and is represented by ( ) =

IV. PROPOSED METHOD
In this section, we present the CovSeg-UNet approach that we propose to diagnose COVID-19 from CT images. The CovSeg-UNet approach architecture is shown in Fig. 1. Generally, the CovSeg-UNet approach architecture's is made up of two blocks; the first one consists of preprocessing the CT images. While the second one performs all encoder/decoder operations to learn high level features from training sets, and locate the spatial information. The CovSeg-UNet network details are shown.

A. Preprocessing
One of the major problems encountered in the deep learning is processing of biomedical images that coming from several machines with different acquisition parameters. To deal with this problem, we apply a preprocessing step on these images to improve the learning of the COVID-19 suspect recognition model despite of the data heterogeneity. This step of preprocessing consists of two steps; the first one concerns the normalization of the signal that processes the intensity of each scanner CT voxel. In this step, we chose the pulmonary window value from Table I to separate the lungs from the other organs. Since each CT scanner has its own Hounsfield (HU) units, consequently, the data collected in different hospitals will have different HUs. For this reason, we are using a multi-valued window, ( is the window center value, and is the window width), the value is randomly assigned from -600 to -500, while the value is fixed at 1200. In the second preprocess step, we normalizing the CT images to be in [0,255]. Through this process we normalize multiple images of different scanners to the same standard, by separating the lungs from other organs, removing unnecessary information (CT features, etc.) or/and noise, increasing images, and improved accuracy.

B. CovSeg-UNet Architecture
In order to detect COVID-19 lung infections using CT images, we propose the CovSeg-UNet approach, which characterized by an end-to-end architecture based on one of the most robust approaches in biomedical image segmentation that is U-Net. The network of the CovSeg-UNet approach is supplied as input by pre-processed CT images and their ground-truth masks. Our proposed method relies on an encoder to extract contextual feature maps from pre-processed images to reduce the dimensions of CT images, and a decoder to locate feature map information in the image. Table II shows the detailed architecture of our encoder/decoder. Generally, the encoder is made up of four residual blocks (ResBlock), already pre-formed on the ImageNet database, the use of these blocks allows us to avoid the disappearance/explosion gradient, to preserve the local information, to improve precision by increasing the depth of the network, and to optimize the formation of layers. Each residual block is made up of two blocks (conv_bloc and identiy_bloc), and is expressed by ( ) + , where is the input vector, and (. ) represents the mapping from input to the output of the residual unit. However, the decoder has two inputs, one input from the parallel layer of the encoder and a second from the previous layer of the decoder. Finally, the decoder output is sent to the SoftMax activation function (see equation (1)) for the prediction of the region infected with COVID-19. Algorithm 1 describes the training steps of the proposed model. Where z ⃗ is the input vector of the softmax function (z 0 , … , z k ), all z i values can take any real value. e zj is applied to have a positive value for each element of the input vector. The term at the formula bottom is the normalization term, which makes it possible to have a = 1 of all the output values (are each in the range (0, 1)) of the function, thus constituting a valid probability distribution, represents the number of classes in the multiclass classifier. for mini batch sample { , predict( 2 ) with SoftMax equation 8 Compute ∆ the stochastic gradient by minimizing the loss function eq. 2 9 Update weights 10 ← + .

C. Loss
The imbalanced class problem is considered as the major challenge in the detection process of lung infections because the distribution of infected/uninfected regions is highly skewed (the infected regions vary between 0% and 20% of the pixels of the lung image). So, if the loss function does not consider this problem, the model will classify the majority of pixels as uninfected regions, and become overfit. For this reason, we use a class-balanced cross-loss function and the penalty factor .
The loss function is defined as a weighted sum of two loss functions; the balanced cross loss and the inverse cross loss : Respecting to the cross-loss, we use the balanced crossentropy to overcome noise in biomedical images, we also use the balance parameter to balance the pixel distribution of infected/uninfected regions in : Where ( = | ) is the ground-truth mask of the sample . = 0, 1, ( | ) is the probability map produced by the function softmax, w represents the balancing parameter ( ) = / ( ). S is the samples number in the training set, and ( ) represents the samples number in the class . However, cross-entropy relies heavily on the accuracy of the annotation. When the data is mislabeled, ( | ) will not be able to represent the true class distribution, which will lead the cross-entropy ( | ) to learn this incorrect distribution type. To deal with this problem, we use reverse learning to know which classes the input does not belong to. Inverse cross-entropy is defined as follows: The weighted combination of entropies ( and ) in the loss function allowed a good convergence of the gradient and relevant learning. The term efficiency exhibits in the distribution balance of infected and uninfected classes, while the term strength appears in the resistance against noise caused by scanner settings. As a result, the balanced symmetric entropy function obtained high performance in the COVID-19 lung infections localization on the test data set.

V. EXPERIMENT
In this section, we explain the different experiments carried out to evaluate the performance of our proposed approach under different simulation scenarios. We start by describing the used dataset and experimental environment of the simulation. Subsequently, we perform quantitative study of the hyper-parameters to show their effects on the model learning. Finally, we present the different performance metrics used to assess the efficiency and robustness of our approach.

A. Datasets
In this work, we have applied our method on two Datasets. 500 | P a g e www.ijacsa.thesai.org Dataset-1: this dataset contains two versions [21]. The first version is published on April 2, 2020, comprising 100 CT images (of 40 COVID-19 patients) of size 512 × 512 labeled by radiologists, these radiologists have defined three tags: pleural effusion in frosted glass (= 3), consolidation (= 2), and mask value (= 1). The second version is released on April 14, 2020, comprising 829 CT images (of 9 COVID-19 patients) sized 630 × 630. Radiologists have tagged 373 images with COVID-19 pulmonary symptoms and the rest of the images as normal cases.
Dataset-2: this dataset is publicly available, contains 20 CT volumes with more than 1800 slices collected from 40 different COVID patients [22]. Each CT slice is of size 512×512 labeled by expert radiologists to mark regions for infections.
The first column images of Fig. 2 represent the original images, while the second column images represent their corresponding ground-truth masks. Two examples of images of normal people are shown in the third and fourth rows. The first and second rows include two COVID-19 images, where the COVID-19 regions are the white and gray regions of the ground-truth masks, while the healthy regions are the black pixels (note that if a person is in good health his ground-truth mask will be completely black). In our simulations, we merged the two versions of dataset-1 to form a new dataset, while keeping 60% of these images for the training, 20% for the validation, and 20% for the test. We implemented our method in the Keras simulation environment with the TensorFlow back-end using the Python 2.7 programming language. Simulations were run on an infrastructure equipped with a Tesla P-100 GPU card, and 16 GB RAM memory.

B. Hyper-parameters Setting
In deep learning, hyper-parameters of the deep neural network crucially influence the performance of the network. In this part, we carried out several experiments to choose the best values of the Hyper-parameters allowing improving the performances of our method. In this regard, we fixed the number of epochs and the batch size respectively at 100 and 64. We simulated and compared the different precision obtained values with different optimizers, different learning rates, and different values of 1 , 2 of the loss function.  From Table III, we observe that the values of the hyperparameters λ 1 = 0.3, λ 2 = 0.5, and the use of the ADAM [23] optimizer with a learning rate equal to 0.000001 showed the best performance in term of accuracy.

C. Performance Metrics Evaluation
To assess the efficiency and robustness of our proposed approach, we use the following performance metrics: Accuracy [24], Sensitivity [25], Matthews Correlation Coefficient [25], and Dice [26]. By definition, higher values of these metrics imply a better segmentation quality. The mathematical formulas of these metrics are respectively expressed below: MCC= T P *T N-F P *F N �(T P +F P )(T P +F N)(T N+F P )(T N+F N)

VI. RESULTS
In this section, we evaluate the effectiveness of the proposed method by performing qualitative and quantitative studies on an open-source benchmark. The obtained results of our method are compared with those of existing methods in the state of the art. 501 | P a g e www.ijacsa.thesai.org

A. Ablation Study
To show the importance of each component of our approach, we did an ablation study on the dataset-1 by evaluating the following performance metrics; Accuracy, Dice, Sensitivity, and Precision. The study of ablations was subdivided into three possible cases. In the first case, we added the preprocessing block without using the loss function ( ) during the learning phase, whereas in the second case, in the learning phase we introduced the loss function without using the preprocessing block. In the latter case, we used the preprocessing block and the loss function ( ) (see Fig. 3). From these simulation cases, we notice that the data preprocessing step has a remarkable effect on the model performances. The first case shows the overfitting phenomenon that occurred when it exceeds epoch 40, and which led to a degradation in the performance of the model (see Fig. 4). The results of this study are illustrated in Table IV. On the other hand, the result of the third case shows that the use of the loss function with the preprocessing block helps to avoid the overfitting problem, and consequently, to improve the performances of the model and to have good results. Fig. 5 and Fig. 6 show the qualitative results of our method on different test samples, the first column (1) represents the lung images, the second column (2) shows the ground-truth mask, and the last column (3) corresponds to the predicted lung infection mask of COVID-19. These qualitative results prove the robustness and the efficiency of our diagnostic method of the region infected by COVID-19.    502 | P a g e www.ijacsa.thesai.org

B. Comparison with Baseline Methods
In this part, we compare our segmentation method of CT images with the reference segmentation methods such as U-NET basic [10], DenseUNet [27], Attention U-Net [13], and UNet++ [11].

C. Comparison with other Methods
Many studies have been done to diagnose COVID-19. To prove the robustness of our method, we carried out a comparative study with different approaches such as Inf-Net [14] and Automatic [17]. We simulated these approaches using their open-source implementation. The quantitative results obtained by the different methods are shown in Table VI. Dice, Sensitivity, and Accuracy are the performance metrics to be evaluated in this benchmarking study. From Table VI we notice that the CovSeg-Unet method reaches an Accuracy = 0.991 on Datasets-1, and an Accuracy = 0.983 on Datasets-2. The obtained values in these simulations prove that the CovSeg-Unet method outperforms other approaches in terms of Dice, Sensitivity, and Accuracy.

VII. CONCLUSION
In this work, we address a more difficult task in the segmentation of limited and unbalanced biomedical images. To cope with this task, we have proposed an end-to-end architecture similar to U-Net, the proposed method network learns the discriminating features of lung infections from CT images to establish an image-to-image mapping relationship. We used the ResNet50 architecture to preserve local information and avoid the issue of fading gradients. To improve the learning of the discriminating features of the network, we introduced a preprocessing block to remove noise and unnecessary information which blows up the performance of the network. In order to strengthen the model to be learned from non-equilibrium data, we have proposed a loss function . Experimental results on two datasets demonstrated the effectiveness of the CovSeg-Unet method in locating COVID-19 infected regions. The quantitative and qualitative results obtained by comparing CovSeg-Unet method with the others methods prove the efficiency of our method, which can be a real solution to detect, diagnose and locate regions infected with COVID-19.