Classification of Malignant and Benign Lung Nodule and Prediction of Image Label Class using Multi-Deep Model

Lung cancer has been listed as one of the world’s leading causes of death. Early diagnosis of lung nodules has great significance for the prevention of lung cancer. Despite major improvements in modern diagnosis and treatment, the five-year survival rate is only 18%. Before diagnosis, the classification of lung nodules is one important step, in particular, because automatic classification may help doctors with a valuable opinion. Although deep learning has shown improvement in the image classifications over traditional approaches, which focus on handcraft features, due to a large number of intra-class variational images and the inter-class similar images due to various imaging modalities, it remains challenging to classify lung nodule. In this paper, a multi-deep model (MD model) is proposed for lung nodule classification as well as to predict the image label class. This model is based on three phases that include multi-scale dilated convolutional blocks (MsDc), dual deep convolutional neural networks (DCNN A/B), and multi-task learning component (MTLc). Initially, the multi-scale features are derived through the MsDc process by using different dilated rates to enlarge the respective area. This technique is applied to a pair of images. Such images are accepted by dual DCNNs, and both models can learn mutually from each other in order to enhance the model accuracy. To further improve the performance of the proposed model, the output from both DCNNs split into two portions. The multi-task learning part is used to evaluate whether the input image pair is in the same group or not and also helps to classify them between benign and malignant. Furthermore, it can provide positive guidance if there is an error. Both the intra-class and inter-class (variation and similarity) of a dataset itself increase the efficiency of single DCNN. The effectiveness of mentioned technique is tested empirically by using the popular Lung Image Consortium Database (LIDC) dataset. The results show that the strategy is highly efficient in the form of sensitivity of 90.67%, specificity 90.80%, and accuracy of 90.73%. Keywords—Lung nodule classification; dilated blocks; dual DCNNs; multi-task learning; multi-deep model


I. INTRODUCTION
Lung cancer is the world's most prevalent and deadly type of cancer. The failure to diagnose the early stages of lung cancer is one cause of higher mortality induced by lung cancer because symptoms usually appear in the final stages [1]. In 2018, an estimated 154,000 deaths were recorded in the USA from lung cancer, which is 1/4 of all cancer death. In developed countries, there is a 16% chance for lung cancer patients of a five-year survival rate [2]. The detection of lung cancer in the initial phase is difficult due to dime-sized lesion growth. The small lesions can just detectable by computerized tomography (CT) scan, and it takes the amount of effort of radiologists to detect and label them as benign or malignant. Computer-aided diagnosis (CAD) systems help the radiologist to decrease the burden as a second option. Lung Nodule classification is also a challenging task in computer-aided diagnosis.
Nowadays, deep learning, which is rebranded from neural networks, is considered as one of the best solutions to solve many strenuous problems of computer vision and pattern recognition like medical diagnostics, natural language processing, etc.
Even though deep learning techniques perform better in the state-of-art in different medical images tasks, but still, there are many challenges to solve. For instance, facing issues with small medical dataset and biological variations. In [16,17], the pre-trained DCNN architectures have been used due to robust learning capability from large scale datasets like ImageNet to solve the generally small amount of data visual recognition problems.
The main issue of classifying lung nodules is inter-class ambiguity and intra-class variations [18], which pose complex challenges in different modalities in the differentiation of benign lesions from malignant ones. A clear example that shows the complication is shown in Fig. 1. In which there is a massive difference between (a) benign lesions, and (b) malignant lesions. Although both benign and malignant lesions are similar, respectively, in both color and shape. Even though several neural networks are productive sufficient to memorize all the training samples [19] due to useful ability of deep learning models, the uncertainty formed by intra-class variation and inter-class similarity can disturb the neural networks and make them fall into misperception. *Corresponding Author www.ijacsa.thesai.org To address the present challenges, in this paper, a multideep model is used for classification of a lung nodule in LIDC datasets, which consists of multi-scale dilated convolutional layer blocks, dual DCNNs, and multi-task learning component. The multi-scale method is applied on LIDC datasets, which helps to view and figure out the dataset at different scales. It worked as a speckle noise filter by eliminating the high-frequency constituent at every scale where no edges were spotted. Moreover, the dual pre-trained DCNNs extract the parameters from the ImageNet dataset [20], and these parameters were fine-tuned with our dataset. Both dual DCNNs simultaneously learn the representation of images from a pair of images, along with two similar images in various categories and two different images in the same category due to strong learning ability. The multi-task learning technique is applied on input pair of images where it will classify the multi-scale dilated Convolutional layers (MsDc) paired images which belong to the similar class or not and also helps to classify between malignant and benign. In addition, the multi deep model is easily trained "end-to-end" in classification supervision. During the test stage, each sample is given with the precision probabilities of a dual neural network together as the probability for a joint decision. The MD model is tested on LIDC datasets. Hence it is proved that the proposed model is state-of-the-art on lung nodule classification problem II. RELATED WORK Nowadays deep learning is helping in many medical fields to improve image classification accuracy. From the last few years, a number of solutions are being published to solve the image classification problems [3, 4, 5, 6, and 7]. These solutions contain handcrafted feature extraction and classifier learning process. For just a classification task, it is very difficult to design such a handcrafted feature. So, deep learning models help for superior classification and to minimize the need for manual feature design. It also helps to enhance the performance of medical image detection, classification [8,9], and segmentation [10,11]. For instance, Jung H et al. [12] adopt a 3D-DCNN with dense connection and shortcut connection technique for lung nodule classification using LUNA 16 datasets to capture the 3D features. Xu Y et al. [13] use a deep convolutional neural network algorithm in their paper to minimize the manual annotation and produce superior feature portrayal for classification and segmentation of colorectal cancer using Cellular pathology. In machine learning, multi-task learning techniques are used in many applications like in drug detection [21], speech recognition [22], spatio-temporal event forecasting [28] and in natural language processing [23], etc. Li X et al. [14] proposed a multi-task learning framework that captures all the lung nodule assortment by taking out all the distinctive features using Convolutional Neural Network (CNN) from alternatingly stacked layers. For improvement in the final result, they train the CNN and form multi-tasking learning which shares information among nine different nodule features at the same time. Wu Z et al. [15] developed a multi-scale convolutional neural network (CNN) for removing lesion surface from CT scans which rely on 3D context fusion named M3DCF. An improved Multi-scale algorithm is used by Qingyuan [24] for image enhancement by using a canny operator to divide the image into edge region and non-edge region zone. Wang Y et al. [25] introduced a succinct and powerful multi-scale dilated convolutional method which used the dilated filters to integrate situational multi-scale data without reducing the receptive field efficiently. The logic behind this approach was to focus on the phenomenon that the dilated convolutional can efficiently extend the correct receptive area while retaining the useful contextual information. In the meantime, they also use residual methods to improve the learning process. Another dilated convolutional approach is used by Wei Y [26] in which they used several multiple dilated convolutional blocks with different dilated levels to create dense position maps of objects for weakly or semi-supervised manner for semantic segmentation networks. It has been also found that dilated convolutional [27], which by increasing the respective field size of kernels, offers a promising solution. In which they used different dilated levels and generated the localization maps at these dilated rates to enhance the discriminative ability.

A. Multi-Deep Model
The proposed multi-deep (MD) model contains three main modules, i.e. Input block, the dual-DCNN unit, as shown in Fig. 2 and Multi-task learning component. The MD architecture takes a pair of images that are arbitrarily selected from the training data. The Dual DCNN contain DCNN-A and DCNN-B, which is the main two sequence learning module. The DCNN-A is pre-trained with the VGG16 network, and DCNN-B is pre-trained with ResNet50, and also both are finetuned to monitor the correct input sequence labels. The input block contains a pair of images where multi-scale dilated convolutional layers (MsDc) strategy is applied to both pairs. Then these pairs of images individually enter into DCNN-A and DCNN-B. A multi-task learning component is not only used to predict whether these pairs belong to the same class or not, but it also classifies the image pairs between benign and malignant. Furthermore, this component also helps to provide positive guidance if there is a synergistic mistake from both DCNNs. www.ijacsa.thesai.org

B. Input Block and Multi-scale Dilated Convolutional Layer
Not quite the same as the traditional DCNNs, the suggested model randomly selects the MsDc pair of images as input from training data. Each image has its class label, and then it transfers to each DCNN. Each image is redimensioned to 224 × 224 × 1 by using a bicubic interpolation algorithm to unify the image size. Inside the input block, there are pair of grayscale images as input. The MsDc strategy is applied to the paired images before getting into DCNN (A/B). After employing this technique, the 3 channels image with different scale dilated convolution operation. Then 3 one channel images got from different receptive fields to one three-channel image is concatenated as the input of DCNN (A/B) as presented in Fig. 3. This approach uses different scale dilated filters to integrate the multi-scale contextual information systematically by extending the receptive area of the convolutional layers. The logic behind this approach is focused on the phenomenon that the dilated convolution could effectively extend the correct receptive area while maintaining useful contextual information. The MsDc composed of different dilation rates, which leads to various receptive filed within the input images. So, the dilation rate 3, 5, and 7 is used in MsDc. In Fig. 4, it is clearly shown how dilation allows the transfer of information. For the classification process to recognize this as a "Lung cancer" image, the circle region in the green cycle is most discriminating. To learn the corresponding feature representation at the area shown by the red cycle, a 3 ×3 convolutional kernel is implemented.

C. The Dual Deep Convolutional Nueral Network
Within this suggested model, the dual DCNN is an essential component with two full training units, DCNN-A and DCNN-B. Although an arbitrary structure of any DCNN such as GoogleNet or AlexNet can be implemented as a DCNN part in the recommended method. Both DCNNs is trained using X = {x(1), x(2)…. x(M)} image sequence and the corresponding Y = {y(1), y(2)…. y(M)} label class. For DCNN-A, a pre-trained VGG-16 model is used, which has thirteen convolutional layers and three fully-connected (FC) layers. Then a pre-trained ResNet 50 is adopted to initialize the DCNN-B because of the high representation capability of the popular residual network. It is concocted of 50 learnable layers, and it also trained for classification tasks on ImageNet datasets. In order to adopt the above models to our dataset, all fully connected (FC) layer is supplant to FC of 1024 neurons for DCNN(A/B) and then fine-tuning the ResNet-50 and VGG16 parameters by utilization our own training data. During optimization, the parameters of DCNN (A/B) are not shared mutually, which is denoted by θ A, and θ B. The uniform U (-0.05, 0.05) distribution of the weights of new FC layers is initialize. Both DCNNs is defined as the crossentropy loss function as:  Where Q is the number of training data. The Adam optimizer(mini-batch Adam) is used to optimize the θ. Input from a pair of images is accepted by both DCNN, which seek to control the training process with the true labels of the respective input sequence in each learning unit. While both DCNN has the ability to determine the input image label class, so, a multi-task learning component is produced that breaks the learning independency of the dual DCNNs by integrating activations from the last two FC layers in both DCNNs.

D. Multi-task Learning Component
To further track the training of each pair of images of each DCNN part, a multi-task learning component (MTLc) is designed, which consists of fully connected layers and vector concatenation layer, as shown in Fig. 5. Let the DCNNs components (A/B) have an image pair ( , ), respectively. The se` cond to last fc layer performance in the DCNN is described as the deep image features learned from that DCNN that can be accomplished through forwarding computation, as shown formally: In the training data, the image pairs are arbitrary selected and denoted the attribute of a pair (xc, xd) as: Where xc and xd are lung nodule images, yc and yd, respectively represent the true labels of xc and xd (like benign or malignant). The number of positive pairs, which is S=1 in the LIDC datasets is approximately 45% -55% to prevent the unbalance data issue. After the output got from both DCNN A/B, two 1024 dimensional FC vectors, we copy and break them into 2 pieces for the multi-task learning part. In order to improve classification across learning tasks, the aim of multitask is to accomplish mutual training by leveraging dependencies in the functionality in order to improve the performance of one task using the other. Moreover, softmax function is used as the non-linear activation function for the final prediction layer and use ReLU function as the non-linear activation for other fully connected layers. For each multi-task component, the hidden layers have 1000 neurons to solve the more complex problem. Furthermore, by using MTLc technique the two tasks is intended for prediction. The first task distinguishes among malignant and benign tumor while the other task will concatenate the vector from another DCNN network to predict the image label class. The analysis of the organized signal is useful to expedient the following binary entropy loss of the MTLc.

IV. EXPERIMENT
This section includes details of the implementation and analysis of the proposed model as well as a comparison of the experimental results. Section "A and B" illustrates how the hyperparameters are initiated and it also describes the dataset used in experiments. Section "C" presents the experiment on the LIDC dataset. The impact of λ is mainly tested on different values and compares the classification accuracy of the proposed model, and it also describes the comparison of MD model with several recently effective methods.

A. Dataset and Hyperparameter Setting
The LIDC dataset consists of 1018 CT scan patients of pulmonary cancer, and these CT images are divided into five categories. Detailed information for each nodule (diameter, coordinate, malignancy, texture, etc.) is indicated by 4 professional radiologists. The diameter of the nodules is between 3 mm and 30 mm. The interpolation spline technique is used as a method to separate with 1 mm × 1 mm due to the difference in resolution. In this study, the presumption is examined in the malignancy of nodules. Each nodule has an annotated malignancy by radiologists ranking from 1 to 5. As the final decision, the voting strategy is considered. If the nodule value was over 3 annotated by more than two radiologists, the nodule is considered as malignant. In contrast, the nodule is deemed benign. There are approximately 195 malignant nodules and 158 benign nodules. The nodules with the same votes are rejected. The central transect for each nodule voxel is extracted to minimize computational complexity. To alleviate the deep learning model overfitting, the Data Augmentation (DA) method is applied to enlarge the data by adding variants to a dataset. In particular, by zooming 0.2, the image is randomly flip and magnify. The selection of the translation step is from [-6, 6] voxels, and the angle of rotation was selected randomly from [90°, 180°, and 270°]. Finally, there are 1956 malignant nodules, 1862 benign nodules.

B. Hyperparameter Setting
For hyperparameter setting the number of iteration steps is set to 12000, and the learning rate is modified using an exponential decay process, and the initialize learning rate r =0.0002, with the learning rate attenuation of 0.95. The minibatch Adam was acknowledged as an optimizer with a batch size 64. To stop the training procedure when the model is overfitting, 20% of training data were randomly selected to form a validation set, which was used to monitor the performance of prospective model. As the training model, Keras is used to distinguish the benign and malignant lung nodules and Tensorflow as Keras backup. Python versions 2.7-3.6 are available in Keras. And it offers the following benefits: strong modularity, reduced simplicity, and scalability.

C. Experimental Result and Analysis
To evaluate the MD model performance on the LIDC dataset, first, the efficiency of the MD model is assessed y taking different values from the hyperparameter λ. The results in Fig. 6 indicate the best performance of suggested model with λ= 15. Then the classification accuracy of the proposed MD model is compared with the same experimental setting against the VGG-16, ResNet-50 model, and the proposed MD model is also evaluated without multi-scale dilated convolutional layer. The classification accuracies of the above models in each DCNN (A/B) are displayed in Fig. 7, and it is clearly shown that the recommended model attains the highest accuracy of 90.73% on the validation set. This reveals that each component of designed model, which is also VGG-16, ResNet-50 achieves an improvement in the precision of over more than 3% relative to the standard VGG-16 and ResNet-50 norm since integrating the multi-task learning component into a dual-DCNN architecture.  To assess the designed model's performance against the related methods, it is also compared to certain other effective methods with recent good results shown in Table I. Kumar et al. [29,30] introduced a new ensemble system for the classification of medical images. The ensemble method has used many advanced CNNs as tailored extractors for image features that have captured the different information in medical images of different types. Zhang J et al. [31] have suggested a dual deep-convolutional neural network that is equipped with a synergic signal network to learn the representation of the image jointly and as well as a synergic signal system helps to verify the pair of the image belongs to the same category or not. Yuhai et al. [32] used an ensemble of 5 pretrained VGG and ResNet-50 models, and 5 completely trained DCNN models, the ImageCLEF-2013 data set is augmented, and the baseline ResNet-50 model is improved by far. A hierarchical learning mechanism called Multi-scale Convolutional Neural Network (MCNN) [33] which is used to explore nodule complexity by eliminating discriminatory features from alternating stacked layers. The architecture uses multi-sized nodule patches to learn a collection of class features at the same time through concatenating response neuron activations of each input level at the last layer. For the classification of lung nodule [34], Haralick, Gabor, and LBP (local binary patterns) features are extracted and also implemented SVM classifier. They achieved 89.5% sensitivity and 86.02% specificity, respectively. Dhara et al. [35] based on the edge, shape, and textures features to ensure a benign and malignant classification.
However, the manual feature set is insufficiently suited as to whether the difference between the various pulmonary nodules types can be described accurately. Devinder Kumar et al. [36] suggested a CAD method utilizing deep functions separated from an auto-encoder for classifying pulmonary nodules as malignant or benign. In the analysis of 9 semantic features in CT images for lung nodules, Chen et al. [37] utilized three multi-task learning (MTL) systems for the use of various computational features extracted from deep learning systems of the convolutional neural network (CNN) and stacked denoising auto-encoder.

V. CONCLUSION
In this paper, a Multi-deep model is proposed for lung nodule classification and to resolve the problem posed by intra-class variation and inter-class similarity. In the first step, the MsDc method is applied to pair of images, which could help to increase the performance of lung nodule classification afterward our technique uses dual DCNNs with a multi-task learning component to allow dual DCNNs to learn from one another. It promotes the ability of the suggested model to distinguish between interclass samples that are easily ignored and the clear diversity of intra-class samples. The experimental result on the LIDC-IDRI dataset demonstrates that the proposed model attains the state-of-the-art performance in the lung nodule classification problem.