Lung Cancer Detection and Classification with 3 D Convolutional Neural Network ( 3 D-CNN )

This paper demonstrates a computer-aided diagnosis (CAD) system for lung cancer classification of CT scans with unmarked nodules, a dataset from the Kaggle Data Science Bowl 2017. Thresholding was used as an initial segmentation approach to segment out lung tissue from the rest of the CT scan. Thresholding produced the next best lung segmentation. The initial approach was to directly feed the segmented CT scans into 3D CNNs for classification, but this proved to be inadequate. Instead, a modified U-Net trained on LUNA16 data (CT scans with labeled nodules) was used to first detect nodule candidates in the Kaggle CT scans. The U-Net nodule detection produced many false positives, so regions of CTs with segmented lungs where the most likely nodule candidates were located as determined by the U-Net output were fed into 3D Convolutional Neural Networks (CNNs) to ultimately classify the CT scan as positive or negative for lung cancer. The 3D CNNs produced a test set Accuracy of 86.6%. The performance of our CAD system outperforms the current CAD systems in literature which have several training and testing phases that each requires a lot of labeled data, while our CAD system has only three major phases (segmentation, nodule candidate detection, and malignancy classification), allowing more efficient training and detection and more generalizability to other cancers. Keywords—Lung Cancer; Computed Tomography; Deep Learning; Convolutional Neural Networks; Segmentation.


I. INTRODUCTION
Lung cancer is one of the most common cancers, accounting for over 225,000 cases, 150,000 deaths, and $12 billion in health care costs yearly in the U.S. [1].It is also one of the deadliest cancers; overall, only 17% of people in the U.S. diagnosed with lung cancer survive five years after the diagnosis, and the survival rate is lower in developing countries.The stage of a cancer refers to how extensively it has metastasized.Stages 1 and 2 refer to cancers localized to the lungs and latter stages refer to cancers that have spread to other organs.Current diagnostic methods include biopsies and imaging, such as CT scans.Early detection of lung cancer (detection during the earlier stages) significantly improves the chances for survival, but it is also more difficult to detect early stages of lung cancer as there are fewer symptoms [1].
Our task is a binary classification problem to detect the presence of lung cancer in patient CT scans of lungs with and without early stage lung cancer.We aim to use methods from computer vision and deep learning, particularly 2D and 3D convolutional neural networks, to build an accurate classifier.An accurate lung cancer classifier could speed up and reduce costs of lung cancer screening, allowing for more widespread early detection and improved survival.The goal is to construct a computer-aided diagnosis (CAD) system that takes as input patient chest CT scans and outputs whether or not the patient has lung cancer [2].
Though this task seems straightforward, it is actually a needle in the haystack problem.In order to determine whether or not a patient has early-stage cancer, the CAD system would have to detect the presence of a tiny nodule (< 10 mm in diameter for early stage cancers) from a large 3D lung CT scan (typically around 200 mm × 400 mm × 400 mm).An example of an early stage lung cancer nodule shown in within a 2D slice of a CT scan is given in Fig. 1.Furthermore, a CT scan is filled with noise from surrounding tissues, bone, air, so for the CAD systems search to be efficient, this noise would first have to be preprocessed.Hence our classification pipeline is image preprocessing, nodule candidates detection, malignancy classification.
In this paper, we apply an extensive preprocessing techniques to get the accurate nodules in order to enhance the accuracy of detection of lung cancer.Moreover, we perform an end-to-end training of CNN from scratch in order to realize the full potential of the neural network i.e. to learn discriminative features.Extensive experimental evaluations are performed on a dataset comprising lung nodules from more than 1390 low dose CT scans.

Background
Typical CAD systems for lung cancer from literature have the following pipeline: image preprocessing → detection of cancerous nodule candidates → nodule candidate false positive reduction → malignancy prediction for each nodule candidate → malignancy prediction for overall CT scan [4].These pipelines have many phases, each of which is computationally expensive and requires well-labeled data during training.For example, the false positive reduction phase requires a dataset of labeled true and false nodule candidates, and the nodule malignancy prediction phase re-an axial slice) o in DICOM form the Kaggle data in our malignan Because the quate to accura the patient lung the LUng Nodu to train a U-Ne dataset contains vide into a train size 178.For ea and a nodule lab ameter).For ea variable numbe image is an axia LUNA16 da tection, one of t problem is to a or 'no cancer') We will use ac the ROC to eval Kaggle test set.

Methods
We preproce The paper's arrangement is as follows: Related work is summarized briefly in Section II.Dataset for this paper is described in Section III.The methods for segmentation are presented in section IV.The nodule segmentation is introduced in Section V based on U-Net architecture.Section VI presents 3D Convolutional Neural Network for nodule classification and www.ijacsa.thesai.orgpatient classification.Our discussion and results are described in details in Section VII.Section VIII concludes the paper.

II. RELATED WORK
Recently, deep artificial neural networks have been applied in many applications in pattern recognition and machine learning, especially, Convolutional neural networks (CNNs) which is one class of models [3].Another approach of CNNs was applied on ImageNet Classification in 2012 is called an ensemble CNNs which outperformed the best results which were popular in the computer vision community [4].There has also been popular latest research in the area of medical imaging using deep learning with promising results.Suk et al. [5] suggested a new latent and shared feature representation of neuro-imaging data of brain using Deep Boltzmann Machine (DBM) for AD/MCI diagnosis.Wu et al. [6] developed deep feature learning for deformable registration of brain MR images to improve image registration by using deep features.Xu et al. [7] presented the effectiveness of using deep neural networks (DNNs) for feature extraction in medical image analysis as a supervised approach.Kumar et al. [8] proposed a CAD system which uses deep features extracted from an autoencoder to classify lung nodules as either malignant or benign on LIDC database.In [9], Yaniv et al. presented a system for medical application of chest pathology detection in x-rays which uses convolutional neural networks that are learned from a non-medical archive.that work showed a combination of deep learning (Decaf) and PiCodes features achieves the best performance.The proposed combination presented the feasibility of detecting pathology in chest x-ray using deep learning approaches based on nonmedical learning.The used database was composed of 93 images.They obtained an area under curve (AUC) of 0.93 for Right Pleural Effusion detection, 0.89 for Enlarged heart detection and 0.79 for classification between healthy and abnormal chest x-ray.
In [10], Suna W. et al., implemented three different deep learning algorithms, Convolutional Neural Network (CNN), Deep Belief Networks (DBNs), Stacked Denoising Autoencoder (SDAE), and compared them with the traditional image feature based CAD system.The CNN architecture contains eight layers of convolutional and pooling layers, interchangeably.For the traditional compared to algorithm, there were about 35 extracted texture and morphological features.These features were fed to the kernel based support vector machine (SVM) for training and classification.The resulted accuracy for the CNN approach reached 0.7976 which was little higher than the traditional SVM, with 0.7940.They used the Lung Image Database Consortium and Image Database Resource Initiative (LIDC/IDRI) public databases, with about 1018 lung cases.
In [11], J. Tan et al. designed a framework that detected lung nodules, then reduced the false positive for the detected nodules based on Deep neural network and Convolutional Neural Network.The CNN has four convolutional layers and four pooling layers.The filter was of depth 32 and size 3,5.The used dataset was acquired from the LIDC-IDRI for about 85 patients.The resulted sensitivity was of 0.82.The False positive reduction gotten by DNN was 0.329.
In [12], R. Golan proposed a framework that train the weights of the CNN by a back propagation to detect lung nodules in the CT image sub-volumes.This system achieved sensitivity of 78.9% with 20 false positives, while 71.2% with 10 FPs per scan, on lung nodules that have been annotated by all four radiologists Convolutional neural networks have achieved better than Deep Belief Networks in current studies on benchmark computer vision datasets.The CNNs have attracted considerable interest in machine learning since they have strong representation ability in learning useful features from input data in recent years.

III. DATA
Our primary dataset is the patient lung CT scan dataset from Kaggles Data Science Bowl (DSB) 2017 [13].The dataset contains labeled data for 1397 patients, which we divide into training set of size 978, and test set of size 419.For each patient, the data consists of CT scan data and a label (0 for no cancer, 1 for cancer).Note that the Kaggle dataset does not have labeled nodules.For each patient, the CT scan data consists of a variable number of images (typically around 100-400, each image is an axial slice) of 512 × 512 pixels.The slices are provided in DICOM format.Around 70% of the provided labels in the Kaggle dataset are 0, so we used a weighted loss function in our malignancy classifier to address this imbalance.
Because the Kaggle dataset alone proved to be inadequate to accurately classify the validation set, we also used the patient lung CT scan dataset with labeled nodules from the Lung Nodule Analysis 2016 (LUNA16) Challenge [14] to train a U-Net for lung nodule detection.The LUNA16 dataset contains labeled data for 888 patients, which we divided into a training set of size 710 and a validation set of size 178.For each patient, the data consists of CT scan data and a nodule label (list of nodule center coordinates and diameter).For each patient, the CT scan data consists of a variable number of images (typically around 100-400, each image is an axial slice) of 512 × 512 pixels.
LUNA16 data was used to train a U-Net for nodule detection, one of the phases in our classification pipeline.The problem is to accurately predict a patient's label ('cancer' or 'no cancer') based on the patient's Kaggle lung CT scan.We will use accuracy, sensitivity, specificity, and AUC of the ROC to evaluate our CAD system's performance on the Kaggle test set.

IV. METHODS
Typical CAD systems for lung cancer have the following pipeline: image preprocessing, detection of cancerous nodule candidates, nodule candidate false positive reduction, malignancy prediction for each nodule candidate, and malignancy prediction for overall CT scan [15].These pipelines have many phases, each of which is computationally expensive and requires well-labeled data during training.For example, the false positive reduction phase requires a dataset of labeled true and false nodule candidates, and the nodule malignancy prediction phase requires a dataset with nodules labeled with malignancy.True/False labels for nodule candidates and malignancy labels for nodules are sparse for lung cancer, and may be nonexistent for some other cancers, so CAD systems that rely on such data would not generalize to other cancers.In order to achieve greater computational efficiency and generalizability to other cancers, the proposed CAD system has shorter pipeline and only requires the following data during training: a dataset of CT scans with true nodules labeled, and a dataset of CT scans with an overall malignancy label.State-of-the-art CAD systems that predict malignancy from CT scans achieve AUC of up to 0.83 [16].However, as mentioned above, these systems take as input various labeled data that is not used in this framework.The main goal of the proposed system is to reach close to this performance.The proposed CAD system starts with preprocessing the 3D CT scans using segmentation, normalization, downsampling, and zero-centering.The initial approach was to simply input the preprocessed 3D CT scans into 3D CNNs, but the results were poor.So an additional preprocessing was performed to input only regions of interests into the 3D CNNs.To identify regions of interest, a U-Net was trained for nodule candidate detection.Then input regions around nodule candidates detected by the U-Net was fed into 3D CNNs to ultimately classify the CT scans as positive or negative for lung cancer.The overall architecture is shown in Fig. 2, all details of layers will be described in the next sections.

A. Proprocessing and Segmentation
For each patient, pixel values was first converted in each image to Hounsfield units (HU), a measurement of radiodensity, and 2D slices are stacked into a single 3D image.Because tumors form on lung tissue, segmentation is used to mask out the bone, outside air, and other substances that would make data noisy, and leave only lung tissue information for the classifier.A number of segmentation approaches were tried, including thresholding, clustering (Kmeans and Meanshift), and Watershed.K-means and Meanshift allow very little supervision and did not produce good qualitative results.Watershed produced the best qualitative results, but took too long to run to use by the deadline.Ultimately, thresholding was used.
After segmentation, the 3D image is normalized by applying the linear scaling to squeezed all pixels of the original unsegmented image to values between 0 and 1. Spline interpolation downsamples each 3D image by a scale of 0.5 in each of the three dimensions.Finally, zero-centering is performed on data by subtracting the mean of all the images from the training set.
After segmentation, we normalize the 3D image by applying the linear scaling to squeezed all pixels of the original unsegmented image to values between 0 and 1.Then we use spline interpolation to downsample each 3D image by a scale of 0.5 in each of the three dimensions.Finally, we zero-center the data be subtracting the mean of all the images from the training set.

Thresholding
Typical radiodensities of various parts of a CT scan are shown in Table 1.Air is typically around −1000 HU, lung tissue is typically around −500, water, blood, and other tissues are around 0 HU, and bone is typically around 700 HU, so we mask out pixels that are close to −1000 or above −320 to leave lung tissue as the only segment.The distribution of pixel Hounsfield units at various axial slices for a sample patient are shown in Figure 2. Pixels thresholded at 400 HU are shown in Figure 3, and the mask is shown in Figure 4 However, to account for the possibility that some cancerous growth could occur within the bronchioles (air pathways) inside the lung, which are shown in Figure 5, we choose to include this air to create the finalized mask as shown in Figure 6.

Watershed
The segmentation obtained from thresholding has a lot of noise-many voxels that were part of lung tissue, especially voxels at the edge of the lung, tended to fall outside the range of lung tissue radiodensity due to CT scan noise.This means that our classifier will not be able to correctly classify images in which cancerous nodules are located at the edge of the lung.To filter noise and include voxels from the edges, we use Marker-driven watershed segmentation, as described in Al-Tarawneh et al. [9].An original 2D CT slice of a sample patient is given in Figure 7.The resulting 2D slice of the lung segmentation mask created by thresholding is shown in Figure 8, and the resulting 2D slice of the lung segmentation mask created by Watershed is shown in Figure 10.Qualitatively, this produces a much better segmentation than thresholding.Missing voxels (black dots in we were unable to preprocess all CT scans using Watershed, so we used thresholding.

U-Net for Nodule Detection
We initially tried directly inputting the entire segmented lungs into malignancy classifiers, but the results were poor.It was likely the case that the entire image was too large a search space.Thus we need a way of inputting smaller regions of interest instead of the entire segmented 3D image.After segmentation, we normalize the 3D image by applying the linear scaling to squeezed all pixels of the original unsegmented image to values between 0 and 1.Then we use spline interpolation to downsample each 3D image by a scale of 0.5 in each of the three dimensions.Finally, we zero-center the data be subtracting the mean of all the images from the training set.

Thresholding
Typical radiodensities of various parts of a CT scan are shown in Table 1.Air is typically around −1000 HU, lung tissue is typically around −500, water, blood, and other tissues are around 0 HU, and bone is typically around 700 HU, so we mask out pixels that are close to −1000 or above −320 to leave lung tissue as the only segment.The distribution of pixel Hounsfield units at various axial slices for a sample patient are shown in Figure 2. Pixels thresholded at 400 HU are shown in Figure 3, and the mask is shown in Figure 4 However, to account for the possibility that some cancerous growth could occur within the bronchioles (air pathways) inside the lung, which are shown in Figure 5, we choose to include this air to create the finalized mask as shown in Figure 6.

Watershed
The segmentation obtained from thresholding has a lot of noise-many voxels that were part of lung tissue, especially voxels at the edge of the lung, tended to fall outside the range of lung tissue radiodensity due to CT scan noise.This means that our classifier will not be able to correctly classify images in which cancerous nodules are located at the edge of the lung.To filter noise and include voxels from the edges, we use Marker-driven watershed segmentation, as described in Al-Tarawneh et al. [9].An original 2D CT slice of a sample patient is given in Figure 7.The resulting 2D slice of the lung segmentation mask created by thresholding is shown in Figure 8, and the resulting 2D slice of the lung segmentation mask created by Watershed is shown in Figure 10.Qualitatively, this produces a much better segmentation than thresholding.Missing voxels (black dots in we were unable to preprocess all CT scans using Watershed, so we used thresholding.

U-Net for Nodule Detection
We initially tried directly inputting the entire segmented lungs into malignancy classifiers, but the results were poor.It was likely the case that the entire image was too large a search space.Thus we need a way of inputting smaller regions of interest instead of the entire segmented 3D image.I. Air is typically around -1000 HU, lung tissue is typically around -500, water, blood, and other tissues are around 0 HU, and bone is typically around 700 HU, so pixels that are close to -1000 or above -320 are masked www.ijacsa.thesai.orgout to leave lung tissue as the only segment.The distribution of pixel Hounsfield units at various axial slices for a sample patient are shown in Fig. 3. Pixels thresholded at 400 HU are shown in Fig. 3a, and the mask is shown in Fig. 3b.However, to account for the possibility that some cancerous growth could occur within the bronchioles (air pathways) inside the lung, which are shown in Fig. 4c, this air is included to create the finalized mask as shown in Fig. 4d.2) Watershed: The segmentation obtained from thresholding has a lot of noise.Many voxels that were part of lung tissue, especially voxels at the edge of the lung, tended to fall outside the range of lung tissue radiodensity due to CT scan noise.This means that our classifier will not be able to correctly classify images in which cancerous nodules are located at the edge of the lung.To filter noise and include voxels from the edges, we use Marker-driven watershed segmentation, as described in Al-Tarawneh et al. [17].An original 2D CT slice of a sample patient is given in Fig. 5a.The resulting 2D slice of the lung segmentation mask created by thresholding is shown in Fig. 5b, and the resulting 2D slice of the lung segmentation mask created by Watershed is shown in Fig. 5d.Qualitatively, this produces a much better segmentation than thresholding.Missing voxels (black dots in Fig. 5b) are largely re-included.However, this is much less efficient than basic thresholding, so due to time limitations, it was not possible to preprocess all CT scans using Watershed, so thresholding is used instead.[10].U-Net is a 2D CNN architecture that is popular for biomedical image segmentation.We designed a stripped-down version of the U-Net to limit memory expense.A visualization of our U-Net architecture is included in Figure 11 and is described in detail in Table 2.During training, our modified U-Net takes as input 256 × 256 2D CT slices, and labels are provided (256×256 mask where nodule pixels are 1, rest are 0).The model is trained to output images of shape 256 × 256 were each pixels of the output has a value between 0 and 1 indicating the probability the pixel belongs to a nodule.This is done by taking the slice corresponding to label one of the

Malignancy Classifiers
Once we trained the U-Net on the we ran it on 2D slices of Kaggle data The model is trained to output images of shape 256 × 256 were each pixels of the output has a value between 0 and 1 indicating the probability the pixel belongs to a nodule.This is done by taking the slice corresponding to label one of the softmax of the final U-Net layer.Corresponding U-Net inputs, labels, and predictions on a patient from the LUNA16 validation set is shown in Fig. 7a, 7b, and 7c, respectively.www.ijacsa.thesai.org

Malignancy Classifiers
Once we trained the U-Net on the LUNA16 data, we ran it on 2D slices of Kaggle data and stacked the   Ideally the output of U-Net would give us the exact locations of all the nodules, and we would be able to say images with nodules as detected by U-Net are positive for lung cancer, and images without any nodules detected by U-Net are negative for lung cancer.However, as shown in Figure 14, U-Net produces a strong signal for the actual nodule, but also produces a lot of false positives, so we need an additional classifier that determines the malignancy.Because our U-Net generates more suspicious regions than actual nodules, we located the top 8 nodule candidates (32 × 32 × 32 volumes) by sliding a window over the data and saving the locations of the 8 most activated (largest L2 norm) sectors.To prevent the top sectors from simply being clustered in the brightest region of the image, the 8 sectors we ultimately chose were not permitted to overlap with each other.We then combined these sectors into a single 64 × 64 × 64 image, which will serve as the input to our classifiers, which assign a label to the image (cancer or not cancer).
We use a linear classifier as a baseline, a vanilla 3D CNN, and a Googlenet-based 3D CNN.Each of our classifiers uses weighted softmax cross entropy loss (weight for a label is the inverse of the frequency of the label in the training set) and Adam Optimizer, and the CNNs use ReLU activation and droupout after each convolutional layer during training.The vanilla 3D CNN is based on a 3D CNN designed for this task [11].We shrunk the network to prevent parameter overload for the relatively small Kaggle dataset.A visualization of our vanilla 3D CNN architecture is included in Figure 15 and described in detail in Table 3 We also designed a 3D Googlenet-based model is based on the 2D model designed in Szegedy et al. for image classification [12].A visualization of our 3D Googlenet is included in Figure 16 and described in detail in Table 4. Refer to Szegedy et al. for more information on the inception module [12].

Results
The results are shown in Table 5, and ROC curves for the Vanilla CNN and 3D Googlenet are shown in Figure 17.

(a)
Figure 12: U-Net sample input from LUNA16 validation set.Note that the above image has the largest nodule from the LUNA16 validation set, which we chose for claritymost nodules are significantly smaller than the largest one in this image.Ideally the output of U-Net would give us the exact locations of all the nodules, and we would be able to say images with nodules as detected by U-Net are positive for lung cancer, and images without any nodules detected by U-Net are negative for lung cancer.However, as shown in Figure 14, U-Net produces a strong signal for the actual nodule, but also produces a lot of false positives, so we need an additional classifier that determines the malignancy.Because our U-Net generates more suspicious regions than actual nodules, we located the top 8 nodule candidates (32 × 32 × 32 volumes) by sliding a window over the data and saving the locations of the 8 most activated (largest L2 norm) sectors.To prevent the top sectors from simply being clustered in the brightest region of the image, the 8 sectors we ultimately chose were not permitted to overlap with each other.We then combined these sectors into a single 64 × 64 × 64 image, which will serve as the input to our classifiers, which assign a label to the image (cancer or not cancer).
We use a linear classifier as a baseline, a vanilla 3D CNN, and a Googlenet-based 3D CNN.Each of our classifiers uses weighted softmax cross entropy loss (weight for a label is the inverse of the frequency of the label in the training set) and Adam Optimizer, and the CNNs use ReLU activation and droupout after each convolutional layer during training.The vanilla 3D CNN is based on a 3D CNN designed for this task [11].We shrunk the network to prevent parameter overload for the relatively small Kaggle dataset.A visualization of our vanilla 3D CNN architecture is included in Figure 15 and described in detail in Table 3 We also designed a 3D Googlenet-based model is based on the 2D model designed in Szegedy et al. for image classification [12].A visualization of our 3D Googlenet is included in Figure 16 and described in detail in Table 4. Refer to Szegedy et al. for more information on the inception module [12].

Results
The results are shown in Table 5, and ROC curves for the Vanilla CNN and 3D Googlenet are shown in Figure 17.Ideally the output of U-Net would give us the exact locations of all the nodules, and we would be able to say images with nodules as detected by U-Net are positive for lung cancer, and images without any nodules detected by U-Net are negative for lung cancer.However, as shown in Figure 14, U-Net produces a strong signal for the actual nodule, but also produces a lot of false positives, so we need an additional classifier that determines the malignancy.Because our U-Net generates more suspicious regions than actual nodules, we located the top 8 nodule candidates (32 × 32 × 32 volumes) by sliding a window over the data and saving the locations of the 8 most activated (largest L2 norm) sectors.To prevent the top sectors from simply being clustered in the brightest region of the image, the 8 sectors we ultimately chose were not permitted to overlap with each other.We then combined these sectors into a single 64 × 64 × 64 image, which will serve as the input to our classifiers, which assign a label to the image (cancer or not cancer).
We use a linear classifier as a baseline, a vanilla 3D CNN, and a Googlenet-based 3D CNN.Each of our classifiers uses weighted softmax cross entropy loss (weight for a label is the inverse of the frequency of the label in the training set) and Adam Optimizer, and the CNNs use ReLU activation and droupout after each convolutional layer during training.The vanilla 3D CNN is based on a 3D CNN designed for this task [11].We shrunk the network to prevent parameter overload for the relatively small Kaggle dataset.A visualization of our vanilla 3D CNN architecture is included in Figure 15 and described in detail in Table 3 We also designed a 3D Googlenet-based model is based on the 2D model designed in Szegedy et al. for image classification [12].A visualization of our 3D Googlenet is included in Figure 16 and described in detail in Table 4. Refer to Szegedy et al. for more information on the inception module [12].

Results
The results are shown in Table 5, and ROC curves for the Vanilla CNN and 3D Googlenet are shown in Figure 17.Most nodules are much smaller.A weighted softmax crossentropy loss calculated for each pixel, as a label of 0 is far more common in the mask than a label of 1.The trained U-Net is then applied to the segmented Kaggle CT scan slices to generate nodule candidates.

VI. MALIGNANCY 3D CNN CLASSIFIERS
Once the U-Net was trained on the LUNA16 data, it is ran on 2D slices of Kaggle data and stacked the 2D slices back to generate nodule candidates 1 .Ideally the output of U-Net would give the exact locations of all the nodules, and it would be able to declare images with nodules as detected by U-Net are positive for lung cancer, and images without any nodules detected by U-Net are negative for lung cancer.However, as shown in Fig. 7c, U-Net produces a strong signal for the actual nodule, but also produces a lot of false positives, so we need an additional classifier that determines the malignancy.
Because U-Net generates more suspicious regions than actual nodules, the top 8 nodule candidates are located (32 × 32×32 volumes) by sliding a window over the data and saving the locations of the 8 most activated (largest L2 norm) sectors.To prevent the top sectors from simply being clustered in the brightest region of the image, the 8 sectors were not permitted to overlap with each other.Then these sectors are combined into a single 64 × 64 × 64 image, which will serve as the input to classifiers, which assign a label to the image (cancer or not cancer).
A 3D CNN is used as linear classifier.It uses weighted softmax cross entropy loss (weight for a label is the inverse of the frequency of the label in the training set) and Adam Optimizer, and the CNNs use ReLU activation and droupout after each convolutional layer during training.The network is shrunk to prevent parameter overload for the relatively small Kaggle dataset.The 3D CNN architecture is described in detail in Table III.
Convolutional neural network consists of some number of convolutional layers, followed by one or more fully connected layers and finally an output layer.An example of this architecture is illustrated in Fig. 8. Figure 4.An example architecture of a 3D Convolutional Neural Network used here.On the left is the input 3D volume, followed by two convolutional layers, a fully connected layers and an output layer.In the convolutional layers, each filter (or channel) is represented by a volume.its orientation in the CT scan.In the remainder of this section, we describe the technical details of the neural network architecture we used and how it was trained.

Convolutional Neural Networks
A convolutional neural network consists of some number of convolutional layers, followed by one or more fully connected layers and finally an output layer.An example of this architecture is illustrated in Figure 4. Formally, we denote the input to layer m of the network by I (m) .The input to a 3D convolutional layer m of a neural network is a n i,j,k where i, j, and k index the 3D volume and selects the channel.The output of a convolutional layer m is defined by its dimensions, i.e., as well as the number of filters or channels it produces n (m) c .The output of layer m is a convolution of its input with a filter and is computed as where W (m, ) and b (m, ) are the parameters which define the th filter in layer m The locations where the filters are evaluated (i.e., the values of i, j, k for which I (m, ) i,j,k is com-treated as a single vector.The output of a fully connected is a 1D vector I (m) whose dimension is a parameter of the network architecture.The output of neuron i in layer m is given by where W (m,i) and b (m,i) are the parameters of neuron i in layer m and the sum over j is a sum over all dimensions of the input.The activation function fReLU(•) here is chosen to be a Rectified Linear Unit (ReLU) with fReLU(a) = max(0, a).This activation function has been widely used in a number of domains [24,16] and is believed to be particularly helpful in classification tasks as the sparsity it induces in the outputs helps create separation between classes during learning [17,3].The last fully connected layer is used as input to the output layer.The structure and form of the output layer depends on the particular task.Here we consider two different types of output functions.In classification problems with K classes, a common output function is the softmax function 3D object with n and its elements are denoted by I (m, ) i,j,k where i, j, and k index the 3D volume and selects the channel.The output of a convolutional layer m is defined by its dimensions, i.e., n as well as the number of filters or channels it produces n (m) c .The output of layer m is a convolution of its input with a filter and is computed as where, W (m, ) and b (m, ) are the parameters which define the th filter in layer m The locations where the filters are evaluated (i.e., the values of i, j, k for which I (m, ) i,j,k is computed) and the size of the filters (i.e., the values of W (m, ) ) which are non-zero) are parameters of the network architecture.Finally, we use a hyperbolic tangent activation function with f tanh (a) = tanh(a).
Convolutional layers preserve the spatial structure of the inputs, and as more layers are used, build up more and more complex representations of the input.The output of the convolutional layers is then used as input to a fully connected network layer.To do this, the spatial and channel structure is ignored and the output of the convolutional layer is treated as a single vector.The output of a fully connected is a 1D vector I (m) whose dimension is a parameter of the network architecture.The output of neuron i in layer m is given by where, W (m,i) and b (m,i) are the parameters of neuron i in layer m and the sum over j is a sum over all dimensions of the input.The activation function f ReLU (.) here is chosen to be a Rectified Linear Unit (ReLU) with f ReLU (a) = max(0, a).This activation function has been widely used in a number of domains [19], [20] and is believed to be particularly helpful in classification tasks as the sparsity it induces in the outputs helps create separation between classes during learning.
The last fully connected layer is used as input to the output layer.The structure and form of the output layer depends on the particular task.Here we consider two different types of output functions.In classification problems with K classes, a common output function is the softmax function: where, N is the index of the last fully connected layer, b (o,i) and W (o,i) are the parameters of the ith output unit and f i ∈ [0, 1] is the output for class i which can be interpreted as the probability of that class given the inputs.We also consider a variation on the logistic output function: which provides a continuous output f which is restricted to lie in the range (a, b) with parameters b (o) and W (o) .We call this the scaled logistic output function.We note that when considering a ranking-type multi-class classification problem like predicting the malignancy level this output function might be expected to perform better.

A. Training
Given a collection of data and a network architecture, the main goal is to fit the parameters of the network to that data.To do this we will define an objective function and use gradient based optimization to search for the network parameters which minimize the objective function.Let D = n i , y i D i=1 be the set of D (potentially augmented) training examples where n is an input (a portion of a CT scan) and y is the output (the malignancy level or a binary class indicating benign or malignant) and Θ denote the collection of all weights W and biases b for all layers of the network.The objective function has the form where, f (n i , Θ)) is the output of the network evaluated on input n with parameters Θ, L(y i , f (n i , Θ)) is a loss function which penalizes differences between the desired output of the network y and the prediction of the network ŷ.The function E prior (Θ) = W 2 is a weight decay prior which helps prevent over-fitting by penalizing the norm of the weights and λ controls the strength of the prior.
We consider two different objective functions in this paper depending on the choice of output function.For the softmax output function we use the standard cross-entropy loss function L(y i , ŷ) = − K k=1 y k log(ŷ k ) where y is assumed to be a binary indicator vector and ŷ is assumed to be a vector of probabilities for each of the K classes.A limitation of a crossentropy loss is that all class errors are considered equal, hence mislabeling a malignancy level 1 as a level 2 is considered just as bad as mislabeling it a 5.This is clearly problematic, hence for the scaled logistic function we use the squared error loss function to capture this.Formally, L(y i , ŷ) = (y − ŷ) 2 where we assume y and ŷ to be valued.
Given the objective function E(Θ), the parameters Θ are learned using stochastic gradient descent (SGD) [21].SGD www.ijacsa.thesai.orgoperates by randomly selecting a subset of training examples and updating the values of the parameters using the gradient of the objective function evaluated on the selected examples.To accelerate progress and reduce noise due to the random sampling of training examples we use a variant of SGD with momentum [22].Specifically, at iteration t, the parameters are updated as where, ρ = 0.9 is the momentum parameter, Θ t+1 is the momentum vector, t is the learning rate and ∇E t (Θ t ) is the gradient of the objective function evaluated using only the training examples selected at iteration t.At iteration 0, all biases are set to 0 and the values of the filters and weights are initialized by uniformly sampling from the interval [− 6 fan in+fan out , 6 fan in +fan out ] as suggested by [23] where f an in and f an out respectively denote the number of nodes in the previous hidden layer and in the current layer.Given this initialization and setting t = 0.01, SGD is running for 2000 epochs, during which t is decreased by 10% every 25 epochs to ensure convergence.

VII. SIMULATION RESULTS
The experiments are conducted using DSB dataset.In this dataset, a thousand low-dose CT images from high-risk patients in DICOM format is given.The DSB database consists of 1397 CT scans and 248580 slices.Each scan contains a series with multiple axial slices of the chest cavity.Each scan has a variable number of 2D slices (Fig. 9), which can vary based on the machine taking the scan and patient.The DICOM files have a header that the necessary information about the patient id, as well as scan parameters such as the slice thickness.It is publicly available in the Kaggle [13].Dicom is the de-facto file standard in medical imaging.This pixel size/coarseness of the scan differs from scan to scan (e.g. the distance between slices may differ), which can hurt performance of our model.The experiments are implemented on computer with CPU i7, 2.6 GHz, 16 RAM, Matlab 2013b, R-Studio, and Python.Initially speaking, the nodules in DSB dataset are detected and segmented using thresholding and U-Net Convolutional Neural Network.The diameters of the nodules range from 3 to 30 mm.Each slice has 512 × 512 pixels and 4096 gray level values in Hounsfield Unit (HU), which is a measure of radiodensity.
In the screening setting, one of the most difficult decisions is whether CT or another investigation is needed before the next annual low-dose CT study.Current clinical guidelines are complex and vary according to the size and appearance of the nodule.The majority of nodules were solid in appearance.For pulmonary nodule detection using CT imaging, CNNs have recently been used as a feature extractor within a larger CAD system.
For simplicity in training and testing we selected the ratings of a single radiologist.All experiments were done using 50% training set, 20% validation set and 30% testing set.To evaluate the results we considered a variety of testing metrics.The accuracy metric is the used metric in our evaluations.In our first set of experiments we considered a range of CNN architectures for the binary classification task.Early experimentation suggested that the number of filters and neurons per layer were less significant than the number of layers.Thus, to simplify analysis the first convolutional layer used seven filters with size 5×5×5, the second convolutional layer used 17 filters with 5×5×3 and all fully connected layers used 256 neurons.These were found to generally perform well and we considered the impact of one or two convolutional layers followed by one or two fully connected layers.The networks were trained as described above and the results of these experiments can be found in Table I.Our results suggest that two convolutional layers followed by a single hidden layer is one of the optimal network architecture for this dataset.The average error for training is described in Fig. 10.Another important parameter in the training of neural networks is the number of observations that are sampled www.ijacsa.thesai.org at each iteration, the size of the so-called minibatch.The use of minibatches is often driven in part by computational considerations but can impact the ability of SGD to find a good solution.Indeed, we found that choosing the proper minibatch size was critical for learning to be effective.We tried minibatches of size 1, 10, 50 and 100.While the nature of SGD suggests that larger batch sizes should produce better gradient estimates and therefor work better, our results here show that the opposite is true.Smaller batch sizes, even as small as 1, produce the best results.We suspect that the added noise of smaller batch sizes allows SGD to better escape poor local optima and thus perform better overall.
The recognition results are shown by confusion matrix achieved on the DSB dataset with 3D CNN as shown in Table IV

VIII. CONCLUSION
In this paper we developed a deep convolutional neural network (CNN) architecture to detect nodules in patients of lung cancer and detect the interest points using U-Net architecture.This step is a preprocessing step for 3D CNN.The deep 3D CNN models performed the best on the test set.While we achieve state-of-the-art performance AUC of 0.83, we perform well considering that we use less labeled data than most stateof-the-art CAD systems.As an interesting observation, the first layer is a preprocessing layer for segmentation using different techniques.Threshold, Watershed, and U-Net are used to identify the nodules of patients.
The network can be trained end-to-end from raw image patches.Its main requirement is the availability of training database, but otherwise no assumptions are made about the objects of interest or underlying image modality.
In the future, it could be possible to extend our current model to not only determine whether or not the patient has cancer, but also determine the exact location of the cancerous nodules.The most immediate future work is to use Watershed segmentation as the initial lung segmentation.Other opportunities for improvement include making the network deeper, and more extensive hyper parameter tuning.Also, we saved our model parameters at best accuracy, but perhaps we could have saved at other metrics, such as F1.Other future work include extending our models to 3D images for other cancers.The advantage of not requiring too much labeled data specific to our cancer is it could make it generalizable to other cancers.

Figure 2 :
Figure 2: (a) Histogram of HU values at (b) corresponding axial slices for sample patient 3D image at various axial slice (a) Histograms of pixel values in HU for sample patients CT scan at various slices.

Figure 2 :
Figure 2: (a) Histogram of HU values at (b) corresponding axial slices for sample patient 3D image at various axial slice (b) Corresponding 2D axial slices.

Figure 3 :
Figure 3: 3a Histogram of HU values at 3b corresponding axial slices for sample patient 3D image at various axial.

Figure 4 :
Figure 4: (4a) Sample patient 3D image with pixels values greater than 400 HU reveals the bone segment, (4b) Sample patient bronchioles within lung, (4c) Sample patient initial mask with no air, and (4d) Sample patient final mask in which bronchioles are included.

Figure 4 :
Figure 4: Sample patient initial mask with no air

Figure 6 :Figure 9 :
Figure 6: Sample patient final mask in which bronchioles are included

Figure 8 :Figure 9 :
Figure 8: lung segmentation mask by thresholding of sample patient

Figure 8 :Figure 9 :Figure 5 :
Figure 8: lung segmentation mask by thresholding of sample patient esholding of samask of sample pation of sample paonding U-Net infrom the LUNA16 3, and 14 respecweighted softmax ixel, as a label of 0 is far more common in the mask than a label of 1.The trained U-Net is then applied to the segmented Kaggle CT scan slices to generate nodule candidates.

Figure 12 :
Figure 12: U-Net sample input from LUNA16 validation set.Note that the above image has the largest nodule from the LUNA16 validation set, which we chose for claritymost nodules are significantly smaller than the largest one in this image.

Figure 13 :
Figure 13: U-Net sample labels mask from LUNA16 validation set showing ground truth nodule location 2D slices back to generate nodule candidates (Preprocessing and reading of LUNA16 data code based on https://www.kaggle.com/arnavkj95/candidate-generation-and-luna16-preprocessing).Ideally the output of U-Net would give us the exact locations of all the nodules, and we would be able to say images with nodules as detected by U-Net are positive for lung cancer, and images without any nodules detected by U-Net are negative for lung cancer.However, as shown in Figure14, U-Net produces a strong signal for the actual nodule, but also produces a lot of false positives, so we need an additional classifier that determines the malignancy.

Figure 14 :
Figure 14: U-Net predicted output from LUNA16 validation set

Figure 13 :
Figure 13: U-Net sample labels mask from LUNA16 validation set showing ground truth nodule location 2D slices back to generate nodule candidates (Preprocessing and reading of LUNA16 data code based on https://www.kaggle.com/arnavkj95/candidate-generation-and-luna16-preprocessing).Ideally the output of U-Net would give us the exact locations of all the nodules, and we would be able to say images with nodules as detected by U-Net are positive for lung cancer, and images without any nodules detected by U-Net are negative for lung cancer.However, as shown in Figure14, U-Net produces a strong signal for the actual nodule, but also produces a lot of false positives, so we need an additional classifier that determines the malignancy.

Figure 14 :
Figure 14: U-Net predicted output from LUNA16 validation set

Figure 12 :
Figure 12: U-Net sample input from LUNA16 validation set.Note that the above image has the largest nodule from the LUNA16 validation set, which we for claritymost nodules are significantly smaller than the largest one in this image.

Figure 13 :
Figure 13: U-Net sample labels mask from LUNA16 validation set showing ground truth nodule location 2D slices back to generate nodule candidates (Preprocessing and reading of LUNA16 data code based on https://www.kaggle.com/arnavkj95/candidate-generation-and-luna16-preprocessing).Ideally the output of U-Net would give us the exact locations of all the nodules, and we would be able to say images with nodules as detected by U-Net are positive for lung cancer, and images without any nodules detected by U-Net are negative for lung cancer.However, as shown in Figure14, U-Net produces a strong signal for the actual nodule, but also produces a lot of false positives, so we need an additional classifier that determines the malignancy.

Figure 14 :
Figure 14: U-Net predicted output from LUNA16 validation set

Figure 7 :
Figure 7: (7a) U-Net sample from LUNA16 validation set.Note that the above image has the largest nodule from the LUNA16 validation set, which we chose for clarity-most nodules are significantly smaller than the largest one in this image, (7b) U-Net predicted output from LUNA16 validation set, (7c) U-Net sample labels mask from LUNA16 validation set showing ground truth nodule location.

Figure 8 :
Figure 8: An example architecture of a 3D convolutional neural network used here.On left is the input 3D volume, followed by two convolutional layers, a fully connected layers and an output layer.In the convolutional layers, each filter (or channel) is represented by a volume.

Figure 9 :
Figure 9: Number of slices per patient in data science bowl dataset.

Table 1 :
[8]ical radiodensities in HU of various substances in a CT scan[8]

Table 1 :
[8]ical radiodensities in HU of various substances in a CT scan[8]

Table I :
Typical Radiodensities in HU of Various Substances in a CT Scan

Table 2 :
U-Net architecture (Dropout wit after each 'a' conv layer during training, ' sizing of image via bilinear interpolation, A learning rate = 0.0001)

Table 2 :
U-Net architecture (Dropout with 0.2 probability after each 'a' conv layer during training, 'Up' indicates resizing of image via bilinear interpolation, Adam Optimizer, learning rate = 0.0001)

Table II :
U-Net Architecture (Dropout with 0.2 Probability after each 'a' Conv.Layer during Training, 'Up' Indicates Resizing of Image via Bilinear Interpolation, Adam Optimizer, Learning Rate = 0.0001)

Table IV :
. As shown from the Table IV, Accuracy of model is 86.6%,Mis-classification rate is 13.4%, False positive rate is 11.9%, and False Negative is 14.7%.Almost all patients are classified correctly.Additionally, there is an enhancement on accuracy due to efficient U-Net architecture and segmentation.Confusion Matrix of 3D CNN using 30% Testing