Efficient Lung Nodule Classification Method using Convolutional Neural Network and Discrete Cosine Transform

In today’s medicine, Computer-Aided Diagnosis Systems (CAD) are very used to improve the screening test accuracy of pulmonary nodules. Processing, classification, and detection techniques form the basis of CAD architecture. In this work, we focus on the classification step in a CAD system where we use Discrete Cosine Transform (DCT) along with Convolutional Neural Network (CNN) to perform an efficient classification method for pulmonary nodules. Combining both DCT and CNN, the proposed method provides high-level accuracy that outperforms the conventional CNN model. Keywords—Convolutional neural network; discrete cosine transform; pulmonary nodule classification; computer aided diagnosis systems


I. INTRODUCTION
Cancer incidence and mortality are increasing rapidly around the world [1]. It represents the second leading cause of death globally, and it was responsible for an estimated 9.6 million deaths in 2018. Lung cancer is the most frequently diagnosed cancer in both genders, and it is the leading cause of cancer-related death worldwide, with over 2.09 million cases. Even worse, Lung cancer is killing over 1.76 million people yearly (according to the World Health Organization), which represents 20% of the overall cancer-related deaths [2].
According to the American Cancer Society, most cases of lung cancer are diagnosed at a late stage when it is already metastasized as symptoms usually appear until a late stage. Early detection of suspected pulmonary nodules is very important and it could potentially increase survival rates. There are several types of medical imaging modalities for lung cancer screening, but the most frequently used for nodule detection and analysis is Computed Tomography (CT) [1].
In many cases, it is difficult to obtain an accurate diagnosis due to the complicated morphological structure of nodules. A pulmonary nodule is simply an oval-shaped spot growth in the lung. Its form can be confused with other shapes in a CTscan like end-on vessels. A Nodule is called pulmonary mass when its diameter is larger than 3 centimeters, otherwise it's called pulmonary nodule. It is also called micronodules when the diameter is smaller than 4 millimeters. Countless amount of nodules can be discovered during a screening test, and each one of them can be either malignant (cancerous) or benign (noncancerous). The figure 1 shows an example of nodules on two different CT-scans from LIDC database.
To deal with this problem related to the diagnosis accuracy, Computer-Aided Diagnosis systems are often used as a second assistant reader to improve the accuracy of diagnosis made by radiologists during screening practicality.
Computer-aided diagnosis systems are efficient tools that are widely used for Medical Image Analysis to improve diagnosis accuracy [2] [3] [4]. Medical image analysis lies at the basis of these systems. CAD systems, used for medical image analysis, consist of a stepwise process, which is usually designed according to the problem given at hand. Generally, it involves preprocessing, segmentation, detection and classification techniques.
Automated analysis of medical images is an important field in today's world of research. Researchers started working on medical image analysis as soon as they had access to medical image acquisitions on computers. Early in the late 1990s, most of automated analysis systems were based on conventional image processing methods, such as morphological processing [5], edge detection [6], region growing [7] and many more.
The goal of these works was to achieve a rule-based system that solves a particular problem. GOFAI (Good Old Fashioned Artificial Intelligence) or symbolic IA is the name attributed to these systems. The concept of a GOFAI system relies on the idea that cognition can be represented as a sequence of computational terms. So to solve a particular problem related to medical image analysis, all we have to do is to find the right stepwise computational system [8].
Since the late 1990's, Supervised techniques have gained popularity in medical image analysis field [9], and most of today's systems that are built on supervised techniques, particularly those used for commercial purposes, are now very successful. In supervised techniques, such as Active Shape Model (ASM) and Active Appearance Model (AAM) [10], we use data to build the system, and up to now this approach is still widely investigated in actual researches for mainly two reasons: the abundance of public data sets and the availability of computational machines and services with good CPU/GPU performance that is needed to build, train and test the model. Owing to all these improvements, it can be noted that there is a big transition from systems that are based on crafted features to systems that are trained automatically using available datasets.
In the beginning, systems used hand-crafted methods that are designed by humans to extract features from the data to learn. The next level of this approach is to let the system itself extract automatically the features that best represent the data for the problem given at hand. This can be done by transforming the given inputs to labeled outputs while learning increasingly the extraction of high-level features. Substantial amount of works has been proposed in the literature related to Medical Image Analysis with deep learning approaches [11]. One of the most successful deep learning models that have been widely used in Medical Image Analysis is Convolutional Neural Networks (CNNs) [11]. They saw their first real-world successful application in LeNet [12], which was a model designed by LeCun in 1998 for handwritten digit recognition. CNN became popular in 2012 when a model called AlexNet [13] has been proposed in ImageNet competition. The model won the challenge with a great margin outperforming all competitors. And in the next years that followed, substantial amount of work has been proposed with more enhancement using related architecture.
In medical image analysis, CNNs are one of the best choices made by researchers to design and build efficient CAD systems. Many methods have been used for feature extraction and were very popular before the breakthrough of CNNs in 2012. Examples include Principal Component Analysis (PCA), Sparse Coding approaches, and other techniques that have been well detailed in [14].
With regards to lung nodule classification, CNNs outperform most of classical feature learning methods [15]. The proposed work is inspired by these pivotal developments in Medical image analysis researches that are related to CNNs.
In this paper, we introduce an efficient approach for lung nodule classification based on both CNNs and DCT for representation learning. Only relevant information acquired with Discrete Cosine Transform is fed to our Convolutional Neural Network instead of raw patches that are extracted from CTimages from which features are usually extracted. CNN is then used for feature extraction from the DCT output with Convolution, Max Pooling and Dropout layers as presented on Fig. 2.
The rest of this paper is organized as follows: In material and methods, we give a brief overview of Machine learning concepts, Convolutional Neural Networks, and Discrete Cosine Transform. Then, we describe our contribution related to lung nodule classification combining both Convolutional Neural Network and Discrete Cosine Transform. Finally, we provide all the details of the experimentation performed to evaluate the proposed method, then we discuss obtained results and also open challenges and future works.

A. Machine Learning
Machine learning approaches are divided into two major categories: supervised and unsupervised learning algorithms. In supervised methods, the model is described using a dataset of n entry x i∈{1,...,n} that is defined as: x i is the input, y j is the output or the label of the input associated to y j and n is the total number of the dataset entries. The output y j can take several forms according to the problem given at hand. For example in our classification problem, the output can be defined as a scalar of type Boolean: true for "it is" or f alse for "it is not" a nodule, while in other problems y can be a multi-dimensional vector. In supervised learning, the model analyzes the data pairs (x, y) that are fed to it, and produces an inferred function f (x, θ) where x is the input, and θ is the model parameters. This function can be used to map new unseen entries other than the pairs (x, y) used to train the model. The parameters θ are computed based on a loss function loss(y, y ) where y is the label obtained by f (x, θ).
Differently from supervised learning, an unsupervised model process the input data without any pre-defined labels which help find or discover previously unknown patterns. Examples include Principal Component Analysis and clustering methods. The last one is often used to group the dataset elements into one or multiple groups in such a way that the elements of a given group share similar properties more than other elements in a different group.
Because of these nuances, unsupervised models cannot be applied directly to a classification or regression problem since there is no pre-defined outcome that gives us an idea of what the output should be. Supervised learning approaches are often used in pulmonary nodule classification problem since we want to get a better understanding of the nodule structure so the model can tell if a given patch is a lung nodule or not.

B. Convolutional Neural Networks
Artificial Neural Network (AN N ) is an information processing model that lies at the basis of most deep learning methods. A neural network consists of many interconnected neurons just like the biological nervous system but less complicated. A neuron is a node that has many inputs and one output. It can be defined as a function that receives an input data x and provides an output y.
The output is the result of an activation function that takes as argument a linear combination of the input x fed to the neuron plus the bias of the neuron. It is defined as: Generally, the activation functions can be divided into two types: linear and nonlinear functions. The last one is often used because it makes it more efficient and easy for the model to adapt to a variety of data. Examples include Sigmoid, Hyperbolic Tangent, Rectified Linear Unit (ReLU ), and Exponential Linear Unit (ELU ).
Convolutional Neural network (CN N ) is a conventional Deep Learning model, an improvement in Artificial Neural networks, that is widely used in the field of computer vision. Differently from the M LP (one of the most well-known conventional model of AN N ), the CN N involves convolution operation. It contains multiple convolution layers that lie at the basis of its model architecture. With this property at hand, the model performs one training process for the same object occurring at different positions in the different images. In fact, we repeat the following process for each layer: a convolution operation is performed on the input image using a set of kernels and biases W ,B, which gives us a feature map X as a result of this process. Next, a nonlinear transformation is performed on the obtained feature. This transformation is defined as follows: Another advantage of CNN is the parameter-dimensionality reduction. Pooling operation is applied to each feature Map to progressively reduce the number of parameters and thus computation complexity of the model. At the end of these two processes, fully connected layers are usually added to the network to complete the model.

C. Discrete Cosine Transform
Discrete Cosine Transform is process that has been introduced by Ahmed et al. [16] in 1974. It is often used in image processing to deal with dimensionality reduction and image compression [17] [18]. It is also used to extract the most relevant information in the image and it is very efficient in a stepwise image processing system, particularly when the DCT coefficients are only used for image representation instead of the whole image.
When the DCT is performed on a raw image, it transforms the image representation from the spatial domain to the frequency domain. Additionally, DCT is data independent due to its fixed basis and it can be used as a simple matrix operation. The DCT formula is defined as: DCT (x, y) = where DCT (x, y) represents the DCT's coefficients and p(x, y) represents the image patch or pixels that will be performed by DCT. Most of the relevant data that represents the image is concentrated in a few coefficients of the DCT which makes it very efficient in data-dimensionality reduction.
Thus it can used as a first step in the feature extraction process, and instead of using directly the patches extracted from CT-images, we integrate the DCT transformation as a first step to boost the performance of our classification Model.

III. PROPOSED METHOD
In this work, we perform a stepwise classification system for the pulmonary nodule. First, CT-images are transformed from the spatial domain to the frequency domain using DCT . Then, these DCT coefficients (which represent the most relevant information in the images) are fed to the CN N whose architecture is defined as follows: First of all, we start with the input which has a shape of 32 × 32. The input is a grayscale 2D patch extracted from the full CT-image contained in our dataset. It is extracted based on the pairs (x, y); the nodule location coordinates in 2D CTslices. All the pairs are provided in one file included in the LIDC database.
Before feeding the input to the CN N , there are two presteps: data augmentation and Discrete Cosine Transform. We use data augmentation to improve the diversity of our available dataset.
In this work, the data augmentation technique we are using include translation, rotation and cropping. The output associated with the new input obtained after data augmentation is manually validated by the practitioner. In total, we have 8000 patches that we divided into three subsets: training, testing, and validation. The DCT transform is applied on each patch p i of the data set input = DCT (p i ) which gives us a new input that will be fed to our network.
As we mentioned before, the DCT transform is used to improve the effectiveness of the classification process by feeding only the most relevant information of the input to the network. The feature extraction comes after the two pre-steps. The model architecture is described in figures on Table 1.
First, a 2D convolution is applied to the input using 32 different filters of size 3 × 3. The convolution is applied twice: the first one involves padding while the next one doesn't. For each convolution, we use ReLU as an activation function to increase the output non-linearity.
In the next step of the process, we use Max-Pooling after convolution to down-sample the convolution outputrepresentation. In this layer we use a shape of 2 × 2 which reduces the dimensionality of the convolution output from 30 × 30 × 32 to 15 × 15 × 32.
After the max-pooling comes Dropout. The goal of this layer is to prevent the model from overfitting. It consists of selecting randomly neurons and turns them off during each iteration of the training process. In fact, the dropout layer turns off P of neurons in each iteration, where P is the percentage of neurons to turn off randomly during the training process.
Since convolution layers have few parameters, they require less regularization as a starting point; hence we set the P value to 25% (P = 25%) for each Dropout layer.
In this work, we perform the process: convolution → maxPooling → dropout 4 times. We use for the convolution layers different shapes of size: 32 × 32, 64 × 64, 64 × 64 and 64×64 respectively. Also, we use the same activation function ReLU for all convolution layers to improve the non-linearity of their output.
We use Flattening at the end of the convolution process to convert the last output data into a one-dimensional array which will be used as the feature vector. The next part of our model is the Fully Connected Neural Network which consists of 4 Dense layers: the input (512 Nodes), 2 hidden layers (512 Nodes each) and the output (2 Nodes). Again, after each dense layer, we add a dropout layer to prevent the model from overfitting. We use ReLU as the activation function for all layers except the output, for which we use Sof tM ax as an activation function.
In the next section, we will describe the experimentation we built to evaluate the proposed model. We will describe in detail the database we used and the behavior of our model. Finally, we will report our experimentation results and we give a brief overview of the work perspectives.

IV. EXPERIMENT AND RESULTS
Computer-Aided Diagnosis systems are based generally on the following stepwise processing system: 1) data acquisition, 2) medical image preprocessing, 3) medical image segmentation, 4) detection, and 5) classification or false positive reduction.  In this work, we are focusing on the Classification which is the main subject of our research. Other efficient approaches that focus mainly on medical image preprocessing and segmentation are well structured and detailed in these works [19], [20], [21], [22] The main goal of this work is to evaluate the impact of combining Discrete Cosine Transform and Convolutional Neural Network on the classification accuracy for pulmonary nodules, to determine whether or not the proposed method outperforms the standard CNN classifier. It is also our goal to improve the classification accuracy of the standard CNN model. We do not include in this experiment, comparison between CNN and other Methods since the proposed work aims at improving the Classification accuracy of the standard CNN. Detailed comparison of CNN with the state of the art of Deep learning approaches for medical image analysis are presented in [15].
In this experiment, we use lung CT-images from the wellknown LIDC database (Lung Image Database Consortium) [23]. The LIDC is an efficient international web-accessible database that is widely used for development, training, and evaluation of Computer-Assisted Diagnosis systems (CAD) that target lung cancer detection and classification.
Each Lesion is marked-up by multiple experts. The coordinates of the lesion center (x, y) on the CT-image as well as its radius, all are provided on the database to help Medical Image Analysis researchers evaluate easily their built systems .   TABLE II. THIS TABLE SHOWS THE TEST ACCURACY FOR BOTH  METHODS LABELED CNN DCT-CNN(OUR PROPOSED METHOD). IT  SHOWS ALSO BOTH THE AVERAGE OF ACCURACY DURING ALL TRAINING  PROCESS AND THE AVERAGE OF ACCURACY AFTER HITTING THE MAX  ACCURACY UNTIL THE END OF THE TRAINING PROCESS (EOT). THE  AVERAGE LOSS IS ALSO DEPICTED ALONG WITH THE AVERAGE LOSS  AFTER HITTING THE MIN LOSS VALUE UNTIL THE END OF THE TRAINING PROCESS. In this work, we use the center coordinates to extract patches from CT images. The figure 5 shows an example of different patches used to train and test the model.

CNN DCT-CNN
In total, we have 8000 patches with a shape size of 32 × 32. We divide the obtained patches into 3 subsets: training, testing, and validation. The first subset entries are used by our model as labeled examples to learn from. The second subset is used to check the model performance while tuning its hyperparameters during the training process. Finally, the third subset is used to evaluate the final model fit.
In machine learning, an epoch is a measure that represents the number of times all the training vectors are used once to tune the model hyper-parameters.
In this experiment we are setting its value to epoch = 15. The batch is the number of samples passed simultaneously during the training process before the weights getting updated, and this per one epoch. In this experiment, we set the value of batch = 32.
For all the layers, we use ReLU as activation function except the output where we use Sof tM ax as activation function. The number of filters per each convolution layer is 32, 64, 64 respectively, of the same size: 3 × 3.
In Max-Pooling, we use a 2×2 box and during all Dropout operations we turn off 25% of the neurons which all are chosen at random. Fig. 3 and 4, we show a graph that consists of two different curves: the blue one which represents the evolution of the classification accuracy/loss after each epoch of our proposed model while the orange curve represents the evolution of the classification accuracy/loss of the conventional CNN. Fig. 3 represents the evolution of the classification accuracy of the two models: the conventional CNN and our proposed Model. After each epoch we evaluate the classification accuracy of both models using entries from the third subset of LIDC database -Entries that we use only for testing and which we don't use in the training process, to ensure the effectiveness of the testing process. From Fig. 3 and 4 we can see that the proposed method outperforms the conventional CNN in terms of Accuracy with over 4.73%. In Table II we provided more details about the experimental results. It shows the Test Accuracy of both CNN and the proposed Method. It also shows the average accuracy starting from the moment the model reached its maximum accuracy until the end of Training.
We included also values of the classification accuracy from the 61st epoch to the 85th epoch of the experimentation. From both the graphs and the results table we can see good improvement in terms of the classification accuracy when using Discrete Cosine Transform along with Convolutional Neural Network as it refines the information of entries used for training to improve the model accuracy. The final result show that the proposed method outperforms the conventional CNN with a good margin.

V. CONCLUSION
The main goal of this work was to evaluate the impact of Discrete Cosine Transform (DCT) on the classification accuracy when it's applied along with Convolutional Neural Network (CNN) for Lung Nodules classification.
The proposed method aims at using DCT to extract only most relevant information in the patches before feeding them to the model as a training data. The model architecture, which is also considered a keystone of the model accuracy, is also described in details along with all its parameters.
The proposed Model is tested on LIDC database which is one of the most efficient datasets used for lung nodules classification and detection. The proposed Method outperform the standard CNN in terms of accuracy with a good margin.
In this work, we demonstrated that Discrete Cosine Transform can improve the accuracy of the conventional CNN with a good margin (in our experiment: between 4.73%), when it is applied for Lung nodules classification in CT-images. In future works, this proposed method can be used as the last step that completes a CAD system; a Real-World Application that aims at analyzing each lesion in an input CT-image and could tell if it is a lung nodule or not.