Lung Cancer Detection on Ct Scan Images: a Review on the Analysis Techniques

—Lung nodules are potential manifestations of lung cancer, and their early detection facilitates early treatment and improves patient's chances for survival. For this reason, CAD systems for lung cancer have been proposed in several studies. All these works involved mainly three steps to detect the pulmonary nodule: preprocessing, segmentation of the lung and classification of the nodule candidates. This paper overviews the current state-of-the-art regarding all the approaches and techniques that have been investigated in the literature. It also provides a comparison of the performance of the existing approaches.


INTRODUCTION
The Lung cancer (LC) is the second most common cancer in both men and women in Europe and in the United States and represents a major economic issue for health care systems, accounting for about 12.7% of all new cancer cases per year and 18.2% of cancer deaths.In particular, each year there are approximately 1,095,000 new cancer cases and 951,000 cancer-related deaths in men and 514,000 new cases and 427,000 deaths in women [3].
Lung cancer is caused by uncontrollable irregular growth of cells in lung tissue.These lung tissue abnormalities are often called Lung nodules.They are small and roughly spherical masses of tissue, usually about 5 millimeters to 30 millimeters in size.In general, They can be categorized into 4 groups [78][94] [59] including: juxta-vascular, well-circumscribed, pleural tail, and juxta-pleural.Figure 1 shows some examples of these categories.Pulmonary nodules are the characterization of the early stage of the lung cancer.
Investigations have shown that the curability of this deadly cancer is nearly 75%, if it is recognized early enough because it is easier to treat and with fewer risks.Therefore, the early diagnosis of malignant nodules is a crucial issue for reducing morbidity and mortality.Computer-aided diagnosis (CAD) systems are efficient schemes that have been developed for the detection and characterization of various lesions in the field of the diagnosis of lung cancer.The main objective of such systems is to assist the radiologist in the different analysis steps and to offer him a second opinion to the final decisions.Thus, Researchers are becoming more and more concerned with the elaboration of automated CAD systems for lung cancer.Many publications proposed different automated nodule recognition systems using image processing, and including, different techniques for segmentation, feature extraction and classification.

II. REVIEW OF EXISTING NODULE DETECTION METHODS
In literature, authors proposed several methods for automated and semi-automated detection of pulmonary nodules [59].However, all these works involved four steps to detect the pulmonary nodule: pre-processing, extraction of nodule candidates, reduction of false positives and classification.Figure 2 shows these steps in details.
The next part focuses on the different studies involving these steps.

A. Pre-processing
Computed Tomography (CT) is considered as one of the best methods to diagnose the pulmonary nodules [76].It uses x-rays to obtain structural and functional information about the human body.However, the CT image quality is influenced a lot by the radiation dose.The quality of image increases with the significant amount of radiation dose [15], but in the same time, this increases the quantity of x-rays being absorbed by the lungs.To prevent the human body from all kind of risk, radiologists are obliged to reduce the radiation dose, which affects the quality of image and is responsible for noises in lung CT images.
Pre-processing step aims to reduce the noises in these images.Different filtering techniques were proposed in literature to remove these noises, such as median filtering [11] [47], bilateral filtering [84] and a specific high-pass filter [32].Many others works combine median filters with Laplacian filters by a differential technique, which subtracts a nodule suppressed image (through a median filter) from a signal enhanced image (through a Laplacian matched filter with a spherical profile) [34][35] [16].A www.ijarai.thesai.orgdifference image, containing nodule enhanced signal, is then obtained and used for the next stages.
In [84], the authors compare different pre-processing methods with various filters and suggest that bilateral filter provides better performances for pre-processing medical images.In addition, Bae et al. used a morphological filter to enhance the image region [8], whereas, Ochs et al. [68] and Paik et al. [71] applied in their studies, a spherical enhancement filter to enhance the nodule like structure in CT images.
In [7], the authors affirm that an Adaptive Median filtering is required to correct the poor contrast caused by poor lighting conditions during image acquisition.They generated a low frequency image by replacing each the pixel value with a median pixel value computed over a square area of 5x5 pixels.Then, a contrast limited adaptive histogram (CLAHE) equalization technique is used to improve the contrast of the CT pre-processed image.
In other hand, Farag et al. insist in [27] [28] that the filtering approach to use must preserve object boundaries and detailed structures, Sharpen the discontinuities to enhance morphological structures and efficiently remove noise in homogeneous physical regions.In their work, the authors used both the Wiener and anisotropic diffusion filters.
Recently, other filters have been developed to enhance lung structures in 3-D images.Many researchers employed filters based on eigenvalues of the Hessian matrix [42][51][75] [60].Frangi et al. [31] further developed this approach by defining a 3D multi-scale structure enhancement filter based on the eigenvalues of the Hessian matrix and applying it to the enhancement of vessels.More later, Rikxoort et al. was the first to propose a supervised enhancement approach based on single phase and multi-phase methods [74].In [92], the authors applied a set of 3D morphologic filters to separate the nodule from other surroundings structures, such as vessels and bronchi.

B. Segmentation
Segmentation of the lung regions is the second stage of the methods processing scheme.It refers to the process of partitioning the pre-processed CT image into multiple regions to separate the pixels or voxels corresponding to lung tissue from the surrounding anatomy.Various approaches have been used for lung segmentation and they can be categorized into two main groups: 2D approaches and 3D approaches.  .In [17], the authors separate nodule candidates from CT images using mathematical morphology and grey level thresholding.In [7], image histogram is used to find two value of threshold and then a multilevel thresholding and a connected-component labeling step is applied to the image in order to segment candidate nodule regions.Furthermore, simple thresholding that exploit the intensity characteristics of lung CT scans was presented in Farag et al. [27], El-Baz et al. [22], and Giger et al. [34] for separation of the nodule candidates from the background image.Bae et al. performs in [8] thresholding and seeded segmentation to isolate the juxta-pleural nodule from other structures.
In the same context, Shao et al. [76] uses adaptive iteration threshold method twice to implement initial segmentation of the pulmonary parenchyma.Zhou et al. [99], Wang et al. [85] and Retico et al. [73] implemented a histogram-based thresholding to segregate the lung region from the adjacent structure.
Another distinct type of lung CT segmentation technique is region-based segmentation methods.These methods focus generally on the homogeneity of the image for determining object boundaries.Region growing is the most widely used technique.It examines adjacent pixels of initial seed points and determines whether the pixel neighbours should be added to the region and then the process is iterated on.Obviously, object of interest must have nearly constant or slowly varying intensity values to satisfy the homogeneity requirement, which is true for CT images.
Region growing was explored by Aggarwal et al. [1], Lee et al. [59], and Taher et al. [80] for lung tissue segmentation.Combining the region growing with morphological closing, Lin and Yan [62] and Lin et al. [63] succeed to fill the large indentation caused by blood vessel that could not be extracted by thresholding.
A part from region growing method, many other methods involving textural features have been implemented last years [18][20] [28].In [28] and [79] local binary patterns were used as textural features and regions of interest (ROls) were characterized by combining the intensity histograms.Devaki et al. used the SURF and the LBP descriptors to generate the features that describe the texture of common lung nodules [20].
Stochastic methods exploit the difference between the existing structures in the lung images statistically.They propose many techniques that attempt to fit the distribution of intensity values in an image to a set of mathematical statistical functions.Each function defines a class and the output of the function defines the probability of an intensity value belonging to it.This approach was used by Guo et al., who developed a lung segmentation method using expectation-maximization (EM) analysis in combination with morphological operations [40].After computing the image's histogram, the authors apply the (EM) algorithm to estimate the appropriate threshold value for lung segmentation.
Another segmentation technique was proposed by El-Baz et al. [22].It aims to isolate the lungs from the surrounding structures by using Gibbs Markov Random Field (GMRF).In the next step, the abnormalities in the lungs are detected by using adaptive template matching and genetic algorithm.
Contour-based methods were used to identify the boundaries of the objects in the CT images.The contour-based methods can be categorized into two groups, Deformable models and Gradient Based methods.Deformable models were implemented in [47] and in [49] to segment nodules images.In fact, Kim et al. [49] uses a set of segmentation methods, such as thresholding, mathematic morphology, and deformable model to detect the lung region.Bellotti et al. [9] employed region growing with contour following to isolate juxta-pleural nodules.Zhao et al. [95] improved the shape-based segmentation using nodule gradient and sphere occupancy measurements.
In [77], the segmentation algorithm is applied based Sobel edge detection method, in order to detect the cancer nodules from the extracted lung image, whereas, in [44], the snake algorithm was used to extract the nodules' boundaries.Later, Tariq et al. used gradient mean and variance based method for the extraction of lung background since gradient operator has high values for pixels belonging to the boundary between foreground and background [82].
Learning-based methods, known also as knowledge-based methods, use pattern recognition techniques to statistically estimate dependencies in the image.They aim to represent the knowledge about lung cancer in a form that the computer can deal with [58][90][4] [45].Leader et al. [58] developed a heuristic threshold-based scheme for initial lung segmentation and then they applied a rule-based process to correct the initial www.ijarai.thesai.orglung segmentation's result.In [4], the authors propose an anatomical model through a semantic network whose nodes are the anatomical structures in the lungs.Each node of this network contains information about a specific anatomical part, position relative to other structures, and gray level.Then, the authors describe these features by fuzzy sets.
In the same context, rule based technique is applied in [77] and a set of diagnosis rules are generated from the extracted features.In [19] Dehmeshki et al. proposed to use a fuzzy map to improve the contrast between nodules and surrounding structures, such as blood vessels.
In [45], Jaafar et al. implemented a genetic algorithm procedure to segment the lung part from the original image, then they used morphology and Susan thinning algorithm to detect lung's edges.In [90], the authors present an intelligent medical system for lung cancer cell identification based on a two-layer rule-based fuzzy knowledge model.

2) 3D-based approaches
Several approaches exist in literature regarding the volumetric lung nodule segmentation.They can be classified into five categories: thresholding [96], mathematical morphology, region growing, deformable model, and dynamic programming, as shown in Figure 4.
Thresholding approach was adopted by Zhao et al. [96] and Yankelevitz et al. [91][92], where the appropriate threshold values can be deduced either after applying the Kmean clustering in [91] [92] or applying the average gradient magnitudes algorithm [96].
Mathematical Morphology was also used for detection lung nodules in 3D CT images.Kostis et al. [52,53] and Kuhnigk et al. [56,57] have proposed effective iterative approaches for binary morphological filtering with various combinations of these basic operators.Okada et al. [69] presented a data-driven method to determine the ellipsoidal structuring element from anisotropic Gaussian fitting.Fetita et al. [30] proposed a new gray-level mathematical morphology operator, in order to discriminate the volumetric lung nodules from other dense structures.In [38], Goodman et al. segmented the existing lung nodules using the watershed algorithm followed by a modelbased analysis.In other hands, more recent studies [19][21] [54,55] used the region growing approach as the main component of their overall segmentation algorithms.Dehmeshki et al. [19] proposed an adaptive region growing scheme on the fuzzy connectivity map computed from a prior segmented images.Diciotti et al. [21] proposed also a modified region growing algorithm designed with a geodesic distances.Kubota et al. [54,55] used the same concept but with an Euclidean distance map.Later, Gong et al. segmented the lung lobes via a 3D region growing algorithm and then a number of regions of interest were extracted by using the Otsu threshold algorithm [37].Graph-Cuts is one of the well-known techniques of region-based segmentation.Zheng et al. [97,98] applied graphcuts to derive their initial 2D nodule segmentation in their coupled segmentation-registration method with B-spline registration.
Deformable models are widely applied methods for 3D segmentation purposes.They were implemented firstly by Kawata et al. [47,48] who adopted the geodesic active contours approach introduced in [14].El-Baz et al. [23,24] adopted the energy minimization approach when designing an appearance model to segment the 3D lung nodules.Farag et al. [29] proposed a Level Sets solution with adaptive prior probability term for nodule segmentation.Yoo et al. [93] adopted the multiphase level sets framework introduced in [83] to present an asymmetric segmentation method for partially solid nodules.Active contours were also a widely used technique in image segmentation research community.In this context, Way et al. proposed in [88], an explicit active contour method which minimized energy that took into account 3D gradient, curvature, and penalized contours when growing against chest wall.Dynamic Programming is another well-known technique for detecting optimal contours in images.Several methods extend this approach to a 3D surface detection process.In Wang et al. [87], a set of 2D dynamic programming iterations are applied to successive slices along the third dimension.In [86], the authors proposed to transform the 3D spherical lung volume to the 2D polar coordinate system before applying the standard 2D dynamic programming algorithm and this was in order to detect 3D lesion boundary.
According to Diciotti et al. [21], segmentation algorithms should be evaluated on large public databases with a welldefined ground truth for verification.Several of the existing studies utilized private databases.Therefore, a performance comparison between various methods is thus limited [59].Usually, a nodule will appear in several slices of image in a CT scan.In 2D method, the slice with the greatest sized nodule is selected for analysis to differentiate between benign and malignancy.Compared with 2D method, the addition of extra dimension dramatically increases the operational complexity and computational cost for processing the entire 3D nodule volume.Thus, to reduce both the computational cost and radiation dose, the study in this paper tries to distinguish between benign and malignant nodules by using a 2Dapproach for a single post-contrast CT scan [64].www.ijarai.thesai.org

C. Nodule extraction and classification
Lung nodule detection aims to identify the location of the nodules if they exist.The most widely proposed approach is detection by classification and clustering.This approach comprises four categories: Fuzzy and neural network, Knearest neighbour, Support vector machines and linear discriminant analysis, as shown in Figure 5.
Fuzzy rules were first designed by Brown et al. [13] who developed a knowledge-based, fully automated method for segmenting volumetric chest CT images.The method utilizes a modular architecture consisting of an anatomical model, image processing routines, and an inference engine.Later, Li et al. [61] and Dehmeshki et al. [19] implemented an automated rule-based classifier to classify nodules and non-nodules.The same approach was also adopted by Kostiset al. [52], Bong et al. [12] and Hosseini et al. [43].In [12] Bong et al. propose and apply state-of-the-art fuzzy hybrid scatter search for segmentation of lung Computed Tomography (CT) image to identify the lung nodules detection.It utilized fuzzy clustering method with evolutionary optimization of a population size.Later in [43], the authors employed two fuzzy methods for the lung nodule CAD application: The Mamdani model and the Sugeno model of the fuzzy logic system.These methods were implemented and the classification results were compared and evaluated through ROC curve analysis and root mean squared error methods.Artificial neural networks were employed by Arimura et al. [6] for lung nodule detection.Reticoet al. introduces the identification of the pleural region by Directional-gradient concentration (DGC) and morphological opening, then, the features are extracted and candidate nodules are classified using Feed-forward Neural Network [73].A two-level convolution neural network was proposed in Lin et al. [86].Lin and Yan [62] and Lin et al. [63] combined fuzzy logic and neural networks for lung nodule detection and reported that the combination was superior to rule-base, convolution neural network, and genetic algorithm template matching approaches.Also, Antonelli et al. [4] adopted a decision fusion technique to develop a computer-aided detection (CAD) system for automatic detection of pulmonary nodules in low-dose CT images.In the classification stage, they built multi-classifier systems, aggregating the decisions of a feed forward four-layer neural network and a decision tree.
Recently, Akram et al. implemented an automated pulmonary nodule detection system a novel pulmonary nodule detection system using Artificial Neural Networks based on hybrid features consist of 2D and 3D Geometric and Intensity based statistical features [2].
A nearest cluster method was used by Ezoe et al. [25] and Tanino et al. [81] to classify the detected nodules candidate.Zhao et al. [96] applied boosting of the KNN classifier to estimate the probability density function of the intensity value of the trained ground glass opacity nodules.In [50], Kockelkorn et al. designed a user-interactive framework for lung segmentation with a k-nearest-neighbour (KNN) classifier.After that, Mabrouk et al. selected, in [66], a total of 22image features from the enhanced CT image, then, a fisher score ranking method was used as a feature selection method to select the best ten features and a K-Nearest Neighbourhood classifier was used to perform classification.Support vector machines (SVM) were performed by Ginneken [36] to classify the nodule feature vector.It was also used by Lu et al. to classify the volumetric lung cancer from based on the concept of machine learning [65].In 2013, Orozco et al. [70] presented a computational alternative to classify long nodules in frequency domain using Support Vector Machines.In the same year, Javed et al. proposed a new weighted SVM classifier in order to increase the accuracy of a lung tumour classification system [46].
The LDA classifier was employed by Gurcan et al. [41] and Armato et al. [5] to reduce the false positives produced by a rule-based classifier.A new feature with 3D gradient field was added to the LDA classifier by Ge et al. [33] to improve the false positives of Gurcan et al. [41].In addition, Matsumoto et al. [67] implemented the same classifier using eight features to identify the candidate nodules.Kim et al. [49] classified the ground glass opacity nodule using LDA based on the Mahalanobis distance distribution.

III. CONCLUSION
This review gives an overview of the current detection techniques for CT images that may help researchers when choosing a given method.Certainly, lung analysis techniques have been improved over the last decade.However, there still are issues to be solved such as developing new and better techniques of contrast enhancement and selecting better criteria for performance evaluation is also needed.

Fig. 2 .
Fig. 2. The general scheme of lung nodule detection system 1) 2D-based approaches In this section, we systematically review the state-of-the-art of the segmentation methods for lung CT images.Due to the large number of segmentation methods, we have categorized these methods into five intuitive groups for easier comprehension: thresholding-based, stochastic, region-based, contour-based, and learning-based methods, as shown in Figure 3.

Fig. 3 .
Fig. 3. 2D-based segmentation methods for lung CT images Thresholding is a simple segmentation technique that converts a gray-level image into a binary image by defining all pixels greater than some value to be foreground and all other pixels are considered as background [7][11][10][17][26][8][76][99].In[17], the authors separate nodule candidates from CT images using mathematical morphology and grey level thresholding.In[7], image histogram is used to find two value of threshold and then a multilevel thresholding and a connected-component labeling step is applied to the image in order to segment candidate nodule regions.Furthermore, simple thresholding that exploit the intensity characteristics of lung CT scans was presented in Farag et al.[27], El-Baz et al.[22], and Giger et al.[34] for separation of the nodule candidates from the background image.Bae et al. performs in[8] thresholding and seeded segmentation to isolate the juxta-pleural nodule from other structures.

Fig. 5 .
Fig. 5.An overview of the Nodule classification methods