Systematic Review of Deep Learning Techniques for Lung Cancer Detection

—Cancer is the leading cause of deaths across the globe and 10 million people died of cancer and particularly 2.21 million new cases registered besides 1.80 million deaths, according to WHO, in 2020. Malignant cancer is caused by multiplication and growth of lung cells. In this context, exploiting technological innovations for automatic detection of lung cancer early is to be given paramount importance. Towards this end significant progress has been made and deep learning model such as Convolutional Neural Network (CNN) is found superior in processing lung CT or MRI images for disease diagnosis. Lung cancer detection in the early stages of the disease helps in better treatment and cure of the disease. In this paper, we made a systematic review of deep learning methods for detection of lung cancer. It reviews peer reviewed journal papers and conferences from 2012 to 2021. Literature review throws light on synthesis of different existing methods covering machine learning (ML), deep learning and artificial intelligence (AI). It provides insights of different deep learning methods in terms of their pros and cons and arrives at possible research gaps. This paper gives knowledge to the reader on different aspects of lung cancer detection which can trigger further research possibilities to realize models that can be used in Clinical Decision Support Systems (CDSSs) required by healthcare units.


INTRODUCTION
Cancer is an abnormal growth of cells in human body. Lung cancer is leading cause of deaths across the world and there are different kinds of cancers according to WHO. They include breast cancer, lung cancer, colon and rectum cancer, skin cancer, prostate cancer, and stomach cancer. Cancer incidence in 2020 is high with 2.26 billion breast cancer cases, 2.21 million lung cancer cases, 1.93 million colon and rectum cancer, 1.41 million prostate cancer cases, 1.20 million skin cancer cases and 1.09 million stomach cancer cases. These statistics show the alarming nature of the disease across the globe. There are many risk factors for cancers. They include tobacco, alcohol, unhealthy diet, air pollution and physical activity. Some chronic diseases also play their role in causing cancer. The major challenge is to detect the lung cancer at early stage and increase the survival rate of lung cancer patients. In this context, technological innovations need to be understood towards cancer diagnosis automatically. With advancements in artificial intelligence (AI) based approaches such as machine learning and deep learning, it is important to consider them for automatic diagnosis of lung cancer. Since there are number of training samples available and they grow in future, supervised learning is found suitable for automatic lung cancer detection. Different methods came into existence based on image processing, machine learning, deep learning, and artificial intelligence (AI). The existing methods are investigated in this paper through systematic review. Several ML based approaches such as [2], [3], [4], [5] and [9], to mention few, are some of the representative approaches where supervised learning is used for lung cancer detection. There are plenty of deep learning methods explored in [6], [7], [8], [10], [11] to mention few, that have different advanced learning models with optimizations for improving prediction performance. Our contributions in this paper are as follows. We have made systematic review of methods used for lung cancer detection. It is backed by a methodology and results of research questions. It throws light on various insights besides giving research gaps. The remainder of the paper is structured as follows. Section II reviews literature on existing lung cancer prediction methods. Section III presents research methodology used for systematic review. Section IV provides the results for different research questions. Section V concludes the paper and gives directions for future scope of the research.

II. LITERATURE REVIEW
This section reviews literature on deep learning methods used for lung cancer detection. It also covers machine learning methods, imaging techniques and performance metrics besides important research gaps.

A. Machine Learning Methods
Machine learning techniques with supervised learning are used for lung cancer detection. Bhatia et al. [2] proposed a methodology to diagnose lung cancer using CT images. Their methodology includes Random Forest (RF) and XGBoost methods along with ensemble learning approach. Radhika and Nair [4] explored different ML models such as Logistic Regression (LR), Decision Tree (DT), Support Vector Machine (SVM) and Naïve Bayes. Pradhan and Chawla [5] used ML models along with a novel architecture for realizing Medical Internet of Things (MIoT). Rehman et al. [9] used ML models like SVM and ANN to detect the disease from lung CT scans. Jenipher and Radhika [14] explored in feature extraction and selection with ML models for early prediction of lung cancer Banerjee et al. [16] studied Classification models like SVM, RF for prediction of early-stage lung cancer. Joshua et al. [17] investigated on different ML techniques suitable for image processing leading to lung cancer prognosis. Chaturvedi et al. [19] studied ML models for detection and classification of lung cancer using MRI, X-Ray, and CT-scan imagery. Raoof et al. [26] proposed ML based framework made up of SVM, ANN and so on for lung cancer research. Shanthi and Rajkumar [27] *Corresponding Author www.ijacsa.thesai.org proposed a feature selection method based on Stochastic Diffusion Search (SDS) to improve ML performance in lung cancer classification.
Singh and Gupta [31] used X-Ray, CT and MRI imagery for lung cancer detection with the help of ML models. DICOM CT images are used by Dev et al. [34] with ML models to detect lung cancer. Saba [38] investigated on recent advancements on ML models for lung cancer research. They used different imaging techniques for proof of the concept. Thallam et al. [40] explored ML models like ANN, KNN, RF and SVM for early prediction of lung cancer. Both ML and IsomiR Expression methods are combined by Liao et al. [46] for cancer diagnosis. Pawar et al. [49] used ML models and image processing techniques to detect lung cancer. Chabon et al. [50] investigated on genomic features for early lung cancer detection in a non-invasive approach. Selvathi and AarthyPoornila [55] studied different ML models such as SVM and KNN for cancer research. Lalitha [60] used ML models for automatic detection of lung cancer. Gupta et al. [62] investigated on supervised ML techniques to detect cancers. Kumar and Rao [63] made research like that of [62].
Katiyar and Singh [66] compared ML models and their efficiency in cancer detection process. Wang et al. [67] focused on pathology analysis using lung cancer images using AI approaches. Houby [68] discussed about disease management techniques associated with ML. Gang et al. [69] proposed dimensionality reduction method along with deep learning to analyses chest X-Ray for detection of lung cancer. Mukherjee and Bohra [70] used ML approaches for disease prediction. Hussain et al. [71] presented different feature extraction methods to improve prediction capability of ML models in disease prediction. Bankar et al. [76] made symptom analysis with data-driven approach using ML techniques for early detection of lung cancer.

B. Deep Learning Methods
Deep learning methods are based on neural networks. They are used widely for processing images such as lung CT and MRI. Tekade and Rajeswari [6] used deep learning models such as VGG and U-net architecture for lung cancer classification. Shakeel et al. [7] proposed "Improved Profuse Clustering Technique (IPCT)" using CT images for lung cancer detection. Shakeel et al. [8] used improved deep neural network and ensemble learning for automatic detection of the disease. Das and Majumder [10] explored different methods linked to deep learning practices for lung cancer detection. Kalaivani et al. [12] used a CNN based model known as DenseNet with CT imagery for disease diagnosis. Shin et al. [13] used deep learning approach for Spectroscopic analysis towards lung cancer diagnosis. Ibrahim et al. [18] used CT scan images investigating on chest diseases such as Covid-19, lung cancer and pneumonia. Their methodology is based on CNN based pre-trained deep learning models such as ResNet152V2 and VGG 19. Cherukuri et al., [21] the study looks upon clinical tomography exercises. Lakshmana Prabu et al. [22] proposed a deep learning-based approach based on techniques known as "Optimal Deep Neural Network (ODNN) and Linear Discriminate Analysis (LDA)." Selvathi and Poornima [23] proposed deep learning methods for medical data analysis using CT and MRI imagery with the notion of Region of Interest (ROI).
Elnakib and Amer [24] proposed a deep learning method based on VGG16 and AlexNet along with an optimization technique based on Genetic Algorithm (GA) to classify lung nodules from CT images. Wang et al. [25] presented a weakly supervised learning approach with deep learning for classifying lung images. They exploited Fully Convolutional Network (FCN) to realize an automated detection system. Liu et al. [28] explored deep reinforcement learning approaches towards lung cancer detection in presence of Medical IoT. Sajja et al. [29] investigated on deep transfer learning using CT images for diagnosis of lung cancer. Schwyzer et al. [30] used deep neural networks with CT images for lung cancer detection automatically. Ardila et al. [32] used 3D deep learning phenomena with end-to-end screening of CT images for lung cancer prognosis. Avanzo et al. [33] focused on combining deep learning with radiomics for efficient detection of lung cancer. Hashemzadeh et al. [35] investigated on automatic cancer screening applications using deep learning models. Kancherla and Mukkamala [36] proposed a novel methodology for lung cancer diagnosis. This methodology exploits features associated with nucleus segmentation. Xu et al. [37] explored serial medical imaging with deep learning for prediction of lung cancer.
Doppalapudi et al. [39] used deep learning approaches to predict period of lung cancer survival. Coudray et al. [41] proposed deep learning-based approach for classification and mutation prediction linked to lung cancer. Munir et al. [43] explored various methods in ML and deep learning for prognosis of different kinds of cancers. Nasrullah et al. [44] used deep learning techniques along with multiple strategies to detect lung nodules and classify them. Mhaske et al. [45] proposed a deep learning algorithm to analyse lung CT image to find the presence of cancer. Hua et al. [47] focused on lung nodule classification using deep learning approaches. Subramanian et al. [48] proposed a deep learning framework using CNN based methods such as AlexNet, LeNet and VGG16. Kumar and Bakariya [51] used deep learning methods to find the presence of malignant cancers in CT images. Kriegsmann et al. [52] used deep learning models to classify and differentiate non-small cell lung cancer from small cell lung cancer. Yang et al. [53] made a retrospective study of whole slide images with deep learning for multi-class classification of lung cancer. Similarly, Hosny et al. [54] made a cohort radiomics study with deep learning methods for lung cancer prognosis.
Jena et al. [56] proposed a hybrid model based on deep learning, named DGMM-RBCNN, for detection and classification of lung cancer. Pham et al. [57] proposed a twostep deep learning approach for lung cancer detection from histopathological images. Gordienko et al. [58] used chest X-Ray images along with deep learning for lung segmentation to diagnose lung cancer. Sun et al. [59] exploited ROI based approaches and automatic feature selection using deep learning for lung cancer diagnosis. Fang [61] made a hybrid approach using deep learning, transfer learning, GoogLeNet and the features linked to median intensity projections. www.ijacsa.thesai.org Ponnada and Srinivasu [64] proposed efficient CNN model based on deep learning for lung cancer prognosis. Cha et al. [65] used deep learning and chest radiographs to detect lung cancer. Hatuwal and Thapa [72] used Histopathological imagery with deep learning such as CNN for lung cancer prediction. Sungheetha et al. [73] made a comparative study of deep learning models while lung CT image segmentation is explored by Brahim et al. [74]. Salaken et al. [75] proposed a methodology to study low population dataset to extract features using deep learning for lung cancer diagnosis. Lustberg et al. [77] proposed automatic contouring technique with deep learning across the globe for clinical evaluation. Lung abnormality detection method is proposed in [78] for automatic lung cancer detection based on deep learning. Masood et al. [79] proposed a methodology for Pulmonary cancer detection with CT imagery. Bharati et al. [80] used X-Ray images to evaluate their hybrid deep learning method that combines VGG, CNN and data augmentation.

C. Imaging Techniques
Lung MRI and CT scan imagery are widely used for lung cancer diagnosis. Rehman et al. [9] used ML models like SVM and ANN to detect the disease from lung CT scans. Shakeel et al. [7] proposed "Improved Profuse Clustering Technique (IPCT)" using CT images for lung cancer detection. Kalaivani et al. [12] used a CNN based model known as DenseNet with CT imagery for disease diagnosis. CT imagery is used by Rahane et al. [15]. Joshua et al. [17] used both MRI and CT scan imagery in their research. Ibrahim et al. [18] used CT scan images investigating on chest diseases such as Covid-19, lung cancer and pneumonia. Chaturvedi et al. [19] studied ML models for detection and classification of lung cancer using MRI, X-Ray and CT-scan imagery. Riquelme et al. [20] used CT scans for lung cancer nodules classification. Lakshmana Prabu et al. [22] used lung CT images for detection of cancer.
In [31] X-Ray, CT and MRI imagery are used for lung cancer detection. In [32] CT images and in [34] DICOM CT imagers are used in lung cancer research. Kadir and Gleeson [42] used advanced imaging techniques for lung cancer diagnosis.

D. Performance Metrics
Widely used performance metrics in the ML and deep learning-based approaches for lung cancer detection are summarized in the Table I. Precision refers to positive predictive value while the recall refers to true positive rate. F1-score is the harmonic mean of both precision and recall which is used to have a measure without showing imbalance while accuracy measure may show imbalance. These metrics are derived from confusion matrix shown in Fig. 1.

E. Research Gaps
With recent advances in deep learning, research has made a significant leap to help identify, classify, and quantify patterns in medical images. Particularly, improvements in computer vision inspired its use in medical image analysis such as image segmentation, image registration, image fusion, image annotation, computer-aided diagnosis and prognosis, lesion/landmark detection, and microscopic imaging analysis, to name a few. Particularly lung cancer is one of the problems that has attracted significant research. Many deep learningbased solutions came into existence. However, there is lot to do to have more accurate detection of lung cancer early. For instance, the research carried out in [22] has significant limitations. First, it has issues in detection of lung cancer early. Second, there is need for improving CNN architectures and cascade them besides making a pipeline with patient-level descriptive statistics for better prediction. Third, ensemble of classifiers could lead to further improvement in prediction of lung cancer.

III. RESEARCH METHODOLOGY
This section presents the research methodology for making systematic review of literature on deep learning techniques for lung cancer detection. It throws light on many aspects of the review process with details on preferred databases for articles, the process of research or conceptual framework, criteria for including or excluding research articles and distribution of articles used in the research. www.ijacsa.thesai.org

A. Research Questions
This systematic review has synthesis of literature from 2012 to 2021 on deep learning methods used for lung cancer detection. It has potential to answer the following research questions. These research questions help to ascertain answers to various questions in terms of deep learning methods, datasets used for the research, development platforms used for building models, performance metrics widely used, imaging techniques and results.

B. Data Sources
Research articles used in this paper are collected from different sources. Article selection process includes criteria for inclusion and exclusion of articles. The databases or digital libraries from which articles are taken are as follows.  IEEEXplore (https://ieeexplore.ieee.org/Xplore/home.jsp)

C. Search Process
Search for articles is carried out with different search phrases. The search applications associated with digital libraries are used to fine relevant peer-reviewed articles that satisfy inclusion and exclusion criteria. The collected articles include articles and conferences. Table II shows search process in terms of different phrases used for finding suitable articles on lung cancer detection. It has provision to collect articles from different sources based on the search phrases provided. After collecting research articles, different criteria on inclusion and exclusion are used to filter out the articles further.

D. Criteria for Article Selection Process
Inclusion and exclusion criteria for this research are defined as presented in Table III. The criteria are meant for leveraging quality of the articles used for the systematic review. Articles in English language that have been published between 2017 and 2021 are used for systematic review. Table III shows the criteria used to select quality articles used in the systematic review. It reflects the state-of-the-art methods used for lung cancer detection.

E. Research Process
We followed a specific research process of search, exclusion, and inclusion criteria of articles that is illustrated in Fig. 2. As many as 180 articles are collected using the search phrases. www.ijacsa.thesai.org This article selection process has led to the removal of irrelevant articles that do not meet the selection criteria.
As presented in Fig. 3, the articles that satisfied inclusion and exclusion criteria are subjected to data analysis process. It is found that most of the articles belong to 2017 to 2021 publication years. From 2012 to 2016 only five articles are found to satisfy the selection quality process.
As presented in Fig. 4, the article distribution is provided in terms of percentage for each year. The highest number of articles belong to 2019 and 2020 years with 26% while 25% of articles are selected from 2018.
As presented in Fig. 5, IEEE is the publisher from which 11 journal papers and 14 conference papers are selected. From Elsevier publications 12 articles are selected while from Springer publications 10 articles are selected. The articles taken from Google Scholar are 21 while 5 are from MDPI.   As presented in Table IV, the articles used for the survey in this paper are provided and they belong to different journals and conferences from diversified and reputed publishers.

RESULTS OF THE RESEARCH
This section provides results containing answers for many research questions provided in Section III A.

A. Results of Research Question 1
Research Question 1: What are different deep learning methods used from 2012 to 2021 to diagnose lung cancer?
Most authors used Convolution Neural Network to detect the lung cancer from 2012 to 2021.
As presented in Table V, different deep learning methods used from 2012 to 2021 to diagnose lung cancer are provided. As presented in Table VIII, different performance metrics used for evaluation of lung cancer detection methods are provided. Performance metrics like Accuracy, sensitivity, specificity, f measure and ROC curves using the following metrics, the authors evaluate their developed models.

E. Results of Research Question 5
Research Question 5: What are the imaging techniques used for lung cancer detection from 2012 to 2021?
As presented in Table IX, different imaging techniques used for lung cancer detection from 2012 to 2021 are provided. The imaging techniques like Computer Tomography, X-Ray, and magnetic resonance imaging (MRI) provide different images related to Lung cancer and non-cancer images. The authors used those images as datasets for deep learning models.

F. Results of Research Question 6
Research Question 6: What are the results obtained for lung cancer detection from 2012 to 2021?
As presented in Table X, different performance metric

V. CONCLUSION AND FUTURE WORK
In this paper, we made a systematic review of deep learning methods for detection of lung cancer. It reviews peer reviewed journal papers and conferences from 2012 to 2021. Literature review throws light on synthesis of different existing methods covering machine learning (ML), deep learning and artificial intelligence (AI). It provides insights of different deep learning methods in terms of their pros and cons and arrives at possible research gaps. This paper gives knowledge to the reader on different aspects of lung cancer detection which can trigger further research possibilities to realize models that can be used in Clinical Decision Support Systems (CDSSs) required by healthcare units. Besides, this systematic review could answer different research questions that may be of help to other researchers and readers. In future, we will work on filling certain research gaps identified in this paper. www.ijacsa.thesai.org www.ijacsa.thesai.org