TADOC : Tool for Automated Detection of Oral Cancer

Cancer is a group of related diseases and it is necessary to classify the type and its impact. In this paper an automated learning-based system for detection of oral cancer from Whole Slide Images (WSI) has been designed. The main challenges of the system were to handle the huge dataset and to train the machine learning model as it consumed more time for each iteration involved. This further increased the time consumed to get a proper model and decrease of freedom for experimentation. Other important key features of the system were to implement a futuristic deep learning architecture to classify small patches from the large whole slide images and use of carefully designed post-processing methods for the slide-based classification. Keywords—Cancer; CT Scan; MRI Scan; Machine Learning; Deep learning; Convolutional Neural Network (CNN); Whole Slide Image (WSI); Residual Networks (ResNets)


I. INTRODUCTION
The signs and symptoms of cancer are not visible initially, they are only visible when the mass grows, the growth of abnormal cells in the human body results in different types of cancer affecting the surrounding tissues. Tumors result from the growth of the extra cells which divide without stopping. The advancement of these tumors via the blood or lymph system of the humans result in new tumors away from the place of origin [2]. Several forms of presence of cancer are, limitless number of cell division, promotion of blood vessel construction and avoidance of programmed cell death. Survey based on several factors such as lifestyle, environment, inherited genetics shows that the death rate of patients with cancer are more prone to suicide when compared to the normal people.

A. Background Study
The nomenclature for various types of cancers are usually based on the organs or tissues from where the cancers origin. Doctors use a combination of tests to diagnose the existence of cancer cells in the body. Cancers that comprise under Head and Neck cancer are Lip or Oral cavity cancer, Mouth cancer, Oral cancer, etc. [2].
The cancers are categorized into major five types based on the type of association with the cell they originate from namely, carcinoma, sarcoma, leukemia, lymphoma and myeloma, brain tumors [1].
Some of the often-used tests for identifying the cancerous cells include the following: 1) Laboratory testing: This is a very primary method to determine cancer which can help to rule out other diagnostic procedures.
2) Biopsy: This type of test involves taking a sample of the tissue from a cancerous lesion and to subject it to further laboratory procedures.
3) CT scan: An advanced technique to the regular X-ray method that helps the doctor to scan more details.

4) MRI scan:
A technique that makes use of magnets, radio waves and a computer to provide detailed analysis [3].

B. Types of Treatment
A lot of research has been done towards the treatment procedures referring to the types of cancers detected. The different types of treatment methods are as listed below listed with their scope of treatment [2].
 Surgery: follows a procedure-based method to treat the cancer.
 Radiation Therapy: radiation of high dosage is used to kill the cancer cells and to reduce / shrink the tumors.
 Chemotherapy: use of drugs.
 Targeted Therapy: in this type of treatment, the cancer cells can grow, divide and spread.
 Stem Cell transplant: in the process of chemotherapy or radiation therapy , the patient under treatment suffers loss of blood and this is considered as a supplement technique to restore blood forming stem cells.
 Precision medicine: in this type the doctors diagnose and treat the patients based on the genetic history [2].

C. Problems in Manual Diagnosis a)
Delay in diagnosis is the main issue with the manual diagnosis of cancer. It involves extremely skilled labors and the, number of diagnosis tests being requested is growing exponentially.
b) Hinders the possibility of early recognition of tumor grade due to the above stated problems of time consumption for proper diagnosis. www.ijacsa.thesai.org c) Obstruct the provision of instant diagnosis report as the conclusions should be drawn out carefully without causing any fatalities to occur. d) Enormous work strain of pathologists is a real concern, and this also drops down the accuracy of the pathologist's prediction.
The designed system aims to achieve the following: 1) To handle WSI(Whole Slide Image) i.e. to find an effective way to open the Whole Slide Images instead of opening it in a document viewer with multiple levels of image visible irrespective of its relevance.
2) To create patches from WSI and train the Deep Learning model for prediction.
3) To train the system from the patches generated as mentioned in the above step, such that the trained model will have appropriate weights attached to each parameter after analyzing thousands of patches.

4)
To predict tumorous regions and generate the heat map from the prediction model, this will highlight the regions which have high probability of cancer.
The rest of this paper is organized as follows. Section II proceeds with research background and methodology. In Section III, a brief description about deep learning and the details of each part of the implementation is discussed. The result outcomes are as shown in Section IV. Finally, the conclusion and future work are followed in Sections V and VI.

II. RESEARCH BACKGROUND AND METHODOLODY
With the aid of Machine learning, the problems based on appropriate data that will fit into the designed models by using different learning algorithms has been discussed [6][7] [26]. M. Praveen Kiruba bai [54] has explained about the different consequences and techniques related to the detection of Oral Cancer and it has been observed that the Oral cancer on detection at early stage is curable.
Komura, D. and Ishikawa S [25] have explained the techniques for histopathological image analysis using machine learning, the authors have also discussed the importance of collaborating WSIs data based on common criteria.
Since images comprise of several overlapping objects and clusters, an automated system for detecting and classifying the microscopic biopsy images has been proposed in [55].
The importance and applications of medical image analysis using deep learning method has been discussed in [28][50] [56].

A. Conventional Techniques
A digital image is a collection of a Pixel or Pel refers to the finite number of elements which are specific to their value in a digital Image [7].
The Image processing involves a 3-stage process namely importing by using image acquisition tools, analyzing and manipulating the image followed by generating a report based on the features of interest.
Digital image processing covers several areas of importance such as in the field of medicine, pattern recognition, video processing, image sharpening and restoration [5].
Preprocessing is considered to be one of the elementary steps in image research, which will ease the user to make the image representation in such a way that the application of algorithms will be much easier for various other operations such as segmentation, feature extraction and so on [11][12] [19]. The separation of foreground from the background is a vital part for image processing and computer vision as it reduces the computational resources utilized [13].

B. Image Segmentation
The area of interest through different methods from an image viz. cell, nuclei or tumor can be obtained by Image segmentation [18].
The various Image Segmentation methods are as illustrated in Fig. 2 which aid in the diagnosis [19] [20].

C. Feature Extraction
The prime focus of this method is to detect, isolate distinct portions and features of images. The features extracted are then fed into machine learning algorithm for classification [23] [24]. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 3, 2020 508 | P a g e www.ijacsa.thesai.org

D. Classification
With the completion of Feature extraction stage, the output which is in the mathematical form is fed into the machine learning algorithm. The taxonomy of classification algorithms for classification purposes, is as shown in Fig. 3.

E. Disadvantages of Conventional methods
 Time Consuming and processing of image takes very long time.
 Choosing appropriate method for each step for processing images.
 Choosing region of interest and appropriate segmentation method.
 Developing proper feature extraction algorithms. Without proper feature extraction the training model and the prediction accuracy will be improper.
 Choosing an appropriate classification algorithm for classification of an image based on the features extracted.
Conventional glass slides are scanned to create digital slides which is referred as Whole Slide Images (WSI). These images have gained beneficiary results in field of education, diagnosis, research. WSI has avoided variance of slide quality by reproducing the same image with the exact orientation. Due to its high image resolution WSI has provided an opportunity of feasible diagnosis for research [29]. A digital WSI is represented as a pyramid with different magnification levels.
For computing resources such as processing power, advanced software is easily available now, digital images have gained wide variety of applications in pathology [30].  There are many challenges that must be addressed while utilizing the WSI, this is because each WSI will occupy large storage space due to its high resolution. Hence storage, transmission and interoperability of WSI are challenging tasks. WSI acquired from different microscopic instruments may have different resolutions and scales of magnification as shown in Fig. 4. The format specification of WSI is not universal which leads to a conflict in viewing, analyzing, accessing with software [31]. Even though WSI enables easy processing facilities of pathological images, these are some of the complexities in handling those images [32].

F. Patch Generation
Patches are sub-images derived from the original image as shown in Fig. 5. Patch can be uniquely identified by horizontal and vertical location inside image, coordinate of center of patch and its size. Patches can be extracted by calculating pixel location of the square when the location and the size are specified. Global features contribute to extraction of texture information, color distribution or whole image information. Information accessed from the global features often turn out to be inadequate, whereas local features like patches will suit to represent restricted region of complex images.
Extraction of these patches can be done through various methods.
 Grids point specification Regular grid of desired patch size is projected on the image which provides the points to extract. Gaps might be included between the patches depending whether they overlap or not.

 Random point specification
This is like grid point specification except that this chooses the points in random. Hence this is distributed over the image. Region of Interest is focused and the points inside the same is considered for generating the patches.

G. Advantages of patch-based Approach
 Recognition of the object is location independent. Object that must be recognized might be present in different location in different images [49]. As it is patch based approach, object can be identified irrespective of the location.
 Identification of partial part of the object. Patch based approach helps to identify the objects present in the image even if it is partially occluded.
 Irrespective of size scaling in different images, object can be identified depending on the patch size [33].
Patch based CNN was specifically used in the Music score images [34]. CNN used in the proposed system consists of three convolutional layers which takes in the input. Output of these layers are fed into max pooling and LRN layers. Three fully connected layers consists of 512 neurons each.
This model also consisted of two dropout layers and they were termed as dense1 and dense2 probability of 50% drop probability. Glorot Initialization and ReLU activation are used for initialization and activation respectively for convolutional fully connected layers.
According to the paper patch-based CNN approach has provided promising results in solving writer classification problems.
A Patch Strategy for Deep Face Recognition [35],proposes a system that would take online cropped images as input for face recognition. Multibranched CNN that learn from each patch and entire face representation is done by considering all the patches is used. AlexNet and ResNet pre-trained CNN models are used for analyzing the efficiency of the method. As an end to end training model, usage of both global and local features is done effectively. Six patches of size 136x136 pixel with facial key points from aligned face images are considered.
These patches are passed onto pooling and convolutional layers. Feature fusion is accomplished by fully connected layers. This method boosts the performance of face recognition as it enhances the representation of local features.
In [36] the researchers propose a system with multiscale version of the patches as input. Down sampling is carried by decimating smooth version and up sampling is carried by nearest neighbor interpolation. The proposed system yields smooth and compact segmentation results.
Comparison of Deep Learning patch-based frameworks such as ConvNet, AlexNet and VGG models was carried by training and testing these models with publicly available, high resolution datasets. Varied patch dimension such as 11x11, 21x21, 29x29, 33x33, 45x45 are considered for comparing the accuracy rates and to choose the appropriate patch size for the model [37]. Small patch size turned out to affect the quality and robustness of features in deep layers.
Authors propose patch based Deep Learning approach to explore subtypes of cancer [27] [38]. Even though CNN has acquired prominence in image classification, handling high resolution image implies high computational cost. Training CNN directly with Whole Slide Image (WSI) of size merely gigabytes would lead to down sampling and data inefficiency. Hence patch based CNN model for lung cancer subtype classification was proposed by Le Hou.

H. CNN Architectures
CNN takes input in the form of a bunch of arrays, the data is readily available in the form of images and follows the deep feed forward mechanism of network. Images are multidimensional array with each unit holding the pixel values and intensities.
CNNs are multi-layered neural networks which can be further subdivided into convolutional and pooling layers. The Fig. 6 as shown below illustrates the representation of a CNN.
The development of a system is based on how neurons work and therefore from the human brain itself. Numerous applications like document processing, semantic analysis of documents, sounds and images have been created using the CNNs already . The document processing system uses a CNN and can as well be trained to implement constraints on languages [39] [40][41] [42].
There is another variant of the CNNs called as the Fully Convolutional Network (FCN) widely being used for some of the above stated cases as shown in Fig. 7. As already discussed above, CNN will have multiple layers and each layer is a 3D array, where 2 of the three dimensions are spatial dimensions and the other feature is the feature dimension. If the representation of 3 layers is x * y * z, then the first layer i.e. x * y is also the image dimension in pixels.
The efficiency of a Deep Neural Network can be intensified by boosting the depth and its width (size of the network). The easiest way of acquiring models with higher accuracy for gigantic amount of data can be achieved by intensifying the depth and width of the network. But this method has a major drawback in the form of the amount of input features that would be dealing with, which certainly leads to overfitting [43].

I. Residual Networks (ResNets)
In Residual Networks (ResNets), the neural network is broken into small pieces and link the pieces through skip or shortcut type of connections that will form a big network.
Based on the type of input and output dimensions, the residual networks takes into account 2 types of blocks namely the identity block where the input and output activations are similar, while in the convolution block of connection the dimensions differ, as shown in Fig. 8, 9 below depict the representation of these 2 types ResNets blocks. www.ijacsa.thesai.org

A. System Design
The data set contained images from a disparate patient population who had oral cancer [46]. The image quality also plays a great role as we can get a better prediction model with an image with a higher resolution [53]. Sometimes the model of image acquisition will have unnecessary variation unrelated to classification levels [44] [45].
When the image is fed to the system in a correct format, it undergoes processing through different modules as shown in the data flow diagram in Fig. 10.

B. Model Training in Deep Learning
A machine learning algorithm has been devised for the model used with the following steps during the training process: [Step 1]: Define Appropriately the Problem (objective, desired outputs). [ Step 2]: Gathering/ Collection of data. [ Step 3]: Set up an evaluation protocol. [ Step 4]: Formulate the data (viz missing values, Categorical values). [ Step 5]: Split the data appropriately. [ Step 6]: Generalize between overfitting and underfitting problems. [ Step 7]: Summarize the learning process of a model.
[ Step 9]: Developing a better model & tuning its hyper parameters to get the best performance possible.

A. Trained Model Results
As per the survey, even though the work started out with a Fully Convolutional Network (FCN), that is being executed with Keras over TensorFlow, the system had to stop to avoid loss of models which were under training, due to power outages. It was found that using PyTorch would give automatic checkpoints for the models under training till the point of failure, that shifted the focus to the same. The trained model in this was stored in the .ckpt format [9][10] [52].
After going through similar implementations for WSI images on other cancer type dataset, referring to one of the recent researches that used ResNet for training models for lung cancer, it was found that there is a similarity in the models used based on same image resolutions. The checkpoints obtained when testing the model with the said dataset, was used further in the process for heatmap generation and other further evaluations.

B. Prediction and Heatmap Generation
In order to understand and interpret the trained model in medical image analysis, visualization of the results is important factor. Most of the times, prediction calculation involves mathematical approach to obtain the probability calculated for that dataset by trained model, based on its knowledge gained during training process [47] [48]. Such aspect does not provide more clarity or the evidence to trust the trained model. Hence heatmap generation comes into picture.
There are various methods and readily available python modules to carry out this task. Activation functions and optima's that are chosen during the training process plays important roles while generating heat map. Approach that is opted to obtain the same in this project is like that of window slide probability calculation. That is, probability of each patch (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 3, 2020 511 | P a g e www.ijacsa.thesai.org generated from test WSI being tumorous is calculated and is stored in NumPy array. Thus, region prone to tumorous are highlighted in the heatmap.
The output obtained from the prediction algorithm, which was in the NumPy format is converted or depicted as an image.
The above figure is a set of sample images that has undergone a heatmap based prediction. The first image i.e. Fig. 11(a) gives us a glimpse of the actual WSI image under consideration. The next image Fig. 11(b) is the label or mask which is the information about the image under prediction. In this case, the image is cancerous, and the white area is marked as cancerous by pathologists. The next three figures illustrate the prediction heat map obtained for the above image using various models for prediction.
The heat map so obtained is a clear indicator of presence or absence of cancer in each slide provided, the model under evaluation is accurate. These heatmaps are particularly useful for pathologists as they mark the area under suspicion and that part of the slide can be easily selected and observed by any pathologist.

C. User Interface
The user interface as shown in Fig. 12(a) to 12(e) is a native application developed for ubuntu operating system using an open source software called PyQt5. Even though a web application would have been easily accessible to everyone, it was not considered due to the obvious reasons of data size and bandwidth capacity [4][8] [51].

V. CONCLUSION AND FINDINGS
A learning-based system for automated detection of oral cancer from whole slide images (WSI) has been presented. The main challenges of the system were to handle the dataset as it was huge, to train the machine learning model as it took huge amount of time to get each iteration of the model [6] [51]. This further led to the increased time consumed to get a proper model and decrease of freedom for experimentation.