Breast Cancer Classification in Histopathological Images using Convolutional Neural Network

Computer based analysis is one of the suggested means that can assist oncologists in the detection and diagnosis of breast cancer. On the other hand, deep learning has been promoted as one of the hottest research directions very recently in the general imaging literature, thanks to its high capability in detection and recognition tasks. Yet, it has not been adequately suited to the problem of breast cancer so far. In this context, I propose in this paper an approach for breast cancer detection and classification in histopathological images. This approach relies on a deep convolutional neural networks (CNN), which is pretrained on an auxiliary domain with very large labelled images, and coupled with an additional network composed of fully connected layers. The network is trained separately with respect to various image magnifications (40x, 100x, 200x and 400x). The results presented in the patient level achieved promising scores compared to the state of the art methods. Keywords—Convolutional neural network (CNN); histopathological images; imagenet; classification


INTRODUCTION
According to reports of the world health organization (WHO), breast cancer (BC) is the most prevalent type of cancer in women.For instance, incidence rates range from 19.3 per 100,000 women in Eastern Africa to 89.7 per 100,000 women in Western Europe [1].Current scientific findings indicate that such high variability might be traced back to differences in lifestyle and urbanization.Although early diagnosis is more affordable in developed countries, it is less likely in underdeveloped nations, which implies that undertaking preventive measures only does not offer a cuttingedge solution.
Mammography is a common screening protocol that can help distinguish dubious regions of the breast, followed by a biopsy of potentially cancerous areas in order to determine whether the dubious area is benign or malignant [2], [3].In order to produce stained histology slides, samples of tissue are taken from the breast during biopsy.In spite of the considerable improvement incurred by such imaging technologies, pathologists tend to visually inspect the histological samples under the microscope for a final diagnosis, including staging and grading [4].
In this context, automatic image analysis is prone to play a pivotal role in facilitating the diagnosis; so far, the relevant processing and machine learning techniques.For instance, the authors in [5] present a comparison of different algorithms of nuclei segmentation, where the cases are categorized into benign or malignant.
Deep CNNs learn mid-level and high-level representations obtained from raw data (e.g., images) in an automatic manner.Recent results on natural images indicate that CNN representations are highly efficient in object recognition and localization applications.This has instigated the adoption of CNNs in the biomedical field, such as breast cancer diagnosis and masses classification [6]- [11], abdominal adipose tissues extraction [12], detection and classification of brain tumour in MR images [13]- [16], skeletal bone age assessment in X-ray Images [17], EEG classification of motor imagery [18], and arrhythmia detection and analysis of the ECG signals [19]- [21].In particular, in [9], the authors propose a framework for masses classification, which mainly encompasses a CNN and a decision mechanism for breast cancer diagnosis as either benign or malignant in a DDSM mammographic dataset.In [4] , the authors propose an improved hybrid active contour model based segmentation method for nuclei segmentation.They adopt both pixel and object-level features in addition to semantic-level features.The semantic-level features are computed using a CNN architecture which can learn additional feature representations that cannot be represented through neither pixel nor object-level features.Thus, it is to stress the fact that, relatively to other biomedical applications, breast cancer diagnosis has not benefited enough from deep learning, which inspired us to investigate it thoroughly.In particular, I opt for several deep architectures in the context of breast cancer histological image classification, and demonstrate that the common belief that high level deep features are more capable of capturing the contextual as well as spectral attributes in optical images remains valid also in histological breast cancer images.This, in fact, is confirmed by the very satisfactory results reported hereby, which advance late works often by large margins.
The rest of this paper is organized as follows.Material and methods are exposed in Section II.Results are presented in Section III.Finally, conclusion is provided in Section IV.

A. Dataset Description
In order to realistically assess any BC diagnosis system, the experiments shall be performed on a large-scale dataset accommodating 1) numerous patients, 2) abundant images.The www.ijacsa.thesai.orglatter component is essential to any deep learning model as large data is required for the training phase.
The Breast Cancer Histopathological Image Classification (BreakHis), which was established recently in [22], is an optimal dataset as it meets all the above requirements.Precisely, it is composed of 9,109 microscopic images of breast tumour tissue collected from 82 patients using different magnifying factors (40X, 100X, 200X, and 400X).For convenience, Fig. 1 and 2 display a slide of breast benign and malignant tumour for the same patient seen in different magnification factors.
To date, the dataset contains 2,480 benign (taken from 24 patients) and 5,429 malignant samples (taken from 58 patients) of 700X460 pixels, 3-channel RGB, 8-bit depth in each channel, and PNG format.In its current version, samples present in the dataset were collected by SOB method, also named partial mastectomy or excisional biopsy.This type of procedure, compared to any methods of needle biopsy, removes the larger size of tissue sample and is performed in a hospital with general anaesthetic.This dataset is structured as shown in Table I.

B. Proposed Methodology
Let us consider as the labeled source data and is its corresponding class label either to be benign or malignant.Similarly, let us refer to as the unseen target data.This paragraph consists of two steps as shown in Fig. 3.

1) Feature Extraction
Deep CNNs are composed of multiple layers of processing which are learnt jointly, in an end-to-end manner, to address specific issues [23]- [25].Particularly, Deep CNNs are commonly composed of four types of layers namely convolution; normalization, pooling and fully connected.The convolutional layer is considered the main building block of the CNN and its parameters consist of a set of filters (or sometimes referred to as a neuron or a kernel).Every filter is small spatially (along width and height), but extends through the full depth of the input image.The output of this layer is called activation maps or feature maps which are produced via sliding the filters across the input image.The feature maps are then fed to a non-linear gating function such as the Rectified Linear Unit (ReLU).Then the output of this activation function can further be subjected to normalization layer to help in generalization.Regarding the pooling layers, they are usually used immediately after convolutional layers in order to control overfitting and reduce the amount of parameters in the network.In this work, I follow the recent approaches for exploiting pretrained CNN models by taking the output of the last fully connected layer (before the sigmoid layer) to represent the images.That is I feed each image as input to the network and generate its corresponding CNN feature representation vector of dimension : Where represent the functions defining the different layers of CNN, is the total number of layers, and and represent the number of labeled source images and unlabeled target images, respectively.Fig. 4 shows feature maps with corresponding extracted features.

2) Classification
I feed the CNN feature vectors from the previous stage to an extra network placed on the top of the pretrained CNN as shown in Fig. 3. Specifically, this network is composed of two fully-connected layers, a hidden layer followed by binary classification layer: a sigmoid layer.The hidden layer maps the input to another representation of dimension through the nonlinear activation function as follows: ) Where mapping weight matrix referred as .I adopt the sigmoid function i.e. as a nonlinear activation function.For simplicity, the bias vector in the expression is ignored as it can be incorporated as an additional column vector in the mapping matrix, whereby the feature vector is appended by the value 1. [26] to increase the generalization ability of the network and prevent it from overfitting.

A. Experimental Results
For the sake of comparison, I followed the protocol proposed in [22], the dataset has been divided into training (70%) and testing (30%) sets taking in consideration that the patient used in training and testing sets are mutually exclusive for training i used 58 patients (17 Benign and 41 Malignant), for test use used 24 patients (7 Benign and 17 Malignant) for test.The adopted protocol was applied independently to each of the four different available magnifications factor (40X, 100X, 200X, and 400X) in the patient level, the recognition rate is computed as follows:

While
Here, be the number of cancer images of patients, and represent the cancer images that are correctly classified.
For the pretrained CNN, i explore VGGm model [27] which composed of 8 layers, and uses five convolutional filters of dimensions (number of filters × filter height × filter depth: 96×7×7, 256×5×5, 512×3×3, 512×3×3, and 512×3×3) and three fully connected layers with the following number of hidden nodes (fc1: 4096, fc2: 4096, and softmax: 1000).This network was pretrained on the ILSVRC-12 challenge dataset.I recall that the ImageNet dataset used in this challenge is composed of 1.2 million RGB images of size pixels belonging to 1000 classes and these classes describe general images such as beaches, dogs, cats, cars, shopping carts, minivans, etc.As can be clearly seen, this auxiliary dataset is completely different from the ECG signals used in the experiments.
For training the extra network placed on the top of the pretrained CNN, I follow the recommendations of [28] for training neural networks.I set the dropout probability to 0.5.I use a sigmoid activation function for the hidden layer.For the backpropagation algorithm, I use a mini-batch gradient optimization method with the following parameters (i.e., learning rate: 0.01, momentum: 0.5, and mini-batch size: 50).The weights of the network are set initially in the range [-0.005 0.005].

Extracted Features
Input Data www.ijacsa.thesai.orgTo present our results, I train the CNN networks depending on their magnifications (40x, 100x, 200x, and 400x) separately.The experiment was repeated five times as shown in Tables II-V, the average accuracy for five cases of the proposed CNN methods has been reported in Table VI which shows the superior accuracy of the proposed methods at patient level against state of the art methods in different magnification factors.

IV. CONCLUSION
This paper proposed a deep learning framework for breast cancer detection and classification.The yielded results confirm that deep learning can incur large margin improvements with respect to handcrafted features.Although the presented method achieves plausible scores, it can benefit from further improvements, potentially by 1) customizing more deep models; and 2) fusing several deep architectures in order to elevate the performance.Another direction to undertake is to adopt active learning in order to raise the classification scores.Ultimately, domain adaptation is another research line that can introduce tangible improvements.

TABLE I .
BREAST CANCER DATABASE USED IN THE EXPERIMENT

TABLE VI .
THE PATIENT-LEVEL CLASSIFICATION ACCURACIES (%), COMPARING OUR METHOD WITH EXISTING RESULTS ON THE BREAKHIS DATASET