Intelligent System for Detection of Abnormalities in Human Cancerous Cells and Tissues

Due to the latest advances in the field of MML (Medical Machine Learning) a significant change has been witnessed and traditional diagnostic procedures have been converted into DSS (Decision Support Systems). Specially, classification problem of cancer discovery using DICOM (Digital Communication in Medicine) would assume to be one of the most important problems. For example differentiation between the cancerous behaviours of chromatin deviations and nucleus related changes in a finite set of nuclei may support the cytologist during the cancer diagnostic process. In-order to assist the doctors during the cancer diagnosis, this paper proposes a novel algorithm BCC (Bag_of_cancerous_cells) to select the most significant histopathological features from the well-differentiated thyroid cancers. Methodology of proposed system comprises upon three layers. In first layer data preparation have been done by using BMF (Bag of Malignant Features) where each nuclei is separated with its related micro-architectural components and behaviours. In second layer decision model has been constructed by using CNN (Convolutional Neural Network) classifier and to train the histopathological behaviours such like BCP (Bags of chromatin Paches) and BNP (Bags of Nuclei Patches). In final layer, performance evaluation is done. A total number of 4520 nuclei observations were trained to construct the decision models from which BCP (Bags of Chromatin Patches) consists upon the 2650 and BNP (Bags of Nuclei Patches) comprises upon 1870 instances. Best measured accuracy for BCP was recorded as 97.93% and BNP accuracy was measured as 97.86%. Keywords—Medical Image mining; Decision support system; Pre-process; DICOM; FNAB


INTRODUCTION
Recently classification of histopathological images is one of the active research area(s) of machine learning.Prediction of human cancerous cells and tissues would assume to be one of the most significant problems because micro-architectural components are likely to be found with heterogynous malignant behaviors and doctors are facing lots of confusions during the diagnosis phase.For example a set of microarchitectural components for each cancer type (such as well differentiated, poorly differentiated, benign cancers and other cancers) have deviated chromatin distribution, heterogynous nuclei behaviors, varying evidences for acentric nucleus within the set of nuclei and so on.In order to resolve these problems, some of very nice medical CAD (Computer added diagnosis) systems have been seen in recent past i.e. [1], [2] and [3].These proposed approaches, addresses the classification problems of medullary, papillary, follicular carcinomas but yet cancerous behaviors such as chromatin level distortion, diffuse nuclei deviations and other cancerous behaviors are not reported in literature.Since efficient classification and feature selection of malignant behaviors would provide more dynamic assistance to doctors during the diagnostic phases because DICOM (Digital Imaging and Communications in Medicine) images are heterogynous in nature, always found with different shapes, sizes and structures at micro-architectural levels which depends upon the stage of different tumors as shown in[Figure 1].In-order to provide assistance to cytologists in early diagnosis of cancer and to classify the malignant behaviors, this paper propose a system so called -Intelligent system for detection of abnormalities in human cancerous cells and tissues‖, which provides in-depth hidden knowledge of nuclei behaviors by proposing a novel algorithm BCC (Bag_of_cancerous_cells) where each nuclei is separated with its micro-architectural components so called BCP (Bags of chromatin Patches) and BNP (Bags of Nuclei Patches).A total number of 4520 nuclei were trained to construct the decision models from which BCP (Bags of chromatin Patches) www.ijacsa.thesai.orgconsists upon the 2650 and BNP (Bags of Nuclei Patches) was 1870 instances.Best accuracies for BCP (Bags of Chromatin Patches) and BNP (Bags of Nuclei Patches) were measured respectively as 97.93% , 97.86% .
Rest of paper is organized into the five sections, where section one is used to define the introduction and the second section reports the related works.Methodology is described in detail in section number three and the results have been presented into the section number four.Conclusion and discussion have been discussed into section five II.RELATED WORKS Basically this paper offers a productive modelling approach which deals with the classification problem of thyroid malignant diseases.Since histopathological DICOM (Digital Communication in Medicine) images needs special consideration to construct a decision model and to predict the micro-archetechral component behaviors such as abnormal chromatin distribution and heterogeneity between the set of nuclei.Many research approaches were proposed to solve the classification problem of cancerous cells and a few of them are presented as bellow.
A comparison [1] of different nuclei segmentation algorithms was proposed for thyroid disease by using the image clustering algorithms i.e.K-means and watershed in first stage as unsupervised learning and in second stage a template matching strategy algorithm was used for classification model as supervised instance learning.The best accuracies for both techniques were recorded respectively in final layer, 72% and 87%.Since micro-architectural components are very difficult to classify because of heterogeneity in terms of shapes, sizes and behaviors but the proposed approach of this paper deals the histopathological images at more deepest levels and it provides more effective assistance to doctors during the experimental setups to predict the interrelated properties of tumors at early stage.
A Comparison [2] of three ML (Machine Learning) neural network based algorithms was conducted to deeply analyze thyroid disease datasets.the classification model was built by using Scaled Conjugate Gradient, quasi-Newton method, Gradient Descent with Momentum and Bayesian regularization algorithms where gradient based layered features were used and the calculated best accuracies were approximated respectively as 90.5%, 86.30% and 83.50%.since the prediction of abnormal behaviors of cells is an essential problem at early stage but diffuse shape of pixels does not allow to select the appropriate set of pixels belonging to the features of regions of interest where cancerous material is persisting in DICOM image.The proposed algorithm of this paper auto detects and segments the cancerous regions by selecting the chromatin and nuclei behavior based feature.Since a dynamic threshold segmentation would allow to doctors to detect various regions of medical images at more granular levels because fixed threshold settings and intensities may not allow to detect a proper set of related attributes, there are maximum fare chances for loss of valuable image information but proposed approach of this article deals effectively with in-depth medical image segmentation and provides more precise assistance to doctors.
A system [4] was proposed for thyroid disease diagnosis and MIL (Multiple Instance Learning) was used as machine learning algorithm to predict the disease.Fully connected neuron model was constructed to classify cancerous thyroid disease tissues and best accuracy of the system was measured as 95.40%.Since the appropriate feature selection would reduce the computational complexity of CNNs (Convolutional Neural Networks) because effective algorithms would enhance the performance evaluation of a classifier.This paper contributes following three contributions.This paper uses state of art classification algorithm such as CNN (Convolutional Neural Networks) and our preprocessing algorithm reduces the complexity of pixel layers and every behavior is represented into 28 X 28 pixel size whereas the size of DICOM image is very high and needs significant time and memory constraints.

III. METHODOLOGY
The methodology of this paper falls into the category of machine learning and offers predictive modelling approach for classification of the biological behaviors of thyroid cancerous cells and tissues.This paper uses a real-world dataset of DICOM (Digital Communication in Medicine) images for FNAB (Fine Needle Aspiration Biopsy) received from Cytological department of affiliated hospital in Pakistan.Methodology of proposed system comprises upon three major layers as shown in [Figure 2].In first layer datasets are prepared by using BCC (Bag_of_cancerous_cells).In second layer decision model is constructed by using CNN (Convolutional Neural Network) classifier by training the selected features from the malignant bags as BCP (Bags of chromatin Patches) and BNP (Bags of Nuclei Patches) to classify the abnormal behaviors of nuclei.In final layer performance evaluation is performed.www.ijacsa.thesai.orgLet's consider an image consist upon the set of heterogynous attributes known as nuclei.Every nuclei in a set of image has its own behaviors and features.Firstly noise reduction was done and unnecessary information was eliminated, since the behaviors of nuclei are likely to be detected by using Otsu's global threshold method [Figure 3, B], where grey level intensity is considered as L means (0, 1… L-1) since L is assigned to approximate two classes by arranging the ranges 0 and 1.Those pixels who have the value of one are the pixels which represents some objects such as nuclei and zero range pixels are to be subtracted from the image by considering the noise.Let's consider are assigned to represent every pixel of the image, where pixels are counted by intensities, where two level thresholds represented by ( ) .Let's formulate as class, -, -.
Where each pixel P having the rang 1 such that , is the summarized measure of and the global variance may be calculated by taking the slandered division of all pixels eq. ( 3) Class 0 intensity pixels may be acquired by using the slandered division of white pixels whose intensities would be considered ( ) ( ) where class 1 qualifying pixels would be divided to acquire the white matter of the objects by substation process of Otsu's method and adaptive threshold would represented as shown in eq.( 4).)have to qualify the threshold as approximated by eq. ( 5).A function eq.( 6) have been created and the edges are detected by considering the HLT (High Level Threshold) at each connected corners of nuclei starts from zero to infinity eq (7).
(7) Let's consider ( ) is required intensity where s represents the number of empty nuclei and t is the threshold of every nuclei which is to be connected by colour scheme as per homogeneity and heterogeneity eq.( 7) and eq.( 8).We use watershed segmentation to find the these regions having , where n is the number of connected components with min and max limits, since we count , -⋃ ( ) ( ) where n is beneath number of T(n) which is randomly filled and counted as number of region in ⋃ .
, -*( )| ( ) Regional maxima library is used to extract the spatial attributes incorporated by Euclidian distances between the instances.Nuclei separation was done by radii cuts by considering the centroids of the image as spatial locations.As shown in [Figure 3] nuclei have been separated and converted into grey scale intensities because doctors use different staining material as stated above therefore in-order to absolve the effects of biomarkers since the grey-scale images have comparatively high accuracies in comparison with colour based nuclei during the classification layer.Let's consider a central locations with corner information ( ) , ( )-( ) ( | ) by considering each pixel, where h is the height of nuclei and w is the width of particular object as shown in eq (9).Since every separated nuclei has unique location in dataset D as ( )eq.( 10).
On the basis of nuclei radii cuts were made and chromatin bags were prepared by considering the foreground of nuclei images and second data set of background was considered to classify the nuclei bags with the assistance.

B. Layer 2: Feature Selection & Classification Using Deep
Learning This paper presents classification of cancerous bags consisting upon the chromatin bags and abnormal nuclei bags.The CNN (Convolutional Neural Network) is a linear classifier since it uses weighted matrix W and bias vector b.Responsible to collect and forward input-values of image features to neurons (number of neurons or hidden layers depends upon the complexity of problem).Supplied data to neurons has to be calculated with some hidden ‗bias' weights for further processing.Due to the neuron like structure one input is connected to another input vectors such like bias and each class of nuclei behaviors is represented by hyperplane by using the vector spaces.Let's consider vector and variable Y is a stochastic variable.
Where class is the max probability to predict the model during the training layer.
Parameters supplied to train the classifier are responsible to maintain the state of persistence which are assigned to shared variable W, b in the parameters of ( | ) where x may be considered as cancer bags vector types.Since the optimal model for learning comprises our minimizing loss strategy but the more than one class (multi-class) strategy considers in a dataset D to follow the parameters as defined.
Since the disease represented variables inputs are defined in x quantities with Y classes.In CNNs deep learning models all the training instances are fully connected to output layer.Gradient loss by considering the parameters as defined in and are able to handle a large number of classes but computational complexities takes huge training time because of low processing capabilities of normal desktop computers.

C. Layer 3: Performance Evaluation
The confusion matrix, Precision and recall measures are used to evaluate the performance of proposed systems as shown in [  4], [Figure 5] where cumulative representation of instances have been shown by using CNN classifier.Since the pixel precision can be recalled by considering the probability retrieved variables with associated unknown inputs consisting upon unknown behaviors.The details of performance evaluation is defined in following result section.4] measure recorded about 97.15% and 98.90%.For Non-Cancerous class label attribute 839 instances were classified and 11 were determined as miss classified.Over all classification accuracy for Bags of Nuclei Patches was recorded as 97.86%.The cumulatively measured classification of the system is 97.91%, since the precision and recall measures were estimated about 98.70% and 96.56% respectively.The estimated AUC (area under curve) is represented is [Figure 6], where AUC for BCP behavior is measure as 0.9385 and the AUC for BNP class was approximated as 0.9915.The comparison of this papers proposed approach with literature is presented in [Table 5], which shows that CNN classification produces the more enhanced accuracies.problem of histopathological images is one of the major problems.In this article thyroid classification problem is presented as use case to describe the proposed.Since the differentiation between the abnormal human cancerous tissues requires special techniques to preprocess because improper segmentation would become the cause of loss of valuable information related to chromatin and set of nuclei.This paper proposes a system so called -Intelligent system for detection of abnormalities in human cancerous cells and tissues‖, which provides in-depth nuclei behaviors classification by proposing novel algorithms BCC (Bag_of_cancerous_cells) where each nuclei is separated with its micro-architectural components i.e.BCP (Bags of Chromatin Patches) and BNP (Bags of Nuclei Patches).Proposed algorithm not only reduces the complexity of classifier by detecting the objects as BMF (Bag of Malignant Features) but also assists the doctors to classify cancerous and non-cancerous quantities such as Bags of chromatin Patches and Bags of Nuclei Patches.Cconvolutional Neural Networks were used to construct the classification models for both DICOM behaviors.A total number of 4520 nuclei were trained where [Table 1] BCP (Bags of Chromatin Patches) consists upon the 2650 nuclei observations and measured classification accuracy for BCP was 97.93%.In confusion matrix [Table 2] represented for BNP (Bags of Nuclei Patches) was measured as 97.86%.Additionally various kinds of cancers could be quantified by using proposed preprocessing algorithm to reduce the noise and to extract the appropriate features of interest.In future works this article recommends to resolve the classification problems of anaplast cancers which are most aggressive cancers occurs in human organs such like thyroid and ovary.

Fig. 1 .
Fig. 1.Training Set of Nuclei with different behaviors-Normal Chromatin and abnormal Chromatin, Normal Nuclei and abnormal Nuclei with respect to appearance The proposed system comprises upon the three layers, in first layer image pre-processing techniques have been used, in which noise reduction and feature selection is done by using proposed algorithm [algorithm 1] so called BCC (Bag_of_cancerous_cells).In second layer bags of chromatin patches and bags of nuclei patches are trained to construct the classification model based upon the deep learning algorithm such as convolutional Neural Networks.

Fig. 2 .
Fig. 2. Intelligent system for detection of abnormalities in human cancerous cells and tissues WorkflowA.Layer 1: Data Pre-Processing & Noise ReductionDataset: Due to un-availability of histopathological datasets of FNAB (Fine Needle Aspiration Biopsy) in literature, realworld datasets were prepared for training and testing purposes.Classification of abnormal behaviors of nuclei were performed at the deepest levels by using the selection of chromatin distribution and other nuclei related features which are not only providing meticulous assistance to doctors in quantification of micro-archetechral components but also helps to reduce the chances of misdiagnosis.Noise Reduction:Noise reduction of DICOM (Digital Communication in Medicine) images is done by using adaptive threshold segmentation.Proposed pre-processing algorithm is presented in detail in following section as [algorithm 1] and [Figure3].

Fig. 3 .
Fig. 3. Noise reduction and object detection  ALGORITHM 1: (Bag_of_cancerous_cells) www.ijacsa.thesai.orgThe edges [Figure 3] of each detected nuclei after segmentation process may be considered as ( ( ) has to find edge position of every set of nuclei where values of ( ) | | (

Table 5
C. The best measured ccuracies for both cancerious behaviours are calculated as BCP (Bags of CancerPatches) 97.93% and BNP (Bags of Ncueli Patches) 97.86%.

TABLE I .
CONFUSION MATRIX FOR BCP ( BAGS OF CHROMATIN PATCHES)

TABLE II .
OVERALL PERFORMANCE OF BCP ( BAGS OF CHROMATIN PATCHES)

TABLE III .
CONFUSION MATRIX FOR BNP ( BAGS OF NUCLEI PATCHES)

TABLE IV .
OVERALL PERFORMANCE OF BNP ( BAGS OF NUCLEI PATCHES)