Automatic Diagnosing of Suspicious Lesions in Digital Mammograms

Breast cancer is the most common cancer and the leading cause of morbidity and mortality among women’s age between 50 and 74 years across the worldwide. In this paper we’ve proposed a method to detect the suspicious lesions in mammograms, extracting their features and classify them as Normal or Abnormal and Benign or Malignant for diagnosing of breast cancer. This method consists of two major parts: The first one is detection of regions of interest (ROIs). The second one is diagnosing of detected ROIs. This method was tested by Mini Mammography Image Analysis Society (Mini-MIAS) database. To check method’s performance, we’ve used FROC (Free-Receiver Operating Characteristics) curve in the detection part and ROC (Receiver Operating Characteristics) curve in the diagnosis part. Obtained results show that the performance of detection part has sensitivity of 94.27% at 0.67 false positive per image. The performance of diagnosis part has 94.29% accuracy, with 94.11% sensitivity, 94.44% specificity in the classification as normal or abnormal mammogram, and has achieved 94.4% accuracy, with 96.15% sensitivity and 94.54% specificity in the classification as Benign or Malignant mammogram.


I. INTRODUCTION
Breast cancer is the most common cancer and the leading cause of morbidity and mortality among women's age between 50 and 74 years across the worldwide.Recent statistics have shown that one in 8 women in the United States and one in 10 women in Europe develop breast cancer during their lifetime [1], [2].So, breast cancer is a major problem of public health, and the best strategy for the fight against breast cancer is early detection.For that reason, the mammography remains the best and most accurate tool for early detection of breast cancer [2], [3].Reading and interpretation of mammogram is a crucial step.From where, Breast Imaging-Reporting and Data System (BI-RADS) of the American College of Radiology (ACR) [4], aims at providing a standardized classification system for reporting mammographic breast densities.Faced with the increase in the number of mammograms in recent decades, and the difficulty of reading and interpretation of mammograms, different research make the effort.Either, to automatically detect breast lesions through Computer Aided detection systems (commonly referred CADe).Either, To automatically interpret mammograms through Computer Aided Diagnostic Systems (commonly referred CADx).These systems are employed as a supplement to the radiologists' assessment.Generally, the procedure to develop a Computer-Aided-Diagnosis (CAD) system, for diagnosing of suspicious regions in mammograms takes place in four steps: 1) Preprocessing step: this step is to prepare the mammograms for the next steps of operations (segmentaton, classification); 2)Detection of regions of interest :This step is to analyze the mammogram and extract the necessary information, for example, segmentation which divides the mammogram into multiple segments, edge detection which finds the edges of objects and helps us to find regions of interest; 3) Features extraction and selection of ROIs detected: In this step , we can identify specific patterns, shapes, density and texture; 4) Classification of ROIs: The purpose of this step is to classify the mammograms as Normal or Abnormal and malignant or benign [5] [6].In this paper, we've proposed an automatic method to detect and diagnosing of suspicious lesions in mammogram.The proposed method is a very accurate technique for detecting and diagnosing breast cancer by using mammogram.Obtained results show the efficiency of projected method and make sure chance of its use in rising breast cancer detection and the diagnosing.Paper organization : The rest of paper organized as follows: Section I: An introduction ; Section II: Related work; Section III: Materials and method ; Section IV : Features generation and extraction; Section V : Our proposed research; Section VI : Results and performance of proposed method ; Section VII : Conclusion; and references are given at the end.

II. RELATED WORK
For detection and diagnosing of abnormalities in mammograms, A number of methods have been proposed, generally regrouped as: Statistical methods [7]; methods based wavelets [8] [9]; Methods based Markov models [10]; Methods using machine learning [11], etc.Several researches have been published about computer breast cancer detection and diagnosis.For example, K. Ganesan et al. [12] presented an overview www.ijacsa.thesai.orgdescribe recent developments and advances in the field of computer-aided breast cancer diagnosis using mammograms.M. Veta et al. [13] presented an overview of methods that have proposed for the analysis of breast cancer histopathology images.Detection of ROIs is a capital step in development a computer-aided breast cancer diagnosis system.Many researchers have published on segmentation of breast tissue regions according to differences in density and texture, for detecting ROIs.For example, Adel et al. [14] used a method to segment mammograms into three distinct regions are : pectoral muscle, fatty regions, and fibroglandular regions using Bayesian techniques with Markov random field.Elmoufidi et al. [15][16] developed a method to Detect of ROIs in Mammograms using LBP algorithm, K-Means algorithm and GLCM algorithm.K. Hu et al. [2] published an approach to detect of suspicious lesions in mammograms by adaptive thresholding based on multiresolution.In other word, many methods have been used to feature extraction and classification.For example, Veena et al. [17] proposed a CAD System for Automatic Detection and Classification of Suspicious Lesions in Mammograms.Nasseer et al. [18] developed an algorithm for Classification of Breast Masses in Mammograms using SVM.L.Jelen et al. [19] developed a method for Classification of breast cancer malignancy using cytological images of fine needle aspiration biopsies.J. Malek et al. [20] proposed a system to Automatic Breast Cancer Diagnosis Based on GVF-Snake Segmentation, Wavelet Features Extraction and Fuzzy Classification.Nra Szkely et al. [21] used A Hybrid System for Detecting Masses in Mammographic Images.[22] Used an approach for Mammogram Segmentation by Contour Searching and Massive Lesion Classification with Neural Network.S. Timp et al. [23] developed a Computer-aided diagnosis with temporal analysis to improve radiologists.
The CAD systems are powerful tools that could aid radiologists to lead better results in diagnosing a patient.

III. MATERIALS AND METHOD
The proposed method checked by mini Mammography Image Analysis Society (mini-MIAS) database [24] and implemented using Seed Region Growing (SRG) algorithm, Local Binary Pattern (LBP) algorithm and support vector machine (SVM) classifier.The SRG to remove the pectoral muscle, the LBP to detect the regions of interest, and SVM to classify the mammograms as normal or abnormal and benign or malignant.SRG and LBP are two simples algorithms of segmentation and better choice for easy implementation.Using SVM as classifier because provide an effective and flexible framework from which to base CAD techniques for breast mammogram [25].

A. Mammogram Database
To checked the proposed method we've used the mini-Mammography Image Analysis Society (mini-MIAS) database [24].The mammograms are in gray scale file format (Portable Grey Map -PGM), the size of every image is

B. Seed Region Growing (SRG)
SRG algorithm for segmentation introduced by R. Adams et al. [25] is a simple method of segmentation which is free of tuning parameters and rapid.It's one of the better choice for easy implementation and applying it on a larger dataset.Seed region growing approach for image segmentation is to segment an image into regions with respect to a set of N seeds as presented in [12], [14] is discussed here.

C. Local Binary Pattern (LBP)
Local Binary Pattern (LBP) operator combines the characteristics of statistical and structural texture analysis.The LBP operator is used to perform gray scale invariant twodimensional texture analysis.The LPB operator labels the pixel of an image by Thresholding the neighborhood (i.e. 3 × 3) of each pixel with the center value and considering the result of this Thresholding as a binary number [7], [26].When all the pixels have been labeled with the corresponding LBP codes, histogram of the labels are computed and used as a texture descriptor.Formally, given a pixel at (x c , y c ), the resulting LBP can be expressed in decimal form as follows: where : i c and i p are, respectively, gray-level values of the central pixel and P surrounding pixels in the circle neighborhood with a radius R, and function s(x) is defined as: D. Support Vector Machine (SVM) SVM classifier algorithm, developed from the machine learning community is a discriminative classifier formally defined by a separating hyperplane.The hyperplane is determined in such a way that the distance from this hyperplane to the nearest data points on each side, called support vectors, is maximal [27].SVM classifiers can be extended to nonlinearly separable data with the help of kernel function application on the data to make them linearly separable [28].An approach with wavelet SVM was discussed in [29].Details about SVM, its application to diagnose of breast cancer was discussed in [26] [30].

IV. FEATURE GENERATION AND EXTRACTION
Below a list of eighteen features selected to use as input parameters of SVM classifier for training and testing our proposed method.
a) Mean Value: µ represents the average of pixels in the segmented ROI.
Where: I(i,j) is the pixel value at point (i,j) in ROI of size MxN.
b) Standard Deviation: σ describes the dispersion within a local region.
c) Entropy: H used to describe the distribution variation within ROI.
Where: P k is the probability of the k th grey level, L is the total number of grey levels.
d) Skewness: S is a number characterizes the shape of the distribution.
Where: I(i,j) the pixel value at point (i,j), µ the mean and σ the standard deviation.e) Kurtosis: K measures the flatness of a distribution relative to a normal distribution.
f) Uniformity: U is a texture measure based on histogram : Where: P k the probability of the k th grey level.
g) Sum Entropy: SE is a logarithmic function of the ROI in consideration.
h) Sum Average: SA is found from the ROI in consideration and the size of the gray scale i) Difference variance: DV is a variance measure between the ROI intensities calculated as a function of the SE calculated previously j) Difference entropy: DE is an entropy measure which provides a measure of no uniformity while taking into consideration a different measure obtained from the original image k) Inverse Difference Moments: IDM is a measure of the local homogeneity.
l) Area: A is the sum of the number of all pixels (x) within segmented ROI.
(14) www.ijacsa.thesai.orgm) Perimeter: P is the length of a polygonal approximation of the boundary (B) of ROI: n) Convexity: C(S) is the ratio of the ROI area and its convex hull, the convex hull is the minimal area of the convex polygon that can contain the ROI: Where: S is a ROI, CH(S) is its convex hull and A is the ROI's area.
o) Compactness: C is a measure of ROI's shape, which indicates how much the ROI is compact : Where : P the ROI's perimeter, A ROI's area.p) Aspect Ratio: AR corresponds to the aspect ratio of the smallest window fully enclosing the ROI in both directions (see Fig. 2): Fig. 2: Example of ROI window from which some features will be extracted.
Where: Dy the height, Dx the width of window in Fig. 2 q) Area Ratio: The Area Ratio (R − Area) is defined by dividing the area of the segmented ROI in pixels by the area of the same window given in Fig. 2 : Where: Area − window = Dx*Dy , Dx is the width's ROI and Dy is the height's ROI.The value of R − Area will range from 0 to 1. So, It takes small values for ROI with appendices and branches emitted from it, and larger values for more compacted and rounded objects.
r) Perimeter Ratio: R − P erim presents the ratio between the perimeter of the segmented ROI to the perimeter of the same rectangular window of fig.2, this can be written as: R − P erim = P erimeter − ROI(in pixels) P erimeter − window(in pixels) .
V. OUR PROPOSED RESEARCH In this paper, we've proposed a method for automatic detecting and diagnosis of suspicious lesions in mammograms.The proposed method consists four major blocks, namely:

A. Preprocessing
The mammography can cause some additional objects at the resulting mammograms, like: artifact, noises, labels, etc.According to [31], the mammograms contain several sorts of noise and imaging artifacts.So, preprocessing step will be applied to get rid of the extra objects and enhance the standard of mammograms.Generally, the preprocessing step is to prepare the mammograms for the next steps, such as: segmentation of ROIs , selection and extraction of features of ROIs, and classification of ROIs.In this step, the aim is to extract only the breast profile region without additional www.ijacsa.thesai.orgobjects, and without background.First, a threshold value is used to get rid of the labels and also the further objects within the mammograms.Second, we've used an automatic technique to take away further background, and detected mammogram orientation.From where, the pectoral muscle is within the top corner in right or left, the seed point of SRG is J [5,5] or J[5, y-5], (were J: is the mammogram when the background has removed, [x,y]=size(J)) and we've used a minimal threshold value for giving a good result with all type of mammogram (Fatty , Fatty-glandular, Dense-glandular) .Third, we used 2D median filter in a 3-by-3 neighborhood connection to remove additional objects (artifact and noise).In addition, the mammogram is basically low contrast [1], so, we've applied a step of enhancement of contrast(see Fig. 4).

C. Diagnosis of Regions of Interest (ROIs)
After detection of regions of interest and extraction their features, the next step is to classify them as normal or www.ijacsa.thesai.orgabnormal in the first time and as benign or malignant in the second time.One among the novelties of proposed method that a new technique to detect all suspected areas in mammogram ( not just the detection of lesions) and consider them as regions of interest (ROIs).If no regions of interest detected, the mammogram is normal.In the case of detection of multiple ROIs, we are going to separate them one by one and extracted their features separately (one by one), then diagnosing them.first, in the case all ROIs belong in the same mammogram are normal, then the mammogram is normal.Otherwise, the mammogram is abnormal.Second, in the case all ROIs belong in the same mammogram are benign, then the mammogram is benign.Otherwise, the mammogram is malignant.In addition, this algorithm is able to detect the masses and the calcification.

1) Experimental results of diagnosis part:
Next three figures show details of the mentioned method.1) Button "download" for downloading a new mammogram.2) Button "Pre-processing" is to apply a preprocessing step on original mammogram (remove label, noise, pectoral muscle and additional background).3) Button "Apply LBP" is to apply local binary pattern algorithm on the result mammogram after preprocessing step .4) Button "Extract ROIs" is to extract all detected objects as ROIs.If we get just one ROI, only the button "ROI1" is going to enable.If we get two ROIs, the two buttons "ROI1" and "ROI2" are going to enable, and so on.5) Button "ROI1" is to select the first ROI, the button "ROI2" is to select the second ROI, and so on.Button "Clac-features" is to extracte ROI's features selected in the previous step.6) Button "add-feature" is to add the features in our database.7) Button "Classify" is to classify the ROI selected to normal, benign or malignant.if the ROI selected is normal a white button appears on the screen containing the text normal, if the ROI selected is benign a green button appears on the screen contains the text benign, if the ROI selected is malignant a red button appears on the screen containing the text malignant.In addition, if we get many ROIs, we are going to classify them one by one, if all the ROIs are normals, the mammogram is normal.if there are at least one ROI benign and no ROI malignant, the mammogram is benign.If there are at least one ROI malignant, the mammogram is malignant.

VI. RESULTS AND PERFORMANCE
Our global method were checked on 322 mammograms from mini-MIAS database.The detail about mini-MIAS database is given above.

A. Performance of detection part
Each segmentation and classification result needs evaluation of its performance.Generally, there are three types of performance evaluations of algorithms and approaches proposed for medical imaging processing (detection of regions of interest, segmentation and classification): The first type involves qualitative assessment, the second is quantitative assessment involving the ground truth evaluation and the third is a statistical evaluation [31].
Detected and selected the suspicious regions in mammogram is a crucial step in developing a CAD system, and detecting more regions d'interet as false positive, result a weak system.For that, we've considered a ROI correctly detected if its area is overlapped by at least of 75% from ground truth.We have obtained a good detection result, i.e., 100%, for MISC and 95.45%, for CIRC.The detection result of SPIC (89.47%) is relatively reliable, because the overlapping of some SPIC is least of 75%, hence, we considered as false negative.Generally, we've obtained a sensitivity of 94.27% at 0.67 False Positive per Image in the detection stage.The evaluation procedure is as following: the database is divided into two parts: the first one for training contains the half of database (161 mammograms from 322 mammograms) selected aleatory, the second one for testing contains the rest of database( 161 mammograms) the detail of the database distribution between training and testinig is given below:

B. Performance of diagnosing part
We have evaluated the performance of proposed method by calculating of accuracy, sensitivity and specificity for normal or abnormal case and benign or malignant case.Diagnosing part of our method has achieved 94.29% accuracy, with 94.11% sensitivity and 94.44% specificity.Fig. 12 shows the ROC curve of the proposed method.[2] in mammograms Veena Detection & Classification of 92.13% et al. [17] Suspicious Lesions in Mammograms Nasseer Classification of Breast Masses 93.069% et al. [18] in Digital Mammograms K. Ganesan Classification of Mammograms 92.48 % et al. [27] Using Trace Transform Functionals Our Automatic Diagnosing of Suspicious 94.29% method Lesions in Digital Mammograms

VII. CONCLUSION
In this paper, an automatic algorithm to breast cancer detection and diagnosing is implemented using the MATLAB environment.our algorithm's performance has evaluated using FROC curve in detection part and ROC curve in diagnosis part.Obtained results show the efficiency of this method and comparable to different solutions.The proposed algorithm will contribute to determination of main drawback in diagnostic procedure mammogram, such as: detection and diagnosing of the masses and also the calcification.The efficiency of the planned method confirms possibility of its use in up the Computer-Aided Diagnosis system.
1024 × 1024 pixels, and resolution of 200 micron.This database composed of 322 mammograms of right and left breast, from 161 patients, where 207 mamograms diagnosed as normal and 115 mammograms as abnormal (22 images of CIRC -Well-defined/circumscribed masses, 19 images of SPIC -Spiculated masses, 19 images of ARCH -Architectural distortion, 15 images of ASYM -Asymmetry, 26 images of CALC -Calcification and 14 images of MISC -Other, illdefined masses) 52 mammograms malignant and 63 benign.Fig.1shows the different objects in the mammograms.

Fig. 7 : 2 )
Fig. 7: ROIs detected contain the lesion and false positives :(a) Original mammograms, (b) Mammograms after preprocessing, (c) LBP algorithm is applied,(d)Regions of interst have detected.2) Example of ROIs automatically detected: In the figure below, some ROIs automatically detected.

a) Example 1 :
A normal mammogram correctly detection and diagnosis .
FROC curve, representing the True Positive Fraction (TPF) according False Positive per Image (FP/I) see the detail below: False Positive per Image = Number of False Positive Number of image (21)

TABLE I :
The obtained results grouped by anomaly classes .

TABLE II :
Images's number used to train SVM Classifier.

TABLE III :
Diagnosing accuracy of normal or abnormal cases

TABLE IV :
Diagnosing accuracy of benign or malignant cases

TABLE V :
Comparison our method's Performance with articles recently published .