An Efficient Method for Breast Mass Segmentation and Classification in Mammographic Images

According to the World Health Organization, breast cancer is the main cause of cancer death among women in the world. Until now, there are no effective ways of preventing this disease. Thus, early screening and detection is the most effective method for rising treatment success rates and reducing death rates due to breast cancer. Mammography is still the most used as a diagnostic and screening tool for early breast cancer detection. In this work, we propose a method to segment and classify masses using the regions of interest of mammographic images. Mass segmentation is performed using a fuzzy active contour model obtained by combining Fuzzy C-Means and the Chan-Vese model. Shape and margin features are then extracted from the segmented masses and used to classify them as benign or malignant. The generated features are usually imprecise and reflect an uncertain representation. Thus, we propose to analyze them by a possibility theory to deal with imprecise and uncertain aspect. The experimental results on Regions Of Interest (ROIs) extracted from MIAS database indicate that the proposed method yields good mass segmentation and classification results. Keywords—Mammography; breast mass; mass segmentation; fuzzy active contour; mass classification; possibility theory


I. INTRODUCTION
Breast cancer is the most common form of cancer in the world.It has become the second cause of death by cancer in women after lung cancer.According to the World Health organization (WHO), in 2012, there were 1.7 million newly diagnosed breast cancer cases in the world [1].Moreover, between the years 2008 and 2012, breast cancer incidence has increased by 20%, while mortality has augmented by 14%.Such statistics motivate researchers to design new tools for early detection and diagnosis of breast cancer.
Computer-Aided Diagnosis (CADx) systems have been developed to reduce the experts' workload and to help them in early detection of breast cancer [2].Such systems involve generally four phases: preprocessing, segmentation, feature extraction and selection, and classification [3].Each of these phases should be performed appropriately.In fact, the performance of each stage can affect that of the subsequent stages [4].
Breast masses are the most important indicators of malignancy that can be present in mammography.It is often difficult to distinguish this type of abnormality from the surrounding parenchymal.Thus, its automated segmentation and classification is a challenging task.There is extensive literature on mass segmentation methods.They can be divided into several techniques such as: thresholding-based techniques, region growing-based techniques, clustering-based techniques and active contour-based techniques.
Thresholding-based techniques can be classified into global and local thresholding.Global thresholding focuses on global information such as the histogram of the mammograms [5].When masses are sufficiently brighter than surrounding tissue, it is possible to use a global threshold.However, local thresholding determines a local threshold value for each pixel based on neighbor pixels intensity values.Thresholding techniques has been widely used for mammographic mass segmentation [6]- [8].
Region growing-based techniques start from initial seed point and regroup pixels of similar characteristics to divide the mammographic image into homogeneous regions.Cao et al. [9] proposed an adaptive region growing method with hybrid assessment function combined with maximum likelihood analysis and maximum gradient analysis.This method is used to segment mammographic masses.Berber et al. [10] proposed an extension of the classical seeded region growing for mass segmentation in mammographic regions of interest.In fact, in this work the threshold value is adjusted adaptively based on mass size estimation to prevent over-and under-segmentation.
Clustering-based techniques classify mammogram pixels by grouping those with similar properties into a set of clusters.Sampaio et al. [11] use cellular neural networks to develop a computational methodology for mass segmentation.In [12], the authors propose an extension of the K-means method.The disadvantage of Clustering-based techniques is that they need to set manually the number of clusters.
Active contours-based techniques can be classified into snakes and level sets.The difference between these two types is their mathematical implementation.In fact, the boundary in snake evolves explicitly.However, it evolves implicitly in level set.There are numerous studies on mass segmentation using level set methods [13]- [15].Mass classification is a key technology in CADx systems.It is very useful in early breast cancer detection and it can prevent unnecessary biopsy [2].Several researches have investigated mass classification.Gorgel et al. [16] use the support vector machine (SVM) method to classify the segmented masses as benign or malignant.The segmentation was performed using a local seed region growing (LSRG) algorithm.In [17], the authors combine both texture and shape features to classify masses by using SVM and ELM networks with modified kernels.Liu et al. [2] performed mass classification using selected geometry and texture features, and a new SVM-based feature selection method.
In this paper, we propose a novel method for automatic mass segmentation and classification of mammographic masses.A general flowchart of the proposed method is outlined in Fig. 1.Mass segmentation is based on the Chan-Vese model.Considering the fact that masses have fuzzy boundaries, we propose to deal with this problem using fuzzy logic by integrating fuzzy membership values in the Chan-Vese model.This leads to a fuzzy Chan-Vese model.The estimation of the fuzzy membership values is performed using the fuzzy C-Means method.
The classification of the segmented masses depends essentially on their shape.In fact, benign masses are usually round and oval having smooth contours.Nevertheless, malignant masses have generally irregular shape with lobulated or spiculated margins.This knowledge suffers from imprecision and ambiguity.Thus, we propose to deal with the problem of mass classification using geometry features while taking into account the uncertainty linked to the degree of truth of the available information and the imprecision related to their content.This paper is organized as follows: In Section II, we describe the proposed method for mass segmentation.In Section III, we present the extracted features from the segmented masses.In Section IV, we provide the followed steps to build a possibility knowledge basis and we present the used method for mass classification.In Section V, we provide the results obtained using the proposed method.Finally, Section VI presents a conclusion of this work.

II. MASS SEGMENTATION
The proposed mass segmentation method is based on the Chan-Vese active contour which is a region-based active contour capable of segmenting objects whose boundaries are not defined by gradient.Nevertheless, the disadvantages of this model are the problem of leaking which arises when the mass margins are fuzzy and ambiguous and the problem of increase of false positives in presence of tissue homogeneity.To overcome these problems, we propose a fuzzy version of the Chan-Vese model which is able to reject "weak" local minima and to handle objects with discontinuous boundaries.The fuzzy C-Means method is used to build the fuzzy Chan-Vese model.In this section, we start by presenting the conventional Chan-Vese model and the fuzzy C-Means method.

A. Previous Methods and Background
a) The Chan-Vese Model: Chan and Vese [18] proposed an active contour model without edges which allows segmenting objects whose boundaries are not defined by gradient.This model assumes that the image is formed by two approximately piecewise-constant intensities [19].The energy function of the model is defined as follows: where µ, λ 1 and λ 2 are fixed positive parameters; c 1 and c 2 are the mean values inside and outside the curve C, respectively.
b) The Fuzzy C-Means method: The Fuzzy C-Means [20] is an iterative unsupervised fuzzy clustering algorithm which uses the concepts of fuzzy set theory and fuzzy logic to provide a fuzzy partition of the image.It is based on minimizing the following objective function: Where, • C is the number of classes and N is the number of pixels; • µ ij is the degree of membership of the pixel x i to the class j; • m ∈ [1, ∞[ is a fuzziness factor which is used to control the fuzziness of the obtained partitions; • D ij is the euclidian distance between the pixel x i and the class center ν j .
Membership functions µ ij and class centers ν j are updated iteratively using the following formulas: The iteration will stop when the following condition is reached: The proposed mass segmentation method consists of mainly three steps: Firstly, the ROI is preprocessed to enhance the contrast.Next, two fuzzy membership values are estimated based on fuzzy C-Means algorithm.These fuzzy membership values are finally used to modify the energy of the Chan-Vese model and to perform the final segmentation.

B. Preprocessing
Since mammographic images are poor in contrast, a preprocessing step is necessary to enhance the contrast of ROIs.Thus, gamma correction which is a non linear transformation process is applied to change the luminance of these ROIs.The mathematical form of its transformation function is as follows [21]: where I is the input image, I out is the output image, c and γ are parameters controlling the shape of the transformation curve.Fig. 2 shows how intensity values are mapped with different values of γ.From this figure, it can be seen that a γ value greater than 1 allows having an image with a higher contrast.In our experiment, we set the value of γ to 4.

C. Fuzzy Membership Estimation
The aim of this step is to determine the membership degrees µ M (x, y) and µ B (x, y) of each pixel of coordinates (x, y) to the class "Mass" and "Background".In this paper, we propose an estimation process based on Fuzzy C-Means algorithm [20].The followed steps to achieve this objective are: 1) Initialization of the fuzzy membership matrix µ ij ; 2) Calculating the cluster centers using (4); 3) Updating the fuzzy membership matrix µ ij using (3); 4) Return to Step 2 until convergence or maximum number of iterations is reached.

D. Segmentation using a Fuzzy Active Contour Model
After estimating the fuzzy memberships, the proposed method performs mass segmentation using a fuzzy active contour model.This model is based on the Chan-Vese model.In fact, the energy of each pixel which is formulated as 2 in the Chan-Vese model is weighted by the corresponding membership value µ M (x, y) or µ B (x, y) so that the formula of the energy formula becomes as follows: Shape and margin features are the most important features to differentiate between benign and malignant masses.In fact, malignant masses have spiculated or microlobulated boundaries and irregular shape.However, benign masses appear smooth in the boundary and round in shape [22] (Fig. 3).In this Fig. 3. Classification of breast masses according to their contours.paper, nine shape features are extracted, including circularity, compactness, rectangularity, normalized radial length (NRL)based features (mean, standard deviation, entropy, area ratio, zero crossing, roughness).These features are listed in Table I.

IV. BUILDING THE POSSIBILITY DISTRIBUTIONS OF THE FEATURES AND MASS CLASSIFICATION
In order to build a possibility knowledge basis, the mass description which is formulated by the extraction of shape and margin features should be transformed into a possibility description.Thus, a possibility distribution should be estimated for each feature and each class.In this work, the Circularity [23] Represents how a mass is similar to a circle.
where A is the area of the mass given by the number of pixels inside its contour and A C is the area of the circle having the same perimeter as the mass.
Compactness [24] A measure of contour complexity versus the enclosed area.

Com = P 2
A with P is the perimeter of the mass measured by summing the number of pixels on the contour's mass:P = pixels ∈ Contour and A is its area.
Rectangularity [23] Represents the degree of resemblance between the mass and a rectangle.
with A is the area of the mass and A R is the area of the minimum bounding rectangle.
NRL Mean [25] Mean of the Normalized Radial Length where d (i) is the Euclidian distance from the mass center to each of the boundary points normalized by dividing by the maximum radial distance.
NRL standard deviation [25] Represents the variance of the NRL around a circle defined by NRL Mean as radius.

NRL entropy [25]
A probabilistic measure determining how well the mass's radial length could be estimated.It includes both the idea of roundness and tumor roughness.
where p k is the probability of a NRL to the number of whole radials.
NRL area ratio [25] Evaluates the percentage of the mass proportion located outside the circle defined by NRL Mean.Thus, it allows discriminating between masses with smooth and spiculated contours.
NRL zero crossing [25] Represents the number of times the line plot crossed the mean NRL.
NRL roughness [25] Allows isolating the macroscopic mass shape from the structure of the fine contours.
generation of the initial possibility distributions, the choice of their shape and their parameters are performed based on the knowledge expressed by the expert.Fig. 4 shows the possibility distributions of the circularity feature.
The obtained distribution functions are then modified by considering the score feature evaluating its pertinence.Plenty of feature selection methods which use ranking criterion to score the features are available in the literature [26].In our proposed method, the fisher score is used for feature selection due to its good performance.In fact, it evaluates each feature individually by measuring the degree of class separability with the following formula: where, η c is the number of elements belonging to the class c; µ i c and σ i c are the he mean and the standard deviation of the ith feature in the class c, respectively; µ i is the global mean of the ith feature.
The obtained Fisher scores are used to adjust the possibility distributions as follows: After adjusting the possibility distributions, a fusion step is performed in order to combine all the information relative to extracted features and to obtain information of better quality.Thus, a conjunctive operator (minimum operator) is applied to the possibility distributions associated with the different features to get only one distribution for each class.The final possibility distributions are obtained as follows: The possibilistic decision can be made based on the maximum possibility measure, the maximum necessity measure or on the maximum confidence index.In this work, the decisionmaking is based on the maximum possibility value because it is the most intuitive and the most used decision in possibilistic classification.

A. Dataset
All the images used in this paper are belonging to a publicly available digital mammography dataset, which is the Mini Mammographic Image Analysis Society (MIAS) dataset [27].It consists of 322 medio-lateral oblique (MLO) views of 161 patients.The images are digitized to 200 micron pixel edge and clipped/padded so that every image is 1024×1024 pixels.
A set of 57 ROIs were extracted from this dataset.These ROIs contain masses with different margin types such as CIR-Cumscribed (CIRC), SPICulated (SPIC), and MISClassified masses (MISC).Fig. 5 shows the distribution of the different mass margin types based on their severity.The contours of these masses were manually annotated by an expert radiologist to serve as ground truth (GTR).

B. Evaluation Metrics
We have used the accuracy, precision and sensitivity measures to evaluate the performance of the proposed segmentation and classification methods.These measures are defined as follows: P recision = T P T P + F P (13)

Sensitivity =
T P T P + F N (14) where: • True Positives TP: Pixels that are correctly segmented as Mass for the segmentation method (SR∩GT R) and the number of malignant masses correctly classified as malignant for mass classification method.
• False Positives FP: Pixels that are segmented as Mass but they are not labeled so as in GTR for segmentation (SR∩GT R) and the number of benign masses falsely classified as malignant for classification.
• True Negative TN: Pixels that are correctly segmented as background for segmentation (SR ∩ GT R) and the number of benign masses correctly classified as benign for classification.
• False Negative FN: Pixels classified as normal tissue in the SR but they are labeled as Mass in the GTR for segmentation (SR ∩ GT R) and the number of malignant masses falsely classified as benign for classification.

C. Segmentation Results
Fig. 6 shows the results of preprocessing, the results of fuzzy membership estimation and also the final segmentation results of four ROIs extracted from the MIAS database.From this figure, we can note that the proposed method reduces the false positives.In fact, even though the benign tissue exhibits in-homogeneity, no noisy regions have been falsely detected outside of mass regions.The values of the quantitative evaluation measures for the ROIs are also given in this figure.We can notice that the proposed method gives satisfactory results both for benign masses (sample 2 and sample 4) and for malignant masses (Sample 1 and Sample 3).
To prove that the combination of the Chan-Vese model and the FCM method improves the segmentation results, we give in Table II the performance results on the whole database of the Chan-Vese model, the FCM method and the proposed method.We can note from this table that our proposed method has the highest accuracy and precision.However, the highest sensitivity is obtained by the Chan-Vese model.This can be justified by the overestimation of mass boundaries caused by this model.

D. Classification Results
Table III is a comparison between our proposed possibilistic classification method and other state-of-the-art classification methods.We can notice that our method outperforms the other methods in terms of accuracy when applied to MIAS database.The promising results should be due to the possibilistic reasoning which represents a means of simulating human reasoning.

VI. CONCLUSION
In this paper, we have investigated and have presented the results of segmentation and classification of breast masses with a data set of 57 ROIs extracted from MIAS database.The Classification using neural networks and shape and density features.

82.30%
proposed segmentation method estimates fuzzy membership values to the class Mass and the class Background.These values are then used to modify the Chan-Vese model.Thus, the motion of the evolving contour will be guided by the pixel fuzzy memberships.This allows obtaining an accurate segmentation even for masses whose boundaries are not defined by gradient.The proposed mass classification method is based on the possibility theory which can handle the uncertainty inherent to the available knowledge.
The obtained results show that the proposed method represents an efficient tool that can automatically segment and classify masses in an accurate way.
A limitation of our method is related to the ROI detection which is not performed in an automatic way.Thus, as perspectives, we propose to deal with other stages of CAD systems such as automatic ROI detection.Furthermore, we will investigate the possibility of introducing other features such as intensity and textural features to improve the mass classification results.

Fig. 4 .
Fig. 4. Possibility distributions of the circularity feature (a) Possibility distribution of the class benign; (b) Possibility distribution of the class malignant.

Fig. 5 .
Fig. 5. Distribution of different mass margin types in the MIAS database.

Fig. 6 .
Fig.6.Obtained results with the application of the proposed method on four samples.

TABLE III .
COMPARISON BETWEEN THE PROPOSED MASS CLASSIFICATION METHOD AND THE STATE-OF-THE-ART METHODS