A New CAD System for Breast Microcalcifications Diagnosis

Breast cancer is one of the most deadly cancers in the world, especially among women. With no identified causes and absence of effective treatment, early detection remains necessary to limit the damages and provide possible cure. Submitting women with family antecedent to mammography periodically can provide an early diagnosis of breast tumors. Computer Aided Diagnosis (CAD) is a powerful tool that can help radiologists improving their diagnostic accuracy at earlier stages. Several works have been developed in order to analyze digital mammographies, detect possible lesions (especially masses and microcalcifications) and evaluate their malignancy. In this paper a new approach of breast microcalcifications diagnosis on digital mammograms is introduced. The proposed approach begins with a preprocessing procedure aiming artifacts and pectoral muscle removal based on morphologic operators and contrast enhancement based on galactophorous tree interpolation. The second step of the proposed CAD system consists on segmenting microcalcifications clusters, using Generalized Gaussian Density (GGD) estimation and a Bayesian backpropagation neural network. The last step is microcalcifications characterization using morphologic features which are used to feed a neuro-fuzzy system to classify the detected breast microcalcifications into benign and malignant classes. Keywords—Artifacts and pectoral muscle removal; Bayesian back-propagation neural network; Breast microcalcifications; CAD system; Digital mammograms; Galactophorous tree interpolation; GGD estimation; Morphologic features; Neuro-fuzzy system


INTRODUCTION
Breast cancer is the first cause of death among women worldwide.Studies have shown that detection of breast lesions at an early stage would increase the chances of survival and reduce the risk of sequels and obviously mortality.This detection can be achieved by submitting menopausal women and those with a family history of breast cancer on a mammogram every two years.However, analyzing mammograms by radiologists is not a trivial task: breast density is an important factor that can increase the risk of misinterpretation.
Computer-assisted diagnosis (CAD) offers radiologists a reliable aid to breast cancer screening.In this context, a new mammographic images enhancement approach is proposed, beginning with the application of the top hat to extract the breast area and eliminate artifacts.A wavelet contrast enhancement step is then carried out followed by a detection and suppression of pectoral muscle.Finally, oriented version of top hat is exploited for detection and interpolation of the galactophoric tree.
Then, a new technique is proposed for microcalcifications (Mcc) segmentation, based on the measurement of the generalized Gaussian density (GGD) and the use of a supervised classifier (a neural network with Bayesian backpropagation).
A classification approach of the detected lesions is finally introduced.It is to operate three morphological descriptors and a supervised classifier (neuro-fuzzy system) to distinguish between benign abnormalities and those malignant.

II. RELATED WORKS
Microcalcifications are tiny flecks of calcium, like grains of salt, in the soft tissue of the breast that can sometimes be an early indicator of breast cancer.
There is a variety of microcalcifications shapes as shown by Fig. 1: annular, round, linear, vascular... [38]  Several works have been carried out in order to enhance microcalcifications clustering on digital mammograms, segment them, characterize them and classify them into benign and malignant classes [4,7].www.ijacsa.thesai.org

A. Enhancement techniques
The major defect that could oppose a better mammograms analysis is a low contrast.Therefore, a contrast enhancement is necessary to improve the quality of mammographic images and facilitate their exploitation.
There are three different families of contrast enhancement techniques [14]: conventional techniques, regions-based techniques and features-based techniques.
2) Regions-based techniques: mainly consist on performing a region growing algorithm from a well-chosen pixel called "a seed" [56].
3) Features-based techniques: consider the processed mammographic image characteristics and include morphological operations, wavelet transform and fractal approach [9,47].
Although the performance of all these enhancement techniques depends on mammograms resolution [20], it has been proved that hybrid enhancement techniques are usually the best, since they allow strengthening advantages and filling disadvantages from several techniques.

B. Segmentation techniques
The main target of these techniques is to identify possible regions of interest (ROI).In [62], authors illustrated a detailed study of mammograms segmentation techniques that can be done either by using a unique view of the breast, or by considering multiple views.
1) Single view lesions detection: it consists of using a single mammogram to detect possible lesions.This category includes regions based approaches, such as regions growing algorithm [33,53,71], watershed algorithm [32] and Split and Merge algorithm [13].Some other approaches are based on the edge detection of mammogram components [5,10,21,25,28,34,37] We also find, in this category, clustering based approaches.They consist of detecting clusters which may represent an eventual tumor [12,48].These techniques are well suitable to microcalcifications' clusters detection.
Another type of single view techniques is models based approaches based on comparing the patient mammograms to known images of healthy and pathological cases [17,30].
2) Multiple views lesions detection: the lesions detection is done by comparing two mammographic images that can come from right breast and left breast.In this case, radiologists compare right and left mammograms to seek for abnormalities in both images [15,55,65].
Two different views of the same breast could be used as well; mostly one mediolateral oblique (MLO) view and one cranio-caudal (CC) view of the same breast [74].
The two views can also come from two mammograms of the same breast taken at different moments: the main purpose is detecting a possible lesion evolution [75].
The efficiency of all segmentation techniques has been widely proven in literature.However, each technique still presents some disadvantages [62].For example, region-based approaches depend on the seed selection and the algorithm ending conditions.Some techniques (mainly fractal model technique) are known as time-consuming [28].

C. Characterization techniques
The main goal of these techniques is to extract several primitives to characterize the ROI selected during the segmentation step, in order to classify the lesions into benign and malignant classes.Several primitives have been exploited in the literature.In [14], Cheng et al. have summarized the different primitives used for lesions characterization.
There are characteristics related to microcalcifications clusters: description of the weight distribution, the area and the number of microcalcifications [18,19,20,66].
Primitives extracted from co-occurrence matrix such as energy, entropy and contrast were used in [25].
Few works have used surround region dependence matrix -SRDM, gray level run length matrix -GLRL and gray level difference matrix -GLD [42].
Wavelet decomposition provides many primitives characterizing gray-levels frequencies from different orientations and has been widely used in breast cancer context [16,23,27,41].
There are other techniques that have been exploited in breast lesion characterization such as Gabor filter bank [23,57], Gaussian Laplacian filter [59] and fractal dimension [60].
The use of all the primitives described above can offer almost perfect results of classification [20].However, characterizing techniques could not be evaluated separately, but rather in association with the classification approach.
Cheng et al. have evaluated the accuracy of the different classifiers in malignancy analysis as follows: from 87% to 90% with neural networks classifiers, from 71.08% to 83.13% with the k-nearest neighbors' technique and from 94% to 97.3% with decision tree [14].www.ijacsa.thesai.orgHowever, this accuracy is highly sensitive to the primitives' selection during the characterization step.

III. PROPOSED MICROCALCIFICATIONS CAD SYSTEM
The proposed CAD system chain contains four essential steps: mammograms enhancement by interpolating the galactophorous tree, microcalcifications detection using GGD estimation, morphologic characterization of detected clusters and neuro-fuzzy classification.This approach is described by Fig. 2.

A. Proposed enhancement approach
Low contrast is the major problem encountered during mammograms analysis.Therefore, several contrast enhancement techniques have been developed, in order to solve this issue and facilitate lesions detection [9,14,22,31,43,47,54,56] as described in the previous section.
The proposed enhancement method, published last year in [6], operates in four steps aiming to delimitate breast area from a digital mammogram and increase contrast between normal tissue and possible microcalcifications' clusters.
The first step consists on removing all unnecessary details (radiologists' labels, scanning artifacts and film boundaries).The second step is to increase the image contrast and denoise it using wavelet transform.The third step consists of detecting and removing pectoral muscle and the fourth step aims to detect, then interpolate, galactophorious tree from the mammogram.These four steps help to prepare the breast image to further treatments by delimiting the breast region and enhancing the suspicious regions.

1) Artifacts and film boundaries removal:
In order to extract the breast region, black and/or white vertical bands, corresponding to the film boundaries, are first removed.These columns are eliminated using a simple algorithm that detects the first four not-black corners in the image.
Then, morphological erosion with a square structuring element of size 13 pixels is applied, followed by a thresholding operation.A well-chosen threshold reveals two related areas of very different sizes: the big one corresponds to the mammary gland, and the other one to artifacts.A simple opening with a structuring element of a greater size helps deleting artifacts and keeping only the breast area (Fig. 3).First, low frequency component L(x, y) is extracted from the preprocessed image I(x, y), using a Gaussian low-pass filter, in order to separate the useful information contained in the lower part of the image, from the noisy information contained in the higher parts.Then, a white top-hat (ToHb) and a black top-hat (ToHn) transforms are separately applied to L(x, y).These transformations are given by ( 1) and (2).
) (  White top hat is defined as the difference between the original image and its opening by the structuring element S and black top-hat transformation is defined by the difference between the image and its closing.The resulting image E(x,y) is given by (3).
 A soft denoising algorithm is finally applied, based on three steps: 2-level Daubechies wavelet decomposition is performed, then the detail coefficients are thresholded (with dynamic threshold calculated at each level of decomposition.Finally, a wavelet reconstruction is applied so that the contrast of the resulting images is visibly enhanced (Fig. 4).

4) Detection and interpolation of the galactophorous tree:
In order to increase mammograms contrast, especially for dense breasts, it is important to remove from the breast region, all the details that may interfere with detecting microcalcifications, including galactophorous tree.
Galactophorous tree has the structure of overlapped vessels with high gray levels intensities that connect lobules of the mammary gland to the tip of the nipple.Its presence could lead to wrongly suspected regions.Indeed, galactophorious tree is in the form of a lines network with variable thickness from a region to another.In this last preprocessing step, an oriented version of the top-hat transform is used to detect all pixels of the galactophorous tree.
The extracted elements width can be controlled by the structuring element choice.Since galactophorous vessels have different thicknesses and gray levels, the structuring element must be straight and oriented in different directions.The different tests led to use three straight segments of respective lengths 10, 20 and 30 pixels, oriented in 13 different directions rising from 0° to 360° by step of 30°.The galactophorous tree is obtained by summing all the 39 obtained images (13 orientations for each of the 3 segment lengths).

B. Proposed segmentation approach
In [8], an unsupervised masses detection approach based on Generalized Gaussian Density (GGD) was proposed.In this work, the GGD estimation is used with a supervised classifier to detect microcalcifications clusters.
The main principle of the Generalized Gaussian Density is wavelet decomposition.
Texture analysis using Generalized Gaussian Density was introduced by Do and Vetterli [24].It consists on building the histogram showing the distribution of the coefficients extracted from the wavelet transform at a given sub-band (level).For each sub-band, a continuous law describing as faithfully as possible the histogram behavior is determinate.
Experimentally, the histogram distribution resembles a Gaussian distribution centered on 0, but for some textures, the peak at 0 is not very rounded and rather reminds a Laplace distribution [49].Do and Vetterli [24] have proposed to model the wavelet coefficients behavior, at each scale, by a generalized Gaussian, parameterized by three factors μ, α and β (4) [73]:


Where: ᴦ(z) = ∫ , z > 0 : gamma function μ, α and β: mean, scale and shape parameters respectively The form factor β governs the shape, more or less sharp.The scale factor α governs the spread of the curve and corresponds to the standard deviation in the case of a classic Gaussian.Fig. 7 gives examples of GGD distributions for different values of α and β.In order to correctly decide whether a microcalcifications cluster exist or not in a given mammogram, two processes are ensured: a training process and a testing process [50].
 Training: in this step, known mammograms are considered.They are enhanced as described previously and then decomposed using a three-level redundant Haar wavelet transform.The multi-scale analysis is carried out by sweeping a 64 × 64 pixels window in the entire image.Then, several primitives are extracted (such as GGD, energy, mean and standard deviation) and then stored in a features' matrix.
 Testing: in this stage, an unknown mammogram is preprocessed, divided into blocks of 64 × 64 pixels and then decomposed using the same wavelets transform described above.Then, a set of features is extracted for each block and compared to the feature values stored in the features' matrix.To achieve this comparison, we use multi-layer perception Bayesian regularization Nowadays, neural networks are one of the most powerful tools in textural tissues recognition [40,72], since they are able to learn from known examples.The elementary component of a neural network is the neuron.Each neuron is linked to some of its neighbors with varying coefficients of connectivity representing the strengths of these connections [26].
During the learning procedure, the connectivity coefficients are adjusted so that neurons can be grouped into layers.The majority of the published works use standard 3-layer architecture.
In this work, several tests have been carried out, using neural networks with one and two hidden layers, containing 3 to 15 neurons each.In addition, three different activation functions were tested: logistic sigmoid function (logsig), hyperbolic tangent function (tanh) and a simple linear function (pur).Fig. 8 shows results of these different tests.Besides, back-propagation learning is used: it consists of minimizing an error function using an optimization method such as gradient descent, Quasi-Newton, and Levenberg-Marquardt method [50].In addition, a 10-fold cross-validation process is used to avoid over fitting and improve the generalization ability of the back-propagation trained net-work.Therefore, the input data are randomly partitioned into 3 sets: a training set, a testing set and a validation set (Fig. 9).Each time, the neural network is trained with the training set and then verified with the generated validation set until the validation error starts increasing.The training procedure is then stopped and the network with minimum validation error is selected as the best model and used to classify the test set.
The process was performed 10 times, and then all the10 recognition rates are averaged to obtain the final performance of the proposed system.In order to create distinct data sets for cross-validation, none of the sets in the training folder appear in any of the remaining folders.This way, every network was trained to give the maximum value of 1 for the extracted microcalcifications cluster region and 0 for the other regions.

C. Proposed classification approach
In this work, three morphologic features are first extracted to describe the detected microcalcifications cluster distribution and size: area, compactness and eccentricity.Then a neurofuzzy network is exploited to classify characterized microcalcifications into malignant and benign classes.
Fuzzy logic has become a significant area of interest for researchers on artificial intelligence.Pr.Mamdani was the first to investigate the use of fuzzy logic to simulate human decision principles.Fuzzy models have the advantage of integrating the knowledge representation and reasoning mechanism with the priori expert experience and knowledge.
A fuzzy system is composed of a knowledge base (KB), and an inference engine module that includes a fuzzification interface, an inference system and a defuzzification interface.
The KB contains a Data Base (DB) and a Rule Base (RB): the Data Base contains all the sets considered in the linguistic rules and the membership functions defining the semantics of the linguistic labels and the Rule Base contains a collection of linguistic rules that are joined by some operators.
The structure of a fuzzy system is illustrated in Fig. 11.www.ijacsa.thesai.orgCombining neural networks with fuzzy systems, called neuro-fuzzy systems, is a powerful alternative approach to develop fuzzy systems [29].In fuzzy systems, relationships are represented explicitly in the form of if-then rules whereas, in neural networks, the same relationships are not explicitly given, but are given in the network by its parameters.Neurofuzzy systems combine the semantic transparency of rule-based fuzzy systems with the learning capability of neural networks [3].In this work, an improved neuro-fuzzy system, known as adaptive network-based fuzzy inference system (ANFIS) is used [36].It is a neuro-fuzzy network with five layers.It includes a knowledge representation and a reasoning mechanism resembling a human expert one.
The inference engine simulates the human expert reasoning based on fuzzy.In our case, we used 9 fuzzy rules that are destined to assist the ANFIS system in the classification decision of microcalcifications, based on the 5 classical membership functions of the deduction system: Very Negative (VNE), Negative (NE), Zero (Z), Positive (PO) and Very Positive (VP).
For each mammogram, fuzzy system is run 10 times, and the average of the ten classification rates is considered.The fuzzy rules used for microcalcifications classification are the following:  Rule 1: if (e1 is PO) and (e2 is PO) and (e3 is PO) then (Malignant is VPO).
 Rule 9: if (e1 is NE) and (e2 is NE) and (e3 is PO) then (Malignant is NE).Fig. 12 and Fig. 13 show the membership functions respectively of the three inputs and the one output of the ANFIS system used for microcalcifications malignant/benign classification.

IV. EXPERIMENTATIONS AND RESULTS
The proposed microcalcifications CAD system was tested on mammograms coming from the MIAS database.
MIAS database is the most used mammographies' database since it can easily be downloaded and exploited.It contains 322 medio-lateral oblique (MLO) mammograms: those whose number is even are left MLO and those whose number is odd are right MLO (Fig. 14).
The 322 images cover all the possibilities of diagnosis: normal (208 images), masses (56 images), microcalcifications (25 images), architectural distortions (18 images) and unbalances (15 images).The efficiency of the proposed CAD system was tested on the 25 mammograms containing microcalcifications clusters.
These images are preprocessed as described previously: artifacts and film boundaries are removed, and then the contrast is enhanced using a soft wavelet coefficients thresholding.After that, the pectoral muscle is detected and removed from the mammograms.The final preprocessing step is the detection and interpolation of the galactophorous tree using an oriented top hat.
Hence enhanced, the mammogram is analyzed using a 64 by 64 pixels sliding window.For each defined block, several GGD features are extracted and then used by a Bayesian backpropagation neural network to detect pixels belonging to microcalcifications cluster.
For each mammographic image, a cross-validation is used, so that the blocks classification is executed 10 times.Each time, training set, validation set and testing set are randomly selected.
The average recognition rate of the 10 tests of the proposed microcalcifications segmentation technique has reached 94.44%, which is promising compared to segmentation rates given in several works.
In fact, using a neural network provided a recognition rate of 70.8% with textural primitives in [2] and 84% with morphologic features in [63].
In [1], authors used wavelet transform to characterize breast tissue and an SVM classifier and achieved only 79.58% of good detection.
Using neural network with textural primitives provided S. Krishnaveni et al. with a detection rate of 96.25% [45].The segmented clusters are next characterized and classified into benign and malignant classes as described in the previous section.The proposed classification approach was tested 10 times on segmented microcalcifications clusters from MIAS database.The average classification rate has reached 99%, which is perfect as a result for a CAD system.M.J. Bottema et al. used the analysis of the density for the classification of microcalcifications, marking a rate of 69% [8].
C. Anuradha and P. Preeti proposed in [1] malignant/ benign microcalcifications classification approach based on wavelet analysis.They compared two supervised classifiers: SVM whose classification rate reached 69% and an artificial neural network which has provided 96% of good classification.
In [51] [8] density analysis 69% [51] extreme learning machine 94% [1] svm 96% Proposed approach ANFIS system 99% V. CONCLUSION In this paper, a new microcalcifications clusters CAD system is proposed.The process begins with a preprocessing step aiming the removal of unnecessary components (artifacts, film boundaries and pectoral muscle) and the enhancement of the mammograms contrast based mainly on detecting and interpolating the galactophorous tree structure.
Microcalcifications segmentation is processed by dividing the enhanced mammograms into 64 by 64 pixels overlapped blocks.Each block is characterized using GGD analysis, and calculated features are used to separate microcalcifications clusters from normal breast tissue via a Bayesian backpropagation neural network.The detection rate has reached 94.44%.
Finally, three morphologic features are calculated to characterize segmented microcalcifications and an ANFIS system is used to classify these detected lesions into benign and malignant classes.99% of microcalcifications clusters were correctly classified.
All tests were carried out with MATLAB R2014 with an Intel Core i5 CPU, 2.53 GHZ and 4GB of RAM.
The proposed approach has proven its efficiency, not only for microcalcifications segmentation and classification, but for breast masses diagnosis as well [50].The proposed CAD system could be ameliorated by combining 2D-mammograms with 3D-mammograms to boost segmentation and classification accuracy.

Fig. 3 .
Fig. 3. Artifacts and film boundaries removal (a) Original images and (b) Resulting images2) Contrast enhancement and wavelet denoising: Contrast enhancement step is very important to ensure better segmentation results.Classic techniques have shown limits in medical image processing.Several enhancement approaches have been proposed in literature[64,68,69,70] aiming to improve mammographic images contrast.One of them has given better results with medical images: it is wavelet transform.

Fig. 4 .Fig. 5 .
Fig. 4. Contrast enhancement and wavelet denoising (a) Images resulting from first preprocessing step and (b) Enhanced images 3) Pectoral muscle removal: The third step of the proposed preprocessing technique is the removal of the pectoral muscle.This muscle usually appears in MLO mammograms with intensities very similar to those of the microcalcifications.An adaptive thresholding algorithm is used to detect pectoral

Fig. 8 .
Fig. 8. Recognition rates for different activation functions and different numbers of neurons Best microcalcifications segmentation rates are obtained with the hyperbolic tangent activation function, for a neural network containing two hidden layers with 15 neurons each.

Fig. 10 Fig. 10 .
Fig. 10 gives an example of microcalcifications segmentation result on a digital mammogram.

TABLE I .
COMPARISON OF MICROCALCIFICATIONS DETECTION RESULTS , Malar et al., authors have developed a classification algorithm which reached a rate of 94%.They used descriptors from wavelet analysis and a supervised classifier (ELM for Extreme Learning Machine).

TABLE II .
COMPARAISON OF CLASSIFICATION RESULT WITH OTHER WORKS