A Comparative Study of Thresholding Algorithms on Breast Area and Fibroglandular Tissue

One of the independently risk factors of breast cancer is mammographic density reflecting the composition of the fibroglandular tissue in breast area. Tumor in the mammogram is precisely complicated to detect as it is covered by the density (the masking effect). The determination of mammographic density may be implemented by calculating percentage of mammographic density (quantitative and objective approaches). Thereby, the use of a proper thresholding algorithm is highly required in order to obtain the fibroglandular tissue area and breast area. The mammograms used in the research were derived from Oncology Clinic, Yogyakarta that had been verified by Radiologists using semi-automatic thresholding. This research was aimed to compare the performance of the thresholding algorithm using three parameters, namely: PME, RAE and MHD. Zack Algorithm had the best performance to obtain the breast area with PME, RAE and MHD of about 0.33%, 0.71% and 0.01 respectively. Meanwhile, there were two algorithms having good performance to obtain the fibroglandular tissue area, i.e. multilevel thresholding and maximum entropy with the value for PME (13.34%; 11:27%), RAE (53.34%; 51.26%) and MHD (1:47; 33.92) respectively. The obtained results suggest that zack algorithm is perfectly suited for getting breast area than multilevel thresholding and maximum entropy for getting fibroglandular tissue. It is one of the components to determine risk factors of breast cancer based on percentage of breast density. Keywords—Thresholding Algorithm; Breast Area; fibroglandular Area


INTRODUCTION I.
One of the preventive measures to decrease the number of breast cancer patients is by having routine screenings.The mammographic density is one of the parts of BI-RADS assessment proposed by the American College of Radiology (ACR) in 2004 modified from Wolfe Standards and widely used by Radiologists.One of the approaches used to assess mammographic density is quantitative and objective approach, by calculating the percentage of mammographic density by means of comparing relative amount of fibroglandular tissue and breast area [1] [2] [3] [4] and [5].Women will have a greater risk then men if their fibroglandular tissue area is higher compared to their fat tissue in the breast area [3].To obtain areas, (fibroglandular tissue area and fat tissue in the breast area), it is necessary to conduct segmentation process automatically by employing thresholding method.The use of proper thresholding method will be able to separate discriminate the fat tissue in the breast area and its background and/or to separate the fibroglandular tissue and fat tissue based on the threshold value obtained.After obtaining these two areas, the ratio value can be calculated, between the fibroglandular tissue and breast area, indicating the risk factors of breast cancer.The result of threshold value can be performed either automatically or semi-automatically.
Several previous researches have used a semi-automated thresholding in mammogram image, including: [2] [5] [6] and [7].Meanwhile, several previous researches only focused on the use of automated thresholding to obtain fibroglandular tissue area or breast area.The automated thresholding methods that have been used to obtain fibroglandular tissue include: Gaussian mixture modeling by [8] and minimum-cross entropy by [9], while the automatic thresholding methods that have been used to get breast areas include: row by row method thresholding (RRT) and average row threshold (ART) by [10] and by employing the threshold value of 18 by [11].[4] had proposed a calculation model of breast cancer risks by computing the percentage of mammographic density.This model could be applied as a reference to help decrease breast cancer risks.In the research, [5] it did not only use the risk factors of mammographic density but also the use of other risk factors, such as estradiol level and polymorphism ESR1 as a predictor of estrogenic factors related to breast cancer in the population of Javanese people in Indonesia.The calculation model of the percentage of mammographic density was conducted by the semi-automatic thresholding method and was named GAMA DEJAVU.Meanwhile, [6] semi-automated thresholding was also employed to determine breast cancer risk factors into four risks (BI-RADS standard), by involving three Radiologists for statistically extracted rules (mean, kurtosis and skewness).Other researches which also employed semiautomated thresholding method were [2] and [7].The objective of the use of the semi-automated thresholding method was to calculate the mammographic density based on BI-RADS on mammogram using craniocaudal view which had been previously determined on the basis of Tabar parenchymal pattern by Radiologists.
The use of RRT and ART methods by [10] has been implemented on 50 mammogram images from the public database DDSM for normal mammogram and breast cancer.The extraction results of both methods look similar.However, the performance of the ART method is better compared to RRT method for it's capability to extract breast area by eliminating the background perfectly.In addition, the limits of the breast www.ijacsa.thesai.orgarea of the ART methods look smoother, thus the output is more proper.On the other hand, the RRT method generates a larger breast extraction compared to breast area.Meanwhile, [11] used a threshold value of 18 to separate the breast area from its background.The result obtained from threshold 18 is the best compared to the previous two methods in the case that the periphery of the breast is highly smooth.However, the use of the threshold value of 18 has a weakness for its static nature.It means that no matter what the histogram condition of mammogram is, the threshold value used is still 18.Thus, when applied to the mammogram image possessing very little or much difference histogram, the threshold value of 18 is not the best threshold value.
On the other hand, the use of several automated thresholding methods to obtain fibroglandular tissue areas, such as Gaussian mixture modeling by [8], is aimed at conducting mammogram image segmentation by using mediolateral oblique view into several areas or sections anatomically.The mammogram is segmented into five components, namely: background, uncompressed fat, fat, dense tissue and muscles.Meanwhile, the minimum cross entropy by [9] is used to obtain the fibroglandular tissue area by separating the fat tissue from the breast area.[4] has developed the computational model in determining the breast cancer risk factors based on the percentage of mammographic density.The use of Zack algorithm to obtain the breast area and multilevel thresholding to obtain fibroglandular tissue area in the proposed model has better accuracy, sensitivity and specificity if compared to the use of maximum Zack algorithm and maximum entropy.The assessment of algorithm performance for the new thresholding was performed simultaneously to obtain breast cancer risk factors.Thereby, this research would be focused more on comparing the performance of several automated thresholding methods if employed to obtain both objects.

MATERIAL AND METHOD II.
This research used mammograms taken from patients who had mammography check-up in Oncology Clinic, Kotabaru, Yogyakarta, with craniocaudal views.Those images were the digitalization from analogue images into digital images with bmp extension in various sizes.They had been classified by Radiologists into four risk factor categories in accordance with BI-RADS standards.

Pre-processing A.
In the pre-processing stage, there was only one process conducted to simplify the segmentation process.The process was the conversion of RGB images into to gray images.Subsequently, the gray image from the stage results would undergo segmentation process by using several automated thresholding algorithms with two different objectives, i.e. to obtain the breast area and to obtain fibroglandular tissue area.

Segmentation B.
The segmentation process was performed by using five automated thresholding methods, namely: Zack algorithm, Otsu, multilevel thresholding, maximum entropy and minimum entropy.Those five algorithms generated threshold value which was automatically implemented on the mammogram images with the aim to separate the breast area from its background and to separate the fibroglandular tissue from the breast area.Firstly, Zack algorithm or triangle thresholding is algorithm to be used to determine the generated threshold value based on the gray intensity histogram (h [x]) out of some component of the image parts associated with a line.In broad sense, the algorithm is consisted of several procedures, namely: finding the min and max value of the degree of grayness, finding the farthest periphery and describing the connecting lines [12].Secondly, the Otsu thresholding is a searching method of an optimal threshold value obtained by using discriminating criteria to maximize the distribution result of the two classes on the grayness level.This method was done to minimize the total weights of some variants in the class of the background and foreground pixels to obtain the optimal threshold [13].Thirdly, the multilevel thresholding is a recursive algorithm based on the Otsu method introduced by [11].It is considered effective in computing to find many threshold levels in the images by using table look-up.The working of this method is by modifying the class variance which is previously calculated and stored in the look-up table to reduce the computation complexity of cumulative probability and the mean of each class [14].Fourthly and fifthly, the maximum and minimum thresholding entropy is a thresholding algorithm based on the entropy distribution from the degree of gray image.The maximum entropy obtained based on the maximization of the entropy value of the two classes is foreground and background [15].Meanwhile, the search process of threshold value in minimal entropy is based on the minimizing of entropy value between the two classes.

The Analysis of Segmentation Method Performance C.
The performance comparison of several thresholding algorithms in the segmentation process was assessed based on the value of three parameters, namely: PME, RAE and MHD [16] and [17].The use of those three parameters was aimed to compare the quality of several mammogram images as the results of segmentation process generated based on the threshold value from the thresholding algorithm.The images from segmentation results were compared to the reference images which had been verified by Radiologists using semiautomated thresholding.

1) Percentage Misclassification Error (PME)
PME is a picture of correlation between segmentation results image and Radiology observations result reflecting the percentage between some mistaken pixel background as it is considered as the pixels of the objects or vice versa.The formula for PME is shown in Equation 1.
is the number of pixels on the background of Radiology observation results, shows the number of pixels on the object of Radiology observation results, represents the number pixels on the background of the segmentation result images generated by thresholding method, and shows the number of pixels on the object from the images of segmentation results produced by the thresholding method.www.ijacsa.thesai.org2) Relative Foreground Area Error (RAE) RAE is a parameter for measuring the number of difference among thresholding result images on reference images in which the Radiology observation result.The formula for RAE is defined in Equation ( 2) and (3) (2) in which is the object area of the reference images, and is the object area of binary image which is the result of the use of thresholding method.

3) Modified Hausdorff Distance (MHD)
MHD is a method used to measure the distortion of the object form resulted from the thresholding process from the reference images object.The MHD formula is shown in Equation ( 4) and (5). Where, dan represent the number of pixels on the object area derived from reference images and the images resulted from thresholding process.Subsequently, the threshold value was sought for those five algorithms described in the previous sub-chapter.The threshold value obtained for each mammogram image by thresholding algorithm is shown in Table 1.For example, for mammogram 1, it has a threshold value of 13 for Zack algorithm, 70 for Otsu, 141 for multilevel thresholding, 128 for maximum entropy and 50 for minimum entropy.The reference images made as the comparator are the reference binary images resulted from segmentation by using semi-automated thresholding conducted by Radiologists.There are two threshold values used to obtain fibroglandular tissue area and breast area.The complete result of those five mammograms is shown in table 2. For example, to obtain breast area and fibroglandular area on mammogram 1, the threshold value used is 13 and 122.
The performance evaluation results for breast images using those five thresholding algorithms are shown in Table 3.The first parameter, PME based on a formula (1) reflects the percentage between several mistaken background pixels considered as the pixels of the object or vice versa.For the second parameter, RAE is based on formula (2) and (3) functions to measure the difference between the images resulted from thresholding algorithm and each of their reference images.In the third parameter, MHD is aimed to measure the distortion of the object forms resulted from the use of the five thresholding algorithms based on the objects of the reference images.The smaller the value for the three parameters indicates its better performance.It means that threshold values resulted from automated thresholding have similar values to the threshold values resulted from Radiology observations using semi-automated thresholding.Likewise, the computation process for the three parameters is to obtain a complete fibroglandular tissue area.The complete results of the threshold value obtained for each mammogram with the five thresholding algorithms are shown in Table 4. Subsequently, the computation results of the performance of the five thresholding algorithms are shown in Table (3) and (4).A computation of the mean value was done and the results are shown in Table (5).
The smaller the value indicates the smaller the difference, meaning that the images resulted from the segmentation by using thresholding algorithm is close to the images resulted from segmentation by using semi-automated thresholding by Radiologists.The computation results for the mean value of the performance of the thresholding algorithm are shown in Table 5.For example, Zack algorithm has the smallest value for all of the three parameters compared to other four algorithms, with respective value for PME, RAE and MHD by 0.33%; 0.71% and 0.01.It indicates that Zack algorithm has the best performance to obtain the breast area.Meanwhile, to obtain the fibroglandular tissue area, there are two algorithms having nearly identical performance, i.e. multilevel thresholding and maximum entropy.The values for parameter PME, RAE and MHD for multilevel thresholding respectively are 13.34%; 53.34% and 1.47, while for the maximum entropy is 11.27%; 51.26% and 33.92.www.ijacsa.thesai.orgAVERAGE PERFORMANCE OF THE THRESHOLDING ALGORITHM TABLE V.

CONCLUSION IV.
The comparison results of the thresholding algorithm performance are designated for two different purposes, i.e. to obtain the areas of breast and fibroglandular.By the virtue of the comparison results of the thresholding algorithm performance by using the three parameters of PME, RAE and MHD, it shows that Zack algorithm has the best performance to obtain the breast area.Meanwhile, to obtain fibroglandular tissue area, there are two thresholding algorithms having the best performance, i.e. multilevel thresholding and maximum entropy.The obtained results suggest that zack algorithm is perfectly suited for getting breast area than multilevel thresholding and maximum entropy for getting fibroglandular tissue.Further research needs to be conducted to improve the performance of the thresholding algorithm in obtaining fibroglandular tissue area using such as fuzzy c-partition entropy or some methods of intelligent system.
in this research is the analysis to determine the performance of each automated thresholding algorithm that is used to obtain the breast area and the fibroglandular tissue area.The use of the five thresholding algorithms tested on five mammogram images as the samples is to the extent of 1 mammogram to 5 mammograms shown in Figure1.(a) to 1 (e).The histogram of the five mammogram images is shown in Figure 2. (a) to 2. (e), which means that the mammogram image in Figure 1.(a) has the form of a histogram shown in Figure 2 (a), so as for the other four mammogram image types.The observation results on the form of histogram of the five mammogram images show that the histogram resulted has various forms.The histogram forms are not consistent in bimodal or nearly-bimodal forms, but there are several unimodal forms or in multimodal forms.