Texture Based Segmentation using Statistical Properties for Mammographic Images

Segmentation is very basic and important step in computer vision and image processing. For medical images specifically accuracy is much more important than the computational complexity and thus time required by process. But as volume of data of patients goes on increasing then it becomes necessary to think about the processing time along with accuracy. Here in this paper, new algorithm is proposed for texture based segmentation using statistical properties. For that probability of each intensity value of image is calculated directly and image is formed by replacing intensity by its probability . Variance is calculated in three different ways to extract the texture features of the mammographic images. These results of proposed algorithm are compared with well known GLCM and Watershed algorithm. Segmenting mammographic images into homogeneous texture regions representing disparate tissue types is often a useful preprocessing step in the computer-assisted detection of breast cancer. With the increasing size and number of medical images, the use of computers in facilitating their processing and analysis has become necessary. Estimation of the volume of the whole organ, parts of the organ and/or objects within an organ i.e. tumors is clinically important in the analysis of medical image. The relative change in size, shape and the spatial relationships between anatomical structures obtained from intensity distributions provide important information in clinical diagnosis for monitoring disease progression. Therefore, radiologists are particularly interested to observe the size, shape and texture of the organs and/or parts of the organ. For this, organ and tissue morphometry performed in every radiological imaging centre. Texture based image segmentation is area of intense research activity in the past few years and many algorithms were published in consequence of all this effort, starting from simple thresholding method up to the most sophisticated random field type method. The repeating occurrence of homogeneous regions of images is texture. Texture image segmentation identifies image regions that have homogeneous with respect to a selected texture measure. Recent approaches to texture based segmentation are based on linear transforms and multiresolution feature extraction (1), Markov random filed models (2,3), Wavelets (4 - 6) and fractal dimension (7). Although unsupervised texture-based image segmentation is


I. INTRODUCTION
Segmenting mammographic images into homogeneous texture regions representing disparate tissue types is often a useful preprocessing step in the computer-assisted detection of breast cancer.With the increasing size and number of medical images, the use of computers in facilitating their processing and analysis has become necessary.Estimation of the volume of the whole organ, parts of the organ and/or objects within an organ i.e. tumors is clinically important in the analysis of medical image.The relative change in size, shape and the spatial relationships between anatomical structures obtained from intensity distributions provide important information in clinical diagnosis for monitoring disease progression.Therefore, radiologists are particularly interested to observe the size, shape and texture of the organs and/or parts of the organ.For this, organ and tissue morphometry performed in every radiological imaging centre.Texture based image segmentation is area of intense research activity in the past few years and many algorithms were published in consequence of all this effort, starting from simple thresholding method up to the most sophisticated random field type method.The repeating occurrence of homogeneous regions of images is texture.Texture image segmentation identifies image regions that have homogeneous with respect to a selected texture measure.Recent approaches to texture based segmentation are based on linear transforms and multiresolution feature extraction [1], Markov random filed models [2,3], Wavelets [4 -6] and fractal dimension [7].Although unsupervised texture-based image segmentation is not a novel approach, these have limited adoption due to their high computational complexity.
Segmentation methods are based on some pixel or region similarity measure in relation to their local neighborhood.These similarity measures in texture segmentation methods use some textural spatial-spectral-temporal features such as Markov random field statistics (MRF) [8][9][10], co-occurrence matrix based features [11], Gabor features [12], local binary pattern (LBP) [13], autocorrelation features and many others.A number of image processing methods have been proposed to perform this task.S. M. Lai et al. [14] and W. Qian et al. [15] have proposed using modified and weighted median filtering, respectively, to enhance the digitized image prior to object identification.D. Brzakovic et al. [16] used thresholding and fuzzy pyramid linking for mass localization and classification.Other investigators have proposed using the asymmetry between the right and left breast images to determine possible mass locations.Yin et al. [17] uses both linear and nonlinear bilateral subtractions while the method by Lau et al. [18].relies on "structural asymmetry" between the two breast images.Recently Kegelmeyer has reported promising results for detecting spiculated lesions based on local edge characteristics and Laws texture features [19][20][21].The above methods produced a true positive detection rate of approximately 90%.Various segmentation techniques have been proposed based on statistically measurable features in the image [22][23][24][25][26][27] Clustering algorithms, such as k-means and ISODATA, operate in an unsupervised mode and have been applied to a wide range of classification problems.
For this paper gray level co-occurrence matrix based features and watershed algorithm are considered for comparison with proposed algorithm which is based on statistical properties for segmentation of mammographic images.In section II different algorithms for texture based segmentation are explained in detail.Section III shows results for those methods and section IV concludes the work.

II. ALGORITHMS FOR TEXTURE BASED SEGMENTATION
Texture is one of the most important defining characteristics of an image.It is characterized by the spatial distribution of gray levels in a neighborhood.In order to capture the spatial dependence of gray-level values which contribute to the perception of texture, a two dimensional dependence texture analysis matrix are discussed for texture http://ijacsa.thesai.orgconsideration.Since texture shows its characteristics by both each pixel and pixel values.There are many approaches using for texture classification.

A. Gray Level Co-occurrence Matrix(GLCM)
The gray-level co-occurrence matrix seems to be a wellknow statistical technique for feature extraction.However, there is a different statistical technique using the absolute differences between pairs of gray levels in an image segment that is the classification measures from the Fourier spectrum of image segments.Haralick suggested the use of gray level co-occurrence matrices (GLCM) for definition of textural features.The values of the co-occurrence matrix elements present relative frequencies with which two neighboring pixels separated by distance d appear on the image, where one of them has gray level i and other j.Such matrix is symmetric and also a function of the angular relationship between two neighboring pixels.The co-occurrences matrix can be calculated on the whole image, but by calculating it in a small window which scanning the image, the co-occurrence matrix can be associated with each pixel.
For a 256 gray levels image one should compute 256x256 co-occurrence matrices at all positions of the image.It is obvious that such matrices are too large and their computation becomes memory intensive.Therefore, it is justified to use a less number of gray levels, typically 64 or 32.There is no unique way to choose the values of distance, angle and window, because they are in relationship with a size of pattern.
Using co-occurrence matrix textural features are defined as: Maximum Probability: max(P ij ) (1.1) where µ x and µ y are means and σ x , σ y are standard deviation Amongst all these features variance, probability has given the best results.Hence results for these extracted features using gray level co-occurrence matrix are displayed in section III.

B. Watershed Algorithm
The watershed transformation is a powerful tool for image transformation .Beucher and Lantuejoul were the first to apply the concept of watershed and divide lines to segmentation problems [28].They used it to segment images of bubbles and metallographic pictures.The watershed transformation considers the gradient magnitude of an image as a topographic surface.Watershed segmentation [29] classifies pixels into regions using gradient descent on image features and analysis of weak points along region boundaries.The image feature space is treated, using a suitable mapping, as a topological surface where higher values indicate the presence of boundaries in the original image data.It uses analogy with water gradually filling low lying landscape basins.The size of the basins grow with increasing amounts of water until they spill into one another.Small basins (regions) gradually merge together into larger basins.Regions are formed by using local geometric structure to associate the image domain features with local extremes measurement.
Watershed techniques produce a hierarchy of segmentations, thus the resulting segmentation has to be selected using either some prior knowledge or manually with trial and error.Hence by using this method the image segmentation can not be performed accurately and adequately, if we do not construct the objects we want to detect.These methods are well suited for different measurements fusion and they are less sensitive to user defined thresholds.In this approach, the picture segmentation is not the primary step of image understanding.On the contrary, a fair segmentation can be obtained only if we know exactly what we are looking for in the image.For this paper, watershed algorithm for mammographic images is implemented as mentioned in [30] and displayed as a result in Figure1(b) and 2(b) .

C. Proposed Algorithm
From the previous section it can be inferred that even though variance using GLCM gives proper tumor demarcation for mammographic images it require huge computation time to calculated statistical properties for the image.Watershed algorithm is comparatively less complex hence less computation time is required but this method gives over segmentation.Hence to achieve proper segmentation with less complexity, new algorithm has been proposed.In this proposed algorithm statistical properties such as variance, probability for grouping pixels into regions and then images are formed for each statistical property.

1) Probability
Images are modeled as a random variable.A full understanding of the properties of images and of the conclusions has drawn from them thus demand accurate statistical models of images.In this paper, probability of image is considered for extraction of the features of an image.
For complete image, probability of particular i th gray level is calculated which is given by: Where Xi is number of pixels for i th gray levels, M and N are no. of rows and columns of the image.
After calculating this the image is formed which contains probability values for that particular gray level instead of gray level in the image .Since the values of probabilities are too http://ijacsa.thesai.orgsmall they are invisible.For perceptibility of this image histogram equalization is preferred and displayed as equalized probability image as shown in Figure 1(e) and 2(e) for mammographic images.

2) Variance
Variance is a measure of the dispersion of a set of data points around their mean value.It is a mathematical expectation of the average squared deviations from the mean.The variance of a real-valued random variable is its second central moment, and it also happens to be its second cumulant.The variance of random variable is the square of its standard deviation.
, where E(X) is the expectation (mean) of the random variable X.That is, it is the expectation of the square of the deviation of X from its own mean.It can be expressed as "The average of the square of the distance of each data point from the mean", thus it is the mean squared deviation.This same definition is followed here for images.Initially probability of image is calculated.Since the probability values are very small, equalized probability image is applied as an input image to find variance of probability image by using 3x3 window size as given by Equation 1.7.Results are shown in the section III.
By using this approach any abnormality in the image can be observed very easily but quite often the radiologist need other details.In this case original image is used instead of using equalized probability image as an input image.Thus variance of original image is calculated using same Equation 1.7 for window size 3x3 and results are shown as direct variance image in the section III.In the third approach , variance is calculated using probability of the image as given by equation 1.8.Results are shown as variance using probability Figure1(h) and 2(h) Variance using probability (X) =E [ ( X -μ) 2 x P(X) ] (1.8)

III. RESULTS
Mammography images from mini-mias database were used in this paper for implementation of GLCM, Watershed and proposed algorithm for tumor demarcation.

IV. CONCLUSION
From Table 1 it can be inferred that GLCM method results are not very good but acceptable but have high computational complexity.As far as watershed algorithm is concerned the results are not acceptable because of over segmentation.The results of proposed methods using statistical parameters such as variance, probability are all http://ijacsa.thesai.orgacceptable, amongst which direct variance method gives the best results for mammographic images which are verified by radiologist.

Fig. 2 (
a) shows original image with tumor.It has fatty tissues as background.Class of abnormality present is CIRC which means welldefined/ circumscribed masses.Image 1 and Image 2 (mdb184 and mdb028 from database) have malignant abnormalities.Figure 1(a) and 2 (a) show original mammographic images.Figure 1(b) and 2 (b) indicates segmentation using watershed algorithm .
Figure 1 and 2 (c)-(d) show results for probability and variance using GLCM.

Figure 1
and 2 (e)-(h) show equalized probability, variance of probability ,direct variance and variance using probability image for image 1 and image 2.

Figure 1
Figure 1 Results of Watershed, GLCM and proposed algorithm for image 1

Figure 2 .
Figure 2.Results of Watershed, GLCM and proposed algorithm for image 2

Table 1 :
Performance comparison of GLCM, Watershed and Proposed Algorithm