Segmentation of Ultrasound Breast Images Using Vector Neighborhood with Vector Sequencing on Kmcg and Augmented Kmcg Algorithms

— B mode ultrasound (US) imaging is popular and important modality to examine the range of clinical problems and also used as complimentary to the mammogram imaging to detect and diagnose the nature breast tumor. To understand the nature (benign or malignant) of the tumor most of the radiologists focus on shape and boundary. Therefore boundary is as important characteristic of the tumor along with the shape. Tracing the contour manually is a time consuming and tedious task. Automated and efficient segmentation method also helps radiologists to understand and observe the volume of a tumor (growth or shrinkage). Inherent artifact present in US images, such as speckle, attenuation and shadows are major hurdles in achieving proper segmentation. Along with these artifacts, inhomogeneous texture present in the region of interest is also a major concern. Most of the algorithms studies in the literature include noise removal technique as a preprocessing step. Here in this paper, we are eliminating this step and directly handling the images with high degree of noise. VQ based clustering technique is proposed for US image segmentation with KMCG and augmented KMCG codebook generation algorithms. Using this algorithm images are divided in to clusters, further these clusters are merged sequentially. A novel technique of sequential cluster merging with vector sequencing has been used. We have also proposed a technique to find out the region of interest from the selected cluster with seed vector acquisition. Results obtained by our method are compared with our earlier method and Marker Controlled Watershed transform. With the opinion of the expert radiologist, we found that our method gives better results.


I. INTRODUCTION
B-Mode ultrasound (US) imaging is widely used diagnostic tool to examine the range of clinical problems because of its real-time image availability, non invasive nature and low cost of a scan.It has very low health risk to the patient during examination and image acquisition process relative to the other imaging modality [1,2].In the tissue characterization of the breast, US imaging is a complimentary method to the mammogram to distinguish between benign and malignant solid masses [3,4].Usually a malignant tumor has different characteristics, such as irregular shape, ill defined margins and heterogeneous echo texture as compare to benign tumor.Round or oval shape with well defined boundary of a tumor is the most valuable information in the detection as benign and can be used to reduce the number of biopsies performed [5].
Segmentation of US images provides clinically valuable information for radiologists in terms of shape irregularity, boundary definition and quantitative measurement of lesion size.The accurate measurement of a lesion is basically used to monitor the tumor growth and also helps in treatment and planning for surgery [6].Due to the limitation of acquisition process (dependence on expert radiologist) and technology, detection and measuring the size of tumor manually is difficult and time consuming process.However, with the improvement in the scanning devices (transducers), the inherent artifacts, such as speckle (which decrease signal-to-noise ratio), attenuation, shadows and signal dropout degrades the quality of the US images acquired.Many traditional segmentation algorithms are not suitable for such poor quality images, unless preprocessing steps to remove these artifacts has been used [7].The artifact such as attenuation, which is causes by the gradual loss in the intensity of the ultrasound waves, generates inhomogeneous intensities within the same tissue type regions (i.e.tumor) and significant overlap between different class tissues (i.e. at the boundary).Blurred boundary and presence of tissue intensity variation in the region of interest are major constraints to achieve accuracy in the automated intensity based segmentation [8].Some methods are discussed in the literature which handles this issue with the help of multiple images of the same region (sequence of images), but the processing of multiple images together is computationally inefficient [9,10].Other methods discussed in the literature for segmentation and classification such as, texture feature, thresholding [11,12], region growing and region merging [13,14], neural network, wavelet and watershed transform [15,16], clustering [17,18] etc, are strongly influenced by these inherent artifacts and involves steps to remove them to enhance the quality of images.The most usual artifact in US images is speckle and its degree is depends on human expertise, acquisition process and devices.This phenomenon highly affects the accuracy of segmentation and requires attention, so removal of noise has been extensively studied by the researchers and provides solutions [19][20][21][22].Here, in this paper, we are demarking the boundary of the tumor in high degree noisy and attenuated US images www.ijacsa.thesai.orgwithout involving any preprocessing (i.e.image enhancement) step.We are proposing Vector Quantization (VQ) based clustering with new computationally efficient codebook generation algorithm [23], i.e.Kekre's Median Codebook Generation (KMCG) and augmented KMCG.Improved method of cluster merging with vector neighborhood is proposed on novel method of sequential merging of clusters [24] and compared with each other.
The other sections of this paper are organized as follows, in section II, vector quantization is discussed with encoding technique and its usability in segmentation.In section III, original VQ based KMCG codebook generation algorithm and augmented KMCG are discussed along with training set formation for horizontal and vertical division of the images.A novel technique of vector neighborhood with vector sequencing is described in section IV along with new technique for closing the holes in the clusters.In section V results are discussed followed by conclusion in section VI.

II. VECTOR QUANTIZATION
Vector Quantization (VQ) was initially developed and implemented for image compression, with the help of many codebook generation and quantization algorithms [25,26,27,28], but now a days it has been extensively use in other applications, such as image segmentation [29], speech recognition [30], pattern recognition and face detection [31,32], tumor demarcation in MRI and Mammogram images [33,34], content based image retrieval [35,36] etc.In this paper, this method has been used as clustering aid in demarcation of area of interest (cysts and tumor) in breast ultrasound images.
A two dimensional image I(X, Y) is converted into K dimensional vector space of size M, V = {V 1 , V 2 , V 3 ,……….., V M } (training set).VQ is used as a mapping function to convert this K dimensional vector space to finite set CB = {C 1 , C 2 , C 3 , C 4 ,…….., C N }.CB is a codebook of size N and each code vector from C 1 to C N represents the specific set of vectors of the entire training set of dimensions K and size M.The codebook size is much smaller than size of the training set and it can represent entire training set.Here, in this paper, the work has been done in spatial domain and size of the codebook is limited to only eight codevectors, which are further used to forms eight clusters.As discussed in the section III A and B, KMCG and augmented KMCG are used as VQ based clustering algorithms and each cluster represents different regions of the image.

A. Kekre's Median Codebook Generation algorithm (KMCG)
This algorithm was proposed for data compression [37,38,39], but this VQ based algorithm, proved its potential and usefulness in various applications, such as segmentation of mammographic images [35], content based image retrieval, face recognition etc.Here, in this paper, this iterative algorithm has been explored for demarcation of tumor from the breast US images.
Initially, Image I, is divided into M non-overlapping blocks of size 2x2 and these blocks are further converted into vectors of dimension 1 x K, (K=4) as shown in the Fig. 1(a).Let, V be the training set, i.e.V = {V 1 , V 2 , V 3 ,……….., V M }, where V 1 to V M are individual vectors of the training set.This entire training set (matrix) of dimension M x 4 is considered as first cluster and become the input to the KMCG algorithm.To divide this cluster further, during the first iteration of this algorithm, entire training set has been sorted with respect to first value of all vectors (i.e.first column) and obtained the median of this sorted column.This median is consider as first codevector, and then divide training set into two clusters with respect to this codevector ,as shown in Fig. 1(b).(i.e upper part including median is first cluster C1 and lower part is second cluster C2).In second iteration, clusters C1 and C2 are sorted separately with respect to second column (i.e.second value of all the vectors) of the training set and obtain the medians of C1 and C2.Further, cluster C1 and C2 are divided into four clusters by using these new medians, as shown in Fig. 1(c).Same procedure has been repeated till we obtained the desired number of clusters.In Fig. 1, cluster formations are shown up to second iteration with k = 4.After acquiring desired number of clusters, they are merged together sequentially as discussed in section IV.B. Augmented KMCG KMCG algorithm has been augmented, to decrease the time require in the clustering process.In this method, vector size is increased to 6 columns, in which last four columns are used to store original gray levels obtained from 2 x 2 blocks of the image.Further, averages of each of these blocks are done separately and stored at the second column in the respective vectors.At the first column, sequence number of the respective vectors has been stored.
Eventually, the size of the entire training set used for this method becomes M x 6.Therefore, unlike the original KMCG, sorting process is done only once on the second column and accordingly vectors of the entire training set are shuffled by keeping vector intact (i.e.vectors are arranged in the increasing order of their average values stored at second column).To make clusters, during the www.ijacsa.thesai.orgprocess median value is obtained from the second column and entire training set is divided into two clusters.Further, these two clusters are handled separately and obtained median value from each of them from the same column.These median values further divides two clusters into four.This process is repeated until desired number of cluster has been formed.Since sorting process is applied only once on the second column (i.e.column of average gray values of the block), time require for clustering is drastically reduced as compared to original KMCG algorithm.

C. Training set formation with vector sequencing
Original KMCG and augmented KMCG algorithms are used to form clusters from the training set.Here, in this paper, two separate training sets are formed for each codebook generation algorithms, Fig. 2 shows training set used for augmented KMCG algorithm.First training set is created by dividing image I (X,Y) into 2x2 non overlapping blocks horizontally and sequence number of these blocks are added at the first location of respective vectors as the vector number as shown in Fig. 2, therefore, first column of the training set contains vector sequence number.Similarly, same image is divided into non overlapping blocks vertically and forms the second training set by adding sequence number of blocks at the first column of the vector.Same procedure has been followed to create two training sets for original KMCG algorithm, except calculation of average gray levels.Therefore, size of these training sets becomes M x 5 and clusters are created according to the method discussed in section III. A.

A. Sequential cluster merging
As discussed in section III.C, pair of separate training sets is used for each codebook generation algorithms to make clusters.Initially, first training set (i.e.formed by horizontal division of image) from the pair is divided in to eight clusters using code book generation algorithms, similarly second training set (i.e.formed by vertical division of image) divided in to another set of eight clusters by same algorithm.So pair of similar clusters is obtained for each algorithm."Fig.4", shows cluster images obtained using KMCG algorithm."Fig.5", shows cluster images obtained using augmented KMCG.These two sets of clusters are handled separately in segmentation process.Further these clusters are merged sequentially one-by-one and forms new sets of merged clusters.As shown in the "Fig.6", first cluster is added with second, resultant clusters is then added with third and resultant third cluster added with fourth and so on.Similarly clusters obtained by augmented KMCG are merged and shown in "Fig.7".

B. Seed vector and exploration of clusters with horizontal and vertical sequencing
Images are divided horizontally and vertically into blocks and sequence of these blocks are used as vector sequence number (VSN) and added to the first column of the training set.After making eight clusters for these training sets, first cluster has been sorted by its vector sequence number (i.e.first column) and accordingly vectors are shuffled.Further median is acquired from the first column of the training set and vector which contains this median value is considered as seed vector as shown in "Fig.3".In most of the ultrasound images, gray level distribution is inhomogeneous but the pixels with lower gray value are concentrated at region of interest.Therefore after clustering, usually these gray levels are components of the first cluster in the form of vectors.Due to these characteristics of image, mostly seed vector obtained from the first cluster falls in the region of interest Therefore sequence number of the seed vector is used to gather neighboring vectors in the cluster.All the vectors in the cluster are searched with respect to sequence values of seed vector.Searching is done in both, right hand side and left hand side of the seed vector.Since cluster is in sorted order with respect to sequence number, it searches vectors only for consecutive sequence number, if any interlude occurs in between, it stops searching and marked all searched vectors as found and this vector line is considered as seed vector line as shown in "Fig.3".Obtained new seed vector by adding sequence number of the first seed vector with number of vectors present horizontally in the row (i.e.X/2).To explore other vectors in the region of interest, first this new vector is checked whether it is present or not in the cluster by using equation 1.If this new vector is present in the cluster then second line has been explored, this process is repeated and lines are explored and lower region is grown from the region of interest.Same procedure is repeated to explore the upper part of the original seed vector line, only difference is, instead of addition, subtraction is used to obtained new seed vector (sequence number of the original seed vector is subtracted from number of vectors present horizontally in the row).Furthermore all marked vectors from these clusters are preserved and other vectors are removed.As shown in the "Fig.3" vectors which are representing noise are removed from the cluster, since they are not the neighbors of the seed vectors and this new cluster is converted to the original size image as shown in the "Fig.8 (a)".Similar procedure is followed to explore the cluster with vertical sequencing, except addition and subtraction of value (i.e.Y/2) is done with the vector sequence number every time to get the new seed vector.With addition of this value, right www.ijacsa.thesai.orghand side region is explored and with subtraction left hand side of the region is explored.This exploration has been stop similarly when newly generated vector does not present in the cluster.Cluster image for vertical indexing is shown in "Fig.10 (a)".Same procedure is applied for second merged cluster and third merged cluster after closing the opening as per discussed in the section IV C and cluster images are formed.

C. Closing the opening of clusters
As discussed in the section IV-A, clusters are merged together one-by-one and sets of eight merged clusters are obtained for both horizontal and vertical sequencing.From this set desired merged cluster has been selected for closing the opening (Break in the sequence number of vector as shown in the "Fig.3".These vectors are not present in the cluster but they are part of region of interest).Here in this paper we proposed new technique to close the opening directly on cluster rather on cluster images.Here we select the certain threshold value, which indicates the number of vectors.Then selected cluster has been sorted with respect to its sequence number.This sorted cluster has been traversed sequentially from the first vector, if any consecutive interlude occur with respect to sequence number and it is less than or equal to threshold value then that many number of vector are added to the cluster with zero gray level.Sequence number of first newly added vector is stared from sequence number of the vector where the interlude has started plus one.After closing the opening of these clusters, they are converted into images of original size as shown in "Fig.8(c)" and "Fig.10(c)".for horizontal and vertical sequencing respectively.[24].Our results are compared with other method's results as shown in the "Fig.17", "Fig.21", "Fig.25" and "Fig.29" and best results are shown by red box drawn around the image.Marker controlled watershed transforms gives over segmentation and other method discussed in [24], boundary is not clear around the tumor.

VI. CONCLUSION
VQ based, KMCG and augmented KMCG algorithms are used for clustering and further it has been used to segment the ultrasound breast images.Here we used training set of size Mx5 and Mx6 for KMCG and augmented KMCG respectively.Since augmented KMCG algorithm requires sorting only once, it is computationally efficient than KMCG and on the basis of visual inspection by expert radiologist, it also having better segmentation results.Here in this paper we used vector indexing in the cluster formation process, which further helps to get the seed vector to grow the region of interest.Some images contains inhomogeneous texture within the tumor region, due to this openings may exist.This openings affects the accuracy of the segmentation, therefore we developed a new technique for closing.This technique is implemented directly on the clusters rather than cluster images.
Ultrasound images probably have smooth texture at the region of interest but course texture at the boundary of the normal tissue region and defected tissue region.Therefore tracing the boundary around the area of region is not remains a trivial task.Here in this paper we use US images with strong attenuation and high degree of noise without using any preprocessing task, because during preprocessing step, important information from the image could be lost.Our method focused on the computational efficiency as well as accuracy of the segmentation.Results are compared with the recently developed marker control watershed transform and our newly developed method [24].With the help of visual inspection and opinion of the expert radiologists, it is found that our results are improved and accurate

Fig. 1 .
Fig. 1.Clustering Using KMCG Algoritm (A) Entire Training Set Of Size M X K (K=4) Obtained From Image.(B) Clusters C1 And C2 Obtained After First Iteration W.R.T. First Column Shown By Arrow.(C) Four Cluster Ontainned From C1 And C2 W.R.T. Second Column.

Fig. 2 .
Fig. 2. (a) Original image, divided horizontaly into M nonoverlaping blocks shown by red boxes.i.e.B1, B2, …., BM and stored at Last four column of the respective vector.(b) Training set generated from Image shown in (a), where, first column indicates sequence number of the block used as Vector Sequence Number (VSN), Second column Indicates Average Gray Lavels (AGL) calculated from Image Gray Levels (IGL) shown in last four column.

Fig. 3 .
Fig. 3. Image Of First Cluster, Used To Obtain Seed Vector And Generate Seed Vector Line.This Seed Vector Further Used To Acquire Region Of Interest (ROI) By Vector Neighborhood With Vector Sequence Number.
Fig. 4. Eight Cluster Images Obtained Using Original KMCG Algorithm For Dimension K= 4. Cluster Images Obtained For Both Horizontal And Vertical Division Of Image Are Same.
Fig. 5. Eight Cluster Images Obtained Using Augmented KMCG Algorithm For Dimension K= 4. Cluster Images Obtained For Both Horizontal And Vertical Division Of Image Are Same.Fig.6.Eight Sequentialy Merged Clusters Images Obtained From Clusters Shown In "Fig.4" Using KMCG For Both Horizontal And Vertical Division