Bins Formation using CG based Partitioning of Histogram Modified Using Proposed Polynomial Transform ‘Y=2X-X 2 ’ for CBIR

This paper proposes a novel polynomial transform to modify the original histogram of the image to adjust the pixel density equally towards the high intensity levels so that uniform distribution of the pixels can be obtained and the image can be enhanced. We have shown the efficient use of this modified histogram for Content Based Image Retrieval. According to the CBIR system described in this paper each image is separated into R, G and B plane and for each plane a modified histogram is calculated. This modified histogram is partitioned into two parts by calculating the Center of gravity and using it 8 bins are formed on the basis of R, G and B values. These 8 bins are holding the count of pixels falling into particular range of intensity levels separated into two parts of the histogram. This count of pixels in 8 bins is used as feature vector of dimension 8 for comparison to facilitate the image retrieval process. Further these bins data is used to form the new variations of feature vectors ; Total (sum) and Mean of pixel intensities of all the pixels counted in each of the 8 bins. These feature vector variation has also produced good image retrieval. This paper compares the proposed system designed using the CG based partitioning of the original and histogram modified using the polynomial transform for formation of the 8 bins which are holding the Count of pixels and Total and Mean of intensities of these pixels. This CBIR system is tested using 200 query images from 20 different classes over database of 2000 BMP images. Query and database image feature vectors are compared using three similarity measures namely Euclidean distance, Cosine Correlation distance and Absolute distance. Performance of the system is evaluated using three parameters PRCP (Precision Recall Cross-over Point), LSRR (Length of String to Retrieve all Relevant images) and Longest String


INTRODUCTION
Content Based Image Retrieval is the promising approach to search the desired images from the large image databases using the image contents like color, shape, texture and their representation as feature vectors in various other formats [1][2] [3].Digital images are the best and convenient media for describing and storing spatial, temporal, spectral and physical components of information from various domains (e.g.satellite images, biomedical images).Today's advanced technology made it easy to capture and store the large no of images.These images can be made available to the users for many fields like scientific, educational, medical, industrial, and other applications.To manage and make efficient use of these large image databases is an important issue [4] [5].CBIR is one of the important area to be studied today to overcome the drawback of text based image retrieval techniques [6] [7].It is vast area for researcher's to find out new approaches to retrieve the similar images from database with very good accuracy and less computational complexity [8] [9][10] [11].We have taken a step towards the same problem with a new approach to extract the image features from the spatial domain.There are various techniques based on histograms are used for CBIR [12][13][14][15][16].
Color histogram depicts color distribution using a set of bins.Using the global color histogram an image will be encoded with its color histogram, and the distance between two images will be determined by the distance between their color histograms.This increases the size of the feature vector and also the time required to calculate the distance between these histograms.Size of the feature vector decides the time required for comparing feature vectors for retrieval.It is an important factor to be considered while designing an efficient CBIR system [17] [18] [19].In our system we have solve this issue by exploring the new technique to form the bins so that color details of the image will be separated properly, feature vector size can be reduced and the comparison will take less time.We have first modified the image by modifying the image histogram using the new polynomial transform introduced in this paper.Then to get the uniform pixel distribution we have used CG i.e Center of Gravity to divide the histogram in two equal parts.This process is applied separately to each R, G and B plane of the image for extracting its features.This partitioning will lead to the formation of 8 bins.After preparing the feature vector databases for 2000 BMP image database we have tested the system performance using 200 query images .i.e 10 images from each of the 20 classes of database [20].The comparison between query and database is images performed using Euclidean, Cosine Correlation and Absolute distance measures.Performance of the system is evaluated using three parameters PRCP, LSRR www.ijacsa.thesai.organd Longest String [21] [22].Organization of the paper is as follows: Section II describes the polynomial transform introduced to modify the histogram.Section III elaborates on the partitioning process using CG and formation of 8 bins.Results are discussed in detail in Section IV followed by conclusions in Section V.

II. POLYNOMIAL TRANSFORM TO MODIFY THE HISTOGRAM
The polynomial transform equation ( 1) designed to modify the histogram such that if the image has large pixels having low intensity levels, they will be shifted towards high level intensities.This adds good enhancement in the image.The polynomial equation is given as follows.
  (1) As shown in the following Figure .1.The X are transformed to Y such that Y are always greater than X for all values in range 0 to 1; i.e. for 0 < x < 1 We have used this transform to modify the original image histogram of the R, G and B planes of the images.In above fig we can see that y=x in the blue curve, shifting of pixels from lower side to higher side can be observed in the red curve which is obtained for the given polynomial function.

A. Modified Histograms: R, G and B Planes
The image in feature extraction process is separated into R, G and B planes.Histogram of each plane is obtained and modified using the given polynomial transform.The effect of polynomial transform is reflected in the Figure.Here the image is sperated into R, G and B Planes and orginal histogram of each plane is obtained.These planes are modified by modifying their histograms after applying the polynomial transform.We can observe in original histogram, that the pixels from low level intensities are shifted to high intensities in the modified histogram.The information can be seen clearly in the modified image planes.

B. Partitioning
Modified Histogram of three planes are partitioned into two parts by calculating the Center of Gravity (CG) using equation 2. Center of gravity finds out the point in the scale of intensity levels such that the moments of image intensities above and below CG are equal.It is the balancing point which exactly divides the histogram into two parts.
Where L i is intensity Level and W i is no of pixels at L i Once the modified histogram is divided into two parts we could form the bins as explained in the following section.

C. Bins Formation: Feature vector Generation
The partitions of histogram are named with id's '0' for intensities below CG and '1' for above CG.In the process of formation of bins we have followed the following sequence of steps: 1. Separate the image into planes : R, G and B 2. Calculate the histogram of each plane , and modify it using the polynomial transform given in equation 1.

3.
Partitioned the modified histogram using the CG given in equation 2. 4. Assign id 0 and 1 to two parts of each plane as explained above.5. Now consider the pixel from the image for which the feature vector is being extracted, check its R, G and B values whether they fall in part 0 or 1 of their respective histogram.For e.g. if the R value falls in part '0', G in part '1' and B falls in part '0' then flag of that pixel is set with these three values '0 1 0' which determines the address of that pixel to be pushed into that bin, i.e. here the pixel will be counted in 'Bin 2'.As we have three planes and each is divided in two parts we could form 8 bins out of it , i.e Bin 000 to 111 ( Total 8 Bins).

Same process is applied to each pixel and its bin address
where that pixel has to reside will be calculated.Total 16384 pixels of an image under feature extraction process (image size is 128 x 128) are segregated by taking their count into 8 bins from bin addresses 000 to 111.Applying the same process explained above to all database images; we have extracted three features of each image as Count of pixels, Total of intensities and Average of intensities and we have prepared the feature vector database separately for each type of feature vector.We have total 7 feature vector databases prepared each having feature vector of dimension 8.
Count of Pixels : One feature vector database for 2000 database images.

Total of Intensities: R, G and B -Three feature vector databases
Average of Intensities: R, G and B -Three feature vector databases.

IV. RESULTS AND DISCUSSION
Once the feature vector databases are prepared a query can be fired to the system.Working scenario of the system ready to face the query and generate the retrieval result is explained through the following block diagram shown in Figure .4 Once the query image is fired it will proceed through all different stages shown in the figure, its feature will be extracted and will be compared with the database image feature vectors by means of similarity measure.

A. Database and Query Image
We have experimented this system for 2000 BMP images from 20 different classes each having 100 images: : Flower, Sunset, Mountain, Building, Bus, Dinosaur, Elephant, Barbie, Mickey, Horse, Kingfisher, Dove, Crow, Rainbow rose, Pyramid, Plate, Car, Trees, Ship and waterfall.We have randomly selected 10 sample query images from each class.Means the system is tested and verified using 200 query images.

B. Application of Similarity Measure
Once the query image enters in the system, feature vector for the same will be extracted and the distance between the query and database images will be calculated to find out the match between similar images.To accomplish this task the system is designed to use three different similarity measures namely Euclidean, Absolute and Cosine correlation distance given in equation 3, 4 and 5 [23] [24].Three result sets are obtained for each query for each type of feature vector separately.

C. Retrieval Results and Performance Evaluation[25][26]
On application of three similarity measures three distance sets are obtained for the given query, these distances are the sorted in ascending order from min to max distance.Now, we have considered first100 images with respect to first 100 minimum sorted distances.Out of these 100 we have taken images relevant to query as a retrieval result; as we have 100 images of each class in database, this results are termed as PRCP value i.

'R' OR 'G' OR 'B'
Feature Vector Databases: 2000 DB Images

Average of Intensities
Green n Blue www.ijacsa.thesai.orgpixels' in Table 1.We can observe that modified histogram based bins are producing very good results as compared to original histograms for all three similarity measures.
Euclidean Distance : Where D(n) and Q(n) are Database and Query feature Vectors resp.
( Chart 1 and 2 are showing the results for the other types of feature vectors that are Total and Average of R, G and B intensities in 8 bins separately.These charts are showing the result as total PRCP obtained for 200 query images.Here we found for all three colors and all distance measures modified histogram is performing better as compared to original histogram.If we take average PRCP result of 200 queries we found it in between 0.3 to 0.4.To improve these results further, we have applied 'OR' Criterion to the results obtained for R, G and B colors separately [22] [23].
Application of OR criterion combines the results of R, G and B and produces single result which has achieved good height of retrieval for parameter PRCP.It has reached to 0.5 as average for 200 query images.
Results after OR criterion are shown in Charts 3 for Total and Average of intensities Observing the chart 3 obtained for criterion OR over R, G B results of Total and Average of intensities, we can say that the PRCP is reached to good height that is from 0.35 to 0.5 for average results of 200 queries.One more observation is that among three distances CD, ED and AD CD and AD are performing better as compared to ED at many places.
Between two feature vector types i.e Total and Average of intensities feature vector of 8 bins containing 'Average of intensities performing far better as compared to Total of intensities.

Chart I
Comparing the results of original histogram based bins with modified histogram bins, we found that before applying OR criterion, results obtained for both types of feature vectors (Total and Average ) with respect to all three distance measures (CD, ED and AD) and three colors R, G and B bins formed using modified histogram are always betters in all the cases.

D. Performance Evaluation: Longest String and LSRR[21]
Longest string parameter identifies the longest continuous string of images relevant to query from all database images sorted to retrieve according to their distances with query which are sorted in ascending order (minimum to maximum distance).Longest string results are obtained for each of the 200 queries separately for all types of feature vectors, for all three distance measures and for three colors R, G and B.
Here we have taken the 'maximum longest string' out of this whole result set obtained for each type of feature vector and distance measure irrespective of the three colors R, G and B and are shown in the following charts 4, 5 and 6 for 'Count' of pixels, 'Total' of intensities and 'Average' of intensities respectively.
In chart 4, 5 and 6 we observed that CD and AD are performing better as compared ED measure.Between CD and AD, AD is producing good results at maximum places.When we look at the count of longest string we found 48 as max longest string for Dinosaur and Dove class for Count Pixel as a feature vector.Next we found 35 as best value for dinosaur class for total of intensities and for the feature vector Average of intensities we found best result for longest string as 89 for Barbie class.Overall observation says that feature vector formed using 8 bins holding the average of pixel intensities is giving best performance for all classes and for all distance measures as compared to other two feature vectors.
The next parameter we used for evaluating the performance of the system is LSRR i.e 'Length of the String to Retrieve all Relevant' images.Retrieval of all images of query class from the database is indication of the parameter recall reached to 1 which is the ideal value of recall for CBIR.According to this we are evaluating the performance of our algorithms on the basis of the length of the sorted distances to be traversed by the system to retrieve all relevant images from database.We expect the LSRR should be as low as possible.
The LSRR parameter is applied to check performance of the system for all feature vector databases using the same set of 200 queries applied to system.Results obtained are analyzed for Count of pixels, Total of intensities and Average of intensities as shown in charts 7, 8 and 9 respectively.Observing these graphs 7, 8 and 9 we can say that all the results shown in graph are traveling less than 100% length of string of images to retrieve all relevant.For count of pixels feature vector out of 20 query classes, for 15 classes we obtained the LSSR between 10% to 70%, for dove Crow and Tree it is in range 10% to 40% LSRR to make recall 1.Among these results best LSRR obtained is for classes Barbie and Dinosaur in range 10% to 20% only.For feature vector type 'Total of intensities', 15 classes we obtained the LSRR in range 10% to 70%, classes Flower, Sunset, Dinosaur, Barbie and Horses obtained LSRR in range 10 to 40%.Best results for this feature vector are again for class Dinosaur and Barbie in range 10% to 15%.
For next type feature i.e Average of intensities we got LSRR in range 10% to 70% for 15 classes.Flower, Bus and Barbie class results are in range 10% to 40%.The best results here we could obtained is for class Barbie, only 10% traversal of sorted distances gives us the 100% recall of images similar to query which is very good and desirable performance of any CBIR system.

V. CONCLUSION
The proposed CBIR system gives new feature extraction method into 8 Bins formed using CG based partitioning of the histogram which is modified using the new polynomial transform introduced as 'Y= 2X-X 2' .
The system's performance tested using 200 query images from 20 classes fired over all different feature vector databases formed using 'Original' and 'Modified' histograms and comparisons are made successfully using three similarity measures CD, ED and AD where each of them has yield different sets of results.After the detail analysis of the results presented and discussed in section IV i.e Results and Discussion.Few conclusions can be made which are given as follows.The new polynomial transform shifts the pixels form lower intensities to higher side and gives the new image of modified histogram which generates enhanced image.After this the image details can be seen clearly and used effectively for the feature extraction.Modifed histogram based feature vectors have produced better results as compared to the original histogram based results.The CG can be used effectively to partition the histograms in two equal parts which produces the 8 bins to form the feature vectors.
Three types of features are extracted into 8 bins by representing the R, G, B intensities in three different forms Count of pixels, Total of intensities( R, G and B Separately), and Average of intensities( R, G and B Separately), for both Original and Modified histogram.Among this Feature vector type Count of pixels and Average of intensities produced good retrieval as compared to Total of intensities.
Among the three similarity measures we found CD and AD are producing very good results as compared to ED.We observed in the analysis of results that the images of query class are shown at larger distance using ED measure; which are shown at smaller distance using CD measure.
The results are evaluated using parameters PRCP, Longest String and LSRR.PRCP gives delineates the performance point where the system generates the results such that the precision and recall both are at same level.According to conventional parameters precision and recall if both are closer to 1 indicating that system is the ideal CBIR system.The Average of 200 queries, after applying OR criterion we could achieve good height for PRCP to 0.5 for CD and for Modified histogram which is high as compared to original histogram result.
LSRR best value among all results is for 10% for Barbie class for modified and original histogram for feature vector Average of intensities.For 'Longest String' we got best result as 89 for class Barbie.Observing the PRCP, LSRR and Longest String results we can say that for each feature vector with different similarity measure we got different set of results which are covering different image classes each time as the best result.It indicates that, the variation in representing the 'image contents' and variation in the comparison process by changing the distance measures the proposed system is generating positive variations in the results by giving good results for different categories for each change either of feature vector, similarity measure or contents of Modified histogram.
Overall performance of the system is compared with respect to all evaluation parameters for original and modified histogram; we can conclude that histogram modified using polynomial transform function gives better results in all cases as compared to original histogram.

Figure 2 .
Figure 2. Kingfisher Image : R, G, B Original and Modifed Histograms

7 .
Feature vector: 'Count of pixels', 'Total Intensities', and 'Average intensities'.Here these set of 8 bins holding the count of pixels for one image is considered as feature vector of dimension 8 representing that image.Further these bins are directed to calculate the sum or Total and Average of Red, Green and Blue intensities of the pixel counts in each of the 8 bins.The 8 bins of red, green and blue color are maintained separately for each image and these are considered as our feature vectors of the image.

Figure 4 .
Figure 4. Block Diagram of the proposed system ready to accept Query and Produce the Retrieval Result e. Precision recall Cross over Point.The values obtained for PRCP are shown for feature vector type 'Count of : R, G and B 'Average': R, G and B Application of similarity Measures: Euclidean Distance (ED) Cosine Correlation Distance (CD) Absolute Distance (AD) Retrieval Results: Count, Total and Average R, G and B Apply OR Criterion to separate results of R, G, B. or Total and Average Feature vectors Red Color Results

Chart 1 .
Total PRCP for 200 queries fired on database of Total of Intensities feature vector for R, G and B colors with CD, ED and AD RED GREEN BLUE RED GREEN BLUE RED GREEN Histogram gives better performance in 7 out of 9. www.ijacsa.thesai.orgChart 2. Total PRCP for 200 queries fired on database of Average of Intensities feature vector for R, G and B colors with CD, ED and AD.Histogram gives better performance in all cases.Chart 3. Total PRCP after application of 'OR' Criterion over PRCP results obtained for R, G, B Colors with CD, ED and AD Histogram gives better performance in 2 cases.

Chart 4 .
Longest String for Count of Pixels With CD, ED, AD for MOD and ORG Histograms Remark: Modified Histogram gives better performance in 16 out of 20.

TABLE I
Remark: Modified Histogram gives better performance in all cases with similarity measures CD (14), ED (12) and AD (13) Out of 20.

Remark: Modified Histogram gives better performance in 12 out of 20.
ORG Longest COUNT CD MOD Longest COUNT ED ORG Longest COUNT ED MOD Longest COUNT AD ORG Longest COUNT AD MOD Longest www.ijacsa.thesai.orgChart 5. Longest String Total Of Intensities With CD, ED, AD for MOD and ORG Histograms Chart 6. Longest String Average of intensities With CD, ED, AD for MOD and ORG Histograms

Remark: Modified Histogram gives better performance in 15 out of 20.
Chart 7. % LSRR for Count of Pixels With CD, ED, AD for MOD and ORG Histograms

COUNT CD ORG LSRR COUNT CD MOD LSRR COUNT ED ORG LSRR COUNT ED MOD LSRR COUNT AD ORG LSRR COUNT AD MOD LSRR www
.ijacsa.thesai.orgChart 8. % LSRR for Total of intensities With CD, ED, AD for MOD and ORG Histograms

Remark: Modified Histogram gives better performance in 10 out of 20.
Chart 9. % LSRR for Average of intensities With CD, ED, AD for MOD and ORG Histograms