High Performance Color Image Processing in Multicore CPU using MFC Multithreading

Image processing is an engineering field where stored image data is readily available for parallel processing. Basically data processing algorithms developed in sequential approach are not capable of harnessing the computing power of individual cores present in a single-chip multicore processor. To utilize the multicore processor efficiently on windows platform for color image processing applications, a lock-free multithreading approach was developed using Visual C++ with Microsoft Foundation Class (MFC) support. This approach distributes the image data processing task on multicore Central Processing Unit (CPU) without using parallel programming framework like Open Multi-Processing (OpenMP) and reduces the algorithm execution time. In image processing, each pixel is processed using same set of high-level instruction which is time consuming. Therefore to increase the processing speed of the algorithm in a multicore CPU, the entire image data is partitioned into equal blocks and copy of the algorithm is applied on each block using separate worker thread. In this paper, multithreaded color image processing algorithms namely contrast enhancement using fuzzy technique and edge detection were implemented. Both the algorithms were tested on an Intel Core i5 Quad-core processor for ten different images of varying pixel size and their performance results are presented. A maximum of 71% computing performance improvement and speedup of about 3.4 times over sequential approach was obtained for large-size images using four thread model. Keywords—Color image; fuzzy contrast intensification; edge detection; lock-free multithreading; MFC thread; block-data; multicore programming


I. INTRODUCTION
Machine vision systems used in various industrial applications are capable of capturing high resolution images and demands time efficient parallel data processing algorithms in real-time environment.To reduce the processing time of the algorithm on these images, parallel computing in multicore architecture is a well known approach [1].Different parallel programming libraries such as OpenMP and Message Passing Interface (MPI) are widely applied in the development of parallel image processing algorithms.The authors N.E.A. Khalid et al [2] have implemented parallel multicore sobel edge detection algorithm using MPI and observed that parallel processing performs better than sequential processing in terms of execution speed.Chen Lin et al [3] have proposed a parallel method to perform medical image registration using OpenMP and concluded that multithreading approach saves nearly half of the computing time.Alda Kika and Silvana Greca illustrated the development of multithreaded algorithms for contrast, brightness and steganography applications using Java package and tested their performance on different single-core and multicore processors [4].The authors concluded that the performance of the complex image processing algorithm on multicore CPU can be improved using multithreaded programming.
In our work, we studied the development of multithreaded C++ algorithms for processing low and high resolution color images on a multicore CPU without using parallel programming library and any other additional hardware.To ensure fine grain (data level) parallelism [5] and computation load balance of the algorithm in a multicore CPU, a lock free multithreaded block-data parallel approach is proposed.In this approach, the image data is shared equally among worker threads and each one manipulates its portion of data.
In VC++ programming, MFC library provides powerful threading Application Programming Interfaces (APIs) [6] for developing concurrent or multithreaded windows based software programs.Multithreaded color image processing algorithms namely contrast enhancement using fuzzy technique and edge detection were developed in Intel Pentium dual-core personal computer and tested on Intel Core i5 CPU.The algorithms were applied on ten selected color image samples of varying size and their execution results are presented.The performance results show that both the algorithms in four thread model attained a speedup of about 3.4 times compared with the sequential approach and saves nearly 71% of algorithm execution time.
The paper is arranged as follows: In section II, MFC multithreading and its application in high performance image processing is described.Section III explains the materials, methods and the color image processing techniques followed in this paper.The performance results of the thread model based parallel algorithms are discussed in section IV.The conclusion is given section V.

II. MULTITHREADED IMAGE PROCESSING USING MFC
MFC is a Microsoft's C++ class library for windows programming.It distinguishes two types of threads namely user interface thread and worker thread [7].The main use of worker thread is to perform background computation work and it is created by defining the task it should perform.This is done by the declaration of thread function according to the MFC definition.The call function AfxBeginThread() launches www.ijacsa.thesai.org the worker thread [8] and it accepts parameters, which includes thread function name, input to the thread, thread priority and few other required parameters.
In block-data parallel processing, image region is identified as several blocks of data.The source image data is partitioned vertically or horizontally into multiple large blocks with equal size [9,10].In our thread model based parallel approach, each thread exclusively performs image processing task on individual image data block as shown in the concurrency model Fig. 1.To maintain load balance within threads, it is good to consider the number of image blocks equal to number of worker threads [11].Since the image data is stored and accessed through global variables no message passing or explicit data access control is required between threads.This makes thread definition simple without data locking mechanism [12].In this lock free multithreaded approach, threads are free to read and process their portion of image data in a parallel manner, which efficiently reduces the data access time as well as the overall computation time of image processing algorithm.Thus the performance of multithreaded algorithm on a single-chip multicore processor can be fine tuned using shared image data variables [13].In the case of color image processing algorithm, three input and three output global variables were assigned to each color component (viz.red, green and blue) to enable image reading and processed data writing concurrently using multiple worker threads.
According to the worker thread priority, the operating system schedules each thread to an individual processing unit in a multicore CPU.Due to this scheduling mechanism, all threads do not finish at the same time, so in order to handle this thread completion task, event object is derived from CEvent MFC class.When the thread completes its processing task, the event object is triggered.Using WaitForSingleObject API, event object trigger is noted and worker thread completion is indicated to the primary main thread [6,14,15].As soon as all the threads complete their processing task, the results are cached and made available in the shared global variable.The synchronization between the main thread and different worker threads was established using event object as shown in Fig. 2.
In MFC multithreading, two threads cannot manipulate the same object because MFC objects are thread-safe only at the class level [16].Hence each thread requires separate objects of the same data structure to operate in a thread-safe manner.To ensure thread safety in the algorithm, each copy of thread parameter data structure is passed as input argument to the corresponding worker thread function.Each thread function uses call by reference method to access the global variables.When the thread calls a image processing function, the private variables declared within the function takes care of storing, processing the intermediate data and also ensures the algorithm execution in parallel manner.

A. Sample Images
A total of ten color images with different pixel size were used to evaluate the algorithm performance.All these images were randomly chosen from free online collection of natural scenes and photos.The pixel size of the images varies from 940x474 to 2880x1800.They are labeled as Image1, Image2….Image10.threadtwo.SetEvent(); return 0; } www.ijacsa.thesai.org

B. Hardware and Software
The entire coding for developing the multithreaded application software was carried out in Intel Pentium dual-core processor @ 2.8GHz on Windows XP operating system preloaded with Microsoft Visual Studio version 6.Using MFC library, multithreaded image processing software for contrast enhancement using fuzzy technique and edge detection has been developed in VC++.Separate menu buttons are provided in the software to load image and execute the algorithms.The developed algorithms were tested using the color image samples on an Intel Core i5-760 @ 2.80 GHz Quad-core CPU on 32 bit Windows 7 operating system with 4GB RAM.

C. Color Image Processing Algorithms a) Contrast enhancement using fuzzy technique
Image enhancement is a preprocessing technique usually employed to improve the brightness and contrast of the images.In color image enhancement, red, green and blue channels were processed separately and added together to produce composite color value.But this approach does not maintain the color balance in the image.To avoid this change in color information, YIQ color space was chosen, where Y represents the luminance information; I and Q together represent the chrominance information.This color space exploits certain characteristics of human-eye color response and improves the appearance of the color image in terms of human brightness perception.In this technique, the contrast enhancement using fuzzy intensification operator was applied only on luminance component; hence color information of the original image is preserved [17].
Steps involved in image contrast enhancement: 1) Convert RGB image in to YIQ color space [18].2) Perform fuzzification [19] on luminance component ' ' using the following expression.
Where and denote the exponential & the denominational fuzzifier, respectively and is called the fuzzy property plane of the image.Value of the can be set as 1 or 2. Value of is determined using the cross-over value with respect to fuzziness value 0.5.

4)
Enhanced luminance component ' ' is obtained using defuzzification defined as follows.

5) Convert YIQ color space to RGB image. 6) Contrast enhanced color image is obtained at the end. b) Edge detection
Edges in color images can be obtained by applying gray scale edge detection method to each of the RGB bands separately and then results were summed to produce composite value [21].Further thresholding was performed to get fine binary edges and a set of four Robinson compass masks used in this method are given below.

D. Multithreaded Block-data Parallel Approach Steps
1) Image block-data decomposition; the main thread splits the image data in to several blocks of equal size to maintain load balance [22].
2) Multiple worker threads are created in the main thread and size parameter of each block is passed as input to the worker thread.
3) Created worker threads are initiated with high priority level to avoid delay due to operating system scheduling.
4) Each worker thread applies its copy of sequential image processing algorithm on a particular image data portion.
5) Each worker thread uses their private copy of data structure for execution.6) Processed image data are stored in output variable.7) As shown in Fig. 3, main thread exits only when all worker threads complete their assigned task.

IV. RESULTS AND DISCUSSION
To measure the execution time of the parallel algorithms while processing the given image data in different threads, the

Exit
Thread N www.ijacsa.thesai.orgVC++ clock function was used.Using the execution time, the speedup and performance improvement (P.I) parameters of the algorithms were calculated.The speedup parameter measures how much a parallel algorithm is faster than a corresponding sequential approach [2].The P.I predicts the relative improvement due to parallel implementation over the sequential approach.The equations for computing the two parameter values are given below.

A. Results of Multithreaded Contrast Enhancement Algorithm
The developed contrast enhancement algorithm was applied on all the ten sample images and the test results were evaluated.The algorithm execution time for each image in sequential and multithreaded approach (for 2, 4 and 8 threads) was recorded in a data file.To determine the average execution time in both the approaches, the algorithm was executed five times successively on each image and the mean time was calculated.The speedup and performance improvement between sequential and four thread approaches was computed using Eq.4 and Eq.5 for images of different pixel size.The algorithm results viz., average execution time, speedup and performance improvement are shown in Table I.

Algorithm
To find the maximum possible number of threads needed to speed up the algorithm execution in the Intel Core i5 processor, eight thread approach was also attempted and the execution time results are included in Table I.It is found from the Table I, the execution time of four and eight threads are nearly same which infers that for a quad-core processor, minimum of four MFC thread is enough to achieve optimum execution time.
As seen from the tabulated results, the average execution time of the algorithm decreases with the number of threads, whereas the speedup and performance improvement goes up with increase in image size.In the four thread implementation, the speedup parameter varies from 2.53 to 3.43 times and the performance improvement variation is found to be between 60.47% and 70.89%.
The input image and processed color image outputs of contrast enhancement algorithm are shown in Fig. 4a, Fig. 4b & Fig. 4c.The processed results of sequential and multithreaded approach are looking similar.

B. Results of Multithreaded Edge Detection Algorithm
A similar approach as followed for the contrast enhancement algorithm was applied for the edge detection algorithm on all the ten color images and the results are presented in Table III.In four thread approach, the speedup varies from 2.82 to 3.44 times and the performance improvement achieved is between 64.50% and 70.94%.The input image and processed color image outputs of edge detection algorithm are shown in Fig. 5a, Fig. 5b & Fig. 5c.The processed image outputs of the algorithm are found to be similar.Thus the two multithreaded color image processing algorithms with different complexity levels were tested in Intel Core i5 processor and found that four thread approach utilized the quad-core CPU efficiently on Windows 7 platform.
V. CONCLUSION This work was carried out to explore the parallel processing ability of the multicore CPU in processing high resolution images using MFC multithreading.In this paper, a lock-free multithreaded block-data parallel approach based color image processing algorithms for fuzzy contrast enhancement and edge detection were developed using VC++ on windows platform without using any parallel programming library.The purpose of this implementation is to improve the performance and reduce the execution time of the image processing algorithms on multicore processor by partitioning the given image into equal blocks and processing each block of data in a parallel manner.In four thread approach, the algorithm speed is found to be about 3.4 times faster than the sequential approach.With regard to performance improvement, the thread model saves nearly 71% computation time compared to sequential implementation.No performance improvement and speedup is noted in processing nearly same size images of marginal difference in pixel size.The performance results indicate that multithreaded image processing algorithms efficiently utilize the computing capability of multicore CPU like Intel Corei5 processor.Hence the developed multicore programming approach using MFC thread can be applied to improve the performance of various color image processing algorithms.

TABLE II .
Individual Thread Execution Time for Four Thread Contrast Enhancement Algorithm (Image Size: 1920x1200) www.ijacsa.thesai.org

TABLE III .
Performance Results of Color Edge Detection Algorithm