Object Detection using Template and HOG Feature Matching

In the present era, the applications of computer vision is increasing day by day. Computer vision is related to the automatic recognition, exploration and extraction of the necessary information from a particular image or a group of image sets. This paper addresses the method to detect the desired object from an image. Usually, a template of the desired object is used in detection through a matching technique named Template Matching. But it works well when the template image is cropped from the original one, which is not always invariant due to various transformations in the test images. To cope with this difficulty and to develop a generalized approach, we investigate in detail another technique which is known as HOG (Histogram of Oriented Gradient) approach. In HOG, the image is divided into overlapping blocks of template size and then compare each block’s normalized HOG with the normalized HOG of the template to find the best match of the object. We perform experiments with a large number of images and have found satisfactory performance. Keywords—Computer vision; template matching; HOG; feature extraction


I. INTRODUCTION
In order to reduce human sufferings and to speed up a specific task with more precision, we need to train a machine such that it can perform a specific task without any human interaction. The human eye can easily identify an object from an image; however, it is difficult for a machine to recognize the objects from an image automatically. If we want to recognize an object by a machine itself, we need to train the machine with an efficient object detection algorithm [1]. Therefore, the main objective of this research is to find out a better algorithm for machines to recognize objects in a scene. Here we are focusing on object detection using the template as well as HOG (Histogram of Oriented Gradient) feature-based techniques. The most widely used computer vision-based technologies are needed to resolve the problems of object matching and recognition in the field of image processing and analysis [2]. For any vision-based image processing application, object detection is the most integrated part [3]. An efficient object detection technique enables us to determine the presence of our desired object from a random scene, regardless of object's scaling and rotation, changes in camera orientation, and changes in the types of illumination. Template matching is one of the approaches of great interest in current times which has become a revolution in computer vision. Another widely used approach of object detection is HOG where matching of extracted features is carried out.
Over the past few years, researchers have come up with new and widely used techniques for identifying and tracking objects. Among them, general Template Matching and HOG are widely used techniques. This study focuses on the correct detection of desired objects using template-based methods. This paper uses HOG technique to find the desired object from a testing cluster image using a patch of a template image. We apply HOG based object detection method on images, observe the results and find the advantages and limitations of the technique. We also overview a comparison results of simple Template Matching and HOG based object detection method.
The rest of this paper is embodied as follows. Section II represents the related works of object detection techniques. Section III describes the small description of the methods named HOG feature and Template Matching. This chapter also includes the methods' flow diagrams and the corresponding algorithms for the methods. Experimental results and analysis of the performance based on some criteria are included in Section IV. Section V presents concluding remarks about the research.

II. LITERATURE REVIEW
Al-Mamun and Yousef investigated different types of methods for the segmentation of images and useful feature extraction from the images. They also proposed a model for the identification of flexible desired objects from an image www.ijacsa.thesai.org with asymmetrical shapes [4]. Two feature detection techniques like Scale Invariant Feature Transform (SIFT) and Speeded UP Robust Features (SURF) were used for image registration [5,6]. SIFT could detect more number of features but its' speed was not remarkable. But SURF showed better results in the case of both speed as well as performance [5]. A variant of a conventional HOG feature named Edge-HOG was proposed by Ren and Li to detect pedestrian and car [7]. Experimental results indicated that Edge-HOG was two times faster in speed compared to the conventional HOG. Chetan in [8] adjoined two external features like the shape and color of the object to detect an object swiftly and with a comparatively accurate detection rate. The results of the paper pointed out that the performance of the proposed method was comparable to other methods. In paper [9], researchers compared four widely used feature detection techniques namely SURF, Harris, FAST (Features from Accelerated Segment) and FREAK (Fast Retina Keypoint) in terms of accuracy and run time. The paper concluded that FREAK algorithm outperformed the other algorithms based on detection accuracy and time complexity. Multiple same instances were detected from a single template image using a new approach which was based on SIFT method but the scale and rotation invariant method named SURF was also used to extract the rotation and scaling of the features [10]. A method was proposed in [11] to detect moving objects from a video. In that paper background subtraction was used for object detection and two methods namely thresholding and edge detection were used for segmentation. Based on Peak Signal to Noise Ratio (PSNR) value, it was experimented that background subtraction showed better performance compared to thresholding. In [12], Nazil Perveen described various types of template matching techniques and emphasized on the applications of those techniques. Haar Cascade Classifier was used to identify human head by considering head image as an object in [13].
Most of the above papers used a feature-based object detection technique to detect a specific type of object like human, car, weapon, etc. This paper offers generalized techniques to identify the desired object from a source image. Again, none of the above researchers compared the prominent approaches, such as simple Template Matching and HOG feature-based method to compare the performance of these techniques in object detection. The paper includes a comparative discussion of HOG and Template Matching to present the drawbacks and advantages of these techniques.

A. HOG Feature-based Method
HOG is a feature descriptor based approach that focuses on the extraction of features. In the feature descriptor of an image, only useful information is extracted and unnecessary information is thrown out. In HOG method the whole image is partitioned into small blocks and then the feature descriptor is completed for each block [14]. After extracting the useful features of each block of the image, those blocks are grouped together and then normalized to obtain contrast normalized features.
HOG feature extraction process: First, convert color images to grayscale. This process minimizes the color information. Next, calculate the value of each pixel's luminance gradient. Then, generate a gradient orientation histogram for each block. This process can get the feature quantity that is powerful in changing the form. Finally, the normalize value of the features is obtained for each block.
The stages of HOG feature extraction and overview of HOG method are depicted in Fig. 1 and Fig. 2, respectively.
Gradient image for a HOG descriptor can be represented in several color models like RGB (Red, Green, Blue), LAB (Color representation where L denotes lightness and A, B represent the color-opponent dimensions) and gamma. As most of the cameras would be RGB cameras, we can consider the input image to be RGB image which can be later converted into a gray scale image for processing.  (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 7, 2020 235 | P a g e www.ijacsa.thesai.org

B. Simple Template Matching Method
Template matching is a popular computer vision based image processing technique used to find out desired object from an input image. This method uses an image patch according to a specific feature of the search image, which we need to identify.The template matching uses a matching criterion to determine the position of an object and calculates a correlation coefficient. Template matching measures the similarity between the template image and the overlapped portion of the original image [15]. This similarity measurement is known as cross correlation. The cross correlation value will be greatest at locations where the input image matches the template patch or mask image [16].

A. Experimental Tool
The experiment is performed in MATLAB environment with a number of images.

B. Object Detection using HOG based Method
The experiment is applied to a large number of colored and grayscale images. Here, the paper presents experimental results for some images only. The main motivation of this experiment is to identify the correct location of the template object from the desired test image. For object detection from an image, we have used a template image. For result illustration, we have only used total six template images shown in Fig. 4. A patch (a group of pixels) is taken from a template for detection purpose. The program is run by changing the size of the template patch (128×128, 64×64, 32×32, etc.) and the results for the variants of patch size are observed. We also use the full template image but we do not get the desirable results.
For the detection of objects from a test image, template images have been used. Fig. 4(a), (b), (c), (d), (e) and (f) represent the template images for car, key, elephant, medicine box, staple remover and book, respectively. Fig. 5(a), (b), (c), (d), (e) and (f) show the detection results for the template images of Fig. 4(a), (b), (c), (d), (e) and (f), respectively. It is seen that car, key, elephant and medicine box are detected correctly but staple remover and book objects are detected incorrectly.
The result of detection using HOG features: True Positive= 4 (Correctly finding the location of the desired objects) False Positive=2 (Incorrectly detection of template objects) From Table I, it is observed that 128×128 size of the patch for all the template images is suitable for the correct detection of objects. The size of the template patch 128×128 is divided into 16×16 blocks where each block is of 8 pixels. From this experiment, it can be said that the suitable patch size depends on the original size of the template and the patch size must be smaller than template size to detect the objects accurately.

C. Object Detection using Template Matching Method
In the template matching method, this paper uses the cropped image of objects as a template image. Template matching cannot detect objects when we take the arbitrary size of the template image. Template of arbitrary sizes (e.g. 32×32, 64×64 and 128×128 etc.) are applied but it is observed that the investigated Template Matching system is unable to detect the image. In Fig. 6 we use the original template image and observe that the object is not detected. But in Fig. 7, all the objects are detected correctly. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 7, 2020 236 | P a g e www.ijacsa.thesai.org    Fig. 8, it is seen ghaphically that in the detection of the image, template matching elapses more time than the elapsing time of HOG method for each of the object class.
V. CONCLUSION The paper addresses the issue of the detection of the desired objects from a test image using two methods named HOG feature-based method and template matching method. In this paper, the performance of these methods has been analyzed using several images. The research is tested on a sample of six images from the database of 20 images for different types of objects. Template Matching uses simply pixel-based cross-correlation matches which is easy to implement. It only works well when the template image is cropped from the original image otherwise it shows poor results in object detection. HOG method uses feature descriptors to detect images. In this method, the crosscorrelation between the template and feature descriptor is used. It requires less time to run the detection process. Now, it is concluded that the implementation of HOG based method is comparatively complex but it shows optimum results based on both detection accuracy and elapsed time.