Extraction of Point-of-Interest in Multispectral Images for Face Recognition

Security systems in companies, airports, enterprises, etc. face numerous challenges. Among the major ones there is objects or face recognition. The problem with the robustness of recognition systems that usually affects color images nowadays can be addressed by multispectral image acquisition in the near infrared range with cameras equipped with new high performance sensors able to take images in dark or uncontrolled environments with much more accuracy. Multispectral CMOS (Complementary Metal Oxide Semi-conductor) sensors in a single shot record several wavelengths that are isolated and allow very specific analyses. They are equipped with new acquisition methods and provide observations that are more accurate. The current generation of these imaging sensors involve scientific and technical interest because they provide much more information than those that operate in visible range; precise nature and spatiotemporal evolution of the areas need to be analyzed. In this study, multispectral images acquired by camera equipped with a hybrid sensor operating in near infrared has been used. This camera is built in the ImViA laboratory of the University of Bourgogne as part of the European project EXIST (EXtended Image Sensing Technologies). The process involved in image acquisition, image mosaicing and image demosaicing by using mosaic filters. After acquisition process the interest points be extract in these bands of images in order to know how information is shared out all over the bands. The results were satisfactory because information is spread all over the images bands and the algorithms used also have detected many interest points. Based on the results, a large database can be set up for a face recognition system building. Keywords—Multispectral image; hybrid sensor; image mosaicing; image demosaicing; mosaic filter


I. INTRODUCTION
Nowadays, in almost every sector, security and attack problems have become a crucial challenge. Biometric imaging systems are appearing as a promising solution to increase levels of security. These biometric systems are mostly based on grayscale images, color images and spectral reflectance. But these systems still face tremendous difficulties when recognizing objects or faces.
In fact, conventional digital color cameras that generally operate in the visible spectrum seem to be limited in many situations where more information is needed and acquisition conditions are still difficult such as making acquisitions under a cloudy sky, while information beyond the visible range is required such as plants that emit in infra-red range, or when acquiring an image with more accuracy is needed, or when the calibration of the acquisition system is needed, or when making acquisitions in uncontrolled or dark environments is necessary, etc. Several studies have shown that images acquired in the visible spectrum present less information than those taken in Infra-red range [1], [2], [3], [4]. In addition Samuel ORTEGA et al. found that multispectral imaging technique able to obtain both spatial and spectral information within and beyond the human visual sensitivity, capture information regarding differents wavelengths [5]. To overcome some of these problems, MAMADOU Diarra et al, in their studies on multispectral images, have merged information from visible range and thermal infra-red to increase information in the image [6], [7], [8], [9], [10]. They also presented multispectral imaging and especially merging of information from the visible and infrared as a very promising alternative for image recognition. Moreover, Xingbo Wang et al. also found that for having more accuracy, it is necessary to make good choice of spectral characteristics of the camera's filters [11], [12], [4]. Their results show that the filter bandwidth had an influence on the accuracy of the reflectance estimation. However, multispectral imaging with cameras equipped with hybrid sensors, operating in the field of Near infra-red are much more efficient and can capture more information [1]. For example to verify that a fingerprint comes from a living finger and not a copy of that finger, it is obvious that the near infra-red range is the best fit since veins are visible through the skin in this area, Laura Rey-Barroso et al. introduced Near infra-red (NIR) multispectral imaging system to evaluate deeper skin layers thanks to higher penetration of photon at this wavelengths [13]. This hybrid system, integrated into a camera with dedicated hardware and software computations, allows a performance in real-time application with 30 fps. It also provides finer detail analysis in recognition systems.
In the context of the problems listed above, a camera equipped with a hybrid sensor has been proposed, in which an optimization of the spectral bands from 680 nm to 950 nm (NIR) has been set up as optimal bands [14]. This camera that captures images on eight bands allowed good resolution for images. These images were used in order to extract the characteristics for the recognition so that performances of recognition systems could be improved. Based on the results, a large database of images taken in the NIR can be set up.
In the following, the process consists of acquiring an image that will be mosaicked before being transmitted by the camera. Once we have the image from the camera, we proceed to the separation of the different spectral bands using binary masks (Fig. 2). After separation each spectral band contains only one spectral component. In order to get a complete image, these image bands have to be interpolated. This process is called demosaicking and it allows us to have complete image bands Fig. 3, Fig. 4 shows the entire acquisition process. After this last step, the interest points will be extracted in these image bands for tests.

A. Hybrid Sensor
In this work a camera equipped with a hybrid sensor was used. This sensor was integrated into camera with a dedicated hardware unit, allowing the operation in real-time applications with 30 fps. In order to provide an optimal solution for the loss of spatial resolution inherent to MSFA, specific algorithms have been developed for multispectral demosaicking. The CMOS sensor is the physical element whose performance impacts on the quality of the final system. This sensor has been chosen respect to several criteria: √ Minimun pixel size is 5µm;

√
The CMOS sensor resolution should be high enough to compensate the loss related to the MSFA system;

√
The spectral sensitivity of the sensor must be extended to the near infra-red.
Taking into account the specifications above, our choice fell on the viimagic 9220H sensor. This sensor was provided by Grass Valley. Some modifications have been introduced in order to improve the final sensor.
The advantage of using CMOS sensors is that its manufacturing is much cheaper than CCD (Charged coupled Device) sensors. Furthermore, CMOS sensors consume less energy [15]. The ease of access to pixels available in CMOS sensors allows great flexibility for real-time data processing. All the above mentioned advantages bring about smaller systems, lower power consumption and lower manufacturing cost [15], [16]. As a result, we have chosen to use CMOS sensors rather than CCD sensors [17], [18]. The result of the mounted hybrid sensor and it's spectral response are presented in Fig. 1.

B. Multispectral Images
A multispectral (MS) image is an image acquired by a sensor that operates in several spectral bands; it can be defined as an image where each pixel contains essentially information on the reflectance of the scene. It is represented by the matrix of pixels as follows: Where M j is the associated matrix of j th band of image.
Let I be a MS image, a pixel of the image is noted P (x, y), where x and y are the coordinates of the pixel P . Each pixel P is associated to a point I(x, y, k) defined in a K-dimensional space (K being the number of component), and I k (x,y) , k {1, 2, . . . , K} represents the value of each component. Therefore, for a multispectral image one needs k components plans I k , k {1, 2, . . . , K}. In this study K = 8, called an 8-band multispectral image.

A. Mosaic Filters
Mosaic Filters are filters presented as a matrix where each filter is associated with a specific spectrum. These filters make it possible to divide finely the spectrum and thus to differentiate the bands. In this work, a set of 8 filters based on the principle of Fabry-Perot has been used. Table I illustrates the response of each of the eight filters. The resulting distribution of MSFA (Multispectral Filter Array) moxel is indicated in Table II.

B. Mosaic Images
Image Mosaicing is a technique that allows building an image by superimposing successive images by registration [16], [19]. It can therefore be defined as the process of assembling different images of the same scene to form a single image [20]. The aim of mosaic creation is to visualize a large area on a single image under perspective projection. One of its applications is the construction of large aerial and satellite images of small photographs collections [21].

C. Strips Extraction
There are many algorithms used for strips extraction [20], [22]. In this work we have chosen to multiply the mosaic image by different binary masks M k (x,y) , k {1, 2, ..., K} [20] which divides the mosaic image into K = 8 components. These masks have the value 1 at the positions where the pixel is available, and 0 at the other positions. Each component plan is obtained by multiplying the mosaic image term by term by the corresponding M k mask.
By multiplying the mosaic image with each mask, we obtain 8 uncorrelated image plans (Fig. 2) on which only one spectral component is available. Each mask corresponds to an image plan.

D. Multispectal Dematrixing by Bilinear Interpolation
After masks application, the resulting image plans contain only one spectral component. For complete image reconstruction, the missing pixels have to be interpolate. This process is called multispectral image demosaicing [23], [24], [18], [25], [3]. Bilinear interpolation [26], [16], [27], can be interpreted as a process of two linear interpolations, one in each direction. Linear interpolations can be made in several directions. P (i, j) being the missing pixel at the position (i, j), we have: • Diagonally: • Vertically: • Horizontally: The interpolation or demosaicing of a mosaic image is a method that estimates the missing pixel on different (chromatic) channal of the mosaic image. Several algorithms have been designed for image demosaicing [28], [29], [30]. The method used, consists of applying convolution filter H on each band of the image obtained [20]. This filter is fixed so that the contribution of the neighbors in the pixel estimation of missing level in this pixel depends on the spatial distance separating the neighbor from the central pixel. Given that the pixels have the same structure, the same filter as shown in Mihoubi's work [20] is used. Interpolated bands have been shown in Fig. 3. The acquisition process is depicted in Fig. 4.

E. Point of Interest
A point of interest in an image is an area of pixel having remarkable properties often expressed by abrupt changes in intensity. They are regions of the image rich in terms of local information content and stable under affine transformations and illumination variations. In an image there must be few points whose local descriptors are similar [31].

IV. EXTRACTION OF INTEREST POINTS
The feature extraction methods are based on Scale Invariant Feature Transform (SIFT). The SIFT detector [32] is the best known of the detectors. This method combines a detector with a descriptor. SIFT's point of interest detection is based on a DoG (Difference of Gaussian), and has several versions. These algorithms are used in several contexts as multispectral imaging, face recognition under different criteria so that the performance of such a feature extraction kernel be able to extract the parameters [33]. It should be remembered that the SIFT method is based on the determinant of the Hessian matrix.
where L xx (x, δ) is the convolution of the second order Differential of the Gaussian (DoG), which is the same for L xy (x, δ) and L yy (x, δ) to reduce the computation complexity of the determinant that uses the approximation of the wavelets of Haar.
By using the expression of the integral image : it can be deduced that: V. IMPACT OF THE WORK The acquisition with hybrid sensors is made to measure the accuracy and the response of the resulting optical filters, which can ensure the accuracy and quality of the obtained multispectral images. These Multispectral images of the hybrid sensor can be less good, because of the demosaicking that compute the neighboring pixels which sometimes generate approximations. But the hybrid sensors are adequate for making snapshot acquisitions in real time application and it use in the case of this work for detecting faces in real time. The multispectral images from a filter wheel camera are very good quality [34], no approximation in the calculations, however, it is impossible to make the detection in real time. With this new camera, a multispectral images database will be set up. When the database contains enough images, a Deeplearning solution will be proposed in future work, as many research projects are moving towards this solution. In 2019 Shaukat Hayat et al. [35], proposed to use deep CNN-based features for Hand-Drawn sketch recognition via Transfer Learning Approach. Xiang Wang et al. introduced also method of privacy-preserving face recognition [36] where the convolutional neural network is used for face feature extraction. Moreover Bogdan BELEAN et al. [37] use CNN (Convolutional Neural Network) for images segmentation.

VI. PRESENTATION OF DETECTORS AND DESCRIPTORS
Face or shape recognition techniques require some tools such as detectors and descriptors that are complementary tools of object recognition.

A. Detectors
Point-of-interest detection is a preliminary step in many computer vision processes. Detectors are used to isolate areas of interest in an image. For twenty years, several interest-point detectors have been developed. Schmid and Mohr compared the performance of several of these detectors. According to Schmid et al. [38], the most popular point-of-interest detector is the Harris detector [39]. The Harris corner detector was proposed by C. Harris and M. Stephens [40]. This easily detects the point of interest through a small window by moving this window in any direction. The Harris corner detection algorithm is performed by calculating the gradient of each pixel. Then, if the gradient values in the two directions are both large, the pixel is assumed to be a corner. Our experiences have been done using KAZE, Harris, ORB. KAZE, ORB, which are at the same time detectors and descriptors [41].
ORB (Oriented FAST and Rotated BRIEF) was introduced by Rublee et al. [42]. The Oriented Fast and Rotated Brief algorithm is based on the BRIEF keypoint descriptor and the FAST keypoint detector since both algorithms are computationally fast. It was presented in 2011 to provide a fast and efficient alternative to SIFT [43]. It is a variant of BRIEF to fill the lack of rotational invariance of it. The ORB method calculates a local orientation using an intensity centroid, which is defined as a weighted average of the pixel intensities in the local patch assumed not to coincide with the center of the entity.
The KAZE algorithm was developed in 2012 and it is in the public domain. The name comes from the Japanese word kaze which means wind and makes reference to the flow of air ruled by nonlinear processes on a large scale [44], [43]. For object recognition KAZE follows mainly the same steps as SIFT but with some differences in each step. KAZE algorithm [44], [45], instead of using DoG use AOS (Additive Operator Splitting) method and the Hessian matrix detector for blobs detection (DoH : Determinant of the Hessian) [43], [46].
where L xx (x, δ) is the convolution of the second order Differential of the Gaussian (DoG), which is the same for L xy (x, δ) and L yy (x, δ).

B. Descriptors
After detecting points of interest, descriptors are used to describe them. They analyze neighborhood of each point to produce a characteristic vector of the interest point area. This vector is called the descriptor vector and in our work this vector describes 64 features. The description vector associated with a point of interest is a set of values extracted from the image in the local neighborhood of the position of the detected point [47]. This work have utilized the detectors and binary feature descriptors in Table III that provide high performance and compact data representation [38], [39].  [48]. In its detection, it uses the AGAST (Adaptive and Generic Corner Detection Based on the Accelerated Segment Test) [49] which is an improved variant of FAST [50]. FREAK (Fast Retina Keypoint) is a binary descriptor proposed by Alahi et al. [51]. Like BRISK (Binary Robust Invariant Scalable Keypoints), this descriptor uses a sampling model and a compensation method orientation. This is a variant of BRISK improved using a selection of pairs of templates. FREAK organizes sampling points analogous to the structure of the biological retina. For the description of the point of interest, the tools used are weighted Gaussians, the motif functioning as the retina and an orientation assignment is made for the description.

VII. RESULTS
In this part, the results of different stages of this work will literally be presented: mosaicing of the image, the decomposition into 8 bands and the points of interest tests.

A. Mosaicing of the Image and Decompositions into 8 Bands
The image obtained after mosaicing is represented in Fig.  5. By applying different masks, mosaic image will be separated into 8 bands of images whose pixels contained a single color component, Fig. 2.
After the separation of the strips, these strips are demosaiced in order to attribute the rest of the color components to each pixel (Fig. 3). Fig. 6 representes a sample of final results on 8 bands after acquisition.

B. Test and Images Used
The images data base is set up with images taken by a camera equipped with a hybrid sensor that detect and acquire faces in real time. This camera takes images on 8 bands and can be used in real time applications. Most of the time, for multispectral images, some bands contain less information than others. But the particularity of our camera is that the information is roughly spread over all the bands of image.The interest points are detected on all the 8 bands of images for all the algorithms mentioned above. Since the recognition is done on the face only, we used the algorithm of Viola Jones [52] to crop the face before the detection of those points of interest. This algorithm allows detecting only regions of interest. On the resulting image, different algorithms for the detection and the description of the points of interest has been applied.The tests have been done using Matlab v2020a with a sample of 30 images and the results are almost the same on each image. 10 images is used in this paper (Fig. 7). A sample of interest points by ORB/ORB has been shown in Fig. 8.  Table IV.
Given the results in Table IV, the KAZE algorithm detects more points of interest than others and The results show that the points of interest are slightly more concentrated on the first bands for all the algorithms other than Harris algorithm which detects more interest points on the last bands. But in general, the information spread over all eight bands if the acquisition process has been successful.

C. Entropy Test
Entropy in an image, makes it possible to measure the quantity of information contained in the image. In this work it allowed us to confirm that, information is spread over all the 8 bands. This entropy is computed by the formula below: Where P i is the probability of each pixel occurrence.
Entropy tests have been done and the results are recorded in Table V and Fig. 9 represents the associated histogram.
www.ijacsa.thesai.org The results of Table IV allow to realize that when one used the pairs of detectors and descriptors Harris/FREAK, Harris/BRISK or Harris/KAZE, they did not detect enough interest points, especially on the first 6 bands. But on the seventh and eighth strips of some images, the number of interest points is quite enough. This phenomenon could be due to the lighting of the scene or the fact that this algorithm is not robust or suitable for these types of images. These algorithms have the worst performance in most cases in terms features detected. The algorithm ORB/ORB has correctly detected the points of interest on each band and each image. Therefore ORB/ORB demonstrates fairest precision with respect to the features, due to its performance one can say that it is better than the Harris/FREAK, Harris [56] find that the KAZE and AKAZE pairs perform better than other pairs. This algorithm has detected several points of interest on each image and on each strip. This proves that the KAZE/KAZE pair would be suitable for these multispectral images from the camera equipped with a hybrid sensor and operating in the near infrared range. One can notice that the interest points were almost spread on all the bands. For confirmation, we compute the entropy tests. This entropy tests in the Table V should show how the information in each image is distributed. These entropy results showed that for each image the information is almost roughly distributed over all the 8 bands except the 7th and eighth bands which detects less points of interest than the others. However, the results of the entropy tests confirm the interest points results.
Based on these results, a large database of images taken on 8 bands with this camera which operates in the NIR can be set up.

IX. CONCLUSION
Security challenges of information systems keep increasing. Researchers have proposed different approaches and techniques. One of them is biometric imaging system. In recent years, studies have shown the limitation of this approache. This study focuses on multispectral (MS) imaging, primarly the use of the camera equipped with a hybrid sensor. This MS camera used in this work was built with a hybrid sensor, a Multispectrsl Filter Array (MSFA) mounted on a CMOS sensor that provided the best resolution for mosaic image, due to the small moxel used and due to the size of filter pitch (5 × 5µm2). This new camera system operates in the field of near infra-red in order to improve the process of object or image recognition. This study looked at the performance of this multispectral camera built in ImViA laboratory at the University of Bourgogne by extracting the points of interests on the bands of the multispectral images acquired by this camera. It also have been shown how to transform the obtained row images directly from the camera to a multispectral image through different steps namely: mosaicking, interpolation or demosaicing. Different descriptors have been used to extract interest points and the results were satisfactory. KAZE descriptor was the best and should be used to build recognition systems. However this project did not take place without difficulties: the filter based on the principle of Febray-Perrot is penalized by its secondary response; to compensate for the loss of sensitivity beyond 850 nm, a complex structure of moxel (6 x 6 pixels) simulation have been adopted, but this solution leads to none homogenous distribution. Ultimately,a regular distribution of the pixels (4x4) in the moxel is kept. The images are so contrasted that some lest robust algorithms are not able to extract interest points.