Comparison of Four Demosaicing Methods for Facial Recognition Algorithms

—Multispectral imaging has become more important in several areas during this decade to overcome the limitations of color imaging. There are several types of multispectral acquisition systems, including single-shot cameras that incorporate Multispectral Filter Arrays (MSFA). MSFA is an extension of the color filter array. Acquisition systems that incorporate spectral filter arrays are very fast, lightweight, and able to acquire moving scenes. But these cameras are manufactured with at best software for filter positioning correction without demosaicing software. Hence there is a need to identify a suitable demosaicing algorithm in terms of image quality, computation time, and decorrelation factor. This paper presents a comparative study of four relevant demosaicing methods in the facial recognition process using images acquired with a single-shot MSFA camera designed in our laboratory. To achieve this goal, the four demosaicing methods named bilinear interpolation, discrete wavelet transform, binary tree, and median vector were adapted to multispectral images acquired using a MSFA camera. Evaluations were first performed using the NIQE performance metric and the correlation coefficient. Then Demosaced images were used to train VGG19 neural network to know which demosacing method better contains relevant features for recognition and better computation time. Results reveal that bilinear interpolation provides the less correlated images and the binary tree gives the best quality images with a NIQE of 8.99 and an accuracy of 100% for face recognition.


INTRODUCTION
Most of today's color cameras incorporate color Filter Array (CFA) or Bayer filters [1], [2].The color filter array contains three filters: green, red, and blue, each one responsible for acquiring the image in each spectral band.These cameras are very fast, displaying the acquired image in a matter of seconds.Despite this performance, facial recognition [3] with color cameras is affected by problems of light variations, occlusions, and pose variations [4], [5].Multispectral imaging corrects these problems, with more information available in the image bands.There are three types of multispectral acquisition systems: multi-camera systems, multi-sensors systems, and one-shot MSFA systems.The first two types are very slow, heavy, and consume more energy.One-shot MSFA cameras help to overcome the problems associated with the other two types.The one-shot MSFA cameras are equipped with MSFA which is an extension of CFA [6]- [8].The MSFA includes more than three filters.Each filter is responsible for the acquisition of the image on a given wavelength.The acquisition systems that incorporate the multispectral filters array allow the acquisition of a single image on several spectra simultaneously.These are compact, one-shot cameras, very fast, and capable of capturing moving scenes.MSFA one-shot cameras solve the problems associated with conventional multispectral cameras, which are the heaviness and slowness during the acquisition of multispectral images.MSFA one-shot cameras are used in several fields such as agriculture, medical imaging, and pattern recognition [9]- [12].
A recognition system is composed of four main modules namely acquisition, feature extraction, matching, and decision.The performance of the system depends on each of the modules.There are several facial recognition systems in the multispectral but most of them use multispectral cameras consisting of multiple single-shot cameras or a single-shot camera in scanning mode.For the acquisition module, a database of face images was collected with a single-shot MSFA camera for facial recognition.This camera is mainly equipped with a viiamagic CMOS sensor, a MSFA with eight filters, micro-lens, an electronic board to drive the sensor, and a camera board for image acquisition.The MSFA bands were selected theoretically with a genetic algorithm combined with a facial recognition application [13].This acquisition system covers the spectral range from 650 to 950 nm and produces raw or mosaic images which require demosaicing before use.Demosaicing is a method of estimating the value of missing pixels in a given band.At the end of demosaicing the number of multispectral images obtained is equal to the number of filters that compose the MSFA.Some demosaicing methods [14]- [18] are developed during the theoretical design of the MSFA, but the industrial constraints of MSFA manufacturing mean that effective demosaicing methods are developed after the MSFA is manufactured.In this case, we already have a MSFA camera, and we want to determine suitable demosaicing methods using its images.Demosaicing methods impact the image quality and thus the performance of the facial recognition system with a single-shot MSFA camera.Single-shot MSFA cameras are very efficient and fast, but the quality of the demosaic images depends on the demosaicing method used.www.ijacsa.thesai.orgFor these purposes, we adapted some currently used algorithms namely demosaicing algorithms Bilinear Interpolation, Discrete Wavelet Transform (DWT), Binary Tree, and Median Vector to demosaic the images acquired with our camera.A comparison of the demosaiced images is made with the NIQE metric and the intercorrelation between the demosaiced image bands is analyzed with the coefficient correlation.The convolutional neural network VGG19 is used to evaluate the impact of demosaicing on facial recognition in terms of accuracy and computation time.To the best of our knowledge, this is the first study that focuses on demosaicing after the MSFA camera is manufactured.This paper is structured as follows: Section II briefly presents the MFSA one shot camera, material and methods in Section III, followed by experimental results and discussion in Section IV.The paper ends with the conclusion and future perspectives in Section V.

II. THE MSFA ONE SHOT CAMERA
Multispectral imaging with a single camera equipped with a spectral or multispectral filter array allows acquiring multispectral images simultaneously on several spectral bands.The concept of a multispectral filter array is an extension to n bands of the color filter array that revolutionized digital cameras.A spectral filter array is composed of n filters and each of them is responsible for the acquisition of the image on a given band.The MSFA camera used in this work is composed of a single Viimagic 9220H sensor, an MSFA for the single-tap imaging system, optical lenses, an electronic board to drive the sensor, and a camera board for image acquisition.This acquisition system was designed in the Imaging and Artificial Vision (ImViA) laboratory formerly known as the Laboratory of Electronics, Informatics, and Image (LE2I) as part of the EU H2020 project called EXIST (Extended Image Sensing Technologies) [13].It is a light and compact camera that covers wavelengths from 650 to 950 nm.The spectral filter array integrated into this camera is made up eight optimal filters selected in the wavelengths {685, 720,770, 810, 835, 870, 895,930}.The design of this custom filter array uses the Fabry-Perot interferometer.The MSFA system integrated into this camera with dedicated hardware and software calculations allows working in real time with 30 frames per second.The filters used to overcome the problems caused by illumination variation, motion blur noise, and SNR noise that severely affect the performance of facial recognition systems using CMOS.The multispectral filter array is characterized by its moxel which is defined by a mosaic of elementary filters repeated across an MSFA.Fig. 1 illustrates the moxel used.

A. Motivation
The literature distinguishes three categories of demosaicing methods: pixel interpolation, frequency transformation, and probability of appearance (POA).Pixel interpolation consists in using the value or weight of neighboring pixels to estimate the value of a missing pixel.Methods such as bilinear interpolation, weighting bilinear interpolation, and binary tree use pixel interpolation.Frequency transformation involves using wavelets to extract essential information from neighboring pixels to estimate the value of the missing pixel.Frequency transformation includes methods such as Discrete Wavelet Transform.Probability of appearance refers to methods that determine the probability of the occurrence of a band and a pixel in a selected band to estimate the value of the missing pixel.These methods include approaches based on binary trees.In general, demosaicing methods depend on the MSFA moxel but nowadays there are generic demosaicing methods that can be adapted to any type of MSFA.Discrete Wavelet Transform, Bilinear Interpolation, Binary three, and vector median filtering were selected for the comparative study.These four demosaicing methods were selected because each one presents some interesting characteristics for the study.These were chosen for the following reasons:  Bilinear interpolation uses the value or weights of neighboring pixels to estimate a missing pixel in a given band.Most demosaicing algorithms combine their demosaicing technique with bilinear interpolation [17], [19]- [21].
 Binary Tree-based Edge-Sensing method is a generic approach that uses the notion of POA to select the band and the pixel in the selected band.This method combines POA, and bilinear interpolation based on the edge-sensing information [22] to determine the value of the missing pixel.
 Discrete Wavelet Transform based MSFA demosaicing [17] is a technique that uses frequency information and Weighted Bilinear interpolation to approximate the value of the missing pixel.
 Vector median filtering [23] is a demosaicing method whose specificity is to use vector based operations and the concept of pseudo pixel.This method groups neighboring pixels according to the size of the moxel to evaluate the value of the pixel missing.www.ijacsa.thesai.org

B. Preprocess
The images acquired with the MSFA one-shot cameras are raw images or mosaic images and must be process or demosaic before using.Demosaicing is a technique that consists in reconstructing each band of the multispectral according to the number N (N=8 in this study) of filters contained in the MSFA.Before demosaicing, a preprocess of band extraction is first performed.This preprocess consists in multiplying each mosaic image by different binary masks as defined by the Eq. ( 1).
This transformation gives eight planes of shifted images in which only one component is available at each pixel.Fig. 2 illustrates this transformation.After pre-processing, the missing pixels in each image band are estimated using the four selected demosaicing methods.

C. Bilinear Interpolation based MSFA Demosaicing
The bilinear interpolation based demosaicing is the simplest method for calculating the value of pixels missing during the demosaicing process.Other demosaicing methods combine their specificities with bilinear interpolation [24].Bilinear interpolation approximates each missing pixel value by means of a distance weighted average of its neighboring pixels.As its name indicates, bilinear interpolation is a succession of two linear interpolations.The linear interpolation can be performed in multiple directions.For a missing pixel at position , the linear interpolation is defined in Eq. ( 2), Eq. (3) and Eq. ( 4) as follows: For this study, a convolution filter is used for demosaicing.This filter is defined according to the spatial distance between the neighbors of the central pixel.The interpolated image band is defined by Eq. ( 5) as: (5) with And the image band.

D. Discrete Wavelet Transform based MSFA Demosaicing
A Discrete Wavelet Transform (DWT) is a transform that decomposes a given signal into a number of sets, where each set is a time series of coefficients describing the time evolution of the signal in the corresponding frequency band.When applied to an image, DWT transforms the image into different frequency bands.DWT is used to decompose images into a series of sub-bands with different frequency components.Over the decades, several DWT-based methods have been proposed in [25]- [29] for Color Filters Array (CFA) demosaicing.
Xingbo et al. [16] extended the application of DWT into MSFA demosaicing.MSFA demosaicking based on DWT encompasses the concept of Down Sampling images, the Haar wavelet (D2), the "replace" rule for the estimation of high-frequency sub-bands and bilinear interpolation for the estimation of low-frequency subbands.This approach is applicable to any MSFA with a regular mosaic pattern, regardless of the number of channels.The algorithm is performed in three successive steps as follows:  High-frequency estimation: First, the image is divided into K down sampled images and each of them is decomposed into spatial frequency sub-bands by DWT using Haar wavelet.Then, estimate the coefficients of the missing DS images in the high frequency sub-bands according to the "replace" rule.
 Low-frequency estimation: Apply bilinear interpolation to the mosaicked image plane by plane and extract the low frequency by image decomposition using Haar wavelet.Then replace the coefficients of the missing DS images at low-frequency sub-bands with those of the interpolated DS images.
 Recompose the low-frequency and high-frequency components and compute inverse discrete wavelet transform to reconstruct the demosaiced image.www.ijacsa.thesai.org This method has been tested by the authors with CFA and two MSFA of four-bands and eight-bands visible.Given the compatibility of this method with all MSFA with regular patterns, regardless of the number of channels, we have modified it to our eight-bands MSFA described above with two pixels per band.For this algorithm, the wavelengths of the bands have been modified by considering the following wavelengths {685, 720, 770, 810, 835, 870, 895, 930 nm} for the eight bands.

E. Binary Tree based MSFA Demosaicing
Lidan Miao et al. [20]- [22] had proposed a generic demosaicing method based on a binary tree.Specifically, this approach uses the Binary Tree Edge Sensing Method (BTES).The missing pixels are estimated progressively using all the edge correlation information of all spectral bands.Binary tree based MSFA demosaicing can be adapted to any 4 × 4 MSFA.The generic demosaicing algorithm consists of three interconnected modules: band selection, pixel selection and interpolation [20].
 Band selection: This module defines the interpolation order of the different spectral bands by using the POA.
The spectral bands have different POA, and the spectral band with the highest POA contains the most detailed information.Band selection is equivalent to selecting leaf nodes (spectral bands) at different levels of the tree.Nodes located at the same level have the same POA, and the deepest nodes have the smallest POA.The band selection process is described as follows: first, the band with the highest POA is selected, the leaf node at the first level of the binary tree.Then, for levels with more than one leaf, the bands at the next level are randomly selected.Finally, this process is repeated until the last level of the tree.
The band with the most edge information is interpolated first and the edge information from the first interpolated band is used to estimate the other bands.
 Pixel selection: This module determines the order of interpolation of pixel locations in each spectral band.The estimation of the pixel values is done gradually.First, some of the missing pixel values are estimated.The other unknown pixel values are then estimated using these estimated pixel values and the MSFA samples.The algorithm uses the binary tree for the pixel"s selection.It takes as input the leaf patterns selected during band selection and interpolates for each of them first the missing band information at the pixel locations where its sibling pattern is located, and then the algorithm goes up one level in the binary tree to find the sibling of its parent pattern.If the latter is an internal node, then the leaf patterns in the subtree below that sibling pattern are examined.This part is repeated until the root of the tree.
 Interpolation is used to estimate the value of the missing pixels with selected pixel for a selected band.The estimate of the value of a pixel at position is calculated by the weighted sum of these four neighboring pixels and their contributions.The weights of these four pixels are estimated based on their edge magnitudes.The weights of two neighboring pixels along the vertical and horizontally direction are calculated by Eq. ( 6) and Eq. ( 7). Vertically: With { } Horizontally: The estimated value ̂ of pixel at position is defined in Eq. ( 8).
The algorithm has been developed for 7 bands.Since the MSFA used has 8 bands, this algorithm was modified for 8 bands, considering the probability of bands appearing.

F. Vector Median based MSFA Demosaicing
Gupta et al. [30] had proposed a CFA vector demosaicing algorithm.This approach selects the color vector that minimizes the sum of the distances to the neighboring pixels to estimate the missing colors.This demosaicing approach is based on the notion of pseudo-pixel, which is defined by a group of neighboring values of red, green, and blue pixels (horizontally and vertically).Xingbo et al. [22] had extended this technique for MSFA demosaicing.According to the authors, this method is based on two According to the authors, this method is based on two specificities: first, the pseudopixels are formed according to the dimension of a moxel.The Moxel is a mosaic element corresponding to a mosaic of elementary filters repeated via an MSFA.Second, the pseudopixels are those that are connected horizontally, vertically, and diagonally.
The median vector of is defined as follows: The algorithm of calculation of the median vector for the demosaicing of the multispectral images proceeds as follows:  For each vector , compute the sum of the distances to all other vectors using the L 1 -norm or L 2 -norm as presented in Eq. ( 10)  The median vector is corresponding to , which is the minimum of This method was tested by the authors with CFA, 4-bands and 8-bands MSFA.For this algorithm, the wavelengths of the bands have been modified by considering the following wavelengths {685, 720, 770, 810, 835, 870, 895, and 930} nm for the eight bands.

A. EXIST Database and Experimental Setup
EXIST is a multispectral image database that was collected with the MSFA one-shot camera described.The EXIST dataset is composed of 2100 raw images of faces of 105 subjects.Each multispectral image is 2072 x 1104 size.After demosaicing, the images obtained are of size 2072 x 1104x8 each corresponding respectively to the wavelengths {685, 720, 770, 810, 835, 870, 895, 930} in nm.Fig. 3 shows some images of the EXIST database.

B. Evaluation Metrics
To identify a suitable demosaicing algorithm, evaluations were performed on three criteria: image quality, decorrelation factor, and recognition rate, using NIQE, correlation coefficient, and recognition accuracy as metrics, respectively.Based on the literature [31]- [33], there are two types of performance metric for image quality: full reference quality, such as Peak Signal Noise Ratio (PSNR), Structural Similarity Index (SSI), Mean Square Error (MSE) and no-reference quality measurements, such as NIQE, Brisqe and piqe.Since the EXIST image database only contains raw images without references, the NIQE performance measure is used to assess the quality of images demosaiced using different methods.
NIQE is developed by researchers of University of Texas [34].C.Kawan et al. [35] used NIQE to compare demosaicing method for Mastcam Images.NIQE is no-reference image quality metrics that use statistical features of the input image to evaluate the image quality.To calculate NIQE value, the image is divided into smaller patches, and the features are modeled as MultiVariante Gaussian (MVG) distributions.
After image quality evaluation with NIQE, intercorrelation of demosaiced images is evaluate with correlation coefficient computation.This coefficient allows to identify the degree of similarity between a pair of images.Images are identical when the correlation coefficient is equal to 1.To calculate this correlation coefficient, the method described in [36] was used.Also, to identify the best demosacing method with the best image quality, decorrelation factor, the demosaiced images were used to train four models based on the VGG19 architecture.VGG19 is an architecture of VGGNet proposed by K. Simonyan et al. [37] in 2015.VGG19 contains 19 weight layers consisting of 16 convolutional layers with three fully connected layers and the same five pooling layers.VGG-19 CNN is used as a pre-training model.The accuracy is used to evaluate the images after recognition.Accuracy indicates the percentage of correct prediction.

C. Result and Discussion
To carry out the test, the following steps were followed for each method: This algorithm led to the results presented in Fig. 4. Visually, we can see that the binary tree presents better images in terms of sharpness of the different parts of the face than the other three demosaicing methods.But visual evaluation alone is not enough.The NIQE metric was therefore used to confirm or invalidate this visual assessment.The quality of the image varies according to the value of NIQE.The smaller the NIQE value, the better the quality of the demosaic image.The Fig. 5 shows the average NIQE calculated on all images obtained for the four demosaicing methods.
For all demosaicing methods, the NIQE value of the corresponding images varies between 8.75 and 14.38 according to the band and demosaicing method.While the NIQE values of the images demosaiced by bilinear interpolation are between 10.68 and 12.97, those of BT are between 8.75 and 9.29, those of DWT between 10.51 and 11.42, and those of VM between 11.68 and 13.66.
The analysis of the NIQE value per band and per demosaicing method shows that the binary tree has the lowest NIQE value per band and for all methods with average NIQE value of 8.99.Based on this NIQE value, the binary tree provides better quality images than others.This analysis confirms the visual observation made after the experimental results.
Time simulations were then made to compare the four methods.The Figure 6 shows the execution time for each.
Interpretation of Fig. 6 shows that the minimum running time of a demosaicing algorithm is 12.78s and is obtained with bilinear interpolation.This time is slightly less than that of the binary tree.The median vector is the method with the longest execution time.The crossing of the execution time and the NIQE value of the demosaicing algorithms shows that the binary tree is the demosaicing method which allows having best quality images in a reasonable time.
To study the decorrelation factor of demosaiced images, we compute the correlation coefficients between them.
The Fig. 7 shows the average correlation coefficient between each band by demosaicing methods.
The demosaiced images are correlated with an average correlation coefficient of 0.9 for all methods.The analysis of these different figures allows us to conclude that the intercorrelation factor between the different bands is lower for the Bilinear interpolation than for the other demosaicing methods.Bilinear interpolation allows us to have less correlated images than the other methods.
After the demosaicing process and comparison, the VGG19 convolutional neural network was used for feature extraction and classification of the demosaiced images obtained with each method.The dataset was identical for all the methods.The training and test datasets were separated with random selection with 80% for training, 10% for validation and 20% for test.In all, 2,200 demosaic images of size 300 x 300 pixels, organized into 110 classes and were used.
Tables I to III describe respectively the training parameter of VGG19, the accuracy, and the recognition time for each image by method.II shows that the accuracies obtained vary between 80% and 100%.Images demosaiced with bilinear interpolation and binary tree have the best accuracy of 100%.A comparison of the results in Table III shows that the execution time for recognizing demosaiced images varies between two and six sec.Images demosaiced with the binary tree are recognized faster than those of the other methods.
In conclusion, the different results show that the binary tree-based demosaicing method is the best of these four methods in terms of image quality, computational time, and accuracy for facial recognition.

V. CONCLUSION
We proposed an evaluation of demosaicing methods for facial recognition using MSFA one-shot cameras to identify a suitable demosaicing algorithm in terms of image quality, computation time and decorrelation factor.This study use and compare four demosaicing algorithms on a base of raw images acquired with a MSFA one-shot camera.The binary tree demosaicing method was used to obtain the best quality images with the best computation time and, accuracy for facial recognition.Demosaicing affects the facial recognition system in terms of image quality and time.The better the demosaic images, the better the accuracy of the facial recognition system.The minimum demosaicing time is 12.78 s with bilinear interpolation.This is huge compared to the demosacing time of CFA which is a few milliseconds.
An optimization of this system should be done in our next work to make it real time.
This paper presents the comparative study of four demosaicing methods to identify the one that gives:  The best image quality result.The best decorrelation factor.The best facial recognition score. And the best computation time.

Fig. 3 .
Fig. 3. Samples of raw images of EXIST database.The experiments were carried out on Microsoft System Windows, version 2010, with two computers.The first was equipped with an Intel(R) Core (TM) i7-8565U CPU, 8 GB of RAM memory.The second has a NVIDIA Quadro P400 graphics processing unit (GPU) with 32 GB of Random Access Memory (RAM).All code is written in the MATLAB 2020 and Python 3.7 programming languages.


Demosaicing raw images. Computation of NIQE values and time of demosaiced images.Calculation of correlation coefficients between demosaic images.Calculation of the average demosaicing.Recognition of demosaiced images.Computation of recognition time.

TABLE I .
PARAMETERS USED IN THE TRAINING PROCEDURE