PlexNet: An Ensemble of Deep Neural Networks for Biometric Template Protection

The security of biometric systems, especially protecting the templates stored in the gallery database, is a primary concern for researchers. This paper presents a novel framework using an ensemble of deep neural networks to protect biometric features stored as a template. The proposed ensemble chooses two state-of-the-art CNN architectures i.e., ResNet and DenseNet as base models for training. While training, the pre-trained weights enable the learning algorithm to converge faster. The weights obtained through the base model is further used to train other compatible models, generating a fine-tuned model. Thus, four fine-tuned models are prepared, and their learning are fused to form an ensemble named as PlexNet. To analyze biometric templates’ security, the rigorous learning of ensemble is collected using a smart box i.e., application programming interface (API). The API is robust and correctly identifies the query image without referring to a template database. Thus, the proposed framework excludes the templates from database and performed predictions based on learning that is irrevocable. Keywords—Biometrics; template protection; deep learning; transfer learning; ensemble


I. INTRODUCTION
Identity theft is one of the major epidemics of current century. In the absence of a reliable identity proofing system, data and information thefts have plagued the applications such as online transactions and social welfare schemes [1]. Traditional security systems such as ID cards and passwords cannot protect from digital impersonation thus, obsolete due to their lost and stolen possibilities [2]. To overcome the problems of lost and deliberate sharing of identity markers, biometric recognition has introduced and gradually became the prime tool for individual authentication. Biometrics uses the physiological or behavioural traits of an individual for identification. Commonly used physiological traits include face, fingerprint, palmprint and hand geometry, while signatures, gait, voice are used as behavioural traits. These biometric traits are proved to be personal, reliable, accessible and universal [3].
Although, the biometric systems are reliable in comparison to traditional identification systems, they are vulnerable to several exploits. The biometric system vulnerabilities can be broadly categorized as faults, failures and attacks [4]. Faults are mistakes made by human or system such as data corruption, software aging and storage space fragmentation [5]. A failure is the consequence of one or more faults. It may be service failures, development failures or security failures. Unexpected service, wrong prediction of complexity or unaccounted situations and imbalance thresholds are some examples of service, development and security failures, respectively.
The faults and failures may lead to fraudulent attacks on machine (i.e., hardware and software) as well as at the administrative level [2]. These attacks can be broadly classified as the sensor level, feature extraction level, matcher level and database level attacks. At sensor level, covertly acquired fake samples such as digital face images, synthetic fingerprints or recorded voice can be presented. The replay of raw biometric trait or injection of false data before pre-processing or feature extraction are common feature extractor level attacks. The attacks on communication channel between feature extraction and matcher module is considered as matcher level attack. For example, infested bug to the algorithm or alteration of match score are some matcher level attacks [6].
The attacks on database consists of templates, are possibly the most prominent fraudulent attacks on a biometric system. The template includes biometric characteristics of an individual that may be compromised, if attacked. Storage of templates in diverse applications create a serious threat to the user's privacy [27]. Thus, a robust mechanism is required to secure the templates stored in the database. An ideal mechanism of template protection should meet the following requirements [8]- [10], [29].
• Cancelability: The compromised template must be revoked by reissuing a unique template from the same biometric features.
• Diversity: It defends user privacy by ensuring that same template is not being used across databases.
• Security: The recovery of original template from the compromised ones must be computationally harder. It avoids the fabrication of a physical spoof from the compromised template.
• Performance: The template protection mechanism should not affect the recognition performance of the biometric system.
Despite several template protection schemes, the protection of biometric templates while preserving it's discriminability is still a challenge [8]- [9], [12]- [28], [30], [16]- [17]. In this paper, a novel framework is designed to collectively meet all the requirements of a robust biometric template method. It achieves high performance through rigorous learning of ensemble that forms the basis for exclusion of template from the database. The absence of a template ensures the cancelability and security requirements, whereas immovable learning avoids it's use in diverse applications.
The proposed ensemble is named 'PlexNet' due to it's resemblance to the plexus such as network of nerves in the nervous system [7]. It is a network of pre-trained and finetuned architectures of the variants of deep networks i.e., ResNet and DensNet thus, creating four network paths as shown in Fig. 1. Initially at each path, one of the architectures from ResNet or DenseNet is used as base model trained on biometric sample e.g., facial images with weights of ImageNet database. The weights obtained from base model are further fine-tuned with a compatible architecture. The combination of fine-tuned weights from all network paths results an ensemble the 'PlexNet'.
The contribution of the paper is for the protection of biometric templates using ensemble learning. The plexnet is prepared using ensemble of transfer learning. As transfer learning is a well-established method for converging to the desired goal faster and more accurate, hence we are implementing transfer learning as a tool. Also, the existing ensemble methods generally use majority voting or averaging mechanism that made predictions based on the individual training of the models. Whereas, the proposed ensemble method combines the learning by fine stacking all the fine-tuned models and then retrains on the target dataset. Therefore, the training of ensemble takes the advantage of the weights gathered by each of the finetuned model and converges faster with better accuracy than the traditional ensemble models.
The learning of the PlexNet is conducted in an application programming interface (API). The API acts as an smart box to predict the class label for the presented biometric template i.e., facial image as shown in Fig. 2. The learning at API is done so vigorous that it correctly identifies the input image without referring to a template. Thus, the prepared API eliminates the requirement of nurturing a template database. Further, it follows the requirements of a robust template protection method. For example, cancelability i.e., there is no chance to compromise the templates. The learning of the model can not be directly transferred to other models, thus, diversity is ensured. It matches security paradigms, as the recovery of original image is almost impossible. The high performance reported by our proposed PlexNet framework forms the basis for exclusion of the template database. Briefly, the substantial contributions of this work are as follows: 1. A novel ensemble of deep neural networks is designed for biometric template security, as called 'PlexNet' due to it's resemblance to the plexus in the nervous system.
2. An innovative learning procedure is adopted using pretrained models that results in faster training and testing, that further solves the problem of class imbalance.
3. Primally, application program interface (API) is constructed that predicts the output based on rigorous learning without referring to a template database.
4. The efficacy of proposed framework is evaluated on challenging databases of face biometrics e.g., VG-GFace2 and MegaFace. PlexNet outperforms other methods and supports presented method of template exclusion thus, making it impregnable.
The rest of the paper is organized as follows. The literature survey of the biometric template protection is presented in Section II. The proposed ensemble framework with a detail description of architectures used and the requirement of transfer learning, ensemble learning and data augmentation are discussed in Section III. The experimentation process that includes database description, experimental setup, results along with security analysis are presented in Section IV. Finally, the conclusions are drawn in Section V.

II. LITERATURE SURVEY AT A GLANCE
The rapid growth in technology and sensing have increased the demand of deployment of the biometric recognition system e.g., at residences, offices, portable devices and other public access checkpoints. Though, this easiness has brought numerous challenges of biometric vulnerabilities that effects the system may be exploited by an adversary. Thus, an efficient and robust mechanism for biometric template protection is essential. The information captured by the sensor may differ for the same biometric trait. So, the conventional method of password protection using cryptographic hash functions can not be directly applied to a biometric system. Normal encryption is not a sustainable solution, as the saved template must be decrypted every time to make the comparison with the query template. Several biometric template protection techniques have been developed [8], [9], [12]- [28], [30], [16]- [17]. Various biometric template protection techniques are classified as follows, 1. Biometric Cryptosystems: Biometric cryptosystem was originally developed to protect keys either using biometric features or generating a cryptographic key [8], [9]. Later, it evolved as a technique for template protection. In a biometric cryptosystem, public information known as helper data is stored in the database instead of original biometric template [10]. This helper data is independent of biometric template and do not reveal the biometric features. It just helps to recover the key used for encryption. Hence, matching is performed between the key extracted from the query template and the stored template key. Biometric cryptosystems can be classified into key binding and key generation approaches [48]. These algorithms differ in the way they generate the helper data. (a) Key binding approach: In the key binding approach, the helper data is obtained by associating a key with the biometric template. The helper data is stored as a template in the database. Since the key is independent of the biometric data, it is really hard to recover the original template or the binding key for the helper data. The stored template is used for key recovery at the time of matching with query template. Thus, the authentication is performed with matching of query and stored keys. This approach is primarily used for key protection. The utilization of these approaches for template protection may not fulfil the criteria of cancelability. Here, the matching is based on error correcting code that may produce incorrect results. Further, if the key is leaked, the original template can easily be acquired.
Fuzzy vault and fuzzy commitment are cryptographic based protection techniques [20], [12]- [13]. A fuzzy vault scheme where a key k is locked using an unordered set A is proposed by Juels and sudan [20]. During the enrollment phase, a polynomial p comprising of the biometric features is used to encode key k resulting in p(k). The unordered set A is projected on p and some false points are added to secure the location of p. During authentication, if the query template overlaps the genuine location of p, key is reconstructed. Further, the idea was extended by Uludag et al., securing fingerprints [21]. Nandakumar et al., proposed the idea of using password for further strengthening fuzzy vault [22]. Though a robust technique in the past, this scheme fails to provide revocability and protection from matching across biometric systems.
Another key binding approach i.e., fuzzy commitment is developed combining two approaches of cryptography and error correcting code. Here, during enrollment phase, a random key is encoded using error-correcting codes. The resultant is a codeword c. A hash value of c is also computed. The XOR operation is performed between generated codeword c and the biometric features, resulting in an encrypted template. This encrypted template known as helper data, is stored along with the hash value of c. During authentication, the features from query template are decoded and XOR operation with helper data is performed to obtain codeword c . If hash c matches the hash c , the user is accepted as genuine. Further, this scheme is extended using helper data with face biometric [15].The limitation of the fuzzy commitment scheme lies in the use of the template as helper data. If the biometric data is non-uniform, helper data may leak information on the secret key as well as about the template itself [14].
(b) Key generation approach: In this approach, the helper data is generated using biometric template.
The key is generated from the helper data and query template. The basic principle of key generation technique is the quantization of helper data that produces the cryptographic keys [24], [26]. The biometric feature vector is divided into several intervals of feature elements. These intervals are selected from multiple biometric instances and encoded to store as helper data. During authentication, a similar approach is applied to the query template. A correct mapping between the intervals results in a key. The key generation algorithm with a quantization approach, can extract same keys from several instances of biometric data provides high stability. However, making it vulnerable to security attacks due to record multiplicity [19]. Moreover, if separate keys are generated for several instances of biometric data, the system suffers from increased entropy and reduced stability resulting in a high rejection rate.
The concept of secure sketch and fuzzy extractor www.ijacsa.thesai.org was introduced by Dodis et al. [25]. A secure sketch is obtained when a biometric template is encoded with a special algorithm. The query template and the secure sketch is required to decode the template, generating the near matching original template. Fuzzy extractors take input biometric identifiers as noisy data and generates a key as well as a helper data. During authentication, it requires both the helper data and the noisy data (query biometric) and generates the matching key.
2. Template Transformation Approaches: Here, a transformation function ( f ) is applied to the original biometric template (t) and only the transformed template [ f (t)] is stored in the database. The transformation function can be derived from any random key or password-based approach. During matching, the same function ( f ) is applied to the query template (t ) and is compared against the original transformed template . The template transformation can be further categorized into invertible i.e., salting and non-invertible transformations [8], [49].
(a) Salting technique: In salting, the biometric features are transformed using an invertible transformation function. Since a key is used for transformation, it should be secured and only be available for authentication. Due to a fixed key used for transformation, system achieves low false acceptance rate (FAR). Connie et al., used pseudo random keys for hashing palmprint templates [30]. A salting technique for face templates is proposed by Teoh and Ngo [31]. They also introduced a biophasor technique based on salting [28]. The transformed template can be inverted into the original if the transformation function is known. Thus, the salting technique suffers from the problem of invertibility making it susceptible to adversary attacks. Random projection is another example of salting technique, where the original biometric template is projected to another domain by a matrix with zero mean and unit variance. The problem with random projection is that the original matrix can be recovered if the projection matrix is known to the adversary. Zhe et al., proposed a ranking based hashing that demonstrated two notions, one sued random projection while the other used random permutation-based hashing schemes. Hence, generating invertible as well as non-invertible transformations [32].
(b) Non-Invertible transformation: On the contrary, a key is used as a transformation function in the noninvertible transformations making it a one-way transform. So, even if the transformation function is known, the inversion is computationally harder to achieve. A non-invertible transformation function for fingerprint is used by Ratha et al. [34]. In another work, they proposed Cartesian, polar and functional transforms that are non-invertible [11]. These functions transform the fingerprint minutiae e.g., the Cartesian transformation divides the minutiae space into a grid and each cell is rearranged to a new position according to the key. The query template is transformed similarly and matched to the original transform for authentication. The use of robust hash function provides better cancelability and security than salting, but the performance may decrease. One of the reasons for low performance of these biometric systems are that they do matching in transformed domain that results in high FRR. Further, if the transformation function is compromised, the system loses non-invertability. Thus, it is difficult to achieve a balance between discriminability and noninvertability.
3. Hybrid Approaches: Hybrid approaches were also introduced to take the advantages of both biometric cryptosystem and transformation. Feng et al., proposed an approach, where the extracted template is randomly projected to a subspace generating a cancelable template [33]. The discriminability is increased during the transformation and finally fuzzy commitment is used to protect the generated cancelable template. A hybid method based on revocable biotokens for face biometrics is proposed by Boult [35]. Liu et al., combined fuzzy vault technique with multispace random projection to improve the security of palmprints [36]. The fingerprint and palmprint features are combined to build a hybrid system [37]. Further, neural network was introduced in combination with secure sketch [38]. This combination provides high tolerance against noisy data as well as revocability and non-invertibility. Sardar et al., proposed a hybrid solution by combining the non-invertibility with a biocode encryption. This helped in enhancing security levels along with better accuracy [39]. Talreja et al.
proposed a multibiometric template protection using binarization and hashing that uses a key for each individual during identification, hence the use had to have they key during the recognition process [40].

III. PROPOSED ENSEMBLE FRAMEWORK
The machine learning techniques simulate the learning process of the human brain and automatically create significant feature vectors. The major issue of these techniques is their limited generalization capability. For example, if the unseen patterns are to be tested, these methods usually results in unconvinced predictions. To improve generalizability, deep neural networks, particularly convolutional neural networks (CNN's) have evolved. The CNN's are proved to be very effective in several image processing and computer vision applications [44]. The capabilities of CNN's are yet to be explored for biometric template protection. The performance and generalizability of CNNs motivated us to utilize it for securing biometric templates. Two state-of-the-art CNN architectures i.e., ResNet and DenseNet are used in order to form an ensemble.

A. CNN Architectures
The CNN is designed stacking three types of layers i.e., convolution, pooling and a dense or fully connected layer. The convolution layer containing filters with numeric weights is responsible for feature extraction. The input image is convoluted with filters generating feature map. It contains the dominant features while preserving the relationship between the neighbouring pixels. More formally, let the input image be denoted as x, the filter be f and the rows and columns of the convoluted matrix are denoted as a and b, respectively. Then, the convolution operation is defined as, Two CNN architectures i.e., ResNet and DenseNet are chosen for making an ensemble [41], [42]. The selection of these architectures is based on their performances on VGGFace2 and MegaFace databases. In addition, these two architectures have several variants and proved to be complimentary to each other.
1) ResNet Architecture: The main idea behind ResNet is the skipping of one or more layers. The convolutional, relu, pooling and batch normalization layers are added in a redundant manner to form a residual network. The initial block consists of two 3 × 3 convolutional layers where each convolutional layers comprise 64 filters of size 3 × 3. The rectified linear activation (ReLU) is the default activation followed by a batch normalization to maintain the mean activation as zero and the standard deviation closer to 1. The size of the second convolutional layer is reduced using max-pooling of size 2 ×2. After each convolutional layer a dropout is applied to avoid over-fitting [50]. The strength of ResNet lies in its identity short-cut connection that flows the gradient by skipping one or more layers. It solves the vanishing gradient problem where the back-propagated gradient becomes infinitely small due to repeated multiplication. Thus, the identity short-cut connection provides the way to feed-forward the output of a ResNet block directly to the other block in the next layer.
More formally, let Θ l (.) be a non-linear composite transformation function of convolution, ReLU, pooling and batch normalization. The output of a ResNet block at layer l taken as ϒ l , is [41], where ϒ l−1 , represents the input to a ResNet block at layer l. It means output of a ResNet block is depends on input from previous layer and it's non-linear transformation with function Θ l (.). Several such blocks are stacked that create a short-cut path for the gradient that optimizes the back-propagation. Due to stacking, ResNet may results in generating redundant layers.
2) DenseNet Architecture: Huang et al., proposed another solution to vanishing gradient problem using DenseNet [42]. DenseNet comprises of several blocks that are densely connected. The features generated by one block is input to the next dense block. In larger CNN models, there may be feature loss before it reaches to the output layer. DenseNet overcomes this issue with the use of repeated blocks. Further, the DenseNet requires fewer parameters in comparison to the other models such as ResNets, [43].
DenseNet architecture consists of dense block and transition layers. The dense block is a collection of different type of layers connected to similar previous layers. The feature maps are generated using 3 × 3 convolutional layer within a block. The generated feature maps are scaled with a batch normalization layer. The feature maps generated by each layer are concatenated together. Thus, the output of a layer l in a DenseNet unit can be represented as: Here, the transition between two dense layers is achieved through a transition layer that controls the number of connections. The ResNet and DenseNet architectures are found to be complementary to each other due to the fact that both provide solution to vanishing gradient. The ResNet handles it with large number of parameters that is compensated by feature reuse in DensNet.

B. Transfer Learning
In order to overcome the issue of under-fitting in smaller databases, the concept of transfer learning is introduced [51]. The weights of a pre-trained model are utilized while training a new model on a different database. Thus, the learning starts from an elevated point that avoids data insufficiency. Moreover, the identical distribution of training and test data is not required while using transfer learning.
Formally, let, ζ 1 be the learning curve of a model pretrained on database χ 1 . The goal is to improve the learning curve, ζ 2 for a new model with a new database χ 2 . The learnt behaviour of the model pre-trained on, χ 1 is transferred to the predictive function, φ . The function φ in turn, improves the learning curve, ζ 2 on database χ 2 . Several state-of-theart image classification methods are based on transfer learning [53]. Different model are pre-trained on ImageNet that contains millions of images of over 1000 categories [18]. The upper layers of these pre-trained models can be fine tuned to match the current model working on the new database.

C. Ensemble Learning
The pre-trained models perform better than a model that starts learning from the scratch. However, sometimes a single pre-trained model may not be enough to establish a robust learning for a given database due to class imbalance or concept drift [52]. To overcome this issue, ensemble of learning algorithms is used for many classification problems [54]. It performs learning by combining different learning models to a single predictive model. An ensemble network can be built using different classifier architectures, initial weights and training databases. Although, the use of different classifier architecture may be a good choice, it requires compatible architectures. For example, ResNet-152 architecture may accept the initial information from a ResNet-101 or ResNet-50, but can not accept the initial weights from other architectures such as DenseNet or VGG [41], [42].
The proposed ensemble framework i.e., PlexNet is prepared using two pre-trained architectures i.e., ResNet and DenseNet. These architectures are chosen over other architectures such as VGG, MobileNet, ResNetV2 and InceptionResNetV2, due to the following reasons, • The ResNet and DensNet architectures have several variants that are compatible to each other. Thus, an ensemble with different classifier architectures is achieved.
• Secondly, these architectures have achieved better accuracies than other pre-trained architectures as shown in Table I.
The initial training starts with pre-trained models on Im-ageNet database [45]. These initial models are presented as the endpoint of the framework as shown in Fig. 1. The weights obtain after training the initial models on the query database are saved. These weights are then fed to the next level of compatible architectures present at the axons of the PlexNet architecture. Thus, the axon models are fine-tuned and achieve higher accuracy with faster convergence than their predecessors. Finally, these fine-tuned models are combined in an integrated stacking model to prepare a meta-learner i.e., Now, feature level fusion is performed that fuses features from each fine-tuned model using concatenation. The result of fusion is a feature vector representing weights from each class. Further, the features are passed to a fully connected neural network of ensemble model that map the features to their corresponding classes based on expected likelihood.
The robust learning of PlexNet that ensures lower false prediction is collected and saved in a smart box i.e., known to be an API as shown in Fig. 2. The query image does not expose features but maps to the appropriate class while processed with API. The model resembles the learning of a human brain to recognise a person using facial images. Learning with faces, again and again, created patterns in the brain that are to be known. If a similar pattern occurs then, there is no need to seek any reference but the brain matches the description with the learning did in the past and identify the person. Similarly, the API predicts the appropriate class of a query image referencing it's learning. Therefore, our model of template protection need not store templates in database for making predictions. Thus, the requirement of nurturing a template database is eliminated, hence making it impregnable against fraudulent attacks.

D. Data Augmentation
To reduce the over-fitting due to class imbalance or rather unavailability of sufficient data, augmentation of images is performed for accommodating the intraclass variations [46]. Various transformation techniques are applied on the available set of images, such as geometric (e.g., reflection, scaling, rotation, shear, translation) and photometric (e.g., brightness, noise reduction, hue adjustments, edge enhancements).
In order to maintain the heterogeneity of generated images and space-time complexities to be minimum, online data augmentation is introduced. We use the image 'Datagenerator' function available in Keras that apply different transformations in each epoch [47]. For example, the images are flipped in one epoch, whereas zooming is applied in another epoch. As the number of epochs increases, so the number of transformation on the random images per class. Thus, each model learns with a different set of images per class at the axon level.

A. Databases
Two publicly available databases i.e., VGGFace2 and MegaFace are used for training and testing all the architectures. The VGGFace2 database comprises of more than 9000 subjects spanning over different accents, ages and ethnicities [55]. There are more than 350 facial images per subject that are further subjected to augmentation. The size of the images varies from 50 pixels to more than 300 pixels, which are averaged to 225 pixels for the experimentation limitations. The facial images of this database are acquired in an unconstrained environment. It means the images have pose and style variations, front or side views and may contain torso part, hence putting more challenges to a face recognition system [56].
The MegaFace database contains millions of facial images belonging to hundred of classes [57]. The images are compressed using JPEG formats. The three channel colored images are available with a dimension range from 250 × 226 to 300 × 312. These are also changed from their original size to 225 × 225. The classes have an unbalanced distribution of images ranging from 500 to 700. In MegaFace database, images are taken in a constrained environment i.e., the faces are cropped and only the front view is included.

B. Validation Metrics
Let, Pred = (Pred 1 , Pred 2 , Pred 3 . .. . Pred n ) be the predicted classes by the classification algorithm, and Act = (Act 1 , Act 2 , Act 3 . . .. . Act n ) be the actual n classes. The proposed method is evaluated using following validation metrics, • True Positives (TP): is the representation of a correct prediction of a positive class by the model i.e., (Pred i = Act i ), where i represents class labels.
• True Negatives (TN): similar to that of true positives, true negatives are the correct identification of the negative class by the model i.e., (Pred i ∈ (1 − Act)).
• False Positives (FP): If the model predicts the negative class as the positive class it is termed as false positives i.e., (Pred i = Act j ), where i, j represents class labels from predictions and actual class sets, respectively and i = j.
• Confusion Matrix: From the above predicted and actual labels, confusion matrix C can be derived for the classes where C i,i , determines the correct predictions for the i th class whereas C i, j determines the i t h classes that were misclassified as j t h class.

Recall =
T P T P + FN (5) • F1-Score: is the harmonic mean that represents a single measure for both precision and recall.

C. Results
The selection of pre-trained models in PlexNet is based on the accuracy achieved on VGGFace2 and MegaFace databases, as shown in Table I. The accuracies of different architectures are evaluated at two different training and testing database ratio i.e., 80:20 and 90:10. The accuracies at the 80:20 split ratio are found better than the accuracy at split ratio of 90:10 on both databases. So, all the experiments are performed on the training and testing database ratio of 80:20. On VGGFace2 database, the accuracy of ResNet base model is found to be the highest i.e., 93.14% at the training and testing database ratio of 80:20. Similarly, the DenseNet achieves the best accuracy of 92.52% at the base model. The accuracies of all other tested architectures are found lower in comparison to these models. Similarly, on MegaFace database these two architectures achieve better accuracies. For example, the ResNet and DenseNet architectures achieve the accuracies of 96.46% and 97.21%, respectively. Beside the accuracies, ResNet and DenseNet complement each other, hence chosen as base models.
The PlexNet is prepared through the integrated stacking model using pre-trained architectures. The information gathered using all the pre-trained weights helps the ensemble to attain the best accuracies as shown in Fig. 3. The base models chosen for both the databases achieve lower accuracies. On VGGFace2 database, the ResNet base model achieves the accuracies of 88.54%, 93.45% and 93.14% at epochs 1, 5 and 10, respectively as shown in Fig. 3a. Similarly, the DenseNet base model at epochs 1, 5 and 10, reports the accuracies 73.24%, 90.12% and 92.52%, respectively. The fine-tuned model perform better using the learning weights of base models. For example, the accuracies of ResNet and DenseNet fine-tuned models are found to be 94.91% and 93.95%, respectively at 10 epochs. Finally, the fusion of finetuned models results in PlexNet that reports the accuracy of 96.48% on the VGGFace2 database.
The proposed method performs better on MegaFace database. The testing accuracies on MegaFace database is shown in Fig. 3b. Here, the testing accuracies of ResNet base model are 82.66%, 94.56% and 96.44% at 1, 5 and 10 epocs, respectively. The DenseNet base model performs better than ResNet base model. It reports the accuracies of 81.65%, 95.12% and 97.21% at 1, 5 and 10 epocs, respectively. Since, the MegaFace database contains facial images acquired in a constrained environment, it may results in lower intraclass variations, hence better accuracies are achieved using all the models. Here, the ResNet and DenseNet fine-tuned models reported the highest accuracies of 97.57% and 98.16% at 10 epochs, respectively. The overall accuracy of proposed PlexNet on MegaFace database outperforms all other existing methods and found to be 99.61%.
The values for precision, recall and F1-score on MegaFace database found to be better than VGGFace2 database and reported to 0.9948, 0.9946 and 0.9947, respectively. The higher values reported for these metrics on both databases further show the robustness of PlexNet. The values for these metrics demonstrate that VGGFace2 is a challenging database, as it contains images in unconstrained environment with large variations in pose, illumination and ethnicity.
The effectiveness of a biometric system is evaluated in terms of true acceptance rate (TAR). In order to achieve better visualization of the accuracy of the PlexNet, a receiver operating characteristic (ROC) curve is drawn as shown in Fig. 4. The ROC curve plots false acceptance rate (FAR) on x-axis and TAR on y-axis. Thus, a relationship is established between FAR and TAR that can be utilized while deciding a

D. Comparative Analysis
The performance of proposed template protection method is compared with existing methods on various biometric databases as shown in Table II. The template protection methods using fingerprint biometrics reported the GAR of 97% and 94% at 0.1% and 0% FAR, respectively [62], [58]. The discriminability of iris features proved to be better as the method of Talreja et al. reported 99.1% GAR at 0% FAR [40]. However, the iris patterns are taken obtrusively and are inconvenient for users. Therefore, the recognition  in constrained environment. The performance of proposed method is evaluated on both constrained and unconstrained databases. The proposed method reports 92% GAR at 0% FAR on VGGFace2 database that is taken in unconstrained environment. The performance of proposed method improves on MegaFace database. In comparison to other constrained environment databases, our method reports highest GAR of 98.5% at 0% FAR. Thus, the proposed method outperforms other existing methods on both constrained and unconstrained databases.

E. Security Analysis
The security paradigm of the proposed ensemble of biometric template security is analyzed to avoid possible fraudulent attacks on the biometric system. We can consider following parameters to evaluate the robustness of the proposed template protection method.
• User Impersonation: The attacker may try to impersonate as a genuine user. The experimental results show that higher TAR is achieved at marginal FAR on both databases. For example, on the VGG2Face database 99% TAR is achieved at lower FAR of 1%. It can be further reduced without much loss in TAR. The false acceptance is further minimized while using the MegaFace database, as 100% TAR is achieved at 0.8% FAR. The proposed biometric system is to be set as to not allow any false acceptance i.e., FAR of 0%, while maintaining the TAR of 93% and 97% on VGGFace2 and MegaFace database, respectively. Hence, there is less chance of template impersonation while using the unconstrained database. Finally, the false biometric attempts made by an intruder to breach the security of the system can be negligible.
• Denial to Attackers: Generally, in a biometric system the features of a query image are matched with the templates stored in the database. An attacker may gain access to the templates of the gallery in such cases. In the proposed framework of PlexNet the gallery database is removed. The learning of PlexNet is enough to make a correct prediction without maintaining the template database. Therefore, there is hardly any possibility of attacks on the template database. The proposed method predicts the class label for a query image using a smart box i.e., API. The API may be compromised by an attacker. It consists of weights and biases from different pre-trained models having millions of parameters per model. The access of image information from these parameters is almost impossible. Hence, the proposed ensemble ensures the security of biometric templates that are completely irrevocable.
To further strengthen our claims, we performed birthday attack to check the vulnerability of the prepared model against cross-referencing of the database. To achieve this, both the databases were cross referenced against each other. For example, the API prepared using VGG face dataset was fed the MegaFaceDataset and vice-versa. The result of this attack is shown in Fig. 5 and Fig. 6. The ROC curve is considered most accurate if it is close to the top left of the corner and then moves horizontally. Likewise, if the plot is closer to the diagonal, as shown in Fig. 5a and 5b, it shows that the probability of false identification is very low. The birthday attack is further analyzed using positive and negative predictions. In both the cases, the positive predictions tend to be near 0 or exact 0 while negative predictions were closer to 1, as shown in Fig.  6a and 6b. This further strengthens the claim that the proposed model has negligible false acceptance during an adversary attack.

V. CONCLUSION AND FUTURE SCOPE
The deep neural networks have achieved tremendous performance in computer vision, pattern recognition and image analysis. These networks learn intrinsic patterns of data automatically without being programmed. The advantages of using deep neural networks such as CNNs are yet to be explored for the application of biometric template protection. The application of CNNs in biometric recognition has motivated us to utilize the potential for biometric template protection. It has shown that a single learning model may not perform better on complex databases due to class imbalance or data insufficiency. The class imbalance and data insufficiency may results in the problem of over-fitting and under-fitting, respectively. To overcome these issues, an ensemble has proved to be efficient in several state-of-the-art methods of biometric authentication. This paper has presented a detailed study on biometric template protection. It has been achieved through exclusion of templates from the database by exploiting the learning of deep neural networks. The exclusion of the template database requires a high performing network architecture, so an ensemble has designed.
The ensemble called 'PlexNet' is designed using deep neural networks. Two state-of-the-art CNN architectures i.e., ResNet and DenseNet are chosen to form the PlexNet. The selection of these architectures is based on experimentation on a database of millions of images and their complementarity. The selected base models are trained separately using their pre-trained weights from ImageNet. The performance of PlexNet for biometric template protection has evaluated on two publicly available databases i.e., VGGFace2 and MegaFace. It has achieved outstanding recognition results that outperform other state-of-the-art template protection mechanism using face biometrics [56]. The proposed PlexNet model laid the foundation for biometric template protection. An API has prepared that learnt base and fine-tuned models together. The API is acted as a smart box for the query image that has predicted the correct class without exposing the biometric features during training. The proposed template protection framework did not store templates in the gallery for any possible predictions by intruders. Therefore, proposed framework has made possible the exclusion of templates from gallery and performed predictions based on learning that is irrevocable.