An Accelerated Architecture Based on GPU and Multi-Processor Design for Fingerprint Recognition

Fingerprint recognition is widely used in security systems to recognize humans. In both industry and scientific literature, many fingerprint identification systems were developed using different techniques and approaches. Although the number of conducted research works in this field, developed systems suffer for some limitations partially those related the real time computation and fingerprint recognition. Accordingly, this paper proposes a reliable algorithm for fingerprint recognition based on the extraction and matching of Minutiae. In this paper, we present also an accelerated architecture based on GPU and multi-processor design in which the suggested fingerprint recognition algorithm is implemented. Keywords—Minutia; Fingerprint; Architecture design; recognition; Gabor filter; MPSOC


INTRODUCTION
Individual identification presents a challenge for the modern society.In this context, biometric recognition presents the most popular method for identification.Particularly, fingerprint technique is the most used in industries for many reasons: cheap, secure and easy to deploy.Fingerprints are rich in details and contain a different form based on ridges.These forms define many characteristics point named -Minutia‖.Each individual has unique repartition of Minutia which is different than others.Consequently, fingerprint is always used in many systems as the identifier of humans.Minutia is defined as a local ridge characteristic.Fingerprint contains various types of Minutia, but usually two types of Minutiae are used: Termination and Bifurcation.Termination is defined as the end point of a ridge.Bifurcation is defined as the point where a ridge merges or splits into branch ridges [1].

A. Fingerprint recognition process
In the scientific literature, the process of fingerprint recognition is always divided to different phases including: the pre-processing, the extraction of the Minutiae and the matching (see Fig. 1).
The objective of the first phase, the pre-processing, is applied on gray-scale images essentially to improve and divide fingerprints ridges from the background texture.The first step is to apply a filter algorithm.This step is important to ameliorate the quality of the image and to decrease noise [2].Then the binarization step transforms the image into binary and lets the separation between ridges and background easier.The last step in the pre-processing step is the skeletonization which is based on thinning algorithms.The aim of this step is to thin ridges to only 1 pixel wide.This preserves the essential information (Minutiae) with low size of storage.Also thinning algorithms reduces the data that represents Minutiae and make the treatment more effective and faster.In the scientific literature, there are many iterative methods for thinning including sequential [49] or parallel [3].In this step, the most used window size is a 3X3 pixel.In this case the central black pixel has 8 neighbours that can be considered [3].The second phase is the extraction of the Minutiae.Many extraction methods were developed including those based on the nearest neighbour pixels around the central pixel [4].Another method of extraction Minutia is presented in [5] which search the characteristics point Minutiae using thinned ridges.Other algorithms were based on classifier techniques [20].[6,7] tried to detect Minutiae using the ridge line without apply the thinning algorithms.In these works, different rules and ad-hoc methods are used to handle problems met on extraction.The extraction method proposed in [8] is based on Data-driven Error Correcting Output Coding (DECOC) classifier.This method presents a many advantages with other methods used.However, the outputs results depend not only on selected extraction algorithm but also on the preprocessing phase essentially binarization and skeletonization.
Matching phase aims to carry out fingerprint verification www.ijacsa.thesai.organd their generalization to find the factor of similarity between the input image and the corresponding database.This phase still suffers from the increasing of the time consumption, especially when the number of saved fingerprint is huge [42].Although the number of the conducted research in the field of Minutiae matching for acceleration [48], many proposed architectures are not suitable for large database to maintain precision and the real time decision.

B. Problem statement
As mentioned bellow, the pre-processing and the extraction Minutiae are an important phases to improve not only the quality of the image but also the recognition result.Accordingly, many research works were conducted to develop techniques and algorithms.Unfortunately, in spite of the number of research works in this field, there is still a lack for systems allowing the real-time fingerprint recognition.In [50] [51], the authors propose matching algorithm based on fingerprint database.But these algorithms let essentially sharing data with database and not suitable for real time identification for large database.
In this context, the objective of this work is the development of a new accelerated architecture based on GPU and multi-processor design based on a new algorithm for fingerprint recognition based on the extraction and matching of Minutiae.
The remainder of this paper is organized as follows: Section 2 introduces the developed pre-processing algorithm.Section 3 presents the proposed Minutiae extraction based on DECOC classifier.Section 4 describes the Minutiae matching phase and presents the adopted method.Section 5 presents the implementation of the suggested system and the obtained results.The paper is concluded by a conclusion and suggests some future works.

II. PRE-PROCESSING PHASE
During fingerprint recognition, the image pre-processing represents an essential phase.The collected images from fingerprint sensor are challenged by the quality of fingerprinted person due to many conditions as skin condition, collection conditions, and the environment.This phase can be divided into four steps: Normalisation, Filtering, binarization and skeletonization.

A. Normalization step:
Normalization step can be defined as a method to reduce the diversification degree of the grayscale image captured by the sensor.As known, the fingerprinted is composed by ridges and valleys that form the structure of texture information.The aim of the normalization step is to facilitate the pattern capture and the fingerprint frequency.But this step still suffers from confusing between ridges and valleys [24].
The normalized step is based on equation (1).

B. Filtering step: Gabor
Filtering step is based on Gabor filter [13], [14], [21].This filter can represent the local frequencies as mentioned in equation (2).It has both orientations: spatial domain and frequency domain.Developer can separate fingerprints from the background using Gabor filter algorithm based only on spatial domain.Indeed, the related works use only spatial domain because the Gabor filter take much time.
In this paper, spatial and frequency domain is used ensure the best filtering results [9].This choice is adopted due to the proposed architecture that is based on GPU accelerator.
Gabor filter has the following equation: where f : frequency of the sinusoidal of the plane wave φ : orientation δx,δy: space constants of the Gaussian envelope along x and y axes, respectively.

C. Binarization
Binarization step convert image from grayscale image to binary image.The most used method of binarization [15] is based on global threshold that consists in calculating a unique threshold for the total image.The disadvantage of this method is shown when the fingerprint has different quality.In this case, binarization algorithm based on global threshold will eliminate many parts of the finger image (Figure 2.a).This paper proposes a binarization method based on local threshold (Figure 2.b).This method is described as follows: 1) Divide the fingerprinted image into masks of 10 x 10 pixels.
2) Calculate the threshold of every mask.The last is determined using the equation (3).
This method shows good results especially when fingerprints and background will be separated.This method is more adopted in the field of fingerprints because the partition of the pixel intensity is not homogeneous.

D. Skeletonization
Skeletonization method is defined as a thinning algorithm that can be transforming the thickness of ridges into a single pixel [24].
The method used for skeletonization [16] is based on scanning the image by 3 x 3 blocks neighbourhood [17].When this method is applied for the total of the image, the process costs too much time.
At first, this paper proposes a modified thinning algorithm like [17] to reduce the execution time of skeletonization step.For this reason, we consider that the process of thinning is applied only when the 3 x 3 bloc contains more than two black pixels.An experimental result shows that the call of thinning function is reduced on third.

III. MINUTIAE EXTRACTION PHASE
Minutia extraction presents the second phase.This phase aims to find a particular point or feature in fingerprinted image named as Minutia.The last can be defined as the intersection of ridges or the ending of ridges.Minutia can be classified as two parts: (1) Bifurcation Minutia (intersection of two ridges) (2) Termination Minutia (ending ridges).

A. Discussion
Minutia extraction is based essentially on classification stage that aims to detect features [25].There are many related work that propose algorithm to identify the Minutiae grayscale.In [26], Minutia is determined using a mask sized 3 x 3 bloc.A Minutia is identified if the pixel has one or more than two neighbors.But when the quality of fingerprinted image is poor is extremely unreliable.There many work that lets Minutia extraction by using different classifier as SVM [36], Bayes [37], Neuron, Fuzzy, etc. Table 1 resumes the most important contributions.

TABLE I. COMPARISON BETWEEN DIFFERENT METHODS FOR MINUTIAE EXTRACTION
FAR FRR HAO GUO method [32] 4.18% 9.93% OMER SAEED method [33] 1,12% Not indicated Ying HAO method [34] 1% 2.5% Jiong Zang method [35] 0.04% 1.31% Ching-Tang and al. method [36] 0.5% 0% Mossaad and al. method [27] 0% 0.02% In this paper, the DECOC classifier is used for Minutia extraction based on comparison in [27].This classifier is adopted in many fields such as manuscript recognition.For this reason it is applied for fingerprint recognition.In the next section, a brief definition for the adopted method is presented.

B. Error Correcting Output Coding (ECOC)
In this section, Error correcting output code algorithm will be described.The DECOC classifier is based on ECOC classifier.The last have been used in different fields as network communication and information theory.The main purpose of the use is the reliability of transmitting binary signals and the integrity information.The idea is to add the superfluous parity bits to an information word.The novel word result is named code word.The code word presents a binary code string.Then, the ECOC algorithm calculates the distances between two code words using Hamming distance, as shown in equation ( 4).The last is determined by the count of the different bits in the two patterns.
Hamming distance: where w j is the ideal code word assigned to group j.H (w j , w(x)) presents the main function that computes the Hamming distance between w j and w(x).
Data-driven presents an extending version of ECOC.The aim is to provide new solutions to the problem of multi-class.In this fact, DECOC compared with ECOC will be applied to two multi-class pattern recognition problems [8].

C. Methodology: Data-driven ECOC
Data-driven ECOC (DECOC) chooses the code words utilizing specific information from the training data.For a Kclass problem as pair wise coupling, a decomposed algorithm is proposed.Also a K*(K-1)/2 base learners are always needed.
This measure determines the learner that will be included in the ensemble.This measure is called confidence score.DECOC classifier is based essentially on two parameters: Separability criteria and confidence score [8].
As mentioned before, the use of DECOC classifier is to make decision about the kind of pixels blocks.This classifier can be defined into two parts as follow:  The used size mask for image is the 5 × 5 pixels; it represents more information than the 3 × 3 mask and sufficient information than the 7×7 pixel mask.It is necessary that the size of the mask be an odd number to ensure the presence of central pixel.

IV. THE MINUTIAE MATCHING PHASE
Matching phase can be presented as a mechanism that is responsible to make decision based on likeliness parameter between two fingerprints.There are different methods in literature that lets Minutiae matching based on types of features [29][30] [31] or on correlations of image [28].
The Minutiae matching phase aims (1) to make a reliable decision and (2) to ensure the real time response.
The first is faced with the problems of variations as the displacement of fingers, the rotation of fingers, the nonlinear distortion, the pressure of fingers when touch the sensor, the skin condition and feature extraction errors, etc [1], present a major problem.
Minutiae matching are classified in two families [28]:  Global Minutiae matching; it aims to align Minutiae between two compared fingerprinted based on two directions and the angle.Works based on global Minutiae still suffer from on-precision and false identification [28].
 Local Minutiae matching; it aims to compare two fingerprints according to the relationships between proximity Minutiae.Works based on local Minutiae matching is more adopted than global Minutiae matching [28].
In the literature, there mainly three proposed method for matching phase.The first represents the classical matching algorithm [53].In this case each Minutia is identified by its neighborhood Minutiae and the comparison is made by pairs.When a two pairs is detected as similar, the rest of comparison is made by coordinates and angles.
The second Minutiae matching algorithm [54] considers the topology of Minutia given by fixed radius and the comparison is ensured by the similarity of local topology.
The third Minutiae matching algorithm [55] gather the methods in [53] and [54].Each Minutia is described using its position and the relative position of neighboring Minutiae.
To sum up, the last method is more precise but its time consumption is important.This paper proposes a modified algorithm based in [55].The proposed matching algorithm uses the location coordinates to extract new information/ relationship between Minutiae without be tied with fixed reference, as shown in figure 5.In our case, Minutia is presented by equation (7).The matching method is based on three parameters:
 Direction between two successive Minutiae, see figure 6, using equation ( 9), ( 10), At conclusion, every fingerprint is associated with a unique signature that is composed by distance, type and direction between two Minutiae, see equation (12).
The matching algorithm process as follow: 1) Compare between Input Distance and the all stored distance in database.If the distance is lower than ε=0.01, we proceed to next step.
2) Compare between Input Type and the adequate stored type in database.If the comparison is equal we proceed to next step.
3) Compare between Input direction and the adequate stored direction in database.If the direction is lower than ε=0.05 the Minutia is accepted and it be considered a true Minutia.
The second aims of the Minutia matching phase in this paper, as mentioned above, is to ensure the real time execution.Many system based on identification is challenged by the real time decision and there are many related works in the literature that deals about it [42], [43], [44].The real time execution is ensured by the identification for distributed database or by the time of identification.
In [50], the authors propose Minutiae matching phase based on client-server system when the fingerprint database is presented in many servers.But this kind of system doesn't support the parallel execution.That's why client-server system is not suitable for very large database.
In [51] and [52], the authors propose Minutiae matching phase based on agent-based systems.In fact this kind of system is adopted for the heterogeneous hardware architecture.The objective is to ensure load balancing between shared machines.Unfortunately, as mentioned in [51], the execution time is still important to guarantee the real time decision for large database.
The cited research works goals to provide identification for distributed databases, but this paper focuses to decrease the time of identification for large database.The state of arts is introduced in the next section.This paper proposes, in the next section, an architecture based on GPU and multi-processor design for fingerprint recognition to accelerate the identification time.

V. EXPERIMENT RESULTS
The implementation of fingerprint recognition represents a challenge especially for system that contains a huge number of fingerprints saved in database.Nowadays, the progress in term of technology became an advantage for recognition system to accelerate the consuming time.Fingerprint recognition involves many multiplications, evolution, rotation, and floating-point operations, which greatly slow the processing speed [24].The implementation of fingerprint recognition system using hardware accelerator is widely used in literature [22] [23].These implementations are much linked with the speed of the used processor and the time execution of the whole system is limited.In the next section, a brief resume for different hardware architectures is described. Hardware board; there are many work that implement the whole recognition algorithm into board based on microcontroller as [38].But this solution suffers from the increasing of execution time above all when the database exceeds about one thousand.www.ijacsa.thesai.org Embedded systems; this solution is adopted for intelligent sensors [39].In this case the sensor encloses the whole architecture to make decision about recognition.Embedded systems are used for a limited number of users.
In this paper, we propose to combine into hardware accelerator and multi-processor architecture.The next section presents the multi-processor architecture.

A. Multi-Processor architecture
The image of fingerprint is received by sensor with size of 200x300 pixels and divided into mask sized by 10x10 pixels.The proposed design is composed by SRAM memory, control unit, 4 processors elements (cores) and 4 hardware accelerators based on Gabor filter, see figure 7. The proposed architecture belongs to SIMD/MPSOC field [44].The control unit represents an essential component in design.The last aims to handle all processes:  Arbitrating the access processors units to/from memory;  Handling and commanding all the processor unit; At first, the image is captured with the fingerprint sensor and saved into principal memory.The proposed algorithm for fingerprint recognition is divided into software modules (skeletonization and Minutiae extraction) and hardware components (Filtering, Binarization and Matching), see the next section.In this case, the skeletonization and the extraction Minutiae code are saved, also, into principal memory.The control unit assigns the right process for processors elements.The proposed MPSOC architecture is composed by Microblaze soft-core processors interfaced with shared memory using the AXI4 bus [45].All processors elements, on-chip memories, and the AXI4 bus are clocked at 100 MHz.Off-chip memory is clocked at 200 MHz and the AXI-lite bus is clocked at 50 MHz.

B. HW/SW implementation
This section presents the implementation of the recognition algorithm using Co-design approach.The aim of this section is to describe a hardware accelerator based on Gabor filter.

Step 1: Partitioning
Partitioning phase is based on execution time of different module of fingerprint recognition as shown in figure 8.The rule of partition suppose that the module where spend more time will be considered as hardware component and the module where spend less time will be considered as software parts [18].Then, we find the result shown in figure 8 based on the native execution of the fingerprint recognition on a 6 Giga Byte memory, 2.2 GHz frequency of the processor Intel Core i3 with Windows 7 as operating system.We notice that the time execution of the Minutia extraction module is the minimum and we can divide our architecture into:  Hardware components: Binarization, Filter and Matching.
 Software applications: Minutia extraction and Skeletonization.
Step 2: Gabor filter design Gabor filter can be divided into three essential parts: Control Unit (CU), Arithmetic Unit (AU) and Memory [11].The proposed architecture is based on ‗CONV' signal that precede the convolution operation of the filter.When the ‗CONV' signal is selected, the convolution process starts.When it is not selected, the filter proceeds in reception phase.In this phase the filter receives data that is corresponding on image input from DATAIN.The storage will be in the memory, see figure 9.
Simulation shows that the step of filtering consumes more time in execution than other steps.The idea is to implement a digital Gabor filter as a hardware accelerator.The contribution is to describe the Gabor phase that was never exploited [9] in the literature.There are three essential parameters that must be considered under design: (1) accelerate the speed of execution (2) minimize the Silicium area and (3) decrease the power consumption.Firstly, when the convolution signal is not selected the input data receives pixels and stores them in the memory [11].When the convolution signal is selected the convolution process starts and follows steps mention in figure 10.The output image that is filtered is given after nine convolution operation.
Figure 11 presents the overview of the top level filter.Gabor filter is composed by six inputs and one output.Data IN receives initial data before filtering.During the writing phase in memory, P-X and P-Y will be stored in specific location.CLK input generates signal every 40ns period.RST input resets the execution of the filter.Arithmetic Unit presents the core of the digital Gabor filter, see figure 11.This unit store the coefficient of Gabor [4].AU is composed by necessary 2 parts: the ROM and the MAC.
 The main advantage of the ROM as a memory is to save the coefficient.
 The MAC is the association of nine multipliers and eight adders.The multipliers are designed in parallel to accelerate the convolution process.The adders are designed in sequence to make the sum of the nine multipliers.
The ‗CONV' signal ensures that the convolution process has made without any mismatch and with the correct data.The ‗CLK' signal and the ‗CONV' signal are connected to the memory and the MAC.When the CU receives ‗READY' and ‗SET', the convolution process finishes.The CU sends an activation signal to the ‗CONV' to activate the convolution process.The convolution function takes 9 complete processes.Before the multiplier operation, the ‗CON' signal data will be stored in a buffer.The idea is to be sure that the data sent from the memory to the convolution process is correct.The design is based on pipeline architecture.The duration of execution contains 222 clock cycles with a time period of 40ns.www.ijacsa.thesai.orgThe control logic unit function presents the main controller in the filter.His role is to attribute process for other blocks.It plays also the role of middle between memory and AU.
The CU generates the address location if the 'START' signal is selected.The CU is composed by two blocks: counter for the coefficient and the counter decoder.The CU is composed by two blocks: counter for the coefficient and counter for the decoder.
The design of the counter gives the relationship between the coefficient and memory address.
The main role of the memory is to save data image in specific location.The address for X-direction is complete from two parts: CU or DATAIN.

C. Results and discussion
In this section, the results obtained from experiments are presented.Discussion section can be divided into two parts: GPU (Gabor filter, binarization and matching) experiments, multi-processor design experiments.

GPU (Gabor filter, binarization and matching) experiments;
The outputs results for the top level filter are the real convoluted data and the expected result.The difference between outputs results represents the error of convolution function.This error is only about 0.001%.The proposed design of Gabor filter based on amplitude and phase both shows a good performance.In addition, the execution time is speedy, it takes 222 cycles only to obtain filtered image.In term of silicium surface, experiments show also that the number of slices has been decreased from 5759 to 1625.This result is performed because in our proposed design we have reduce the number of multipliers and adders compare of the traditional architecture of Gabor filter.Our proposed design makes different optimization in memory functionality and the www.ijacsa.thesai.orgcontroller unit.

Multi-processor design experiments;
DB14 represents a database that contains 10.000 fingerprints as templates.By a single identification, the recognition process computes 100 identifications using 100 different and randomly selected fingerprints from each database as the input [43].The next table presents the results using the proposed algorithm and design comparing with others works.It is necessary to mention that all compared works use the same database and the same conditions [43].
In table 2, the proposed design ensures the best execution time for fingerprint recognition.Our proposition shows that the speed-up factors range is about 44.63 times in comparison with CPU execution in [43].The result shows also that the speed-up factors range is about 2.84 times in comparison with GPU execution in [43], 1.84 times in comparison with Jiang design [46], and 1.43 times in comparison with Chen design [47], see figure 12.

VI. CONCLUSION
In this paper, an algorithm for fingerprints recognition is proposed.The last is based on comparison made between different works in literature.As a first contribution, a Gabor filter design has been proposed as hardware accelerator to decrease the execution time.The estimated results are reached.The experiment results between the outputs results of Gabor filter algorithm and the design is too near.This proves that the proposed blocks are well chosen.
As a second contribution, a whole hardware design based on multi-processor architecture is described.The proposed design combines with 4-cores and GPU (Gabor filter, binarization and matching) for hardware implementation and software modules: skeletonization and extraction Minutiae.
Compared with other works cited in previous section, our design achieves the best execution time.This advantage is showed mainly for ATM system that has a huge number of saved fingerprints.Jiang Design [46] Chen Design [47] Our proposition speed-up in comparison with CPU execution [43]

Figure 3 .
a shows the results of skeletonization process.As observed by figure 3.a, the zigzag ridge presents the major problem of skeletonization.It can influence the detection of the Minutiae because the changing of one pixel can modify the kind of Minutiae.As a second contribution in skeletonization, smoothing filter is applied to decrease the zigzag effect, see figure 3.b.

TABLE II .
COMPARISON BETWEEN THE PROPOSED DESIGN AND RELATED WORKS BASED ON TIME EXECUTION