A Method for Segmentation of Vietnamese Identification Card Text Fields

The development of deep learning in computer vision has motivated researches in related fields, including Optical Character Recognition (OCR). Many proposed models and pre-trained models in the literature demonstrate their efficient in optical text recognition. In this context, image processing techniques has an essential role in improving the accuracy of recognition task. Because, depending on the practical application, image text often suffering several degradation from blur, uneven illumination, complex background, perspective distortion and so on. In this paper, we propose a method for pre-processing, text area extraction and segmentation of Vietnamese Identification Card, in order to improve the accuracy of Region of Interest detection. The proposed method was evaluated with a large data set with different practical qualities. Experiment results demonstrate the efficiency of our method. Keywords—Optical Character Recognition (OCR); text identification; identification card detection and recognition


I. INTRODUCTION
Identification (ID) Card is a personal card, providing basic information of citizen such as full name, date of birth, place of origin, place of permanent residence, nationality, religion, date and place of issue. In almost daily business, those information are required and usually extracted manually. It is not efficient process because we need a lot of time to input data one by one. Therefore, we need a method that processes automatically which is know as Optical Character Recognition (OCR).
Typically, a method for recognition of optical characters contain three main phrases: pre-processing, layout analysis and recognition [1], [2]. The pre-processing usually relates to improving the input image to reduce the noise, and to enhance the processing speed in the next phrases. Different basic image processing can be applied at this step such as: automatic contrast adjustment, noise reduction and so on.
The output of pre-processing phrase is passed to the next one where page layout analysis is essentially performed for detection of Region Of Interest (ROI), such as text, image region [1]. At the final phrase, where the main principle of what usually called Optical Character Recognition happens, the potential text areas are recognized by different methods/models. Therefore, the optical character recognition task is in fact performed at the last step, after a series of image processing techniques. With the development of deep learning in computer vision recently, many efficient models/methods has been proposed which allow us to recognize optical character with high accuracy, for example Tesseract OCR [3], CHAR model [4], CTPN [5] and so on. In this context, with a same method or model, the key factor to improve the accuracy depend now on the way that data input is processed and also on the accuracy of ROI (the text fields) detection. Because, depending on the practical application, image text, especially ID Card images, often suffering several degradation from blur, uneven illumination, complex background, perspective distortion and so on.
A Vietnamese ID card usually contains text fields with different font styles and size. In many cases, the characters and also the other parts like rows, the seal, the signature was not well printed which cause the inaccurate information, like the overlap of characters, of rows. In addition, by the time, the card is normally faded and blurred.
In the literature, there are already existing works to improve the accuracy of ID card reading by different techniques before the recognition of optical characters [1], [2], [6], [7], [8], [9], [10], [11]. But for the Vietnamese ID Card, especially with the old form, it still lacks an efficient method to improve the quality of input data, reduce noise or time for the recognition task.
In this paper, we proposes a method to detect and separate text fields in a Vietnamese ID Card by analyzing image structure and basic pre-processing image technique like tilt adjusting, noise filtering, background removing, color channel analysing, connected component analysing, mask line creating, table structure analysing and binary image.
The paper is organized as follows: Section 2 presents the work that relates to our problem; Section 3 proposed our method for detection and separation of ROI; Section 4 provides the experimental evaluation; Section 5 is our conclusion and further work.

II. RELATED WORK
Regarding to the process of text identification from a text image, it composes of six steps as shown in Fig. 1, in which page layout analysis and text zone recognition are two important ones. These two steps determine success or failure of an identification system. Page layout analysis is to analyze and determine locations, structures of different information areas on the input image such as text, image, table an so on. At this step, after being defined (detected) text zone will be removed/separated from the background and sent to next step.
For the Top-down approaches, starting from the input image page, a loop is performed where the image is divided into smaller ones until it satisfies given conditions or the gained areas are unified. The execution speed is main advantage of this approach. But it requires the knowledge about structure of the page layout. Typical top-down algorithms include algorithm that use the projection [14], X-Y cut algorithm, white stream algorithm [13], analytical algorithms based on region-specific transformations [15].
With respect to the Bottom-up ones, starting from pixels of the input page, the approaches pair them into bigger homogeneous regions (connected components, characters, row, text block . . . ). Advantages of those approaches are flexibility and ability to bear tilt of the page (even much tilted). However, it is slower than top-down approaches. In the literature, there are several works based on this approaches, for example Docstrum [18], algorithms basing on Voronoi diagram, run-length smearing [19], segmentation algorithm based on the development of regions [11], differential algorithm using morphological operators [20].
Regarding to the Hybrid approaches, the advantage of the above approaches are combined to analyze the page layout. Many works related to this approaches can be found in the literature, like spit-and-merge [21], bottom-up analysis method combined with tab-stop detection technique (tab-stops) [12], method combining of bottom-up analysis and machine learning techniques [14].
With respect to Vietnamese ID card image, the traditional text structure analysis faces following difficulties in detection and separation of text field: • Background of the ID card contains complicated patterns and not unified; text and background colors are sometimes similar and it is difficult to separate them.
• Personal information fields may be deviated from the standard lines or overprinted in the preprinted section.
• Uneven text among ID cards, even in a same ID, there are words which are too dark or too light (translucent).
• ID cards can be stained, moldy, creased, and in some cases the text is more blurred than the background pattern.
• Quality of the image is unstable, which depends a lot on the light source at the time of image acquisition.
• For the back side of the ID card, it is very common that stamp or signature overlaps with information such as date of issue, place of issue.
Due to this complexity of structure and quality, for detection and separation of the information fields in a Vietnamese ID Card, we can not simply use only one algorithm. For each concrete information field, it needs a suitable method to deal with its specific situation. In the next section, we will present four main steps supporting in analyzing a Vietnamese ID Card.

III. DETECTION AND SEPARATION OF INFORMATION FIELDS IN ID CARDS
The Vietnamese ID Card is in rectangular form, sized 85.6 mm in length and 53.98 mm in width; its two sides are flowerpatterned in light white blue. The front side contains two main parts: (i) to the left from top to bottom is the national emblem of the Socialist Republic of Vietnam, sized 1.9 cm in diameter; the 3 x 4 cm photo of the identity card holder; its valid timelimit; (ii) To the right, from top to bottom: the first rows is "Cong Hoa Xa Hoi Chu Nghia Viet Nam" (The Socialist Republic of Vietnam), and the second one is "Doc Lap -Tu Do -Hanh Phu" (Independence -Freedom -Happiness); words "Chung minh nhan dan" (People's identity card) (in red); the number, birth name and family name; sex; commonly used name; birth date; birth place; residence place.
Similarly, the back side also composes of two parts: (i) to the left: there are 2 blocks, the upper block for the left forefinger print, the lower block for the right forefinger print; and (ii) to the right, from top to bottom: identification particulars; day, month, year of issuing the identity card; the title of the issuer, his/her signature and seal. The detail information are described in Table I. Due to this variety, the analysis of the front side and the back side should be done separately. Even for each part in a same side, for the front side, the ID card Number has a different style comparing with the other fields, so that it will be processed differently. Regarding to the back side, it contains a table that requires a specific structure analysis while detecting and separating information fields. Therefore, we propose an adaptive method, as illustrated in Fig. 2, for detection and separation of the Vietnamese ID www.ijacsa.thesai.org card. The method will process the two sides separately, but it has a same procedure: • The details of each step are presented in the next section. 1) Image pre-proceeding, enhancing the quality of input data: As mentioned above, ID cards can be stained, moldy, crumpled and worn out over time. Therefore, improving and enhancing the quality of input image is necessary and important.
Pre-processing was done in both front and back side of the card. It includes basic steps: Convert the color image to the gray-scale one; align tilt, smooth and create the binary image.
2) Detecting and separating the ID card number: For the front side, the important information we need is the ID card Number, so that with this side we firstly detect and separate the ID Card Number field.
However, due to the same color among the ID card Number, wavy lines, the national emblem and sometimes clothes of ID card holder; therefore, firstly we highlight the ID card Number by the color channel analysis technique.
Then, based on the location and structure combining with vertical and horizontal projections we can detect the number. The algorithm is described as follows: The first step is to convert the input image to the grayscale. Then the closing morphology is applied to "connect" adjacent characters (components) into rows (blocks). After that, Otsu's method [22] is used to separate pixels into two classes, foreground and background. Then, we reverse grayscale to convert gray-scale image to binary image and separate connected components. Connected components collected from this step can include heading "ID card", ID card No., one part of national emblem or portrait, as shown in Fig. 3.
Finally, based on characteristics of their location, size, horizontal and vertical projects, we eliminate connected components in image areas and the heading. The obtained result is a image area containing ID card No that is separated from the others.  Moreover, while stamping/printing and finger-print, the characters or the fingerprints may overlap with lines which makes it difficult to detect the table structure. Therefore, to determine table structure, the horizontal and vertical lines should be clearly defined. Since they have a same characteristic, we apply also a same algorithm to define these lines. The algorithm is describe as follows, in Algorithm 2. The morphological changes is applied for the first step to highlight the horizontal lines. Then connected components is analysed (function CC Analysis(I Bin ) to define and separate interconnected components. From this, we can define a set of horizontal lines in the image. This set may contain the actual horizontal lines of the table or the horizontal lines made by adjacent dots of the baseline or underline of signature, as shown in Fig. 4. In the next step, based on the characteristic of distance, relative location and length, we eliminate horizontal lines which is not in the table. In addition, the missing parts is added; and the rows of the table is also adjusted and smooth.

3) Analysis of
The algorithm to define the vertical lines is the same with the horizontal ones. After defining the horizontal and vertical lines, the areas with information fields for verification from the table would be separated.

4) Detecting text rows:
The detection of text rows is applied on the binary image block after separating national emblem, portrait, headings and ID card Number in the front side, or the text image defined from the table in the back side.
Traditionally, in order to identify the text in a paragraph, the histogram can be used horizontally in the text block, as shown in Fig. 5. However, as analysed above, the information rows may be tilted/deviated from the standard row or overlap on header or the other information rows, so that the traditional approach is sometimes not effective.

Fig. 5. Histogram of a information block in ID card
To resolve this issue, we apply the bottom-up algorithm to define text row, as presented in Algorithm 3.

Algorithm 3: Defining text rows
Input: Binary block image: T ext Block Image Output: List of information rows to define: Row List 1: Analyze and define connected components: CC List ← CC A nalysis(T ext Block Image) 2: Filter temporarily noises in row: • N oise List ← {cc noise|cc noise ∈ CC List AN D size of (cc noise) ≤ noise size} • Small List ← {cc small|cc small ∈ CC List AN D size of (cc small) ∈ (noise size, small size} {CC List \ small List} 3: Arrange CC List according to x abscissa 4: Create format of first row from the arranged CC List 5: Edit the baseline of the newly created lines 6: Calculate overall deviation of the text block 7: Temporarily filtered items to their respective lines Firstly, a list of connected component (CC-Connected Component) is defined as a sequence of pixels (black dots) consecutively. This list contains characters, circumflex/tones (or interference) where the first ones relate to the typically components of the row while the second one is untypical. In the next step, we temporarily remove untypical ones in order to format and adjust the row precisely.

At
Step 3, we arrange the typical components by the abscissa operation. It is not only a process which support to format the row, but also enable the algorithm to detect tilt of the row.
In order to format the row, for each connected component in the set of the CC List, we check if there is row that intersect the component, if yes, adding the component to the row and reformat the row; if not creating a new row. In the case there are many such rows, the most intersected one is selected.
In the next steps, we add the baseline for the new created rows and put the untypical components (temporarily filtered) in the corresponding row. It allow us to avoid the omission and lost of information during the character segmentation at identification step.

A. Experiment Configuration
The experiments is carried out using 1856 ID Card images, in which there are 928 front-side-images and 928 back-sideimages of ID cards. ID cards were collected from many provinces, in various qualities, font sizes, printing style and scanned at resolutions of 200dpi, 300dpi and 400dpi. For actual assessment, each ID card was given ground-truth data, which contains information about the information areas to be detected in the image such as total number of areas, coordinates of areas. The experiments is performed HP computer, with speed 2.4GHz, 6.0 GB RAM, on Window 10.
In order to evaluation of the result, we based on Precision, Recall and F-measure [23], which are calculated as following: • F-measure = = (2*Precision*Recall)/ (Precision+Recall) [23] Let V be the area to be detected (defined in ground-truth file) and V be the detected area (by the program), we call undetected area are the area that can not be detected if its ordinate exists in ground-truth file but the program can not detect, which mean V = φ. The correct detected area is considered as corrected detected if square of the intersection of V with area to be detected V t meet the following requirements: In which: S(.) is square; C 1 , C 2 are optional constants An area is considered as incorrect detection area if it is not included in two cases mentioned above. It will satisfy the following requirements:

B. Results Analysis
Let N T be the correct detected area, N M be the undetectable area, N F be the incorrect detected area, the result is shown in Table II.
The experiment results show that the detection measure based on the separation of color channel support a high precision with ID card No. field (100%). Because the area is clearly contrasted with neighboring areas, as shown in Fig.  6. In total of 925 ID Card Numbers, there was only 3 undetected cases. It happened to ID card images which was too old and unclear, as illustrated in Fig. 7. Therefore, these fields could not be detected. For remaining information fields, the omission or wrong detection was often caused by the fact that the lines of information was oblique, or overlapping, as presented in Fig.  8 and 9.
By observing Table II, we can see that the performance is reduced from the top information fields to the bottom ones. It is explained by the increase of number of information of these fields. As illustrated in Fig. 6 where the ID card Number (the red number in the fourth line) has only 11 characters, while the place of origin and residence (the last two fields) has more than that; or as presented in Fig. 10 where the date and place of issues have the most information (the last fields with the red stamp), while the ethnic group and region (the first line) has only a few character.
In general, the detection and segmentation of front side perform better that the back one. F-Measure of the first one perform 1.64% higher (98.98% vs. 97.34%). Because, the  field of the front side are well separated than the back side. Moreover, the back side fields are usually overlap with each others. Especially the date and place of issue which are overlap by the stamp and signature (as shown in Fig. 10), that why the F-Measure of these fields is the lowest.

V. CONCLUSION
The article propose a solution for detection and segmentation information fields, which is suitable for identification (automatic data input) of personal information on Vietnamese ID card. Based on its specific feature, the detection and segmentation are divided into two separated step for the back side and the front side. After a series of basic image processing,for the front side, we detect the ID Card Number, and the other We performed an experiment with 928 Vietnamese ID Card. Experimental results show the effectiveness of the proposed method which obtain more than 97% of accuracy, more than 95% of recall and measure effectively more than 96% of all information fields.
Future developments regard the extension of the proposed method in whole process of the recognition of Vietnamese ID card. Moreover, we need collect more ID model in order to test and improve the accuracy of the method.