A High-performance Approach for Irregular License Plate Recognition in Unconstrained Scenarios

org


INTRODUCTION
Recognizing license plates is a crucial area of research due to its numerous practical applications, including monitoring road traffic, collecting tolls automatically, enforcing traffic laws, and more. A license plate recognition pipeline for recognizing irregular license plates typically includes four stages: license plate detection, perspective correction, segmenting characters and recognizing characters. License plate detection aims to extract license plate regions from input images. The accuracy of the entire system heavily relies on the accuracy of license plate detection, as the extracted regions are utilized in subsequent stages. As real-world images containing license plates are captured under different viewpoints, license plates may have arbitrary direction. As a result, perspective correction is performed to align the detected license plates. For recognizing characters, a segmentation approach is first used to decompose the aligned license plate image containing a sequence of characters into sub-images of individual character. Then, a character recognition approach is employed to classify each character. Classical approaches based on computer vision for license plate recognition [1], [2] primarily focus on extracting features of license plates based on the background color, contours and edges, and use these hand-crafted features for locating license plates and decomposing characters. Recently, numerous CNN-based approaches for license plate recognition have been proposed, leading to significant advancements. These methods first adopt CNN architectures to extract discriminative feature representations from input images. A network is then used to locate the location of license plates. With the detected license plates, a classifier is proposed to search for license plate characters and classify them. Since character segmentation is a challenge problem due to the effect of lighting conditions, shadows, and noise, various approaches [3], [4] have been proposed to directly recognize license plate characters without segmentation. With the success of CNN and text recognition, CNN-based license plate recognition methods have obtained great achievements in both accuracy and efficient. However, previous methods still depend on high-end GPUs or controlled environments such as specific viewing angles or simple backgrounds. Furthermore, with the growing number of license plate designs, license plate recognition systems that concentrate on single-line plates or frontal plate detection and recognition face increasing challenges.
In view of these issues, a novel framework for detecting and recognizing irregular license plates in real-world complex scene images is designed in this paper. The proposed model can locate and recognize various types of license plates with arbitrary shooting angles in difficult conditions. There are three stages in the proposed model: license plate detection, perspective correction, and recognizing characters. For license plate detection, this paper employs a state-of-the-art object detector and extends it for predicting four corner points of license plates, which are then used to rectify distorted license plates. For license plate recognition, this paper designs a segmentation-free model based on a fast and efficient object detection architecture for predicting license plate characters. The results of the experiments conducted on two extensive datasets show that the proposed model boasts both a high recognition accuracy and rapid inference speed. This paper is structured as follows: Section II presents a literature review of license plate recognition. Section III offers an overview of the proposed approach. The details of the proposed pipeline are described in Section IV. The discussion of experimental results can be found in Section V. Lastly, the conclusion is outlined in Section VI.

II. LITERATURE REVIEW
This section provides a brief literature review on the topic of license plate recognition. This paper focuses on recent methods that are based on deep learning for end-to-end license plate recognition. For studies involving traditional image processing strategies or focused on license plate detection, please refer to [1], [2], [5], [30].
Since license plates usually occupy small portions of input images, various methods proposed to first detect vehicles and then locate license plate regions to improve license plate www.ijacsa.thesai.org detection performance. For this purpose, Sergio and Claudio [6] proposed a novel CNN model that includes a YOLO-based network for vehicle detection and license plate detection and an optical character recognition module for character recognition. The model can detect and rectify multiple distorted license plates before feeding the rectified license plates to the optical character recognition module to obtain results. In [7], the authors presented an end-to-end license plate recognition system utilizing the YOLO detector [8], [9]. This approach first locates the vehicles in the input image by a YOLO detector and then detects their respective license plates in the vehicle patches by another YOLO detector. Afterward, the model detects and recognizes all license plate characters simultaneously by forwarding the license plate region into the CR-NET model [10]. The results showed that this approach obtains high accuracy and fast inference speed. However, the model only recognizes single-line license plate taken from a frontal angle.
Due to the effect of environment conditions, character segmentation is a challenge problem. Moreover, any incorrect character location produced by character segmentation will lead to misrecognition of the license plate characters. To solve this problem, various methods proposed to avoid character segmentation. Wang et al. [11] introduced a cascade approach (i.e., VSNet) for irregular license plate recognition. VSNet consists of a license plate detection network that makes predictions using multiple feature levels produced by a fusion network and a license plate recognition network that features an encoding layer for left-to-right feature extraction and a weight-sharing classifier for character recognition. In addition, a vertex-estimation branch is proposed to rectify distorted license plate images. In [12], the authors presented an end-toend convolutional neural network for license plate recognition that eliminates the need for character segmentation. The network is implemented on FPGA with very fast processing speed. To enhance the accuracy of license plate recognition in unrestricted conditions, Zou et al. [13] proposed a robust license plate recognition framework that uses a combination of Bi-LSTM and contextual position information of license plate characters to locate the characters in the license plate. The authors utilized deep separable convolutions and a spatial attention mechanism for license plate feature extraction to activate the character feature regions and thoroughly extract the features of license plates.
In summary, although the above methods have achieved some significant accomplishments, they have not fully addressed the issue of irregular license plate recognition in unconstrained scenarios. Furthermore, these methods mostly require hardware with high-end GPUs, which is difficult to implement in practical applications.

III. OVERVIEW OF THE PROPOSED FRAMEWORK
The proposed method includes three stages as outlined in Fig. 1. Specifically, the proposed method takes images as inputs and sequentially undergoes license plate detection, perspective correction, and character recognition to produce final license plate characters. Both stage 1 and stage 3 are based on simple and efficient deep CNN architectures for fast inference speed. Overview of each stage is described further. Stage 1: License plate detection. As shown in Fig. 1, license plate detection aims to locate the four corner points of each license plate. For this purpose, this paper employs a lightweight deep CNN structure used for human pose estimation [14] and modifies it for predicting the four corner points of license plates.
Stage 2: Perspective correction. Perspective deformation images are corrected by applying perspective transformation. First, four corner points are predicted by the license plate detection network. Then, the homography between the camera and the license plate is recovered. Finally, the homography is used to warp the detected license plate into a rectified image as shown in Fig

A. License Plate Detection
This paper uses CenterNet [14] for extracting license plate regions and the corresponding corner points from input images. CenterNet considers the center point of a bounding box as an instance and uses this keypoint to predict the dimensions and offsets of the box. CenterNet strikes a desirable balance between precision and speed and is highly customizable and extensible. It can be easily extended to multiple computer vision tasks including 3D object detection, object tracking, human pose estimation, and many others. In this paper, CenterNet is used to predict the four corner points of license plates (i.e., top-left, bottom-left, top-right and www.ijacsa.thesai.org bottom-right corners). Based on the predicted corner points, perspective correction is performed to get rectified license plate images. For this purpose, this paper employs the CenterNet structure used for human pose estimation [14] and modifies it for corner point estimation. The detailed pipeline of the license detection model based on CenterNet is shown in Fig. 3. Consider an input image , where and represent the width and height of the input image, respectively, a fully convolutional encoder-decoder architecture is first used to produce feature representations from input images and generate output results. Three heads are produced after one forward pass from the feature extraction network as shown in Fig. 3 (i.e., keypoint heatmap head, corner location head, and corner offset head). All the heads are predicted with the same dimensions (i.e., height and width) ( , ), where represents the output stride ( in this paper).
1) Feature extractor. This paper adopts RestNet-50 [15] for feature extraction. ResNet blocks are then augmented with three up-convolutional layers to incorporate higher resolution output feature maps. In addition, a 3×3 deformable convolutional layer is used before each up-sampling layer.
2) Keypoint heatmap head. Keypoint heatmap head is used for predicting the center point of license plates. In this paper, the keypoint heatmap ̂ has one channel since only one class is predicted by the license plate detection network. After one forward pass, a Sigmoid layer is utilized on the keypoint heatmap, and the calculated value at each keypoint is viewed as the certainty score for it being the center of the license plate.
3) Corner locations head. Corner location head predicts the four corner locations of license plate (i.e., top-left, bottomleft, top-right, and bottom-right corners). Each corner is considered as a 2-dimensional property of the center keypoint and parameterized by an offset to the center keypoint. The dimensions of this head are ( ). 4) Corner offset head. The Corner offset head is employed to rectify the quantization error resulting from the downsampling of the input. After one forward pass, the coordinates of predicted center keypoints are mapped to a higher resolution input image. This results in a deviation in values because the original image coordinates are whole numbers, whereas the actual center points ought to be decimal numbers. As a result, the local offsets ̂ are predicted for each center point to recover the discretization error.  342 | P a g e www.ijacsa.thesai.org

B. Perspective Correction
As license plates can sometimes be difficult to read due to the viewpoint, perspective correction is performed to align the detected license plates. Based on the four corner points generated by the license plate detection network, the homography between the camera and the license plate is first recovered. Then, the homography is used to warp the detected license plate into a rectified image as shown in Fig. 1. To be more specific, based on the detected corner points from input image, this paper first identifies and , which represent the maximum horizontal and vertical distances between the corner points. Then four corresponding vertices of rectified image are calculated as follow: (1) where represent the top-left, top-right, bottomleft, and bottom-right corners of the rectified license plate.
Following [16], the perspective transformation matrix is calculated from the detected corner points and corresponding vertices as follow: where: and Finally, the rectified license plate region is formed as follow: (6)

C. Character Recognition
Character recognition aims to identify each character on the rectified license plates. For this purpose, this paper considers character recognition as character detection problem and designs a lightweight character detection network that predicts each license plate character without depending on license plate layouts (i.e., license plates of single-line or double-line text). Specifically, the lightweight character detection network is trained to detect 35 classes (i.e., "A-Z", "0-9", the digit "0" is recognized jointly with the letter "O") based on the rectified license plates as well as the bounding box and class of each character as inputs. In the case of Chinese license plates, the initial symbol is a Chinese character that signifies the province. As stated in [17] and [7], the character detection network proposed in this work has not been trained to identify Chinese characters because assigning the category to such characters is not a straightforward task. Table I showcases the design of the suggested lightweight character detection network. The design of the network is influenced by the Fast-YOLOv4 model [18], which is a tiny deep neural architecture that obtains very fast detection speed without sacrificing much accuracy. As shown in Table I, 3×3 convolution layers are used to extract features from previous layers followed by 1×1 convolution layers for reducing the feature channels. In addition, max pooling layers are used to decrease the feature dimensions. The number of channels is multiplied by two following each max pooling layer. Following [9], [19], this paper applies detection head at different scales to predict license plate characters. Specifically, character prediction is performed at layer 13 and layer 20, where the output size is 24×8 and 48×16, respectively. This detection approach is crucial for character recognition because the characters on the license plate may take up either a small or large area of the license plate region, as depicted in Fig. 4. It is worth mentioning that the proposed license plate character recognition system accurately detects and identifies license plates with either single-line or double-line text as it predicts all characters on the rectified license plate. All experiments were carried out on a computer equipped with an Intel Core i7-10700 CPU, a single NVIDIA GeForce GTX 1080Ti GPU, and 64GB of RAM. All models are designed and evaluated under the framework of PyTorch [20] and mmdetection [21].

A. Dataset and Evaluation Metrics
To assess the proposed method, this paper evaluates experiments on two extensive public datasets: CCPD [22] and AOLP [23].
CCPD [22] consists of 290k images captured under diverse illuminations, environments, and weather conditions. This dataset is more challenge than other datasets for license plate recognition since each image is captured from different positions and angles, which makes license plates have arbitrary direction. The dataset provides sufficient annotations for training the proposed model, including bounding boxes, vertices of each license plate, and license plate characters. Images in the dataset have the resolution of 720 (width) × 1160 (height) × 3 (channels). Following [22], this paper employs 100k images of CCPD-Base subset for training both detection and character recognition network. The remaining 100k images from the CCPD-Base subset and the 80k images from the CCPD-DB, CCPD-FN, CCPD-Rotate, CCPD-Tilt, CCPD-Weather, and CCPD-Challenge subsets are utilized for testing. Additionally, the CCPD dataset also includes the CCPD-Characters subset, consisting of over 1000 individual images for every possible license plate character. This paper utilizes the CCPD-Characters subset for further training the license plate recognition network.
AOLP (Application-Oriented License Plate database) [23] contains 2049 images. The images in this dataset are classified into three categories based on the capturing conditions: access control (AC), traffic law enforcement (LE), and road patrol (RP). The AC subset consists of 681 images of license plates captured in scenarios where vehicles move through a fixed passage at a slower pace or come to a complete stop. The LE subset consists of 757 images of license plates captured by a roadside camera during instances of traffic law violations. The RP subset includes 611 images of license plates captured from vehicles with random viewpoints and distances, making it more challenging for license plate recognition due to the heavily distorted license plates. Since the AOLP dataset only provides annotations for the coordinates of license plate bounding boxes and numbers, this paper manually annotates the four corners of each license plate. In line with [11], this paper trains on the LE and RP subsets and uses the RP subset for testing.
For the evaluation metric, this paper calculates the accuracy of license plate recognition by dividing the number of correctly recognized license plates by the total number of license plates in the test set. A recognition is considered correct only if the is greater than 0.5 and all characters have been correctly recognized. Here, is calculated as follow: where is the detected polygon of license plate and is the ground truth polygon of the license plate. and are calculated based on the detected corner points and ground truth corner points, respectively. Table II provides recognition results of the proposed approach and recent approaches on the CCPD dataset. The results demonstrate that the proposed approach obtains the best recognition accuracy on most of the subsets. To be more specific, the proposed model obtains recognition accuracy at 99.7%, 99.2%, 99.1%, 99.6%, and 99.6% on CCPD-Base, CCPD-DB, CCPD-FN, CCPD-Rotate, and CCPD-Tilt, respectively, which outperform all previous methods, including method proposed by Zhang et al. [3]. For CCPD-Challenge subset, method proposed by Zhang et al. [3] obtains the best recognition accuracy. Since CCPD-Challenge subset comprises the most difficult images for the recognition of license plates, the simple and efficient license plate detection network cannot locate some license plates (Fig. 6), which leads to wrong recognition results by the character recognition network. In the future, this paper will investigate more effective fusion strategies to enhance the feature representation of the license plate detection network, which would improve detection results. It is noteworthy that the majority of the comparison methods in Table II determine the recognition results by setting the IoU threshold to 0.6. Additionally, it is observable that the proposed method attains the most significant improvements in the CCPD-Rotate and CCPD-Tilt subsets. Specifically, the proposed network improves recognition accuracy by 3.2% and 2% on CCPD-Rotate and CCPD-Tilt subsets compared with that of model proposed by Zhang et al. [3]. Given that the CCPD-Rotate and CCPD-Tilt subsets contain images with significant perspective distortion, these results show that the proposed model excels at detecting and recognizing license plates that have undergone distortion or rotation. For recognition speed, since the proposed model is designed based on fast and efficient architectures, it obtains the fastest recognition speed among comparing methods. To be more specific, the proposed model needs 8.3ms for processing an image based on single NVIDIA GeForce GTX 1080Ti GPU. The results indicate that the proposed model is efficient and well-suited for real-time applications. As depicted in Fig. 5, this study showcases the recognition results of the proposed approach on the CCPD dataset, including the detection of the four corners of the license plates, the rectified license plates after perspective correction, and the recognition of the characters on the license plates. It is evident that the proposed model performs effectively under various conditions. Fig. 6 shows some failure cases where the proposed model cannot locate license plates in challenging environments or fails to recognize some similar license plate characters.  Zhang et al. [3] 91.9

C. Results on AOLP
For the AOLP dataset, AOLP-RP subset is employed to evaluate the proposed model since this subset is more challenging (most of images contain severe perspective deformation license plates). Table III presents the comparison of recognition accuracy on the AOLP dataset. The proposed model emerges as the top performer in terms of recognition accuracy on the AOLP dataset, outperforming other methods. Specifically, the recognition accuracy of the proposed model surpasses that of the method proposed by Sergio et al [6] by 0.4%. This result further strengthens the claim of the proposed model's capability in recognizing license plates that have an irregular shape.

VI. CONCLUSION
This study presents a CNN-based approach for detecting and recognizing license plates with irregular shapes in complex real-world images. The proposed model employs a CenterNetbased CNN structure to predict the four corners of the license plates, followed by perspective correction to align the detected license plates. For character recognition, a YOLO-based segmentation-free model is designed to predict the characters on the license plate. The effectiveness of the proposed method is verified through experiments on the CCPD and AOLP datasets. Specifically, experimental results on two datasets show that the proposed method obtains the best recognition accuracy with the fastest recognition speed. This result demonstrates that the proposed method is highly suitable for intelligent traffic management applications that require real-time processing. In the future, the study intends to investigate additional fusion techniques for extracting more discriminative features from the input images, which can enhance the accuracy of the license plate detection network.