A Novel and Efficient Point Cloud Registration by using Coarse-to-Fine Strategy Integrating PointNet

—The registration of the point cloud plays a critical and fundamental role in the computer vision domain. Although quite good registration results have been obtained by using the global, local, and learning-based registration strategies, there are still many problems to solve. For example, the local methods that are based on geometric features are very sensitive to attitude deviation, the global shapes-based methods are easy to result in inconsistency when the distribution differences are obvious and the learning-based registration methods have highly relied on the huge label data. A novel and effective registration method for the point cloud data integrating the coarse-to-fine strategy and the improved PointNet network is proposed to overcome the above-mentioned drawbacks and improve registration accuracy. The improved Random Sample Consensus (RANSAC) algorithm is developed to effectively deal with the initial attitude deviation problem in the coarse registration procedure and the improved Lucas and Kanade (LK) algorithm is proposed based on the classical PointNet framework to reduce the errors of the refine registration, and the whole registration procedure is implemented under a trainable recurrent deep learning architecture. Compared with the state-of-the-art point cloud registration methods, experimental results fully prove that the proposed method can effectively handle the significant attitude deviation and partial overlap problem and achieves stronger robustness and higher accuracy.


I. INTRODUCTION
As one of the most faithful and convenient data formats, the datasets of point clouds have been popularly applied in the domain of 3D reconstruction [1,2], virtual reality [3], augmented reality [4], etc. Due to environmental and other influence factors, the registration of point cloud is an essential step before various tasks of computer vision and robotics applications.For example, in the auto-drive domain, the autodrive system unifies the point clouds collected from different positions by the laser radar to the same coordinate system to build the three-dimensional high-precision map, and then it matches the real-time collected data to the high-precision map through the point cloud registration [5].Other typical examples include the three-dimensional location for robotics [6] and the pose estimation from different point cloud data [7].
From a mathematical perspective, the registration procedure of the point cloud is usually treated to be an optimization problem, which searches the space correspondence parameters by minimizing the transformation estimation error under some objective metrics [8].Once the best correspondences are found, the search stops.Many typical work has been reported in this domain.As a typical example, the famous Iterative Closest Point (ICP) registration firstly iteratively assigns the point to the nearest point in a different space of point cloud and then calculates the least squares distances of point pairs to be the objective function.Because only spatial coordinates are used to guide the search, the ICP is very easy to initialize.However, the traditional ICP algorithm usually requires a large overlap area between two frames of the point clouds [9].In addition, the robustness is not good enough since only the point features are used but the other important features information is lost.Therefore, some registration methods based on the extracted structural features are proposed to improve the registration accuracy, such as the histograms and adjacent points [10], the Euclidean distance [11], the normal vector difference [12], and the surface curvature [13].Another classical registration scheme based on the Random Sample Consensus (RANSAC) has been also popularly applied in coarse registration.Among them, the 4-Point Congruent Sets (4PCS) is the respective one, which determines the correspondence by comparing the intersection diagonal ratios of four-point sets [14].The 4PCS algorithm can handle point cloud registration tasks in complex scenes, and a series of improved versions have been developed.For example, to deal with the registration task in the large-scale scene, the computational complexity can be reduced from O(n 2 ) to O(n) by the Super 4PCS [15], which uses the intelligent strategy to index; the k-4PCS algorithm [16] improves the registration precision by replacing the randomly sampled points with the sparse key points; the Generalized 4PCS [14] effectively reduce the time cost by no longer strictly restricting the coexistence of four points of the 4PCS in a plane; the V4PCS [17] algorithm incorporates the concept of volume consistency to reduce the time cost and the 2PNS [18] is proposed to deal with the registration problem under the smaller overlapping scenes (with a minimum of only 5% overlap).All of these methods produce good registration results but the deep features have not been carefully considered.
Very recently, with the breakthrough of the theory of deep learning, the learning-based registration methods become the www.ijacsa.thesai.orgresearch spots in this domain.Charles et al. constructs the famous PointNet network [19], in which each point through maximum pooling can extract features without conversion, and it solves the problems of permutation invariance and disorder of the point cloud.Inspired by this work, many deep learning-based registration models have been constructed [20][21][22][23].For example, Wang et al. proposed the DCP algorithm based on the dynamic graph convolution network [24,25].It combines the local context information and the communication by using the attention mechanism [26] to get the soft mapping relationship between the point clouds.The registration matrixes (including the translation and rotation) are computed according to both of the smooth mapping relationship and differentiable decomposition of the singular values.The performance of the time efficiency and accuracy is good but it is very sensitive to rigid transformations since it heavily relies on the local geometric features.Therefore, its performance is not satisfying when handling significant initial attitude differences [27].Zi et al. proposed the feature extraction network RPM-Net [28] to reduce the initialization sensitivity.It calculates the mixed features from the spatial location and the geometric characteristics, and then obtains the soft assignment by using the Sinkhorn layer [29].Zan et al. constructs a deep learning architecture named "3DSmoothNet" to implement the 3D point cloud registration, and its convolutional layers is represented by using the smoothed density value [30].Huang et al. proposes the fast registration framework that based on the feature-metric strategy, which considers the registration procedure to minimize the error of the feature-metric projection without correspondences.As reported, it is a semi-supervised model and is very robust to the density difference in point cloud data [31].For all of these methods, the global shape information can be well used to maintain the robustness, however, the registration results are still not good enough when faced with the distribution differences of the point cloud that caused by the partial overlap.
To overcome the above-mentioned drawbacks and improve registration accuracy, an effective and novel point cloud registration framework based on the improved PointNet network is constructed.To deal with the high sensitivity of the initial attitude differences and the partial overlap, the coarseto-fine registration strategy is developed.The improved RANSAC algorithm is employed as the coarse-grained registration to reduce the attitude difference and make the input point cloud roughly aligned.The LK alignment method is further improved to enhance the inaccurate alignment caused by the distribution differences and partial overlap.Unlike the classical ICP method or the improved version, the proposed method does not require expensive point-to-point calculations.In addition, due to the excellent learning and extracting ability of deep features, it has better generalization for invisible objects and shape changes.In summary, the major contributions of this research work are described as follows:  Firstly, a novel registration framework based on the coarse-to-fine strategy is developed.The coarse registration is used to obtain an excellent initial transformation position and the fine-grained registration is used to implement the further optimization to improve the accuracy.For the coarse registration, the improved RANSAC algorithm is proposed to effectively overcome the default that caused by the initial attitude difference.
 Secondly, the Lucas and Kanade (LK) algorithm is improved to avoid the inherent defect that the feature representations that directly extract from the PointNet cannot adapt to compute the gradient estimation in convolution steps so that it can be used to deal with the small registration errors that caused by the partial overlap.
 Finally, this study is based on the two above improved algorithms, the registration procedure and the coarse-tofine strategy are carefully implemented under the deep learning architecture of PointNet, and four state-of-theart registration methods are employed to improve the accuracy and superiority.
This work is divided into four parts.Section 1 introduces some background of the point cloud registration.Section 2 presents all the details of the proposed method.Section 3 tests the method and makes a careful discussion.Section 4 summarizes the conclusions and provides the future plan.

II. THE WHOLE METHOD
The PointNet provides a learnable structured representation and is usually applied for tasks of point cloud classification and segmentation.To successfully makes it applicable to point cloud registration, the RANSAC algorithm and the LK algorithm to are improved to adapt to the "imaging function" of traditional PointNet and expand the PointNet model, the RANSAC algorithm, and the LK algorithm into a unified deep learning framework, whose structure is shown in Fig. 1.
The proposed registration framework starts from constructing feature representations that from the PointNet.The representations using the global features are input into the improved RANSAC algorithm to compute the rough transformation between different point clouds and the improved LK algorithm refines the roughly transformed results according to the local feature representations.The registration procedure stops until the optimal transformation is found by the recurrent learning.

A. The MLP Symmetric Pooling Feature Extractor
Let Q and P be the source and target dataset, respectively.When the function  is applied by the multi-layer perceptron (MLP) to the 3D points in P and Q , the dimension of the output is also K .Then, the pooling function with symmetry is applied to promote the invariance of point order arrangement, and a K-dimensional global feature descriptor is obtained.The whole structure is shown in Fig. 2. www.ijacsa.thesai.org

B. The Improved RANSAC Algorithm
The advantage of the traditional RANSAC algorithm is that it can automatically match the model according to the data, but it is not effective to compute the corresponding locations of the point cloud.To effectively apply it to 3D point cloud registration, an improved RANSAC algorithm is proposed to calculate the initial registration matrix to minimize the objective functions between the corresponding point cloud.The whole procedure is shown in Fig. 2. As shown in Fig. 3, the input of the improved algorithm can be treated to be the K-dimensional feature descriptor corresponding to the two inputting point cloud datasets.The main steps are described as follows: 1) Search the corresponding features.Randomly select n features   12 ( ), ( ), , ( ) , and find the corresponding features   2) Calculate the difference vector of the corresponding features.Firstly, the Euclidean distance between the features is calculated; then, the difference ratio is calculated to form the vector  , which is shown in Eq. ( 1).,, max( , max( , max( , 3) Correspond the feature transformation.A temporary transformation matrix i T is estimated from the corresponding feature pairs, and () P

C. The Improved LK Algorithm
In the refine registration, it wants to find the transformation G that best aligns the data Q from P , which can be denoted by using the exponential map in Eq. (3).

exp( )
ii i GT    (3) www.ijacsa.thesai.orgwhere, 1 2 6 ( , , , ) T      is the torsion parameter.i T is the transformation matrix generated by the coarse registration.The three-dimensional point cloud alignment problem can be described as ( ) ( ) P G Q   to find the optimal G , where the abbreviation ()  represents the transformation of Q through the rigid transformation G .
In the traditional LK algorithm, as shown in Equation ( 4), the Jacobian matrix is defined to be: where Usually, the calculation of the J is not an easy issue for it heavily requires the gradient of the distortion parameter for the PointNet function that relative to G .Therefore, as shown in Fig. 4, the stochastic gradient method similar in reference [23] is employed to calculate the value of the Jacobian matrix J .Specifically speaking, each column of the Jacobian matrix are approximated by calculating the finite difference gradient that described by the Eq. ( 5).

(exp(
) ) ( ) where i t is the infinitesimal perturbation of the torsion parameter  .In the improved version, J is equal to an analytic derivative because the i-th torsion parameter i t in each column is non-zero.According to the experiments, a small, fixed value for i t will produce the better results.The  can be expressed as: where J  represents the Moore-Penrose of J .
The Equation ( 6) is used to calculate the optimal twist parameters, and update the point cloud data Q to Equation (7): exp( ) As shown in Equation ( 8), the final estimation matrix is the combination of all incremental estimation that calculated in the iteration loop.

D. The Loss Function
The aim is to search out the best transformation by minimizing the difference between the estimation transformation est G and the forward transformation gt G .To avoid possible logarithmic operation of the function during the training process and improve the computational efficiency, the objective function in Eq. ( 9) is used.

A. The Experimental Details
The experiments are designed by using the point cloud data of the Stanford University and the Geometry Center for training [32].The maximum number for iterations is 80.Other parameters are set to be the best according to the reference [19].
Two classical global methods and two advanced deep learning-based methods are used to compare the registration performance, i.e., the ICP method [33], the histogram based registration method (3DHoPD) [34], the 3DSmoothNet [30], and the PointNet LK [35].
As shown in the Eq. ( 10), the Root Mean Square Error (RMSE) is selected as the error measurement for its popularity in point cloud registration.It refers to the average square summation of the distance between the corresponding points., ( 1) where i P and j Q are the pairwise nearest neighbors in the two datasets.N , M is the scale parameter, respectively.The smaller RMSE means the better result.
Actually, only the RMSE is not enough to know the number of aligned points.The Effective Root Mean Square Error (ERMSE) can better describe the registration accuracy.How to calculate the ERMSE is show in Eq. (11).www.ijacsa.thesai.org where  is the ratio of the aligned points to all the points, k is the number of non-aligned points, N is the number of all the points.

B. The Results and Discussion
Firstly, a test on the learned model by using the open ModelNet40 dataset [36][37][38][39][40] is implemented.The intermediate results in the iterations are shown in Fig. 5.After the model is trained, when the new registration data is input, it can be seen that as the iterations increase, the alignment from the source points to the target points can be well obtained, even if the data is not used to train the model, demonstrating the proposed method is robust enough and has the good feasibility and generalization.Then, the ICP, 3DHoPD, 3DSmoothNet, PointNetLK and the proposed method are tested on the four models that are shown in Fig. 6.The blue points are the source and the yellow points are target points.All the visual registration results are shown in Fig. 7 and the quantitative comparison results are shown in Table I to Table Ⅳ.The deviation of the initial attitude is significant for the data in Model 1 and Model 2. It can be found from Fig. 7 that the ICP and the 3DHoPD perform very poor on these two datasets, even the registration is failed.From the value of  in Table I and Table Ⅱ, it means only few corresponding points are obtained.In addition, the value of the RMSE and the ERMSE is obviously larger than that of the other three methods, showing the traditional global registration cannot well deal with the significant attitude deviation.On the other hand, the tree methods using the deep learning theory perform well on the two dataset, especially the proposed method can get the best ratio of 89.2%, which means most of the corresponding points are obtained.This is because the important structural features in the deep levels can be effectively captured.The superiority is obvious to deal with the attitude deviation problem.
For the data in Model 3 and Model 4, they mainly focus on the translation when the partial overlap happens.It can be found that the ICP and the 3DHoPD perform better than their performance in Model 1 and Model 2; the value of  in Table Ⅲ and Table Ⅳ showing more corresponding points can be obtained.Of all the five registration methods, the proposed method achieves the best quantitative comparison and the highest accuracy is 93.52%, improving five percent compared with the ICP method.Overall, the proposed method can achieve sufficiently good results for both of the translation and rotation in the registration, demonstrating stronger robustness, better generalization and higher accuracy.The time cost of the five methods on the four models is also shown in Table Ⅴ (time/s).It can be found that when dealing with the significant deviation of the initial attitude in Model 1 and Model 2, the proposed method achieves almost 8 times faster than the ICP method and 3DHoPD method, 7 times faster than the 3DSmoothNet method and 1.5 times faster than the PointNetLK method.When deal with the partial overlap problem in Model 3 and Model 4, the time cost of the ICP, 3DHoPD is almost the same, the 3DSmoothNet spends www.ijacsa.thesai.org the most time, and the proposed method achieves at least 1.6 times faster than the PointNetLK method.Therefore, both of the qualitative and quantitative analysis on experimental results show the superiority and feasibility.

IV. CONCLUSION AND FUTURE WORK
A novel point cloud registration method by using the coarse-to-fine strategy is developed.This method integrates the improved RANSAC algorithm and the LK algorithm into the PointNet network, effectively avoiding the inherent defect that the PointNet network cannot adapt to the gradient estimation through convolutions.In addition, the proposed method reduces the attitude difference and partial overlap between the source point cloud datasets by simultaneously making use of the global and local features.Experimental results obtained by four state-of-the art methods on four datasets fully verify its effectiveness, accuracy and superiority.
Though good results have been obtained, some limitations should be fixed, such as the extracted features are not rich enough, the registration accuracy is not satisfying and the time cost is still too high to apply it in practice.In future work, the proposed method will be further optimized by introducing advanced theory and applying it to other registration tasks.For example, the famous transformer model and attention mechanism will be employed to extract more deep features to improve the registration performance.
descriptor of K-dimension, which represents the PointNet function.

Fig. 1 .
Fig. 1.The structure of the registration framework.

. 4 )
is converted to () i P  Compute the transformed matrix shown by Eq. (2).

Fig. 3 .
Fig. 3.The structure of the feature extractor.

Fig. 7 .
Fig. 7.The registration results of different methods.

TABLE I .
THE REGISTRATION RESULTS OF MODEL 1 Fig. 6.The four models of the point cloud.

TABLE II .
THE REGISTRATION RESULTS OF MODEL 2

TABLE IV .
THE REGISTRATION RESULTS OFMODEL 4

TABLE V .
THE RUNNING TIME OF DIFFERENT METHODS