WSDF : Weighting of Signed Distance Function for Camera Motion Estimation in RGB-D Data

With the recent advent of the cost-effective Kinect, which can capture real-time high-resolution RGB and visual depth information, has opened an opportunity to significantly increase the capabilities of many automated vision based recognition including object/action classification, 3D reconstruction, etc... In this work, we address the camera motion estimation which is an important phase in 3D object reconstruction system based on RGB-D data. We segment objects by thresholding algorithm based on depth data and propose the weighting function for SDF that is called WSDF. The problem of minimizing of this function is solved by Gauss-Newton methods. We systematically evaluate our method on TUM dataset. The experimental results are measured by ATE and RPE that evaluate both global and local consistency of camera motion estimation algorithm. We demonstrate large improvements over the state-of-the-art methods on both plant and teddy3 objects and achieve the best ATE as 0.00564 and 0.0182 and the best RPE as 0.00719 and 0.00104, respectively. These experiments show that the proposed method significantly outperforms stateof-the-art techniques. Keywords—RGB-D data; 3D Reconstruction; SDF; Camera Motion Estimation


INTRODUCTION
Reconstructing 3D object is an interesting and challenging problem in computer vision.It has attracted many research efforts from the computer vision community in recent decades for its high potential applications such as game, SLAM, medical technology, virtual reality, and robotics.Due to its wide range of applications, 3D object reconstruction has attracted much attention in recent years [2].Generally speaking, 3D object reconstruction framework contains three main steps namely object segmentation, camera motion estimation, and surface reconstruction (see in Fig. 2).Object segmentation is to identity the object region in images that can achieve by using the algorithms such as kmean, mean shift, ostu ... Camera motion estimation aims to represent the movement of object over frames.The result of this phase is point cloud that describe object in 3D space.Surface reconstruction focus on reconstructing the surface mesh… In this work, we only focus the problem of the camera motion estimation phase.We use the Ostu and thresholding algorithm for object segmentation.
The advent of affordable RGB-D sensors has opened up a whole new range of applications based on the 3d perception of the environment by computers, which includes the creation of a virtual 3d representation of real objects.Compared with conventional color data, depth maps provide several advantages, such as the ability of reflecting pure geometry and shape cues, or insensitive to changes in lighting conditions.Moreover, the range sensor provides 3D structural information of the scene and objects.These characteristics will be helpful for object segmentation and camera motion estimation.In this manuscript, we proposed the weighting parameters for SDF that was proposed at [4,5] to improve the performance of camera motion of 3D reconstruction system based on RGB-D data.The main contributions of this paper are summarized as follows: Firstly, we apply the weighting approach for SDF for camera motion estimation based on RGB-D data.Secondly, we systematically evaluate our WSDF on four challenging datasets.
The rest of this paper is organized as follows: Section II gives a concise review of existing works on camera motion estimation for 3D reconstruction.Section III presents signed distance function for camera motion estimation.Section IV introduces our improvement for camera motion estimation.Section V presents action classification.Section V shows the experiment results on relevant benchmarks.Finally, section VI draws conclusions of our work and indicates future studies.www.ijarai.thesai.orgFig. 2. Flowchart of 3D object reconstruction system in RGB-D data II.LITERATURE REVIEW Comprehensive reviews of the previous studies can be found in [2].Our discussion in this section is restricted to a few influential and relevant parts of literature, with a focus on camera motion estimation based on RGB-D data.
The camera motion estimation aims to find the affine transformations to convert point clouds in local frames into global coordination and integrate them into a final point cloud for object representation.These transformations represent the movement of camera from the first frame to the last frame.The earliest approaches focus on finding the affine transformation between two consecutive frames.In [13], the author use ICP algorithm to find affine between two consecutive frames based on the features are extracted from them.Another famous method are called Kinect Fusion [10,11], the method build the Signed Distance Function (SDF) and use the function for initializing the point cloud for each frame.Then, ICP algorithm is used to find affine transform in the next frame.However, the integration of affine transformations between two consecutive frames makes the errors that accumulated to misleading in the following frame is greater.The difference from Kinect Fusion, these methods in [7,10] estimate directly the affine transformation by minimizing the RSME of SDF, then updating SDF based on the computed transformation.In [8], the authors build SDF based on Octree to reduce memory and computational cost.These methods that use ICP algorithm focus on minimizing the point cloud, some methods [3 , 4 , 5, 6, 10] minimize the RGB-D of SDF between two consecutive frames.In [9], the method finds corresponding points between two consecutive frames and minimizes the total of the distance of these corresponding points.
In this paper, we propose the camera motion estimation based on SDF in [5,6].However, we improve SDF by adding the weighting function in [3] that is called WSDF.And, the problem of minimizing for this function is solved by Gauss-Newton method.

III. BACKGROUND OF CAMERA MOTION ESTIMATION
In this session, we present the camera motion estimation over frames from RGB-D sequences.The inputs of this phase are local point clouds are extracted from RGB and depth of each frame The problem is to find affine transformation to transfer the local point cloud at i-th frame from local coordinate to global coordinate.The affine transformation also describes motion of camera over frames, so this phase is called camera motion estimation.In

A. Signed Distance Function
The SDF of given surface ( ) .This function returns for any point the signed distance from to the surface.The SDF have four properties as follows:  If is outside the surface then ( ) .
 If is inside the surface then ( ) .
 If is on the surface then ( ) .
 If is nearer the surface then ( ) is smaller.www.ijarai.thesai.org is as smaller as possible.We must find i R and i t such that the function Considering the function consists of 12 parameters.However, the limitation of problem only needs the rotation and translation that can be solved by 6 parameters with three parameters for rotation ( ) and three parameters for translation ( ) .Therefore, can be written as a vector of 6 dimensions ( , , , , , ) . To minimize this function, Bylow et al. [4,5] used Gauss-Newton algorithm.

C. Update the SDF and the colors
The SDF is not traditional formula function due to it is formed by dividing the space into grids in 3D.Each node in 3D grid is called voxel.If a point does not match to voxel, SDF value of x is obtained based SDF value of the nearest neighbor voxels.So, the objective in this step is to compute SDF for each voxel.( ) ( , )

Assume that
Since the distance () L dv is a rough approximation which can get arbitrary wrong, we follow the standard approach to reduce the impact of bad measurements by truncating the measured distance if |d| > δ for some threshold δ as follows: || For each frame, we can compute the distance of each voxel at frame i th .The SDF value of a voxel can be obtained by weighted average of these distances as follows: However, this is not enough to decrease the impact of bad measurements.We do also have a higher uncertainty when the voxel lies behind the surface.To handle this, we weight the measurements using the following weight function as follows: From the RGB image and each voxel the color is estimated as the formula as follows: Where the weight of color for new measurement, is used as where is the angle between the ray www.ijarai.thesai.organd the principal axis to give more weight to pixels whose normal is pointing towards the key frame.

IV. WEIGHTING OF SIGNED DISTANCE FUNCTION
To increase the accuracy for the problem of minimize () i E  , we propose the weighting function () i wr for SDF that is called WSDF where ( ) ( ) the weighting function () i wr is defined as follows: The points are near surface can more accurately describe the shape of the object than the points are far from surface.So, the w( ) will increase when increase, this means the weights of the points are near the surface will be higher than the weight of the points are far from the surface.Meanwhile, we have to find i  by solving the optimization of the non-linear function 2 argmin( ( )( ( )) ) . We apply the Gauss-Newton method to solve the problem.The initialization for (0)

 
, and  at each loop is computed by the following formula: ( where J is Jacobian matrix is matrix that is created by main diagonal of () i wr .The loop will end when  11 The end of the process, we have is computed by a vector of 6 dimensions of i  .Then, we update SDF to compute for the next frame.

A. Dataset
We also evaluated our approach on the TUM 3D object reconstruction RGB-D benchmark dataset [12].In this wok, we use plant and teddy 3 to measure the errors of our approach.Fig. 5 shows some examples of the TUM dataset.

B. Measurement Evaluation 1) Relative pose error (RPE)
The relative pose error [8] measures the local accuracy of the trajectory over a fixed time interval ∆.Therefore, the relative pose error corresponds to the drift of the trajectory which is in particular useful for the evaluation of visual odometry systems.We define the relative pose error at time step i as follow: From a sequence of n camera poses, we obtain in this way m = n − ∆ individual relative pose errors along the sequence.From these errors, we propose to compute the root mean squared error (RMSE) over all-time indices of the translational component as follows: where ( ) refers to the translational components of the relative pose error .

2) Absolute trajectory error (ATE)
The absolute trajectory error [8] measures the global consistency can be evaluated by comparing the absolute distances between the estimated and the ground truth trajectory.As both trajectories can be specified in arbitrary coordinate frames, they first need to be aligned.www.ijarai.thesai.orgThis can be achieved in closed form using the method of Horn [1], which finds the rigid-body transformation S corresponding to the least-squares solution that maps the estimated trajectory onto the ground truth trajectory .Given this transformation, the absolute trajectory error at time step i can be computed as follows: Similar to the relative pose error, we propose to evaluate the root mean squared error over all time indices of the translational components as follows: where ( ) refers to the translational components of the relative pose error .

C. Experimental Results
We firstly evaluate our proposed approach on the benchmark objects in TUM dataset.Then we compare our experimental results to the-state-of-the-art methods to prove the effectiveness and robust of the proposed method.
In this research, we focus on camera motion estimation for 3D object reconstruction.Our approach based on object segmentation and SDF in RGB-D data.More specific, we use depth data for segmenting object and proposed the weighting function for SDF and solve the problem of minimizing for this function by using Gauss-Newton method.We evaluate our method by ATE and RPE that evaluate both global and local consistency.Moreover, we also evaluate many different time intervals to have deeper in understanding of the problem of camera motion estimation.
Table I and II give our experimental results on plant and teddy3 objects.However, the same approach has the different result on the different objects.This is the different characteristics of these datasets.The plant object have the slow movement more than teddy3 object.In addition, teddy3 object have structure of surface more complexity than plant object.
Table III, IV, V and VI compare our experimental results with state-of-the-art results on TUM dataset.We achieve better than Bylow's approach on both plant and teddy3 object.Our method is more efficient on both global and local consistency (can see Fig. 6).These results show that our approach is robust for camera motion estimation.To have these promising results based on updating SDF with the weighting function to get more accuracy when estimate the motion between two consecutive frames.

VI. CONCLUSION
In this work, we present a novel approach for camera motion estimation based on SDF in 3D object reconstruction using RGB-D data.In order to segment object, we use depth data based on threshold method.To estimate camera motion, we proposed a weighting function is added to SDF function is called WSDF to improve the performance of camera motion estimation phase.And, the WSDF is minimized by Gauss-Newton method.We systematically evaluate our approach on benchmark dataset.The experiments are measured on both ATE and RPE that assess the global and local consistency of the camera motion estimation.The experimental results show that our proposed approach achieves superior performance to the state-of-the-art algorithm on TUM dataset.www.ijarai.thesai.orgIn the future, we will consider SIFT or SIFT-flow for camera motion estimation based on RGB data to have better the performance of the system.

Fig. 1 .
Fig. 1.Illustration of 3D camera and RGB-D data: a) Microsoft Kinect Device; b) an object example of RGB-D data is captured by Kinect [4,5], Bylow et al. introduced the method of camera motion estimation based on signed distance function (SDF).

Fig. 3 .
Fig. 3.An example of camera motion estimation

Fig. 4 .
Fig. 4. Illustration of SDF for object's surfaceB.Affine TransformationAn affine transformation consists of two components: a three-dimension square matrix i R and a three-dimension

Gv
is global coordinate of each voxel.Based on the estimated pose i T , we can transfer to local coordinate of  .According to camera model, with the focal lengths and and principal point ( ) we can project 3D point ( , , ) , j) be pixel coordinate of projected point L v in image and I(d) be the corresponding depth value at ( ).We can compute distance () L dv of the depth of voxel and the depth value at ( ).
can update SDF of each voxel as follows:

5   based on the experiment, 2 
or the number of loop achieve the limitation.We adopt at each loop is computed as follows:

Fig. 5 .
Fig. 5.Some RGB and depth frames from TUM dataset

TABLE VI .
COMPARISION WITH THE STATE OF THE ARE METHOD ON TEDDY3 OBJECT ON RPE