Vision-based Indoor Localization Algorithm using Improved ResNet

The output of the residual network fluctuates greatly with the change of the weight parameters, which greatly affects the performance of the residual network. For dealing with this problem, an improved residual network is proposed. Based on the classical residual network, batch normalization, adaptive dropout random deactivation function and a new loss function are added into the proposed model. Batch normalization is applied to avoid vanishing/exploding gradients. -dropout is applied to increase the stability of the model, which we select different dropout method adaptively by adjusting parameter. The new loss function is composed by cross entropy loss function and center loss function to enhance the inter class dispersion and intra class aggregation. The proposed model is applied to the indoor positioning of mobile robot in the factory environment. The experimental results show that the algorithm can achieve high indoor positioning accuracy under the premise of small training dataset. In the real-time positioning experiment, the accuracy can reach 95.37. Keywords—Deep learning; residual network; loss function; dropout; indoor localization


I. INTRODUCTION
With the development of artificial intelligence technology, various types of robots have been widely used. In the application of mobile robots, real-time detecting and monitoring the location of robots is the prerequisite for better service to human beings. For indoor localization assignments, Wi-Fi based method [1], Bluetooth based method [2] and Radio Frequency identification technology [3] were proposed and widely used. However, bottlenecks exist in these methods. Wi-Fi-based methods are vulnerable to multi-path effects, Bluetooth-based methods exist mutually interference and RFbased methods require expensive equipment support. Visionbased methods [4] [5] which can realize real-time positioning only by a normal RGB camera, avoid all these bottlenecks mentioned above and provide a new way for indoor positioning.
In recent years, deep learning technology has been greatly developed and widely used in image processing, especially in image-based classification tasks. Compared to many traditional algorithms, deep learning technology, which uses massive training dataset to learn prior knowledge, has stronger generalization ability and more complex parametric expression.
Since 2012 [6], Hinton et al put forward the outstanding performance of Alexnet with five convolution layers and three full connection layers in the ImageNet image classification competition. More and more scholars began to study convolution neural network to solve various practical problems. It was found that the accuracy can be improved by increasing the depth of CNN (Convolutional Neural Network). The deeper the network, the more features can be obtained, and the stronger the expression ability of the network. What's more, the deeper the network, the more abstract semantic features can be extracted [7][8][9][10]. However, simply increasing the number of layers of neural network will lead to the problems of gradient disappearance, gradient explosion and model degradation. In 2016, He et al proposed a 152 layer Res-Net [11], which the residual structure is used in the deep neural network. Res-Net can solve the degradation problem and the residual structure makes the model easier to optimize, and can get better training results under the premise of smaller training dataset, but the learning results of the network are very sensitive to the fluctuation of the network weight, that is, the slight change of the network weight will cause a greater change of the output. The model would be affected badly by this shortcoming in the process of model training and testing. In [12][13][14][15][16], a serious of improvements have been made to ResNet. But none of them can solve the problem well.
In order to solve the stability problem of the ResNet, an improved residual network is proposed. Based on the classical residual network, batch normalization, adaptive  -dropout random deactivation function and a new loss function are added into the proposed model. Batch normalization is applied to avoid vanishing/exploding gradients.  -dropout is applied to increase the stability of the model, which we select different dropout method adaptively by adjusting parameter  . The new loss function is composed by cross entropy loss function and center loss function to enhance the inter class dispersion and intra class aggregation. The proposed model is applied to the indoor positioning of mobile robot in the factory environment. The experimental results show that the algorithm can achieve high indoor positioning accuracy under the premise of small training dataset. In the real-time positioning experiment, the accuracy can reach 95.37. www.ijacsa.thesai.org II. THE IMPROVED RESNET We use 50 layers residual network in our assignment, to enhance the performance of our model, batch normalization layer,  -dropout layer and improved loss function are added into our model. The structure of the improved ResNet is as follow Table I: The residual structure is composed of image preprocessing convolutional layer conv1, convolutional blocks conv2_3, conv3_4, conv4_6 and conv5_3 and full connection layer conv6. Each block is composed of three convolution layers, they are duplicated 3 times, 4 times, 6 times and 3 times, respectively. Batch Normalization layers are placed in front and back of each block and residual structure is applied in each block. Between conv5_3 and conv6, Average pool is applied to extract deep image feature and  -Dropout is applied to simplify the network. After conv6, loss function is applied, weight parameters are adjusted by stochastic gradient descent (SGD) of loss function with back-propagation, mini batch size is 256. The learning rate is 0.1 at the beginning and is divided by 10 when the error rate stops falling. We have 18 localization centers, so the final result of the loss function is, and we select the biggest one in these 18 number.
The residual structure is shown in Fig. 1， where x is the input of the convolutional block, the output of the block is  is more easier to be optimized. Branch x is sent to the next block directly, which can be studied easily. Under this structure, Back propagation is easier to go on.

A. Batch Normalization
Batch normalization is used to regulate the input into a reasonable scope, which can avoid the vanishing/exploding of gradients caused by the increase of the layer of deep neural network.

B. Dropout
With the increase of the layer, depth neural network is easy to cause over-fitting, dropout [17][18] is a commonly used technology to alleviate this problem. The specific method is to discard a neural network node according to a certain probability in the training process of deep learning network, that is, to set the activate value of the node to zero. To enhance the stability of the model,  -dropout is applied. Adjusting the value of  adaptively, we can generate different kinds of distribution of dropout.

C. Loss Function
In indoor localization algorithm, the image location features of the adjacent location points are similar, that is, the spacing between different classes is very small. In order to increase spacing between different classes and reduce spacing in one class, a loss function combined center loss and cross entropy loss is applied. The loss function can be described as follows: Where c L is the center loss function, s L is the cross entropy loss function,  is a weight used for balancing the two loss functions. The structure of our loss function is shown in Fig. 2.
The cross entropy loss function can be seen in [17]. We establish a class center in the feature space for each class. The center loss function is the sum of the distance between features of the sample and features of the class center in the feature space.

A. ImageNet Classification
For testify the superiority of our algorithm, ImageNet 2012 classification dataset [19], which include 1.3 million images and 1000 classes, is applied. Our improved ResNet is trained by 1.15 million images, evaluated by 50k images, and eventually tested by 100k images.
The training error rate and test error rate of classical ResNet and our improved ResNet can be found in Fig. 3. It can be seen that curves of classical ResNet both in training set and test set fluctuate badly and curves of improved ResNet changes smoothly with the increase of iteration times.
Top-1 and top-5 error rates of ResNet50 and our improved ResNet50 are shown in Table II. It can be seen that our improved ResNet is a little lower than ResNet.

B. Indoor Localization
Indoor localization experiments are done in a factory environment where 18 regions and location centers are placed.    We constructed the dataset [24] in a factory environment where we test our algorithm. 1800 samples labeled location information are included in the dataset, we have 100 samples for each location point with different shooting angles.
Confusion matrix is employed to describe localization result, 30 images of each location region were used for testing these experiment results that can be seen in Fig. 6.
We can see in Fig. 6(a) that correct classification time of method1 is 504 and wrong classification time is 36, the accuracy of method1 is 93.33%, more errors happened in middle regions of the scene that is because the location feature of these nearby regions are similar and hard to distinguish, and when the input location feature fluctuate, the output would go to the wrong location. When comes to our improved ResNet, the output would be more stable when the location feature of the input changes, so the accuracy increases. In Fig. 6(b), correct classification time of method1 is 515 and wrong classification time is 25, the accuracy of improved ResNet is 95.37%, the accuracy increased by 2.04%.  IV. CONCLUSION An improved residual network is proposed in this paper to enhance the stability of classical ResNet. Based on the classical residual network, batch normalization, adaptive  -dropout random deactivation function and a new loss function are added into the proposed model. Batch normalization is applied to avoid vanishing/exploding gradients.  -dropout is applied to increase the stability of the model, which we select different dropout method adaptively by adjusting parameter  . The new loss function is composed by cross entropy loss function and center loss function to enhance the inter class dispersion and intra class aggregation. The improved ResNet50 is then applied to the indoor positioning of mobile robot in a factory environment. The experimental results show that the algorithm can achieve high indoor positioning accuracy under the premise of small training dataset. Future work will focus on the temptation and improvement of other neural-networks to improve the accuracy of the indoor localization system.