Support Vector Regression based Localization Approach using LoRaWAN

—The Internet of Things (IoT) domain has experienced significant growth in recent times. There has been extensive research conducted in various areas of IoT, including localization. Localization of Long Range (LoRa) nodes in outdoor environments is an important task for various applications, including asset tracking and precision agriculture. In this research article, a localization approach using Support Vector Regression (SVR) has been implemented to predict the location of the end node using LoRaWAN. The experiments are conducted in the outdoor campus environment. The SVR used the Received Signal Strength Indicator (RSSI) fingerprints to locate the end nodes. The results show that the proposed method can locate the end node with a minimum error of 36.26 meters and a mean error of 171.59 meters.


I. INTRODUCTION
In 2016, everything appeared to spin around the development of the Internet of Things (IoT), where anything from vehicles to washroom scales is connected to the internet to offer additional services to customers [1]. However, it is most likely that the industry applications evolving from machine-to-machine (M2M) technologies are the main driving force for IoT. IoT is the evolution of M2M communications, where a larger number of nodes are connected using ethernet in the backend to reroute the data as needed. This is a crucial step towards creating smart city applications and the fourth industrial revolution, where experts say that the physical, digital, and biological boundaries will blur in industries [1]. IoT companies are trying to launch their solution for networks because of the increasing demand for Machine-to-Machine communication. While Machine-to-Machine largely depends upon 2G networks for deployment, the IoT emerged with entirely different requirements, such as low costs for the IoT chips and the dense deployment of nodes on a single cell [2] [3].
The deployment of IoT has increased the demand for finding the locations of the end devices. It is crucial in the field of IoT to have the localization done with low power and long range [4][5] [6] [7]. This can be accomplished by implementing low-power wide area networks (LPWAN) technologies. Longrange wide area networks (LoRaWAN), the LPWAN technology, have the significance of providing location-based services with low power and long range.
Localization using LoRaWAN can be performed using multiple techniques or approaches. The simplest of which is the trilateration technique [8] [9]. This technique uses at least three gateways to find the location of the end node. It uses the received signal strength indicator (RSSI) to determine the distance between the gateways and the end nodes. It then applies the trilateration algorithm to find the end node's location. The second approach is to find the angle of the received signal on the receiving antenna. Using that angle, the angle of arrival (AoA) technique helps in finding out the location of the end node [10][11] [12]. The third approach can be the time-based approach. Time of Arrival (ToA) [13] [14][15] [16] and Time Difference of Arrival (TDoA) [17] are the two types of time-based techniques. The time-based approaches use the time of the signal to reach the receiver. This time is then converted to the distance, and the localization is performed. The final and most accurate approach is the fingerprinting approach. This approach has two phases, the offline and the online phase. The measurements are taken in the offline phase and uploaded to the database. In the online phase, the location of the end nodes is predicted using machine learning algorithms by learning the data collected in the offline phase. Multiple machine-learning algorithms can be implemented to find the location of the end nodes. Depending upon the application, the classifiers [18] [19][20] and the regression-based algorithms [21][22] [23] are used. If it is enough to determine the region or an area where the end node is located, then the classification algorithms can be the easy catch. If there is a need to find the exact location of the end node, then the regression-based algorithms can be used to find the ground truth locations of the end nodes.
In this research article, we have implemented the fingerprinting approach to find the location of the end node. Firstly, the measurements are taken in the campus outdoor environment. Using those measurements, the location of the end node is predicted using support vector regression (SVR) to find the overall distance error. The performance of SVR is studied in the areas where the shadowing effect has its maximum presence.
From the technical problem evaluation perspective, the work is subdivided into the following sections. The literature review is discussed in Section II. In Section III, the research methodology is presented. Section IV presents the results. The conclusion is derived in the last section, along with the references.

II. LITERATURE REVIEW
Long Range Wide Area Network (LoRaWAN) technology is the key enabler of IoT technologies. It helped form a largescale network connected to the internet at a very low cost because the range of LoRaWAN is high, requiring a minimal number of end devices and gateways to cover a large area. The technology has long-range communication with very little power consumption, thereby increasing the battery life of the end devices. Study [24] used LoRaWAN in the real environment in Thailand to present the experimental performance evaluation. The authors have found experimentally that in an outdoor rural environment, the LoRaWAN ranges up to 2 km and ranges 55-100 m in an indoor environment. It was pointed out that the range depends upon the properties of antennas, such as the antenna's height, gain, and directivity. Research [25] used the central business district of Melbourne, a high-density urban area, to present specific measurements to evaluate the performance of LoRaWAN. Their results show that within the radius of 200 m, only the communication is loss-free, while at around 600 m, the communication is a total loss. It isn't easy to have a precise measurement. Author in [26] explained with the results that environmental temperature highly affects communication. The authors showed that perfect communication could be converted to an almost useless one by increasing the environmental temperature. Therefore, it is important to consider the environment and the effect of the environment on LoRa signals to get good localization accuracy. The LoRaWAN technology offers an excellent option for Internet of Things (IoT) uses, such as advanced agriculture irrigation systems and intelligent urban development initiatives, among others. Thousands of end devices can be supported by a single gateway. Localization is significant for these LoRaWAN applications as the LoRaWAN network can have devices within the range of several thousand. Therefore, it is imperative to estimate each end device's location. An example of this can be multiple temperature sensors placed in various urban areas to measure temperature fluctuations. As the number of sensors can be thousands in this application; therefore, it is very tedious to program each of the sensors with their locations.
A natural solution to this problem is to equip GPS with every sensor. While this is a perfect solution, as GPS can have up to 10 m of accuracy, adding a GPS tracker to every sensor or device will increase the overall cost and power consumption [27]. Another problem with GPS is the lack of indoor coverage, as GPS signals can have so much signals losses when penetrating buildings etc. Therefore, it is very much important to find a solution for localization using LoRa. The in-depth studies on LoRaWAN can be found in [28] [29] [30] [31].
The study [32] calculated the positioning errors by constructing RSSI fingerprint data for LoRaWAN and SigFox using k-NN. The accuracy obtained by the authors was 398.4m. Their study used several gateways for measuring LoRa RSSI data.
Research [33] compared the fingerprinting and the rangebased approaches. The authors concluded that the fingerprinting approach has less mean localization error than the range-based approach. The mean error using fingerprinting approach was 340 m, and 700 m using the range-based techniques. Similarly, [34] used k-NN, Extra Trees, and neural networks (NN) to find the location of the end node and had the mean error of 394 m, 379 m, and 357 m, respectively. Authors in [35] used the artificial neural network to find the end node's location and got a mean error of 381.8 m.
The study [36] compared linear regression methods, SVR, k-NN, weighted k-NN, and random forest, concluding that the random forest could perform with the minimum localization error of 340 m.
The research [37] used two layers to perform the localization. In the first layer, the authors used k-means clustering; in the second layer, the final position is estimated using the weighted kernel regression model. The authors were able to achieve a mean localization error of 346.03 m.
The literature provides valuable insights into LoRa localization. However, to the best of the author's knowledge, a significant gap still exists in the literature with regard to the performance of SVR, where the shadowing effect has its maximum presence.

III. METHODOLOGY
This section describes the hardware setup, the dataset, test point locations, and the methodology used. Fig. 1 shows the longitudes and the latitudes of the points where the RSSIs are measured. The gateway was placed on the rooftop of a lab with an elevation of 74 m above sea level, and the end node was moved to 14 random locations. The height of the end device was variable as it is challenging to make the elevation of the end device constant with different distances and areas. A total of 14 random locations were selected on the campus to find out the measured RSSI values. All the measurements were taken outdoors. No indoor measurements were taken. The minimum and the maximum distance used between the gateway and the end node are 17 m and 1330 m, respectively, to measure the RSSI. The distances between the End Node and gateways are calculated using Eq. (2). The Dragino LoRa shield served as the endpoint device for the experiments, and the RisingHF (RHF2S008) acted as the www.ijacsa.thesai.org gateway. The end node was powered by a portable battery bank, while the gateway was powered through Power over Ethernet (PoE) and included an integrated GPS module, making it convenient for the experiments to determine Differential TDoA. The gateway possessed notable features, which are as under.

A. LoRaWAN Setup
 The gateway supported 8 multi spreading factor uplink channels.
 The maximum output power of 27 dBm.
 Antenna gain is 3dBi.
The experiments utilized the online public network server called THE THINGS NETWORK as the network server. The LoRa Shield transmits the data to the gateway, which then passes the data, along with metadata such as SNR, RSSI, and timestamps, to THE THINGS NETWORK. The collected data is then uploaded to the computer for the application of Support Vector Regression (SVR) to predict the location of the end node.

B. Support Vector Regression
SVR is the supervised machine learning model that works similar to support vector machine. It finds the best-fit line for the predictions. The Support Vector Regression (SVR) approach differs from other regression models in that it aims to find the best line that falls within a specified range, known as the threshold value, instead of minimizing the difference between the actual and predicted values. This threshold value refers to the space between the hyperplane and the boundary line. However, SVR's computational time for fitting increases rapidly with the number of samples, making it challenging to handle datasets with over 10,000 data points [33].
There are a total of 21 features (RSSIs) for a single ground truth location to predict the location of the end node. We used the standardization in the preprocessing step on our data using z-score and then applied the SVR. The kernel scale used is 1.1. We have used the gaussian as the kernel function. The formula for the gaussian kernel function can be seen in Eq. (1) [38].
(1) Where x j is the target variable, and x k is the feature variable.

C. Distance Error
The distance error between the ground truth location and the predicted location is calculated using Eq. (2) by implying the predicted longitudes and the latitudes [39].
Whereas n=R*m. The R is the earth's radius, n is the distance between two points on earth, l and q are the latitudes and longitudes, respectively. Fig. 2 shows the RSSI values at different distances. The results were taken by using a single gateway and a single node. There were 14 locations where the RSSIs were measured. Some locations were chosen to make the shadowing effect more significant. At each location, 21 readings were taken. Fig. 2. RSSI values at different distances.

IV. RESULTS AND DISCUSSIONS
As shown in Fig. 2, the RSSI decreased with the increase in the distance, but there were some exceptions. The exceptions were the locations where the shadowing effect was more significant. This can be seen in Fig. 3 where the average RSSI plot was taken at different locations. As can be seen at the distance of 43m, 330m, 413m and 600m, the average RSSI decreased more because of shadowing, especially at 600m, which we measured behind the building. The RSSI decreased up to -118.4 dBm because of the shadowing effect. It is clear from the graphs that shadowing can be a bottleneck for localization using LoRa, thereby increasing distance error.  310 | P a g e www.ijacsa.thesai.org with the increase in the distance, but after a certain distance, the decrease in RSSI becomes very minimal.     6 shows the average SNR at every testbed location. It is observed that the SNR measured has a negative value at 600 m (behind the building) and at a distance of 1330 m. A combined graph of Average SNR, Average RSSI and distance can be seen in Fig. 7. As can be seen that at a distance of 600 m, the RSSI and SNR are at their lowest value due to the shadowing effect. This is the datapoint directly taken behind the building, which clearly shows that the shadowing directly affects localization accuracy using SVR.  Table I shows the measured average RSSIs, average SNRs, longitude and latitudes of the actual locations, longitude and latitudes of the predicted locations, and the distance errors caused using the SVR. The table shows that the highest distance errors predicted by SVR were the locations where the shadowing effect was most significant. The least distance error predicted by the proposed method is 36.26, the mean error is 171.59 m, and the highest distance error is 755.54 m.
The limitations of the study can be attributed to the environmental factors on the signal strength, which can affect the accuracy of the LoRa localization. Some limitations include signal strength variability due to environmental factors such as obstacles, interference, and atmospheric conditions. The complexity of environmental modeling, such as terrain and buildings, affects the propagation of LoRa signals. Accurately modeling these environmental factors can be complex and require detailed knowledge of the local environment.
One use case scenario where the localization method can still be useful even with a high error is in wildlife tracking. For example, researchers tracking the movement of large animals such as elephants or giraffes in a wildlife reserve can benefit from using localization methods to get a general idea of the animal's location, even with a high error margin. Even if the location error is high, it can still provide valuable information about the animal's general movements, such as where they are likely to feed, rest, or migrate. Additionally, the data collected over time can help researchers identify patterns, make predictions about the animal's behavior, and inform conservation efforts. By tracking the movements of wildlife, researchers can gain insights into their behaviors and habitats and use that knowledge to protect them better.

V. CONCLUSION AND FUTURE WORK
In this research article, the experiments were conducted on a university campus to find the location of the end node. A localization algorithm using Support Vector Regression (SVR) has been implemented on a LoRaWAN architecture. The results show that using the RSSI as features and the SVR as a regression algorithm, the end node can be located with an average distance error of 171.59 meters and a minimum error of 36.26 meters. This shows that the low-powered LoRaWAN can be used for localization in applications where high localization accuracy is not needed. This work can be extended to find the effect on localization accuracy by increasing the number of gateways, the dataset, and the inclusion of other fingerprints like SNR and time fingerprints. Additionally, different environments can be included in the experiments, like indoor areas and finding out the localization distance errors in the combined space of indoors and outdoors.

ACKNOWLEDGMENT
The author of this research article acknowledges the financial support from the Universiti Teknologi PETRONAS under full-time PhD sponsorship for Graduate Assistantship (GA) program.