Autonomous Vehicle-to-Vehicle ( V 2 V ) Decision Making in Roundabout using Game Theory

Roundabout intersections promote a continuous flow of traffic. Roundabouts entry move traffic through an intersection more quickly, and with less congestion on approaching roads. With the introduction of smart vehicles and cooperative decision-making, roundabout management shortens the waiting time and leads to a more efficient traffic without breaking the traffic laws and earning penalties. This paper proposes a novel approach of cooperative behavior strategy in conflict situations between the autonomous vehicles in roundabout using game theory. The game theory presents a strategic decision-making technique between independent agents players. Each individual player tends to achieve best payoff, by analyzing possible actions of other players and their influence on game outcome. The Prisoner's Dilemma game strategy is selected as approach to autonomous vehicle-to-vehicle (V2V) decision making at roundabout test-bed, because the commonly known traffic laws dictate certain rules of vehicle's behavior at roundabout. It is shown that, by integrating non-zero-sum game theory in autonomous vehicle-to-vehicle (V2V) decision making capabilities, the roundabout entry problem can be solved efficiently with shortened waiting times for individual autonomous vehicles. Keywords—autonomous vehicles; decision making; non-zerosum game theory; mobile robots; roundabout; vehicle-to-vehicle cooperation (V2V); wireless communication


INTRODUCTION
Roundabout intersections have recently become very popular, since they reduce the number of conflict points, which are characteristic for classic intersections, reduce driving speeds and increase driver attention [1].When traffic is heavy, waiting time is a significant problem [2].Possible solutions include installation of traffic lights at a roundabout entry, which can decrease waiting times during increased traffic flow times [3], especially if it is optimized [4].Modern approaches, such as flower and turbo roundabouts, present recent solutions that improve road safety and reduce number of collisions [5].Increased capacity also increases pollutant emissions [6].
With the introduction of smart vehicles, an alternative method for roundabout management has emerged.In the paper [7], a new concept for lateral control on roundabouts is introduced, taking into account entrances, exits and lane changes inside the roundabouts.The experiments have been tested in a 3D simulator that emulates the behaviour of driverless vehicle from the real world -Cybercars.Using vehicle-to vehicle communication (V2V) and vehicle-toinfrastructure (V2I), vehicle gaps can be reduced, thus increasing roundabout traffic flow [8].Besides that, non communicating vehicles should be identified and reported by the road-side infrastructure [9].In [10], a microscopic traffic simulator was developed to study intelligent traffic management techniques and evaluate their performance at roundabouts and crossroads.In the paper [11], the fuzzybehavior-based algorithm for roundabout intersection management is presented.The various different vehicle communication typescombinations of cooperative and noncooperative vehicles as well as possibility of faulty or missing infrastructure controller were examined.
It is found in [12] that by applying game theory in VANETs and fuzzy logic control for simulation, minimizing traffic congestion and reduced wait time can be achieved quite well.The approach makes traffic regularized not only in the mountainous areas after the occurrence of landslides, but in urban and rural areas as well, upon facing road hurdles.One example of using Game Theory (GT) in Intelligent Transport systems is seen in Vehicle Platoon [13].
In deciding an action of the robot in the coordination for the target tracking, [14] presents a method using the "Nash equilibrium" based on the noncooperative game theory.On the other hand, [15] proposes the "Stackelberg equilibrium", based on a type of cooperative game.In [16], the switching method is proposed in order to coordinate the Nash equilibrium with the Stackelberg equilibrium, which needs communication in the situation that only the Nash equilibrium that needs no communication, is a difficult task to achieve.
The key contribution of this work is twofold.First, we propose a novel approach of cooperative behavior strategy in conflict situations between the two robot vehicles in a roundabout model, based on game theory.Second, this Grant from Ministry of Civil Affairs of Bosnia and Herzegovina www.ijacsa.thesai.orgautonomous vehicle-to-vehicle (V2V) decision making framework was implemented as cyber-physical system, through wireless connected mobile robot platforms, in order to demonstrate real-life situations in a roundabout.
The rest of this paper is organized as follows.Section II describes the basics of game theory.Section III proposes the non-zero-sum game structure in roundabout.In Section IV, the results of game theory in autonomous V2V decision making are presented to demonstrate the effectiveness of the proposed approach in cyber-physical system framework.In the last section, conclusions and directions for future work are presented.

II. GAME THEORY
Game theory is a formal study of decision-making where several players must make choices that potentially affect the interests of other players [17].The sequence of optimal decisions chosen by the players is closely related to optimal control problem.Game theory applies in many studies of competitive scenarios, therefore, the problems are called games and the participants are called players or agents of the game [18].A player is defined as an individual or group of individuals making a decision [19].Each player of the game has an associated amount of benefit or gain, which he receives at the end of the game, and this, is called payoff or utility, which measures the degree of satisfaction an individual player derives from the conflicting situation [20].For each player of the game, the choices available to them are called strategies [21].The game presents the description of strategic interactions that include the constraints on the action that a player can take and also the player's interests but does not specify the actions that the players do take [19].
Game theory is generally divided into two branches, and these are the non-cooperative and cooperative game theory [19].Whether a game is cooperative or non-cooperative would depend on whether the players can communicate with one another.The non-cooperative game theory is concerned with the analysis of strategic choices [17].While the noncooperative game theory focuses on competitive scenarios, the cooperative game theory provides analytical tools to study the behavior of rational players when they cooperate [19].
Formally, n-player normal form game is defined as the (2n + 1) -tuple: where is a natural number, * + is a given finite set, so-called set of players, its elements are called players; for every * +, is an arbitrary set, socalled set of strategies of the player i, and is a real function called payoff function (utility function) of the player i.A set of all strategies space of all players is represented by the matrix [21].In such a case, we denote the game by ⟨ ( ) ( )⟩.
The Nash equilibrium, also called the strategic equilibrium, is a list of strategies, one for each player, which has the property that no player can unilaterally change his strategy and get a better payoff.In other words, no player in the game would take a different action as long as every other player remains the same [22].
An n-tuple of strategies * + is called an equilibrium point or Nash equilibrium of the game, if and only if for every * + and every the following condition holds: Depending upon the number of players, a game can be classified as 2-player game or N-players where N>2 [21].Bimatrix game is a two-player finite normal form game where  player 1 has a finite strategy set * +  player 2 has a finite strategy set * +  when the pair of strategies ( ) is chosen, the payoff to the first player is ( ) and the payoff to the second player is ( ) , are payoff functions.
The values of payoff functions can be given separately for particular players: Matrix A is called a payoff matrix for player 1, matrix B is called a payoff matrix for player 2.
There are a number of possible strategies that a player can choose to follow: Dominating, Extensive game, Mixed strategy, Zero-sum game and Non-zero-sum games Evolutionary interpretation, etc [17].
The dominant strategy presents the best choice for a player for every possible choice by the other player.A dominant strategy has such payoffs that, regardless of the choices of other players, no other strategy would result in a higher payoff.An extensive game (or extensive form game) describes with a tree how a game is played.It depicts the order in which players make moves, and the information each player has at each decision point.A mixed strategy is an active randomization, with given probabilities, which determine the player's decision.As a special case, a mixed strategy can be the deterministic choice of one of the given pure strategies.A game has perfect information when at any point in time only one player makes a move, and knows all the actions that have www.ijacsa.thesai.orgbeen made until then.In the evolutionary interpretation, there is a large population of individuals, each of whom can adopt one of the strategies.The game describes the payoffs that result when two of these individuals meet.The dynamics of this game are based on assuming that each strategy is played by a certain fraction of individuals.Then, given this distribution of strategies, individuals with better average payoff will be more successful than others, so that their proportion in the population increases over time.
A game is said to be zero-sum if for any outcome, the sum of the payoffs to all players is zero.In a two-player zero-sum game, one player's gain is the other player's loss, so their interests are diametrically opposed.The theory of zero-sum games is vastly different from that of non-zero-sum games because an optimal solution can always be found.Non-zerosum games differ from zero-sum games in that there is no universally accepted solution.That is, there is no single optimal strategy that is preferable to all others, nor is there a predictable outcome.Non-zero-sum games are also nonstrictly competitive, as opposed to the completely competitive zero-sum games, because such games generally have both competitive and cooperative elements.Players engaged in a non-zero sum conflict have some complementary interests and some interests that are completely opposed.The examples of non-zero-sum games are the Prisoner's Dilemma game, the Battle of the Sexes, the symmetric games, etc.The Prisoner's Dilemma game can be generalized to any situation when two players are in a non-cooperative situation where the best all-around situation is for both to cooperate, but the worst individual outcome is to be the cooperating player while the other player defects.The Prisoner's Dilemma game strategy is selected as an approach to autonomous vehicle-to-vehicle (V2V) decision making at roundabout test-bed, because the commonly known traffic laws dictate the certain rules of vehicle's behavior at roundabout.

A. Roundabout Model
The basic idea for the application of game theory and appropriate structures in roundabout will be illustrated through the Roundabout model as shown in "Fig.1".The mobile robot was considered as autonomous vehicles -players.Regarding the position of the autonomous vehicles toward to roundabout, players could be in next states: "normal (NS)", "including (RI)" and "circulating (RC)".
Each autonomous vehicle player has two statuses: Entering Vehicle (EV) and Circulating Vehicle (CV, roundabout inside).The Entering vehicle, while entering the roundabout, can detect the Circulating vehicle.Both autonomous vehicles must have certain information about one another.The Entering vehicle can calculate and send an angle at which it saws the Circulating vehicle, its own traveled distance and current speed.The circulating vehicle sends information to the Entering vehicle about the circulating continuing or not in order that the Entering vehicle decides to smootly include or slow down.
Common known traffic laws in a roundabout dictate that the Circulating vehicle always has the advantage over the Entering vehicle, i.e. the Entering vehicle must slow down and ultimately stop if the Circulating vehicle has not passed the specific roundabout intersection.

B. Vehicle-to-Vehicle Cooperation
Let the autonomous vehicle R1 be in status Entering vehicle, and the autonomous vehicle R2 in the status of Circulating vehicle.The cooperation of autonomous vehicles in the roundabout is done through the following steps: a) When the autonomous vehicle R1 goes from "normal" state to "including", then the autonomous vehicle R1 can start to move constant speed towards the point of inclusion and parallel scanning on the R2 autonomous vehicle from the left side.If the autonomous vehicle R1 notices the autonomous vehicle R2 from the left side, it calculates the angle at which the autonomous vehicle R2 was noticed and its distance D2.Value D 10 -D 1 is the distance travelled by the autonomous vehicles R1 from the point where it went from the "normal" state to "including" until the moment when it noticed the autonomous vehicle R2 on the left side.Once it notices the autonomous vehicle R2, the autonomous vehicle R1 stops the scanning.The is the angle between , and, based on the cosines theorem, it is: The is the angle between values , so, based on the cosines theorem, it is: The length of L represents the distance travelled by the autonomous vehicle R2, that is, the time needed to reach the point of inclusion / exclusion.   In minimizing roundabout congestions, travel time is the most important factor that needs to be considered.The times and are the times that are necessary for the autonomous vehicles R1 and R2 respectively, to reach the point of inclusion in the roundabout, from the moment when the autonomous vehicle R1 notices the autonomous vehicle R2.
The time required for the autonomous vehicle R1 to travel the part up until inclusion with speed is ⁄ .

Fig. 2. Roundabout Model geometry
The time required for the autonomous vehicle R2 to cross the section L with speed is ⁄ .Time Δt represents the passing of an autonomous vehicle R2 through the point of inclusion / exclusion inside the roundabout, in order to avoid a collision between the autonomous vehicles R1 and R2.Time Δt only relates to the speed of autonomous vehicle R2 and its dimension and it is ⁄ .
During the V2V cooperation, the following specific cases are possible, to which we apply game theory:

 If
, the autonomous vehicle R1 includes freely and moves into the "circulating" state.If the autonomous vehicle R2 decides to continue to circulate, the autonomous vehicle R2 remains in the state of "circulating".If the autonomous vehicle R2 decides to exclude from the roundabout, then, it moves into the "normal" state, and freely excludes itself from the roundabout.The waiting time for both autonomous vehicles R1 and R2 are zero.

 If
and if the autonomous vehicle R2 decides to exclude itself from the roundabout, then, the autonomous vehicles move freely.The autonomous vehicle R1 moves into the "circulating" state, and the autonomous vehicle R2 moves into the "normal" state.The waiting time for both autonomous vehicles R1 and R2 are zero.

 If
and if the autonomous vehicle R2 decides to remain in the "circulating" state, conflicts are possible.The autonomous vehicle R2 has the advantage and continues to circulate freely, while the autonomous vehicle R1 adjusts its speed to avoid conflicts.The autonomous vehicle R1 must come to a point of inclusion for .Then the waiting time of autonomous vehicle R2 is zero, and for the autonomous vehicle R1 is ( ). (Autonomous vehicle R1 will slow down linearly, to the point of inclusion, in order to reach that point for the time .Once the autonomous vehicle R1 gets included, it goes into the "circulating " state.

 If
, the autonomous vehicle R1 includes freely and moves into the "circulating" state.If the autonomous vehicle R2 decides to continue to circulate, it remains in the state of "circulating".If the autonomous vehicle R2 decides to exclude from the roundabout, then the autonomous vehicle R2 moves into the "normal" state and freely excludes from the roundabout.The waiting time for both autonomous vehicles R1 and R2 are zero.

C. Localisation of Autonomous Vehicle in Roundabout
In case of turning detection, the localisation algorithm in roundabout for each autonomous vehicle is based on combination of previous state and current state in which the vehicle was or can be and random moving as action through the space.The next sequence describe the condition, action and localisation state after moving action: (new previous state, new current state) = f(previous state, current state, action).
For different situation in a roundabout, there exist the next sequences:

normal, Turn right or left)
For example, if current state is "normal" and previous state is "normal" and the vehicle random turns right then new values are current state = "including" and previous state = "normal".

D. Game Strategy in autonomous V2V Decision making
Based on the "Prisoner's dilemma" and the predefined www.ijacsa.thesai.orgalgorithm, we can create a table that shows the waiting time for the autonomous vehicles R1 and R2 according to the situation in which they find themselves within the roundabout.
Each player has two strategies.The autonomous vehicle R1 is Entering vehicle and the autonomous vehicle R2 is the Circulating vehicle.For the Entering vehicle, those strategies are "smoothly inclusion (SI)" and "adjusting speed (AS)".For Circulating vehicle, those strategies are "smoothly exclusion (SE)" and "smoothly circulate (SC)".Game Strategies for Entering and Circulating vehicles and their payoffs are shown in Table II.Autonomous vehicle R1 is trying out all possible actions starting with the one that is best: ) ( ( ) ).For the autonomous vehicle R2, all actions lead to zero waiting time.For the autonomous vehicle R1, the greatest waiting time, also the only waiting time is in case it chooses a strategy (AS), and the autonomous vehicle R2 chooses (SC).All other actions by the autonomous vehicles R1, lead to zero waiting time.
Based on the Nash equilibrium in the Prisoner's dilemma, if the prisoners are not "selfish", we can conclude that the Nash equilibrium actions are (SI, SE) = (SI, SC) = (AS, SE) = (0,0).In case of two interacting vehicles, Entering vehicle loses some minimal amount of time, but overall time loss is avoided.

A. Autonomous Mobile Robot Structure
The modified Parallax Boe-Bot mobile robots are used as autonomous vehicles, and used for demonstration scenarios in roundabout.This mobile robot consists of the two geared motors mounted on aluminum chassis, batteries and control electronics.In order to achieve advanced performance and utilize Arduino libraries, BasicStamp was replaced by Arduino Uno microcontroller board, which is based on ATmega328 microcontroller.Each autonomous vehicle has QTI sensors for line following (as road detection).In order to detect obstacles and other vehicles, the robot was also equipped with Parallax Ping))) ultrasonic sensors distances.
The communication between the robots is established through a wireless communication using XBee modules and ZigBee protocol.

B. Example of Conflict Scenario
Consider a scenario, where conflict situation between autonomous vehicles R1 and R2 are possible.In "Fig.3", the autonomous vehicle R2 is in "circulate" state and autonomous vehicle R1 is in "including" state.Autonomous vehicle R1 scans whether the autonomous vehicle R2 comes from the left.In that case, the autonomous vehicle R1 sends the message that it wants to include itself in the roundabout and asks whether the autonomous vehicle R2 will continue to circulate or exclude from the roundabout.In this case, the autonomous vehicle R2 decided to continue to circulate in the roundabout and the autonomous vehicle R1 adjusts its speed and waits until another autonomous vehicle R2 passes the point of inclusion in the roundabout, "Fig.4"."Fig.5" presents time responses of left and right servo motors for autonomous vehicle R1 and "Fig.6" presents states of autonomous vehicle R1.If it notices another vehicle, this means that it is in "circulating (RC)" and that it can eventually cause a crash if another autonomous vehicle decides to continue circulating.Therefore, when the autonomous vehicle R1 notices another vehicle R2, it asks whether it will continue to circulate (possible conflict) or exclude from the roundabout.In this experiment, the autonomous vehicle R2 decided to continue to circulate, and the autonomous vehicle R1 stops its movement and waits until another autonomous vehicle R2 passes the point of inclusion in the roundabout.This waiting time lasts from t = 12[s] to t = 15[s], when another autonomous vehicle R2 leaves the point of inclusion in the roundabout, and the autonomous vehicle R1 can continue its movement.
At time t = 17[s], the autonomous vehicle R1 is included in the roundabout, a new state status of autonomous vehicles R1 is "circulating (RC)", and for the autonomous vehicle R2, it is "normal" state.The autonomous vehicle R1 circulates until the time t = 21[s], when it excludes from the roundabout and passes to "normal (NS)" state.It is found that by applying Game theory in autonomous vehicle-to-vehicle (V2V) decision making, results can be achieved quite well in the form of management in critical section and reduced waiting time for individual autonomous vehicles.Our approach is verified in cyber-physical framework of wireless connected mobile robots.
In real life traffic scenarios, additional factors may shape the cooperate model between more vehicles.For the future work, we plan to investigate a more advanced decision making model with multiple vehicles inside roundabout, so different congestion scenarios can be analyzed.

Fig. 1 .
Fig. 1.Roundabout Model with states regarding the position b) The autonomous vehicle R1 sends a request for communication with the autonomous vehicle R2, noticed earlier in the first stage.The autonomous vehicle R1 sends the angle and the distance D 2 under which it noticed the autonomous vehicle R2, its own moving speed and distance D 10 -D 1 from the point when it turns from "normal" state to "including", to the moment when it notices the autonomous vehicle R2 on the left side.c) After the autonomous vehicle R2 received the necessary information from the autonomous vehicle R1, the autonomous vehicle R2 performs the action of coordination.Autonomous vehicle R2 reads its own uniform speed ."Fig.2" shows the important parameters of positioning of autonomous vehicles in the roundabout.

TABLE I .
THE BIMATRIX FOR TWO PLAYERS ( , )

TABLE II .
POSSIBLE STRATEGIES AND PAYOFFS FOR VEHICLES