Time-Dependence in Multi-Agent MDP Applied to Gate Assignment Problem

Many disturbances can impact gate assignments in daily operations of an airport. Gate Assignment Problem (GAP) is the main task of an airport to ensure smooth flight-to-Gate assignment managing all disturbances. Or, flights schedule often undergoes some unplanned disruptions, such as weather conditions, gate availability or simply a delay that usually arises. A good plan to GAP should manage as possible stochastic events and include all in the planning of assignment. To build a robust model taking in account eventual planning disorder, a dynamic stochastic vision based on Markov Decision Process theory is designed. In this approach, gates are perceived as collaborative agents seeking to accomplish a specific set of flights assignment tasks as provided by a centralized controller. Multi-agent reasoning is then coupled with time dependence aptitude with both time-dependent action durations and stochastic state transitions. This reflection will enable setting up a new model for the GAP powered by a Time-dependent Multi-Agent Markov Decision Processes (TMMDP). The use of this model can provide to controllers at the airport a robust prior solution in every time sequence rather than bringing a risk of online schedule adjustments to handle uncertainty. The solution of this model is a set of optimal decisions time valuated to be made in each case of traffic disruption and at every moment. Keywords—Time-dependent Multi-Agent Markov Decision Processes; stochastic programming; flight delays; Gate Assignment Problem


I. INTRODUCTION
More interest in recent years is allowed to providing advanced techniques in the air traffic framework.This is resulting from the increase of air transport traffic [1].The main objectives are best allocation and management of airport and airline resources in the best way effectively and efficiently.Caused by the dynamic stochastic operational environment conditions of air transport, the scheduling problems currently confronted by the airport and airline managers are leading to challenging and complex planning problems that involve innovative models and solutions.This is triggered by the significant diversity of resource segments that have to be regarded including terminals, flights, crews, baggage …, and most are interdependent.In fact, stochastic disruptions in air traffic transport raised the complexity of the resolution models.This is progressively more taken under consideration in most recent studies.
The main target of an airport is to guarantee a fluent flights traffic.Optimal assignment of aircraft guaranteed to make available over time the proper gates.If an aircraft is not assigned, it will be forced to wait on the ramp very well as in the air; This type of scenarios are quite undesirable on account of time wasting and let to flight delays.Also, ramps and airspace are as well resources with a limited capacity.
Gate flight assignment is an essential task of an airport; it is the primary activity in airline traffic transport management [2].Moreover, several airports today have severe capacity constraints resulting from the increase in air traffic volume.The GAP can be regarded as such a problem of constraint resource assignment, in which gates represent resources and aircraft considered as resource consumers.
Furthermore, GAP is thought to be a challenging problem [3] since it includes very inter-dependent resources integrating aircraft, crews, and gates.Therefore, severe disruptions in the airport manifested as flight delays are caused by inadequate assignment, which reduces the customer services and produces inefficient use of gate services and conflicting flights.
Various circumstances could potentially cause stochastic disruptions in gate assignment; it can possibly be interrelated to possible gate dysfunction, a flight delay or earliness, extreme weather conditions, or for any more causes.This type of daily disturbances might reduce the overall performance of the currently assigned gates once associated with actual operations.Therefore, even a unique variation in a single flight plan could engender a series of disturbances for additional aircraft, which have been designated to the same gate.This sort of phenomenon is very unwanted in airline operations due to its noticeable costly impact.
Various GAP models and techniques are identified from the literature.Static as well as stochastic models are developed.Working with methods with an exact solution can be obviously more suitable.However, [4] states that these kinds of exact methods are actually ineffective to resolve real problems.This is because flights in static models are allocated to gates depending on the expected flight schedule using fixed parameters.Nonetheless, in real operations, stochastic disruptions occur frequently, leading to real-time adjustments of gate assignments and flight delays.Consequently, stochastic methods have been widely motivated in recent researches.
Consequently, to build a significantly better gate flight assignment approach, it has to include in the model the possibilities of stochastic flight delays that may arise in real operations.www.ijacsa.thesai.orgWhen it comes to stochastic environments, Markov Decision Processes (MDPs) [5] have confirmed to be effective in optimal decision making.A derived version of MDPs called multi-agent Markov decision process [6] was developed to manage some challenges in the standard GAP based MDP firstly introduced.In this work, collaborative multi-agent based MDPs is built, which are composed of multiple agents attempting to produce the best allocation of aircraft to gates.A new methodology for GAP is provided and regarded as a multi-agent problem that includes robustness for a stochastic disturbance.The GAP is therefore designed as a Multi-agent MDP that is intended to resolve within the assumption of environment uncertainty the GAP.Then, incorporating Time dependence to the developed Multi-Agent model enables further stochastic planning ability.In this method, the stochastic feature is considered caused by flight delays with the flexibility to consider additional constraints for the constructed model.Gates are designed as agents having a centralized controller.Consequently, individual agent or gate possesses full visibility of airport operations and so can be aware of flights allocated to every gate at the time of planning horizon.Built policies take into consideration the time dimension.Time-dependent Multi-agent Markov Decision Processes allows more real illustration of the Gate problem with rewards and transitions varying with time.So TMMDP includes Multi agent aspect coupled with a real-valued time component.This paper is structured as follows: the next section provides a literature review.Section 3 shows the MMDP model for the GAP.Section 4 presents the model of GAP in stochastic circumstances with time dependence.Furthermore, Section 5 will provide experimentations, and at last, conclusion and perspective are dressed.
Airport Gate Assignment Problem is referred to as setting an appropriate gate for each arriving aircraft to the airport until the time of its departure.It is one among the primary components in what concerns the management of airport resources.Gates, being a resource, is subject to the next two groups of constraints as categorized in literature: strict and soft constraints (see [3]).
The first category is obligatory to represent the problem of gate assignment.It comprises these constraints:  Single: Each aircraft have to be assigned just to one gate.
 Feasible: A single gate could be assigned to one single aircraft simultaneously.
Soft constraints are various and can possibly be related to either airlines or airports.Mainly common among constraints in the literature is about to minimize the total walking distances within passenger transfer.(e.g.[7]), or just like assigning aircraft to some specified gates, also taking into consideration the size aircraft for allocating the gate [8].It can also be minimizing the number of aircraft obligated to wait for a gate.
There are several objective functions in GAP models.Some notable ones of literature are cited.These functions are like minimizing the total walking distance [7] or the total waiting time for passengers such as in [9] or also minimizing the number of un-gated aircraft in [2].Others like minimizing the current schedule modification from an initial schedule, or also maximizing the preferences of assigning particular aircraft to individual gates (e.g.[10]) and minimizing gate conflict in [2].In this paper, the stochastic model will implement particularly the last one to minimize conflicting assignment due to flights disruptions.
The GAP formulation is classified into two main types: deterministic and stochastic models.In the first kind, just static parameters are regarded (including passengers, gates, number of flights…); due to stochastic perturbations in realworld operations deterministic models becomes infeasible.Stochastic GAP models have been investigated to consider those disruptions in air traffic into concern such as flight delays or some sever weather conditions.Deterministic models are more a lot discussed in the literature, such as [7].Most have as an objective the minimization of the total passenger-walking distance.Lately, stochastic and robust models are more reviewed assisting operators to act in response to possible uncertain events.
To illustrate stochastic and robust GAP resolutions in literature, [11] displays that having a planned buffer time into the flight schedule can increase schedule punctuality.In [12] and [9], they use in their GAP a fixed buffer time among two consecutive flights assigned to the same gate in order to absorb the possible stochastic flight delays.In [12] author produces a multi-commodity network flow approach as well as in [13].In [14], author builds up a heuristic approach sensitive to stochastic flight delays in a framework that consists of three components, a stochastic gate assignment model, then a real-time assignment rule, in addition to two penalty correction methods.
In [2], GAP is modelled as a stochastic programming model and altered it into a binary programming model; the resolution contains hybrid meta-heuristic, a tabu search, and a local search.Also, an ant system combined with a local search in [15] has been used to an over-constrained airport Gate Assignment Problem with the interest of choosing and allocating aircraft to the gates minimizing the total passenger interconnection.

Recently, a model based heuristics of Mixed Integer
Programming in [16] has been presented, it has been confirmed to be more efficient when compared to the linearized models, and more robust.Likewise, a multiobjective optimization model of GAP has been offered in [17], a particle swarm algorithm for resolution is used for resolution, which gives an improved comprehensive service of gate assignment regarding robustness.Applying also a metaheuristic for resolution, authors in [18] designed a threeobjective problem to the GAP and using a non-dominated sorting genetic algorithm for resolution.
Markov process theory, in general, has been proven for application in airline transport like in [19].Notably, the use of MDP model for GAP has been applied in [20] to deal with gate disturbances with consideration of aircraft size in the assignment, where neighboring gates can just only accept www.ijacsa.thesai.orgaircraft of a specified size or are possibly blocked.A most recent robust GAP with multi-agent MDP model has been provided in [21].
In a similar idea of incorporating stochastic disturbance for establishing gate assignment, a multi-agent system with time dependence for modeling with time dependence is used in this paper.Multi-agent systems (SMA) are a part of Distributed Artificial Intelligence.Their applications are large: game theory, humanities, economics, and other real-world applications including air traffic control, robotics, and networking.SMA methods are interested in connections between independent entities.This circumstance is mainly examined in SMA as the cooperation that requires complex components.
In planning with multi-agent systems, it is commonly supposed to possess are some number of agents, each one with their individual group of actions, and a provided tasks to be solved recognizing that interaction with the other agents is essential.Reinforcement learning has been a practical methodology to construct coexisting agents (e.g.[22]) as well as Markov games (see, e.g.[23]).In general, each agent may possess its personal goals.In this paper, the concern is given to the case of fully cooperative agents; where all of the agents have a shared similar goal to maximize the total expected reward.In particular, where agents are autonomous and distributed, a local Markov Decision Process (MDP's [5]) is used to express every single agent's state and actions space.Therefore, the utility of any given system state is similar for all agents, and with models of uncertainty and general utility, Multi-agent Markov decision process (MMDP) is developed by [24] to incorporate such numerous adaptive agents that interact to compute some given goals.MMDP has been applied in various domains as well as in the air transportation (see [25]).
MMDP is the basis of full observability of the global state by every single agent; it is designed as a set of interacting learners agents, which are autonomous.These agents have to learn in order to cooperate and obtain their assigned goal.It can also either centralized or decentralized in term of decision-making main feature [26].Hence, this paper incorporates Markov decision processes as a formalism in the multi-agent structure (e.g.[24]).It supposes having a centralized controller knowing all information regarding the system (Fig. 1), including actions, the global state of the system, and rewards; thus the controller possesses the decision authority and keeps information distributed among agents.
Multi-Agent notion can as well be combined with realtime valued to include time evolution into the multi-agent system dynamics.A Time-dependent Markov Decision Process (TMDP) is provided by [27] to give this extension.This model is composed of stochastic state transitions and as well as stochastic time-dependent action durations.The actions in TMDP model are stochastic and time-varying: Resulting policies are actions to be performed by agents in every single time sequence.Then, the real planning window can be widespread to problems under uncertainty changing with time.
So, in this formulation as in [28], first, MMDPs consider an assignment centered decomposition approach, which is intermediate between the join MDP method and the method of independent agents.The centralized controller is adopted having the complete relevant information regarding the states of all agents to allocate jobs and assign jobs and resources to agents determined by a task level value functions associated with agents.After the jobs are allocated to agents, the particular lower level actions of agents are driven by the task level value functions till the primary controller reassigns jobs.Then, adding time dependence behavior will give a more realistic representation of the Gate Assignment Problem, inspired by TMDP and coupled with the MMDP approach providing a new formalism of time-dependent Multi agent MDP.This method will help us to have real-time policies to apply in every case of disturbance for the GAP problem.

II. THEORETICAL BACKGROUND
Giving the theoretical knowledge, Markov Decision Processes (MDPs) are defined (see [5]), and then generalized to multi-agent settings.Then, the basic model of Time-Dependent Markov Decision Process (TMDP) (given by [27]) is provided to finally conclude a new extension of MMDP depending on time and formalize the Time-Dependent Multi-Agent Markov Decision Process (TMMDP).

A. Standard Markov Decision Process
Considerably, more research interested in problems having uncertainty in the planning with possibly conflicting objectives.As a tool of artificial intelligence (AI) planning, decision-theoretic dress those challenges, especially, Markov Decision Processes Theory (MDPs).It finds a significant attractiveness in recent researches equally as a computational and conceptual model.MDP is defined by a tuple where s is a finite set of states S describing www.ijacsa.thesai.orgsystematic interests, a finite set of actions A featured to the agent, and then a reward function R. When an action can take an agent from one state to a second one, the results of actions is uncertainty described by the probability P considered as transition model.A mapping defines a policy.The objective is to identify the optimal policy maximizing per each state the expected discounted future reward.MDP is considered in this paper to possess an infinite horizon with exponentially discounted future rewards by a discount factor γ [0, 1).

B. Multi-Agent Markov Decision Processes
The MDP model can be extended to multi-agent systems to define The Markov Decision Processes Multi-agent or MMDP as in [6].In this formalism, the same goal of maximizing the total expected reward is shared for all agents having the same joint utility function.MMDP can be viewed as a generalization of MDP with a single agent; Or, but also a special case of Markov games [29] where the payoff function is identical for all agents.Let define first the MMDP formalism before offering it as a useful framework to constitute a new GAP model.
A MMDP is identified via a tuple <n, S, A, P, R>.Where each one action is identified by the set of actions of all single agents, it constitutes a joint action.Each element is defined as :  n: the total number of agents in the system.
 S: refers to the set of states S.


: identifies the set of joint actions of all agents, defines the set of local actions designed for the agent .
 P defines the transition function; it provides the probability ( ) of the system moves from a state s into a state s' once agents run the joint action .
 R identifies the reward function.( ) is the reward received after moving from a state to a state performing an action .Solving a MMDP is about determining a joint policy .Where corresponds to the policy of a local agent.It identifies a function that gives a mapping to any system state to the action of agent .The joint policy will be computed applying the standard algorithm the Value Iteration (continue to operating in the general situation of decentralized agents, see [18]).

C. Time-Dependent Markov Decision Processes
In standard previously defined MDPs, transitions and rewards are thought to be stationary functions; they do not undergo any change during decision epochs.In literature, some approaches like [30] define Stochastic Time-Dependent Network where stochastic transition durations are included, but transition outcomes are deterministic.A model given by [27] is one of the first models to focus on time as an independent observable state variable; it is named as Timedependent Markov Decision Process.In the TMDP model, each transition, which arises from making an action, is decomposed into a set of possible outcomes {µ}.Every single outcome identifies both a transition duration and a resulting state.
The TMDP model decomposes each transition resulting from the application of action into a set of possible outcomes {µ}.Each outcome describes a resulting state and transition duration.
Formally, the TMDP is defined as in [27] by:  S: Discrete space state.In TDMDP and at time t, if in a state agent executes an action , it will be generated outcome µ1 by certain probability ( ) and an another outcome by a probability ( ) .represents the transition to and gives the transition absolute arrival time, while www.ijacsa.thesai.orgrepresents the return to (failure to leave ) with a duration .Implicitly, a waiting time is inserted before each action in the model.The likelihood functions L governs possible outcomes in the model.Time distributions in a TMDP could be either -relative‖ (REL) or -absolute‖ (ABS) as shown as an example in Fig. 3.
The TMDP model can be represented by the Bellman equations below: The resolution of this model is performed using Bellman equations, (2) representing an undiscounted continuous-time MDP.At each state, the optimal time-value function is a piecewise linear function of time, which could be precisely calculated by value iteration [27].The TMDP model is more general than semi-Markov decision processes [31] that have no notion of absolute time.With absolute time included in the state space, comprehensive set of domain objectives can be modeled beyond the objective to minimize expected time, like for example the probability of designing a deadline.Actually, the variable time dimension may represent further quantities; it can consider planning with the non-linear utilities, or also with continuous resources.

D. Time-Dependent Multi-Agent Markov Decision Processes
Based on the two previous definitions of MMDP and TMDP, a new formalism is defined combining between those approaches.So, it is called Time-Dependent Multi-Agent Markov decision process TMMDP.This is a MMDP seen as cooperative multi-agent systems as in [6] or associated with a time dependence capabilities as defined by [27].MMDP is then extended to take a continuous observable time dimension contained in the state space.Supposing time variable is common between agents, a global time is associated to all agents.
A TMMDP is defined by:  n : Number of agents. R: ( ) Reward attached to outcome µ at time t for all agents with duration δ.The aim of defining TMMDP formalism is to model and solve large real problems of planning under uncertainty taking into account either cooperative agent property and time evolution.Resulting policies are actions to be performed by agents in every time sequence (see Fig. 4).

A. Multi-Agent Reasoning
Various efforts made in the literature to manage uncertainty (see Section 2).With the Objectif to build a robust, a multi-Agent based method is selected to develop a solution that can resist the most to flights disturbances.The choice for this specialized background to model the problem.MAS methods are getting large approval being an effective instrument to solve more complex problems and then designs a promising alternative.As well, many advantages related to multi-agent reasoning such as the distribution of processing, which made some type of problems more simple in conception.Additionally, it provides an intelligent alternative to complex problems and logical approach of decomposing into individual agents that cooperate.www.ijacsa.thesai.orgIn this paper, MAS is considered to be managed by a centralized controller, and the solution is composed of all possible decisions that could be taken within the planning horizon of gates to flights assignment.Therefore, This approach supposes there is no need to take real-time optimization since it is predetermined the solutions for all likely case of disturbances.Hence, for any provided gate assignment combination, the solution offers the best decision of gate allocation to make.

B. Time-Dependence Behavior
The real interest is given to sequential decision problems.Theoretical aspect based on MDPs gives a best well-known tool to model and solve them, giving optimal results.However, real-world problems have additional and specific behavior, which is time dependence.MDP reflects only fixed time steps between decision epochs, which can be easily modeled as iteration steps.This property does not reflect the real evolution of problems like the subject of gate assignment.To bypass this limitation, Time-dependent MDP (TMDP) has been proposed in those models (see the previous section), the transition between states is not instantaneous but proceeds in specific time t.Also in TMDP, the time is always observable, optimal policies give to the agent the best moment to make a decision or execute an action due to the state of the system.
Inspired by other occurrences like the truck dispatching system where decisions about truck assignments and destinations are made in real-time [32], choosing to benefit from temporal aspect and to project it to Gate Assignment Problem.Therefore, the rewards associated with action outcomes in the time-dependent frameworks will be represented as time-dependent functions including more real evolution information of the problem.Before extending the model of GAP to be time-dependent, an earlier formulation like in [21] of the Gate Assignment Problem with MMDP is presented.The model is given by is a tuple <K,S, A, P, R> as a follow (see Fig. 5):

C. Multi-Agent Model for the GAP:
The State is a vector giving the diverse feasible combinations of flights indexed by its assignment position ( ), where k is number of gates and .V represents the set of flights to be allocated to gates during the planning horizon (one day in general).

The set of actions A =
describes the set of joint actions for the agents, gives the set of local actions of the agent .For each single agent, performing , will match an action of allocation a flight to the gate i.
Therefore, each agent is in charge of handling a particular gate, and for agent considers that there is a set of feasible flights to be affected to the gate i.
that are appropriated to be allocated to gate i.This supposition regarded as a feasibility constraint that describes the possible assignment.

Defining:
set of feasible flights for the gate at a discrete time t.Then: ) gives the probability of transition as : It represents the probability of the going from state into another state when agents perform a joint action .This probability is views as the possibility of modifying assignment combination from to resulting from executing a reassignment action.
The probability P is integrating the complete stochastic information about assignment of gates including stochastic delays as well as additional disturbances that impact gate assignment and computed as a probability of occurrence.This probability utilizes other estimation techniques to build the probabilistic model of GAP under possible disruptions.
The way how transition probabilities are defined is essential for building the robustness of the GAP based MMDP model.The state transition stochastic matrix P defines all likely possible state transition probabilities ( ): Where: Various statistical estimating methods could be applied to calculate state transition probabilities described above.The method as in [33] is applied using statistical data of state transition.Actions corresponding to flights combination are identified, and the arising states are collected from data.The collected values from observed data, ( ) corresponds to the case without disruption on state performing action a, and (a) is the case of disruption observed between state and state performing action a. therefore the transition probability between and performing an action a is estimated from observed data as : ( ) www.ijacsa.thesai.orgreward includes as the previous model two components : -A benefit from the gate assignment outcome µ.
-A penalty to assignment outcomes µ that causing a possible disturbance at time t and with duration .

IV. EXPERIMENT
A. Multi-Agent Model experiment: Computational analysis is done to test the efficiency of the used Multi-Agent MDP approach, and utilizing a simple data example to conduct experimentations.
For simplification, data includes two gates and three aircraft to allocate in a discrete window of time between and .
, is set of flights and for it match a vacant assignment gate.
As an initial policy, the solution provided first by a deterministic approach to problem from literature is used.Simple values are used as input parameters only for simulation.The preliminary policy is as follows: ( ): It is designed regarding observations, transition probabilities and rewards are shown in Fig. 7.A simple experimentation is done to demonstrate the feasibility of the suggested resolution method.The initial policy is not possible as a result of delay of the flight (Table II), which causes a conflict in gate allocating.Therefore this solution is used as initial policy in the policy iteration algorithm then the algorithm is performed.
After execution of value iteration algorithm in MatLab, the provided solution offers another order in the gate assignment; optimal policy is ( ) identified as in Table III: Table III shows that the proposed approach can give a solution that is more robust to delays.Compared with the sample agent MDP in [21], this approach is more representative of the problem structure because of the Multiagent distribution of processing, that simplify its conception.Also, MMDP gives gate assignment configurations in multidimensional policies instead of having in MDP a single gate to flight assignment.However, MMDP model gives only fixed time steps between decision epochs (iteration steps), that does not reveal the real evolution of gate assignment witch time is different from iteration step and always observable.Next paragraph gives an experiment with time dependence.

B. Time-Dependent Multi-Agent Model Experiment:
In this paragraph, it is conducted an experiment of Time-Dependent Multi-Agent MDP modeled earlier.
For simplification, every action possesses a single outcome.Hence actions and outcomes can be directly recognized ( ) and actions thought to be deterministic with regard to the discrete component of the state.This is expressed as: Such that is feasible in state , ( ) It is used a real data from six flights of Hong Kong international airport as in Table IV; tree gates are dedicated to those flights.
A Gate conflict is detected between flights LH738/739 and SQ862/861 due to some disturbance.
Starting with a specific state of the system corresponding to the airport gate assignment:  Moreover, exploiting other possible actions is done to apply adapting assignment to arriving flights representing a change in gate configuration.Just for simplification, all outcomes have parameter , so outcomes with durations are not considered.The probability density functions are the defined for every outcome see as example Fig. 9.This probability includes stochastic information related to action execution.Rewards are given in a way to score every action of assignment in the airport.So, implementing the resolution algorithm, the value iteration algorithm gives an exact resolution [27].The given solution consists of time-dependent policy choosing outcome that avoid the disturbance situation.Then, the solution given by this approach is robust and handles flight delays.The fact of including the information about the possible disturbances improves more the GAP solution quality.

V. CONCLUSION AND PERSPECTIVE
In this work, A new approach has been formulated for the Gate Assignment Problem (GAP) powered by Timedependent Multi-Agent Markov Decision Processes (TMMDP).This method aims to constitute a robust mechanism that will give a time valuated approach dealing with disturbances in every time sequence.The provided solution is all of the decisions at every time that could be performed at the time of the planning horizon of flights assignment.This kind of model takes into account real-time optimization because it assumes to have a solution at every time which manages disturbances.
Experimentations on this approach using a real sample data by simulation of the associated value iteration algorithm provides a best feasible solution that the deterministic model.
The aim behind this reflection is to offer to controllers at the airport a robust time valuated solution take in consideration possibilities of gate conflict, even if may take more time to resolution, it can manage well risks in gate assignment.
As perspective, this reflection about this type of model can be more extended to take into account as possible other real constraints of gate assignment.
Time-dependent Markov Decision Process extends the Markov decision process model where a continuous observable time dimension is contained in the state space.The added time variable allows more real representation of large problems with transitions or rewards time-varying.So TMDP includes problems with following properties:  State transitions are stochastic;  Time-dependent action durations are stochastic. Rewards are Time-dependent.


A: Discrete action space. M: Discrete set of outcomes, of the form ( ) : -S: is the resulting space -{ABS, REL}: identifies the type of the resulting time distribution (if it is absolute or relative) -( ) (If T µ = ABS): probability density function (pdf) over absolute arrival times of µ -(δ) (If T µ = REL): probability density function over durations of µ  L: ( ) is the likelihood of outcome µ given action a, state , and time t  R: ( ) is the reward associated to outcome µ at time t with a duration δ In the figure below (Fig. 2), it shows a simple graphic representation of TMDP evolution.

(
) : Utility associated to the outcome µ in time t ( ) : Time-value function of the immediate action ( ) : Expected Q time-value through outcomes.


S : refers to the set of states  : The set of joint actions for the agents is the set of local actions of the agent . M: Discrete set of outcomes, of the form ( ) : -S: the resulting space -{ABS, REL}: identifies the type of the resulting time distribution (absolute or relative) -( ) (If T µ = ABS): pdf (probability density function) over absolute arrival times of µ -(δ) (If T µ = REL): pdf over durations of µ  L: ( ) is the likelihood of outcome µ given join state , time t and join action ( ).

Fig. 7 .
Fig. 7. Transitions and rewards matrixes With .Like in TableI, ( ) expresses a probability of disruption performing action on , which

(
Fig. 8 below shows the state transition corresponding diagram.

TABLE I .
INITIAL POLICY WITHOUT DISRUPTION

TABLE II .
CONFLICTING ASSIGNMENT IN INITIAL POLICY DUE TO DELAY

TABLE III .
OPTIMAL POLICY