Characterization of Dynamic Bayesian Network the Dynamic Bayesian Network as Temporal Network

— In this report, we will be interested at Dynamic Bayesian Network (DBNs) as a model that tries to incorporate temporal dimension with uncertainty. We start with basics of DBN where we especially focus in Inference and Learning concepts and algorithms. Then we will present different levels and methods of creating DBNs as well as approaches of incorporating temporal dimension in static Bayesian network.


INTRODUCTION
The majority of events encountered in everyday life are not well described based on their occurrence at a particular point in time but rather they are described by a set of observations that can produce a comprehensive final event.Thus, time is an important dimension to take into account in reasoning and in the field of artificial intelligence in general.To add the time dimension in Bayesian networks, different approaches have been proposed.The common names used to describe this new dimension are "temporal" and "dynamic ".

A. Definition
Bayesian networks represent a set of variables in the form of nodes on a directed acyclic graph.It maps the conditional independencies of these variables.They bring us four advantages as a data modeling tool [16,17,18] A dynamic Bayesian network can be defined as a repetition of conventional networks in which we add a causal one time step to another.Each Network contains a number of random variables representing observations and hidden states of the process.
We consider a dynamic Bayesian network composed of a sequence of T hidden state variables (a hidden state of a DBN is represented by a set of hidden state variables) and a sequence of T observable variables where T is time limit of the studied process.
In order that the specification of this network is complete, we need to define the following parameters: -The transition probability between states -The conditional probability of hidden states knowing observation -The probability of the initial state The first two parameters must be determined for each time .These parameters can be invariant or not over time.

B. Inference
The general problem of inference for DBNs is to calculate where is the hidden variable at time and represents all observations between times and .
There are several interesting cases of inference, they are illustrated below.The arrow indicates : that we try to estimate.Shaded regions correspond to observations between and We consider a dynamic Bayesian network B. We wish to calculate the probability ) of occurrence of the sequence of observation .This probability is: Applying directly this formula, the computation time is O(TNT).For this, we consider the forward variable defined by: (2) which expresses the probability of observing the sequence while lying in state .This variable can be computed inductively: Initialisation : Induction : Thus, we can calculate , this naturally leads us to: It is also possible to perform the calculation in reverse, using the backward algorithm.
For this, we define the backward variable as follows: (5) This variable expresses the conditional probability of observation from time t +1 until the last observation time T, given the values of the hidden states at time t.Its calculation follows the following procedure: Initialisation : Thus, we can calculate the expected probability: The complexity of this algorithm is, as the forward algorithm in O(TN 2 ).
From these two factors (forward and backword) propagation probabilities, we can explore other terms that are useful for inference and learning of Dynamic Bayesian networks:  Smoothing: this is to calculate where t < T. From equations ( 4) and ( 6), we can determine the following equation: is called smoothing operator.We can also derive higher order smoothing equations.For example, a smoothing of the first order is defined as follows: ∑ These terms may be used to easily calculate the probabilities of hidden states from the neighboring nodes.This task can be solved using the dynamic programming algorithm of Viterbi.We can start with the following equation: (13) Considering the topology of the DBN, we can deduce: We can now easily deduce that: To find ̂ , we must introduce the argument that maximizes as follows: And we have: Note that if we want to use the Viterbi algorithm to decode the sequence of hidden states, we must have a complete observation .If the number of observations is not sufficient, a less optimal solution known as the truncated Viterbi algorithm can be used.

2) Junction Tree Algorithm:
The Junction Tree Algorithm [1] is an algorithm similar to the Baum-Welch algorithm used in HMM.It involves transforming the original network into a new structure called junction tree and apply a type inference algorithm used for static Bayesian networks.This tree is obtained by following these steps:  Moralization: connecting parents and eliminating directions. Triangularization: selectively adding arcs to the graph morale (not to have cycles of order 4 or more). Junction Tree : is obtained from the triangulated graph by connecting the cliques such that all cliques on the path between two cliques X and Y contain X ∩ Y

B. Approximate inference:
When the dimension of Bayesian networks increases, the computing time is increasingly important.When the conditional probability tables are derived from data (learning), these tables are not accurate.In this case it is not worth wasting time by making exact inference on probabilities not precise, hence the use of approximate inference methods.Among the approximate inference methods that often work well in practice, we give:

1) Variational methods
The simplest example is the approximation by the average (mean-field approximation) [2], which exploits the law of large numbers to approximate large sums of random variables by their average.The approximation by the average product of a lower probability.There are other, more stringent, resulting in a lower and upper.

2) Monte Carlo
The easiest Monte Carlo Method [3] is the Importance Sampling (IS) that produces a large number of samples x from the unconditional distribution of hidden variables) then we give weight to samples based on their likelihood (where y is the observation).This forms the basis of Particulate Filter which is simply the Importance Sampling adapted to a dynamic Bayesian network.

3) Loopy Belief propagation
We apply the algorithm of Pearl [4] to the original graph even if it contains loops.In theory, one runs the risk of double counting certain words but it was shown that in some cases (for example, a single loop), events are counted twice and thus cancel out fairly between them to give the correct answer .

4) Leaning:
Learning is to estimate the probability tables and conditional distributions CPTs CPDS.This task is based on the EM algorithm (Expectation Maximization) algorithm or the GEM (General Expectation Maximization) for DBNs.
Let M be a Dynamic Bayesian network with parameter , learning aims to determine ̂ such the posterior probability of the observations is maximal, then either: EM Algorithm: This algorithm includes:  an evaluation step of expectation (E), which calculates the expectation of the likelihood taking into account the recent observed variables,  a maximization step (M), where an estimated maximum likelihood parameters by maximizing the likelihood found in step E.

C. Pruning
This task is based on the possibility of change in time, of RBD's structure.This is usually omitted for its complexity.Pruning the network consists in perform one of the following operations:

a) Delete one or more states of a given node b) Remove a connection between two nodes c) Remove one or more network nodes
This can be exact (lossless) or approximate

III. DIFFERENT LEVELS OF CREATING DBN
To describe a dynamic Bayesian network, we must specify its topology (the graph structure) as well as all the tables of conditional probability distribution.You can learn them both (the graph and distributions) from experimental data.www.ijacsa.thesai.orgHowever, it is more difficult to learn a structure to learn its parameters.
It is possible that some nodes are hidden during the experiments (values that we can't observe), or missing data.In this case, learning becomes more complicated settings.From these considerations, there are 4 possible cases of learning [5]:

B. Known structure /Partial observability
When certain variables are not observable, the likelihood surface becomes multimodal and we must use iterative methods such as EM or gradient increasing to find local maxima of the function ML / MAP.The principle of the EM algorithm is to associate a problem with an incomplete data problem for which complete data for a simple solution exists for the maximum likelihood estimate.This procedure needs to use an inference algorithm to compute the parameters for each node.These algorithms are explained in section II.3

C. Unknown structure / Full observability
There are several techniques for learning DBN structure from observed data.These techniques help to create the network structure by adding or deleting edges between any two nodes or reversing the direction of an existing arc.These changes must be made in order to maintain and acyclic directed graph.
To accomplish the task of structural learning, we need [6]: -an algorithm to find the different possible structures -a metric for comparing the possible structures to each other The structure learning algorithms can be classified into two broad categories.
 The first class of algorithms using heuristic search methods to construct the graph and evaluates it using scores (scoring methods).This procedure is repeated until the improvement between two consecutive models is not significant. The second class of algorithms to create the network structure by analyzing the independence relations between nodes.These independence relations are measured using several types of tests of conditional independence (eg mutual information between two nodes can be considered as a criterion for conditional independence) According to Cheng et al.
[7], when comparing the two types of algorithms, we can conclude that the first class of algorithms are faster than the second if the network is densely connected, but can't find the best solution for most models corresponding to real processes of the heuristic nature of these algorithms.The second class of algorithms can produce, under some assumptions, an optimal or near optimal solution especially when the data are not numerous.

D. Unknown Structure /Partial Observability
The EM algorithm is developed to make learning network settings, so it must be adjusted to perform structural learning from incomplete data.The structural EM (SEM) is one of the most popular techniques that are developed for this purpose.SEM has the same E-step EM algorithm for completing the data using observations and the current structure of the network.The M-step involves two parts: In the first, it recalculates as already explained, the maximum likelihood to determine the parameters.In the second part, it uses these parameters to evaluate any other candidate structure similar to the current structure.

IV. DIFFERENT APPROACHES FOR INCORPORATING TIME IN BAYESIAN NETWORK
Dynamic Bayesian Networks (DBN) are an extension of Bayesian networks that represent the temporal or spatial evolution of random variables.There are several models for incorporating time into network representation.These models can be classified into three broad categories:  Models that use static BNs and formal grammars to represent the temporal dimension (temporal probabilistic networks (TPNs)  Models that use a mixture of several probabilistic frameworks  Models that use temporal nodes in the static BNs to represent temporal dependencies The first two models are developed for specific objectives and have a very limited use.We will therefore focus on the third model.

A. Probabilistic Temporal Networks (PTN) 1) Definition
A probabilistic temporal network (PTN) is defined as a model, representing the time information while fully embracing probabilistic semantics.In a PTN, the nodes of the www.ijacsa.thesai.orggraph are the temporal aggregates and the arcs are causal and / or temporal relations This type of network uses grammatical rules to express temporal dependencies in the structure of Bayesian networks: The conservation of the structure of static Bayesian networks allow reuse of powerful techniques for inference of BNs this specific type of networks.Grammar introduce temporal relations between events

2) Temporal Reasoning
In PTN, temporal reasoning is based on interval algebra [8] which was introduced by James F. Allen in 1983.This is a calculation that defines the possible relationships between time intervals and provides a table of composition that can be used as a basis for reasoning on descriptions of temporal events.
The 13 following basic relations capture possible relationships between two intervals are illustrated in the following table:

C. Pure Probabilistic DBN
In this section we consider a DBN as a graph whose nodes represent states and arcs represent conditional dependencies (causal) between states of a band as well as temporal dependencies between the states belonging to two consecutive time slices

1) Extension of BNs toward DBNs
A static Bayesian network can be extended in many ways to represent temporal process.These extensions can be classified into five categories:

1-
Adding the history of a node to explicitly express the temporal aspect in the Bayesian network.

2-
Select from a library of pre-developed Bayesian network, the RB appropriate to the current state.-Changing network settings (values of the table of conditional probabilities CPT) of a time slice to another -Adding or deleting new nodes and / or arcs to the structure of BN.
The structural changes of a DBN (addition or deletion of edges or nodes) is a complex problem and can not be generalized easily.In the following, we are interested in changing parameters (CPT) system.
In [10] Zweig and Russell presented a model that uses decomposition techniques to represent dynamic situations real.These dynamic processes can be decomposed into several sequences.Such decomposition can be used in speech recognition or recognition of manuscripts.They found it more suitable to represent dynamic processes (temporal) creating a RB (a subnet) at each stage in the evolution of the process to model the whole process by a single BN.Each sub-network must be learned from observations at the appropriate time.
3) DBNs for events representation www.ijacsa.thesai.org In such networks, we use information obtained from states belonging to two consecutive time slots in order to deduce the events that took place between the two points of time.Structure of these networks is presented in the following diagram: -The random variables (corresponding to states of the real process) -Observations -Events Dynamic Bayesian networks are a repetition of the traditional network in which we add a causal link (representing the time dependencies) of a time step to another.The network topology is the same for the different time slots.Arcs and probabilities that form these models have the same interpretations as for a statistical system based on a classic SNL.Thus, a DBN is completely defined by giving the couple , with: is a BN which defines the a priori probability (initial state) -is the temporal Bayesian Network with two time slices (2TBN: two-slice Temporal Bayes Net) which defines using a directed acyclic graph DAG as follows:

∏
Where represent le i th node at time t and is the parent of in the V. FROM HMMS TO DBNS The main difference between the HMM and dynamic Bayesian networks is that in an RBD the hidden states are represented as distributed by a set of random variables .Thus, in an HMM, the state space consists of a single random variable .
Figure 2 shows a HMM represented in its graphical form with a dynamic Bayesian network.The gray nodes represent observed nodes and nodes in white are the hidden nodes.In Figure 2, following the notations used in the literature on the HMM, the node represents the initial state with .The transition matrix is represented by tables of transition probabilities between nodes et with Finally, the observation matrix is found in probability tables between nodes t X and t Y with Thus, the specification of an HMM as a dynamic Bayesian network is simply given by the probability tables for , et . Assuming that the model is invariant over time (transition matrix and observation are fixed over time) then the givening of , et are sufficient.
The major advantage of dynamic Bayesian networks over HMM is that it is very easy to create alternatives to HMM simply giving another structure more or less complex DBN.The formalism and algorithms remain the same [11].If you change the tables of probability distributions (discrete tables) by continuous distributions (eg Gaussian), then it also becomes possible to represent models based on Kalman filters [12].It is also possible to combine these different models simply by hanging them DBN and thus provide more complex models.

VI. REPRESENTATION OF HMMS AS DBNS
There are several variants of HMM, which were proposed in response to specific classes of problems and to overcome limitations in traditional HMMs.
In this section, we will present the variations of the most widely used HMM (shown in Figure 3).The coupled HMM (Figure 3 (a)) is probably the most natural structure, which can process, simultaneously and with good efficiency, multiple data streams from the same observations.For this, we will briefly introduce other representations and will be presented in more detail the coupled HMMs in the next section.
Figure 4 (b) is a specific coupling of HMM described in [13] as an event coupled HMM.The motivation for this representation is to model a class of loosely coupled time series where only the occurrence of events are coupled in time.
The representation of events coupled with HMMs is obviously limited by its narrow structure and this structure is for a very specific class of applications.
Input / Output HMM (Figure 4 (c)) [15] represents a promising alternative to the use of a hidden Markov model.This variant allows to map an input sequence and output sequence.The main difference with traditional HMMs is indeed the first is the distribution of the output sequence when the second shows the distribution conditional of the output sequence given an input sequence .This allows for spot monitoring or recognition of sequences online.The inputs and outputs can be or continuous, scalar or vector.The factorial HMM (Figure 3 (d)) [14] is a model used to represent systems in which the hidden states are made from a set of decoupled dynamical systems and with only one observation available.

A. Definition
In a coupled HMM, each hidden variable (state) is connected to his own observation.It is also connected to its two nearest neighbors in the time slice with the exception of the following variables belonging to chains border, each with a single nearest neighbor (see Figure 4).In the same way as for traditional HMMs, we use the Forward-backword algorithm to calculate in the case of L coupled HMMs.There, in this case each observation is a vector ).Since L HMMs are coupled, the variables forward and backword should be defined jointly for all HMMs.In other words, we define the forward variable as follows: Therefore, we can calculate inductively the two variables as follows: And the likelihood function can be calculated as follows:

D. EM algorithm for learning parameters of CHMM
As in the case of traditional HMMs, the two basic steps of the EM algorithm as described in [3] are:  Estimation step: Given the observations O, the parameters to estimate and the objective function , we construct an auxiliary function: that represents the expectation of the objective function of all sequences of possible states, given the observations O and the current parameters estimated  Maximization step: In the exact EM algorithm, the role of this step is to estimate the new parameters ̂ as follows:

VIII. MOTIVATION OF USING CHMM
According to its definition, a coupled HMM can be viewed as a collection of HMMs, one for each data stream, where the discrete nodes at time t for each HMM are conditioned by the discrete nodes at time t -1 of all HMMs linked.The characteristics of handwritten characters can perform a joint analysis of the image of a character according to the two preferred directions: vertical ("column") and horizontal ("lines").So we will use the coupled HMM (CHMM) to couple two HMMs: one can handle comments on the columns and the second will be used to handle comments on the lines

Filtering:
this is to estimate the belief state at time knowing all the observations until this moment: Decoding (Viterbi): decoding problem is to determine the most likely sequence of hidden states knowing the observations up to time : Prediction: This is to estimate a future observation or state knowing the observations up to the current time ijacsa.thesai.orgSmoothing (offline): is to estimate a past state knowing the observations up to the current time T There are several algorithms for inference in Dynamic Bayesian Networks.We can classify these algorithms according to their accuracy, in two broad classes: A. Exact Inference: 1) Forward-Backword Algorithm: The algorithm proceeds in two steps: a) Forward step: forward propagation of probabilities b) Backword step: backward propagation of probabilities FORWARD ALGORITHM:


Prediction: this is to calculate et .We can easily determine: DBN as a mixture of several probabilistic structures Dynamic Bayesian networks generalize hidden Markov models (HMM) and linear dynamical systems (LDS) by representing the hidden states (and seen) as state variables, with complex interdependencies.The HMMs are used to represent discrete states and the LDS are used to represent states (variables) continuous.The combination of these two structures to create a mixed-state DBN.This type of model was introduced and applied to the recognition of human gestures[9]

3 -
Changing dynamics of the network structure.4-Repeat the traditional network for each time step by introducing Bayesian networks to represent events.5-Repeat the classical Bayesian network by adding arcs representing the time dependencies of a time slice to another.The networks of the first category may be regarded as mere static BNs which is added an additional node to represent past information in time.The second class of Bayesian networks is the object of an idea that has been used in early work on DBNs by Singhal et al.They use a bunch of BNs (COBRA) developed locally and every time the system selects the Bayesian network corresponding to its beliefs about the current state of real objects studied, hence the dynamic (temporal) aspect of this class We will describe in more detail, the three other types of extension of BNs to the DBN: 2) Dynamic change in the structure of DBNs Changes in the structure of a DBN can be:

Figure 1 .
Figure 1.General structure of a dynamic belief network In such networks, there are three types of nodes W, O and E which represent respectively:

Figure 2 .
Figure 2. HMM represented as an instance of RBD unrolled over three time steps

Figure 3 :
Figure 3: different variants of HMM: The empty circles represent the hidden states and the gray circles represent the observations (slightly gray circles in Figure (d) represent the input nodes).(a) coupled HMMs, (b) event coupled HMM, (c) factorial HMM, (d) input-output HMM.

Figure 4 .
Figure 4. Coupling of two HMMs B. Parameters of coupled HMMs Let a CHMM model formed with L coupled HMMs.This model is fully described giving the following parameters:  initial probabilities: www.ijacsa.thesai.org( )

TABLE II .
ALLEN'S INTERVAL ALGEBRA