Verifying Weak Probabilistic Noninterference

Weak probabilistic noninterference is a security property for enforcing confidentiality in multi-threaded programs. It aims to guarantee secure flow of information in the program and ensure that sensitive information does not leak to attackers. In this paper, the problem of verifying weak probabilistic noninterference by leveraging formal methods, in particular algorithmic verification, is discussed. Behavior of multi-threaded programs is modeled using probabilistic Kripke structures and formalize weak probabilistic noninterference in terms of these structures. Then, a verification algorithm is proposed to check weak probabilistic noninterference. The algorithm uses an abstraction technique to compute quotient space of the program with respect to an equivalence relation called weak probabilistic bisimulation and does a simple check to decide whether the security property is satisfied or not. The progress made is demonstrated by a real-world case study. It is expected that the proposed approach constitutes a significant step towards more widely applicable secure information flow analysis. Keywords—Confidentiality; secure information flow; noninterference; algorithmic verification; bisimulation


A. Motivation
In information security, a confidentiality policy prevents the unauthorized disclosure of information.Confidentiality policies are defined in terms of confidentiality mechanisms, which are approaches to enforce the policies [1].Cryptography and access control are examples of confidentiality mechanisms.But they do not restrict the flow of information inside a program.For example, when an android application grants permission to access contacts, there is no cryptography or access control mechanism to verify legal use of the contacts by the application.This is where secure information flow comes to the rescue.
Secure information flow controls the way information flows throughout a program.Information flow properties are designed to prevent the information from flowing to an unauthorized user, i.e., attacker or low-observer [2].Typically, it is supposed that there are two security levels, high (H) and low (L), corresponding to higher and lower confidentiality for program variables respectively.An information flow property is defined in such a way that it prevents data in H from flowing to L.More complex hierarchies of security levels can be defined via a security structure [3].Information flow properties are of paramount significance for guaranteeing confidentiality of data.Because of this, it is desirable to establish an automatic and efficient verification approach for secure information flow.

B. Background
In most of researches done on secure information flow, a security property specifying the confidentiality policy is formally defined and then a verification method is proposed to check the property.Noninterference [4] is a long-established information flow property, stipulating that high data may not interfere with low data.The absence of interference requires indistinguishability of program behavior, as secret inputs are varied.
Probabilistic noninterference is a widely-used security property for multi-threaded programs, proposed by Volpano and Smith [5], and extended by Sabelfeld and Sands [6].It is a timing-and probabilistic-sensitive property, defined over a simple imperative language with dynamic thread creation.Sabelfeld and Sands define a timing-sensitive partial probabilistic bisimulation to characterize indistinguishability of the executions of the program.The intuition is that lowequivalent states must produce executions that run in lock-step, affect the shared memory in the same way, and the probability of stepping to the states from the same equivalence class be the same [6].
Smith [7] shows that probabilistic bisimulation is too strict regarding time.To address this problem, Smith defines probabilistic noninterference in terms of weak probabilistic bisimulation, allowing probabilistic systems to be regarded as equivalent when they do not run at the same time.The resultant property is called weak probabilistic noninterference, which requires low-equivalent states to produce executions that visit the same sequence of equivalence classes, but some executions may remain in a class longer that the other executions.
Verifying secure information flow is mostly done via information flow type systems.A type system is a formal system of type inference rules for reasoning about properties of programming languages [8].In information flow type systems, www.ijacsa.thesai.org the property of interest is a property of secure information flow, e.g., probabilistic noninterference.Many information flow type systems have been proposed to enforce probabilistic noninterference.Sabelfeld and Sands [6] define a type system to verify probabilistic noninterference.Smith [9] proposes a new type system to enforce probabilistic noninterference for multi-threaded programs running under a uniform probabilistic scheduler.In [7], Smith applies weak probabilistic bisimulation to prove that the type system proposed by him in [9] guarantees the probabilistic noninterference.
Type systems are automated and compositional, but they are not extensible, as each new feature added to the programming language, or variation of the information flow property requires a redefinition of the type system and its soundness proof [10].Consequently, algorithmic verification has been favored recently, which is the application of rigorous, mathematically sound, and fully automatic techniques to the analysis of systems.These techniques are more flexible than type systems, and give a precise and efficient mechanism to verify a variety of security properties, without the need to prove soundness repeatedly [11].
Algorithmic verification techniques have been mostly developed for trace properties, which describe single executions of programs.But, most security properties, including weak probabilistic noninterference, are 2-safety properties.2-safety properties predicate over two executions of a program and consequently, verification requires establishing relationships between two different executions [12].For example, weak probabilistic noninterference is not a property of individual executions and hence not a trace property, because whether an execution is allowed by the property depends on whether another execution is also allowed.2-safety properties are an important subset of relational properties, which describe multiple executions of one or more programs [13].As most classical verification techniques are not adequate to reason about relational properties, recently, many new techniques have been developed for secure information flow [12], [14]- [19], but none for weak probabilistic noninterference.

C. Foreground
In this paper, an algorithm is developed to verify weak probabilistic noninterference for multi-threaded programs running under an arbitrary scheduler.The program to be verified is modeled by a probabilistic state transition system, called probabilistic Kripke structure.Weak probabilistic noninterference is formally defined in terms of semantics of the probabilistic Kripke structure.In the proposed analysis, a program satisfies weak probabilistic noninterference, if and only if all executions with low-equivalent initial states visit the same sequence of equivalent classes with respect to weak probabilistic bisimulation.The verification algorithm computes the quotient space, i.e., the set of all equivalence classes of the probabilistic Kripke structure and does a simple check to decide the satisfaction of the security property.The quotient space is an abstraction of the concrete model of a program and allows obtaining enormous state-space reductions, possibly avoiding sate explosion problem.It is shown that the proposed verification algorithm runs in polynomial time.A case study is provided to show the feasibility of the verification algorithm.Fig. 1 gives a clear picture of the proposed approach.

D. Structure of the Paper
The paper starts by an informal overview of the approach in Section II.The program model assumed throughout the paper is presented in Section III.Weak probabilistic noninterference is defined in Section IV, using weak probabilistic bisimulation.The verification algorithm, time complexity, and application of the algorithm to a case study are addressed in Section V. Discussing related work and comparisons are done in Section VI.Finally, Section VII concludes the paper and discusses some future work.

II. OVERVIEW OF APPROACH
In this section, a tour of the proposed work is given.To build intuition for the proposed approach, the key idea is illustrated using an example.
For clarity, some informal definitions are discussed.Suppose an attacker has full knowledge of source code of a multi-threaded program, can choose a scheduler for its execution, and observe the program behavior under the chosen scheduler.By observing behavior, we mean the attacker can see values of public variables during the program execution.For example, she can print public values.If the attacker can infer information about secret (high) values of the program by observing public (low) values, the program is said to have a leak (or channel).Depending on the ability of the attacker, programs may have different leaks; e.g., explicit, implicit, or probabilistic leaks.Explicit leaks occur when a high value is assigned to a low variable; e.g., l:=h, assuming l is a low variable and h is a high variable.Implicit flows happen because of the control structure of a program; e.g., if h=1 then l:=1 else l:=0.Probabilistic leaks occur as a result of probabilistic behavior of the program.An example of this leak will be given in the following.
Secure information flow to the rescue.Secure information flow analysis aims to detect and consequently avoid information leaks in a program.Usually, it involves three main steps: 1) The program behavior is defined using a www.ijacsa.thesai.orgprogram model; 2) The absence of leaks is defined using a security property; 3) A verification technique is developed to check the satisfaction ability of the property in the given program.In this paper, probabilistic Kripke structure (definition 1) is used to model the program behavior.Weak probabilistic noninterference (definition 8) of Smith [7]    should belong to the same equivalence class.If not, then the program does not satisfy weak probabilistic noninterference and is not secure.
Back to the example, the set of initial states are partitioned.As there is only 1 initial state, just 1 block is obtained: . The execution 0 1 2 3 4 s s s s s  is chosen as the witness and the state names are renamed, so that they are not confused with the states of S K .Thus, the witness execution is ' ' ' ' ' 0 0 1 2 3 4 w s s s s s   (Fig. 3).The quotient space of the combination of A probability distribution  over a set X is a function : . The set of all probability distributions over X is denoted by () X D .The support of a probability distribution () X   D is the set of all elements with a positive probability, i.e., ( A partition of a finite set S of states is a set Here, atomic propositions are possible values of the low variables.K is called finite if S and AP are finite.The set I containing states s with ( ) 0 s   is considered as the set of initial states.The set of successor distributions of a state s is defined as . The set of successor states of a state s is defined as Executions in a PKS K are alternating sequences of states that may arise by resolving both nondeterministic and probabilistic choices in K A finite-memory scheduler denotes a scheduler that can be described by a deterministic finite automaton (DFA).Formally, Definition 4 (Finite-memory scheduler): Let K be a PKS with state space S .A finite-memory scheduler S for K is a tuple ( , , , ) where, is a decision function that selects the next transition ( , )  de q s for any mode qQ  and state s of K ,  is a function that selects a starting mode for state s of K .
The behavior of a PKS ( , , , , ) is as follows.At the beginning, an initial state 0 s is randomly chosen such that 0 ( ) 0 s   and the DFA S is initialized to the mode 00 () q st s Q  .Assuming that K is in state s and the current mode of S is q , the next transition is given by the decision function, i.e., ( , )  ( ) s s Post s q q s s q s q de q s Post s Smith [7] states that if a secure program is run starting from two low-equivalent states, then two executions must pass through the same sequence of equivalence classes.This is captured formally by the definition of weak probabilistic noninterference.

Definition 8 (Weak probabilistic noninterference):
Given a finite-memory scheduler S , a multi-threaded program MT satisfies weak probabilistic noninterference, iff

, ( ). [0]
[0] The main steps of the verification algorithm are sketched in Algorithm 1.The algorithm takes a finite FPKS as input and returns secure if the FPKS satisfies weak probabilistic noninterference, and insecure if it does not.In the sequel, some steps of the algorithm are explained in more detail.
Taking a witness execution: As pointed out earlier, all executions of the input FPKS are infinite and hence form a cycle.To take a witness execution, a cycle detection algorithm based on depth-first search, called colored DFS, is used.The algorithm initially marks all states white.It proceeds by moving to successor states and coloring them, and terminates when a colored state (i.e. a state that was encountered before) is visited.The sequence of states remains in the stack of the depth-first search form the witness execution. are computed using an approach similar to that of Baier and Hermanns [21].The general idea of the computation algorithm is to use an iterative partition refinement technique.It starts from a trivial initial partition, where each block of the partition contains all low-equivalent states (condition (1) of the definition 6).It then successively refines the given partition by splitting any block of the partition into sub-blocks, eventually resulting in the set of weak probabilistic bisimulation equivalence classes.A general schema of the iterative refinement is depicted in Fig. 4.
The main idea for splitting each block B of the partition is to isolate non-silent states , s s B   with equivalent conditional probability to some other block C , i.e. ( , ) ( , )

PP
, in order to ensure condition (2) of the definition 6.By condition (3) of the definition, each such non-silent isolated subblock AB  has to be enriched with those silent states of B , which produce execution fragments that remain inside B and end up in A .Fig. 5 shows how B is refined into two subblocks.www.ijacsa.thesai.org

B. Correctness of the Algorithm
Before diving into proving correctness of the proposed algorithm, a lemma is presented, which will be used in the correctness proof.This lemma asserts that p  can be lifted from states to executions and vice versa.Lemma 1. Weak probabilistic bisimilar states have weak probabilistic bisimilar executions and vice versa:   yields an execution fragment that fulfills the desired conditions.assume that j is minimal, i.e., ,

C. Complexity of the Algorithm
For computing the initial state blocks, HashMap class of Java was used.The worst case complexity of inserting a keyvalue pair to the hash map is   || O AP .Hence, the time complexity of computing the initial state blocks is Let t be the number of transitions of K S .A witness execution can be extracted in time ( | |)  O t S  .Thus, the time complexity of extracting all witness executions is The quotient space / p   K can be constructed in time

D. Case Study
The algorithm proposed in this paper has been implemented as part of SCT (Security Certifying Tool), which has been developed in JAVA to verify secure information flow for multi-threaded programs.SCT gets a probabilistic Kripke structure as model of the program and checks whether the program satisfies weak probabilistic noninterference.To our knowledge, no other algorithmic verification technique for weak probabilistic noninterference has been published, so it is not possible to compare the implementation to other algorithms.
As a case study, consider the problem of dining cryptographers.The problem is borrowed from [11] to show how an attacker can deduce secret information through probabilistic leaks.David Chaum first proposed this problem in 1988 as an example of anonymity and identity hiding [23].In this problem, three cryptographers are sitting at a round table to have dinner at their favorite restaurant.The waiter informs them that the meal has been arranged to be paid by one of the cryptographers or their master.The cryptographers respect each other"s right to stay anonymous, but would like to know whether the master is paying or not.So, they decide to take part in the following two-stage protocol:  Stage 1: Each cryptographer tosses an unbiased coin and only informs the cryptographer on the right of the outcome.The situation is illustrated in Fig. 8.In this figure, c1, c2, and c3 are identities of cryptographer 1, cryptographer 2, and cryptographer 3 respectively.
 Stage 2: Each cryptographer publicly announces whether the two coins that she can see are the same ("agree") or different ("disagree").However, if she actually paid for the dinner, then she lies, i.e., she announces "disagree" when the coins are the same, and "agree" when they are different.An even number of "agree"s implies that none of the cryptographers paid (the master paid), while an odd number implies that one of the cryptographers paid.David Chaum names this protocol as Dining Cryptographers network or DCnet.DC-net is secure, since it does not leak the identity of the paying cryptographer (in case one of the cryptographers made arrangement to pay for the meal).Following Ngo [11], to make this protocol leak information, a slight change is done: coins are biased, i.e., with probability 0.6 it comes up heads, and with probability 0.4 it comes up tails.www.ijacsa.thesai.org To model the case study, PRISM has been used.PRISM is a tool for formal modeling and analysis of probabilistic systems [24].PRISM describes models using the PRISM language, a simple, state-based language with a guarded command notation.The program is implemented in PRISM and its model is built.Then, export the explicit-state model, containing the set of reachable states and their labels, along with the transition matrix.Then, the model is given to SCT to compute the quotient space and check the security property.SCT was run on a PC with a Core i3 2.53 GHz CPU and 6 GB RAM.
Without lack of generality, suppose one of the cryptographers has made arrangement for the meal, and the other one is the attacker, i.e., the one who tries to find out the payer"s identity.The FPKS K S of the model built by PRISM has 285 states and 582 transitions.K S has just 3 initial states.
All initial states have the same label value of    To see how an attacker can infer the identity of the payer, consider an example scenario where cryptographer 2 is the attacker and aims to find out which one of the cryptographers 1 or 3 is the payer.Suppose cryptographer 2 and cryptographer 3 both toss tail.Cryptographer 2 can observe the coin of cryptographer 3, and thus announces "agree".Assume cryptographer 2 observes that cryptographer 1 announces "agree" and cryptographer 3 announces "disagree" for the values of the coins.Two situations corresponding to this case are shown in Fig. 9 and executions of these situations are outlined in Fig. 10.In Fig. 10, each state is represented as 10tuples listing the current values of the variables (pay, agree1, agree2, agree3, coin1, s1, coin2, s2, coin3, s3) and labeled with the current value of parity: 0 for even number of "agree"s, and 1 for odd number of "agree"s.The variable pay contains the number of the cryptographer who is actually the payer.Variables agree1, agree2, and agree3 contain the announcements of cryptographer 1, 2, and 3, respectively: 0 for "disagree", and 1 for "agree".Variables coin1, coin2, and coin3 contain the coin values for cryptographer 1, 2, and 3, respectively: 1 for head, and 2 for tail.Finally, variables s1, s2, and s3 contain the status values for the three cryptographers: 0 for "not done", and 1 for "done".Execution 1  occurs when cryptographer 1 is the payer and tosses head.Therefore, cryptographer 1 announces "agree" and cryptographer 3 announces "disagree".Execution 2  occurs when cryptographer 3 is the payer and tosses tail.Thus, cryptographer 3 announces "disagree" and cryptographer 1 announces "agree".As seen in Fig. 10, the probability of 1  (i.e.cryptographer 1 tossing head) is more than the probability of  (i.e.cryptographer 1 tossing tail) and hence the attacker can deduce that cryptographer 1 is more likely to be the payer.This is a probabilistic leak.www.ijacsa.thesai.org

VI. RELATED WORK
In the following, some related approaches from the literature are discussed and the proposed approach is compared with them.Barthe et al. [10] propose the idea of self-composition for logical characterization of information flow properties.Selfcomposition reduces the problem of verifying information flow property for a program P to a safety property for a program derived from P, by composing P with a renaming of itself.Then, standard model checking and algorithmic verification techniques can be used to verify secure information flow.Terauchi and Aiken [14] introduce 2-safety properties, which can be refuted by observing two executions.They show that termination insensitive secure information flow problem is a 2safety problem.They further generalize the idea of selfcomposition and show that it can be used to verify 2-safety properties.Huisman et al. [15] use the idea of self-composition to characterize secure information flow in CTL* and modal µcalculus temporal logics.They specify secure information flow using observational determinism, an information flow property proposed by Zdancevic and Myers [25] for concurrent programs.Van der Meyden and Zhang [16] employ a selfcomposition-like method to reason about noninterference properties and develop algorithmic verification techniques for these properties.They characterize the computational complexity of the developed verification techniques and discuss some possible heuristics for optimizing the verification.Verification methods that use the idea of self-composition suffer from the state-space explosion problem, i.e., space needed to store the states and transitions of the program exceed the available memory.This occurs because in self-composition a program model is composed with a copy of itself.In the proposed algorithm, the program model is composed with only a small part of the model (witness execution).Furthermore, security analysis is done on the abstract model (quotient space), not on the concrete model.Ngo et al. [26] propose scheduler-specific probabilistic observational determinism as a property to specify secure information flow for probabilistic multi-threaded programs.They define the property based on two conditions.First condition requires that all traces of each public variable starting in the same initial state are stuttering equivalent.A trace of an execution is a mapping of states of the execution to the corresponding state labels.Two traces are stuttering equivalent if they become the same after removing repeating adjacent labels.Second condition requires that for all traces of an initial state i s , there exists a trace of an initial state ' i s low-equivalent to i s , that is stuttering equivalent to each one of the traces of i s and the probabilities of the traces are the same.Condition 2 of this property is closest in semantics to our definition of weak probabilistic noninterference.Of course, weak probabilistic noninterference requires weak probabilistic bisimulation between executions, which is different from stuttering equivalence.To verify condition 2 of their property, Ngo et al build two FPKSs for each pair of initial states i s and ' i s .Then, they transform the FPKSs to stuttering-free ones and check equivalence of the probabilistic languages arising from executions of the two FPKSs using an off-the-shelf algorithm.
The time complexity of the algorithm is   3 On for each pair of initial states i s and ' i s , where n is the number of states of each FPKS.The deficiency of this verification algorithm is that it builds two copies of the program for each pair of initial states.It is clear that if the input program has enormous state space, then the algorithm would suffer from the state explosion problem.
A trending field in security verification is proof-based verification, in which mathematical logic is used to describe the program, specify the property of interest, and prove satisfiability of the property.Hoare logic [27] is one of the most widely-used logics for proof-based verification of software.Variants of Hoare logic have been proposed for verifying relational, and in particular, k-safety properties [28][29][30].An advantage of these techniques is that they avoid the state-space explosion problem, because they do not check the whole state space of the program.Consequently, they are suitable for verifying programs with huge, and even infinite, state space.A disadvantage with these techniques is that they are semi-automatic.Although many of the proof steps are done mechanically, some steps need expert user intervention.This contrasts with algorithmic verification, which is fully automatic.

VII. CONCLUSIONS AND FUTURE WORK
In this paper, the problem of verifying weak probabilistic noninterference was discussed.Weak probabilistic noninterference is a notion of confidentiality for multi-threaded programs.The behavior of multi-threaded programs running under the control of a scheduler was modeled by probabilistic Kripke structures.Weak probabilistic noninterference was formalized in terms of executions of the probabilistic Kripke structure.Then, a verification algorithm was proposed to check the property.
As future work, we plan to use the proposed algorithm to verify other information flow properties.We believe the applicability of the algorithm can be extended and it can be used to verify many security properties, such as strong security [6] and probabilistic noninterference [6].In an earlier paper [31], we used a similar algorithm to verify observational determinism.
A disadvantage of the proposed verification algorithm is that it works on explicit model of the program, which may be too huge for real-world programs.This harms scalability of the approach.To solve this problem, one can change the algorithm in such a way that it works on abstract models of the program, such as binary decision diagrams.
We also aim to modify the algorithm to support compositional verification, thereby reducing conceptual complexity and making the analysis scale.

Fig. 5 .
Fig. 5. Refinement of the block B into A and B\A.

Fig. 7 .
Fig. 7. Relation between the witness execution i w and other executions of
i w do not belong to same equivalence class and hence SCT correctly recognizes the model as insecure.
same equivalence class, the verification algorithm returns insecure.This is what was expected.www.ijacsa.thesai.orgIII.PROGRAM MODEL In this section, the program model assumed throughout the paper is introduced.Furthermore, some basic concepts concerning probability distributions, partitions, and equivalences are recalled.
Probabilistic Kripke structures are used to model operational semantics of probabilistic programs.Probabilistic Kripke structures are state transition systems that permit both probabilistic and nondeterministic choices.A state of a probabilistic Kripke structure indicates the current value of all low variables (shared memory of the multi-threaded program) together with the current value of the program counter that indicates the next program statement to be executed.
i B are called blocks.An equivalence relation R on S partitions S into the set of equivalence classes.The equivalence class of sS  w.r.t.R , denoted [] R s , is defined as [ ] { | ( , ) } R s s s s R   .The set of equivalence classes of S w.r.t.R is called quotient space, denoted / SR .
. More precisely, a finite execution fragment  of K is a finite state sequence 01 DnPost s [21]e infinite.It is assumed that the state space of the model of the multi-threaded program and the shared memory used by the threads are finite.threadedprogramissecurewhenavariation of the values of the high variables does not influence the lowobservable behavior of the program[6].Thus, low-observable behavior of the program should be indistinguishable as high variables are varied.Variation of the values of high variables is represented by low-equivalence relation.Two states 1 ] uses the notion of weak probabilistic bisimulation to represent the indistinguishability of low-observable behavior of the program.Weak probabilistic bisimulation abstracts from steps that remain inside the equivalence classes, i.e., it does not care which state within the equivalence class the system is in[21].denotesthe conditional probability for non-silent state s to block C under the condition that being in s the system does not make a move inside [] R s .
an FPKS, modeling the executions of the program MT under the scheduler S. Now, K S satisfies weak probabilistic noninterference if and only if[0]i w w.r.t.p  .

Algorithm 1 .
Verification of Weak Probabilistic Non-interference

Computing the quotient space w.r.t.
p  : Equivalence classes w.r.t.p 0 Algorithm 1 returns secure if and only if the input FPKS K S satisfies weak probabilistic non-interference.
i w and  are weak probabilistic bisimilar if and only if their initial states, i.e.,[0]