A Game Theoretic Framework for E-Mail Detection and Forgery Analysis

In email forensic, the email detection and forgery conflict is an interdependent strategy selection process, and there exists complex dynamics between the detector and the forger, who have conflicting objectives and influence each other’s performance and decisions. This paper aims to study their dynamics from the perspective of game theory .We firstly analyze the email basic structure and header information, then discuss the email detection and forgery technologies. In this paper, we propose a Detection-Forgery Game (DFG) model and make a classification of players’ strategy with the Operation Complexity (OC). In the DFG model, we regard the interactions between the detector and the forger as a two-player, non-cooperative, nonzero-sum and finite strategic game, and formulate the Nash Equilibrium. The optimal detection and forgery strategies with minimizing cost and maximizing reward will be found by using the model. Finally, we perform empirical experiments to verify the effectiveness and feasibility of the model. Keywords—email detection; email forgery; game theoretic model; Nash Equilibrium; the optimal strategy


INTRODUCTION
E-mail is ubiquitous in the contemporary commercial environment.Because of its convenience, low cost and rich content, it becomes one of the most widely used applications for people to transmit information on the Internet.However, the widespread use of the email has made it a common tool and carrier for criminals to commit criminal activities.Meanwhile, the forensic investigators are more often to take the email as evidence of criminal cases.Therefore, technical appraisal of email plays an increasingly important role in solving cases and providing evidence in the court.
To verify the authenticity of email evidence, a scientific appraisal technology is of great concern.However, as email is complicated, with different protocols for receiving and sending, various email server software and client, research on email technical appraisal is almost a blank in nowadays.Genwei Liao proposed the basic ideas to identify the authenticity of email from following parts: email header, server log file, email sending environment, email content abnormal and the logic between emails [1].M.T Banday and Hong Guo et al. studied the working principle of an email, discussed the construction mechanism of keywords that commonly used in the header field, and applied the analysis to email forensic [2,3].Based on the email header information, Preeti and Surekha et al. provided an algorithm to identify the data, time, and address spoofing [4,5].Email authenticate is challenging due to not only the flexibility of composing, editing, deleting of emails by using offline or online applications, but also the various fields that can be forged by hackers or malicious users.However, the current researches study the emails authentic identification only from the perspective of detecting but not forging.
To have a better performance on emails authentic identification, detectors can consider what method had falsifiers taken to fabricate emails.The game theoretical analysis is useful for analyzing, modeling and deciding for the interdependent and antagonistic relationship.The player can have a prediction of other players" action and strategy in the game theory model.Recently, many literatures on the forensic and anti-forensic with game theoretical framework have been proposed.Mauro Barni et al. used the game model to solve the optimum forensic and counter-forensic strategies in source identification with training data [6].The model is used to derive the Nash equilibrium and the condition under which the false negative error probability tends to zero.Matthew C Stamm et al. developed a theoretical understanding of interactions between a falsifier who uses anti-forensics and a forensic investigator, and decided the optimum decision rule by predicting the falsifier"s best anti-forensic strength [7].Xiangui Kang et al. defined a VIF (Video Inter-frame Forgery) game to analyze the interplay between the forensic investigator and the falsifier, and used the Nash equilibrium strategy to decide under which false alarm rate can the detection rate reach 100% [8].These studies demonstrate the efficiency of game model in solving the optimal strategy.However, the previous studies are almost based on the multimedia forensic, and the similar research on email forensic is still a blank.And there is no related research had a discussion of how can a forensic investigator predict the forger"s action by introducing the game theory into email forensic.www.ijacsa.thesai.orgOur work is different from the state-of-the-art studies in several aspects.Firstly, we make a new classification and costbenefit quantification for the existing email forgery methods and authenticity appraisal technologies.Before we make a prediction with someone"s action, we need to have a basic understanding.
The strategies classification and cost-benefit quantification can help detectors know the factors that will affect forger"s decision and action.Secondly, we take both email forensic and game theory into consideration simultaneously and propose a DFG game model to analyze the dynamics between email detection and forgery for the first time.The DFG model aims to help the detector and forger find out the trade-offs that depend upon the actions of another.Thirdly, we propose an algorithm to solve the Nash equilibrium of the DFG model.Then the optimal strategy with the maximum benefits and minimum risks can be found for the players.
The rest of paper is organized as follows.In section 2, we study the email header fields and information, and have a discussion on the email forensic and forgery technologies.In section 3, we will give a formalization definition to DFG model, make a strategy classification and cost-benefit quantification for the detector and forger, and introduce an optimal strategy selection algorithm.The experimental work is discussed in section 4. The conclusion and future work are discussed in section 5.

II. E-MAIL DETECTION AND FORGERY ANALYSIS
Electronic mail, often called email, is a method of exchanging digital messages from an editor to one or more recipients.The email relevant rules are defined by the RFC (Request for Comments), a series of number ranked memorandums issued by the IETF(Internet Engineering Task Force) [9].

A. E-mail Header Analysis
An Internet email messages consist of two major sections: header fields and body.The email header is divided into several fields and each field has a name and value.The email header contains the sender and recipient information, time and data information, email sever information, email transfer information and other relevant information, which plays an essential role to ensure the authenticity of an email.The basic header fields that have been defined in RFCS are show in Table I [3].
In addition to the basic fields, there are some non-standard, custom fields generated by different mail client, which begin with X-, such as 'X-Sender', 'X-Mailer', 'X-SMAIL-MID', 'X-Received', 'X-Originating-IP' and so on.Expecting for these custom fields, some fields generated because of the security technology used by the mail server, such as 'DKIM-Signature', 'Received-SPF', 'Sender-ID'.All these header fields are of great importance for email authenticity appraising.The email address, and optionally the name of the sender.

To
The email address, and optionally the name of the message recipient.

Data
The local time and date when the message was written.Subject A brief summary of the topic of the message.
Message-ID An automatically generated field, it uniquely identifies this message.

Reply-To
Address that should be used to reply to the message.

Received
Tracking information generated by mail servers that have previously handled a message, in reverse order.

Content-Type
The type of the message content Content-Transfer-Encoding The transfer and encoding ways of message content.

B. E-mail Detection and Forgery Analysis
Email spoofing is one of the biggest challenges that threats email security, and the main important forms of email spoofing are data and time spoofing, address spoofing and content spoofing.Generally, an email may be required to be appraised in following three conditions: firstly, the sender or recipient does not recognize they sent or received the email; Secondly, the sender and recipient have objections on the email date and time.Thirdly, the litigants don"t reach an agreement on the email content.The Fig. 1 shows the header message of an email which has a question on the sender address.Fig. 1.An email"s header message, there are two different senders 543954686@qq.comand cqydyt2009@163.comon the header message while the sender on the email envelop is 543954686@qq.comTo find out the real sender, we can analyze the email header information.There are many fields include sender information, such as 'X-Sender', 'Authentication-Results', 'From', 'Message-id' and 'Sender'.Among these fields, only the "From' field refers to 543954686@qq.com,and the 'From' field is created by the author.Meanwhile, there are four fields referring to cqydyt2009@163.com,especially the "Message-id', it was an automatically generated field and not easy to be changed.www.ijacsa.thesai.orgAfter the multi-fields correlation analysis of sender, we can appraise the email sender 543954686@qq.com is forged and the real sender is cqydyt2009@163.com.
In fact, the email forgery and detection methods are various, and the example above only represents one situation.For example, modifying the system properties is the most convenient way to falsify an email, and we can falsify the email data by modifying the system time.The Simple Mail Transfer Protocol (SMTP) is an email transfer protocol, and we can use Telnet command to tamper the email address by logging in the SMTP server.We can also use the off-the-shelf software and website to forge emails.The most complex method is to steal someone"s email password and imitate him to send emails, but it is not easy to know others" password because of the Encryption software and algorithms.Most people fabricate emails with a certain purpose, may be just a joke but the more is for some profits.
To protect people"s profits from the email forgery, various detection strategies need to be taken.Viewing the email header information is the simplest method to detect an email.The multi-fields correlation analysis denotes to analyze a series of fields including one message.For example, the "Received', 'Data', 'Message-ID', and 'Boundary' filed are all including the email data and time.And we appraise the email by contrasting the times of these fields.
We can also use the sender related fields to identify the true address.Making use of external resources means we can take use of off-the-shelf software like "nslookup" to analyze the IP and DNS, or other information like login and server files to identify the email.Multi-emails correlation analysis indicates that we can identify if the emails are authentic by analyzing the logical relationship among emails, comparing the client, writing habits, IP, address and so on.
Since the methods are so various, how can the detector know which detection strategy is the most effective, and how can the forger know which forgery strategy can bring him the maximum benefits and minimum risk?

III. DETECTION-FORGERY GAME MODEL
Game theory is a study of strategic decision making.Specifically, it is "the study of mathematical models of conflict and cooperation between intelligent rational decision-makers".It attempts to determine mathematically and logically the actions that "players" should take to secure the best outcomes for themselves in a wide array of "games" [10].

A. Detection-Forgery Game Model Definition
A game theoretical model includes three basic elements: Player, Strategy set and Payoff function.The strategic form of a detection-forgery game is a 3-tuple DFG=(N,S,U) [11]: ) is a set of players.Players are the decision-makers who decide the action and strategy to maximize their own interests.And in this game model, the players are detector and forger .

 (
) is a set of players" strategies.( ) is the strategy set of plyer i.And here we define the strategy sets as ( ) and ( ).

 (
) is the payoff function set of the players.It reflects the gain and utility the players can gain from the game.We define the detector"s payoff as , and the forger"s payoff as .
Definition1: Nash Equilibrium (NE) is a solution concept of a non-cooperative game, it means each player gains the maximum benefits.In ( ) ( ) ( )), the strategy group ( ) is a Nash equilibrium if and only if for ( ) ( ) and for ( ) ( ).
In a complete information game model, we can use the definition 1 to solve all the possible Nash equilibrium.In the DFG model, ( ) ( ) represents the detector and forger"s payoff while the detector selects the strategy i to detect the email which is forged by strategy j.The Fig. 2 shows the corresponding strategy game, where each row represents the detector"s strategy and each column represents the forgers" strategy, and the values in the matrix are the payoffs associate to the players.

S S
Fig. 2. The DFG payoff matrix

B. The Classification of Strategies
In the DFG model, the players" strategy set is a necessary component.In this paper, we mainly discuss the forgers" and detectors" strategies based on the email header, and classify the strategies according to Operation Complexity (OC).Richard E Overill [12] used the Operation Complexity(OC) to enable the complexity of both the cognitive and the computational components of a process, and the more complex a process is, the less likely it is to occur accidentally, unintentionally or spontaneously.Similarly, we use the operation complexity to measure the complexity or difficulty of a detection or forgery strategy.Generally, the more complex a strategy is, the higher costs it takes.This can be evaluated according to the amount of extra resources or the steps the players take.For any detection or forgery strategy i, the operational complexity of that strategy can be given by: The KLM is specified by the GOMS-KLM model for measuring the human involvement in the operational process [13] and the R represents the size of files for sending an email.The basic unit of the GOMS-KLM characterization of cognitive information processing is taken to be the mouse button press or release; www.ijacsa.thesai.orgsimilarly, the basic unit of information processing used in characterizing the resource is the byte.In addition, the cognitive component should be scaled by the ratio of the processing rates of the human and computer, typically≈10 6 .Table II shows the KLM operators and normal values [12] and Table III shows an example of the frequent KLM actions and values of modifying the system time to send a false email, and the total value is 62.6.Since this strategy needs no extra resource expect an email client or login an email website, such as Foxmail7.2,then the R is 15,624,827, so the OC=78,224,827.In order to have a better strategy classification, we can divide the operation complexity into three relative levels:  L1: The cost is very small and the OC<10 9 .For example, modifying the system time needs only an email client, and the operation needs simple steps;  L2: The operation needs some time and resource and the OC<10 10 .For example, using telnet command to falsify the email address needs little resources, and the operational step is complex and time-consuming;  L3: The operation needs much more resource and the OC<10 11 .For example, stealing other"s password not only needs many resources but the operational step is more complex and time-consuming.
Based on the email detection and forgery analysis and the definition of three level operation complexity above, the email detection-and forgery-strategy taxonomy according to the operation complexity are show in Table IV

C. Cost-Benefit Quantification
In order to make the payoff function more exactly and actually, we need to quantify the costs, risks and benefits.The relevant cost factors are defined as follows[14]: Definition2: Detect Cost (DC) characterizes the amount of resources of implanting a detect action, such as hardware and software resource, expertise, time and so on.Definition5:Forge Cost (FC) characterizes the amount of resource of implanting a fake action, such as hardware and software resource, expertise, time and so on.
Definition6:Forge Damage (FD) characterizes the amount of damage or the legal penalties to the forger which is inflicted by the detector if he can identify the forged email successfully (Expressed in negative values).
Definition7:Forge Benefit (FB) characterizes the amount of benefits if the forger escaping from the forgery detection.
Definition8:Detection Rate P indicated the possibility that a detect method can successfully identify the forged email as forgery.If there is an email E that has been manipulated using an editing operation m(*), then we assume the null hypothesis H 1 is that E is unaltered and authentic; The alternated hypothesis H 2 is that E is a manipulated version of another email E 1 and E is forged, i.e. : Then we do experiments on a large number of emails that include forged and authentic, and the detection rate P defined as follows: Where   (5) From the (4) and ( 5), we can find that if the detector and forger want to maximize their rewards, the detector needs to maximize the detection rate P while the forger needs to minimize it.However, the detection rate depends not only on the cost of detecting but on the cost of forging.The more detection cost, the higher P, and the more forgery cost, the lower P.So we need to find out an optimal P to maximize both the forger"s and detector"s rewards.

D. Optimal Strategy Selection
The detection strategy with highest possibility and maximum rewards to appraise emails, and the forgery strategy with maximum possible and rewards to falsify emails can be selected from the candidate sets through DFG model.And the detail of the optimal strategy selection process is presented as follows: Input: detect and forge strategy set Output: optimal strategy Algorithm: 1) Construct the detector"s strategy set ( ); 2) Construct the forger"s strategy set ( ); 3) Initialize the DFG (( ),( ),( )); 4) For all , compute the defection rate P according to the (2) and ( 3),and compute the rewards of detector and forger according to the (4) and ( 5 for all ; iv).
Get the Nash equilibrium( ); 7) Decide the optimal strategy.

IV. NUMERICAL ANALYSIS
In order to verify the effectiveness and feasibility of the DFG model, we need to introduce the model into the actual email authenticity identification cases.Since the email address spoofing case is the most widely happened in civil disputes, so we consider the counterwork between the detector and forger as follows: A and B are business partners, and A used to order goods from B through emails.However, one day B received an email and the sender on the envelope is A, but A denied to have sent the email and received the goods.We consider the envelop of all the emails shown as Fig. 3. Fig. 3.The envelop of emails,the emails above have the same sender and recipient shown in the envelop,but not all the emails are authentic, some of them are forged,the sender is not the real one In this case, we considered there are different strategies for a forger to fabricate an email with false sender, and the methods to identify the email"s real sender are also various for the detector.In real society, the mostly related money of Email appraisal cases is almost from ten thousand to ten million, and the influence of the appraisal results is from 0% to 100%.In this case, the influence of appraisal result is 100%, and the basic unit of the cost and benefit is one thousand dollar.Since the more complex, the higher cost, here we set the cost of L1 from 0-10, the cost of L2 from 10-40, the cost of L3 from 40-100.Table VI,VII shows the strategies, benefits, costs of the email forger and detector.According to the tables above, we fabricated numbers of email with the forgery strategy set and appraised emails with the given detection strategies.Then we use the optimal strategy www.ijacsa.thesai.orgselection algorithm to solve the optimal solution.According to the (2),( 3),( 4), (5), we can get the detection rate P matrices and detection-forgery payoff matrices , as follows: An equilibrium ( )=(0,0.0083,0,0.9917),Z=90.4545; ( )=(0,0,1,0), M=90 can be found by the optimal strategy selection algorithm.Therefore, the detector plays the strategy with the possibility 0.0083 and the strategy with the possibility 0.9917, and the strategy is the optimal strategy for the forger.From the above results, we can find that the strategy set ( ) is an optimal strategy for the example case.This result indicates that the forger is most likely to use off-the-shelf software to fabricate an email with false sender, and the forensic investigator will have a maximum reward by making correlation analysis with multiemails when facing such cases.

V. CONCLUSION
In this paper, we have proposed a DFG game model for analyzing optimal detect and forge strategy decision in email authenticity identification.We regard the interactions between a forensic investigator and a forger as a two-player, noncooperative, nonzero-sum game and formulated the DFG game model.Based on the strategies" cost-benefit quantification and DFG model, we selected the optimal strategy from the given sets.And finally, we used a practical case study to verify the effectiveness of the DFG model.Nevertheless, there are still some problems of the DFG model, such as the cost-benefits quantification and payoff functions we adopted in this paper is not very comprehensive.We will pay more attention to improving it in the future work.

Definition3:
Detect Damage (DD) characterizes the amount of damage or risks inflicted by the detector if he can"t identify the forged email or treat the forged email as the authentic one.(Expressed in negative values) Definition4:Detect Benefit (DB) characterizes the amount of benefits inflicted by the detector or the extra-reward if he successfully detected the forged email.

2
HH means we appraise the email is forged, and  1 E m E means the email is actually forged.Based on the definitions above, we can define the detailed rewards of detector and forger as follows: www.ijacsa.thesai.
), and get the payoff matrix ; 5) Set the utility matrix ; 6) Compute the Nash Equilibrium of DFG.Processes as follows: a) Test for the saddle point in the utility matrix; b) If there is a saddle point, the saddle point is the Nash Equilibrium; c) If there is no saddle point, solve it by linear program.Processes as follows:

TABLE II
and TableV.

TABLE IV .
DETECTION STRATEGY TAXONOMY

TABLE VI .
SUMMARY OF FORGERY STRATEGY