Dynamic SEIZ in Online Social Networks: Epidemiological Modeling of Untrue Information

The epidemic propagation of untrue information in online social networks leads to potential damage to society. This phenomenon has attracted attention to researchers on a faster spread of false information. Epidemic models such as SI, SIS, SIR, developed to study the infection spread on social media. This paper uses SEIZ, an enhanced epidemic model classifies the overall population in four classes (i.e. Susceptible, Exposed, Infected, Skeptic). It uses probabilities of transition from one state to another state to characterize misinformation from actual information. It suffers from two limitations i.e. the rate of change of population and state transition probabilities considered constant for the entire period of observation. In this paper, a dynamic SEIZ computes the rate of change of population at fixed intervals and the predictions based on the new rates periodically. Research findings on Twitter data have indicated that this model gives more accuracy by early indications of being untrue information. Keywords—Information diffusion; epidemic models; SEIZ; rumor detection


I. INTRODUCTION
The term "Social Networks" (SNs) was first coined Barnes in 1954 [1]. It emerged in the form of e-mail and now used in many applications. In the last few years, SNs also referred to as social media have spread at a phenomenal pace covering people in every walk of human society. It has allowed people to share ideas, opinions, and seek information quickly and effectively. Emergence and boom of social networking websites like Facebook, Twitter and Reddit have proved useful in catastrophe control required in situations like flood, storm, and earthquake. These platforms have been equally effective in mitigating situations caused due to terrorist attacks, shootouts, and other similar situations. All these platforms are open platforms on which people can share whatever they wish with almost no censorship or filtering or verification of the contents by the media operators.
The enormous use of online social networks (OSNs) by millions of daily active users leads to radical changes in information sharing and communication among users as in [2]. Statistics show that over the years 2016-2018, about 20% of adults received news from U.S. social media websites [3]. According to study results [4], in 2017, the news on selected countries named Television as their source of news with 41 %, while 44% identified the internet.
(Including social media) as their source of news, as per in Fig. 1. Websites intentionally publish hoaxes, rumors and misleading information popping up all over the web, and are shared on social media to extend their reach [5]. Terms such as "false news", "post-truth" and alternative facts" that is strongly associated with exposure of news in media. For example, stories such as Hillary Clinton's commercialism weapons to ISIS and Pope Francis support Trump for President liked and commented thousands of times on Facebook. Also, 14% of the people admitted that they have deliberately shared a fake political news story [6]. It is clear from this that the false news will continue to gain attention as long as users are still willing to share online. Even the current online checking system, such as Factcheck.org and Politifact.com, are standard detection approaches supported by professionals who specialize in the verification of the story item. Besides, every day a large amount of real-time information posted, answered, replied and shared via online social networks, which makes the detection of false news even harder.
In recent years, extensive research carried out on the establishment of an automatic and effective system or framework for the detection of online fake news and thus retrieving the useful and valuable information. Identifying legitimate information from a large amount of data available at different platforms is a challenging task because of the heterogeneous and dynamic nature of online social communications.
Misinformation can be defined as any malicious information that has spread deliberately or unintentionally (i.e. without realizing it is untrue) [7]. Dissemination of false information has currently become identical with fake news. Moreover, the longer misleading information spreads within the network of a trusted source, the greater the impact. However, it is desirable to avoid unnecessary propagation of messages that causes an excessive burden on infrastructure that is used as long as-over emergencies [8].
The problem of false information or rumors in online social networks usually observed as misinformation or rumor blocking [9]. In addition, the lowest set of users can minimize the effect of spreading achieved by the originators of rumor. Other rumors distinguished by the subset of users act as the defender against the dissemination of misleading information sources [7]. The goal was to find the smallest subset of influential users and restrict the spread of misinformation to the specified ratio in a given set of the population. The proliferation of competitive campaigns evolving and aiming towards the selection of the appropriate subset of nodes such as the influence of misinformation campaigns also decreased as in [10].
For the effective and efficient solution to the problem, a series of research questions that need to address-• How to effectively eradicate misinformation and understand the news propagation ways • How contradictory information affects an individual's decision to accept and share the contents • How can misinformation sources are early identified • How to visualize and classify the misinformation of high dimensional real-time data

A. Research Challenges and Issues
The research challenges from the motivation of various papers on online fake news detection and some missing research directions are as follows: • Determining an effective and reliable detection system for the analysis of rumor content in OSN's as early as possible.
• The studies suggest that online detection of fake content diversified in terms of methodologies, objectives, and research areas. Summarization of varied techniques and methods, with representative features, and evaluation of existing detection systems is required.
• The binary classification is not sufficient to classify the characteristics of deceptive information because of the complexity of the detection of untrue information. Data visualization is a powerful tool to illustrate the heterogeneity of social structures and the dissemination of social information online modes.
• In addition to the detection system, two aspects of promising research aimed at combating the false news. They are: (a) the early prediction of false news (i.e. it is very significant to identify trends or potential false news as early as possible.); and (b) False news intervention (i.e. from the historical data, the initial predictions appear false news or rumors before the intervention). This work tries to assess whether the system is based on online communication information is news or rumor. However, it is extremely significant to identify trends or potential false news as early as possible.

B. Contributions
The contribution of the work summarized as follows: • The SEIZ epidemic model used to simulate the Twitter dataset in real-time. The non-linear least square optimization used over tweet data to underlying network ODEs and demonstrates how false information modeled with the SEIZ.
• Comparison and data analysis between classical and dynamic SEIZ in the network provides an early indication and distinguishes the rate of change of population with time on Twitter data, respectively.
• The parameters-Exposed Ratio (RSI) and Reproductive Number (R0IZ) that show the capability of the SEIZ model to quantify information to be a rumor in compartment transition dynamics. In addition, these measures provide suggestive results for the rateinfected user affects the exposed compartment and susceptible compartments.
The paper structure is as follows; Section 2 presents a comprehensive description of existing work in epidemic models and rumor models. Next, Section 3 demonstrates a detailed analysis of SEIZ with experimental evaluation by fitting it with the dataset. Section 4 provides a comparative study with a classical and modified dynamic SEIZ model with results. It also determines early rumor detection ratios for identifying them in the network. Finally, Section 5 concludes the research study.

II. RELATED WORK
A significant amount of work studied and modeled into research on information diffusion in social networks, which includes epidemic models and rumor modeling techniques in literature with their significance. Below is the summarization of these techniques from the literature.

A. Epidemic Models
Diffusion of information similar to the spread of the epidemic, but there are differences. Information diffusion is related to time, the strength of the relationship, information content, social factors, network structure, etc. Researchers have made ongoing improvements based on the classical model, developing new models such SI model (i.e. Susceptible, Infected) by [11], SIS model (i.e. Susceptible, Infected, Susceptible) by [12], SIR model (i.e. Susceptible, Infected, Recovered) by [13], SIRS model (i.e. Susceptible, Infected, recovered, Susceptible) by [14]. The SEIZ model (i.e. Susceptible, Exposed, Infected, and Skeptic) proposed by [15] takes a distinct approach by introducing a state Exposed(E). It is quite an effective mapping disease model with rumor models [16]. Several papers study the properties of spreading information online social networks with emphasis on Twitter [17][18] .
Information diffusion includes a set of characteristics summarized as follows: Diffusion minimization problem which identifies a small set of individuals with whom the dissemination of information diffusion is higher [19] [14]. The role of users before the spread and classifying them consistent with their activity (i.e. re-tweeting, generating new tweets, commenting tweets, mentioning, replying, no. of followers so on) [13][15] [17]. The re-tweet chain and the range of information diffusion [17], Hash-tags [20], temporal and topical patterns [21]. Modeling and studying the dynamics of information diffusion through the model of temporal dynamics of the epidemic and the diffusion of information on Twitter by [22].
The SEIR model was developed by adding another compartment Exposed (E) nodes supported the SIR model [23]. It is a dynamic information propagation method to analyze the impact of the frequency of the logged-in user and various friends. The results showed that the frequency of the logged-in user is directly proportional to the speed and range of transmission of information. S-SEIR model designed for single-layer social networks that support SEIR and its transmission of information depends on user activities [24]. SCIR model assumes that all followers assigned to the contact state 'C' when the user posts a message. It is a model for Micro-blogs by adding a contacted state (C) and provides an online topic spreading model [23] [24]. In addition, the state of the followers may change according to a certain probability, the change in transmission or immune users after a particular duration. In [25], proposed model IR-SIR based on the SIR model simulating the adoption and abandonment of user views by adding a recovery process of the mechanics of the infection and verified the process on Facebook and Google. In [26], proposed the FSIR model that considers the effect of neighbors on an individual within the diffusion information.
The threshold for spreading of information is zero and affects information diffusion. ESIS model based on SIS by [27], where the information transmitted between people has expressed a sort of emotional information.
Compartmental models are a mathematical approach applied to measure and predict the spread of various infectious diseases. The method of misinformation diffusion is usually a similar approach as a virus spreading process. In transmission epidemics, there is each user infected with viruses and can become susceptible to viruses. In [15], proposed the S E I Z model where the skeptics are the individuals becoming immune to infection. Although it is similar to the removed (R) individuals, skeptic transitions directly from the susceptible state and their interaction will still affect different compartments as well.

B. Rumor Modeling
In 1964, analogy between the spreading of infectious disease transition and information propagation in the network was given [28]. In particular, it emphasizes on a mathematical model for the spreading of infection in several ways, depending on a mechanism to describe the growth and decay process. Later experimentation conducted that guaranteed greedy approximate solution applied to Facebook [10]. According to the results, minimizing the spread of misinformation in social networks (SNs) is an NP-hard problem. Similarly, rumors detection problems in the microblog explore the effectiveness of into three categories (i.e. content-based, network-based, and micro-blog memes) for correctly identifying rumors [14]. [29]analyzed information credibility of news propagated through Twitter. The results are evidence that there are measurable differences in the way messages propagate, with a range of 70% to 80% of precision and accuracy rates. [30] Modeled rumor propagation in the network with the model Susceptible-Infected (SI) and constructed an estimator for the rumor source. They established a maximum likelihood (ML) estimator for a graphic class to find rumor sources in the network. The stochastic epidemic study on infection spread and rumors on networks, focusing on the SIR epidemic by ignoring density correlation between neighboring nodes investigated by [31]. This method computed mathematically and compared with the mean-field deterministic model over the set of equations.

C. SEIZ Model
Compartmental models are a mathematical approach used to evaluate and predict the spread of various infectious diseases [35]. SEIZ model is a compartmental model that breaks the population into distinct compartments and establishing parameters for the rates at which the population transitions between compartments. These parameters obtained by looking at the relationships between each class of the population and making assumptions about the disease. With this, a set of differential equations generated to make predictions about the spread of the disease. A basic approach that is more practical than the SI model. Here the population divided into three compartments: Susceptible (S), who are the individuals at risk to the disease; Infected (I) which are those who have the disease and are capable of transmitting it. Removed I which are individuals who can no longer be infected or infect anyone else, so they either died from the disease or they have recovered and are now immune [32]. A great application of this model would be to a disease like measles, where once recovered, an individual has immunity for life. One of the assumptions made in this Susceptible-Infected-Removed, or SIR, model, is that the overall population, N, and is constant. .
Suppose that individuals only transition from S to I at a contact rate of r > 0 and from I to R at a transition rate of a > 0, as shown in Fig. 2. These relationships then used to generate a system of ODEs: The rate at which the individual SI transition depends on r as well as the number of individuals infected. The rate of change depends on Infected (I) because if someone is in contact with an infected individual and becomes infected. The transition rate from I to R, however, is only dependent on 'a' as per the equations 3, 4 and 5. The transition for each individual out of the infected compartment is independent of everyone else for this model. Another simple model used is the Susceptible-Infected-Susceptible, or SIS, model as shown in Fig. 3.
In this case, it has assumed that instead of dying or becoming immune, the infected individual transition back into the susceptible class. Thus, this model used for diseases such as gonorrhea, because individual does not gain immunity after recovering from an infection. Though this model is not perfect, its simplicity can give us a good idea of what is going on. If we assume that individuals transition from S to I with a contact rate of r > 0and back from I to S with a transition rate of λ > 0, gives.
With these relationships, the following system of ODEs generated: According to the SIR model, the rate of transition from (I) is independent of anything except the parameter 'λ'. In addition, the rate of transition from S to I depends on the parameter 'r', and the number of people in the infected compartment as per the equation 6 and 7.
One major drawback of SIS and SIR models is that a susceptible individual moves to the infected class immediately after exposure to an infected individual. This is usually not the case; however, as many pathogens take time to develop during a phase of incubation. For these instances, an Exposed Compartment (E) incorporated to indicate those who have met an infected individual, but it takes some time in the model to infect them themselves. From this extension, the SEIZ model explored one more compartment Skeptics (Z). Z recruit from the susceptible population with rate 'b', but results either in transforming the individual into another skeptic (Z) with probability (l), or can have unintended effect of sending that individual to the incubator class with probability (1-l).
In this model, the susceptibility immediately infected with probability 'p', and '(1-p)' is the possibility of an individual transiting to the incubator class instead, from which they adopted. 'N(t)' denotes the total population where the network has a disease-free status with S*=N, E*=I*=Z*=0. Fig. 4 illustrates the relationship between each compartment. Table I and Table II provide description of each parameter of this model. This provides a more intuitive look into the model and relates the relationships above with the actual equations: With the relationships between each compartment described by the parameters above, we have the following set of ODEs:   In [16], studied the SEIZ and SIS models for the spread of news on Twitter and found the SEIZ model produced much better results. When the models were optimized against Twitter data from 4 news stories and 4 rumors, it was found that the SEIZ model consistently had a much lower relative error than the SIS model. SEIZ suffers from two limitations i.e. the rate of change of population in a state and state transition probabilities considered constant for entire period of observation. This requires the population to reach a certain level before making any trustworthy characterization of information as untrue information resulting in substantial delay in characterization of untrue information. Assuming static rate of population, estimates delayed accuracy of results. In this paper, a dynamic SEIZ computes the rate of change of population at fixed intervals and the predictions based on the new rates periodically. This approach provides the early indications of information being untrue.

III. MODEL FITTING
The scripts used for fitting the model and performed multilinear regression to study Twitter data from actual data. The entire experimentation performed in Python 3, solving ODE equation with period 7 days of information spread by Higgs Boson. The computed response ratio for days, R SI and relative error (Err) produced between infected tweets in time, I(t) and actual tweets. The inputs were taken from Higgs networks which are fitted in each of the compartments of S, E, I, Z, where Z to be skeptic population assumed to be 1 as the initial condition for prevailing days with time.

A. Dataset used
The Stanford Network Analysis Project (SNAP) used 14 million tweets [33]. With the discovery of a new particle, the Higgs dataset used to analyze the process of informationspread on Twitter. The messages posted on Twitter on this exploration between 1st and 7th July 2012 taken into consideration. Accessibility of the four directive networks derived from Twitter user behaviors is as follows: 1) Re-tweeting (retweet network) 2) Replying (reply network) to existing tweets 3) Mentioning (mention network) other users 4) Friends/followers social relationships among user involved in the above activities 5) Information about activity on Twitter during the discovery of Higgs boson It is worth remarking that the user IDs has anonymised, and the same user ID used for all networks. This choice allows using the Higgs dataset in studies about large-scale interdependent/interconnected multiplex/multilayer networks, where one-layer accounts for the social structure and three layers encode different types of user dynamics.

B. Experimental Evaluation
Data on a particle discovery obtained from Twitter, applied the classic model, and modified model of SEIZ. Here the social network contains 456,631 nodes and 14,855,875 edges to fit and model in dynamic SEIZ. Further its accuracy evaluated for early detection of rumor content by RSI and R0IZ (i.e. standard parameters for rumor measure in the network).
In the context of Twitter, various compartment SEIZ interpreted as follows: susceptible (S) is a person who has not heard about the tweet, infected (I) indicates an individual who had tweeted about the tweet, skeptical (Z) are individuals who have learned of the tweet but chooses not to tweet it. In addition, the exposed (E) are those who receive tweets but it took some time before the posting. The inputs for the function are the initial conditions of each compartment(S, E, I, Z), and the total population as N = S+E+I+Z. Then solving the system of ODEs and computing the result. The initial conditions known of the dataset. Assuming that the skeptic Z (i.e. dormant accounts) to be 1% of total population. The goal of optimizing these parameters is to minimize equation. Some of the constraints to this minimization problem are that all of the parameters from Table III must be non-negative. Thus the parameters values β, b, ρ, s ≥ 0 and 0 ≤ p, l ≤ 1. Similarly, the compartments must also be non-negative. Table III calculated from the SEIZ ODE solving previously mentioned in the equations (8)(9)(10)(11). To demonstrate SEIZ mathematical model, quantities from Higgs Boson fitted in this model. Data implementation written in Python to extract data from each network from this large social network data set. Here classical SEIZ model computed and compared with the result of modified SEIZ model that provides early detection and indication of rumor in the network provided with information in time slots of 7 days. Experimentation performed for one day, three days, five days, seven days and data obtained for each day respectively. 581 | P a g e www.ijacsa.thesai.org  Fig. 6 demonstrates that the susceptible (S), Exposed (E), Infected (I), Skeptic (Z) individuals began to increase with the time (i.e. retweet/reply/mention in the network during the 7 days). For 1 day, E(t) and S(t) decreases at the rate negative to that of I(t) increase. In fact, I(t) coincides with S(t) with number of days making more exposed population in E(t) compartment. For other days, it is observed that S(t), I(t) and E(t) becomes constant over certain period in the network. Thus, there is no significant amount of changes seen for 3 days, 5 days and 7 days with static classical SEIZ approach. The remaining cases of SEIZ plots are shown in Fig. 5 with S, E, I, Z dynamically changing over time.

The parameters in
The time varying network for all four compartments in Higgs Twitter's SEIZ time course plots in Fig. 6 suggest that the effective rate of skeptic population becoming susceptible is much higher than the infected population for 1 day. With an increase in Z (t), S (t) decreases and thereafter, S (t) is stable together with Z (t). I (t) increase as S (t) decreases, but its rate of change is slower. There is also a strong correlation between E (t) and I (t) increases. As I (t) peak, E (t) peaks as well. However, the increase of E (t) closely correlated with the decrease compartment S (t). Most of the increase I (t) occurs after S (t) has developed minimum values, which demonstrates that the infected compartment is constantly changing.
Obtained results from data shows early indication of untrue spread of information/particle spreads which states that dynamic SEIZ works well when obtained in dynamic time intervals over a large network. It also clearly distinguishes data from classical model by stating its rate of change in the entire four networks to be faster with time irrespective of early distinguishing the true and untrue information.

C. Comparison with Classical and Dynamic SEIZ
The modified SEIZ provides early indication of untrue information in the network with increase in number of days in the network. For 5 days to 7 days we can observe significant peak rise with modified technique which suggest that increase in infected users is because of transition of large amount of users from infected to exposed compartment over time (t) in the network. Comparing both techniques in Fig. 7.
The infected compartment increases rapidly from the exposed population and not from the direct recruitment of susceptible individuals. Thus, these findings suggest that the generation of early indication of propagation of misinformation spread in the Higgs Boson network.   Next requires determining the Exposed Ratio (R SI ) and the Reproductive Number (R IZ ) to recognizing the quantity of untrue information on Twitter data. Analysis of this fixed point's local stability shows that the specific reproductive numbers are given by equation 10 where, R 0 I corresponds to an eigenvector with the adopter component. R 0 IZ corresponds to the exclusive rise of the skeptical population without recognizing the infection. R 0 IZ is the criterion for adopters invaded in a susceptible population. If R 0 IZ >1 it will spread the infection. The higher R 0 IZ , more successful is the acceptance ratio of the infection. By assembling equations to relate the primary parameters of the SEIZ model, R SI is the ratio of the number of effective transition rates that arrive (from S) to the sum of transition levels that exit the compartment (to I). R SI holds all of the SEIZ model's rate constants and probability values and relates them to the exposed flux-ratio i.e. the ratio of effects entering E to those leaving E. The ratio, called the Exposed ratio, given by equation 11: The Exposed ratio may be able to indicate a difference between news stories and rumors. A value greater than 1 for RSI means that the transition rates from S to E were greater than from E to I. If the ratio is less than 1, than the exit rates are greater than the entry rates. For the cases studied in [16], it was shown that the news stories typically had RSI > 1and the rumors had RSI < 1. Similar computation followed on dataset with these values indicating early signs of rumor/infection in the network. Table 4 illustrates RSI and RIZ the values for the modified SEIZ model.
The values of RSI in the Fig. 8 shows that for the very first day when SEIZ applied, the exposed population from Susceptible to Infection compartment detects with -15.95 rate of influx to efflux. This shows that modified dynamic SEIZ results in early prediction of rumor count, which initially detected for 1 day. For the rest of days this ratio positively shows with 5 to 6.39 which potentially giving non-rumor rate of diffusion.

R 0
IZ corresponds to an eigenvector with the adopter component gives exclusive rise of the skeptical population without recognizing the infection. R 0 IZ is the criterion for adopters invaded in a susceptible population. If R 0 IZ >1 it will spread the infection. The higher R IZ , more successful is the acceptance ratio of the infection. By assembling equations to relate the primary parameters of the SEIZ model, R SI is the ratio of the number of effective transition rates that arrive (from S) to the sum of transition levels that exit the compartment (to I). R SI holds all of the SEIZ model's rate constants and probability values and relates them to the exposed flux-ratio i.e. the ratio of effects entering E to those leaving E. The ratio, called the Exposed ratio, given by equation 13.
Thus, R SI determines that the greater R SI value more susceptible enter in exposed group and with the time this ratio decreases which gives us there can be situation that all exposed population can prone to untrue information with time information diffused in the network. For 1 day to 7 days provides a clearer indication of I (t) infected population creating rumor.
The R IZ value estimates the average number of adopters induced by a standard spreader in a susceptible population as shown in Fig. 9. These tests the infection's effectiveness as R IZ > 1, the infection spreads in a network. The greater the R IZ , the greater is the potency of the infection Fig. 10 determines the similar result that infection rate is much higher for the first day with 122.57 which gradually decreases with I(t) becoming Z(t) skeptic. Larger the values of R IZ estimated in the SEIZ model are mainly due to the very long lifeline of the infection I (t). In practice, R IZ can be estimated by linearizing I(t) around disease-free equilibrium in simple models [36]. In Fig.  11, R IZ s provides a statistical distribution to match Feynman diagram spread data [34].

V. CONCLUSIONS
Early prediction can remind users of all potential risks with false information before it exists. The false news intervention can eradicate users to erase the negative impacts of false news. In this paper, a model inspired SEIZ epidemiological applied to the population to detect false information dissemination. SEIZ model is capable of modeling the false content of large sets of network with earliest possibility in dynamic network. Exposed ratio (R SI ) and Reproductive Number (R IZ ) is a unique value and too sensitive to change in parameter values used to gain useful information from it. Specifically, on Twitter, this paper demonstrates how these criteria incorporated into the strategy to support the early detection of Twitter infection. The findings suggest that Twitter data provides valuable dissemination information with other data analysis strategies to provide accurate and reliable results.

VI. FUTURE WORK
By examining, more data sets in the future further will strengthen the claim made that SEIZ model is a great model for early prediction of untrue information propagation on Twitter. In addition, these epidemiological models with reinforcement learning can mitigate the effect of false news on social media. Removing suspicious accounts online, protecting users with false information and some automated methods aids in reducing the spread of false news online. Thus has a potential research direction for big data systems.