Identifying Top-k Most Influential Nodes by using the Topological Diffusion Models in the Complex Networks

Social networks are sub-set of complex networks, where users are defined as nodes, and the connections between users are edges. One of the important issues concerning social network analysis is identifying influential and penetrable nodes. Centrality is an important method among many others practiced for identification of influential nodes. Centrality criteria include degree centrality, betweenness centrality, closeness centrality, and Eigenvector centrality; all of which are used in identifying those influential nodes in weighted and weightless networks. TOPSIS is another basic and multi-criteria method which employs four criteria of centrality simultaneously to identify influential nodes; a fact that makes it more accurate than the above criteria. Another method used for identifying influential or top-k influential nodes in complex social networks is Heat Diffusion Kernel: As one of the Topological Diffusion Models; this model identifies nodes based on heat diffusion. In the present paper, to use the topological diffusion model, the social network graph is drawn up by the interactive and non-interactive activities; then, based on the diffusion, the dynamic equations of the graph are modeled. This was followed by using improved heat diffusion kernels to improve the accuracy of influential nodes identification. After several re-administrations of the topological diffusion models, those users who diffused more heat were chosen as the most influential nodes in the concerned social network. Finally, to evaluate the model, the current method was compared with Technique for Order Preferences by Similarity to Ideal Solution (TOPSIS). Keywords—Topological Diffusion; TOPSIS; Social Network; Complex Network; Interactive and Non-interactive Activities; Heat Diffusion Kernel


INTRODUCTION
Most networks existing around us are of complex type.These include neural networks, social networks, organizational networks, computer networks, etc. [1].Today, social networks have drawn attention more than the others.Every social network is composed of two elements of users and relationships: Users are defined as any entity participating in a relationship and are called Nodes; relationships are the connections between entities and are called Edges.Different types of relationships (work, family, friends, etc.) can exist between nodes [2].
Development of social networks accelerates the spread of different types of information, including rumors, news, ideas, advertisements, etc. People's decisions to refuse or accept information depends on the diffusion or spread method of the information [3].Therefore, choosing the individuals intended for spreading the information gains much importance.So far, several models have been proposed in social networks for identification of those individuals who have social influence among people.Many existing social influence models for the definition of influence diffusion are based solely on the topological relationship of social networks nodes.The ideas of topological diffusion models can be used in the process of diffusion and spread of influence, and it can be evaluated through topological relationships among nodes in a social network [4], [5].
In this paper, it is attempted to examine the social influence regarding the heat diffusion kernel phenomenon, i.e., the dynamical equations are modeled based on heat diffusion.In fact, the heat diffusion process finds the influential nodes in complex networks using heat transfer laws based on interactive and non-interactive activities.Therefore, users who receive heat more than others and have a greater increase in temperature curve are identified as the most influential nodes in a social network.Then, through defining the modified heat diffusion kernels, the accuracy of the identification of the influential nodes can be increased.

II. RELATED WORK
Doo & Liu in [3] proposed a model of social influence based on activities.Activity-based social influence is very effective in finding nodes in the social networks.In their paper, three types of topological diffusion, i.e., the Linear Threshold models (LT), Independent Cascade model (IC) and heat diffusion model have been fully expressed.
The authors in [6] carried out in 2015 for the first time the problem of influence maximization as a combination optimization problem.They considered two influences spread models, i.e. independent cascade and the linear threshold and evaluated these models extensively.But authors in [7] focused on the linear threshold model and presented a standard greedy algorithm in which the selection of a node with the maximum edge is increased repeatedly.The authors in [8] discussed mostly the time-critical influence maximization, where each node wants to reach maximize influence spread within a given deadline.www.ijacsa.thesai.org The authors in [9] presented three diffusion models together with three algorithms for selection of the best people.Their paper presented a new approach for analyzing social networks; subsequently, complexity analysis shows that the proposed model is also scalable to great social networks.
The authors in [10] designed a two-stage greedy algorithm (GAUP 1 ) to find the most effective nodes in a network; GAUP initially computes the preferences of the user to a pattern that has latent feature model based on SVD 2 or a model based on vector space, then to find top-K nodes in the second stage, it utilizes a greedy algorithm.
A new technique [11] known as Technique for Order Preferences was presented by Similarity to Ideal Solution (TOPSIS weighted) to improve the ranking of node spread.With this method, the authors in [12] not only considered different centrality measures as the multi-attribute to the network but also proposed a new algorithm to calculate the weight of each attribute; and to evaluate its performance in four real networks they used the Susceptible-Infected-Recovered (SIR) model to do the simulation.Hu, et al.'s experiments on four real networks showed that the proposed method could rank the spreading ability of nodes more accurately than the original method.

III. METHOD
Social network models can be described by mathematical tools, such as graph and matrix.The most important property of graphs is their topological capability in which a vertex is created for each user, and if two users are friends with each other, the two vertices are connected.If a direction is attributed to each edge, the said network is called "directed", in which the order of vertices in an edge is important.Each row in the matrix corresponds to a vertex, and each column in the matrix also corresponds to a vertex.If there is a relationship/edge between two vertices of an edge, number 1 is used; otherwise, number 0 is inserted [13].
Users can send text, photo or video and like their friends, or write comments for them; hence, the users' activities can be divided into two categories: the interactive activities, i.e., those user activities that include nodes other than themself, and the non-interactive activities, which include only himself and not someone else.As an example, permission for commenting on the profile picture of another person is an interactive activity because it is implemented between two nodes and the picture in the profile page is a non-interactive activity because it includes only the node itself.If i is taken as a row and j as a column of a matrix, assume that IA ij represents some interactive activities from node v i to the neighboring node v j and i NIA are some non-interactive activities at node v i .Upon combination of interactive and non-interactive activities, there are several methods for spreading heat diffusion kernel.Interactive and non-interactive activities are defined as , respectively.
The MAX(NA) is defined as the largest number of non- 1 Greedy Algorithm based on Users' Preferences (GAUP) 2 Singular Value Decomposition (SVD) interactive activity in V [3].Fig. 1 is an example of the topological structure of Twitter social network displaying top 10 users.The positive integer numbers on the edges represent the number of interactive activities, and positive integer numbers with underline at each node represent the number of non-interactive activities implemented by that node.Topological diffusion, as one of the processes of influence diffusion in social networks, exhibits the spread of influence through topological relationships between nodes.Heat diffusion model is one of such topological diffusion models.In this model, it is assumed that at the initial time t 0 , in a social network of n nodes, all heat nodes except the node v i has primary heat of zero.Node v i which has some heat, is selected as a heat source and at time t 1 , v i diffuses the heat equally between all its neighbors.At time t 2 , nodes with non-zero heat diffuser their heat to all their neighbors.With the repetition of this process for a period t for all n nodes, the influence of each node is found.
Suppose that G = (V , E) represents a directional graph of social network, V = {v 1 , v 2 , ..., v n } is a set of vertices representing the number of users, and E = {(u,v)|u,v V} is a set of edges representing the friendship relationship between users [14].The heat at vertex v i at time t is defined by the function H i (t); heat flows from a high-temperature node to a low-temperature node following the edges between vertices.In the directional graph, at time interval t, vertex v i diffuses the heat to the amount of DH i (t) through output edges to next nodes.At the same time, the vertex v j , receives the heat RH i (t) through input edges.Heat variations at vertex V  v i between the time interval t and t+t is defined with the sum of the differences between the heat that received, and the heat diffuses to, all its neighbors.
DH i (t) and RH i (t) are defined as ( 2) and (3): Parameter α, called heat diffusion coefficient, controls the rate of heat transfer.Value of α is a real number between 0 and 1.If α tends to 0, heat transfer would be difficult, and the heat will not spread everywhere.If α tends to 1, without heat loss, all heat will be distributed among all neighbors, i.e., if α is big, the heat is diffused rapidly from one node to another.In this paper, the value of α is taken as 0.5, which means that the heat is transferred with a little loss between nodes.β in ( 3) is used as a weight for non-interacting activities and takes a real number between 0 and 1.If β is considered 0, it means that some of the non-interacting activities are ignored and have no effect on the heat diffusion process.But if β is set to 1, in heat diffusion process, node v i loses its heat with a lower rate; in this paper, in all curves, the value of β is taken to be 0.5.Equation ( 2) and ( 3) can be combined with equation of heat variation (1) and obtain (4).

: ( , )
: ) In general, for n nodes, it can be written as ( 5): In ( 5), K is an n×n matrix representing the heat diffusion kernel from graph G, and H(t) is a column vector representing heat distribution at time t in graph G, which is defined for primary heat source H(0).Now the limit of ( 5) is taken on ∆t when ∆t approaches zero and gives (6).
Then we take integral of both sides of ( 6): Now if we take the ln of both sides of (7), the heat equation is defined as ( 8): Then, using heat change equations RH and DH, two types of heat diffusion kernel are provided.Now, through MATLAB software, curves are drawn and with the help of curves, the most influential node can easily be found.
The first defined kernel is called heat diffusion kernel based on interactive and non-interactive activities.Matrix K is defined as: ( ) ; and 0 0 ; otherwise.
Fig. 2(a) illustrates the topological structure of heat diffusion based on interactive and non-interactive activities.This graph has been calculated and plotted using (9) and Fig. 1.As shown in the figure node v 1 neighbors six nodes v 2 , v 3 , v 4 , v 6 , v 8 and v 9 .Two nodes v 4 and v 6 have four neighboring nodes; nodes v 1 , v 2 , v 5 and v 6 neighbor node v 4 and nodes v 1 , v 4 , v 5 and v 7 are four neighbors of node v 6 .Each of nodes v 2 , v 5 , v 7 , v 8 and v 9 is directly related to two nodes.Finally the nodes v 3 and v 10 have one neighbor, it means that the node v 1 is the only neighbor of node v 3 and node v 9 is the only neighbor of node v 10 .Fig. 2(b) is the diagram of heat diffusion in which, as an example, node v 7 has been chosen as the heat source and is plotted by using ( 8) the diagram of heat diffusion.X-axis is the time line and Y-axis is the amount of heat at each node.The heat diffusion in this diagram is such that initially (at time zero), the heat source has a lot of heat and the rest of the nodes are without heat.At time 1, v 7 diffused heat to its neighbors (v 6 and v 8 ), and these two nodes gain heat.But node v 6 receives more heat and goes up.In this manner, the heat source reduces with time due to diffusion of heat to other nodes and the other nodes also receive heat from their neighbors and increase.Node v 1 receives more heat because it is directly linked with six nodes and increase more than all nodes; so it is obvious that it is ascending in the diagram.But nodes v 3 and v 10 with just one neighbor will be the lowest limit of the diagram.A point worthy to note in this diagram is that in the time interval 10, two nodes v 2 and v 4 are higher than node v 6 ; the reason for that is that these two nodes receive more heat than node v 6 ; hence in the diagram, nodes v 2 and v 4 have higher increases.
The degree of each node is one of the influential factors for information diffusion.If d j represents the degree of each node, the weight on the edges connected to node v j is calculated as (10): 1 .Therefore, RH i (t) can be changed into (11): Now a new kernel called heat diffusion kernel can define through (11) and ( 3) based on the non-interactive activities.
Fig. 3(a) is drawn using (12) and Fig. 1.As it was mentioned above, d j represents the degree of each node meaning that if the degree of each node is calculated based on (10), weight of the edge between nodes is obtained.As an example, vertices v 6 and v 8 are two neighbors of node v 7 ; therefore, the weight of 0.5 is assigned to E(v 7 , v 6 ) and E(v 7 , v 8 ), respectively, which means half of the heat of v 7 is transferred to v 6 and the other half to v 8 .Node v 10 has just one friend, which is vertex v 9 .Thus, the weight of E(v 10 , v 9 ) is equal to 1.It means that the whole heat of v 10 is diffused to v 9 .Similarly, the edge weight between vertices is calculated.
For example, node v 7 is considered as a heat source and the heat diffusion diagram is plotted in Fig. 3(b) using (8).The heat diffusion process in this diagram is as follows: at time interval zero, the heat source has high heat, and the rest of the nodes are without heat.With the passage of time, heat of the source reduces due to heat diffusion to other nodes; accordingly, heat increases in two nodes v 6 and v 8 , which are directly related to heat source (v 7 ).But as shown, finally, it is node v 6 , with more links to other nodes, which finally receives more heat and goes higher than node v 8 .From time interval 4 onwards, node v 1 increases and surpasses all other nodes, even the heat source.Node v 1 , with links to more nodes, receives higher heat.Therefore, expectedly, it follows an ascending trend.
A comparison of Fig. 2(b) and Fig. 3(b) diagrams indicates that in both diagrams, node v 1 is higher than all others; therefore, it is selected as the most influential node.But, as previously mentioned, the y-axis represents the receipt of the amount of heat; where each node receiving more heat goes higher.Now, a comparison of node v 1 in both diagrams reveals that node v 1 in diagram Fig. 2(b) is higher than in diagram Fig. 3(b); therefore, it can be stated that diagram Fig. 2(b) has received more heat.It is thereupon implied that the heat diffusion kernel based on interactive and non-interactive activities (9) acts better than the heat diffusion kernel based on non-interactive activity (12); because the nodes in heat diffusion kernel based on interactive and non-interactive activity have received more heat.It is now attempted to obtain a modified kernel by combination of kernels in such a way that the most influential node receives more heat (i.e., it is higher in the y-axis).This core is expressed as (13): www.ijacsa.thesai.org Fig. 4(a), is the topological structure of the proposed kernel drawn using Fig. 1 and ( 13).As can be seen, the weight on the edges varies when ( 13) is applied.Heat diffusion diagram with heat source v 7 using ( 8) is indicated in Fig. 4(b).In this diagram also the heat source has a descending trend over time due to heat diffusion to other nodes; and still, node v 1 with more links has higher heat and increases more than other nodes.Accordingly, node v 1 is selected as the most influential node in the proposed kernel similar to the previously presented kernels.Heat diffusion kernel based on interactive and noninteractive activities is compared with the proposed kernel in two diagrams Fig. 2(b) and 4(b) in Fig. 5.In this diagram, too, node v 7 is selected as a heat source; continuous line represents the heat source of proposed method, and a dotted line represents the heat source of the method of interactive and noninteractive activities.Node v 1 (as the most influential node) for the proposed method is indicated as triangular and for the case of interactive and non-interactive method is shown as a circle.As seen in the diagram of Fig. 5, the triangular curve is higher than the circle curve, i.e. node v 1 in the proposed method receives more heat, and therefore triangular curve has higher increase.So the proposed kernel acts better than the heat diffusion kernel based on the interactive and non-interactive activities.IV.

EVALUATION
In this section, two real data sets, Revije.paj and EIES.paj, are selected; then using Pajek software, the data is analyzed and implemented using MATLAB software.After that, TOPSIS method, which is a multiple criterion decision-making method, and is defined based on positive ideal solution and negative ideal solution, is used for evaluation.TOPSIS model evaluation and prioritization procedure will be as follows [4]: The first step is development of a decision matrix; that is, using m criteria, n indexes will be evaluated.
The second step is normalization of the decision matrix.To that end, ( 14) is used, which is a vector method.
The third step is making weighted normalized decision matrix where the weight of criteria is calculated by (15).; 1, , ; 1, , .
In the fourth step, the positive and negative ideals are calculated.The highest performance of each index (positive ideal) and the lowest performance of each index (negative ideal) are represented by A + and A -, respectively.These two indexes are calculated in ( 16) and (17).
In the fifth step, Euclidean distances of each criterion from the positive ideal and negative ideal are calculated by ( 18) and (19), respectively.
The sixth step is the calculation of ideal solution, i.e., the relative closeness of each criterion to the ideal solution, which is obtained by (20).; 1, , .
Prioritization is based on the value of i C  where this value can be 01 i C   .When this value is closer to one, it indicates the highest rank, and when this value is close to zero it indicates the lowest rank.

A. Revije Data Set
Revije is a data set of Slovenian magazines and journals published in 1999 and 2000 3 [15].124 different magazines and journals were listed, and over 100,000 people were asked to read magazines and journals.A typical network is created from this data set in which the magazines are the vertices.In this network, edges are directed and weighted; the ring at vertices corresponds to the readers of the magazine.If a reader has read two or more magazines, then those magazines are linked together, and the weight on that indicates the number of readers of two magazines.The topological structure of this data set is shown in Fig. 6.The data set of magazines and journals have interactive and non-interactive activities and include 124 vertices and 12068 edges.The highest degree of nodes is 123, which is related to   By comparing Tables 1 and 2, it can be seen that in all methods, node v 4 is the priority, but other priorities are different.For example, the second priority in TOPSIS method is node v 1 and in the heat diffusion method, node v 119 .Hence, the heat diagrams of Revije dataset for two nodes v 1 and v 119 are drawn in Fig. 7 and from that, the most influential node will be determined.For readability of the diagrams, the heat source (node v 4 ) is drawn only for the proposed heat diffusion kernel; heat diffusion kernel based on interactive and non-interactive activities are shown with  1 , heat diffusion kernel based on non-interactive activity is shown by  2 , and the proposed heat diffusion kernel is shown by φ 3 .
As shown in Fig. 7, nodes v 1 and v 119 are compared for the above-mentioned heat diffusion models, in both diagrams (a) and (b).Fig. 7(b) which is related to the node v 119 , receives more heat compared with diagram Fig. 7(a) and thus has a higher ascending trend.So node v 119 is more influential than node v 1 and has a higher priority.Hence, for advertisement and news diffusion, firstly node v 119 and then node v 1 are selected.www.ijacsa.thesai.org

B. EIES Data Set
EIES data set 4 [15] of Wasserman and Faust data collection is the second set of real data that has been considered in this paper.This data set also has interactive and non-interactive activities; its communication network is directional, weighted and has 32 nodes and 460 edges.Node v 1 with the degree 29 and node v 25 with the degree 6 are highest and lowest degree in this data set, respectively.Fig. 8 shows the topological structure of the data set.In Table 3, the top 10 nodes based on the TOPSIS model are shown in the order of priority.
According to Table 3, node v1 in TOPSIS model has the highest priority; accordingly, in Table 4, the order of priorities of 10 superior nodes for data set EIES is indicated using v 1 as the heat source.
Comparing Tables 3 and 4, it can be seen that the priority in all methods belongs to node v 1 , but other priorities vary.For example, the second priority in some methods is v 29 while in some other ones, it is v 32 .Thus, the diagram of heat diffusion of data set EIES for two nodes v 29 and v 32 is drawn in Fig. 9.Then, the most influential node is found from the diagrams.For readability of diagrams, the heat source (node v 1 ) is drawn just for the proposed heat diffusion kernel; heat diffusion kernel based on interactive and non-interactive activities are shown by  1 , the heat diffusion kernel based on non-interactive activity is shown by  2 and proposed heat diffusion kernel is shown by φ 3 .
By comparing the two diagrams (a) and (b) in Fig. 9, it is clear that the two diagrams have little difference.Now, if this is compared to diagram (c), which is related to node v 8 (third priority of TOPSIS model); for data set EIES, it can be stated that node v 8 has lower priority compared with node v 32 .Therefore, nodes v 29 , v 1 and v 32 have the best condition for diffusion in social networks and they are more appropriate for spreading ideas and news or advertisements.www.ijacsa.thesai.org

V. CONCLUSIONS
The most obvious problem in the field of social networks is finding k influential nodes in a network of individuals so as to benefit from the influence of these individuals in the entire network and diffuse the news in entire network to the most neighbors in the best and fastest possible way.Earlier, in the Methods section, three different heat diffusion kernels were defined: heat diffusion kernel based on interactive and noninteractive activities, heat diffusion kernel based on noninteracting activity and proposed heat diffusion kernel; this was followed in Experiments and Evaluation section by finding influential nodes for two data sets Revije and EIES (more specifically, node v 4 for Revije data set and node v 1 for EIES data set).Now in this section, the best heat diffusion kernel is depicted using Fig. 10.As stated previously, for readability of diagrams, the heat diffusion kernel based on interactive and non-interactive activities are shown by  1 , the heat diffusion kernel based on non-interactive activity is shown by  2 and proposed heat diffusion kernel is shown by φ 3 .
In Fig. 10(a), the diagram with the heat source for node v 4 in Revije data set is shown, where the diffusion kernel related to φ 3 , which is the proposed diffusion kernel, is higher; in other words, since the proposed kernel has higher heat compared to the other two kernels, it has an ascending trend.In this diagram, heat diffusion kernel based on the non-interacting activity is lower than the other kernels.
In Fig. 10(b), the diagram with the heat source for node v 1 in data set EIES is drawn in which again the proposed diffusion kernel (φ 3 ) has increased more compared to the other two kernels, i.e., in EIES data set, the proposed kernel has higher heat.On the other hand, in this diagram also the heat diffusion kernel based on non-interactive activity is lower than the other kernels.Therefore, proposed heat diffusion kernel in both data set Revije and EIES have higher increase and more heat.Thus, the order of kernels from the lowest to the highest priority can be concluded as the following: 1) Heat diffusion kernel based on the non-interactive activity.
2) Heat diffusion kernel based on interactive and noninteractive activity.

Fig. 2 .
Fig. 2. Heat diffusion based on interactive and non-interactive activities.

Fig. 7 .
Fig. 7. Comparing the second priority for models presented in Revije.
Fig. 9. Comparing top nodes with heat source v1 for models presented in EIES.
Fig. 10.Comparison of kernels to determine the highest priority of the kernel.
3 http://vlado.fmf.uni-lj.si/pub/networks/data.nodes v 1 , v 2 , v 4 , v 6 , v 27 , v 42 , v 113 and v 119 and the lowest degree is 18, which belongs to node v 50 .Ten highest nodes prioritized by TOPSIS model are shown in Table 1.Node v 4 which has the highest priority in TOPSIS model is considered as the heat source.Top 10 nodes with the heat source v 4 for Revije data set are shown in Table 2, in the order of priority.

TABLE .
III. THE ARRANGEMENT OF 10 SUPERIOR NODES BY THE TOPSIS MODEL FOR EIES DATA SET

TABLE .
IV. THE ARRANGEMENT OF 10 SUPERIOR NODES WITH THE HEAT SOURCE V4 FOR EIES DATA SET