Identifying Central Nodes in Directed and Weighted Networks

An issue of critical interest in complex network analysis is the identification of key players or important nodes. Centrality measures quantify the notion of importance and hence provide a mechanism to rank nodes within a network. Several centrality measures have been proposed for un-weighted, undirected networks but applying or modifying them for networks in which edges are weighted and directed is challenging. Existing centrality measures for weighted, directed networks are by and large domain-specific. Depending upon the application, these measures prefer either the incoming or the outgoing links of a node to measure its importance. In this paper, we introduce a new centrality measure, Affinity Centrality, that leverages both weighted in-degrees as well as out-degrees of a node’s local neighborhood. A tuning parameter permits the user to give preference to a node’s neighbors in either incoming or outgoing direction. To evaluate the effectiveness of the proposed measure, we use three types of real-world networks migration, trade, and animal social networks. Experimental results on these weighted, directed networks demonstrate that our centrality measure can rank nodes in consonance to the ground truth much better than the other established measures. Keywords—Centrality; weighted network; directed network; migration network; world input output trade network; community structure


I. INTRODUCTION
Data analysts from diverse domains represent relationships or ties between entities using graph-based network models. The semantic meaning of nodes and ties is, however, domainspecific; in social networks where nodes represent individuals, ties might represent friendship or face-to-face communication [17], [2] whereas, in web networks, ties signify the existence of hyperlinks between web pages [16]. In most real-world networks, ties are characterized by their strength as well as direction. For instance, in world trade networks, where links between nations represent the exchange of commodities, tie strength is the cash flow and its direction indicates either import or export [6]. When both the strength and direction of ties are available, modeling data as weighted, directed network can be more elucidative and revelatory.
Network models are generally deployed to explain or predict the behavior of entities [11]. One key requirement in these applications is to determine the 'most important' or 'central' node in a network. A centrality measure quantifies this notion of node importance and provides a means to rank nodes based on their importance. Central nodes are useful in varied applications such as predicting most cited authors [22], determining influential spreaders for product advertisement in online social networks [12], [25], detecting influential criminals [9], performing resilience analysis of power grid networks [13], locating key areas of activity in the urban infrastructure of a city [1], and traffic sampling for intrusion detection [28].
Several centrality measures have been formulated to quantify the notion of central nodes in un-weighted/ weighted, un-directed networks and are surveyed in [7], [3], [4], [5]. However, quantification of node centrality is more challenging in complex weighted and directed networks due to the dynamic effect of weighted reciprocal links on its computation. Very few measures exist for such networks, and the area remains under-explored.

A. The Problem and Motivation
PageRank (PR) proposed by Brin and Page to rank web pages is a popular and effective centrality measure [20], and there exist variations and extensions of PR for weighted, directed networks [27], [30]. These measures quantify the importance of a web page by iterative counting of the number and quality of its incoming links. The underlying assumption is that more important web pages have more incoming links from other central web pages. The problem is that this assumption, though correct for web pages, may not be valid for other domains. For example, in the migration networks, a state's importance in the network is affected not only by the incoming migrant population but also by the outgoing migrants from that state.
A pair of centrality measures that consider both incoming and outgoing links are computed through the Hyperlink-Induced Topic Search (HITS) algorithm for web pages. However, this method delivers two metrics -hub score and authority score [14]. A good hub page has outgoing links to many good authorities; a good authority page has incoming links from many good hub pages. Similarly, the recently proposed Bi-directional h-index also presents two measures, h in -index and h out -index that give preference to incoming and outgoing links, respectively [29].
This raises a critical question regarding the importance of incoming versus outgoing links when computing the relative importance of a node. We conjecture that, in some domains, incoming links have more impact than outgoing links, whereas, in others, it is vice versa. This trade-off offers the opportunity to define a novel measure that can tune the relative importance between incoming and outgoing ties. Consider the example network shown in Fig. 1 modeled as a citation network where nodes are authors and weighted incoming link from author A to author B indicates the number of times A has cited B. In citation networks, importance of an author is commensurate with the number of citations, therefore incoming links should be given preference for computing centrality. Highly cited authors are more important, and if the citations are from other highly cited authors, then the importance should increase proportionately. In the example network, author A is the most central by virtue of receiving the highest incoming links (citations). Authors C and D receive citations from two authors each. Although author C gets more citations compared to D, the centrality of D should be high because of being cited by highly cited authors compared to C.
On the other hand, if the network in Fig. 1 is an organizational network of employees and weighted outgoing link represents the number of tasks assigned by employee A to employee B, then outgoing links should be preferred for computing the importance. An employee at higher position supervises a large number of employees and has the privilege to assign more tasks to them. Such an employee has higher importance in the organization compared to others. Following this hypothesis, employee E is the most central because this node has maximum outgoing links. Between employees F and C with equal number of outgoing links, employee F should be considered more important than C because F receives tasks from other important employees.
In the same vein, for applications such as analysis of trade or migration networks, both incoming and outgoing links could be given user specified weightage.
Recognizing these requirements, we propose a new centrality measure called Affinity Centrality that determines the importance of a node based on preference and influence proportions of its local network. We propose an intuitive upgradation of simple yet powerful weighted degree centrality by incorporating neighbors' attachment with the node. The quantum of centrality contributed by a node's neighbor is decided by the relative proportion of its incoming/outgoing interactions. A tuning parameter permits the user to flexibly assign more weightage to either the in-neighbors or the outneighbors of a node. Our centrality measure leverages only local node topology, which distinguishes it from well-established PageRank and HITS methods. Despite its simplicity, the measure is able to rank nodes in a better consonance to the ground truth than these established measures.

B. Our Contributions
We introduce Affinity Centrality (AC), a centrality measure for weighted and directed networks. The summary of contributions follows.
• We propose a tunable centrality measure for quantifying the importance of a node by combining the advantages offered by its neighbors' topology via incoming and outgoing links (Section III).
• We perform an extensive evaluation of AC on realworld migration and trade networks and compare its effectiveness with well established centrality measures (Section IV-B).
• We demonstrate empirically the effect of the tuning parameter in capturing the relative importance of the incoming versus outgoing ties (Section IV-C).
• We evaluate the role of central nodes delivered by the proposed centrality measure on the community structure of real-world networks (Section IV-D).

C. Organization of the Paper
The paper is organized as follows: after a survey of centrality measures for weighted and directed networks in Section II, we present the proposed centrality measure (AC) in Section III. Section IV presents empirical investigations followed by conclusions in Section V.

II. RELATED WORK
Vast literature exists for centrality measures designed for un-weighted and un-directed networks [5], [19], [12], [10], [18]. However, computing centrality for weighted and directed networks still faces some gaps in terms of incorporation of the direction of interactions in the computation. We briefly describe the existing work for directed and weighted networks by dividing them into two categories viz. i) Local-neighborhood based and ii) Global network structure based measures.
In the local-neighborhood based class, a node's importance is computed based on its interaction with l-hop neighbors where l indicates the number of hops. Opsahl et al. proposed a generalized centrality method to incorporate impact of degree along with the strength of interactions using a tuning parameter which can be tuned to give importance to either of the two aspects [19]. However, the proposed mechanism considers either incoming or outgoing direction in computation. Neighborhood centrality computes importance based on the centrality of a node and its 2-hop neighbors' centrality for un-directed and weighted networks [18]. The absence of direction in computation reduces its applicability to directed networks.
Global network structure based methods consider the influence of all nodes on the importance of a pivot node. Two established algorithms in this category are HITS [14] and PageRank [20], that measure the probability of a random walker visiting a node on the web to assign a rank. HITS gives two scores Hub and Authority based on the direction considered whereas PageRank does ranking using incoming interactions only. Various extensions to these two algorithms have been proposed to extend them to directed weighted networks [29], [30], [24]. Zhang et al. proposed a weighted Pagerank algorithm for directed networks that incorporated the role of a node's degree, its strength and the node information using a tuning parameter to compute its rank [30]. Wang et al. modified the efficiency centrality for un-directed and weighted networks and incorporated both the degree and distance of all the nodes in a network [26]. Singh et al. proposed hybrid node-weighted centrality measures based on closeness and decay measures and made use of node information alongwith edge weight to identify important nodes [23]. However, the high computational complexities of global network based algorithms make them unsuitable for large networks.
Designing an effective ranking measurement to capture the importance of nodes in a directed and weighted network is still an open challenge. Our proposed measure Affinity Centrality fills this gap by encapsulating both types of interactions along with their strength in the computation of the topological significance of a node in the network.

III. AFFINITY CENTRALITY FOR WEIGHTED AND DIRECTED NETWORK
This section describes the proposed centrality measure called Affinity Centrality that leverages auxiliary information in a node's 1-hop neighborhood to determine its importance.

A. Notations used
Let G(V, E) be a weighted and directed network of order N = |V |, size M = |E| where V denotes the vertex set and E denotes the edge set. The network G can be represented by an asymmetric weighted adjacency matrix W := (w ij ) of size N XN where N = |V |. Each element w ij ∈ Z + represents the strength of the interaction from node i to j and w ij = 0 represents no interaction. We use w i→j and w i←j to refer to the strength of outgoing and incoming ties of node i, respectively.
Let O i denotes the total strength of all the outgoing ties and I i denotes the total strength of all the incoming ties of a node i i.e I i = j w i←j and O i = j w i→j . Hence, total edge weight T = i I i = i O i . In case when weights are unknown, W = A where A := (a ij ) is the standard adjacency matrix having a ij = 1 if nodes i and j are adjacent, otherwise 0. Notations used in the paper are detailed in the Table I for ready reference.

B. Constituents of Node Importance
A directed and weighted network, in general, consists of asymmetric interactions, and the direction of an interaction along with its weight emulates the preferential attachment of individuals in their neighborhood [8], [27], [19], [30]. The importance of a node depends upon its bonding with its local neighbors, which depends upon the strength and direction of the interactions. We refer to weight on an incoming edge as in-strength and weight on an outgoing edge as out-strength. determines the endorsement from its neighbor j. The higher the value, the more preferentially attached the node i is with node j. Also, the influence gained through preferential attachment increases if the endorsement for the node j within its local neighborhood is high too. In other words, resources gained by an individual show its power which is captured by its total in-strength (I j ). Formally, preference (β i ) of a node i with neighborhood set L i is defined in the Eq. 1.
ii. Influence: The strength of the outgoing ties of a node i demonstrates its influence on its neighbors and captures its endorsement (preferences) for others. A higher value of wi→j Ij indicates a high influence of node i on node j. Also, the influence of a node i propagates in the network through its neighbors, which is captured through their out-strength O j . Collective endorsement of the neighbors along with an individual's support impacts its influence on others. Formally, influence γ i of a node i is computed as given in the Eq. 2.

C. Affinity Centrality
The importance of a node depends upon its structural position in the network which depends upon its interactions with neighbors. We compute the proposed affinity centrality (AC) by incorporating effect of preference and influence of neighbors on the node i, using a tuning parameter θ ∈ [0, 1] (Eq. 3). Note that θ gives flexibility to the end-user to include either of the in-strength and out-strength or both based on the application need.
Using θ = 1 will reveal the influence of in-degree neighborhood on a node's affinity, in contrast to θ = 0, www.ijacsa.thesai.org that captures influence using its endorsement for neighbors. Using θ = 0.5 will incorporate the role of both influence and preference on the node's position in the network structure. The higher the position is, the more powerful/important that node is. For example, in trading, the importance of a supplier is dependent on imports as well as exports. Importing from the established suppliers increases its endorsement, whereas exporting to powerful vendors improves its position in the trade. Hence, θ = 0.5 is recommended in such scenarios. In case, influence is to be captured purely on the basis of imports/exports, then θ = 0/1 is recommended.
To substantiate the argument, we rank nodes of the example network ( Fig. 1) by computing AC with varying values of θ. The ranks are shown in Table II. With θ = 0, only outgoing links are considered for capturing centrality; hence node E is assigned the highest rank, and node F is ranked above C. With θ = 1, outgoing links are ignored resulting in node A being ranked highest, and node D ranked above C. The results validate the motivation (see subsection I-A) and establish the theoretical formulation of the proposed centrality measure.

D. Algorithmic Complexity
As the method exploits information of the nodes' neighborhood to quantify centrality, the computational complexity is O(M ). The proposed method is effective for handling large networks due to its O(M + N ) storage space requirements.

IV. EXPERIMENTAL EVALUATION
The goal of this section is to assess the performance of the proposed Affinity Centrality (AC) on the basis of the following questions.
i. How effective is the ranking delivered by AC measure?
We inspect this question in Section IV-B using six weighted and directed real networks for which ground-truth can be crafted. ii. How does in-strength and out-strength impact the ranking computed by AC measure? This investigation is done to demonstrate the role of preference and influence components on the importance of a node using a small sized weighted and directed network. iii. What is the role of topological central nodes on the community structure? We examined this question by extracting communities of the six networks and studied their evaluation in terms of important nodes delivered by the proposed centrality measure.
We evaluated the performance of AC measure by comparing its results with two simple and widely used local centralities viz. Weighted in-degree (WI) and Weighted outdegree (WO). We also compare the results with the two global structure based algorithms viz. PageRank [20] and HITS [14] for weighted, directed networks.
We implemented our proposed measure AC and variation of degree centrality -Weighted in-degree (WI) and Weighted out-degree (WO) in Python (64bits, v 3.6.9) and executed on Intel Core i3-4005U CPU @1.70GHz with 4GB RAM. We used the modules PageRank and HITS of the graph library networkx 1 of Python for comparison. Results of the experimentation are discussed in the following sub-sections after the description of networks used.

A. Real-world Networks
We consider three types of directed and weighted networks -migration, trade and animal networks to investigate the effectiveness and stability of the proposed AC measure. Description of these publicly available networks are detailed below.  iii. The Moreno Rhesus monkey grooming network represents a network of 16 monkeys 4 . The network consists of 16 nodes representing the monkeys and a weighted edge from a monkey (say A) to another monkey (say B) represents the number of times the monkey A groomed monkey B. We use this small network to demonstrate the role of the introduced tuning parameter (θ) on the node centrality.
Topological and structural properties of these networks are given in the Table III.

B. Effectiveness of the Affinity Centrality
We study the effectiveness of the proposed centrality measure AC by using six real networks in two categories: migration and trade. The topological characteristics of the networks are detailed in Table III. To evaluate the performance of the centrality measures, we consider the Gross Domestic Product (GDP) of a state/country as ground truth because it is the most commonly used measure of economic activity and stability during a period of time (typically 1 year). Higher GDP of a state/country indicates richness in terms of resources and services and raises the living standards of its residents by offering more jobs, business opportunities, etc..
We compare the performance of the AC measure with four popular centrality measures viz, i) In-strength (WI) ii) Out-strength (WO) iii) Weighted Pagerank (PR) and Weighted HITS (both hub and authority scores). For each centrality measure, we rank nodes such that rank 1 is assigned to the largest value and so on. Spearman's rank correlation coefficient [21] is used to find the correlation of computed node ranks with ground-truth ranks, where correlation value indicates the ability of centrality measures to deliver correct ranks. Table IV shows the ranking assigned by different measures to the top-10 ranked nodes as per GDP (first column in the table) for six networks. Ranking by AC measure is computed using θ = 0.5 to include an equal, fair proportion of the two components viz. preference and influence on a node's importance. The last row of the table shows the correlation value. For all networks excluding Business2001, AC identifies the top-ranked nodes most accurately. Also, rankings assigned by AC are in better agreement with the ground truth compared to other measures, as indicated by the largest correlation (shown in bold). This is attributed to the inclusion of the relative importance of two components in capturing importance. Hence, AC stands out as the most effective performer for capturing importance in weighted directed networks. 4 http://konect.cc/networks

C. Effect of Tuning Parameter
This section demonstrates empirically the effect of the tuning parameter (θ) in capturing the relative importance of incoming versus outgoing ties using the animal network shown in Fig. 2 where edge size reflects the proportional edge weight. We use θ ∈ {0, 0.5, 1} to show the role of both components preference and influence on the node ranks (shown in the Table  V). We examine the ego-networks of three nodes M1, M4, M5 (Fig. 3) to study the role of θ on a node's importance. In the figures, I indicates total in-strength and O indicates total outstrength of the neighbor. i. Table V shows that node M1 is assigned extreme ranks for θ = 0 and θ = 1, where value 0 or 1 indicates inclusion of a node's preference or influence component in centrality computation. Consider Fig.  3a for analysis of ranks for different θ values. When θ = 0 is used, measure AC assigns a low rank to M1 because of its low influence on its outgoing neighbors M2 and M3 which themselves are less preferred nodes in their neighborhood (low value of I). On the other hand, θ = 1 results in a high rank because of the high preferences of its three neighbors (M2, M3, M6) for node M 1 where preferences are in proportion to in-strength of neighbors (Fig. 3a). High proportions of in-strength of three neighbors (M2, M3, M6) indicate their high preferences for the node which improves the rank of M1. Also assigning equal weightage to both components of importance (θ = 0.5) results in a middle rank as both the preference from the neighbors and the influence to neighbors together impact its importance.  USA  1  1  1  1  3  1  CHN  3  3  3  3  2  2  JPN  6  6  7  9  6  7  DEU  2  2  2  2  4  3  GBR  5  5  6  5  8  8  FRA  4  4  5  4  10  5  BRA  19  18  19  21  16  15  ITA  10  8  10  6  12  11  RUS  13  19  11  16  14  20  IND  23  20  23  23  18   14 15 14 ii. In contrast to node M1, the ranking of node M4 remains the same (although low) for all cases as shown in Table V. It is always ranked lowest because of its minimal influence on its out-going neighbor (M2) and low proportion of preferences from its incoming neighbors M2, M14, M15 having high values of I and O (Fig. 3b). iii. Node M5 has a large in-degree and out-degree compared to node M1, but with low interaction strength. When θ = 0 is used, the high influence of node M5 on its influential neighbors with high out-strength rises its rank (Fig. 3c). For θ = 1, the node is ranked 12 th , which is comparatively lower than the rank assigned to node M1. A lower rank is attributed to lower preferences from its neighbors although the value of in-degree is high (low value of instrength and I). When θ = 0.5, node M5 is ranked in the middle due to the cumulative effect of both components on its structural position in its egonet.

D. Central Nodes and Community Structure
The objective of this section is to detail the role of central nodes delivered by the proposed measure AC on the evolution of communities. Communities provide a good insight into the connection patterns and binding among nodes. We use community detection module Rbpots [15] of library CDLIB 5 in Python to extract communities from the directed and weighted networks. Extracted communities are plotted using Paintmaps 6 where a color scheme is used to differentiate communities based on the interaction strength of underlying nodes. We executed the Rbpots module to identify communities in four networks of migration class and two networks of trade class. The plots are shown in Fig. 4. We compared the extracted communities for two different years under the same class to understand their evolution in terms of change in node ranking. We describe below our observations using the top-10 nodes delivered by the measure AC (Table VI) for two categories of networks.
i. Business-based migration network: Fig. 4a and 4b show the communities for networks Business2001