Analysis of Coauthorship Network in Political Science using Centrality Measures

In recent era, networks of data are growing massively and forming a shape of complex structure. Data scientists try to analyze different complex networks and utilize these networks to understand the complex structure of a network in a meaningful way. There is a need to detect and identify such a complex network in order to know how these networks provide communication means while using the complex structure. Social network analysis provides methods to explore and analyze such complex networks using graph theories, network properties and community detection algorithms. In this paper, an analysis of coauthorship network of Public Relation and Public Administration subjects of Microsoft Academic Graph (MAG) is presented, using common centrality measures. The authors belong to different research and academic institutes present all over the world. Cohesive groups of authors have been identified and ranked on the basis of centrality measures, such as betweenness, degree, page rank and closeness. Experimental results show the discovery of authors who are good in specific domain, have a strong field knowledge and maintain collaboration among their peers in the field of Public Relations and Public Administration.

www.ijacsa.thesai.org II. RELATED WORK Modularity divides a complex network into small groups called modules. If the modularity value of a graph is high then it means that modules are cohesive and are strongly connected with each other. Shang et al. [13] proposed a MIGA-Modularity and Improved Genetic Algorithm to overcome the difficulty for finding optimal solution when handling large scale network problem with hill climbing. MIGA has low computational time and can detect more than half part with prior information using simulated annealing method.
Sutaria et al. proposed a community detection algorithm in which author finds the communities on the basis of modularity class [14].
Newman proposed CNM algorithm, discovering nonoverlapping and overlapping communities [15]. Palla et al. described cumulative distribution functions P(scom), P(dcom), P(sov) and P(m) that used four basic quantities. Each node represented as i of network characterize as membership number ni of the community. Communities are represented as α and β, that share overlapping property depicting the size of the community. Palla et al. used k-clique method for finding communities in a network. The benefit of this method over divisive method and agglomerative method is that it allows construction of unconstrained network of communities [16].
Karsten et al. used a simple approach with common neighboring similarity, topological clustering coefficient similarity and node attribute similarity using directed weighted graph. Proposed approach identified the clustering coefficient of the node, using clustering coefficient similarity. It measures the contribution of the connectedness among the neighboring nodes. Common neighboring similarity captures overall connectedness between immediate neighbors of nodes by substituting the neighbors. Finally, node attribute similarity computed the weight of edges based on node attribute similarity [17]. Yang et al. proposed an approach that utilized the spectral clustering algorithm which compared network communities quantitatively. Thirteen different communities were examined and divided into four classes. This methodology used for comparing networks, based on real data and examining their robustness [18]. Authors found that this method reliably detected ground-truth communities.
Qiu et al. proposed a ranking algorithm called ocdRank for finding overlapping communities in social network. The algorithm combines the features of overlapping community detection and community member ranking in heterogeneous social networks. Results show that ocdRank has low time complexity and detected better community structure as compared to other community detection methods [19].

III. PROPOSED METHODOLOGY
The proposed analysis methodology consists of three steps: First, the data is collected from Microsoft Academic Graph (MAG), then in second step, the data is preprocessed and transformed in required form, thirdly, we applied centrality measures and ranked the authors related to each field. We have chosen two fields of Political Science, Public Relations and Public Administration, and analyzed these fields using most common centrality measures. In the study, the goal is set to find most prominent group of authors in each field and ranked these authors according to work in their respective field. The proposed methodology is applied one by one on each field, which is discussed in subsequent sections.

A. Ranking Authors based on Degree Centrality
The degree centrality measure is used to find highest degree node. The degree centrality measure highlighted those scientists who have highest collaboration. The average degree distribution of public relations is 2.683. Most of the researchers have low degree and few researchers have high degree as shown in Table II. The author named as "14674B35-DanckerDLDaamen" of public relation affiliated to Leiden University, has highest influence and frequent collaboration with other 47 researchers as shown in figure 2. "14674B35-DanckerDLDaamen" has worked exclusively in public opinion field which is the sub field of public relations. The second most influence author is "7FF2291D-DarrelMontero" and is affiliated with Arizona State University.
We extracted the graph of top 10 degree researchers and their connected researchers as shown in figure 3. This graph contains 426 researchers and 1146 collaborations. Average degree of top 10 degree graph is 5.38, network diameter is 4, modularity is 0.7 and there are 11 connected components in the network. Modularity value shows that this graph has good community structure. In figure 5, the most productive institute is the Univeristy of Missouri. "7F4328BD-GlenTCameron" is the researcher who has degree 38 and ranked as 4 th in top ten degree, with 41 other researchers. The author collaborated with University of Missouri, Missouri School of Journalism and University of Georgia and he has productive research with University of Missouri as he has 19, 6 and 1 publications, respectively. The second most productive institute is University of Minnesota. "7E654E5D-DavidPFan" is the researcher who has 27 degree and ranked as 10 in top ten degree researchers, having collaboration with 28 other researchers. The author is affiliated to University of Minnesota and he has eleven publications.

B. Ranking Authors based on betweenness Centrality
Betweenness centrality ranks the nodes with highest value that are part of most of the shortest path. The network diameter of public relations is 41 and the length of average path is 12.833. Majority of the researchers have zero or near to zero betweenness, some researchers have high betweenness, which shows that they are responsible for flow of knowledge from one community to another community.     Table III shows the top 10 researchers who have high betweenness in the field of public relations. Figure 4 shows the graph that contains 119 researchers and 129 collaborations. The network diameter of graph is 8, average path length is 4.32, 0.754 modularity and there are 2 connected components. The most influential author is "771B6FCA-DietramAScheufele" who is affiliated to "University Of Wisconsin Madison", "Nanyang Technological University", "Ohio State University", "Cornell University", "University of Washington" and "University of Wisconsin Madison School of Journalism Mass Communication". The author is the most central researcher and is involved in shortest path from one researcher to other researcher and have frequent collaboration, as he is ranked 9 in degree centrality measures.
Node "80EE5C66-JeongnamKim" is the second most central researcher having frequent collaborations. He has ranked 25 th in degree centrality measures, affiliated to "Purdue University", "University Of Houston", "University Of Maryland College Park", "Hankuk University of Foreign Studies","University Of Siena", "Hong Kong Baptist University", "Indiana University", "Kansas State University" and "San Diego State University". He has worked in multiple fields like "Reputation", "Soft Power" and "News Media", subfields of public relations. He has collaborated with 13 other researchers. In figure 6, the most productive institute is the Univeristy of Georgia as this institute has highest number of publications. "76015751-BryanHReber" is the researcher who has degree14 and ranked as 6th in top 10 betweenness researchers, having collaboration with 15 other researchers. He has collaboration with University of Georgia, University of Alabama, Missouri School of Journalism, University of Florida and University of Maryland College Park, as he has 11, 8, 4, 2 and 1 publications, respectively. The second most productive institute is Purdue University. "80EE5C66-JeongnamKim" is the researcher who has degree 12 and ranked at second place in top 10 betweennes researchers. having has collaboration with 13 other reasearchers, in collaboration with Purdue University, University of Maryland College Park ,University of Houston, Hankuk University of Foreign Studies, University of Siena, Hong Kong Baptist University, Kansas State University, San Diego State University and Indiana University. He has 7, 3, 2, 1, 1, 1, 1, 1 and 1 publications with these institutions, respectively.

C. Ranking Authors based on Closeness Centrality
The author "771B6FCA-DietramAScheufele" is most central researcher and ranked first in betweenness and closeness centrality, as shown in Table IV. He has worked exclusively in public opinion field which is the sub field of public relations. "7CF2B524-DoohunChoi" is the second most central researcher. Graph of top 10 researchers based on closeness centrality is shown in figure 8. This graph contains 105 researchers and 121 collaborations. The diameter of network is 7, average path length is 4.073, 0.683 is modularity and there is a single component. Figure 7 shows the most productive institute that is "03FD8454-University Of Maryland College Park". "0916F08B-James E Grunig", "7CF3C0D4-Elizabeth L Toth" and "7E3071EE-Beyling Sha" researchers are affiliated to ""03FD8454-University Of Maryland College Park"" and they have 4, 3 and 1 publications, respectively. The second most productive institute is "0D109F83-Purdue University". "80EE5C66-Jeongnam Kim" researcher is affiliated to "0D109F83-Purdue University" and has 7 publications.

D. Ranking Authors based on PageRank
We have discussed top ten researchers having highest PageRank centrality of "Public Relations-025B78CE", as shown in Table V. Figure  9 shows the researcher "7F4328BDGlenTCameron", who has the highest PageRank, and has worked in "03FEE94E-Media Relations", "09820AAE-Communication Management", "09BDF000Corporate Communication" and "071FA02B-Journalism" fields. "7F4328BD-GlenTCameron" is affiliated with three different affiliations i.e. "04946B1EUniversity of Missouri", "061FEB1F-Missouri School of Journalism" and "09E0E324University of Georgia". Figure 9 contains 281 nodes

A. Ranking Authors based on Degree Centrality
The average degree distribution of public administration field is 3.924. In public administration field, most of the researchers have low degree and some have high degree.
The author of public administration, named as "12F4FDCC-Eds" is affiliated to "Centro Agronomico Tropical De Investigacion Y Ensenanza", who has highest influence and frequent collaboration with 110 researchers as shown in Table VI and in figure 10. "12F4FDCC-Eds" has prominent worked in 0B2F54F0-Kenya, 0A51FEF5-Refugee, 034E1111-International Law, which are the sub-fields of public administration.
The second most influencing and frequent collaborative author is "7EBE0990-RobertEBlack", affiliated to "0A183231Johns School of Public Health". He also has worked with other different affiliations i.e "05B090CE-University of California Berkeley", "08A948CC-Johns Hopkins University", "4FBCBEC0-United Nations High Commissioner For Refugees". "7EBE0990-RobertEBlack" has worked in "0A51FEF5-Refugee","0AAE1030-Containment", "0B2F54F0-Kenya" and "063ABE50-Displaced Person" fields which are sub-fields of public administration and he has collaborated with 70 other researchers. Community of top ten degree researchers and their connected researchers is shown in figure 11. This graph contains 1263 researchers and 1389 collaborations. Average degree of graph is 2.2, network diameter is 6, modularity is 0.829 and there are four connected components. The most productive institute in community of top 10 highest degree researchers of public administrations are the "339CD1B3-Vu University Amsterdam", "0A183231-Johns Hopkins Bloomberg School Of Public Health", "070B5E86-Aga Khan University" and so on as shown in figure 12. "0CAEADF8-Vu" has 146 degree and ranked at 3, "7EBE0990RobertEBlack" has degree 159 and ranked at 2 and "781D4EE0-ZulfiqarABhutta" has 93 degree ranked at 9, are affiliated to "339CD1B3-Vu University Amsterdam", "0A183231-Johns Hopkins Bloomberg School Of Public Health" and "070B5E86-Aga Khan University", respectively.

B. Ranking Authors based on Betweenness Centrality
The network diameter of public administrations is 34 and the average path length is 24.833. The highest normalized betweenness is 4.44E-03 and least is zero. The author "7EBE0990-RobertEBlack" has collaborated with "0C45A054Diarrhoeal Disease Research Bangladesh", "0A183231-Johns School Of Public Health", "08A948CC-Johns Hopkins University", "070B5E86-Aga Khan University", "4CED0A71World Health Organization", "05628CAA-Medical Research Council", "043B0D41-London School Of Hygiene Tropical Medicine", "4CEF40CE-Save The Children" and 21 other affiliations, having highest influence and frequent collaboration with other 148 researchers as shown in Table VII. "0CAEADF8-Vu" is the second most central researcher and have frequent collaboration as he is ranked 3 in degree centrality measures having 146 degree and affiliated to "339CD1B3-Vu University Amsterdam", "3653C029-Vu www.ijacsa.thesai.org University Medical Center", "34DF872C-University Of "7B95835A-David H Peters" and "7EBE0990-Robert E Black" belong to "0A183231-Johns Hopkins Bloomberg School Of Public Health" and they have 10 and 20 publications ,respectively.

C. Ranking Authors based on Closeness Centrality
The author of public administration field, named "7EBE0990-RobertEBlack" is ranked first in closeness, same as in betweenness, as shown in Table VIII. "7B95835ADavidHPeters" is the second most rated researcher who is responsible for spreading information frequently to other researchers in a network, since he has ranked 8 in degree centrality measures having 99 degree and prominently affiliated to "08A948CC-Johns Hopkins University", "0A183231-Johns School Of Public Health", "0992A59E-Makerere University School Of Public Health", "0AE9B3CCIndian Institute Of Health Management Research" and 12 other affiliations as shown in figure 16.
Amsterdam" "00C86936 -University Of Cantabria" and 17 other affiliations. Graph for top ten researchers with respec t to betweenness centrality is shown in figure 13. This graph contains 900 researchers and 944 collaborations.
The network diameter of top 10 betweenness researchers graph is 10, average path length is 4.83, 0.832 is modularity and there are 2 connected co mponents. The most productive institutes in community of top 10 betweenness researchers of public administrations are the "339CD1B3 -Vu University Amsterdam", "0A183231 -Johns Hopkins Bloomberg School Of Public Health", "070B5E86 -Aga Khan University", "09650 0 C 2 -University Of Cape Town", and 26 other affiliations. Since they have large number of publications as shown in figure 14. "0CAEADF8 -Vu" researcher belongs to "339CD1B3 -Vu University Amsterdam" and he has 24 publications.     "7EBE0990-Robert E Black", "7B95835A-David H Peters", "80A44097-Jennifer Bryce", and "7DDF7540-Ronald H Gray" researchers belong to "0A183231-Johns Hopkins Bloomberg School of Public Health" and he have 12, 10, 05 and 03 publications, respectively. "7B222E50-Cesar G Victora" and "80A44097-Jennifer Bryce" researchers belong to "0A1685A1-Universidade Federal De Pelotas" and they have 12 and 1 publications, respectively. "781D4EE0-Zulfiqar A Bhutta" researcher belong to "070B5E86-Aga Khan University" and this author has 12 publications.

D. Ranking Authors based on PageRank
The top ranked researchers who have highest PageRank are shown in Table IX. The author in "Public Administration002F8D8F" field named as "7F404D7B-PeterDreier" is the researcher who has highest PageRank and has published more than 300 publications by collaborating with 63 researchers related to different fields.
In graph of top 10 PageRank researchers, the most productive affiliation is of "339CD1B3-Vu University Amsterdam" with 36 publications as shown in figure 18. www.ijacsa.thesai.org We extracted the graph of top 10 PageRank researchers and their connected researchers as shown in figure 17. This graph contains 645 nodes and 649 edges. Network diameter is 4, average path length is 2.407, modularity is 0.847 and there are seven connected components.

VII. DISCUSSION
The social network analysis has been widely explored to discover relationship patterns among individuals, teams, groups, societies, communication devices and even among organizations. The study discloses patterns of associations that help in best decision making and better understanding of various patterns in a graph. Analysis study in the domain of coauthorship network helps to identify the dynamic collaboration patterns exist in specific field. We applied centrality measures on two sub fields that is Public Administration and Public Relations of Political Science. We have analyzed just two fields because due to the hardware limitation and the availability of too much nodes where our computer is unable to process more than ten billion nodes. Data is collected from Microsoft Academic Graph. We have taken 102975 papers related to the field of Public Relations and 143831 papers related to Public Administration. For coauthorship network analysis, we selected data that covered time span of 16 years i.e. from 2000 to 2016. We represented the graph in the form of adjacency matrix that is created using Python and R. We considered four common centrality measures for coauthorship network analysis and visualized the centralities and author communities using Gephi and R. Different centrality values for different authors reflect collaborative patterns and trends occurring in 16 years of time span. Analysis on this huge database of public administration and public relation authors discovered the top group of authors who collaborated frequently and diversely in both domains. Some authors hold strong position in a network which shows their strong influence in research collaboration and knowledge sharing.  Our analysis is carried out for undirected non-overlapping communities. In future, we will try to carry out an analysis study on directed graph of coauthorship network that will show not only frequent collaboration with co-authors but will also reveal number of publications in relation with other coauthors. There is also a gap to identify the overlapping collaboration among authors because different authors have research contributions in various fields. Other parameters can also be used like impact factor, number of publications and citations count for overlapping community detection to identify and extract the dynamic collaborative patterns in coauthorship network.