Measuring Homophily in Social Network : Identification of Flow of Inspiring Influence under New Vistas of Evolutionary Dynamics

Interaction with different person leads to different kinds of ideas and sharing or some nourishing effects which might influence others to believe or trust or even join some association and subsequently become the member of that community. This will facilitate to enjoy all kinds of social privileges. These concepts of grouping similar objects can be experienced as well as could be implemented on any Social Networks. The concept of homophily could assist to design the affiliation graph (of similar and close similar entities) of every member of any social network thus identifying the most popular community. In this paper we propose and discuss three tier datamining algorithms) of a social network and evolutionary dynamics from graph properties perspective (embeddedness, betweenness and graph occupancy). A novel contribution is made in the proposal incorporating the principle of evolutionary dynamics to investigate the graph properties. The work also has been extended towards certain specific introspection about the distribution of the impact, and incentives of evolutionary algorithm for social network based events. The experiments demonstrate the interplay between on-line strategies and social network occupancy to maximize their individual profit levels. Keywords—Homophily; Affiliation; Embeddedness; Betweenness; Graph occupancy; Evolutionary dynamics


INTRODUCTION
Different properties of social network have demonstrated potential interplay between events, participants and social network itself.
There are numbers of instances, where, attribute of social network could drive the application area of the network itself [21].Hence, certain properties of social network have an emerging impact [27] and homophily like behavior is definitely one of them.
Prior research demonstrates impressive role of such behavior on the application and analysis of social network [ 28] .[20], the tendency of individuals to form association with individuals of similar socio-cultural background, becomes the basic governing structural component of any social network and it has been the focus of many social network studies [2] [7].Social network studies reveal that "social networks are homogeneous with regard to many socio-demographic, behavioral and interpersonal characteristics" [3].In any existing social network such as Facebook or Twitter there lies some common functional attributes such as 'posting photos', 'sending messages', 'likes', 'dislikes', etc..These kinds of activities lead to the concept of affiliation towards a community [20].From influence of social propagation, Facebook and Twitter are dedicated to disseminating the information and thus the concept of Twitter follower graph and cascading of influence also reinforces the hypothesis of different influence measurement model [4].Considering the broader definition of the problem, this paper finds a close similarity between graph theory and a social network homophilic structure and explores the empirical significance of influence propagation or a popular community ranking and detection.The paper validates the existing graph postulates with a proposed mining algorithm and simulation applied a Facebook data set.Investigation yields certain significant results with regards to popular community structure and ranking based on different classical graph theory properties like path traversal, Betweenness and Embeddedness of social network nodes.After intial validation through graph simulation, an initiative has been solicited with a selforganizing and evolutionary principle, which could dynamically trace the variants of social network.The role of evolutionary dynamics [20] is also considerably significant as it is defined as a study of the mathematical principles according to which life has evolved and continues to evolve.The evolution is also visible in the formation of social graph.Reinforcement and validation of graph properties has been demonstrated through the algorithmic strategies coined from evolutionary dynamics [13] [14].The remaining part of the paper is organized as follows: Section 2 elaborates the statement of the problem with the parameters of graph theory followed by existing methodologies, examples of social and affiliation graph in section 2A and motivation of the analysis has been discussed in section 2B.Section 3 describes mathematical treatments responsible for proposed algorithm, presented in section 4. Section 5 discusses the data set for experiments and their implication on the graph properties of social network.Section 5.1 introduced the role of evolutionary dynamics to validate the simple graph properties for social network instances, which may implicate in the mining of graph related inferences.Finally section 6 gives conclusion and mentions further scope of relevance research on the paradigm.

II. STATEMENT OF THE PROBLEM
A community is formed in order to propagate or transfer or share different knowledge across the network.And to disperse www.ijacsa.thesai.org the utility of one's community, it is required to apply some techniques, such as voting or polling, on which knowledge about the particular community has to be shared.When forming community, certain basic parameters along with the links among the members of the community are to be considered.Parameters and links are as follows:

A. Friend List
Friendship is developed in a social site based on some common factor, like as, members belonging to same school, working area, community and so on.Similar type of characteristic people can be blocked into a common structure and even a few from the block can also belong to some other structure based upon their choice.[8] The homophily test of friendship in a social site can be interpreted with the help of the following example.Let there be a network where "m" fraction of all individuals are male and "f" fraction of all individuals are female.Considering a given edge in this network, if we independently assign each node the gender male with probability "m" and the gender female with probability "f" then the both ends of the edge will be male with probability "m 2 " and similarly both ends will be female with probability "f 2 ".But if the first end of the edge is male and the second end is female or vice versa then there exist 'cross-gender edge'.This condition will take place with a probability "2mf".Thus the test for homophily according to gender can be summarized asif the fraction of cross-gender edges is less than "2mf" then there is a presence of homophily [8].

B. Community Links
Community can be created by a group of members by selection and social influence method.The tendency of people to form friendships with others who are like them are termed as selection [8].The selection criteria are mainly race or ethnicity or similar characteristics.People may also modify their behaviors to bring them more closely into alignment with the behavior of their friends.This process is vividly described as socialization and social influence [8].
The individual similar characteristics drive the formation of links but social influence is a mechanism by which the existing links in the network serve to share people's characteristics.

C. Graph occupancy period: Initial time and final time
It is the amounts of time spend on visiting a node.The duration of remaining in a focus is calculated by checking the difference in final time and initial time.This indicates the graph occupancy period.

D. Paths and connectivity
According to the social scientists John Barnes defines graph theory as "Terminological jungle, in which any newcomer may plant a tree."[8] A path is defined to be as a sequence of nodes with the property that each consecutive pair in the sequence is connected by an edge.The paths can also be analyzed as not just the nodes but also the sequence of edges linking these nodes.Connectivity can be described by saying a graph is connected if for every pair of nodes, there is a path between them.If a graph is not connected, then it is separated into a set of connected pieces.Connected components of a graph are a subset of the nodes such that the following two properties hold [8].
1) Every node in the subset has a path to every other [8] 2) The subset is not part of some larger set with the property that every node can reach every other [8].

E. Path Traversed
The path traversal is important in spread of important information.It is required to examine whether something flowing through a network has to travel just a few hops or more.The "Length of a path" is the number of steps it contains from beginning to end that is the number of edges in the sequence that comprises it.The path traversal technique used over here is the "Breadth -First -Search".The method of the traversal is one just need to keep discovering nodes layer-by-layer, building each new layer from the nodes that are connected to at least one node in the previous layer.Since it searches the graph outward from a starting node, reaching the closest node first, it is named as breadth-first-search [8].

F. Embeddedness
The number of common neighbors the two end points in a network has referred to as embeddedness of an edge.This is illustrated with the help of a schematic diagram in schema1.

Schema 1 An affiliation network Case
Here, embeddedness for two node A and node B has two common neighbors' node G and node H. Thus the concept of embeddedness provides information that if two individuals are connected by an embedded edge then this makes it easier for them to develop a trust level and generate confidence for transferring vital information or interacting with each other [8].

G. Betweenness
Betweenness of a node is explained as the total amount of flow that it carries when there exists a unit of flow between each pair of nodes is divided up evenly over shortest path.Nodes with high betweenness occupy critical roles in the network structure.To compute betweenness efficiently we use the notation of breadth-first-search.For a give graphical structure the calculation of betweenness is done on the perspective of time.For each given node the total flow from that node to all other is distributed over the edges.This technique is applied on every node in order to simply add up the flow from all of them to get the betwenness on every edge [8](shown in schema 2).

H. Affiliation
Affiliation, a concept that is associated with homophily graph, [8] [9] can be used to represent the participants, i.e. a set of people, in a set of foci (representing some kind of community).For example, node A, representing a person could participate in focus X through an edge.These kinds of graph are said to be affiliation network, since it represents the affiliation of people (on left) with foci (on right).Affiliation network is one of the examples of the bipartite graph.Bipartite graph: A graph is said to be bipartite if its nodes can be divided into two sets in such a way that every edge connects a node in one set to a node in the other set [8].

Scheme 3 is an example showing nodes A and B representing people participating in
and Literature Club and Soft Computing ) foci.

I. Co-evolution of social and affiliation networks
New friend links are formed and people become associated with new foci over the period of time.This kind of formation can lead to a kind of co-evaluation which might indicate the selection choice of each individual and there social influence.For example if two people belong to a same focus then there is a probability that they become friends and can influence each other with their community they belong.According to the graph theoretic representation nodes are used as both people and foci but the difference is created by distinct type of edges.Firstly, an edge in a social network, it connects two people and indicates friendship.Secondly, an edge in an affiliation network, usually known as 'social-affiliation network'.This edge connects a person to a focus and designates the operation of the person in the focus.These two parameters can be resembled in the following schema 3.

III. RECENT TRENDS AND MOTIVATION
While exploring the incentive generation, the basic economics model of incentive distribution has become crucial trend to be studied.Research already revealed the impact of incentives on worker self-selection in a controlled and restricted laboratory experiment.Subjects face the choice between a fixed and a variable payment scheme.Considering the status of the treatment, the variable payment is a piece rate, a tournament, or a revenue-sharing scheme [22] [23].
Person www.ijacsa.thesai.org The extension of applying the graph properties of social network is broadly inspired by Julia Poncela Casasnovas's research [24].She addressed the study of the evolution of cooperation on complex networks, using among the different social dilemmas.Emphasizing mainly on the Prisoner's Dilemma game as a metaphor of the problem, her research analysed possible outcomes of the dynamics, depending on the underlying topology.Very recently, in 2013, it has been pointed out that the topology not only highlights homophily or other associated properties but also it leads toward role discovery problem.role discovery problem [25] finds groups of nodes that share similar topological structure in the graph.But is it only topology or economies of incentive to quantify the distribution of pay off under network?We investigated from significant work on evolutionary games of Nowak et.al.
[26] and found that even payoff determines reproductive rate and successful individuals have a higher payoff and produce more offspring.Still it cannot be assured that the payoff also could be signified by the carrying capacity of individual participants in networked games.Finally, this work adopts the emerging strategies of evolutionary game based incentive distribution for any instance of social network under test.

IV. EXPLORING MATHEMATICAL TREATMENTS
The most influential community (focus) can be determined by the frequency of the clicks made by the individual nodes.However, visiting a focus and being a follower and then a member of that focus are two different aspects.Visiting a node could mean only collect information while being a member means making the community well know.As such, to find the most influential focus we need to find all possible paths from the source to the destination and finding the shortest path by computing the Betweenness values.Considering the graph one node at a time, the distribution of the total flow over the edge from that node to the other nodes could be computed.And hence, the betweenness of the every node could be calculated as follows: Betweenness of every node = ∑flows from different nodes The shortest path traversed from the source to the destination can be found by the Breadth-First Search algorithm.Hence the number of shortest path to each node should be the sum of the number of shortest path to all nodes directly above it in the breadth-first search [8].Valuation of the members of the community or focus can be denoted by the time spent on it by a particular node.Here lies the concept of Graph Occupancy, which can be calculated by calculating the time difference between Final time of leaving the focus and Initial time of entering the focus.Mathematically, this can be expressed as follows: ( Where t={1,2,……..m} Since we concentrate on each node for calculating the betweenness and thus finding their frequency of participants in making a focus famous thus we have the following set of Betweenness (B i ) and Frequency (F i ).This is expressed as: , describes the frequency of acceptance of a focus by different nodes.Based on a number of choices made by each individual, we can select out the most beneficial or famous (leader) communities, in terms of the population focus.This entire logic can be expressed mathematically as: where, p is the popularity of a node and C is a constant for each delay unit process.In equation [3], the number of paths to be followed to reach the destination is obtained by the betweenness calculation denoted as and frequency of the particular node can also be obtained and denoted as F i .The summation of these two values yields the minimum possible path to be traversed to point the popularity of the community.The maximum popularity of a community could be found by the product of this result, in other words, by applying the concept of Max-Min function, the maximum popularity of a focus can be obtained for the minimum path travelled from the source to the destination.

V. PROPOSED ALGORITHM -I
Finding the most influential community could be achieved through the following algorithm which has distinct blocks to evaluate the popularity and influence of node(s).The algorithm contains certain unconventional graph properties such as betweenness and embeddedness.

VI. IMPLEMENTATION AND ANALYSIS
The data set for validating the proposed model was collected from Facebook community network.The reason to choose Facebook was its close resemblance, in terms of nodes, betweenness and edges, to the classical graph theory models.A set of nine homophilic nodes, at a particular time, were selected for initial validation.In these sets, each node is connected to its influential nodes, which is denoted by the links and their weights.There are communities that belong to the nodes and each of these nodes tries to promote their own communities.
To denote the most influential focus or community, we need to find the shortest path (which also motivates the other nodes to join the community and increase the popularity).Embeddedness of the nodes are also considered to identify the common neighbors among them.Table 1 shows the parameters to be considered for the proposed algorithm.The output value in the table was computed after implementing the algorithm (using MATLAB version 7.10.0.499 (R2010 a)).The detailed explanations of the parameters are as follows: Friend List: 9 different Nodes represent 9 different friends with their communities Link Present: edges among the nodes with weights.

Embeddness: A→B edge having common neighbors.
There may be present or might not be present.If not present, then denoted by NIL.If present then node number is given.
Betweenness: involves reasoning about the set of all shortest paths, between pairs of nodes.

Graph occupancy: amount of time spent on a node.
Leading to the further popularity of a community and increasing the node linkage.The corresponding outputs are given in the following Table 1.To crawl Facebook, we implemented a distributed, multithreaded crawler using Python with support for remote method invocation (RMI) [11].Facebook provides a feature to show 10 randomly selected users from a given regional network; we performed repeated queries to this service to gather 50 user IDs to "seed" our breadth-first searches of social links on each network 1 [11].

A. Role of Evolutionary Dynamics
Based on the intial simulation, it has been demonstrated that there exist a strong cohesive directions with graph theory and social network in the context of homophilic community detection.We need further investigation for the specific attributes in the context of more homophily identification from the perspective of either directed or undirected graph.The extension of the algorithm will be significant to quantify the application specific investigation of homophily.
The extension also could revalidate the correlation of graph properties under social network.
Inspired by the phenomenal contribution in evolutionary game theory by Nowak and his colleagues, several non linear characteristics have been started adopting the concept of evolutionary dynamics [12 ] [13 ].It is evident that evolutionary dynamics are defined by nonlinear differential equations and therefore can be imported for revalidating complex graph and network with the growth of a social graph as mentioned in the first part of the algorithm.An evolutionary dynamic assigns each population game F an ordinary differential equation [14] on the simplex x.One simple and general way to define an evolutionary dynamic is via a growth rate function: Here, g represents the (absolute) growth rate of strategy i and it will as a function of the current payoff toreward the strategy.The previous algorithm only considered the shortest path, breadth first search and graph frequency.As Evolutionary Dynamics can also retrospect the growth aspects, propagation and mutation of message under any state of graph, hence therefore the conventional graph properties have been revalidated using the extended algorithm.It should be mentioned that proposed algorithm tries to incorporate potential strength of evolutionary dynamics for simulating www.ijacsa.thesai.org the behavior and growth of social graph structure.Series of more precise plots have been accomplished in post simulation of extended algorithm (Figure 4-8).Persistence across the network also indicates the out degree and also any average number of distinct tags of groups and of tag assignments of users having k out neighbours could be evaluated from social network as shown.The users, who have more contacts in the social network, tend also to be more active in terms of tags and groups.Average number of distributive tags denoted as n t. group tags on the specific message as n g and subsequently n w represents the list of predefined choices.Correlations between the activity of participants and their number of declared friends and neighbours can be identified: here also the k out neighbours have been taken into consideration.The data has been log-binned (Figure 3 (5) /* The probability Yi(t) that at least one mutation has occurred while the system was at state i before time t*/ 3. Initialize the system with N classes of social network instances at time t = 0, with its intialstart of connection, k out : out-degree of graph, occupancy on graph, and termination instance: evaluate: /* where: probability ) t ( m i  that the social network is at state i, P(t): sample occupancy,  represents probable mutation rate of messages across participants according the proportion of occupation*/ 4. Sample the next mutation time according to the cumulative probability P(t).This can be done via the 2 It is simply a function on two variables, i and j which are integers, when for each social network instances cardinality of the variables is large, then it could assist to approximate inference based with a constrained, lower complexity, adaptively sized sum for the target cardinal value [10 ].
inversion method, such that the next time t m = P −1(r), where r is a uniform random variable between 0 and 1. 5. Add t m to the current time of occupancy of participants and betweenness.6. Choose the specific score and plot according to their respective nore transition of the network and update the state of the system as per Step 4. 7. Remove extinct and redundant classes from the list and reduce the number N of classes accordingly.8.Return to Step 1 until finished.In case of large scale network like Facebook, the most conventionally investigated mixing pattern involves the degree (number of neighbours) of nodes.This type of mixing improvises the likelihood; leading users with a given number of neighbours connect with users of similar degree.This property is emphasized by computing multi-point degree correlation functions.Complementary cumulative conditional distributions as mentioned with group tagging, specific message tagging and pre-defined choice tagging, compared with the global cumulative distributions denoted by black lines (Figure 4).Even among the subset of users with a given k ou t, a strong disparity is still observed in the amount of activity and also around a specific community.Subsequently, the www.ijacsa.thesai.orgoccupancy on a specific graph instance results in the following plot: Log-log plot of the distribution of the contact durations and of the cumulated duration of all the contacts two individuals' m and n have over a day (w mn ).An interesting inference could be drawn that out of 88% of the total contacts sustained less than 1 minute on a specific tag or comments, but more than 0.2% persisted more than 5 minutes against a specific topic of interest.For the cumulated durations, 64% of the total duration of contacts between two individuals during one day last less than 2 minutes, but 9% last more than 10 minutes and 0.38% more than 1 hour.The small symbols reciprocate to the actual distributions, and the large symbols to the log-binned distributions [18] (Figure 5).The lower figures are the comparison of developer and project degree distributions in log-log coordinates.The R 2 of developer degree distribution from the simulated data in lower figure is 0.959 and the R 2 of project degree distribution from the simulated data in lower figure is 0.7657.Also the largest project size of the simulated data is just 1500.We can further lower this value by tuning the mutation parameter  [17] and P (t),and graph function of the extended algorithm.H log represents homophily log evaluated from message exchanged towards any specific and common interest [17].
The distribution of trend on population graph of the new attributes concerning the people refers category of homophily structures (shown in red line) where the majority of people in the population have lower levels of the attribute.Gradually, as time elapses, the most influential message/ broadcast configuration propagates across the population (indicated by green line) but subsequently; evolution dynamics pushes the higher level of the attribute with certain incentive.This is natural recipient of incentives under social network for those persons who are participating and interacting.In addition to, there are drifts of incentive distribution under different settings.Hence, eventually after a considerable period of evolutionary dynamic, drift increases at the high end of the attribute for homophilic structure and the distribution approaches reverts back again to normal.At this stage of simulation, it was not evident how the appropriate interaction could entail higher range of incentive among the participants.Therefore, a specific data set was chosen to exhibit different interplay among the social participants.The observations show that the frequency of cooperators (indicated by blue), deviators (indicated byred) and loners (represented as yellow) under smooth participation.This is also measured as the multiplication factor according to trust of discussion and interaction.Individuals are arranged on random regular graphs where each node has eight neighbors and they interact in randomly formed groups of size N = 5.For small multiplication factors,individuals dominate.The reason is simple: in this case even in a group of cooperators, the payoffs do not exceed the incentive of individual.All of the proposed components of three algorithms improvise dynamic equilibrium considering the population of social network.The range of multiplication with trust increases the pay off a very small discrimination ofr.And above the threshold deviators exist..Only for much larger value of r~4.056 cooperators reappear and co-exist with deviators.Since individuals are absent, the dynamics again retain voluntary participation into compulsory interaction.Finally, for r= 4.6 cooperators take over and manage to displace deviators (see Figure 9 and table  1).

Table 1. Configuring design of Incentives on SN
The proposed design involves 4 stages and 4 groups described below and summarized in the following Table : www.ijacsa.thesai.org Group 1: This group comprises of more than 65 subjects.The default initial value of the participating reward is 10 units and the final value could be measured as Ω.
 Group 2: This group represents almost 50 subjects with initial incentive but in subsequent sessions the experience differs with different fractions of incentives depending on the frequency of interaction as shown in Figure 8.
 Group n: The group enhances its subject line > 100, but in addition to the normal incentive distribution, there will be different treatments for incentive either there should be donation or additional amount the participants will push into it.

B. Comment on final Homophily Sturucure
The homphily evlalution from the social graph is completely based on the increase of posts, tags and other social network action artifacts.Keeping in mind about the non-linear aspects and growth strategy of socail network, evolutionary dynamics, comparatively better visualization of density of attributes has become possible.Especially, betweenness and graph occupancey and mutation could be the one of those key factors.Recent research of Tom A.B. Snijders and his colleagues [15] demonstrated the relation between evolutionary dynamics and homophily of social network in some common aspects of friendship, recommendation, group study and selection etc. Figure 7 exhibits trend of precise visualization of homophily structure as guided by the extended algorithm with mutation as prime parameter.The following table 2 should be considered to understand the relationship shown in Figure 8: the red circle signifies higher cluster density and therefore providing the remaining count of homophily over the population.Affiliation being an important factor of homophily graph has a relevant role in promoting the focus through the nodes.The idea is how a friend in the Facebook or any social network can influence his friends to join the community he or she belongs to.This paper investigates such possibilities by exploring the homophily community in social network by using graph property based algorithm and simulation.Further, incorporation of evolutionary dynamics also contributed for investigating homophily property with better approximation.Subsequently, the growth of social network, temporal behavior and trend of it, can also be investigated by augmenting the existing evolutionary dynamics algorithm.On the other hand, the ranking algorithm or conventional classifier model can also be extended using initial attributes of embeddedness and graph traversal with graph occupancy time.Soft computing based (fuzzy and rough set) homophily identification from graph properties could be an emerging research on computational social network.Finally, in the extended analysis part, certain significant observations are made in terms of maximizing benefit or profit, based on their role at specific instances under social network.Experiments have incorporated certain public data sources to demonstrate mutual interplay of social network participants, their specific contribution towards the network and share of incentive if any.
Friend list  Community links  Graph occupancy period: Initial time and final time  Paths and connectivity  Path traversed  Embeddedness  Betweenness of nodes  Affiliation  Co-evaluation of social and affiliation network

Schema 2 :
Local betweenness(the local betweenness of actor 1 is 2

Schema 3
Affiliation through bipartite graph.

Schema 3 (
a) When A, B, C are three different persons.Schema 3 (b) A and B represent people but C denotes focus.
is non-linear relation with betweenness.

Fig. 1 .
Fig. 1.Nodes with their links and weights Figure 1 represents nine different influential nodes with their links represented in the directed graph format.Each of the edges has their weights marked on it.Here, the shortest path has been considered from a source node say Node1 and the destination node, Node 6.Time duration, that is the amount of time spent by the nodes in their communities, can be an invoking factor to join their community.Using equation (3), Figure 2demonstrates that the most influencing or inspiring node enhances the number of functionally active node.In this figure, it is shown that the inspired nodes have added on more of the connectivity with other different nodes.

Fig. 2 .
Fig. 2. Enhancements in new connections with the existing invoking nodes ): by definition a bin of constant logarithmic width signifies that the logarithm of the upper edge of a bin (x i +1) is equal to the logarithm of the lower edge of that bin (x i ) plus the bin width (b) [19].Here, the symbols indicate the average, and the error bars with near optimal 25 and 75 percentiles for each bin.The algorithm deploys the Python 2.7.3, which was released on April 2012.The present implementation also allows the feature of Automatic numbering of fields in the str.format() method, which have been reflected in the post implementation stage of algorithm.Algorithm II: Graph Pattern (G, P (t),  ) 1. Define the initial state of the graph , i.e. define G i = i  /* i  is the 2 Kronecker symbol*/ 2. Solve for Pi(t), which provides functions P(t) and ⇢i(t) /*P(t) : Probability of sample occupancy time for t

Fig. 5 .
Fig. 5. Occupancy Graph for Social Network with Mutation Probability of Messages Degree distributions of evolutionary dynamics and the empirical data: 'o' represents the data points from the simulated data and 'x' represents the data points from the empirical data new project The results of the degree distributions of the extended algorithm with dynamic values of mutation are shown in Figure 6.The upper figures are the comparison of developer and project degree distributions in linear coordinates.

Fig. 9 .
Fig. 9. Simulation of Pay off distribution Values form the betweenness leads to find the shortest path 7. Calculating the shortest path based on Breadth first search 8. Select a random node which has visited a particular community node at least once www.ijacsa.thesai.org