Analysis of Purchasing Tendency using ID-POS Data of Social Login User Case study of golf portal site

This study targets social login registrants on an EC site and aims to clarify the difference between the purchasing tendency of social login registrants and general members by analyzing product purchasing history. The authors focused on the golf portal site that is the subject of this research. The authors analyzed the purchasing data comparing social login registrants with general members. It became clear that the social login registrants and general members have different distribution regarding the number of purchases and purchase type. Moreover, the social login registrants have a larger range of purchase types per purchase and they are purchasing from a variety of genres. In addition, the authors analyzed them with a focus on the relationship between products purchased. As the results of network analysis, it became clear that the existence of specific product combinations (concentrated sets on the network) more readily purchased simultaneously by Facebook users than by general members. Moreover, the authors compared each network tendency using a network index (degree, closeness and betweenness centrality). As the results, it became clear that social login registrants have less resistance to purchasing expensive products on an EC site compared with general members and golf gears act as a bridge for purchasing. Keywords—Social Networking Service; Consumer Behavior; Network Analysis; ID-POS Data


INTRODUCTION
There has been a general upward trend in the scale of consumer orientated electronic trading (Internet shopping) around the world in recent years.The Internet has ceased to serve simply as a media for the transmission of information, but rather continues to evolve as a platform (EC sites) for trading.As such, the importance of Internet marketing is on the rise [1].
There is an increasing interest in Consumer Generated Media (CGM) as represented by Social Networking Services (SNS) as a usable source of information for Internet marketing [2] [3].CGM refers to media created by consumers using the Internet.The content of CGM ranges from information exchange regarding various products and services, to everyday events.The onset of CGM has allowed companies operating EC sites easy access to conventionally unavailable consumer feedback.Combined analysis of information accumulated by EC sites (purchasing history and access log data) and information broadcast by consumers is the subject of tireless research and development to realize more accurate behavioral analysis of consumers [4].
It is in such a context that more and more companies are adding social login features to their EC sites.A social login resembles a single sign on service offered by companies.Social login registrants can login to the company's EC site using their Facebook (SNS), Twitter or other such account.Well known companies offering a social login service include American technology company GIGYA 1 .Utilizing social login enables users to streamline the registration process, while also alleviating the need for separate accounts on each EC site and reducing the risk of losing the password.On the other hand, social login can significantly reduce registration barriers and increase their ability to create new customers for companies.Moreover, social login facilitates the connection of registered consumers with a unique ID and social media account.In other words, companies can simplify the acquisition and analysis of information of their customers.
Social login is being introduced on the EC sites of various companies.Many studies looking at the affects of social login services on EC sites focus on access history such as pages viewed and page views per visit.Page view and pages viewed per visit are an important index when evaluating the effect of social login.However, the authors think that a more detailed analysis about purchase behavior focused on actual products is necessary.It is thought that close observation of customer tendency and purchase behavior will boost the affect of marketing.Consequently, this paper reports the analytical findings of research conducted to clarify the difference between the purchasing tendency of social login registrants and general members based on product purchasing history.This paper is organized as follows.In chapter II, the authors describe the prior research focusing on behavior on social media.In chapter III, the authors describe the research objective.In chapter IV, the authors describe the result of the analysis using purchasing data compared social login registrants and general members.Based on the results, in chapter V, the authors describe the result of network analysis that focus on the relationship between products purchased.In chapter VI, the authors summarize all aforementioned analyses, review results and discuss.In chapter VII, concludes the paper.

II. PRIOR RESEACHES
This chapter is a short summary about research focusing on behavior on social media (actions on social media including access behavior and posts etc.) and purchase behavior.Many representative studies addressing behavior on social media and purchase behavior concern movie box-office records.Some such examples include a study [5] that divided the content of comments posted to YAHOO! Movies about a certain movie into positive and negative and analyzed the connection with that movie's box-office record, a study [6] analyzing the effect of negative posts on social blogs on a movie's box-office record, and a study [7] that modeled the combined effect on movie box-office records of both social blog posts and the volume of television advertisement.As Tsurumi [8] et al. have pointed out, it is thought that the reason these studies focus on movie box-office records is the ease of access to such box-office records and the related text data of reviews and comments posted by consumers.
Each of these relatively successful studies is extremely important as works focusing on behavior on social media and purchase behavior.On the other hand, many of them focus on movie box-office records.As Tsurumi [8] et al. have pointed out, movies exhibit a special characteristic in that many consumers comment after having watched the movie.It is thought that a detailed analysis of a more general product is required when applying social media to marketing activities [9].This study differs from conventional research by selecting a golf EC site as the subject of research.

III. RESEARCH OBJECTIVES
This study targets social login registrants on an EC site and aims to clarify the difference between the purchasing tendency of social login registrants and general members by analyzing purchasing tendency based on product purchasing history.In addition, this study borrows data from Golf Digest Online Inc. 2(herein referred to as GDO), the operator of the golf portal site that is the subject of this research.GDO is one of the largest golf portal sites in Japan boasting some 2 million members.Users can make golf course reservations, shop, manage and analyze scores online, and gain access to the latest golf news and product information.GDO has introduced social plus, a social login service operated by Feedforce Inc. 3 that tenures a connection between the social login registrants' Facebook and Twitter accounts and the unique ID used on GDO's EC site.This study utilized the purchasing data from GDO's EC site.

A. Data Set
The authors will begin with an explanation of the data set utilized in this study.Purchasing data from the 24 month interval between January 2012 and December 2013 was used.The purchasing data listed the product name, date, unique ID, product purchase price and other relevant data for each product purchase.The authors extracted purchase data about social login registrants from this purchasing data.Additionally, users who had registered in June 2013 were extracted to collect their purchasing information for the relevant period (Facebook: data set 1; Twitter: data set 2).This study utilizes these users as a social login registrant data set.
Next, a sampling was made of users not included in data set 1, 2 and who are not social login registrants.The sampling randomly selected the same number of users (1653 users) as Facebook social login registrants to collect their purchasing information for the relevant period (general members: data set 3).An overview of each data set is shown in TABLE I.

B. Classification Based on Purchase Price
This section presents the results of classification based on the purchasing tendency and purchase price of each data set extracted in the previous section.Firstly, the purchase price for users during the period was derived to confirm the purchasing tendency of each data set.User results were plotted in Fig. 1, 2, and 3 with the primary axis (left) as the purchase price per user (blue lines) and the secondary axis (right) as the cumulative purchase price ratio(red lines).
Secondly, a decile analysis of each data set was performed.Decile Analysis is a method of customer analysis that groups all customers into 10% categories (deciles) from highest to lowest purchase price [10].The total purchase price of each decile is analyzed to derive the percent distribution in relation to overall sales and determine which customer segment contributes to sales.The results of decile analysis clearly showed that purchases by the top 30% of users accounted for some 80% of sales.Moreover, while each TABLE indicates that very few users make extremely expensive purchases, it shows that the purchase price of many users is low, confirming that purchasing tendency commonly follow the power law.
Thirdly, users were classified in each data set based on the cumulative purchase price ratio.More specifically, users with a cumulative purchase price ratio up to 70% were placed in the high purchase price group (High Group), users between 70% and 95% in the medium purchase price group (Middle group), and users with 95% and more in the low purchase price group (Low group).The numbers and ratio of each group are shown in TABLE II.

C. Comparison of Purchasing Tendency
This section presents the results of a comparison between the overall tendency and the purchasing tendency of each group classification using the classification based on purchasing price conducted in the previous section.For this comparison of purchasing tendency, this study focused on the number of purchases and purchase type per user during the relevant period.Compared with users purchase specific products, it is known that users purchase multiple types of products are higher purchase ratio when presented with product recommendations.Therefore, the authors think that product type is a necessary element when considering potential demand.The authors determined the number of purchases and purchase type per user during the relevant period for each data set.The authors derived the product types from the 49 separate classifications used by GDO; irons, outers (blouson, wind breaker, jacket), underwear, discount wear sets, wedges, calendars, caddy bags, socks, golf gear cases, golf gear sets, grips, gloves, competition gifts, sunglasses, shafts, shoes, drink cases, skirts, spikes, tees, drivers, travel covers, shorts, putters, videos/DVDs/tickets, fairway woods, vests, head covers, belts, balls, carry bags, utilities, small golf goods, repair goods, rainwear, long pants, one-piece dress, distance-measuring equipment, socks, health goods, umbrellas, books, mid-layer wear (sweater, trainer), long-sleeve shirts and polo shirts, electronics, short-sleeve shirts, polo shirts, hats, and practice goods.
Firstly, the difference between averages was tested for the number of purchases and purchase type in each set.The authors began by testing the Kolmogorov-Smirnov normality.The results showed that none of the data sets followed normal distribution for a significance probability of 0.01.Next, a Levene test was conducted to investigate the distribution ratio.The results showed that all of the data sets have unequal distribution for a significance probability of 0.01.Consequently, this study conducted the Kruskal-wallis test-one of the nonparametric testing methods-to test the distribution.Null hypothesis has the distribution of product purchases (purchase types) as equal for each data set.The results dismissed the null hypothesis and showed that the distribution is not equal to a significance probability of 0.01.Next, a multiple comparison using the Mann-Whitney U test was conducted to determine which data sets have a difference in distribution.In addition, Bonferroni correction was used for the multiple comparisons.The results showed that the distribution of product purchases and purchase types differed for all data set combinations for a significance probability of 0.01.
Secondly, linearization was conducted using the least-squares method to clarify the relationship between the number of purchases and purchase type for each data set.A fitted line was added to the scatter graph shown in Fig. 4, with the number of purchases as the x-axis and the product type as the y-axis.Users with the same number of purchases and purchase types are plotted on top of each other resulting in the high density seen in the bottom left of Fig. 4. The coefficient of determination was 0.7214 for general members, 0.7791 for Facebook, and 0.8693 for Twitter.Closer inspection of the fitted line clearly shows that its gradient is larger for both Twitter and Facebook than for the parent population of general members.This therefore clarified that users registered through social login have a strong tendency to increase the type of products purchased in proportion to the number of purchases in comparison to general members.
The following looks at the difference between each group using a classification based on the purchase price.Fig. 5, 6 and 7 are scattered graphs plotting users for each group with the addition of a fitted line and coefficient of determination where the number of purchases is the x-axis, and product type is the y-axis.High Group (high purchase price group) had the greatest difference between data sets.Furthermore, Low Group (low purchase price group) had the least difference between data sets.
These results show that social login registrants and general members have different distribution regarding the number of purchases and purchase type, that social login registrants have a larger range of purchase types per purchase and are purchasing from a variety of genres, and that this tendency is especially evident for users purchasing high priced products.

V. ANALYSIS OF PURCHASED PRODUCTS USING NETWORK ANALYSIS
In the chapter IV, the authors focused on the number of purchases and purchase type for the data sets of both social login registrants and general members during the relevant period and analyzed the relationship and difference between averages for the number of purchases and purchase type.The results showed that social login registrants have a larger range of purchase types per purchase, that they are purchasing from a variety of genres, and that this tendency is especially evident for users purchasing high priced products in High Group.
This chapter conducts analysis with a focus on the relationship between products purchased.Network analysis was used to analyze the relationship between products.Utilization of network analysis and creation of a network graph facilitates an intuitive understanding of which products are closely related (i.e. which products are more readily purchased simultaneously).This chapter aims to clarify whether there is a difference in the purchasing tendency of user registered via social login and general members.

A. Visualization Using a Network Graph
The authors set the 49 product types (chapter IV) as nodes, and an undirected edge was added between products if a product was purchased by the same user during the relevant period.The Kamada-Kawai Layout Algorithm was used for the creation of the network graph [11].This algorithm is a visualization method based on the spring model in which the ideal distance between all nodes is determined and this ideal distance is depiceted 2-dimensionally as closely as possible.In the case of this study, products simultaneously purchased by many users are shown close together allowing for an intuitive understanding of the relationship between products.Firstly, network graphs created using the purchasing history of every user in each data set are shown in Fig. 8, 9, and 10, and the network index of each network graph as TABLE III.
Closer inspection of each network graph reveals that nodes are equally spaced for general members in comparison to Facebook and Twitter users.This suggests that a lack of products that can be purchased simultaneously on the whole.On the other hand, when the authors look at Facebook and Twitter, nodes can be seen in proximity here and there.This suggests that these users have more products to purchase simultaneously, when compared with general members.

B. Analysis of Product Relationships with a Focus on the High Price Purchasing Group
Next, a more detailed analysis is conducted using the groups derived from cumulative purchase price ratio.The authors compared every possible combination of groups A, B, and C. The results showed a significant difference between the Facebook and general members of High Group (high price purchasing group).Analysis will herein focus on the Facebook and general members of High Group.
Firstly, the authors shown network graphs created using the purchasing history of users in both the Facebook and general member data sets in Fig. 11 and 12. Close inspection of Facebook's High Group reveals a significantly greater deviation compared with the network graph (Fig. 8) created using all data.The authors thought that there is a strong relationship between specific products (more readily purchased together).Further examination of those combinations confirmed that winter apparel in the blue circle including sweaters, trainers, and long pants (center bottom right of Fig. 11), and summer apparel in the red circle including short (long)-sleeve shirts, polo shirts, hats, and shorts (center top of Fig. 11) tend to be more readily purchased simultaneously.Moreover, Golf Gears in the green circle including drivers, irons, wedges and grips (center bottom left of Fig. 11) tend to be more readily purchased simultaneously.Next, the authors will focus on the general members of High Group.As with the network graph (Fig. 10) created using all data, a uniformly distributed graph was created with no deviation between products.From these results, it is thought that there are no specific products with a strong relationship (more readily purchased together) for general members.
Figure 13 and 14 are contour graphs of Fig. 11, and 12, respectively.These are created through 2-demensional kernel density estimating using coordinate data of nodes 4 .Comparing these graph, the authors can found that Fig. 13 which is the Facebook member have two peak regions.In these regions, some node concentrated.In this study, the authors set 1 (80−1) 2 × 1.2 as a threshold for the peak.Then the authors found three peak regions in the graph.In Fig. 11, the authors depict three color circles (red, green and blue).The red region contains summer seasonal apparel and green are golf gears.
However, in Fig. 14, the authors cannot find an obvious peak.From this result, there are some different purchasing behavior between Facebook and general member.Moreover, Facebook members are easy to purchase simultaneously some specific categories.Secondly, the authors compared each network tendency using a network index.This study compared network tendency using centrality.Centrality is an index used to identify the central node (therefore taking on a vital role) of a network.This study deals with the three separate index concepts of (1) degree centrality, (2) closeness centrality, (3) betweenness centrality.The three centralities were calculated for High Group of Facebook and general members and the five nodes with the highest centrality were extracted and compared.
The authors will first look at degree centrality.Degree centrality is the most basic method of calculating centrality in 4 The authors set 80 as band width which nodes where edges converge are deemed as having higher centrality.Therefore, the nodes degree is used as the centrality.The calculation results are shown in TABLE IV where the calculation equation is equation (1) when degree centrality is dgc(v).The degree of node v is written as deg(v) = |Γ(v)|.Here, Γ(v) is an adjacent node set of node v.
Closer inspection of products with a high degree of centrality shows that the same products appear in the same order up to the top 5 for both Facebook and general members.Therefore, it was clarified that the same products (quantitatively) were being purchased together with other products.
Next is a look at closeness centrality.Closeness centrality is a method of calculating centrality in which centrality is higher the closer the distance (therefore other nodes can be reached with a small step) is between the nodes.The calculation results are shown in TABLE V where the calculation equation is equation ( 2) when closeness centrality is clc(v).Here, d(v, u) is the step number between node v and node u.
Looking at products, it became clear that Facebook has a high centrality for golf gears.Therefore, it became clear that golf equipment was at the center of the network and strongly tended to resemble other products.On the other hand, it became clear that the distance between products such as clothes and bags tended to be small for general members.
Lastly is betweenness centrality.Betweenness centrality is a method of calculating centrality in which centrality is higher the more channels there are passing through a certain node.Therefore, this implies the importance of nodes acting as bridges between nodes in the network.The calculation results are shown in TABLE VI where the calculation equation is equation (3) when betweenness centrality is bwc(v).Here, σ s,t is the minimal pass number between nodes s, t, and σ s,t (v) is the minimal pass number between nodes s, t passing through node v.
Looking at products, it became clear that there is a high centrality for consumable goods like balls, gloves, and small golf goods for general members, while Facebook showed a high centrality for golf gears like drivers, irons, and fairway woods.

VI. RESULTS REVIEW AND DISCUSSIONS
This chapter will summarize all aforementioned analyses, review results and discuss.Firstly, looking at the number of purchases (4-A, TABLE I) for data sets of the same period, social login registrants have a lower total number of purchases and a lower average number of purchases per user compared with general members.Moreover, examination of the average purchasing price shows general members at 125,601 (JPY), while Facebook is 65,430 (JPY).This therefore points to the fact that social login registrants are not always highly profitable consumers for companies.
On the other hand, detailed analysis (4-C) of the number of purchases and purchase type clarified that social login registrant's purchase a greater variety of products per purchase.It is speculated that this is due to users viewing pages across multiple product genres on GDO's EC site and purchase multiple products of their choice rather than purchasing several specific (predetermined) products.Moreover, social login registrants have better access to information regarding new products, sale items, and campaigns on GDO's Facebook page (in addition, the authors has confirmed that there were no social login specific sales or campaigns during the relevant period).It is speculated that while consumers have different values, in general they are likely to be enticed by sale items and bundle items with a lowered price, and that social login registrants are especially conscience of prices and price drops.
The interest social login registrants have in price drops can also be explained by network graph tendency.Network analysis (5-B) clarified the existence of specific product combinations (concentrated sets on the network) more readily purchased simultaneously by Facebook users than by general members.
It became clear that concentrated sets in the network are characterized by purchase behavior and a strong relationship between summer and winter apparel despite seasonal differences in sales numbers.Japan has 4 distinct seasons and many EC sites hold sales in conjunction with seasonal changes.In particular, it is not uncommon for apparel to be heavily discounted to reduce the risk of dead stock due to constantly changing tendency.These results suggest that social login registrants are accustomed to shopping on EC sites.They are sensitive to GDO's sales such as seasonal price drops and campaigns and are shopping wisely.
On the other hand, network analysis clarified other strong relationship about the golf gears in sales numbers by Facebook users.Additionally, another characteristic was confirmed from the centrality derived through the network index analysis (5-B).Firstly, focusing on degree centrality produced exactly the same result for the most highly purchased products up to the top 5 for both Facebook and general members.On the other hand, golf gears came out on top for Facebook when focusing on closeness centrality.Furthermore, expensive golf gears came out on top even when focusing on betweenness centrality.Correspondingly, it became clear that the exact opposite is true for general members when focusing on betweenness centrality with inexpensive consumable goods like balls, gloves, and small golf goods coming out on top.
There are many companies in Japan selling golf products in retail stores (for example, Victoria Golf http://www.victoria.co.jp/victoriagolf, Niki Golf http://www.nikigolf.jp/top/index.aspx).Retail stores offer the chance to test swing golf gears, check form, and even consult real agents.In actual fact, many golfers test swing gears at retail stores and consult an agent when purchasing a golf gear.Golf gears are also some of the most expensive golfing items and many have reservations about purchasing them on an EC site.As such, companies operating EC sites like GDO are faced with the challenge of increasing golf gear sales on their sites.
The results of network analysis confirmed that social login registrants have less resistance to purchasing expensive products on an EC site compared with general members, that it is generally desirable to test products before purchasing (of course it is expected that some users test swing at retail outlets and then purchase on the net), and golf gears act as a bridge for purchasing.
The results of network analysis and analysis focusing on the number of purchases and purchase type showed that social login registrants and general members exhibit different purchasing tendency.

VII. CONCLUSION AND FUTURE WORKS
In this study the authors target social login registrants on the golf EC site and aims to clarify the difference between the purchasing tendency of social login registrants and general members by analyzing product purchasing history.
The authors analyzed the purchasing data comparing social login registrants with general members.It became clear that (1) the social login registrants and general members have different distribution regarding the number of purchases and purchase type, (2) the social login registrants have a larger range of purchase types per purchase and they are purchasing from a variety of genres.Based on the results of analysis of the purchasing data, the authors conducted network analysis focus on the relationship between products purchased.It became clear that the existence of specific product combinations (concentrated sets on the network) more readily purchased simultaneously by Facebook users than by general members.Additionally, the authors compared each network tendency using a network index (degree, closeness and betweenness centrality).As the results, it became clear that social login registrants have less resistance to purchasing expensive products on an EC site compared with general members and golf gears act as a bridge for purchasing.From these results, the author considered that social login registrants and general members exhibit different purchasing tendency.Future research will include a follow-up study of members analyzed in this study and a survey of changes in purchase behavior after registering as a social login user.Moreover, an analysis focusing on the attribute information of social login registrants will be conducted and their purchase behavior examined in comparison to general members with regard to social login limited sales.

Fig. 8 .
Fig. 8.A network graph created using the purchasing history (Facebook)

Fig. 11 .
Fig. 11.A network graph created using the Facebook of High Group data

TABLE I .
OVERVIEW OF DATA SET

TABLE II .
RATIO AND THE NUMBER OF USERS OF EACH GROUP

TABLE III .
NETWORK INDEX OF EACH NETWORK GRAPHS

TABLE IV .
CALCULATION RESULTS OF DEGREE CENTRALITY

TABLE VI .
CALCULATION RESULTS OF CLOSENESS CENTRALITY

TABLE VII .
CALCULATION RESULTS OF BETWEENNESS CENTRALITY