Personalized Recommender by Exploiting Domain based Expert for Enhancing Collaborative Filtering Algorithm : PReC

The large amount of information available on the internet initiated various Recommender algorithms to act as an intermediate between number of choices and internet users. Collaborative filtering is one of the most traditional and intensively used recommendation approaches for many commercial services. Despite providing satisfying outcomes, it does have some issues that include source diversity, reliability, sparsity of data, scalability and cold start. Thus, there is a need for further improvement in the current generation of recommender system to achieve a more effective human decision support in a wide variety of applications and scenarios. Personalized Expert based collaborative filtering (PReC) approach is proposed to identify domain specific experts and the use of experts preference enhanced the performance of collaborative filtering recommender systems. A unified framework is proposed that integrates similar users rating data, experts rating and demographic data to reduce the number of pairwise computations from the search space to ensure scalability and enabled fine grained recommendations. The proposed method is evaluated using accuracy metrics MAE, RMSE on the data set collected from MovieLens datasets. Keywords—Recommender system; collaborative filtering; domain based experts; demographic data


I. INTRODUCTION
With an overwhelming growth of information available over the internet in recent years, Recommender systems [1], [2] have proven to be a powerful tool whose aim is to guide users with personalized recommendations and alleviate the potential problem of information overload.However, the performance of these systems in dynamic environments remain unsatisfactory, thereby making Recommender systems inefficient, inaccurate and less robust to the changes of user preferences thereby motivating for further research in the field.
Despite the success of several Recommendation approaches such as Collaborative filtering [3], [4], Content based [5], [6] and Hybrid filtering [7], there have been several limitations increasing the need to provide effective and accurate recommendations.Collaborative filtering is one of the most traditional and intensively used recommendation approach for many commercial services like Movies recommendation, Music Recommendation, News Recommendation, Book Recommendation, etc., as it is content independent and easy to implement.In this approach, recommendations are generated based on user ratings and the similarity measures between users (User-based CF) and/or items (Item-based CF).Despite providing satisfying outcomes, existing collaborative filtering methods does have some issues that include source diversity, reliability, sparsity of data, scalability and cold start.Thus, there is a need for further improvement in the current generation of recommender system to achieve a more effective human decision support, in a wide variety of applications and scenarios.
In order to address the aforementioned problems, existing solutions in the literature have been explored and an effective recommendation approach is proposed that can outperform traditional algorithms.A Personalized Expert based collaborative filtering (PReC) is proposed for domain specific expert identification by analyzing each user and item subgroups to predict relevant unseen preferences and improve the performance of the system in terms of source diversity and recommender reliability.A novel use of expert"s preference elicitation enabled fine grained recommendations thereby, solving sparsity and cold start issues.Intelligent Demographic filtering mechanism is introduced to reduce the number of pairwise computations from the search space to ensure scalability.Finally, a versatile unified framework is proposed in which all the aforementioned approaches have been integrated to handle the challenges of current generation Recommender systems.

II. RELATED WORK
Recommender system has come into existence to overcome the choice option by providing recommendations that are tailored to individual user preferences.Research shows that Recommender systems [1] [2] have increased the sales and customer satisfaction.It guides users by providing personalized information in a large space of options.Collection of input data is one of the important components of Recommender system.It considers user preferences in the form of explicit rating i.e., providing 5-star ratings or implicit data by clicking, browsing, etc.Since the inception of Recommender systems, today we have many Recommendation strategies/algorithms based on the user ratings, content of the item, location, content, etc. www.ijacsa.thesai.org

A. Collaborative Filtering
Collaborative filtering [3] [4] is one of the widely used technique to predict users preference or generate list of user preferences.However, there exists several issues which need to be addressed that include the complex computation for large dataset of users, understand the diversity of different information sources and understanding user preferences when there is sparsity of data.Prior research in Collaborative Filtering Recommender systems focus only on the accuracy of the recommendations at the cost of scalable and reliable recommendations.Among the various Recommender systems used, early systems that used collaborative filtering approaches are Grundy System a book recommender, Ringo [8] which recommended personalized music, Tapestry [9], a document recommender, an article recommender Grouplens [10], Amazon.com[11] recommender which recommends the relevant items, etc., In [12] improved collaborative filtering approach is used by extending user-item rating matrix that could alleviate the data sparsity issue but could not solve cold start problem and user interest shifted with time.In [13] Regularized Matrix Factorization is used to develop an Incremental Collaborative Filtering Recommender system.

B. Content-Based Recommender System
Content based recommender systems [5], [6] can be used in a wide variety of domains ranging from recommending news articles, hotels, movies and items for sale.Also referred as cognitive filtering, the key idea of content based recommender is to recommend items to the user according to relevant attributes of the item and user preferences by attributes.Content based recommender system [6] compares the items to others in the collection.Items with a high degree of similarity are presented as recommendations.This is called as "item-to-item correlation".The information source used by the content-based filtering systems is text documents.These items are typically described with keywords and weights.By analyzing the keywords and document the system recommend a suitable item using probabilistic methods, for e.g., Naïve Bayes, decision trees [14], clustering, nearest neighbor, etc., There are many ways to build a user profile which involves (i) inferring profile from user actions (implicit).It includes read, buy, click, etc. and (ii) inferring profile from explicit user ratings, which includes feedback technique by selecting a value of range, filling out forms, etc., Both implicit actions and explicit ratings are processed against the database of content attributes to build the content profile.Content based recommender systems have been applied in various applications.The Fab system [5], which recommends Web pages to users, [15] represents documents, DailyLearner system [16] filter out news items too similar to those already seen by the user, etc. Content based recommender is effective in recommending accurate items when they are better described.The profiles of other users do not influence the recommendations of the target user as the recommendations are based on individual preferences.Content-based recommender systems always need to consider the following issues: 1) Partial content analysis: It is difficult to recommend about the user"s profile if only partial content available.
2) High specialization: It restricts the users to rate the items by identifying the similar items defined in their corresponding profiles and so new items and other options are not revealed to rate.

C. Hybrid Recommender System
Hybrid Recommender system [17] combines both collaborative filtering and content based filtering approach in different ways: i) Implementation of algorithms separately and combining the results [18]; ii) Utilizing some capabilities of content based into collaborative based approach [19]; iii) Utilizing some capabilities of collaborative based into content based approach [20]; and iv) Constructing a unifying model using different approaches [21].
Hybrid Recommender system integrates various RSs which helps us to overcome the disadvantages and improve the performance.Therefore the quality of the recommendations provided to the end user or customer.Empirically, several studies compared the performance and quality of hybrid approach [22] and proved that the hybrid approaches provide recommendations more accurately than original recommender approaches.The very common issues cold start and sparsity are also eliminated.

D. Expert based Recommender Systems
A series of algorithms have been described in the literature to identify the experts and exploit their opinion to demonstrate their effectiveness.Earlier research expert is judged by comparing with a set of predefined keywords [22], that is time-consuming and also the profile becomes outdated that no longer reflects in future.This resulted an increased demand for automating and more focused towards the development of personalized expert based Recommender systems.To capture the expert, a number of approaches exist in the literature, like Probabilistic models [23] that can estimate the association between domain and expert users, discriminative models [24] that directly estimate the conditional probability to find the relevance, voting models [25] use voting mechanism to vote the users, graph based models [26] determine the associations by inference on a graph consisting items, users, etc., and there are some other models that make use of latent variables that correspond to a theme.In [27], expert ranking is considered as a voting problem using data fusion techniques.Author in [28] used context dependent expertise information to reduce the number of users.In [29], Ranking SVM is applied to rank the experts by using pairwise approach to rank and predict the candidates.In [30] an evaluation of Learning to Rank algorithms is proposed for expert search on the DBLP database.In [31] a supervised learning approach is proposed to aggregate ranking and apply the same to search the experts and their blogs.An expert user is identified through the blog entries in [32] to generate recommendations.Experts identified from the user community dynamically proved to be time consuming in [33].

E. Demographic based Recommender System
Demographic Recommender system [34] considers demographic data in collaborative filtering when providing recommendations.In [35], recommendations are generated based on the product demographic data learned from online www.ijacsa.thesai.orgreviews and blog entries.The demographic features of the tourist are taken in [36] without any rating data to generate the predictions but it can only achieve limited accuracy.A framework is developed by evaluating demographic attributes to solve cold start problem in [37].

III. PROPOSED METHODOLOGY
In this paper, we proposed personalized expert based collaborative filtering (PReC) to identify domain specific experts and use of demographic data with expert"s preference in order to improve the performance of traditional collaborative filtering recommender systems.Initially, EUCF is proposed that integrates collaborative filtering features and clusters similar users and similar items thereby promoting experts based on the user profile and exploit their opinions thus addressing the reliability issue and enables fine grained recommendations.Secondly, an intelligent demographic filtering mechanism is introduced that integrates demographic features and user based collaborative filtering to reduce the number of pairwise expensive similarity computations from the search space to ensure scalability and cold start issue.And finally, a unified framework is proposed that integrates similar users rating data, experts rating and demographic data to deal source diversity, sparsity and enable fine grained accurate recommendations.

A. Expert user based Collaborative Filtering (EUCF)
Based on the study, we developed a framework as given in Fig. 1 to identify the experts from users" responses having similar preferences to improve recommendation quality.While traditional systems compare users profile with other users to recommend unseen movies, our proposed system compare expert users profile with target user and use the experts" opinion that eventually results in better recommendations to the target user.

1) Neighborhood formation:
Once the rating profile of the user is constructed, similar users are identified by analyzing user profile (User-Item Rating Matrix) and by applying the similarity measure, Pearson correlation coefficient, equation (1).
where , is the rating of user x and y on"i", ̅ mean of all x"s ratings, ̅ , mean of all y"s ratings and n is the number of items Likewise, the similarity between items is computed by using the user item ratings and discovering similar items.For computation of similarity between items, Pearson correlation coefficient is used as given in equation (2).
where , is the rating of user i on item "t" and item "r" respectively, ̅̅̅ the average rating of item "t", ̅̅̅ , average rating of item "r" and n is the number of users.

2) Expert identification and generating recommendations:
Designing a framework for expert identification involves collecting user rating data from useritem profile that contributes in identifying users with highest access factor and most similar to the average ratings opinion in exchange of high quality personalization.A pseudo rating profile is generated by taking the average ratings of the users in that cluster.
Once the identification of similar users and similar items is completed, the selection of the right expert during identification phase is critical.Expert user is a user who is most similar to the average ratings and whose access factor is greater than the other users.An expert user may vary from one domain to another domain and there may exist more than one expert user for the same domain.The task of expert finding usually involves taking the user rating data as input, denoting the heavy access factor and finding similarity with average ratings and returning the list of users ordered by their expertise level.User"s expertise is characterized by the heavy access factor i.e., more number of ratings for a given domain and is given in equation (3).
Where set of items 'u' accessed, HA = 1 if "u' has heavy access.The features of user profile related to the number of items rated associated with each domain and also identifying the user profile who is most similar to the average rating assuming that the users with such profiles are likely to be considered as experts.The average rating similarity is calculated as shown in equation (4).
Where U, I are the user set and item sets respectively, n is the number of items, ̅̅̅̅̅ average rating of users who rated item "i'.Thus, the expertise can be measured by the following equation (5). www.ijacsa.thesai.org In selecting the experts, threshold value of fixed size can be defined.Small threshold value often results in high precision but low recall in experts finding, whereas large threshold value results in low precision but high recall.We assume that there may exist more than one candidate expert for every domain.Formally, given the user ratings, candidate experts are identified and the similarity of expert user and the target user is computed as shown in equation (6).
and then finally predictions are generated based on similar experts recommendations.The expert scores are averaged to generate predictions by the following equation: Where ̂ average rating of target user, rating of expert user on target item, ̂ average rating of target user to the items, s(u,e) similarity of target user "u" and expert user "e".An expert candidate is assessed by considering how many items (movies) have been rated, how many are related to the particular domain, and how similar the user with the average opinions.

B. Demographic User based Collaborative Filtering (UCFD)
We studied the difficulty in computing the similarity among different users from a large dataset.A novel approach is proposed for efficient computations of similar users based on demographic data of the user.We use demographic features of users to partition the set of users having similar demographic features i.e., age and gender and from the subset again, by analyzing the ratings of the user, similar users are identified by their ratings preferences.We assume that users with similar demographic features have similar user preferences that will effectively improve the time and performance in traditional Collaborative filtering system.Thus a unified framework is proposed that utilizes both rating data and demographic data to compute the similarity between users thus bypasses the scalability issue.Formally, the profile of a user is represented as a vector where the elements of the vector correspond to the demographic features of the user as shown in Table I.The input contains a demographic feature vector space of each single user and the output is the relevant similarity score of the users.By selecting a set of well-designed features i.e., age and gender, our proposed approach outperforms the traditional models in terms of scalability on the dataset extracted from MovieLens dataset.After data selection, pairwise computations are performed to partition the users into clusters.The demographic filtering of users mainly reduces the pairwise similarity computations that are performed during user similarity computation.Our proposed approach has the advantage of overcoming the sparsity when there is sparsity of data.The framework combines demographic based and user based collaborative filtering recommendations in order to derive benefits.Similarity of users based on demographic data can be computed using the following equation: and the similarity based on user rating data of similar demographic users is computed using Euclidean distance measure as shown in equation ( 9): d(u,a ) =√∑ (9) where, rating of user u and a on item "i" and " and the similarity between the users is given by the equation (10).s(u,a) = (10) Finally, as a result our proposed enhanced correlation is based on demographic correlation and similarity rating based correlation given by the following equation (11): dem_rat_sim(u,a) = α dem u,a + β s u,a + γ (dem u,a * s u,a ) (11) UCFD generates prediction for the target user based on the enhanced correlation given by Pred_dem_rat_sim(a,i) = where Pred_dem_rat_sim(a,i) predicted rating for target user a for item i, dem_rat_sim(u,a) similarity between user u and a, ̂ isaverage ratings of target user, rating of user u on item I ̅̅̅ average ratingsof user u andn is the number of similar neighbors.

C. Expert user based CF with Demographic Filtering (EUCFD)
A novel recommendation algorithm is proposed that fuses the opinions from similar users based on rating data, expert users (EUCF), and similar users based on demographic features (UCFD).A typical recommender system considers only one"s judgment i.e., similar user, whereas our proposed system considers two more sources i.e., Expert users and demographic similar users in addition to similar users predictions to generate efficient output as shown in Fig. 2. www.ijacsa.thesai.orgThe EUCF method extracts expert users from each item category and exploits their ratings rather than similar neighbors.In addition, the similarity between the target user and the users is considered based on demographic features in the prediction of ratings when there is sparsity of rating data and cold start issue and finally we fuse all the variants i.e., EUCF and UCFD to form EUCFD that utilizes domain based experts opinion by considering both demographic and ratings data.
Initially, we identify expert users from an independent user data set based on the number of ratings given to each item category.The demographic features of the active user are considered if there is sparsity of rating data.Recommendations are generated by considering the prediction terms of expert user and the accumulative values of similar users based on demographic data and similar users rating data.
1) Enhanced prediction term for target user: Finally, given a user"s choice of preference, personalized recommendations are generated based on their preferences from the movies space.The algorithmic approaches are ensembled to generate recommendations by considering three varying factors: similar users prediction term, expert based prediction term, demographic based prediction term as given in the following equations: From equation (13)(14)(15), we compute enhanced prediction term for target user as: Where α +β+ γ =1, ̅̅̅̅ is the average of target user ratings, SP(a,i) is the user similarity prediction term, EP(a,i) is the expert users prediction term and DP(a,i) is demographic prediction term.Parameter α , β, γ is used to tune between the three values.We aggregate the recommendations generated from each module by considering the weights.When there is an absence in the explicit user input ratings, thus resulting in the sparsity of data thereby resulting inaccurate recommendations, then the canonical proposed approach is to consider the demographic features of the users retrieved from the user profile and use these features to find the similarity between the users and to set recommendations for the target user.

D. Experimental Evaluation 1) Datasets:
In this study, we considered MovieLens datasets.MovieLens dataset consists of approximately 100,000 ratings, 943 users and 1682 movies, where each user has rated atleast 20 movies and the ratings are on a 1-5 scale with"1" representing least and "5" representing highest.Movielens is a web based Recommender system.The data used by Movie Lens is collected by GroupLens Research Project at Minnesota University.(http://grouplens.org/datasets/movielens/100k/). The statistics of the datasets are detailed in Table II.
2) Evaluation metrics: In order to evaluate our system, the data is divided into two sets with 80% -20% split ratio of training set and testing set.Let "x" be a variable that give the percentage of training and test data.If x=0.8, then it indicates that 80% of data is used as training set and 20% of data is used as test set.The performance of the proposed system can be evaluated by employing Recommendation Accuracy Metrics.In this paper, we use Mean Absolute Error (MAE) and Root Mean Square Error(RMSE) most frequently used Metrics to measure the accuracy.
Where p i prediction ratings for item "i", r i actual ratings for item 'i' and n is number of rated items.The lower the MAE value, the more accurate are the recommendations.The other metrics used in this study are precision and recall.
There is a certain ambiguity exists while using these measures.For an instance, increasing the total number of recommended items N, which directly effects on increase in recall but decreases precision.To overcome this ambiguity, we use another metric F1-Measure.It gives equal weightage to both precision and recall.All these measures are defined in equations ( 19), ( 20) and ( 21 III gives the comparison of MAE values for all the proposed algorithms for different sizes of neighborhood.In order to make the improvement even prominent, the results are also compared for root mean square error formats given in Table IV.The outcomes are also analyzed graphically in Fig. 3, Fig. 4 and Fig. 5.We compared all three algorithms with the traditional algorithm and is summarized in Table V.It has been observed from Fig. 6 that EUCFD is more accurate than traditional user based collaborative filtering (UCF), item based collaborative filtering (ICF) and proposed UCFD, EUCF algorithms and the results proved that our proposed EUCFD algorithm outperformed remaining algorithms and our proposed method shows a significant increase in prediction performance when compared to a traditional single source model.
The results in Fig. 7 demonstrates the higher relevancy of the final EUCFD algorithm recommendations and the significant improvements made over UCF, EUCF and UCFD algorithms for MovieLens datasets.The F-Measure results of EUCF, UCFD and EUCFD are also analyzed graphically as well.This paper introduced the benefits to the users of information systems in retrieving their personalized preferences by considering rating data, experts" opinion and demographic data of the user to generate personalized preferences.Furthermore, we introduced novel methods in identifying domain based experts and integrated similar users" opinion with experts" opinion to improve the recommendation quality.This paper focus on the issues of Collaborative filtering that includes data sparsity, scalability and reliability.Initially, we contributed three approaches that significantly improved the recommendation quality.Results proved that our approach can increase the scalability and generate an accurate prediction that is more suitable for large data sets.Thus, our proposed approach works towards the development of an efficient recommender system.

TABLE I
Finally, we evaluate our results and experiments on MovieLens datasets.The experiments proved that our proposed algorithms generate efficient and accurate predictions when compared to existing traditional User based Collaborative Filtering.The Mean Absolute Error of our proposed algorithms is lower than that of UCF.The outcomes from all the proposed algorithms are analyzed and compared based on MAE.Table generation www.ijacsa.thesai.org 3) Experimental results:

TABLE III .
MAE VALUES OF PROPOSED ALGORITHMS BASED ON NEIGHBORHOOD SIZE

TABLE IV .
RMSE VALUES OF PROPOSED ALGORITHMS BASED ON NEIGHBORHOOD SIZE

TABLE V .
SUMMARY OF MAE AND RMSE (MOVIELENS)