An Item-based Multi-Criteria Collaborative Filtering Algorithm for Personalized Recommender Systems

Recommender Systems are used to mitigate the information overload problem in different domains by providing personalized recommendations for particular users based on their implicit and explicit preferences. However, Item-based Collaborative Filtering (CF) techniques, as the most popular techniques of recommender systems, suffer from sparsity and new item limitations which result in producing inaccurate recommendations. The use of items’ semantic information besides the inclusion of multi-criteria ratings can successfully alleviate such problems and generate more accurate recommendations. This paper proposes an Item-based MultiCriteria Collaborative Filtering algorithm that integrates the items’ semantic information and multi-criteria ratings of items to lessen known limitations of the item-based CF techniques. According to the experimental results, the proposed algorithm prove to be very effective in terms of dealing with both of the sparsity and new item problems and therefore produce more accurate recommendations when compared to standard itembased CF techniques. Keywords—Collaborative Filtering; Recommender Systems; Multi-Criteria; Sparsity; New Item


INTRODUCTION
The information overload problem occurs due to the increasing growth of web information, which makes it difficult for web users to locate relevant information, products or services according to their needs and preferences.Recommender systems have been broadly utilized to address the information overload problem by helping web users in finding the most related information, products or services in diverse application domains such as e-commerce, e-learning, egovernment and e-tourism [1][2][3][4][5][6].Recommender systems are personalized decision support tools employed to exploit the users' explicit and implicit preferences to recommend to them the most relevant information, products or services.Collaborative filtering (CF) is one of the most known techniques in recommender systems to generate personalized recommendations.The CF technique can be further classified into user-based and item-based CF techniques.In user-based CF, recommendations for users are generated based on items that are liked by other similar users.In item-based CF, recommendations for users are generated based on items that are similar to those they have liked in the past [1,7].
However, the item-based CF technique is proved to be more successful in terms of the prediction accuracy than the user-based CF technique [8,9].Regardless of its efficiency, the item-based CF does not perform well and may produce inaccurate recommendations when there is a lack of users' ratings due to two key obstacles: the sparsity and the new item problems [8,9].To solve such problems, recent recommender systems have focused on the integration of additional information, thus, allowing recommender systems to exploit the added information as a supplementary to the insufficient users' ratings to generate more accurate recommendations.Examples of such additional information are: the semantic relationships that are exist among users or items [10][11][12][13][14]; and the multi-criteria ratings which can imply more complex users' preferences [7,[15][16][17].
Semantic information associated with users or items can be represented by taxonomies or ontologies and has an important task, by including concepts and their relationships, in accurately representing the item information and the user model [10][11][12][13][14].In addition, recent studies acknowledge that multi-criteria ratings of users can be utilized to find the actual correlations between users as it based on more than one criterion [18][19][20][21][22][23][24][25].To sum up, the additional information of users and items would assist in precisely model users' preferences and items' relations, and accordingly can result in more accurate recommendations.This paper proposes an Item-Based Multi-Criteria CF (IMCCF) algorithm for personalized recommender systems.The proposed algorithm is a hybrid of MC item-based CF and an item-based semantic filtering techniques.The proposed algorithm exploits the additional information provided by both the semantic relationships among items and the multi-criteria ratings of users to address the sparsity and new item problems.The proposed algorithm proved to be more effective in dealing with the above limitations and therefore produce more accurate recommendations when compared to standard item-based CF techniques.The rest of this paper is ordered as follows.Section 2 describes the related work of the research.Section 3 demonstrates the proposed Item-Based Multi-Criteria CF algorithm.Section 4 shows the experimental setup and results.The conclusion and future work are revealed in Section 5.

II. RELATED WORK
In general, most of current recommender systems use single-criterion CF recommendation approaches, which have been deployed highly successfully for many years.Recently, a number of research studies [18][19][20][21][22][23][24][25] have employed multicriteria ratings in their recommender systems taking into account that multi-criteria ratings would facilitate the accurate modeling of users' preferences, and thus provide more precise www.ijacsa.thesai.orgrecommendations.Examples of recent studies on Multi-Criteria Recommender Systems are: Ebadi & Krzyzak [18] develop an intelligent hybrid multicriteria hotel recommender system that suggests a number of hotels that are tailored for the preferences of a given user.To enhance the recommendation accuracy, the proposed system utilizes a multi-criteria rating technique to better capture and learn the preferences of users.TripAdvisor data is used to train the proposed system.Experimental results based on different settings and scenarios confirm the outstanding performance of recommendation accuracy of the system.Jhalani et al. [19] propose the employment of multi linear regression approach for determining the weights for each criterion and calculating the overall ratings predictions of each item.Experimental results on Yahoo movie dataset show the effectiveness of the proposed method in generating quality recommendation compared with single criterion and multicriteria CF benchmark algorithms.
Nilashi et al. [20] propose a novel recommendation algorithm using expectation maximization (EM) and classification and regression tree (CART) in order to improve the recommendation accuracy of multi-criteria recommender systems.The authors also employ the principal component analysis as a dimensionality reduction technique to alleviate the multi-collinearity limitation due to the interdependencies between different criteria in multi-criteria CF datasets.TripAdvisor and Yahoo!Movies datasets are used to validate the performance of the proposed algorithm.Experimental results show that the proposed algorithm extensively enhances the accuracy of recommendations in multi-criteria CF.
Farokhi et al. [21] propose a tourism recommender system that employs a recommendation method that integrates both multi criteria user-based and multi criteria item-based CF approaches.Fuzzy C-means algorithms beside k-means algorithms have been used to improve the recommendation accuracy of user-based and item-based CF approaches.The authors acknowledge that the use of multi-criteria rating in producing recommendations can improve the recommendation accuracy by providing more realistic recommendation that are very close to users' interests.Experimental results on the TripAdvisor dataset confirm the high performance in accuracy of the proposed method.
Nilashi et al. [22] incorporate the multi-criteria ratings in a new hybrid method for hotel recommendation using prediction and dimensionality reduction techniques to improve the predictive accuracy.The proposed method is a hybrid of the expectation maximization (EM) clustering, adaptive neurofuzzy inference system (ANFIS) and the principal component analysis (PCA) techniques.These techniques are combined to boost the predictive accuracy of the multi-criteria CF in tourism domain by exploiting the extra knowledge hidden in the multi-criteria ratings and reducing the dimensionality of a dataset to deal with the multi-collinearity problem presents in the multi-criteria ratings.Experimental results on TripAdvisor dataset show that the proposed method achieved high recommendation accuracy in the tourism sector.
Bokde et al. [23] propose a university recommendation system that provides students, of the Engineering College, with recommendations derived from their past preferences.The proposed system employ a hybrid method of multi-criteria item-based CF and dimensionality reduction approaches to produce high quality recommendations.The hybrid method decreases the computational cost and increases the prediction accuracy, thus overcoming the scalability and sparsity limitations.
Bilge & Kaleli [24] propose a multi-criteria item-based CF framework that extends the conventional item-based CF algorithm to make use of the benefits of multi-criteria rating systems.The authors determine the most suitable neighborhood selection approach and examine the performance of accuracy of statistical regression-based predictions.Experimental results on Yahoo Movies dataset affirm the assumption that multi-criteria item-based CF algorithms can accurately generate more reliable recommendations than single criterion rating item-based CF algorithms.
Shambour & Lu [25] propose a hybrid multi-criteria trustenhanced CF (MC-TeCF) method that addresses the limitations of single criterion user-based CF techniques by integrating the MC user-based CF and the MC user-based Trust filtering techniques.Empirical results of the proposed MC-TeCF method prove its significance over single criterion user-based CF techniques, in improving the accuracy and coverage of recommendations, when faced with extreme sparse data sets or new users.
However, compared to the huge amount of research carried on in the last years on single-criteria recommender systems, the adoption and employment of multi-criteria ratings in recommender systems has received limited attention [15,22].Thus, the need of more research in the area of multi-criteria recommender systems has provoked our interest toward the development of an Item-Based Multi-Criteria CF algorithm in this study.

III. THE ITEM-BASED MULTI-CRITERIA CF (IMCCF) ALGORITHM
The proposed IMCCF algorithm takes a raw matrix of useritem MC ratings, as input, which consists of multi-criteria ratings of M users on N items, and a hierarchical tree structured item taxonomy.The item taxonomy, given by the domain experts, has a set of main items' categories where items should belong to as leaf nodes.It should be noted that each item can be a member of one or more items' categories.The process of recommendation of the proposed IMCCF algorithm is demonstrated by the subsequent three main tasks:

A. The Computation of MC Item-based CF Similarity
The MC item-based CF similarity between a given target item i and an item neighbor j is computed in this step through: 1) the calculation of the partial similarities between each of the rating criteria c, then 2) the use of an aggregation function to get the overall similarity value.According to [24], the use of Euclidean distance as similarity measure proved to be an excellent choice for item-item similarity computation in comparison with the traditional item-based CF similarity techniques.Thus, the Euclidean Distance similarity measure www.ijacsa.thesai.org[16,24] is used here to calculate the MC item-based CF similarity values between the target item i and the item neighbor j based on each individual criterion as shown below: r denote the user u ratings on items i and j with regard to criteria c correspondingly.n is the number of users who commonly rated items i and j.The smaller is the distance between two items are, the larger the similarity value between them is.Therefore, the following metric is needed to convert the resultant distance into the similarity value based on each individual criterion: Then, we use the worst-case (i.e., smallest) similarity [16,24] as an aggregation approach on the partial similarities to find out the overall similarity value between a given target item i and an item neighbor j as follows: Sim is the value of partial similarity based on criteria c, x is the number of individual criterion.
Nevertheless, the Euclidean Distance similarity measure that is used to calculate the similarity values between items based on each individual criterion considers only the absolute value of ratings between users who have commonly rated items i and j.This could produce unreasonable similarity values between items since two items can have a high similarity value even though they have obtained an extremely limited amount of ratings.This issue can be improved by taking into account the amount of the users who have rated both items while computing the similarity between them.To solve this issue, we employ the Dice coefficient [26], as shown in (4), as a weighting factor to consider the percentage of users who have commonly co-rated both items i and j to the total number of users who have rated items i and j separately.Thus, the final MC item-based CF similarity is given by (5).

B. The Computation of Item-based Semantic Similarity
The item taxonomy is used to exploit the semantic relationships among items.To form such taxonomy in a particular domain: 1) The total number of main items' categories should be identified ; 2) The main items' categories should be created; 3) each item should be assigned to one or more appropriate main category.Formally, every item is modeled as a vector of binary values [0,1], as depict by ( 6).
Where is the binary vector representation of item i, t is the overall number of the major items' categories.The value of item-based semantic similarity among two items i and j is computed using the standard vector-based cosine similarity [8], as shown in (7).

C. The Computation of Rating Predictions
The prediction process of unrated item x by an active user a consists of two major steps.First, we use the weighted sum of deviations from the mean approach [27] to compute the rating predictions for each unrated item twice: 1) using the MC itembased CF similarity as specified by ( 8); and 2) using the itembased semantic similarity as specified by (9). , Finally, the above rating predictions is merged using the weighted harmonic mean aggregation method as revealed by (10)

A. Dataset and Evaluation metrics
To validate the performance of the proposed IMCCF recommendation algorithm, we use the Yahoo!Movies MC dataset [28] which was collected from the Yahoo!movies website (http://movies.yahoo.com).Each record of the rating data includes ratings for four criteria: story, acting, direction and visuals, in addition to an overall rating, user ID, and movie ID.The Yahoo! Movies MC dataset consists of 34,800 ratings from 1,716 users on 965 movies.The ratings are on the scale from 1 to 5. We built a movie taxonomy hierarchical tree structure with two levels.The main categories of items, referred to as movie genres, in which every item should be attached to are included in the first level.Whereas, the second level includes the items, referred to as movies, as leaf nodes.The movie genres has 32 attributes such as Action, Drama, Fantasy, … etc.
To evaluate the quality of the proposed algorithm, the recommendations produced were evaluated using: 1) the Mean Absolute Error (MAE) metric to measure the prediction accuracy (Note that the lower MAE is, the higher is the prediction accuracy), and 2) the Coverage metric to evaluate the capability of a given recommendation algorithm to produce recommendations (refer to [29] for more details on the metrics).

B. Benchmark algorithms
For benchmark purposes, we compare the results of the proposed IMCCF algorithm with the results of two widely used item-based CF algorithms: 1) The item-based CF based on cosine similarity proposed by [8] (denoted as VC-ICF); and 2) The item-based CF based on adjusted cosine similarity (denoted as AVC-ICF) proposed by [30].

C. Experimental results
Two main experiments have been performed to prove the improvement of the proposed IMCCF recommendation algorithm with respect to the prediction accuracy and recommendation coverage when faced with the challenges of sparsity and new item.

1) Evaluating
the Prediction Accuracy and Recommendation Coverage of the IMCCF on the Sparsity problem.On this experiment, we verify the efficiency of the proposed IMCCF algorithm compared with the benchmark algorithms in reducing the impact of the sparsity problem.As shown in Fig. 1 and Fig. 2, the proposed IMCCF algorithm has proven its superiority over other benchmark algorithms by obtaining the highest prediction accuracy (i.e., lowest MAE) and the maximum recommendation coverage at all sparsity levels.To conclude, it can be proven that the proposed IMCCF algorithm has a considerable improvement in lessen the effect of the sparsity and new item problems in comparison to the benchmark algorithms.

V. CONCLUSION AND FUTURE WORK
This paper proposes an Item-based Multi-Criteria Collaborative Filtering algorithm that integrates the items' semantic information and multi-criteria ratings of items to lessen known obstacles of the item-based CF techniques.The experimental results of the proposed algorithm, in comparison to the benchmark item-based CF algorithms, prove that the proposed IMCCF algorithm is very effective in dealing with both of the sparsity and new item problems with respect to the prediction accuracy and recommendation coverage.The proposed IMCCF algorithm enhances the quality of produced recommendations by exploiting the added information obtained from both the multi-criteria ratings of users and the semantic relationships among items to address the sparsity and new item limitations.In future, we will focus on further validating the performance of the proposed algorithm against more benchmark CF-based algorithms on larger data sets.
r and n r denote the mean values of ratings of items x and n respectively.based and semantic similarities between the items x and n respectively.The most Nearest Neighbors of items to the target item x identified according to the MC item-based CF and item-based semantic similarity weights denoted by mean rating value based on all rating criteria of item n by the active user a.

Fig. 1 .Fig. 2 .
Fig. 1.Comparing the predictuion accuracy of each algorithm on different levels of sparsity

Fig. 3 .
Fig. 3. Comparing the predictuion accuracy of each algorithm on specific number of ratings of new items

Fig. 4 .
Fig. 4. Comparing the recommendation coverage of each algorithm on different number of ratings for new items to guarantee that a high rating value of the ,