Improving the Prediction Accuracy of Multicriteria Collaborative Filtering by Combination Algorithms

This study focuses on developing the multicriteria collaborative filtering algorithm for improving the prediction accuracy. The approaches applied were user-item multirating matrix decomposition, the measurement of user similarity using cosine formula and multidimensional distance, individual criteria weight calculation, and rating prediction for the overall criteria by a combination approach. Results of the study show variation in multicriteria collaborative filtering algorithm, which was used for improving the document recommender system, with the two following characteristicsfirst, the rating prediction for four individual criteria using collaborative filtering algorithm by a cosine-based user similarity and a multidimensional distancebased user similarity; second, the rating prediction for the overall criteria using combination algorithms. Based on the results of testing, it can be concluded that a variety of models developed for the multicriteria collaborative filtering systems had much better prediction accuracy than for the classic collaborative filtering, which was characterized by the increasingly smaller values of Mean Absolute Error. The best accuracy was achieved by the multicriteria collaborative filtering system with multidimensional distance-based similarity. Keywords—Algorithm; multicriteria collaborative filtering; document; recommendation; system; similarity; multidimensional distance; decomposition; combination; prediction; accuracy


INTRODUCTION
In computer science field, a recommender system is a relatively new domain of study.Initially, the recommender systems is only a topic of study from several other fields such as cognitive science, approximation theory, information retrieval system, forecasting theory and management science.At the mid of 1990s, the recommender systems become the independent domain of study, i.e. when the researchers have began to focus on the problems of the recommendation using collaborative filtering [1] [2].
The work principle of collaborative filtering algorithm is to generate recommendations for active users based on the opinion history of a group of users that have similarity with that of the active users.The users' opinions are explicitly given in form of rating value [2] [3].To select new item that will be recommended to the active users, the system previously do the rating predictive value on all of the new items that are not given the rating value yet by the active users.Only the items with highest predictive value will be included into a list of recommendations.
The main problem faced by the collaborative filteringbased recommender system is the prediction accuracy [4].Many researchers have paid attention to the effort of improving the accuracy, both by developing the prediction technique and the handling of cold-start problem.In this paper, we explain a process of engineering the prediction algorithm on recommender system using multicriteria collaborative filtering to improve the prediction accuracy, including by introducing new approach in user similarity measurement.A metric used to measure the prediction accuracy is Mean Absolute Error defined as : [2] [5] where p ui is the user's predictive value u on item i and r ui is a rating value given by the user u on item i, and c is the number of item.
The writing of rest of the paper is arranged systematically as follows.Section 2 provides an explanation of the urgency of multicriteria collaborative filtering in the recommender system.Section 3 explains the process of modifying the suggested multirating prediction algorithm.The testing of prediction accuracy was presented at Section 4, while the discussion of the results of the testing was presented at Section 5.The writing of paper was closed by the conclusion presented at Section 6.

II.
THE URGENCY OF MULTICRITERIA COLLABORATIVE FILTERING (MCF) DEVELOPMENT The collaborative filtering approach is so far largely applied at the recommender systems with only used one criterion to represent the users' opinion on an items [6] [7].As an example, an individual gives the rating value of 5 in a document, so the value of 5 does not specifically show the criteria of rating used; therefore, a case might occur where (1) www.ijacsa.thesai.orgseveral users give the same values but the criteria used were different.Such problem is called without distinction of interest problem [8]- [10].
In order to solve such problems, an idea is offered to accommodate the use of different criteria in making the rating, which is called as multicriteria collaborative filtering [7].The approach is a variation of the collaborative filtering using many criteria in representing the rating of users' interest.The idea was applied by the Zagat's Guide by determining three criteria of restaurant rating, i.e. food, decor and service, while Buy.com used the multicriteria rating system for electronic devices including display size, performance, battery life and cost.Yahoo!Movies determined four criteria, i.e. story, action, direction and visuals [1].
The use of many criteria in the collaborative filtering is proven to generate recommendation with better quality and more approaching the users' need.The indication of the improving quality can be known from the increasingly high prediction accuracy based on many criteria that are appropriate with the users' tendencies [10] [11].However, this concept still causes new problems because it is not accompanied by the weighting of criteria reflecting the preferencies of users or frequently called without weight feature problem [8].In order to solve the problem, the weighting is done for several criteria that are regarded as having high priority and the weighting is static in nature.Other criteria regarded as not important were ignored and not involved in the rating determination process.
The static property in the weighting of several criteria and the ignorance of other criteria are potentially harmful to the system, i.e. the lack of prediction accuracy because such users' preferences collectively develop in a dynamic manner.Therefore, it is necessary to develop a multicriteria collaborative filtering that has a capability of improving the weight of criteria adaptively in accordance with the development of the users' collective preferences.The mechanism of updating the weight of criteria should acommodate all the criteria determined, no matter how small the weight of effect on the collaborative process.For the purpose, it is necessary to develop a variation in the multirating value prediction algorithm by combining the concept of classical collaborative filtering and the calculation of criteria weight.The use of many criteria in collaborative filtering also generated an idea to modify a technique for user similarity measurement by the concept of multidimensional distance.

III. PROPOSED MCF PREDICTION ALGORITHM
In the classical collaborative filtering, the model of user profile representation used was the matrix of userneighborhood where each matrix cell R(u,i) represented the rating value given by user u on item i, with a note that the value 0 indicates the item was never given the users' rating value [12] [13].The multicriteria collaborative filtering also used the matrix of user-neighborhood to represent user profile, but each user give many ratings to each item, in accordance with the number of criteria determined and added by one overall rating value.Thus, if the number of criteria determined was k, each user should give the rating for k+1.In the study, the selected object was the scientific documents with four criteria, i.e. topic (k 1 ), novelty (k 2 ), recency (k 3 ) and author (k 4 ).Thus, the user profile representation also used the matrix of user-items multiratings where each cell of the matrix consisted of five rating values, four for the individual criteria and one for the overall criteria (k u ).

A. User Neighborhood Formation
The formation of user neighborhood is based on user similarity value.The terminology of similarity in this context referred to the similarity in the track records of users in giving ratings on a group of documents.The concept of multicriteria collaborative filtering provides a space for new ideas in calculating the user similarity, i.e. in addition to use cosine, the user similarity can also be measured using the concept of multidimensional distance.
To explain the process of measuring the similarity by using both the models, an example of the matrix of user-document multiratings was given as given in Fig. 1 containing eight users, i.e. u 1 , u 2 , u 3 , u 4 , u 5 , u 6 , u 7 , u 8 and five documents, i.e. d 1 , d 2 , d 3 , d 4 , d 5 .Each document used four individual criteria and an overall criterion written by k 1 , k 2 , k 3 , k 4 , k u .For example, the users that are active are u 4 and there are three documents that are given the rating value by using u 4 , i.e. d 3 , d 4 dan d 5 .
The task of such recommendation system is to make the rating prediction given by u 4 on the three documents, and then give recommendation to the documents with highest predictive value to u 4 .In order to do the measurement of user criteriabased similarity, the first step passed through is to decomposite the multicriteria problems become single criterion ones.Results of the decomposition of multiratings given in Fig. 1 become five single criteria matrices as shown in Fig. 2, respectively the matrices for the criteria k 1 , k 2 , k 3 , k 4 , and k u .After five matrices were gained, further step was to make the measurement of user similarity for each criteria using cosine formula as follows: The algorithm of user similarity measurement using the cosine formula can be written as follows : Input : ratings matrix R(u,i) Output : similarity(u1,u2,criterion) 1 Set First User and Second User (u1,u2) (2) www.ijacsa.thesai.orgMeanwhile, the measurement of user similarity using the concept of multidimensional distance can be explained in three steps as follows.
The first step is to calculate distance between two users for each document that was co-rated.The more the documents that were co-rated, the more the values of multidimensional distance.For example, the multiratings of users u were (r 1 ,r 2 ,r 3 ,r 4 ,r u ) and the multiratings of users were (r' 1 ,r' 2 ,r' 3 ,r' 4 ,r' u ), so the multidimensional distance between the users u and v for one document was written as d(u,v), calculated by using the Manhattan formula as follows : [14] The second step is to calculate the multidimensional distance between two users based on members D(u, v), i.e. a set of document co-rated by the users u and v.The multidimensional distance, written by d total (u,v), was an average of all d(u,v), shown as follows: The third step is to converse the multidimensional distance value gained from the second step to be the similarity value.A relation between multidimensional distance and similarity was stated by [14] with the formula as follows: The algorithm of user similarity measurement by using the concept of multidimensional distance can be written as follows: Input : ratings matrix R(u,i) Output : similarity(u1,u2)

B. Prediction Algorithm
The process of the prediction of overall criteria rating can be explained in three steps as follows: (5) www.ijacsa.thesai.org

1) Four individual criteria rating prediction
After the database of multiratings was formed, the formation of user-neighborhood and the prediction for the four document criteria, i.e. k 1 , k 2 , k 3 , k 4 , can be done using the formula of similarlity-based prediction as follows : where: : rating value by user v on item i.
: average rating value on user u.
: user similarity u and v.
Output of the step was four rating predictive values resulted from the system (r' 1 , r' 2 , r' 3 , r' 4 ) for each document.

2) The calculation of criteria weight
With the step of the prediction of 4 criteria-based individual rating value, the process of computing the relations between the four individual criteria-based rating values (r 1 ,r 2 ,r 3 ,r 4 ) and the overall criteria saturating value (r u ) was parallelly done based on the multiratings database that was available by using an artificial neural network method.Output of the step was four weights of criteria and one constant e, i. 3) The prediction of overall criteria rating The last step was to predict the rating value for the overall criteria (r' u ) by not using similarity value again, but by utilizing the four individual criteria-based rating value (r' 1 , r' 2 , r' 3 , r' 4 ) resulted from the first step and four weight of criteria resulted from the second step (b 1 , b 2 , b 3 , b 4 ) and one constant e, so the overall criteria value was: .

IV. EXPERIMENTS
To do the testing of the prediction accuracy, some conditions representing the recommender systems was selected, i.e. when the users and document achieved certain amount with certain sparsity level also.The experimental scenario of the testing was as follows: 1) The measurement of Mean Absolute Error (MAE) was done for each criteria.
3) The first condition selected for the measurement was when for the first time cold-start problem was solved, where the number of users listed achieved 50 people and the number of document was 100.The second condition was a middle condition, i.e. when the number of users was 100 people and the number of document was 200.In these conditions, there occurred many interraction between users and system where there were the significant addition of new users and documents.The last condition was when the number of user listed achieved 200 people and 400 documents.
4) Prediction rating value for four individual criteria using cosine-based similarity and multidimensional distancebased similarity.
5) Experiment was done for 10 times for each sparsity level.

C. The Prediction Accuracy of Classic Collaborative Filtering
Testing the prediction accuracy of classic collaborative filtering was necessary to do as baseline, with results of the testing shown in Fig. 3. From the graphic, it can be seen that the lower the sparsity level of a matrix, the lower the Mean Absolute Error (MAE) value.The trend occurred also when the number of users and documents was higher and accompanied by the activity of giving the rating value.The addition of the number of new users and documents into a system but not followed by the activity of giving the rating value will indeed increase a matrix sparsity with impact on the reduced quality of prediction system.

D. The Prediction Accuracy of Multicriteria Collaborative
Filtering (MCF) Model.
The measurement of Mean Absolute Error value of the multicriteria collaborative filtering was done in two model variation in accordance with approach used for predicting the four individual rating criteria.The first variation was the model for predicting the four individual criteria rating based on similarity measured using cosine formula as usually done in classic collaborative filtering, while the second variation was the model whose prediction process used the concept of multidimensional distance-based similarity.For the overall rating prediction, both models similarly used a combinatorial technique.Results of the measurement of Mean Absolute Error on the first multicriteria collaborative filtering model was shown in Fig. 4.
(7) www.ijacsa.thesai.org Results of the measurement of the MAE confirmed the conclusion of previous researchers stating that the more the users actively giving rating value, the more accurate the recommendation produced by collaborative filtering algorithm.In the contrary, although many users listed into a system but when most users will not actively give rating it will indeed weaken collaborative principles as the core of power for recommender systems.
In general, it can be concluded that the best prediction was resulted in the condition of 200x400 with the sparsity level of 10%, while the worst results occured in conditions of 50x100 with the sparsity level of 60%.There was no difference in significant MAE among four document criteria, i.e. topic, novelty, recency, and author.However, the presence of similarity in the trend of MAE value cannot automatically be meant that there were the uniformity of rating value given to four document criteria.It is possible that it was more caused by the users homogenity involved in the system testing process.
( Results of the measurement of the MAE also provide important information that the overall criteria prediction have better accuracy level compared with four individual criteria, characterized by the lower MAE value for all the sparsity level.Therefore, it can be concluded that the rating value prediction process by using a acombinatorial approach give more accurate results compared with results of pure collaborative filtering approach.In the testing of the model, the lowest MAE value was 0.6537, which was recorded when the number of users and documents was 200x400 with the matrix emptiness level of 10%.
Results of the measurement of MAE for the second model were shown in Fig. 5. From the five criteria, there was similarity in trend of predictive values among the four individual document criteria.Meanwhile, for the overall criteria it had better prediction accuracy level.Similar to what happened in the first model, in the second MCF model the best prediction for all the criteria was also resulted in the condition of 200x400 with sparsity level of 10%, while the worst results were also in the condition of 50x100 matrix with the higher sparsity level of 60%.
For individual criteria, the lower value of MAE was 0.6500 that was gained the Topic criteria in conditions of the number of users and documents 200x400 with sparsity level of 10%.Meanwhile, for other three individual criteria, i.e. novelty, recency and author, the lowest value of MAE gained by each was 0.6550, 0.6566 and 0.6540.If compared, the four values of MAE for the four individual criteria did not have significant difference.There were unstable conditions, i.e. when the number of users and documents 50x100 and sparsity level was 20%.Results of the measurement of the MAE of multidimensional distance-based multicriteria collaborative filtering show that rating prediction for the overall criteria has also better accuracy level compared with the prediction for four individual criteria.The best value of MAE for the second model was 0.6229 measured in the conditions of 200x400 with the sparsity level of 10%.The MAE value was lower than the MAE value in the first MCF model, i.e. 0.6537, recorded in the conditions of the same number of users and documents with the same sparsity level.By considering all results of measurement of Mean Absolute Error, it can be concluded that the MCF of second model resulted in more accurate predictive value compared with the MCF of the first model, both for the four individual criteria and the overall criteria.

V. DISCUSSIONS
Theoretically, the prediction process in a collaborative filtering was actually done based on the principles of similarity value.However, if the number of criteria used is more than one, the overall rating prediction process can be modified by doing a combination between collaborative filtering and criteria weight searching model.However, the way requires conditions, i.e. the availability of user-item ratings database in a large number.By considering all the results of experiment concerning the measurement the Mean Absolute Error shown in Fig. 3, Fig. 4 and Fig. 5 can be known that multicriteria collaborative filtering resulted in the more accurate predictive value than pure collaborative filtering.
The second model resulted in more accurate predictive value compared with the first model, for all individual criteria and overall criteria.It gives new knowledge that although the cosine formula resulted in higher similarity value among users compared with the formula of multidimensional distance, but the prediction accuracy was lower.From computational aspect, overall criteria prediction was more efficient because it only consists of several simple arithmetic statements.However, there were also other computational loads, i.e. when searching the criteria weights using artificial neural network.Periodically, the criteria weights can be updated after there were new rating data.
In addition to give more accurate results of prediction, MCF also given advantage when generating recommendation.It means that some documents that gained high predictive value can be recommended based on the combinatorial criteria.It is very useful for users that want the diversity of recommendation.

VI. CONCLUSIONS
Generally, the notion of development of combination prediction algorithm of multicriteria collaborative filtering given the significant increase of prediction accuracy.From the results of experiments, it can be known that average similarity value measured using cosine formula was higher than measured by the concept of multidimensional distance.However, the modification of prediction algorithm using multidimensional distance-based similarity was proven to give more accurate prediction value compared with model using similarity measured by a cosine formula.

Fig. 3 .
Fig. 3. Graphic of the MAE of Classic Collaborative Filtering

Fig. 4 .
Fig. 4. Graphic of the MAE of First MCF Model.User Similarity for Each Individual Criteria Was Calculated by Cosine Formula

Fig. 5 .
Fig. 5. Graphic of the MAE of Second MCF Model.User Similarity Was Calculated Using the Multidimensional Distance Approach.