Multi-Objective Evolutionary Programming for Developing Recommender Systems based on Collaborative Filtering

In the era of internet, several online platforms offer many items to users. Users could spend a lot of time to find (or not) some items they are interested, sometimes, they will probably not find the desired items. An effective strategy to overcome this problem is a recommender system, one of the most popular applications of machine learning. Recommender systems select most appropriate items to an specific user based on previous information between items and users, and they are developed using diffeent approaches. One of the most successful approach for developing recommender systems is collaborative filtering, which can filter out items that a user might like based on reactions of users with similar profiles. Often, traditional recommender systems only consider precision as evaluation metric of performance, however, others metrics (like recall, diversity, novelty, etc) are also important. Unfortunately, some metrics are conflicting, e.g., precision impacts negatively on other metrics. This paper presents a multi-objective evolutionary programming method for developing a recommender system, which is based on a new collaborative filtering technique, while maximizes the recall for a given precision, The new collaborative filtering technique uses three components for recommending an item to a user: 1) clustering of users; 2) a previous memory-based prediction; and 3) five decimal parameters (threshold average clustering, threshold penalty, threshold incentive, weight attached to average clustering and weight attached to Pearson correlation). The multiobjective evolutionary programming optimizes the clustering of users and the five decimal parameters, while, it searches maximizes both similarity precision and recall objectives. A comparison between the proposed method and a previous nonevolutionary method shows that the proposed method improves precision and recall metric on a benchmark database. Keywords—Collaborative filtering; clustering; evolutionary programming; multi-objective; recommender systems


I. INTRODUCTION
Huge amounts of user data are generated and collected every day on the web, given the explosive growth of information, users are often greeted with more than countless choices [1]. Recommender system is an effective tool for helping the user in cutting the time needs to find personalised movies, products, documents, friends, places, services, among others [2]. Also, a recommender system is one of the most important and new research area in machine learning [3].
The most commonly recommendation approaches [4] used to produce a list of items for a user are: content-based, collaborative filtering and hybrid approaches. Content-based filtering is based on the item to define the prediction, i.e., it uses features of the item to make a similar item recommendations. Collaborative filtering is one of the most prominent and popular approaches, It recommends similar items to similar users (similar users is based on past behavior, previous purchases, preferences, ratings of other products, average purchase amount, etc.). And, hybrid filtering combines the two filtering approaches.
Evolutionary computing (or evolutionary algorithm) is a research area within computer science, as the name suggests, it is a special flavour of computing, which draws inspiration from the process of natural evolution. The fundamental metaphor of evolutionary computing relates this powerful natural evolution to a particular style of problem solving -that of trial-anderror [5]. The main branches [6] of evolutionary computing are: Genetic Algorithm, Evolutionary Strategies, Differential Evolution, Genetic Programming and Evolutionary Programming. Evolutionary Programming (EP) is a computational optimization method to find global optimal solution for a given problem and, it is used in this paper. All of these branches have been used to resolve different trends in recommender systems research [7], among these trends, the use of evolutionary computing for optimizing weights of recommendation techniques/component and learning recommendation systems.
The performance of classical recommendation algorithms is usually evaluated by accuracy related metrics [8], like precision and recall. From the definitions, precision and recall are conflicting objectives because while recall tries to increase the number of tagged entries as much is possible or the fraction of relevant information that are retrieved, precision tries to increase the number of correctly tagged entries or the fraction of information retrieved that are relevant [9]. A common strategy for dealing with this problem is the evolutionary multi-objective optimization, which refers to the use of evolutionary algorithms to solve problems with two or more (often conflicting) objective functions [10].
The proposed method, named MOEP-CF, is based on a Multi-Objective Evolutionary Programming for developing recommender systems, which uses a new Collaborative Filtering technique and, improves precision and recall objectives. The new collaborative filtering technique is based on three components to recommend an item to a user: 1) clustering of the users; 2) a previous memory-based prediction; and 3) five decimal parameters (threshold average clustering, threshold penalty, threshold incentive, weight attached to average clustering and weight attached to Pearson correlation). The optimization process is guided by a proposed new mutation operator with ten types of mutation and improves six components or sub components of the new collaborative filtering technique: 1) users assigned to each cluster; 2) threshold for penalty; 3) threshold for incentive; 4) threshold for average cluster ranking; 5) weight attached to average cluster ranking; and 6) weight attached to memory-based prediction. The MOEP-CF proposed method is based on a previous non multi-objective evolutionary method proposed in [11].
The main contributions of this work are summarized as follows: • The new collaborative filtering technique based on three components.
• The use a multi-objective Evolutionary programming for developing recommender systems based on new collaborative filtering technique and improving precision and recall objectives.
• The new mutation operator with ten types of mutation to improve components or sub components of the new collaborative filtering technique.
The rest of the paper is structured as follows. Related work is presented in Section II. Section III defines collaborative filtering and gives an example of the clustering-based collaborative filtering algorithm used in this paper for recommendation process. Section IV presents a review of evolutionary programming and the multi-objective evolutionary programming algorithm used in the proposed method is presented. A detailed description of the proposed method is presented in Section V. Section VI shows experimental results and compares that results with the previous method, and presents advantages of the proposed method. Section VII concludes the work and presents future work.

II. RELATED WORK
The proposed method in this paper is related to four broader research areas, namely CF approaches in recommender systems, clustering-based recommender systems, evolutionary computing in recommender systems and recommender systems that analyzed the performance metrics such as precision and recall.

A. CF-Based Recommendation Approaches
Collaborative filtering, as one of the most successful recommendation techniques, has been widely studied and applied by various research institutions and industries. However, CFbased approach often suffers from several shortcomings [16] [17], such as data sparsity, cold start, and scalability issues, which seriously affect the efficiency of a recommender system (RS). To overcome the mentioned problems, many data mining and machine learning techniques such as clustering [19] [22] [20] [21] and matrix factorization (Surveyed in [18]) are proposed to improve the performance of RS. Matrix factorization, one of the unsupervised learning methods, can play a role in reducing dimensionality and eventually alleviating the data sparsity [16].
Yang et al. [16] and Chen et al. [17] review and summarize the traditional CF-based approaches and techniques used in RS, classify and compare several typical CF algorithms as memory-based approaches and model-based approaches, and study some recent hybrid CF-based recommendation approaches and techniques, including the latest hybrid memorybased and model-based CF recommendation algorithms.

B. Clustering-based Recommender Systems
There has been diverse research to enhance recommendation accuracy by means of clustering methods, such as [19] [22] [20] [21]. In [19], is presented an approach (D2P) to addresses the tradeoff between privacy and quality of recommendation, it can be applied to any collaborative recommender system. The main intuition behind D2P is to rely on a distance metric between items so that groups of similar items can be identified. As a result of grouping, the K most similar users based on the similarity measure were selected for recommendation. In [22], a k-means clusteringbased recommendation algorithm is proposed, which addresses the scalability issues associated with traditional recommender systems. An issue with traditional k-means clustering algorithms is that they choose the initial k centroid randomly, which leads to inaccurate recommendations and increased cost for offline training of clusters. In [20], a fuzzy C-Means approach has been proposed for user-based collaborative filtering and its performance against different clustering approaches (including K-Means, self-organizing maps, and fuzzy C-Means) has been assessed. A collaborative filtering algorithm based on singular value decomposition (SVD) and fuzzy clustering was shown in [21]. It also reduces the dimensionality (scalability problem) and the search range of the neighbors.

C. Evolutionary Computing in Recommender Systems
Evolutionary Computing (EC) can optimize and improve RS in the various applications. Horváth & de Carvalho [7] and Sadeghi & Asghari [23] provide comprehensive reviews of more relevant publications focusing on three relevant aspects: approaches in which EC are used to optimize weights of recommendation techniques or different component, approaches utilizing EC for clustering of items or users, and hybrid and other approaches. In [24] [25] [27] [26] [28], novel heterogeneous evolutionary clustering algorithms are presented.
In [24], a new genetic algorithm encoding is proposed as an alternative of k-means clustering. The initialization issue in the classical k-means is targeted by proposing a new formulation of the problem, to reduce the search space complexity affect as well as improving clustering quality.
In [25] [27] [26] [28], novel heterogeneous evolutionary clustering algorithms are presented. The goal of these algorithms is to gather users with similar interest into the same cluster and to help users find items that fit their personal tastes www.ijacsa.thesai.org best. Firstly, items and users are regarded as heterogeneous individuals in the network. According to the constructed network clustering model, states of users evolve over time. States of users would be stable after some period of iteration. In light of stable states of users, they are clustered into several groups. Liji et al. [27] compute the user attribute distances. In chen et al. [26], a dynamic evolutionary clustering algorithm based on time weight and latent attributes is proposed. Then, collaborative filtering is applied in each cluster to predict the ratings.
In [28], a novel collaborative filtering recommendation algorithm based on user correlation and evolutionary clustering is presented. Firstly, score matrix is pre-processed with normalization and dimension reduction. Based on these processed data, clustering principle is generated and dynamic evolutionary clustering is implemented. Secondly, the search for the nearest neighbors with highest similar interest is considered. A measurement about the relationship between users is proposed, called user correlation, which applies the satisfaction of users and the potential information.
In [29], a hybrid approach to increase the accuracy of recommendation of user-based collaborative filtering video recommender system is proposed. The proposed approach combines k-means clustering algorithm and two different evolutionary algorithms which are Accelerated Particle Swarm Optimization Algorithm (APSO) and Forest Optimization Algorithm (FOA).

D. Performance Analysis in terms of Accuracy (Precision and Recall)
Many evaluation metrics are available for recommendation systems and each has its own pros and cons, but a few works provide guidance on how to choose and evaluate the impacts among them. For example, Schröder et al. [30] describe accuracy related evaluation metrics and discuss their applicability for different types of recommender systems.
Classical recommender systems mainly focus on the accuracy related metrics. However, with the increase of the diversified demands of users, multiple metrics which may conflict with each other have to be considered in modern recommender systems, especially for the personalized recommender system. Lin et al. [31] present a multi-objective personalized recommendation algorithm using extreme point guided evolutionary computation (called MOEA-EPG). In MOEA-EPG, the accuracy, diversity, and novelty of recommendations are chosen as the three conflicting objectives, and the aim of that algorithm is to optimize the modeled MOP for personalized recommendation.
In the context of trade-offs among accuracy related metrics, specifically precision and recall, Karabadji et al. [32] and Tran et al. [11] present methods to improve recommendation performance in terms of accuracy related metrics. Karabadji et al. [32] propose an evolutionary multi-objective optimizationbased recommendation system to pull up a group of profiles that maximizes both similarity with the active user and diversity between its members. The recommendation system will provide high performances in terms of both accuracy (precision, recall, f-measure) and diversity. Tran et al. [11] propose a new clustering-based CF (CBCF) method using an incentivized/penalized user (IPU) model only with ratings given by users. The purpose of CBCF with the IPU model is to improve recommendation performance such as precision, recall, and F1 score by carefully exploiting different preferences among users, i.e, maximize the recall (or equivalently F1 score) for a given precision. In addition, performance on the precision and recall of other various recommender systems was analyzed in [33] [34] [35].

III. COLLABORATIVE FILTERING
As one of the most successful approaches to building recommender systems, Collaborative Filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users [12]. Three main categories of CF techniques are: memory-based, model-based, and hybrid (that combine the two before techniques).
Memory-based techniques use the entire or a sample of the user-item database to generate a prediction. Model-based techniques design and develop models that analyzes the training data to recognize complex patterns and then make intelligence predictions for test data. The proposed MOEP-CF method uses both a memory-based and a model-based technique.
The memory-based technique used by the proposed method is the Weighted Sum of Others' Ratings [13]. Predict rating of a particularity item i to a user a is calculated by the following equation:r a,i = r a + u∈U (r u,i − r u ) × w a,u u∈U |w a,u | where: •r a,i : Predict rating for the user a on the item i The Pearson correlation between the user a and user u is calculated by the following equation: where: • w a,u : Weight or Pearson correlation between the user a and the user u • i∈I : The summations over all the items that both the users u and the user a have rated • w a,u : Weight or Pearson correlation between the user a and the user u.
For illustration, Table I shows a rating matrix for 8 users and 5 items. Based on information available in Table I, it is possible to calculate the Pearson correlation between users and, predict rating between users and items. For instance, w 1,5 = 0.28 and w 3,7 = −0.82; and,r 1,4 = 3.18 andr 8,3 = 2.72. Tables II and III show the Pearson correlation between users and predicted ratings for missed ratings, respectively.

IV. MULTI-OBJECTIVE EVOLUTIONARY PROGRAMMING
Evolutionary Programming (EP) was proposed by Lawrence Fogel [14], it is based on natural evolution and it has been applied with success to many numerical and combinatorial optimization problems.
EP does not require the use of a specific form of representation (for example, real-valued or integer strings), allowing the user to select the most suitable representation for the problem at hand. That is an important feature used in this paper.
Whatever the choice of representation, EP uses an iterative improvement process whereby a parent population structures are perturbed using a suitably defined mutation operator, with a selection process taking place to see which structures survive Algorithm 1: Evolutionary Programming Algorithm RepresentationIndividuals(); t = 0; Initialize(P p (t)); Evaluate(P p (t)); while isNotTerminated() do P o (t) = Mutation(P p (t)); Evaluate(P o (t)); P p (t + 1) = Select(P p (t) ∪ P o (t)); t = t + 1; end into the next iteration of the Algorithm [6]. An overview of the canonical EP algorithm is provided in Algorithm 1.
Multi-Objective Evolutionary Programming (MOEP) is an extended version of single objective EP. The MOEP algorithm with non-domination sorting can be described in Algorithm 2 and, it was proposed in [15]. The MOEP-CF method is based on MOEP.

V. PROPOSED METHOD
The MOEP-CF proposed method is based on a method proposed in [11], and hybrids Collaborative Filtering (see Section III) and Multi-Objective Evolutionary Programming (see section IV). The MOEP-CF recommends or does not recommend a certain item i to a certain user a using a modelbased technique based on three features: 1) Previous predicted ranking, 2) Clustering of the users, and 3) Five following parameters: • γ: threshold average clustering of item i in cluster c To decide whether to recommend or not to recommend a certain item i to a certain user a, the proposed method follows the Algorithm 3.   Following the Algorithm 1 and the information provided in Table IV, it is possible to recommend the item 4 to the user 1; and don't recommend the item 3 to the user 8. A detail of that recommendations are explained in Algorithm 4 and 5, respectively. In order to define features shown in the Table IV for maximizing the precision and recall metrics, the MOEP-CF proposed method follows the Algorithm 5. It is detailed in next subsections step by step.

A. Representation of Individuals
Each individual or solutions is encoded in a vector with two parts: the clusters part and the parameters part, such as depicted in Fig. 1.
The clusters part is an integer vector with |U | dimensions (U is the set of users); the dimension 1 contains the index of the cluster where user 1 belongs (iC 1 ), the dimension 2 contains the index of the cluster where user 2 belongs (iC 2 ), and so on.
The parameters part is a decimal vector with five dimensions; each dimension represents the parameters γ, α, β, ρ and σ, respectively.   Table IV.

B. Initialize Parents Population
Initial population or parent population (P p (t)) of size sizeP are randomly generated. For the clusters part, each www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 10, 2020 dimensions contains a random integer between 1 and iC max (iC max represents the maximum number of clusters). For the parameters part, γ, α and β contains a random decimal between minRank and maxRank; ρ and σ contains a random decimal between 0 and 1.
Values sizeP , iC max , minRank and maxRank are a parameters of the proposed method.

C. Evaluate Parent Population
The MOEP-CF proposed method uses training and test datasets. The test dataset contains some registers with ratings from users to items that are not in training dataset. For each individual in P p (t), it is calculated the precision and recall.
Based on training dataset, the Weighted Sum of Others' Ratings (see Section III) is used to define predicted ratings for the ratings in test dataset. Then, for each individual in P p (t), the Algorithm 1 is used to recommend or don't recommend an item to a user on test dataset. Finally, precision and recall are calculated based on the following equations: where: • tp: true positive.
• f p: false positive.
• f n: false negative.
• T : number of registers in test dataset.
• R test v,u,i : Recommend the item i to user the u (in register v) in test dataset following the Algorithm 1.
• N R test v,u,i : Don't recommend the item i to user the u (in register v) in test dataset following the Algorithm 1.
• r test v,u,i : Rating of the user u on the item i (in register v) in test dataset.
• δ ref : Threshold value for determining whether a user is really satisfied with the recommended item. It is a parameter of the proposed method and generally set to 4.0 (or 8.0) in case of a five-point scale (or a ten-point scale).
• [pred]: Evaluate to 1 if the predicate pred is true and 0 otherwise.

D. Terminating Condition
The terminating condition of MOEP is the maximum number of iterations maxIter.

E. Mutation
The mutation process is applied on each individual in P p (t) to generate offspring population (P o (t)) and, it is one of the main contributions of the MOEP-CF proposed method. Each parent solution suffer a type of mutation (with a probability) to generate a offspring individual. Table V shows the ten probabilities and types of mutation using in the proposed method. Changing randomly the cluster of the 50% of user, i.e., changing randomly 50% of dimensions in clusters part

10%
Changing the γ parameter to a random value between minRank and maxRank values.

10%
Changing the α parameter to a random value between minRank and maxRank values.

10%
Changing the β parameter to a random value between minRank and maxRank values.

10%
Changing the ρ parameter to a random value between 0 and 1.

10%
Changing the σ parameter to a random value between 0 and 1.

F. Evaluate Offspring Population
The process for calculating precision and recall of each individual in P o (t) follows the same evaluation process explained in subsection V-C for P p (t). After that, the P o (t) is merged to P p (t) to generate a merged population (P m (t)).

G. Identify Non-Dominated Solutions
A non-dominated solution is a solution that is not dominated by any other solution in P m (t). In the MOEP-CF proposed method, there is no other solution that performs better in precision and recall objective (or equal to one objective and better than another) than the non-dominated solution.

H. Assign and Sort by Front Number
The no-dominated solutions in P m (t), identified in the previous step, are inserted in the Front 0. A new search for non-dominated solutions is carried out in the population P m (t) without considering the solutions in the Front 0, the new nondominated solutions found are inserted in the Front 1, and so on.

I. Select the New Parent Population
The next parent population (P p (t + 1)) is filled according to front ranking until the size of P p (t+1) is equal to sizeP . If one front is taking partially, perform, for each solution, the sum of Euclidean distance between the objectives of that solution and other solutions in order to diversify the solutions for next iterations. The greater sum of Euclidean distance are preferred.
Finally, the evolution process is repeated until a termination condition has been reached. The source code of the proposed MOEP-CF proposed method is available on github (username: Edward-Hinojosa-Cardenas, project: MOEP-CF).

VI. EXPERIMENTAL RESULTS AND PERFORMANCE COMPARISON
In this section, the performance of the MOEP-CF proposed method is evaluated and compared with a similar previous method (non-evolutionary multi-objective method, named CBCF [11]). To show that MOEP-CF method performs better, it is applied on the MovieLens 100K dataset (collected by the GroupLens Research Project at the University of Minnesota). Table VI describes the dataset used in the evaluation. The proposed method is evaluated by 5 folds cross-validation. The parameters used in the MOEP-CF proposed method are outlined in Table VII.   TABLE VII      In order to assess the performance of the proposed method against the CBCF method [11]), we used the same dataset. Before of comparison, for each fold, the MOEP-CF method selects a random solution from non-dominated solutions in the last iteration. The average values for all selected solutions are shown in the Table VIII.   Table IX shows the clustering γ, α and β values and, the precision, recall and F 1 values obtained using the CBCF method. Next, it shows the number of cluster; the average of α, β, γ, ρ and σ optimized parameters; and, the average values of precision, recall and F 1 obtained in for that MOEP-CF proposed method.
The results show that the MOEP-CF method achieved better results than previous CBCF method on the three metrics mentioned above. The MOEP-CF method with a flexible clustering and two additional parameters (in comparison with the previous method), improves in 33.89%, 6.21% and 20.52% the precision, recall and F 1 metrics, respectively.
However, some limitations should be noted. First, the proposed method uses static probability values for each mutation which can influence the final result. Second, the proposed method doesn't consider important new metrics like novelty, diversity, stability and reliability in recommendation systems. Third, the parameters of the proposed method, showed in Table VII are defined empirically or experimentally.

VII. CONCLUSION
Optimizing different objectives simultaneously is a well recognized problem in a recommender system setting. In this work, a multi-objective evolutionary programming method for developing recommender systems is proposed. It is based on a new collaborative filtering technique that achieves high precision and recall.
The main contributions of the proposed method are three: a new collaborative filtering technique; a multi-objective evolutionary programming for developing recommender systems improving precision and recall objectives, and, a new mutation operator with ten types of mutations.
Future Work includes: adding non-accuracy metrics like novelty, diversity, stability and reliability to multi-objective evolutionary optimization process; using other predicted ranking methods before multi-objective evolutionary optimization process; and, evaluating the proposed method taking as input other popular datasets.