Temporal-based Optimization to Solve Data Sparsity in Collaborative Filtering

—Collaborative Filtering (CF) is a widely used technique in recommendation systems. It provides personal recommendations for users based on their preferences. However, this technique suffers from the sparsity issue which occurs due to a high proportion of missing rating scores in a rating matrix. Several factorization approaches have been used to address the sparsity issue. Such techniques have also been considered to tackle other challenges such as the overfitted predicted scores. Nevertheless, they suffer from setbacks such as drift in user preferences and items’ popularity decay. These challenges can be solved by prediction approaches that accurately learn the long-term and short-term preferences integrated with factorization features. Nonetheless, the current temporal-based factorization approaches do not accurately learn the convergence of the assigned k clusters due to a lower number of short-term periods. Additionally, the use of optimization algorithms in the learning process to reduce prediction errors is time-consuming which necessitates a faster optimization algorithm. To address these issues, a new temporal-based approach named TWOCF is proposed in this paper. TWOCF utilizes the elbow clustering method to define the optimal number of clusters for the temporal activities of both users and items. This approach deploys the whale optimization algorithm to accurately learn short-term preferences within other factorization and temporal features. Experimental results indicate that TWOCF exhibits a superior CF prediction accuracy achieved within a shorter execution time when compared to the benchmark


I. INTRODUCTION
Nowadays, recommendation systems have become popular as they efficiently suggest items to customers according to their feedback and interests [1]. The major resources (data) used to create recommendations are customer profiles, item profiles, and user-item connections (i.e., customer scores to the suggested items) [2]. Collaborative filtering (CF), contentbased filtering, demographic filtering, and hybrid filtering are four forms of filters employed in recommendation systems [3]. CF is one of the best prevalent recommendation techniques that provide users with personalized predictions based on their preferences. It relies solely on past users' rating scores on products and does not require the creation of explicit profiles. For example, CF utilizes the rating scores of neighbours to predict a list of items to the active user. However, CF suffers from three main issues: data sparsity [3], [4], cold start [5], [6], and scalability [7] [8]. This paper briefly discusses the approaches utilized to solve the sparsity issue.
Normally, CF-based recommendation systems arrange customers' product rating scores in form of a rating matrix. Customers rank a small number of products using rating scores. These scores are then arranged into a rating matrix. The rating matrix contains very few scores while others are unknown or sparse. This reduces the prediction accuracy of the CF technique. CF provides a list of recommendations to an active user based on his/her interests and according to the feedback of common users who rate some items that are rated by the active user. The feedback is calculated using the similarity and prediction assessments. The similarity assessment between common users and the active user will be infeasible or inconsistent if there is a higher proportion of missing rating scores in the rating matrix [9].
The optimization algorithms have proven successful in several areas such as healthcare [10], document processing [11], and recommendation systems [12]. Various factorization approaches have been used to solve the sparsity issue. These include an imputation-based matrix factorization [2], ensemble divide and conquer [13], and neighbourhood matrix factorization [14]. Although these approaches can learn factorization and latent features that influence the prediction accuracy of the CF technique [12], they cannot effectively learn temporal behaviors and temporal issues such as the drift in users' preferences and the popularity decay of items [15] [16].
The temporal collaboration model [17] merged factorization vectors, long-term preferences, and short-term preferences to enhance the efficiency of the CF technique. The short-term feedback is defined using the shrunk neighbors approach [18] [19] [17]. Furthermore, the short-term preferences are defined by a timestamp factor that determines time periods such as the number of years, seasons, months, and so on. This duration is used to assign k clusters to learn the short-term features using k-means clustering algorithm. The temporal-based approach [1] achieves higher prediction performance compared to other previous temporal approaches. However, this approach cannot be implemented on the small rating matrix since the k-clustering cannot be achieved (k problem). *Corresponding Author www.ijacsa.thesai.org Also, bacteria foraging optimization algorithm has been used to learn the accurate temporal and factorization features [16], [15]. This algorithm provides different error values (fitness) in the search space. The error value increases in some iterations while in others, it decreases; thus, consuming time. To address all these challenges, a new temporal-based whale optimization approach named TWOCF is proposed in this paper.
The TWOCF approach uses an elbow clustering method [20] to accurately learn the precise number of clusters in the time matrix of users' activities. Additionally, TWOCF approach learns the accurate weights for short temporal features that are integrated with other factorization and long temporal features.
The unique aspects of this research can be summarized as follows:  Solving the sparsity problem.
 Learning accurate latent features throughout the learning iterations.
 Solving the drift and decay issues.
 Reducing the running time of the learning process.
The factorization-based optimization approaches [15] [16] [1] have been focused on solving the temporal and sparsity issues based on personalization. These approaches have improved the accuracy of predicted rating scores while ignored the running time. Despite running time is a significant factor in online recommendation systems, most of the recommendation-based optimization approaches ignore running time. Therefore, TWOCF approach is proposed to solve temporal and sparsity issues with the purpose of improving the prediction accuracy and reducing running time.
The rest of the paper is constructed thus: Section 2 discusses the related works (including the factorization and temporal methods) and describes the whale optimization algorithm (WOA) whose methodology is explained in Section 3. In Section 4, the investigational findings are explained while Section 5 summarizes the main findings of this study.

A. Matrix Factorization
Matrix factorization has been recently used for solving the sparsity and cold start issues associated with the CF technique in recommendation systems [21]. Matrix factorization is characterized by two features: baseline and latent features. The latent features are defined using the singular value decomposition algorithm [22]. Many factorization methods integrate latent and baseline features of users and items utilizing several formulae [7]. For example, the baseline formula can be used to predict the missing rating scores in the rating matrix as shown in Equation (1). Additionally, the norm latent factor is used in several methods [23]. Equation (2) is another example where various latent features are integrated with the weight of  to minimize the overfitting in the predicted rating scores.
where u p and v q  are the norm latent features of users and the norm of transport latent features of items, respectively.
Ensemble divide and conquer method is used to solve the rating scores' deviation that occurs when the rating scores are arranged in the memory. This method precisely learns latent features by rearranging the ratings in the rating matrix [13]. Nevertheless, the accuracy of this method is still low due to the drift in users' preferences and items' popularity decay. Thus, there is a need to study the positive effects of such temporal features for improving the prediction performance of recommendation systems [24]. Maintaining the Integrity of the Specifications.

B. Temporal Preferences
In recent years, temporal preferences and factorization factors are integrated within the collaborative-based approaches to solve sparsity problems [25]. The temporal dynamics method defines the time features by splitting the timeline into constant numbers of bins [25] while the user preferences are altered over time. This approach minimizes the overfitting of the predicted rating scores in the optimization latent space by a global weight (which is characterized by weakness in terms of personality). Generally, temporal preferences are long or short-termed [17].

 Long-term preferences
The long-term approach [17] where s and e represent the first and last time preferences that are sequentially recorded in the rating matrix, tuv is the current time item v is rated by user u. The long term preferences of users are defined in Equation (4) [15]. ( t is the last time that item v was rated by users.
 Short-term preferences The short-term based latent model learns the drift of users' preferences by incorporating the factorization features with the neighbors' latent feedback throughout a session, e.g., one month, one season, one year, etc. The temporal interaction model [17] combines the preferences of both short-term and long-term to address the drift in users' preferences. Nevertheless, both the short-term model and the temporal integration model [17] have limitations in terms of discovering the drift and time decay.
The short-term based factorization model [16] utilizes the k-means algorithm to divide the time matrix into a number of clusters based on the number of short-term periods such as the number of months. This model also deploy a bacterial foraging optimization algorithm to learn the best short-term weight to be integrated within the factorization features during the iterative learning procedure [16]. However, the sparse timestamp matrix of some active users cannot be divided into a certain number of clusters if the number of common users who rated the active user's item of interest is not sufficient (k problem).
In addition, longer execution time is required to learn the accurate temporal and factorization features, especially if the number of common users in the rating matrix is large. Hence, to address these issues while improving on our earlier works [1] [16], a new temporal-based approach named TWOCF is proposed. TWOCF approach assigns optimum temporal preferences and accurately learns the factorization and latent features using WOA.

C. Whale Optimization Algorithm
Whales are considered the largest animals in the world. They are always awake, quite smart, and sensitive to feelings. A truly exciting optimization idea is taken from the humpback whales, a group of whales with a unique searching technique named the bubble-net feeding method. The humpback whales only use the bubble-net for feeding. They would rather chase a bunch of krill or little fishes close to the surface. This hunting is completed by producing distinctive bubbles along a circle [26]. The feeding behaviour of humpback whales is mathematically represented by WOA to solve optimization problems.

A. MovieLens Dataset
Several prediction approaches have utilized the MovieLens dataset to evaluate the performances of recommendation systems [15], [27]. All records in this dataset have been collected using the MovieLens website (movielens.umn.edu) throughout seven months from September 1997 to April 1998. There are 100,000 rating scores collected by 943 customers for 1682 movies. Each rating score that is collected by users is saved with its timestamp info. In the data set, each user has rated at least 20 movies, rating range is 1-5, each user can rate the movie as 1, 2, 3, 4, 5. The higher the user's rating of the movie, the more interested the user is in the movie. The sparseness of dataset can be calculated as: 1 -(100000/ (943 * 1682)) = 0.936953. Additional features are collected by users such as age, gender, occupation, etc. Further features for items are movie id, movie title, release date, video release date, and genre. Similar to the benchmark methods, three significant features are considered in this paper. They are rating score, timestamp, and item genre.

B. Evaluation and Benchmark Methods
This work focuses on resolving the sparsity, overfitting, drift, and decay issues to improve on the prediction performance of the CF technique. The root mean square error (RMSE) function is employed for performance evaluation. RMSE has been utilized in many prediction approaches [15], [28] to evaluate the performance of the CF technique. A lower RMSE value indicates a higher prediction accuracy.

C. Experimental TWOCF Approach
The TWOCF method incorporates the long-term and shortterm preferences with the factorization features to furnish the sparse rating matrix in order to yield accurate predictions. Besides the sparsity issue, three other challenges will be addressed by TWOCF. These are overfitting, drift, and decay as illustrated in subsequent subsections.

1) Assigning the temporal preferences:
There are two kinds of temporal preferences: short-term and long-term preferences. The short temporal-based factorization model [16] analyzes the time matrix using the k-means algorithm. Similarly, the number of clusters is assigned according to the total number of sessions (e.g., MovieLens dataset contains 7 sessions and each session spans one month). However, this technique is suitable for smaller k periods (such as number of years or seasons) and not larger values (such as the number of weeks). This is the case for most users when the number of clusters formed is less than the target k after several convergences.
In addition, there are other user-centric differences. For example, some users are active only within a very short time www.ijacsa.thesai.org while others are active for a longer time. These make numbering clusters in periods inaccurate. Thus, the best way to solve these challenges is by using clustering algorithms that can define the number of clusters accurately. The elbow clustering method is one of the most common methods used to determine the optimal values of clusters [29], [30]. In this work, the elbow clustering method is used to tackle the challenge of determining the number of clusters in the sparse timestamp matrix. Fig. 1 shows a simple formation of the whale members using the elbow clustering method. Sometimes, the learning process stops because the clusters created by the k-means method are less than the required number of clusters. Equations (4) and (5) [15] are used to assign long-term preferences for users and items, respectively.
2) Integrating the temporal preferences with factorization features: This experimental work is intended to (i) predict the missing rating scores in the rating matrix following the CF technique and (ii) address other limitations such as overfitted predicted scores, drift in the users' preferences and decay in the popularity of items. The factorization and latent features that are integrated with temporal preferences are learned as follows: where u  and v  are the long preference of user u and item v, respectively. The WOA is used to update the short-term weights of users and items. These weights will be integrated with the factorization features and with the long-term preferences using Equation (6) to reduce the overfitted predicted rating scores. Subsequently, the WOA learns the drift in the preferences of users and the decay in the popularity of items. This helps to reduce the effects of these negative factors throughout the iterative learning process using Equations (7) and (8), respectively.
where M is the number of common users who provide rating scores for items and N is the number of all items rated by the active user. To obtain the best performance, this work integrates the latent feedback of Equation (6) with Equations (7) and (8) using Equation (9).
where uv  is the predicted value for the missing rating score value by user u for item v. In Equation (9) 3) Integrating WOA within TWOCF approach: WOA is integrated with TWOCF approach to optimize the prediction of the CF technique. This is aimed at addressing sparsity, overfitting, drift, and decay issues. TWOCF updates the weights of short temporal preferences throughout the iterative learning process managed by WOA. Three feeding behaviors of the humpback whales are briefly discussed as follows:  Encircling prey Humpback whales identify the locations of small fishes, then engulf them [31]. WOA algorithm assumes that the recent superlative candidate result is the objective prey or is near to the optimum. After the finest search agent is identified, the other search representatives will later attempt to revise their positions for finding the best search representative. This performance is characterized by Equations (10) and (11).
where i is the number of the current iteration, A and C are coefficient vectors. In this work, T represents the short duration weights which is the position vector of the prey.  (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 12, 2020 518 | P a g e www.ijacsa.thesai.org where  is linearly reduced from 2 to 0 throughout the iterative process and r is a random vector from 0 to 1. Equation (11) allows each search agent to update the temporal weight that is close to the current best position and replicate circling the prey.
 Bubble-net attacking method WOA mathematical modeling involves two phases modeled as follows: 1) Shrinking encircling mechanism: In Equation (12), the value of  is reduced, then A is also reduced by  .
Therefore, the new position of a search agent can be defined between the current temporal positions and the best temporal positions by setting random values for A from 0 to 1.
2) Spiral updating position: The distance between the whale and its prey can be calculated using the weights of temporal features and the best weights of temporal features. A spiral equation is then formed between the position of whale and prey as shown in Equation (14). * ( 1) .exp( ).cos(2 ) ( ), where denotes the distance between the whale and its prey, b is a constant number to define the form of logarithmic rise, and  is a random number in [-1, 1]. The humpback whales swim across the prey contained by a disappearing circle along a spiral-shaped path concurrently. This simultaneous behavior takes a 50% probability to choose either the shrinking surrounding structure (Equation (6)) or the spiral model for updating the temporal weights (Equation (9)) during the optimization as shown below. * * ( 0.5) where p is a random number in [0,1].

 Search for prey (Exploration phase)
In this situation, A is utilized by the random values greater than 1 or lesser than −1 for forcing the search agent to change the current whale's location [31]. This allows the WOA algorithm to perform a global search that is modeled using Equations (15) and (16).
where rand T is a random temporal vector. The phases of TWOCF approach are further detailed in the TWOCF algorithm.

Procedure of Short-Term:
Input: Timestamp Matrix Output:

Until complete Max iteration Output:
The accurate Rating Matrix with predicted missing scores The accurate list of recommendations for the active user

A. The Effect of TWOCF in Solving the Overfitting, Drift, and Decay
The TWOCF approach is proposed to address the weaknesses of factorization and temporal-based factorization approaches. Experimental results show that the TWOCF approach exhibits significantly superior performance with respect to learning the accurate temporal and factorization features by reducing overfitted predicted scores, tracing decay in the popularity of items, and tracing drifting in the users' preferences. Table I presents the experimental results obtained after implementing the TWOCF approach on MovieLens dataset under-scoring [1][2][3][4][5].
The first column in Table I contains 31 active users. The second and third columns indicate the dimensions of the rating matrix for assigning the learning search space. Columns 4 and 5 show the cluster's number based on users' and items' dimensions, respectively. The different numbers of clusters in each matrix refer to personality behaviors of users. It is worthy of note that this cannot be learned accurately using a specific number of clusters. The sixth column shows several numbers of whale members by which the TWOCF approach accurately learns the features of users and items.
In column 7, the execution time of learning procedures varies according to the dimension space of each matrix. The shortest execution time is 11 seconds while the longest execution time is 487 seconds. The last column indicates the prediction accuracy of the CF technique according to RMSE values. Here, a lower value indicates a higher prediction accuracy. Using the TWOCF approach, results ranging from 0.523 to 0.997 with an average of 0.764 are obtained.
The learning processes by the TWOCF approach are visualized in Fig. 2 to show its ability to reduce the RMSE values throughout the iteration loops. Fig. 2 shows the effectiveness of the TWOCF approach in accurately learning the behaviours of users and items throughout the learning iteration. This improves the CF technique's speed and learning accuracy.

B. Comparison of the Performances of CF, Factorization, and Factorization-based Temporal Approaches
Here, the TWOCF approach is evaluated by comparing its effectiveness in reducing the RMSE value with other www.ijacsa.thesai.org benchmark approaches described in Section 2. The TWOCF and benchmark approaches compared are implemented using one Test-Set (contains 31 rating matrices) to predict the missing scores in each rating matrix. In addition, the contributions of the tested approaches to solve the issues that are reviewed in this article are summarized in Table II. The CF technique has provided the lowest accuracy prediction because of the negative effects of sparsity, drift, and decay issues. The highest RMSE value represents the lowest accuracy prediction. Ensemble Divide and Conquer [13] is used to solve the sparsity and missing the accurate location of data when arrange this data into memory. The output results show better performance compared to the CF. However, this method has a weakness in terms of drift and decay. Long Temporal based Factorization [15] is used to solve the sparsity, overfitting and decay issues by learning the long-term features through the convergence of genres features of the items. The short temporal features (column 7) are defined by different duration factors to achieve accurate solutions. For example, Temporal Dynamics [25] defines the short-term by time slices. However, it has weaknesses in terms of personality.
Short Temporal-based Factorization [16] and Temporalbased Factorization [1] approaches defined the short-term periods using the k-means algorithm where the number of the clusters is assigned based on the number of certain times (e.g., in MovieLens the whole time of users' activities can be divided into 7, 15, or 30 clusters when assigning one month, 2 weeks, or one week period, respectively).
The Short Temporal-based Factorization [16] is used to solve drift of users and ignored the issue of decay during long duration which reduce the accuracy prediction performance of the CF. Temporal-based Factorization [1] approach is used to solve all issues. Its result is the best comparing to the benchmark approaches. However, the benchmark approaches are ignored the results of running time due to the iteration of optimization procedure is very slow as shown in Table II (e.g. minimum average of running time is 1316 second). Bacterial foraging optimization algorithm is used with the last three benchmark approaches of temporal and its experimental running times are slow as shown in Table II which represent as a significant weakness. The recommendation systems need high accuracy as well as faster running time. As observed in Table II, the studied approaches have different executing time and accuracy. It is obvious that approaches with high accuracy have long run time, e.g., Temporal-based Factorization [1] provides lower RMSE (high accuracy) but with a long executing time.
Distinctively, TWOCF approach learns the accurate features of each user within the smallest execution time. Additionally, the TWOCF approach provides the highest accuracy prediction compared to the other benchmark approaches. This means that the TWOCF approach has the best performance and can deal with all kinds of matrices as the number of clusters is assigned automatically. Moreover, TWOCF approach performs best in reducing the overfitted predicted scores and accurately learning the temporal features throughout the learning iteration, which reducing the negative effect of drift and decay in the prediction performance of the CF technique.

V. CONCLUSION
Recommendation systems are becoming popular because they can efficiently recommend products to customers based on their interests. CF-based recommendation systems perform well since they consider the rating matrix in their execution. Nevertheless, CF suffers from the sparsity issue which is usually tackled using factorization approaches. Similarly, overfitting is another challenge mainly addressed using optimization approaches. Additionally, the drift in users' preferences and items' popularity decay addressed by Temporal-based factorization approaches are also major setbacks. Although the current solutions achieve some level of accuracy, there is still room for improvement. For example, dividing the temporal activities throughout the duration search space, reducing the runtime of the execution process, and lowering the error values of the predicted rating scores.
The TWOCF approach is proposed to render timely and accurate predictions within the rating matrix by accurately learning users' preferences and items' popularity pattern. TWOCF adopts the elbow clustering method to obtain the optimal number of temporal clusters. Also, the short-term weights of generated clusters are integrated with the factorization features for predicting the missing scores in the rating matrix. Results show that the TWOCF approach outperforms the benchmark schemes, improves the accuracy of the CF technique, and reduces its execution time.