Genres and Actors/Actresses as Interpolated Tags for Improving Movie Recommender Systems

A movie recommender system has been proven to be a convincing implement on carrying out comprehensive and complicated recommendation which helps users find appropriate movies conveniently. It follows a mechanism that a user can be accurately recommended movies based on other similar interests, e.g. collaborative filtering, and the movies themselves, e.g. contentbased filtering. Therefore, the systems should come with predetermined information either by users or by movies. One interesting research question should be asked: “what if this information is missing or not manually manipulated?” The problem has not been addressed in the literature, especially for the 100K and 1M variations of the MovieLens datasets. This paper exploits the movie recommender system based on movies’ genres and actors/actresses themselves as the input tags or tag interpolation. We apply tag-based filtering and collaborative filtering that can effectively predict a list of movies that is similar to the movie that a user has been watched. Due to not depending on users’ profiles, our approach has eliminated the effect of the coldstart problem. The experiment results obtained on MovieLens datasets indicate that the proposed model may contribute adequate performance regarding efficiency and reliability, and thus provide better-personalized movie recommendations. A movie recommender system has been deployed to demonstrate our work. The collected datasets have been published on our Github repository to encourage further reproducibility and improvement. Keywords—Movielens; movie recommender systems; tag interpolation; colloborative filtering


I. INTRODUCTION
Recommender systems (RSs) have been developed to generate meaningful recommendations any products or items to a group of users that might get their attention. RSs [1], [2] are now widely use in research [3], industry [4], and education community [5], [6], where many approaches have been developed for improving recommendations. Many real world examples of recommendation operation can be found for books on Amazon [7], music on Spotify [8], activities on social media [9], [10], services on Twitter [11], [12], or movies on Netflix [13]. The design of these systems depends on the particular characteristics of the datasets, e.g. the ratings of 1 (most disliked) to 5 (most liked). Additionally, the systems might incorporate other information such as descriptions, multimedia contents, and demographic knowledge. Such data sources capture the interactions between items-items, usersusers, and users-items. Recommender systems then analyze and learn the underlying patterns in these data sources to develop a correlation between users/items and or items/users which can be used to predict similar pairs. The architecture and evaluation of RSs are an active research area. The infinite solutions to RSs can be categorized into several categorizations. Content-based recommendation models recommend items that are similar to the items that a user has interacted in the past. The second approach is collaborative filtering that recommends items based on all users' past ratings collectively. Tag-aware recommendation [14], [15] approaches the interaction among items that are independent of the existence of users. Our contribution enhances the current research on the tag-based recommendation [16], [17]. Application of tag-based recommendation have been exploited in various domains from personalized social media services [18], e-learning environments [19], personalized location recommendation [20], image search [21], personalized news recommendation [22], personalized music recommendation [23], [24], and many others. A fourth category is a hybrid approach that combines two or more of the previously mentioned categories [25], [26].
Content-based recommendation [27], [28] is the proposal of items based on a comparison between the content of data items and/or user profiles. The content of each item is presented as a set of descriptions, lists of terms or tags, often words that appear in the textual form. The recommended items are primarily related to the items that are relatively rated as a recommendation. Content suggestions use different types of models to find similarities between sources to create the best proposal. The term collaborative filtering (CF) was introduced in a commercial recommender system that recommends newsgroups documents to users [29]. CF analyzes data interaction across users to find matching patterns resulting in other items recommendation [30]. The cold-start problem arises in CF systems where users exist and they have not rated several items before. The motivation of CF is to leverage social collaboration recommend the most similar items/products/services despite a large amount of data. Applications of CF have been developed in a wide range of domains from recommending books [31], musics [32], movies [33], advertisements [34] and other consumer products [35].
Twenty years of MovieLens datasets have witnessed a blossom of research that is garnering a remarkable significance with the advent of e-commerce and the whole industry. Variations of the dataset have been downloaded hundreds of thousands of times, reflecting their popularity and distinctive contribution in the field of recommendation systems and connected subjects. The samples take the form of <user, item, rating, timestamp> tuples where each tuple represents a personal preference for a movie at a particular time. A report made by their inventors shows that more than 7500 references to the keyword movielens have been made in Google Scholar [36]. A live research system 1 of the Movie-Lens datasets has been developed and maintained by experts to nurture the personalization and recommendation research. GroupLens research group 2 developed MovieLens as an online movie recommendation system that allows users to rate movies and integrates rating from different sources to collaboratively recommend to other people. Averaged 20-30 new users have signed every day for a long period. This system allows people to create profiles, rate movies, establish tastes and receive recommendations.
Our approach interpolates genres and actors/actresses as tags that predict similarity among movies and provide an appropriate suggestion. We investigate the two-way interactions between {item i , [tags] i } and {item j , [tags] j }, where an item i is similar to an item j using the similarity score between their two tags [25], [37]. Practically, tags are collected from users' annotations during the involvement of a recommender system. However, what if the information is missing or does not exist in the first place? Table I presents a quantitative summary of the MovieLens datasets in which the first two variations of the datasets contain no tag information. In this paper, the authors consider another principle design of a movie recommender system: watching movies containing similar genres and actors/actresses (as other movies) lead to watching more same movie categories, which leads to an approach called tag interpolation-based recommendation. We have evaluated the proposed approach on Movielens' variations that contains no manual and/or collected tags from users. Instead, the tags come from movies' genres and actors/actresses. To the best of our knowledge, the research on a movie recommender system based on tags has never been done on the MovieLens 100K and MovieLens 1M variations.

II. RELATED WORK
One of the early attempts to develop a model and build a movie recommender system has been proposed by Azaria et al. [38]. In that paper, the authors introduce the profit and utility maximizer algorithm (PUMA) which mounts a black-boxed movie recommender system and predicts movies that will maximize the system's revenue. Another research direction focus on human emotions as the input for movie recommendation [39]. The approach accepts the user profile as part of the system. Deldjoo et al. introduce multimodal content-based movie recommender system [40] that is evaluated on the MovieLens 20M dataset. They exploit the effects of genres as the metadata feature. However, the tags have already provided in MovieLens 20M. The genre features have been further addressed by the same research team of Deldjoo [41]. Another interesting paper that focuses on tagaware recommendation and the effects of tags over a recommender system is presented in [42]. In that paper, the authors investigated tags from genres and textual reviews on the tagavailable dataset, e.g. MovieLens 10M. These models have one thing in common that they perform the recommendation task on tag-available datasets with extra agglomeration from other sources, e.g. genres, users' information, and textual reviews. Our approach differs from these previous ones by the fact that the tags are automatically interpolated from genres and actors/actresses, without any additional manual effort from the users' side and predetermined tags. Consequently, this work is the first to exploit tag interpolation in MovieLens 100K and MovieLens 1M datasets and can be furthered referred for tagbased recommendations.

A. Datasets
As mentioned in the previous section, the authors employ the MovieLens 100K and MovieLens 1M variations in the experiments because there are no pre-defined tags, but instead, tags are interpolated from the movies' genres and actors/actresses. The datasets [36] can be downloaded on the MovieLens 100K 3 and MovieLens 1M websites 4 . The MovieLens 100K dataset consists of 100,000 ratings (from 1 to 5) from 943 users on 1,682 movies. Each user has rated at least 20 movies. The MovieLens 1M dataset comprises 1,000,209 ratings (from 1 to 5) from 6,040 users on 3,706 movies. For each dataset, the training and test sets have been already split into five-fold cross-validation. The authors run the proposed model on all sets and take an average in the end.
Interpolated tags. We could identify 19 different genres in both MovieLens variations. These tags can be easily extracted from the u.genre file of each MovieLens dataset. Regarding actors/actresses, these tags are not included explicitly. Instead, the authors link the movies of MovieLens dataset with their corresponding web pages at Internet Movie Database (IMDb) 5 from the u.item file of each MovieLens dataset, and extract actors/actresses from the IMDB database. One movie in 100K variation contains at least 1 and at most 45 actors/actresses while one movie in MovieLens 1M consists of at least 2 and at most 235 actors/actresses. The number of 14291 and 46198 actors and actresses can be extracted from MovieLens 100K and 1M respectively. The summary of interpolated tags is presented in Table II. The top 10 most used interpolated tags are summarized in Tables (III and IV collected datasets available at our Github repository 6 . We encourage reproducibility, further comparison and improvement.

B. Evaluation Metric
Root mean squared error (RMSE) and mean absolute error (MAE) are widely used to evaluate the performance of a recommender system given a rating prediction task. The errors quantify the difference between the true rating values and the predicted rating values made by the recommender system. In this work, the authors evaluate the performance of the system by RMSE. We denote r ij andr ij as the true rating value and the predicted rating value respectively. Then the RMSE e is calculated as follows: where the smaller the e is, the better the result is.
Furthermore, based on the computed rating scores of movies, the similarity between any two movies u and v is calculated by using cosine similarity c i,j as follows.
where m is the dimensional space of u and v. The list of recommended movies is sorted by values of c u,v .
Equation (1) is used to evaluate the performance of the recommendation system. Based on the list of recommended movies created by Equation (2), the system compares the most similar movie's ratings with its prediction by Equation (1). However, in the system deployment presented in Section (IV), the list of recommended movies is more important to the users than the RMSE scores. Hence, the calculation of Equation (1) is ignored in real-time.

C. Experimental Results
The MovieLens 100K and MovieLens 1M datasets contain information on several meta-data such as genres and the names of actors/actresses. The information is considered as the feature of movie observations. We describe the experimental results in the following three scenarios.
1) Scenario 1: Tags are interpolated from the movies' genres.: In this scenario, the authors investigate the performance of the recommender system by utilizing genres as the tags. The experimental results are presented in Tables V and VIII for MovieLens 100K and MovieLens 1M respectively.
2) Scenario 2: Tags are interpolated from the movies' actors/actresses.: Scenario 2 is about the effect of actors/actresses tags to the system's performance. The experimental results are presented in Tables VI and IX for Movie-Lens 100K and MovieLens 1M respectively.
3) Scenario 3: Tags are interpolated from the movies' combination of genres and actors/actresses.: In the last experimental scenario, the authors combine the tags of both genres and actors/actresses. Tables VII and X presents the system's performance in this scenario.

A. System Design
The movie recommender system implements the Model-View-Template model during creating an application with user interaction. This model includes HTML codes with Django    by URL mapping and if the URL is successful. The View will start interacting with the Model and return the Template to the user as Response. The website is written in Django's default SQLite database and it also integrates a lightweight server for application development and testing.
The system provides functionality for two groups of users, e.g. the administrator(s) and the user(s). An administrator performs functions of managing users, users' information, movies, movies' information, databases, and suggestions. Administrators have the highest rights in the system that can perform addition, editing, deletion, and search for movies and users. Users are allowed to register, log in, search for movies and actors/actresses. The overview of our proposed movie recommender system is illustrated in Fig. 1.

B. System Implementation
As the implementation of our recommendation system, the authors have deployed a website for our movie recommender system 7 . The application has 15 preliminary features for both users and administrators. The functionality of our website is shown in Fig. 2. The database design is presented in Fig. 3. The website has been developed using Django framework 8 [44] and the relational database management system SQLite 9 [45]. A screenshot of our website can be seen in Fig. 4. The recommendation function is demonstrated in Fig. 5 where the watched movie is in the main position on the left and its list of similar movies is presented on the right. All the work of the model's training and prediction and the website's deployment are done on a normal laptop. The hardware configurations are the following: Intel Core i5, 12GB of RAM, 240GB high-speed SSD, and Windows 10. Before using the system, users need to register an account without specifying their preferred movie genre. The interaction with the recommender system can be done through the web interface. By watching any movies and rating them, user profiles are created. A list of recommended movies is generated every time a movie is watched. This system is deployed in real-time scenarios to generate an automatic recommendation.

V. REMARKS AND DISCUSSION
The experimentation has been conducted on the MovieLens 100K and MovieLens 1M whose tags are missing originally. Remember to note that information of tags is only available in modern variations of the datasets, e.g. MovieLens 10M and MovieLens 20M. The authors interpolate the movie genres and actors/actresses as the tags. The experimental results lead us to believe that the proposed tag interpolation should work properly and yet improve the development of movie recommender systems whose tags are missing. We have achieved better RMSE scores as other approaches running on the tagavailable MovieLens datasets [42]. From the experimental results conducted in [42] on a similar movie recommender system, we can agree that our proposed tag interpolation approach is more effective than probabilistic matrix factorization [46], collaborative topic regression [47], factorization machines [48], and regression latent factor model [49].
The RMSE scores are quite similar in all experimental scenarios. Regarding MovieLens 100K, the average score is achieved by 1.1457 ± 0.0126, 1.0556 ± 0.0250, and 1.0556 ± 0.0250 in case of genre tags, actors/actresses tags and the combination of genres and actors/actresses respectively. The running time is super fast in the case of movies' genres, e.g. less than 1 second. In case of MovieLens 1M, the RMSE scores are slightly better than those of 100K variation. the average score is achieved by 1.0364 ± 0.0190, 1.0335 ± 0.0025, and 1.0334 ± 0.0025 in case of genre tags, actors/actresses tags and the combination of genres and actors/actresses respectively. The running times increase through the extension of the number of interpolated tags. The effect is quite understandable that the more data processed, the more times required.

VI. CONCLUSION
The prevalence of movie recommendation systems has been an indispensable component in a wide range of websites and e-commerce applications. And tag usability is increasing in many recommendation systems, yet appropriate algorithms are available to exploit these tags. This work addresses a simple research question: what if the tags are missing or do not exist in the first place? Therefore, tags can be interpolated from any other characteristics of the movies themselves. Our proposed approach makes it highly convenient for users to get meaningful movie recommendations. Several experimental scenarios have validated the effectiveness of our proposed solution. The significant contribution of the paper is to the MovieLensbased research where previous work has never done on the 100K and 1M variations. As we illustrated in our experimental results, the effects of genres and actors/actresses as interpolated tags have proved the effectiveness and applicability. We have implemented a complete movie recommender system with 15 preliminary functions for both users and system administrators. The interpolated-tags datasets are also available on our Github repository. Future work will focus on the implementation of datasets that emerge the similar characteristics.