Incorporating News Tags into Neural News Recommendation in Indonesian Language

—News recommendation system holds the potential to aid users in discovering articles that align with their interests, which is critical to alleviate user information overload. To generate effective news recommendations, one key capability is to accurately capture the contextual meaning of text in the news articles, since this is pivotal in acquiring useful representations for both news content and users. In this work, we examine the effectiveness of neural news recommendation with attentive multi-view learning (NAML) method to conduct a news recommendation task in the Indonesian language. We further propose to incorporate news tags, which at some levels may capture the important contextual meanings contained in the news articles, to improve the effectiveness of the NAML method in the Indonesian news recommendation system. Our results show that the NAML method leads to significant improvement (if not comparable) in the effectiveness of neural-based Indonesian news recommendations. Further incorporating news tags is shown to significantly increase the performance of the NAML method by 5.86% in terms of NDCG@5 metric.


I. INTRODUCTION
A recommendation system plays a crucial role in helping users discover items that match their needs and preferences [1].In business, it is a vital part of marketing strategies, particularly for boosting online sales by offering customers a curated selection of items tailored to their preferences [1,2].This process of recommendation can be automated using a wide range of techniques, such as content-based methods and collaborative filtering [3], and recently involves computing user and item embeddings which serves as a basis for predicting how much a user is likely to prefer a particular item [4,5].Recommendation systems have found application in various domains, including movie recommendation [6,7,8], news recommendation [9,10,11,12,13], and music recommendation [14,15,16,17].
News recommendation stands as one of the most frequently employed use cases of recommendation systems within the digital landscape, notably embraced by online platforms and news publishers [2,18,10].Given the overwhelming amount of news articles generated daily, it is impractical for users to manually sift through all of them to find content that aligns with their interests [19].Every day, an enormous volume of news articles is generated and published online, posing a challenge for users to efficiently discover news that aligns with their interests [2,20].Therefore, the implementation of personalized news recommendation becomes vital, as it enables online news platforms to target user preferences effectively and alleviate the issue of information overload [10].
In the past few years, news recommendation models have benefited from the use of deep learning methods [9,11,10,13,12].These models are often referred to as neural recommendation models.Various neural recommendation models have been introduced to tackle news recommendation challenges, including NAML (Neural news recommendation with Attentive Multi-view Learning) method [11].The NAML method uses an attentive multi-view learning model that learns unified news representations from different kinds of news information, such as news titles, bodies, and categories.This method takes advantage of multiple useful information from news articles that could enrich the semantics captured in the news representation.As a result, this method has been shown to outperform a range of deep learning methods for news recommendation, such as Convolutional Neural Network (CNN) [21], Deep fusion model (DFM) [22], Deep news recommendation based on knowledge-aware CNN (DKN) [23], etc.The NAML method also offers the flexibility to incorporate other useful information that could be exploited from news articles to produce more accurate news representation, which may result in more effective news recommendations.With this reported effectiveness as well as the potential to utilize extra knowledge for improving news representation, we propose to use the NAML method to perform the Indonesian neural news recommendation task in this work.
As mentioned above, the NAML architecture enables us to easily incorporate extra information to produce news representation.Therefore in this work, we also want to investigate the effectiveness of integrating news tags as supplementary information into the news recommendation model, aimed at improving the news representation.We argue that news tags at some levels may capture the important points in the articles, which therefore may provide extra useful information to generate better news representation.Previous research has primarily relied on a limited set of news components, such as title, category, and subcategory [24].To the best of our knowledge, the use of news tags to enhance recommendation performance has not been explored in previous work.
Finally, this research endeavors to address two primary research questions: 1) RQ1.How is the performance of the NAML neural recommendation method to perform news recommendation task in the Indonesian language?
2) RQ2.To what extent news encoder in the NAML model can benefit from the use of news tags in the Indonesian news recommendation task?
By providing answers to these research questions, this study contributes to the existing body of research by providing insights into the effectiveness of a neural method, i.e., NAML, for Indonesian news recommendation systems, as well as the impact of incorporating news tags into the NAML architecture.
The rest of this paper is organized as follows.Section II is dedicated to an in-depth review of relevant literatures related to news recommendation models, news information, and text embedding.Section III outlines the research methodology adopted in this study.Moving on to Section IV, this is where the results of the data analysis are presented and discussed.Section V serves as the conclusion, encapsulating key findings and providing directions for future research based on the study's insights.Finally, Section VI provides recommendations for future work based on the findings of this study.

A. Text Embedding
Text embedding is a technique that represents text as a vector of real numbers.This allows computers to process text in a more meaningful way, as the vector representations can capture the semantic meaning of the text.There are several text embedding methods that try to capture the semantic and syntactic meaning of the text input.
The Word2Vec model served as a fundamental baseline for word embedding [25].Their work introduced two innovative model architectures specifically crafted for generating continuous vector representations of words using vast datasets.These representations were evaluated for their quality in a word similarity task.The Word2Vec model has demonstrated its capability to address various tasks, including text summarization [26,27,28], ranking for academic expert finding [29,30], and text classification [31,32,33].Several models similar to Word2Vec were subsequently introduced, including Glove (Global Vectors for Word Representation) [34] and FastText [35].These models build upon Word2Vec's foundations, enhancing the learning of word representations to acquire deeper semantic insights from the corpus.
In 2017 BERT is introduced, a bidirectional model that can learn context-aware and informative representations of words and phrases [36].BERT, as a bidirectional model, learns word representations considering both forward and backward contexts, distinguishing it from previous models limited to forward context understanding.BERT is pre-trained on a massive dataset of text and code, and it can be fine-tuned for a variety of natural language processing tasks.This model remains challenging for underrepresented languages, which face unique obstacles due to limited data availability and the substantial corpus required for effective training [37].
Notably, there have been notable works on BERT-based models created specifically for the Indonesian language.These models, known as IndoLEM [38] and IndoNLU [39], have been developed independently, each trained on a different corpus of Indonesian text.IndoLEM and IndoNLU represent valuable contributions to Indonesian NLP research.They serve as pre-trained language models that have learned to understand and encode the linguistic patterns, semantics, and contextual information present in Indonesian text data.In addition, the BERT-based model has been demonstrated to enhance performance in classification tasks [31].In our research, we evaluate the capabilities of these two distinct BERT-based models to obtain text embeddings.Furthermore, our part of the research is to conduct a comprehensive comparison of various text embedding methods and evaluate their performance within the context of a news recommendation system.

B. News Information
In the scope of news recommendation systems, the use of news information as news representation plays a pivotal role in constructing the entire system [2,19].The recommendation model aims to construct a user profile customized to individual preferences through an analysis of the articles that a user reads.To achieve this, news representation involves the transformation of some or all news information into vector form [24].
Typically, the components input into the model encompass essential elements such as the article's title, abstract, body, category, and subcategory.These components collectively serve as the news representation for generating a comprehensive vector representation of news articles.This representation is instrumental in facilitating the modeling of user preferences and enabling the recommendation system to provide personalized news suggestions that align with each user's unique interests and preferences [4,5].
Another valuable component of news information is the news tags.Although the news tags show the topic information that is closely related to the news content, the existing personalized news recommendation methods usually ignore the value of tags.News tags are usually used by users to track certain topics in the news portal which can trace other articles based on that tags.This behaviour leads to our research to check whether we add news tags as additional information is necessary for news recommendation models.
One of the methods to process the tag information is by leveraging social bookmarking [40].The method involves the utilization of tags, which are assigned by a community of users with shared preferences.It's important to note that while tags are still employed in this method, they now signify a collective understanding within the user group, aligning with their common interests and preferences.An alternative approach involves creating a probability relation graph among tags, exploring potential correlations among different tags [41].Building on this foundation, they employ the term frequency-inverse document frequency (TF-IDF) method to determine tag weights.Additionally, they introduce a novel approach to calculate the correlation degree between tags, utilizing conditional probability as a key metric.

C. News Recommendation Methods
In recent years, the field of news recommendation systems has received significant attention due to the exponential growth of online news consumption and the need for personalized content delivery.Several studies have been conducted to explore various approaches for improving the accuracy and relevance of news recommendations.News recommendation systems can be broadly categorized based on how they model user behavior.Two primary categories exist: Candidate-Agnostic (C-AG) models and Candidate-Aware (C-AW) models [42].Both of the categories are illustrated in Fig. 1.In C-AG Models, the User Encoder (UE) forms user embeddings exclusively from the embeddings of previously clicked news articles, without taking into account the candidate news articles [10,9,11,12].Essentially, the user embedding remains the same regardless of the specific candidate news being presented.One of the C-AG models is NAML that using a multi-view learning framework to learn unified news representations by incorporating several news information like title, body, and category.The NAML architecture is easily modified to add other news information.For the user encoder, NAML utilize a news attention network to identify crucial news articles, aiding in the acquisition of informative user representations [11].The experimental results showcased significant enhancements in recommendation accuracy with the deep learning approach compared to traditional methods.Due to this performance, we have selected NAML as the primary model for our research focus.
Conversely, in C-AW models, the User Encoders (UEs) produce user embeddings that are influenced by the content of the candidate news articles.This means that in C-AW Models, the user embeddings can vary depending on the specific candidate news article in consideration [23,13].While C-AG Models maintain a consistent user embedding, C-AW models adapt the user representation to the characteristics of the candidate news, thereby potentially improving recommendation accuracy.
Considering the importance of contextual information in news recommendations, researchers have explored the incorporation of additional factors such as temporal relevance and user context.With the advent of deep learning techniques, several studies have investigated the application of neural networks for news recommendation [10,9,11,13].
With the evolution of news recommendation systems, several metrics are employed to compare the performance among models.The most common ones include AUC (Area Under Curve), MRR (Mean Reciprocal Rank), and NDCG (Normalized Discounted Cumulative Gain) [4,43].For models treating the news recommendation task as a classification problem, the Area Under Curve (AUC) score is a frequently used metric to measure the accuracy of the model.MRR is calculated as the reciprocal of the position of the first relevant element in the ranking.NDCG considers graded relevance (rating values) along with positional information of the recommended items.Both MRR and NDCG assess the relevancy of recommendations from the news recommendation system (NRS) model.
In summary, news recommendation systems have been extensively studied, employing various approaches such as contextual information and deep learning approaches.These studies have contributed to the advancement of news recommendation systems, enhancing their accuracy, relevance, and personalization capabilities.

A. Dataset
We conducted experiments on a real-world news recommendation dataset collected from one of the largest news portals in Indonesia.We randomly sampled users who had at least 5 news click records during 4 weeks from Feb 10 to March 16, 2023.The last 1 week was treated as a data test.We collected the behaviour logs of these users in this period, which are formatted into impression logs.An impression log is a record of the news articles that are presented to a user when they visit a news page, including information about the time of the visit and the user's interactions such as clicks on these news articles.In Fig. 2, you can see the layout of the news article page and an example of news information presented within an article.Regarding the news dataset, we focus on several news components to figure out their influence on the performance of news recommendation systems.These components encompass the title, abstract (short description), news body, category, and subcategory.Furthermore, our primary research objective is to scrutinize the impact of news tags on the model.By examining how each of these elements shapes the model's performance, we aim to gain valuable insights into the factors that drive effective news recommendations.The statistical information about the dataset summarized in Table I.Fig. 3a, 3b, and 3c provide insights into the length distributions of news titles, abstracts, and bodies.Notably, news titles exhibit a distinct pattern of being quite short, averaging only 10.4 words.In contrast, both news abstracts and bodies present significantly longer text lengths, offering the potential for more in-depth coverage of news content.This discrepancy in text length underscores the value of incorporating diverse types of news information, including titles, abstracts, and bodies.Such integration enriches our comprehension of news articles by providing a more comprehensive and nuanced perspective on the content.In contrast to the approach outlined in [41], our methodology involves concatenating the news tags into a sentence or text format.We then treat this combined text in the same manner as other news information, such as the title and abstract.As a result, we transform the tags into a textual format, enabling them to be seamlessly integrated into the news content.
Our initial observations, as illustrated in Fig. 4, reveal an interesting connection between the amount of text data and the presence of words within news tags.With the expansion of text data, there is a higher probability of encountering shared words between the tags and other textual content.This intriguing pattern suggests that even though news tags are found within other news information, they can contribute to a more comprehensive understanding of the article.This is because the proportion of news tags included in other news information remains relatively low.This highlights the potential significance of news tags in enriching news representation.

B. Experiment Settings
In our study, we employed two pre-trained models of word embedding, FastText and Glove.Additionally, we utilized the feature-based approach of Indonesian BERT architecture to generate more contextualized word embeddings.We reported average results in terms of AUC, MRR, NDCG@5 and NDCG@10.For the training objective, we primarily focus on parameter tuning through minimizing cross-entropy loss (with negative sampling) as the straightforward classification objective [44].

C. Text Embedding Models
As part of our benchmarking process, we trained three word embedding models-Word2Vec, Glove, and FastText-using the Indonesian Wikipedia corpus extracted from the Indo4B corpus [39].Furthermore, we also trained a FastText model using the entire Indo4B corpus.For our BERT-based models, we leveraged models from two distinct sources, namely indolem [38] and indobert [39].These pretrained BERT models played a crucial role in our research, serving as integral components in our investigations into text embeddings and their impact on news recommendation systems.

D. News Recommendation Models
In this paper, we assess several C-AG models, each distinguished by their News Embedding (NE) component, which essentially determines how they embed the clicked news articles: (1) The LSTUR [9], focuses on learning user representations using recurrent networks.In this model, a short-term user embedding is generated from the clicked news articles using a Gated Recurrent Unit (GRU) [45].This short-term embedding is then combined with a long-term embedding, which is constructed by initializing with random values and fine-tuning it during training.In this research, we use two types of LSTUR model which is LSTUR-con and LSTURini.( 2) NAML [11] employs additive attention [46] to encode users' preferences.This means that NAML utilizes an attention mechanism to capture and emphasize important aspects of a user's preferences when making recommendations.Additionally, our primary focus for this experiment will be on NAML, given its strong performance and adaptability, especially in accommodating adjustments to the included news information in the model.( 3) NRMS [10] adopts a more intricate approach to learn user representations.It employs a two-layer encoder, consisting of multi-head self-attention [36] and additive attention.This design allows NRMS to capture and process user information from different perspectives, potentially leading to more comprehensive user representations for improved recommendations.( 4) TANR [12] propose a new encoder that is trained to learn topic-aware news representations by simultaneously training it on an auxiliary topic classification task.
On the other hand, we examine one specific C-AW model to gain insights into their performance and effectiveness.We evaluate CAUM [13], there is a fusion of two key components: A candidate-aware self-attention network, which is used to model extensive connections among clicked news items while taking into account the specific candidate news; and a candidate-aware convolutional network (CNN) employed to capture immediate user interests from nearby clicks, also influenced by the content of the candidate news.Ultimately, the user's candidate-aware embedding is derived by considering both the long-range and short-term representations.

IV. RESULTS & ANALYSIS
Table II shows evaluation of the NRS has produced diverse performance results across various testing scenarios.These thorough assessments have uncovered valuable insights, particularly highlighting the competitive performance of two prominent models, NAML and CAUM, in terms of accuracy and relevancy.However, a notable distinction emerges as NAML demonstrates a significant advantage in terms of runtime efficiency, notably surpassing the CAUM model in processing speed.This unique combination of high performance and reduced computational overhead presents a compelling argument.
Using the NAML model as our main framework, we conduct a more in-depth assessment of how news information influences news representation and, consequently, its impact on news recommendation model performance.The results are shown in Table III.Our approach involves the independent processing of various news text information and categories.We merge these elements using an attention network, constructing a holistic representation that encompasses the collective impact of various news components.This method enables us to systematically investigate the synergistic effects of diverse news content on enhancing model performance.

Regarding news representation, the results in Table III
reveal a key insight: relying on a single source of news infor-mation may not suffice for optimal representation.The notable improvement lies in not depending solely on a single type of news information but integrating various types, such as title, abstract, and tags.This approach is suggested to potentially result in better performance, emphasizing the need to leverage the complementary aspects of diverse news components.By incorporating a variety of information sources, the model can achieve a more comprehensive and nuanced understanding of news articles, ultimately improving the quality of news representation.Table III provides further insights, notably indicating a significant improvement in model performance when news tags are incorporated alongside the abstract as part of the news information.This observation underscores the positive impact of integrating news tags into the model, particularly when combined with other textual elements like the abstract.Such integration significantly increases the performance of the NAML method by 0.88%, 3.11%, 5.86%, and 2.59% for AUC, MRR, NDCG@5 and NDCG@10, respectively.
The example of the NAML model using and without tags is illustrated in Fig. 5. Based on this example, it is evident  The analysis of user interactions with news articles has yielded consistent findings across varying levels of user engagement with news content.These findings have been summarized and are presented in Table IV.Surprisingly, users who interact with only a few news articles demonstrate outcomes comparable to those who engage with a more extensive collection of articles.This intriguing observation suggests that the specific selection of the last N news items may not exert a substantial influence on the observed results.This insight prompts further exploration into the dynamics of user interactions and their impact on the recommendation process, potentially leading to more refined and nuanced approaches to personalizing news recommendations.In addition to this research, we undertook an in-depth examination of different text embedding methods and their performance when integrated into the NAML model.The comprehensive results of these evaluations can be found in Table V.Our objective was to ascertain whether utilizing BERT-based models, which have demonstrated remarkable capabilities in various natural language processing tasks, would yield a significant performance boost within the NAML method.
However, upon thorough analysis, our findings suggest that the differences in performance between word embeddings and BERT-based models are not statistically significant when applied to the NAML model.In other words, the NAML model does not exhibit a substantial improvement in recommendation performance when integrated with BERT-based text embeddings compared to more conventional word embeddings.This emphasizes carefully choosing the right models and text embedding methods when optimizing recommendation systems.

V. CONCLUSION
This research investigates the effectiveness of neural news recommendation with attentive multi-view learning (NAML) method to perform news recommendation in the Indonesian language.We further propose to incorporate news tags information into the NAML architecture in order to improve the semantics of news representation, which is aimed at enhancing the Indonesian news recommendation system.According to our experimental results, the NAML method can significantly outperform some state-of-the-art neural models, such as LSTUR, TANR, and NRMS, in Indonesian news recommendation task.Although NAML demonstrates effectiveness similar to the CAUM method, it maintains superiority in terms of efficiency.Furthermore, our investigation into news representation underscores the significance of diversifying news information sources.Combining multiple types of news data, rather than relying solely on a single source, has shown promising potential for enhancing model performance.Subsequently, the incorporation of news tags into the NAML architecture has demonstrated notable effectiveness, resulting in an enhancement of 3.11% for MRR and 5.86% for NDCG@5 in recommendation system performance.
In addition, we conducted performance assessments of the NAML model using various text embeddings.Upon initial examination, it becomes evident that there's no significant improvement observed when utilizing different types of word embedding as news representations, i.e., Word2Vec, FastText, Glove, and IndoBERT.Lastly, our analysis of user interactions with news articles, has unveiled intriguing findings regarding user engagement levels and their influence on recommendation outcomes.We found that users who interact with only a few news articles demonstrate outcomes comparable to those who engage with a more extensive collection of articles.This insight encourages further exploration into personalized news recommendation strategies, with the potential to refine and tailor recommendations based on a deeper understanding of user behaviors.

VI. FUTURE WORK
This study's findings offer insights and several promising directions for future research aimed at enhancing news recommendation systems.First, we can fine-tune the model parameters using a contrastive learning objective, specifically supervised contrastive loss.This approach can be employed to improve the separation between clicked and not-clicked news articles within the representation space.Second, it is potential to explore the automatic extraction of entities from news articles using NER (Named Entity Recognition) method as an integral part of the news recommendation model process.This approach can help uncover the relatedness of entities between relevant and non-relevant news articles, providing valuable context for improving recommendation robustness.Finally, research in news recommendation systems should diversify beyond contextual user history, adopting different methodologies for broader insights and improved system performance.

Fig. 2 .
Fig. 2. Illustration of page layout and main article example.

Fig. 4 .
Fig. 4. Percentage of tags information included in other news information.

Fig. 5 .
Fig. 5. Recommendation results by using and without news tags for the same impression.The news clicked by the user in this impression is in blue and bold.
The Yamaha Grand Filano comes equipped with various interesting features, from its hybrid engine to its super spacious trunk.The Yamaha Grand Filano Hybrid-Connected was officially launched in Indonesia on Tuesday, January 17, 2023, in Jakarta.In Indonesia, the Grand Filano will join the Classy Yamaha segment alongside the Fazzio.In terms of design, the Grand Filano can be considered more fashionable compared to the Fazzio, which has ...

TABLE I .
STATISTICAL INFORMATION ABOUT THE DATASET

TABLE II .
THE PERFORMANCE ON VARIOUS NEWS RECOMMENDATION MODELS.SIGNIFICANT DIFFERENCES WITH RESPECT TO BASELINES LSTUR/TANR/NRMS/CAUM ARE INDICATED USING †/ ‡ / * /♢ FOR p < 0.05.RUN TIME IS REPRESENTED IN THE FORMAT HH:MM:SS

TABLE III .
PERFORMANCE METRICS BEFORE AND AFTER ADDING THE NEWS TAGS.SIGNIFICANT DIFFERENCES ARE INDICATED USING † FOR p < 0.05 OR ‡ FOR p < 0.001.

TABLE IV .
RESULTS ON DIFFERENT NUMBER OF CLICKED NEWS BY USERS

TABLE V .
RESULTS ON DIFFERENT TEXT EMBEDDING