Fairness Embedded Adaptive Recommender System: A Conceptual Framework

In the current fast paced and constantly changing environment, companies should ensure that their way of interacting with user is both relevant and highly adaptive. In order to stay competitive, companies should invest in state-ofthe-art technologies that optimize the relationship with the user using increasingly available data. The most popular applications used to develop user relationship are Recommender Systems. The vast majority of the traditional recommender system considers recommendation as a static procedure and focus on a specific type of recommendation, being not very agile in adapting to new situations. Also, when implementing a Recommender System there is the need to ensure fairness in the way decisions are made upon customer data. In this paper, it is proposed a novel Reinforcement Learning-based recommender system that is highly adaptive to changes in customer behavior and focuses on ensuring both producer and consumer fairness, Fairness Embedded Adaptive Recommender System (FEARS). The approach overcomes Reinforcement Learning’s main drawback in recommendation area by using a small, but meaningful action space. Also, there are presented two fairness metrics, their calculation and adaptation for usage with Reinforcement Learning, this way ensuring that the system gets to the optimal trade-off between personalization and fairness. Keywords—Algorithmic fairness; reinforcement learning; recommender systems; system adaptability


I. INTRODUCTION
In the current constantly changing business environment, companies are required to respond appropriately to challenges that appear and adapt quickly to customers new needs and expectations in order to stay "top of mind" with prospects and clients. In retailing, a family of applications called Recommender Systems (RecSys) can help businesses stay relevant to their customers by leveraging the existing data about users and/or different items in order to help users find the right item for them [1]. One disadvantage of the current approaches of RecSys like Collaborative filtering or Contentbased recommendations is that these strategies consider only the two elements, users and items when delivering recommendations, making impossible to detect important patterns that include other elements and to adapt it to the context or changing environment. Also, each of the recommendation approaches has its own limitations. Items recommended through Content-based filtering are always similar to the items previously bought or consumed by the user [2], while Collaborative filtering provides a good solution only under static scenarios when there are many users that bought or consumed the same product [3]. Hybrid recommender systems combine two or more recommendation strategies in different ways to benefit from their complementary advantages [4] and overcome the limitations of individual components. Another limitation of RecSys, regardless of strategy is the assumption that user's underlying preferences remains unchanged, thus the recommendation procedure is a static process [5,6].
One of the best-known approaches that allows to include adaptability in a system is Reinforcement Learning (RL) [7][8][9]. There is a series of publications that explore the usage of RL in the area of RecSys. Out of which there are those that focus on user-item interaction sequence or user's browsing history and use it to create a state that later is fed to the RL model [10][11][12][13][14][15]. A different approach is to use user and item sets which are obtained from bi-clustering as environmental states [6]. An earlier paper is using both user information and item information vectors and refers to it as context [16]. Important work on integrating negative influence of irrelevant recommendations is done by using negative rewards [12,13,15,17].
It is worth noting that based on literature review, there are no works as per knowledge of the author that explore simultaneously the following elements of an efficient and fair recommender system: 1) focus on customer relationship development 2) adaptability to new situations 3) optimization for long term customer engagement using negative rewards where appropriate and 4) awareness of both consumer and provider fairness.
Thus, in this paper, it is presented the design of a Fairness Embedded Adaptive Recommender System (FEARS) that has its main aim the development of customer relationship. The conceptual framework is combining multiple recommendation strategies through leveraging extensive information about user, items and context. The recommendation strategies are combined and used as the action space by a RL Agent. This 488 | P a g e www.ijacsa.thesai.org way, the system has the ability to automatically learn the optimal policy through trial-and-error; by recommending and receiving reinforcements from user's feedback. This will allow the system to quickly adapt to the changing needs of the customers and will try to come up with a more long-term recommendation strategy to build a fruitful relationship with the client. Not least, the rewards of the RL engine are defined in such a way that both consumer and provider fairness is being ensured.

II. RELATED WORK
In this section, firstly it is described the basic problem of Recommender Systems. Next, each of the recommender strategies specifics, latest advances and limitation are presented. Then, it is introduced the Reinforcement Learning practice and an analysis of it's to date usage and limitations in the recommendation area.
Also, an exploration of the different fairness formalizations for ML and recommendation systems is done in order to give an overview on how these definitions are translated into implementation.

A. Recommender Systems (RecSys)
In the human decision-making process, obtaining recommendations from trusted sources is a critical component. Usually, this role is played my family, friends or subjectmatter experts. The goal of a recommender system is to create and give relevant recommendations of items or products to users. Depending on the structure of the learning system, traditionally there are distinguished following types of systems [4]: • Collaborative Filtering: In this type of systems, a user is recommended items based on the previous ratings of the users that bought/ used the product.
• Content-based Filtering: These systems recommend items that are similar to items the user has liked in the past.
• Hybrid approaches: These methods try to combine both collaborative and content-based approaches into one in order to overcome the individual limitations of each of the approaches.
Currently, the architecture of RecSys and their evaluation on real-world problems is an active area of research.

B. Collaborative Filtering
Collaborative filtering (CF) systems collect user feedback in the form of ratings or ranks and makes recommendations to the active user based on items that other users with similar preferences liked in the past [38].
The aim of any recommendation system is to suggest elements that are relevant to users by extracting latent variables [39][40][41][42].
Latest advances in the field include using graph encoding, Stochastic Shared Embeddings, large-scale Pairwise Collaborative Ranking, Sequential Recommendation Via Personalized Transformer [43] that mainly solve the problem of scaling to massive datasets, learn user and item embeddings and think about the problem as a sequence of actions, not oneshot recommendations. The user and items embeddings are mainly a different way to refer to latent variables and a series of work leverage the power of Neural Networks to try and learn them [44][45][46].

C. Content-Based Filtering
The intuition behind a content-based recommendation is to suggest to a customer a product similar to those the user has previously purchased. Method tries to extract similar objects. There are two main types of measures used to estimate this relationship: measures of distance and measures of similarity between objects [47].
Most of the advances in the content-based recommendation area is based on finding best ways to represent an item through a vector, or, in other words, get their embeddings [48][49][50][51][52][53].

D. Recommendation of Complementary Products
When recommending complementary products, the system tries to leverage the transactions history of customers [54,55] through Association Methods like APRIORI.
The APRIORI algorithm remained essentially unchanged since its introduction to the research community, although there are sporadic efforts to extend it [56,57].

E. Reinforcement Learning (RL)
Reinforcement Learning is an area of machine learning that has been inspired by behavioural psychology. The field focuses on how a software agent (hereinafter agent) should take actions and how to interact with an environment so as to maximize a total reward function.
An agent can interact with the environment and learn through trial and error, just like humans and animals. Every action that the agent performs in an environment influences the future state of the agent. Also, each action is rewarded with a reward, and this is the only response the learner receives [58]. The mechanism that generates the reward and the transition from one state of the agent to another refers to the dynamics of the environment [47].
The agent's goal is to maximize his total long-term reward in the way he responds to his environment. This can happen if an agent explores the environment and tries to learn its dynamics.
Formally, the environment is a mathematical model known as the Markov Decision Process (MDP) encountered primarily in dynamic programming. The difference between the classical methods of dynamic control and RL is that the latter does not know the MDP model and can be used if these processes are very complex and other methods are unfeasible [59]. The basic MDP model contains the following components: • A set of environmental states S1, ..., Sn ∈ S: These can refer to the inherent characteristics of the agent or objects that surround and interact with it.
489 | P a g e www.ijacsa.thesai.org • A set of actions that the agent can take, A1, ..., Am ∈ A: These refer to all possible actions that the agent controls.
• Transition function from one state to another: Being a Markov process, the next state of the system depends only on its previous state and the action taken, not on the whole history.
• The reward function represents the value of the reward obtained after acting with At in St.
An agent is a computer program that is able to observe and interact with the environment defined by the MDP. The agent perceives the environment as a set of observations that define a state. The agent interacts with the environment in a feedback loop pattern by following the steps below: 1) The agent observes the characteristics of the environment that define the current state, St.
2) The agent chooses an action from the set of possible actions, At, with which it responds to the environment in the current state St.
3) The agent enters a waiting state until the characteristics of the environment change with the St + 1 state and the agent receives the Rt+1 reward.
The agent's behaviour or the way he interacts with the environment is described by a function called action policy or simply policy [60]. It specifies the actions to be taken when the agent is in a certain state. The agent's learning goal is to find a policy that maximizes the total reward.

F. Recommender Systems using Reinforcement Learning
As previously mentioned, in the literature there are already RecSys that include an RL engine. It is useful to formalize the problem of RL in the RecSys area and see the differences in the approach of the different research.
As mentioned above, formally, the RL problem can be defined as a mathematical MDP model. For that it is needed to specify the States, Actions and Rewards.
States are defined in different ways in the existing literature. They can reflect a mapping of previous user-item interactions into a hidden state [15], user's recommendation and ad browsing history [13], previous items that a user clicked [12], the sequence of visited and recommended items [10] or a more detailed interaction sequence that contains clicking, purchasing, or skipping, leaving [14]. An interesting approach is to define states as the cluster resulted from the coclustering or biclustering of users and items [6] or to extend the state to include user demographics [5]. Efforts are as well invested in how to best represent the state in a RL RecSys [61]. Currently, and as to the knowledge of the author, in the current literature there is no approach where the recommendation context, user demographics, behavioral patterns and recent browsing/interaction history is taken into account in the state definition.
Actions are mostly defined as selecting an item to be recommended from the whole discrete action space which contains the candidate items [12,14,15] or even whether to give a recommendation or not, and if yes, what would be the item to recommend [13]. There are authors that consider recommending a list of items [5,11,61]. One of the most different approaches it to recommend items from neighboring clusters to the user-items one [6]. As mentioned in multiple articles [62,63], RL in RecSys has a common issue of efficiency that comes from the fact that the action space is too large, consisting of all candidate items, and thus huge amount of interaction data is required for learning an optimal policy.
The reward function is heavily dependent on user feedback and actions he takes, for example user can click or purchase a recommended item and receive a positive reward or to skip it and get a different reward value [10][11][12][13]15]. Reward can consist of immediate user feedback, but as well as a longerterm objective [14]. Most of the rewards are not deterministic and depend very much on how the user is reacting, but there are also formulations when this is seen deterministically as the Jaccard distance between the user vectors of the time t and t+1 state [6].
It is important to note the research direction as well towards using negative rewards. This can help the learning agent into searching for a policy that would be appropriate for overcoming the information fatigue [12,13,15,17].

G. Fairness in Machine Learning
In the same way as people, algorithms are vulnerable to biases that exist in data and can lead to an unfair decision or outcome. More than 20 types of biases in ML were extracted, categorized and explained by researchers [24,26] in order to motivate and accelerate the process of mitigating them.
Putted simple, in the context of high stakes decisionmaking, fairness is the absence of any prejudice or favoritism towards an individual or a group based on their inherent or acquired characteristics that are considered sensitive variables. Thus, a fair algorithm is one whose decisions are not skewed towards a particular group of people. GDPR, UK Equality Act, Fair Housing Act and Equal Credit Opportunity Act define protected classes such as race, gender, age or disability and state the fairness and equality principles [64].
The most simple and straightforward definition of fairness is "fairness through unawareness": "A ML model is said to achieve fairness through unawareness if protected attributes are not explicitly used in the prediction process".
Although these variables are not used in developing the ML model, this doesn't mean that the information cannot be retrieved from other variables. Chiappa & Isaac [65] emphasize that fairness should be expressed both in terms of sensitive variables, but also considering corelated or proxy variables. Not considering these proxy variables has been shown to increase the risk of discrimination [27].
Mehrabi et al. [66] distinguish three different types of fairness definitions: 1) Individual Fairness where the system should give similar outputs to similar individuals, 2) Group Fairness where ML system treats different groups equally and 3) Subgroup Fairness which intends to obtain the best properties of the group and individual notions of fairness. 490 | P a g e www.ijacsa.thesai.org Group fairness equal treatment can be in turn defined through [66,37]: 1) Equal Opportunity where the probability of a person from a positive outcome class of being assigned a positive outcome should be equal for both protected and unprotected group members, 2) Demographic Parity where the likelihood of a positive outcome should be the same regardless of whether the person is in the protected group or not, 3) Disparate Impact considers the ratio between unprivileged and privileged groups likelihood of a positive outcome. Disparate impact uses the "not less than 80%" rule to define if a process has disparate impact or not.
Main approaches for tackling unfairness are differentiated into three groups: 1) pre-processing, 2) in-processing and 3) post-processing [37]. Pre-processing methods are extracting representations from the data in order to remove undesired biases [27]. Then, this unbiased data is used for model development. Some of the methods in this family are adversarial learning, causal methods, relabeling, perturbations, resampling, reweighing, transformation and variable blinding [27]. The in-processing methods are constraining a model to produce fair outputs by including fairness into the learning mechanism like adversarial learning, bandits, constraint optimization, regularization or reweighing [27]. The postprocessing methods are working with model outputs to make them fair using calibration, thresholding and transformation approaches [27].

H. Fairness in Recommender Systems
As presented in a previous section, RecSys make recommendations to support decision making by studying user behavior and historical patterns. Because it is widely applied in various fields like recommending music, books, people to hire or jobs, an impartial view of the system towards any of the involved sides can be detrimental [67,68].
Since these systems use past data, they are also inheriting the 1) Historical Bias, which is the already existing bias in the world [26], 2) Representation Bias when the used sample from a population is not representative for the whole population [26] and 3) Social Bias when other people's actions or generated content affect another person's opinions [69]. Alongside with these biases, the system itself displays: 1) Popularity Bias when items that are more popular tend to be exposed more [70], 2) Algorithmic Bias when the bias is not present in the input data and is added later by the algorithm in the way it works [69], 3) Presentation Bias when the way items are presented impacts the attraction those items get (e.g., users can click only what they see, thus, items presented more often will get more clicks) [69] and 4) Ranking Bias when top ranked items are perceived as more interesting and thus, receive more traffic [71]. In Fig. 1 it is shown how different types of biases feed each other in a RecSys.
Considerations of fairness have been actively studied in the context of recommender systems. Burke [72] introduced the multisided view of fairness in recommender systems. In the case of recommendations, the system is facilitating a transaction between parties [73]. Fairness towards all the involved parties is important and a balanced point should be found. Burke et al. [74] divide stakeholders of any given recommender system into three categories: 1) Consumers are the individuals that receive recommendations 2) Providers are those that stay behind the recommended items or products and gain from consumer's choices and 3) System is the platform itself that tries to match providers with consumers and by doing this in a successful way is gaining benefits. Recommender system's objective for a consumer is to give the best items for his needs, through personalization, in such a way that these items are not constraining him in getting a higher overall utility compared to people from other groups, thus in a fair way. For a provider, recommender system needs to ensure that his items get sufficient exposure and that items are shown to the consumers that have the highest probability of buying or consuming them, thus in a relevant way. Platform's utility is also important, because this is the initial motivation of having the recommender system in place. A key issue that arises in recommender systems is the tension between a personalized view of recommendation delivery and fairness objectives [74,75].
Provider's fairness in recommender systems is typically defined for the objects or subjects to be ranked. It has been explored and formalized as: 1) the bound of the number of items related to each of the protected attributes that are allowed to appear in the top k positions of the ranking [76], 2) a sufficient presence of items belonging to different groups [77], 3) a consistent treatment of similar items [77], 4) a proper representation of items from both protected and unprotected groups [77]. 5) exposure, disparate exposure and group fairness disparity; all three proportional to the merit of the item defined as relevance to the query [78], 6) pairwise fairness that expresses the likelihood of a clicked item being ranked above another relevant unclicked item is the same across both groups [79], 7) pairwise statistical parity represents that if two candidates from different groups are compared, then on average each group has an equal chance of being top ranked [79], 8) set-based fairness at discrete points in the ranking with logarithmic discount that emphasize the fact that fairness at top ranks is more important than at lower ranks [80], 491 | P a g e www.ijacsa.thesai.org 9) difference in acceptance rates measures whether a relevant item from the advantaged and disadvantaged class are accepted at the same rates [81].
In the area of consumer fairness, there are the following metrics that can be used: 1) value unfairness which occurs when one group of users is consistently given higher or lower predictions than their true preferences [82]. Value unfairness becomes large when predictions for one group are consistently overestimated and predictions for the other group are underestimated.
2) absolute unfairness, which measures inconsistency in absolute estimation error across different user groups [82]. This means that the advantaged group has the unfair advantage of good recommendations, while the other groups have poor recommendations.
3) non-parity unfairness is computed as the absolute difference between the overall average ratings of disadvantaged users and those of advantaged users for recommended items [82].
4) balanced neighborhood that expresses the fact that recommendations for all users are generated from neighborhoods that are balanced with respect to the protected and unprotected classes [74].
Overall, the consumer fairness is less represented in the literature.
There are efforts in the area of mitigating bias and ensuring fairness in recommender systems by using regularization terms [82,83,84], reinforcement learning [78,85] and neighborhood balancing [74].
Although the literature of methods is rich in methods to mitigate unfairness, not all of them are applicable to the dynamic nature of recommender systems. Ge et al. [85] show that by enforcing fair decisions through static fairness criteria metrics, the system leads to unexpected unfairness in the long run and that fairness cannot be defined in a static setting without considering the long-term impact and evolution. Same as with the need to bring adaptability into the recommender system results, the need to have a dynamic view over fairness can be solved by using reinforcement learning.
The practice of using Reinforcement Learning to ensure fairness is an emerging research area [27]. In terms of implementation, fairness dimension can be given to the RL agent as a reward that can be positive in the case of fair outputs and negative otherwise [86,87]. Other approaches construct the problem as a Constrained Markov Decision Process [85].
III. PROPOSED APPROACH In this section, it is proposed a conceptual framework for the Fairness Embedded Adaptive Recommender System that aims to balance between personalization and fairness for longterm customer engagement. Firstly, the objectives of the recommendation system are introduced with possible solutions. Then, based on this, a novel architecture for this type of systems is proposed. Next, it is described how fairness dimension can be introduced into the RL engine and how the personalization-fairness trade-off can be solved.

A. Fairness Embedded Adaptive Recommender System
In the present paper, the objective is to create a conceptual design of a recommender system that holds a series of requirements: • System is focusing on customer relationship development.
• System is incorporating an adaptivity functionality.
• System is optimizing for long term customer engagement.
• System is ensuring consumer and provider fairness.
• System is using a small action and state space in the RL engine.
• System is solving the personalization and fairness trade-off.

B. System Overview
Once converted, the relationship with a new customer must be developed for it to become profitable. In simple terms, this means understanding and covering client's needs.
The objective of the application is to extract consumer preferences and use this knowledge to find the most appropriate products and / or content that will be recommended through communication and interaction with the customer. Same time, the recommended content should bring to user the maximum utility and give him equal opportunities compared to people from other social groups.

1) Database creation:
The application starts by setting up data sources (Fig. 2, 1). The information considered mandatory in a recommender system application is a) consumer data, b) their past interactions, c) data on provider's items characteristics, d) items' reviews and e) metadata about the current browsing session.
2) Preprocessing: The next step is to prepare the tables in the form in which they will be used in different recommendation components. This means that a series of tables having different structures will be created: a) The Customer-Item Matrix Table contains information about the items purchased or consumed by a consumer during the analysis period b) The Transaction-Item matrix is typically stored in transactional format where a transaction contains several rows, c) The items characteristics table contains all the tangible and intangible characteristics of an item as well as 492 | P a g e www.ijacsa.thesai.org statistics about how many times it was recommended, clicked, bought from recommendation lists etc.
3) Recommendation Components: Once all the main tables are prepared (Fig. 2, 2), three recommendation components are developed, and their results are merged and combined to create the action space for the RL Engine (Fig. 2,  3-4).
4) User-oriented collaborative filtering recommendation (Fig. 2, 3.1): The method starts from the assumption that similar users have similar preferences [88] and reflects the real situation when recommendations from friends are more effective. Model tries to explain the Customer-Item Matrix Table using a set of latent factors. Latent structures are automatically deduced from the matrix, as long as the number of factors is specified [88]. Once the factors are discovered, the model associates the belonging of an item to a factor and the user's inclination towards the same factor. For each customer, the model will recommend products that have values close to "1" in the reconstructed matrix and have not been purchased in the past. 5) Content-based recommendation (Fig. 2, 3.2): In the case of new users or items, because of no prior history, the recommendation strategy is using the Items characteristics table. For example, one can use all available information about the tangible and intangible properties of the item, including embeddings extracted from unstructured data. 6) Complementary item recommendation (Fig. 2, 3.3): In this step, rule sets are extracted from transactions using association algorithms. This is extremely useful as it emphasizes the context of using/consuming the initial item. This as well brings completeness to customer's need by saying "if you want to use this, do not forget about that".
Following these recommendation strategies, the outcomes are combined between them in order to create the action space for the RL model (Fig. 2, 4). As it can be seen, there is also the "Random Recommendation" component (Fig. 2, 3.4) and "No Recommendation " (Fig. 2, 3.5) that will bring exploration and novelty into the recommendation landscape as well as will keep the system from harming stakeholders through unfair decisions.

7) Reinforcement Learning Engine:
The next task of the system is to choose the most appropriate action for a particular use. In other words, the question that needs to be answered is: "For this user, what is the best action to take? Recommendation of a product that corresponds to the latent structures of the user? An item similar to what user consumed/liked before or a complementary item?". The answer can be as well that the best action is no action.
The solution that could combine user information, past behavior and interaction in order to choose the best action is to use a RL engine (Fig. 2, 5). RL problem is defined as a MDP system.
The set of environmental states is represented by the finite clusters over the vector space extracted from the characteristics of the environment: • The socio-demographic characteristics of the user.
• User past behavior and interaction with focus on indicators like diversification, appetite for novelty, previously liked items.
• The details of the period in which the browsing and recommendation is made. This can include time of the day or year, browsing device, browsing session time etc. 493 | P a g e www.ijacsa.thesai.org The set of actions represents all possible actions that the recommendation system controls. As defined before, the set of actions is represented by the individual recommendations or combinations of them (Fig. 2, 4).
The advantages of formulizing the actions like this are: 1) the low complexity given by action space: instead of having all the candidate items, 2) keep the personalization as a key focus, 3) include novelty for consumer and fairness for provider (random product action and no recommendation action).
The reward function is conditioned by the user's response to the recommendation received (Fig. 2, 6-7), but also by fairness rewards of the system.
The reward value at time t+1, Rt+1, after the agent takes At in St is compounded out of three terms: • Reward coming from user response • Reward coming from fairness value towards consumer • Reward coming from fairness value towards provider Next, the individual terms are defined, because they are the key for embedding fairness and consumer relationship in the system.
User response rewards are linked to the action that user is taking after seeing the recommended item. There can be distinguished the following actions that a user is taking in response, each with a different associated reward. The associated reward will be decreasing as per user feedback reaching a negative value if user ends the communication with company: • User clicks on the recommended product, and buys/consumes it.
• User clicks on the recommended product, but does not buy/consume it.
• User is adding the item into wish list/buy latter/favorites lists.
• User marks the recommendation as inappropriate.
• User closes the recommendation/searching session without taking any other action.
Consumer fairness reward is linked to a particular item recommended and is known in advance; thus, it is deterministic. Consumer fairness reward is calculated for each item based on Disparate Impact formula as defined in [37]. This particular formula was used because it is easy to adapt and integrate as a reward into the system. Also, the metric encodes the demographical parity idea in a RL workable manner. But the main reason why this particular metric was chosen out of all presented previously is its unique advantage of being deterministic, thus known before taking any action. Considering the act of recommending a particular item being a treatment, this means that the ratio between the likelihood of a positive outcome (presentation of an item) for different groups should ideally be close to 1: The Disparate Impact for an item is updated every time an item is being recommended using equation 1, and although it is item linked, is actually expressing the consumer fairness. By integrating this metric, it is ensured that individuals with different backgrounds are treated the same, thus they receive same type of items and content recommendation. A good example of a situation that was not fair towards consumer is the study that shows that female users of Google had a lower chance of being recommended and presented hiring ads for high-paying executive jobs [67].
In order to integrate the Disparate Impact into the RL engine, first is needed to adapt it to express the undesirable practice of having either positive or negative bias. Although most of the metrics are focusing on supporting the unprivileged group, this can lead to a turn of the situation, such that the focus here is on not having any type of bias and treating individuals from different groups equally. The equation of calculating the consumer reward is given by For the provider fairness, it is desired to ensure that exposure is a function of relevance. For this purpose, difference in acceptance rates is being used (DAR) [81] and is calculated as the pairwise difference of the ratio of true positives divided by the predicted positives for each class. The latest is also called Precision in binary classification evaluation [47] and reflects the fraction of relevant cases. The reason why it was decided to go with this particular metric, is the fact that it is the only metric out of the presented ones that can be used in other recommendation settings besides ranking.
By incorporating DAR into the RL engine there will be ensured that items are presented proportionally to how relevant they are.
In order to actually use DAR as rewards, for each item, the relevance as defined in equation 3 is calculated. The updated relevance is simulating the situation on recommending an item, while it will not be clicked on. Secondly, DAR (equation 4) is calculated using the newly updated relevance for the item, while maintaining the relevancies for other items ceteris paribus. There will be a DAR value associated with each of the items that can potentially be recommended. Finally, this value is taken with a negative sign (equation 5) in order to count for provider fairness. If there are big differences between items relevance, they will sum up and bring to a high negative reward, thus the agent will try to choose an item that leads to a smaller DAR. The overall reward function is given by the equation 6.
For solving this problem formulation and extracting the optimal policy, one can use Temporal Difference Methods [89] as they are appropriate for continuous tasks having discrete state and action spaces or Deep Q-Learning Networks [90] if one wants to include the reward values into the state.

IV. RESULTS AND DISCUSSION
In the current paper, it was presented the design of a Fairness Embedded Adaptive Recommender System (FEARS) which allows to create a fruitful relationship with the client by optimizing for long-term goals while making sure to keep both the consumer and provider fairness.
The gaps in the current conceptual practice were presented in the related work chapter. The desired functionalities of the system were stated in the proposed approach section and in the Table I it is shown how requirements were translated into implementation solutions along with potential issues and limitations that should take the form of further work.
The system was designed to use holistic user information including his socio-demographics and past buying behavior patterns. This was integrated into the system by encoding it into the state of the RL MDP model. As per author knowledge, this is a novel addition to the RecSys using RL approach. Another important detail is the inclusion of the recommendation context expressed through browsing time metadata (ex. time of the year, hour of the day). The way states are used in the RL engine, by clustering the initial vectors, allows overcoming the most common RL limitation of non-efficiency and non-convergence.
Another way to ensure optimal policy convergence is to take control over action state space. This was stated as a clear problem in the literature reviewed and by using an elegant approach of defining actions as recommendation strategies and combination of those, the action space is downside from the number of all items to a maximum of 11 actions.
Although the individual recommendation components are traditional and straightforward, together they are covering the whole scene of consumer interest: similar, complementary, high interest or random products. The way these strategies are combined, namely through an RL engine, brings both adaptivity and ensures reaching long term objectives into the system. A detail that emphasizes the customer relationship health and importance is the practice of using negative rewards into RL component. This means that system will try to optimize for users to be recommended products that they are likely to buy but also play quite safe and not causing information fatigue that can lead to termination of relationship. Another addition is the fair view both towards consumer, but also with respect to items providers. Stated as a clear problem in the reviewed literature, the conflict between personalization, consumer fairness and exploration is solved by introducing not only consumer satisfaction rewards, but also fairness specific rewards.
In this paper, it was presented a conceptual framework that can be adapted to a large range of use cases, from e-commerce companies to both news, article and media items recommendation. Another set of application area may consist of those where the decisions and recommendations are linked to life-changing, high stake situations like hiring, job recommendation or financial lending.
The approach tries to overcome limitations of both individual traditional recommendation systems as well as RL usage in the RecSys by having an integrated view over consumer, a focus on the long-term engagement and a strong enforcing of a sustainable and fair recommendation practice.

System Requirements Functionality Implementation Potential issues/ limitations with the solution
System is focusing on customer relationship development Usage of an RL engine that has the reward function linked to customer relationship goals.
The reward function is not reflecting accurately the desired objective.

System is incorporating an adaptivity functionality
Usage of a RL engine that takes complex states in account when recommending an item.
High complexity and search space that comes with all the additional information.
System is optimizing for long term customer engagement RL engine is using negative rewards where appropriate in order to decrease information fatigue and optimize for long term objectives.
Negative rewards are too small in comparison with positive rewards affecting their efficiency System is ensuring consumer and provider fairness RL engine reward function is containing fairness metrics and outputs a higher reward in case of fair recommendations. N/A System is using a small action and state space in the RL engine Action space is represented by individual or combinations of recommendation strategies. State space is discretized by using clustering techniques.
Oversimplification of the action and state spaces that could lead to pattern loss System is solving the personalization and fairness trade-off Reward function contains both recommendation relevance metrics and fairness metrics. Inappropriate balance between the two objectives 495 | P a g e www.ijacsa.thesai.org Future research should involve implementing the approach and use it in real-world situations for evaluating the degree in which it reaches its multisided objectives. Also, different streams of work linked to potential issues presented in Table I should be carried.

V. CONCLUSIONS
Major contributions of this paper are presented as follows: • A reinforcement learning based framework FEARS for better recommendations that focus on both revenues and relationship with the customer was introduced.
• The framework has a holistic view over customer and recommendation landscape ensuring a highly personalized, relevant and positive user interaction.
• Two relevant, adapted fairness metrics are defined and a way to compute them is presented.
• The relevant fairness metrics are embedded into the system as corresponding rewards.
• A RL problem definition was given that overcomes the common RL in RecSys issue of non-efficiency by using a limited, but relevant action space and discretized and clustered state space.
Overall, the system has all the necessary levers to overcome limitations of individual components, solve the personalization-fairness conflict, ensure long-term customer engagement and avoid the typical RL issue.
Same time, the framework should be tested in real-world situations or simulated data and appropriate design changes should be made. This is a conceptual starting point for developing FEARS.