The Performance of Personality-based Recommender System for Fashion with Demographic Data-based Personality Prediction

Currently, the common method to predict personality implicitly (Implicit Personality Elicitation) is Personality Elicitation from Text (PET). PET predicts personality implicitly based on statuses written on social media. The weakness of this method when applied to a recommender system is the requirement to have minimal one social media account. A user without such qualification cannot use such system. To overcome this shortcoming, a new method to predict personality implicitly based on demographic data is proposed. This proposal is based on findings by previous researchers stating that there is a correlation between demographic data and personality trait. To predict personality based on demographic data, a personality model (rule) is needed. This model correlates demographic data and personality. To apply this model to a recommender system, another model is needed, that is preference model which connects personality and preference. These two models are then applied to a personality-based recommender system for fashion. From performance evaluation, the precision of and user satisfaction to the recommendation is 60.19% and 87.50%, respectively. When compared to precision and user satisfaction of PET-based recommender system (which are 82% and 79%, respectively), the precision of demographic data-based recommender system is lower whereas the satisfaction is higher. Keywords—Implicit personality elicitation; demographic data; personality-based recommender system; personality trait


I. INTRODUCTION
The first method to be used in a recommender system was content-based filtering which recommend items based on similarity between keywords on item description and on user's profile [1] [2]. However, as it turned out, a content-based filtering has several weaknesses, one of which is its inability to distinguish the quality of items. This is because a good quality item will have the same keyword as a bad quality item [1].
Because of the glaring weakness in content-based filtering method, a new method, collaborative filtering, is used. The inability of content-based method to differentiate between different item qualities is solved by collaborative filtering by asking users to rate all the consumed items. This rating data is then used to calculate the rating of all new items [3], then this method will select top N items with highest ratings and then recommend these items along with the estimated ratings [4]. In practice, a rating-based collaborative filtering also has several weaknesses, one of which is cold start problem or the new user problem [5]. This issue occurs when a recommender system is unable to provide a new user with accurate recommendations, because a new user does not have a record of what items has been consumed and the rating (the user profile is still empty).
To deal with the cold start problem, a user profile must be made available as soon as a new user becomes a member of a recommender system. The trick is that new users must fill in certain data when registering as a new member. Data that can be used in this case is personality trait. Afterward users will be given recommendations that match their personality traits. There are three advantages of using this personality trait [6]. The data pertaining to personality trait can be obtained in two ways, i.e. explicitly (Explicit Personality Elicitation) and implicitly (Implicit Personality Elicitation). The explicit method requires the user to answer a personality trait questionnaire to predict the personality trait. The commonly used personality trait questionnaire is based on the Big Five. As the name implies, Big Five consists of five factors/traits, namely: openness, conscientiousness, extraversion, agreeableness, and intellect. There are many Big Five based questionnaires that are available free of charge ranging from the longest with 504 questions to the shortest with only 10 questions [7]. If a recommender system utilizes the explicit method to obtain a user's personality traits, the user must answer a personality trait questionnaire before becoming a member of the system. Despite the fact that the method can accurately predict a user's personality traits; however, this method is burdensome and time-consuming for the user; therefore, the explicit method is only suitable for use in laboratory studies [8].
To overcome the weaknesses of the explicit personality elicitation method, a researcher may opt to use the implicit personality elicitation method. By using the latter method, a user's personality trait can be predicted, albeit indirectly. The current technique is called the Personality Elicitation from Text (PET). As the name implies, the users' personality traits are predicted from the posts they write, in this case on social media [9][10] [11] [12][13] [14]. However, this method has one obvious weakness when applied to a recommender system, i.e. the user must have at least one active social media account. 360 | P a g e www.ijacsa.thesai.org In order to cope with the shortcoming of PET, a novel method of implicit personality elicitation is proposed, that is based on demographic data. In this new method, the user's personality is predicted based on demographic data.
To date, demographic data has been applied directly to a recommender system, hence the name Demographic Recommender System. Here the demographic information about the users is used by the classifiers to learn about how to find correlations between certain demographic data with ratings or buying tendencies [15]. However, there has been no research on the use of demographic data to predict human personality traits. The research is useful to overcome the weakness of implicit personality elicitation method, which is based on writing. This is where the gap lies in the personalitybased recommender system research, specifically the implicit personality elicitation. This paper presents the result of the work in creating personality model connecting demographic data and personality traits. Next, the model is applied in a personality-based recommender system.
The idea to apply demographic data to predict personality traits come from the results of previous studies which found relationship between personality traits and demographic data. According to [16], one's personality can change or is stable at certain period in the course of his or her life. Except [17], other researchers such as [18] and [19] found that gender also affected personality traits.
Other demographic data such as race/ethnicity/country, hobbies, sport, occupation, zodiac, blood type, and color are known to also affect personality traits. Reference [20] found that persons from different countries have different personality traits. Furthermore, people with different personality traits tend to have different hobbies. An example is a person with a high score in openness is more likely to enjoy something abstract. Therefore, they are most likely are connoisseur of the arts and other forms of culture. Regarding sport, psychologically, a person's preference for a certain type of sport will be supported physically along with preference for a certain movements. Researchers also believe that different personalities will favor different movements.
The potential benefits of this new method of implicit personality elicitation are that it can be applied to any personality-based recommender system such as in an online shop, library, and travel company. By using this method, the system can give accurate and satisfying recommendations based on the users' demographic data instead of the users' rating history.
A summary of research trends in the recommender system and the proposed method is presented in Fig. 1.
The goal of this research is to create personality and preference models that when applied to a recommender system, then such system: 1) Can be used by any users without the need to have social media account or write status with certain length.
2) Has quite high users' satisfaction.
To achieve the goals, the following research questions must be answered: 1) Which demographic data or combination of demographic data that makes up the best model?
2) When the model is applied to a recommender system, how is the precision of the recommendation and satisfaction to the items recommended?
This paper is structured as follows: the next section, Section II, talk about the methods that are used in processing the data. Results and Discussion is presented in Section III. Section IV concludes the paper.

II. METHODS
The detailed survey methods has been presented in [25]. Below is the summary.
A total of 1014 respondents from several cities in Indonesia were involved in the current study. The questionnaire used in this study consists of three parts: i.e. demographic data, personality traits, and preferences.
In this survey, personality questionnaire based on Big Five was used. From a number of Big Five personality trait questionnaires that are available, the Indonesian IPIP 50 questionnaire [26] was chosen. The difference between IPIP 50 and the other Big Five-based questionnaires is that IPIP 50 does not use the term neuroticism; in its place, it uses the term emotional stability which is the opposite of neuroticism. Moreover, IPIP 50 also does not use the term openness; it uses the term intellect instead. To assess the personality traits, respondents were asked to score each question with a score of 1-5 where 1: strongly disagree, 2: disagree, 3: neutral, 4: agree, and 5: strongly agree.
In this survey, the following demographic data were collected: year of birth, marital status, city of residence, sport, occupation, hobbies, ethnicity, favorite color, zodiac, and blood type.
To learn more about preferences, data on respondents' preferences with regard to clothing styles were collected. There 361 | P a g e www.ijacsa.thesai.org are seven main clothing styles, i.e. Rebellious, Natural, Feminine, Elegant Chic, Dramatic, Creative, and Classic. It should be pointed out that initially this classification of clothing styles was intended for female users; hence, the styles are referred to as feminine and elegant chic. If the terms were applied to a male user, then obviously they would not be referred to as feminine and elegant chic. The term feminine for male users would be substituted with a style that matches the gender, i.e. masculine, whereas, the elegant chic style for male users would simply be referred to as fashionable.
As many as 105 samples of clothes that match the seven clothing styles, or 15 samples for each clothing/fashion style were provided. The respondents were asked to choose samples of clothes they liked. If they liked all the samples then they must choose all and vice versa, if they did not like any of the samples then they did not have to choose any.
After collecting the data, a three steps initial data processing were performed: (1) converted the year of birth into age, (2) calculated the total score of personality traits for each trait, (3) classified the data on the clothes samples selected by the respondents for each clothing style and counted the total number. The number of items selected by the respondent is used to determine the respondent's level of preference for a particular clothing style. If the number of selected items ranges from 1 to 5, then the level of preference is weak. If the number is from 6-10, then the level of preference is moderate. And if the number is greater than that, i.e. between 11 and 15 items, then the level of preference is strong. After that, the three most preferred clothing styles were determined, i.e. the three clothing styles with the highest number of selected items.
While doing the initial processing, the Cronbach alpha value was also calculated to determine the internal consistency of the data. Cronbach alpha was calculated using the following formula: where: α = Cronbach alpha. N = the number of items. c = average covariance between item-pairs. v = average variance.
From the calculation, the following values were obtained: 0.801 for extraversion (good internal consistency), 0.773 for agreeableness (acceptable internal consistency), 0.844 for conscientiousness (good internal consistency), 0.908 for emotional stability (excellent internal consistency), and 0.749 for intellect (acceptable internal consistency).
It should be noted that the questionnaires for the research was made using Google Forms that does not put a limit on how many times a respondent can fill in the questionnaires. Therefore, a check needs to be done to find respondents who fill in the questionnaires more than once. During this check, as many as 25 such respondents were found; therefore, one of the duplicate data was deleted.
In addition to checking for duplicate data, another thing that needs to be checked was the presence of a certain respondent known as a self-enhancer. The presence of a selfenhancer is characterized by a high interscale correlation value, which is the value of the correlation coefficient between attributes. To calculate the interscale correlation, Pearson's correlation coefficient (r) formula was used: where: r xy = correlation between variables x and y. n = the sample size.
xi and yi = the i th sample points.
Since there are five traits, there are 10 correlation coefficients must be calculated such as correlation between extraversion and agreeableness, correlation between extraversion and conscientiousness, etc. All these correlation coefficient values are then averaged to get the average interscale correlation.
The high interscale correlation value can happen because when filling in the questionnaires, self-enhancers tend to rate themselves higher than they should. For that reason, a selfenhancer will show a high level of personality traits in all traits. Therefore, to search for the presence of a self-enhancer, the total scores of personality traits of each respondent must be checked. Respondents who have a maximum score or near to the maximum score on all traits are considered as selfenhancers. In their study, [19] obtained an interscale correlation value of 0.19 and they claimed that such a value indicated that there were not many self-enhancers in the data. However, since a value of 0.38 was obtained, it was assumed that there were quite many self-enhancers in the data. After checking the data, 94 self-enhancers were found. After the data were deleted, the interscale correlation value dropped to 0.24.
When checking the data, data that did not make any sense at all were found. The data were from respondents who rated themselves with a score of 3 on all questions. Therefore, the data were deleted. After deleting the unwanted data, the remaining data is 894.
In building the model, the attribute that act as dependent attribute is level of personality traits. There are two levels: i.e. high and low. The level of personality trait was obtained in the following way: (a) by calculating the average score for each trait, (b) scores that were smaller than averages were designated as low level and scores that were higher than average were labeled as high level.
Additionally, in the modeling, the respondents' age group (categorical type) was used instead of age (numerical type). Therefore, the age data were grouped according to the classification laid down by the Indonesian Ministry of Health [25]. Based on the classification, the respondents were grouped into three groups, i.e. the middle age, adulthood, and adolescents. 362 | P a g e www.ijacsa.thesai.org The last stage in data processing was to remove some of the attributes that will not be used in the modeling stage. In the modeling, the following attributes were used: blood type, occupation, favorite color, gender, hobby, sport, zodiac group, zodiac component, age group, marital status, ethnicity, intellect level, emotional stability level, conscientiousness level, agreeableness level, extraversion level, preferred clothing style 1, preference level 1, preferred clothing style 2, preference level 2, preferred clothing style 3, and preference level 3.

A. Modeling
It takes two models to build a personality-based recommender system for fashion; first, a personality model that links the demographic data with personality traits and the other one is a preference model that links the personality traits with a person's preference over fashion. Accordingly, in this stage, the two models were built.

1) Personality model:
The process to create personality model has been presented in detail in [25]. Below is the summary.
The attributes that were used in the personality trait modeling were blood type, occupation, favorite color, gender, hobby, sport, zodiac group, zodiac component, age group, marital status, ethnicity, intellect level, emotional stability level, conscientiousness level, agreeableness level, and extraversion level. The first eleven attributes are demographic data that serve as independent attributes in the modeling using a decision tree. In addition to using the demographic data individually, a combination of two demographic data (e.g. blood type-occupation, blood type-age group) is used. By combining two demographic data, as many as 54 combinations are obtained, so the total number of demographic data used in the modeling is 65. Hence for each trait, as many as 65 models were created. Meanwhile, the level of personality traits (intellect level, emotional stability level, conscientiousness level, agreeableness level, and extraversion level) were used as the dependent attribute. To evaluate the model, a 10-fold cross validation was used.
Only one model will be used at a later stage. To select the model, the following criteria were used: (1) to make sure that the model can be used by everyone repeatedly at a later date, it has to be made certain that the demographic data in the model will never change, (2) it has to be also made sure that the model are fairly accurate. Based on these criteria, the model based on age group and gender is chosen [25]. Another reason to choose this model is because previous research found that age and gender has very close relationship to personality traits [19]. Table I presents the model.

2) Preference model:
As in the personality model, the detailed process in making this preference model has been presented in [27]. The summary is presented.
In the data processing stage, three preferences data and their level (preferred clothing style 1, preference level 1, preferred clothing style 2, preference level 2, preferred clothing style 3, and preference level 3) were selected from each respondent. However, before building the model, a preference data for each clothing style must be created. The data was obtained by combining the respondent's three preferences data into one. Table II shows example preferred data for Natural clothing style that will be used in the modeling. Preferred data for other clothing styles, i.e. Dramatic, Classic, Elegant Chic, Creative, Rebellious, and Feminine were also created.
The data used in building the preference model comprises the levels of the five personality traits and preferences. In the modeling, the class association rule method was used with personality trait's levels as antecedents and preferences as consequent or class.
Association rule is a method to discover a rule that connect items on a transaction. There are at least two measures that are used to identify good rule: support and confidence. If N is the number of transaction, then support of item X, Y is defined as: Meanwhile confidence of (X  Y) is defined as: The personality trait model shown in Table I reveals that the levels of personality traits for extraversion, agreeableness, and conscientiousness are the same for all groups, they are: low for extraversion, high for agreeableness and conscientiousness. Therefore, in this preference modeling, only data with low extraversion, high agreeableness, and high conscientiousness were used. The modeling also only used the data with moderate and strong preference levels. Since it is not possible to recommend men's clothing to women and vice versa; therefore, the men and women were separated.  Two preferred clothing styles for each personality group are selected based on the highest confidence value. The two fashion styles with the highest confidence value were chosen as the preferred clothing styles (Table III). After selecting the two preferred clothing styles, the preference model can be created and is presented in Table IV. B. System Performance 1) System architecture: The author in [7] developed a personality-based recommender system in which personality traits were predicted using a questionnaire (explicit method). Basically, the system consists of two parts, i.e. a part for predicting the personality traits and a part to find the nearest neighbors. Meanwhile, [28] used the personality elicitation from text (implicit method) to predict personality traits in their system. The system they propose basically also consists of two parts, i.e. a part for predicting the personality traits and a part to find the nearest neighbors.
In reference to the two studies above, the recommender system that is built also consists of two parts, i.e. a part for predicting the personality traits and a part to find the nearest neighbors (Fig. 2). 364 | P a g e www.ijacsa.thesai.org Before discussing the model, it is necessary to explain what is meant by neighbors-neighbors are respondent's data collected during the data collection stage and modeled at the modeling stage. When the respondents were filling in the questionnaire, they were asked to pick the items that they liked; therefore, this data were treated as data about users who had consumed certain items. These data were stored in the user database as a basis for providing recommendations.
The built system includes a personality model (Model 1) and preference model (Model 2) which was obtained in the previous stage. As mentioned before, the personality model contains rules that link the demographic data to personality traits. Meanwhile, the preference model contains rules that link the personality traits to preferences.
The system starts working when a new user enters the year of birth and gender into the system. Using these data, the system will classify users into certain personality traits (predicting the user's personality traits) based on Model 1 (personality model). These personality traits are passed along to Model 2 (preference model) that will classify user's preferences based on the user's personality traits (predicting user's preferences).
After generating user's preferences, the system will retrieve all neighbors in the user's database in search of all neighbors whose year of birth, gender, and preferences are the same as the user's year of birth, gender, and preferences. The result is a list of neighbors with the same year of birth, gender, and preferences as the user.
By using the filtered list of neighbors, the system will search for the nearest neighbor. It should be explained that in this study, all users are using the model for the first time. Therefore, the user has never consumed any items. Because of that, at this stage, the system looks for the nearest neighbor based only on the same year of birth. If there are several neighbors with the same year of birth, then the system will pick one neighbor.
After the nearest neighbor is obtained, the system will collect all the items that have been consumed by the nearest neighbor. These are the items that will be recommended to the users. After consuming some or all of the recommended items, the new user data will be saved to the user database.
2) System evaluation: A number of researchers have used direct method to evaluate the systems they have built. A survey 365 | P a g e www.ijacsa.thesai.org conducted by [29] involved 21 students as the respondents. In the survey, the respondents were asked about the novelty of the items recommended, the accuracy, and satisfaction with the recommendations given. In another study, [30] also carried out a direct survey to users to find out the respondents' level of satisfaction with the built model.
To evaluate the performance of the proposed recommender system, a direct survey involving 74 respondents was conducted. In the survey, the respondents were asked to interact with the system.
• Relevance of the recommended items. In this experiment, the system recommends a number of items to the user. The system provides a Like button on all recommended items. When checking the recommended items one by one, if the button was pressed, it means that the item is relevant. Then the precision of the relevant items can be calculated. Precision is the percentage of relevant items from all the recommended items. = Here each user has one precision value, whereas the system's precision is the average precision of all users.
• Satisfaction to the recommended items. To find out user's satisfaction, the CSAT method is used. By using the method, user's satisfaction can be gauged by asking them how satisfied they are with the goods or services they used. There are five scales used to assess user's satisfaction, i.e. very dissatisfied, dissatisfied, neutral, satisfied, and very satisfied. The percentage of satisfaction is calculated using the following formula: Note that the responses used in the formula are only Satisfied and Very Satisfied ones.
In the study, the system will ask the question: "How do you rate the recommended items?" immediately after the respondent finished checking the recommended items. The respondents were supplied with five answers as follows: very dissatisfied, dissatisfied, neutral, satisfied, and very satisfied. The percentage of user satisfaction is only calculated from the responses that give the value of Satisfied and Very Satisfied.

C. Evaluation Result
From the evaluation, the following facts were obtained: 1) Precision of the recommendation was 60.19%.
2) Satisfaction to the items recommended was 87.50%.
3) There were as many as 17 respondents whose precision less than 50%, nevertheless satisfied or even very satisfied with the recommendation.
4) There were as many as 5 respondents whose precision more than 50%, yet dissatisfied (neutral) with the recommendation.
From the facts above, it can be said that a respondent with low precision can still be satisfied with the recommendation. In opposite, a respondent with high precision can be dissatisfied with the recommendation. Therefore, it is hypothesized that satisfaction correlates with level of preference, and not with precision. The statement on satisfaction has no correlation with accuracy has been confirmed by [31] and [32]. Note that precision is one of accuracy metrics used in recommender system besides recall, MAE, and MSE.
User's satisfaction is a psychological condition that can be measured from the user's expectation. A user will satisfy if the products or services offered to them exceeds or at least the same as the expectation. On the contrary, a user does not satisfy if the experience when using the products or services below the expectation [33]. Based on this, the reason why 17 respondents with low precision still satisfied or very satisfied with the recommendation is because they like the items recommended very much (high preference level) despite they only like a few of the many items recommended. In other word, the user's expectations are fulfilled. Meanwhile, in the case of 5 respondents with high precision but dissatisfy; it is because they do not really like the items recommended (medium preference level). As a result, they dissatisfy with the recommendation although they like many items. In other word, the user's expectation still not met.
The author in [29] reported that the precision and satisfaction of PET-based system were 82% and 79%, respectively. Compared to demographic data-based system performance, the precision of PET-based system is better but demographic data-based system is better in satisfaction. In recommendation system, accuracy is important but accuracy alone is not enough. This is because user satisfaction is more important. The next paragraph explains about this.
According to [31] two main tasks of accuracy metrics in recommender system are: 1) To measure the accuracy of single prediction. This is called predictive accuracy metrics. This metric calculate how close the predicted rating from the actual rating. Mean absolute error (MAE) and mean squared error (MSE) are used in this metric.
2) To evaluate the effectivity of the system in selecting the high quality items from a set of available items. This metric is called decision-support metrics and uses precision and recall to declare the accuracy.
Furthermore, [31] stated that building a recommender system with high accuracy was not enough. This is because the most accurate recommendation based on those metrics above, is sometimes not the useful recommendation for or liked by the users. This causes dissatisfaction to the users. In other word, user satisfaction does not always correlate with high accuracy [32] [31]. Knowing the importance of user's satisfaction, [31] and [34] stated that it is not fair to judge a recommender solely from its accuracy. The users must be taken into account since they do not care with the algorithm to increase the recommendation accuracy. They only want to have useful recommendations. 366 | P a g e www.ijacsa.thesai.org Back to the performance comparison between demographic data-based system and PET-based one where the demographic data-based system has lower accuracy but higher satisfaction. Based on the above explanation, recommender system whose satisfaction is higher is better. In other word, demographic data-based system is better than the PET-based system.

IV. CONCLUSION
A research has been carried out to find model connecting demographic data and personality traits. As many as 65 models of each trait were created. From those models, the one based on age group and gender was selected as the working model since it satisfied two criteria. Besides that, previous research also found that age and gender had very close relationship to personality traits.
From the performance evaluation, precision and satisfaction of the demographic data-based recommender system were 60.19% and 87.50% respectively. When compared to PETbased system, demographic data-based system is lower in precision but higher in satisfaction. Other advantage of demographic data-based system compared to PET-based system is there is no obligation to have social media account.
Despite the strength of demographic data-based system compared to PET-based system, this research has some limitations: 1) The fashion classification may differ from other classifications. So, when a clothing is classified into a certain clothing style, others may classify it into different clothing style.
2) Some of the respondents may be come from lowincome communities who do not care with fashion hence the choice of preferred clothing different from other respondents with the same category (e.g. age group -gender).
To test the hypothesis about the relationship between satisfaction and level of preference, another experiment is needed. In the experiment the respondents are asked not only to determine whether they like an item or not, but also the level of preference to the item. One way to obtain the level of preference is by providing five stars on each item. The more stars given by a respondent to an item, the higher the level of preference of the respondent to that item.