Using Social Signal of Hesitation in Multimedia Content Retrieval Graphical Analysis of Selection Traces in the Matrix-factorization Space of Multimedia Items

This paper presents the graphical analysis of selection traces in matrix-factorization space of multimedia items. A trace consists of links (lines) between points that present a selected item during interaction between user and video-ondemand (VoD) system. User used gestures to select from among video on screen (VoD service), while additional user-produced social signal (SS) information was used to recommend more suitable new videos in the process of selection. We used a sample of 42 users, equally split into test (SS considered) and control and random (SS not considered) user groups. We assumed, for each user, there are areas of multimedia items in the matrixfactorization space that include preferred user items, called preferred areas. The results showed that user selection traces in the space of multimedia items (matrix-factorization space) better covered the user’s preferred areas of items if the SS of hesitation was considered. Keywords—Human-computer Interaction; Social Signals; Hesitation; Matrix Factorization; Video-on-Demand; Graphical Analysis


INTRODUCTION
State-of-the-art research in human-computer interaction (HCI) ignores the user social behaviour, therefore the user interaction with the system is still not completely user-friendly experience.Social signal processing [1,2,3,4] is a research domain that aims to understand social interactions through machine analysis of nonverbal behaviour [4].Social signals (SSs) are initiated by the human body and present reactions to current social situations.They are expressed with nonverbal behavioural cues (e.g., gestures, postures, facial expressions, etc.).
One example of how SS can be used in HCI is a manual VoD system with a conversational recommender system (RS) where the user selected one video clip among several presented on the screen [5].The system adjusted the list of video items to be recommended according to the extracted SS class {hesitation, no hesitation}.SS of hesitation was used because is commonly manifested when a user is faced with a variety of decision choices.The results of this study showed a significant difference in user satisfaction with the system between group for which the SS was considered and group for which the SS was not considered [5].
In this paper we present the results of graphical analysis of selection traces in matrix-factorization (MF) space of multimedia items.At each step user selected one video on screen (one point in MF space).A line links two consecutive selected videos (points).In that way we got selection traces for all interactions.Graphical analysis was based on two assumptions (i) the MF space of multimedia items is the best possible layout of multimedia items for all users and (ii) for each user, there are areas of multimedia items in MF space that include preferred user items, called preferred areas.We compared traces between group for which the SS was considered (test group, 14 users) and groups for which the SS was not considered (control group, 14 users; random group, 14 users).The results indicate that the use of the SS of hesitation in our VoD system provides better coverage of the user"s preferred areas of multimedia items in MF space, resulting in better user satisfaction with the system.The reminder of this paper is summarized as follows.
Section II provides experimental design, experimental user scenario and additional explanations of the selected aspects of the experimental design.Section III describes the evaluation methods that were used, while the evaluation results are presented in Section IV.A discussion of the evaluation results are provided in Section V. Section VI concludes the study.

II. EXPERIMENTAL DESIGN
We modelled an independent-measures experimental design and an associated experimental user scenario for the evaluation of SSs in HCI in an example where users use gestures to select from among videos on a screen (VoD service) (experimental design and user scenario are briefly described in [5]).Our experimental design allows the control of the effect of the SS expressed by the user during an interaction with the system and the control of other possible causes of differences in quality of experience (QoE) among tested users to reliably estimate the contribution of the use of the SS to the QoE.The experimental design allowed a fair comparison among test, control and random groups.A human operator provided a baseline for real-time action recognition and SS extraction.The main reason why we used a human operator was to avoid there being a new uncontrolled parameter in our design since the results obtained with current state-of-the-art automatic gesture-recognition algorithms still include errors.The human operator observed the user via a camera and reported his/her decisions through a humanoperator interface.www.ijacsa.thesai.org The experimental user scenario was a manual VoD system with a conversational RS, where the user selected one video clip from among several presented on a screen (television) through a VoD user interface.The system adjusted the list of the video items to be recommended (RS) according to the extracted SS class {hesitation, no hesitation} and selected item.All scenario description below refer to the test user group.If the user is not hesitating, the system displays three similar items in addition to the selected one.If the user is hesitating, the system then displays four diverse items according to the items on the current screen.The new items are projected onscreen with sound feedback, which indicates how the system recognized the user"s SS.The user repeats the selection process until he/she finds the item he/she wants to watch.When the user indicates with a gesture that the final decision has been made (i.e., the user selects the item he/she wants to watch), the system expands the selected item (video) to the whole screen and turns on the sound of the video.The user watches the selected item for about 20 seconds.To detect if the user was hesitating, we used hand movements, eye behaviour and the time between two selections.Before and after interaction with the system, the user fills in pre-and post-interaction questionnaires.
The scenario for the control user group and for the random user group is almost the same as for test group.The only difference is how the system presents new items to the user.In the control user group, the system provides three similar items related to the initially selected item (the decision of the system is based only on gestures for video selection).In the random group, the system randomly provides similar and diverse items related to the initially selected item.In this way, we ensure that any difference between the test and control groups of users and test and random groups of users is not only a consequence of the use of different selection functions.
Selection of the most significant behavioural cues that describe SS class {hesitation, no hesitation} was based on methodology presented in [5].We obtained the best results by combining four features (three behavioural cues and one automatic feature (time)) and logistic regression as classification algorithm.These four features are: (a) the user watching video content, which is then selected for a longer viewing time, (b) the user making a quick gesture when selecting video content, (c) the user watching all video contents, but none for a longer time, and (d) the time between two selections.The proposed model was then used for the design of a human-operator interface through which the human operator reported his/her decisions about the extracted SS class and recognized gesture of selection.

A. Selected Aspects in Experimental Design
This paper focuses on graphical analysis of selection traces in the MF space of multimedia items, therefore we briefly describe the conversational RS, MF space of videos and video selection function in sub-sections below.

1) Conversational Recommender System and Video Database
A conversational RS with no previous knowledge about the user was used.Functions getInitialItems(), getSimilarItems(), and getDiverseItems() (see subsection below) were based on selected videos from the LDOS-CoMoDa research dataset [7] and MF-based recommender algorithms [8].However, we did not use all videos from the LDOS-CoMoDa dataset.Our subset contained over 300 videos (movie trailers).All the videos had the same display resolution (632 x 274 pixels) and were in the same multimedia format.The minimum length of a video was 60 s.The distance between movies was computed in a twodimensional space generated by the first two factors of the MF algorithm presented in our previous work [9] and briefly below.

2) Matrix-factorization Space of Videos
Input data for the RS were presented as a sparse matrix in two dimensions, where the first dimension represented users and second dimension items (r u,i ).Data of the matrix were item ratings; specifically, explicit user feedback taken from the LDOS-CoMoDa research dataset, which has more than 3600 ratings given by 150 users.The goal of the MF method is to explain ratings in the r ui matrix by characterizing both items and users with factors inferred from the rating patterns.MF models map both users and items to a joint latent factor space of dimensionality f, such that user-item interactions are modelled as inner products in that space.Each item i is associated with a vector q i ∈ D f , and each user u is associated with a vector p u ∈ D f (1) [8].The main challenge is the computation of the mapping of each item and user to factor vectors q i , p u ∈ D f .In our case, the stochastic gradient descent approach [8,10,11] was used.We computed the factor space in two dimensions (f = 2) and each multimedia item was therefore presented as a point in twodimensional MF space (Fig. 1).

3) Video Selection Functions
Employing our testing scenario (see Section II), videos were provided to the user according to the SS produced by the user.The VoD system simulates an event in the video rental store or at home.The user wishes to get a video, but is not sure which one.The support person provides the user with four videos (items) and the user expresses an opinion.If the user www.ijacsa.thesai.orghesitates when selecting one item, four completely new items are provided.If the user does not hesitate when selecting one item, the selected item remains and three similar items are added.The selection procedure is repeated until a final selection is made.Therefore, we need three video selection functions provided by the conversational RS:   , , , ( , , , ) hS hA hB hC getSimilarItems hS h1 h2 h3   , , , ( , , , ) hA hB hC hD getDIverseItems h1 h2 h3 h4  .
Function getInitialItems ((2), Algorithm 1) provides four videos for the first screen, where the videos cover the whole MF space.
Function getSimilarItems ((3), Algorithm 2) provides four videos that are similar to hS (the selected video); one of them is hS.This narrows the search area.
Function getDiverseItems ((4), Algorithm 3) provides four videos that are not similar to h1, h2, h3 and h4, which expands the search area.The function should diversely cover all of the factorized video space except the areas covered by h1, h2, h3 and h4.The distance metric measuring similarity among movies is based on the MF space.

III. METHODOLOGY
Graphical analysis was based on a comparison among selection traces in MF space (see Sec. II.A) obtained by users of all three groups.Each interaction can be presented in twodimensional MF space as a trace.A trace consists of links (lines) between points that present a selected item during interaction (Fig. 2).A line links two consecutive selected items.If the line is coloured red, the selection function recommends items in the next step that are similar to the selected item in the current step (see Sec. II).If the line is coloured blue, the selection function recommends diverse items (see Sec. II).The green circle represents the starting point (first selected video), while the red circle represents the last selected video (final selection).According to these selection traces, we determined the effect of the coverage of the MF space on the user"s QoE.
To explain selection traces in MF space we need to present methodology for the evaluation of the effect of an SS on the QoE.Methodology how we measured QoE is briefly described in [5].In this paper we highlight only the most important parts.
The evaluation was based on pre-and post-interaction questionnaires.The pre-interaction questionnaire comprised 16 statements having a seven-point Likert scale [12] (from completely disagree to completely agree) and one question for which only five different replies were possible.The aspects considered were user knowledge about video contents, user www.ijacsa.thesai.orgtrust propensity, persistence of user choice, user affection towards new technologies, and possible user pattern preferences.Psychometric characteristics such as reliability (Cronbach"s Alpha [13,14]) and validity (average variance extracted [15]) were measured for most aspects.The post-interaction questionnaire consists of 25 statements and questions having a seven-point Likert scale [12], except in the case of demographics, for which data were collected in various ways.The questionnaire considered user satisfaction with the system, the system usability scale, past experiences with similar systems, the user selection time, user confidence in the accuracy of communication performance, user satisfaction with interpreted SSs, user satisfaction with recommended videos, user opinion about task complexity, and personal and demographic information.Psychometric characteristics were measured for most aspects.
To evaluate data from questionnaires, we used Fisher"s exact test, the Mann-Whitney U test and an independent t-test for independent samples.An α-value of 0.05 was considered statistically significant.
Since a human operator was used for real-time action recognition and SS extraction, we estimated the possible effect of the human operator regarding his/her responsiveness and the consistency of his/her recognitions.Based on the results, we concluded that the use of a gesture-based user interface where a human operator performs gesture recognition does not have a negative effect on the interaction (his/her response time is fast enough).To check the consistency of human-operator recognitions in real time, we introduced two additional human operators for gesture and SS class recognition.Both results indicate that human-operator decisions made in real time do not critically affect the results of our experiment.Brief explanation of these results is given in [5] and [6].

IV. RESULTS
Graphical analysis is based on the following assumption.The MF space of multimedia items (Sec.II) is built on more than 3600 ratings, which we can reasonably assume is the best possible layout of multimedia items for all users.We thus introduce the notion of the MF spatial area as an area of multimedia items with similar characteristics (short distances among items within the same area).Therefore, in our layout of items, there are several areas that combine items with similarities.
A user"s past experiences with multimedia items are reflected in the way that the user prefers some items over others.Therefore, our second assumption is that, for each user, there are areas of multimedia items in our MF space that include preferred user items, called preferred areas.We graphically estimated the coverage of the preferred areas for the users in all three groups.All following analyses are based on the procedures described in Section III.
We used a sample of 42 users (N=42); there were 14 users for each of the control, random and test user groups.Since the evaluation in this paper is related to our previous research results, we firstly present the results of hypothesis testing that were published in [5] and [6].

A. Hypothesis Testing
To test hypothesis "The use of the SS of hesitation in the RS improves the QoE when the user interacts with a VoD system" we used statements from the post-interaction questionnaire that represented user satisfaction with the system.The first tested statement was "The system is useful."(St1) and the second statement was "Overall, I am satisfied with the system."(St2).The Mann-Whitney U test was employed to measure the pvalue.Results are shown separately for the two pairs of groups (control and test groups and random and test groups) (Table I).
Before we measured QoE we detected and eliminated other possible causes for the difference between groups.We compared users according to (i) their basic demographics, (ii) their answers to the pre-interaction questionnaire, and (iii) the video content provided.We indicated two possible causes for the difference in QoE between user groups.Significant difference exists in age between user groups and in average rates of all videos that were recommended to the user.We concluded that difference in both cases does not give any advantage to the users in test group (SS considered) in measuring the effect of the SS.
Table I shows that in comparison between control and test group there is a significant difference in both cases (St1, St2), while in comparison between random and test groups there is not a significant difference.We can thus accept the null hypothesis only for comparison between control and test user groups.

B. Coverage for the Control Group of Users
Users in the control group are limited to one area of MF space that is not always suited for them.The MF space is poorly covered in terms of the items that the user sees.Therefore, the user does not always get an item that he/she wants.www.ijacsa.thesai.orgFig. 3 shows two typical traces made by users in the control group.The traces between selected items are short because every selection results in the recommendation of similar items (red line).Items thus cover only a small area of MF space.The users see only the items from one area of MF space, which may not correspond the preferred areas, possibly resulting in lower QoE (Table I).

C. Coverage for the Random Group of Users
Users in the random group are not limited to one area of MF space.The MF space is better covered in terms of the items seen by the user.Because the recommendation of similar (red line) or diverse (blue line) items is generated randomly, the items may cover areas that do not suit the user, and the user therefore does not always get an item that he/she wants.Fig. 4 shows two typical traces for users in the random group.The traces are interlaced because the selection function is selected randomly.Consequently, users see more areas in the MF space that could suit them but they cannot manage these recommendations.A mismatch between areas seen and the user"s preferred areas can be reflected in poor QoE (Table I).

D. Coverage for the Test Group of Users
Users in the test group are not limited to one area of MF space.The MF space is better covered in terms of the items seen by the user.The items cover areas that are suited to the user because the system allows the user to manage the item recommendation through his/her SS.In this way, the user has a better chance to find an item that he/she wants to watch.Fig. 5 shows two typical traces made by users in the test group.Items better cover different areas of the MF space.The SS manages the recommendations and thus guides the user trace.If the user hesitates, a diverse-selection function is used.In contrast, if the user does not hesitate, a similar-selection function is used.The users see more preferred areas and select the most appropriate item from one of the suitable areas.This can be reflected by better QoE (Table I).

E. Analysis of Test-group Scenarios
As assumed, there are several preferred areas in MF space for each user.The function of diverse items (D) is used when the user hesitates, while the function of similar items (S) is used when the user does not.Function D allows the passing from one area of MF space to another, while S allows "walking" only in one area of MF space.Below we present the most common scenarios for the test group of users.

1) A few S then D and then again a few S
The user finds one (preferred) area he/she is interested in and he/she wants to explore.The user does not hesitate and therefore receives similar items.Even if this is one of the preferred areas, the user does not find a suitable item after few steps.The user hesitates and gets four diverse items, which represent four diverse areas in MF space.The user selects another area that he/she finds is suitable and explores it until finding the item he/she wants to watch.The cycle can be repeated several times in one interaction.After each D, it is possible for one or more S to follow.In the case of only one S, the user probably thinks that the current area is of interest, but after getting more items from that area, the user changes his/her mind.In the case of more S, the user explores the selected area.The described scenario can be seen in Fig. 6.Fig. 3. Typical traces among selected items during the interaction of users in the control group with the VoD system.Lines between selected items are coloured red to indicate a similar selection.The user is limited only to one area in the MF space, which may not be his/her preferred area and may be reflected by poor QoE 57 | P a g e www.ijacsa.thesai.orgFig. 4. Typical traces of selected items in the interaction of users in the random group with the VoD system.Lines between selected items are coloured red for a similar selection and blue for a diverse selection.The user sees more areas of the MF space but he/she cannot manage the recommendations and therefore cannot always get an item from a preferred area Fig. 5. Typical traces of selected items in the interaction of users in the test group with the VoD system.Lines between selected items are coloured red for a similar selection and blue for a diverse selection.The user sees more preferred areas in the MF space because he/she can manage the recommendations.Therefore, the user can always get an item from one of his/her preferred areas www.ijacsa.thesai.orgFig. 6.The user finds an item from one of his/her preferred areas on the first screen and therefore explores this area.Since the user is not hesitating, the system provides similar items (red line).After a few steps, the user still does not find an appropriate item and thus hesitates, and the system provides diverse items (blue line).The user is interested in one of the four new items (in one of the preferred areas) and therefore does not hesitate to select it.After a few steps within this area, the user finds an appropriate item

2) One or several D at the beginning of the interaction
The user is not satisfied with the first screen (i.e., there is no single item from the preferred areas) and therefore hesitates.The user then gets items from four new areas.The scenario repeats until the user finds an appropriate (preferred) area.The user then explores this area until he/she finds an appropriate item.The described scenario can be seen in Fig. 7.

3) Only S in interaction
The user finds a (preferred) area that he is interested in on the first screen.The user does not hesitate and gets only similar items.After a few steps in this area, the user finds an appropriate item.This scenario is common for users who love to watch movies but have never used a similar system.The described scenario can be seen in Fig. 8. Fig. 7.The user does not find an item from a preferred area and therefore hesitates (blue line).After finding an appropriate area, the user explores (without hesitation (red line)) it and selects an appropriate item Fig. 8.The user finds an item he/she is interested in on the first screen.Since the user does not hesitate, the system provides similar items (red line).In a few steps, the user explores the selected area until he/she finds an appropriate item V. DISCUSSION Comparison of the coverage of preferred areas in MF space of multimedia items among groups gave expected results.Since the users in the test group can manage the system recommendations (similar or diverse items) with their expressed SSs, their preferred areas of multimedia items in MF space are better covered, which is reflected in better QoE.The users in the random group cannot manage the system recommendations because the recommendations are generated randomly (random selection between similar and diverse selection functions).Despite this, the system can cover a user"s preferred areas in MF space since the diverse function allows transition among areas in MF space.Users in this group have worse satisfaction with the VoD system than users in the test group but better satisfaction than users in the control group.Users in the control group can select an area in MF space only on the first screen and then can only explore within this area.There is thus a high probability that the user does not see any item from his/her preferred areas.These findings indicate that the use of the SS of hesitation in our VoD system provides better coverage of the user"s preferred areas of multimedia items in MF space, resulting in better user satisfaction with the system.

VI. CONCLUSIONS
We presented, to the best of our knowledge, the first attempt to use the user"s SS expressed during the interaction as part of feedback information.We modelled an experimental design and an associated experimental user scenario where users make gestures to select among videos on screen (i.e., VoD).Additional user-produced SS information was used to recommend new videos that were more suitable in the process of selection.Our previous work [5,6] includes comparison between a group for which the SS is considered (test group) and a group for which SS is not considered (control and random group).Comparison based on pre-and post-interaction questionnaires.Our findings were (i) there was a significant difference between the test group (a user group for which the SS was considered) and the control group (a user group for www.ijacsa.thesai.orgwhich the SS was not considered) in user satisfaction with the system and (ii) there was a non-significant difference between the test group and random user group (another user group for which the SS was not considered) in user satisfaction with the system.
In this paper we present the results of graphical analysis of users" selection traces made in MF space of multimedia items to estimate the effect of the MF space coverage on the user"s QoE.We concluded that the use of the SS in our VoD system provided better coverage of a user"s preferred areas of multimedia items in MF space, which is reflected by better satisfaction with the system.
Our future work should focus (i) on increasing the size of the sample of the tested users and (ii) on the realization and testing of repeated-measures experimental design.Each user should test the scenarios of all three groups and decide only which of the scenarios offers him the best user experience.

Fig. 1 .
Fig. 1.MF space of videos.Each video is represented by a point in the twodimensional MF space

Fig. 2 .
Fig.2.Two-dimensional MF space of multimedia items, where each item denotes a point.Each line represents a link between two consecutive selections.If the line is coloured red, the system recommends similar items in the two selections; otherwise, it recommends diverse items (blue lines).The starting point is coloured green and the end point red Find the maximum distance and the ID of the items that this distance belongs to.Add this ID in vector vecC and remove it from SimSub.
vecC: vector of IDs of currently playing videos selID: ID of selected video vecE: vector of IDs of already played videos n: number of items that are being looked for 10:end while

TABLE I .
THE RESULTS OF A USER SATISFACTION WITH THE SYSTEM (QOE MEASURE) AS COMPARED BETWEEN CONTROL AND TEST GROUPS AND BETWEEN RANDOM AND TEST GROUPS.THE NULL HYPOTHESIS WAS TESTED USING STATEMENTS ST1 AND ST2.A MANN-WHITNEY U TEST WAS USED.THE RESULTS ARE PRESENTED WITH MEAN VALUES FOR ALL THREE GROUPS (MEAN C -CONTROL GROUP, MEAN T -TEST GROUP, MEAN R -RANDOM GROUP) AND P-VALUE (P C-T -CONTROL, TEST; P R-T -RANDOM, TEST).ROWS WHERE A SIGNIFICANT DIFFERENCE WAS FOUND BETWEEN GROUPS ARE SHADED RED