Detection and Feature Extraction of Collective Activity in Human-Computer Interaction

Time-based online media, such as video, has been growing in importance. Still, there is limited research on information retrieval of time-coded media content. This work elaborates on the idea of extracting feature characteristics from time-based online content by means of users' interactions analysis instead of analyzing the content itself. Accordingly, a time series of users’ activity in online media is constructed and shown to exhibit rich temporal dynamics. Additionally it is demonstrated that is also possible to detect characteristic patterns in collective activity while accessing time-based media. Pattern detection of collective activity, as well as feature extraction of the corresponding pattern, is achieved by means of a time series clustering approach. This is demonstrated with the proposed approach featuring information-rich videos. It is shown that the proposed probabilistic algorithm effectively detects distinct shapes of the users’ time series, predicting correctly popularity dynamics, as well as their scale characteristics. Keywords—Users’ activity; aggregation modelling; collective intelligence; time-based media; pattern detection


INTRODUCTION
Over the last decade the World Wide Web became the most popular medium for watching video content [1].Online multimedia content gains wide acceptance and popularity, due to recent advances in related technology and significant related hardware cost reductions.User-generated content, ranging from personal and how-to videos captured by individuals to free online video lectures and documentaries produced by organizations and academic institutions, is growing at a fast pace online.Broadband Internet connections and increasing interest in online applications and games aided the unprecedented growth of rich multimedia content.Now more than ever more effective methods for indexing, searching, categorizing and organizing this information are required [2].
As a deterministic following, corresponding research work on video retrieval must adopt itself to new techniques and methodologies that leave behind traditional video analysis that is usually based on automatic spatiotemporal analysis of sequences and open up ways to new interpretations.One of the new trends towards this direction is the consideration and exploitation of the interactive behaviour of users as an integral part of the actual video retrieval process.In this research, this approach is taken a step further and social video consumption activity is considered as a user activity signal within the temporally linear video playback.This approach relies on the capture and analysis of implicit user interactions in order to extract useful information about the video.Previous research [3] has suggested that implicit interactions between users viewing a video and the video-player benefit video summarization.To further explore this, aggregated users' interaction with the video-player using a stochastic pulse modelling process is analyzed.
In principle the concept of analyzing implicit user interaction in computing activities in order to develop user models and to provide intelligent interactions is far from new and tackles both content-based and user-based approaches.In the past, Budzik & Hammond [4] developed a system that provided dynamic responses to users based on their previous interactions with desktop computer applications.Most notably, Liu, Dolan and Pedersen [5] have improved personalization of news items by analyzing previous users' interactions with news items.In the context of multimedia, previous research has considered both content-and user-based methods for video retrieval.Kim et al. [6] report a large-scale analysis of in-video dropout and peaks in viewership and student activity, using second-by-second user interaction data from several Massive Open Online Courses (MOOCs) videos.In a different approach, Carlier, Ravindra, Charvillat and Ooi [7] propose a hybrid method where content analysis is complimented by the implicit feedback of a community of users in order to recommend viewports.Last but not least, the authors of [8] proposed a framework for analyzing human actions in video streams by introducing an implicit user-in-the-loop concept for dynamically mining semantics and annotating video streams.Based on these observations and in the current context of online multimedia sharing communities and networks, effective analysis of collective intelligence and activity remains a hot trending research topic in the area of human-computer interaction and provides motivation towards identifying new innovative research ideas.www.ijacsa.thesai.orgThus, as an example, the methodology presented herein could utilise users' interactions on content, by interpreting these interactions as time series.Such time series could be of clicks, plays or progression-slider position change of a video on YouTube1 .Modelling the collective intelligence of users' interaction via the detection of characteristic patterns within the interaction time series could lead to judgment about the importance of the content, in part or whole, from which users' activity originated.
The rest of the paper is organized as follows: section ΙΙ presents recent research works on implicit user interactions with multimedia content, mainly focusing on online collections.Section III provides the methodology used herein, as well as the basic steps of the proposed algorithm.Section IV has a two-fold role: initially presents the dataset used in the experimentation conducted in order to evaluate the case study results, while further discusses an evaluation procedure, which enrolled a case study in order to clearly show how the proposed mechanism works.Finally, this work ends with conclusions and thoughts as far as future work is concerned in Section V.

II. RELATED WORK
Research concerning the temporal characteristic of human activity has recently attracted increased attention due to the increased interactivity provided by Web 2.0, as well as the increasing amounts of rich online multimedia content (such as videos) available to users through the Web.
Previous user-based research on web video has focused on the meaning of textual comments, tags, re-mixes, and microblogs, but has not examined simple user interactions with a web-based video player.In the seminal user-based approach to web video, Shaw and Davis [9] proposed that video representation might be better off modelled after the actual use made by users.In this way, they have analyzed the annotations to understand media semantics.Peng et al. [10] have examined the physiological behaviour (eye and head movement) of video users, in order to identify interesting key-frames.Nevertheless, the practical application of this approach is diminished as it assumes that a video camera should be available and turned-on in the home environment.Shamma, Shaw, Shafton, and Liu [11] have created summaries of broadcasts (sports and political debate, respectively) by analyzing the Twitter2 stream of the respective real-time event.Although a Twitter stream is considered to be semantically rich, it lacks real-time accuracy that is required in the generation of video thumbnails, since it does require eventually a minimal amount of time (i.e. a few seconds) to type and send a text message.In contrast, the proposed method is entirely based on persistent data of realtime user interactions, such as replay button presses.
As far as research on the temporal characteristics of human social activity is concerned, several interesting areas have been identified by previous research.Issues like social behaviour analysis, social web evolution and trend detection, similar to the works of Yang and Leskovec [12], Papantoniou, Loumos, Poulizos and Rigas [13], Patrikakis, Argyriou and Papantoniou [14] and Vafopoulos [15] aim mostly in predicting social user behaviour, in order to offer an insight to the potential customization of content.Wu and Huberman [16] and Yardi, Golder and Brzozowski [17] examined how collective human attention to items propagates and eventually fades among large populations.Given the vast amount of available online content and the ease of producing more, the authors of [18] and [19] studied the problem of predicting how much attention an item will ultimately receive.Moreover, [20] and [21] offer research on response dynamics of social systems with focus on the effects of bursts of activity in the social system.Backstrom, Kleinberg and Kumar [22] presented research on customizing feeds of news articles based on users' traffic pattern.Spatiotemporal patterns of user interaction with blog-posts have been examined by [23], [24], [25] and [26].In addition, Aperjis, Huberman and Wu [27], presented research on online discussion forums focusing on how users behave when trying to maximize the amount of the acquired information, while minimizing the collection time.
Yang and Lescovec [11] examined temporal patterns associated with online content, by focusing on the popularity of content on social media (i.e.Twitter hashtags).Identification of the common temporal patterns is done by means of a time series clustering algorithm.Their work concludes that temporal variation of content popularity, in online social media, can be accurately described by a small set of time series shapes, while by observation of a small number of adopters of online content reliable predictions of the overall dynamics of content popularity over time can be attained.
As far as users' actions in video content is concerned and more generally, important scene selection in videos, research has mostly been based on content-based methodologies 3 .Nevertheless, such content-based methods often fail to capture high-level semantics that adhere to non-specialist users' navigation to videos as depicted in [28].The authors of [29] proposed that users unintentionally show their understanding of the video content through their interaction with the viewing system.Syeda-Mahmood and Ponceleon [30] presented a client-server-based media playing and data-mining system aiming at tracking video browsing behaviour of users in order to generate fast video previews.The authors of [28], presented a user-centric approach, wherein by analysis of implicit users' interactions on web video player semantic information about the events within a video are inferred.Finally, Karydis, Avlonitis and Sioutas [31] proposed the aggregation of users' activity in order to infer important video frames.

III. METHODOLOGY
In this section, the problem tackled in this work is formally defined and then a probabilistic algorithm solution is proposed.To begin with, let's assume a time series of users' interactions for a specific piece of content.This could be a time series of clicks or plays of a video on YouTube 4 , the number of times an article on a newspaper website was read, or the number of times that a hash tag in Twitter was used.The aim, therefore, is to detect patterns emerging in the temporal variation of the www.ijacsa.thesai.orgcorresponding time series indicating the importance of a segment of content at a specific time interval of its duration.
The formal definition of the aforementioned scenario as a problem of time series correlation based on the correlation between the shape of the (experimentally collected) time series with the shape of a reference time series indicating local maximization of users' activity.
In the first stage, a simple procedure is used in order to average out user activity noise in the corresponding experimental signal.In the context of probability theory the noise removal can be treated, for example, with the notion of the moving average (e.g., [33]): from a curve exp () st a new smoother curve ( ) may be obtained as, where T denotes the averaging "window" in time.The larger the averaging window T, the smoother the signal will be.
It must be noted that the optimum size of the averaging window T is completely defined by the variability of the initial signal.Indeed, T should be large enough in order to average out random fluctuations of the users' activities and small enough in order to reveal, and not disturb, the bell-like localized shape of the signal which in turn will demonstrate the area of high users' activity.
In the second stage, an estimation of the aggregates of users' activity via probabilistic arguments is attempted.This can be done by means of an arbitrary bell-like reference pattern: it is thus proposed that there is an aggregate of users' actions if within a specific time domain a bell-like shape of the experimental signal emerges in the sense that there is high probability that the actions are concentrated at a specific time instance (the centre of the bell) while this probability tends to zero quite symmetrically while moving away from the centre.
As a milestone of this work it is claimed that it is possible to built a scale free similarity metric, by means of the well known correlation coefficient (x, y) between two time series x(t) and y(t), though introduction of the notion of the aforementioned reference bell-like time signal.Indeed, it can be shown that for all the cases where local maxima of the users' activity was detected, a bell like shape is encountered.Noting that the Gaussian functions are the best approximations for a bell-like shape, it is proposed that every local maxima of the popularity may be expressed as an approximated Gaussian like time signal.This assertion is depicted in Figure 1, where an arbitrary smoothed signal of density P(t) of users' activity per time instance t (in seconds) is depicted with the black continuous curve (the same interpretation is used for the rest of the Figures herein).It can be seen that within a neighbourhood of the local maxima of the time series, their shape almost perfectly matches with the upper part of a Gaussian curve (depicted by a dotted line).However, while the Gaussian signals show high similarity in every distinct case of high popularity, the respective widths and highs are not the same in each case.As a result, a robust characterization of the local behaviour of high users' activities, besides the estimation of the exact location of the bell, should estimate its height and width.Moreover, the stochastic nature of users' activity signals has the direct consequence of standard signal process methods locating signal maxima and/or minima based on the estimation of first and second derivatives break down.Indeed, in real-life applications the proposed smoothing procedure is not enough in order to eliminate signal discontinuities thus resulting to infinities derivatives

A. Scalable stochastic similarity algorithm
In the following, a two value correlation coefficient ( ) is built, where is the time centre of the Gaussian bell and w is its width.In other words, a Gaussian time signal is constructed by shifting its centre over the time domain of the experimental signal and for each position a number of different Gaussian time signals are created by gradually increasing its width w (see Figure 2 from blue to red and to green solid Gaussian bell).For each Gaussian reference signal of different width the corresponding correlation coefficient is estimated with the experimental signal.The proposed formulation is motivated by the well known notion of Gaussian kernel density estimation 5 , a non-parametric way to estimate the probability density function of a random variable [34].More specifically, this work further elaborates on the introduction of a series of Gaussian kernels with variable widths and finding the optimum matching or correlation coefficient for each point of the state space instead of having many Gaussians kernels with constant width and for the entire state space of the random variable under study.In this way, a two dimension correlation coefficient for each time location and for different bell widths is thus achieved.As previously stated, whenever, for a specific time centre of the Gaussian bell, a high correlation coefficient is identified during the time shifted process, a local maxima of the experimentally constructed time series is assumed.Indeed this can be seen in www.ijacsa.thesai.org Figure 3 for an arbitrary time series (top line).The lower line, normalized to 10, depicts the corresponding correlation coefficient between the arbitrary time series and the shifted Gaussian bell.Initially the width of the bell is kept constant (as will be discussed in the corresponding section) while a robust alternative measure for the initial width could be the variance of the smoothed experimental signal).
It is evident that there is a very clear maximum of the correlation coefficient exactly when the centre of the Gaussian bell coincides with the maximum of the experimental series.As a result, the exact location of the experimental series is detected as the point of the local maximum of the corresponding correlation coefficient.Then, the assumption of the constant bell width is relaxed by keeping constant the centre of the bell by building Gaussian bells of different widths (as depicted in Figure 2).For each bell of variable width a new correlation coefficient is computed.The maximum value of this second set of correlation coefficients is estimated completing thus the process.The final result is the estimation of the maximum correlation coefficient terms of the optimum time moment and optimum bell width.It is thus argued that the optimum time moment coincides with the local maximum of the online media popularity while the optimum Gaussian bell width coincides with the corresponding time interval over which popularity is important enough.
Summarizing, the proposed algorithm, r-algorithm performs the following steps: it begins with an initial Gaussian bell, the centre of which is located at the time origin of the content and its width coinciding with the variance of the smoothed experimental signal.Then, a two step procedure follows, namely the detection step and the refinement or characterization step.Within the detection step the bell is shifted along the time domain, computing the corresponding correlation coefficient between the Gaussian bell and the experimental signal.The local maxima of users' activity are identified as the time instances where the computed correlation coefficient reaches local maximum, with the local maximum being above a user and domain specific threshold.
In the characterization step, for each local maximum of the correlation coefficient, a series of variable width Gaussian bells is generated (beginning from a value of few seconds to a fraction of the overall duration of the content) and the corresponding correlation coefficients are computed again.The calculated optimal bell width is an estimation of the interval over which the content was important enough for the users.

Algorithm 1 The r-algorithm
Require: Experimental time series, upper part of Gaussian time series g(ct,w) of centre ct and width w. for ct=1 to L do (detection step) r_ct (the correlation coefficient for different centers) if r_ct > thress (critical threshold of correlation) for w=1 to L/10 do (characterization step) r_ct_w (correlation coefficient for variable widths) end for end if end for return r_ct_w (returns seconds of maximum users' activity and the corresponding time interval of popularity)

IV. ALGORITHM EVALUATION
In order to evaluate the proposed work, an open-access data-set is employed, as proposed by Gkonela & Chorianopoulos [28], which has been created in the context of a controlled user experiment, in order to ensure well-defined user-based semantics (ground-truth).www.ijacsa.thesai.org In the initialization phase, every video is considered to be associated with four distinct series in the time domain of length k, where k is the time duration of the video in seconds.Each series corresponds to the frequencies the four distinct buttons of Play/Pause, GoForward, and GoBackward are used by users at specific times.Initially, the users' activity series is created as follows: each time a user presses the GoBackward (the button, the intervals matching the last 30 seconds (the next 30 seconds) of the video, are incremented by one, meaning that during all these 30 second the corresponding button was assumed pushed.The main experiment assumption followed relies on the fact that a user typically rewinds a video because there is something interesting to be seen again, while a user forwards a video because there is nothing of interest to see so far.In this way, a series is constructed for each button and for each video that resembles a depiction of users' activity patterns over time.
Following the above described smoothing procedure, the proposed approach focuses on the analysis of video seeking user behaviour incorporating the GoBackward and GoForward buttons.Of particular interest is the GoBackward button signal, since it may contain a quite regular pattern with a small number of regions with high users' activity.In the following, preliminary results are presented that demonstrate the proposed methodology for detecting patterns of such users' activity.
The analysis of the users' activity signals follows the implementation of the proposed r-algorithm.The results of the proposed methodology are depicted in Figure 4.The smoothed signal is plotted with the upper (black) curve.Moreover, the correlation coefficient of the smoothed signal with each Gaussian bell (as its centre is shifted over the time domain) is also depicted with the lower (red) curve, as extracted from the detection step of the proposed r-algorithm.It is evident that the exact seconds of users' activity maxima are surprisingly well estimated from the corresponding maxima of the correlation coefficient.It must be noted that while a series of small maxima is observed in Figure 4, it is obvious that the local maxima of the correlation coefficient that is aimed for must be above a very clear threshold value.This is done to avoid perturbation of the signal, which by no means expresses signal trends.For the present case a threshold value of 0.8 is assumed for the proposed algorithm.The exact threshold value depends on the problem under consideration and could be tackled by means of an establishment through a training process.
For each estimated local maxima the optimum width, as computed from the characterization step of r-algorithm, is also given for each detected local maxima.It can be seen in Figure 4 that the computed widths fit the corresponding widths or time interval over which important scenes are popular enough.
In order to compare the outcomes of the proposed ralgorithm, a surface plot is also provided in Figure 5, depicting the evolution of the two valued correlation coefficient ( ) in relation to its variables.It can be seen that the local maxima of the correlation coefficient depends on the width of the Gaussian bells is used.As a result, the optimum time instance (coinciding with the local maximum of the online media popularity) and the optimum Gaussian bell width (coinciding with the corresponding time interval over which popularity is important enough) may be defined from the coordinates of the corresponding local maxima of the plotted surface.

V. CONCLUSIONS
In this research work describes a method that detects collective activity and identifies collective intelligence patterns via the detection of characteristic patterns within the corresponding signal monitoring aggregate activity.This framework introduces an algorithmic approach for detection of aggregates of users' activity.The latter relies heavily on the notion of a two parameter arbitrary Gaussian bell, acting as a reference pattern for aggregation.Accordingly, the aggregation of users' actions coincides to the upper part of a bell-like shape of the corresponding distribution.The complete pattern of users' interactions is defined by means of two parameters: the exact location of the centre of the Gaussian bell , as well as the corresponding width w.In this way, managing to map different users' behaviour to different observed patterns is successfully achieved.As depicted herein, initial experimental research results are obtained from the application of the proposed methodology on web videos utilizing an open-access dataset.These results may be used to understand and explore social collective intelligence in online media, i.e., the way to detect users' collective behaviour, as well as how the detected collective behaviour leads to judgment about the importance of multimedia content from which users' activity originated.
Although the total improvement is not considered to be impressive, it is the belied of the authors that the approach successfully incorporates the underlying knowledge and further exploits collective activity in the video analysis value chain.www.ijacsa.thesai.orgMoreover, further research intends to use the proposed ralgorithm as a tool of user-based multimedia content analysis towards efficient content adaptation and personalization according to evolving users' preferences.Finally, minor enhancements on the implemented algorithmic model, e.g., in terms of additional semantic relationships exploitation, would further boost its performance and impact.

Fig. 1 .
Fig. 1.The user activity signal is approximated with Gaussians bells in the neighborhood of user activity local maxima

Fig. 2 .
Fig. 2. The Gaussian bell is shifted over the time domain.When a local maximum of the correlation coefficient is detected a series of variables widths is created in order to estimate the optimum width

Fig. 3 .
Fig. 3. Local maxima of the correlation coefficient (lower curve) coincide with local maxima of users' activity signal (upper curve)

Fig. 4 .
Fig. 4. Local The smoothed signal is plotted with the upper (black) curve.The optimum correlation coefficient as extracted from the detection step of the proposed r-algorithm is plotted with the lower(red) curve

Fig. 5 .
Fig. 5.A surface plot depicting the evolution of the two valued correlation coefficient ( ) in relation to its variables.Local maxima of the correlation coefficient emerge and submerge depending on the width of the Gaussian bells is used