Optimizing the Behaviour of Web Users Through Expectation Maximization Algorithm and Mixture of Normal Distributions

The proposed work is to analyse the user’s behaviour in web access. Worldwide, the web users are browsing through different websites every second. Aim of this paper is to identify the behaviour of user's in a time bound using an Expectation Maximization (EM) algorithm and the maximum likelihood estimates of the model parameters. A novel approach based on Mixture normal distribution is used to discuss the percentage of user along with web page frequency. Keywords—EM algorithm; maximum likelihood; mixture normal distribution; web page frequency


INTRODUCTION
The number of accessible web pages grows significantly; it is becoming increasingly difficult for users to find documents that are relevant to their particular needs.Users must either browse through a large hierarchy of concepts to find information or submit a query to a widely available search engine [1].Therefore, the process of understanding the user's navigation behaviour is challenging but fundamental in improving web query answering, link structure and in simplifying navigation through a large number of individual webpages.The web sites are making great effort to understand user's behaviour and make the web sites easy to access.To achieve this goal, researchers proposed lots of approaches to use web usage data.
Researchers studied this topic from different points of view.A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data or hidden data is presented at various levels of generality [2], [3], [4] and [5].EM algorithm to retrieve the complete scatterer trajectory matrix is discussed in [6].
Mixture distributions are extensively used to model a wide variety of empirical phenomena, in diverse fields such as biology, anthropology, psychology, economics, and marketing.Overviews of mixture distributions and many examples of their applications are given by [7].Mixtures of t-distributions and their numerous variants are discussed by [8], [9], [10] and [11].
EM algorithm and finite mixture model is discussed in [12].The EM-GMM algorithm targets reconfigurable platforms, with five main contributions [13].
In this paper we have studied the web user's behaviour using EM algorithm.The web page access is predicted using mixture normal variate.The remaining of the paper is organized as follows.In section 2 we present the concept of EM algorithm.
Section 3 gives the application of EM algorithm to the selected database.In Section 4 we deal with mixture normal variate and its application in predicting web page frequency and finally concluded in Section 5.

A. Data Base
The data is taken from the educational institute of Sri Sivasubramaniya Nadar College of Engineering (SSNCE), Chennai, Tamil Nadu, India.

II. CONCEPT OF EM ALGORITHM
The EM algorithm is a general method, to estimate the parameters using maximum-likelihood estimation.
EM algorithm is used when the data is incomplete, due to the limitations of the observation process.The algorithm consists of two steps.This is diagrammatically shown in Fig. 1.
Given a set of parameter estimates the E-step calculates the conditional expectation of the complete-data log likelihood given the observed data and the parameter estimates.In this step, using conditional expectation, given the observed data and current estimate, the missing data is estimated.Given complete-data log likelihood, the M-step finds the parameter estimates in order to maximize the complete-data log likelihood from the E-step.These two steps are iterated until convergence.Convergence is assured since the algorithm is guaranteed to increase the likelihood at each iteration.
The sequences of web users randomly access the SSN educational institute website for various departments.Web user access 6 engineering departments with 4 independent various attributes in each department.For application of EM algorithm www.ijacsa.thesai.org the dataset corresponds to page views of a user.To predict the accessibility of various departments among users EM Algorithm is applied.From the Internet browsing logs, we could gather the following information about a web user the frequency.The initial values i.e., Expectation values are depicted in Table I as the values for E-step.By application of the Maximization step, the updated values are shown in Table II.Based on the calculations, the accessibility for each department can be determined and the results are depicted in Fig. 3.

IV. MIXTURE OF NORMAL VARIATES TO PREDICT PERCENTAGE OF USER AND WEB PAGE FREQUENCY
It is essential to predict the percentage of usage and web page frequency to understand the accessibility and popularity of the website among users.In this paper, we have used mixture of normal variates for this purpose.
Mixture of normal variates is used in statistical methods.Random vector x has a normal variate and it can be written as linear combination of variables from vectors x, all the samples of x variables from normal variates.It is independently distributed with zero covariance.The density function of a mixture of two univariate normal distributions is ( ) is the standard normal distribution [14], [15], [16] and [17].The interpretation of this system consists of mixture of two population and p lies between zero and one.The component of two mixture normal variates ∑ where i is the unit matrix ( ⁄ ) If then the term goes to infinity.The variance of mixture components are finite and finite probability to all points.While other components can shrink onto the data point thus contributing the data point increasing additive value to the log likelihood.Two mixture of normal distribution with mean and standard deviations to take mixture of distribution and where .Therefore the mixture of mean is ( ) ( ) [18].The mixture of the resulting normal curve is estimated using MATLAB and the results are shown in Fig. 4. From the graph, shown in Fig. 4 we observe that variance and mean are www.ijacsa.thesai.orgdifferent.It is an equally weighted average of the bell-shaped probability density function of the two normal distributions.The weights were not equal, the resulting distribution could still be bimodal but with peak of different height and split-up is a linear combination of two normal variates with means 11 and 18; variance 0, 1 and 4, given by 0.5N(11,1)+0.5N(18,1) and 0.75(11,0)+0.25N(18,4).In this paper we proposed a method of using EM algorithm to predict the accessibility of webpages among users.We have used mixture distribution to identify web page frequency and percentage of users.Based on these the popularity of the web pages among users can be studied.The frequently accessed web pages can be updated.The study reveals that EEE department is popular among the users and is accessed much frequently when compared to the other departments.The study can be extended to centrality of networks.

Fig. 1 .
Fig. 1.EM Algorithm III.APPLICATION OF EM ALGORITHM The sessions are grouped based on the user's profile.The sessions are grouped as various departments, namely, EEE,

Fig. 2 .
Fig. 2. Grouped webpages ECE, MECH (MEC), CIVIL (CIV), IT and BME.The considered webpages are Events, Faculty (Fac), Research (Res) and News.The webpages considered is shown in Fig. 2. To determine the proportion of usage of various departments, we determine the likelihood of the webpage access and the sessions by using EM algorithm.

Fig. 4 .
Fig. 4. Percentage user vs web page frequency V. CONCLUSION

TABLE II .
FINAL AND M-STEP OF THE WEB USER