An Approach to Improve the Representation of the User Model in the Web-based Systems

—A major shortcoming of content-based approaches exists in the representation of the user model. Content-based approaches often employ term vectors to represent each user's interest. In doing so, they ignore the semantic relations between terms of the vector space model in which indexed terms are not orthogonal and often have semantic relatedness between one another. In this paper, we improve the representation of a user model during building user model in content-based approaches by performing these steps. First is the domain concept filtering in which concepts and items of interests are compared to the domain ontology to check the relevant items to our domain using ontology based semantic similarity. Second, is incorporating semantic content into the term vectors. We use word definitions and relations provided by WordNet to perform word sense disambiguation and employ domain-specific concepts as category labels for the semantically enhanced user models. The implicit information pertaining to the user behavior was extracted from click stream data or web usage sessions captured within the web server logs. Also, our proposed approach aims to update user model, we should analysis user's history query keywords. For a certain keyword, we extract the words which have the semantic relationships with the keyword and add them into the user interest model as nodes according to semantic relationships in the WordNet.


INTRODUCTION
User model [1] is a collection of personal information.The information is stored without adding further description or interpreting this information.It is comparable to a gettingsetting mechanism of classes in object-oriented programming, where different parameters are set or retrieved.User model represents cognitive skills, intellectual abilities, intentions, learning styles, preferences and interactions with the system.These properties are stored after assigning them values.These values may be final or change over time.
The Semantic Web [2] "transforms the Web by providing machine understandable and meaningful descriptions of Web resources".Making the Web content machine understandable, allowing agents and applications to access a variety of heterogeneous resources, processing and integrating the content, and producing added value for the user.Data on the Web must be defined and linked in a way that can be used for more effective discovery, automation, integration, and reuse across various applications.
The personalization aspects [3] of the user interests or profiles can form a good representation of the learning context, which promises to enhance the usage of learning content.The key knowledge nugget in any personalization strategy for elearning is an accurate user model.User Modeling is an active research area in e-learning and personalization, especially when abstracting the user away from the problem an abstraction that has, over the years, contributed to the design of more effective e-learning systems.Despite this improvement, the main focus in most systems, for the past decade, has been on models that are "good for all users", and not for a specific user.
Our proposed approach is to propose improvements in the representation of a user model during building user model in content-based approaches by performing the next steps.First step is domain concept filtering in which concepts and items of interests are compared to the domain ontology to check the relevant items to our learning domain using ontology based semantic similarity.Second step is incorporating semantic content into the term vectors.We use word definitions and relations provided by WordNet to perform word sense disambiguation and employ domain-specific concepts as category labels for the semantically enhanced user models.The implicit information pertaining to the user behavior was extracted from click stream data or web usage sessions captured within the web server logs.The method of representing semantic user model was proposed in [4].Also, our proposed approach update user model, we should analysis learner's history query keywords.For a certain keyword, we extract the words which have the semantic relationships with the keyword and add them into the user interest model as nodes according to semantic relationships in WordNet.The method of updating user model was proposed in [5,6].

II. RELATED WORKS
In [7], authors proposed an idea of adaptation using semantic web techniques with reduced cost of user profile acquisition.Cost-effectiveness is achieved by use of distributed hash tables allowing effective store and lookup operation.Actually DHT operations have to be based on unique IDs which can be easily transformed into keys by means of hash www.ijacsa.thesai.org/function employed in particular DHT implementation.Such approach is acceptable for rule based adaptation systems which do not require information about similarity amongst user profiles to decide.
A method was proposed [8] for creating hierarchical user profiles using Wikipedia concepts as the vocabulary for describing user interests.Authors proposed a method for distinguishing informational and recreational interests in the profile from the commercial interests.They developed ways of mapping documents to Wikipedia concepts for the purpose of profile generation.
It was presented [9] a framework for content-based retrieval integrating a relevance feedback method with a word sense disambiguation (WSD) strategy based on WordNet for inducing semantic user profiles.Hypothesis of authors is that substituting words with synsets produces a more accurate document representation that could be successfully used by learning algorithms to infer more accurate user profiles.These semantic profiles will contain references to concepts defined in lexicons or ontologies.
In paper [10], authors combines the ontology and concept space, indicates the feature items of user profile with semantic concepts, calculates learner's interest-level to the topic through establishing the word frequency and utilize the suitable calculation methods, mining the concepts within the user's feedback files and the relationship between concepts, combines user's short-term interests and long-term interests to create user profiles model with semantic concept hierarchy tree and embody the drifting of user profile and improves and completes the user profiles model consistently on the related feedback mechanism.
Authors have proposed [11,12] an approach to personalized query expansion based on a semantic user model.They discussed the representation and construction of the user model which represents individual user's interests by semantic mining from user's resource searching process in order to perceive the semantic relationships between user's interests which are barely considered in traditional user models and to satisfy the requirement of providing personalized service to users in e-Learning systems.
It has been described in [13] a personalized search approach that represents the user profile as a weighted graph of semantically related concepts of predefined ontology, namely the ODP (http://www.dmoz.org).The user profile is built by accumulating graph based query profiles in the same search session.We define also a session boundary recognition mechanism that allows using the appropriate user profile to rerank search results of queries allocated in the same search session.

III. THE PROPOSED APPROACH
A user model is an internal representation of the user's properties.Before a user model can be used it has to be constructed.This process requires many efforts to gather the required information and finally generate a model of the user.The effectiveness of a user profile depends on the information the system delivers to the user.If a large proportion of information is irrelevant, then the system becomes more of an annoyance than a help.This problem can be seen from another point of view; if the system requires a large degree of customization, then the user will not be willing to use it anymore.
Depending on the content and the amount of information about the user, which is stored in the user profile, a user can be modeled.Thus, the user profile is used to retrieve the needed information to build up a model of the user.The behavior of an adaptive system varies according to the data from the user model and the user profile.Without knowing anything about the user, a system would perform in exactly the same way for all users [1].Representation of user model [14,15] is a necessary factor for building effective and accurate adaptive systems.Adaptive systems compare user profiles to some reference profiles or item characteristics in order to predict the user's model in considering items.The outcome of that process depends on the ability to accurate identify and represent the user's model.
The presented approach for constructing a semantically enhanced user model that represents the user's interests from web-log data [16] (web usage logs).The goal of incorporating the semantic content of the web pages to build the semantically enhanced user models is to address the high dimensionality problem and semantic inadequacy of the Vector Space Model [17,18,19] on which the initial user model was based, and to map conceptually related terms.To enrich the user model during the user is browsing the pages and navigate the webbased system the user model must be updated.To update user model our proposed approach analyzes user's history query keywords by using WordNet.
To acquire user interests, we must extract the user behavior and visited page address from web-log data.Then we analyze the visited pages to acquire the terms in the pages that can be considered as concepts in the user model.The extracted terms are represented by Vector Space Model [17,18,19] that is adapted to our proposed system to achieve effective representations of documents where each document is identified by an n-dimensional feature vector for each dimension corresponds to a distinct term.Each term in a given document vector has an associated weight.
The term vector serves as the initial term-based user model (IUM) upon which we intended to improve.To build a semantically enhanced user model (SUM), we used refined domain-specific concepts.First we obtained a list of domainspecific concepts from domain ontology.Then we performed term-to concept mapping between terms in the initial user model (IT-UM) and domain related concepts based on concept hierarchies in WordNet.The final product is a semantically enhanced user model (SUM) in which terms are mapped to related high-level concepts.
The semantic User Model (SUM) can be updated using user query and WordNet.For a certain keyword, we extract the words which have the semantic relationships with the keyword and add them into the learner interest model as nodes according to semantic relationships in WordNet.
The goal of incorporating the semantic content of the web pages to build the semantically enhanced user models was to www.ijacsa.thesai.org/address the high dimensionality problem and semantic inadequacy of the vector space model, on which the initial user model was based, and to map conceptually related terms.To enrich the user model during the user is browsing the pages and navigate the web-based system the user model must be updated.To update user model our proposed approach analyzes user's history query keywords by using WordNet.
The proposed approach architecture is shown in figure 1.

A. User's Web Log Analysis
Web usage mining [20], the process of discovering patterns from web data using data mining methods, strives to find learner preferences based on the web-logs that reside on servers.Web log [16] records each transaction, which was executed by the browser at each web access.Each line in the log represents a record with the IP address, time and date of the visit, accessed object and referenced object.In such data, we follow sequences in visiting individual pages by the learner, who is, under certain condition, identified by the IP address.In sequences, we can look for learners behavior patterns.
The data from Web logs, in its raw form, is not suitable for the application of usage mining algorithms.The data need to be cleaned and preprocessed.To perform log data analysis, the data pre-processing process must be accomplished.The data pre-processing is the process of cleaning and transforming raw data sets into a form suitable for web mining.The task of the data pre-processing module is therefore, to obtain usable datasets from raw web log files, which, in most cases, contain a considerable amount of incomplete and irrelevant information.
The overall data preparation process [21,22] is briefly described in figure 2.
Data Cleaning: to remove accesses to irrelevant items (such as button images), accesses by Web crawlers (i.e.non-human accesses), and failed requests.
Learner Identification: Because web logs are recorded in a sequential manner as they arrive, therefore, records for a specific learner are not necessary recorded in consecutive order rather they could be separated by records from other learners.
Session Identification: To divide pages accessed by each learner into individual sessions.A session is a sequence of pages visited by a learner.We also call it as a usage sequence.
Path Completion: To determine if there are important accesses which are not recorded in the access log due to caching on several levels.
Formatting: Format the data to be readable by data mining systems.
Once web logs are preprocessed, useful web usage patterns may be generated by applying data mining techniques.Table 1 shows a sample of web log data after preprocessing process.The outputs of this step are web based learning materials; that the learner explored and preferred it, and the behavior pattern of the learner.The learner behavior is used to acquiring knowledge requirement for learners based on course ontology.

B. Domain ontology developing based knowledge engineering approach
Ontology engineering is a subfield of knowledge engineering that studies the methods and methodologies for building ontologies.It researches the ontology development process, the ontology life cycle, the methods and methodologies for building ontologies, and the tools suite and languages that support them.Knowledge Engineering field usually uses the IEEE 1074-2006 standard [23] as reference criteria.The IEEE 1074-2006 is a standard for developing a software project life cycle processes.It describes the software development process, the activities to be carried out, and techniques that can be used for developing software.
It was proposed [24] a knowledge engineering approach to build domain ontology.Figure 3 shows main steps of the ontology development process.
Identify the purpose and requirement specification concerns to clear identify the ontology purpose, scope and its intended use, that is the competence of the ontology.Ontology acquisition is to capture the domain concepts based on the ontology competence.The relevant domain entities (e.g.concepts, relations, slots, and role) should be identified and organized into hierarchy structure.This phase involves three steps as follows: first, enumerate important concepts and terms in this domain; second, define concepts, properties and Ontology implementation aims to explicit represent the conceptualization captured in a formal language.Evaluation/Check means that the ontology must be evaluated to check whether it satisfies the specification requirements.Documentation means that all the ontology development must be documented, including purposes, requirements, textual descriptions of the conceptualization, and the formal ontology.
Our domain focuses "programming languages" course.We use Hozo [25] as our ontology editor.Since Hozo is based on an ontological theory of a role-concept, it can distinguish concepts dependent on particular contexts from so-called basic concepts and contribute to building reusable ontologies.A roleconcept [24] represents a role which an object plays in a specific context and it is defined with other concepts.On the other hand, a basic-concept does not need other concepts for being defined.An entity of the basic concept that plays a roleconcept is called a role-holder.Figure 4 shows part of our domain ontology and the extracted OWL [26] is shown in figure 5.

C. User Model Acquiring
In the proposed system [6,22], user interest model's knowledge expression uses the thought, which is based on the space vector model's expression method and the domain ontology.This method acquires user's interest was shown in [6,22].Figure 5 shows certain steps to acquire user interest.

D. Document Representation
The Vector Space Model [27,28] is adapted in our proposed system to achieve effective representations of documents.Each document is identified by n-dimensional feature vector for each dimension corresponds to a distinct term.Each term in a given document vector has an associated weight.The document keywords were extracted by using a termfrequency-inverse-document-frequency (tf-idf) calculation [18,19], which is a well-established technique in information retrieval.The weight of term k in document j is represented as: Where: : tf kj = the term k frequency in document j, dfk = number of documents in which term k occurs, n = total number of documents in the collection.Table 2 shows the term frequency in different documents.
The main purpose of this step is to extract interested items in the web page, then get term frequency that reflects the importance of the term.Finally, get the weight of terms in the selected page.The output of this step is the weight of terms in selected page that can be used to build learner interest profile.Table 3 shows a sample of the weighted terms in the documents; that found in table 3.

E. Domain Concept Filtering
This process discovers concepts which represent the learner's interests.These concepts and items are compared to the domain ontology to check the relevant items to the learner profile.The most relevant ones update the learner profile.The items relevance is based on ontology-based semantic similarity where browsed items by a learner on the web are compared to the items from a domain ontology and learner profile.The importance is combined with the semantic similarity to obtain a level of relevance.The page items are processed to identify domain-related words to be added to the learner profile.A bag of browsed items is obtained via a simple word indexing of the page visited by the learner.We filter out irrelevant words using the list of items extracted from domain ontology.Once domainrelated items are identified, we evaluate their relevance to learner's interests.
The selected method was used in [29,30] to compute semantic similarity function (S) based on a domain ontology.The similarity is estimated for each pair of items where one item is taken from a learner profile, while the other one from a set of browsed items.
The functions Sw is the similarity between synonym sets, Su is the similarity between features, and Sn is the similarity between semantic neighborhoods between entity classes an of ontology p and b of ontology q, and w w , u w , and n w are the respective weights of the similarity of each specification component.Weights assigned to Sw, Su, and Sn depend on the characteristics of the ontologies.
The similarity measures are defined in terms of a matching process [29,30]: Where A and B are description sets of classes a and b, i.e., synonym sets, sets of distinguishing features and a set of classes in semantic neighborhood; (A∩B) and (A/B) represent intersection and difference respectively, | | is the cardinality of a set; and α is a function that defines relative importance of noncommon characteristics.A set of browsed items that are similar to items from the learner profile is considered as a set of items that can be added to this profile.

IV. BUILDING SEMANTIC USER MODEL USING CONCEPT MAPPING
To overcome these weaknesses of term-based representations, an ontology-based representation [33,34] using wordnet will be performed.Moreover, by defining an ontology base, which is a set of independent concepts that covers the whole ontology, an ontology-based representation allows the system to use fixed-size document vectors, consisting of one component per base concept.
We present a method based on WordNet [35] that improves traditional vector space model.WordNet is an ontology of cross-lexical references whose design was inspired by the current theories of human linguistic memory.English names, verbs, adjectives, and adverbs are organized in sets of synonyms (synsets), representing the underlying lexical concepts.Sets of synonyms are connected by relations.The basic semantic relation between the words in WordNet is synonymy [36].Synsets are linked by relations such as specific/generic or hypernym /hyponym (is-a), and meronym/holonym (part-whole).The principal semantic relations supported by WordNet is synonymy: the synset (synonym set), represents a set of words which are interchangeable in a specific context.WordNet [36] consists of over 115,000 concepts (synsets in WordNet) and about 150,000 lexical entries (words in WordNet).This representation requires two more stages: a) the "mapping" of terms into concepts and the choice of the "merging" strategy, and b) the application of a disambiguation strategy.
The purpose of this step is to identify WordNet concepts that correspond to document words [31].Concept identification is based on the overlap of the local context of the analyzed word with every corresponding WordNet entry.The entry which maximizes the overlap is selected as a possible sense of the analyzed word.The concept identification architecture for the terms in the initial user model is given in figure 6.We use WordNet categories [32] to map all the stemmed words in all documents into their lexical categories.For example, the word "dog" and "cat" both belong to the same category "noun.animal".Some words also has multiple categories like word "Washington" has 3 categories (noun.location,noun.group,noun.person) because it can be the name of the American president, the city place, or a group in the concept of capital.Some word disambiguation techniques are used to remove the resulting noise added by multiple categories mapping which are disambiguation by context and concept map.

A. The Weight of Concept Computation
The concepts in documents are identified as a set of terms that have identified or synonym relationships, i.e., synsets in the WordNet ontology.Then, the concept frequencies are calculated based on term frequency as follows: Where r(c) is the set of different terms that belong to concept C. Note that WordNet returns an ordered list of synsets based on a term.The ordering is supposed to reflect how common it is that the term is related to the concept in standard English language.More common term meanings are listed before less common ones.The authors in [33,34] have showed that using the first synset as the identified concept for a term can improve the clustering performance more than that of using all the synsets to calculate concept frequencies.
Hypernyms of concepts can represent such concepts up to a certain level of generality.The concept frequencies are updated as follows:

Where
is the set of concepts , which are all the concepts within r levels of hypernym concepts of c.
In WordNet, is obtained by gathering all the synsets that are hypernym concepts of synset c within r levels.In particular, returns all the hypernym concepts of c and ( ) returns just c.
The weight of each concept c in document d is computed as follows: Where is the inverted document frequency of concept c by counting how many documents in which concept c appears as the weight of each term t in the document d.
The weights of the concepts after mapping the items in table 3 is shown in table 4 after computing the concepts weights.

V. UPDATE USER MODEL USING WORDNET
During the user is working through the web based learning system, user interests' change quite often, and users are reluctant to specify all adjustments and modifications of their intents and interests.Therefore, techniques that leverage implicit approaches for gathering information about users are highly desired to update the user interests that are often not been fixed.
In order to update user interest [6,37], first of all, we should analysis user's history query keywords.For a certain keyword, we extract the words which have the semantic relationships with the keyword and add them into the user interest model as nodes according to semantic relationships in WordNet.
With new words added constantly, user is always interested in the kind of the words with a higher score which standard for some type of knowledge.We must constantly, update the user model after the users enter the new specific keywords.User model is updated by the new keywords.The incremental updating strategy is used here, and gives the related words the different score according to the relations which reflect their importance of different words in order to render the interestingness of the words.As a result, the words that are more frequent have a higher score.Because of history keywords have the order, the keywords which are inquired later always have more meaning than the keywords which are inquired earlier; it need multiply a factor of attenuation β when increasing the score.Because the keywords are added The main steps of this method can be described as follows: 1) If a new keyword is found in the original user model, we increase the score of the related nodes directly.That is, the node is given by five score after multiplying a factor of attenuation β.If it is not found, we must create a new word node and give it five score.
2) Finding the following three relations between new keywords and inputted words in the user model based on the WordNet: a) Synonymous relations: obtain the synonym set and insert every synonym into the original user model in turn.If the synonym is found in the original user model, we increase the score of the related nodes directly.That is, the node is given by four score after multiplying a factor of attenuation β.

CONCLUSION
We have presented in this paper a novel approach for conceptual document indexing.Our contribution concerns two main aspects.The first one consists on a concept-representation approach of the initial user model items based on the use of WordNet.The approach is not new but, we proposed new techniques to identify concepts and to weight them.In addition to the semantic representation approach to build the semantic user model, we proposed approach to update the user model using the Wordnet.

Figure 5 :
Figure 5: steps to acquire learner interest The weight is a function of the term frequency, collection frequency and normalization factors.Different weighting approaches may be applied by varying this function.Hence, a document j is represented by the document vector d j : ) ,..., , ( 2 1 nj j j j w w w d Where, w kj is the weight of the k th term in the document j.The term frequency reflects the importance of term k within a particular document j.The weighting factor may be global or local.The global weighting factors clarify the importance of a term k within the entire collection of documents, whereas a local weighting factor considers the given document only. ijacsa.thesai.org/

Figure 6 :
Figure 6: Semantic User Model using Concept Mapping

TABLE III SHOWS
THE TERM WEIGHTS IN DIFFERENT DOCUMENTS

Programming Language Program ming of Lists Functions and Foundations Programming of Recursion Procedures Types Memory Management and Control Object Oriented Programm ing Structured Programm ing Concurre ncy and Logic Program ming Distributed Programmi ng Logic Programming ML Programming Language
0980 www.ijacsa.thesai.org/ Concept Weights www.ijacsa.thesai.org/constantly and the scale of the user model becomes bigger, some old nodes must be removed in order to reduce user interest model.
Otherwise, create a new word node with four score and add a new undirected edge labeled synonym relation.b) Hyponym or Hypernym relations: obtain the hyponym or hypernym set and insert every word into the original user model in turn.If the word is found in the original user model, we increase the score of the related nodes directly.That is, the node is given by two score after multiplying a factor of attenuation β.Otherwise, create a new word node with two score and add a new directed edge labeled hyponym or hypernym relation.c) Meronym or Holonym relations: obtain the meronym or holonym set and insert every word into the original user model in turn.If the word is found in the original user model, we increase the score of the related nodes directly.That is, the node is given by one score after multiplying a factor of attenuation β.Otherwise, create a new word node with one score and add a new directed edge labeled meronym or holonym relation.d) In order to reduce user interest model, the nodes which have the lower score must be removed after some time.