A Knowledge-based Topic Modeling Approach for Automatic Topic Labeling

Probabilistic topic models, which aim to discover latent topics in text corpora define each document as a multinomial distributions over topics and each topic as a multinomial distributions over words. Although, humans can infer a proper label for each topic by looking at top representative words of the topic but, it is not applicable for machines. Automatic Topic Labeling techniques try to address the problem. The ultimate goal of Topic labeling techniques are to assign interpretable labels for the learned topics. In this paper, we are taking concepts of ontology into consideration instead of words alone to improve the quality of generated labels for each topic. Our work is different in comparison with the previous efforts in this area, where topics are usually represented with a batch of selected words from topics. We have highlighted some aspects of our approach including: (1) we have incorporated ontology concepts with statistical topic modeling in a unified framework, where each topic is a multinomial probability distribution over the concepts and each concept is represented as a distribution over words, and (2) a topic labeling model according to the meaning of the concepts of the ontology included in the learned topics. The best topic labels are selected with respect to the semantic similarity of the concepts and their ontological categorizations. We demonstrate the effectiveness of considering ontological concepts as richer aspects between topics and words by comprehensive experiments on two different data sets. In another word, representing topics via ontological concepts shows an effective way for generating descriptive and representative labels for the discovered topics. Keywords—Topic modeling; Topic labeling; Statistical learning; Ontologies; Linked Open Data


I. INTRODUCTION
Recently, probabilistic topic models such as Latent Dirichlet Allocation (LDA) [7] has been getting considerable attention. A wide variety of text mining approaches, such as sentiment analysis [26], [3], word sense disambiguation [21], [9], information retrieval [50], [46], summarization [4], and others have been successfully utilized LDA in order to uncover latent topics from text documents. In general, Topic models consider that documents are made up of topics, whereas topics are multinomial distributions over the words. It means that the topic proportions of documents can be used as the descriptive themes at the high-level presentations of the semantics of the documents. Additionally, top words in a topic-word distribution illustrate the sense of the topic. Therefore, topic models can be applied as a powerful technique for discovering the latent semantics from unstructured text collections. Table I, for example, explains the role of topic labeling in generating a representative label based on the words with highest probabilities from a topic discovered from a corpus of news articles; a human assessor has labeled the topic "United States Politics".
Although, the top words of every topic are usually related and descriptive themselves but, interpreting the label of the topics based on the distributions of words derived from the text collection is a challenging task for the users and it becomes worse when they do not have a good knowledge of the domain of the documents. Usually, it is not easy to answer questions such as "What is a topic describing?" and "What is a representative label for a topic?" Topic labeling, in general, aims to find one or a few descriptive phrases that can represent the meaning of the topic. Topic labeling becomes more critical when we are dealing with hundreds of topics to generate a proper label for each.
The aim of this research is to automatically generate good labels for the topics. But, what makes a label good for a topic? We assume that a good label: (1) should be semantically relevant to the topic; (2) should be understandable to the user; and (3) highly cover the meaning of the topic. For instance, "relational databases", "databases" and "database systems" are a few good labels for the example topic illustrated in Table I. With advent of the Semantic Web, tremendous amount of data resources have been published in the form of ontologies and inter-linked data sets such as Linked Open Data (LOD) 1 . Linked Open Data provides rich knowledge in multiple domains, which is a valuable asset when used in combination with various analyses based on unsupervised topic models, in particular, for topic labeling. For instance, DBpedia [6] (as part of LOD) is one the most prominent knowledge bases that is extracted from Wikipedia in the form of an ontology consisting of a set of concepts and their relationships. DBpedia, which is freely available, makes this extensive quantity of information programmatically obtainable on the Web for human and machine consumption.
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 9, 2017 The principal objective of the research presented here is to leverage and integrate the semantic knowledge graph of concepts in an ontology, DBpedia in this paper, and their diverse relationships into probabilistic topic models (i.e. LDA). In the proposed model, we define another latent (i.e. hidden) variable called, concept, i.e. ontological concept, between topics and words. Thus, each document is a mixture of topics, while each topic is made up of concepts, and finally, each concept is a probability distribution over the vocabulary.
Defining concepts as an extra latent variable (i.e. representing topics over concepts instead of words) are advantageous in several ways including: (1) it describes topics in a more extensive way; (2) it also allows to define more specific topics according to ontological concepts, which can be eventually used to generate labels for topics; (3) it automatically incorporates topics learned from the corpus with knowledge bases. We first presented our Knowledge-based topic model, KB-LDA model, in [1] where we showed that incorporating ontological concepts with topic models improves the quality of topic labeling. In this paper, we elaborate on and extend these results. We also extensively explore the theoretical foundation of our Knowledge-based framework, demonstrating the effectiveness of our proposed model over two datasets.
Our contributions in this work are as follows: 1) In a very high level, we propose a Knowledge-based topic model namely KB-LDA, which integrates an ontology as a knowledge base into the statistical topic models in a principled way. Our model integrates the topics to external knowledge bases, which can benefit other research areas such as classification, information retrieval, semantic search and visualization. 2) We define a labeling approach for topics considering the semantics of the concepts that are included in the learned topics in addition to existing ontological relationships between the concepts of the ontology. The proposed model enhances the accuracy of the labels by applying the topic-concept associations. Additionally, it automatically generates labels that are descriptive for explaining and understanding the topics. 3) We demonstrate the usefulness of our approach in two ways. Firstly, we demonstrate how our model connects text documents to concepts of the ontology and their categories. Secondly, we show automatic topic labeling by performing a multiples experiments.
The organization of the paper is as follows. In section 2, we formally define our model for labeling the topics by integrating the ontological concepts with probabilistic topic models. We present our method for concept-based topic labeling in section 3. In section 4, we demonstrate the effectiveness of our method on two different datasets. Finally, we present our conclusions and future work in section 5.

II. BACKGROUND
In this section, we formally describe some of the related concepts and notations that will be used throughout this paper.

A. Ontologies
Ontologies are fundamental elements of the Semantic Web and could be thought of knowledge representation methods, which are used to specify the knowledge shared among different systems. An ontology is referred to an "explicit specification of a conceptualization." [16]. In other words, an ontology is a structure consisting of a set of concepts and a set of relationships existing among them.
Recently, the topic modeling approach has become a popular method for uncovering the hidden themes from data such as text corpora, images, etc. This model has been widely used for various text mining tasks, such as machine translation, word embedding, automatic topic labeling, and many others. In the topic modeling approach, each document is considered as a mixture of topics, where a topic is a probability distribution over words. When the topic distributions of documents are estimated, they can be considered as the high-level semantic themes of the documents.

B. Probabilistic Topic Models
Probabilistic topic models are a set of algorithms that have become a popular method for uncovering the hidden themes from data such as text corpora, images, etc. This model has been extensively used for various text mining tasks, such as machine translation, word embedding, automatic topic labeling, and many others. The key idea behind the topic modeling is to create a probabilistic model for the collection of text documents. In topic models, documents are probability distributions over topics, where a topic is represented as a multinomial distribution over words. The two primary topic models are Probabilistic Latent Semantic Analysis (pLSA) proposed by Hofmann in 1999 [18] and Latent Dirichlet Allocation (LDA) [7]. Since pLSA model does not give any probabilistic model at the document level, generalizing it to model new unseen documents will be difficult. Blei et al. [7] extended pLSA model by adding a prior from Dirichlet distribution on mixture weights of topics for each document. He then, named the model Latent Dirichlet Allocation (LDA). In the following section, we illustrate the LDA model.
The latent Dirichlet allocation (LDA) [7] is a probabilistic generative model for uncovering thematic theme, which is called topic, of a collection of documents. The basic assumption in LDA model is that each document is a mixture of different topics and each topic is a multinomial probability distribution over all words in the corpus.
Let D = {d 1 , d 2 , . . . , d D } is the corpus and V = {w 1 , w 2 , . . . , w V } is the vocabulary set of the collection. A topic z j , 1 ≤ j ≤ K is described as a multinomial probability distribution over the V words, p(w i |z j ), V i p(w i |z j ) = 1. LDA produces the words in a two-step procedure comprising (1) topics generate words and (2)documents generate topics. In another word, we can calculate the probability of words given the document as: (1) Figure 1 shows the graphical model of LDA. The generative process for the document collection D is as follows: The joint distribution of hidden and observed variables in the model is: In the LDA model, the word-topic distribution p(w|z) and topic-document distribution p(z|d) are learned entirely in an unsupervised manner, without any prior knowledge about what words are related to the topics and what topics are related to individual documents. One of the most widelyused approximate inference techniques is Gibbs sampling [15]. Gibbs sampling begins with random assignment of words to topics, then the algorithm iterates over all the words in the training documents for a number of iterations (usually on order of 100). In each iteration, it samples a new topic assignment for each word using the conditional distribution of that word given all other current word-topic assignments. After the iterations are finished, the algorithm reaches a steady state, and the wordtopic probability distributions can be estimated using wordtopic assignments.

III. MOTIVATING EXAMPLE
Let's presume that we are given a collection of news articles and told to extract the common themes present in this corpus. Manual inspection of the articles is the simplest approach, but it is not practical for large collection of documents. We can make use of topic models to solve this problem by assuming that a collection of text documents comprises of a set of hidden themes, called topics. Each topic z is a multinomial distribution p(w|z) over the words w of the vocabulary. Similarly, each document is made up of these topics, which allows multiple topics to be present in the same document. We estimate both the topics and document-topic mixtures from the data simultaneously. After we estimate the distribution of each document over topics, we can use them as the semantic themes of the documents. The top words in each topic-word distribution demonstrates the description of that topic.
For example, Table II shows a sample of four topics with their top-10 words learned from a corpus of news articles. Although the topic-word distributions are usually meaningful, it is quite difficult for the users to exactly infer the meanings of the topics just from the top words, particularly when they do not have enough knowledge about the domain of the corpus. Standard LDA model does not automatically provide the labels of the topics. Essentially, for each topic it gives a distribution over the entire words of the vocabulary. A label is one or a few phrases that adequately describes the meaning of the topic. For instance, As shown in Table II, topics do not have any labels, therefore they must be manually assigned. Topic labeling task can be laborious, specifically when number of topics is substantial. Automatic topic labeling which aims to to automatically generate interpretable labels for the topics has attracted increasing attention in recent years [49], [35], [32], [24], [22]. Unlike previous works that have essentially concentrated on the topics discovered from LDA topic model and represented the topics by words, we propose an Knowledge-based topic model, KB-LDA, where topics are labeled by ontological concepts.
We believe that the knowledge in the ontology can be integrated with the topic models to automatically generate www.ijacsa.thesai.org topic labels that are semantically relevant, understandable for humans and highly cover the discovered topics. In other words, our aim is to use the semantic knowledge graph of concepts in an ontology (e.g., DBpedia) and their diverse relationships with unsupervised probabilistic topic models (i.e. LDA), in a principled manner and exploit this information to automatically generate meaningful topic labels.

IV. RELATED WORK
Probabilistic topic modeling has been widely applied to various text mining tasks in virtue of its broad application in applications such as text classification [17], [29], [44], word sense disambiguation [21], [9], sentiment analysis [26], [30], and others. A main challenge in such topic models is to interpret the semantic of each topic in an accurate way.
Early research on topic labeling usually considers the topn words that are ranked based on their marginal probability p(w i |z j ) in that topic as the primitive labels [7], [15]. This option is not satisfactory, because it necessitates significant perception to interpret the topic, particularly if the user is not knowledgeable of the topic domain. For example, it would be very hard to infer the meaning of the topic shown in Table I only based on the top terms, if someone is not knowledgeable about the "database" domain. The other conventional approach for topic labeling is to manually generate topic labels [34], [48]. This approach has disadvantages: (a) the labels are prone to subjectivity; and (b) the method can not be scale up, especially when coping with massive number of topics.
Recently, automatic topic labeling has been getting more attention as an area of active research. Wang et al. [49] utilized n-grams to represent topics, so label of the topic was its top n-grams. Mei et al. [35] introduced a method to automatically label the topics by transforming the labeling problem to an optimization problem. First they generate candidate labels by extracting either bigrams or noun chunks from the collection of documents. Then, they rank the candidate labels based on Kullback-Leibler (KL) divergence with a given topic, and choose a candidate label that has the highest mutual information and the lowest KL divergence with the topic to label the corresponding topic. [32] introduced an algorithm for topic labeling based on a given topic hierarchy. Given a topic, they generate label candidate set using Google Directory hierarchy and come with the best matched label according to a set of similarity measures.
Lau et al. [25] introduced a method for topic labeling by selecting the best topic word as its label based on a number of features. They assume that the topic terms are representative enough and appropriate to be considered as labels, which is not always the case. Lau et al. [24] reused the features proposed in [25] and also extended the set of candidate labels exploiting Wikipedia. For each topic they first select the top terms and query the Wikipedia to find top article titles having the these terms according to the features and consider them as extra candidate labels. Then they rank the candidate to find the best label for the topic.
Mao et al. [33] used the sibling and parent-child relations between topics to enhances the topic labeling. They first generate a set of candidate labels by extracting meaningful phrases using Ngram Testing [13] for a topic and adding the top topic terms to the set based on marginal term probabilities. And then rank the candidate labels by exploiting the hierarchical structure between topics and pick the best candidate as the label of the topic.
In a more recent work Hulpus et al. [22] proposed an automatic topic labeling approach by exploiting structured data from DBpedia 2 . Given a topic, they first find the terms with highest marginal probabilities, and then determine a set of DBpedia concepts where each concept represents the identified sense of one of the top terms of the topic. After that, they create a graph out of the concepts and use graph centrality algorithms to identify the most representative concepts for the topic.
The proposed model differs from all prior works as we introduce a topic model that integrates knowledge with datadriven topics within a single general framework. Prior works primarily emphasize on the topics discovered from LDA topic model whereas in our model we introduce another random variable namely concept between topics and words. In this case, each document is made up of topics where each topic is defined as a probability distribution over concepts and each concept has a multinomial distribution over vocabulary.
The hierarchical topic models which consider the correlations among topics, are conceptually similar to our KB-LDA model. Mimno et al. [36] proposed the hPAM approach and defined super-topics and sub-topics terms. In their model, a document is considered as a mixture of distributions over super-topics and sub-topics, using a directed acyclic graph to represent a topic hierarchy. Our model, KB-LDA model, is different, because in hPAM, distribution of each super-topic over sub-topics depends on the document, whereas in KB-LDA, distributions of topics over concepts are independent of the corpus and are based on an ontology. The other difference is that sub-topics in the hPAM model are still unigram words, whereas in KB-LDA, ontological concepts are n-grams, which makes them more specific and more representative, a key point in KB-LDA. [11], [12] proposed topic models that integrate concepts with topics. The key idea in their frameworks is that topics of the topic models and ontological concepts both are represented by a set of "focused" words, i.e. distributions over words, and this similarity has been utilized in their models. However, our KB-LDA model is different from these models in that they treat the concepts and topics in the same way, whereas in KB-LDA, topics and concepts make two separate levels in the model.

V. PROBLEM FORMULATION
In this section, we formally describe our model and its learning process. We then explain how to leverage the topicconcept distribution to generate meaningful semantic labels for each topic, in section 4. The notation used in this paper is summarized in Table V. The intuitive idea behind our model is that using words from the vocabulary of the document corpus to represent topics is not a good way to understand the topics. Words usually demonstrate topics in a broader way in comparison with ontological concepts that can describe the topics in more specific manner. In addition, concepts representations of a topic are closely related and have higher semantic relatedness to each other. For instance, the first column of Table IV shows top words of a topic learned by traditional LDA, whereas the second column represents the same topics through its top ontological concepts learned by the KB-LDA model. We can determine that the topic is about "sports" from the word representation of the topic, but the concept representation of the topic reveals that not only the topic is about "sports", but more precisely about "American sports". Let C = {c 1 , c 2 , . . . , c C } be the set of concepts from DBpedia, and D = {d i } D i=1 be a text corpus. We describe a document d in the collection D with a bag of words, i.e., d = {w 1 , w 2 , . . . , w V }, where V is the size of the vocabulary. Definition 1. (Concept): A concept in a text collection D is depicted by c and defined as a multinomial probability distribution over the vocabulary V, i.e., {p(w|c)} w∈V . Clearly, we have w∈V p(w|c) = 1. We assume that there are |C| concepts in D where C ⊂ C.

A. The KB-LDA Topic Model
The KB-LDA topic model is based on combining topic models with ontological concepts in a single framework. In this case, topics and concepts are distributions over concepts and words in the corpus, respectively.
The KB-LDA topic model is shown in Figure 2 and the generative process of the approach is defined as Algorithm 1. P (w, c, z|α, β, γ)

B. Inference using Gibbs Sampling
Since the posterior inference of the KB-LDA is intractable, we require an algorithm to estimate the posterior inference of the model. There are different algorithms have been applied to estimate the topic models parameters, such as variational EM [7] and Gibbs sampling [15]. In the current study, we will use collapsed Gibbs sampling procedure for KB-LDA topic model. Collapsed Gibbs sampling [15] is based on Markov Chain Monte Carlo (MCMC) [42] algorithm which builds a Markov chain over the latent variables in the model and converges to the posterior distribution after a number of iterations. In this paper, our goal is to construct a Markov chain that converges to the posterior distribution over z and c conditioned on observed words w and hyperparameters α, β and γ. We use a blocked Gibbs sampling to jointly sample z and c, although we can alternatively perform hierarchical sampling, i.e., first sample z and then sample c. Nonetheless, Rosen-Zvi [43] argue that in cases where latent variables are greatly related, blocked sampling boosts convergence of the Markov chain and decreases auto-correlation, as well.
The posterior inference is derived from Eq. 3 as follows: P (z, c|w, α, β, γ) = P (z, c, w|α, β, γ) P (w|α, β, γ) ∝ P (z, c, w|α, β, γ) = P (z)P (c|z)P (w|c) where where P (z) is the probability of the joint topic assignments z to all the words w in corpus D. P (c|z) is the conditional probability of joint concept assignments c to all the words w in corpus D, given all topic assignments z, and P (w|c) is the conditional probability of all the words w in corpus D, given all concept assignments c.
For a word token w at position i, its full conditional distribution can be written as: where n (c) w is the number of times word w is assigned to concept c. n In most probabilistic topic models, the Dirichlet parameters α are assumed to be given and fixed, which still produce reasonable results. But, as described in [47], that asymmetric Dirichlet prior α has substantial advantages over a symmetric prior, we have to learn these parameters in our proposed model. We could use maximum likelihood or maximum a posteriori estimation to learn α. However, there is no closed-form solution for these methods and for the sake of simplicity and speed we use moment matching methods [38] to approximate the parameters of α. In each iteration of Gibbs sampling, we update For each document d and topic k, we first compute the sample mean mean dk and sample variance var dk . N is the number of documents and n (d) is the number of words in document d.
Algorithm 2 shows the Gibbs sampling process for our KB-LDA model.
After Gibbs sampling, we can use the sampled topics and concepts to estimate the probability of a topic given a document, θ dk , probability of a concept given a topic, φ kc , and the probability of a word given a concept, ζ cw : www.ijacsa.thesai.org

VI. CONCEPT-BASED TOPIC LABELING
The key idea behind our model is that entities that are included in the text document and their inter-connections can specify the topic(s) of the document. Additionally, the entities of the ontology that are categorized into the same or similar classes have higher semantic relatedness to each other. Therefore, in order to recognize good topics labels, we count on the semantic similarity between the entities included in the text document and a suitable portion of the ontology. Research presented in [2] use a similar approach to perform Knowledgebased text categorization.

Definition 5. (Topic Label):
A topic label for topic φ is a sequence of words which is semantically meaningful and sufficiently explains the meaning of φ.
KB-LDA highlights the concepts of the ontology and their classification hierarchy as labels for topics. To find representative labels that are semantically relevant for a discovered topic φ, KB-LDA involves four major steps: (1) constructs the semantic graph from top concepts from topic-concept distribution for the given topic; (2) selects and analyzes the thematic graph, a semantic graph's subgraph; (3) extracts the topic graph from the thematic graph concepts; and (4) computes the semantic similarity between topic φ and the candidate labels of the topic label graph.

A. Semantic Graph Construction
In the proposed model, we compute the marginal probabilities p(c i |φ j ) of each concept c i in a given topic φ j . We then, and select the K concepts having the highest marginal probability in order to create the topic's semantic graph. Figure  3 illustrates the top-10 concepts of a topic learned by KB-LDA.

Definition 6. (Semantic Graph):
A semantic graph of a topic φ is a labeled graph G φ = V φ , E φ , where V φ is a set of labeled vertices, which are the top concepts of φ (their labels are the concept labels from the ontology) and E φ is a set of edges { v i , v j with label r, such that v i , v j ∈ V φ and v i and v j are connected by a relationship r in the ontology}.
For instance, Figure 4 shows the semantic graph of the example topic φ in Fig. 3, which consists of three sub-graphs (connected components).
Even though the ontology relationships are directed in G φ , in this paper, we will consider the G φ as an undirected graph. www.ijacsa.thesai.org

Topic Label Graph Extraction
The idea behind a topic label grap to find ontology concepts as candidate topic.
We determine the importance of con matic graph not only by their initial w are the marginal probabilities of conc topic, but also by their relative position Here, we utilize the HITS algorithm [1 signed initial weights for concepts to fi itative concepts in the dominant thema sequently, we locate the central concep based on the geographical centrality these nodes can be identified as the them of the graph.

B. Thematic Graph Selection
In our model, we select the thematic graph assuming that concepts under a given topic are semantically closely related in the ontology, whereas concepts from varying topics are located far away, or even not connected at all. We need to consider that there is a chance of generating incoherent topics. In other words, for a given topic that is represented as a list of K concepts with highest probabilities, there may be a few concepts, which are not semantically close to other concepts and to the topic. It consequently can result in generating the topic's semantic graph that may comprise multiple connected components.

Definition 7. (Thematic graph):
A thematic graph is a connected component of G φ . Particularly, if the entire G φ is a connected graph, it is also a thematic graph.

C. Topic Label Graph Extraction
The idea behind a topic label graph extraction is to find ontology concepts as candidate labels for the topic.
The importance of concepts in a thematic graph is based on their initial weights, which are the marginal probabilities of concepts under the topic, and their relative positions in the graph. Here, we apply Hyperlink-Induced Topic Search algorithm, HITS algorithm, [23] with the assigned initial weights for concepts to find the authoritative concepts in the dominant thematic graph. Ultimately, we determine the central concepts in the graph based on the geographical centrality measure, since these nodes can be recognized as the thematic landmarks of the graph. The top-4 core concept nodes of the dominant thematic graph of example topic φ are highlighted in Figure 6. It should be noted that "Boston Red Sox" has not been selected as a core concept, because it's score is lower than that of the concept "Red" based on the HITS and centrality computations ("Red" has far more relationships to other concepts in DBpedia).
From now on, we refer the dominant thematic graph of a topic as the thematic graph.
To exploit the topic label graph for the core concepts CC φ , we primarily consider on the ontology class hierarchy (structure), since we can concentrate the topic labeling as assigning class labels to topics. We present definitions similar to those in [22] for representing the label graph and topic label graph. laying at most three hops away from C i . The union of these graphs G cc φ = V , E where V = V i and E = E i is called the topic label graph.
It should be noted that we empirically restrict the ancestors to three levels, because expanding the distance causes undesirable general classes to be included in the graph.

D. Semantic Relevance Scoring Function
In this section, we introduce a semantic relevance scoring function to rank the candidate labels by measuring their semantic similarity to a topic.
Mei et al. [35] consider two parameters to interpret the semantics of a topic, including: (1) distribution of the topic; and (2) the context of the topic. Proposed topic label graph for a topic φ is exploited, utilizing the distribution of the topic over the set of concepts plus the context of the topic in the form of semantic relatedness between the concepts in the ontology.
To determine the semantic similarity of a label in G cc φ to a topic φ, the semantic similarity between and all of the concepts in the core concept set CC φ is computed and then ranked the labels and finally, the best representative labels for the topic is selected.
Scoring a candidate label is based on three primary goals: (1) the label should have enough coverage important concepts of the topic ( concepts with higher marginal probabilities); (2) the generated label should be more specific to the core concepts (lower in the class hierarchy); and ultimately, (3) the label should cover the highest number of core concepts in G cc φ .
In order to calculate the semantic similarity of a label to a concept, the fist step is calculating the membership score and the coverage score. The modified Vector-based Vector Generation method (VVG) described in [45] is selected to compute the membership score of a concept to a label.
In the experiments, we used DBpedia, an ontology created out of Wikipedia knowledge base. All concepts in DBpedia are classified into DBpedia categories and categories are inter-related via subcategory relationships, including skos:broader, skos:broaderOf, rdfs:subClassOf, rdfs:type and dcterms:subject. We rely on these relationships for the construction of the label graph. Given the topic label graph G cc φ we compute the similarity of the label to the core concepts of topic φ as follows.
If a concept c i has been classified to N DBpedia categories, or similarly, if a category C j has N parent categories, we set the weight of each of the membership (classification) relationships e to: The membership score, mScore(c i , C j ), of a concept c i to a category C j is defined as follows: where E l = {e 1 , e 2 , . . . , e m } represents the set of all membership relationships forming the shortest path p from concept c i to category C j . Figure 7 illustrates a fragment of the label graph for the concept "Oakland Raiders" and shows how its membership score to the category "American Football League teams" is computed.
The coverage score, cScore(c i , C j ), of a concept c i to a category C j is defined as follows: if there is a path from ci to Cj 0 otherwise. (15) The semantic similarity between a concept c i and label in the topic label graph G cc φ is defined as follows: where w(c i ) is the weight of the c i in G cc φ , which is the marginal probability of concept c i under topic φ, w(c i ) = p(c i |φ). Similarly, the semantic similarity between a set of core concept CC φ and a label in the topic label graph G cc φ is defined as: where λ is the smoothing factor to control the influence of the two scores. We used λ = 0.8 in our experiments. It should be noted that SSim(CC φ , ) score is not normalized and needs to be normalized. The scoring function aims to satisfy the three criteria by using concept weight, mScore and cScore for first, second and third objectives respectively. This scoring function works based on coverage of topical concepts. It ranks a label node higher, if the label covers more important topical concepts, It means that closing to the core concepts or covering more core concepts are the key points in this scenario. Topranked labels are selected as the labels for the given topic. Table VI shows a topic with the top-10 generated labels using our Knowledge-based framework.

VII. EXPERIMENTS
In order to evaluate the proposed model, KB-LDA, we checked the effectiveness of the model against the one of the state-of-the-art text-based techniques mentioned in [35]. In this paper we call their model Mei07.
In our experiment we choose the DBpedia ontology and two text corpora including a subset of the Reuters 3 news   [39]. More details about the datasets are available in [1]. At the fist step, we extracted the top-2000 bigrams by applying the N-gram Statistics Package [5]. Then, we checked the significance of the bigrams performing the Student's T-Test technique, and exploited the top 1000 ranked candidate bigrams L. In the next step, we calculated the score s for each generated label ∈ L and topic φ. The score s is defined as follows: where PMI is defined as point-wise mutual information between the topic words w and the label , given the document corpus D. The top-6 labels as the representative labels of the topic φ produced by the Mei07 technique were also chosen .

A. Experimental Setup
The experiment setup including pre-processing and the processing parameters presented in details in [1].

B. Results
Tables VII and VIII shows sample results of our method, KB-LDA, along with the generated labels by the Mei07 approach as well as the top-10 words for each topic. We compared the top words and the top-6 labels for each topic and illustrated them in the respective Tables. The tables confirm our believe that the labels produced by KB-LDA are more representative than the corresponding labels generated by the Mei07 method. In regards to quantitative evaluation for two aforementioned methods three human experts are asked to compare the generated labels and choose between "Good" and "Unrelated" for each one.
We compared the two different methods using the Preci-sion@k, by considering the top-1 to top-6 generated labels. The Precision factor for a topic at top-k is represented as follows:   Topic 20  Topic 1  Topic 18  Topic 19  Topic 3   league  company  space  bank  china  team  stock  station  financial  chinese  game  buzz  nasa  reuters  beijing  season  research  earth  stock  japan  football  profile  launch  fund  states  national  chief  florida  capital  south  york  executive  mission  research  asia  games  quote  flight  exchange  united  los  million  solar  banks  korea  angeles  corp  cape  group   By considering the results in Figure 8, two interesting observations are revealed including: (1) in Figure 8a for up to top-3 labels, the precision difference between the two methods demonstrates the effectiveness of our method, KB-LDA, and (2) the BAWE corpus shows the higher average precision than the Reuters corpus. More explanations are available in [1].
Topic Coherence. In our model, KB-LDA, the topics are defined over concepts. Therefore, to calculate the word distribution for each topic t under KB-LDA, we can apply the following equation: Table IX illustrates the top words from LDA and KB-LDA approaches respectively along with three generated topics from the BAWE corpus.
As Table IX demonstrates that the topic coherence under KB-LDA is qualitatively better than LDA. The wrong topical words for each topic in Table IX are marked in red and also italicized.
We also calculate the coherence score in order to have a quantitative comparison of the coherence of the topics generated by KB-LDA and LDA based on the equation defined in [37]. Given a topic φ and its top T words V (φ) = (v (φ) 1 , · · · , v (φ) T ) ordered by P (w|φ), the coherence score is represented as: where D(v) is the document frequency of word v and D(v, v ) is the number of documents in which words v and v cooccurred. Higher coherence scores shows the higher quality www.ijacsa.thesai.org  of topics. The coherence scores of two methods on different datasets are illustrated in Table XI. As we mentioned before, KB-LDA defines each topic as a distribution over concepts. Table X illustrates the top-10 concepts with higher probabilities in the topic distribution under the KB-LDA approach for the same three topics i.e."topic 1", "topic2", and "topic3" of Table IX. VIII. CONCLUSIONS In this paper, we presented a topic labeling approach, KB-LDA, based on Knowledge-based topic model and graphbased topic labeling method. The results confirm the robustness and effectiveness of KB-LDA technique on different datasets of text collections. Integrating ontological concepts into our model is a key point that improves the topic coherence in comparison to the standard LDA model.
In regards to the future work, defining a global optimization scoring function for the labels instead of Eq. 17 is a potential candidate for future extensions. Moreover, how to integrate lateral relationships between the ontology concepts with the topic models as well as the hierarchical relations are also other interesting directions to extend the proposed model.