Predicting Quality of Answer in Collaborative Q/A Community

Community Question Answering (CQA) services have emerged allowing information seekers pose their information need which is questions and receive answers from their fellow users, also participate in evaluating the questions or answers in a variety of topics. Within this community information seekers could interact and get information from a wide range of users, forming a heterogeneous social networks and interaction between users. A question may receive multiple answers from multiple users and the asker or the fellow users could choose the best answer. Freedom and convenience in participation, led to the diversity of the information. In this paper we present a general model to predict quality of information in a CQA by using non textual features. We showing and testing our quality measurement to a collection of question and answer pairs. In the future our models and predictions could be useful for predictor quality information as a recommender system to complete a collaborative learning.


INTRODUCTION
Community Question Answering (CQA) has recently become available for information seekers.Beside web search engines, information seekers today have an option to inform their questions on CQA sites and answered by other users.Comparing with information through search engines such as Google [1] [6], which the results are not always correspond to user requirements, in Community Question Answering (CQA) information seekers provides the information needed by other users such as Yahoo!Answer, Naver or Answer Bag.
These communities have become quite popular in the last several years for a number of reasons.First, because of the targeted response from users with knowledge or experience, it is making users more useful and easy to understand the information.Second, the information also provides consolidated communication environment in which the information related to the questions could be seen.This environment facilitates multiple answers (likely from a different perspective) and discussion (in the form of comments) which could benefit the questioner (and others as well).
By clarification and suggestion (using email or other means), it is possible for the questioner to interact with the answerer.This paradigm is, although, quite different from the instantaneous search for stored information, this is likely to provide the questioner with useful answer.Finally, the forum provides an incentive for the users to show their skills and in the process get acknowledged by the community.Such as collaborative learning, users could exploit and share their resources and skills by asking information, evaluating, monitoring one another's information and idea.
Many CQA service providing non-textual information related to their document collections.Usually textual features are used to measure relevance of the document to the query and non textual features can be utilized to estimate the quality of the document.The information from non-textual feature has potential for improving search quality [2] such as points, best answers, contributor etc.In the other hand, the quality of information given by traditional content could be favorable and trusted.For the social media of CQA, the quality of information is diverse, from the high-quality, low-quality or spam.The quality of an answer or of any information in document content for that matter could be subjective.
Jeon et al [2] [3] using non-textual features to predict quality of answers.They collected Q&A pair of data and 13 features from the Naver Q&A service which is written in Korean.
To handle various types of non-textual features and build a stochastic process that could predict the quality of documents, they use kernel density estimation [12] and maximum entropy approach.
[13] Introduce the problem of predicting information seekers satisfaction in collaborative question answering communities.[16] also trying to predict selected information by using 13 quality criteria to evaluated the answers (5-point Likert scale) and 9 feature.Occasionally, answerer's temporal characteristic could significantly contribute to the quality of an answer beside activity feature [17].This paper presents a method for systematically processing non-textual feature to predict the quality of information collected from specific Indonesian web service (id.Y!A) using classifier.

A. System Architecture
The proposed method in this paper consists of four parts.There are data collection, feature extraction, coefficient correlation with an answers, and classification.Figure 1 showing the architecture of the proposed system.www.ijarai.thesai.org

B. Data Collection
Our data is based on a snapshot of Yahoo!Answer for Indonesian people (http://id.answers.yahoo.com/),a popular CQA site.Our first step is collecting categories that have the highest activity (question resolved) from the 26 category.From table 1, we could see a category that has high activity.There are music and entertainment category, society and culture category, computers and internet category, family and relationship category, and the last consumer and electronic category.In order to focus on a realistic question and answer, we choose internet and computer category.The selection is based on the idea that several sub category on music entertainment and society culture providing highly subjective answer such as religion and spirituality.
We collected 258870 Q&A pairs from id.Y!A service (internet and computer), all question and answer are written in Indonesian.We randomly selected resolved question from 7 sub category and all we found 1500 Q&A pairs.The quality of a Q&A depends on the question part and answer part.For the question part we use most popular resolved question.Users could not get any useful information from bad questions.The reality bad questions always lead to bad quality answers.Therefore we decide to estimate only the quality of answers and consider it as the quality of the Q&A.In the Y!A CQA, multiple answers are possible for a single question and the questioners selects the best answer.We extract features only from the best answer.We use statement for evaluating answers [13].The asker personally has closed the question and selected the best answer; also provide a rating of at least 3 stars for the best answer quality.
The information of CQA is typically complex and subjective.We use annotators for manual judgment of answer quality and relevance.General, good answers tend to be relevant, information, objective, sincere and readable.We may separately measure these individual factors and combine scores to calculate overall the quality of the answer.Therefore, we propose to use a holistic view to decide the quality of an answer.Our annotators read answers, consider all of the above factors and specify the quality of answers in three levels: Bad, Medium and Good (in the future classified as good, medium and bad).

C. Feature Extraction
First we will extract feature vectors from a Q&A pair (answer yahoo).We extract 18 non-textual features, divide as answer feature/AF (feature 1 to 8) and answerer user history/AUH (feature 9 to16).Because in community question answer, multiple answers for single answer are possible.We extract features only form the questioner selects (best answer).The features are; (1) Star: Number of stars that given by questioners from one to five stars to the answer.
(2) Reference: When answer the question; sometime answerer's give the reference for the answer.
(  (12) Total number of answer: Total number of all answerer's that answers answered previously.(13) Number of best answer: Total number of best answer.(14) Best answerers acceptance ratio: The ratio between best answers to all the answers that the answers answered previously.(15) Number of other answer: Total number of other answer (not best answer) that answerer's answered previously.(16) Answerers other acceptance ratio: Ratio of other answers (not best answer) to all the answerer's answered previously.(17) Best and other answer ratio: Ratio of best answers to the other answers previously.
(18) Answer question ratio: Ratio of all answer to the entire question previously.www.ijarai.thesai.org

D. Correlation Coefficient
The function of the correlation coefficient is to know how closely one variable is related to another variable [4], in this case the correlation between individual features and the annotators scores (good answers have higher scores: Bad = 0, Medium = 1, Good = 2).Table 2 showing 13 features' that have strongest correlation with the quality of answer.Surprisingly, number of char and number of word have the strongest correlation with the quality of the answer.On the other side, number of star is not the feature that has strongest correlation with the quality of the answer.This means the number of stars that given by questioners evaluation is subjectively, some of users opinion does not agree with the answer.Almost users appreciate getting answers regardless of the quality of the answers.This user behavior may be related to the culture of Indonesian users, same as Korean users [2].
The formula for Pearson's Correlation Coefficient: (1) Figure 2 show the distributions of good, medium and bad quality answer for word length.Good answers are usually longer than bad and medium answer.

E. Classification Algorithms
We explored Decision Tress, Boosting and Naïve Bayes, using Weka framework [15].Using a decision tree classifier, we expect o get high precision on the target class.Support vector machines are considered the classifier of choice for many tasks, and to handle the noisy features use AdaBoost.Using Naïve Bayes cause has performed very simple and fast, effective method to investigate the success of our experiment.

III. IMPLEMENTATION AND RESULTS
We l implement the proposed methods to the Q&A pair of data.There are four kind data for the classification, data from the entire feature, data with high correlation (> 0.1 and > -0.1), data from answer feature, and data from answer user history.We build the predictor using 815 training data and 302 testing data (from the annotators we get 1117 related Q&A pair data).Table 3 reports prediction accuracy for the different implementation of answer quality, in particular comparing the choice in classifier algorithm, feature sets (using all feature, Correlation feature, answer feature, answerer history feature) and test option.Surprisingly C4.5 results in the best performance of all the classification variants, with accuracy on the satisfied class of 91.9 for all features.From the same table we could see that by using answer feature (AF) and answerer user history (AUH) the accuracy it is not so good, especially for answerer user history.For the answer feature is closed to within 3.07 with all feature and 2.59 with Correlation feature.
The geometric mean of precision and recall measures (F1) reported in Table 4.We could see from all feature set and Correlation feature set by using test option, C4.5 have higher F1 for 91.9, training set, 89.1 testing set and 81 using 5 cross validation.Another interesting result from Table 4 and 5 we could see that the differences between all features and Correlation feature, is not too significant for accuracy it is about 0,52.This indicates that feature which does not have high correlation is not too pretty significant impact for classification results.

IV. IMPLEMENTATION AND RESULTS
In this paper we presented our knowledge to quantify and predict quality of answer in question answering communities, especially for Indonesian CQA.Beyond developing models to select best answer and evaluate the quality of answers, there are several important lessons to learn here for measuring content quality in CQA.We find huge variety of question and answer on CQA services, and by given question may several answers are providing from the community.
With appropriate features, we could build models that could have significantly higher probability of identifying the best answer class than classifying a non-best answer.From the entire system by using Q&A pairs from id.answer yahoo, 18 feature and 3 type classification.We conclude as following: (19) From the four existing feature, the highest accuracy exist on all feature set (comparing with correlation coefficient set, AF set and AUH set) (20) The best performance of all classification variants by using C4.5, with average accuracy 91.90 , precision 91.9 and recall 91.9 In the future our models and predictions could be useful for predictor quality information as a recommender system to complete a collaborative learning.