An Improved Multi-label Classifier Chain Method for Automated Text Classification

Automated text classification is the task of grouping documents (text) automatically into categories from a predefined set. The conventional approach to classification involves mapping a single class label each to a data point (instance). In multi-label classification (MLC), the task is to develop models that could predict multiple class labels to a data instance. There exist several MLC methods such as classifier chain (CC) and binary relevance (BR). However, there are drawbacks with these methods such as random label sequence ordering issue. This study attempts to address this issue peculiar with the classifier chain method. In this paper, a hybrid heuristic evolutionary-based technique is proposed. The proposed PSOGCC is a combination of particle swarm optimization (PSO) and genetic algorithm (GA). Genetic operators of GA are integrated with the basic PSO algorithm for finding the global best solution representing an optimized label sequence order in the chain classifier. In the experiment, three MLC methods: BR, CC, and PSOGCC are implemented using five benchmark multilabel datasets and five standard evaluation metrics. The proposed PSOGCC method improved the predictive performance of the chain classifier by obtaining the best results of 98.66%, 99.5%, 99.16%, 99.33%, 0.0011 accuracy, precision, recall, f1 Score, and Hammingloss values, respectively. Keywords—Text classification; multi-label classification; classifier chain; particle swarm optimization; genetic algorithm


I. INTRODUCTION
Automated text classification (ATC) is the task of developing predictive models capable of categorizing text documents into distinct class labels from a predefined set. In other words, ATC is a technique that involves the process of managing and processing a vast number of documents in a continually increasing form. Conventionally, classification technique [1]- [3] focuses on the development of predictive model, a function that learnt to map an input to an output , . . , : → . This traditional approach to classification is otherwise termed single-label classification (SLC). Unlike the classical SLC technique, where an instance of a data sample is associated with a single class label, multi-label classification (MLC) [4]- [6] involves the problem of assigning to a data point (instance) multiple class labels simultaneously.
Given an input vector = ⌈ 1 , 2 , ⋯ , ⌉ and a vector of labels = ⌈ 1 , 2 , ⋯ , ⌉ , the goal of MLC is to build a model applicable in predicting one or more class labels simultaneously provided the labels are not mutually exclusive. The multi-label classification concept primarily originated from text [5]. In a real-world scenario, a document (such as news article) could have multiple themes (topics) like entertainment, business, security, health, science, etc. To automate the categorization of such related textual data, MLC methods and techniques have been proposed. The existing MLC techniques could be broadly categorized into two approaches [6]: problem transformation and algorithm adaptation.
In problem transformation (PT) approach, the strategy involves transforming a multi-label problem into multiple single-label problems and learn one of the SLC algorithms (or classifiers) such as decision trees, for modeling the membership class (label). Subsequently, a new observation (test instance) is then predicted by combining the output of the positive predictions from the baseline classifiers. The PT strategy [7] is a very straightforward, easy, and flexible multilabel classification approach. Most of the conventional MLC algorithms such as binary relevance (BR), label powerset (LP), calibrated label ranking (CLR), and classifier chain (CC) adopt the PT strategy for MLC tasks.
Algorithm adaptation (AA) approach is based on inducing a conventional machine learning classification algorithm (singlelabel classifier) for multi-label problem. In other words, in AA strategy, a learning algorithm (classifier) such as support vector machine (SVM) is modeled and directly applied on MLC problems. This approach to MLC has been less applied by researchers due to its limitations such as lack of flexibility, complexity [8]. Notable algorithms that have adopted AA approach include ML-kNN, BP-MLL, and BR-kNN.
Classifier chain (CC) [9] [10] is one of the conventional MLC methods based on the problem transformation approach. The method is a direct extension of binary relevance (BR), developed to address the issue of label correlations. In BR, labels are taken as independent classifiers; hence the algorithm ignores labels inter-correlations. However, CC models consider labels as a chain-like structure, allowing communication (i.e., sharing of predictions) among the underlaying classifiers. The multi-label classification method has shown to be very competitive, achieving better classification results compared to other classical MLC methods such as BR [9].
Although, CC algorithm has been widely applied to several applications [11], [12]- [17], the method suffers from a major setback, which is the labels ordering issue [11], [12]. The conventional CC method adopts a random approach for labels sequence ordering, but studies have shown that the random labels sequence ordering may affect the performance of the classification method [11]. Attempts have been made to improve the original CC method, particularly to address the random labels sequence ordering issue, with several CC extensions proposed. This work attempts to further improve the standard CC method using a new alternative approach. In this paper, a hybrid heuristic evolutionary-based technique is proposed. The proposed PSOGCC optimization technique is a combination of particle swarm optimization (PSO) and genetic algorithm (GA).
The contributions of this work are grouped into three folds. First, we proposed an improved multi-label classifier chain method based on hybrid heuristic evolutionary techniques. Second, the proposed PSOGCC method is successfully demonstrated with standard benchmark multi-label datasets. Third, several conventional metrics are exhaustively employed to validate the performance of the proposed method against standard BR and CC methods in terms of Accuracy, Precision, Recall, -Measure, and Hammingloss.
The rest of this paper is organized as follows. Section 2 reviewed related works, with focus on multi-label classifier chain method. Section 3 documented the experiment and method, and Section 4 presented the classification results. Section 5 concluded with direction to future works.

II. RELATED WORKS
MLC is an emerging, growing field in the area of machine learning and data mining. MLC methods and techniques have been applied to various application domains including [4], [6], [18]- [20].
Specifically, there have been a growing number of works [11], [17], [21], [22] based on implementing and improving the multi-label classifier chain method. As aforementioned, CC is an extension of the classical BR method. The classifier chain method improved on BR by taking into consideration label correlations. The method works by modeling a set of binary classifiers (learning phase) based on the random label sequence ordering defined in the chain. The learning algorithm is then used to predict (a target label) taking into consideration the predictions of preceding labels in the chain. Given a new observation (prediction phase), the classifier makes prediction (following same procedure in the learning phase), by combining all positive predictions (outputs) of the classifiers. The performance of CC is sensitive to the label sequence order, which may be likely prone to "error propagation" in the chain. Several attempts have been made to overcome the limitations of CC.
In [23] an efficient label ordering approach was proposed for improving multi-label classifier chain accuracy. The proposed approach is based on exploiting semantic relationships among labels. The method achieved better accuracy compared to the original CC method. Also, a decision function based on Bayesian network was proposed in [24] for multi-label classifier chain. Similarly, [22] employed the use of Bayesian network based on conditional entropy for discovering label correlation and order of labels in the chain classifier. The author in [25] proposed an improved classifier chain method based on conditional likelihood maximization. A k dependence classifier chains with label-specific function was developed. The method is shown to be effective. A cost-sensitive CC method was proposed in [12] for selecting low-cost features in multi-label classification. The method combined classifier chain with logistic regression dimensionality reduction technique.
In this paper, a hybrid heuristic evolutionary-based technique is proposed for improving the performance of multilabel classifier chain method. Heuristic techniques [26]- [30] are a set of intelligent self-learning algorithms developed to search for the optimum (best) solution to an optimization problem. Evolutionary-based heuristic methods are optimization algorithms that mimic the natural biological process (nature) in finding solutions to optimization problems. Most common and widely applied of the evolutionary-based optimization algorithms include: genetic algorithm, PSO, differential evolution, ant colony optimization algorithm, bee optimization algorithm, artificial immune system, cuckoo search, firefly algorithm, and tabu search algorithm.
The proposed technique applied in this work combined PSO and GA for finding the global solution that best represents an optimized label sequence order in the chain. Genetic operators: selection, crossover, and mutation, were integrated into the basic PSO algorithm for improving the search process, updating and maintaining diversity of the population (solutions). Details of the research methodology are presented in the next section.

A. PSO Algorithm
Particle swarm optimization (PSO) is a population-based heuristic algorithm developed by Eberhart and Kennedy for solving optimization problems. The heuristic algorithm was influenced by the social behavior of species of animals such as birds flocking, fish schooling etc. In PSO algorithm (shown in Algorithm 1), a population entity called particle is assigned with position and velocity. A particle is a potential solution to a given problem. Each of the particles, represented as −dimensional vector, moves around in the solution space, adjusting its position and velocity at every iteration using (1) and (2) respectively. Each particle has memory and remembers its previous best position based on its experience. The global best represented as is the collective best position in the swarm. Each particle knows the global best and move towards it. The performance of each particle (at every successive iteration) is measured using a fitness function.

B. GA Algorithm
Genetic Algorithm (GA) is a global search optimization algorithm developed by Holland and based on the concept of natural selection adopted from the principle of Charles' Darwin theory of evolution. GA is one of the most important and successful evolutionary-based heuristic method. The algorithm has been widely applied to several application problems [31]- [33]. The algorithm uses genetic operators: selection (or reproduction), crossover (or recombination), and mutation, to find (or produce) the global best solution to a given problem.
The evolutionary-based algorithm works (refer to Fig. 1) by first generate random initial population. At each generation, the quality of individuals (candidate solutions) is validated using a defined fitness function. Selection operator is applied to identify (select) individuals from the current generation based on the best fitness values. The process is improved through crossover and mutation operators until a new (better) population is created. The search ends with a termination criterion when the maximum iteration limit is reached or the best solution is found.

III. METHODOLOGIES
In this paper, the experimental work comprises of four phases. These include input (data), preprocessing, classification, and output (results).
The experimental work is carried out using 5 benchmark multi-label datasets from Mulan (an open source library for multi-label classification problem). The standard datasets (in Table I) are from the most commonly experimented MLC datasets. The input data is preprocessed using StringToWordVector filtering tool and Term frequency inverse-document frequency ( ) . These are from the standard preprocessing techniques often applied in machine learning problems. The preprocessed data is stored in ARFF (Attribute-Relation File Format), the standard file format for machine learning using Mulan and Weka.  Fig. 2) is a hybrid of PSO and GA. The combined heuristic techniques are used to find the global best solution that best represents an optimized label sequence order in the chain classifier. PSO is an efficient, simple optimization algorithm and GA is a powerful, robust global search algorithm. Genetic operators: selection, crossover, and mutation, are applied for the population updates and reproduction of new generations (individuals). In PSOGCC (as shown in Algorithm 2), the optimization algorithm takes as input a training set and produces as output an optimized label sequence , representing the global optimum solution found in the chain. The entire algorithmic process could be broadly categorized into two: PSO loop (1 − 7) and GA loop (9 − 20).
In the first phase (PSO loop), population of particles (also called individuals) is initialized randomly with position , velocity , and swarm size . Individual particles are represented as -dimensional vectors (where is equivalent to the number of predefined labels). The particles are encoded as integers representing label sequence indexes in the range value [1, ] . Individual particle's previous best position is initialized with a copy of its current position . The quality of particles is assessed using a defined fitness function ( ) in (3). Subsequently, the global best is initialized with the index of the best fitted particle. (an optimized label sequence) Step 1: Initialize population of particles (potential candidate solutions representing the label sequences) with random positions , velocities , and swarm size . Set the particle's previous best position to the current position ( = ) Step 2: Given a training set Step 3: all particles (label sequences) in the population Step 4: Build the classifier chain (CC) model (using standard − cross validation) Step 5: Compute the particle's fitness ( ) using ( ) Step 6: Update the population and set the best particle to the current population Step 7: end Step 8: repeat Step 9: Partition the training set into and Step 10: all particles (candidate label sequence ) in the current population Step 11: Construct the CC model using and label sequence Step 12: Evaluate the fitness (quality) of the CC model using the and the fitness function ( ) defined in ( ) Step 13: Apply Genetic operators: Step 14: Using Tournament selection approach parents (particle with best fitness value) Step 15: Generate new particles (child) from the old ones (parents) with operator Step 16: Apply procedure (to the offspring) Step 17: Update particles and population using − approach Step 18: end Step 19: Until ( ( ) ) Step 20: return (optimum solution rep labeled-ordered sequence) 445 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 3, 2021 In the second phase (GA loop), standard genetic operators: selection, crossover, and mutation, are applied. Classifier chain (CC) models are built and further evaluated using the fitness function ( ) defined in (4). Genetic tournament selection strategy [34] is applied to select the best individuals to be recombined for producing a new generation (offspring). Order crossover operation [35] is performed using the selected individuals, resulting in the generation of new individuals. Thereafter, mutation operator is applied on the new individuals in order to avoid being trapped in the local minima. Age-based elitism replacement approach [36] is employed to replace the old generation with new ones while preserving a small group (elite individuals) in the population. This helps to improve and maintain diversity in the population. The PSOGCC implementation ends with the termination criteria and the global best solution representing an optimized label sequence order in the chain classifier is returned.
( & ) are control parameters (for balancing the trade-offs of particle's , , , and ); represents the accuracy of the baseline classifier; ( & ) represent population size and neighborhood size, respectively. , , , , 1 score are standard performance metrics.

IV. EXPERIMENTS AND RESULTS
This section details the experiments performed and the simulation results obtained. Five benchmark multi-label datasets with five conventional performance metrics were employed to validate the performance of the proposed PSOGCC method against the standard binary relevance (BR) and classifier chain (CC) multi-label classification algorithms. The classification results were compared in terms of: Accuracy Accuracy ( ) [ 5] is a standard performance metric used to measure the correctly classified instances across data points. The higher the accuracy value, the better the classification algorithm. Precision [ 6], Recall [ 7], and −Measure [ 8] are performance metrics often applied in classification problems to measure the degree of correctness of the positively classified instances. An effective classifier should have high precision, recall, and 1 . Lastly, Hammingloss [ 9] evaluation metric helps to measure the degree of incorrectness (misclassification) wrongly predicted by the classification algorithm. In general, a good classifier is one with high accuracy, precision, recall, 1 , and low Hammingloss values.
The three multi-label classification methods: PSOGCC, BR, and CC, produced competitive results. In Table II, the proposed PSOGCC achieved the highest accuracy result of 98.66% with the genbase multi-label dataset. Closely followed by the CC method with 98.15% accuracy while BR obtained 98.06%. From the accuracy results, it could be observed why the classifier chain (CC) outperformed the traditional BR algorithm. This is due to the limitation (associated with BR) of ignoring label correlations. Also, the proposed PSOGCC heuristic method outperformed the other two methods due to its combined advantages of considering label correlations and finding an optimized label sequence order, thereby addressing the limitation of the original CC method (i.e., random label sequence order in the chain).
Tables III to V presented the experimental results in terms of precision, recall, and -Measure, respectively. Consistently, the proposed PSOGCC optimization algorithm outperformed both BR and CC multi-label methods. PSOGCC obtained the highest scores of 99.5%, 99.16%, and 99.33% precision, recall, and 1 respectively. These results further proved the effectiveness and superiority of the proposed method compared to the other two classical methods: binary relevance and classifier chain.
Finally, Table VI showed the Hammingloss values of the three classification methods obtained across the benchmark multi-label datasets. As aforementioned, Hammingloss metric helps to check the frequency of misclassification by the classifier. A good classifier should have less labels misclassified (i.e., low Hammingloss value). From the result, it could be seen that the proposed PSOGCC performed best compared to BR and CC. The method obtained the lowest Hammingloss value of 0.0011 with genbase dataset. The original CC method came second with 0.0102 Hammingloss value while BR performed the least (0.0121).
To further show a clearer and easier understanding of the classification results, the performance of the three MLC methods are presented in graphical forms as plotted in Fig. 3 to 7. The results comparisons showed the proposed PSOGCC had better performance across the multi-label datasets. This reflects the significance influence of finding an optimized label sequence order in the chain classifier.

V. CONCLUSION
Single-label classification (SLC) involves predicting a single class (output) for a particular data instance (input) whereas in multi-label classification (MLC), the task is to develop predictive models capable of assigning multiple class 447 | P a g e www.ijacsa.thesai.org labels simultaneously (to a single instance). In MLC, there are standard methods such as binary relevance (BR), classifier chain (CC), and label powerset (LP). There exist limitations with these methods such as ignoring label correlations (associated with BR), complexity (associated with LP), and random label ordering (associated with CC). This study attempted to improve the predictive performance of the multilabel CC method. In this work, the randomized label sequence order issue of CC is addressed. To achieve this, the study proposed a hybrid heuristic evolutionary-based technique.
Heuristic techniques involve developing a set of intelligent self-learning algorithms designed for finding the optimal best solution to an optimization problem. In this paper, PSOGCC multi-label classification method is proposed to extend the original CC method. The evolutionary-based algorithm is a combination of particle swarm optimization (PSO) and genetic algorithm (GA). The proposed PSOGCC method is used to find the global best solution representing an optimized label sequence order in the chain classifier. Genetic operators: selection, crossover, and mutation were integrated with the basic PSO for optimizing the search problem.
The experiment was conducted using five benchmark multi-label datasets. Furthermore, five evaluation metrics were applied to validate the performance (predictions) of the proposed PSOGCC against standard BR and CC methods. Results were presented in Tables II to VI in terms of accuracy, precision, recall, f-measure, and Hammingloss respectively. The proposed PSOGCC achieved the overall best classification results of 98.66%, 99.5%, 99.16%, 99.33%, 0.0011 accuracy, precision, recall, -measure, and hammingloss values respectively.
In the future work, the proposed technique will be further validated using more multi-label datasets. Also, it is recommended to compare the performance of PSOGCC against other standard MLC algorithms. Finally, the research study will be further extended to employ other recent heuristic evolutionary-based techniques such as bat algorithm, whale optimization algorithm, and firefly algorithm etc.