Neuro-fuzzy System with Particle Swarm Optimization for Classification of Physical Fitness in School Children

Physical fitness is widely known to be one of the critical elements of a healthy life. The sedentary attitude of school children is related to some health problems due to physical inactivity. The following article aims to classify the physical fitness in school children, using a database of 1813 children of both sexes, in a range that goes from six to twelve years. The physical tests were flexibility, horizontal jump, and agility that served to classify the physical fitness using neural networks and fuzzy logic. For this, the ANFIS (adaptive network fuzzy inference system) model was used, which was optimized using the Particle Swarm Optimization algorithm. The experimental tests carried out showed an RMSE error of 3.41, after performing 500 interactions of the PSO algorithm. This result is considered acceptable within the conditions of this investigation. Keywords—Classification; ANFIS; particle swarm optimization; physical fitness; RMSE


I. INTRODUCTION
The World Health Organization (WHO) has highlighted some important key facts about people's sedentary attitude and their relationship to some health problems due to physical inactivity. Some of these show that a lack of physical activity is among the top 10 risk factors for death and is also significantly related to other diseases [1]. Besides, a low level of physical activity is related to low levels of physical fitness. However, Physical Fitness (PF) is a reliable indicator of health in childhood years as well as in adulthood [2].
In this perspective, the evaluation of physical fitness is important. In general, physical fitness tests within the school education system are an important tool to measure the achievements of the learning standards associated with physical education Test is understood as the instrument or procedure that measures an observable response, in this case, that of physical fitness this can be measured through flexibility, jumping, agility tests, considering age, Body Mass Index (BMI), maximal oxygen consumption, Maximum Expiratory Flow (MEF), among others [3], that can be measured with the use of measuring instruments of the different types of physical fitness [4].
There are some studies on the classification of physical activity and physical fitness developed with supervised machine learning algorithms (SML) [5] and neural networks [6]. They have also been applied with Neuro-fuzzy systems that combine fuzzy logic and neural networks [7] that monitor human physical activity [8], and computational intelligence techniques to evaluate anthropometric indices [9]. In these works, a classification process is carried out using the ANFIS model, and the authors propose fuzzy sets and fuzzy rules based on anthropometric indicators and physical fitness tests for their classification.
However, to date, no studies have been found that seek to optimize these neuro-fuzzy models with some evolutionary algorithm, which allows optimizing the error when classifying physical fitness in school children. These evolutionary algorithms such as genetic algorithms, ant colony, swarms of particles would allow optimizing the results obtained in a system of classification of physical fitness.
In the present work, the question is raised that if the implementation of a hybrid neuro-fuzzy system (ANFIS) with Particle Swarm Optimization (PSO) will optimize the classification error of physical fitness in school children.
To answer the question posed, the classification system of physical activity in male and female school children between six to twelve years of age in the Arequipa-Peru region was designed; the system has as input attributes: age, weight, height and BMI, maximal oxygen consumption, MEF; and as output the classification of low PF, standard PF, and high PF. Training and classification tests are carried out with the Matlab r2018a ANFIS neuro-fuzzy model, using the genfis3 function, with a fuzzy Sugeno-type model, an FCM clustering technique (fuzzy c-means clustering), with unsupervised learning; The FIS, created from ANFIS, is evaluated to optimize the mean square error obtained in the classification by using the PSO algorithm.
After this introduction, this article is organized as follows: The related works to this article are explained in Section II, the methodology in Section III, and the results of experimentation in Section IV, and finally the conclusions and future work in Section V.

II. RELATED WORK
In recent years, work on classification using machine learning and neural network techniques has been used. The classification of the first part refers to the works related to 505 | P a g e www.ijacsa.thesai.org In 2020, Cai et al. [10] proposed a machine learning prediction model for successful aging (SA) based on physical fitness tests. Four machine learning models (logistic regression, deep learning, random forest, and gradient boosting decision tree) were applied to develop the prediction models, and the analyzed sample was 890. The accuracy and area under the curve of all four machine learning models was >85% In [6], they used massive artificial neural networks to detect complicated patterns from vast amounts of input data to learn classification models. This paper compares several cuttingedge classification techniques for automatic recognition of activities between people in different settings that vary widely in the amount of information available for analysis. Neural networks performed better, achieving 60% overall prediction accuracy.
Shihabudheen and G. N. Pillai [11] mentioned that neurofuzzy systems are part of flexible computing (soft-computing) that encompass a set of techniques that have in common the robustness in handling imprecise and uncertain information that exists in problems related to In the real world, flexible computing techniques can be combined to take advantage of their advantages. ANFIS is a method that allows creating the rules base of a fuzzy system, using the backpropagation training algorithm from collecting data from a process. Its architecture is functionally equivalent to a Sugeno-type rule base.
Particle Swarm Optimization (PSO) is a population metaheuristic that has been successfully applied to solve optimization problems. It is inspired by the social behavior of the flight of flocks of birds or the movement of schools of fish. The PSO algorithm was developed by Kennedy and Eberhart based on a social metaphor approach [12], and it is based on the factors that influence the decision making of a particle that is part of a set of similar particles. The decision of each particle is made according to a social component and an individual component, through which the movement of this particle is determined to reach a new position in the space of solutions. Metaheuristics try to simulate this behavior to solve optimization problems.
Clustering can be defined as the process of grouping a set of abstract or physical objects into similar classes. Clustering is an unsupervised learning technique, and a suitable clustering method should identify clusters that are as compact as they are separated from each other, that is, they have high intra-cluster similarity and low inter-cluster similarity [13]. The clustering methods used in ANFIS are Subtractive Clustering and fuzzy c-means clustering (FCM). Subtractive Clustering is a fast, one-step algorithm to estimate the number of groups and centers of clusters in a data set, and it is implemented using the subclust function, the genfis2 function uses this method to generate a fuzzy inference system (FIS). FCM is a grouping method developed by Dunn in 1973 and improved by Bezdek in 1981, and this method allows determining the membership of a data in a cluster, based on its degree of membership in each of the predefined clusters and the distance of the data to each of the centers of the clusters, through an optimization function; this clustering method is used by genfis3 to generate a fuzzy inference system [14].
In [15], they applied a multiple classification support vector machine algorithm optimized by Particle Swarm Optimization to identify five types of conventional human postures. Experimental results show that our overall classification accuracy is 92.3%, and Measure F can reach 92.63%, indicating that the human activity recognition system is accurate and effective.
In 2019, Sivaram [16] proposed an Advanced Expert System Using Particle Swarm Optimization Based Adaptive Network-Based Fuzzy Inference System to Diagnose the Physical Constitution of Human Body. The comparative results with the ANFIS system and proves that BSO-ANFIS matches well with the physician's report than the ANFIS system.
In [17], they made an ANFIS training with a modified PSO algorithm. The proposed model is applied to identify a nonlinear dynamic system. ANFIS uses the least-squares method to calculate the error. The modified PSO algorithm removes the worst particle from the swarm and replaces it with two particles generated by a crossover operator from two particles, one selected at random. Moreover, the other is selected for the characteristic of being the worst local best of its generation. The modified PSO algorithm managed to improve the error of the original FIS. The idea of improving the error found in a FIS was taken from this article, using the PSO algorithm to improve the classification error for both training and test data.
In [18] they focused on the problem of recognizing physical activity, that is, the development of a system that can learn patterns from the data in order to detect what physical activity a certain user is carrying out, for this They propose a hybrid system that combines particle swarm optimization for clustering characteristics and genetic programming combined with evolutionary strategies for the evolution of a population of classifiers, in the form of decision trees. This worked significantly well for the user's specific case.
In [9] developed a valid prediction model, a modern hybrid approach was built, combining a fuzzy inference system based on adaptive networks and particle swarm optimization (ANFIS-PSO) for the prediction of changes on anthropometric indices, including waist circumference, waist-hip ratio, thigh circumference, and upper-middle arm circumference, in female athletes. The results of the ANFIS-PSO analysis were more accurate compared to SPSS. From the mentioned article, the idea of optimizing the FIS error for the classification of Physical Activity using PSO is taken, the cost function to be optimized will be the mean square error.
As described, most works vary about the use of some classifiers for physical fitness, besides, the use of ANFIS and the PSO metaheuristic. From the works studied, we saw that the vast majority of the works obtain an acceptable precision when classifying physical activity. Considering that most of these works are implemented for other types of activities that is why in this work, we intend to use the combination ANFIS PSO in the classification of physical activity in school children 506 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 6, 2020 little discussed in the literature. The objective is to validate if these techniques are also obtaining good results with RMSE.

III. METHODOLOGY
The study was descriptive in cross-section. The sample was selected 1813 children in a probabilistic way (stratified), with 988 men and 825 women of average socioeconomic status from public schools in the urban area of the city of Arequipa-Peru (2320 meters above sea level).
Internationally standardized protocols were used because they offer a higher degree of reliability for anthropometric and physical activity variables.
The study seeks to classify Physical Activity using anthropometric data and tests of Physical Activity in school children between 6 and 12 years of age in the Arequipa region, Peru. Using a neuro-fuzzy system (ANFIS), whose root mean square error (MSE) will be optimized with the Particle Swarm Optimization (PSO) algorithm. The system will be made in Matlab R2018a.
The methodology followed for the following work is made up of the stages shown in Fig. 1.

A. Understanding the Problem
The classification of the Physical fitness will be done on a database of 1813 records, and each record has four input attributes: Data preprocessing was first performed with anthropometric data cleaning and arrangement and the data from the following tests: -Flexibility (cm): The flexibility of the dorsal-lumbar region, sitting, and modified reach was measured.
-Horizontal jump (cm): The horizontal jump was measured the number of times in the "kangaroo" position.
-Agility 10 x 5 m: (second) It was evaluated in a 5-meter run ten times. It was evaluated in seconds with a chronometer.     Tables I, II, and III show the results of the Tests evaluated with their corresponding P5, P10, P125, P50, P75, P90, and P95 percentiles. The cutpoints used were based on percentiles, similar to NASPE batteries [19], considering that values that approach the percentile (P90) are rated as excellent and in the percentile (P10) as deficient, depending on the physical test or the battery objective (concerning health or physical performance). For this, the following cut-off points are suggested for the diagnosis of physical fitness: <P10 = deficient, P10 to P25 = Poor, P25 to P50 = Regular, P50 to P75 = Good, P75 to P90 = Very good and> 90 Excellent For our work, in a practical way, we considered the Physical fitness (PF) classification: low PF (P <25), standard PF (25 <= P <75), and high PF (P> = 75).
The research work was implemented in Matlab r2018a using the genfis3 tool, based on the PSO algorithm of [20], to which modifications were made to adapt it to the nature of the problem addressed in this article.

B. Selection and Data Processing
At this stage, for proper processing, the 1813 records were stored in a file (.mat), utilizing a script (.m) in charge of converting data to this format, this file (.mat) is a Matlab structure that is used to store data, which internally consists of  Tables I, II and III. For balancing, the records were first randomly disordered, then 70% were chosen from these for training and 30% for tests.

C. ANFIS Optimization by PSO
The ANFIS network [21] has the Matlab Neuro-fuzzy Designer application for its implementation. This is shown in Fig. 2.
It is observed that it consists of four distinct parts, such as: Based on these steps or stages, an app optimized by PSO was built that tries to improve this process, which can be seen in Fig. 3. It is detailed in the following section. 508 | P a g e www.ijacsa.thesai.org

1) Load Data:
Unlike the Neuro-fuzzy Designer tool, this process is improved by incorporating data balancing features. Fig. 4. The balancing process of the data described in section B is observed.
It should be noted that the separation between inputs and outputs is due to the fact that it facilitates the process of fusing or generating the Fuzzy Inference System (FIS) in the following section.
The results of loading and balancing the data show an amount of 1069 records for both training and testing and 744 records for training and testing outputs. 2) Generate FIS: For the generation of the FIS, we chose to use fuzzy c-means clustering (FCM), since this allows us to generate a more adaptable model to the problem of the classification of flexibility in school physical fitness, as it is known that this problem has different ranges. For each specific age, that is, a 6-year-old child cannot be treated in the same way as a 12-year-old, due to the difference in the percentiles of the anthropometric indicators with age.
An alternative was to create 7 FIS models of the Sugenotype with trapezoidal membership and linear output functions, for records grouped by age.
The way to improve this creation of 7 FIS models was proposed, for this, an unsupervised fuzzy classification was chosen that with a single FIS model that covers all the ages considered, that is, a fuzzy clusterization that automatically inferred the necessary membership functions that may apply to school children.
The solution was to use the Fuzzy c-means fusification model through Matlab's Genfis3 function.
Genfis3 generates a fuzzy inference system (FIS) from previously provided data using fuzzy c-means (FCM) clustering, which by extracting a set of rules models the behavior of the data. The function requires separate sets of input and output data.
When there is only one output, genfis3 can be used to generate an initial FIS for ANFIS formation. The extraction method first uses the FCM function to determine the number of antecedents and consequent membership rules and functions.
Equation (1) shows the parameters taken by genfis3, Xin is the matrix of the input data, Xout is the matrix of the output data, type specify the type (Sugeno or Mamdani), cluster_n specifies the number of groups and grouping specifications for the FCM algorithm is detailed in fcmoptions. fismat = genfis3(Xin, Xout, type, cluster_n, fcmoptions) The input membership function type is 'gaussmf'. By default, the output membership function type is 'linear.' However, if the type is specified as 'Mamdani,' then the output membership function type is 'gaussmf'.
The parameters for genfis3 are shown in Table IV.      • The knowledge about the environment (its value of fitness).
• Historical knowledge or previous experiences (memory).
• Historical knowledge or experiences of individuals located in your neighborhood.
A PSO algorithm maintains a cluster of particles, where each particle represents a solution to the problem. Particles fly through a multidimensional search space, where the position of each particle is adjusted according to its own experience and the experience of its neighbors.
PSO is initialized with a group of random particles (solutions) and then searches for optimal ones by updating iterations. In each iteration, each particle is updated by the following two "best" values. The first of these is the best solution (fitness) that has been achieved so far and is represented as Pbest. The Other best value is the best solution obtained so far by any particle in the population, this is represented as Gbest. Each particle knows the best value to date (Pbest) and the best value in the group (Gbest). The particles try to modify their position using the current speed and distance, from Pbest to Gbest. The adjusted velocity and the adjusted position of each particle can be calculated using the formulas in equations (2) and (3). Where: +1 : Adjusted speed W t : Inertia of the movement itself.  In more detail, the PSO method is described as follows: a) Initialize the population. The position of each of the particles is determined randomly.
b) The best previous position is matched to the current position.
c) Each position is evaluated in the fitness function to determine the quality of the solution.
d) The aptitude of the current position is compared with the best previous one. e) Assign informants (neighborhood) of size k to the particle.
f) Determine the best particle in the neighborhood. g) Adjust speed. h) Adjust the position. i) Check if the stopping criterion is met. j) If not met, return to step e. The values that PSO takes for an initial training optimization test are detailed in Table V. It should be noted that more tests will be carried out in the "Results" section to reach the conclusions.  Vol. 11, No. 6, 2020 4) Test FIS: For the evaluation of performance, the mean square error (MSE) was used. The mean squared error (MSE) of an estimator measures the average of the squared errors, that is, the difference between the estimator and what is estimated. The MSE is a risk function, corresponding to the expected value of the loss of the squared error or quadratic loss. The difference occurs due to randomness or because the estimator does not take into account information that could produce a more accurate estimate.
The MSE is the second moment (about the origin) of the error and therefore incorporates both the variance of the estimator as well as its bias. For an unbiased estimator, the MSE is the variance of the estimator. Like variance, the MSE has the same units of measurement as the square of the quantity being estimated. In an analogy with the standard deviation, taking the square root of the MSE produces the error of the mean square root or the deviation of the mean square root (RMSE or RMSD), which has the same units as the estimated quantity; For an unbiased estimator, the RMSE is the square root of the variance, known as the standard deviation.

IV. RESULTS
In Table VI. The results for the training are shown considering a population of 25 and at different levels of iterations. Fig. 7 shows the comparison between the output (black color) and the expected results (red color) for the training with the best result obtained, which was 500 iterations.   Table VII shows that the Cost and RMSE results decrease as the iterations increase until from 500 onwards it grows again.

V. CONCLUSION
Based on the table of percentiles generated for the physical fitness tests, an application was created using ANFIS-PSO that processed the data for its classification The results obtained in Tables VI and VII about the classification of the physical fitness of school children using ANFIS-PSO show that the error, as well as the cost, decreases to a greater number of interactions, more specifically when it reaches 500 iterations reaching at an RMSE of 3.10, on the other hand, if the number of iterations continues to increase, it is seen that the results decline, because it is considered that an overtraining has occurred, being adjusted to very specific characteristics of the training data that are not causally related with the objective function.
For future work, this study can be extended to improve PSO training by increasing the swarm size and parallelizing the algorithm. It is also suggested that the age range should be widened, and other physical tests increased. The implementation of the Fuzzy c-means made it more manageable to address the problem of physical fitness classification.
This approach can save physical education teachers time when trying to evaluate large populations of children.
These results can be considered as a baseline to make future comparisons and observe changes over time.
Through particle swarm optimization calculations, physical fitness can be classified by the Neuro-fuzzy System validly and reliably, since the RMSE values of 3.10 are acceptable.