Is Deep Learning Better than Machine Learning to Predict Benign Laryngeal Disorders?

It is important in otolaryngology to accurately understand the etiology of a laryngeal disorder, diagnose it early, and provide appropriate treatment accordingly. The objectives of this study were to develop models for predicting benign laryngeal mucosal disorders based on deep learning, naive Bayes model, generalized linear model, a Classification and Regression Tree (CART), and random forest using laryngeal mucosal disorder data obtained from a national survey and confirm the best classifier for predicting benign laryngeal mucosal disorders by comparing the prediction performance and runtime of the developed models. This study analyzed 626 subjects (313 people with a laryngeal disorder and 313 people without a laryngeal disorder). In this study, deep learning was the best model with the highest accuracy (0.84). However, the runtime of deep learning was 39min 41sec, which was a 10 times longer development time than CART (3min 7sec). This model confirmed that subjective voice problem recognition, pain and discomfort in the last two weeks, education level, occupation, mean monthly household income, high-risk drinker, and current smoker were major variables with high weight for the benign laryngeal mucosal disorders of Korean adults. Among them, subjective voice problem recognition was the most important factor with the highest weight. The results of this study implied that the prediction performance of deep learning could be better than that of machine learning for structured data, such as health behavior and demographic factors as well as video and image data. Keywords—Benign laryngeal mucosal disorder; voice disorder; deep learning; Naive Bayes model; generalized linear model


I. INTRODUCTION
Laryngeal disorders include organic dysphonia, caused by the structural changes (anatomical changes) of the larynx including the vocal cords, and functional dysphonia, which changes voice due to health risk behaviors (e.g., smoking or drinking) and improper habits (e.g., abuse or misuse of voice) [1]. In particular, benign laryngeal disorders refer to laryngeal disorders except for laryngeal cancer, a malignant tumor [2]. They are caused by abnormalities in the nervous system, mucous membranes, and cartilage [3], and they are frequently found in the adult population [4]. Benign laryngeal disorders include vocal polyp, vocal nodule, vocal cyst, Reinke's edema, vocal sulcus, vocal scar, contact granuloma, and laryngeal papilloma [5] [6].
The prevalence of laryngeal disorders was 6.6% based on the American population [7]. Roy et al. (2005) reported that at least 1 in 10 Americans had experienced voice problems at least once in their lifetime [7]. There is not enough data regarding the prevalence of laryngeal disorders in South Korea. The Otolaryngology Examination Survey of the 2012 Korean National Health and Nutrition Survey reported that the prevalence of benign laryngeal disorders was approximately 2.5% in South Korea [8]. It was reported that the prevalence of laryngeal orders is higher among men than women and smokers than nonsmokers [9] [10]. It was also reported that the risk of laryngeal disorders was 1.4 to 1.6 times higher in managers, professionals, and service & sales workers than economically inactive people [11] [12].
Voice is a very critical function for maintaining daily life. Particularly, it is directly related to living for certain occupations such as teachers, announcers, and singers. Consequently, discovering a laryngeal disorder early for maintaining a healthy voice can greatly improve the quality of patients' life [13,14,15]. Therefore, it is important in otolaryngology to accurately understand the etiology of a laryngeal disorder, diagnose it early, and provide appropriate treatment accordingly.
To date, the most common risk factors causing benign laryngeal mucosal disorders are voice abuse and wrong vocalization habit [16,17,18,19,20,21,22]. Other very diverse factors (e.g., smoking, drinking, viral infection, upper respiratory tract infection, and laryngopharyngeal reflux) have also been reported as risk factors [16,17,18,19,20,21,22]. However, since a disease is a result of complex interactions between multiple risk factors, not caused by a single risk factor, it is limited to predict a disease by exploring only individual risk factors [23]. To make it harder, different treatments need to be given according to individual characteristics (habits) and etiology, even though the shape of the lesions of a laryngeal disorder on the vocal cord mucosa is similar [24]. Consequently, it is important to fully understand the etiology of a benign laryngeal mucosal disorder and identify multiple risk factors of the disease in order to perform accurate diagnosis and treatment. Nevertheless, most studies that have evaluated the risk factors of laryngeal disorders have just tried to find individual risk factors using regression analysis [25,26,27,28,29], and only a few studies have explored the multiple risk factors of benign laryngeal mucosal disorders using machine learning [30].
Supervised learning-based machine learning has been used as a way to detect a disease and identify multiple risks in recent years [31,32,33]. Many recent studies [34,35] have reported that neural network-based deep learning is more accurate in 112 | P a g e www.ijacsa.thesai.org classifying and predicting diseases than machine learning. Nevertheless, previous studies [36,37] mainly focused on developing classifiers for discriminating the presence of laryngeal diseases by mostly using video and image data. However, there are not enough studies on developing models to predict benign laryngeal mucosal disorders while reflecting various features (e.g., health behavior, disease, and demographic characteristics) in health surveys. The objectives of this study were to develop models for predicting benign laryngeal mucosal disorders based on deep learning, naive Bayes model, generalized linear model, a Classification and Regression Tree (CART), and random forest using laryngeal mucosal disorder data obtained from a national survey and confirm the best classifier for predicting benign laryngeal mucosal disorders by comparing the prediction performance and runtime of the developed models.
Construction of this study is as follows: Section II explains data source, measurements, development and validation of prediction models. Section III compares the results of developed machine learning models. Lastly, Section IV presents conclusion and direction for future studies.

A. Data Source
This study targeted adults ( ≥ 19 years old) who participated in the otolaryngology examination and completed the 2012 KNHANES. The KNHANES extracts survey plots using the proportional allocation systematic sampling method that stratifies administrative districts and types of residences across the country and extracts samples proportional to the population survey plots of each layer. This study selected 4,528 adults (313 subjects with a laryngeal disorder and 4,215 subjects without a laryngeal disorder) who completed the health questionnaire, the otolaryngology questionnaire, and laryngeal endoscopy as the primary subjects of this study. Since the prevalence of a laryngeal disorder was only 6.9% among the subjects, showing a data imbalance issue, this study resolved the imbalance issue by using propensity score matching, which matched sex and age (1:1 ratio). Finally, this study analyzed 626 subjects (313 people with a laryngeal disorder and 313 people without a laryngeal disorder).
This study conducted a power test using G-Power program 3.1.9 (Universität Mannheim, Mannheim, Germany) for the final analysis data. When power (1-B) was 0.95, significance level (alpha) was 0.05, effect size (f2) was 0.35, and 201 predictor variables were applied, the appropriate sample size was 361. Therefore, the sample size of this study (626) satisfied the appropriate sample size for testing statistical significance (Fig. 1).

B. Variables
Benign laryngeal disease [20] in this study were defined as vocal nodules, laryngeal polyps, intracordal cysts, reinke's edema, laryngeal granuloma, glottic sulcus and laryngeal keratosis (Fig. 2). The explanatory variables were occupation(economically-inactive, non-manual, manual), educational level(elementary school graduates and lower, junior high school graduates, high school graduates, college graduates and over), high-risk drinking (yes, no), Income(quartile), smoking (current smoker, previous smoker, or non-smoker), skipped yesterday's breakfast (yes or no), skipped yesterday's lunch (yes or no), skipped yesterday's dinner (yes or no), dietary supplement consumption in the past one year (yes or no), usual fluid intake (g), protein intake (g), fat intake (g), carbohydrate intake (g), calculus intake (g), sodium intake (g), sinusitis prevalence (yes or no), otitis media prevalence (yes or no), tinnitus prevalence (yes or no), depression for two consecutive weeks (yes or no), pain and discomfort in the last two weeks (yes or no), and subjective voice problem recognition (yes or no).

C. Development and Validation of Prediction Models
This study developed models for predicting benign laryngeal disorders using deep learning, naive Bayes model, generalized linear model, CART, and random forest and compared the accuracy and runtime of them to check their prediction performance. Since this study had a small sample size (n=626), it could deteriorate the reliability when evaluating the prediction performance using held-out validation. Therefore, this study carried out 5-fold crossvalidation to evaluate the prediction performance (Fig. 3). The R code of the 5-fold cross-validation is shown in Fig. 4.    When developing a model using a method with a random characteristic (e.g., random forest), the seed was fixed to #0123456. This study defined the model with the highest accuracy as the model with the best prediction performance. When the accuracy was identical, a model with a shorter runtime was selected as the model with the best prediction performance. All analyzes were performed using R version 3.6.3 (Foundation for Statistical Computing, Vienna, Austria).

D. Machine Learning Models
The decision tree is an algorithm that creates a learning model in the tree shape according to the features of the data and derives a final decision through repetition. Since the decision tree expresses the analysis process in a tree-shaped graph, the decision tree has the advantage of helping a researcher understand and explain the analysis process easily (Fig. 5). In this study, CART was used as a decision tree algorithm. In this study, the maximum tree depth was set to 10, the parent node was set to 50, and the child node was set to 30. The naive Bayes model is a method of classifying observations by using Bayes theory (Fig. 6). Bayes theory refers to a way of deriving a posteriori probability for a certain observation by using a secured prior probability.
Random forest is a decision tree-based ensemble method that generates many random samples using a bootstrap (randomly extracting samples of the same size from a given data with replacement) from a learning data, trains independent decision trees for each sample group, and synthesizes the results to create a final model (Fig. 7).  The generalized linear model is an extension of the linear model that can handle cases where a dependent variable of the dataset does not satisfy the normal distribution assumptions. It is a regression analysis using the glm() function. In other words, the generalized linear model models f(x), which is formed by converting a dependent variable, using a linear combination of the independent variable and the regression coefficient.
Deep learning is an algorithm composed of an input layer, composed of independent variables, an output layer, composed of dependent variables, and two or more hidden layers between the input and output layers. Independent nodes are arranged in each layer, and the nodes between the two layers are connected by weighted neurons (connecting lines) (Fig. 8).
This study used H2O Deep Learning among various deep learning types. H2O's Deep Learning is a type of the multilayer feedforward artificial neural networks, and it is trained with gradient descent optimization and back-propagation.
In this study, the number of hidden layers was set to 2 (200 hidden node), a default value, and epoch (the number of passes of the entire training dataset) was set to 10. H2O Deep Learning provides Tanh, Tanh with dropout, Rectifier, Rectifier with dropout, Maxout, and Maxout with dropout as activation functions. This study used rectifier, the default function, as an activation function to develop models. The code of H2O Deep Learning is presented in Fig. 9.

A. Comparing the Accuracy and Runtime of benign Laryngeal Mucosal Disorder Prediction Models
The accuracies of five models (deep learning, naive Bayes model, generalized linear model, CART, and random forest) for predicting benign laryngeal mucosal disorders are presented in Fig. 10. In this study, deep learning was the best model with the highest accuracy (0.84). The runtimes of the five models are presented in Fig. 11. In this study, CART showed the shorted runtime (3min 7sec).

B. Predictors of benign Laryngeal Mucosal Disorders in Korean Adults
The normalized importance of the deep learning's variables, the final model, is presented in Fig. 12. This model confirmed that subjective voice problem recognition, pain and discomfort in the last two weeks, education level, occupation, mean monthly household income, high-risk drinker, and current smoker were major variables with high weight for the benign laryngeal mucosal disorders of Korean adults. Among them, subjective voice problem recognition was the most important factor with the highest weight.  115 | P a g e www.ijacsa.thesai.org

IV. DISCUSSION
This study compared models for predicting the benign laryngeal mucosal disorders of Korean adults. The results of this study showed that deep learning had the best prediction performance among deep learning, naive Bayes model, generalized linear model, CART, and random forest. The runtime of deep learning was 39min 41sec, which was a 10 times longer development time than CART (3min 7sec). However, deep learning showed better (≥6%) accuracy than machine learning models. The results of this study agreed with the results of previous studies [36,43] that reported that the performance of deep learning was better than ensemble-based machine learning methods (e.g., light gradient boosted machine, and extreme gradient boosting) for predicting laryngeal disorders by using video, image, and speech analysis. The results of this study implied that the prediction performance of deep learning could be better than that of machine learning for structured data such as health behavior and demographic factors as well as video and image data. However, since machine learning studies using epidemiological data are much less than machine learning studies using video, image, and speech data, additional studies are needed to prove the superiority of prediction performance of deep learning in epidemiologic data such as health surveys.

V. CONCLUSION
The results of this study suggested that the prediction performance of deep learning could be better than other machine learning methods when developing a multi-modal model for predicting benign laryngeal mucosal disorders by using various data such as image data, demographic factors, and health behavior in the future. It will be necessary to compare the accuracy and runtime of models using the data of various diseases in order to prove the prediction performance of deep learning models, built by using epidemiological data.