Mining Educational Data to Analyze the Student’s Performance in TOEFL iBT Reading, Listening and Writing Scores

— Student scores in TOEFL IBT reading, listening, and writing may reveal weaknesses and deficiencies in educational institutions. Traditional approaches and evaluations are unable to disclose the significant information hidden inside the student's TOEFL score. As a result, data mining approaches are widely used in a wide range of fields, particularly education, where it is recognized as Educational Data Mining (EDM). Educational data mining is a prototype for handling research issues in student data which can be used to investigate previously undetected relationships in a huge database of students. This study used the EDM to define the numerous factors that influence students' achievement and to create observations using advanced algorithms. The present study explored the relationship among university students’ previous academic experience, gender, student place and their current course attendance within a sample of 473 (225 male and 248 female). Educational specialists must find out the causes of student dropout in TOEFL scores. The results of the study showed that the model could be suitable for investigation of important aspects of student outcomes, the present research was supposed to use the statistical package for social sciences (SPSS V26) for both descriptive and inferential statistics and multiple linear regressions to improve their scores.


I. INTRODUCTION
Over the last decade, test developers and experts have fixated much of their time and focus on developing a theoretical view of language ability in order to understand better the nature of language proficiency, as well as developing and applying more sophisticated statistical tools to analyze language tests and test takers' performance in order to best tap these issues [1]. However, language testing research shows that language aptitude is not the only factor influencing test takers' performance. Almost all screening processes in academic environments, from seeking college admission to applying for an exchange student programmer, require the applicant to present TOEFL iBT or other Standard English language test scores.
The TOEFL iBT (Test of English as a Foreign Language) Language testing is largely concerned with whether the results clearly effectively reflect test takers' underlying ability in a certain area in a given testing setting [2]. After graduation, English proficiency is necessary for developing career options and attaining aspirational goals in the workplace [3]. The Educational Testing Service (ETS) commissioned a recent survey study and found a high link between high English proficiency and the income of young professionals (full-time workers in their 20s or 30s) across all major industries. This higher income allows them to put more money into improving their English abilities, which are "a vital instrument for success in today's world". Test-takers personality factors to the testing scenario, such as education level, Gender, and place, can all affect their performance [4]. But these construct-irrelevant elements are regarded as potential causes of test bias, which might cause the acquired results to be unrepresentative of the underlying skill that a language test is attempting to assess. As a result, a thorough assessment of the likely effects of such factors is worthwhile.
Taking these factors into account and the popularity of the TOEFL iBT as a proficiency exam worldwide, this study aims to determine the future effects of test education level, Gender, and place on TOEFL iBT listening reading and writing results.

II. LITERATURE SURVEY
Test fairness is a challenging topic in the literature when it comes to language testing. Debates about test fairness aim to create tests free of discrimination and contribute to testing equity [5,6]. When students with the same language ability perform differently on a test, it may be called discriminatory. When the substance of the test is discriminatory to test takers from certain groups, other criteria such as education level, Gender, and test place play a factor. The test's requirements may have different impacts on test takers from different groups; test taker factors such as education level, Gender and place can all contribute to test bias.
These factors can impact a test's validity and lead to measurement mistakes. As a consequence, in the design and development of language exams decreasing the impact of these factors that are not part of the language competence is a top objective [7].
The association between TOEFL score and GPA was shown to be positive and statistically significant; however, it was less for engineering students than for students in other professions and for engineering courses than for nonengineering courses. In logistic regressions of CAE pass rate and graduation rate, the TOEFL score was also statistically significant, showing an increased probability of success with a higher TOEFL score. However, model goodness-of-fit values were low, showing that many students defied overall trends in their performance [8].
Accord to the previous survey, a mixed ANOVA was used to answer the following study question: Is there a significant difference between pre and post TOEFL test scores for male and female students? Is there an interaction between male and female students' pre and post TOEFL test scores? According to those findings, there was a substantial change between pre and post TOEFL exam scores, but no significant variation between genders. Furthermore, no correlation was found between male and female students' pre and post TOEFL test scores [9].
In agreement with the past research, there was a relationship between overseas students' academic performance and their language skills, academic self-concept and other factors that influence academic achievement. The research looked at first-year international students enrolled in undergraduate business programs at a Canadian Englishmedium institution. The following data was gathered on the students: grades in degree program courses, annual GPA, and EPT scores (including sub scores).
Students also filled out an academic self-concept measure. In addition, instructors in two obligatory first-year business courses were interviewed regarding the academic and linguistic requirements in their courses and the profile of successful students to acquire additional information about success in first-year business courses [10].
In the other side the purpose of this study was to determine whether there was a significant difference in the capacity of male and female students to respond to factual and vocabularyin-context questions on the TOEFL-like reading comprehension test. The results of reading comprehension tests taken from twenty-one male and twenty-one female students in the English Education Program were used for secondary data analysis. Through the use of random sampling, samples were chosen. Utilizing an independent sample t-test, data were evaluated [11].
On the other hand in this study, the self-efficacy of university students in responding to TOEFL questions is examined in relation to gender and participation in TOEFL courses. This study uses a descriptive design with a total sample of 200 university students from two large institutions who are majoring in both English and non-English [12].

III. PROPOSED METHODOLOGY
After reviewing data and determining the research aim and objectives, this paper examines the effects of characteristics such as education level, attendance, and student gender to examine students' scores in TOEFL iBT reading, listening, and writing using data mining approaches. For this study's techniques and data preparation procedures, methodologies are discussed below.

A. Dataset
The data for this study came from 473 students. Arabic is one of their first languages. 473 students in total took the TOEFL. The study enlisted the participation of 225 male and 248 female students (Table I).

B. Data Preparation
All activities were taken from the raw data to create the final dataset (data that was entered into the design tool). The dataset's variables were prepared to generate the models needed in the next phase.
The students received a variety of English language skills, including a TOEFL preparation session, during the rigorous English language program. The TOEFL scores of the students were used as the research tool. At the end of the course, students take the TOEFL (paper-based test). Students were in class for five hours a day and were given TOEFL-related assignments. Listening, grammar/structure, and reading are the three skills that make up the TOEFL score. The TOEFL score ranges between 310 and 677. This study aims to determine the future effects of test education level, gender, place, and attendees on TOEFL iBT listening, reading, and writing results. Fig. 1 depicts a framework for predicting student success. First, the data on student performance is fed into this system. This student data set has been preprocessed to eliminate noise and make the data set more consistent. The input data set is then subjected to various SPSS statistics analyses. Next, data analysis is carried out. Finally, different algorithms' categorization results are compared.

IV. MODEL AND ALGORITHM
Likewise, gender is another factor that is usually studied, but there is a lack of good research to identify whether male and female language learners have significantly different TOEFL results. From a psychological standpoint, there are numerous variables related to gender [13]. In general, females are believed to be more successful in language learning than males. Therefore, many scholars in language acquisition studied how gender disparities can affect students' language learning proficiency. In other words, ten studies found that female students were superior to male students in reading comprehension. In contrast, five studies found that male students were superior [14,15] also undertook a quantitative study to see if there are any gender differences in TOEFL scores and found no significant differences. The Educational Testing Service (ETS), on the other hand, came to a different result.
According to the survey, female pupils are more advanced than male students [16]. Females, for example, outperformed males in writing and reading, though the difference was minor. On the other hand, Male students performed higher in terms of listening and comprehension, as well as vocabulary proficiency [17].
Additionally, a standardized English language assessment examination, such as the Test of English as a Foreign Language, is required at most English language colleges and universities (TOEFL). However, because there are few standardized evaluation measures for all candidates, English proficiency ratings are occasionally utilized for purposes other than evaluating the "abilities of non-native English speakers to use and understand English." However, in the lack of standard ranking techniques for all candidates, the TOEFL score may be used as a stand-in for those criteria; the TOEFL score is occasionally employed as a predictor of how well a potential student will perform at a university. Even when the TOEFL is not used as the main measure of academic success, minimum TOEFL score requirements are frequently enforced.
Despite the fact that the underlying English-language communication abilities that TOEFL scores represent may be significantly more important to academic performance in specific areas, TOEFL score minimums for admission frequently do not vary among academic majors or fields of study. Requiring the same minimum TOEFL score whatever of a student's selected major may lead to the exclusion of otherwise talented students from academic programmers where academic achievement is not contingent on language competence [18]. For example, an increased TOEFL score is less correlated with academic success in college students than in other college students (possibly because English communication skills largely determine academic success in these areas). It may be reasonable to adopt the TOEFL score entry requirements. More lenient for engineering applicants, especially those who can show enough preparation through means other than a TOEFL score.
Despite the fact that course enrollment has tripled in the past 10 years, little is known about the impact of environment tests and attendance on learning. According to a recent study of college students, course attendance and the student place have an impact on the examination scores. Therefore, differences in student accomplishment between groups should be viewed with caution. This study adds to the body of knowledge by addressing a recurring problem of earlier research: determining the impact of various classroom test conditions on exam scores. The features of test environments are rarely described in previous research. This study compares test scores from students who took examinations off-campus with test scores from students who were called back to school for probationary exams within a semester [8].

V. EXPERIMENTS AND RESULTS
The analysis of this paper was done using the statistical package for social sciences (SPSS V26) for both descriptive and inferential statistics. In this work, ANOVA was used as a statistical analysis method. Because this study examines the significance of group differences, it uses an ANOVA statistical model with a continuous dependent variable (TOEFL scores) and categorical independent factors.
Because this study tries to observe the interaction between gender differences, ANOVA is the most appropriate statistical procedure among the numerous varieties of ANOVA [19]. Pre and post TOEFL scores are within-subject factors, while male and female are between-subject variables. To address the first study question, a statistically significant mean difference between before and post TOEFL scores will be studied. After that, we'll look at the statistically significant mean difference between male and female TOEFL scores. The impacts will next be compared between the TOEFL scores of males and females. Table II provides descriptive statistics for the     Furthermore, the results of the multiple regression were reported, and it can be noticed that all variables have significant positive effect on the total score since (P<0.001), as a result, the null hypothesis is rejected, and the alternative hypothesis is accepted in Table IV.  Table IV, the assumptions of this study were examined using multiple regression analysis in this part. On the other side, the impact of demographic variables on the students' overall scores will be studied in this section. Finally, the normal distribution test was done utilizing Skewness and kurtosis tests to choose between parametric and nonparametric testing Table VI [20].  Table VI, the values of Skewness and kurtosis for the score were within the range of ±2, indicating that the total score was normally distributed, according to the normality statistics.  First hypothesis: there is a significant difference in total scores regarding the Gender of the students. The independentsamples t-test is the appropriate parametric test because Gender is a categorical variable with two independent categories.
Table VII, some descriptive statistics of the total score according to each category were given. Fig. 3 can be concluded from that the average degree of females (487.49) was greater than that of males (462.04).
In addition, Levene's test for equality of variances was done and found that the variances were equal since ( = .449, > 0.05). The results of the independent-sample t-test show that there is a significant difference in total scores between males and females since P-value is less than 0.05 as ( = −3.961, < 0.001) Table VIII.   In Table VIII, the results of the independent-sample t-test show that there is a significant difference in total scores between males and females since P-value is less than 0.05 as( = −3.961, < 0.001).
Moreover, in the second hypothesis: there is a significant difference in total scores regarding the attendees of the students. Since the student's attendance is a categorical variable with more than two independent categories, the suitable parametric test is the analysis of variance (ANOVA) test.
In Table IX, some descriptive statistics of the total score according to each category were given. In Table X, the results of the ANOVA test show that there is no significant difference in total scores between the number of attendees since the P-value is greater than 0.05 as( = .151, > 0.05).   As well the third hypothesis: there is a significant difference in total score regarding the place of the test. Since the place of the test is categorical variable with two independent categories, so the suitable parametric test is the independent-samples t-test. Table XI shows some descriptive statistics of the total score according to each category were given.   In Table XII, Levene's test for equality of variances reveals that the variances were equal since( = .475, > 0.05). The results of the independent-sample t-test show that there is a significant difference in total scores between Cairo and Sheikh Zayed since P-value is less than 0.05 as ( = −2.848, < 0.01).
Subsequently, the fourth hypothesis shows a significant difference in total scores regarding the level of education. Since the level of education is a categorical variable with more than two independent categories, the suitable parametric test is the analysis of variance (ANOVA) test.  Table XIII shows some descriptive statistics of the total score according to each category were given. Fig. 6 concluded that students' average scores were different across the level of education. Table XIV shows the results of the ANOVA test show that there is a significant difference in total scores across the level of education since the P-value is less than 0.05 as ( = 8.407, < 0.001).  Finally, the fifth hypothesis: there is a significant difference in TOEFL parts (Listening, Grammar, and Reading) regarding the Gender. Table XV shows some descriptive statistics of the TOFEL parts according to each category were given. Then, Levene's test for equality of variances was conducted. It can be noticed that for listening, we have unequal variances since ( = 7.566, < 0.01) but for Grammar, we have equal variances since( = .007, > 0.05)and the same for Grammar. We have equal variances since( = 1.870, > 0.05). The results of the independent-sample t-test show that there is a significant difference in listening scores between males and females since P-value is less than 0.05 as ( = −3.082, < 0.01). Moreover, there is a significant difference in grammar scores between males and females since P-value is less than 0.05 as( = −3.900, < 0.001). Finally, there is a significant difference in reading scores between males and females since P-value is less than 0.05 as( = −3.716, < 0.001) Tables XVI and XVII.
In Tables XVI and XVII, since the Gender of the students is categorical variable with two independent categories; the suitable parametric test is the independent-samples t-test.  This study looked at the TOEFL results of 473 students based on how much time they spend studying, their educational level, gender, course attendance, and place. AS EXPECTED, the TOEFL scores improved from pre-to post-test, and the change was statistically significant. In this survey, there was significant difference by educational level, gender, attendance, and place difference. Furthermore, there was a relationship between male and female students' before and post TOEFL scores. As a result, the study's findings offer students with useful information. Furthermore, TOEFL educators can propose that the more time a student devotes to learning, the higher their TOEFL score will be. This also aids programmer makers in class design by giving them a sense of what students (who are prepared for the TOEFL) could expect. Because many students are applying to universities each year, generalizing TOEFL scores to the general population is insufficient.