JEPPY: An Interactive Pedagogical Agent to Aid Novice Programmers in Correcting Syntax Errors

Programming is a complicated task and correcting syntax error is just one among the many tasks that makes it difficult. Error messages produced by the compiler allow novice learners to know their errors. However, these messages are puzzling, and most of the times misleading due to cascading of errors, which can be detrimental to running a syntax-error free program. In most laboratory setting, it is the role of the teachers to assist their students while doing activities. However, in our experienced, considering the large number of students in a class, it may seem difficult for teachers to assist their students one-byone given the time constraints. In this paper, the design and implementation of an interactive pedagogical agent named JEPPY is presented. It is intended to assist novice learners learning to program using C++ as a programming language. In order to see on how students struggle or progress in dealing with errors, the proponents implemented the Error Quotient (EQ) developed by Jadud. The principles of the cognitive requirements of an agentbased learning environment were followed. The agent was put into test by novice learners in a laboratory setting. Logs of interaction between the embodied agent and the participants were recorded, aside from the compile errors and edit actions. These mechanisms show us some insight on the interaction behavior of learner to the agent. Keywords—Pedagogical agent; error quotient; syntax-error correction; compile errors; human computer interaction


I. INTRODUCTION
Computer programming is a complicated task. According to Jenkins [1], Programming "is a complicated business" which requires the mastery of several skills such as problem solving, abstraction, mathematical logic and testing, debugging and so forth. In addition to this, in case of novice programmers, knowledge was found to be limited and shallow hence they lack the ability to write syntactically-correct programs. Over the years, several studies have been conducted to look at how compiler errors have affected the learning curve of students learning to program, particularly, novice programmers. Becker [2] showed that compiler errors can be frustrating and students in his study described them as "barriers to progress". In addition, Denny, Luxton-Reilly and Tempero [3] showed that students have difficulties locating and correcting syntax errors using average compiler. Moreover, Kummerfeld and Kay [4] concluded that even the more experienced students took significant time to correct some syntax errors. Studies have been conducted to understand interaction of learners to the compilers. Separate studies of Jadud [5] and Becker [6] showed a metric in quantifying these repeated errors. Jadud [5] called the interaction of the learners to the compiler as "compilation behaviour" and called the metric as the Error Quotient. To support learners in dealing cryptic messages produced by compiler, Ahmed, Kumar, Karkare, Kar, and Gulwani [7] developed a system called TRACER (Targeted RepAir of Compilation Errors) that perform repairs on compilation errors. In a study of Becker, Goslin, and Glanville [8], an enhancement to JAVA compile error messages was made and employed for intervention. Comparison between control and intervention groups showed that enhancing compiler messages is of advantage.
For environment of practice by novice programmers such as that in a laboratory setting, the current methods and tools employed focused on identifying behaviors using the online protocols and browser to inform teachers who among their students struggles and then provide manual intervention if necessary. However, considering the number of students in a classroom it is not realistic that the teacher can always assist the entire class for every laboratory session given some time constraints. This issue motivated the researchers in this study to come up another approach to augment the problem.
This study made an attempt in employing an embodied agent and see its potential use to aid novice programmers in their battle over syntax errors. This can help mentors attend to several other skills to consider in teaching programming, rather than focusing on assisting compile errors correction. However, skills like problem solving and logical reasoning were not yet addressed in this study and learning on that aspects requires different measures to help novice learners.
In this paper, the proponents presented the design and implementation of an interactive pedagogical agent which will be used as a tool to assist novice programmers in the daunting task of correcting syntax error produced by the compiler. Moreover, the proponents look into the interaction of the learner to the agent along with their interaction to the compiler. This can give us insights on the improvement of the agent and to the target benefit at which the agent was employed.

II. REVIEW OF RELATED WORK
There were studies that looked at how novice learners interact with the compiler while practicing programming. Jadud [9] define novice compilation behavior as the study of students" interaction with their compiler while learning to program. In his study, Jadud [3] developed a quantification of the student's compilation behavior based on grounded theory. He called it the error quotient or EQ. Every record in the data logs represents one compilation event. Stored in each record is the error message if there was an error at the time of compilation, the location of the error in the file which is reported by the compiler as a line number, and the source code. An EQ score is of the range 0 to 1.0, where 0 is a perfect score. An EQ score of 0 does not mean that the student made no syntax errors in their programming process. What it means is that at no point did the student encounter the same syntax error consecutively. Whereas a session scoring 1.0 means that every compilation resulted to the same syntax error all the time.
Agapito and Rodrigo [9] looked into students" compilation behaviors as they wrote their programs in C++ by analyzing automatically collected online protocols. Students" data were analyzed by computing for their Error Quotient. Results confirmed that freshmen programmers do experience difficulties and that the Error Quotient is a practical tool that can be used to characterize their compilation behaviors.
Many of the programming environment or IDEs used today have embedded capabilities or features added to help programmers do their job easily instead of just writing it using plain text editors. This same IDE is also used by novice programmers in their first programming experience using specific language. Many works reported development of automated syntax error correction. However, the approach does not care whether learners have assisted their own mistake. Some works produced feedback through an interface where support is provided. Carter [10] developed an intelligent tutor for debugging called ITS-Debug. This is achieved by developing a system with four standard modules (Domain, Student, Pedagogical, Communication) of Intelligent Tutoring System. A web-based system was developed wherein students learn debugging through different phases. Students were able to edit the code, compile and run the code, and receive assistance on a host of syntax, runtime, and logical defects that might be present in the exercise or that they may inadvertently create themselves. In the study of Kummerfeld and Kay [4], a web-based reference guide was developed which catalogues some common C++ compiler generated errors.
So far, the work of Edwards, Rajagopal, Kandru [11] reported the use of embodied characters that assist learners in dealing syntax. The proponents developed an emotionallyintelligent pedagogical agents to deliver effective and efficient feedback to students about their programming assignments and also act as a teaching assistant for any general programming related queries. The main objective of their study is to communicate clearly the feedback about student programs while motivating them to perform better. This is so far, the work that was closely related in this study.
Veletsianos and Russell [12] defined pedagogical agents as anthropomorphous virtual characters employed in online learning environments to serve various instructional goals. Pedagogical agents were employed by Carlotto and Jacques [13], Kim [14], Liew, Zin, and Sahari [15], and Kim, Thayne, and Wei [16] in a form of an animated characters, virtual or digital characters. It was used as a chatbot as reported by Savin-Baden, Tombs, and Bhakta [17], an influencer such as of Kim and Baylor [18], or a tutor Kim [14]. They can also simulate conversations and nonverbal behavior according to Liew and Tan [19]. In the work of Schroeder, Romine, and Craig [20], pedagogical agent was employed to enhance student learning. Johnson and Lester [21] cited a nonverbal feedback capability of pedagogical agents. The nonverbal cues can take various forms including nodding or shaking the head, facial expressions such as smiling or surprise. This paper employs the use of nonverbal cues for the embodied agent and used the agent as an assistant.

A. Defining Agent Design Requirements
According to Baylor [22], the prime cognitive consideration in the design of agent-based learning environment is the management of control. The first dimension of control involves instantiating the instructional purpose of the environment on a constructivist (high learner control) to instructivist"s (high program/agent control) continuum. A critical issue from a constructivist approach to agent-based learning environments is in moderating between the agents taking over thinking for the student with the agent training the student to think more effectively. In the constructivist approach, the agent is a medium that does not teach the student directly. In this study, the presentation of knowledge about errors comes in a form of recall and example. Note that in an error message, the compiler may refer to some token in the code. Meaning, different token may appear even for the same error type. For example, the error message "expected ";" before "int"" contains the token ";" and "int" enclosed within single quote. In recall, the content presented by the agent will not specifically tell the student the specific solution but instead present the similar or general case. For instance, for the error mentioned above, the agent would say "Remember that in C++ every statement must end with a semi-colon. In an example, the agent would present an example statement with a semicolon at the end. This is how the proponents push the student to do the thinking. The second dimension of control entails managing feedback, and several issues need to be considered: type, timing, amount, explicitness, and learner control of agent feedback. An important consideration in terms of feedback is that the pedagogical agent should not provide too many insights and thereby annoy the student. In the current design of intervention, the agent will depend on the current computed value of the error quotient. This means that whenever the student is stuck in a specific error, the agent will intercept every compilation. Although by default help should be minimal, part of our intention is to give us insight on the interaction of the learner to the agent in the environment. So, the proponents allow the agent to be proactively intervening as long as the EQ limit of 0.3 or greater was reached. Third consideration is when agent versus learner control is further defined through the desired relationship of the learner to agent. Some examples of instantiating the learner-agent relationship include the agent as learning companion, agent as mentor, multiple pedagogical agents, agent as personal assistant, or agent as resource. In this paper, the proponents define the role of the agent to be an assistant that informs the learner on their mistake. The feedback flows from agent looking at the error quotient and appears when EQ is greater than the value 0.3. The agent looks only on the first error message per compilation since the first error most of the time is the cause of cascading error and if not eliminated will cause the student to get stuck on that error. This is also www.ijacsa.thesai.org consistent to the existing computation of error quotient in which only one error was considered for computation in every compilation. Fourth, to be instructionally effective, the agent must assert enough control so that the learner develops confidence in the agent in terms of believability, competence, and trust. The critical issue that concerns believability is the message that the agent will provide. Incorrect message to provide will decrease trust and competence. The persona and behavior of the agent was also considered to make the agent believable. Fig. 1 shows the Agents" gestures. Sequencing these gestures make up some form of behavior. The choice of the interface is a cartoon character and was named JEPPY. The proponents choose not a very serious character to capture the attention of the serious learners. The behavior space includes deictic and affective gestures as shown in the Fig. 1. The following gestures were combined to form actions that make the embodied agent more life-like.

Thumbs-up Waving Default
Reading Nodding Clapping

C. Testing
To test and validate the functionality of the components, the proponent put JEPPY in to a test with participants in an actual laboratory session. Participants were students taking up introductory programming course in a State University. Before the participants continue in the task, they were given questionnaire to verify whether they are really novice programmers. This is because the agent is intended for novice programmers only. There are 18 participants which where identified to be novice programmers. They were given a source code which contains cascading errors. Meaning, one error may come after another after correcting the first one. They were all given the freedom and time to finish the problem without asking help from other participants or instructor around.

D. The Architectural Design
The implementation follows the typical architecture of a pedagogical agent but was contextualized according to purpose of used. Fig. 2 shows the architecture of the agent in this study. The pedagogical module was implemented as plugin in Code::Blocks. The errors produced by the compiler were preprocessed to include only necessary information. The event logger was responsible on logging the preprocessed compiler errors, the edits done in the code, the interaction of the learner with the agent and the calculated value of error quotient. These data logged by the logger were inserted in the SQLite database.
The communication module implemented using Java comprises the interface and inference controller. The interface is where the learner interacts with the agent. The embodiments are gif files which are retrieved depending on the interaction and current state of the learner.

A. The Implemented Agent
In the first compilation, the agent would appear and introduce itself to the student. Starting from the first compilation also, the logger is activated. So, every time the learner edits some lines in the code it will be recorded line per line. For every compilation starting from the third compilation, the two pairs of events can be created. At this point the Error Quotient can be calculated. When EQ is more than the threshold value, the agent will capture the first error, preprocessed it and retrieve message from the rules in the domain module that matches the error, and then display the help message through the embodied agent. Table I shows an example EQ computation extracted from the compile-edit log. As per algorithm, the task is to compare www.ijacsa.thesai.org two successive compilations. For instance, from Table I,  looking at compilation number 2 and 3, both compilations ended with error, so a penalty of 2 was added. Since both compilations have same error type (expected token before token), a penalty of 3 was added. However, both compilations do not have same error location and line edit made, so no penalty was added. The total score for this pair (compilation 1 and 2) is 5. The total score was divided to 11, which is the highest possible score, and is now the normalized value 0.5556. The final error quotient for this pair is the average of the sum of all the normalized score in each pair, in the given example, it is 0.3889.
The implemented agent was shown in Fig. 3 to Fig 9. Fig. 3 shows the appearance of the agent when it offers help from the learner. As one can see, the agent does not provide directly the help on the error identified. Instead, an option was given to see whether help is needed, or the learner already knows the error. When help is used, the agent will then provide the help as shown in Fig. 4. Fig. 5 shows the case wherein the error occurred again, and the agent will offer another help. Fig. 6 is the screenshot of the agent portraying like reading some notes when telling student to use example.
When help is used again, help will be provided in a form example as shown in Fig. 7. Fig. 8 and Fig. 9 are the affective gestures of JEPPY when it is sad and glad, respectively.

B. Result of Interaction based on the Logs
One critical part among the components is the correct message or support that the agent will provide. The interaction log provides a way for us to see whether correct help is given to an error message. Recorded in every row was the error message which is a result in preprocessing stage during compilation. Also, in the same row, was the help coming from the domain knowledge which is a result of the inference engine.
Aside from validating the functionality of the components through the logs, it also gives us some observations on the interaction of the learner to the agent. Out of 538 times that the agent appears, only 159 or 29.55% of the time the agent was used. It can also be observed that there are 119 or 22.12% of the time the agent was closed when help is asked. The proponents can also see instances wherein there is no interaction in an intervention, meaning the agent was ignored and after 20 seconds without any interaction it pops out. There are 260 or 48.33% of the time that the agent was ignored. The large number of time that the agent was ignored by students is maybe because they were so engaged in attempting to correct error by themselves. As mentioned by Jadud [3], students took significant time editing and compiling their code, and after several attempts without success, they may fall into frustration. But here, with the presence of JEPPY, we can be able to prevent such case. We can see that in the sequence of usage. From 159 interventions, 106 or 66.7% were hint usage and 53 or 33.33% were example usage. Even the students are proactively debugging these errors by themselves and do not use help even when they need it, based on the logs, out of the 106 hints usage, 70 or 66.04% of the time wherein errors were encountered are corrected after using hint. When error was not eradicated, the agent can reinforce this by offering an example. We see that there are 11 instances in the total usage wherein hint is immediately followed by example and the error was corrected after it. There is a total of 81 or 76.42% of errors corrected after using the support provided. In case of example usage only, meaning not preceded with hint, there is 56.60% of the total usage wherein the error was corrected right after.
Although the figures presented are not at large, the potential of JEPPY can be seen in helping the novice learners in dealing syntax error, of course, with further improvement.       To see whether content in the support is helpful, part of the interaction by the agent is to ask the learner whether the message is clear or helpful. There are 66 or 41.51% instances wherein student responded on the question whether hint is clear or understandable. From the total responses, all 66 of it responded that the message is clear. For the example usage, however, there is only one response which said that the message is clear.
The summary of our logs had given us insight in terms of interaction. In the design, the agent was intended to be proactive by having smaller threshold value of Error Quotient. But our logs tell us that the agent must be designed to carefully select www.ijacsa.thesai.org timing in intervention, otherwise, the learner might get annoyed. Probably models on interaction along with EQ should be developed for timing in intervention. Nevertheless, when help is being used, the agent can be of help before the learner falls into frustration. However, it should be noted that our logging mechanism was not intended to deeply look on the efficacy of learning. The logs enable us to verify and validate the functionality of every component and give us opportunity to gain insight for further improvement of the agent.

V. CONCLUSION AND FUTURE WORK
In this paper, the researchers presented the design and implementation of an interactive pedagogical agent. It was successfully embedded as a plugin in an Integrated Development Environment named Code::Blocks. The said environment for developing real-world applications was also used by the students in our institution. However, it was not developed to care on the problem encountered by Novice programmers such as syntax error correction. Hence, through this work, the researchers were able to address one of the many problems a Novice programmer may encounter.
Although our domain is specific to C++ as programming language, the modular fashion of the architectural design on the components can be easily expanded. For instance, rules containing errors and their corresponding help or corrections can be added without any changes in the rule engine as long as it conforms to the pattern. Currently, the study does not include yet the evaluation on the learning gain. It can be seen, however, that by using the computed EQ, one can determine how well a student is progressing with or without JEPPY. This can be done with a large number of participants and an ample time. The current work done focuses on the design and implementation of the agent and the EQ.