Online Programming Semantic Error Feedback using Dynamic Template Matching

Many of automated computer programming feedback is generated based on static template matching that need to be provided by the experts. This research is focusing on developing an automated online programming semantic error feedback by using dynamic template matching models based on students’ correct answers submission. Currently, there is a lack of research using dynamic template matching model due to their complexity and varies in terms of programming structure. To solve the formulation of the dynamic templates, a new automated feedback model using front and rear n-gram sequence as the matching technique was developed to provide feedback to students based on the missing structure of the best-matched template. We have tested 60 student’s Java programming answers on 3 different types of programming questions using all the dynamic templates randomly chosen for each student. An expert was assigned to manually match the student’s answer with the 3 randomly chosen templates. The result shows that 80% of the best-matched templates for each student using the technique were similarly chosen by the expert. Based on the matched template, the student will be given feedback notifying the possible next programming instruction that can be included in the answer to get it correct as was achieved by the template. This model can contribute to automatically assist students in answering computational programming exercises. Keywords—Dynamic; feedback; online programming; semantic error; template matching


I. INTRODUCTION
Computer science is a discipline that involves the understanding and design of computers and computational processes, including their theory, analysis, software and hardware design, efficiency, implementation, and application and effect on society [1]. In other words, computer science is an emergent, scientific and practical method, which deals with the theoretical basis of information and computation, and combines its realization and application technology [2]. Computer programming is one of the core subjects that every computer science student must be competent to become a good programmer. Therefore, to obtain the programming skill, lots of programming exercises need to be completed [3].
Students need to develop programming logic and thinking skills to understand and solve the tasks especially on code writing. Students also need to solve any encountered programming errors in their coding regarding the syntax, semantic, and also question requirements. From there, students will get the knowledge and experiences on how to encounter any common programming errors or mistakes. Learning through practice is the best way to learn computer programming and attract novice students [4].
Unfortunately, most computer science students face difficulties in learning computer programming especially in writing the programming scripts [5]. Despite the importance of computer science, there is a high percentage of failures and dropout rates in introductory programming courses recorded by most educational institutions around the world [2]. Lecturers must also be responsible for assisting and providing some feedback to their students to resolve students' misunderstandings or mistakes. Helping a large number of students in providing personalized feedback during programming exercises will be a difficult role for teachers [6].
Furthermore, there are a lot of automated programming assessment tools with automated feedback that have been continuously developed to help students practice programming and build up logic skills and also programming syntax [7]. By using any automated programming tools, a student can submit a computer program on a problem-solving exercise while the tool will promptly produce automated feedback to highlight any encountered errors or mistakes during the compilation or implementation of the program [8]. The error is produced by the compiler known as Syntax Error. The compiler will highlight which lines that contain errors. However, for a beginner student, the syntax error j does not explain on how to fix the code in solving the question problem. This research is focusing on developing an online programming semantic error feedback by using a dynamic template matching model.

II. RELATED WORK
The teacher-student ratio can reach thousands to one by implementing the advancement of Massive Open Online Courses (MOOCs) [14]. This makes the feedback design more specific and personalized. Unfortunately, providing manual teacher feedback for programming assignments is determined as a traditional method and it is no longer suitable for MOOCs. Current automatic feedback methods have some weaknesses, such as the inability to extend to larger programs, manual teacher involvement, and lack of accuracy in determining errors.
There are two techniques to design the programming feedback; static and dynamic approaches. Static approaches identify and study the source code without running the computer program [9]. It is used to evaluate the syntax and semantic error and programming style. The dynamic approach www.ijacsa.thesai.org is based on the execution of the computer program [10]. It is used to evaluate run-time errors, programming design, and software metrics such as timing and resources utilization.
For beginners, static feedback is crucially needed in helping them to visualize the logic of the computer program in solving a question. As to master the programming skills, lots of exercises need to be completed by a student. With the advance of an e-learning platform, many platforms offer programming exercises to be done online. There are a lot of programming tools where users can learn and train their programming skills by solving the given problem with some programming code to find the best solution for that problem [11]. Most of these tools are developed as web applications. Some of these tools are CodingBat [12], betterprogrammer, Practice-It, and CodeWorkout [13]. By using these systems, users can get feedback about their submitted answers because these systems already provide a set of practical programming problems to be solved in the web browser and the results are evaluated by checking them against unit tests or test cases. Unfortunately, this dynamic feedback is difficult to be understood by the beginner who wants to start learning the logic or flow of the programming. The feedback is general in highlighting how the output should be generated. Writing hints and preparing the feedback in this way needs meta cognition and involves critical thinking which is not yet developed among the beginners.
The novice programmer tries to imitate the steps prepared by the teacher, and some errors that the novice programmer could not solve appeared during compilation [15]. One of the challenges in writing coding for novice programmers is insufficient feedback error messages. The only feedback that is available is the compiler-based error on the syntax errors [16]. Therefore, the best compiler errors are those that can deliver important messages that are desperately needed by programmers in response to all the errors they make. Decaf is a Java editor that serves as a medium for improving javac compiler error messages. An error message will be produced by the compiler if there are some errors contains in the student's source code. Then, the error codes and error messages are analyzed to produce enhanced error messages that provide more valuable information to students, in the hope that the error can be corrected more effectively as compared to the ordinary error messages alone.
With the existence of the standard error and enhanced errors, students can avoid making the same error in the future by referring to the both types of error Decaf is an enhanced compiler error message as shown in Fig. 1 that elaborate the common syntax error produced by Java.
However Decaf only provide feedback in clarifying the error related to the programming syntax. A logic error which is part of the semantic error is not presented in most of the compilers as it depends on the individual question requirements.
A semantic error feedback is meant to provide feedback based on specific question requirements using a solution template [17]. A template consists of a correct program instructions sequence (keywords, symbols, numbers). This computer programs need to be converted to certain features numbers before it can be processed as a matching template. [8] was using the instruction ratio (IGR) and the instruction count ratio (ICR) as the features to represent the computer program. IGR is the ratio of sequential instructions or symbol sequences in a program to instructions or symbol sequences of templates with some skippable instructions. Meanwhile, ICR is the average ratio of the amount of all unique instructions within the program that matches the amount of all unique instructions laid out in the template. Based on these features, the K-Means algorithm was used to assign similar computer programs to certain clusters. Based on each cluster, programs that have a similar Euclidean distance to the centroid in the cluster are grouped. These groups represent unique rules that will be associated with semantic feedback. Under this rule, an expert will add an assisted feedback to add comments of what further actions need to be done in solving the question. However, the technique needs to design the enhanced feedback manually to make sure their student understands the way on how to fix the error in their code. It needs to be done on pre-defined templates. This method requires more resources from the teacher not only to prepare the template but also need to manually assign students' program clusters with feedback from time to time. This research further enhanced this technique by making the matching template more accurate by comparing forward and reverse sequences of the n-gram algorithm. It also provides automated feedback based on the missing instruction sequence based on the selected dynamic templates mining from the correct submission answers from other students.

III. METHODOLOGY
The N-gram model is improved by calculating the sequence N-gram in two different ways. The first approach is using front N-grams where the value is gained by calculating the matched sequences based on the forward parsing of the codes. The second approach is using rear N-grams where the value is gained by calculating matched sequences based on the reverse parsing of the codes.
The combination of front and rear N-gram values which are referred to as the FR-Grams model are then used to enhance the similarity finding technique between two different programs. Fig. 2 shows the framework of generating the semantic error feedback using FR-Grams. www.ijacsa.thesai.org FR-Grams Model is then used to match a student's program with the dynamic program templates. The templates were autogenerated based on the list of correct answers of the submitted student's program. The answers were graded automatically by executing the answers and matched with the expected test cases. Based on the best-matched program templates between the student's program, the student will get feedback notifying any missing program instruction sequence that needs to be added to the program in order to get the correct answers according to the selected template.

A. Step 1: Pre-processing
The library of computer instructions keywords needs to be prepared first before the computer program instruction's extraction can be generated. For example, a list of Java instructions or keywords are "int", "if", "string", "nextInt()", "public" and others. These keywords are inserted into a file to be used as a key point to filter the submitted answers. In the pre-processing process, any essential code before the program body needs to be removed. In java, these codes are something like "import java.util.Scanner", "public static void main". This framework only considers the main program body.

B. Step 2: Instruction Extractions from Student's Answer
Student programs need to be converted as a sequence of instructions. The sequence of Java instructions from a student example answer shown in Fig. 3 will be produced as the following: -"Scanner", "System", "String", "next()", "int" Then, the total instruction (N) is calculated based on the sequence of the program instructions.

C. Step 3: Instruction Extractions from Template
Templates are the collection of successful and accepted computer program submissions made by the previous student attempts to a question. These templates need to be converted into a sequence of instructions similar to the Step 1 to 2.

D. Step 4: Calculate FR-Grams
The number of FR-grams from student answers and correct answers is compared to calculate the sequence of programming. The algorithm to calculate N-grams is given in Fig. 4.

E. Step 5: Find the Best Template
All the correct students' answer stored in the database will be selected as the dynamic template matching. A student's attempt answer will be matched with these templates. The highest total FR-grams among them will be the considered as the most accurate template for further feedback generation.

F. Step 6: Feedback Generation
After the comparison, the total FR-grams value is produced based on the template selected in step 4. After finishing comparing the answers, the FR-Grams are calculated to get the total FR-grams for each template. The highest total will be processed as the feedback template.    5 shows that the student's attempt consists of instructions sequence of "Scanner", "System.in", "String", ".next()", "int" and "System.out". While comparing to the template, a feedback will be generated by the system notifying that "You need to add .nextInt() at line 10, position 154". This is the missing instruction sequence that the student needs to add for the program to be tailored to the template. This feedback can provide some clues for the student on how to proceed and make the correction to the program.

IV. RESULT AND ANALYSIS
There were 60 student's Java programming answers were tested using all the dynamic templates. The answers were based on 3 different set of programming questions. The templates were randomly chosen for each student's attempt by the system. Table I is the sample feedback that was generated by the system along with the template chosen by the system.
An expert was assigned to manually match the student's answer with the three randomly chosen templates. The experts are chosen based on their experience in validating the source code and marking the student's programming answer. For each student's answer, the expert will be presented with the three template that have highest total FR-grams to be compared with the student's answer. The expert was provided with a rubric in order to choose which template should the student's attempt be referred most.

1)
Check for similar variables: In a student's answer, many variables contain in the source code such as "string", "int", "char", "for" and others. If the student's answer contains a "string" variable, the experts will search this variable in the correct answers to get the most accurate template.
2) Check for quantity and type of variable: If the student's answer contains a "string" variable but in the template use "string[]" which is a string array type, the template will not be chosen.
3) Check for the simpler with the student's answer: If each template has passed the first and second rules which means that there is almost similarity between the templates, the experts will choose the simplest template according to the student's answer. The result shows that there were 48 out of 60 or 80% similar decision made by the model and the expert. 9 from the 12 answers that were not matched with the expert's decision was due to the small difference of the total FR-Grams (only one missing sequence different) among the templates. On the other hand, these cases will also contribute difficulty for the expert to decide which template should be considered as the most matched. Based on the matched template, the student will be given feedback notifying the possible next programming instruction that can be included in the answer to get it correct as was achieved by the template. This will be like personal coaching to help students recover from any cluelessness on the programming command sequences to answer computational programming exercises.
However, there was a weakness for this sequence-based model as it does not recognize the template based on the data type usage. For example, answer that was using array data type should have only seek template that using the same data type. This will be the future research works need to be conducted in identifying template context in order to get the best template matching for a more accurate feedback.

V. CONCLUSION
In conclusion, the best semantic error feedback for the student should meet these criteria: 1) Can guide the student on what is missing.
2) Can highlight to the student what student needs to include in the source code to fix the error.
With this semantic error feedback, students can get a valuable idea or hint to solve the error in their source code. The critical thinking skills of students will increase based on computational logic skills practice tools. The system continuously collects feedback as a repository which eventually fully automated interactive assisted learning system can be achieved. Lastly, the student will keep interested and motivated to self-practice programming exercises towards problem-solving skill development.