A Serious Game for Improving Inferencing in the Presence of Foreign Language Unknown Words

This study presents the design of a serious game for improving inferencing for foreign language students. The design of the game is grounded in research on reading theory, motivation and game design. The game contains trial-and-error activities in which students create conversations and then watch these conversations play out. Making mistakes results in students receiving feedback and being requested to try again. An evaluation of the system was also conducted, in which participants used both simple text and the game. Post-test scores for using the game were significantly higher than scores when reading the text. User reception to the system was also positive. These results suggest that serious games can be effective for enhancing inferencing when foreign language students face unknown words. Implications for reading comprehension and for incidental vocabulary learning are also discussed. Keywords—Serious game; foreign language; contextual inference; unknown words


I. INTRODUCTION
Inferencing is the process of making connections when trying to interpret a text [1].Inferencing information from the text is a necessary component of reading.To understand a text, readers must use their previous knowledge as base for the inferences of the new information.It also plays a role when disambiguating the meaning of words and clauses [2].The inferencing necessary is not extensive, unless the information is too unfamiliar, or the language proficiency of the reader is not sufficient for that specific text [3].One case of lack of proficiency is lack of vocabulary.Unknown words will often increase the amount of inferencing necessary for understanding the passage.In such cases, the reader will have to rely on the words he does know, the current information he collected from the text and his own background knowledge.In some cases, the reader may also infer the meaning of the unknown word.Children learn thousands of words per year and those words are mostly acquired from context [4], [5].This also happens for second language (L2) learners [6]- [9].
As such, inference from context in the presence of unknown words plays two roles in reading comprehension: • Understanding the passage by using the remaining information; • Inferring the meaning of the unknown word to aid in understanding the passage.
Learning words from context is a form of incidental learning.Incidental learning is the accidental learning of information without intention of remembering that information [10], [11].The amount of words the students learn from incidental learning might make one believe that inferring the meaning of a word from context is an easy task.The problem is that students often fail to pick up this contextual information in the presence of unfamiliar vocabulary.Learners may ignore the word and give up on understanding the given passage.There may not be enough information in the context to infer the meaning of the word.It can also be the case that students infer the wrong meaning of a word [10], [12], [13].Even when using dictionaries, students don't look up the meaning of all the words [14], specially when there are too many new items [15].Also, new words usually need to be encountered multiple times [4] in order to be learned.
One factor that has been shown to affect incidental learning is task involvement load.Vocabulary enhancing techniques have been used to increase the effectiveness of incidental vocabulary learning [12], [16]- [19].Past studies also agree that the higher the involvement load, the higher the effectiveness of incidental vocabulary learning.A meta-review on this matter can be seen in [20].These tasks, however, have limitations.First, time spent in the classroom is limited.This limits the amount of time users will spend interacting with these activities.Users cannot be expected to engage for long periods of time in these tasks on their free time.Yet, as mentioned before, students are able to learn a large amount of words in their school years which cannot be attributed to explicit learning.Voluntary reading, often done outside the classroom, have been associated with better language acquisition [21] and better vocabulary test scores [22].Voluntary reading works because of the large volume of reading done.Students read in such large volumes that they have multiple encounters with the unknown words.However, the amount of reading necessary and the time it takes to for measurable progress to be made can make extensive reading hard to implement [23].
Current research agrees that reading comprehension is positively affected by motivation [24]- [26].Students with poor reading skills show more correlation between their motivation and their reading performance [24].Motivation also affects the total amount of reading done [27].Interventions to increase reading comprehension have been shown to increase reading performance [28].Motivation is also a predictor of success in language learning in general.It was shown that motivation had the stronger correlation with language grades and selfevaluation [29].In that study, motivation outperformed both the attitude towards the learning situation, integrativeness and orientations.
One approach to handling motivation is Digital Game www.ijacsa.thesai.orgBased Learning (DBGL).One definition for DBGL is "the innovative learning approach derived from the use of computer games that possess educational value or different kinds of applications that use games for learning and education purposes such as learning support, teaching enhancement, assessment and evaluation of learners" [30].DBGL includes the use of both commercial games and serious games.Serious games are digital games made for more than entertainment [31].One core element of games that affects motivation is challenge.Appropriate challenge that matches the skill of the user will greatly affect the experience [32].If it's too easy, the player will be bored.If it's too hard, the player will be discouraged.This fits with the conditions to achieve flow state, a popular construct in entertainment research [33].It also fits with the need for competence from the Self Determination Theory explained in [34], [35].As such, proper challenge is one of the factors that influences engagement and motivation in game design.However, game design is not about arbitrarily creating challenge.A game must be both accessible and easy to use while still providing a hard experience for the player [36].This means that a game's challenge should not be born from usability issues.It's necessary to focus on usability in game design.Also cited as an important element is for the player to have freedom to fail and try again, as much as the user needs [32].
On the limitation of using commercial games, [37] conducted a research comparing vocabulary recall in players and watchers of a music game.Players actually interacted with the game and watchers were asked to simply watch the game.Watchers had a much higher score in vocabulary recall.The research reported that players were divided between listening to the words or doing well in the game.This shows that extraneous cognitive load can get in the way of reading, depending on game genre.This shows that while using commercial games is cost-effective, there might be a loss in learning gains compared to a well-designed educational game.
One popular game genre is visual novels.It involves reading a narrative through long periods of time [38].Visual novels, unlike books, only show one snippet of text at a time.After the player does some sort of interaction with the game (pressing a button, for instance) the game advances to the next snippet of text.This means that players are presented with a limited amount of text at a time, meaning that players don't have to keep track of their progress in a book's page, for example.Visual novels also have graphical elements like backgrounds and character artwork.This gives the player a vision of what is going on inside the story.This facilitates reading comprehension [39].Many visual novels also have some game-play elements between story-line sections, such as [40].Alternation between story and game-play is a recurring element in game design, and is said to have beneficial elements, such as rewarding the player and improving pacing [41].
The serious game used in this study contains an activity designed to induce students to infer information from context.It locks them into a trial-and-error feedback loop while they attempt to construct a conversation.It combines this activity with a story, similar to a visual novel.In the sections below, the design of the game will be further explained, with a focus put on how it improves inferencing and on its motivational elements.Then, the experiment will be described, and the results analyzed.The game used in this study has been explored before in the context of extensive reading [42], [43].
This study aims to answer weather or not inferencing improved by using the game when compared to simply reading text.It also aims to present how the design of the game interacts with inferencing.
Section II presents related work in the field, exploring other works that used games for language teaching with a focus on reading.Section III presents the design of the game and details of the experiment.The game's design portion starts by presenting the challenges of designing a game for improving inferencing, based on the discussion presented in Section I.It then presents the various aspects of the game and ends by summarizing how the aspects of the game answer the design challenges.Then it introduces various aspects of the experiment done.Section IV presents the results obtained through the experiment and discusses their meaning.Section V concludes the study.It presents implications of the results on the field, the shortcomings of the study and possible future research.

II. RELATED WORK
DBGL has been used successfully for language teaching on various fields, ranging from situated vocabulary learning [44], conversational visual novels [45], commercial games in the classroom [46], [47], relating language gains to gaming habits [48] and so on.The synthesis done in [49] about video games and second language learning concluded that games have a positive impact on learning, specially for vocabulary, with the experimental group surpassing the traditional study control group in some cases [50]- [53].This shows that it is possible to have gains when reading content in games.The gains from vocabulary measured are due to incidental learning while playing.This shows that inferencing while reading also happens when playing games.Despite this, there has not been much research that focused specifically on designing a software focused on supporting reading as the main activity in the context of DBGL and foreign languages.The work of [54] attempted to use a augmented reality game to enhance reading comprehension but it failed to show gains in reading comprehension.It did, however, show motivational gains.Some works [55]- [57] focus on first language primary reading skills (among other fields) for young children and showed positive results, but does not focus on reading long texts, focusing instead on more basic skills, such as individual word reading.The study of [58] focuses on first language reading comprehension, but does not evaluate the actual learning gains and does not expose how the design of the game relates to actually attaining those skills.Other works, like [59], address L2 reading but do not go in depth in designing the application to integrate with the reading process.
As far as inferencing during foreign language reading goes, none of the work reviewed addressed the process directly.

III. METHODOLOGY
A. Design of the Game 1) Design Challenges: Summarizing the material presented in Section I, the challenges in designing an activity for www.ijacsa.thesai.orgimproving inference from text in the presence of unknown words are: • Students may ignore passages which contain unknown words; • Students may infer the wrong meaning of a word; • Activities would benefit from being intrinsically motivating, which implies: • Better performance in reading comprehension and language learning in general; • Possibility for the activity to be used in students' free time, thus avoiding time limitations of the classroom; • Compatibility with extensive reading.
If the activity is a game, it would also present the following challenges: • Game elements should not detract from reading or from inferencing; • The game needs be challenging but not too challenging; • The activity should allow for students to fail as much as they need in order to progress.

2) Game Introduction:
The game features a combination of story segments and activity segments.Screen-shots of the segments can be seen in Fig. 1 and 2, respectively.
In the story segment, we can see the characters present in the scene, a box displaying a piece of dialog and a background depicting the current location.Those elements can all be seen in Fig. 1.Upon user input, the story advances.This makes the next line of dialog or narration appear.This dynamic continues until the scene ends.While reading is prevalent, no elements are present to induce or improve contextual inferencing in this segment.
The inducement of contextual inference happens in the activity segments.During activity segments, users attempt to construct a conversation that solves a certain in-story goal.The conversation is constructed by inserting the pieces of the conversation into the empty slots, as shown in Fig. 2. The design of activity segments will be further discussed below.
3) The Conversation Construction Activity's Design: This activity consists of constructing a conversation and watching it play out.If the constructed conversation is inappropriate, a new conversation will be formed that will give the user insight into why that conversation is wrong and into how to create the appropriate conversation.From now on we'll refer to the phase of constructing a conversation as the assembling phase and the phase of watching the conversation play out as the result phase.Those two phases will be further developed in the subsections below.
The ideal behavior of the user for this activity can be seen in Fig. 3. 4) Conversation Construction's Assembling Phase: This phase consists of forming a sequential dialog by inserting conversation pieces into a grid, like in Fig. 2.However, the user can only insert the pieces related to what one person says.What the other person says is already fixed on the grid and cannot be moved.This was a deliberate decision to reduce ludo-narrative dissonance.Ludo-narrative dissonance is when game-play and story have a mismatch [60].If players ask themselves "if I am the main character in the narrative, how come I can control what the other person will say?"immersion would break.Whenever a student fills up all vacant spaces with conversation pieces a button will appear in the interface.Pressing that button will take the student back to the screen of the story segments and the resulting conversation will play out.
Regarding Fig. 3, this refers to the "Construct a conversation" node.

5) Conversation Construction's Result
Phase: First, the system must check if the conversation is appropriated or not, by comparing it to the answer.If the conversation is appropriate, it will be shown to the player as it is, and the story will go on.This refers to the "appropriate conversation case" in Fig. 3.
However, if it's incorrect, the system must logically assemble a new conversation based on the player's constructed conversation.this is done by using the following steps: • Find the player's first mistaken conversation piece in the conversation by comparing the correct conversation with the assembled conversation from top to bottom; • Discard all conversation pieces below the player's first mistaken conversation piece; • Insert the text that has been previously prepared as a reaction to the mistaken conversation piece.This text will show up after the mistaken conversation piece; • Insert the text that has been previously prepared as a clue for the correct conversation piece that would fit in the position the player made his first mistake.This text will appear after the text of the previous step.
In Fig. 3, this would be the inappropriate conversation case.This new generated conversation is then shown to the player.After the generated conversation ends, the player will go back to the conversation construction screen.This process can be better understood in Fig. 4. In the "First mistake reaction", when the conversation goes in to an unexpected flow, the user feedback begins, where users can acquire information on why the card related to the "first mistake" is unappropriated and insight in to what conversation piece would be appropriate for that particular slot.Users who are reading attentively will also be able to clearly point out which conversation piece has been considered inappropriate, since the feedback (the change in the conversation flow) begins at that moment.This feedback is effective because it uses the player's inappropriate input to generate a conversation, similarly to an error-based simulation (such as the one used in the work of [61]).Instead of simply stating "this conversation is wrong, the correct one is this one", it allows players to reflect on their input in a more effective way.This refers to the "Read feedback" and "Extract new information" nodes in Fig. 3.

6) "Look for Clues" Functionality:
In Section III-A5, it has been stated that feedback starts at the "First mistake reaction".To further enhance this fact to the player and to support him in relating this feedback to the "First mistake", a screen was added, called the "Look for Clues" message, which will appear right before the "First mistake".The message says the following "This conversation will not go as expected!Read it to find clues!!!".This happens in the middle of the conversation and it tells the user that: • The created conversation has a problem.
• Until that point the conversation did not have a problem.
• There is something wrong with the conversation piece that appears right before that message.
• Looking for clues in whatever is coming up next is what the game expects them to do.
This was designed to further induce the ideal behavior in Fig. 3.

7) Game Design Elements:
This session will describe the game design elements that have been incorporated into the design of the game.Their effects and importance will also be discussed.References for these elements can be seen in Section I.
Challenge and Freedom to Fail: Our approach for challenge has been through natural, emergent difficulty.As we've previously shown, reading comprehension for L2 learners can be a fairly difficult task.On extensive reading there is a focus on choosing texts with appropriate difficulty to mitigate this difficulty.The conversation construction task involves extracting information from the text and using that information.As such, it should have a difficulty similar to the reading comprehension process.The difference is that feedback is provided.In our feedback loop, progress will make it simpler for him to solve the activity.This way, every time the user tries to solve the task, he should have more information and the task should become easier.About freedom to fail, the user is free to fail in our design.Furthermore, he is rewarded with feedback from his failure.
Visual Novel: The game is very similar to a visual novel and could be classified as such.This was not an arbitrary design decision.As discussed before, visual novels have a number of elements that make them an effective reading application.
User Interaction: In our conversation construction activities, drag-and-drop is the main form of interaction used.In [62], drag-and-drop is encouraged and described as an intuitive way to move content through the system.Our conversation construction activity has been designed with this in mind for its intuitiveness and for providing a fast way to construct the conversation.This approach has been used in applications like Monsakun to achieve similar effects and they have been well received [63].
8) Addressing the Design Challenges: Students may ignore passages which contain unknown words: A user that displays this behavior is not performing according to the ideal behavior displayed in 3.If the user ignores a passage, he would have trouble building the conversation.Because of this, the chances of the user making a mistake would rise.Upon making a mistake, the user would then be presented with feedback.At that moment, the "look for clues" functionality described above further points the user to reading the feedback.Furthermore, the chances of the user solving the activity by luck is 5%, given the default setup of five conversation pieces and two empty slots.It is low enough to make reading the feedback a more suitable strategy than trying to make the correct conversation by luck.
Students may infer the wrong meaning of a word: This would also imply in students making a mistake in the conversation construction activity.The expectancy is that the user will be able to correct his misunderstanding from reading the feedback.
Activities would benefit from being intrinsically motivating: The designing focuses on intrinsic motivation by balancing challenge, offering freedom to fail and by using drag-and-drop for ease of use.As discussed before, these are the elements related to the intrinsic motivation in games.
Game elements should not detract from reading or from inferencing: As seen before, the game has two types of segments, story segments and conversation construction segments.Both segments include reading.There are no actions to be done in-game that don't involve reading in some way.Conversation construction segments would result in multiple readings of the conversation pieces.Students are also expected to be carefully reading the feedback.As such, instead of detracting, the design has a focus on improving inferencing.
The game needs be challenging but not too challenging: One challenge of L2 reading is matching the difficulty of the text to the skills of the user.As such, difficulties in reading are highly content based.The trial-and-error with feedback design mitigates this issue by making the activity progressively easier as the user keeps on reading the feedback.This has been further explored in the subsection Game Design Elements.
The activity should allow for students to fail as much as they need in order to progress: This is a natural part of the trial-and-error design.Further details above in the subsection Game Design Elements.

B. Game Development
The game has been developed using the C# language and the game engine Unity [64].A scripting language was created to describe the story sequences and the conversation construction activities.Since the story sequences are similar to visual novels, the commands used are similar in format to the ones found in Ren'py, a visual novel engine [65].A script interpreter was written to render the scene for the players, while also handling input.

C. Design of the Experiment
This study used a within-subject design with two conditions for counterbalancing: text-game and game-text.Afterwards, participants took a post-test and some of the users took a user perception survey.Text-game read a text and then played the game.Game-text played the game and then read the text.All measurements were done in the end so that measuring would not affect the behavior of the users.This flow can be seen in Fig. 5.The post-test had three sections: remembering section, textual comprehension section and word comprehension questions.Both game and text included dummy words to create a situation where users are reading a material with unknown words.
13 Japanese University students participated in the study and were randomly assigned to each condition.
Two textual contents were used in this study, A and B. Both contents have a game form and a textual form.Thus, we have game A, game B, text A and text B. The text-game group used text A and game B. The game-text group used game A and text B. Content A and B were found to be appropriate or below the difficulty of Grade 2 in the Common Core State Standards [66].As such, both contents are considered to be accessible and equivalent in difficulty.This was measured using the TextEvaluator tool [67].Scores given by the tool were found to have high correlation with judgment presented by human experts [68].Content A has 164 words while content B has 226 words.
The post-test was divided into three sections: • Remembering section: users were asked to write as much as they could remember with as much detail as possible.
• Textual interpretation section: users were asked questions such as "Did Brian ever get angry in the story?If yes, why did he get angry?" • Word comprehension section: users were asked to explain the meaning of the dummy words and to translate phrases that used the dummy words.
The textual interpretation questions were designed around the passages that contained dummy words.This means that inferencing information in the presence of unknown words is necessary to correctly answer the questions.Only the last two sections are used to calculate the scores.The first section was included for the possibility of an exploratory analysis, but it is not addressed in this study.
The user perception survey had four questions.Three of them comparing text and game on ease of content understanding, motivation to read and on suitability for studying English.The last question was about the usability of the game.
The experiment was performed in participants individually.They interacted with the game on a computer.The text was read through a PDF file.While interacting with the game, participants were taught that clicking would advance the story.They were also taught how to drag-and-drop to build the conversations.The post-test and the user perception survey were both administered through an online form.

IV. RESULTS AND DISCUSSION
Table I shows scores obtained by the two conditions and for all participants.Scores have a minimum value of 0 and a maximum value of 1. Game scores had a lower standard deviation than text scores.
A two-way analysis of variance was conducted on the influence of two independent variables (medium, order of use) on the post test scores.Medium includes two levels (game, text) and order of use consisted of two levels (first, second).The only significant effect at the .05significance level was for the medium factor.The main effect for medium yielded an F ratio of F(1, 22) = 11.16,p <.01, indicating a significant difference between using the game (M = 0.77, SD = 0.13) and reading the text (M = 0.52, SD = 0.24).The main effect for order yielded an F ratio of F(1, 22) =1.06, p >.05, indicating that the effect for order was not significant, first (M = 0.61, SD = 0.24) and second (M = 0.67, SD = 0.23).The interaction effect was not significant, F(1, 22) = 0.86, p >.05.
The above results suggest that using the game results in more information absorbed than using the text.It also suggests that order of use (which one is used first, and which one is used second) does not affect the amount of information absorbed.
As for the user perception survey results, found in Table II, the following trends were found: • In the area of interest, all users except for one had a positive opinion towards the game, with over half of the users completely favoring the game.
• On perceived comprehensibility and perceived learning, half of the users had a positive opinion while the other half had a neutral opinion.
• On usability, one user found the game a little bit hard to use, while the clear majority thought the game was easy to use.
• The user who felt the game is a little bit hard to use is the only one user that was unfavorable towards the game in any of the areas.He also favored printed text in the area of interest.
Those trends show that the hypothesis was true.About the one user that was unfavorable towards the game, his scores were checked in order to see if his unfavourability affected his scores.Surprisingly, he was the only user to get a perfect grade related to the content in the game version he used, suggesting that the comprehensibility scores are not affected by dislike of the game.These results fit well with past findings suggesting good affective reception from learners in relation to DBGL, as reported in [69] and in other works ( [55], [70], etc.).Users higher comprehensibility when using the game can be attributed to being able to read the feedback information to solve the conversation construction problems.This suggests that users were performing according to the ideal behavior previously defined, indicating that our efforts to create an activity that can only be practically solved by displaying the needed behavior have been successful.When using the text, users may have been more likely to ignore passages or to make mistakes during inferencing.This gain in performance is reflected not only in reading comprehension but also in incidental vocabulary learning, since the experiment included dummy words.Thus, results suggest that users are able to infer partial meaning of the words better when using the game.

V. CONCLUSION
Results suggest that users are able to infer information from context better by using the game.This implies that activity designs based on creating a trial-and-error task with automatic feedback can be useful for improving reading comprehension and improving incidental vocabulary learning.Qualitative results have also been positive.
As for the perception of the game as an English studying tool in comparison to the paper version, around half of the users pointed to them being equally effective.And yet the comprehension scores for the game version have been much higher.This contradiction between user's perceived learning effectiveness and the actual effectiveness has also been reported by [59].Low perceived learning is also one of the challenges of extensive reading, so making DBGL tools have a higher perceived learning by students should positively impact their performance and studies in that direction are necessary, such as measuring differences in flow and motivation between incidental and explicit learning.
Remaining issues would be performing additional experiments to show more compelling evidence of the increases in reading comprehension, since the number of participants was small.Also, the low perceived learning and the fact that the design relies on the presence of conversations are also limitations.Expansions to this research could focus on making learning more explicit by mixing the narratives with explicit vocabulary teaching, thus making the learning process more obvious to the student.Another problem is that, currently, producing content for the game is a complex task.Creating a tool to assist this process would allow content to be created by teachers and other content creators.
Another possible next step is adapting the design to be generated based on natural language processing techniques without human input.This would allow for a large amount of game content to be created.This would have implications for improving the performance of extensive reading programs.

Fig. 1 .
Fig. 1.Screen-shot of a story sequence in the game.

Fig. 3 .
Fig. 3. Ideal user behavior flow for the conversation construction activity.

Fig. 4 .
Fig. 4. Flow chart for mistakes during the conversation construction activity.

TABLE I .
AVERAGE SCORES AND STANDARD DEVIATION FOR THE TWO CONDITIONS AND FOR ALL PARTICIPANTS