Text2Simulate: A Scientific Knowledge Visualization Technique for Generating Visual Simulations from Textual Knowledge

— Recent research has developed knowledge visualization techniques for generating interactive visualizations from textual knowledge. However, when applied, these techniques do not generate precise semantic visual representations, which is imperative for domains that require an accurate visual representation of spatial attributes and relationships between objects of discourse in explicit knowledge. Therefore, this work presents a Text-to-Simulation Knowledge Visualization (TSKV) technique for generating visual simulations from domain knowledge by developing a rule-based classifier to improve natural language processing, and a Spatial Ordering (SO) algorithm to solve the identified challenge. A system architecture was developed to structurally model the components of the TSKV technique and implemented using a Knowledge Visualization application called ‘ Text2Simulate ’. A quantitative evaluation of the application was carried out to test for accuracy using modified existing information visualization evaluation criteria. Object Inclusion (OI), Object-Attribute Visibility (OAV), Relative Positioning (RP), and Exact Visual Representation (EVR) criteria were modified to include Object’s Motion (OM) metric for quantitative evaluation of generated visual simulations. Evaluation for accuracy on generated simulation results were 90.1, 84.0, 90.1, 90.0, and 96.0% for OI, OAV, OM, RP, and EVR criteria respectively. User evaluation was conducted to measure system effectiveness and user satisfaction which showed that all the participants were satisfied well above average. These generated results showed an improved semantic quality of visualized knowledge due to the improved classification of spatial attributes and relationships from textual knowledge. This technique could be adopted during the development of electronic learning applications for improved understanding and desirable actions.


INTRODUCTION
Knowledge visualization is the application of visualization techniques to disseminate knowledge among individuals [1], [2]. The main purpose of knowledge visualization is to improve the communication of knowledge through visual means. Although most available knowledge sources are in either numeric or textual formats, it is highly necessary to provide a visual representation of such knowledge for easier assimilation and retention in human minds and for fostering required action [3]. Visual knowledge could be represented with charts, maps, images, scenes, simulations, and videos. In this article, a textual knowledge visualization technique that visually simulates textual knowledge was developed. A visual simulation is a visual representation containing visual objects that dynamically move based on predefined spatial attributes or collisions. Although visual metaphors, scenes, and animations could be used for the visualization of textual knowledge, these techniques do not emphasize precision in the visual representation of spatial attributes. Such techniques may not be appropriate for related domains that require accuracy in visual object positioning and movement. Furthermore, expressing textual knowledge in visualizations is of utmost importance for effective assimilation to be achieved [4]. This paper, therefore, presents a Text-to-Simulation Knowledge Visualization (TSKV) technique for generating visual simulations from domain knowledge. A review of related works is described in the next section of this article which is followed by an implementation of the TSKV technique. A quantitative evaluation is finally presented to test for accuracy using modified existing information visualization evaluation criteria.

II. REVIEW OF RELATED WORK
Knowledge visualization, according to [5] and [6] can be summarized into four major formats. Sketches, diagrams, images, and interactive visualizations. Fig. 1 show a word cloud of knowledge visualization techniques. A sketch is a visual representation that shows an abstract drawing or prototype of an idea or concept. Diagrams consist of boxes and circles that represent concepts and entities; and lines, and edges that depict the relationship between the entities. Diagrams are used to illustrate the classification and clustering of related concepts in a domain. Examples are Knowledge Graphs used for visualizing search histories [7]; concept maps used to illustrate concepts and relationships between them in specific domains such as medicine [8]; teaching and learning among others [9]. An image (visual metaphor) is a pictorial representation of humans and events. It is usually generated through rendering, photography, or paintings. Images are used to express emotions and give an idea about a concept such as comic images, icons, and emojis for storytelling [10]. Interactive visualization involves visually representing knowledge with animated objects and shapes. It allows users to interact and make decisions while viewing the visualization. Visualizations are usually shown in an ordered sequence of images. Examples are models such as archetypes [11]; Existing literature research has shown that explicit knowledge could be visually represented with knowledge graphs, knowledge maps, and concept maps. Hao [15] used knowledge graphs to develop a surveying and remote-sensing application. Li [16] employed a concept map to develop adaptive learning systems. Visual metaphors replace key terms and concepts found in textual documents which visual characters. Huron [17] used data feeds to represent sediments. Chau [18] also used flowers to represent search results from the web. Hiniker [19] applied visual metaphors for clustering and viewing citations of literature from a large database. Keith [20] used visual metaphors to develop narrative maps for visualizing online narratives.
More recent KV techniques such as text-to-scenes, text-tovideos, and text-to-simulations generate more realistic visual presentations in form of images [21]; animations [22]; scenes [23], [24]; videos [25] and visual simulations [14]. A brief discussion of the major techniques related to our work is presented thus:

A. Text-to-Scene
Text-to-scene conversion generates static scenes from natural language text. Several authors have worked on this research area by applying different AI approaches. The first conversion system was WordEye [26]. Other authors applied machine learning techniques for text-to-scene conversion [27], [28]. Deep learning techniques such as Variational Auto Encoders (VAE) and Generative Adversarial Networks (GAN) are recently being applied for text-to-scene conversion [29], [30]. Our technique does not produce static scenes but dynamic simulations showing object-to-object collision behavior and movement.

B. Text-to-Simulation
Visual simulations are generated from textual inputs. Very limited research in this area exists in [14], [31], [32], and [33]. VoxSim architecture was presented in their work. They applied a rule-based approach specifically predicate logic to set rules for each motion verb which was applied during conversion. However, precise geometric information of objects and their relative positions with other objects were not specified. Also, behavioral attributes of objects such as motion were not specified. The above issues were reported to have led to ambiguities in some simulation results. In our work, precise spatial grounding information of visual objects was included. Exact geometric information and their behavior attributes for all visual objects was also defined.

C. Text-to-Video
This involves converting natural language text to Videos that semantically depict the textual input. Recent research on text-to-video conversion exist [25], [34], [35]. A.I. techniques have been applied. Specifically, machine learning [36]; deep learning and neural networks such as convolutional neural networks [34]; recurrent neural networks [37], long short-term memory networks [38], and generative adversarial networks [25], [35] Although a rule-based approach was adopted, there is a slight resemblance between our work and most text-tovideo conversion systems in that our prototype tool coupled with TTS and an existing screen capturing tool can generate a video of all simulations performed.

D. Natural Language Processing (NLP)
This involves NLP which involves several tasks such as sentence segmentation, tokenization, Part-of-Speech (POS), Named Entity Recognition Dependency parsing among others [39]. SpaCy [40], Natural Language Processing Tool Kit [41] and Stanford CoreNLP tool [42] are some NLP tools that perform NLP tasks. Although these tools perform these tasks, classification and extraction of domain-related keywords and attributes remain a research challenge due to the semantic interpretation of such words. A rule-based classifier was developed to classify and extract domain-related keywords, keyword attributes, and relations between keywords.

E. Spatial Relations and Arrangement
This involves defining the specific mapping of spatial keywords to predefined spatial information. Chang [28] and Ma [24] developed spatial relations for prepositions. Fisher [43] presented an arrangement model for determining the order of object placement and position. Our work also defines specific relation information for spatial keywords and prepositions by developing an algorithm for spatial positioning and rendering of objects.

F. CAD Models and Scenes
Computer-Aided Design is employed for creating models which are visual replicas of real-world objects. Most KV applications make use of existing CAD model datasets such as ShapeNet [23]; ScanNet [44] and Scene datasets such as SceneNet [45] for developing visual representations of textual knowledge. This work did not make use of existing datasets but developed new models. This is due to the scarce availability of models in the selected domain used for application validation. www.ijacsa.thesai.org

III. METHODOLOGY
We present a mathematical model for generating visual simulations from textual knowledge and a system architecture in the subsequent subsections.

A. Mathematical Model Formulation
We mathematically model the task of generating visual simulations from textual knowledge. This is carried out using linear functions given a set of textual inputs . The following definitions show each milestone required from textual input to visual simulation output.
Where the set of entities is is the attribute(s) set; is the relationship between entities, and are entities.

Definition 2:
Suppose there exist a model repository and entity set ; (2) Where and are some models and entities in and and all entities in have a one-to-one function in Definition 3: Suppose there are a finite set of shapes with scaling factor ; centroid coordinate , spatial attributes and relation . Let be some numeric value; be the length of and Let there exist Scaling and relative positioning Transforms and respectively. Let there exist a rendering Engine The visual simulation is given by .

1) For single object placement ;
( 2) For two relatively positioned objects and ; Where:

3)
For two relatively positioned objects and with spatial attribute ; Where: Given textual input, the semantic classifier extracts all entities, attributes, and relationships between entities from textual input as described in Eq. (1). Next, a set of entities are extracted from the model repository (Eq. (2)). Each model has a unique centroid coordinate and a scaling vector. Eq. (3) and (4) describe how single models are rendered while taking into consideration its centroid coordinate and its scaling vector. To render two objects relative to each other (Eq. (6)) the centroid coordinate of an object is recalculated using the centroid coordinate of its relative object and some numeric value which could be the height or thickness of the relative object added to to get the . For object relative positioned to each other with spatial attributes (Equation 7), the value of an object is recalculated by; first subtracting the value of the spatial attribute from the mid-length value of its relative object. This result is further subtracted from the value of its relative object. Some numeric value which could be the height or thickness of the relative object added to to get the . Fig. 2 shows the Text-to-Simulation Knowledge Visualization architecture. It comprises four major modules: Natural Language Processing module, 2D Graphic Models Knowledge Base, Spatial Ordering module, and the user interface module. The Natural Language Processing module performs natural language processing tasks which are tokenization (breaking down the user's input sentence into words, numbers, punctuation marks, full stops, and discrete items); Part of Speech Tagging (allocating a POS tag per word); Dependency Parsing (assigning dependency labels to show relationship patterns between object and subject tokens); Classification and Extraction of domain-specific words, attributes, and relationships. 2D models can be found in the image repository. The Spatial Ordering Module determines the order of model rendering. The Graphic and Physics Engines are built into the application developed to cater for collision and interaction among models and also for rendering. www.ijacsa.thesai.org 1) Natural language processing and dependency parsing: We make use of SpaCy toolkit for natural language processing and dependency parsing. It is an existing industrial natural language processing library written in Python programming language [40]. SpaCy is known for a higher level of speed and accuracy in major NLP tasks such as POS tagging [46] and dependency parsing [47] when compared to other NLP applications. Tokenization of text, POS tagging, and parsing sub-libraries are used in our work. The NLP pipeline selected for this research supports English Language since it is the commonest medium of communication where this architecture was implemented and evaluated.

B. System Architecture
Each word in the user's input is passed through a text corpus for relevance. If the sentence is not domain-related, the user is prompted to input domain-related sentences. Tokenization is done by converting each word in the sentence to tokens. Dependency parsing is done by assigning POS tags to each word, assigning dependency labels that show relationship patterns between object and subject tokens. Fig. 3 shows an example of a dependency graph given the sentence: place a ruler‖ 2) 2D models: A repository was created to store 2D models. Major apparatus objects used for performing High school experiments in Mechanics (a subtopic in Physics) are modeled in 2D for this research. The intended and purposely selected users for evaluation of the implemented architecture informed the choice of 2-Dimensional modelling. 2D graphic models provide necessary visual knowledge without much distraction and complexity.
The apparatus image library consists of 14 classes as shown in Fig. 4. Each 2D apparatus model was created using modified existing objects from the Pymunk library. An apparatus model comprises of one or more objects. Fig. 4 is a class diagram showing each apparatus class, the attributes, and the methods. It also shows the relationship among the apparatus classes.  3) Rule-based classification and extraction algorithm: We present a novel rule-based classification algorithm that accepts a predefined objects-of-interest list and tokens (Fig. 5). Extracted noun tokens from the POS tagging must exist in the predefined list of objects of interest. The algorithm returns an object list, objects-attribute list, relations list, and object-relobject list which shows the relationship between objects. In this work, the relationship between two objects is only considered.

4) Attributes and relations grounding:
We consider spatial attributes such as the weight, size, and unique position of objects which are extracted during NLP. The object's color and other characteristics are not emphasized as much work has been done on this area in literature [24], [28]. The spatial attributes are very essential for precision in visual simulation output. Three predefined sets of relations: on, under, and inside are considered. The bounding box approach was employed to determine close distance and positions appropriately. The (x, y) axis coordinates, the height, and the width of the bounding box of the model were used to determine relative positions. 5) Spatial Ordering: Spatial ordering is the task of sequentially rendering models and determining relative positioning for related models. Given a set of object list O, object-attribute list P, relations list Q, and object-rel-object list R, a visual simulation is sequentially rendered based on the following procedure: The next section presents an implementation of the methodology.

IV. -TEXT2SIMULATE‖ KNOWLEDGE VISUALIZATION SYSTEM
We develop a knowledge visualization system called Text2Simulate based on the architecture shown in Fig. 2.

A. Software Requirement Gathering and Elicitation
Domain experts (three teachers) who taught physics in High schools located in a remote area were interviewed to retrieve system requirements since the targeted users of the application are high school physics subject tutors and senior high school students from remote areas. It is expected that the users have an elementary level of proficiency in English language. The users should also be familiar with basic concepts and terms used in high school physics. The Text2Simulate Knowledge visualization system can be used by students with little or no supervision of teachers.

B. Dataset
The dataset used for this application is the models in the apparatus image library and physics-related sentences in natural language text. The apparatus object image library contains apparatus object images. Existing objects in the Pymunk library are modified and used to create apparatuses models for Ruler, knife-edge and other apparatuses shown in Fig. 2.

C. Graphic User Interface
The user interface is divided into three major sections as shown in Fig. 6. The visual simulation is viewed in the right section. On the upper left, the textbox is used for accepting textual input from users. Apparatus models can also be viewed by clicking on objects from the Toolbar. Selected models are then viewed on the right viewing pane. When the button 'analyze' is clicked, the extracted objects, object-attribute list, and the object-rel-object list are shown on the lower-left pane. The user then clicks the 'simulate' button to generate visual simulations which are shown on the right pane.

D. Implementation Results
The models can be viewed by clicking each model as seen in Fig. 7. A table model is shown when clicked. Fig. 8 to 10 are screenshots of sentence inputs, classification results and visual simulation. The classification results show the objects, attributes and relations extracted from the sentence input. The figures show the screenshot of the visual simulation of the sentence. Fig. 8 shows a visual simulation result for ‗Hang a spring to a retort stand and place a mass of 6kg on the spring'. During visual simulation, the motion can be viewed when the mass is attached to the spring. The spring continues to oscillate till it gets the equilibrium position. The screenshot was taken when the spring got to its equilibrium position. Fig. 9 and 10 show visual simulation for sentence inputs describing principle of moments using a balanced ruler experiment [48]. www.ijacsa.thesai.org

V. QUANTITATIVE EVALUATION
We first perform a quantitative evaluation on the rulebased semantic classifier. Then, the accuracy of converting textual knowledge to visual simulation representation is evaluated. Finally, a user evaluation on the knowledge visualization tool is conducted.

A. Rule-Based Classification Evaluation
A total of 110 sentences were purposively selected and used for evaluation. A total of 60 sentences were extracted from the general domain; while the remaining 50 were domain-specific sentences.
Objects_of_Interest, Object_Attribute and Object_Relation_Object classification results are compared with human-generated classification. Standard performance evaluation metrics (recall, accuracy, precision, and F1 score) based on the confusion matrix are employed. Each outcome is assigned True Positive (TP) if it correlates with human-generated classification; assigned False Positive (FP) if it is extracted as a member of a list but false with human-generated classification and assigned False Negative (FN) if human-generated classification assigns it to a list but is not included in the extracted list. Eftimov [49] and Popovski [50] reported that the True Negative metric is not required for the evaluation of rule-based entity classification methods. Hence, True Negative values are not reported. Table  I presents the three classification categories; Objects of Interest list; Object-Attribute list and Object-Relation-object list.  The accuracy for Objects_of_Interest classification is 0.9565 as presented in Fig. 11; precision is 1; recall is 0.9565 and F1 score of 0.9777. For object_Attribute classification and extraction, the accuracy is 0.8888; precision is 1; recall is 0.8888 and F1 score is 0.9411. Evaluation for the Object_Relation_Object classification produces; 0.9286 (recall); 1 (precision); 0.9286 (accuracy); and F1 score of 0.9630. The precision value of 1 for all the classifications is achieved because objects belonging to the list from the classifier actually belong according to human judgment. It can also be attributed to the advantage of employing a rule-based approach for the classifier modeling as reported in Al-Moslmi [51]. Accuracy and recall have the same value for all the categories. This shows that the classifier can correctly classify and extract all three categories to separate lists. This is also because True Negative outcome is not computed for this evaluation. Summarily, it can be concluded that evaluation results show that the classifier performed well above average. During error analysis of semantic classifier results, the dependency parser did not identify a few relations existing between objects as a preposition. They were identified as modifiers. However, human judgment categorized them as prepositions. Some attributes were also not identified since they did not have numeric values. The dependency parser did not identify a few compound words as nouns (objects). The words were split into modifier and noun (‗inclined plane). However, human classification categorized them as Nouns.

B. Evaluation of Knowledge Visualization System based on Visual Simulation Results
Existing evaluation criteria used for evaluating the visual representation of information visualization techniques in [52] and [53] were adopted and modified to include Object's Motion criteria. Two domain experts were selected to evaluate Text2Simulate Knowledge Visualization application based on visual representation results. The simulation generated from the Text2Simulate application were evaluated using 50 purposively selected domain-specific sentences and compared with human judgment based on the following modified criteria: Objects Inclusion (OI); Object-Attributes Visibility (OAV); Object's Motion (OM); Relative Positioning (RP) and Exact Visual Representation of Text (EVR). Each criterion was evaluated using performance metrics based on a confusion matrix. Each result is Assigned True Positive (TP) if the visual simulation matches with human judgment; Assigned False Positive (FP) if the visual simulation is not semantically correct based on human judgment and Assigned False Negative (FN) if there is no visual simulation (object is static) but there should be simulation-based on human judgment. The confusion matrix for visual simulation tasks is presented in Table II.  The visual presentation results showed almost a hundred percentage positivity for the knowledge presented as shown in Fig. 12. Spatial attributes were significantly visualized as well as the motion of objects which was dependent on their unique attributes, sizes, and intersection with other relative objects. During error analysis, a statement such as 'place two masses of 5kg and 20kg on a table' produced a visual representation showing only one mass and a table. Although the second mass was also rendered, it was hidden as both masses were placed at the same default location. Also, it was observed that some objects were not rendered even though the classifier identified them as objects as their models were not found in the image library.

C. User Study Evaluation
A total of 10 physics teacher participants and 20 student participants who have little or no prior graphic knowledge and reside in developing areas were purposively selected for the user study. Both set of participants were trained on how to simulate physics experiments on the knowledge visualization tool. They were then asked to perform two experiments www.ijacsa.thesai.org namely: Principle of Moments Using a Balanced Ruler and Hooke's Law as shown in Fig. 8, 9 and 10. The participants were finally asked to fill a user feedback questionnaire which was based on a two-point Likert scale (Yes/No) after the performing both experiments. The questions in the survey was drafted to indicate the level of system effectiveness and their overall satisfaction with the application developed based on the following 10 purposively selected metrics: Graphic design (T1), User-Friendliness (T2), Meaningful Arrangement (T3), Meaningful Size (T4), Object-Attribute visible (T5), Semantic Correlation of Text and Simulation (T6), Ability to Understand (T7), Ease of Use (T8), Reading Robustness (T9), Reusability (T10).Text2Simulate knowledge Visualization application was chosen as the independent variable while T1 to T10 were the dependent variables. Results of the analysis of mean is in Fig. 13. Fig. 13. Graph of mean for teachers and students evaluation of Text2Simulate based on T1-T10 metrics Fig. 13 showed that the teachers were fully satisfied as T2, T3, T4, T5, T6, T8, T9, T10, and T11 metrics have an average value of 1. The metrics are User-Friendliness, Meaningful Arrangement, Meaningful Size, Object-Attribute visible, Semantic Correlation of Text and Simulation, Ease of Use, Reading Robustness, Reusability, and Knowledge Sharing. This can be attributed to the essential need for Text2Simulate application as a teaching aid and for electronic learning due to the recent pandemic occurrence as stated by one of the teachers. The high performance on these metrics is also due to the current non-availability of the laboratory apparatuses in schools of the participants and their preference for Text2Simulate application over conducting the experiment for students in their laboratories if available. The mean scores for student's overall satisfaction based on the metrics are well above 0.5. It is shown that the scores range from 0.7 to 0.95. This reflects that most of the students were satisfied with the application.

VI. CONCLUSIONS AND FUTURE WORK
A Text-to-Simulation Knowledge Visualization (TSKV) technique for generating visual simulations from domain knowledge has been presented and implemented using a newly developed Knowledge Visualization application called ‗Text2Simulate'. The generated results have shown that precise semantic visual representations of spatial attributes and relationships between objects of discourse can be generated from natural language text using the above technique. The developed rule-based semantic classifier can be used for domain-related classification of text which requires classification and extraction of objects, object properties, and the relationship between objects. The text-tosimulation technique for knowledge visualization produced a better visual representation of textual knowledge than existing knowledge visualization techniques due to its ability to visualize spatial object attributes retrieved from the text. This technique could be employed when developing electronic learning applications.