Classifying Natural Language Text as Controlled and Uncontrolled for UML Diagrams

Natural language text fall within the category of Controlled and Uncontrolled Natural Language. In this paper, an algorithm is presented to show that a given language text is controlled or uncontrolled. The parameters and framework is provided for UML diagram's repository. The parameter for controlled and uncontrolled languages is provided. Keywords—Natural Language Processing; UML Diagrams; Software Engineering


INTRODUCTION
Natural Language processing is using computer to process text which is readable and understood by humans.The processing in NLP requires natural language text to be given as input [1].This input is can be classified as controlled or uncontrolled.NLP techniques are more effective in case of controlled languages instead of uncontrolled languages.The controlled language can be in form of plain text or any formal specification such BPNF.The authors had developed code for the removal of noise in the given the text [2].
In this work, Classifying Controlled Language (CCL) methodology is proposed.CCL methodology takes as input a natural language input and verifies using wordnet, framenet and StanfordNLP parser for checking if the language is controlled or not.The controlled languages has host of advantages which are missing in the uncontrolled language.This work is in continuation with the TextToUML (TTU) methdology presented earlier [3].
Our contribution is hence given as following:-1) Providing a sample textual description for the class and activity diagrams.
2) Classifying the textual description as controlled and uncontrolled for UML Diagrams.

II. PROBLEM DEFINITION
The problem relates to giving the appropriate input for generating UML diagrams.There have been considerable efforts or converting text to UML diagrams [6][7][8].The textual description for class and activity diagrams is not presented by any researcher.
Use of textual description is extensively done for use-case diagrams [4].The simple text in any natural language is understood by large population of people if they are native to it.In order to achieve universal programmability, use of text [5] can be done as it can be understood by humans.This implies everyone gets a chance to code while everything else being done by the computer.There are following benefits of using textual description for the UML diagrams:-1) Natural Language Text is known to humans.2) Natural Language Text is understood by humans.www.ijacsa.thesai.org 3) Subject to less interpretation and speculation if the text is controlled.4) Understood by large number of audience.5) Possible to derive many applications of automation.6) More comfort of understanding 7) Most convenient form of expressing information.Figure-1 lists the advantages of textual description.The level of comfort is more for a human readable text.The level of comfort goes down as the specification moves away from the human readable text.The machine readable code is most difficult to interpret and understand.The previous literature did not focus on the classifications according the domain level [13,14].

A. Textual Description of UML Diagrams
Cockburn et.al.provides a use case template which is exhaustive example of an textual description [4].While the use case template is specific only to use case diagrams no other diagrams have such description.UML diagram are developed when the analysis phase is about to finish and design phase gets started.Generally, the phases in SDLC are always overlapping and hence it is not feasible to tell when the analysis phase has started and when it will finish.Textual description allows automation to be made possible in early phases of SDLC.This can be made possible by generating textual descriptions of all the different artefacts, processes, methods, tools in SDLC.

B. UML Diagram Basic Usage
There exists a textual specification in form of textual use case template which is helpful in generating the use case UML Diagram [4].Use case diagram generation is hence easy to automate.But there is no other description available.This explanation answers the research question-RQ-1.For answering the second question (RQ-2), we gave a sample textual description for the class and activity diagrams.The description was evaluated using StanfordNLP parser.

Textual Description -Class Diagram
A class diagram consists of two main features as [15]:-1) Components of the class a) These are names, variables and operations.2) Relationships with other classes.
3) These include relationships such as dependency, realization etc.
The class description should describe all the situations in which the class participates.The class diagram can be drawn from both the problem as well as solution domain.Hence, it has classes in both problems as well as solution domain must be properly described so that the problem can be automated.A sample description of the class diagram containing the above mentioned variables is available in Annexure A.

Textual Description -Activity Diagrams
1) An Activity diagram mainly focuses on the behavior of a set of objects [15].A textual description for activity diagram should hence contain following parts:a) Objects It includes the names of objects.b) Associated workflows.2) A sample description of the activity diagram containing the above mentioned variables is available in Annexure A.
3) The textual description is first parsed using the Stanford NLP parser.The result is classified as Controlled Language and Uncontrolled language for the generation of UML diagrams.

Rules for Use Case Diagrams
For the input scanned using Stanford Parser:-1) check the occurrence of/index of NN and NNP words these may constitute use case.
2) If it follows Subject Verb Object, the subject becomes actor.
3) If a sentence contains occurance of NNS, NNS becomes check the hierarchy in wordnet.
4) If in a sentence VB occurs get its index as it is a potentially a use case.
The above rules are applied for generating text ready to be fed into the system.

Rules for Class Diagrams
For the input scanned using Stanford Parser 1) check the occurance of/index of NN and NNP words these may constitute Class.
2) If it follows Subject Verb Object, the subject becomes class.The CCL methodology encompasses two important phases: pre-processing and post-processing of text.The table-II shows the software's used in the preprocessing stage of CCL methodology along with the explanation of the work done at each level.

IV. APPLICATION OF WORK
This work can be applied to several approaches which utilize textual information as input:-1) Human User Textual Notation (HUTN) is a recent specification given by Object Management Group (OMG).That specification is also in textual format.[16].The CCL methodology can be applied to it.
2) Software Requirement Specification, Requirement Document, Use Case Description, Acceptance Test Cases are the artifact on which CCL can be applied in analysis phase.
3) Software Design Specification, UML Diagrams, Test Cases are the artifact on which CCL can be applied.
4) Test cases and test manuals are the areas in which CCL algorithm can be applied.
5) Maintenance logs, user complaints, customer executive logs are also areas in which CCL can be applied.
6) UML Lookup Repository can be expanded to include other domains in Software Engineering for the purpose of generating useful information or for automation.www.ijacsa.thesai.orgtrain ticket for which ticket is reserved.The passenger does not board any other train and reaches the destination station.

UML Repository Table
The repository was developed using the basic information pertaining to the UML Diagram.

Fig. 1 .
Fig. 1.Advantages of Textual Description The following research questions were not extensively addressed in the literature review:-RQ-1.What are the textual descriptions which describe the UML diagrams?RQ-2.Is it possible to provide a textual specification for the entire UML diagrams?RQ-3.How can Natural Language Text be classified as Controlled and Uncontrolled for UML Diagrams ?
Coding in High Level LanguagesMachine Readable (Binary Code) Comfort www.ijacsa.thesai.org 3) If a sentence contains occurance of NNS, NNS becomes check the hierarchy in wordnet.4)If in a sentence VB occurs get its index as it is a potentially a use case.Multiple sentences containing the same subjects are not considered for developing classes.The whole process is divided into pre and post-processing of the text as shown in Figure-2.

Table - 1
provides list of papers which have usage of textual description in UML diagrams or in related fields.

Sr. No. Operation Carried Out Software's Used/Methodology used Explanation
TABLE.II.SOFTWARE USED FOR PRE-PROCESSING STAGE

TABLE .
III. SOFTWARE USED FOR POST-PROCESSING STAGE