Generation of Attributes for Bangla Words for Universal Networking Language(unl)

— The usage of native language through Internet is highly demanding now a day due to rapidly increase of Internet based application in daily needs. It is important to read all information in Bangla from the internet. Universal Networking Language (UNL) addressed this issue in most of languages. It helps to overcome the language barrier among people of different nations to solve problems emerging from current globalization trends and geopolitical interdependence. In this paper we propose a work that aims to contribute with morphological analysis of those Bangla words from which we obtain roots and Primary suffixes and developing of grammatical attributes for roots and Primary suffixes that can be used to prepare Bangla word dictionary and Enconversion/Deconversion rules for Natural Language Processing(NLP).


INTRODUCTION
In the last few years, machine translation techniques have been applied to web environments.The growing amount of available multilingual information on the Internet and the Internet users has led to a justifiable interest on this area.Hundreds of millions of people of almost all levels of education and attitudes, of different jobs all over the world use the Internet different purposes [1].English is the main language of the Internet.Understandably, not all people know English.Urgent need, therefore, arose to develop interlingua translation programs.The main goal of the UNL system is to provide Internet users access to multilingual websites using a common representation.This will allow users to visualize websites in their native languages.The UNL system has a growing relevance since the usage of the WWW is generalized across cultural and linguistic barriers.Many languages [10] such as Arabic, French, Russian, Spanish, Italian, English, Chinese or Brazilian Portuguese have already been included in the UNL platform.Lexical knowledge representation is a critical issue in natural language processing systems.Recently, the development of large-scale lexica with specific formats capable of being used by several different kinds of applications has been given special focus; in particular to multilingual systems.Our aim is to introduce Bangla into this system.In order to implement this project with the lowest time and human effort costs, we will reuse linguistic resources already available as much as possible.
In this paper we present the UNL system for Bangla.The major components of our research works touches upon i) development of grammatical attributes for Bangla root and Krit Prottoy to construct Bangla Word Dictionary and use of morphological analysis ii) UNL Expression of the Bangla attributes and iii) Selecting scheme of attributes.In section 2 we describe the UNL system.In sections 3, 4 and 5 we present our main works that include all the above three components.

II. UNIVERSAL NETWORKING LANGUAGE
UNL is an artificial language that allows the processing of information across linguistic barriers [10].This artificial language has been developed to convey linguistic expressions of natural languages for machine translation purposes.Such information is expressed in an unambiguous way through a semantic network with hyper-nodes.Nodes (that represent concepts) and arcs (that represent relations between concepts) compose the network.UNL contains three main elements: • Universal Words: Nodes that represent word meaning.
• Relation Labels: Tags that represent the relationship between Universal Words.Tags are the arcs of UNL hypergraph.
• Attribute Labels: Additional information about the universal words.
These elements are combined in order to establish a hierarchical Knowledge Base (UNLKB) [10] that defines unambiguously the semantics of UWs.The UNL Development Set provides tools that enable the semi-automatic conversion of natural language into UNL and vice-versa.Two of such tools are the EnConverter and the DeConverter.The main role of EnConverter [11] is to translate natural language sentences into UNL expressions.This tool implements a language www.ijacsa.thesai.orgindependent parser that provides a framework for morphological, syntactic and semantic analysis synchronously.This allows morphological and syntactical ambiguities resolution.The DeConverter [3,12], on the other hand, is a language independent generator that converts UNL expressions to natural language sentences.

A. Universal Words
Universal Words are words that constitute the vocabulary of UNL.A UW is not only a unit of the UNL syntactically and semantically for expressing a concept, but also a basic element for constructing a UNL expression of a sentence or a compound concept.Such a UW is represented as a node in a hypergraph.There are two classes of UWs from the viewpoint in the composition: • labels defined to express unit concepts and called "UWs" (Universal Words) • a compound structure of a set of binary relations grouped together and called "Compound UWs".

B. Relational Labels
The relation [1] between UWs is binary that have different labels according to the different roles they play.A relation label is represented as strings of three characters or less.There are many factors to be considered in choosing an inventory of relations.The following is an example of relation defined according to the above principles.
Relation: There are 46 types of relations in UNL.For example, agt (agent), agt defines a thing that initiates an action, agt(do, thing), agt(action, thing), obj(thing with attributes) etc.

C. Attributes
The attributes represent the grammatical properties of the words.Attributes of UWs are used to describe subjectivity of sentences.They show what is said from the speaker"s point of view: how the speaker views what is said.This includes phenomena technically [4,5] called speech, acts, propositional attitudes, truth values, etc. Conceptual relations and UWs are used to describe objectivity of sentences.Attributes of UWs enrich this description with more information about how the speaker views these state of affairs and his attitudes toward them.

III. MORPHOLOGY OF BANGLA WORDS
Morphology is the field of linguistics that studies the structure of words.It focuses on patterns of word formation within and across languages, and attempts to formulate rules that model the knowledge of the speakers of those languages.Thus morphological analysis is found to be centered on analysis and generation of word forms.It deals with the internal structure of words and how words can be formed.Morphology plays an important [2,8] role in applications such as spell checking, electronic dictionary interfacing and information retrieving systems, where it is important that words that are only morphological variants of each other are identified and treated similarly.In natural language processing (NLP) and machine translation (MT) systems we need to identify words in texts in order to determine their syntactic and semantic properties [7].Morphological study comes here to help with rules for analyzing the structure and formation of the words.A Bangla morpheme, besides the root word, is supposed to be represented in the Bangla-UNL dictionary using the following UNL format [10].
[HW] "UW" (ATTRIBUTE 1, ATTRIBUTE 2 …) <FLG, FRE, PRI> HW← Head Word (Bangla Word) UW← Universal Word ATTRIBUTE← Attribute of the HW FLG← Language Flag FRE← Frequency of Head Word PRI← Priority of Head Word The attributes describe the nature of the head word classifying it as a grammatical, semantic or morphological feature.So, we will be especially concerned about representation of morphemes using various attributes.
In our work, we will make separately Word Dictionary entries for all of these prefixes and words, so that they can combinely make meaningful words by applying rules.For example, if we consider prefix "প্রতত্" [9] (means like/similar/every/opposite/against etc.) we can make "প্রতত্তদন", "প্রতত্ব্দ" etc.Now we can make the word "প্রতত্" for dictionary entry.But the word "প্রতত্" has two or more meanings so that we have to represent two or more dictionary entries for the word as follows.
We have to represent only the words "তদন", "ব্দ" in the dictionary entry as per the following format.

E. Verb Morphology
Diversity of verb morphology in Bangla is very significant.We can select the head words as the Longest Common Lexical Unit (LCLU) of all the possible transformations of the word [8].We can give the example of the Bangla word "ড়" (means read).The corresponding UW in basic form is "read".The dictionary entry is: [ড়া] { } "read (icl>do)", where "ড়v &" is the head word and (icl>do) is from the knowledge base.Some possible transformations of "ড়" in the Bangla to UNL dictionary are given as follows [9,10]: If we consider "ড়" (means read) as a root, we can represent this root in the dictionary as [ড়]{} "read (icl>do)" (V, @present) <B,0,0> Some transformations based on the persons and tenses are.

IV. MORPHOLOGY OF BANGLA ROOT WORDS
Bangla Language contains a lot of verbs.The core part of those verbs is called roots.In another way if we split the verbs we get two parts Roots and Suffixes.From verbs if we remove suffixes we get roots.For example "K ‡i" (do) is a verb.Its two parts are: Ki+G; here "Ki" is a root and "G" is a suffix.Some other Bangla roots verbs are Pj& , co&, ai& , Mo&, Nl&, bvP&, Kuv`& etc.

A. Bangla Primary Suffixes (K…ৎ cÖZ¨q)
We know that the core of the verb is called root and if number of suffixes are added to roots then they form verbs.When sound or sounds [8] are added with roots and form nouns or adjectives then the root words are called root verbs and the sound or sounds are added with root verbs are called Primary Suffixes.For example Pj& (Root verb)+Ab& (Primary Suffix)=Pjb (Noun) and Pj& (Root verb)+Aš-(Primary Suffix)= Pjš-(Adjective).Some others primary suffixes are Ab, Abv, Awb, AK, Av etc. www.ijacsa.thesai.org

C. Morphological Analysis of Bangla verbs
Morphological analysis is applied to identify the actual meaning of the word by identifying suffix or morpheme of that word.Every word is derived from a root word.A root word may have the different transformations.This happens because of different morphemes which are added with it as suffixes.So, the meaning of the word varies for its different transformations.For example, if we consider "Ki&"(do) as a root word then after adding ÕB" we get the word "Kwi" [6,8] which means a work done by some one(first person , present tense).Similarly after adding "Av", we get the word "Kiv".Here, this word represents noun of the root word "Ki&".Therefore, by morphological analysis we get the grammatical attributes of the main word.Derivational morphology is simple and a word rarely uses the derivational rule in more than two or three steps.The first step forms nouns or adjectives from verb roots.The next steps form new nouns and adjectives [5].We have examined derivational morphology for UNL Bangla dictionary too.

V. METHODS OF FINDING GRAMMATICAL ATTRIBUTES
Representing Universal Words (UWs) for each of the Bangla Head Word we need to develop grammatical attributes that describe how the words behave in a sentence.Grammatical Attributes (GA) have to be developed by the rules (Enconversion and Deconversion) and dictionary developers.They play very important rules for writing Enconversion and Deconversion rules because a rule uses GA in morphological and syntactic analysis, to connect or analyze one morpheme with another to build a meaningful (complete) word and to examine or define the position of a word in a sentence.
When we analyze the HWs for representing them in the word dictionary as UWs, we find all the possible specifications of the HWs as attributes named grammatical attributes, so that they can be used in the dictionary for making rules (EnCo and DeCo).For example, if we consider "cvwLÓ meaning bird as a head word, then we can use attributes N (as it is noun), ANI (as bird is an animal), SG for singular number and CONCRETE (as it a concrete thing which is touchable).So, this word can be represented in the dictionary as follows: [cvwL]{}"bird(icl>animal>animate thing)"(N,ANI,SG,CONCRETE)
As we are the initiators of developing rules and word dictionary for Bangla we are proposing some grammatical attributes and their descriptions in table 1.
Ratherবদারin place of দু ল্ and বখারin place of খু র is added with "আ" Krit Prottoy and form meaningful word.For example, বদার+আ=বদারা,বখার +আ=বখারা.We make two entries in Bangla word dictionary for URoots.For example, দু ল্ and বদার, খু র and বখার.In Bangla there are some primary suffixes which added with roots and form a new different words for example, ফচ + w³ = ভু w³ and ভু চ+ w³= ভু w³.These words are added to word dictionary in special category.
The dictionaries would be: [উw³]{} "speech (icl>do)"( ROOT, BANJANT, SP) <B,0,0> [ভু w³] {} "free (icl>do)"( ROOT, BANJANT, SP) <B,0,0> The suffixes আন্ত, তত্, আ etc. will be in the dictionary only with grammatical attributes.They will be added with the roots to form verbs, nouns or adjectives using rules.In the above examples we have classified suffixes in the basis of adding either with SORANTO (vowel ended) or BANJANTO (consonant ended) to give them proper attributes so that they can be used to make appropriate rules enconversion and deconversion.

VI. CONCLUSION AND FUTURE WORK
A system capable of understanding natural language sentences is of potentially unlimited uses in the field of natural language processing.In this paper we have generated grammatical attributes of Bangla roots and Krit Prottoy for developing Bangla Word Dictionary for Universal Networking Language (UNL).
We have presented some method to select grammatical attribute using morphological analysis Bangla words that can be used to make dictionary for converting the Bangla sentences to UNL documents and vice versa.We have done limited work so far for Bangla words.
Our future plan is to build a Bangla language server that will contain a complete Bangla Word Dictionary.

Table 1
some proposed grammatical attributes D. To solve this problem we divide BANJANTA Roots into two categories.One is General BANJANTA that is attributed as BANJNT and another is attributed with URoots.