Translation of Pronominal Anaphora from English to Telugu Language

Discourses are linguistic structures above sentence level. Discourse is nothing but a coherent sequence of sentences. Discourse analysis is concerned with coherent processing of text segments larger than the sentence and this requires something more than just the interpretation of the individual sentences. A phenomenon that operates at discourse level includes cohesion. Text is cohesive if its elements link together. This linking can be either forward or backward. Pronominal referencing is one method for linking sentences. This paper presents the issues in translating pronominal references from English to Telugu language. This work handles resolution and generation of personal pronouns whose antecedents appear before the anaphora. An algorithm is developed for translation of pronominal references. Keywords—GNP; Gender number person; SL: Source language; English; TL; target language: Telugu; S-singular; P-plural, M- masculine; F-feminine; N-neuter; VBD - past tense verb form; VBZ- 3 rd person singular present verb form; VBP- non 3 rd person singular present verb form; MD- Modal I. INTRODUCTION Bloomfield and Chomsky 1957 have defined that the sentence is the largest grammatical unit for language analysis. Halliday and Hasan (1976) threw light on the concepts like coherence and cohesion. Discourse analysis is concerned with coherent processing of text segments larger than the sentence and this requires something more than just the interpretation of the individual sentences. Machine translation refers to the task of translating text from one natural language to another with minimal human intervention. The present machine translation system is a rule based machine translation system, where parallel grammar was developed for both source and target languages. Phrase structure grammar framework is used to develop the grammar rules for the languages. This system is able to translate text above sentence level. The text above sentence level also called the discourse text. Translation of discourse includes resolving references used in the sentences. This paper presents the resolution and evaluation of these anaphora problems in translating from English to Telugu language.


INTRODUCTION
Bloomfield and Chomsky 1957 have defined that the sentence is the largest grammatical unit for language analysis.Halliday and Hasan (1976) threw light on the concepts like coherence and cohesion.Discourse analysis is concerned with coherent processing of text segments larger than the sentence and this requires something more than just the interpretation of the individual sentences.Machine translation refers to the task of translating text from one natural language to another with minimal human intervention.
The present machine translation system is a rule based machine translation system, where parallel grammar was developed for both source and target languages.Phrase structure grammar framework is used to develop the grammar rules for the languages.This system is able to translate text above sentence level.The text above sentence level also called the discourse text.Translation of discourse includes resolving references used in the sentences.This paper presents the resolution and evaluation of these anaphora problems in translating from English to Telugu language.

II. PRONOMINAL REFERENCE AND ANAPHORA
A grammatical term for pronoun, which refers back to another word or phrase, is called Anaphora.Halliday and Hasan defined anaphora as the cohesion which points back to some previous item [1].The item which refers is called anaphor and the item which is referred is called the antecedent.
Ex: 1 Ram went to fruit market.He likes apples very much.
In the above sentence 'He' is a pronoun which refers to Ram in the previous sentence.Here 'He' is an anaphor and 'Ram' is an antecedent.This is the most common type of anaphor called the pronominal anaphora.Anaphora phenomenon has two processes, resolution and generation.'Resolution' refers to the process of determining the antecedent of an anaphor; 'Generation' is the process of creating references over a discourse entity.This work handles resolution and generation of personal pronouns whose antecedents appear before the anaphora.cataphoric relations are not taken into account in this study.The translation of third person personal pronouns from English to Telugu language has been evaluated on unrestricted corpora.The precision achieved in translating personal pronouns is above 75%.Personal pronouns can also be used as objects to refer to the antecedents which are objects of the previous sentence.
1) Intra-sentential Anaphora Intra-sentential anaphora has the two co-referring expressions in the same sentence [2].The first phrase in co-reference is called the antecedent and the second one is anaphor.Intra-sentential anaphora resolution relies on syntactic, rather than discourse cues [3].Ex: 2 a) When jack arrived at the party, he was drunk b) When jack arrived at the party she was drunk.In the above example 2a) is an ill formed sentence for an obvious constraint, that for two noun phrases to co-refer they must agree in gender, number and person.1a violates this constraint as jack is female and a pronoun 'he' which is masculine is used to co-refer jack.So the correct usage here is using a pronoun 'she' as in example 2b) 2) Inter-sentential Anaphora: Co reference can occur between two different sentences.If a pronoun is used to refer a noun in the previous sentence it is called inter-sentential anaphoric reference [4].Pronouns are used to replace nouns.Pronouns have all the features that a noun has.Pronouns carry the information called gender, number and person.They are chosen to refer a noun based on the GNP features of a noun they are referring.Personal pronouns in Telugu corresponding to English [5] [9] are shown in table 1. www.ijacsa.thesai.org

Person
Singular Plural Anaphoric resolution is of crucial importance in order to translate anaphoric expressions correctly into target language.Resolution refers to the process of identifying the antecedent of an anaphor [3].If there are more than one noun in a sentence to which an anaphor can refer then ambiguity arises in resolving the antecedent of an anaphor.Understanding the sentence and translating them correctly requires world knowledge.Contextual understanding is required to understand and translate such sentences.Humans use more refined and flexible inference making and problem solving capabilities for interpreting these texts.If we can imbibe these processing capabilities to a machine, the accuracy of the system will be as good as a human translator.
Ex: 3 SL: Radha and Ravi are good friends of Raju.She is a very naughty girl.
SL: Raju is playing guitar.It is an electronic device.
TL: Raju guitar vayinchutU unnadu.adi oka electronic parikaram In the above two sentences the antecedents of anaphors are easily identified.In the first example 'she' refers to Radha.In the second example 'it' refers to guitar.Translation is done without any ambiguity as these anaphors have only one interpretation in Target language.In some cases there will be more than one antecedent to which an anaphor can refer.

SL:
Radha bought bananas and bangles.They were very sweet.

TL:
rAdha araTi paLLu gAjulu techchinadi.avi chAla tiyyaga u.mdinavi In the above example 'they' can refer to either bananas or bangles.GNP features of both the nouns match with GNP features of 'they'.By applying world and contextual knowledge we can understand that here 'they' refers to bananas as bangles have no taste.Translation does not incur ambiguity as both bangles and bananas are having neuter gender, either of them being the antecedent the anaphor 'they' will be translated as 'avi'.
Ex: 5 SL: Radha bought bananas and bangles.They were yellow in color.
TL: rAdha araTi paLLu gAjulu techchinadi.avi pachcha ra.mgu lO u.mdinavi In the above example 'they' can refer to either bananas or bangles.GNP features of both the nouns match with GNP features of 'they'.By applying world and contextual knowledge we cannot understand whether 'they' refers to bananas as bangles as both of them can be yellow in color.
Here either the author wants to express that both bananas and bangles are yellow in color or he should explicitly use the noun instead of anaphor to avoid the ambiguity.
In the above example 'they' can refer to either friends or fruits.Number and person features of both friends and fruits are the same but gender feature differ.The gender of friends CAN be male or female and fruits is neuter.'They' can refer either friends or fruits.If 'they' refers to a neuter gender noun, then it will be translated as 'avi' else it will be translated as 'vAru/vALLu' and accordingly the verb suffix will change.By applying world and contextual knowledge its understood that human beings tire but fruits don't.Accordingly 'they' will be translated as 'vAru' to refer friends.Ex: 7: SL: Radha came home with her friends and some fruits to eat.They were very fresh.
TL: rAdha tana snehitulu mariyu tinuTaku konni paLLu thO vaccinadhi.vAru chala tajaga vu.mdinaru TL: rAdha tana snehitulu mariyu tinuTaku konni paLLu thO vaccinadhi.avi chala tajaga vu.mdinavi Taking the same example with slight modification introduces ambiguity.In the example 'they' can refer to either friends or fruits as the adjective 'fresh' can be used for either of them.By applying world and contextual knowledge Its difficult to tell whether 'they' refers to fruits or friends.

IV. TRANSLATION OF ANAPHORS
Translation of anaphors involves three major steps.First step is identification of the antecedent of the anaphor by matching the features of the anaphor with the nouns of the previous in the nominative form.Second step is identification of anaphor of the target language.While translating anaphor from SL to TL the features of anaphors of SL are mapped to the anaphors of target language.If more than one entry is available for the anaphor in the bilingual lexicon then match the GNP features of the antecedent to which the anaphor is referring and anaphor of TL.Third step is verb suffix change according to the subject verb agreement rules of the target language.
• Verb dependency on Anaphors English verbs are not strongly inflected.The only inflected forms are third person singular simple present in -s, a simple www.ijacsa.thesai.orgpast form, a past participle form, a present participle and gerund form in -ing.Most verbs inflect in a simple regular fashion.There are some irregular verbs with irregular past and past particle forms [1].If pronoun is the subject then the auxiliary verb should agree with the number and person features of the subject.
Telugu verbs are formed by combining roots with other grammatical information.Simple verbs in their finite forms are inflected for tense followed by GNP endings or states.In order to indicate aspect and modality of verbs various auxiliaries are employed The structure of the verb will be like Verb stem+ Tense Suffix+ GNP Suffix.When a pronoun is the subject of a sentence, the verbs agrees in person, number, and when using third person agrees with gender also [7] [8].
The verb inflections should agree with gender and number features of the subject, noun.Though Telugu nouns have three genders and two numbers the verb suffixes change in a different way.
In singular number, feminine and neuter nouns have the same verb suffixes but masculine nouns have different verb suffixes.In plural numbers masculine and feminine nouns have same GNP endings, but for neuter nouns they differ.The suffixes for the verb 'go' are shown in the table 2.

TABLE I .
PERSONAL PRONOUN OF TELUGU AND ENGLISH

TABLE II .
SUFFIXES OF VERB 'GO' FOR DIFFERENT GNP FEATURES