An Evaluation of the Accuracy of the Machine Translation Systems of Social Media Language: Google‟s Arabic-English Translation as an Example

—In this age of information technology, it has become possible for people all over the world to communicate in different languages through social media platforms with the help of machine translation (MT) systems. As far as the Arabic-English language pair is concerned, most studies have been conducted on evaluating the MT output for the standard varieties of Arabic, with fewer studies focusing on the vernacular or colloquial varieties. This study attempts to address this gap through presenting an evaluation of the performance of MT output for vernacular or colloquial Arabic in the social media domain. As it is currently the most widely used MT system, Google Translate (GT) has been chosen for evaluating the reliability of its output in the context of translating the Arabic colloquial language (i.e., Egyptian/Cairene Arabic variety) used in social media into English. With this goal in mind, a corpus consisting of Egyptian dialectal Arabic sentences were collected from social media networks, i.e., Facebook and Twitter, and then fed into GT system. The GT output was then evaluated by three human translators to assess their accuracy of translation in terms of adequacy and fluency. The results of the study show that several translation problems have been spotted for GT output. These problems are mainly concerned with wrong equivalents, inappropriate additions and deletions, and transliteration for out-of-vocabulary (OOV) words, which are mostly due to the literal translation of the Arabic vernacular sentences into English. This can be due to the fact that Arabic vernacular varieties are different from the standard language for which MT systems have been basically developed. This, consequently, necessitates the need to upgrade such MT systems to deal with the vernacular varieties.


I. INTRODUCTION
Arabic is a diglossic language. For many years, however, writing used to be confined to Modern Standard Arabic (MSA) which was and still considered by some to be more prestigious [1]. With the development of social media networks and platforms, however, millions of users in the Arab world have been using their vernacular and colloquial dialects in expressing themselves. In other words, social media networks have given legacy to the colloquial forms of Arabic [2]. Millions of Arab users are using these forms in reflecting on different issues and there is an increasing demand from institutions and individuals for translating a lot of this stuff. Political agencies, manufacturers, branders, and researchers are often concerned with understanding what people say and think about. It is impossible for human translators to meet these translation needs of institutions and individuals.
In response to these needs, various machine translation (MT) systems including Google Translate, Microsoft Translator, Amazon Translate, Bing Translator have been developed. Despite the effectiveness of these systems in providing reliable translation services, many questions are still raised concerning the accuracy and quality and thus reliability of MT systems of Arabic social media [3]. This can be attributed in part to the lack of evaluation studies of the performance of MT systems with Arabic social media [4]. Therefore, this study seeks to address this gap in the literature through the evaluation of Google Translate Arabic into English translation of Arabic social media language. The rationale behind choosing Google Translate for the current study is that Google Translate is the most widely used MT system for Arabic-English translations.
Data from Facebook and Twitter will be collected for the purposes of the study. Length and representativeness issues will be considered. Manual evaluation methods will be used for evaluating the performance of Google Translate. The rest of the paper is organized as follows. Section 2 is a brief survey of the evaluation studies of Google Translate. Section 3 describes the methods and procedures of the study. Section 4 reports on the results of the translation of Google Translate of the selected data. Section 5 is a discussion and interpretation of the results. Section 6 concludes the paper with suggestions for further research.

II. LITERATURE REVIEW
Omar, et al. [5] argue that social media platforms accommodate colossal varieties of dialectal communications of Arabic. Such inclusion of different varieties has necessitated a wide use of automatic translation on social media networks. Irrespective of the purpose that pushed towards this type of translation, which can be economic or political, etc., this study is concerned with the reality that machine translation is heavily counted on in social media, amongst these is Google Translate. Zantout and Guessoum [6] illustrate that Arabs who are unable to interact or use English have lesser chance to be familiar with 50% of the content of the Web. This explains the reason that people in the Arab world resort to using machine translation. They further pinpoint the fact that it is momentous to use machine translation as it allows access to the technological advances happening in the world.
Zughoul and Abu-Alshaar [7] deduce that machine translation, also known as automatic translation, is a process that involves statistical approaches of using "rules and assumptions to transfer the grammatical structure" from one source language into another target language. Machine Translation can be defined as "the application of computers to the task of translating texts from one natural language to another" [8]. Machine translation has been developed in the field of computer science and it has been deemed of great value to a number of areas where technological endeavours are aspired for [9]. The future of machine translation, especially in the light of unprecedented development of social media tools, is booming. This has been stressed by Technology Review cited in [7] that "Universal translation is one of 10 emerging technologies that will affect our lives and work in revolutionary ways within a decade". Indeed, the Arab world is now open to all tools of social media and Arabic language is one of the languages that is available on Google Translate and other machine translation systems. Zantout and Guessoum [6] state that the Arab world is in shortage of human translators that make it follow the technological advances the world witnesses. This situation has increased the pressure of heavily relying on machine translation in order for the Arab people to keep up to date with the renovation of knowledge in all disciplines.
The process of translating knowledge or information is challenging and tedious. It could be deemed time-consuming when done by human translators. Penrose [10] cited in [11] explains that the idea of machine translation is that it imitates human minds. Given the fact that a human mind can perform complicated and sometimes enigmatic tasks, then it is possible to construct a machine that can perform this duty in an accelerated manner. This requires uploading all linguistic knowledge into software with constant input of information. In fact, technology has made it possible for people to communicate in different languages through social media platforms with the help of machine translation systems. The need to real-time translators in immediate exchange of interactions in business or socio-political situations has lessened; even though, the faults of the machine could be insurmountable. This is what this study is dealing with, as it attempts to present an evaluation of machine translation output in relation to Arabic through social media texts with special focus on Google Translate. This study assumes that the Arab people are able to interact and exchange information among cultures and across the globe without being knowledgeable of the target language; nonetheless, they are able to manage their messages through. In this respect, and in relevance to our argument, Hadla, et al. [12] argue that "most of the people with Arabic as their mother tongue use dialects in their communications at home". However, such use is even greater on social media, we argue. In fact, many studies have been conducted to evaluate the effectiveness, accuracy and reliability of machine translation, but before we visit those studies, it is relevant and crucial to explore Google Translate system, being the one that is mostly used on such platforms.
Google Translate was introduced in 2006 and is considered one of the most popular machine translation systems. It is highly used by most people all over the world [13]. Sherman [14] states that Google Translate is a statistical, phrase-based machine translation (PBMT) model. Later in 2016, it was updated with a Neural Machine Translation (NMT) model. NMT is a sophisticated method that outrates statistical methods [15]. Cheng [16] mentions that NMT employs a neural network that deals with input through various layers before it goes out. It uses deep learning techniques that results in quicker translation outcomes [17]. This enhancement of Google Translate is marked with both high-quality processing of translation and speed. It is stated that NMT uses algorithms that are capable of comprehending the linguistic rules in a way that outperforms the old statistical approach [16]. Mahmood and Al-Bagoa [13] illustrate that Google Translate translates more than 100 billion words per day to support 107 languages and more than 500 million people use it. Mahmood and Al-Bagoa [13] further explain that Google Translate can translate full web pages, spoken languages, text images, render handwritten pattern and can also provide pronunciation and read out translations.
However, with such high-tech in-built introduction to Google Translate, it has been under constant evaluation by translation studies scholars. In this research, we try to describe the reliability of the system by using social media texts. It is worth mentioning that there have been a number of studies, including Arabic, evaluating the quality of Google Translate translations. Hadla, et al. [12] show that there are three main categories used in the process of evaluating machine translation systems: human evaluation, automatic evaluation, and embedded application evaluation. In this section, we explore a number of studies related to human evaluation and automatic evaluation. Alkhawaja, et al. [17] say that the product of machine translation can be evaluated by human translators who have access to both the source and target languages. They evaluate a text in terms of adequacy, fluency, accuracy and cognition. They can also compare two outputs of machine translation to scrutinize the differences and similarity between them. The literature exhibits that there are a quite number of evaluation studies carried out on Google Translate; however, research in Arabic remains decent in this area. www.ijacsa.thesai.org Al-khresheh and Almaaytah [18] use Google Translate to evaluate the translation of English proverbs into Arabic in a small-scale study. They compared the output of the machine to a translation done by students. This study concluded that the translation of Google Translate was inaccurate in terms of rendering syntactic and lexical patterns. It can be argued that the structure of proverbs is a challenge to the machine at the moment. In a similar vein, Hadla, et al. [12] used nonproverbial Arabic sentences in their study to evaluate and compare the outcomes of Google Translate and Babylon. They used a corpus of 1033 sentences in these two machine systems and compared them to model translations. The results of their study indicated that Google translate outperformed Babylon on grounds of precision and accuracy. Further, their findings concur Al-khresheh"s in that the system was incapable of handling Arabic sayings and proverbs. The ungrammatical structure that Google Translate unintelligibly produces is usually related to wrong word order, mispositions of verbs in the sentences, lexicosemantic slips and probably incorrect tense concordance [17]. Inaccuracy of Google Translation output was also evident in a study conducted by Hijazi [19] who evaluated the translation of English into Arabic legal texts. It could be argued that the newly introduced NMT update of Google Translate would produce more accurate results when it comes to legal texts. Hijazi"s research was in 2013, three years before the update which was in 2016. It is recommended that a new research comparing legal texts by using machine translation and Neural Machine Translation is conducted to offer better highlights on the accuracy and adequacy of meaning.
Al-Dabbagh [20] conducted a study to evaluate the quality of Google Translate in relation to Arabic texts. The study uses a variety of text types: journalistic, economic and technical. It showed that the translation outputs of Google Translate have been marked with grammatical and textual blunders and sometimes lexical inaccuracies. The outcome of Google Translate, as pointed out by Al-Dabbagh [20], sometimes appears to be incomprehensible to readers. Moreover, the effect of Arabic from/into English translation as per Google Translate has been measured to indicate that there is no consistent frequency of flaws [12]. Discrepancies of Google Translate between Arabic appeared, as well, on verb constructions. In a detailed study on the translation of Arabic verb via Google Translate, Carpuat, et al. [21] detect that the position and the order of the Arabic verb seem to be altered by Google Translate. This, in fact, is in line with Alqudsi, et al. [22] who find that the production of Google Translate seemed literal and fallacious. Further, Jabak [23] concludes that in the light of Arabic/English automatic translation, the output of Google Translate is marked with "inadequacy, ineffectiveness and defectiveness".
Evaluation studies of Google Translate in other languages is abundant; nonetheless, the machine is relatively new and on constant development and improvement. Aiken [24] indicates that recent results of Google Translate assessment are highly encouraging. He mentions that the quality of the machine has been enhanced scoring "3.694 (out of 6) to 4.263, nearing human-level quality at 4.636" (p. 253). Aiken [24] offers further improvement statistics in these language combinations of Google Translate: "English to Spanish (87%), English to French (64%), English to Chinese (58%), Spanish to English (63%), French to English (83%), and Chinese to English (60%), for an average improvement of 69% for all pairs". The figures in Aiken"s study are based on a study conducted by Wu, et al. [25] titled Google"s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, which investigates the neural machine translation system.
As indicated, the literature reveals that the enhancement of Google Translate yielded more accurate results. Thus, this study explores Arabic social media texts to evaluate translation adequacy and fluency in order to reach a verdict on the system"s reliability. It goes without saying that translation could be perfect in different stylistic constructions. However, we attempt to evaluate the closeness of the output of Google Translate to a logical human translation that reads fluent with adequate and accurate meaning. Social media language holds the impression of the heavy use of dialects that Arabic is renowned with. It will be interesting to see how the system is flexible in handling dialects with less counterpart phrases stored in it. The study seeks to explore whether Google Translate algorithms can construct a meaningful written utterance with probably less processing data. Puchała-Ladzińska [11] reports that the accuracy of Google Translate could differ among languages as he states that "English and Spanish works very well, but translating between English and Japanese is not nearly as effective". This study will look at this harmony by seeing if the dialectal nature of social media works well with English on Google Translate.

III. METHODS AND PROCEDURES
Thirty passages with variation in document length were randomly selected. These included short, middle, and long passages. Short passages (1-10) were about 1000-1200 characters. Middle passages (11)(12)(13)(14)(15)(16)(17)(18)(19)(20) were about 2000-2400 characters. Long passages (21)(22)(23)(24)(25)(26)(27)(28)(29)(30) were about 3600-3900 characters. This corpus-based study aims to evaluate the dialectal Arabic-English translation of the social media content. Dialectal Arabic is a spoken phenomenon that makes it hard to find written material in the slang language except in the social media platforms. Hence the corpus consisted of dialectal Arabic sentences, mainly Egyptian (Cairene) Arabic collected from monolingual Arabic Facebook and Twitter comments on posts covering wide spectrum of genres including sports, religion, and politics. Data were filtered and comments which are completely or mostly made up of MSA words were eliminated. Only comments mostly composed of dialectal words were retained. The sentences were fed into Google Translate and the translation outputs were evaluated by the three evaluators specialized in translation for the potential errors of translation. Errors were, then, analyzed and classified according to their type. A graded rubric was designed by the researchers to guide the evaluation work of the three annotators, as seen in Table I. Evaluators were asked to highlight the error they come across in the process of their evaluation. Table II   A summary of the results can be seen in Table III.    The evaluation of the MT quality is inspired by some more general criteria. Adequacy and fluency are the general parameters against which machine translation is assessed; the former is about conveying ideas contained in the source text or how accurate the translated text as compared with the source text. The latter is concerned with the grammaticality of the target text [26,27]. Likewise, Popović [28] argues that the quality of the MT outputs is always assessed against three quality norms: adequacy, comprehensibility and fluency. The bilingual standard of adequacy is concerned with the accuracy of conveying the meaning of the source text to the target text. The monolingual criterion of comprehensibility deals with the extent to which a reader is able to understand the resultant translation without having to go back to the source text. Fluency reflects how much the translated text adheres to the structural system of the target language. Fluency and adequacy are described by Koehn and Monz [29] as the most widely employed as manual evaluation metrics. As thus, evaluation of the MT translation in terms of semantic, syntactic and pragmatic features seems to be a kind of breaking down of these more holistic terms of adequacy, comprehensibility and fluency. Accordingly, evaluators were asked to assess each sentence along two phases. The first is monolingually-directed in which the target text was assessed for comprehensibility. The second phase compared the two texts to examine the accuracy of rendering the source text into the target text.

IV. ANALYSIS
Three annotators were given the selected passages to judge in terms of their adequacy and fluency quality. A rubric of five points was prepared by the researchers to guide annotators. The annotators were asked to comment on the salient problematic elements they come across in the outputs that render the translation defective. Annotators pinpointed different types of errors that affected the quality of the Google Translate performance. They spotted many errors covering the lexicosemantic, syntactic and pragmatic levels.  God willing, the response will be on the day of the endowment with the end of the Ramadan session Translating ‫"الْقفة"‬ alwaqfah with "endowment" which denotes "to furnish with an income" is a reprehensive example of a semantic error as the proper translation that conveys the propositional content of the phrase is "the eve of the feast". Similarly, the choice of "session" to translate ‫"الدّرٍ"‬ aldawrah is a manifestation of lexical and semantic error as the word used in the output means "a meeting" whereas the intended meaning is "tournament". One more instance of semantic errors, which annotators highlighted, is in the following output:
In this example, Google Translate transferred the formulaic expression of ‫الفاضٖ"‬ ‫ػلٔ‬ ‫مالم‬ ‫"اًث‬ 'ant kalam ealaa alfadi into "talking about the empty space" which constitutes a major semantic problem severely affecting the meaning and distorting the connotations of the source text. The literal translation of ‫ًفسنن‬ ‫بتتؼبْ‬ bitataeibu nafsikum into "how tired you are" is another example of the lexico-semantic problems faced in Google Translate translations.
The syntactic problems or the errors related to structure and the word order of the sentence appeared now and then in the machine translation outputs. Sometimes they follow from the semantic errors; however they can also occur independently from the semantic errors. In the following output, syntactic errors appeared coupled with the semantic problem.

GT Output
No one, Modena, Dahia, other than you, you are watercressers In this example, the system failed to recognize/identify the meaning of ‫داَُ٘"‬ ‫ف‬ ‫"هْدٌٗا‬ Mwdina f Dahyh as it mistakenly identified it as referring to named entities that should be written capitalized. This creates semantic and syntactic problems. The semantic content of the source utterance is not conveyed to the translated text bringing about a meaningless sentence. This semantic error is intertwined with a syntactic one as the sentence is missing a necessary verb and a predicate. The sentence has another instance of synchronous semanticsyntactic error occurring in the second half of the sentence when the system uses the plural form of watercressers to refer to the second person singular pronoun "you". The use of the word "watercressers" as the English equivalent of the Arabic word" ‫جرج٘رٕ‬ " jrjyri is semantically wrong since it is used here to refer to the name of the post author(with some phonological modification) with the purpose of mocking him. Another example to show the association of semantic errors with the syntactic one is found in the following output: ST ‫ف٘ي‬ ‫الضحل‬ ‫ُ٘القٖ‬ ‫ماى‬ ‫ػارف‬ ‫هش‬ ‫الزهالل‬ ‫غ٘رر‬ ‫م‬ ‫الْاحد‬ ‫ّهللا‬ wallah alwahid m ghyrr alzamalik msha earif kan hilaqi aldahk fyn GT Output By God, the One M changed Zamalek, I don"t know where I would have laughter Using the indefinite pronoun" the one" as the subject of the sentence and at the same time using the first person singular pronoun "I" to refer to it is an example of syntactic malfunction not to mention the structurally incorrect use of the definite article "the "before it. The subject ‫"الْاحد"‬ alwahid meaning "one " refers to the author of the comment who is the speaker and as thus should be referred to using the first person pronoun "I" . The sentence should be restructured to be "By God, I don"t know without Zamalek,…." This syntactic issue seems to be based on a semantic problem as the system failed to convey the propositional content of ‫الزهالل"‬ ‫غ٘رر‬ ‫"م‬ m ghyrr alzamalik which is a prepositional phrase meaning "without Zamalek".
The pragmatic errors, which reflect insensitivity to the contextual and cultural aspects of the text, popped out persistently in the translation outputs. Take for example, the following outputs: ST ‫دمك‬ ‫علً‬ ‫اختشً‬ ‫اخً‬ ‫ٌا‬ ‫فاستتروا‬ ‫بلٌتم‬ ‫اذا‬ ya 'akhi akhtashi ealiun damak 'iidha balaytum fastatiruu GT Output My brother, fear for your blood If you bleach, then take cover.
In this output, the system rendered the Arabic formulaic expression ‫دهل‬ ‫ػلٖ‬ ‫اختشٖ‬ akhtashi ala damak to "fear for your blood "which is a word for word translation that misses important cultural dimensions. It is an idiomatic expression used to implicate the illocutionary force of blame or rebuke. It is better to be translated into the functional equivalent of "shame on you" or "you should be ashamed of yourself". Another bad word choice is that caused by lack of diacritics. The word ‫‪is‬بل٘تن‬ identified by the system as meaning "wear away" when the intended meaning is taken from the Arabic word َ ‫ُلٖ‬ ‫ب‬ bulia to mean "be afflicted" which means "if you are afflicted by or find it indispensable to commit shameful things you should do this privately or secretly not in public". The best equivalent is another proverb in English which says "Don't wash your dirty linen in public". Here is another manifestation of the occurrence of pragmatic ambiguity in the GT translation"s output:

ST
‫دي‬ ‫الحجات‬ ‫ع‬ ‫تتفرجو‬ ‫منٌن‬ ‫مراره‬ ‫بتجٌبو‬ bitajibu mararih mnyn tatafaraju e alhajat di GT Output Btjebo bitterness from where you look at these pilgrimages In this example, the system was not able to translate" ‫بتج٘بْ‬ " bitajibu and identified it as a named entity causing a syntactic error that produced a distorted meaning. The literal translation of ‫هٌ٘ي‬ ‫هرارٍ‬ ‫بتج٘بْ‬ bitajibu mararih mnyn is a sign of a pragmatic issue which should be transferred so that it reflects the illocution of surprise or disbelief. As the expression is an idiomatic one in Arabic, it is better to be translated by another English equivalent idiom, if possible, to give similar www.ijacsa.thesai.org connotation and convey the intended speech act. The best functional equivalent in English is "how on earth you have the patience to ". The last part of the ST ‫دٕ‬ ‫الحجات‬ is translated by GT system as "these pilgrimages" and not "these things". This is due to the fact that sometimes the spelling form ‫"الحجات"‬ is used in colloquial Arabic to mean "things" but it is used in Modern Standard Arabic to refer to the plural form of the word "pilgrimage". The standard Arabic form for "things" is ‫الحاجات‬ alhaajat.
The addition to or deletion of words from the translation output is also an issue that can affect the translation quality especially with regard to the deletion. Deletion might jeopardize the appropriate convey of the full connotation of the utterances of the source text. Izwaini [30] came across similar cases of word deletion and suggested that such deletion might result in awkward translation. Numerous cases of deletion occurred in the translation of the comments with negative influence on meaning. Take the following example: ST ‫مالم‬ ‫بأٓ‬ ‫جوِْرمن‬ ‫ػلٔ‬ ‫بتضحنْ‬ ‫حاجَ‬ ‫ُتؼولْا‬ ‫ّال‬ ‫الفاضٖ‬ ‫ػلٔ‬ ‫مالم‬ ‫اًث‬ ‫الؼظ٘ن‬ ‫ّهللا‬ wallah aleazim 'ant kalam ealaa alfadi wala taemaluu hajah bitadhuk ealaa jumhurikum ba'aa kalam GT Output By God Almighty, you are talking about the empty space, and you will not do anything by laughing at your audience with any words.
In this example there are two instances of addition which has no equivalent in the source text; the first is the addition of the word "space" which is semantically wrong as it does not convey any propositional content and at the same time caused severe semantic ambiguity. The second instance is the addition of the preposition "by" that refers to the means through which something is done, a meaning not meant in this context and not included in the source text as a lexeme. For deletion, here is an example: ST ‫افرٗق٘ا‬ ‫هي‬ ‫أفضل‬ ‫الدّرٕ‬ ‫أى‬ ‫بتقْلْ‬ ‫هش‬ ‫لَ٘‬ ‫ًفسنن‬ ‫بتتؼبْ‬ ‫ػن‬ ‫ٗا‬ ya em bttebw nafsikum lih msh btqwlw 'ana aldawria 'afdal min 'afriqia GT Output Oh, how tired you are, why don't you say that the league is better than Africa?
In this example, the word"‫"ػن‬ em to mean buddy is deleted which causes the meaning to lose an important pragmatic connotation that show the informality of the communication and that the addresser and the addressee are mostly strangers. The addition of the question word "how" is not supported in the source text bringing about semantic ambiguity and conveying meaning which is neither said nor meant by the writer.

V. DISCUSSION
This small-scale exploration of the MT"s translation quality of the social media content revealed a wide range of errors that covered the Lexical, syntactic and pragmatic levels affecting the overall translation performance and rendering unintelligible outputs in most cases. This is in line with Jabak [23] who, albeit focused on MSA, challenged the validity of the GT system and advocated human intervention. Al-khresheh and Almaaytah [18] noted that the polysemous vocabulary and the difference in grammar caused the GT to face multitude of difficulties that affected the translation quality. Likewise, Al-Dabbagh [31] disclosed that English-Arabic translation of varied text types via Google Translate produced the full range of errors from the lexical level up to the contextual level. Omar and Gomaa [32] questioned the reliability of GT due to the many errors surveyed that relate to different lexical, structural, and pragmatic types of errors. Hadla, et al. [12] highlighted the tendency of MT systems including Google Translate to translate literally overlooking the pragmatic aspects. Hijazi [19] concluded that GT was unable to produce accurate translation to legal texts and that this kind of genre presented high difficulties lexically and syntactically. The system could not provide the readers with a general idea about the translated texts.
In the same vein, Jabak [23] revealed that GT is not a valid tool to translate from Arabic to English as its outputs lack accuracy due to the many sematic and syntactic errors committed which necessarily needs human intervention. Abdelaal and Alazzawie [33] used Google Translate to render informative news from Arabic to English to pinpoint the most common errors committed by the system. Results referred to two types: omission and bad word choice. Ali [34] revealed that three MT systems, namely, Google Translate, Microsoft Bing, and Ginger, performed insufficiently. Google translate came last in terms of accuracy. The study again highlighted the need for human post editing. However, given this stress on the necessity of a human role in translation, the question that poses itself in this regard is the feasibility of human intervention in this almost real time communication which means that the luxury of post editing and human interference is not possible.
The reason of this inherent poor quality of Google Translate is mostly due to the distant relation between Arabic and English in terms of their linguistic systems and underlying cultures, which makes MT highly challenging for GT [22,23]. The challenges faced by the MT system due to this linguistic distance touch a wide scope of linguistic performance including the orthographic, semantic, syntactic and pragmatic levels [35][36][37][38]. The difference in lexical schemes between the two languages including phenomena like homophony [33] polysemy, and multiword expressions ( [39] mainly lend themselves well to the lexico-semantic problems.
The difference between the two languages with regard to the grammatical rules governing the structuring of sentences causes syntactic issues that constitute major challenges for Google Translate bringing about defective translation [18].
Orthography is another challenge that affects the GT performance. One manifestation of this challenge is the oneletter words in Arabic. Such words are so scarce in Arabic and they are mostly prepositions. They are always joined to the next words and as thus never appear as distinct words in the Arabic sentence structure. Examples are ‫)ك(‬ "like", ‫")ب(‬ with".in addition, there are some imperative verbs that are reduced to one letter like (‫")ق‬protect" and ‫)ع(‬ "beware". For dialectal Arabic, the lack of standardized orthographic system leads to variation in orthographic representations/ resulting in an absence of uniformity concerning the writing system [40]. These one-word letters constituted barriers to the GT system as it failed to correctly process many instances of these one-letter words (peculiar to dialectal Arabic) occurring in the comments.
The lack of unified writing systems for vernacular Arabic means that within the one dialect the same words can be www.ijacsa.thesai.org written in a variety of ways and spelled as it comes [41]. In other words, people"s writing of the dialectal Arabic follows dialectal pronunciation, and ,as thus, lacks standard spelling conventions [42]. The system failure to address the challenge of improvised encoding of the dialectal language was clear in many examples of the comments translated by the study.
Orthography also affects the MT system"s processing of lexemes. For example, when diacritical marks are added many of the semantic and lexical problems are solved. This is clear in translating ‫"بل٘تن"‬ and ‫"حاجات"‬ in the example shown earlier.
Similarly, poor named entities recognition seems more evident in Arabic posing high challenge on the MT systems. Examples of the system"s inability to address such challenge are the incorrect translation of ‫"بتج٘بْ"‬ and ‫"داُ٘ة"‬ detailed in the examples. This is actually due to the fact that Arabic writing system doesn"t have the uppercase and lower case letter conventions found in English [39].
Degree of Dialectness of the source text is an important challenge that determines the effectiveness of the MT systems. Habash, et al. [43] break down dialectness into four levels which each has a degree of impact on the performance of GT system. They categorize dialecteness along a continuum with the extremes of pure MSA and Pure dialectal. The other two levels cover the mixed cases between the two extremes. The pure dialectness and the code switching between MSA and the Arabic dialect cause the system to work relatively improperly and render defective translations. In this regard, Omar and Gomaa [32] argue that the degree of colloquialism determines how accurate and reliable machine translation is due to the elastic nature of the dialectic coding system lexically or syntactically.
The contextual and cultural dimensions play a vital role in determining the MT quality. Drawing on information previously mentioned in the text is a context sensitive skill that relates to pragmatics. Pragmatics is concerned with how a language user can mean more than what s/he says. Pragmatics presupposes that the referential meaning of the utterance should be seen in light of the contextual aspects so as to be able to capture the speaker /writer"s intention which often goes beyond what is said. Linguistic elements by themselves are not enough to carry the full-fledged meaning intended. Pragmatics is about the implied meaning which is not carried through words or the indirect meaning that can be derived from the text and context. "When it comes to translation, the key issue is how to capture indirectness in human communication and how to invest the resources available in both languages when rendering it" [44]. What is explicitly expressed in the ST can be faithfully transferred into the TT. However, this might create a pragmatic ambiguity resulting from not clarifying elements in the context. The pragmatic ambiguity constitutes the most common error type in the current study compared to the other lexico-semantic and syntactic error types. Notably, in most cases of semantic and syntactic errors, a pragmatic ambiguity ensues as a result.
It is worthy to note that reviewing the studies that explored the MT performance concerning literary texts [32,44,45] and proverbs genre [18,45,46] have common denominator with social media texts. Though different in their type of texts, they all agree in that high degree of cultural aspects exists in them that determine the translation quality. This makes human intervention through post editing necessary. Malika [45] reaffirms this argument that: The cultural context of proverbs and poetry is of major importance, and since this context is out of reach for the machine, the outcome is stylistically ambiguous and culturally inappropriate translations. When the source language and the target language belong to two different families, like Arabic and English, the outcome is what Gellerstram called "Translationese " i.e., awkwardness and ungrammicality [45].
It is worthy to note that the Machine translation systems have their tools to deal with aforementioned challenges to produce as accurate translation as possible. However, the point is not as concerned with the availability of such tools as it is with the quality and compatibility of such tools. Following are a brief exploration of the specific processing techniques needed to improve the translation quality of the MT systems.
The lexico-semantic problems persist in the translated outputs despite the existence of word sense disambiguation (WSD) techniques .This might be logical due to the special case of the Arabic morphological system as characterized with being highly inflectional [47] and the unique orthographic system of Arabic with its loose and nonconventionalized scheme of writing [48]. The morphological and orthographic systems of the Arabic vernacular languages seem to be even more complicated and pose more challenge. Existing word sense disambiguation" (WSD techniques need to be honed and even more Arabic-compatible techniques that address the unique features of Arabic should be developed. The Out of vocabulary (OOV) words stood out clearly in the translated outputs in two instances to disclose the system failure to engender accurate translation. These out of vocabulary words always appear in the translated output as transliterated words. Meereboer [49] confirms that an inherent weakness in any machine translation system is Out-Of-Vocabulary (OOV) words which is supposedly not to be included in the training data. Translating from morphologically rich languages to another less rich ones often leads to the OOV words [50]. Aqlan, et al. [51] argue that these missing or unknown words are caused by the highly inflected words peculiar to the Arabic language .As a matter of fact, with low resource languages that have limited parallel data, out of vocabulary (OOV) words are more likely to happen [52]. Dealing with this challenge should be based on the type of the missing words. According to Gujral, et al. [52], unknown words fall under three categories; "named entities, borrowed words, compound words, spelling or morphological variants of seen words or content words unrelated to any seen word" [52]. This challenge was addressed via different techniques [43,50].
In addition, for optimal translation out of the system, morphological segmentation schemes need to be developed to cover the wide complex variation in the morphological behaviour of the dialectal Arabic. Some of the errors committed by the system, especially concerning the out-ofvocabulary, are likely to be due to the failure of the system to discover the different possible morphological variants that can associate specific roots. Some words, which the system failed www.ijacsa.thesai.org to recognize and produced as out of vocabulary, were fed again to the system with different morphemes and the system was able to recognize and interpret these words. in their study of the Semitic language of Tigrinya, Tedla and Yamamoto [53] call for developing new morphological segmentation models that fit the highly inflected nature of their language, a thing which is applicable to Arabic as belonging to the same language family.
It is clear from the examples shown that some of the resultant problems are brought about by the poor boundary recognition of the system. The sentence boundary detection is the foundational first step for natural language processing. What follows is that more work is needed to hone the sentence boundary detection tools to address the complicated sentence structure of the typical social media communication. The social media content is primarily composed of non-punctuated stream of words which essentially presents a seemingly insurmountable challenge that so often hinders the system from executing properly. Systems need more training to be able to address the inherent flurry sentence boundaries of the social media communications. In their investigation of the sentence boundary detection for social media texts, Rudrapal, et al. [54] concluded that the current systems are limited in terms of sentence boundary detection and highlighted the needs for more advancements in this regard to capture the peculiarities of the coarse nature of social medial contents. They explained that the language of the social media texts "tend to be full of misspelled words, show extensive use of home-made acronyms and abbreviations, and contain plenty of punctuation applied in creative and non-standard ways.

VI. CONCLUSION
Several studies have been conducted on evaluating the performance of a number of MT systems, including Google Translate (GT), for the Arabic-English language pair. However, while most studies were focused mainly on the modern standard form of Arabic, very few studies handled the translation of Arabic dialects or the so-called colloquial Arabic or vernaculars. The current study has attempted to address this gap through evaluating the performance of automatic translation of the colloquial forms of Arabic in the social media networks and platforms, including Facebook and Twitter. In the current age of information technology social media is extensively used by people across the globe to communicate in different languages with the help of MT systems. GT, which is the most widely used MT system for the Arabic-English translation, has been chosen for evaluating the reliability of its output in the context of translating Arabic social media language into English. The evaluation has been carried out manually by human translators who assessed the accuracy of translation in terms of adequacy and fluency. The evaluators spotted a number of errors on the lexico-semantic, syntactic and pragmatic levels, which rendered the translation unintelligible in most cases. Most of the texts investigated were translated literally by GT, which resulted in inappropriate translation in the TL output. This literal rendition resulted in wrong equivalents, inappropriate additions and deletions, and transliteration for out-of-vocabulary (OOV) words. This poor quality of GT output can be attributed to a number of reasons, most important of which is the distant relation between Arabic and English in terms of their linguistic systems. The difference in the linguistic system between both languages gives rise to a number of linguistic challenges including polysemy, homophony and multi-word expressions. Another similarly important reason is the peculiar nature of Arabic which is generally described as morphologically rich and syntactically free. This complex nature of Arabic, commonly known as a highly inflected language, poses great challenges for the computational processing of the language for NLP applications, including machine translation. A third reason which is particularly related to the vernacular varieties used in social media is the lack of a standardized orthographic system which consequently leads to variation in orthographic representations. This means that in colloquial Arabic the same words can be written in a variety of ways as they follow dialectal pronunciation and do not adhere to standard spelling conventions.
In order to overcome the problems encountered in the GT output of Arabic social media texts, NLP techniques, such as morphological analyzers, part-of-speech (POS) taggers, syntactic parsers and word sense disambiguation (WSD), systems need to be enhanced and even more Arabic-compatible techniques that address the unique features of Arabic should be developed, particularly for processing the Arabic vernacular varieties.
The current study is an attempt to shed light on the problems facing MT in the context of Arabic vernacular varieties used in social media. The study was focused on evaluating one MT system, i.e., GT, in this regard. Further studies are still needed in the area of machine translation of dialectal Arabic. One such study can address the translation of vernacular Arabic by a number of MT systems and compare and contrast between the outputs of the systems under investigation to uncover more problems and suggest possible solutions.