Towards Gp Sentence Parsing of V+p+cp/np Structure a Perspective of Computational Linguistics

—Computational linguistics can provide an effective perspective to explain the partial ambiguity during machine translation. The structure of V+Pron+CP/NP has the ambiguous potential to bring Garden Path effect. If Tell+Pron+NP structure has considerable higher observed frequencies than Tell+Pron+CP structure, the former is regarded as the preferred structure and has much lower confusion quotient. It is possible for the grammatical unpreferred Tell+Pron+CP structure to replace the ungrammatical preferred Tell+Pron+NP, which results in the processing breakdown. The syntactic details of GP processing can be presented by the computational technologies. Computational linguistics is proved to be effective to explore the Garden Path phenomenon.


INTRODUCTION
A Garden Path (GP) sentence can produce an effect of processing breakdown.It refers to the special local ambiguous structure which always makes a reader misled down to the garden path.The GP sentence is grammatically correct and the ultimate result of processing is different from the initial incorrect interpretation which is considered to be the most likely one at first.The ups and downs of processing may lure the reader into a dead one.With the appearance of garden path effect, the initially built-up structure is replaced with the improved structure whose processing is involved in the breakdown and backtracking.Connectionist psycholinguistics focus on psychology of language, and connectionist models are considered to play an important role in GP processing.[1] The parsing of GP sentences is complex during the processing of natural language.Information-based decisionmaking provides a helpful information method to analyze the complex structure, i.e.GP sentence, by describing the phenomenon of information use and by explaining why an information use occurs as it does.The relative effectiveness can be obtained by using multi-term phrases and POS tagged terms.[2] Knowledge Base System is effective for parsing of GP sentence.Large-scale discriminative two-phase learning algorithms, which ensure the reliable estimation and prevent overfitting, can be used to learn parameters in models with extremely sparse features.[3]Lexical information and knowledge representation can improve the efficiency of parsing.The effective device for imposing structure on lexical information is that of inheritance, both at the object (lexical items) and meta (lexical concepts) levels of lexicon.Lexical semantics theory can utilize a knowledge representation framework to offer a richer, more expressive vocabulary for lexical information, which obviously brings the advancement of knowledge base system.[4]Writing strategy training can be meaningfully provided by artificial intelligence tutoring system which is effective in assessing essay quality and guiding feedback to students with the help of processing algorithms,the development of which must consider a broad array of linguistic, rhetorical, and contextual features.[5] Machine translation systems need the effectively parsing of GP sentences.MT systems against semantic frame based MT evaluation metrics and objective function can benefit from the semantic knowledge and produce more adequate output.Machine translation evaluation systems in the metrics task can be used to measure the similarity of the system output translations and the reference translations on word sequences.
Statistical machine translation (SMT) performance can be affected by the small amounts of parallel data with the result of both low accuracy (incorrect translations and feature scores) and low coverage (high out-of-vocabulary rates).The bilingual lexicon induction techniques are helpful for learning new translations from comparable corpora, thus improving the coverage.The model"s feature space with translation scores can be estimated over comparable corpora, which brings the improvement of accuracy.[6]Extant Statistical Machine Translation systems embed multiple layers of heuristics and encompass very large numbers of numerical parameters.The phrase-based translation model can be used to decrease the difficulty of analyzing output translations and the various causes of errors.[7] Based on natural language processing systems, computational linguistics skills and cognitive analysis, related system can be established to efficiently parse GP sentences.This discussion comprises two parts.The first part will discuss the Shon and Moon"s system for machine learning which is helpful for GP sentence parsing.The second part will parse the GP sentence from multi-perspectives.Context-free grammar, recursive transition network, CYK algorithm, and statistical analysis will be introduced to analyze the processing breakdown of GP sentence.www.ijacsa.thesai.orgII.
THE SYSTEM FOR MACHINE LEARNING The peculiarities of linguistic cognition are necessary to be analyzed in the machine learning system.ML should have the ability to process the linguistic data originating from natural languages.For example,the evidence shows that it is hard for children to recover from misinterpretations of temporarily ambiguous phrases.They are reluctant to use late-arriving syntactic evidence to override earlier verb-based cues to structure, and late-developing cognitive control abilities mediate the recovery from GP sentences.[8]World-knowledge about actions designated by verbs and syntactic proficiency is reflected in on-line processing of sentence structure.[9] Automated knowledge acquisition is an essential aspect involved in machine learning.For example, the automated induction of models and the extraction of interesting patterns from empirical data are concerned in the ML domain.Fuzzy set theory is proved to be helpful for machine learning, data mining, and related fields.In Fig. 1 created by Shon and Moon [11], we can see the system comprises many steps, i.e. on-line processing, off-line processing, feedback of validation, the Enhanced SVM machine learning approaches, cross validation.
On-line processing is the first step, which focuses on a realtime traffic filtering using PTF after the parsing of raw packet capture.PTF is used to drop malformed packets, which can provide efficient work for packet preprocessing and reduce the number of potential attacks of the raw traffic.Off-line processing is involved in data clustering using SOFM and a packet field selection using GA.Both procedures work comfortably before the overall framework.GA chooses the appropriate fields and the filtered packets in real-time select the related fields by the natural evolutionary process.SOFM-based packet clustering can make a lot of packet profiles for SVM learning which can bring more appropriate training data.
In the feedback of validation, the filtered-packets are used as the preprocessing of high detection performance.The relationships between the packets are considered to charge SVM inputs with temporal characteristics.The SVM comprises soft margin SVM (supervised method) and one-class SVM (unsupervised method).The Enhanced SVM machine learning approach, including machine training and testing, inherits the high performance of soft margin SVM and the novelty detection capability of one-class SVM.Cross validation and real test with Snort and Bro is the final step which entails an mfold cross validation test and real world test.
Shon and Moon"s model provides us a general idea about machine learning.An effective system has to be a flexible and adaptable one.That means any effective system should be the product of theoretical and practical application.The improvement of theoretical analysis can improve the system effectively.Therefore, some good algorithms and parsing skills should be involved in improving machine learning system.
From the perspective of theoretical analysis, machine learning depends on the integration of computational technologies and linguistic background.The harmonious development of linguistics and computer science can bring the advancement of machine learning.The parsing of complex sentences, i.e. garden path sentences, will be clearly shown below to present the procedures of machine learning and linguistic cognition.The integration of context-free grammar, recursive transition network, well-formed substring table and CYK algorithm can be used as an effective method of computational linguistics for application.

III. MULTI-PARSING OF GP SENTENCE
The processing of GP sentences needs the effective involvement of formal language, e.g.context free grammar, and the algorithms, e.g.recursive transition network and CYK algorithm.

A. CFG-Based Processing of GP Sentence
Context free grammar generally comprises nonterminal symbol (V) and terminal symbol (w), and all the production rules belong to the structure of "V→w".Sometimes, "w" can refer to the nonterminal strings or empty strings.The situation in which the production rules are available regardless of the context of a nonterminal is called context free.The single "V" can be replaced by "w" according to the production rules until nearly all the rules are used for the processing.If the ultimate nonterminal symbol "S" can be replaced by the terminal symbols on the right and all the strings have been parsed successfully, the processing is acceptable.If part of the strings are left by system without being parsed, processing has to return to the crossroad to choose the alternative which can lead to the full parsing of all the strings.The backtracking obviously brings the processing breakdown which is the particular effect of GP sentence.
Example 1: She told me a little white lie will come back to haunt me.www.ijacsa.thesai.orgAccording to the CFG above, we can find the grammar is defined by the four-parameters "Vn, Vt, S, P"and sub-language defined by "G"."Vn" means a finite set of non-terminal symbols; "Vt", a finite set of terminal symbols; "S", start symbol; "P", productions of grammar which shows the relationships between non-terminal and terminal symbols.All the rules are available during the processing.Please see the parsing by means of the grammar.In the BNCweb database (http://bncweb.lancs.ac.uk/), we can obtain the statistical information about "tell+me+any article" type in which both "[(that) any article+NP]IP" and "[any article+NP]NP" are acceptable.The corpus shows that "Your query "tell me" returned 5284 hits in 1213 different texts (98,313,429 words [4,048 texts]; frequency: 53.75 instances per million words), sorted on position +1 with tag-restriction any article (272 hits)".According to the corpus, the statistical number of "[(that) any article+NP]IP" type is 24, and the other NP type is 148.
According to the nonparametric statistics, if df=1 and X 2 =93.30, then p<.05, which means the significant differences between the types.Since the type of NP has much higher observed frequency than the other, NP type is considered to be the prototype by means of which system automatically parses the strings involved.If the prototypical NP type is rejected by system during the processing, the optional IP type will be started, resulting in the GP effect.Please see the successful processing in which IP type is accepted successfully.
In the diagram, we can find the fact that "a little white lie" is considered NP used as the subject of IP, by which the GP sentence is successfully parsed.Besides the CFG and the tree diagram, Recursive Transition Network (RTN) is helpful for the parsing and processing of GP sentence.

B. RTN-Based Processing of GP Sentence
An RTN usually comprises one S net and some subnets created for the recursive transition of different sub-strings.In  In Fig. 3, the NP subnet and VP subnet are complexer than the other nets.According to the Table 1, the structure of "She told me a little white lie" is parsed as follows.In the processing, we can see that not all sub-strings are parsed successfully, which means the failure of processing.According to Figure 3 Since the first structure with high statistical frequency is proved to be illegal above, the alternative structure is started subsequently.The shift obviously brings the cognitive overburden and effect of garden path.The effective processing of unpreferred structure is shown as follows.www.ijacsa.thesai.orgIn the processing above, all the sub-strings are included and parsed comprehensively.The procedures reflect the fact that the GP sentence is grammatical correct despite of processing breakdown and backtracking.More details of the decoding algorithm can be analyzed in CYK.

C. CYK-Based Processing of GP Sentence
CYK algorithm is effective for the processing of GP sentence.It can clearly show the procedures of backtracking and present the possibility that computational linguistic knowledge functions as an efficient method providing the helpful hints to alleviate the effect of garden path.
In Table 2, we can see all the procedure of processing breakdown.At the point (7,6), the CFG rule of "NP-->Adj N" is started and the sub-string "white lie" is parsed.At the point (7, 3), another CFG rule of "NP-->Det NP" is accepted and the sub-string "a little white lie" is rewritten into another NP.At the point(7, 1), the rules of "VP-->V NP NP" and "NP-->Pron" are available.The sub-string of "told me a little white lie" is parsed as VP.At the point(7, 0), the rules of "NP-->Pron" and "S-->NP VP" are provided.The sub-strings of "she told me a little white lie" are processed clearly.The original processing result is obtained.With the appearance of other sub-strings of "will come back to haunt me" which needs a NP as a SUBJECT, system has to re-analyze the original result.Thus, GP effect comes into being.Please see the non-well-formed sub-string table which is created on the basis of Table 2.In Fig. 4, the turning point is 7 in which the processing is cut into S+VP.The result is as same as the situation in Table 2 in which S is obtained at the point(7, 0) and VP is got at the point (13,7).Since no effective CFG rule used to rewrite S+VP, the sub-string table is not well-formed, meaning the failure of processing.If system can backtrack to the turning point and apply the rule "VP-->V NP IP" rather than "VP-->V NP NP", the whole processing procedures can be attractively presented in Table 3.  EXAMPLE 1 In Table 3, the CFG rules, e.g."S-->P(NP) VP", "VP-->V P(N) IP", "IP-->NP VP", "VP-->VP SC" etc. are available and the sentence can be effectively parsed.The well-formed substring table created from Table 3 can be shown in the Figure 5.The procedures of parsing reflect the fact that all the strings are successfully processed, and correspondingly, the algorithm used for the system to improve the efficiency can be presented as follows.In the algorithm, two parameters (i & j) are provided to present the procedures.The parameter j is used to construct the chart structure and the parameter i is used to built the grammatical structure.
The first step is to construct the chart structure.The sentence has the string length of 13, i.e. 13 words are involved www.ijacsa.thesai.org in the processing.The duration of j is 1-13.Then the first word "she"can be drawn by the chart (0, 1); the second word "told", by the chart(1,2); the third word "me", by the chart(2,3);...the last word, by the chart (12,13).
The second step is the construction of syntactic grammar.Since at least two basic charts can constitute one basic syntactic sector, the value duration of i is j-2≧0.Thus, the created syntactic rules of symbols cover the string from i to j, and the algorithm with i and j can be obtained.The basic construction is "syntactic_chart_fill(i, j)".Please see the algorithm presentation.
In the algorithm, the parameter k input between i and j is introduced to show the dynamic forms of analysis (k:=i+1 to j-1).With the help of i, j, and k, the processing algorithm can be smoothly run, and all the processing procedures based on wellformed sub-string table can be expressed clearly.

D. The Statistical Analysis of GP Effect
Effect of garden path can be calculated based on the statistical analysis.The formula created to analyze the confusion quotient is as follows: V cq = According to the details of parameters, Vcq is used to show the confusion degree and cq means "confusion quotient".The letter O is the abbreviation of "observed frequencies" during the processing.The letter E is the abbreviation of "expected frequencies".The letter n is the abbreviation of "number", meaning the numbers of peculiarities.The letter i means the unit of peculiarity.The complexer the structure of garden path sentence is, the higher the confusion quotient is.The value of 1 is the axis of confusion quotient.The GP effect appears when 2≧Vcq≧1.If the value of Vcq is less than 1, whole ambiguity appears rather than the partial ambiguity which is the result of GP effect.[12] TABLE V.
THE CONFUSION QUOTIENT OF CP/NP AMBIGUITY In Table 5, we can see the CP model has the value (1.82) and NP model has the value (0.91), which means the NP structure has less confusion quotient and therefore is considered the prototype.If the preferred NP model is replaced by the unpreferred or higher confusion model, the processing breakdown appears and GP phenomenon takes effect.The statistical analysis of GP effect reflects the fact that the ungrammatical preferred structure is always parsed firstly, which brings the backtracking during the processing.The grammatical unpreferred structure is proved to be the replacement of the initial processing.The chronological start of partial ambiguous structures is the potential reason of processing breakdown.The parsing can be shown by the online Stanford Parser which focuses more on statistics.

Fig. 2 .
Fig. 2. Tree diagram of example 1 . www.ijacsa.thesai.orgExample 1, the local ambiguity potential of structure asks the network effective to parse the sentence.That means both NP subnet and VP subnet should have convincing description of what function "a little white lie" should be, or can be.Please see RTN of Example 1.

Fig. 3 .
Fig. 3.The RTN of example 1 PRP me)) (NP (DT a) (JJ little) (JJ white) (NN lie))))))The processing above correspondingly brings the backtracking.Please see the details of breakdown processed by means of RTN algorithm.
, both the structures of "[[she]NP [[told]VP [me]NP [a little white lie]NP]VP]S..." and "[[she]NP [[told]V [me]NP [a little white lie ...]IP]VP]S" are partially acceptable during the processing, thus resulting in the local ambiguity at the point of "a little white lie".

TABLE I .
THE STATISTICAL ANALYSIS OF "TELL+ME" TYPE

TABLE II .
THE BREAKDOWN MATRIX OF EXAMPLE 1 Fig. 4. Non-well-formed sub-string table of breakdown

TABLE III .
THE PROCESSING MATRIX OF