In the construction grammar tradition, co- occurrence statistics from corpora have often been used as evidence for hypothesized constructions. However, such statistics are typically gathered on a case-by-case basis, and no reliable procedure ex- ists to automatically identify constructions. In con- trast, in computational linguistics, many automatic procedures are studied for identifying MWEs (Sag et al., 2002) – with varying success – but here they are treated as exceptions: identifying multi-word ex- pressions is a pre-processing step, where typically adjacent words are grouped together after which the usual procedures for syntactic or semantic analysis can be applied. In this paper I explore an alter- native formal and computational approach, where multi-word constructions have no special status, but emerge in a general procedure to find the best statistical grammar to describe a training corpus. Crucially, I use a formalism known as “Stochastic Tree Substitution Grammars” (henceforth, STSGs), which can represent single words, contiguous and noncontiguous MWEs, context-free rules or com- plete parse trees in a unified representation.
To situate the results of our system, figure 4(a) gives the values of several parsing strate- gies. CCM is our constituent-context model. DEP - PCFG is a dependency PCFG model  trained using the inside-outside algorithm. Figure 3 shows sample parses to give a feel for the parses the systems produce. We also tested several baselines. RANDOM parses ran- domly. This is an appropriate baseline for an unsupervised system. RBRANCH always chooses the right-branching chain, while LBRANCH always chooses the left-branching chain. RBRANCH is often used as a baseline for supervised systems, but exploits a system- atic right-branching tendency of English. An unsupervised system has no a priori reason to prefer right chains to left chains, and LBRANCH is well worse than RANDOM . A system need not beat RBRANCH to claim partial success at grammar induction. Finally, we in-
Researchers have investigated the rate and amount of attrition in different areas or sub skills of a language and have come up with various results. Some have come up with significant loss of a language and some others have come up with slight loss. Seliger (1991) stated that attrition is selective and does not concern all aspects of language in the same way. It has generally been assumed that often lexicon shows the higher degree of attrition, there were some studies that indicated this area seems to be the first to be affected in the process of attrition. In other words, this linguistic level is more vulnerable to attrition than the grammatical system. L2 attrition may occur in different aspects of language like grammar or vocabulary, from this point of view some researchers investigated the components of language, such as vocabulary or grammar and claimed that they are more vulnerable to attrition. In contrast to the study of lexical attrition, research on language attrition on the grammatical level was confined to the analysis of errors (Schmid, 2004). Laleko (2007) investigated the attrition phenomena that are responsible for the fact that AR (American Russian) is defined as a reduced variety of SR (standard Russian). Findings showed a reduction of vocabulary and changes in the aspectual system and gender agreement in AR. Al-Hazemi (2000) suggested that vocabulary is more vulnerable to attrition than grammar in advanced L2 learners who had acquired the language in a natural setting and similar to this result in some studies when speakers who lived in L2 environment were asked if they had so far experienced attrition, most speakers immediately reported they had some problems of lexical access, and this is also often suggested in attrition research as the aspect of linguistic knowledge that is most vulnerable to attrition (Hulsen, 2000; Kopke & Nespoulous, 2001; Kopke & Schmid, 2004; Montrul, 2008; Opitz, 2004).
tuguese grammar checker, a project initially sponsored by FINEP (a Research and Projects Funding agency). This research on CoGrOO began in 2004, and since its first release in 2006 it has been adopted by important companies like Petrobras - the biggest company in Brazil and the 8th biggest in the world in market value - and Celepar - the Paran´a State information technology company, responsible for deploying software for government of- fices and public schools. CoGrOO accumulated over a hundred thousand downloads from its official website.
75 classification, there are six different types of languages: SVO — Subject Verb Object; SOV — Subject Object Verb; VSO — Verb Subject Object, etc. These schemes reflect the typical structure of sentences. Turkic languages belong to the type SOV. A list of 13 links that naturally reflect the most important syntactic links between words in the sentences in the Kazakh language is described in . It is important that the same links can be used in the development of parsers for other Turkic languages, due to the high degree of similarity not only of their syntax, but also the morphology and vocabulary.
The aim or principle of developing the local culture (always implicitly marked as ‘lesser’ or ‘inferior’) has in its turn determined the canon of translated works, since not all the texts are suitable or purposeful from the point of view of such a mission. Translations should fill the blank spots, be the so-called gap fillers in the cultural scene. In the following, we will bring some examples. Ott Ojamaa (2010 : 62–63), one of the Estonian translation missionaries, has said that when translating Estonians should set the priority to be cultural gain: “…we try to translate literature that is as different as possible, not such literature that already exists here. What is very important is the principle that we must translate only for ourselves, taking into account the state our own literature is in. Only then can we produce translational literature that is worth something. [...] It has to exercise only those possibilities of language that no Estonian work of literature would address. Satisfy those needs that would otherwise be dissatisfied.” 2
This paper describes a possible extension for a well-known open source grammar checking software LanguageTool. The proposed extension allows the developers to write grammatical rules that rely on naturallanguage parser-supplied dependency trees. Such rules are indispensable for the analysis of word-word links in order to handle a variety of grammatical errors, including improper use of articles, incorrect verb government, and wrong word form agreement.
This paper gives an Abstract Categorial Grammar (ACG) account of (Kallmeyer and Kuhlmann, 2012)’s process of trans- formation of the derivation trees of Tree Adjoining Grammar (TAG) into depen- dency trees. We make explicit how the requirement of keeping a direct interpre- tation of dependency trees into strings re- sults into lexical ambiguity. Since the ACG framework has already been used to provide a logical semantics from TAG derivation trees, we have a unified pic- ture where derivation trees and depen- dency trees are related but independent equivalent ways to account for the same surface–meaning relation.
A method for deriving an approximately labeled dependency treebank from the Thai Categorial Grammar Treebank has been implemented. The method involves a lexical dictionary for assigning depen- dency directions to the CG types associ- ated with the grammatical entities in the CG bank, falling back on a generic map- ping of CG types in case of unknown words. Currently, all but a handful of the trees in the Thai CG bank can unambiguously be transformed into directed dependency trees. Dependency labels can optionally be assigned with a learned classiﬁer, which in a preliminary evaluation with a very small training set achieves 76.5% label accuracy. In the process, a number of annotation er- rors in the CG bank were identiﬁed and corrected. Although rather limited in its coverage, excluding e.g. long-distance de- pendencies, topicalisations and longer sen- tences, the resulting treebank is believed to be sound in terms of structural annota- tional consistency and a valuable comple- ment to the scarce Thai language resources in existence.
Grammar based naturallanguage generation (NL- G) have received considerable attention over the past decade. Prior work has mainly focused on hand-crafted generation grammar (Reiter et al., 2005; Belz, 2008), which is extensive, but also ex- pensive. Recent work automatically learns a prob- abilistic regular grammar describing Markov de- pendency among fields and word strings (Konstas and Lapata, 2012a, Konstas and Lapata, 2013), or extracts a tree adjoining grammar provided an alignment lexicon is available which projects the input semantic variables up the syntactic tree of their naturallanguage expression (Gyawali and Gardent, 2014). Although it is a consensus that at a rather abstract level naturallanguage generation can benefit a lot from its counterpart natural lan- guage understanding (NLU), the problem of lever- aging NLU resources for NLG still leaves much room for investigation.
Of the fundamental importance is the demarcation of two types of transposition of the linguistic units in the part-of- speech and inter part-of-speech classes – functional-semantic (or semantic, Sh. Bally's terminology) and functional types. Functional-semantic transposition, being the fact of grammar and vocabulary, includes the change both in the grammatical features and lexical-semantic features of the words and word forms being exposed to a categorical (part-of- speech) transformation. As a result the linguistic units become categorically "bifurcate" and "spin-off" from them, these word forms (twins) turn into a new class; compare: Podkhodiashchiy k stantsii poezd zamedlil khod (imperfect participle from the verb to approach) -> On iskal podkhodyashchiy moment, chtoby soobshchit' etu novost' (participle pronoun) (see [Shigurov, 1993, 2003]). The functional transposition, in contrast, has no relation to the word formation, presenting a purely grammatical process associated with the part-of-speech (categorical) attributes of the word forms. The nature of the functional transposition of the linguistic signs is exceptionally complex. Without going into the history of the issue, we note that within its boundaries the two types can be distinguished: 1) the transition of the language units from the different parts of speech into particular inter part-of-speech semantic-syntactic categories of words – predicatives and modal-parenthesis words (Brief Russian grammar, 1989, pp. 22-23, 302, etc.), and 2) a purely grammatical transition of words and phrases from one part of speech into the other (substantivization, adjectivization, adverbialization, etc.), not violating the identity of the original lexeme.
Unsupervised grammar induction has gained gen- eral interest for several decades, offering the pos- sibility of building practical syntactic parsers by reducing the labor of constructing a treebank from scratch. One approach is to exploit the In- side/Outside Algorithm (Baker, 1979; Carroll and Charniak, 1992), a variation of EM algorithm for PCFG, to estimate the parameters of the parser’s language models. More recent advances in this ap- proach are the constituent-context model (CCM) (Klein and Manning, 2001; Klein and Manning, 2002), dependency model with valence (DMV) based on Collin’s head dependency model (1999), and the CCM+DMV mixture (Klein and Manning, 2004; Klein, 2005). Several search techniques and
In other works, machine translation has been previously used for grammar correction. Brock- ett (2006) used phrasal based MT for noun correc- tion of ESL students. D´esilets and Hermet (2009) translate from native language L1 to L2 and back to L1 to correct grammar in their native languages. Mizumoto (2012) also used phrase-based SMT for error correction. He used large-scale learner corpus to train his system. These translation tech- niques suffered from lack of good quality parallel corpora and also good translation systems.
We first experiment on PTB dataset in the super- vised learning setting. We separately train our parser and joint generative model on PTB train- ing dataset, and then evaluate our language model on PTB test dataset. Table 2 lists the perfor- mance of our framework and competitor models. GRU-256 LM is our implemented language model using 2-layer GRU with hidden size 256, which is also used in other experiments. Parsing an- notations are used by RNNG, SO-RNNG, GA- RNNG and NVLM. These grammar-aware models achieve significantly better performance compared to state-of-the-art sequential RNN-based language models, showing that grammar indeed helps lan- guage modeling. NVLM substantially improves over the current state of the art, by 10% reduction on test perplexity.
There is now a considerable body of evidence to emphasize that prospective teachers’ cognition and education are highly influenced by their beliefs about teaching and learning, which in turn are defined by their personal history and learning experiences in classrooms. However, little has been said about the possible effect of such beliefs and experiences on practicing teachers. To address this gap, the present paper set out to investigate Iranian EFL teachers' beliefs about teaching grammar and to determine to what extent their current beliefs are influenced by their prior language learning experiences, their teacher education courses, and their teaching experiences. Analyses of data drawn from a questionnaire given to 40 experienced English teachers in an English language teaching institute and a semi-structured interview of 14 teachers from the same sample revealed a significant contribution of prior language learning experiences in the formation of the teachers' (pre)conceptions about teaching grammar. These experiences were found to be as influential in the construction of teachers' beliefs as the teachers' own teaching experiences and also proved to be much more significant than their teacher education courses.