Probabilistic tree transducers for grammatical error correction
[摘要] ENGLISH ABSTRACT: We investigate the application of weighted tree transducers to correcting grammaticalerrors in natural language. Weighted finite-state transducers (FST) have beenused successfully in a wide range of natural language processing (NLP) tasks, eventhough the expressiveness of the linguistic transformations they perform is limited.Recently, there has been an increase in the use of weighted tree transducers andrelated formalisms that can express syntax-based natural language transformationsin a probabilistic setting.The NLP task that we investigate is the automatic correction of grammar errorsmade by English language learners. In contrast to spelling correction, which canbe performed with a very high accuracy, the performance of grammar correctionsystems is still low for most error types. Commercial grammar correction systemsmostly use rule-based methods. The most common approach in recent grammaticalerror correction research is to use statistical classifiers that make local decisions aboutthe occurrence of specific error types. The approach that we investigate is related toa number of other approaches inspired by statistical machine translation (SMT) orbased on language modelling. Corpora of language learner writing annotated witherror corrections are used as training data.Our baseline model is a noisy-channel FST model consisting of an n-gram languagemodel and a FST error model, which performs word insertion, deletion andreplacement operations. The tree transducer model we use to perform error correctionis a weighted top-down tree-to-string transducer, formulated to perform transformationsbetween parse trees of correct sentences and incorrect sentences. Usingan algorithm developed for syntax-based SMT, transducer rules are extracted fromtraining data of which the correct version of sentences have been parsed. Rule weightsare also estimated from the training data. Hypothesis sentences generated by thetree transducer are reranked using an n-gram language model.We perform experiments to evaluate the performance of different configurationsof the proposed models. In our implementation an existing tree transducer toolkit isused. To make decoding time feasible sentences are split into clauses and heuristicpruning is performed during decoding. We consider different modelling choices in theconstruction of transducer rules. The evaluation of our models is based on precisionand recall. Experiments are performed to correct various error types on two learnercorpora. The results show that our system is competitive with existing approacheson several error types.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]