Title

Evaluating and Automating the Annotation of a Learner Corpus

Document Type

Article

Publication Date

3-2014

Journal / Book Title

Language Resources and Evaluation

Abstract

The paper describes a corpus of texts produced by non-native speakersof Czech. We discuss its annotation scheme, consisting of three interlinked tiers,designed to handle a wide range of error types present in the input. Each tier correctsdifferent types of errors; links between the tiers allow capturing errors in word orderand complex discontinuous expressions. Errors are not only corrected, but alsoclassified. The annotation scheme is tested on a data set including approx. 175,000words with fair inter-annotator agreement results. We also explore the possibility ofapplying automated linguistic annotation tools (taggers, spell checkers and grammarcheckers) to the learner text to support or even substitute manual annotation.

DOI

10.1007/s10579-013-9226-3

Journal ISSN / Book ISBN

1574-020X‎

Published Citation

Rosen, A., Hana, J., Stindlová, B., & Feldman, A. (2014). Evaluating and automating the annotation of a learner corpus. Language Resources and Evaluation, 48(1), 65-92. doi:http://dx.doi.org/10.1007/s10579-013-9226-3

COinS