Evaluating and Automating the Annotation of a Learner Corpus
Journal / Book Title
Language Resources and Evaluation
The paper describes a corpus of texts produced by non-native speakersof Czech. We discuss its annotation scheme, consisting of three interlinked tiers,designed to handle a wide range of error types present in the input. Each tier correctsdifferent types of errors; links between the tiers allow capturing errors in word orderand complex discontinuous expressions. Errors are not only corrected, but alsoclassified. The annotation scheme is tested on a data set including approx. 175,000words with fair inter-annotator agreement results. We also explore the possibility ofapplying automated linguistic annotation tools (taggers, spell checkers and grammarcheckers) to the learner text to support or even substitute manual annotation.
Journal ISSN / Book ISBN
MSU Digital Commons Citation
Rosen, Alexandr; Hana, Jirka; Stindlova, Barbora; and Feldman, Anna, "Evaluating and Automating the Annotation of a Learner Corpus" (2014). Department of Linguistics Faculty Scholarship and Creative Works. 7.
Rosen, A., Hana, J., Stindlová, B., & Feldman, A. (2014). Evaluating and automating the annotation of a learner corpus. Language Resources and Evaluation, 48(1), 65-92. doi:http://dx.doi.org/10.1007/s10579-013-9226-3