Designing and evaluating a Russian tagset
This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset and associated morphosyntactic specifications are based on the MULTEXT-East framework, while the decisions in designing it were aimed at achieving a balance between parameters important for linguists and the possibility to detect and disambiguate them automatically. The final tagset contains about 600 tags and achieves about 95% accuracy on the disambiguated portion of the Russian National Corpus. We have also produced a test set of tagging models and corpora that can be shared with other researchers.
MSU Digital Commons Citation
Sharoff, Serge; Kopotev, Mikhail; Erjavec, Tomaž; Feldman, Anna; and Divjak, Dagmar, "Designing and evaluating a Russian tagset" (2008). Department of Linguistics Faculty Scholarship and Creative Works. 28.