Document Type

Conference Proceeding

Publication Date



This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset and associated morphosyntactic specifications are based on the MULTEXT-East framework, while the decisions in designing it were aimed at achieving a balance between parameters important for linguists and the possibility to detect and disambiguate them automatically. The final tagset contains about 600 tags and achieves about 95% accuracy on the disambiguated portion of the Russian National Corpus. We have also produced a test set of tagging models and corpora that can be shared with other researchers.

Published Citation

Sharoff, S., Kopotev, M., Erjavec, T., Feldman, A., & Divjak, D. (2008, May). Designing and Evaluating a Russian Tagset. In LREC (Vol. 26, pp. 279-285).

Included in

Linguistics Commons