Document Type
Preprint
Publication Date
1-1-2011
Journal / Book Title
Proceedings of the Annual Meeting of the Association for Computational Linguistics
Abstract
The paper describes a tagger for Old Czech (1200-1500 AD), a fusional language with rich morphology. The practical restrictions (no native speakers, limited corpora and lexicons, limited funding) make Old Czech an ideal candidate for a resource-light crosslingual method that we have been developing (e.g. Hana et al., 2004; Feldman and Hana, 2010). We use a traditional supervised tagger. However, instead of spending years of effort to create a large annotated corpus of Old Czech, we approximate it by a corpus of Modern Czech. We perform a series of simple transformations to make a modern text look more like a text in Old Czech and vice versa. We also use a resource-light morphological analyzer to provide candidate tags. The results are worse than the results of traditional taggers, but the amount of language-specific work needed is minimal.
Montclair State University Digital Commons Citation
Hana, Jirka; Feldman, Anna; and Aharodnik, Katsiaryna, "A Low-budget Tagger for Old Czech" (2011). Department of Computer Science Faculty Scholarship and Creative Works. 649.
https://digitalcommons.montclair.edu/compusci-facpubs/649