Department of Computer Science Faculty Scholarship and Creative Works

A Low-budget Tagger for Old Czech

Document Type

Preprint

Publication Date

1-1-2011

Journal / Book Title

Proceedings of the Annual Meeting of the Association for Computational Linguistics

Abstract

The paper describes a tagger for Old Czech (1200-1500 AD), a fusional language with rich morphology. The practical restrictions (no native speakers, limited corpora and lexicons, limited funding) make Old Czech an ideal candidate for a resource-light crosslingual method that we have been developing (e.g. Hana et al., 2004; Feldman and Hana, 2010). We use a traditional supervised tagger. However, instead of spending years of effort to create a large annotated corpus of Old Czech, we approximate it by a corpus of Modern Czech. We perform a series of simple transformations to make a modern text look more like a text in Old Czech and vice versa. We also use a resource-light morphological analyzer to provide candidate tags. The results are worse than the results of traditional taggers, but the amount of language-specific work needed is minimal.

Montclair State University Digital Commons Citation

Hana, Jirka; Feldman, Anna; and Aharodnik, Katsiaryna, "A Low-budget Tagger for Old Czech" (2011). Department of Computer Science Faculty Scholarship and Creative Works. 649.
https://digitalcommons.montclair.edu/compusci-facpubs/649

Download

Included in

Computer Sciences Commons

COinS

Department of Computer Science Faculty Scholarship and Creative Works

A Low-budget Tagger for Old Czech

Document Type

Publication Date

Journal / Book Title

Abstract

Montclair State University Digital Commons Citation

Included in

Search

Browse

Author Corner

Links

Department of Computer Science Faculty Scholarship and Creative Works

A Low-budget Tagger for Old Czech

Authors

Document Type

Publication Date

Journal / Book Title

Abstract

Montclair State University Digital Commons Citation

Included in

Share

Search

Browse

Author Corner

Links

//<![CDATA[ document.write("<a href='mailto:" + "digitalcommons" + "@" + "mail.montclair.edu" + "'>" + "Contact Us" + "<\/a>") //]]>