Experiments in Cross-Language Morphological Annotation Transfer
Document Type
Conference Proceeding
Publication Date
7-7-2006
Journal / Book Title
Computational Linguistics and Intelligent Text Processing
Abstract
Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is available. Our paper describes experiments with Polish, Czech, and Russian. However, the method is not tied in any way to these languages. In all the experiments we use the TnT tagger ([3]), a second-order Markov model. Our approach assumes that the information acquired about one language can be used for processing a related language. We have found out that even breath-takingly naive things (such as approximating the Russian transitions by Czech and/or Polish and approximating the Russian emissions by (manually/automatically derived) Czech cognates) can lead to a significant improvement of the tagger's performance.
DOI
10.1007/11671299_4
Journal ISSN / Book ISBN
Online ISBN 978-3-540-32206-1
MSU Digital Commons Citation
Feldman, Anna; Hana, Jirka; and Brew, Chris, "Experiments in Cross-Language Morphological Annotation Transfer" (2006). Department of Linguistics Faculty Scholarship and Creative Works. 48.
https://digitalcommons.montclair.edu/linguistics-facpubs/48
Published Citation
Feldman, A., Hana, J., Brew, C. (2006). Experiments in Cross-Language Morphological Annotation Transfer. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg.