Date of Award
5-2022
Document Type
Master's Project
Degree Name
Master of Science (MS)
College/School
College of Science and Mathematics
Department/Program
Computer Science
Thesis Sponsor/Dissertation Chair/Project Chair
Eileen Fitzpatrick
Abstract
Natural Language Processing (NLP) in Arabic is witnessing an increasing interest in investigating different topics in the field. One of the topics that have drawn attention is the automatic processing of Arabic figurative language. The focus in previous projects is on detecting and interpreting metaphors in comments from social media as well as phrases and/or headlines from news articles. The current project focuses on metaphor detection in poems written in the Misurata Arabic sub-dialect spoken in Misurata, located in the North African region. The dataset is initially annotated by a group of linguists, and their annotation is treated as the seed data used in the project. Moreover, the verses in the dataset are annotated by layman native speakers of the sub-dialect who are not acquainted with the rhetorical principles of this kind of poetry. The model applied in the project is built on the Long Short-Term Memory (LSTM) architecture. The aim is to compare the performance of the model to the performance of human annotators who are not experts in the Arabic figurative language used in poetry. The results show that the model outperforms the output provided by the human annotators and scores a higher score of 79%. In addition, the model scores an 80.7 % accuracy score in predicting metaphors from unseen blind data. Since Arabic sub-dialects are acquired as a native language, it becomes important to develop NLP models that can be trained on these informal varieties of Arabic in order to fulfill many tasks such as auto-correction, machine translation, dialogue systems, and sentiment analysis among others.
File Format
Recommended Citation
Abugharsa, Azza, "Metaphor Detection in Poems in Misurata Arabic Sub-Dialect : An LSTM Model" (2022). Theses, Dissertations and Culminating Projects. 1080.
https://digitalcommons.montclair.edu/etd/1080
Comments
Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computational Linguistics program.