Date of Award

5-2022

Document Type

Master's Project

Degree Name

Master of Science (MS)

College/School

College of Science and Mathematics

Department/Program

Computer Science

Thesis Sponsor/Dissertation Chair/Project Chair

Eileen Fitzpatrick

Abstract

Natural Language Processing (NLP) in Arabic is witnessing an increasing interest in investigating different topics in the field. One of the topics that have drawn attention is the automatic processing of Arabic figurative language. The focus in previous projects is on detecting and interpreting metaphors in comments from social media as well as phrases and/or headlines from news articles. The current project focuses on metaphor detection in poems written in the Misurata Arabic sub-dialect spoken in Misurata, located in the North African region. The dataset is initially annotated by a group of linguists, and their annotation is treated as the seed data used in the project. Moreover, the verses in the dataset are annotated by layman native speakers of the sub-dialect who are not acquainted with the rhetorical principles of this kind of poetry. The model applied in the project is built on the Long Short-Term Memory (LSTM) architecture. The aim is to compare the performance of the model to the performance of human annotators who are not experts in the Arabic figurative language used in poetry. The results show that the model outperforms the output provided by the human annotators and scores a higher score of 79%. In addition, the model scores an 80.7 % accuracy score in predicting metaphors from unseen blind data. Since Arabic sub-dialects are acquired as a native language, it becomes important to develop NLP models that can be trained on these informal varieties of Arabic in order to fulfill many tasks such as auto-correction, machine translation, dialogue systems, and sentiment analysis among others.

Comments

Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computational Linguistics program.

File Format

PDF

Share

COinS