PET Scan: Uncovering cross-lingual abilities of LLMs to detect potentially euphemistic terms

Presentation Type

Abstract

Faculty Advisor

Anna Feldman

Access Type

Event

Start Date

25-4-2025 10:30 AM

End Date

25-4-2025 11:29 AM

Description

Euphemisms pose challenges for Large Language Models (LLMs) due to their ambiguous and context-dependent meanings. This study examines LLMs’ ability to classify Potentially Euphemistic Terms cross-linguistically in five languages: English, Spanish, Chinese, Turkish, and Yorùbá. We compare sequential fine-tuning, where a model learns euphemisms in one language before another, to simultaneous fine-tuning, where multiple languages are learned at once. Results show that sequential fine-tuning generally improves PET detection in the second language, suggesting that prior exposure to euphemisms aids cross-lingual transfer. However, performance varies across language pairs due to linguistic differences and dataset complexity. Our findings reveal that transfer success depends on pretraining coverage, dataset complexity, and linguistic similarity. This work extends euphemism detection to low-resource languages and provides insights into how LLMs transfer figurative language knowledge across languages.

Comments

Poster presentation at the 2025 Student Research Symposium.

This document is currently not available here.

Share

COinS
 
Apr 25th, 10:30 AM Apr 25th, 11:29 AM

PET Scan: Uncovering cross-lingual abilities of LLMs to detect potentially euphemistic terms

Euphemisms pose challenges for Large Language Models (LLMs) due to their ambiguous and context-dependent meanings. This study examines LLMs’ ability to classify Potentially Euphemistic Terms cross-linguistically in five languages: English, Spanish, Chinese, Turkish, and Yorùbá. We compare sequential fine-tuning, where a model learns euphemisms in one language before another, to simultaneous fine-tuning, where multiple languages are learned at once. Results show that sequential fine-tuning generally improves PET detection in the second language, suggesting that prior exposure to euphemisms aids cross-lingual transfer. However, performance varies across language pairs due to linguistic differences and dataset complexity. Our findings reveal that transfer success depends on pretraining coverage, dataset complexity, and linguistic similarity. This work extends euphemism detection to low-resource languages and provides insights into how LLMs transfer figurative language knowledge across languages.