PET Scan: Uncovering cross-lingual abilities of LLMs to detect potentially euphemistic terms
Presentation Type
Abstract
Faculty Advisor
Anna Feldman
Access Type
Event
Start Date
25-4-2025 10:30 AM
End Date
25-4-2025 11:29 AM
Description
Euphemisms pose challenges for Large Language Models (LLMs) due to their ambiguous and context-dependent meanings. This study examines LLMs’ ability to classify Potentially Euphemistic Terms cross-linguistically in five languages: English, Spanish, Chinese, Turkish, and Yorùbá. We compare sequential fine-tuning, where a model learns euphemisms in one language before another, to simultaneous fine-tuning, where multiple languages are learned at once. Results show that sequential fine-tuning generally improves PET detection in the second language, suggesting that prior exposure to euphemisms aids cross-lingual transfer. However, performance varies across language pairs due to linguistic differences and dataset complexity. Our findings reveal that transfer success depends on pretraining coverage, dataset complexity, and linguistic similarity. This work extends euphemism detection to low-resource languages and provides insights into how LLMs transfer figurative language knowledge across languages.
PET Scan: Uncovering cross-lingual abilities of LLMs to detect potentially euphemistic terms
Euphemisms pose challenges for Large Language Models (LLMs) due to their ambiguous and context-dependent meanings. This study examines LLMs’ ability to classify Potentially Euphemistic Terms cross-linguistically in five languages: English, Spanish, Chinese, Turkish, and Yorùbá. We compare sequential fine-tuning, where a model learns euphemisms in one language before another, to simultaneous fine-tuning, where multiple languages are learned at once. Results show that sequential fine-tuning generally improves PET detection in the second language, suggesting that prior exposure to euphemisms aids cross-lingual transfer. However, performance varies across language pairs due to linguistic differences and dataset complexity. Our findings reveal that transfer success depends on pretraining coverage, dataset complexity, and linguistic similarity. This work extends euphemism detection to low-resource languages and provides insights into how LLMs transfer figurative language knowledge across languages.
Comments
Poster presentation at the 2025 Student Research Symposium.