Decoding psychology through text: Insights into text-based symptom classification

Presentation Type

Abstract

Faculty Advisor

Katsiaryna Aharodnik

Access Type

Event

Start Date

25-4-2025 12:00 PM

End Date

25-4-2025 1:00 PM

Description

The diagnosis of mental health disorders is challenging due to the absence of an all-encompassing diagnostic framework, symptom overlap, the high prevalence of comorbidity, and the subjective nature of diagnoses (APA, 2013; Cameron, 2007). These challenges can lead to suboptimal treatment outcomes and a lack of trust in healthcare systems. This study leveraged a dataset of anonymous user statements categorized into mental health conditions (available on Kaggle - Sarkar, 2025). We constructed a generalized mental health machine learning classifier to identify best-performing diagnostic metrics for seven mental health disorder classes. We used an encoder-only pre-trained large language model (LLM) BERT (Wolf et al., 2020; Devlin et al., 2019) as our classifier, as BERT suite models are well-suited for sentiment analysis, where subtle linguistic and contextual cues are critical. We performed analysis of part-of-speech use and word n-grams, and then fine-tuned models and evaluated their performance. Our highest-performing model achieved a F1 score of 87.7%, which included only mental health conditions. This meant excluding ‘Stress’ and ‘Suicidal’, since previous models struggled to disambiguate these from diagnosable disorders. These results highlight the challenges of using ambiguous self-reflections about mental health to classify conditions that are deeply psychological, behavioral, and contextually dependent. Our work provides insights into the nuanced ways patients describe their symptoms, as well as the potential limitations of using LLMs for mental health diagnoses. These findings demonstrate the complexities of building reliable models in this domain while contributing to advancements in understanding mental health dynamics and improving resource accessibility.

Comments

Poster presentation at the 2025 Student Research Symposium.

This document is currently not available here.

Share

COinS
 
Apr 25th, 12:00 PM Apr 25th, 1:00 PM

Decoding psychology through text: Insights into text-based symptom classification

The diagnosis of mental health disorders is challenging due to the absence of an all-encompassing diagnostic framework, symptom overlap, the high prevalence of comorbidity, and the subjective nature of diagnoses (APA, 2013; Cameron, 2007). These challenges can lead to suboptimal treatment outcomes and a lack of trust in healthcare systems. This study leveraged a dataset of anonymous user statements categorized into mental health conditions (available on Kaggle - Sarkar, 2025). We constructed a generalized mental health machine learning classifier to identify best-performing diagnostic metrics for seven mental health disorder classes. We used an encoder-only pre-trained large language model (LLM) BERT (Wolf et al., 2020; Devlin et al., 2019) as our classifier, as BERT suite models are well-suited for sentiment analysis, where subtle linguistic and contextual cues are critical. We performed analysis of part-of-speech use and word n-grams, and then fine-tuned models and evaluated their performance. Our highest-performing model achieved a F1 score of 87.7%, which included only mental health conditions. This meant excluding ‘Stress’ and ‘Suicidal’, since previous models struggled to disambiguate these from diagnosable disorders. These results highlight the challenges of using ambiguous self-reflections about mental health to classify conditions that are deeply psychological, behavioral, and contextually dependent. Our work provides insights into the nuanced ways patients describe their symptoms, as well as the potential limitations of using LLMs for mental health diagnoses. These findings demonstrate the complexities of building reliable models in this domain while contributing to advancements in understanding mental health dynamics and improving resource accessibility.