Date of Award
5-2025
Document Type
Thesis
Degree Name
Master of Arts (MA)
College/School
College of Humanities and Social Sciences
Department/Program
Psychology
Thesis Sponsor/Dissertation Chair/Project Chair
Michael Bixter
Committee Member
Keven Askew
Committee Member
Manuel Gonzalez
Abstract
The following study seeks to answer the question of whether traditional regression models and more complex machine learning models can predict rare and infrequent events in the social and psychological sciences. Part of the study sought to compare the performance of regression models to more complex models, in an effort to determine whether the use of more complex models (which are harder to interpret and configure) is even necessary. This study explored this question via two studies. The first, being a study on workplace misconduct, in which 363 participants in the United States were surveyed as to their workplace experiences and behaviors, including acts of misconduct personally performed (a frequency of approximately 4%). The second used found data from a major news outlet’s database detailing civilian fatalities from police use-of-force incidents (firearms), from 2015 to 2023, and was paired with publicly available survey data (collected by the Federal government) focused on local and state police agencies organizational practices. In this second study, models were built to attempt to predict agencies that demonstrated a high risk of shooting unarmed civilians by virtue of their organizational practices and attributes (an approximately 1.5% occurrence). In both studies, various models, including logistic regression, random forest, XGBoost, and Tabnet were run in different configurations on the binary prediction problems (attempting to predict workplace misconduct in Study 1 and high-risk police agencies in Study 2), in an effort to identify (and then compare) those models that demonstrated sufficient performance in accurately identifying these rare events. Both Study 1 and Study 2 ultimately revealed that less sophisticated models tended to outperform more complex models. However, it was also observed that no single model performed well in both training and final validation – raising a question as to whether the models can be relied upon by virtue of only their repeated performance during training or their performance on unseen data (but not both). The study highlights the inherent difficulty in predicting rare events in the social sciences, where it is difficult to find rare events that, as a phenomenon, have a completely unique and strong signal (in terms of correlational strength) that is also common to all the rare events. The dynamic nature of these rare events, as well as the difficulty of applying machine learning to extremely imbalanced data, contributes to the inherent difficulty of achieving complete success in this area of study.
File Format
Recommended Citation
Lieberman, Brian, "Comparing Regression Methods to Interpretable Machine Learning in the Prediction of Infrequent Events in the Social and Psychological Sciences" (2025). Theses, Dissertations and Culminating Projects. 1525.
https://digitalcommons.montclair.edu/etd/1525