Date of Award
Master of Science (MS)
College of Science and Mathematics
Thesis Sponsor/Dissertation Chair/Project Chair
In this study, we will analyze a supply retailing company’s data to model the relationship between their customer’s past purchase behavior to predict their future online purchase behavior. The data was divided into time periods from 2016: P1-P6(January 31st to July 30th) and P7(July 31st to August 27th ). Based on customer’s past purchase information from the P1-P6 period, such as money spent, number of cart additions, transactions type, number of unique purchase dates, number of unique purchase skus, number of page views, number browse dates, company size, and number of products purchased, we aim to find if these information could predict the customer’s purchase behavior in the P7 period, which is the number of responses the customer responded to emails sent to them during P7. With the response variable as count data, we model the data in R with the Poisson distribution regression with an offset variable. We also model the number of responses out of the number of emails sent using a logistic regression model. For the Poisson model, since there are zero inflation or over-dispersion issues in the response, hurdle model, zero-inflated-poisson (ZIP) model and zero-inflated-negative-binomial (ZINB) model would be used to handle these issues. Model comparisons among the Poisson model with an offset, logistic regression model, hurdle model, ZIP, ZINB is conducted to select the best model to fit the data using the AIC criterion and the cross-validation criterion.
Zhang, Chengxin, "Statistical Modeling of Count Data with Over-Dispersion or Zero-Inflation Problems" (2019). Theses, Dissertations and Culminating Projects. 275.