Date of Award
5-2019
Document Type
Thesis
Degree Name
Master of Science (MS)
College/School
College of Science and Mathematics
Department/Program
Mathematical Sciences
Thesis Sponsor/Dissertation Chair/Project Chair
Haiyan Su
Committee Member
Andrew McDougall
Committee Member
Andrada Ivanescu
Abstract
In this study, we will analyze a supply retailing company’s data to model the relationship between their customer’s past purchase behavior to predict their future online purchase behavior. The data was divided into time periods from 2016: P1-P6(January 31st to July 30th) and P7(July 31st to August 27th ). Based on customer’s past purchase information from the P1-P6 period, such as money spent, number of cart additions, transactions type, number of unique purchase dates, number of unique purchase skus, number of page views, number browse dates, company size, and number of products purchased, we aim to find if these information could predict the customer’s purchase behavior in the P7 period, which is the number of responses the customer responded to emails sent to them during P7. With the response variable as count data, we model the data in R with the Poisson distribution regression with an offset variable. We also model the number of responses out of the number of emails sent using a logistic regression model. For the Poisson model, since there are zero inflation or over-dispersion issues in the response, hurdle model, zero-inflated-poisson (ZIP) model and zero-inflated-negative-binomial (ZINB) model would be used to handle these issues. Model comparisons among the Poisson model with an offset, logistic regression model, hurdle model, ZIP, ZINB is conducted to select the best model to fit the data using the AIC criterion and the cross-validation criterion.
File Format
Recommended Citation
Zhang, Chengxin, "Statistical Modeling of Count Data with Over-Dispersion or Zero-Inflation Problems" (2019). Theses, Dissertations and Culminating Projects. 275.
https://digitalcommons.montclair.edu/etd/275