Date of Award


Document Type


Degree Name

Master of Science (MS)


College of Science and Mathematics


Computer Science

Thesis Sponsor/Dissertation Chair/Project Chair

Vaibhav Anu

Committee Member

Aparna Varde

Committee Member

Jiacheng Shang


This thesis proposes and evaluates Machine Learning (ML) based data models to identify and isolate software requirements from datasets containing user app review statements. The ML models classify user app review statements into Functional Requirements (FRs), Non-Functional Requirements (NFRs), and Non-Requirements (NRs). This proposed approach consisted of creating a novel hybrid dataset that contains software requirements from Software Requirements Specification (SRS) documents and user app reviews. The Support Vector Machine (SVM), Stochastic Gradient Descent (SGD), and Random Forest (RF) ML algorithms combined with the term frequency-inverse document frequency (TF-IDF) natural language processing (NLP) technique were implemented on the hybrid dataset. The performance of each data model was evaluated by metrics such as accuracy, precision, recall, and F1 scores, and the models were validated using 10 k-fold cross-validation. The proposed approach can successfully identify and isolate software requirements with SGD performing the best with an accuracy of 83%. Overall, this thesis presents a comprehensive methodology for implementing machine learning algorithms combined with NLP techniques to identify requirements from user app reviews with a high degree of accuracy.

File Format