Using Machine Learning to Classify Discipline and Text Type in University Writing

Presentation Type

Poster

Faculty Advisor

Larissa Goulart da Silva

Access Type

Event

Start Date

26-4-2024 2:15 PM

End Date

26-4-2024 3:15 PM

Description

This project aims to explore the extent to which a machine learning model can be used to accurately identify the discipline and communicative purpose of undergraduate student writing. With this goal in mind, we developed a machine learning model based on the linguistic annotation for grammatical features and used logistic regression to build the model. Our data consists of a corpus of 180 undergraduate student written assignments divided into four disciplinary groups (arts and humanities, social sciences, life sciences and physical sciences) and three communicative purposes (to argue, to explain, to give a procedural recount). Each text was tagged for different grammatical features with the Biber Tagger. Performance evaluation was carried out using classification reports, which provides metrics such as precision, recall, and F1 score for each class. Based on preliminary results, we can see that precision and recall for the discipline of physical sciences is higher than other disciplines in the corpus. In addition, it seems that both argumentative and procedural recounts assignments can be classified with a certain degree of accuracy. Explanation texts, on the other hand, have below optimal results. The results of the classification suggest that there is more linguistic variation within the assignment of explanations than other assignments. Similarly, the discipline of physical sciences also seems to be more stable in terms of linguistic features than the other ones investigated. The classification reports enable the assessment of the model's effectiveness in predicting the subject and communicative purpose of university student writing based on linguistic features.

This document is currently not available here.

Share

COinS
 
Apr 26th, 2:15 PM Apr 26th, 3:15 PM

Using Machine Learning to Classify Discipline and Text Type in University Writing

This project aims to explore the extent to which a machine learning model can be used to accurately identify the discipline and communicative purpose of undergraduate student writing. With this goal in mind, we developed a machine learning model based on the linguistic annotation for grammatical features and used logistic regression to build the model. Our data consists of a corpus of 180 undergraduate student written assignments divided into four disciplinary groups (arts and humanities, social sciences, life sciences and physical sciences) and three communicative purposes (to argue, to explain, to give a procedural recount). Each text was tagged for different grammatical features with the Biber Tagger. Performance evaluation was carried out using classification reports, which provides metrics such as precision, recall, and F1 score for each class. Based on preliminary results, we can see that precision and recall for the discipline of physical sciences is higher than other disciplines in the corpus. In addition, it seems that both argumentative and procedural recounts assignments can be classified with a certain degree of accuracy. Explanation texts, on the other hand, have below optimal results. The results of the classification suggest that there is more linguistic variation within the assignment of explanations than other assignments. Similarly, the discipline of physical sciences also seems to be more stable in terms of linguistic features than the other ones investigated. The classification reports enable the assessment of the model's effectiveness in predicting the subject and communicative purpose of university student writing based on linguistic features.