Date of Award
Master of Science (MS)
College of Science and Mathematics
Thesis Sponsor/Dissertation Chair/Project Chair
RSS feeds, Electronic newspapers--Language, Context (Linguistics)
The paper presents a statistical analysis that explores methods for measuring con- troversy in online news articles collected from 23 RSS feeds. Several baseline datasets are used to re-evaluate previous work and determine the predictive quality of uni- grams for classifying controversial documents. This is achieved by comparing con-troversy and sentiment, exploring sentiment variance, and considering entropy and standard deviation as potential features. The paper tests whether there are more controversial words in negative sentiment than in positive sentiment as well as whether there are more non-controversial words in positive sentiment than in negative sen-timent. Unlike previous studies, we determine that words alone were not useful for detecting controversy as they did not provide enough context. Consequently, fur-ther analysis yields a more fruitful approach using features to detect controversy such as standard deviation and entropy. Results demonstrate that entropy and standard deviation provide greater discrimination quality compared to using posi-tive and negative sentiment to classify controversial documents. Since words alone are not enough, we perform a crowdsourcing experiment on titles to provide more context than words alone. Although the titles are more beneficial than the words, we go one step further and utilize the summaries of the articles, which provide even more context. These features, along with the improvements, provide a cleaner sep-aration of data for classifying controversial documents and may provide useful in- sight for the design of future classification models.
Kaplun, Kateryna, "Classifying Controversiality in Article Data" (2018). Theses, Dissertations and Culminating Projects. 133.