Date of Award

5-2018

Document Type

Thesis

Degree Name

Master of Science (MS)

College/School

College of Science and Mathematics

Department/Program

Mathematical Sciences

Thesis Sponsor/Dissertation Chair/Project Chair

Aihua Li

Committee Member

Christopher Leberknight

Committee Member

Haiyan Su

Subject(s)

RSS feeds, Electronic newspapers--Language, Context (Linguistics)

Abstract

The paper presents a statistical analysis that explores methods for measuring con- troversy in online news articles collected from 23 RSS feeds. Several baseline datasets are used to re-evaluate previous work and determine the predictive quality of uni- grams for classifying controversial documents. This is achieved by comparing con-troversy and sentiment, exploring sentiment variance, and considering entropy and standard deviation as potential features. The paper tests whether there are more controversial words in negative sentiment than in positive sentiment as well as whether there are more non-controversial words in positive sentiment than in negative sen-timent. Unlike previous studies, we determine that words alone were not useful for detecting controversy as they did not provide enough context. Consequently, fur-ther analysis yields a more fruitful approach using features to detect controversy such as standard deviation and entropy. Results demonstrate that entropy and standard deviation provide greater discrimination quality compared to using posi-tive and negative sentiment to classify controversial documents. Since words alone are not enough, we perform a crowdsourcing experiment on titles to provide more context than words alone. Although the titles are more beneficial than the words, we go one step further and utilize the summaries of the articles, which provide even more context. These features, along with the improvements, provide a cleaner sep-aration of data for classifying controversial documents and may provide useful in- sight for the design of future classification models.

Share

COinS