This study provides preliminary insights into the linguistic features that contribute to Internet censorship in mainland China. We collected a corpus of 344 censored and uncensored microblog posts that were published on Sina Weibo and built a Naive Bayes classifier based on the linguistic, topic-independent, features. The classifier achieves a 79.34% accuracy in predicting whether a blog post would be censored on Sina Weibo.
MSU Digital Commons Citation
Ng, Kei Yin; Feldman, Anna State 6557500; and Leberknight, Chris, "Detecting Censorable Content on Sina Weibo: A Pilot Study" (2018). Department of Linguistics Faculty Scholarship and Creative Works. 26.
Ng, K. Y., Feldman, A., & Leberknight, C. (2018, July). Detecting censorable content on sina weibo: A pilot study. In Proceedings of the 10th Hellenic Conference on Artificial Intelligence (pp. 1-5).