Document Type

Conference Proceeding

Publication Date

1-1-2023

Journal / Book Title

Lecture Notes in Networks and Systems

Abstract

Social media websites such as Twitter have become so indispensable today that people use them almost on a daily basis for sharing their emotions, opinions, suggestions and thoughts. Motivated by such behavioral tendencies, the purpose of this study is to define an approach to automatically classify the tweets on Twitter data into two main classes, namely, hate speech and non-hate speech. This provides a valuable source of information in analyzing and understanding target audiences and spotting marketing trends. We thus propose HiSAT, a Hierarchical framework for Sentiment Analysis on Twitter data. Sentiments/opinions in tweets are highly unstructured-and do not have a proper defined sequence. They constitute a heterogeneous data from many sources having different formats, and express either positive or negative, or neutral sentiment. Hence, in HiSAT we conduct Natural Language Processing encompassing tokenization, stemming and lemmatization techniques that convert text to tokens; as well as Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) techniques that convert text sentences into numeric vectors. These are then fed as inputs to Machine learning algorithms within the HiSAT framework; more specifically, Random Forest, Logistic Regression and Naïve Bayes are used as text-binary classifiers to detect hate speech and non-hate speech from the tweets. Results of experiments performed with the HiSAT framework show that Random Forest outperforms the others with a better prediction in estimating the correct labels (with accuracy above the 95% range). We present the HiSAT approach, its implementation and experiments, along with related work and ongoing research.

DOI

10.1007/978-3-031-16072-1_28

Journal ISSN / Book ISBN

85138002632 (Scopus)

Share

COinS