Features in hate speech detection

3. Related Work 33

3.4. Features in hate speech detection

both GloVe and FastText embedding and BERT language model and found that the improvements using BERT on these datasets were insufficient and did not justify the increased computational cost.

3.4. Features in hate speech detection

Feature extraction aims to transform input data into a new dataset by creating new features. Schmidt and Wiegand (2017) did a summary of important features used within hate speech detection, which includes a wide range of features. In this section follows a presentation of the state of the art within feature extraction.

Simple surface featuresare features that can be derived without advanced methods. Bag of words (BoW), word and character n-grams are popular methods used to find the presence and frequency of words in a document. URL mentions, hashtags, punctuation, word and document lengths and capitalisation are also used widely in hate speech classification by authors such as Burnap and Williams (2015), Waseem and Hovy (2016) and Nobata et al. (2016). Waseem and Hovy (2016) explored which features that are most prominent when detecting hate speech and found that character n-grams contribute the most to the result. Furthermore, Mehdad and Tetreault (2016) concluded that character n-grams outperform word n-grams and other methods. Character n-grams are superior within hate speech detection due to the ever-evolving language on social media. Users learn blacklisted words and can thus avoid them by using slurs and disguising the language in other manners. Using character n-grams is also more efficient to catch spelling mistakes.

Severallexical resources can be found on the web. Burnap and Williams (2015) created a word list containing specific negative words such as insults and slurs. Dennis Gitari et al. (2015) built a list of hate verbs and more recently hatebase.org⁴ has been widely used. Davidson et al. (2017) showed that when detecting hate speech, one cannot rely completely on only these word lists. Most approaches today use it in addition to other features, such as simple surface features and word embeddings.

Linguistic features or syntactic features utilise syntactic information in the language such as dependency relationships and part-of-speech (POS) which are employed in the feature set of Dennis Gitari et al. (2015), Burnap and Williams (2015), Van Hee et al. (2015) and Z. Zhang et al. (2018). Nobata et al. (2016) looked into several different features, such as surface features and linguistic features. They found that character n-grams perform very well alone, but in combination with other features, their method became even more powerful. By adding these methods, one can capture long-range dependencies between words which n-grams may struggle to do. Burnap and Williams (2015) found that using typed dependencies, a representation of a syntactic grammatical relationship in a sentence, reduced the false negatives by 7% over the baseline BoW. This is useful

4https://hatebase.org/

3. Related Work

for differentiating between utterances such as "send de hjem" and "la de være". Here, the POS pattern is the same, but the first utterance is more frequent in hateful comments.

Sentiment analysis is the degree of polarity expressed in a message. A popular feature used is the presence of positive or negative words. Thus, hate speech and sentiment are often affiliated; often negative sentiment belongs to a hateful utterance, and several approaches such as Dennis Gitari et al. (2015) and Van Hee et al. (2015) look into this.

With hate speech detection, one might encounter the problem of data sparsity and high dimensionality. Word generalisation, which generally consists of word clustering and word embeddings, have been used to face these problems. With word clustering, induced cluster IDs representing a set of words are used as additional generalised features.

Algorithms such as Brown Clustering (Brown et al., 1992), assigning each individual word to one particular cluster, and Latent Dirichlet Allocation (LDA) (Blei et al., 2003), topic distribution for each word were used as features in Warner and Hirschberg (2012), Malmasi and Zampieri (2017) and Zhong et al. (2016). However, more recently, word embeddings, distributed word representations based on neural networks, have been proposed for similar purposes. Word embeddings can be useful in hate speech detection since semantically similar words such as "dog" and "cat" may end up having a more similar vector than "dog" and "boat". There exist several popular word embedding models including Word2vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014) and fastText (Mikolov et al., 2017). Word embedding has also been quite popular in recent research within hate speech. Nobata et al. (2016), Park and Fung (2017) and Gambäck and Sikdar (2017) used Word2vec, Badjatiya et al. (2017) used GloVe and fastText and Pavlopoulos et al. (2017) used both GloVe and Word2vec. Djuric et al. (2015) showed that Word2vec outperformed the current state of the art model when compared to using BoW with a logistic regression classifier, addressing the aforementioned issues of data sparsity and high dimensionality. Furthermore, recently Devlin et al. (2018) presented a new word embedding model called BERT, Bidirectional Transformers for Language Understanding. BERT is a bidirectional unsupervised language representation, meaning that it can represent a word as having different meanings. For example, the word "bank"

would have different representation, whether it is used in the word "bank deposit" or

"riverbank". Their model outperformed the current state of the art, and this model can further improve many different types of NLP tasks. In addition to BERT, ElMo (Peters et al., 2018) and ULMFiT (Howard and Ruder, 2018) are some popular transfer learning models used in hate speech detection.

Meta informationis information about the context such as user characteristics or whether the user has a high frequency of certain negative words in their user history. Waseem and Hovy (2016), Pitsilis et al. (2018) and Unsvåg (2018) all look into this. Today’s social media consist of both images, videos and audio content. This content can also be hateful, and some studies try to use this multi-modal information as a predictive feature. Hosseinmardi et al. (2015) and Zhong et al. (2016) employ features based on images. World knowledge is used inKnowledge-based features to better get the context of a sentence. It requires a lot of manual coding, and therefore, to the best of our knowledge,

3.4. Features in hate speech detection only Dinakar et al. (2012) use a knowledge base in their work. In many papers, several different features are used and tested to gain the best result. This includes Nobata et al.

(2016), Z. Zhang et al. (2018), Badjatiya et al. (2017), Davidson et al. (2017) and Waseem (2016). However, it is often difficult to select useful features since existing supervised models heavily rely on carefully engineered features. Robinson et al. (2018) conducted an extensive feature selection analysis, looking at surface features, linguistic features and sentiment features. They concluded that automatic feature selection could reduce the carefully engineered features by over 90%. The authors were able to select a small set of the most predictive features which achieves much better results than models using carefully engineered features. In addition, to the best of our knowledge, the general trend favours employing n-grams and different types of word embeddings, where transfer learning models have been the newest addition to the current state of the art.

The features commonly used in hate speech detection, as presented in this section, are summarised in Table 3.1.

Table 3.1.: Features in hate speech detection Feature type Description

Simple surface features Features that can be derived without advanced methods Bag of Words (BoW) It is a representation of text that describes the occurrence

of words within a document

N-grams N-gram assigns probabilities to word and character sequences. It is a sequence of n words or characters Linguistic features Utilise syntactic information in the language such as

dependency relationships and part-of-speech (POS) tagging Typed dependencies A representation of a syntactic grammatical relationship in

a sentence

Sentiment analysis The degree of polarity expressed in a message

Word generalisation Generally consists of word clustering and word embeddings Word clustering Cluster ids is representing a set of words used as additional

generalised features

Word embeddings Distributed word representations based on neural networks, such as Word2Vec, GloVe and fastText

Transfer learners Language models that represent a word as having different meanings, such as BERT and ElMo

Meta information Information about the context, such as user characteristics Knowledge-based

features World knowledge that is used to better understand the context of a sentence

3. Related Work

In document Detecting hateful utterances using an anomaly detection approach (sider 55-58)