• No results found

3. Related Work 33

3.2. Existing data collections

Despite the proliferation of hate speech as a research field, one commonly accepted corpus does not exist yet. Because of this, authors usually have to collect and label their own data. The datasets are often constructed specially for the domain. Since the datasets have been constructed for different purposes, they may display different sub-types of hate speech and have unique characteristics. As an example, the data collected at a white

3.2. Existing data collections supremacy forum will differ from the data collected at more general sites such as Twitter, due to amongst other the difference in demographic. Because of the lack of a benchmark dataset, a lot of the studies conducted use a variety of different annotations and data, making it harder to compare methods and results. In addition, creating datasets is very time-consuming because the number of hateful statements is much lower than for neutral statements. Thus, to get a sufficient number of hate speech instances, a larger number of comments has to been annotated. Also, a lot of the datasets has not been made publicly available. One reason can be that due to the offensive and profanity language of the data, the authors do not want the content to be publicly available.

Even though there is no benchmark dataset there exist some that are widely used in recent papers. One of these is from Waseem and Hovy (2016). It was made publicly available on GitHub with approximately 16k messages from Twitter which were labelled as racism, sexism or neither. The tweets were collected through a manual search of common hateful terms and hashtags related to religion, sex, gender and ethnic minorities.

The dataset is quite small and also contain a significant proportion of neutral tweets, which makes it unbalanced. The authors state in their paper that this is intentional for a better real-world representation. At the time being, many tweets are no longer available due to users being deleted or blocked from Twitter.

Davidson et al. (2017) proposed a dataset with data collected from Twitter using the hate speech lexicon fromHatebase.org. They did labelling through employing CrowdFlower workers who did manual labelling on each tweet into one of three categories: hate speech, offensive but not hate speech or neither. The workers were given definitions of hate speech and told to consider the context of the tweet. The authors concluded that specific lexical methods are effective to identify the offensive language, but not as accurate when identifying hate speech; just a small percentage of the flagged Hatebase lexicon was considered hate speech by humans. Their analysis showed that a hateful term could both help and hinder accurate classification, and their study pointed out that some terms are especially useful for distinguishing between offensive language and hate speech.

Nonetheless, if a text does not contain any offensive terms or curse words, we are most likely to misclassify hate speech.

Another publicly available dataset1 from Kaggle contains around 150k Tweets where 16k are toxic. The dataset separates between six different types of hateful; toxic, severe toxic, obscene, threat, insult and identity hate. Sharma et al. (2018) created a new dataset of tweets based on ontological classes and degrees of harmful speech, with the granularity they claim other publicly available datasets to be missing. They also take into consideration the degree of harmful content, the intent of the speaker and how this affects people on social media when labelling the data. Z. Zhang et al. (2018) also created a dataset which extended the at time currently available datasets which consist of Waseem and Hovy (2016) and Davidson et al. (2017). More recently, Founta et al. (2018) published a dataset containing 100k English tweets with cross-validated labels. This is more useful

1https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data

3. Related Work

for deep learning models since they require larger amounts of data. The authors proposed a methodology for annotating large-scale datasets and used CrowdFlower workers for the labelling process. The dataset is more balanced compared to earlier datasets, with roughly half of the samples labelled as "Normal" and the rest as either "Offensive", "Spam"

or "Hateful".

Gröndahl et al. (2018) tried to reproduce five state of the art hate speech models using datasets from Waseem and Hovy (2016), Wulczyn et al. (2017) and Z. Zhang et al. (2018).

Their results show that the models only perform well when tested on the same type of data they were trained on. Thus, this underpins what Davidson et al. (2017) and Waseem (2016) emphasized; the lack of a definition of hate speech result in a difference in the annotation of the dataset, which leads to models predicting offensive speech as being hateful. However, if the models are retrained with the training set from another dataset and then tested using the test set from the same dataset, all models perform equally well.

Thus, they are largely independent of the model architecture.

Lastly, several national and international hate speech workshops have recently proposed datasets in their respective languages for their workshops and competitions. Zampieri et al. (2019) presented OLID, the Offensive Language Identification Dataset, which contains over 14k English tweets. Thus, this indicates that there is still not one commonly accepted corpus.

3.3. Anomaly detection

As previously mentioned, Gröndahl et al. (2018) compared five state-of-the-art hate speech models with three well-known datasets and found that all of the models had poor performance when they were trained on one dataset and tested against another.

Therefore, they suggest to re-phrase the problem. Hate speech detection has previously only been referred to as a classification problem, but should instead be addressed as detection. Hence, they suggest reconceptualising hate speech detection as anomaly detection, where hate speech constitutes an anomalous variant of ordinary speech. To the best of our knowledge, there are currently no existing methods that have experimented with the suggestion made by Gröndahl et al. (2018) to utilise anomaly detection to solve the problem of detecting hateful utterances.

Even though anomaly detection has not been used in the field of hate speech detection, it has been well-studied within diverse research areas and application domains. This includes areas such as fraud detection, industry damage detection, image processing, video network surveillance, and intrusion detection (Han et al., 2012). Most of the related work treats anomaly detection as an unsupervised learning problem. Typical anomaly detection methods assume that most of the data samples are normal and attempts to learn a "compressed" representation of this data. Moya et al. (1993) implemented a neural network one-class classifier for target recognition, while Schölkopf et al. (2001) and Tax and Duin (2004) implemented one-class SVMs for detecting novel data or outliers.

3.3. Anomaly detection The methods aim at finding a subset of the data which contains the normal instances.

Data samples that do not fall into this set are deemed anomalous. Furthermore, Kim and Scott (2012) and Vandermeulen and Scott (2013) use Kernel Density Estimation and F. T. Liu et al. (2008) use Isolation Forest to deal with the anomaly detection problem. A drawback to these shallow unsupervised anomaly detection methods is that they often require manual feature engineering to be effective on high-dimensional data.

This process is time-consuming, which entails that they are limited in their scalability to large datasets.

In recent years, there has been an increasing interest in deep anomaly detection algorithms (Chalapathy and Chawla, 2019), a line of research which has already shown promising

res-ults. These approaches are motivated by the limited scalability of shallow AD techniques, and the need for methods that can handle large and complex datasets. Furthermore, the deep anomaly detection methods aim at overcoming the need for manual feature engineering by being able to learn the relevant features from the data automatically.

The methods have been applied to a diverse set of tasks like video surveillance, image analysis, health care and cyber-intrusion detection. There have been proposed several novel deep approaches to unsupervised anomaly detection, including the work done by Abati et al. (2019) that designed an unsupervised deep autoencoder that learns the underlying probability distribution through an autoregressive procedure. Furthermore, Erfani et al. (2016) presented a hybrid model where an unsupervised Deep Belief Network is trained to extract features, and a one-class SVM is trained from these features. Other approaches include the works by Hendrycks et al. (2018), Ruff et al. (2019) and Pang et al. (2019).

All the methods previously mentioned are relying on unsupervised learning. On the other hand, semi-supervised learning utilises some labelled data samples in addition to unlabelled data. Many real-world applications have access to a small portion of data that might, for example, be labelled by a domain expert, and this knowledge is not exploited in the unsupervised setting. According to Ruff et al. (2020), the term semi-supervised anomaly detection has been used to describe two different settings for anomaly detection. These settings include only adding labelled normal data and adding both labelled normal data and anomalies. Most of the existing work adopts the first setting, i.e. only incorporates labelled normal data. Shallow approaches that adopt this setting include the work done by Blanchard et al. (2010) that created a semi-supervised method for novelty detection, assuming that only labelled examples of the normal class was available. They argue that the problem could be solved by reducing it to a Neyman-Pearson (NP) classification, which is the problem of binary classification subject to a constraint on the false positive rate. One deep approach is developed by Akcay et al. (2018) by using a conditional generative adversarial network by employing encoder-decoder-encoder sub-networks. There are a few authors that have investigated the second setting, where one utilises labelled anomalies in addition to the labelled normal data. This includes the works conducted by Görnitz et al. (2013) and Ergen et al. (2017) among others.

3. Related Work

Ruff et al. (2020) introduce a deep end-to-end method for general semi-supervised anomaly detection using an information-theoretic perspective. It involves deriving a loss motivated by the idea that the entropy for the latent distribution of normal data should be lower than the entropy of the anomalous distribution. Generally, semi-supervised approaches to anomaly detection aim at utilising labelled samples, but most proposed methods are limited to merely including labelled normal samples. This method also takes advantage of labelled anomalies. They have conducted extensive experiments with three widely used datasets containing images, along with other anomaly detection benchmark datasets (where none contains text data). They argue that their method outperforms shallow, hybrid, and deep competitors, yielding increased performance even when provided with only a little labelled data.

There are a limited amount of works that address anomaly detection on text data. L. M.

Manevitz et al. (2001) study one-class classification of documents using OC-SVM, where their model is based on identifying “outlier” data as representative of the second-class. L.

Manevitz and Yousef (2007) later experimented with a simple autoencoder (feed-forward network) on text, where they developed a filter to examine a corpus of documents and choose those of interest. They did this by only using positive information, i.e. normal data, training on the Reuters-21578 data collection (Lewis et al., 2004).2 Steyn and De Waal (2016) constructed a Multinomial Naïve Bayes classifier and enhanced it with an augmented Expectation-Maximization (EM) algorithm in an attempt to simplify the problem of textual anomaly detection. Kannan et al. (2017) use block coordinate descent optimisation to create a matrix factorisation method for anomaly detection on text, which they claim has significant advantages over traditional methods. Gorokhov et al. (2017) implemented a convolutional neural network for unsupervised learning with an RBF activation function and logarithmic loss that was tested on the Enron Email dataset.3

Recent work has found that proper text representation is crucial for designing well-performing machine learning algorithms. Several existing methods within the field of hate speech detection and text classification, in general, utilises word embeddings. This will be further discussed in Section 3.4. However, existing methods for anomaly detection often rely on bag-of-words (BoW) to represent text, such as the works by L. M. Manevitz et al. (2001), L. Manevitz and Yousef (2007), Kannan et al. (2017) and Mahapatra et al.

(2012). Neither of these methods makes use of unsupervised pre-trained word models, like word embeddings. Ruff et al. (2019) is currently the only text-specific method for anomaly detection that utilises pre-trained models for distributed vector representations of words. They introduce a one-class classification method which uses unsupervised learning, builds upon word embedding models and learn multiple sentence representations via self-attention. These sentence representations capture multiple semantic contexts, which enables the performance of contextual anomaly detection concerning the multiple themes and concepts present in the unlabelled text corpus. The datasets they experimented

2Available at: http://www.daviddlewis.com/resources/testcollections/reuters21578/

3https://www.kaggle.com/wcukierski/enron-email-dataset

3.4. Features in hate speech detection