Datasets - Experimental setup - Experiments and Results 73

6. Experiments and Results 73

6.2. Experimental setup

6.2.1. Datasets

To perform the experiments with semi-supervised detection of hate speech and to check the feasibility of the approach, two large datasets were used; one in the English language and one in the Norwegian language. For both datasets, the text preprocessing steps described in Section 5.1 were applied.

English (Jigsaw) dataset

The English dataset used in the experiments was the Jigsaw dataset from a Toxic Comment Classification Challenge, which is available on Kaggle.¹ The dataset consists of a number of comments and their respective set of labels. There are six categories in the dataset; toxic, severe toxic, obscene, threat, insult and identity hate. Two random samples from the dataset looks like this:

Table 6.1.: Two random samples from the Jigsaw dataset showing how the data is represented.

Each row also contains an id which is not included here.

comment text toxic severe

toxic obscene threat insult identity hate FUCK YOUR FILTHY

MOTHER IN THE ASS,

DRY! 1 0 1 0 1 0

You, sir, are my hero.

Any chance you remember what page that’s on?

0 0 0 0 0 0

As can be seen from Table 6.1, a comment may have several labels, exactly one label or no labels. The comment is neutral if does not have any labels, i.e. none of its labels are set to one. The dataset consists of one training and test set, where some of the labels in the test set are set to -1. After all of these labels were removed, the dataset consisted of

1https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data

6. Experiments and Results

223 549 comments in total, where 159 571 are in the train set and 63 978 are in the test set. Hence, the test set consists of 28.6% of all comments. Table 6.2 shows the number of comments that are labelled in each category.

Table 6.2.: The number of comments in each category in the English dataset Label Train set Test set

Toxic 15294 6090

Severe toxic 1595 367

Obscene 8449 3691

Threat 478 211

Insult 7877 3427

Identity hate 1405 712

Since many comments are labelled with more than one label, the total number of hateful comments is less than the sum of the comments presented above. Table 6.3 shows the distribution between hateful and neutral in both the training and test set. The number of hateful comments is calculated by summing all comments that contains at least one hateful label. The number of neutral comments is the sum of all comments without any hateful label.

Table 6.3.: The number of comments categorised as neutral and hateful in the English dataset.

Dataset Neutral Hateful Total

Train 143346 16225 159571

Test 57735 6243 63978

Based on the amount of neutral and hateful comments presented in the table above, one can find that both datasets contains approximately 90% neutral comments.

Since the test set contains approximately 30% of all the data, one third of this data are used for validation instead of testing. This results in approximately 70% of the data used for training, 10% for validation and 20% for testing.

6.2. Experimental setup

Norwegian dataset

After the entire dataset was annotated and all comments labelled with X were removed, the dataset was complete. Every row in the dataset is on the format ‘id, label, text’. The dataset contains the number of comments presented in Table 6.4.

Table 6.4.: The number of comments and percentage of total in the annotated Norwegian dataset Category Number of comments Percentage of total

1 - Neutral 34083 82.8%

2 - Provocative 4734 11.5%

3 - Offensive 1563 3.8%

4 - Moderately hateful 509 1.2%

5 - Hateful 250 0.6%

Total 41139 100%

The distribution of comments in the five distinct categories is also displayed in Figure 6.1.

1 2 3 4 5

labels 0

5000 10000 15000 20000 25000 30000 35000

count

1 Neutral 2 Provocative 3 Offensive 4 Moderately hateful 5 Hateful

Figure 6.1.: The distribution of comments in each category in the Norwegian dataset

To fit the problem statement of this thesis, it was necessary to only separate between hateful/anomalous and everything else. The inter-annotator agreement calculations in Section 4.2.3 indicated that even the expert annotators struggled to agree on the annotation of the comments in category 4 or 5. Based on this and the definition in Section 2.1, it was chosen to include both the moderately hateful and hateful comments as the anomaly class, whereas category 1 to 3 was the normal class. But since many hate speech detection methods struggle with the separation of offensive and hateful utterances,

6. Experiments and Results

it was decided to compare the performance of the model with and without the inclusion of the offensive class as anomalies. This involves the investigation of two cases: (1) class 1, 2 and 3 are normal samples and class 4 and 5 are anomalous, (2) class 1 and 2 are normal samples and class 3, 4 and 5 are anomalous. Both experimental tests from Section 6.1 were conducted using these two cases. Table 6.5 displays the number of comments and percentage of total when only separating between normal and anomalous in the two cases explained above.

Table 6.5.: Preprocessed combined annotated Norwegian dataset with and without the inclusion of the offensive class as anomalies.

Case 1: 4+5 are anomalies Case 2: 3+4+5 are anomalies Category #comments % of total #comments % of total

Normal 40880 98.2% 38817 94.4%

Anomalies 759 1.8% 2322 5.6%

Total 41139 100% 41139 100%

The dataset is separated into a training, test and validation set using stratified splits in the beginning of every experiment. In other words, the data were divided so that each set had approximately the same distribution between the different classes.

In document Detecting hateful utterances using an anomaly detection approach (sider 91-94)