• No results found

4.1.1 Data Acquisition and Analysis

A good dataset for training should ideally be large and contain a diverse set of images. The better the dataset is, the better the model is able to generalize. Defining diverseness in this experiment means having images taken from different heights, from different areas, with different backgrounds and weather conditions. A diverse set of wool color is also ideal.

One frustration in dataset generation is that white sheep are more prevalent than black and brown, but black and brown wool color is harder to differentiate from the background.

This information combined with the results from previous master thesis (Muribø, 2019), indicates that detecting sheep independent of their wool color should yield better results.

Thus we trained our models at detecting sheep in general, and not white, brown and black sheep distinctly. In addition, distinguishing between sheep based on their wool color is close to impossible when using the infrared images.

All of the images in the dataset were taken between 21-22 August, 20-22 September and 25th of October 2019 at Storlidalen in Trøndelag. Data from a test flight nearby Dragvoll without images of sheep were added as negative samples as well. A rough location can be seen in figure 4.1. The data was collected by the supervisor Svein-Olaf Hvasshovd and two other master students at NTNU, Kari Meling Johannessen and Magnus Guttormesen.

The dataset contains images from different heights ranging from 14 to 120 metres. As explained in section 3.3 the maximum FoV any image can have is 176m by 132m on the RGB images. This means that the sheep is at maximum 176m/2 (considering the UAV is always in the middle of an image) from the GPS location of the UAV. This in turn means that the maximum area the shepherd has to manually look for the sheep is an area

Chapter 4. Planning & Structure

Figure 4.1:Approximate locations of the datasets images. Left is an image of Storlidalen in Oppdal (Google-Maps, 2020a) and Right is an image nearby Dragvoll in Trondheim (Google-Maps, 2020b)

of 176m by 132m. Assuming the sheep has not relocated. Different backgrounds were also prevalent in the dataset, with some grassy fields, forest environment, as well as rocky highland. Weather conditions also vary somewhat with mostly cloudy or sunny conditions, as well as some colder weather with snowy background (See Figure 4.2).

Figure 4.2:Some of the different backgrounds and altitudes of the RGB images.

Datasets for the experiment were generated manually in order to create the most optimal

4.1 Planning training data for the model. Most images were suitable for usage, with only a few being unusable. Unusable images meaning blurry images, unclear infrared information or images that do not follow the same settings as the other images in the dataset. A lot of images were very similar to each other as well, almost looking like duplicates. The reason being that images were taken in bursts in order to gather a lot of data. This also required manual filtering of the raw dataset, in order to create a diverse dataset with as many different backgrounds, environments and altitudes as possible. The dataset should not contain too many images that look very similar as this could cause the networks to overfit. Images taken of the same herd at the same place, but at different altitudes were still defined as different images. It was only in the instance of similar images at the same altitudes that some of the images were filtered out.

Figure 4.3:Examples of unusable infrared images, The left image was in the location of an infrared image and has a resolution of 640x480 making it unusable for both the infrared and RGB network.

Image on the right used a different color palette than gray scale.

Labeling

After cropping the 4k images the number of images had effectively multiplied by 20, causing the amount of work needed to label them to multiply by 20 as well. Therefore this projects supervisor recommended cooperation on sharing data. Together with another group (Magnus Falkenberg Nordvik, 2020), the RGB images were uploaded to Labelbox (Labelbox, 2020) and used the websites tools to label the images. The infrared images on the other hand were not labelled cooperatively and were instead labelled using labelimg (Tzutalin). This was because of the significantly lower number of images causing labeling to be less work.

4.1.2 Experiment Structure

The experiment used the darknet implementation by Alexey (AlexeyAB, a), which is a popular fork of the original darknet repository by the creator of YOLO (pjreddie). The repository by AlexeyAB contains many improvements over the original code. Some of the improvements include general performance, more optimal GPU usage, windows support, runtime warnings and improved metric calculations. In addition to code improvements,

Chapter 4. Planning & Structure

AlexeyAB provides a detailed plan for improving detection on custom datasets. AlexeyAB’s implementation is more optimal for training on GPU’s, which was a concern for this experiment as the training was done on the NTNU IDUN computing cluster (Sj¨alander et al., 2019). The cluster has more than 70 nodes and 90 GPGPU nodes. Half of the nodes are equipped with two or more Nvidia Tesla P100 or V100 GPGPU’s, which this experiment will take advantage of.

In order to test YOLO’s performance on both infrared and RGB images, two different models were needed. Training and testing on both models were carried out separately.

The difference in size between RGB and infrared images, as well as the difference in the images, warrants different settings and thus separate testing was preferred. For instance, the RGB images contain sheep in three different colors, but for the infrared images all sheep are shown as white dots indicating heat.

With this in mind, these research questions were formed:

RQ1:How well does YOLOv3 perform in detecting sheep in highland terrain?

RQ2:Do infrared images improve the detection of sheep as opposed to RGB images?

The performance of a network was determined by comparing the performance data the different network configurations generate. Many different metrics were generated when the networks were tested, but the most important metrics was:

• Precision: The accuracy of predictions, the percentage of predictions that were correct predictions.

• Recall:The percentage of predictions to the number of objects in the dataset.

• mAP@50:Mean average precision with a threshold of 0.50 intersection over union.