• No results found

4.1 Dataset

4.2.2 Proposed Method

A different solution is therefore proposed in this thesis: First train the neural network with heavy augmentations, followed by fine-tuning with little or no augmentation. The hy-pothesis is that a two-stage data augmentation scheme will reduce overfitting and increase performance. Similar methods may have been adapted before, but no references are found in the literature.

The heavy augmentations, although perhaps breaking some of the images, should pro-duce a useful starting point for the weights of neural networks. In fact, training on artifi-cially augmented images, followed by fine-tuning without augmentations, can be viewed as a method of transfer learning. This comparison was very briefly drawn by Mikołajczyk and Grochowski [41].

An objective with a heavy/light two-stage augmentation scheme is for it to be general and applicable to any highly augmentable datasets. This would remove the need for in-dividually tuning the augmentation configurations for different networks and datasets. In this thesis, only different networks are tested. Investigating how the scheme generalizes to other datasets is suggested as further work.

Heavy data augmentation will not replace standard transfer learning. Rather it is an intermediate step between general images and corrosion images. The majority of training epochs (e.g. 30 – 100) should be performed with heavy augmentations as this dataset is more difficult. Training is then continued with little or no augmentation for relatively few epochs (e.g. 1 – 30).

There are a number of ways to construct a generic, heavy augmentation scheme. The transformations used in this project was based on examples and default values in the im-gaug documentation [28]. The resulting full, detailed augmentation scheme used in this

thesis can be found in Appendix C. An overview of applied transformations is given below.

Geometric Transformations

Geometric transformations are applied identically to both the image and the corresponding segmentation mask. The following geometric transformations were used in the heavy augmentation scheme.

• Horizontal/Vertical Flipping:As corrosion damages have no directional orienta-tion, we can randomly flip images both horizontally and vertically. This was applied to each image50 %of the time.

• Rotation: Correspondingly, damages can be rotated. With random flipping both horizontally and vertically, we need only randomly rotate images±45to effectively obtain full rotational freedom. Rotation was applied50 %of the time.

• Scale: Damages can be of any size. To increase scale invariance, images were randomly scaled to80 %–120 %of their original size50 %of the time. Scaling was applied individually per axis (horizontally and vertically).

• Translation: Translating a damage, either horizontally, vertically or both, can in-crease positional invariance. Images were therefore randomly translated±20 % rel-ative to the height/width of the image, individually per axis. This, too, was per-formed50 %of the time for each image.

• Shear Mapping:Stretching rectangular images to a parallelogram alters the shape of the damage and can increase robustness of the model. Shearing images randomly in the range±16was applied50 %of the time.

Some of the above transformations generate new pixels, e.g. to fill the empty space in corners when rotating an image. New pixels were assigned a solid random color.

Non-Geometric Transformations

Non-geometric transformations alter the pixel values and can increase robustness to dif-ferent light conditions, contrast levels, color variations, etc. Such augmentations are only applied to the images and not the segmentation masks. The following non-geometric trans-formations were used in the heavy augmentation scheme. In the following, "sometimes"

means that a random selection of up to5of the below listed transformations were applied to each image.

• Brightness:As images during inspection can be taken under vastly different light-ing conditions, it is beneficial to alter the brightness of images. This was obtained by sometimes multiplying either all pixel values or random subareas by a value in the range0.5–1.5.

• Add:Somewhat similar to brightness, sometimes a random value in the range−10 to10was added to all pixels. Half the time, the value was sampled individually per channel (RGB).

• Contrast:For similar reasons, adjusting the contrast in images is beneficial. Linear contrast was therefore sometimes used, half the time individually per channel.

• Hue and Saturation:Transforming images from the RGB color space to HSV (hue, saturation, value) allows for easy change of color (hue) and its intensity (saturation).

Adjusting these values and transforming back to RGB can increase robustness to color variations. Changes of±20 %were sometimes applied.

• Grayscale: Similar to hue/saturation, further robustness to colors can be achieved by sometimes randomly converting images partly to grayscale.

• Blur:To simulate areas out of camera focus, blur can be added. Sometimes either Gaussian, average or median blur was applied with moderate, random intensity.

• Invert:Sometimes image channels were inverted with a probability of5 %(sampled individually per channel), i.e. pixel valuevwas changed to255−v.

• Gaussian Noise: Robustness to noise can be achieved by sometimes randomly adding noise. Half the time, the noise was sampled individually per channel and pixel, the other half of the time it was only sampled once per pixel.

• Edge Detection:Edges are highly relevant features for neural networks. Sometimes a method of edge detection was applied, highlighting the edges in images.

• Sharpen:Sometimes images were sharpened, i.e. enhancing the definition of edges.

This is somewhat similar to edge detection above but is a more natural approach as the edges are not artificially highlighted.

• Emboss:Emboss is yet another method of enhancing the appearance of edges. This was sometimes applied.

• Superpixel: Sometimes converting images to their superpixel representation (i.e.

grouping pixels based on similarity, effectively reducing the resolution of the color space) was applied50 %of the time. This effectively creates a cartoon-like effect, increasing robustness to image resolution.

• Dropout: Sometimes adding artificially small black boxes may remove common features of a damage, forcing neural networks to also pay attention to less important features.

Figure 4.6 shows a few results of the heavy data augmentation scheme on three exam-ple images.

Tested Augmentation Schemes

The strength of the heavy data augmentation is easily adjusted by two simple parameters;

number of non-geometric transformations performed, and the probability of performing a geometric transformation. Adjusting individual transformation parameters/intensities is of course also possible but contradicts the purpose of this being a general scheme.

(a) Example 1.

(b) Example 2.

(c) Example 3.

Figure 4.6: Heavy data augmentation scheme on three example images. Many of these results are too heavily augmented for regular use in neural networks. However, training on such images lays a foundation for further fine-tuning with little or no data augmentation.

Chapter 5 shows results of using the above described heavy data augmentation fol-lowed by light augmentation in the form of horizontal and vertical flipping. A semi-heavy version in which the strength was reduced was also tested. Finally, training with flipping only and various augmentations considered realistic at low intensities, both for the entire training process, was tested. See Appendix C for details.