• No results found

During the initial exploration of making a custom data set, Vale v1.0, many challenges were uncovered. However, with the limited testing on Vale v1.0, further challenges were unfolded during the implementation of the first batch of Vale v2.0. Doubts linked to the sub-environments, annotation process, partitioning of the data set and the data set size. These doubts, including final insights of the data set, are covered in this section.

4.2.1 Vale v2.0 Sub-environments

The findings in Vale v1.0 urged for structural diversity in the data set.

Hence, 9 videos were captured from four different locations within the environment highlighted in Subsection 4.1.4. Each location presents structural aspects that are unique to the individual location, see Figure 4.10.

Two of four locations, locations 2 and 3 in Figure 4.10, were chosen for the first batch of 200 data samples, with an initial goal of 600 samples in total, preferable 150 samples from each location.

4.2.2 Category Annotation

During the annotation process of Vale v1.0, a subset of 50 frames were annotated. The annotation strategy was fairly simple, outlining every part of the structure that was included in one of two categories, and extracting

(a) Location 1 (b) Location 2

(c) Location 3 (d) Location 4

Fig. 4.10: Vale v2.0 locations

the other with the negative of the first category. This guaranteed unique pixel annotation, for both categories, however, obviously this strategy would not work with four categories.

To guarantee unique pixel annotation for all four categories in the initial 200 samples of Vale v2.0, a strict outlining of structures based on the category description was utilized. However, doubt concerning annotation time, but more importantly, whether the model would learn specific environment structures or the elevation of the structures. This, introduced the ranked category annotation approach, which was adapted for the remaining 150 of the initial 200 samples.

Approach: Strict Outlining

The straight forward annotation strategy, as other data sets have used [9], would be to trail the boundaries of each structure with regards to its category constraints. For the case of Vale v2.0, this would mean to label each structure based on the general guidelines and category descriptions, Figure 4.11b presents an illustration of labeling structures based on their structure boundaries.

Approach: Ranked outlining

The category descriptions presented in Subsection 4.1.2, aims at a metric division of each category. However, the purpose of this data set is to differentiate between elevation differences, derived from mobile robot clearances. The possibility of prioritizing the categories based on the highest to lowest elevation, and training the CNN model on the differences between these elevations are presented. To train on the elevation differences, two steps are required, 1) Ranking the categories

according to their elevation, i.e. non-traversable is ranked highest, and wheeled the lowest. 2) Annotating the segments in that order, and including neighboring pixels from the lower-ranked segments. The first step guarantees the constraint of the category descriptions. The second step includes visual segments of lower elevations connected with higher elevations, allowing the CNN to learn not only structures with a certain elevation but the connection between a higher and lower elevation.

Figure 4.11a illustrates the annotation of pixels of lower-ranked elevation segments in their higher-ranked neighbors. The concept of ranked categories has been used by Malberg and Rolfsen [30] during the labeling of map terrains.

(a) Strict Category Annotation Strategy

(b) Ranked Category Annotation Strategy

Fig. 4.11: Ranked category annotation strategy includes neighbouring pixels from lower ranked categories in the annotation of their higher ranked neighbours. Strict category annotation strategy annotates each category based on their constraints

This approach presents the advantages of, avoiding elevation categories that are exclusively represented by structures, forcing the model to train on the visual features (relations) between structures of different elevation categories. Additionally, this strategy allows for more flexibility by utilizing eye measuring on the number of neighboring pixels included during annotation, outlining slightly beyond structure frontiers.

4.2.3 Category Annotation - Final Approach

As stated during the presentation of Vale v2.0 in Section 3.1, the ranked annotation strategy is utilized. This strategy is chosen as a means to instigate the model towards distinguishing elevation, rather than

structural differences. Figure 4.11 highlights the difference between the two strategies.

Though this approach introduces classifying lower elevation terrain as higher, it still satisfies the constraints of the mobile robots, and the goal for the data set. This approach will slightly restrict the mobility of robots with a lower-ranked traversability, however, this can be accepted as an additional safety measure for the mobile robots, avoiding a collision.

4.2.4 Border Annotation

Semantic image annotation classifies every pixel in the frame, this also means the ground truth has to be accurate to the last pixel. Natural structures might have hundreds if not thousands of points in its outline polygon, and that is one of an arbitrary number of segments for each frame in the data set. During the annotation of Vale v1.0, where a very simple eye-measured outlining of one single category was performed, the average time of labeling one image was 5min. For comparison Cityscapes’ fine-pixel level annotation presents a 1,5h average time cost per image [9] with 30 categories. The same level of annotation quality is not doable for this thesis, the time constraint does not allow it.

The initial annotation strategy performed, on the 200 sample batch, included using the ignore label concept. However, doubt concerning category annotation presented the concept of ranked category annotation, and with that the border annotation strategy of dilation of segments. Based on the advantages, this approach was later adapted for the initial 200 sample batch of Vale v2.0.

Approach: Ignore label

Utilizing ignorelabel on areas difficult to determine a category, and then ignoring these segments during training, is a common practice when trading off segment boundary accuracy with annotation speed [9], [13].

This approach presents flexibility when outlining the segment boundaries, with no trade-off on incorrectly annotating categories.

Fig. 4.12: Ignoring segment boundaries in order to increase annotation speed, source PASCAL VOC2012 data set [13]

Approach: Dilation of segments

Dilating labeled segments offers 100% pixel annotation coverage. How-ever, this approach trades off both accuracy and slight miss classification, at the segment boundaries. Accepting this slight miss classification during the dilation introduces two pixel-annotation cases at the frontiers. Cor-rectly annotated pixels, and segments with overextended frontiers, which introduces miss classification.

4.2.5 Border Annotation - Final Approach

The concept of ranked category annotation makes the ignore label ap-proach pointless since the advantage o avoiding miss classification is gone.

Since ranked annotation removes the further risk of miss classification, as this approach already introduces "miss classification" of lower rank cat-egories in higher-ranked ones. Hence, the dilation of segment frontiers presents a favorable advantage, increasing the number of annotated pixels.

4.2.6 Elevation Reference

To annotate based on elevation requires a reference base. An initial approach of global elevation referencing was used during the initial 200 sample batch of Vale v2.0. However, considering the circumstances in the data samples a local elevation strategy was later adapted.

(a) Global elevation reference annotation strategy

(b) Local elevation reference annotation strategy

Fig. 4.13: Elevation referencing strategies. Red indicates non-traversable segment, green indicates wheeled segments

Global elevation referencing

Elevation categories based on a single global referencing base maintains the global traversability constraint of the terrain, clear distinction of elevated

segments, see Figure 4.13a. The main challenge with this approach concerns the choice of referencing base, how to decide where the base should be located.

The initial reference point for the elevation in the initial 200 samples was an adapted version of the global referencing concept. The reference base was set depending on the lowest elevation segment of the lowest elevation category. I.e. the lowest segment of the wheeled elevation category was labeled as wheeled, any flat surface elevated above that wheeled segment was labeled relative to the elevation from the initial wheeled segment.

Local elevation referencing

The local elevation referencing strategy accounts for any large segments of other elevation categories within the boundaries of a segment. This approach introduces island segments of lower elevation categories within segments of higher elevation, see Figure 4.13b. Hence, local elevation referencing avoids noise originating from classifying large segments resembling a lower category within a higher elevation segment.

4.2.7 Elevation Reference - Final Approach

For Vale v2.0 local elevation referencing has been utilized. Argued that a local elevation context, visual features between different elevation segments, have a higher frequency than a global elevation context. I.e.

within every image, local elevation differences appear between every segment, however, global referencing depends on the entire image, constraining the training samples to the number of images in the data set.

Thus, the training percentage on local elevation differences are higher than the training percentage of global elevation differences.

Similar to the ranked category annotation strategy, the lower elevation segment boundaries within segments of higher elevation, have been annotated based on eye measuring. A guideline gap of 5cm ± 3cm is assumed for the boundaries of the elevated segments.

4.2.8 Vale v2.0 Initial 200 Sample Batch

With the described annotation approaches a total of 3042 unique segments, of varying sizes, have been annotated. Figure 4.14 presents an overview of the distribution of categories among these 3042 segments. Stating segment distribution of, 94 (3.09%) wheeled, 581 (19.1%) belted, 986 (32,42%) non-traversable, and 1381 (45.40%) legged segments. This is a reasonable distribution relative to the sizes of each category segment. Wheeled areas usually cover large flat surfaces, with few separations between these, as opposed to legged, where structures that are spread out are many.

Figure 4.15 presents the distribution of segmented pixels per elevation

Fig. 4.14: Segment count on initial 200 sample batch, Vale v2.0 category. Here the findings show that the belted elevation category covers more than half of the total sample coverage. This is predicted as the content of sub environments for the initial 200 sample batch is mainly grass and gravel which falls under the belted category constraints. Also, the distributions show that less than 2% of the pixels have been left for dilation annotation.