• No results found

With the advancement in autonomous perception and seeing the potential in lidar for this field, several methods for object detection based on point cloud data have emerged. The field of autonomy is constantly evolving, which means that new methods are regularly arising. In the following sections, related work regarding object detection will be introduced. This includes methods based on neural networks, clustering and the use of intensity and reflection to enhance classification. Inspired by the findings investigating related work, and with the goal for this project in mind, a method to procede with will be chosen.

3.1 Lidar Detection

There are several methods based on using CNNs to do object detection in the point cloud. As mentioned in section 2.3.1, CNNs specializes on grid-like topology. The 3D scenes captured by the lidars used in this project, is in the form of sparse and irregular point clouds. This gives rise to the problem of structuring and representing the 3D data in a good and efficient manner.

To cope with this problem, grid-based methods have been developed. The methods generally consist of representing the data as 3D voxels, as for instance in [64, 59, 54], or based on 2D maps such as in [11, 60, 33, 62]. The 2D maps can include both bird’s eye view and front view to perceive the environment. These methods can be processed by 3D or 2D CNNs to extract features for 3D detection. There also exist point-based methods such as [36, 53] which is based on PointNet [37]. These methods directly extract features from the point cloud for detection. Grid-based methods are generally more computationally efficient, but are more prone to information loss compared to point-based methods. PV-RCNN [52] is a method utilizing features from both grid-based methods and point-based methods.

Clustering is a technique of structuring a finite data set into a set of clusters. The data in a cluster has similar structure, defined by the method used. By evaluating the geometric properties and the amount of points in a cluster, it is possible to extract valuable information from them.

This information can be used to do object localization in the points cloud, for instance if the geometric constraints of an object are known. There exists several techniques for clustering, where DBSCAN [40] and OPTICS [4] are some notable methods. In the recent years, a cone detection technique has been developed at Revolve NTNU. It uses Euclidean clustering with certain conditions together with a series of filtering techniques to extract cone candidates. It

utilizes the geometric properties of a cone to determine if a given cluster is a cone candidate. It has a quite fast processing time, but is prone to false positives.

3.2 Lidar Intensity

Some of the key attributes of a lidar is that it can provide accurate distance measurements with an accuracy of up to a few centimeters. Most of the detection methods mentioned in the previous sections rely only on the positional data of the points, but it may be unused potential.

Many lidars provides intensity and reflective information that could be used to enhance the classification. It is not widely used, but the potential is still recognized as seen in the paper given by Scaioni et al. [49]. The paper concludes that using intensity data has the potential to improve classification for certain areas, mostly with examples for airborne use. Examples of such methods are given in [38, 26].

A case in which reflection data is used to improve classification for autonomous perception is presented in the paper given by L. Zhou and Z. Deng [63]. They use a linear support vector machine to classify traffic signs based on camera and lidar data. The image classification is enhanced by the lidar, which inputs 3D-position together with reflection data. Another example is given by Hern´andez et. al [13]. Here the clustering technique DBSCAN and reflective data from a laser rangefinder is used to find line surfaces for autonomous navigation. Methods for improving segmentation [55] and detection[5] using intensity and reflective data from the lidar have also been proposed.

A lidar manufacturer named Ouster has updated their drivers to output images based on lidar intensity data. It works by mapping the intensity data from the points in the point cloud to a grey-scale image. The result is similar to a conventional image, only that it is possible to extract accurate depth information from the image. Since it resembles conventional images, methods for object detection in images can be used [1].

In the paper given by Gosala et al. [19], they present approaches related to perception and state estimation for an autonomous race car. The autonomous race car is designed for the same framework as Revolve NTNU, the Formula Student competition. As part of their perception system, they proposes a method for recognizing cone colors based on the intensity data from a lidar. Their method is based on mapping a cone to a 32x32 grey-scale image where a pixel represent an intensity value from the point cloud. The image is classified using a CNN.

3.3 Choice of Method

The method of choice should be able to detect and classify the cones, preferably with cone color. It should have a fast processing time, be accurate and have high precision and recall. Due to the need of labeled training data for the neural network approaches, and since there already exists an implementation of Euclidean clustering, this is chosen to be used for localizing cone candidates. As mentioned, it has a fast processing time, but is prone to false positives, giving it low precision.

To ensure that precision becomes higher, a method for classifying the cone candidates is

sug-3.3 Choice of Method gested. Inspired by the front view approach from the detection methods, the use of Ouster as a camera, and the cone color classification mentioned by Gosala et al. [19], a method of project-ing the cone candidates to images is proposed. This is both to improve precision and to classify color. The overall method is similar to many traditional object detection methods on images, such as R-CNN [17], since the detection system finds regions of interest which is then classified by a CNN.

Chapter 4