Lidar-Lidar Fusion - Classification scores with different amounts of simulated data

Classification scores with different amounts of simulated data

6.1 Lidar-Lidar Fusion

6.1.1 Calibration

Calibration is an important prerequisite for a lidar-lidar fusion framework as it finds the transfor-mation between the lidars locally on the vehicle. The calibration was believed to be successful as the translation and the rotation in yaw looked well aligned in both point clouds after applying the transformation to the Pandar20B cloud. On the contrary, the result from scenario 1 showed another case as illustrated in the deviation in pitch (figure 5.15). Even a slight deviation, for instance, of 1^◦will result in an error of 0.463m at the range of 25m, which is more than the height of a cone. This probably affected the results from lidar-lidar localization substantially and illustrates how important a thorough verification of calibration is.

Due to test conditions, it was not possible to do re-calibration based on a new scenery. Two dif-ferent attempts at re-calibration were tested based on the existing data. One is by changing the NDT parameters introduced in section 4.3.2, while the other is by a point-to-point based ICP.

None of the methods resulted in better transformation, the ICP method will therefore not be explained further. The possible causes may be due to the selected scenery and the differences in the lidars. The chosen scenery for calibration was exposed to many buildings in close proximity to the lidars. This may not be a concern for translation and rotation in yaw, but it might miss small errors in pitch, which is visible first at longer distances. Differences in the lidars refer to the different amount of channels and that they have different angular resolutions. This can make it more difficult to match them, as a point cloud from one lidar is not a rigid transformation of a point cloud from the other lidar.

There are also other methods that can be used for calibration. Another registration algorithm that can be tested is ICP [7] with other variations, for example the point-to-plane variation. The

point-to-plane method tries to match a point from one point cloud to a plane constructed in the other point cloud, compared to matching each point to another point in the other cloud with the point-to-point method. Variations of ICP is included in the PCL[21].

NDT and ICP are registration algorithms, which means that they are general algorithms that can be used for different purposes. Therefore, they are not optimized and developed specifically for lidar-lidar calibration. It is possible to look into more procedural methods, for instance, as pro-posed Pusztai et al. [35]. They propose methods for calibrating a multi-lidar, mulit-camera system, including lidar-lidar calibration. They use a setup of boxes combined with point regis-tration of the corners of the boxes to obtain the transformation.

6.1.2 Ego-Motion Compensation

For high velocities, ego-motion compensation is essential for both lidar-lidar fusion and single lidar methods. The results obtained from ROS-bags using Ouster OS1-64 data highlights that the potential distortion can significantly affect the cone placement (figure 5.5). The main con-cepts from the proposed framework can also be seen, as the level of correction is dependent on the velocity of the vehicle, and that the point cloud is corrected more in the start of a sweep compared to the end where the ground truth is set.

The goal for the vehicle is to drive on average 17 m/s, which means that speeds above 20 m/s likely will be recorded. The velocities recorded with the Ouster OS1-64 is not high enough to prove the concept, but it was useful for development purposes. The framework should therefore be tested and verified in situations with higher velocities.

The method proposed in section 4.4.1 assumes constant velocities during a sweep along with negligible deviations in roll and pitch. These assumptions can be optimistic because of the con-ditions a race car is exposed to. Angular velocities in roll and pitch are retrieved from the state estimation system on the vehicle, and can therefore easily be implemented in the compensation if necessary. If the assumption of constant velocities during a sweep is too optimistic, it is pos-sible to divide a sweep into smaller segments, which typically can be done in the lidar driver.

For instance, if the point cloud is divided into 20 segments, each segment will correspond to (₂₀¹/20 = 1/400 =⇒ 400Hz), which is the rate state estimation is published.

6.1.3 Synchronization

The synchronization method tries to counteract the potential time difference between incoming data from the lidars, which can result in point clouds arriving at different global positions due to high velocities. This method is implemented and used for merging the data during the exper-iments, which was conducted stationary. It was also tested while moving the trolley at walking speed. Although the data looked good, it was not considered valid for verification due to the low speed.

The contribution for synchronization is therefore a framework relating incoming data to state estimation based on data time-stamped in ROS. This results in a global transformation for each point cloud used to relate each cloud to each other globally. However, this method should be tested and verified also at high velocities.

6.2 Object Detection

6.2.1 Localization

The goal of the localization framework is to find cone candidates in 3D-space by using clus-tering and filclus-tering. It it adaptable and tested for Pandar20B, Pandar40 and on fusion data. As mentioned in section 4.6, tuning is an important aspect of success. By using larger voxel size, the resulting point cloud becomes smaller, which will positively affect the processing time. On the other hand, it can contribute to generalization which can cause cones not to be detected. The choice of clustering threshold is directly affected by the voxel size, as larger voxels will give greater distances between points. From experimentation with tuning, it is believed that one of the biggest contributors to false positive filtering is where the size of a cluster is checked. This ensures maximum width and height from a candidate. The outlier rejection also contributes a lot, by removing all candidates who are too close to each other (figure 5.7).

The scenarios highlights the positive aspects of using lidar in autonomous detection, especially with regards to localization. In a stationary situation, they generally had low deviation between callbacks, which means that the position of a candidate in 3D-space can be trusted. The stan-dard deviation were not affected by the distance to a candidate (figures 5.12b, 5.13b, 5.14b).

It also shows that clustering and filtering have low processing time (table 5.6), it managed to process and find candidates well within 20Hz (0.05s) which is the frequency the lidar publishes data. This means that it can process and use every point cloud from a lidar without storing or neglecting any of the data. It also allows room for processing time for classification.

Based on scenario 1 (section 5.3.6) and scenario 3 (section 5.3.7), combining the localization’s from Pandar20B and Pandar40 could be beneficial. Pandar20B and the Pandar40 complement each other, and are combined capable to see all of the cones in each scenario, up to a certain distance. The combining process can be asynchronously streams of localization, or fusion based on the localized candidates. The lidar-lidar fusion, on the other hand, is unable to see all the cones, which is likely due to calibration as discussed in section 6.1.1. This probably affected the results to a significant extent that lidar-lidar fusion cannot be compared to the single lidar configurations in terms to localization. With a successful calibration, it still can contribute pos-itively. In figure 5.14a, it can be seen that the fusion approach was able to locate all the nearby cones, where the error in pitch probably did not affect the localization. Each cone localized and reconstructed by the fusion approach contains more data compared to the single lidar ap-proaches, which is favorable for classification.

There are also some interesting aspects when it comes to recall. The experiments were carried out stationary, which means that the scenery is expected to be constant. Although the lidars and cones stood still, the recall varied with the different configurations (tables in appendix C).

Possible explanation for this may be the noise/accuracy from the lidar itself, as points from one sweep might not appear in the same location as the points in the next sweep. This can cause points to break the rules given in the localization method, i.e. clustering threshold, height/width criteria. This might explain small variations in recall, specially in the lower/upper end of recall scores. For example, low recall; a candidate normally not located, get localized a few times.

High recall; a candidate normally located is missed a couple of times. The candidates with recall in the middle spectrum are a bit more difficult to explain, (ID:0 P40 scenario 3 figure 5.18a, ID:13, ID:14 and ID:2 P20 scenario 1 figure 5.12a). Possible solutions are noise, small

movements, interference between the lidars or software related problems.

Another aspect is the weight. For a race car, increasing weight is a concern, and a large part of building a race car is optimizing for lower weight. Two lidars are heavy, which increases the need of processing power and wiring on the vehicle. If one lidar is to be used, the experiments show that the Pandar40 may be beneficial. It has the best recall of all methods and provides more information from a cone due to more channels. The data from the Pandar20B was notable more noisy. This can contribute to lower recall and slightly higher standard deviation compared to the Pandar40.

The amount of false positives is not evaluated for this thesis. This is because the amount is directly dependent on the scenery it is subjected to. In a city environment with many buildings, cars, people, etc., it would find a lot of false positives. But the race car is meant to drive on reasonable flat tracks with mostly cones in the scenery. Sources of false positives may be peo-ple, objects such as lamp posts and perhaps work areas with garages, etc. To realistically say something about the actual number of false positives, data from an actual track-scenario should be used. This was not obtained during the period this thesis was made. The uncertainty in false positives indicates why classification can be useful.

6.2.2 2D Projection

In general, 2D projection of cone candidates seems to be successful. It manages to capture both the shape and the intensity pattern across the height of a cone. This is clearly visualised in figure 5.3, where the average intensity of a subset of images is given. Some of the 2D projected cones has uneven horizontal lines, which makes sense because of a cone’s roundness. This results in variations in the images that may contribute to the need of more training data for the CNN to learn relevant features. It also looks less like the simulated data, which has straight lines. To counter this, it is possible to use ring number instead of z-direction vertically. The ring number indicates which channel from the lidar a point is related to, and is therefore likely to give straight lines in an image. This can increase the effect of simulated data and reduce the need for training data.

Placement of the lidar can also contribute to the image quality. The data from 2019 was captured with the Pandar40 low in the front of the trolley, giving a different perspective in the images.

This can be seen in figure 5.8c and figure 5.8f which is from 2019, compared to the others which are from 2020. The difference can also be seen in appendix B. By having the lidar lower in the front, more channels will hit cones at closer distances, which relates to more information from the cones. On the other side, a lower placement is more prone to cone occlusion.

The 2D projected images are based on data from one lidar due to ease of development because the method is in a conceptual phase. If 2D projection is to be done based on lidar-lidar fusion, the intensity between the two lidars must be calibrated. This is to ensure that each lidar provides the same intensity information from the same surface.

One final aspect is the quality and quantity of data. To provide a variety of cones of different quality, cones with impurities were used (figure 5.2). These darker impurities are likely to affect the intensity and are valuable assets for displaying diversity in the dataset. A total of 5000 samples were used, which is a small amount compared to MNIST’s 60,000. The problem with

6.2 Object Detection the specific and specialized methods proposed in this thesis, is that there are no open-source datasets that can be used. The process of generating and labeling data is tedious, and a lot of time can go to spill if the method proves not to be prosperous. Other methods for generating simulated data are possible, for instance, by using platforms such as Carla [15] or AirSim [50].

These provides simulated environments where modifications could give relevant cone data.

6.2.3 Classification

The purpose of the classification framework is to evaluate the class of a cone candidate retrieved from localization. The results from the development of the model are generally good (table 5.3 and 5.5). The results give no specific indications of the effect of simulated data, neither positive nor negative (figure 5.10 and 5.11). Both the shape classifier and the combined shape and color classifier have overall high scores based on the test data. The scores from the shape classifier is slightly better at 0.989 in overall accuracy. The overall accuracy from the combined shape and color classifier is 0.973, and the confusion matrix (table 5.4) indicates that if a class is predicted wrong it is often blue and yellow cones that are mistaken with each other, or a non-cone that get classified as a blue cone.

Section 4.7.1 proposed a classification range of 15m with color and 24m based on shape with the Pandar40. From the results this might be too optimistic, based on scenario 1 and 2, more realistic distances are 10m with color and 15m based on shape. Nevertheless, the results from the scenarios are positive, especially from scenario 3 where all but one candidate is correctly predicted (figure 5.20). This indicates that the shape classifier can contribute to fewer false positives, and thus higher precision in the overall cone detection framework. The color and shape classifier manages to decide color up to 10m, which can be useful in predicting corners for the path planning algorithm, and further increasing the information that can be put into a global map. The combined color and shape classifier does not perform very well on finding false positives based on scenario 3, so a possible solution might be to integrate both classifier so that the color classifier only predicts the color of cones predicted by the shape classifier.

The proposed classification framework can be regarded as an optimistic proof of concept. More training data can be beneficial, and it should be tested and evaluated in several scenarios before it can be regarded as a valid method. Although the modified LeNet-5 produced good results, other architectures can also be tested. Adding depth to the network by adding new convolutional layers and adding more filters can cause the network to learn new features.

Chapter 7

In document Lidar based object detection for an autonomous race car (sider 77-83)