Camera-Lidar Fusion - Future Work - Conclusion and Future Work

Conclusion and Future Work

7.2 Future Work

7.2.3 Camera-Lidar Fusion

Revolve NTNU uses a stereo camera setup that runs a detection algorithm independent of the lidar detection module on the vehicle. This means that both the camera module and the lidar module are looking for cone candidates, but they parses this data to the localization and mapping algorithm asynchronously. This might be unused potential, because a detection method using camera and lidar combined may be better. A lidar and a camera have complementary features, with lidar’s accurate depth information and the rich informational images from the camera. This section intends to explore some of the possibilities with camera-lidar fusion, which can be used as a potential spring board for future work.

Related Work

There are methods that combine camera and lidar in an integrated deep neural network architec-ture, for instance, Multi-View 3D [11] and AVOD-FPN [25]. They both are combining different views from the lidar with an image input for classification. There are also several methods for camera classification, for instance Fast R-CNN [44], Faster R-CNN [43] and YOLO [42]. The latter is used to classify cones during the 2020 season at Revolve NTNU.

Calibration and Synchronization

To combine data from a lidar and a camera, the two sensors must be calibrated. Similarly with lidar-lidar calibration, the goal for calibration is to find the transformation between the lidar and the camera coordinate systems. A relevant method is given by Dhall et al. [14] which has a ROS package for practical implementation.

Synchronization is also important for a camera-lidar fusion framework to ensure that the fused data is related to the same instance of time. Unlike the lidar, it is possible to trigger a camera.

By using this it is possible to trigger the camera at the same time as the lidar publishes data.

There are a few methods related to this, such as TriggerSync [16].

In this thesis, it is suggested to use lidar for localization of cone candidates that are classi-fied by doing 2D projection of the candidates. This is a concept that is highly adaptable to a camera-lidar fusion-based concept. Instead of 2D projecting the candidates, snippets of images that represent the candidate can be extracted. The images can then be classified by a CNN, which should contain much more information compared to 2D projected lidar data. It can also increase the classification range, since the clustering is able to find candidates at around 30m.

One possible drawback is that the field of view of a typical camera is less than that of the lidar, which means that multiple cameras might need to be used.

Another solution is to use image based classification to localize and classify cones in images, the lidar can then be used to extract the distance of the cones found by the camera. The positive aspect of using camera-based classification is that there already exist a database with around 75,000 images of cones that are labeled. This allows for better training and verification. The weight aspect can also be improved since a camera can be smaller and lighter, and it can poten-tially compensate the benefits and need of two lidars. A combined camera-lidar deep network architectures can also be used, such as AVOD-FPN or Mulit-View 3D. The disadvantage with these is that they need training data based on both lidar and camera.

Bibliography

[1] The lidar is the camera. https://ouster.com/blog/the-camera-is-in-the-lidar/. Accessed:

2019-12-3.

[2] Racecar vehicle dyanmics roll pitch yaw. https://www.racecar-engineering.com/tech- explained/racecar-vehicle-dynamics-explained/attachment/racecar-vehicle-dyanmics-roll-pitch-yaw/. Accessed: 2020-4-9.

[3] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man´e, R. Monga, S. Moore, D. Mur-ray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Van-houcke, V. Vasudevan, F. Vi´egas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.

[4] M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. volume 28, pages 49–60, 06 1999.

[5] A. Asvadi, L. Garrote, C. Premebida, P. Peixoto, and U. Nunes. Real-time deep convnet-based vehicle detection using 3d-lidar reflection intensity data. 09 2017.

[6] J. L. Bentley. Multidimensional binary search trees used for associative searching. Com-mun. ACM, 18(9):509–517, Sept. 1975.

[7] P. J. Besl and N. D. McKay. A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2):239–256, Feb 1992.

[8] P. Biber and W. Strasser. The normal distributions transform: a new approach to laser scan matching. InProceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453), volume 3, pages 2743–2748 vol.3, Oct 2003.

[9] C. M. Bishop. Pattern recognition and Machine learning. Springer-Verlag New York, 2006.

[10] T. H. Bryne and T. I. Fossen. Lecture notes on aided inertial navigation systems. March 2019.

[12] P. Chu, S. Cho, S. Sim, K. Kwak, and K. Cho. A fast ground segmentation method for 3d point cloud. Journal of Information Processing Systems, 13:491–499, 01 2017.

[13] D. C´aceres Hern´andez, V. Hoang, and K. Jo. Lane surface identification based on re-flectance using laser range finder. In2014 IEEE/SICE International Symposium on System Integration, pages 621–625, 2014.

[14] A. Dhall, K. Chelani, V. Radhakrishnan, and K. M. Krishna. LiDAR-Camera Calibration using 3D-3D Point correspondences. ArXiv e-prints, May 2017.

[15] A. Dosovitskiy, G. Ros, F. Codevilla, A. L´opez, and V. Koltun. CARLA: an open urban driving simulator. CoRR, abs/1711.03938, 2017.

[16] A. English, P. Ross, D. Ball, B. Upcroft, and P. Corke. Triggersync: A time synchroni-sation tool. In2015 IEEE International Conference on Robotics and Automation (ICRA), pages 6220–6226, 2015.

[17] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR, abs/1311.2524, 2013.

[18] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016. http:

//www.deeplearningbook.org.

[19] N. B. Gosala, A. B¨uhler, M. Prajapat, C. Ehmke, M. Gupta, R. Sivanesan, A. Gawel, M. Pfeiffer, M. B¨urki, I. Sa, R. Dub´e, and R. Siegwart. Redundant perception and state estimation for reliable autonomous racing. CoRR, abs/1809.10099, 2018.

[20] M. Himmelsbach, F. v. Hundelshausen, and H. . Wuensche. Fast segmentation of 3d point clouds for ground vehicles. In2010 IEEE Intelligent Vehicles Symposium, pages 560–565, 2010.

[21] D. Holz, A. Ichim, F. Tombari, R. Rusu, and S. Behnke. Registration with the point cloud library - a modular framework for aligning in 3-d. IEEE Robotics Automation Magazine, 22:110–124, 12 2015.

[22] J. Kannisto, T. Vanhatupa, M. Hannikainen, and T. D. Hamalainen. Software and hardware prototypes of the ieee 1588 precision time protocol on wireless lan. In 2005 14th IEEE Workshop on Local Metropolitan Area Networks, pages 6 pp.–6, 2005.

[23] D. Kingma and J. Ba. Adam: A method for stochastic optimization. International Con-ference on Learning Representations, 12 2014.

[24] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolu-tional neural networks. Neural Information Processing Systems, 25, 01 2012.

[25] J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. Waslander. Joint 3d proposal generation and object detection from view aggregation. IROS, 2018.

[26] M. Lang and G. McCarty. Lidar intensity for improved detection of inundation below the forest canopy. Wetlands, 29:1166–1178, 12 2009.

[27] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D.

Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computa-tion, 1(4):541–551, Dec 1989.

[28] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to docu-ment recognition. Proceedings of the IEEE, 86(11):2278–2324, Nov 1998.

[29] Y. LeCun, C. Cortes, and C. J. C. Burges. The mnist database of handwritten digits.

http://yann.lecun.com/exdb/mnist/. Accessed: 2020-3-19.

[30] M. Magnusson. The three-dimensional normal-distributions transform – an efficient rep-resentation for registration, surface analysis, and loop detection, 2013.

[31] P. F. McManamon. Review of ladar: a historic, yet emerging, sensor technology with rich phenomenology. Optical Engineering, 51(6), 2012.

[32] P. Merriaux, Y. Dupuis, R. Boutteau, P. Vasseur, and X. Savatier. Lidar point clouds correction acquired from a moving car based on can-bus data. CoRR, abs/1706.05886, 2017.

[33] K. Minemura, H. Liau, A. Monrroy, and S. Kato. Lmnet: Real-time multiclass object detection on CPU using 3d lidar. CoRR, abs/1805.04902, 2018.

[34] B. Palerud. Pattern recognition for cone color classification using lidar intensity. Dec.

2019.

[35] Z. Pusztai, I. Eichhardt, and L. Hajder. Accurate calibration of multi-lidar-multi-camera systems. Sensors, 18:2139, 07 2018.

[36] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas. Frustum pointnets for 3d object detection from RGB-D data. CoRR, abs/1711.08488, 2017.

[37] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. CoRR, abs/1612.00593, 2016.

[38] N. Quang Minh and H. La. Land cover classification using lidar intensity data and neural network. Korean Journal of Geomatics, 29:429–438, 08 2011.

[39] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Ng.

Ros: an open-source robot operating system. volume 3, 01 2009.

[40] A. Ram, J. Sunita, A. Jalal, and K. Manoj. A density based algorithm for discovering density varied clusters in large spatial databases. International Journal of Computer Ap-plications, 3, 06 2010.

[41] E. Recherche, E. Automatique, S. Antipolis, and Z. Zhang. Iterative point matching for registration of free-form curves. Int. J. Comput. Vision, 13, 07 1992.

[42] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. CoRR, abs/1506.02640, 2015.

[44] S. Ren, K. He, R. B. Girshick, and J. Sun. Faster R-CNN: towards real-time object detec-tion with region proposal networks. CoRR, abs/1506.01497, 2015.

[45] J. Rieken and M. Maurer. Sensor scan timing compensation in environment models for au-tomated road vehicles. In2016 IEEE 19th International Conference on Intelligent Trans-portation Systems (ITSC), pages 635–642, Nov 2016.

[46] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and F. Li. Imagenet large scale visual recognition challenge. CoRR, abs/1409.0575, 2014.

[47] R. B. Rusu. Semantic 3d object maps for everyday manipulation in human living environ-ments. KI - K¨unstliche Intelligenz, 24(4):345–348, Nov 2010.

[48] R. B. Rusu and S. Cousins. 3d is here: Point cloud library (pcl). In 2011 IEEE Interna-tional Conference on Robotics and Automation, pages 1–4, 2011.

[49] M. Scaioni, B. H¨ofle, A. Kersting, L. Barazzetti, M. Previtali, and D. Wujanz. Methods from information extraction from lidar intensity data and multispectral lidar technology.

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Infor-mation Sciences, XLII-3:1503–1510, 04 2018.

[50] S. Shah, D. Dey, C. Lovett, and A. Kapoor. Airsim: High-fidelity visual and physical simulation for autonomous vehicles. InField and Service Robotics, 2017.

[51] K. Sherman and L. B. Stotts. Fundamentals of Electro-Optic Systems Design : Communi-cations, Lidar, and Imaging. Cambridge University Press, 2012.

[52] S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. Dec 2019.

[53] S. Shi, X. Wang, and H. Li. Pointrcnn: 3d object proposal generation and detection from point cloud. CoRR, abs/1812.04244, 2018.

[54] S. Shi, Z. Wang, J. Shi, X. Wang, and H. Li. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.

[55] A. Tatoglu and K. Pochiraju. Point cloud segmentation with lidar reflection intensity behavior. Proceedings - IEEE International Conference on Robotics and Automation, pages 786–790, 05 2012.

[56] H. Technology. Pandar20A/B 20-Channel Mechancial LiDAR User’s Manual. Hesai Photonics Technology Co, Ltd.

[57] H. Technology. Pandar40 40-Channel Mechancial LiDAR User’s Manual. Hesai Photon-ics Technology Co, Ltd.

[58] J. Wojtanowsku, M. Zygmunt, M. Kaszcuk, and Z. Mierczyk. Comparison of 905nm and 1550nm semiconductor laser rangerfinders’ performance deterioration due to adverse environmental conditions. Optical Engineering, 22(3), 2014.

[59] Y. Yan, Y. Mao, and B. Li. Second: Sparsely embedded convolutional detection. Sensors, 18:3337, 10 2018.

[60] B. Yang, W. Luo, and R. Urtasun. PIXOR: real-time 3d object detection from point clouds.

CoRR, abs/1902.06326, 2019.

[61] K. Zhang, S. Z. Ahmed, V. B. Saputra, S. Verma, and A. H. Adiwahono. Multi-lidar calibration and synchronization for autonomous vehicles. 10 2019.

[62] J. Zhou, X. Tan, Z. Shao, and L. Ma. Fvnet: 3d front-view proposal generation for real-time object detection from point clouds. pages 1–8, 10 2019.

[63] L. Zhou and Z. Deng. Lidar and vision-based real-time traffic sign detection and recogni-tion algorithm for intelligent vehicle. pages 578–583, 10 2014.

[64] Y. Zhou and O. Tuzel. Voxelnet: End-to-end learning for point cloud based 3d object detection. CoRR, abs/1711.06396, 2017.

Appendix A

In document Lidar based object detection for an autonomous race car (sider 85-93)