• No results found

SUMMARY AND CONCLUSION 85 The standard implementation of Dalal and Triggs [2005]’s algorithm in OpenCV

Evaluation and Conclusion

4.5. SUMMARY AND CONCLUSION 85 The standard implementation of Dalal and Triggs [2005]’s algorithm in OpenCV

was tried out in the lab using the Bumblebee2 camera. The multi-scale sliding window search implemented in OpenCV was tried out on four videos containing an adult standing upright, an adult pushing a rollator while slightly leaning forward, an adult partially occluded by an umbrella, and lastly an adult sitting in a wheelchair. The standard implementation was able to accurately detect the person in the three first categories, showing that the method is better at this task than the skeleton algorithm implemented in NITE.

The method did however use around around 640-812 ms to perform detection in a 384x512 pixel window which is too slow for real world application. In the thesis it is discussed how Zhu et al. [2006]’s algorithm based on AdaBoost and a cascade of classifiers is successful of making Dalal and Trigg’s method work in real-time.

This algorithm is unfortunately patented so further work must be done in order to find an alternative improvement. One solution could be to learn some clues about where the pedestrian might be in the image and narrow down the search to concentrate on a region of interest.

The standard implementation was not able to detect the person sitting in a wheelchair. The implementation was therefore customized in order to be possible to retrain using the SVM implemented in OpenCV. Because OpenCV is open source it is possible to alter the source code to suite our needs. An attempt at training a custom classifier based on Dalal and Trigg’s HOG/SVM algorithm was performed using 90 positive images of people in wheelchairs and the negative images from the INRIA Person training set. Now the custom classifier was able to give positive predictions in 72 / 100 frames. 31 of these positive predictions were very good predictions where most of the wheelchair was inside the prediction window and no false positives were present. The results can be seen in Figure 3.10 The results show that this classification algorithm can be retrained in order to detect other road user categories like wheelchair and bicycles, and possibly other objects like baby strollers.

Further the OpenCV implementations of Hough transform and face recognition based on Haar-like features were tested on the task of detecting wheels and faces in images. The results can be seen in Figure 3.11. The results were not very good, but it might, through further work, be used as a weak classifier in the ensemble.

In order to infer the age of a detected pedestrian it is discussed in Section 3.4.6 how information provided by growth curves can be used to calculate the probabil-ity of being a certain age given observed height. A program was written in order to perform simulations of a population to estimate the probabilityP(Age|Height).

The results of the simulation is given in Table A.2 and Table A.1.

It is also discussed how groups of children from a kindergarten or an elemen-tary school walking in groups could be detected with Memarzadeh et al. [2013]’s HOG+C features in combination with information about height. They add Hue-Saturation-Value color information to the HOG descriptors in order to detect the colors of the reflective vests of workers at an construction site. It is a fact that the children wear such vests when they are out walking with the kindergarten or elementary school.

The conclusion of goal 2 is that even though more work on making robust methods that can perform real-time multi-label or multi-class classification in traffic scenes is needed, all of the identified methods can be extended to work as weak classifiers in an ensemble of binary classifiers. The identified methods for classification also answers research questions three and four. A bit of engineering to make the classifiers work in real-time is needed but this may be a matter of narrowing down the search space and running the classifiers in parallel. By performing a sliding window search on the input stream from the camera, the classifiers can produce a true/false prediction in a window giving the image area a set of labels indicating what pedestrian categories are present. In the real world the algorithm could have a few frames to decide on a correct classification as the pedestrians will be waiting for the light to change anyway.

In order to retain the systems capability to infer a pedestrians intention to cross the road or not, the system needs some other parts in addition to the classification algorithms. Detected pedestrians must first be separated from the background, and if they are standing closely, from each other. Then they must be tracked over a sequence of frames in order to infer the features used in Solem [2011]’s intention algorithm. These can be seen in Table 3.3. All of these features were trivial to infer using the skeleton algorithm in NITE, but is not that simple to infer without using the Kinect. Further work must be done in this area in order to identify methods for solving this problem.

The combination of the architecture outlined in Figure 3.2 and the ensemble of classifiers illustrated in Figure 3.12 answers goal 3. This architecture can be used as a basis for further implementations and experiments in order to construct a full system that can complement Kheradmandi and Strom [2012]’s intelligent traffic lights. Hopefully these systems can become a reality in the future and help solving some of the issues with today’s traffic lights.

Bibliography

Aamodt, A. and Plaza, E. (1994). Case-based reasoning; foundational issues, methodological variations, and system approaches. AI COMMUNICATIONS, 7(1):39–59.

Aggarwal, J. and Cai, Q. (1997). Human motion analysis: a review. InNonrigid and Articulated Motion Workshop, 1997., pages 90 –102.

Belbachir, A., Schraml, S., and Brandle, N. (2010). Real-time classification of pedestrians and cyclists for intelligent counting of non-motorized traffic. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, pages 45 –50.

Bjørnskau, T. (2008). Risiko i trafikken 2005-2007. Technical Report 986, Trans-portøkonomisk institutt. Stiftelsen Norsk senter for samferdelsforskning.

Bluesky Internation Limited. (2012). How does lidar work? http://www.

lidar-uk.com/how-lidar-works.

Bo, L. and Heqin, Z. (2003). Using object classification to improve urban traffic monitoring system. In Neural Networks and Signal Processing, 2003. Pro-ceedings of the 2003 International Conference on, volume 2, pages 1155 –1159 Vol.2.

Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, COLT ’92, pages 144–152, New York, NY, USA. ACM.

Chen, D.-Y., Cannons, K., Tyan, H.-R., Shih, S.-W., and Liaoa, H.-Y. (2008). A framework of spatio-temporal analysis for video surveillance. In Circuits and Systems, 2008. ISCAS 2008. IEEE International Symposium on, pages 2745 –2748.

87

Cho, H., Rybski, P., and Zhang, W. (2010). Vision-based bicyclist detection and tracking for intelligent vehicles. InIntelligent Vehicles Symposium (IV), 2010 IEEE, pages 454–461.

Clark-Carter, D. D., Heyes, A. D., and Howarth, C. I. (1986). The efficiency and walking speed of visually impaired people. Ergonomics, 29(6):779–789. PMID:

3743536.

Crow, F. C. (1984). Summed-area tables for texture mapping. SIGGRAPH Comput. Graph., 18(3):207–212.

Daamen, W. and Hoogendoorn, S. P. (2003). Experimental research of pedestrian walking behavior. Transportation Research Record: Journal of the Transporta-tion Research Board, 1828(1):20–30.

Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005.

IEEE Computer Society Conference on, volume 1, pages 886–893 vol. 1.

de Chaumont, F., Marhic, B., Delahoche, L., and Cauchois, C. (2004). Generic method for recognition of a wheelchair, even with a low resolution-effective sen-sor. InIndustrial Technology, 2004. IEEE ICIT ’04. 2004 IEEE International Conference on, volume 1, pages 56–60 Vol. 1.

Di´ogenes, M. C., Greene-Roesel, R., Ragland, D. R., and Lindau, L. A. (2008).

Effectiveness of a Commercially Available Automated Pedestrian Counting De-vice in Urban Environments: Comparison with Manual Count. In Transporta-tion Research Board Proceedings. TransportaTransporta-tion Research Board.

Dollar, P., Wojek, C., Schiele, B., and Perona, P. (2009). Pedestrian detection: A benchmark. InComputer Vision and Pattern Recognition, 2009. CVPR 2009.

IEEE Conference on, pages 304–311.

Duda, R. O. and Hart, P. E. (1972). Use of the hough transformation to detect lines and curves in pictures. Commun. ACM, 15(1):11–15.

Enzweiler, M. and Gavrila, D. (2009). Monocular pedestrian detection: Survey and experiments. Pattern Analysis and Machine Intelligence, IEEE Transac-tions on, 31(12):2179–2195.

Forsynth, D. A. and Ponce, J. (2003). Computer Vision: A Modern Approach.

Pearson.

Freund, Y. and Schapire, R. E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting.

BIBLIOGRAPHY 89 Glad, A. and Midtland, K. (2000). Seks˚aringer og kryssing av veg.

Hancock, J., Hoffman, E., Sullivan, R., Ingimarson, D., Langer, D., and Hebert, M. (1997). High-performance laser range scanner. In SPIE Proceedings on Intelligent Transportation Systems.

Hosotani, D., Yoda, I., and Sakaue, K. (2009). Wheelchair recognition by using stereo vision and histogram of oriented gradients (hog) in real environments.

InApplications of Computer Vision (WACV), 2009 Workshop on, pages 1–6.

Huang, C.-R., Chung, P.-C., Lin, K.-W., and Tseng, S.-C. (2010). Wheelchair detection using cascaded decision tree.Information Technology in Biomedicine, IEEE Transactions on, 14(2):292–300.

Jung, H., Ehara, Y., Tan, J. K., Kim, H., and Ishikawa, S. (2012). Applying msc-hog feature to the detection of a human on a bicycle. InControl, Automation and Systems (ICCAS), 2012 12th International Conference on, pages 514–517.

Kay Fitzpatrick, Marcus A. Brewer, S. T. (2006). Another look at pedestrian walking speed.Transportation Research Record: Journal of the Transportation Research Board, 1982(1):21–29.

Kheradmandi, . and Strom, F. (2012). Controlling a signal-regulated pedestrian crossing using case-based reasoning. InControlling a Signal-regulated Pedes-trian Crossing using Case-based Reasoning.

Kofod-Petersen, A., Wegener, R., and Cassens, J. (2009). Closed doors – mod-elling intention in behavioural interfaces.

Leonard, R. (2002). Statistics on vision impairment a resource manual.

Li, B., Yao, Q., and Wang, K. (2012). A review on vision-based pedestrian detection in intelligent transportation systems. In Networking, Sensing and Control (ICNSC), 2012 9th IEEE International Conference on, pages 393 – 398.

Lienhart, R. and Maydt, J. (2002). An extended set of haar-like features for rapid object detection. InIEEE ICIP 2002, pages 900–903.

Memarzadeh, M., Golparvar-Fard, M., and Niebles, J. C. (2013). Automated 2d detection of construction equipment and workers from site video streams using histograms of oriented gradients and colors.Automation in Construction, 32(0):24 – 37.

Mitra, S. and Acharya, T. (2007). Gesture recognition: A survey.Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 37(3):311 –324.

Moeslund, T. B., Hilton, A., and Kr¨uger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Un-derst., 104(2):90–126.

Myles, A., Lobo, N. D. V., and Shah, M. (2002). Wheelchair detection in a calibrated environment.

NPRA (2012). Handbook 048 trafikksignalanlegg. Technical report, Norwegian Public Roads Administration.

OECD, P., editor (2003). New transport technology for older people; Summary and Conclusions of the Symposium on Human Factors of Transport Technology for Older Persons. Organisation for Economic Co-operation and Development OECD.

of Violence, W. H. O. D., Prevention, I., Disability., and Organization, W. H.

(2009). Global status report on road safety : time for action / World Health Organization. World Health Organization, Geneva :.

Organization, W. H. (2012). Cataract.

Oxley, J. (2002). Elderly pedestrian issues. Monash University Accident Research Centre.

Papadourakis, V. and Argyros, A. (2010). Multiple objects tracking in the pres-ence of long-term occlusions. Computer Vision and Image Understanding, 114(7):835 – 846.

P.B.Juliusson, Roelants, M., Eide, G. E., Moster, D., Juul, A., Hauspie, R., Waaler, P. E., and Bjerknes, R. (2009). Vekstkurver for norske barn.Tidsskrift for Den norske legeforening, 4.

Philips, J. (1997). An algorithm for determining the position of a circle in 3d from its perspective 2d projection. Technical Report TRITA-MAT-1997-MA-1, Department of Mathematics, KTH (Royal Institute of Technology).

Point Grey Research Inc. (2012). Bumblebee2 stereo camera. http://www.

ptgrey.com/products/bumblebee2/bumblebee2_stereo_camera.asp.

Poppe, R. (2007). Vision-based human motion analysis: An overview.Computer Vision and Image Understanding, 108:4 – 18. ¡ce:title¿Special Issue on Vision for Human-Computer Interaction¡/ce:title¿.

BIBLIOGRAPHY 91 Prince, F., Corriveau, H., Hebert, R., and Winter, D. A. (1997). Gait in the

elderly. Gait and Posture, 5(2):128 – 135.

Qui, Z., Yao, D., Zhang, Y., Ma, D., and Liu, X. (2003). The study of the de-tection of pedestrian and bicycle using image processing. InIntelligent Trans-portation Systems, 2003. Proceedings. 2003 IEEE, volume 1, pages 340 – 345 vol.1.

Raptis, M., Kirovski, D., and Hoppe, H. (2011). Real-time classification of dance gestures from skeleton animation. InProceedings of the 2011 ACM SIG-GRAPH/Eurographics Symposium on Computer Animation, SCA ’11, pages 147–156, New York, NY, USA. ACM.

R¨am¨a, P. (1993). V¨asentliga beteendevariabler hos barn i trafiken.

Shan, C. (2012). Learning local binary patterns for gender classification on real-world face images. Pattern Recognition Letters, 33(4):431 – 437.

¡ce:title¿Intelligent Multimedia Interactivity¡/ce:title¿.

Shan, C., Gong, S., and McOwan, P. W. (2007). Learning gender from human gaits and faces. In Proceedings of the 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, AVSS ’07, pages 505–510, Washington, DC, USA. IEEE Computer Society.

Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. (2011). Real-time human pose recognition in parts from single depth images.

Solem, J. S. (2011). Intention-aware sliding door. In Intention-aware Sliding Doors.

Somasundaram, G., Morellas, V., Papanikolopoulos, N., and Bedros, S. (2012).

Object classification in traffic scenes using multiple spatio-temporal features.

InControl Automation (MED), 2012 20th Mediterranean Conference on, pages 1536–1541.

Sonka, M., Hlavac, V., and Boyle, R. (2008). Image Processing, Analysis, and Machine Vision. Thomson.

Statens vegvesen (2012). Handbook 048. http://www.vegvesen.no/Fag/

Publikasjoner/Handboker. Accessed: 25.05.2013.

Sun, Z., Bebis, G., and Miller, R. (2006). On-road vehicle detection: A review.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 28:694–711.

Takahashi, K., Kuriya, Y., and Morie, T. (2010). Bicycle detection using pedaling movement by spatiotemporal gabor filtering. InTENCON 2010 - 2010 IEEE Region 10 Conference, pages 918–922.

Tarcin, S., ¨Oz¨utemiz, K. B., Koku, A. B., and Konukseven., E. I. (2011). Compar-ision of kinect and bumblebee2 in indoor environments. Middle East Technical University Department of Mechanical Engineering.

Thang, N., Kim, T.-S., Lee, Y.-K., and Lee, S. (2011). Estimation of 3-d human body posture via co-registration of 3-d human model and sequential stereo information. Applied Intelligence, 35:163–177.

Tsoumakas, G. and Katakis, I. (2007). Multi-label classification: An overview.

Int J Data Warehousing and Mining, 2007:1–13.

Uddin, Z., Thang, N. D., Kim, J. T., and Kim, T.-S. (2011). Human activity recognition using body joint-angle features and hidden markov model. ETRI Journal, 33(4):569–579.

Viola, P. and Jones, M. (2001). Rapid object detection using a boosted cas-cade of simple features. InComputer Vision and Pattern Recognition, 2001.

CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, volume 1, pages I–511–I–518 vol.1.

Wang, C.-C., Thorpe, C., and Suppe, A. (2003). Ladar-based detection and tracking of moving objects from a ground vehicle at high speeds. In IEEE Intelligent Vehicles Symposium (IV2003).

Wang, J.-G., Li, J., Yau, W.-Y., and Sung, E. (2010). Boosting dense sift descrip-tors and shape contexts of face images for gender recognition. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, pages 96–102.

Wikimedia Foundation Inc. (2013). F test. http://en.wikipedia.org/wiki/

F-test. Last updated March 14, 2013.

Wikimedia Foundations Inc. (2013). Svm. http://en.wikipedia.org/wiki/

Support_vector_machine. Last updated May 17,2013.

Wu, H., Chen, X., Gao, Y., Zhou, H., and Zhang, X. (2010). An effective algo-rithm of tracking multiple objects in occlusion scenes. InIndustrial Mechatron-ics and Automation (ICIMA), 2010 2nd International Conference on, volume 2, pages 409–413.

BIBLIOGRAPHY 93 Yamato, J., Ohya, J., and Ishii, K. (1992). Recognizing human action in time-sequential images using hidden markov model. InComputer Vision and Pat-tern Recognition, 1992. Proceedings CVPR ’92., 1992 IEEE Computer Society Conference on, pages 379 –385.

Yang, C.-A. and Chung, P.-C. (2007). Recovery of 3-d location and orientation of a wheelchair in a calibrated environment by using single perspective geometry.

InTENCON 2007 - 2007 IEEE Region 10 Conference, pages 1–4.

Yang, Y., Wang, H., Zeng, K., Lv, H., and Li, S. (2009). A tree-structure classifier ensemble for tracked target categorization. In Image and Signal Processing, 2009. CISP ’09. 2nd International Congress on, pages 1–5.

Yogameena, B., Mansoor Roomi, S., Jyothi Priya, R., Raju, S., and Abhaiku-mar, V. (2012). People/vehicle classification by recurrent motion of skeleton features. Computer Vision, IET, 6(5):442 –450.

Yun, W., Qing-Jie, K., Zhonghua, L., and Yuncai, L. (2010). Pedestrian and bicycle detection and tracking in range images. InOptoelectronics and Image Processing (ICOIP), 2010 International Conference on, volume 2, pages 109 –112.

Zhu, Q., Yeh, M.-C., Cheng, K.-T., and Avidan, S. (2006). Fast human detection using a cascade of histograms of oriented gradients. InComputer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 1491–1498.

Appendices

95

Appendix A

Calculations of Probability