• No results found

Experiment 5. A Comparative Study between Human and Machine using FE-

Chapter 3 Face Detection using an a contrario approach

4.5 Experiments and Analysis of Results

4.5.5 Experiment 5. A Comparative Study between Human and Machine using FE-

14 Caffe is a deep learning framework made with expression, speed, and modularity in mind.

also expected to be more difficult. Facial Expression Recognition by Humans

This experiment was done with a set of 253 participants between 18 and 66 years old, 27.27% females and 72.72% males. For this, we created a web page, where 10 images were shown to each participant, who had to classify each image in one of the seven emotions (AN (angry), DI (Disgust), FE (Fear), HA (Happy), SA (Sad), SU (Surprise) and NE (Neutral)). In Figure 4.8, an image of this experiment is shown. It is a picture of the web page where the participants classified several faces according to their criteria in facial expression recognition.

Figure 4.8. The created web page for the experiment of Facial Expression Recognition by humans.

The confusion matrix is shown in Table 4.17, where we obtain a total mean of 83.53% in the accuracy of this testing set evaluated by humans. Also, we can observe that there are some expressions that can be confused by others as Sad and Fear, which are often confused with Neutral and Surprise, respectively. However, Happy is the most clear to distinguish, and most of the participants recognize it easily.

AN DI FE HA NE SA SU Total

AN 329 21 5 2 3 2 3 90.14 %

DI 23 303 14 3 1 14 7 83.01 %

FE 7 22 243 0 1 5 88 66.39 %

HA 1 2 2 331 12 2 2 94.03 %

NE 6 4 5 13 331 5 0 90.93 %

SA 7 12 11 2 45 276 7 76.67 %

SU 5 4 13 29 7 1 299 83.52 %

Table 4.17. Confusion Matrix from human assessment (7 expressions). Results of the FE-Test dataset (described in Section 4.2.2) using the cross-datasets protocol.

Chapter 4. Facial Expression Recognition | 83

4.5.5.1 Facial Expression Recognition by our system

In this experiment, we employ our CNN and image pre-processing steps to classify expressions of the FE-Test dataset. We first study the capability of the system to discriminate between 6 expressions. The system is trained in two ways: with each of the five datasets separately available from previous Sections, and with the five combined datasets together. In Table 4.18, we can see that better results are obtained with the combination of five datasets than with only one dataset. As we see, we improve the results up to a 37.22% (the worst result was with the JAFFE as training dataset, which got 32.78%, while using the 5 DBs for training resulted in a 70% accuracy). And we improve even in the best case with a 6.11%.

Training set FE-Test Table 4.18. Results of the test FE-Test (6 expressions).

Therefore, in the case of 7 expressions we use the combination of five datasets as training set using our CNN and pre-processing step. We used the FE-Test as testing set, thus this is also a cross-datasets evaluation.

Table 4.19. Confusion Matrix from our system (7 expressions). Results of the FE-Test dataset using the cross-datasets protocol.

The confusion matrix is shown in Table 4.19, where we have obtained a total average of 68.86% in the accuracy. The best accuracy is obtained with Happy and Surprise, with machines (CNN) performing better than humans. Also, in both experiments we obtained the worst results in Sad and Fear, although humans are better than machines in this case. Humans also recognized better the neutral, angry and disgust emotions.

Finally, we can also see a correlation between the experiments, especially in the recognition of Angry, Disgust and Fear, which are usually confused with Disgust, Angry and Surprise,

respectively. Interestingly, these mistakes are done both by humans and machines, that is, both perform similar misclassifications.

4.6 Conclusions

In this Section we showed that: (1) the pre-processing step is relevant to improve the performance despite the intrinsic complexity of a CNN; (2) merging information captured with different cameras significantly helps in the network’s training; (3) facial expression classification from non-expert humans is correlated with the one of the CNN (especially in the recognition of Angry, Disgust and Fear) that is, we can see that the same types of facial expressions are misclassified by both the humans and the neural network.

Several experiments have been performed to build our proposed CNN and find the adequate steps for image preprocessing. We have evaluated the system using six datasets, including two new datasets (FEGA and FE-Test) that are presented in this Section. One of the captured datasets (FEGA) is the first one in the literature including simultaneously labeling of facial expression, gender and age of the individuals. Another contribution is the combination of different datasets to train our system. Up to our knowledge, this is the most extensive experimental study to date in cross-dataset facial expression recognition using CNNs, since most previous studies in the literature only employ one dataset for testing. Our study shows that each dataset adds an important value in the training, because each one of them has been captured in different conditions, and contains people from different ethnicities and ages.

Therefore, not only the quantity is important to train the data with CNN, but also the variety of information. Thus, the combination of these datasets into one single training dataset, using our image preprocessing steps to unify them improves significantly the results with respect to using only one dataset for training. Furthermore, we have got about 70% in accuracy using the cross-datasets protocol when the test set comes from a never-seen-before dataset. Finally, we have performed a comparative study of facial expression classification using our system vs. human opinion. The experiments show that our system outperforms other proposed solutions in the literature (Table 4.16), in addition to get good accuracy results in real world situations. Also, we have observed that humans and machine are prone to similar misclassifications errors.As future work, we intend to refine our system with more datasets, in addition to studying the pre-processing step for color images. We also plan to extend this study using age and ethnicity to develop a new multimodal system, more robust, for facial expression recognition.

85

Chapter 5

Evaluation on Social Robots

In Chapter 4, we designed a system based on a convolutional neuronal network and a specific image preprocessing for facial expression recognition. We used a combination of five datasets to get about 70% in accuracy using the cross-datasets protocol when the network is tested with a dataset unseen in the training. In this Chapter, we describe an application with social robots to evaluate our system in a real environment.

Section 5.1 introduces the context of this work and the most relevant related literature. In the next Section, we explain the performed experiment. In Section 5.3, we explain the design and procedure in detail. Section 5.4 is devoted to analyze the obtained results. The last Section shows the conclusions, review the main contributions and propose future lines of work.

5.1 Introduction

Facial expression recognition plays an important role in recognizing and understanding human emotion by robots [11]. Studies as [41] have demonstrated that a robot can affect its social environment beyond the person who is interacting with it. For example, studies of robots in autism therapy [83] show that robots can influence how children interact with others. For that reason, facial expression recognition is important to shape a good human-robot interaction and get a better user experience. Since social robots can simulate empathy and decide the best way to interact according to the facial expression of the user. Robots equipped with expression recognition capabilities can also be a useful tool to get feedback in videogames, for example, since they can assess the degree of satisfaction of the users. They can act as mediators, motivate the user and adapt the game according to the user’s emotions. On the other hand, many papers have demonstrated that the use of robots in the field of rehabilitation has a considerable effect in the improvement of the patients [91, 73, 24]. There are several types of social robots in the current market [92]. But we can highlight the robot NAO [69], which is a humanoid robot with friendly aspect and pleasant voice. This contributes to have a better user experience. Many papers have used the social robot NAO [69] in their experiments as in [32, 14, 94, 9], where the social component of natural interaction is common to all the proposed applications, in addition to be a tool for motivation in rehabilitation sessions.

Given the growing interest in Human-Robot Interaction [87], we have created an advanced interaction system using the social robot NAO. The system consists in a serious game to evaluate the facial expression made by the user in front of social robot. The robot acts as if it were an evaluator of actors and actresses. Then the robot interacts with the person according to his or her facial expression. With each recognized expression, the robot responds with a positive phrase to encourage the user with the game. In the design of the facial expression recognition system we have used the trained network described in Chapter 4. The experiment has as main goal the evaluation of our trained network in real environments with non-expert users.

Although, the system can also be used in several ways:

 As a user-experience evaluation tool, where the robot is adapted according to the user's emotions (positive feedback).

 As a tool for training the emotions, where the social robot acts as a supervisor of the user's level of success regarding the emotion performed. This system allows replicating and learning in a playful way seven facial expressions (happy, sadness, disgust, anger, surprise, fear and neutral). This kind of experiment also seeks to encourage attention and motivation of users, especially people with special needs, as for example children with autism. In [24], the authors affirm that the social robot can be a good imitation and interaction tool for children with attention deficit disorder movements) with the participant, the attention (level of user’s concentration) and the difficulty to express an emotion through a final interview with each participant. Since the participants were non-experts in this field, some of them did not know how to express some emotion.

Therefore, the results obtained by the CNN were also compared with the ground truth provided by 10 experts in facial expression recognition, in order to validate the system. We have considered as experts the 10 persons that ranked best in an initial test with 30 participants and which obtained a hit rate of 100% in experiment 5 of Chapter 4 (sub-Section 4.5.5.1).

5.2 Experiment

The goal of this study is to measure both the interaction and the attention of users with a Social Robot. In addition to evaluate our trained neural network in real time with a completely new set of users.

A total of 29 people participated in the experiment. Each participant was evaluated individually and signed the informed consent at the beginning of the experiment, since our robot captured his or her images. The participant sat in front of the robot (see Figure 5.1) and followed the instructions of NAO, without any help of the interlocutor. The robot began with an explanatory presentation of the game and involved the user by addressing him or her by name, to give a sense of personalized application. In this presentation, the robot acts as if it were an