• No results found

Deep Learning-Based Unsupervised Human Facial Retargeting

N/A
N/A
Protected

Academic year: 2022

Share "Deep Learning-Based Unsupervised Human Facial Retargeting"

Copied!
5
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

Pacific Graphics 2021

E. Eisemann, K. Singh, and F.-L Zhang (Guest Editors)

Volume 40(2021),Number 7

Deep Learning-Based Unsupervised Human Facial Retargeting

Seonghyeon Kim1 , Sunjin Jung1 , Kwanggyoon Seo1 , Roger Blanco i Ribera2 , Junyong Noh1

1KAIST, Visual Media Lab 2C-JeS Gulliver Studios

1. Architecture

Table1and Table2show the architecture of ReenactNet and BP- Net, respectively.

Table 1:Overview of architecture of ReenactNet. Convolutional filters are specified in the format of “k(#kernel size)s(#stride)”. PS2 indicates a pixel shuffle layer [SCH16] with an upscale factor of 2. The two decoders Dsand Dtof the autoencoder share the same structure.

EncoderE Filter Activation function Output

Conv k3s1 ReLU 16×128×128

Conv k3s2 ReLU 32×64×64

Conv k3s2 ReLU 64×32×32

Conv k3s2 ReLU 128×16×16

Conv k3s2 ReLU 256×8×8

Conv k3s2 ReLU 512×4×4

FC - - 512

FC - - 8192

Conv k3s1 - 512×4×4

PS2 - LReLU (α=0.2) 512×8×8

DecoderD Filter Activation function Output Conv k3s1 LReLU (α=0.2) 512×8×8

PS2 - - 512×16×16

Conv k3s1 LReLU (α=0.2) 256×16×16

PS2 - - 256×32×32

Conv k3s1 LReLU (α=0.2) 128×32×32

PS2 - - 128×64×64

Conv k3s1 LReLU (α=0.2) 64×64×64

PS2 - - 64×128×128

Conv k3s1 LReLU (α=0.2) 32×128×128

PS2 - - 32×256×256

Conv k7s1 tanh 3×128×128

2. Additional Results

The following Figures1,2,3, and4show additional results.

References

[SCH16] SHI W., CABALLERO J., HUSZÁR F., TOTZ J., AITKEN A. P., BISHOP R., RUECKERTD., WANGZ.: Real-time single im- age and video super-resolution using an efficient sub-pixel convolutional

Table 2:Overview of the architecture of BPNet. Convolutional fil- ters are specified in the format of “k(#kernel size)s(#stride)”.

EncoderE Filter Activation function Output

Conv k3s1 ReLU 16×128×128

Conv k3s2 ReLU 32×64×64

Conv k3s2 ReLU 64×32×32

Conv k3s2 ReLU 128×16×16

Conv k3s2 ReLU 256×8×8

Conv k3s2 ReLU 512×4×4

FC - ReLU 2048

FC - ReLU 512

FC - ReLU 256

FC - ReLU 128

FC - - 52

neural network. InProceedings of the IEEE conference on computer vision and pattern recognition(2016), pp. 1874–1883.

(2)

Kim et al. / Deep Learning-Based Unsupervised Human Facial Retargeting

Figure 1: Cyclic retargeting of expressions to verify robustness of our method. Each expression of the source model is retargeted to different models as shown (Target) and then retargeted back to the source model (Recovered).

(3)

Kim et al. / Deep Learning-Based Unsupervised Human Facial Retargeting

Figure 2: Results of our method on extreme expressions: anger, happiness, surprise, and sadness. We added 1000 more frames of animation to the training datasets of source and target models because the original dataset does not cover an extreme range of expressions. The expressions of the source model are reproduced well on Man A and Man C. In case of Mery, we observed that the angry and sad expressions are not convincingly transferred compared to the other expressions due to the large difference in facial proportion between the source and target models. However, the other two expressions are retargeted well.

(4)

Kim et al. / Deep Learning-Based Unsupervised Human Facial Retargeting

(5)

Kim et al. / Deep Learning-Based Unsupervised Human Facial Retargeting

Figure 4: Comparison of retargeting results produced by our method (Ours), cross-mapping (CM), and manifold alignment (MA). In all cases, our method generates results superior or comparable to those of other methods.

Referanser

RELATERTE DOKUMENTER

This operation can present undesirable folds in areas with high curvature or if the distance between origin and target points is large. Lorenz and Hilger worked on solutions to

The results from the second version indicate that when users experience moderate frustration, the social agent intervention to the users’ facial expressions seem to help lower the

Animation and Theatrical Performance of Masks In the Recorder interface (Face3DRecorder) of the au- thoring system it is possible to select eight standard facial expressions

After establishing the dense correspondences, we first transfer the facial expressions onto the deformed model by a deformation transfer technique, and then obtain the final

We achieve this using a very small set of prior knowledge in the form of facial expressions and phonemes, which is used to fit a Gaussian mixture model that maps sketched strokes

Using a multi-class SVM classification, and a large set of experiments an average facial expression recognition rate of 77.54% is obtained for the six basic facial expressions, on

Based on the aforementioned methodology, the facial animation retargeting method that is presented approximates the output facial expression that appears in a target face model, given

Figure 1: Our method can successfully transfer the blendshape weights of a source face model on the far left to target models from the second to the last of different