Title Page
Contents
Abstract 11
Chapter 1. Introduction 13
Chapter 2. Related Work 15
2.1. Facial Re-targeting 15
2.1.1. blendshapes-based Re-targeting 16
2.2. Facial Expression Recognition Method 17
2.2.1. Depth-wise Separable Convolution 17
2.2.2. Facial Expression Recognition Vision Transformer 18
2.3. Multi-task Learning For Facial Expression And Intensity Estimation 18
Chapter 3. Dataset 20
3.1. Facial Emotion Recognition Dataset 20
3.1.1. Data Collection 20
3.1.2. Annotation Process 20
Chapter 4. Methods 23
4.1. Deep Learning Model 23
4.1.1. Problem Statement 23
4.1.2. Overall Architecture 24
4.2. Model Transformation 27
4.3. Facial Re-targeting Module 28
4.3.1. Face Crop In Unity3D Engine 29
4.3.2. Facial Re-targeting 32
Chapter 5. Experiments 34
5.1. Experimental Settings 34
5.2. Baseline Models 35
5.2.1. Machine Learning Baselines 35
5.2.2. Deep Learning Baselines 36
5.3. Experimental Results 37
5.3.1. Model Performance 37
5.3.2. Evaluate Facial Re-targeting 38
5.3.3. Facial Re-targeting Results 38
5.3.4. Facial Re-targeting Performance 41
5.3.5. Facial Re-targeting MOS 44
5.3.6. Summary 45
Chapter 6. Conclusion 48
References 49
초록 60
Table 3.1. Descriptive statistics of the Number of facial expressions and intensity dataset. (*) means facial expression intensity. 22
Table 5.1. Number of samples in train and test folds of EDFR dataset. 34
Table 5.2. Classification performance comparisons between nine baseline models and the proposed model. 35
Table 5.3. Intensity performance comparisons between nine baseline models and the proposed model. 36
Table 5.4. Performance comparisons embedded on facial re-targeting models and algorithms based Inference Workers. 47
Figure 3.1. An illustration of the proposed Framework, EDFR Encoder Framework. 21
Figure 3.2. An illustration of the pivot image of facial expression intensity. (*) means facial expression intensity. 22
Figure 4.1. An illustration of the proposed model, EDFR. 24
Figure 4.2. An illustration of the Emotion Encoder. 25
Figure 4.3. An illustration of the Intensity Encoder. 26
Figure 4.4. An illustration of the Model Transformation. 27
Figure 4.5. An illustration of the Facial Re-targeting Module. 28
Figure 5.1. Representation of landmarks taken from 468 landmarks (Face Mesh) and 68 landmarks (Dlib). It is a difficult task to accurately ascertain the angle of displacement... 39
Figure 5.2. Comparative overall results obtained real-time with webcam. 40
Figure 5.3. Comparative results Angry facial expression intensity. 41
Figure 5.4. Comparative results Embarrassed facial expression intensity. 42
Figure 5.5. Comparative results Happy facial expression intensity. 43
Figure 5.6. The average facial image, in the form of a Happy. (1) is the least intensive Happy and (5) the most intensive Happy face. (6) is a face landmark that is represented... 44