목차

Title Page

Contents

List of Abbreviations 17

ABSTRACT 19

CHAPTER Ⅰ. Introduction 22

1.1. Background 22

1.2. Motivation 25

1.3. Contributions 35

1.4. Outline 36

CHAPTER Ⅱ. Preliminaries of VSR 39

2.1. Propagation 39

2.1.1. Local Propagation 40

2.1.2. Recurrent Propagation 43

2.2. Alignment 44

2.2.1. Explicit Alignment 45

2.2.2. Implicit Alignment 47

2.3. Aggregation and Up-sampling 48

CHAPTER Ⅲ. Related Works 49

3.1. Single Image Super-Resolution 49

3.1.1. Regression-based Method 49

3.1.2. Diffusion Model-based Method 51

3.2. Video Super-Resolution 53

3.2.1. Local Propagation-based Method 53

3.2.2. Recurrent Propagation-based Method 55

CHAPTER Ⅳ. Proposed Video Super-Resolution Models 58

4.1. Group-based Bi-Directional Recurrent Wavelet Neural Network 61

4.1.1. Overview 61

4.1.2. Group-based Bi-Directional Recurrent Propagation 62

4.1.3. Temporal Wavelet Attention 67

4.2. Hierarchical Recurrent Spatio-Temporal Transformer Network 71

4.2.1. Overview 71

4.2.2. Hierarchical Bi-directional Recurrent Propagation 76

4.2.3. Hierarchical Recurrent Transformer 81

CHAPTER Ⅴ. Application for VVC 87

5.1. Versatile Video Coding (VVC) 87

5.2. Reference Picture Resampling (RPR) for VVC 91

5.3. Hierarchical Recurrent Neural Network for VVC 92

CHAPTER Ⅵ. Experimental Results 95

6.1. Datasets 95

6.1.1. Vimeo-90K 96

6.1.2. REDS 98

6.1.3. Vid4 99

6.2. Implementation Details 100

6.2.1. GBR-WNN 100

6.2.2. HiRN and HiRT 100

6.3. Comparison with The-State-Of-The-Arts 101

6.3.1. Quantitative Comparisons 102

6.3.2. Qualitative Comparisons 110

6.4. Ablation Studies for GBR-WNN 111

6.4.1. Impact of GBR Framework and TWA Module 111

6.4.2. Impact of Attention Mechanism in TWA Module and Number of TWA Module 112

6.5. Ablation Studies for HiRN and HiRT 114

6.5.1. Impact of TWA Module on HiRN 114

6.5.2. The effectiveness of bi-directional access branch in HiRN 115

6.5.3. The effectiveness of feature evolution in HiRN and HiRT 121

6.5.4. The analysis of the discrete wavelet transform 123

6.5.5. The summary of the effectiveness for the proposed methods 128

6.6. Results on Application for VVC 130

CHAPTER Ⅶ. Conclusion 134

References 137

ABSTRACT IN KOREAN 156

TABLE 6.1. Quantitative comparison (average PSNR(dB) and SSIM) on REDS4 (RGB channel) and Vid4 (Y channel) for 4x video SR. 103

TABLE 6.2. Perceptual quantitative comparison (average LPIPS and NIQE) on REDS4 (RGB channel) for 4x video SR. 105

TABLE 6.3. Comparison of the computational complexity on REDS4 (RGB channel) for 4x video SR. 107

TABLE 6.4. Analysis of adopted GBR framework and TWA module (Experiments here adopt a model with 20 RBs) on Vid4 for 4x video SR on... 110

TABLE 6.5. Analysis of attention mechanism, wavelet transform, number of TWA module (Experiments here adopt a model with 10 RBs) on Vid4... 113

TABLE 6.6. Analysis of adopted TWA module on the proposed HiRN on Vid4 (Y channel) for 4x video SR. 114

TABLE 6.7. Analysis of PSNR (dB) for bi-directional access branch (stage 2) on the proposed HiRN. 115

TABLE 6.8. Comparison of latency (ms) between existing bi-directional propagation-based method and the proposed HiRN for 10 frames in a clip 000 on REDS4 (RGB channel). 120

TABLE 6.9. Difference in PSNR for each swin transformer layer of HiRT on REDS4 (RGB channel) for 4x video SR. 125

TABLE 6.10. Comparison of computation time (ms) of five types of DWT for processing overall frames in a clip on REDS4 (RGB channel). 127

TABLE 6.11. The summary of the effectiveness of the proposed schemes on REDS4 (RGB channel) and Vid4 (Y channel) for 4x video SR. 129

TABLE 6.12. BD-rate reduction and encoding/decoding computational complexity of proposed HiRN-based RPR compared to VTM-11.0 nnvc-... 132

TABLE 6.13. BD-rate reduction and encoding/decoding computational complexity of proposed HiRN-based RPR compared to VTM-11.0 nnvc-... 132

Fig. 1.1. The example of the consecutive frames on clip 011 in REDS4 [1]. The red boxes mean 64 x 64 patches. 27

Fig. 1.2. Schematic illustration of three propagation frameworks. 32

Fig. 1.3. The results of the BasicVSR++ [2] and the proposed HiRT at frame 012 on clip 015 of the REDS4. 34

Fig. 1.4. The results of the BasicVSR++ [2] and the proposed HiRT at frame 009 on clip 011 of the REDS4. 35

Fig. 2.1. Different merging schemes of local propagation for VSR. 41

Fig. 2.2. The architecture of the FRVSR. 42

Fig. 2.3. The architecture of the BasicVSR. 42

Fig. 2.4. The architecture of the SOF-VSR. 45

Fig. 2.5. The architecture of the PCD alignment module in EDVR. 46

Fig. 2.6. The architecture of the ESPCN. 48

Fig. 3.1. The architecture of the SwinIR. 51

Fig. 3.2. The architecture of the SRDiff. 52

Fig. 3.3. The architecture of the DUF. 54

Fig. 3.4. The architecture of the BasicVSR++. 56

Fig. 4.1. An overview of the proposed GBR-WNN. 60

Fig. 4.2. The structure of the proposed GBR framework. 62

Fig. 4.3. The network architecture of the proposed TWA module. 66

Fig. 4.4. An example of the 2D Haar discrete wavelet transform (DWT). 69

Fig. 4.5. An overview of the proposed hierarchical recurrent network. 72

Fig. 4.6. The illustration of residual block-based feature propagation block (FPB). 74

Fig. 4.7. The illustration of transformer block-based feature propagation block (FPB) of stage 2. 82

Fig. 4.8. The illustration of pre-processing for transformer-based proposed network. 83

Fig. 4.9. The illustrations of swin transformer layer. 84

Fig. 5.1. Schematic illustration of the encoder and decoder frameworks of VVC standard for RPR. 90

Fig. 5.2. An overview of the VVC framework utilizing the proposed HiRN-based RPR. 93

Fig. 6.1. The example frames of the Vimeo-90K. 96

Fig. 6.2. The example frames of the REDS. 97

Fig. 6.3. The example frames of the REDS4. 98

Fig. 6.4. The example frames of the Vid4. 99

Fig. 6.5. Qualitative comparison of clip 000 on REDS4 for 4x video SR. Zoom in to see better visualization. 108

Fig. 6.6. Qualitative comparison of clip 015 on REDS4 for 4x video SR. Zoom in to see better visualization. 108

Fig. 6.7. Qualitative comparison of clip city on Vid4 for 4x video SR. Zoom in to see better visualization. 109

Fig. 6.8. Qualitative comparison of clip foliage on Vid4 for 4x video SR. Zoom in to see better visualization. 109

Fig. 6.9. The maximum optical flow of each frame on clip 015 and 020 in REDS4. 117

Fig. 6.10. The difference of PSNR (dB) of each frame on clip 020 in REDS4. 117

Fig. 6.11. The comparison of tendency between PSNR of bicubic and the difference of PSNR. 118

Fig. 6.12. Visualization of output feature maps of three branches in the proposed HiRN on Calendar and City clips of Vid4. 122

Fig. 6.13. The results of 1-level discrete wavelet transform (DWT) on clip 000 in REDS4. 124

Fig. 6.14. The results of each level of Haar discrete wavelet transform (DWT) on clip 000 in REDS4. 126

Fig. 6.15. Visualization of output feature maps of feature extractor (five residual blocks) for low-frequency sub-band image of each level of Haar... 127