Title Page
Contents
초록 10
Ⅰ. Introduction 12
Ⅱ. Image Outpainting 14
1. Introduction 14
2. Related Work 18
1) Image Completion 18
2) Image Captioning 18
3) Text-guided Image Manipulation 19
3. Proposed Method 20
1) Captioning-based Extensive Painting Module 20
2) Image Outpainting 24
3) Wide-range Image Blending 24
4. Experiments 24
1) Base line Methods 25
2) Datasets 25
3) System Set-up 26
4) Quantitative Results 27
5) Qualitative Results 35
6) Ablation Studies 35
5. Limitation 40
Ⅲ. Video Editing 41
1. Introduction 41
2. Related Work 44
1) Text-Guided Editing 44
2) Optical Flow Estimation 45
3. Proposed Method 46
1) Preliminary 49
2) Motion Map Injection Module 49
4. Experiment 52
1) Experimental Setup 52
2) User Study 52
3) Quantitative Results 53
4) Qualitative Results 56
5) Ablation Study 57
6) Application 59
7) Limitation 60
Ⅳ. Conclusion 61
Ⅴ. Reference 63
Table 1. Quantitative results for the image outpaitning task on Landscape (Yang et al, 2019), Landmarks (Weyand et al, 2020), and AmsterTime (Yildiz... 26
Table 2. Quantitative results for wide-range image blending task on Landma rks, Scenery, and AmsterTime datasets using metrics on on FID (the lower,... 27
Table 3. Effect of SCST optimization on the image captioning model; Image captioning model OFA was optimized via SCST method using CIDEr with... 36
Table 4. Effect of mask sizes; Image inpainting was conducted at the last stage of wide-range image blending on Landmarks dataset, using OFA and... 37
Table 5. Comparison of hint-based image outpainting models on beach data set. 37
Table 6. Comparison of captioning models on Landmarks dataset. 38
Table 7. A user study evaluating the performance of Video-P2P and the output of our model conducted on three items: Structure Preserving, Text Ali... 53
Table 8. Quantitative comparison results. Higher performance was recorded for CLIP Score and Masked PSNR compared to Video-P2P. 53
Figure 1. Illustration of image outpainting and wide-range image blending. Image outpainting is a task which aims to extend a given image beyond its... 14
Figure 2. Overall process of extensive painting with the proposed CEP module. Left: Image outpainting is performed through our CEP module... 19
Figure 3. Qualitative results of image outpainting task on AmsterTime dataset. 31
Figure 4. Qualitative results of wide-range image blending task on Image 4K dataset. 34
Figure 5. Effect of optimizing an image captioning model on wide-range image blending; Withoutthe SCST optimization process, the text-guided image... 35
Figure 6. Effect of captions for extensive painting; Image captions for missing region provides diverse and natural images. Red boxes indicate... 36
Figure 7. Comparison of OFA and ClipCap captionings with GLIDE 39
Figure 8. Failure cases; The word "blurry" in captions generated blurry images. Red boxes indicate generated regions. 40
Figure 9. Comparison of video editing output and attention map compared to existing methods. Both Image-P2P and Video-P2P failed to accurately est... 41
Figure 10. Overall framework of this study. First, the T2V-Model generates an attention map by receiving video and prompts as input.... 47
Figure 11. Examples that the existing video editing model could not edit object having motion. However, editing was performed by directly injecting... 48
Figure 12. Experimental results of our study. It shows the result of four examples corresponding to each target prompt. The top row of each... 55
Figure 13. Output of various methods to measure the correlation between the motion map and the attention map. 57
Figure 14. Editing method for objects moving in the direction specified by the user. Before editing, the user first selects one of the 8 directions. 58
Figure 15. Video-p2p and our video editing results and estimated optical flow in inaccurate motion estimation 60