목차

Title Page

ABSTRACT

Contents

1. Introduction 13

1.1. Pedestrian Guidance System 13

1.2. Research Content 15

1.3. Research Target 16

2. Literature Review 17

2.1. Attention Mechanisms in computer vision 17

2.2. Squeeze-and-Excitation Networks 19

2.3. Convolutional Block Attention Modules 20

2.3.1. Spatial Attention Module (SAM) 21

2.3.2. Channel Attention Module (CAM) 22

2.4. Perspective Transformation in Image Processing 23

3. Real Time Semantic Segmnetation 25

3.1. Speed-Accuracy Trade-offs in Existing Models 25

3.2. Bilateral Segmentation Network (BiSeNet) 26

3.2.1. Attention Refinement Module 26

3.2.2. Feature Fusion Module 26

3.3. STDC2 27

3.3.1. STDC Module 27

4. Datasets 28

4.1. CityScapes 28

4.2. Korean Pedestrian Dataset 30

4.3. Training with Korean pedestrian dataset 31

4.3. Warp Perspective Transform 34

5. Experiment 37

5.1. Proposed Method 37

5.2. Experiment 38

5.2.1. Environment and setting 38

5.2.2. Result 38

6. Conclusion 40

Reference 41

Table 1. Categories of CityScapes dataset 29

Table 2. Categories of Korean Pedestrian dataset 31

Table 3. After label merging categories of Korean Pedestrian dataset 32

Table 4. Perspective comparison between datasets 33

Table 5. Comparison of Warp Perspective Transform 35

Table 6. Training result comparison with stacked / non stacked images 35

Table 7. IoU of each class by stacked / non stacked images 35

Table 7. Result comparison of Perspective transform training 36

Table 8. Experiment model structure 37

Table 9. Environment and setting 38

Table 10. Experiment Result 38

Table 11. Class accuracy for each experiment case 39

Fig 1. Example screen of the Pedestrian Guidance System 14

Fig 2. Illustration of human vision focus. 17

Fig 3. Illustration of attention changes in human vision 18

Fig 4. Transformer Architecture, Scaled Dot Product Attention, and Multi-Head Attention. 18

Fig 5. Convolutional Block Attention Module layout 20

Fig 6. Feature Maps representation as a Tensor 21

Fig 7. Spatial Attention Module 21

Fig 8. Channel Attention Module 22

Fig 9. Example of CityScapes Dataset 28

Fig 10. Inference Image with STDC2 trained with CityScapes dataset on Korean pedestrian environment 30

Fig 11. Example of feature redunduncy in Korean Pedestrian dataset 1 31

Fig 12. Example of feature redunduncy in Korean Pedestrian dataset 2 32

Fig 13. After label merging, the result of STDC2 trained on Korean Pedestrian dataset 34