목차

Title Page

Abstract

Contents

Nomenclature 17

Chapter 1. Introduction 19

1.1. Background 19

1.1.1. Convolutional Neural Network 19

1.1.2. Traffic Sign Classification 22

1.1.3. Traffic Sign Detection 23

1.2. Contribution and Key Concepts 24

1.3. Organization of the Dissertation 27

Chapter 2. Attention based Convolutional-Pooling Neural Network for Traffic Sign Classification 28

2.1. Motivation 28

2.2. Attention-Mechanism 29

2.3. Convolutional-Pooling Neural Network 30

2.3.1. Convolution Layer 30

2.3.2. Convolutional-Pooling Layer 31

2.3.3. Classification Layer 32

2.4. Attention based Convolutional-Pooling Neural Network 33

2.5. Harsh Traffic Sign Classification Experiments 35

2.5.1. Data Augmentation 36

2.5.2. ACPNN Architecture Selection 37

2.5.3. Evaluation and Comparison Results 38

2.6. Discussion 54

Chapter 3. An Advanced Classifier for Traffic Sign Classification considering External Environments 55

3.1. Motivation 55

3.2. Network Architecture 57

3.2.1. Convolution Layer 57

3.2.2. Deconvolution Layer 57

3.2.3. Attention Layer 58

3.2.4. Convolutional-Pooling 60

3.2.5. Attentional-Deconvolution Module-based Net 61

3.3. Noised Coupled Traffic Sign Classification and Comparison Results 67

3.3.1. Data Augmentation 67

3.3.2. ADM-Net Architecture Selection 68

3.3.3. Evaluation and Comparison Results 71

3.4. Discussion 85

Chapter 4. Feature-Selection-based Attentional-Deconvolution Detector for German Traffic Sign Detection Benchmark 87

4.1. Motivation 87

4.2. Network Architecture 89

4.2.1. Structure of YOLOv5 89

4.2.2. L1-Norm Feature Selection 90

4.2.3. Feature-Selection-based Attentional-Deconvolution Detector 94

4.3. Traffic Sign Detection and Comparison Results 98

4.3.1. Preprocess 98

4.3.2. Evaluation and Comparison Results 101

4.4. Discussion 113

Chapter 5. Conclusions 114

5.1. Contributions 115

5.2. Future Works 116

Bibliography 117

Table 2.1. Architecture selection for CNN and ACPNN. 38

Table 2.2. Hyper-parameters selected for training. 39

Table 2.3. Dataset information corresponding classification performance in the tables. 41

Table 2.4. Classification performance using original GTSRB training dataset. 43

Table 2.5. Classification performance for the original GTSRB training dataset with blur. 43

Table 2.6. Classification performance for the original GTSRB training dataset images with missing information. 43

Table 2.7. Classification performance for the original GTSRB training dataset with illumination effects. 44

Table 2.8. Classification performance for original GTSRB training dataset with contrast normalization. 44

Table 2.9. Architecture selection for hierarchical CNN. 45

Table 2.10. Classification performance for CNN, hierarchical CNN, and ACPNN (3-attention) for the original GTSRB dataset with blur. 47

Table 2.11. Classification performance for CNN, hierarchical CNN, and ACPNN (3-attention) for the original GTSRB dataset with missing information. 47

Table 2.12. Classification performance for CNN, Multi-scale CNN, Committee, hierarchical CNN, MCDNN, and ACPNN (3-attention) for the original GTSRB... 52

Table 2.13. Classification performance for CNN, Multi-scale CNN, Committee, hierarchical CNN, MCDNN, and ACPNN (3-attention) for the original GTSRB... 52

Table 2.14. Memory usages and training/testing time comparisons of each approach. 53

Table 3.1. Architecture selection for ADM-Net. 69

Table 3.2. Selection of hyper parameters for training. 71

Table 3.3. Classification evaluations using the original GTSRB datasets with blur as training. 72

Table 3.4. Classification evaluations using the original GTSRB datasets with missing information as training. 72

Table 3.5. Classification evaluations using the original GTSRB datasets with illumination as training. 75

Table 3.6. Classification evaluations using the original GTSRB datasets with missing information as training. 75

Table 3.7. Classification evaluations using the original GTSRB datasets with missing information as training. 78

Table 3.8. Classification evaluations using the original GTSRB datasets with illumination as training. 78

Table 3.9. Classification evaluations using the original GTSRB datasets with blur as training. 80

Table 3.10. Classification evaluations using the original GTSRB datasets with missing information, blur, and illumination as training. 80

Table 3.11. Ablation studies of classification evaluations using the original GTSRB datasets with missing information, blur, and illumination as training. 81

Table 3.12. Training and test information for each network from Table 3 to 9. 84

Table 3.13. Training and test information for each network for Table 10. 84

Table 4.1. L1-norm selection for YOLOv5. 101

Table 4.2. Selection of receptive field sizes based on the L1-norm. 101

Table 4.3. Hyper parameter selections for YOLOv5, YOLOv6, YOLOv7, and FSADD. 101

Table 4.4. Traffic sign recognition results of the FSADD for the GTSDB dataset. 103

Table 4.5. Traffic sign recognition results of YOLOv5 using GTSDB. 106

Table 4.6. Traffic sign recognition results of YOLOv6 using GTSDB. 107

Table 4.7. Traffic sign recognition results of YOLOv7 using GTSDB. 108

Table 4.8. Traffic sign recognition comparisons using GTSDB. 111

Figure 1.1. Examples of several traffic signs affected by the external noises in real applications. 20

Figure 2.1. Three branches of the proposed ACPNN: (a) input images, (b) feature extraction and training (the convolution layer applies the attention mecha-... 34

Figure 2.2. Example images for (a) original GTSRB test dataset and (b) new test dataset with simulated harsh conditions (Blur, illumination, and missing information). 36

Figure 2.3. Examples of generated GTSRB training datasets for harsh conditions: (a) original GTSRB images, and images with (b) blur, (c) missing infor-... 40

Figure 2.4. Examples of feature maps when the attention mechanism is applied: (a) Generated traffic signs with missing information or blur, (b) The bright... 42

Figure 2.5. Examples images for hierarchical CNN and the proposed ACPNN: (a) original GTSRB test dataset, (b) blur and missing information cases. 46

Figure 2.6. Training accuracy for CNN, hierarchical CNN, and proposed ACPNN for the original GTSRB dataset with blur. 49

Figure 2.7. Training loss for CNN, hierarchical CNN, and proposed ACPNN for the original GTSRB dataset with blur. 49

Figure 2.8. Training accuracy for CNN, hierarchical CNN, and proposed ACPNN for the original GTSRB dataset with missing information. 50

Figure 2.9. Training loss for CNN, hierarchical CNN, and proposed ACPNN for the original GTSRB dataset with missing information. 50

Figure 3.1. Architecture of ADM-Net including convolutional-pooling, FCN, and two ADMs with attention and deconvolution layers. 61

Figure 3.2. Design of ADMs in ADM-Net: (a) first ADM applied to feature-reweighted map, (b) second ADM applied to another feature-reweighted map. 62

Figure 3.3. Example images for (a) original GTSRB test datasets and (b) new test datasets affected by harsh conditions (blur and missing information). 67

Figure 3.4. Generated training images for (a) original GTSRB images, and (b) images with blur, (c) missing information, and (d) illumination. 71

Figure 3.5. Generated test images for (a) original GTSRB images and (b) missing information with illumination. 74

Figure 3.6. Generated test images for (a) original GTSRB images, (b) missing information with illumination, and (c) missing information with illumination and blur. 77

Figure 3.7. Training accuracy for GTSRB datasets (0 to 20 epoch). 82

Figure 3.8. Training accuracy for GTSRB datasets (10 to 20 epoch). 82

Figure 3.9. Training loss for GTSRB datasets (0 to 20 epoch). 83

Figure 3.10. Training loss for GTSRB datasets (10 to 20 epoch). 83

Figure 4.1. Examples of feature maps sorted by L1-norm values: (a) original images from GTSDB; (b) obtained feature maps from CSPDark-net53. 91

Figure 4.2. Examples of grouping images by similarity: (a) original input image from the GTSDB dataset; (b) extracted feature maps from the convolution layer;... 92

Figure 4.3. Process of the CSP-block in the CSPDarknet53 using L1-norm feature selection. 93

Figure 4.4. Architecture of the proposed feature-selection-based attentional-deconvolution detector (FSADD). 95

Figure 4.5. Forty-three kinds of German traffic signs from GTSDB dataset. 98

Figure 4.6. Examples from the GTSDB dataset. 99

Figure 4.7. Traffic sign detection results using the FSADD - part 1: (a) ground truth - 17 and 38, (b) ground truth - 8 and 10, (c) ground truth - 14 and 36, (d)... 104

Figure 4.8. Traffic sign detection results using the FSADD - part 2: (a) ground truth - 2 and 9, (b) ground truth - 1, (c) ground truth - 15, (d) detection results... 105

Figure 4.9. Traffic sign detection performance comparisons of YOLOv5, YOLOv6, YOLOv7, and FSADD. 110