목차

Title Page

Abstract

국문요약

Contents

Chapter 1. Introduction 15

1.1. Motivation 15

1.2. Contribution 16

1.3. Organization of the Dissertation 16

Chapter 2. Related Work 17

Chapter 3. Method 19

3.1. Conventional knowledge distillation methods 19

3.1.1. Logit-Based 20

3.1.2. Feature-Based 21

3.2. Penalized Logit And Feature distillation : PLAF 22

3.2.1. Penalized Logit 22

3.2.2. Feature map 24

3.2.3. PLAF Loss 25

3.2.4. PLAF Architecture 25

3.2.5. Pseudo-code 26

Chapter 4. Experiments 27

4.1. CIFAR-100 Image dataset 27

4.2. Network Architecture details 27

4.3. Experimental Details 28

4.4. Experimental Results 29

Chapter 5. Conclusion 32

Summary 33

References 34

Appendices 37

Appendix A. Appendix 37

Appendix B. Abbreviations 38

Table 4.1. Ablation study on resnet110 and resnet20. 29

Table 4.2. Ablation study on resnet56 and resnet20. 29

Table 4.3. Ablation study on resnet110 and resnet20 with different λ. 29

Table 4.4. Accuracies(%) of using penalized logit(PL) itself 30

Table 4.5. Accuracies(%) of same network architecture using PLAF loss 30

Table 4.6. Accuracies(%) of different network architecture using PLAF loss 31

Table A.1. List of balancing factor b 37

Figure 3.1. Logit-based KD structure. 20

Figure 3.2. Feature-based KD structure. 21

Figure 3.3. PLAF KD structure. 25

Algorithm 1. PLAF Loss 26