Title Page
Abstract
국문요약
Contents
Chapter 1. Introduction 15
1.1. Motivation 15
1.2. Contribution 16
1.3. Organization of the Dissertation 16
Chapter 2. Related Work 17
Chapter 3. Method 19
3.1. Conventional knowledge distillation methods 19
3.1.1. Logit-Based 20
3.1.2. Feature-Based 21
3.2. Penalized Logit And Feature distillation : PLAF 22
3.2.1. Penalized Logit 22
3.2.2. Feature map 24
3.2.3. PLAF Loss 25
3.2.4. PLAF Architecture 25
3.2.5. Pseudo-code 26
Chapter 4. Experiments 27
4.1. CIFAR-100 Image dataset 27
4.2. Network Architecture details 27
4.3. Experimental Details 28
4.4. Experimental Results 29
Chapter 5. Conclusion 32
Summary 33
References 34
Appendices 37
Appendix A. Appendix 37
Appendix B. Abbreviations 38
Table 4.1. Ablation study on resnet110 and resnet20. 29
Table 4.2. Ablation study on resnet56 and resnet20. 29
Table 4.3. Ablation study on resnet110 and resnet20 with different λ. 29
Table 4.4. Accuracies(%) of using penalized logit(PL) itself 30
Table 4.5. Accuracies(%) of same network architecture using PLAF loss 30
Table 4.6. Accuracies(%) of different network architecture using PLAF loss 31
Table A.1. List of balancing factor b 37
Figure 3.1. Logit-based KD structure. 20
Figure 3.2. Feature-based KD structure. 21
Figure 3.3. PLAF KD structure. 25
Algorithm 1. PLAF Loss 26