목차

Title Page

Contents

Abstract 8

Korean Abstract 9

1. INTRODUCTION 10

2. MOTIVATION 13

2.1. Measurement Setup 13

2.2. Characteristics of Each Layer Group 14

2.3. Impact of DNN Model Partitioning 14

2.4. Impact of Pipeline of Processing and Communication 16

3. SYSTEM MODEL 17

3.1. Service Model 17

3.2. Resource and Energy Model 19

3.3. E2E Latency Model 20

4. PROBLEM DEFINITION AND ALGORITHM 21

4.1. Problem Definition 21

4.2. Algorithm Derivation 23

4.3. Algorithm Description 24

4.4. Performance Analysis 28

5. SIMULATIONS 29

5.1. Simulation Setup 29

5.2. Simulation Results 30

6. EXPERIMENTS 32

6.1. Experiment Setup 32

6.2. Experiment Results 32

7. RELATED WORK 35

7.1. Binary Decision-based Mobile Edge Computing 35

7.2. DNN Model Partitioning 35

8. CONCLUSION 37

REFERENCES 38

9. APPENDIX 41

9.1. Proof of Lemma 1 41

9.2. Proof of Lemma 2 41

9.3. Proof of Theorem 1 42

Fig. 1. Processing/communication workloads for each layer group of YOLOv4-tiny model. 13

Fig. 2. An illustration of the impact of pipeline in DNN model partitioning. 14

Fig. 3. QoE measurements for different partition points in an un-pipeline system. 15

Fig. 4. Fps measurements of un-pipeline, pipeline systems. 16

Fig. 5. Architecture of mobile-edge based DNN model partitioning system. 17

Fig. 6. QoE results of RT-DMP for different target E2E latency. 29

Fig. 7. Ratio of selected control parameters of RT-DMP for 0.04 seconds of target E2E latency. 30

Fig. 8. Processed fps and energy consumption of RT-DMP for different weight parameter and comparison with other algorithms. 31

Fig. 9. Processed fps, energy consumption per frame and E2E latency of RT-DMP, RT-MEC, Neuro-surgeon and DADS on Jetson TX2. 32

Fig. 10. Processed fps, network conditions, partition point selection, GPU and network energy consumption per frame, and E2E latency by running RT-DMP on Jetson TX2. 33