목차

Title Page

Abstract

Contents

Chapter 1. Introduction 12

Chapter 2. Motivation 16

Chapter 3. Performance Per Temperature (PPT) 21

Chapter 4. SPLASH 25

4.1. System Overview 25

4.2. Considerations for Practicality 27

4.3. Lightweight Thermal Model 28

4.4. Adaptive Latency Model 31

4.5. Thermal-aware Scheduling and Policies 33

4.6. Summary 36

Chapter 5. Implementation 37

Chapter 6. Evaluation 39

6.1. Evaluation Setup 39

6.2. Performance for Long-term Workload 42

6.3. Case study: Band+Cloud vs Splash 44

6.4. Thermal Model Accuracy 46

6.5. Scheduling Overhead 47

Chapter 7. Related Work 49

Chapter 8. Conclusion 51

Bibliography 52

초록 62

Table 2.1. Specification of the mobile and cloud platform. 17

Table 2.2. Thermal efficiency among processors and cloud offloading varies in the same model, RetinaFace. The measured network bandwidth on cloud... 19

Table 4.1. Features for the estimation of power consumption for the thermal model. 30

Table 6.1. Two continuous vision AI application workloads. 40

Figure 1.1. An illustration of continuous vision AI application. It executes both a text detection model and an object detection model concurrently... 13

Figure 2.1. Performance degradation from thermal throttling. The NPU shut-down and latency increase of processors deteriorate the user experience. 18

Figure 2.2. The timeline of latency changes during the execution of the person finder workload on the existing DNN inference system. The Mobile-Cloud... 19

Figure 3.1. The processors' performance per temperature across workloads. MobileNetV2, EAST, EfficientDet-Lite, and DeepLabV3 are used... 22

Figure 3.2. The performance per temperature with the variation of network bandwidth. The measured bandwidth of Strong, Regular, and Weak is 249... 23

Figure 4.1. System architecture of SPLASH.[이미지참조] 26

Figure 6.1. The evaluation setup for Splash with two mobile devices, one cloud server with a network router, and the thermal incubator. 41

Figure 6.2. Experiment results over camera frame rate. Splash outperforms the baselines on both Google Pixel 4 and Samsung Galaxy S20 with two... 42

Figure 6.3. Experiment results of the person finder workload with 30 FPS. Splash serves up to 2.20x longer to severe throttling status, resulting in a... 43

Figure 6.4. The difference of scheduling between Band+Cloud and our system within one frame on the person finder workload with 30 FPS. Splash assigns... 45

Figure 6.5. The timeline of latency changes over time of Band+Cloud and Splash in the person finder workload with FPS 30 on Google Pixel 4.... 46

Figure 6.6. The performance of the thermal models of Splash. The predicted values follow the same trend as the measured values. 47

Algorithm 1. Splash scheduler 33

Algorithm 2. Minimum Heat within SLO policy 34

Algorithm 3. Max Weighted PPT policy 35