목차

Title Page

ABSTRACT

Contents

Chapter 1. INTRODUCTION 18

Chapter 2. Field Background 21

2.1. Deep Reinforcement Learning 22

2.2. Multiagent-Systems 26

2.3. Multiagent reinforcement Learning 29

2.4. Training Scheme 30

2.5. Related Works and Background 33

Chapter 3. Grey Wolf Optimization (GWO) 38

3.1. Encircling Prey 39

3.2. Hunting 40

3.3. Attacking Prey 41

Chapter 4. Q-value selection using optimization and DRL (QSOD) 44

4.1. Flowchat of Present invention 55

Chapter 5. Performance Evaluation 59

5.1. Starcraft II 59

5.2. Experimental Results 60

5.3. Win-rate 61

5.4. Training Loss 64

5.5. Convergence 65

5.6. Training Time 66

Chapter 6. Conclusions and future work 68

References 69

요약 76

Table 1. Symbols used in algorithm:1 and equation(1) 25

Table 2. Symbols and Abbreviations used in Algorithm:2 and GWO 43

Table 3. Symbols used in Algorithm 3 and Equation (10) 49

Table 4. Agent details 61

Table 5. Win rate final values 62

Table 6. Time required by QMIX, QVMix, and the Proposed QSOD for different scenarios 66

FIGURE 1. Reinforcement Learning 21

FIGURE 2. Deep Reinforcement Learning 23

FIGURE 3. Competitive MAS 28

FIGURE 4. Cooperative MAS 28

FIGURE 5. Centralized Training 31

FIGURE 6. Decentralized Training 31

FIGURE 7. Hybrid Training 32

FIGURE 8. Social Hierarchy of GWO 38

FIGURE 9. Updating the Position of agents in GWO 39

FIGURE 10. QSOD 46

FIGURE 11. StarCraft II Environment 59

FIGURE 12. Win rates during training for the proposed algorithm, QVMix, and QMIX across five different scenarios. 63

FIGURE 13. Training loss during training for the proposed algorithm, QVMix, and QMIX across five different scenarios. 64

FIGURE 14. Convergence of win rate rolling average in 3m 65

Algorithm 1. Deep Reinforcement-learning 24

Algorithm 2. Grey Wolf Optimizer (GWO) 42

Algorithm 3. Attention-based hybrid policy for deep MARL (QSOD) 51

Algorithm 3-A. Attention-based hybrid policy for deep MARL 54