Title Page
ABSTRACT
Contents
Chapter 1. INTRODUCTION 18
Chapter 2. Field Background 21
2.1. Deep Reinforcement Learning 22
2.2. Multiagent-Systems 26
2.3. Multiagent reinforcement Learning 29
2.4. Training Scheme 30
2.5. Related Works and Background 33
Chapter 3. Grey Wolf Optimization (GWO) 38
3.1. Encircling Prey 39
3.2. Hunting 40
3.3. Attacking Prey 41
Chapter 4. Q-value selection using optimization and DRL (QSOD) 44
4.1. Flowchat of Present invention 55
Chapter 5. Performance Evaluation 59
5.1. Starcraft II 59
5.2. Experimental Results 60
5.3. Win-rate 61
5.4. Training Loss 64
5.5. Convergence 65
5.6. Training Time 66
Chapter 6. Conclusions and future work 68
References 69
요약 76
Table 1. Symbols used in algorithm:1 and equation(1) 25
Table 2. Symbols and Abbreviations used in Algorithm:2 and GWO 43
Table 3. Symbols used in Algorithm 3 and Equation (10) 49
Table 4. Agent details 61
Table 5. Win rate final values 62
Table 6. Time required by QMIX, QVMix, and the Proposed QSOD for different scenarios 66
FIGURE 1. Reinforcement Learning 21
FIGURE 2. Deep Reinforcement Learning 23
FIGURE 3. Competitive MAS 28
FIGURE 4. Cooperative MAS 28
FIGURE 5. Centralized Training 31
FIGURE 6. Decentralized Training 31
FIGURE 7. Hybrid Training 32
FIGURE 8. Social Hierarchy of GWO 38
FIGURE 9. Updating the Position of agents in GWO 39
FIGURE 10. QSOD 46
FIGURE 11. StarCraft II Environment 59
FIGURE 12. Win rates during training for the proposed algorithm, QVMix, and QMIX across five different scenarios. 63
FIGURE 13. Training loss during training for the proposed algorithm, QVMix, and QMIX across five different scenarios. 64
FIGURE 14. Convergence of win rate rolling average in 3m 65
Algorithm 1. Deep Reinforcement-learning 24
Algorithm 2. Grey Wolf Optimizer (GWO) 42
Algorithm 3. Attention-based hybrid policy for deep MARL (QSOD) 51
Algorithm 3-A. Attention-based hybrid policy for deep MARL 54