표제지
요약
목차
1. 서론 12
1.1. 연구 배경 12
1.2. 선행 연구 15
1.3. 연구 목표 17
2. A2C 기법 19
2.1. 강화학습 19
2.1.1. 강화학습 개념 및 목표 19
2.1.2. 마르코프 결정 과정(Markov Decision Process) 21
2.1.3. 강화학습 알고리즘 27
2.2. 액터-크리틱 기법 30
2.2.1. 심층강화학습 30
2.2.2. Q 액터 크리틱 기법(Q Actor-Critic Method) 30
2.2.3. A2C 기법(Advantage Actor-Critic Method) 32
3. 자율운항 무인선 강화학습 환경 설계 34
3.1. 에이전트 대상선 34
3.2. 에이전트 행동 36
3.3. 강화학습 환경 상태 38
3.4. 강화학습 환경 보상 40
4. A2C 기법을 활용한 자율운항 강화학습 43
4.1. A2C 신경망 구성 및 강화학습 환경 구현 43
4.2. 시나리오 선정 46
4.3. 강화학습 결과 48
5. 결론 61
6. 참고문헌 63
Table 1. Statistics on Korea marine safety accidents in the last 4 years 13
Table 2. Discrete Action Space of Agent 37
Table 3. Hyper parameter of A2C Neural Network 44
Figure 1. Different type of Autonomous ship 13
Figure 2. Overview of reinforcement learning for agent and environment 20
Figure 3. Q Actor-Critic Network pseudo code 31
Figure 4. Advantage Actor-Critic Network pseudo code 33
Figure 5. Ship Fixed Coordinate System and Ship Information 35
Figure 6. Action of agent with thrust control system 36
Figure 7. State of Reinforcement Environment 39
Figure 8. Reward Flow Chart for Collision Avoidance of Target Ship 40
Figure 9. Reward Flow Chart for Collision Avoidance of Static Obstacle 41
Figure 10. Composition of A2C Neural Network 43
Figure 11. Composition of Reinforcement Learning Environment 45
Figure 12. Scenario Cases of the Target ship Encounters 46
Figure 13. Scenario Cases of the Target ship Encounters with Static Obstacle 47
Figure 14. Reinforcement Learning Result of Case 1 49
Figure 15. Reinforcement Learning Result of Case 2 51
Figure 16. Reinforcement Learning Result of Case 3 53
Figure 17. Reinforcement Learning Result of Case 4 55
Figure 18. Reinforcement Learning Result of Case 5 57
Figure 19. Reinforcement Learning Result of Case 6 59