목차

표제지

요약

목차

1. 서론 12

1.1. 연구 배경 12

1.2. 선행 연구 15

1.3. 연구 목표 17

2. A2C 기법 19

2.1. 강화학습 19

2.1.1. 강화학습 개념 및 목표 19

2.1.2. 마르코프 결정 과정(Markov Decision Process) 21

2.1.3. 강화학습 알고리즘 27

2.2. 액터-크리틱 기법 30

2.2.1. 심층강화학습 30

2.2.2. Q 액터 크리틱 기법(Q Actor-Critic Method) 30

2.2.3. A2C 기법(Advantage Actor-Critic Method) 32

3. 자율운항 무인선 강화학습 환경 설계 34

3.1. 에이전트 대상선 34

3.2. 에이전트 행동 36

3.3. 강화학습 환경 상태 38

3.4. 강화학습 환경 보상 40

4. A2C 기법을 활용한 자율운항 강화학습 43

4.1. A2C 신경망 구성 및 강화학습 환경 구현 43

4.2. 시나리오 선정 46

4.3. 강화학습 결과 48

5. 결론 61

6. 참고문헌 63

Table 1. Statistics on Korea marine safety accidents in the last 4 years 13

Table 2. Discrete Action Space of Agent 37

Table 3. Hyper parameter of A2C Neural Network 44

Figure 1. Different type of Autonomous ship 13

Figure 2. Overview of reinforcement learning for agent and environment 20

Figure 3. Q Actor-Critic Network pseudo code 31

Figure 4. Advantage Actor-Critic Network pseudo code 33

Figure 5. Ship Fixed Coordinate System and Ship Information 35

Figure 6. Action of agent with thrust control system 36

Figure 7. State of Reinforcement Environment 39

Figure 8. Reward Flow Chart for Collision Avoidance of Target Ship 40

Figure 9. Reward Flow Chart for Collision Avoidance of Static Obstacle 41

Figure 10. Composition of A2C Neural Network 43

Figure 11. Composition of Reinforcement Learning Environment 45

Figure 12. Scenario Cases of the Target ship Encounters 46

Figure 13. Scenario Cases of the Target ship Encounters with Static Obstacle 47

Figure 14. Reinforcement Learning Result of Case 1 49

Figure 15. Reinforcement Learning Result of Case 2 51

Figure 16. Reinforcement Learning Result of Case 3 53

Figure 17. Reinforcement Learning Result of Case 4 55

Figure 18. Reinforcement Learning Result of Case 5 57

Figure 19. Reinforcement Learning Result of Case 6 59