목차

Title Page

Abstract

Contents

Chapter 1. Introduction 15

1.1. Problem Statement 15

1.2. Background, Motivation, and Necessities 17

1.3. Literature Review 20

1.3.1. Iterative Methods for Solving AREs 20

1.3.2. Model-Free Policy Iteration Methods 21

1.3.3. ADP Methods Without Initial Admissible Policies 22

1.3.4. The Koopman Operator for Control 22

1.3.5. Learning-Based Koopman Operator Applications 24

1.4. Objectives and Contributions 25

1.4.1. Objectives 25

1.4.2. Contributions 25

1.5. Dissertation Outline 29

Chapter 2. Theoretical Backgrounds 31

2.1. Notation 31

2.2. Mathematical Preliminaries 32

2.2.1. The Matrix Inertia Theorem 32

2.2.2. Fréchet Derivatives 32

2.2.3. The Koopman Operator 33

2.3. Linear System Theory 36

2.3.1. Controllability and Observability 36

2.3.2. Algebraic Riccati Equations 37

2.3.3. Lyapunov Equations 39

2.4. The Kleinman Iteration 41

2.5. Meta-Learning 44

2.5.1. Optimization Problem Formulations 44

2.5.2. Closed-Form Base Learners 45

Chapter 3. Data-Driven Optimal Control for Unknown Linear Systems 47

3.1. Implicit Value Functions 47

3.2. The Surrogate Q-Learning 52

3.2.1. Surrogate Q-Functions for Continuous-Time Systems 52

3.2.2. The Surrogate Q-Learning Algorithm 56

3.2.3. The Data-Driven Surrogate Q-Learning 60

3.3. The Extended Kleinman Iteration 63

3.3.1. Existence of Solutions 64

3.3.2. Selection of Design Parameters 66

3.4. Convergence Analysis 68

3.4.1. Monotonic Stabilization 68

3.4.2. Local Convergence 70

3.4.3. Global Convergence 73

3.5. Illustrative Numerical Examples 79

3.5.1. Validation of the Extended Kleinman Iteration 79

3.5.2. Validation of the Data-Driven Surrogate Q-Learning 80

Chapter 4. Application to Nonlinear Optimal Control Problems 87

4.1. Nonlinear Optimal Control Problems 88

4.2. Koopman Operators for Optimal Control Problems 90

4.2.1. Koopman Lifting Linearization 90

4.2.2. Equilibrium Points 92

4.2.3. Lifted Optimal Control Problems 93

4.3. The Meta-Learning Framework 99

4.3.1. Koopman Groups and Common Liftings 99

4.3.2. Diffeomorphic Lifting Approximation 100

4.3.3. Base Learner Formulation 103

4.3.4. Meta-Learner Formulation 105

4.3.5. Offline and Online Learning Synthesis 107

Chapter 5. Numerical Simulation 109

5.1. Koopman Group of Nonlinear Systems 109

5.2. The Meta-Learning Stage 112

5.2.1. Meta-Learning Setups 112

5.2.2. Meta-Learning Results 113

5.3. The Surrogate Q-Learning Stage 119

5.3.1. Surrogate Q-Learning Setups 119

5.3.2. Surrogate Q-Learning Results 120

Chapter 6. Conclusion 127

6.1. Concluding Remarks 127

6.2. Direction for Further Research 129

Bibliography 131

Appendix A. The Glow Implementation 145

A.1. Flows 145

A.1.1. Activation Layers 145

A.1.2. 1×1 Convolution Layers 147

A.1.3. Affine Coupling Layers 148

국문초록 149

Table 5.1. Meta-learning parameters. 114

Table 5.2. Mean-square linearization errors. 118

Figure 3.1. Convergence history of Pk and Kk to their optimal values.[이미지참조] 81

Figure 3.2. The convergence history of the number of eigenvalues of Ak with positive real parts.[이미지참조] 82

Figure 3.3. Convergence history of Pk and Kk to their optimal values.[이미지참조] 85

Figure 3.4. The convergence history of the number of eigenvalues of Ak with positive real parts.[이미지참조] 86

Figure 4.1. A diagram of a Koopman group. 101

Figure 4.2. The proposed meta-learning and reinforcement learning scheme. 108

Figure 5.1. The approximated common lifting. 115

Figure 5.2. The functions f₁(x) and f₂(x). 116

Figure 5.3. The functions G₁(x) and G₂(x). 116

Figure 5.4. The contour plots of ∇ˆø₁(x;wø)T f(x) and ∇ˆø₂(x;wø)T f(x).[이미지참조] 117

Figure 5.5. The functions ∇ˆø₁(x;wø)T G(x) and ∇ˆø₂(x;wø)T G(x).[이미지참조] 117

Figure 5.6. The performance output of the nonlinear system and the Koopman lifting linearization. 119

Figure 5.7. The learning history of the surrogate Q-learning for 20 different randomly sampled systems. The upper plot presents the median number of... 122

Figure 5.8. The feedback gain convergence histories of the surrogate Q-learning for 20 different randomly sampled systems. The median errors for each element... 123

Figure 5.9. The optimal control inputs (left), the learned control inputs (middle), and the errors between the two (right) for the random systems Sp₁ (top) to Sp₃...[이미지참조] 124

Figure 5.10. The phase portrait of the analytic optimal control (left) and the controller trained using the surrogate Q-learning (right) for each system. 125

Figure A.1. The architecture of the Glow. 146

Figure A.2. The architecture of a Flow. 146