목차

Title Page

ABSTRACT

Contents

1. INTRODUCTION 9

2. Prediction of Water Positions on Protein Structure using wKGB Statistical Potential 13

2.1. Methods for GalaxyWater-wKGB 13

2.1.1. Derivation of wKGB potential 13

2.1.2. Prediction of bound water positions with wKGB 19

2.2. Performance of GalaxyWater-wKGB 21

2.2.1. Characteristics of wKGB potential 21

2.2.2. Results of water site prediction 26

3. Prediction of Water Positions on Protein Structure using 3D-CNN 31

3.1. Methods for GalaxyWater-CNN 32

3.1.1. Overview of the overall method 32

3.1.2. The CNN architecture 33

3.1.3. Training of the neural network 36

3.1.4. Placement of water molecules from the water map 37

3.1.5. Methods for performance comparison 38

3.2. Performance of GalaxyWater-CNN 42

3.2.1. Results of network training 42

3.2.2. Results on the single-protein test set 43

3.2.3. Results on the protein-protein complex test set 47

3.2.4. Result on the protein-compound complex set 49

4. CONCLUSION 52

SUPPLEMENTARY INFORMATION 54

BIBLIOGRAPHY 64

국문초록 68

Figure 2.1. Definitions of hydrogen bond distance r and orientation θ. Hydrogen bond distance r is distance between water oxygen atom w whose coordinate vector... 15

Figure 2.2. Radial part of smoothed wKGB potential for (a) ALA main chain O, (b) ASN side chain OD1, (c) ASP side chain OD, (d) LYS side chain NZ, and (e) ALA... 22

Figure 2.3. Angular part of smoothed wKGB potential for (a) ALA main chain N, (b) ASP side chain OD, and (c) SER side chain OG. Potential is drawn for sp of...[이미지참조] 24

Figure 2.4. Effect of considering solvation state and water occupation in vacant space on resulting wKGB potential, illustrated for ASN side chain ND2. (a) current... 25

Figure 2.5. Prediction results of water sites on crystal structure sets. X-axis is number of predicted water sites (Npred) divided by number of well-resolved...[이미지참조] 28

Figure 2.6. Example case of water site prediction (PDB ID: 3QL9) when twice the number of crystallographic water molecules are predicted. Red spheres represent... 29

Figure 2.7. Coverage of crystallographic water positions and RMSD of predicted positions for crystal structure set is shown in (a) and (b). X-axis is number of... 30

Figure 3.1. GalaxyWater-CNN places water molecules around a given structure of a protein, a protein-protein complex, or a protein-compound complex by locating... 33

Figure 3.2. Network structure of GalaxyWater-CNN. a) The overall network structure, where n denotes the number of channels representing atom types and N is... 35

Figure 3.3. Water placement method on the water distribution map generated by CNN. 38

Figure 3.4. Evolution of the loss values for the single-protein training and validation sets with respect to the number of epochs. 42

Figure 3.5. Performance comparison of GalaxyWater-CNN, GalaxyWater-wKGB, 3D-RISM, and FoldX on the single-protein test set. a) Average coverage and b)... 45

Figure 3.6. An example case PDB ID: 2FWH which emphasizes a case in which GalaxyWater-CNN predicts water sites precisely, as in (a), (b), and (c), whereas the... 46

Figure 3.7. Performance comparison of GalaxyWater-CNN, GalaxyWater-wKGB, 3D-RISM, and FoldX on the protein-protein complex test set. a) Average coverage;... 47

Figure 3.8. Example case of predicting protein interface bridging water molecules (PDB ID: 2FHZ) at Npred /Ncryst=3. Gray contour in (a) and that magnified in (b)...[이미지참조] 48

Figure 3.9. Performance comparison of GalaxyWater-CNN and 3D-RISM for predicting water molecules in the compound binding sites of proteins: a) Average... 50

Figure 3.10. Example case of predicting binding-site water molecules on a protein-compound complex (PDB ID: 2ihj) at Npred /Ncryst=3. Green and pink spheres...[이미지참조] 51