Title Page
Abstract
Contents
Introduction 12
A. The assessment of bowel preparation using the CNN model 14
1. Background and Objective 14
2. Materials and Methods 15
2.1. Data 15
2.2. Training method 17
3. Experiments and Results 18
3.1. Image set validation 19
3.2. Video set validation 23
4. Discussion 26
5. Summary 27
B. Endoscopic diagnosis on eosinophilic esophagitis using CNN 27
1. Background and Objective 27
2. Materials and Methods 28
2.1. Dataset 28
2.2. Deep learning models 28
2.3. Train strategy and Evaluation 31
3. Experiments and results 32
3.1. Baseline characteristics of study patients 32
3.2. Comparison of diagnostic ability between networks 33
3.3. Comparison of performance between endoscopists and sca4 U-Net 37
4. Discussion 38
5. Summary 41
C. The domain adaptation of virtual endoscopy images for monocular depth estimation from endoscopic images by using CNNs 41
1. Background and Objective 41
2. Materials and Methods 42
2.1. Dataset 42
2.2. VE to TE 43
2.3. TE to Depth 45
3. Experiments and Results 45
3.1. Image-to-image translation result 45
3.2. Depth estimation result from TE 47
3.3. Depth estimation result from RE 48
4. Discussion 50
5. Summary 52
Conclusion 52
References 53
Abstract (with Korean) 58
Table 1. The number of the originally collected endoscopic data for BBPS training. 16
Table 2. The number of endoscopic image data after the hard augmentation. 17
Table 3. The hyperparameter details of ResNet50 and image preprocessing. 19
Table 4. The quantitative result of the test set. Note that no BBPS class was excluded from this examination. 20
Table 5. The quantitative binary classification result of the test video set. Note that no BBPS class was excluded from this examination. 24
Table 6. Hyperparameter settings for experimented networks. 31
Table 7. Baseline characteristics and endoscopic findings of the study patients 32
Table 8. Accuracy results of each network 33
Table 9. Area Under Curve (AUC) values of each network 34
Table 10. Comparison of diagnostic performance between 3 groups and sca4U-Net. 37
Table 11. The quantitative result of image-to-image translation. As shown in the visualization, the SSIM score of DCLGAN was higher than CUT. But CUT exceeded DCLGAN in other metrics. 45
Table 12. Quantitative result of the predicted depth map. The depth maps predicted by TE images from the CUT model showed better results in MSE and PSNR. But as seen in visual results, the TE... 48
Figure 1. The test set confusion matrix by ResNet50 trained by the parameter of Table 3. 20
Figure 2. Confusion matrix and ROC curve of the test set when it is composed as a binary class. Note that no BBPS class was excluded from this examination. 21
Figure 3. The Grad-CAM result of BBPS classification. The top 2 rows show the Grad-CAM result of the wrong case and the bottom 2 rows show the Grad-CAM result of corrected cases. 21
Figure 4. The T-SNE clustering result of the test image sets. The color of each point denotes the label of each image. And the locations of each point are defined by the T-SNE algorithm. From this, the... 22
Figure 5. Confusion matrix test video set. 113 videos were acquired for this evaluation. Note that no BBPS class was excluded from this examination. 23
Figure 6. Confusion matrix and ROC curve of test video set when classified as a binary class. 113 videos were acquired for this evaluation. Note that no BBPS class was excluded from this examination. 24
Figure 7. T-SNE clustering result of feature vectors from two video clips by trained ResNet50. One video was sampled from video that have a clean label and the other was sampled from videos that... 25
Figure 8. The elapsed times for the trained model to predict the BBPS score for 10 seconds long videos were recorded and rounded to an integer to draw this histogram. Finally, the average time to... 26
Figure 9. ROC curve comparisons of each class from models are listed below. The blue curve shows VGG 19's ROC curve and yellow curve denotes the ResNet 50 and the green denotes the... 35
Figure 10. Gradient Class Activation Map results of EoE positive/negative patients from each model. 37
Figure 11. The comparison of classification ability between physicians and sca4 U-Net. We sampled 70 images from the test set (eosinophilic esophagitis: 30, normal control: 40) and asked physicians to... 38
Figure 12. The comparison of feature maps from vanilla U-Net trained to learn image reconstruction. (a) is the feature maps from the 3rd skip connection of the U-Net and (b) is the feature maps from the... 40
Figure 13. The translated TE images from VE images. The left image shows the result from CUT and the right image shows the result from DCLGAN. 46
Figure 14. The predicted depth map from TE images was translated by each CUT and DCLGAN. 48
Figure 15. The predicted depth map from RE images and the 3D reconstruction results of the depth map. 50