Title Page
Contents
Abstract 11
Chapter 1. INTRODUCTION 13
1. Motivation 13
2. Contribution 20
3. Organization of the Dissertation 24
Chapter 2. Background 25
1. Electronic health record 25
2. Patient generated health data 27
3. Deep learning 29
Chapter 3. Influenza Screening via Deep Learning Using a Combination of Epidemiological and Patient- Generated Health Data: Development and Validation Study 32
1. Overview 32
2. Methods 33
2.1. Data collection 33
2.2. Data Preprocessing 35
2.3. Deep Learning Model and Training Hyperparameters 37
3. Results 38
4. Discussion 46
Chapter 4. Enhancing Machine Learning-based COVID-19 Screening Models with Epidemiological and Mobility Features, A retrospective model development 49
1. Overview 49
2. Methods 50
2.1. Study design 50
2.2. Data source 51
2.3. Models and training 56
3. Results 58
3.1. SHINE dataset characteristics 58
4. Discussion 65
Chapter 5. Deep-learning-based personalized prediction of absolute neutrophil count recovery and comparison with clinicians for validation. 69
1. Overview 69
2. Materials and Methods 70
2.1. Definition 70
2.2. Study design 70
2.3. Data collection 73
2.4. Data preprocessing 75
2.5. Model selection and description 75
2.6. Evaluation metric for model performance 76
2.7. Validation of the model by comparison with the clinicians' performance 77
2.8. Development and survey of the questionnaire 78
2.9. Statistical analysis 78
3. Results 79
3.1. Data statistics 79
3.2. Model performance 80
3.3. Validation of the model through comparison with clinicians' prediction result 82
3.4. Clinicians' change in prediction after looking at the models prediction 84
3.5. Questionnaire survey result analysis 84
4. Discussion 87
Chapter 6. Conclusion and Future work 91
1. Overview 91
2. Summary and results 91
3. Future work 93
4. Concluding remark 96
References 97
논문요약 116
Table 1. General characteristics of the data set. 39
Table 2. The effects of the removal of each variable from the analysis. "-〈Variable〉" means that the variable was singularly removed from the list of variables for the... 42
Table 3. Effect of each variable on the analysis. The baseline included body temperature, antipyretic drug, and antibiotic drug data. "+〈variable〉" means that the variable was... 44
Table 4. Comparison of COVID-19 testing results between the israeli and the SHINE datasets 59
Table 5. Comparison of Symptom Prevalence between the COVID-19 positive and negative groups. 60
Table 6. Model performance in predicting COVID-19 with the integration of secondary features. Each row represents the individual performance achieved by adding specific... 64
Table 7. Demographic information 79
Table 8. Prediction performance of our model and clinicians with inter-group comparison 81
Figure 1. Pipeline for data preprocessing. KCDC: Korea Center for Disease Control. 37
Figure 2. Receiver operating characteristic (ROC) curve illustrating the screening ability of the model. The red line shows a random guess, the blue line is the result of... 41
Figure 3. Screening performance versus the number of body temperature records. The y-axis shows the percentage of accuracy, and the x-axis refers to the number of body... 45
Figure 4. Symptom correlation in the SHINE dataset. The Spearman correlation coefficients are displayed in the grid. 61
Figure 5. Comparison of 7-day moving average values for confirmed cases and asymptomatic positive ratio in the SHINE dataset. 62
Figure 6. Feature importance by SHAP value in the Israeli dataset 63
Figure 7. Top ten important features, ranked by SHAP values, in the SHINE' dataset. National confirmed cases, global confirmed cases and national new deaths are... 65
Figure 8. Study overview. (a) Data collected from the Samsung Medical Center and filtered according to the inclusion and exclusion criteria. (b) According to the patient's... 73
Figure 9. Inclusion and exclusion criteria (A) Training dataset (B) Test dataset 74
Figure 10. Effect of the proposed chemotherapeutic agent data handling method. 82
Figure 11. Comparison of the predicted values and answers for the model and human expert. Statistical comparison of the percentage of correct answers according to error... 83
Figure 12. The 5-point Likert scale responses of groups of specialists and residents for each factor in the questionnaire (1=not at all agree, 5=totally agree). The number... 87