Title Page
ABSTRACT
국문 초록
PREFACE
Contents
CHAPTER 1. INTRODUCTION 16
1.1. Background and Purpose 16
1.2. Organization of This Paper 23
CHAPTER 2. Related work 24
2.1. Deep Learning with CNN 24
2.2. Lightweight models 27
2.3. Attention 28
2.4. Interpretation with Grad-CAM 29
2.5. Overview of lightweight CNN-GRU skip connection 30
CHAPTER 3. Materials and Methods 36
3.1. Data - clinical dataset 37
3.2. Pre-processing and Feature extraction (stage 1) 40
3.2.1. Segmentation of respiratory sounds 40
3.2.2. Bandpass filter 41
3.2.3. Log-Mel spectrogram MFCC 42
3.3. Attention networks 44
3.3.1. Squeeze-and-Excitation Networks (SENet) 44
3.3.2. Efficient Channel Attention (ECA-Net) 46
3.4. Proposed models (stage 2) 48
3.4.1. Modified VGGish 48
3.4.2. Light attention connected network 51
3.4.3. Grad-CAM (stage 3) 53
CHAPTER 4. Experiments 55
4.1. Hyperparameter settings 55
4.2. Model structure 56
4.3. Evaluations 57
CHAPTER 5. Results 58
5.1. Results of the baseline and proposed models 58
5.2. Confusion matrix of baseline and proposed models 60
5.3. Interpretation of lung disease with XAI (Grad-CAM) 62
5.4. Comparison of attention models 67
5.5. Comparison of pass filter and augmentation 70
5.5.1. Result of pass filter 70
5.5.2. Result of augmentations 72
5.6. Littmann stethoscope data utilization with data in brief 74
CHAPTER 6. Discussion 76
CHAPTER 7. Conclusion 82
REFERENCES 84
Table 1.1. Characteristics of respiratory sounds: Normal, Wheeze, and Crackle. 18
Table 2.1. Results for the lightweight model with precision, sensitivity, specificity, f1-score, Cohen's kappa, and Matthews Correlation Coefficient. 34
Table 3.1. Disease information in the clinical dataset. 37
Table 3.2. Information of respiratory sound and measurement conditions. 39
Table 3.3. Segmentation of the clinical dataset. 41
Table 3.4. Parameters of the log-Mel spectrogram MFCC. 43
Table 3.5. Pseudo code for light attention connected network. 51
Table 4.1. Hyperparameter settings. 55
Table 4.2. Details of the model structure. 56
Table 5.1. Comparison of the baseline and proposed models. 58
Table 5.2. Result of the baseline and proposed models for the confusion matrix score. 60
Table 5.3. Comparison of attention with No-attention, CBAM, SENet, and proposed model. 67
Table 5.4. Performance of pass filter: bandpass, lowpass, and highpass filter. 70
Table 5.5. Performance of augmentation: time stretch and pitch. 72
Table 5.6. Lung sound of public data. 74
Table 5.7. Comparison with other models using the public dataset. 75
Table 6.1. State-of-the-art models vis-a-vis the proposed model. 78
Figure 2.1. Patient measurement site and disease. 30
Figure 2.2. Preprocessing of respiratory sound: (a) normal sound, (b) low pass filter, (c) high pass filter, and (d) band pass filter. 31
Figure 2.3. MFCC with feature stacking. 32
Figure 2.4. Lightweight CNN-GRU skip connection. 35
Figure 3.1. Architecture of the proposed model. 36
Figure 3.2. Signal to log-Mel spectrogram MFCC. 40
Figure 3.3. Attention networks: (a) SENet and (b) ECA-Net. 45
Figure 3.4. (a) Standard convolution and (b) depthwise separable convolution, the kernel size Dk, the number of output channels N, and the number of input channels.[이미지참조] 49
Figure 3.5. Models with (a) VGGish, (b) baseline, (c) proposed model, and (d) light attention connected network. 50
Figure 5.1. Confusion matrix with (a) baseline and (b) proposed model. 61
Figure 5.2. Respiration measurement location: Both left and right of (a) Anterior upper and (b) Posterior lower. 63
Figure 5.3. Interpretation of disease with Grad-CAM (XAI): Normal and crackle: pneumonia. 64
Figure 5.4. Interpretation of disease with Grad-CAM (XAI): Crackle: bronchiectasis and interstitial lung disease. 65
Figure 5.5. Interpretation of disease with Grad-CAM (XAI): Wheeze: COPD and asthma. 66
Figure 5.6. Comparison of attention models: baseline, No-attention, CBAM, SENet, proposed model, and [42]. 69
Figure 5.7. Normal sound with pass filters: (a) non-pass filter, (b) bandpass filter, (c) lowpass filter, and (d) highpass filter. 71
Figure 5.8. Data augmentation: (a) non-augmentation, (b) time stretch (0.7), (c) time stretch (1.3), and (d) pitch. 73