표제지
Abstract
목차
Chapter 1. INTRODUCTION 10
1.1. Background 10
1.2. Contribution 11
1.3. Overview 12
Chapter 2. ACOUSTIC ECHO CANCELLATION 14
2.1. Problem formulation 14
2.2. Previous studies for acoustic echo cancellation 15
2.3. Performance metrics for acoustic echo cancellation 16
2.3.1. PESQ 16
2.3.2. ERLE 17
2.3.3. AEC-MOS 17
Chapter 3. DEEP NEURAL NETWORK 19
3.1. Convolutional neural network 20
3.2. Self-attention mechanism 21
3.3. Gated convolution 23
Chapter 4. DNN BASED ACOUSTIC ECHO CANCELLATION 24
4.1. U-net 24
4.1.1. Autoencoder 25
4.1.2. Skip-connection 25
4.1.3. U-net 26
4.2. Objective function 27
4.2.1. Mean square error 27
4.2.2. Scale-invariant signal to noise ratio 28
Chapter 5. Proposed model 30
5.1. Overall architecture 30
5.2. Encoder-decoder architecture 32
5.2.1. Complex convolution layer 33
5.3. Gated attention block 34
5.4. Interaction block 36
Chapter 6. Experiment 38
6.1. Dataset 38
6.2. Training details 39
6.3. Experiment results 39
6.3.1. Double-talk scenario 39
6.3.2. Single-talk scenario 42
6.4. Ablation study 45
Chapter 7. Conclusion and future study 47
Bibliography 49
초록 55
Table 5.1. Details of the encoder/decoder block ((frequency, time) order) 32
Table 6.1. Result of the double-talk scenario test 40
Table 6.2. Result of the double-talk scenario test 40
Table 6.3. Result of the double-talk scenario test with blind test set 42
Table 6.4. Result of the far-end single talk scenario test with blind test set 43
Table 6.5. Result of the near-end single talk scenario test with blind test set 45
Table 6.6. Result of the double-talk scenario test 46
Table 6.7. Result of the double-talk scenario test with blind test set 46
Table 6.8. Result of the far-end single talk scenario test with blind test set 46
Figure 2.1. Visualization of the acoustic echo scenario. 15
Figure 3.1. Visualization of the convolution layer calculation. 20
Figure 3.2. Visualization of the attention mechanism. 22
Figure 4.1. Architecture of the U-net 27
Figure 4.2. Illustration of the definitions of SNR and SI-SNR 29
Figure 5.1. Overall architecture of the proposed model. 31
Figure 5.2. Architecture of the encoder-decoder block. 33
Figure 5.3. Visualization of complex convolution. 34
Figure 5.4. Architecture of the gated attention block. 35
Figure 5.5. Example of the attention mask. 36
Figure 5.6. Architecture of the interaction block. 37
Figure 6.1. Spectrogram of the enhanced sample. The noise was added in the far-end side, SER=6dB. 41
Figure 6.2. Spectrogram of the sample. There were movements in the echo path. 44