목차

표제지

Abstract

목차

Chapter 1. INTRODUCTION 10

1.1. Background 10

1.2. Contribution 11

1.3. Overview 12

Chapter 2. ACOUSTIC ECHO CANCELLATION 14

2.1. Problem formulation 14

2.2. Previous studies for acoustic echo cancellation 15

2.3. Performance metrics for acoustic echo cancellation 16

2.3.1. PESQ 16

2.3.2. ERLE 17

2.3.3. AEC-MOS 17

Chapter 3. DEEP NEURAL NETWORK 19

3.1. Convolutional neural network 20

3.2. Self-attention mechanism 21

3.3. Gated convolution 23

Chapter 4. DNN BASED ACOUSTIC ECHO CANCELLATION 24

4.1. U-net 24

4.1.1. Autoencoder 25

4.1.2. Skip-connection 25

4.1.3. U-net 26

4.2. Objective function 27

4.2.1. Mean square error 27

4.2.2. Scale-invariant signal to noise ratio 28

Chapter 5. Proposed model 30

5.1. Overall architecture 30

5.2. Encoder-decoder architecture 32

5.2.1. Complex convolution layer 33

5.3. Gated attention block 34

5.4. Interaction block 36

Chapter 6. Experiment 38

6.1. Dataset 38

6.2. Training details 39

6.3. Experiment results 39

6.3.1. Double-talk scenario 39

6.3.2. Single-talk scenario 42

6.4. Ablation study 45

Chapter 7. Conclusion and future study 47

Bibliography 49

초록 55

Table 5.1. Details of the encoder/decoder block ((frequency, time) order) 32

Table 6.1. Result of the double-talk scenario test 40

Table 6.2. Result of the double-talk scenario test 40

Table 6.3. Result of the double-talk scenario test with blind test set 42

Table 6.4. Result of the far-end single talk scenario test with blind test set 43

Table 6.5. Result of the near-end single talk scenario test with blind test set 45

Table 6.6. Result of the double-talk scenario test 46

Table 6.7. Result of the double-talk scenario test with blind test set 46

Table 6.8. Result of the far-end single talk scenario test with blind test set 46

Figure 2.1. Visualization of the acoustic echo scenario. 15

Figure 3.1. Visualization of the convolution layer calculation. 20

Figure 3.2. Visualization of the attention mechanism. 22

Figure 4.1. Architecture of the U-net 27

Figure 4.2. Illustration of the definitions of SNR and SI-SNR 29

Figure 5.1. Overall architecture of the proposed model. 31

Figure 5.2. Architecture of the encoder-decoder block. 33

Figure 5.3. Visualization of complex convolution. 34

Figure 5.4. Architecture of the gated attention block. 35

Figure 5.5. Example of the attention mask. 36

Figure 5.6. Architecture of the interaction block. 37

Figure 6.1. Spectrogram of the enhanced sample. The noise was added in the far-end side, SER＝6dB. 41

Figure 6.2. Spectrogram of the sample. There were movements in the echo path. 44