摘要

The full convolutional time-domain audio separation network (Conv-TasNet) is a state-of-the-art end-to-end speech separation model which was proposed recently. The Conv-TasNet used dilated convolution to expand the receptive field and fuse more speech features in space, which greatly improved the speech separation performance of the network, but at the same time ignored the importance of information across different convolution channels. An end-to-end speech enhancement method based on ultra-lightweight channel attention was proposed, which effectively combined Conv-TasNet and channel attention. At the same time, a group of filters was added to the Conv-TasNet codec to improve the speech feature extraction ability of the network. This method can make convolutional neural network combine spatial information and channel information more effectively to improve the speech enhancement effect. Experiment shows that the proposed model can effectively improve the performance of speech enhancement when the model capacity is only increased by about 0.02%.