Skip to main content

A new speech enhancement method based on Swin-UNet model

Buy Article:

$17.00 + tax (Refund Policy)

U-shaped Network (UNet) has shown excellent performance in a variety of speech enhancement tasks. However, because of the intrinsic limitation of convolutional operation, traditional UNet built with convolutional neural network (CNN) cannot learn global and long-term information well. In this work, we propose a new Swin-UNet-based speech enhancement method. Unlike the traditional UNet model, the CNN blocks are all replaced with Swin-Transformer blocks to explore more multi-scale contextual information. The Swin-UNet model employs shifted window mechanism which not only overcomes the defect of high computational complexity of the Transformer but also enhances global information interaction by utilizing the powerful global modeling capability of the Transformer. Through hierarchical Swin-Transformer blocks, global and local speech features can be fully leveraged to improve speech reconstruction ability. Experimental results confirm that the proposed method can eliminate more background noise while maintaining good objective speech quality.

The requested document is freely available to subscribers. Users without a subscription can purchase this article.

Sign in

Keywords: 74.3; 74.8

Document Type: Research Article

Affiliations: 1: School of Information, Nanchang Hangkong University 2: College of Physics and Electronics, Shandong Normal University

Publication date: 01 July 2023

More about this publication?
  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content