Skip to main content

Analysis of representation and generalization capabilities of pre-trained audio models in urban environments

Buy Article:

$15.00 + tax (Refund Policy)

In the last decade, urban noise pollution has become a significant environmental concern that can be mitigated with the help of audio detection algorithms for classifying different sources of noise and creating more informative noise maps. In this context, machine learning, particularly transfer learning, is an essential technology that enables accurate analysis of urban noise sources. However, the choice of the pre-trained model used to compute audio embeddings can significantly influence the performance of downstream classification tasks. This paper aims to compare the embeddings of various pre-trained models on different data collection campaigns in the context of the Sons al balcó project and quantify the robustness of audio representations. To achieve this, we develop metrics and statistically test the presence of distribution shifts in learned latent features. To evaluate the quality of the embeddings, we perform both qualitative and quantitative analysis using dimensionality reduction methods and assess the performance on downstream tasks using data from different collection campaigns. Results highlight major differences between general purpose and specific models. Our findings suggest the need for careful consideration during the choice of the pre-trained model to use in audio event detection applications.

The requested document is freely available to subscribers. Users without a subscription can purchase this article.

Sign in

Document Type: Research Article

Affiliations: 1: University of Pisa, Computer Science Department 2: HER - Human Environment Research, La Salle Campus Barcelona, Ramon Llull University

Publication date: 04 October 2024

More about this publication?
  • The Noise-Con conference proceedings are sponsored by INCE/USA and the Inter-Noise proceedings by I-INCE. NOVEM (Noise and Vibration Emerging Methods) conference proceedings are included. All NoiseCon Proceedings one year or older are free to download. InterNoise proceedings from outside the USA older than 10 years are free to download. Others are free to INCE/USA members and member societies of I-INCE.

  • Membership Information
  • INCE Subject Classification
  • Ingenta Connect is not responsible for the content or availability of external websites
  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content