
DOL3: Distilled OpenL3 audio embeddings for lightweight audio classification
Deep audio representations, also known as embeddings, recently became a popular alternative to conventional features like spectrograms for a wide range of audio classification tasks because of their domain-agnostic character and reduced training costs. Still, the usage is often limited
to rather computationally intensive system due to the nature of their extraction from large networks. This paper aims to minimize the computational costs of embedding extraction by distilling the knowledge of the OpenL3 audio network to a smaller student network. Results show that the student
network maintains comparable performance as the teacher network on various music and ambient noise classification tasks, while reducing the network size by over 90\% and the computational load by five times.
The requested document is freely available to subscribers. Users without a subscription can purchase this article.
- Sign in below if you have already registered for online access
Sign in
Document Type: Research Article
Affiliations: 1: Technische Universität Ilmenau 2: Fraunhofer Institute for Digital Media Technology IDMT
Publication date: 04 October 2024
The Noise-Con conference proceedings are sponsored by INCE/USA and the Inter-Noise proceedings by I-INCE. NOVEM (Noise and Vibration Emerging Methods) conference proceedings are included. All NoiseCon Proceedings one year or older are free to download. InterNoise proceedings from outside the USA older than 10 years are free to download. Others are free to INCE/USA members and member societies of I-INCE.
- Membership Information
- INCE Subject Classification
- Ingenta Connect is not responsible for the content or availability of external websites
- Access Key
- Free content
- Partial Free content
- New content
- Open access content
- Partial Open access content
- Subscribed content
- Partial Subscribed content
- Free trial content