Mostrar el registro sencillo del ítem

dc.contributor.authorNogales Moyano, Alberto 
dc.contributor.authorCaracuel-Cayuela, Javier
dc.contributor.authorGarcía Tejedor, Álvaro José 
dc.date.accessioned2024-04-11T11:23:37Z
dc.date.available2024-04-11T11:23:37Z
dc.date.issued2024
dc.identifier.issn2076-3417spa
dc.identifier.urihttps://hdl.handle.net/10641/4297
dc.description.abstractThis paper presents an approach to enhancing the clarity and intelligibility of speech in digital communications compromised by various background noises. Utilizing deep learning techniques, specifically a Variational Autoencoder (VAE) with 2D convolutional filters, we aim to suppress background noise in audio signals. Our method focuses on four simulated environmental noise scenarios: storms, wind, traffic, and aircraft. The training dataset has been obtained from public sources (TED-LIUM 3 dataset, which includes audio recordings from the popular TED-TALK series) combined with these background noises. The audio signals were transformed into 2D power spectrograms, upon which our VAE model was trained to filter out the noise and reconstruct clean audio. Our results demonstrate that the model outperforms existing state-of-the-art solutions in noise suppression. Although differences in noise types were observed, it was challenging to definitively conclude which background noise most adversely affects speech quality. The results have been assessed with objective (mathematical metrics) and subjective (listening to a set of audios by humans) methods. Notably, wind noise showed the smallest deviation between the noisy and cleaned audio, perceived subjectively as the most improved scenario. Future work should involve refining the phase calculation of the cleaned audio and creating a more balanced dataset to minimize differences in audio quality across scenarios. Additionally, practical applications of the model in real-time streaming audio are envisaged. This research contributes significantly to the field of audio signal processing by offering a deep learning solution tailored to various noise conditions, enhancing digital communication quality.spa
dc.language.isoengspa
dc.publisherApplied Sciencesspa
dc.rightsAtribución-NoComercial-SinDerivadas 3.0 España*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/*
dc.subjectSpeech enhancementspa
dc.subjectNoise suppressionspa
dc.subjectDeep learningspa
dc.subjectVariational autoencodersspa
dc.titleAnalyzing the Influence of Diverse Background Noises on Voice Transmission: A Deep Learning Approach to Noise Suppression.spa
dc.typejournal articlespa
dc.type.hasVersionVoRspa
dc.rights.accessRightsopen accessspa
dc.description.extent1082 KBspa
dc.identifier.doi10.3390/app14020740spa
dc.relation.publisherversionhttps://www.mdpi.com/2076-3417/14/2/740spa


Ficheros en el ítem

FicherosTamañoFormatoVer
applsci-14-00740-v2.pdf1.056MbPDFVer/

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Atribución-NoComercial-SinDerivadas 3.0 España
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución-NoComercial-SinDerivadas 3.0 España