Spech Processing, Deep Learning, Digital Signal Processing, Human-Machine Interaction
The interaction between Humans and Machines is a holy grail in science and technology, and speech is one of the most natural media used on purpose for more than 40 years. In the last decade, due to the advent of modern Deep Learning techniques, a tremendous improvement of reliability of Automatic Speech Recognition (ASR) for HMI has been registered. However, there are still many challenges which need to be faced by the scientific community, mostly related to the fact that such systems are required to work under harsh acoustic conditions, characterized by the presence of multiple overlapping speakers, different types of noise, reverberation and unknown microphones position. Another related task is the transmission of speech over communication networks, which may present audio packet loss and requires audio inpainting where packets are lost, and latency control. The present research is focused on developing innovative data-driven solutions for enhancing the quality of acquired speech signals for tasks related to speech processing, such as ASR.
Digital Signal Processing and Computational Intelligence Lab
Fondazione Bruno Kessler, PerVoice SpA, INRIA, Carnegie Mellon University, Jabra