Blind Source Separation based on Multiple Decorrelations (*)
Lucas Parra, Clay Spence
Abstract
Acoustic signals recorded simultaneously in a reverberant environment can
be described as sums of differently convolved sources. The task of source
separation is to identify the multiple channels and possibly to invert
those in order to obtain estimates of the underling sources. We tackle
the problem by explicitly exploiting the non-stationarity of the acoustic
sources. Changing cross-correlations at multiple times give a sufficient
set of constraints for the unknown channels. A least squares optimization
allows us to estimate a forward model, identifying thus the multipath channel.
In the same manner we can find an FIR backward model, which generates well
separated model sources.
The applications of this technique reach from advanced digital
hearing aids to improved front ends for speech recognition engines.
Results
Two speakers recorded in a low noise environment (12 dB). 15 seconds alternating
and 15 seconds continuous speech. The cross-talk, measured as Signal to
Signal Ratio (SSR) in the first 15 seconds improves from 0 dB before the
separation (channel 1, channel
2) to 14 dB after the separation (channel
1, channel 2).
In order to compare with current state of the art listen to separation
obtained from two speakers in a real room (channel
1, channel 2). This is our result (channel
1, channel 2) and the result provided
by by Te-Won Lee
(channel
1, channel 2).
In the case of a main source embedded in a background of multiple
sources (channel 1, channel
2) the assumption of more microphones than sources is violated. In
that case the algorithm separates the speaker form the background, providing
a good background estimate (background - channel
1, speaker - channel 2).
This is an example of a strongly reveberating environment(channel
1, channel 2) . The interfering source
(TV set) has little direct signal to the two microphones and instead reflects
of a wall of the room. This result is obtained with a filter of size 512
(background - channel 2,
speaker
- channel 1).
(*) US patent 6,167,417; IEEE Trans. on Speech
and Audio Processing pp. 320-327, May 2000.