The aim of this work is to provide robust, low-complexity demixing of sound sources
from a set of microphone signals for a typical meeting scenario where the source
mixture is relatively sparse in time. We define a similarity matrix that
characterizes the similarity of the spatial signature of the observations at
different time instants within a frequency band. Each entry of the similarity
matrix is the sum of a set of kernelized similarity measures, each operating on
single frequency bin. The kernelization leads to high robustness as it reduces the
importance of outliers. Clustering by means of affinity propagation provides the
separation of talkers without the need to specify the talker number in advance. The
clusters can be used directly for separation, or they can be used as a global
pre-processing method that identifies sources for an adaptive demixing procedure.
Our experimental results confirm the that the approach performs significantly
better than two reference methods.