ViSQOL: an objective speech quality model
Venue
EURASIP Journal on Audio, Speech, and Music Processing, vol. 2015 (13) (2015), pp. 1-18
Publication Year
2015
Authors
Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte
BibTeX
Abstract
This paper presents an objective speech quality model, ViSQOL, the Virtual Speech
Quality Objective Listener. It is a signal-based, full-reference, intrusive metric
that models human speech quality perception using a spectro-temporal measure of
similarity between a reference and a test speech signal. The metric has been
particularly designed to be robust for quality issues associated with Voice over IP
(VoIP) transmission. This paper describes the algorithm and compares the quality
predictions with the ITU-T standard metrics PESQ and POLQA for common problems in
VoIP: clock drift, associated time warping, and playout delays. The results
indicate that ViSQOL and POLQA significantly outperform PESQ, with ViSQOL competing
well with POLQA. An extensive benchmarking against PESQ, POLQA, and simpler
distance metrics using three speech corpora (NOIZEUS and E4 and the ITU-T P.Sup. 23
database) is also presented. These experiments benchmark the performance for a wide
range of quality impairments, including VoIP degradations, a variety of background
noise types, speech enhancement methods, and SNR levels. The results and subsequent
analysis show that both ViSQOL and POLQA have some performance weaknesses and
under-predict perceived quality in certain VoIP conditions. Both have a wider
application and robustness to conditions than PESQ or more trivial distance
metrics. ViSQOL is shown to offer a useful alternative to POLQA in predicting
speech quality in VoIP scenarios.
