A Simple and Efficient Method to Handle Sparse Preference Data Using Domination Graphs: An Application to YouTube
Abstract
The phenomenal growth of the number of videos on YouTube provides enormous
potential for users to find content of interest to them. Unfortunately, as the size
of the repository grows, the task of discovering high-quality content becomes more
daunting. To address this, YouTube occasionally asks users for feedback on videos.
In one such event (the YouTube Comedy Slam), users were asked to rate which of two
videos was funnier. This yielded sparse pairwise data indicating a participant’s
relative assessment of two videos. Given this data, several questions immediately
arise: how do we make inferences for uncompared pairs, overcome noisy, and usually
contradictory, data, and how do we handle severely skewed, real-world, sampling? To
address these questions, we introduce the concept of a domination-graph, and
demonstrate a simple and scalable method, based on the Adsorption algorithm, to
efficiently propagate preferences through the graph. Before tackling the publicly
available YouTube data, we extensively test our approach on synthetic data by
attempting to recover an underlying, known, rank-order of videos using similarly
created sparse preference data.
