Jump to Content
Amr Ahmed

Amr Ahmed

Amr Ahmed is a Senior Staff Research Scientist at Google. He received his M.Sc and PhD degrees from the School of Computer Science, Carnegie Mellon University in 2009 and 2011, respectively. He received the best paper award at KDD 2014 , the best Paper Award at WSDM 2014, the 2012 ACM SIGKDD Doctoral Dissertation Award, and a best paper award (runner-up) at WSDM 2012. He co-chaired the WWW'18 track on Web Content Analysis and served as an Area Chair for IJCAI 2019, SIGIR 2019, SIGIR 2018, ICML 2018, ICML 2017, KDD 2016, WSDM 2015, ICML 2014, and ICDM 2014. His research interests include large-scale machine learning, data/web mining, user modeling, personalization, social networks and content analysis.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    DAG-structured Clustering by Nearest-Neighbors
    Nicholas Monath
    Avinava Dubey
    Andrew McCallum
    International Conference on Artificial Intelligence and Statistics (2021)
    Preview abstract Hierarchical clusterings compactly encode multiple granularities of clusters within a tree structure. Hierarchies, by definition, fail to capture different flat partitions that are not subsumed in one another. In this paper, we advocate for an alternative structure for representing multiple alternative clusterings, a directed acyclic graph (DAG). By allowing nodes to have multiple parents, DAG structure is not only more flexible than a tree but also allows for points to be members of multiple clusters. We describe a large-scale, map-reduce-based algorithm to infer these DAGs. Our algorithm works by simply merging nearest neighbor substructures to form a DAG structure. Our algorithm is supported with theoretical guarantees showing its representational capacity over tree-based algorithms. Further, we provide comprehensive empirical experiments on large-scale clustering benchmarks and entity resolution datasets. Our results show that our method is as scalable as and more accurate than state-of-the-art tree-based techniques. View details
    Exact and Approximate Hierarchical Clustering Using A*
    Craig Greenberg
    Sebastian Macaluso
    Nicholas Monath
    Avinava Dubey
    Patrick Flaherty
    Kyle Cranmer
    Andrew McCallum
    Uncertainty in Artificial Intelligence (2021)
    Preview abstract Hierarchical clustering is known to be broadly applicable in myriad domains. Despite its extensive use, existing approximate inference methods are insufficient for applications that require either exact or high-quality approximate solutions.For example, in high energy physics, we are interested in discovering high-quality jet structures, which are hierarchical clusterings of particles.In this paper we view inference as a search problem and focus on inferring high-quality hierarchies for a given cost function, rather than using ad hoc (e.g., greedy or beam) methods. This leads naturally to the use of A*, which has seldom been used for clustering (with the notable exception of \citep{daume2007fast}). Unlike ad hoc search methods, A* carries with it optimality guarantees. However, applying A* search naively leads to a large space and time complexity. To address this challenge, we develop a novel augmented trellis data structure and dynamic programming algorithm for A* that result in substantially improved time and space complexity bounds while still computing the globally optimal hierarchical clustering. We demonstrate that our proposed method increases the number of points for which an exact solution can be found by 25$\%$ compared with previous work \cite{greenberg2020compact}. Furthermore, our approach yields a natural approximation that scales to larger datasets and achieves substantially higher quality results than ad hoc search baselines, motivating its use in applications demanding exact or high-quality approximations. View details
    Non-Stationary Off-policy Optimization
    Joey Hong
    Branislav Kveton
    International Conference on Artificial Intelligence and Statistics (AISTATS) (2021)
    Preview abstract Off-policy learning is a framework for estimating the value of and optimizing policies offline from logged data without deploying them. Real-world environments are nonstationary, and the optimized policies should be able to adapt to these changes. To address this challenge, we study the novel problem of off-policy optimization in piecewise-stationary environments. Our key idea is to use a change-point detector to partition the logged data into categorical latent states, then find a near-optimal policy conditioned on latent state. We derive high-probability bounds on our off-policy estimates and optimization. Furthermore, we also propose a practical approach to deploy our policy online and evaluate our approach comprehensively on a real-world clickstream dataset. View details
    Scalable Hierarchical Agglomerative Clustering
    Nick Monath
    Avinava Dubey
    Guru Prashanth Guruganesh
    Andrew McCallum
    Gokhan Mergen
    Mert Terzihan
    Bryon Tjanaka
    Yuchen Wu
    Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2021), 1245–1255
    Preview abstract The applicability of agglomerative clustering, for inferring both hierarchical and flat clustering, is limited by its scalability. Existing scalable hierarchical clustering methods sacrifice quality for speed and often lead to over-merging of clusters. In this paper, we present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points. We perform a detailed theoretical analysis, showing that under mild separability conditions our algorithm can not only recover the optimal flat partition but also provide a two-approximation to non-parametric DP-Means objective [32]. This introduces a novel application of hierarchical clustering as an approximation algorithm for the non-parametric clustering objective. We additionally relate our algorithm to the classic hierarchical agglomerative clustering method. We perform extensive empirical experiments in both hierarchical and flat clustering settings and show that our proposed approach achieves state-of-the-art results on publicly available clustering benchmarks. Finally, we demonstrate our method’s scalability by applying it to a dataset of 30 billion queries. Human evaluation of the discovered clusters show that our method finds better quality of clusters than the current state-of-the-art. View details
    Preview abstract Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (in terms of memory mainly) on the sequence length due to their full attention mechanism. To remedy this, we propose, \emph{BigBird}, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that \emph{BigBird} is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our theoretical analysis demonstrates the need for having an O(1) global tokens, such as CLS, that attend to the entire sequence as part of the sparse attentions. We show that the proposed sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context, \emph{BigBird} drastically improves performance on various NLP tasks such as question answering. View details
    Latent Bandits Revisited
    Joey Hong
    Branislav Kveton
    Advances in Neural Information Processing Systems 33 (NeurIPS 2020), pp. 13423-13433
    Preview abstract A latent bandit is a bandit problem where the learning agent knows reward distributions of arms conditioned on an unknown discrete latent state. The goal of the agent is to identify the latent state, after which it can act optimally. This setting is a natural midpoint between online and offline learning, where complex models can be learned offline and the agent identifies the latent state online. This is of high practical relevance, for instance in recommender systems. In this work, we propose general algorithms for latent bandits, based on both upper confidence bounds and Thompson sampling. The algorithms are contextual, and aware of model uncertainty and misspecification. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions. A comprehensive empirical study showcases the advantages of our approach. View details
    Uncovering Hidden Structure in Sequence Data via Threading Recurrent Models
    Daniel Silva
    Yuchen Wu
    Shibani Sanan
    Surojit Chatterjee
    Proceedings of the 12 ACM International Conference on Web Search and Data Mining (2019), pp. 186-194
    Preview abstract Long Short-Term Memory (LSTM) is one of the most powerful sequence models for user browsing history [17, 22] or natural language text [19]. Despite the strong performance, it has not gained popularity for user-facing applications, mainly owing to a large number of parameters and lack of interpretability. Recently Zaheer et al. [25] introduced latent LSTM Allocation (LLA) to address these problems by incorporating topic models with LSTM, where the topic model maps observed words in each sequence to topics that evolve using an LSTM model. In our experiments, we found the resulting model, although powerful and interpretable, to show shortcomings when applied to sequence data that exhibit multi-modes of behaviors with abrupt dynamic changes. To address this problem we introduce thLLA: a threading LLA model. thLLA has the ability to break each sequence into a set of segments and then model the dynamic in each segment using an LSTM mixture. In that way, thLLA can model abrupt changes in sequence dynamics and provides a better fit for sequence data while still being interpretable and requiring fewer parameters. In addition, thLLA uncovers hidden themes in the data via its dynamic mixture components. However, such generalization and interpretability come at a cost of complex dependence structure, for which inference would be extremely non-trivial. To remedy this, we present an efficient sampler based on particle MCMC method for inference that can draw from the joint posterior directly. Experimental results confirm the superiority of thLLA and the stability of the new inference algorithm on a variety of domains. View details
    Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space
    Nick Monath
    Daniel Silva
    Andrew McCallum
    The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’19) (2019)
    Preview abstract Hierarchical clustering is typically performed using algorithmic-based optimization searching over the discrete space of trees. While these optimization methods are often effective, their discreteness restricts them from many of the benefits of their continuous counterparts, such as scalable stochastic optimization and the joint optimization of multiple objectives or components of a model (e.g.end-to-end training). In this paper, we present an approach for hierarchical clustering that searches over continuous representations of trees in hyperbolic space by running gradient descent. We compactly represent uncertainty over tree structures with vectors in the Poincaré ball. We show how the vectors can be optimized using an objective related to recently proposed cost functions for hierarchical clustering. Using our method with a mini-batch stochastic gradient descent inference procedure, we are able to outperform prior work on clustering millions of ImageNet images by 15 points of dendrogram purity. Further, our continuous tree representation can be jointly optimized in multi-task learning applications offering a 9 point improvement over baseline methods View details
    Preview abstract Learning continuous representations of discrete objects such as text, sentences, users, and movies lies at the heart of many applications including involving text and user modeling. Unfortunately, traditional methods that embed all objects do not scale to large vocabulary sizes and embedding dimensions. In this paper, we propose a general method, Anchor & Transform (ANT) that learns sparse representations of discrete objects by jointly learning a small set of \textit{anchor embeddings} and a \textit{sparse transformation} from anchor objects to all objects. ANT is scalable, flexible, end-to-end trainable, and allows the user to easily incorporate domain knowledge about object relationships. ANT also recovers several task-specific baselines under certain structural assumptions on the anchor embeddings and transformation matrices. On several benchmarks involving text and user modeling, ANT demonstrates strong performance with respect to accuracy and sparsity. View details
    Preview abstract Recurrent neural network, such as Long-short term memory (LSTM), are powerful tools for modeling sequential data, however, they lack interpretability and requires large num- ber of parameters. On the other hand, topic models, such as Latent Dirichlet Allocation (LDA), are powerful tools for uncovering the hidden structure in a document collection, however, they lack the same strong predictive power as deep models. In this paper we bridge the gap between such mod- els and propose Latent LSTM Allocation (LLA). In LLA each document is modeled as a sequence of words, and the model jointly groups words into topics and learns the tempo- ral dynamics over the sequence. Our model is interpretable, concise and can capture intricate dynamics. We give an ef- ficient MCMC-EM inference algorithm for our model that scales to millions of documents. Our experimental evalu- ations shows that the proposed model compares favorably with several state-of-the-art baselines. View details
    Predicting Latent Structured Intents from Shopping Queries
    Chao-yuan Wu
    Gowtham Ramani Kumar
    Ritendra Datta
    WWW (2017)
    Preview abstract In online shopping, users usually express their intent through search queries. However, these queries are usually in a rather ambiguous form instead of being accurate. For example, it is more likely (and easier) for users to write a query like “high end bike” than “21 speed carbon frames jamis road bike”. It is challenging to interpret these ambiguous queries and thus search result accuracy suffers. A user oftentimes needs to go through the frustrating process of refining search queries or self-teaching from possibly unstructured information. However, shopping is indeed a structured domain, that is composed of category hierarchy, brands, product lines, features, etc. It would have been much better if a shopping site could understand users’ intent through this structure, present organized/structured information, or even find items with the right categories, brands or features for them. In this paper we study the problem of inferring the latent intent from unstructured queries and mapping them to structured attributes. We present a novel framework that jointly learns this knowledge from user consumption behaviors and product metadata. We present a hybrid Long Short term Memory (LSTM) joint model that is accurate and robust, even though user queries are noisy and product catalog is rapidly growing. Our study is conducted on a large-scale dataset from Google Shopping, that is composed of millions of items and user queries along with their click responses. Extensive qualitative and quantitative evaluation shows that the proposed model is more accurate, concise, and robust than multiple possible alternatives. View details
    Preview abstract Topic models are often applied in industrial settings to discover user profiles from activity logs where documents correspond to users and words to complex objects such as web sites and installed apps. Standard topic models ignore the content-based similarity structure between these objects largely because of the inability of the Dirichlet prior to capture such side information of word-word correlation. Several approaches were proposed to replace the Dirichlet prior with more expressive alternatives. However, this added expressivity comes with a heavy premium: inference becomes intractable and sparsity is lost which renders these alternatives not suitable for industrial scale applications. In this paper we take a radically different approach to incorporating word-word correlation in topic models by applying this side information at the posterior level rather than at the prior level. We show that this choice preserves sparsity and results in a graph-based sampler for LDA whose computational complexity is asymptotically on bar with the state of the art Alias base sampler for LDA. We illustrate the efficacy of our approach over real industrial datasets that span up to billion of users, tens of millions of words and thousands of topics. To the best of our knowledge, our approach provides the first practical and scalable solution to this important problem View details
    State Space LSTM Models with Particle MCMC Inference
    Xun Zheng
    Alex J. Smola
    Eric Xing
    The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS) workshop on Bayesian Deep Learning. (2017)
    Preview abstract Long Short-Term Memory (LSTM) is one of the most powerful sequence models. Despite the strong performance, however, it lacks the nice interpretability as in state space models. In this paper, we present a way to combine the best of both worlds by introducing State Space LSTM (SSL) models that generalizes the earlier work Zaheer et al. (2017) of combining topic models with LSTM. However, unlike Zaheer et al. (2017), we do not make any factorization assumptions in our inference algorithm. We present an efficient sampler based on sequential Monte Carlo (SMC) method that draws from the joint posterior directly. Experimental results confirms the superiority and stability of this SMC inference algorithm on a variety of domains. View details
    Recurrent Recommender Networks
    Chao-Yuan Wu
    Alex Beutel
    Alexander J. Smola
    How Jing
    Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (2017), pp. 495-503
    Preview abstract Recommender systems traditionally assume that user profiles and movie attributes are static. Temporal dynamics are purely reactive, that is, they are inferred after they are observed, e.g. after a user's taste has changed or based on hand-engineered temporal bias corrections for movies. We propose Recurrent Recommender Networks (RRN) that are able to predict future behavioral trajectories. This is achieved by endowing both users and movies with a Long Short-Term Memory (LSTM) autoregressive model that captures dynamics, in addition to a more traditional low-rank factorization. On multiple real-world datasets, our model offers excellent prediction accuracy and it is very compact, since we need not learn latent state but rather just the state transition function. View details
    Canopy --- Fast Sampling with Cover Trees
    Satwik Kottur
    Jose Moura
    Alex J. Smola
    ICML 2017 (2017)
    Preview abstract Hierarchical Bayesian models often capture distributions over a very large number of distinct atoms. The need for this arises when organizing huge amount of unsupervised data, for instance, features extracted using deep convnets can be exploited to organize abundant unlabeled images. Inference for hierarchical Bayesian models in such cases can be rather nontrivial, leading to approximate approaches. In this work, we propose a sampler based on Cover Trees that is exact and that has guaranteed runtime logarithmic in the number of atoms and is polynomial in the inherent dimensionality of the underlying parameter space. In other words, the algorithm is as fast as search over a hierarchical data structure and we demonstrate the effectiveness on both synthetic and real datasets, consisting of over 100 million images. View details
    Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams
    Nan Du
    Mehrdad Farajtabar
    Alexander J. Smola
    Le Song
    Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015), pp. 219-228
    Preview abstract Clusters in document streams, such as online news articles, can be induced by their textual contents, as well as by the temporal dynamics of their arriving patterns. Can we leverage both sources of information to obtain a better clustering of the documents, and distill information that is not possible to extract using contents only? In this paper, we propose a novel random process, referred to as the Dirichlet-Hawkes process, to take into account both information in a unified framework. A distinctive feature of the proposed model is that the preferential attachment of items to clusters according to cluster sizes, present in Dirichlet processes, is now driven according to the intensities of cluster-wise self-exciting temporal point processes, the Hawkes processes. This new model establishes a previously unexplored connection between Bayesian Nonparametrics and temporal Point Processes, which makes the number of clusters grow to accommodate the increasing complexity of online streaming contents, while at the same time adapts to the ever changing dynamics of the respective continuous arrival time. We conducted large-scale experiments on both synthetic and real world news articles, and show that Dirichlet-Hawkes processes can recover both meaningful topics and temporal dynamics, which leads to better predictive performance in terms of content perplexity and arrival time of future documents. View details
    Preview abstract Business-to-consumer (B2C) emails are usually generated by filling structured user data (e.g. purchase, event) into tem- plates. Extracting structured data from B2C emails allows users to track important information on various devices. However, it also poses several challenges, due to the re- quirement of short response time for massive data volume, the diversity and complexity of templates, and the privacy and legal constraints. Most notably, email data is legally protected content, which means no one except the receiver can review the messages or derived information. In this paper we first introduce a system which can extract structured information automatically without requiring hu- man review of any personal content. Then we focus on how to annotate product names from the extracted texts, which is one of the most difficult problems in the system. Nei- ther general learning methods, such as binary classifiers, nor more specific structure learning methods, such as Condition- al Random Field (CRF), can solve this problem well. To accomplish this task, we propose a hybrid approach, which basically trains a CRF model using the labels pre- dicted by binary classifiers (weak learners). However, the performance of weak learners can be low, therefore we use Expectation Maximization (EM) algorithm on CRF to re- move the noise and improve the accuracy, without the need to label and inspect specific emails. In our experiments, the EM-CRF model can significantly improve the product name annotations over the weak learners and plain CRFs. View details
    Scaling Distributed Machine Learning with the Parameter Server
    Mu Li
    David G. Anderson
    Jun Woo Park
    Alexander J. Smola
    Vanja Josifovski
    James Long
    Eugene J. Shekita
    Bor-Yiing Su
    Operating Systems Design and Implementation (OSDI), USENIX (2014), pp. 583-598
    Preview abstract We propose a parameter server framework for distributed machine learning problems. Both data and workloads are distributed over worker nodes, while the server nodes maintain globally shared parameters, represented as dense or sparse vectors and matrices. The framework manages asynchronous data communication between nodes, and supports flexible consistency models, elastic scalability, and continuous fault tolerance. To demonstrate the scalability of the proposed framework, we show experimental results on petabytes of real data with billions of examples and parameters on problems ranging from Sparse Logistic Regression to Latent Dirichlet Allocation and Distributed Sketching. View details
    Scalable Hierarchical Multitask Learning Algorithms for Conversion Optimization in Display Advertising
    Abhimanyu Das
    Alexander J. Smola
    ACM International Conference on Web Search And Data Mining (WSDM) (2014)
    Preview abstract Many estimation tasks come in groups and hierarchies of related problems. In this paper we propose a hierarchical model and a scalable algorithm to perform inference for multitask learning. It infers task correlation and subtask structure in a joint sparse setting. Implementation is achieved by a distributed subgradient oracle and the successive application of prox-operators pertaining to groups and sub-groups of variables. We apply this algorithm to conversion optimization in display advertising. Experimental results on over 1TB data for up to 1 billion observations and 1 million attributes show that the algorithm provides significantly better prediction accuracy while simultaneously being efficiently scalable by distributed parameter synchronization. View details
    Reducing the Sampling Complexity of Topic Models
    Aaron Li
    Sujith Ravi
    Alexander J Smola
    ACM Conference on Knowledge Discovery and Data Mining (KDD) (2014)
    Preview abstract Inference in topic models typically involves a sampling step to associate latent variables with observations. Unfortunately, the generative model loses sparsity with the increase in data, requiring O(k) operations per word for k latent states, such as topics. In this paper we propose an algorithm which requires only O(kd) operations per word, where kd is the number of actually instantiated topics in the document. For large document collections and structured hierarchical models kd ≪ k, thus yielding an order of magnitude speedup. Our method is general and it applies to a wide variety of statistical models. At its core is the idea that dense, rapidly changing distributions can be approximated efficiently by the combination of a Metropolis-Hastings step, judicious use of sparsity, and amortized preprocessing via the alias method. View details
    Taxonomy Discovery for Personalized Recommendation
    Yuchen Zhang
    Vanja Josifovski
    Alexander J Smola
    ACM International Conference on Web Search And Data Mining (WSDM) (2014)
    Preview abstract Personalized recommender systems based on latent factor models are widely used to increase sales in e-commerce. Such systems use the past behavior of users to recommend new items that are likely to be of interest to them. However, latent factor model suffer from sparse user-item interaction in online shopping data: for a large portion of items that do not have sufficient purchase records, their latent factors cannot be estimated accurately. In this paper, we propose a novel approach that automatically discovers the taxonomies from online shopping data and jointly learns a taxonomy-based recommendation system. Out model is non-parametric and can learn the taxonomy structure automatically from the data. Since the taxonomy allows purchase data to be shared between item- s, it effectively improves the accuracy of recommending tail items by sharing strength with the more frequent items. Ex- periments on a large-scale online shopping dataset confirm that our proposed model improves significantly over state-of- the-art latent factor models. Moreover, our model generates high-quality and human readable taxonomies. Finally, us- ing the algorithm-generated taxonomy, our model even out- performs latent factor models based on the human-induced taxonomy, thus alleviating the need for costly manual taxonomy generation. View details
    KDD tutorial: The Dataminer Guide to Scalable Mixed-Membership and Nonparametric Bayesian Models
    Alexander J Smola
    ACM conference on Knowledge Discovery and Data Mining (KDD) (2013)
    Preview
    Hierarchical Geographical Modeling of User locations from Social Media Posts
    Liangjie Hong
    Alexander J Smola
    Proceedings of the 22nd International World Wide Web Conference (WWW 2013)
    Preview
    Focused Marix Factorization for Audience Selection in Display Advertising
    Sandeep Pandey
    Vanja Josifovski
    Lluis Garcia-Pueyo
    Jeff Yuan
    Proceedings of the 29th International Conference on Data Engineering (ICDE) (2013)
    Preview abstract Audience selection is a key problem in display advertising systems in which we need to select a list of users who are interested (i.e., most likely to buy) in an advertising campaign. The users’ past feedback on this campaign can be leveraged to construct such a list using collaborative filtering techniques such as matrix factorization. However, the user-campaign interaction is typically extremely sparse, hence the conventional matrix factorization does not perform well. Moreover, simply combining the users feedback from all campaigns does not address this since it dilutes the focus on target campaign in consideration. To resolve these issues, we propose a novel focused matrix factorization model (FMF) which learns users’ preferences towards the specific campaign products, while also exploiting the information about related products. We exploit the product taxonomy to discover related campaigns, and design models to discriminate between the users’ interest towards campaign products and non-campaign products. We develop a parallel multi-core implementation of the FMF model and evaluate its performance over a real-world advertising dataset spanning more than a million products. Our experiments demonstrate the benefits of using our models over existing approaches. View details
    Measurement and modeling of eye-mouse behavior
    LaDawn Jentzsch
    Rory Sayres
    Sujith Ravi
    Alex J. Smola
    Proceedings of the 22nd International World Wide Web Conference (2013)
    Preview
    Scalable Dynamic Nonparametric Bayesian Models of Content and Users
    Eric P. Xing
    International Joint Conference on Artificial Intelligence (IJCAI -- Best paper track) (2013)
    Preview
    Latent Factor Models with Additive Hierarchically-smoothed User Preferences
    Sandeep Pandey
    Vanja Josifovski
    Lluis Garcia-Pueyo
    Proceedings of The 6th ACM International Conference on Web Search and Data Mining (WSDM) (2013)
    Preview abstract Items in recommender systems are usually associated with annotated attributes such as brand and price for products; agency for news articles, etc. These attributes are highly informative and must be exploited for accurate recommendation. While learning a user preference model over these attributes can result in an interpretable recommender system and can hands the cold start problem, it suffers from two major drawbacks: data sparsity and the inability to model random effects. On the other hand, latent-factor collaborative filtering models have shown great promise in recommender systems; however, its performance on rare items is poor. In this paper we propose a novel model LFUM, which provides the advantages of both of the above models. We learn user preferences (over the attributes) using a personalized Bayesian hierarchical model that uses a combination (additive model) of a globally learned preference model along with user-specific preferences. To combat we smooth these preferences over the item-taxonomy an efficient forward-filtering and backward-smoothing algorithm. Our inference algorithms can handle both attributes (e.g., item brands) and continuous attributes (e.g., prices). We combine the user preferences with the latent- models and train the resulting collaborative filtering system end- using the successful BPR ranking algorithm. In our experimental analysis, we show that our proposed model several commonly used baselines and we carry out an ablation study showing the benefits of each component of our model. View details
    Distributed Large-scale Natural Graph Factorization
    Nino Shervashidze
    Shravan Narayanamurthy,
    Vanja Josifovski
    Alexander J Smola
    Proceedings of the 22nd International World Wide Web Conference (WWW 2013)
    Preview
    The Nested Chinese Restaurant Franchise Process: User Tracking and Document Modeling
    Liangjie Hong
    Alexander J Smola
    International Conference on Machine Learning (ICML) (2013)
    Preview
    MedLDA: Maximum Margin Supervised Topic Models
    Jun Zhu
    Eric P. Xing
    Journal of Machine Learning Research (2012)
    Preview abstract A supervised topic model can utilize side information such as ratings or labels associated with documents or images to discover more predictive low dimensional topical representations of the data. However, existing supervised topic models predominantly employ likelihood-driven objective functions for learning and inference, leaving the popular and potentially powerful max-margin principle unexploited for seeking predictive representations of data and more discriminative topic bases for the corpus. In this paper, we propose the maximum entropy discrimination latent Dirichlet allocation (MedLDA) model, which integrates the mechanism behind the max-margin prediction models (e.g., SVMs) with the mechanism behind the hierarchical Bayesian topic models (e.g., LDA) under a uni- fied constrained optimization framework, and yields latent topical representations that are more discriminative and more suitable for prediction tasks such as document classification or regression. The principle underlying the MedLDA formalism is quite general and can be applied for jointly max-margin and maximum likelihood learning of directed or undirected topic models when supervising side information is available. Efficient variational methods for posterior inference and parameter estimation are derived and extensive empirical studies on several real data sets are also provided. Our experimental results demonstrate qualitatively and quantitatively that MedLDA could: 1) discover sparse and highly discriminative topical representations; 2) achieve state of the art prediction performance; View details
    Hokusai | Sketching Streams in Real Time
    Sergiy Matusevych
    Alex Smola
    Proceedings of the 28th International Conference on Conference on Uncertainty in Artificial Intelligence (UAI) (2012)
    Preview abstract We describe 北斎 Hokusai, a real time system which is able to capture frequency information for streams of arbitrary sequences of symbols. The algorithm uses the Count-Min sketch as its basis and exploits the fact that sketching is linear. It provides real time statistics of arbitrary events, e.g. streams of queries as a function of time. We use a factorizing approximation to provide point estimates at arbitrary (time, item) combinations. View details
    FastEx: Hash Clustering with Exponential Families
    Sujith Ravi
    Shravan Narayanamurthy
    Alex Smola
    Proceedings of the 26th Conference on Neural Information Processing Systems. (NIPS) (2012)
    Preview
    Web-Scale Multi-Task Feature Selection for Behavioral Targeting
    Mohamed Aly
    Abhimanyu Das
    Alex Smola
    Tasos Anastasakos
    Proceedings of The 21st ACM International Conference on Information and Knowledge Management (CIKM), ACM (2012)
    Preview abstract A typical behavioral targeting system optimizing purchase activities, called conversions, faces two main challenges: the web-scale amounts of user histories to process on a daily basis, and the relative sparsity of conversions. In this paper, we try to address these challenges through feature selection. We formulate a multi-task (or group) feature-selection problem among a set of related tasks (sharing a common set of features), namely advertising campaigns. We apply a group-sparse penalty consisting of a combination of an `1 and `2 penalty and an associated fast optimization algorithm for distributed parameter estimation. Our algorithm relies on a variant of the well known Fast Iterative Thresholding Algorithm (FISTA), a closed-form solution for mixed norm programming and a distributed subgradient oracle. To eciently handle web-scale user histories, we present a distributed inference algorithm for the problem that scales to billions of instances and millions of attributes. We show the superiority of our algorithm in terms of both sparsity and ROC performance over baseline feature selection methods (both single-task L1-regularization and multi-task mutual-information gain). View details
    Scalable Inference in Latent Variable Models
    Mohamed Aly
    Joseph Gonzalez
    Shravan Narayanamurthy
    Alex Smola
    Proceedings of The 5th ACM International Conference on Web Search and Data Mining (WSDM) (2012)
    Hokusai - Sketching Streams in Real Time
    Sergiy Matusevych
    Alex J. Smola
    CoRR, vol. abs/1210.4891 (2012)
    Tutorial: New Templates for Scalable Data Analysis
    Alex Smola
    Markus Weimer
    The 21st International World Wide Web conference (WWW) (2012)
    Supercharging Recommender Systems using Taxonomies for Learning User Purchase Behavior
    Sandeep Pandey
    Vanja Josifovski
    Jeff Yuan
    Lluis Garcia Pueyo
    Proceedings of the 38th International Conference on Very Large Databases (VLDB) (2012)
    Discovering Geographical Topics In The Twitter Stream
    Liangjie Hong
    Siva Gurumurthy
    Kostas Tsioutsiouliklis
    Alex Smola
    Proceedings of The 21st International World Wide Web conference (WWW) (2012)
    Fair and Balanced: Learning to Present News Stories
    Choon-hui Teo
    S.V.N Vishwanathan
    Alex Smola
    Proceedings of The 5th ACM International Conference on Web Search and Data Mining (WSDM) (2012)
    Sparse Additive Generative Models of Text
    Jacob Eisenstein
    Eric Xing
    Proceedings of the 28th International Conference on Machine Learning (ICML) (2011)
    Online Inference for the Infinite Cluster-topic Model: Storylines from Streaming Text
    Qirong Ho
    Choon-hui Teo
    Jacobe Eisenstein
    Alex Smola
    Eric Xing
    Proceedings of the 14th Conference on Artificial Intelligence and Statistics (AISTATS) (2011)
    Tutorial: Graphical Models for the Internet
    Alex Smola
    The 25th Conference on Neural Information Processing Systems. (NIPS), (2011)
    Scalable Distributed Inference of Dynamic User Interests for Behavioral Targeting
    Yucheng Low
    Mohamed Aly
    Vanja Josifovski
    Alex Smola
    Proceedings of The 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2011., ACM
    Unified Analysis of Streaming News
    Qirong Ho
    Jacob Eisenstein
    Eric P. Xing
    Alex Smola
    Choon-hui Teo
    Proceedings of the 20th International World Wide Web conference (WWW) (2011)
    Online Inference for the Infinite Topic-Cluster Model: Storylines from Streaming Text
    Qirong Ho
    Choon Hui Teo
    Jacob Eisenstein
    Alexander J. Smola
    Eric P. Xing
    Journal of Machine Learning Research - Proceedings Track, vol. 15 (2011), pp. 101-109
    Modeling Users and Content: Structured Probabilistic Representation, and Scalable Online Inference Algorithms
    Ph.D. Thesis (2011)
    Tutorial: Latent Variable Models for the Internet
    Alex Smola
    the 20th International World Wide Web conference (WWW) (2011)
    WWW 2011 invited tutorial overview: latent variable models on the internet
    Alexander J. Smola
    WWW (Companion Volume) (2011), pp. 281-282
    Unified analysis of streaming news
    Qirong Ho
    Jacob Eisenstein
    Eric P. Xing
    Alexander J. Smola
    Choon Hui Teo
    WWW (2011), pp. 267-276
    Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream
    Eric P. Xing
    Proceedings of the 26th International Conference on Conference on Uncertainty in Artificial Intelligence (UAI) (2010)
    Recovering Time-Varying Networks of Dependencies in Social and Biological Studies
    Eric P. Xing
    Proceeding of the National Academy of Science (PNAS) (2010)
    Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature.
    Luis Pedro Coelho
    Andrew Arnold
    Joshua Kangas
    Saboor Sheikh
    Eric P. Xing
    Robert Murphey
    Lecture Notes in Bioinformatics, Springer (2010)
    Estimating Time-Varying Networks
    Mladen Kolar
    Le Song
    Eric P. Xing
    Annals of Applied Statistics (AOAS) (2010)
    Structured Literature Image Finder: Parsing Text and Figures in Biomedical Literature
    Andrew Arnold
    Luis Pedro Coelho
    Joshua Kangas
    Saboor Sheikh
    Eric P. Xing
    Robert Murphy
    Journal of Web Semantics (2010)
    Staying Informed: Supervised and Semi-Supervised Multi-view Topical Analysis of Ideological Perspective
    Eric P. Xing
    Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (2010)
    MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification
    Jun Zhu
    Eric P. Xing
    The 26th International Conference on Machine Learning (ICML) (2009)
    SLIF: Structure Literature Image Finer
    Andrew Arnold
    Luis Pedro Coelho
    Joshua Kangas
    Saboor Sheikh
    Eric P Xing
    Robert Murphy
    The Annual Meeting of The ISMB BioLINK Special Interest Group : Text Mining, Image Analysis and the future of Scientific Publishing In Association with ISMB/ECCB (2009)
    Structured Correspondence Topic Models for Mining Captioned Figures in Biological Literature
    Eric P. Xing
    William W. Cohen
    Robert Murphy
    Proceedings of The Fifteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2009)
    Collapsed Variational Inference for Time-Varying Dirichlet Process Mixture Models
    Eric P. Xing
    Non-Parametric Bayes Workshop, NIPS (2009)
    Time-Varying Networks: Recovering Temporally Rewiring Genetic Networks During the Life Cycle of Drosophila melanogaster
    Le Song
    Eric P Xing
    arXiv:0901.0138v2 (2008)
    Training Hierarchical Feed-forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks
    Kai Yu
    Wei Xu
    Yihong Gong
    Eric P. Xing
    Proceeding of the 10th European Conference of Computer Vision (ECCV) (2008)
    Joint Latent Topic Models for Text and Citations
    Ramesh Nallapati
    Eric P. Xing
    William W. Cohen
    Proceedings of The Fourteen ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2008)
    Dynamic Non-Parametric Mixture Models and The Recurrent Chinese Restaurant Process : with Applications to Evolutionary Clustering
    Eric P Xing
    Proceedings of The Eighth SIAM International Conference on Data Mining (SDM) (2008)
    Sparse Word Graphs: A scalable algorithm for capturing word correlations in topic models
    Ramesh Nallapati
    William W Cohen
    Eric P. Xing
    In International Conference of Data Mining, Workshop on High performance data mining (ICDM WS) (2007)
    Modeling Topic Evolution in Text Stream
    Eric P Xing
    CMU-MLD Technical Report CMU-ML-07-117 (2007)
    On Tight Approximate Inference of the Logistic Normal Topic Admixture Model
    Eric P Xing
    The Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS) (2007)
    Task Assignment for a Physical Agent Team via a Dynamic Forward/Reverse Auction Mechanism.
    Abhilash Patel
    Tom Brown
    MyungJoo Ham
    Myeong-Wuk Jang
    Gul Agha
    The International Conference of Integration of Knowledge Intensive Multi-Agent Systems KIMAS (2005)
    Efficient Communication in Multi-Agent Systems
    Myeong-Wuk Jang
    Gul Agha
    Software Engineering for Multi-Agent Systems III: Research Issues and Practical Applications, LNCS 3390, Springer-Verlag (2005)