Jump to Content
Mehryar Mohri

Mehryar Mohri

Mehryar Mohri leads the Learning Theory Team in Google Research. The team has extensive expertise in a variety of areas, including learning theory, statistical learning theory, optimization, decision making under uncertainty, reinforcement learning, and theory and algorithms in general.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Preview abstract Blackwell's celebrated theory measures approachability using the $\ell_2$ (Euclidean) distance. In many applications such as regret minimization, it is often more useful to study approachability under other distance metrics, most commonly the $\ell_\infty$ metric. However, the time and space complexity of the algorithms designed for $\ell_\infty$ approachability depend on the dimension of the space of the vectorial payoffs, which is often prohibitively large. We present a framework for converting high-dimensional $\ell_\infty$ approachability problems to low-dimensional \emph{pseudonorm} approachability problems, thereby resolving such issues. We first show that the $\ell_\infty$ distance between the average payoff and the approachability set can be equivalently defined as a \emph{pseudodistance} between a lower-dimensional average vector payoff and a new convex convex set we define. Next, we develop an algorithmic theory of pseudonorm approachability analogous to previous work norm approachability showing that it can be achieved via online linear optimization (OLO) over a convex set given by the Fenchel dual of the unit pseudonorm ball. We then use that to show, modulo mild normalization assumptions, that there exists an $\ell_\infty$ approachability algorithm whose convergence is independent of the dimension of the original vector payoff. We further show that that algorithm admits a polynomial-time complexity, assuming that the original $\ell_\infty$-distance can be computed efficiently. We also give an $\ell_\infty$ approachability algorithm whose convergence is logarithmic in that dimension using an FTRL algorithm with a maximum-entropy regularizer. Finally, we illustrate the benefits of our framework by applying it to several problems in regret minimization. View details
    Preview abstract Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks and yet, they perform well in many others. In fact, in practice, they are often selected as the top choices, due to their simplicity. But, for what tasks do such policies succeed? Can we give theoretical guarantees for their favorable performance? These crucial questions have been scarcely investigated, despite the prominent practical importance of these policies. This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration. Our results apply to value-function-based algorithms in episodic MDPs with bounded Bellman-Eluder dimension. We define an exploration-gap quantity, alpha, that captures a structural property of the MDP, the exploration policy and the given value function class. We show that the sample-complexity of myopic exploration scales quadratically with the inverse of this quantity, 1 / alpha^2. We further demonstrate through concrete examples that the exploration gap is indeed favorable in several tasks where myopic exploration succeeds, due to the corresponding dynamics and reward structure. View details
    Preview abstract We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy, while the other, the optimizer, is a rational utility maximizer. We consider general Bayesian games, where the payoffs of both the optimizer and the learner could depend on the type, which is drawn from a publicly known distribution, but revealed privately to the learner. We address the following questions: (a) what is the bare minimum that the optimizer is guaranteed to obtain regardless of the no-regret learning algorithm employed by the learner? (b) are there learning algorithms that cap the optimizer payoff at this minimum? (c) can these generalizations be implemented efficiently? While building this theory of optimizer-learner interactions, we define a new combinatorial notion of regret called polytope swap regret, that could be of independent interest in other settings. View details
    Preview abstract In distributed learning settings such as federated learning, the training algorithm can be potentially biased towards different clients. Mohri et al. (2019) proposed a domain-agnostic learning algorithm, where the model is optimized for any target distribution formed by a mixture of the client distributions in order to overcome this bias. They further proposed an algorithm for the cross-silo federated learning setting, where the number of clients is small. We consider this problem in the cross-device setting, where the number of clients is much larger. We propose a communication-efficient distributed algorithm called Agnostic Federated Averaging (or AgnosticFedAvg) to minimize the domain-agnostic objective proposed in (Mohri et al., 2019), which is amenable to other private mechanisms such as secure aggregation. We highlight two types of naturally occurring domains in federated learning and argue that AgnosticFedAvg performs well on both. To demonstrate the practical effectiveness of AgnosticFedAvg, we report positive results for large-scale language modeling tasks in both simulation and live experiments, where the latter involves training language models for Spanish virtual keyboard for millions of user devices. View details
    A Field Guide to Federated Optimization
    Jianyu Wang
    Gauri Joshi
    Maruan Al-Shedivat
    Galen Andrew
    A. Salman Avestimehr
    Katharine Daly
    Deepesh Data
    Suhas Diggavi
    Hubert Eichner
    Advait Gadhikar
    Antonious M. Girgis
    Filip Hanzely
    Chaoyang He
    Samuel Horvath
    Martin Jaggi
    Tara Javidi
    Sai Praneeth Karimireddy
    Jakub Konečný
    Sanmi Koyejo
    Tian Li
    Peter Richtarik
    Virginia Smith
    Mahdi Soltanolkotabi
    Weikang Song
    Sebastian Stich
    Ameet Talwalkar
    Hongyi Wang
    Blake Woodworth
    Honglin Yuan
    Mi Zhang
    Tong Zhang
    Chunxiang (Jake) Zheng
    Chen Zhu
    arxiv (2021)
    Preview abstract Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and other constraints that are not primary considerations in other problem settings. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer real-world performance. The goal of this work is not to survey the current literature, but to inspire researchers and practitioners to design federated learning algorithms that can be used in various practical applications. View details
    Preview abstract There have been many recent advances on provably efficient reinforcement learning in problems with rich observation spaces and general function classes. Unfortunately, common to all such approaches is a realizability assumption, that requires the function class to contain the optimal value function of true MDP model, that holds in hardly any real-world setting. In this work, we consider the more realistic setting of agnostic reinforcement learning with a policy class (that may not contain any near-optimal policy). We provide an algorithm for this setting and prove instance-dependent regret bounds when the MDP has small rank $d$. Our bounds scale exponentially with the rank $d$ in the worst case but importantly are polynomial in the horizon, number of actions and the log number of policies. We further show through a nearly matching lower bound that this dependency on horizon is unavoidable. View details
    Preview abstract We present a theoretical and algorithmic study of the multiple-source domain adaptation problem in the common scenario where the learner has access only to a limited amount of labeled target data, but where he has at his disposal a large amount of labeled data from multiple source domains. We show that a new family algorithms based on model selection ideas benefit from very favorable guarantees in this scenario and discuss some theoretical obstacles affecting some alternative techniques. We also report the results of several experiments with our algorithms that demonstrate their practical effectiveness in several tasks View details
    Preview abstract We study multiple-source domain adaptation, when the learner has access to abundant labeled data from multiple source domains and limited labeled data from the target domain. We analyze existing algorithms and propose an instance-optimal approach based on model selection. We provide efficient algorithms and empirically demonstrate the benefits of our approach. View details
    Preview abstract We study reinforcement learning in tabular MDPs where the agent receives additional side observations per step in the form of several transition samples -- e.g. from data augmentation. We formalize this setting using a feedback graph over state-action pairs and show that model-based algorithms can leverage side observations for more sample-efficient learning. We give a regret bound that predominantly depends on the size of the maximum acyclic subgraph of the feedback graph, in contrast with a polynomial dependency on the number of states and actions in the absence of side observations. Finally, we highlight challenges when leveraging a small dominating set of the feedback graph as compared to the well-studied bandit setting and propose a new algorithm that can use such a dominating set to learn a near-optimal policy faster. View details
    Preview abstract We present a new active learning algorithm that adaptively partitions the input space into a finite number of regions, and subsequently seeks a distinct predictor for each region, both phases actively requesting labels. We prove theoretical guarantees for both the generalization error and the label complexity of our algorithm, and analyze the number of regions defined by the algorithm under some mild assumptions. We also report the results of an extensive suite of experiments on several real-world datasets demonstrating substantial empirical benefits over existing single-region and nonadaptive region-based active learning baselines. View details
    Preview abstract A general framework for online learning with partial information is one where feedback graphs specify which losses can be observed by the learner. We study a challenging scenario where feedback graphs vary with time stochastically and,more importantly, where graphs and losses are stochastically dependent. This scenario appears in several applications that we describe. We give a new algorithm for this scenario that exploits the stochastic properties of the graphs and that benefits from favorable regret guarantees in this challenging setting. We present a detailed theoretical analysis of this algorithm, and also report the results of a series of experiments on real-world datasets, which show that our algorithm outperforms standard baselines for online learning with feedback graphs. View details
    Preview abstract We present a series of new PAC-Bayes learning guarantees for randomized algorithms with sample-dependent priors. Our most general bounds make no assumption on the priors and are given in terms of certain covering numbers under the infinite-Renyi divergence and the L1 distance. We show how to use these general bounds to derive leaning bounds in the setting where the sample-dependent priors obey an infinite-Renyi divergence or L1-distance sensitivity condition. We also provide a flexible framework for computing PAC-Bayes bounds, under certain stability assumptions on the sample-dependent priors, and show how to use this framework to give more refined bounds when the priors satisfy an infinite-Renyi divergence sensitivity condition. View details
    Preview abstract Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon. In this work, we propose a general algorithmic framework, \mime, which i) mitigates client drift and ii) adapts arbitrary centralized optimization algorithms such as SGD and Adam to the federated learning setting. Mime uses a combination of control-variates and server-level statistics (e.g. momentum) at every client-update step to ensure that each local update mimics that of the centralized method run on iid data. We prove a reduction result showing that \mime can translate the convergence of a generic algorithm in the centralized setting into convergence in the federated setting. Further, we show for the first time that multiple local steps can lead to faster convergence in the cross-device FL setting. Our thorough theoretical and empirical analyses establish Mime's superiority over other other baselines. View details
    Preview abstract Communication cost is often a bottleneck in federated learning and other client-based distributed learning scenarios. To overcome this, several gradient compression and model compression algorithms have been proposed. In this work, we propose an alternative approach whereby an ensemble of pre-trained base predictors is trained via federated learning. This method allows for training a model which may otherwise surpass the communication bandwidth and storage capacity of the clients to be learned with on-device data through federated learning. Motivated by language modeling, we prove the optimality of ensemble methods for density estimation for standard empirical risk minimization and agnostic risk minimization. We provide communication-efficient ensemble algorithms for federated learning, where per-round communication cost is independent of the size of the ensemble. Furthermore, unlike works on gradient compression, our proposed approach reduces the communication cost of both server-to-client and client-to-server communication. View details
    Preview abstract We present a new algorithm for domain adaptation improving upon a discrepancy minimization algorithm, (DM), previously shown to outperform a number of algorithms for this problem. Unlike many previously proposed solutions for domain adaptation, our algorithm does not consist of a fixed reweighting of the losses over the training sample. Instead, the reweighting depends on the hypothesis sought. The algorithm is derived from a less conservative notion of discrepancy than the DM algorithm called generalized discrepancy. We present a detailed description of our algorithm and show that it can be formulated as a convex optimization problem. We also give a detailed theoretical analysis of its learning guarantees which helps us select its parameters. Finally, we report the results of experiments demonstrating that it improves upon discrepancy minimization. View details
    AdaNet: A Scalable and Flexible Framework for Automatically Learning Ensembles
    Charles Weill
    Vitaly Kuznetsov
    Scott Yang
    Scott Yak
    Hanna Mazzawi
    Eugen Hotaj
    Ghassen Jerfel
    Vladimir Macko
    (2019)
    Preview abstract AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework for automatically learning high-quality ensembles with minimal expert intervention. Our framework is inspired by the AdaNet algorithm (Cortes et al., 2017) which learns the structure of a neural network as an ensemble of subnetworks. We designed it to: (1) integrate with the existing TensorFlow ecosystem, (2) offer sensible default search spaces to perform well on novel datasets, (3) present a flexible API to utilize expert information when available, and (4) efficiently accelerate training with distributed CPU, GPU, and TPU hardware. The code is open-source and available at https://github.com/tensorflow/adanet. View details
    Advances and Open Problems in Federated Learning
    Brendan Avent
    Aurélien Bellet
    Mehdi Bennis
    Arjun Nitin Bhagoji
    Graham Cormode
    Rachel Cummings
    Rafael G.L. D'Oliveira
    Salim El Rouayheb
    David Evans
    Josh Gardner
    Adrià Gascón
    Phillip B. Gibbons
    Marco Gruteser
    Zaid Harchaoui
    Chaoyang He
    Lie He
    Zhouyuan Huo
    Justin Hsu
    Martin Jaggi
    Tara Javidi
    Gauri Joshi
    Mikhail Khodak
    Jakub Konečný
    Aleksandra Korolova
    Farinaz Koushanfar
    Sanmi Koyejo
    Tancrède Lepoint
    Yang Liu
    Prateek Mittal
    Richard Nock
    Ayfer Özgür
    Rasmus Pagh
    Ramesh Raskar
    Dawn Song
    Weikang Song
    Sebastian U. Stich
    Ziteng Sun
    Florian Tramèr
    Praneeth Vepakomma
    Jianyu Wang
    Li Xiong
    Qiang Yang
    Felix X. Yu
    Han Yu
    Arxiv (2019)
    Preview abstract Federated learning (FL) is a machine learning setting where many clients (e.g., mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g., service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and mitigates many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents a comprehensive list of open problems and challenges. View details
    Preview abstract We present an extensive analysis of relative deviation bounds, including detailed proofs of two-sided inequalities and their implications. We also give detailed proofs of two-sided generalization bounds that hold in the general case of unbounded loss functions, under the assumption that a moment of the loss is bounded. We then illustrate how to apply these results in a sample application: the analysis of importance weighting. View details
    Preview abstract We study a scenario of active learning where the input space is partitioned into different regions and where a distinct hypothesis is learned for each region. We first introduce a new active learning algorithm (EIWAL), which is an enhanced version of the IWAL algorithm, based on a finer analysis that results in more favorable learning guarantees. Then, we present a new learning algorithm for region-based active learning, ORIWAL, in which either IWAL or EIWAL serve as a subroutine. ORIWAL optimally allocates points to the subroutine algorithm for each region. We give a detailed theoretical analysis of ORIWAL, including generalization error guarantees and bounds on the number of points labeled, in terms of both the hypothesis set used in each region and the probability mass of that region. We also report the results of several experiments for our algorithm which demonstrate substantial benefits over existing non-region-based active learning algorithms, such as IWAL, and over passive learning. View details
    Learning GANs and Ensembles Using Discrepancy
    Ningshan Zhang
    Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019
    Preview abstract Generative adversarial networks (GANs) generate data based on minimizing a divergence between two distributions. The choice of that divergence is therefore critical. We argue that the divergence must take into account the hypothesis set and the loss function used in a subsequent learning task, where the data generated by a GAN serves for training. Taking that structural information into account is also important to derive generalization guarantees. Thus, we propose to use the discrepancy measure, which was originally introduced for the closely related problem of domain adaptation and which precisely takes into account the hypothesis set and the loss function. We show that discrepancy admits favorable properties for training GANs and prove explicit generalization guarantees. We present efficient algorithms using discrepancy for two tasks: training a GAN directly, namely DGAN, and mixing previously trained generative models, namely EDGAN. Our experiments on toy examples and several benchmark datasets show that DGAN is competitive with other GANs and that EDGAN outperforms existing GAN ensembles, such as AdaGAN. View details
    Preview abstract We present two novel extensions of an on-line importance weighted active learning algorithm IWAL, using the properties of disagreement values among hypotheses. The first extension, DIWAL, prunes the hypothesis set with a more aggressive strategy based on concentration bounds with disagreement values. We show that DIWAL improves the generalization performance and the label complexity of the original IWAL, and further quantify the improvement in terms of the disagreement graph coefficient. The second extension, ZOOM, adaptively zooms into the function space near the best-in-class hypothesis, which effectively reduces the best-in-class error and thus simultaneously improves the generalization performance and the label complexity. We report experimental results on multiple datasets and demonstrate that the proposed algorithms achieve better test performances than IWAL given the same amount of labeling budget. View details
    Preview abstract We consider the scenario of online learning with the sleeping experts where not all experts are available at each round and analyze the general framework of learning with stochastic feedback graphs, where loss observations associated to each expert are characterized by a graph, thereby including both the bandit and full information settings as special cases. A critical assumption in this framework is that the loss observations and the set of sleeping experts at each round is independent. We first extend the classical algorithm of Kleinberg et al 2008 to use the loss information encoded by any sequence of feedback graphs and prove matching upper and lower bounds for the sleeping regret of this algorithm. Our main contribution is then to relax this independence assumption, present a finer notion of sleeping regret, and derive a general algorithm with strong theoretical guarantees. We instantiate our framework to the important scenario of online learning with abstention, where a learner can elect to abstain from making a prediction at the price of a certain cost. We empirically validate the improvement of our algorithm against multiple abstention algorithms on several real-world datasets View details
    Logistic Regression: The Importance of Being Improper
    Dylan Foster
    Haipeng Luo
    Karthik Sridharan
    Conference on Learning Theory (2018)
    Preview abstract Learning linear predictors with the logistic loss—both in stochastic and online settings—is a fundamental task in machine learning and statistics, with direct connections to classification and boosting. Existing “fast rates” for this setting exhibit exponential dependence on the predictor norm, and Hazan et al. (2014) showed that this is unfortunately unimprovable. Starting with the simple observation that the logistic loss is 1-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm. This provides a positive resolution to a variant of the COLT 2012 open problem of McMahan and Streeter (2012) when improper learning is allowed. This improvement is obtained both in the online setting and, with some extra work, in the batch statistical setting with high probability. We also show that the improved dependence on predictor norm is near-optimal. Leveraging this improved dependency on the predictor norm yields the following applications: (a) we give algorithms for online bandit multiclass learning with the logistic loss with an O*(√n) relative mistake bound across essentially all parameter ranges, thus providing a solution to the COLT 2009 open problem of Abernethy and Rakhlin (2009), and (b) we give an adaptive algorithm for online multiclass boosting with optimal sample complexity, thus partially resolving an open problem of Beygelzimer et al. (2015) and Jung et al. (2017). Finally, we give information-theoretic bounds on the optimal rates for improper logistic regression with general function classes, thereby characterizing the extent to which our improvement for linear classes extends to other parameteric and even nonparametric settings. View details
    Online learning with abstention
    Scott Yang
    Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden (2018)
    Preview abstract We present an extensive study of a key problem in online learning where the learner can opt to abstain from making a prediction, at a certain cost. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we first point out a bias problem that limits the straightforward extension of algorithms such as UCB-N to this context. Next, we give a new algorithm, UCB-GT, that exploits historical data and time-varying feedback graphs. We show that this algorithm benefits from more favorable regret guarantees than a natural extension of UCB-N. We further report the results of a series of experiments demonstrating that UCB-GT largely outperforms that extension of UCB-N, as well as other standard baselines. View details
    Preview abstract In this paper, we introduce and analyze Discriminative State Space Models for forecasting non-stationary time series. We provide data-dependent generalization guarantees for learning these models based on recently introduced notion of discrep- ancy. We provide an in-depth analysis of complexity of such models. Finally, we also study generalization ability of several structural risk minimization approaches to this problem and provide efficient implementation for one of them which is based on a convex objective. View details
    AdaNet: Adaptive structural learning of artificial neural networks
    Vitaly Kuznetsov
    Scott Yang
    Proceedings of the 34th International Conference on Machine Learning (ICML 2017). Sydney, Australia, August 2017. (2017)
    Preview abstract We present new algorithms for adaptively learning artificial neural networks. Our algorithms (ADANET) adaptively learn both the structure of the network and its weights. They are based on a solid theoretical analysis, including data-dependent generalization guarantees that we prove and discuss in detail. We report the results of large-scale experiments with one of our algorithms on several binary classification tasks extracted from the CIFAR-10 dataset and on the Criteo dataset. The results demonstrate that our algorithm can automatically learn network structures with very competitive performance accuracies when compared with those achieved by neural networks found by standard approaches. View details
    Parameter-Free Online Learning via Model Selection
    Dylan Foster
    Karthik Sridharan
    Neural Information Processing Systems (2017)
    Preview abstract We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning. Departing from previous work, which has focused on highly structured function classes such as nested balls in Hilbert space, we propose a generic meta-algorithm framework that achieves online model selection oracle inequalities under minimal structural assumptions. We give the first computationally efficient parameter-free algorithms that work in arbitrary Banach spaces under mild smoothness assumptions; previous results applied only to Hilbert spaces. We further derive new oracle inequalities for matrix classes, non-nested convex sets, and R^d with generic regularizers. Finally, we generalize these results by providing oracle inequalities for arbitrary non-linear classes in the online supervised learning model. These results are all derived through a unified meta-algorithm scheme using a novel “multi-scale” algorithm for prediction with expert advice based on random playout, which may be of independent interest. View details
    Learning with rejection
    Proceedings of The 27th International Conference on Algorithmic Learning Theory (ALT 2016). pages 67-82, Bari, Italy, 2016. Springer, Heidelberg, Germany. (2016)
    Preview abstract We introduce a novel framework for classification with a rejection option that consists of simultaneously learning two functions: a classifier along with a rejection function. We present a full theoretical analysis of this framework including new data-dependent learning bounds in terms of the Rademacher complexities of the classifier and rejection families as well as consistency and calibration results. These theoretical guarantees guide us in designing new algorithms that can exploit different kernel-based hypothesis sets for the classifier and rejection functions. We compare and contrast our general framework with the special case of confidence-based rejection for which we devise alternative loss functions and algorithms as well. We report the results of several experiments showing that our kernel-based algorithms can yield a notable improvement over the best existing confidence-based rejection algorithm. View details
    Boosting with abstention
    Advances in Neural Information Processing Systems (NIPS 2016). Barcelona, Spain, 2016. MIT Press. (2016)
    Preview abstract We present a new boosting algorithm for the key scenario of binary classification with abstention where the algorithm can abstain from predicting the label of a point, at the price of a fixed cost. At each round, our algorithm selects a pair of functions, a base predictor and a base abstention function. We define convex upper bounds for the natural loss function associated to this problem, which we prove to be calibrated with respect to the Bayes solution. Our algorithm benefits from general margin-based learning guarantees which we derive for ensembles of pairs of base predictor and abstention functions, in terms of the Rademacher complexities of the corresponding function classes. We give convergence guarantees for our algorithm along with a linear-time weak-learning algorithm for abstention stumps. We also report the results of several experiments suggesting that our algorithm provides a significant improvement in practice over two confidence-based algorithms. View details
    Preview abstract We present a new algorithm for efficiently training n-gram language models on uncertain data, and illustrate its use for semi-supervised language model adaptation. We compute the probability that an n-gram occurs k times in the sample of uncertain data, and use the resulting histograms to derive a generalized Katz backoff model. We compare semi-supervised adaptation of language models for YouTube video speech recognition in two conditions: when using full lattices with our new algorithm versus just the 1-best output from the baseline speech recognizer. Unlike 1-best methods, the new algorithm provides models that yield solid improvements over the baseline on the full test set, and, further, achieves these gains without hurting performance on any of the set of channels. We show that channels with the most data yielded the largest gains. The algorithm was implemented via a new semiring in the OpenFst library and will be released as part of the OpenGrm ngram library. View details
    Random Composite Forests
    Association for the Advancement of Artificial Intelligence (2016) (to appear)
    Preview abstract We introduce a broad family of decision trees, Composite Trees, whose leaf classifiers are selected out of a hypothesis set composed of p sub-families with different complexities. We prove new data dependent learning guarantees for this family in the multi-class setting.These learning bounds provide a quantitative guidance for the choice of the hypotheses at each leaf. Remarkably, they depend on the Rademacher complexities of the sub-families of the predictors and the fraction of sample points correctly classified at each leaf. We further introduce random composite trees and derive learning guarantees for random composite trees which also apply to Random Forests. Using our theoretical analysis, we devise a new algorithm, RCF, that is based on forming an ensemble of random composite trees. We report the results of experiments demonstrating that RCF yields significant performance improvements over both Random Forests and a variant of RCF in several tasks View details
    Structured prediction theory based on factor graph complexity
    Vitaly Kuznetsov
    Scott Yang
    Advances in Neural Information Processing Systems (NIPS 2016). Barcelona, Spain, 2016. MIT Press. (2016)
    Preview abstract We present a general theoretical analysis of structured prediction with a series of new results. We give new data-dependent margin guarantees for structured prediction for a very wide family of loss functions and a general family of hypotheses, with an arbitrary factor graph decomposition. These are the tightest margin bounds known for both standard multi-class and general structured prediction problems. Our guarantees are expressed in terms of a data-dependent complexity measure, factor graph complexity, which we show can be estimated from data and bounded in terms of familiar quantities for several commonly used hypothesis sets along with a sparsity measure for features and graphs. Our proof techniques include generalizations of Talagrand’s contraction lemma that can be of independent interest. We further extend our theory by leveraging the principle of Voted Risk Minimization (VRM) and show that learning is possible even with complex factor graphs. We present new learning bounds for this advanced setting, which we use to design two new algorithms, Voted Conditional Random Field (VCRF) and Voted Structured Boosting (StructBoost). These algorithms can make use of complex features and factor graphs and yet benefit from favorable learning guarantees. We also report the results of experiments with VCRF on several datasets to validate our theory. View details
    Structural maxent models
    Vitaly Kuznetsov
    Proceedings of the Thirty-Second International Conference on Machine Learning (ICML 2015)
    Preview abstract We present a new class of density estimation models, Structural Maxent models, with feature functions selected from a union of possibly very complex sub-families and yet benefiting from strong learning guarantees. The design of our models is based on a new principle supported by uniform convergence bounds and taking into consideration the complexity of the different sub-families composing the full set of features. We prove new data-dependent learning bounds for our models, expressed in terms of the Rademacher complexities of these sub-families. We also prove a duality theorem, which we use to derive our Structural Maxent algorithm. We give a full description of our algorithm, including the details of its derivation, and report the results of several experiments demonstrating that its performance improves on that of existing L1-norm regularized Maxent algorithms. We further similarly define conditional Structural Maxent models for multi-class classification problems. These are conditional probability models also making use of a union of possibly complex feature subfamilies. We prove a duality theorem for these models as well, which reveals their connection with existing binary and multi-class deep boosting algorithms. View details
    Learning with Deep Cascades
    Proceedings of the Twenty-Sixth International Conference on Algorithmic Learning Theory (ALT 2015)
    Preview abstract We introduce a broad learning model formed by cascades of predictors, Deep Cascades, that is structured as general decision trees in which leaf predictors or node questions may be members of rich function families. We present new detailed data-dependent theoretical guarantees for learning with Deep Cascades with complex leaf predictors or node question in terms of the Rademacher complexities of the sub-families composing these sets of predictors and the fraction of sample points correctly classified at each leaf. These general guarantees can guide the design of a variety of different algorithms for deep cascade models and we give a detailed description of two such algorithms. Our second algorithm uses as node and leaf classifiers SVM predictors and we report the results of experiments comparing its performance with that of SVM combined with polynomial kernels. View details
    Preview abstract We present a new algorithm for domain adaptation improving upon a discrepancy mini- mization algorithm, (DM), previously shown to outperform a number of algorithms for this problem. Unlike many previous proposed solutions for domain adaptation, our algorithm does not consist of a fixed reweighting of the losses over the training sample. Instead, the reweighting depends on the hypothesis sought. The algorithm is derived from a less con- servative notion of discrepancy than the DM algorithm. We call this quantity generalized discrepancy. We present a detailed description of our algorithm and show that it can be formulated as a convex optimization problem. We also give a detailed theoretical analysis of its learning guarantees which helps us select its parameters. Finally, we report the results of experiments demonstrating that it improves upon discrepancy minimization in several tasks. View details
    On-line learning algorithms for path experts with non-additive losses
    Vitaly Kuznetsov
    Manfred K. Warmuth
    Proceedings of The 28th Annual Conference on Learning Theory (COLT 2015)
    Preview
    Preview abstract Google Research recently tested a massive online class model for an internal engineering education program, with machine learning as the topic, that blended theoretical concepts and Google-specific software tool tutorials. The goal of this training was to foster engineering capacity to leverage machine learning tools in future products. The course was delivered both synchronously and asynchronously, and students had the choice between studying independently or participating with a group. Since all students are company employees, unlike most publicly offered MOOCs we can continue to measure the students’ behavioral change long after the course is complete. This paper describes the course, outlines the available data set and presents directions for analysis. View details
    Ensemble methods for structured prediction
    Vitaly Kuznetsov
    Proceedings of the 31st International Conference on Machine Learning (ICML 2014)
    Preview
    Theoretical Foundations for Learning Kernels in Supervised Kernel PCA
    Modern Nonparametrics 3: Automating the Learning Pipeline, Neural Information Processing Systems, Workshop (2014)
    Preview abstract This paper presents a novel learning scenario which combines dimensionality reduction, supervised learning as well as kernel selection. We carefully define the hypothesis class that addresses this setting and provide an analysis of its Rademacher complexity and thereby provide generalization guarantees. The proposed algorithm uses KPCA to reduce the dimensionality of the feature space, i.e. by projecting data onto top eigenvectors of covariance operator in a kernel reproducing space. Moreover, it simultaneously learns a linear combination of base kernel functions, which defines a reproducing space, as well as the parameters of a supervised learning algorithm in order to minimize a regularized empirical loss. The bound on Rademacher complexity of our hypothesis is shown to be logarithmic in the number of base kernels, which encourages practitioners to combine as many base kernels as possible. View details
    Deep boosting
    Proceedings of the Thirty-First International Conference on Machine Learning (ICML 2014)
    Preview abstract We present a new ensemble learning algorithm, DeepBoost, which can use as base classifiers a hypothesis set containing deep decision trees, or members of other rich or complex families, and succeed in achieving high accuracy without overfitting the data. The key to the success of the algorithm is a capacity-conscious criterion for the selection of the hypotheses. We give new data- dependent learning bounds for convex ensembles expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. Our algorithm directly benefits from these guarantees since it seeks to minimize the corresponding learning bound. We give a full description of our algorithm, including the details of its derivation, and report the results of several experiments showing that its performance compares favorably to that of AdaBoost and Logistic Regression and their L1-regularized variants. View details
    Multi-Class Deep Boosting
    Vitaly Kuznetsov
    Advances in Neural Information Processing Systems (2014)
    Preview abstract We present new ensemble learning algorithms for multi-class classification. Our algorithms can use as a base classifier set a family of deep decision trees or other rich or complex families and yet benefit from strong generalization guarantees. We give new data-dependent learning bounds for convex ensembles in the multiclass classification setting expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. These bounds are finer than existing ones both thanks to an improved dependency on the number of classes and, more crucially, by virtue of a more favorable complexity term expressed as an average of the Rademacher complexities based on the ensemble’s mixture weights. We introduce and discuss several new multi-class ensemble algorithms benefiting from these guarantees, prove positive results for the H-consistency of several of them, and report the results of experiments showing that their performance compares favorably with that of multi-class versions of AdaBoost and Logistic Regression and their L1-regularized counterparts. View details
    Learning ensembles of structured prediction rules
    Vitaly Kuznetsov
    Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014)
    Preview
    Learning kernels using local rademacher complexity
    Marius Kloft
    Advances in Neural Information Processing Systems (NIPS 2013), MIT Press.
    Preview abstract We use the notion of local Rademacher complexity to design new algorithms for learning kernels. Our algorithms thereby benefit from the sharper learning bounds based on that notion which, under certain general conditions, guarantee a faster convergence rate. We devise two new learning kernel algorithms: one based on a convex optimization problem for which we give an efficient solution using existing learning kernel techniques, and another one that can be formulated as a DC-programming problem for which we describe a solution in detail. We also re- port the results of experiments with both algorithms in both binary and multi-class classification tasks. View details
    Large-scale SVD and manifold learning
    Ameet Talwalkar
    Journal of Machine Learning Research, vol. 14 (2013), pp. 3129-3152
    Preview
    Accuracy at the Top
    Stephen Boyd
    NIPS: Neural Information Processing Systems Foundation (2012)
    Preview abstract We introduce a new notion of classification accuracy based on the top τ -quantile values of a scoring function, a relevant criterion in a number of problems arising for search engines. We define an algorithm optimizing a convex surrogate of the corresponding loss, and show how its solution can be obtained by solving a set of convex optimization problems. We also present margin-based guarantees for this algorithm based on the top τ -quantile of the scores of the functions in the hypothesis set. Finally, we report the results of several experiments in the bipartite setting evaluating the performance of our algorithm and comparing the results to several other algorithms seeking high precision at the top. In most examples, our algorithm achieves a better performance in precision at the top. View details
    Sampling Methods for the Nystrom Method
    Ameet Talwalkar
    Journal of Machine Learning Research (JMLR) (2012)
    Preview
    Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081)
    Maxime Crochemore
    Lila Kari
    Dirk Nowotka
    Dagstuhl Reports, vol. 1 (2011), pp. 47-66
    Preview
    Ensemble Nystrom
    Ameet Talwalkar
    A book chapter in Ensemble Machine Learning: Theory and Applications, Springer (2011)
    Preview
    Domain adaptation in regression
    Proceedings of The 22nd International Conference on Algorithmic Learning Theory, ALT 2011, Springer, Heidelberg, Germany
    Preview
    Can matrix coherence be efficiently and accurately estimated?
    Ameet Talwalkar
    Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011)
    Preview
    Ensembles of Kernel Predictors
    Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011)
    Preview
    Preview abstract Most generalization bounds in learning theory are based on some measure of the complexity of the hypothesis class used, independently of any algorithm. In contrast, the notion of algorithmic stability can be used to derive tight generalization bounds that are tailored to specific learning algorithms by exploiting their particular properties. However, as in much of learning theory, existing stability analyses and bounds apply only in the scenario where the samples are independently and identically distributed. In many machine learning applications, however, this assumption does not hold. The observations received by the learning algorithm often have some inherent temporal dependence. View details
    Generalization Bounds for Learning Kernels
    Proceedings of the 27th Annual International Conference on Machine Learning (ICML 2010)
    Preview abstract This paper presents several novel generalization bounds for the problem of learning kernels based on a combinatorial analysis of the Rademacher complexity of the corresponding hypothesis sets. Our bound for learning kernels with a convex combination of p base kernels using L1 regularization admits only a √log p dependency on the number of kernels, which is tight and considerably more favorable than the previous best bound given for the same problem. We also give a novel bound for learning with a non-negative combination of p base kernels with an L2 regularization whose dependency on p is also tight and only in p^(1/4). We present similar results for Lq regularization with other values of q, and outline the relevance of our proof techniques to the analysis of the complexity of the class of linear functions. Experiments with a large number of kernels further validate the behavior of the generalization error as a function of p predicted by our bounds. View details
    Two-Stage Learning Kernel Algorithms
    Proceedings of the 27th Annual International Conference on Machine Learning (ICML 2010)
    Preview abstract This paper examines two-stage techniques for learning kernels based on a notion of alignment. It presents a number of novel theoretical, algorithmic, and empirical results for alignmentbased techniques. Our results build on previous work by Cristianini et al. (2001), but we adopt a different definition of kernel alignment and significantly extend that work in several directions: we give a novel and simple concentration bound for alignment between kernel matrices; show the existence of good predictors for kernels with high alignment, both for classification and for regression; give algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP; and report the results of extensive experimentswith this alignment-based method in classification and regression tasks, which show an improvement both over the uniformcombination of kernels and over other state-of-the-art learning kernel methods. View details
    Discriminative Topic Segmentation of Text and Speech
    International Conference on Artificial Intelligence and Statistics (AISTATS) (2010)
    Preview
    On the Impact of Kernel Approximation on Learning Accuracy
    Ameet Talwalkar
    Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010)
    Preview abstract Kernel approximation is commonly used to scale kernel-based algorithms to applications containing as many as several million instances. This paper analyzes the effect of such approximations in the kernel matrix on the hypothesis generated by several widely used learning algorithms. We give stability bounds based on the norm of the kernel approximation for these algorithms, including SVMs, KRR, and graph Laplacian-based regularization algorithms. These bounds help determine the degree of approximation that can be tolerated in the estimation of the kernel matrix. Our analysis is general and applies to arbitrary approximations of the kernel matrix. However, we also give a specific analysis of the Nystr¨om low-rank approximation in this context and report the results of experiments evaluating the quality of the Nystr¨om low-rank kernel approximation when used with ridge regression. View details
    Preference-Based Learning to Rank
    Nir Ailon
    Machine Learning Journal, vol. 8 (2010), pp. 189-211
    Preview abstract This paper presents an efficient preference-based ranking algorithm running in two stages. In the first stage, the algorithm learns a preference function defined over pairs, as in a standard binary classification problem. In the second stage, it makes use of that preference function to produce an accurate ranking, thereby reducing the learning problem of ranking to binary classification. This reduction is based on the familiar QuickSort and guarantees an expected pairwise misranking loss of at most twice that of the binary classifier derived in the first stage. Furthermore, in the important special case of bipartite ranking, the factor of two in loss is reduced to one. This improved bound also applies to the regret achieved by our ranking and that of the binary classifier obtained. Our algorithm is randomized, but we prove a lower bound for any deterministic reduction of ranking to binary classification showing that randomization is necessary to achieve our guarantees. This, and a recent result by Balcan et al., who show a regret bound of two for a deterministic algorithm in the bipartite case, suggest a trade-off between achieving low regret and determinism in this context. Our reduction also admits an improved running time guarantee with respect to that deterministic algorithm. In particular, the number of calls to the preference function in the reduction is improved from Ω(n^2) to O(n log n). In addition, when the top k ranked elements only are required (k≪n), as in many applications in information extraction or search engine design, the time complexity of our algorithm can be further reduced to O(k log k+n). Our algorithm is thus practical for realistic applications where the number of points to rank exceeds several thousand. View details
    Learning Bounds for Importance Weighting
    Yishay Mansour
    Advances in Neural Information Processing Systems (NIPS 2010), MIT Press, Vancouver, Canada
    Preview
    On the Estimation of Coherence
    Ameet Talwalkar
    CoRR, vol. abs/1009.0861 (2010)
    Preview
    Half Transductive Ranking
    Bing Bai
    Jason Weston
    David Grangier
    Ronan Collobert
    Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010)
    Preview abstract We study the standard retrieval task of ranking a fixed set of items given a previously unseen query and pose it as the half transductive ranking problem. The task is transductive as the set of items is fixed. Transductive representations (where the vector representation of each example is learned) allow the generation of highly nonlinear embeddings that capture object relationships without relying on a specific choice of features, and require only relatively simple optimization. Unfortunately, they have no direct outof- sample extension. Inductive approaches on the other hand allow for the representation of unknown queries. We describe algorithms for this setting which have the advantages of both transductive and inductive approaches, and can be applied in unsupervised (either reconstruction-based or graph-based) and supervised ranking setups. We show empirically that our methods give strong performance on all three tasks. View details
    Domain Adaptation with Multiple Sources
    Yishay Mansour
    Advances in Neural Information Processing Systems (NIPS 2008), MIT Press, Vancouver, Canada (2009)
    Preview
    Multiple Source Adaptation and the Renyi Divergence
    Yishay Mansour
    Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), Montr\'eal, Canada
    Preview
    Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models
    Gideon Mann
    Ryan McDonald
    Nathan Silberman
    Daniel Walker IV
    Neural Information Processing Systems (NIPS) (2009)
    Preview
    On Sampling-Based Approximate Spectral Decomposition
    Ameet Talkwalkar
    International Conference on Machine Learning (ICML) (2009)
    Preview
    Ensemble Nystrom Method
    Ameet Talwalkar
    Neural Information Processing Systems (NIPS) (2009)
    Preview
    N-Way Composition of Weighted Finite-State Transducers
    International Journal of Foundations of Computer Science, vol. 20 (2009), pp. 613-627
    Preview
    Gaussian Margin Machines
    Koby Crammer
    Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), Clearwater Beach, Florida, pp. 105-112
    Preview
    Polynomial semantic indexing
    Bing Bai
    Jason Weston
    David Grangier
    Ronan Collobert
    Kunihiko Sadamasa
    Yanjun Qi
    Advances in Neural Information Processing Systems (NIPS 2009), MIT Press
    Preview
    Sampling Techniques for the Nystrom Method
    Ameet Talwalkar
    Artificial Intelligence and Statistics (AISTATS) (2009)
    Preview
    L2 Regularization for Learning Kernels
    Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), Montr\'eal, Canada
    Preview
    Rademacher Complexity Bounds for Non-I.I.D. Processes
    Advances in Neural Information Processing Systems (NIPS 2008), MIT Press, Vancouver, Canada (2009)
    Preview
    Weighted Automata Algorithms
    Handbook of weighted automata, Springer (to appear) (2009)
    Preview
    Domain Adaptation: Learning Bounds and Algorithms
    Yishay Mansour
    Proceedings of The 22nd Annual Conference on Learning Theory (COLT 2009), Omnipress, Montr\'eal, Canada
    Preview
    A Machine Learning Framework for Spoken-Dialog Classification
    Patrick Haffner
    Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)
    Preview
    General Algorithms for Testing the Ambiguity of Finite Automata
    Ashish Rastogi
    Proceedings of Twelfth International Conference Developments in Language Theory (DLT 2008), Springer, Heidelberg, Germany, Kyoto, Japan
    Preview
    On the Computation of the Relative Entropy of Probabilistic Automata
    Ashish Rastogi
    International Journal of Foundations of Computer Science, vol. 19 (2008), pp. 219-242
    Preview
    Stability of Transductive Regression Algorithms
    Dmitry Pechyony
    Ashish Rastogi
    Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML 2008), Helsinki, Finland
    Preview
    Preview abstract The problem of identifying the minimal gene set required to sustain life is of crucial importance in understanding cellular mechanisms and designing therapeutic drugs. This work describes several kernel-based solutions for predicting essential genes that outperform existing models while using less training data. Our first solution is based on a semi-manually designed kernel derived from the Pfam database, which includes several Pfam domains. We then present novel and general {\em domain-based} sequence kernels that capture sequence similarity with respect to several domains made of large sets of protein sequences. We show how to deal with the large size of the problem -- several thousands of domains with individual domains sometimes containing thousand of sequences -- by representing and efficiently computing these kernels using automata. We report results of extensive experiments demonstrating that they compare favorably with the Pfam kernel in predicting protein essentiality, while requiring no manual tuning. View details
    Speech Recognition with Weighted Finite-State Transducers
    Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)
    Preview
    Stability Bounds for Non-i.i.d. Processes
    Advances in Neural Information Processing Systems (NIPS 2007), MIT Press, Vancouver, Canada (2008)
    Preview
    Sample Selection Bias Correction Theory
    Proceedings of The 19th International Conference on Algorithmic Learning Theory (ALT 2008), Springer, Heidelberg, Germany, Budapest, Hungary
    Preview
    Kernel Methods for Learning Languages
    Leonid Kontorovich
    Theoretical Computer Science, vol. 405 (2008), pp. 223-236
    Preview
    An Efficient Reduction of Ranking to Classification
    Nir Ailon
    Proceedings of The 21st Annual Conference on Learning Theory (COLT 2008), Springer, Heidelberg, Germany, Helsinki, Finland
    Preview
    Learning with weighted transducers
    Proceedings of the Seventh International Workshop Finite-State Methods and Natural Language Processing (2008)
    Preview
    3-Way Composition of Weighted Finite-State Transducers
    Proceedings of the 13th International Conference on Implementation and Application of Automata (CIAA 2008), Springer-Verlag, Heidelberg, Germany, San Francisco, California, pp. 262-273
    Preview
    Learning sequence kernels
    Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (2008)
    Preview
    Robust music identification, detection, and analysis
    Proceedings of the International Conference on Music Information Retrieval (ISMIR) (2007)
    Preview
    Learning Languages with Rational Kernels
    Leonid Kontorovich
    Proceedings of The 20th Annual Conference on Computational Learning Theory (COLT 2007), Springer, Heidelberg, Germany, San Diego, California
    Preview
    Speech Recognition with Weighted Finite-State Transducers
    Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)
    Preview
    On the Computation of the Relative Entropy of Probabilistic Automata
    Ashish Rastogi
    International Journal of Foundations of Computer Science, vol. to appear (2007)
    Preview
    L_p Distance and Equivalence of Probabilistic Automata
    Ashish Rastogi
    International Journal of Foundations of Computer Science, vol. to appear (2007)
    Preview
    Lp Distance and Equivalence of Probabilistic Automata
    Ashish Rastogi
    International Journal of Foundations of Computer Science, vol. 18 (2007)
    Preview
    Lp Distance and Equivalence of Probabilistic Automata
    Ashish Rastogi
    International Journal of Foundations of Computer Science, vol. 18 (2007), pp. 761-780
    Preview
    An Alternative Ranking Problem for Search Engines
    Ashish Rastogi
    Proceedings of the 6th Workshop on Experimental Algorithms (WEA 2007), Springer-Verlag, Heidelberg, Germany, Rome, Italy, pp. 1-21
    Preview
    Kernel Methods for Learning Languages
    Leonid Kontorovich
    Theoretical Computer Science, vol. to appear (2007)
    Preview
    Magnitude-Preserving Ranking Algorithms
    Ashish Rastogi
    Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML 2007), Oregon State University, Corvallis, OR
    Preview
    Factor Automata of Automata and Applications
    Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA2007), July, CIAA 2007Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA2007), Prague, Czech Republic.
    Preview
    A Machine Learning Framework for Spoken-Dialog Classification
    Patrick Haffner
    Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)
    Preview
    On Transductive Regression
    Advances in Neural Information Processing Systems (NIPS 2006), MIT Press, Vancouver, Canada (2007)
    Preview
    OpenFst: a General and Efficient Weighted Finite-State Transducer Library
    Johan Schalkwyk
    Wojciech Skut
    Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA 2007), Springer-Verlag, Heidelberg, Germany, Prague, Czech Republic
    Preview
    On the Computation of Some Standard Distances between Probabilistic Automata
    Ashish Rastogi
    Proceedings of the 11th International Conference on Implementation and Application of Automata (CIAA 2006), Springer-Verlag, Heidelberg, Germany, Taipei, Taiwan
    Preview
    Probabilistic Context-Free Grammar Induction Based on Structural Zeros
    Proceedings of the Seventh Meeting of the Human Language Technology conference - North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2006), New York, NY
    Preview
    A Unified Construction of the Glushkov, Follow, and Antimirov Automata
    Proceedings of the 31st International Symposium on Mathematical Foundations of Computer Science (MFCS 2006), Springer-Verlag, Heidelberg, Germany, Star\'a Lesn\'a, Slovakia, pp. 110-121
    Preview
    On a Common Fallacy in Computational Linguistics
    A Man of Measure: Festschrift in Honour of Fred Karlsson on this 60th Birthday, SKY Journal of Linguistics, Volume 19 (2006), pp. 432-439
    Preview
    Learning Linearly Separable Languages
    Leonid Kontorovich
    Proceedings of The 17th International Conference on Algorithmic Learning Theory (ALT 2006), Springer, Heidelberg, Germany
    Preview
    Efficient Computation of the Relative Entropy of Probabilistic Automata
    Ashish Rastogi
    Proceedings of the 7th Latin American Symposium (LATIN 2006), Springer-Verlag, Heidelberg, Germany, Valdivia, Chile
    Preview
    Multi-Armed Bandit Algorithms and Empirical Evaluation
    Joannès Vermorel
    Proceedings of the 16th European Conference on Machine Learning (ECML 2005), Springer, Heidelberg, Germany, Porto, Portugal
    Preview
    Local Grammar Algorithms
    Inquiries into Words, Constraints, and Contexts. Festschrift in Honour of Kimmo Koskenniemi on his 60th Birthday, CSLI Publications, Stanford University (2005), pp. 84-93
    Preview
    A General Regression Technique for Learning Transductions
    Jason Weston
    Proceedings of the Twenty-Second International Conference on Machine Learning (ICML 2005), Bonn, Germany
    Preview
    A Comparison of Classifiers for Detecting Emotion from Speech
    Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), Philadelphia, Pennsylvania
    Preview
    Margin-Based Ranking Meets Boosting in the Middle
    Cynthia Rudin
    Robert E. Schapire
    Proceedings of The 18th Annual Conference on Computational Learning Theory (COLT 2005), Springer, Heidelberg, Germany, Bertinoro, Italy, pp. 63-78
    Preview
    Confidence Intervals for the Area under the ROC Curve
    Advances in Neural Information Processing Systems (NIPS 2004), MIT Press, Vancouver, Canada (2005)
    Preview
    Finite-State Transducers in Computational Biology
    Tutorial presented at the 13th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2005), Detroit, MI
    Preview
    Statistical Natural Language Processing
    Applied Combinatorics on Words, Cambridge University Press (2005)
    Preview
    Margin-Based Ranking Meets Boosting in the Middle
    Cynthia Rudin
    Robert E. Schapire
    Proc. of the 18th Annual Conference on Computational Learning Theory (COLT 2005), Springer, Heidelberg, Germany, pp. 63-78
    Preview
    Distribution Kernels Based on Moments of Counts
    Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), Banff, Alberta, Canada
    Preview
    Rational Kernels: Theory and Algorithms
    Patrick Haffner
    Journal of Machine Learning Research (JMLR), vol. 5 (2004), pp. 1035-1062
    Preview
    Automata and Graph Compression
    Ananda Theertha Suresh
    CoRR, vol. abs/1502.07288 (2015)
    On the Disambiguation of Weighted Automata
    Michael D. Riley
    CIAA (2015), pp. 263-278
    Accelerating Optimization via Adaptive Prediction
    Scott Yang
    CoRR, vol. abs/1509.05760 (2015)
    On the Rademacher Complexity of Weighted Automata
    Borja Balle
    ALT (2015), pp. 179-193
    Voted Kernel Regularization
    Prasoon Goyal
    Vitaly Kuznetsov
    CoRR, vol. abs/1509.04340 (2015)
    Learning Weighted Automata
    Borja Balle
    CAI (2015), pp. 1-21
    Non-parametric Revenue Optimization for Generalized Second Price Auctions
    Andres Muñoz Medina
    CoRR, vol. abs/1506.02719 (2015)
    Automata and graph compression
    Ananda Theertha Suresh
    ISIT (2015), pp. 2989-2993
    Foundations of Coupled Nonlinear Dimensionality Reduction
    Learning Theory and Algorithms for revenue optimization in second price auctions with reserve
    Andres Muñoz Medina
    ICML (2014), pp. 262-270
    On the Disambiguation of Weighted Automata
    Michael D. Riley
    CoRR, vol. abs/1405.0500 (2014)
    Revenue Optimization in Posted-Price Auctions with Strategic Buyers
    Andres Muñoz Medina
    CoRR, vol. abs/1411.6305 (2014)
    Conditional Swap Regret and Conditional Correlated Equilibrium
    Scott Yang
    NIPS (2014), pp. 1314-1322
    Generalization Bounds for Time Series Prediction with Non-stationary Processes
    Vitaly Kuznetsov
    ALT (2014), pp. 260-274
    Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers
    Andres Muñoz Medina
    NIPS (2014), pp. 1871-1879
    Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions
    Spencer Greenberg
    CoRR, vol. abs/1310.5796 (2013)
    Perceptron Mistake Bounds
    CoRR, vol. abs/1305.0208 (2013)
    Learning Theory and Algorithms for Revenue Optimization in Second-Price Auctions with Reserve
    Andres Muñoz Medina
    CoRR, vol. abs/1310.5665 (2013)
    On the Disambiguation of Finite Automata and Functional Transducers
    Int. J. Found. Comput. Sci., vol. 24 (2013), pp. 847-862
    Tight Lower Bound on the Probability of a Binomial Exceeding its Expectation
    Spencer Greenberg
    CoRR, vol. abs/1306.1433 (2013)
    Foundations of Machine Learning
    Ameet Talwalkar
    MIT Press (2012), I-XII, 1-412
    Sampling Methods for the Nyström Method
    Ameet Talwalkar
    Journal of Machine Learning Research, vol. 13 (2012), pp. 981-1006
    Stability Bounds for Stationary phi-mixing and beta-mixing Processes
    Journal of Machine Learning Research, vol. 11 (2010), pp. 789-814
    New Generalization Bounds for Learning Kernels
    Multiple Source Adaptation and the Rényi Divergence
    Yishay Mansour
    UAI (2009), pp. 367-374
    Stability Analysis and Learning Bounds for Transductive Regression Algorithms
    Dmitry Pechyony
    Ashish Rastogi
    CoRR, vol. abs/0904.0814 (2009)
    Stability Bound for Stationary Phi-mixing and Beta-mixing Processes
    CoRR, vol. abs/0811.1629 (2008)
    Rademacher Complexity Bounds for Non-I.I.D. Processes
    NIPS (2008), pp. 1097-1104
    Learning Languages with Rational Kernels
    Leonid Kontorovich
    COLT (2007), pp. 349-364
    Magnitude-preserving ranking algorithms
    Ashish Rastogi
    ICML (2007), pp. 169-176
    An Alternative Ranking Problem for Search Engines
    Ashish Rastogi
    WEA (2007), pp. 1-22
    Efficient Computation of the Relative Entropy of Probabilistic Automata
    Ashish Rastogi
    LATIN (2006), pp. 323-336
    On the Computation of Some Standard Distances Between Probabilistic Automata
    Ashish Rastogi
    CIAA (2006), pp. 137-149
    On Transductive Regression
    NIPS (2006), pp. 305-312
    A Unified Construction of the Glushkov, Follow, and Antimirov Automata
    MFCS (2006), pp. 110-121
    Learning Linearly Separable Languages
    Leonid Kontorovich
    ALT (2006), pp. 288-303
    A Comparison of Classifiers for Detecting Emotion from Speech
    ICASSP (1) (2005), pp. 341-344
    A General Weighted Grammar Library
    Ninth International Conference on Automata (CIAA 2004), Kingston, Canada, July 22-24, 2004, Springer-Verlag, Berlin-NY (2005)
    Weighted Automata in Text and Speech Processing
    arXiv, vol. abs/cs/0503077 (2005)
    Multi-armed Bandit Algorithms and Empirical Evaluation
    Joannès Vermorel
    ECML (2005), pp. 437-448
    Moment Kernels for Regular Distributions
    Machine Learning, vol. 60 (2005), pp. 117-134
    A General Weighted Grammar Library
    Distribution kernels based on moments of counts
    An Optimal Pre-Determinization Algorithm for Weighted Transducers
    Theoretical Computer Science, vol. 328 (2004)
    Statistical Modeling for Unit Selection in Speech Synthesis
    42nd Meeting of the Association for Computational Linguistics (ACL 2004), Proceedings of the Conference, Barcelona, Spain
    An optimal pre-determinization algorithm for weighted transducers
    Theor. Comput. Sci., vol. 328 (2004), pp. 3-18
    A General Weighted Grammar Library
    Proceedings of the Ninth International Conference on Automata (CIAA 2004), Kingston, Ontario, Canada
    General Indexation of Weighted Automata -- Application to Spoken Utterance Retrieval
    Murat Saraclar
    Proceedings of the annual meeting of the Human Language Technology conference and North American Chapter of the Association for Computational Linguistics (HLT/NAACL 2004), Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval, Boston, Massachusetts
    AUC Optimization vs. Error Rate Minimization
    Advances in Neural Information Processing Systems (NIPS 2003), MIT Press, Vancouver, Canada (2004)
    A Generalized Construction of Integrated Speech Recognition Transducers
    Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), Montreal, Canada
    Rational Kernels: Theory and Algorithms
    Patrick Haffner
    Journal of Machine Learning Research, vol. 5 (2004), pp. 1035-1062
    Statistical Modeling for Unit Selection in Speech Synthesis
    $42$nd Meeting of the Association for Computational Linguistics (ACL 2004), Proceedings of the Conference, Barcelona, Spain
    Confidence Intervals for the Area Under the ROC Curve
    Weighted Finite-State Transducer Algorithms: An Overview
    Formal Languages and Applications, Springer, Berlin (2004)
    Weighted automata kernels - general framework and algorithms
    Patrick Haffner
    INTERSPEECH (2003)
    Learning from Uncertain Data
    COLT (2003), pp. 656-670
    Positive Definite Rational Kernels
    Patrick Haffner
    Proceedings of The 16th Annual Conference on Computational Learning Theory (COLT 2003), Springer, Heidelberg, Germany, Washington D.C.
    Lattice Kernels for Spoken-Dialog Classification
    Patrick Haffner
    Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Hong Kong
    Finitely Subsequential Transducers
    Int. J. Found. Comput. Sci., vol. 14 (2003)
    Generalized Algorithms for Constructing Statistical Language Models
    Finitely Subsequential Transducers
    International Journal of Foundations of Computer Science, vol. 14 (2003), pp. 983-994
    Generalized Algorithms for Constructing Statistical Language Models
    $41$st Meeting of the Association for Computational Linguistics (ACL 2003), Proceedings of the Conference, Sapporo, Japan
    Edit-Distance of Weighted Automata: General Definitions and Algorithms
    International Journal of Foundations of Computer Science, vol. 14 (2003)
    Efficient Algorithms for Testing the Twins Property
    Journal of Automata, Languages and Combinatorics, vol. 8 (2003)
    AUC Optimization vs. Error Rate Minimization
    An Efficient Pre-Determinization Algorithm
    Eighth International Conference on Automata (CIAA 2003), Santa Barbara, CA, Springer, Berlin-NY, pp. 83-95
    Learning from Uncertain Data
    Proceedings of The 16th Annual Conference on Computational Learning Theory (COLT 2003), Springer, Heidelberg, Germany, Washington D.C.
    Generalized Optimization Algorithm for Speech Recognition Transducers
    Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Hong Kong
    Positive Definite Rational Kernels
    Patrick Haffner
    COLT (2003), pp. 41-56
    Generalized optimization algorithm for speech recognition transducers
    ICASSP (1) (2003), pp. 352-355
    Edit-Distance Of Weighted Automata: General Definitions And Algorithms
    Int. J. Found. Comput. Sci., vol. 14 (2003)
    Weighted Automata Kernels -- General Framework and Algorithms
    Patrick Haffner
    Proceedings of the 9th European Conference on Speech Communication and Technology (Eurospeech '03), Special Session Advanced Machine Learning Algorithms for Speech and Language Processing, Geneva, Switzerland (2003)
    $p$-Subsequentiable Transducers
    Seventh International Conference on Automata (CIAA 2002), Tours, France, Springer, Berlin-NY (2003), pp. 24-34
    Voice Signatures
    Proceedings of The 8th IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2003), St. Thomas, U.S. Virgin Islands
    Lattice kernels for spoken-dialog classification
    Patrick Haffner
    ICASSP (1) (2003), pp. 628-631
    Rational Kernels
    Patrick Haffner
    Advances in Neural Information Processing Systems (NIPS 2002), MIT Press, Vancouver, Canada (2003)
    Edit-Distance of Weighted Automata
    Seventh International Conference on Automata (CIAA 2002), Tours, France, Springer, Berlin-NY (2003), pp. 1-23
    An Efficient Pre-determinization Algorithm
    Edit-Distance of Weighted Automata
    Proceedings of the Seventh International Conference on Automata (CIAA 2002), Tours, France
    p-Subsequentiable Transducers
    CIAA (2002), pp. 24-34
    Weighted Automata Algorithms (Tutorial)
    Proceedings of the workshop Weighted Automata: Theory and Applications (WATA), Dresden, Germany (2002)
    A comparison of two LVR search optimization techniques
    Stephan Kanthak
    Hermann Ney
    INTERSPEECH (2002)
    An Efficient Algorithm for the N-Best-Strings Problem
    Proceedings of the International Conference on Spoken Language Processing 2002 (ICSLP '02), Denver, Colorado
    Weighted Finite-State Transducers in Speech Recognition (Tutorial)
    Proceedings of the International Conference on Spoken Language Processing 2002 (ICSLP '02), Denver, Colorado
    Generic e-Removal and Input e-Normalization Algorithms for Weighted Transducers
    Int. J. Found. Comput. Sci., vol. 13 (2002), pp. 129-143
    Semiring Frameworks and Algorithms for Shortest-Distance Problems
    Journal of Automata, Languages and Combinatorics, vol. 7 (2002)
    Generic Epsilon-Removal and Input Epsilon-Normalization Algorithms for Weighted Transducers
    International Journal of Foundations of Computer Science, vol. 13 (2002), pp. 129-143
    A Comparison of Two LVR Search Optimization Techniques
    Stephan Kanthak
    Hermann Ney
    Proceedings of the International Conference on Spoken Language Processing 2002 (ICSLP '02), Denver, Colorado
    An Efficient Algorithm for the $N$-Best-Strings Problem
    Proceedings of the International Conference on Spoken Language Processing 2002 (ICSLP '02), Denver, Colorado
    Weighted Finite-State Transducers in Speech Recognition
    Computer Speech and Language, vol. 16 (2002), pp. 69-88
    $p$-Subsequentiable Transducers
    Proceedings of the Seventh International Conference on Automata (CIAA 2002), Tours, France
    Edit-Distance of Weighted Automata
    CIAA (2002), pp. 1-23
    On the Determinizability of Weighted Automata and Transducers
    Proceedings of the workshop Weighted Automata: Theory and Applications (WATA), Dresden, Germany (2002)
    Weighted finite-state transducers in speech recognition
    Computer Speech & Language, vol. 16 (2002), pp. 69-88
    Rational Kernels
    Patrick Haffner
    NIPS (2002), pp. 601-608
    A weight pushing algorithm for large vocabulary speech recognition
    INTERSPEECH (2001), pp. 1603-1606
    Generic Epsilon-Removal Algorithm for Weighted Automata
    5th International Conference on Automata (CIAA 2000), London Ontario, Canada, Springer-Verlag, Berlin-NY (2001), pp. 230-242
    A Weight Pushing Algorithm for Large Vocabulary Speech Recognition
    Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech '01), Aalborg, Denmark (2001)
    Language Processing with Weighted Transducers
    Proceedings of the 8th annual conference Traitement Automatique des Langues Naturelles (TALN 2001), Tours, France
    Regular Approximation of Context-Free Grammars through Transformation
    Mark-Jan Nederhof
    Robustness in Language and Speech Technology, Kluwer Academic Publishers, The Netherlands (2001), pp. 153-163
    Weighted Grammar Tools: the GRM Library
    Robustness in Language and Speech Technology, Kluwer Academic Publishers, The Netherlands (2001), pp. 165-186
    Minimization algorithms for sequential transducers
    Theor. Comput. Sci., vol. 234 (2000), pp. 177-201
    Context-Free Recognition with Weighted Automata
    Grammars, vol. 3 (2000), pp. 133-150
    Generic Epsilon-Removal Algorithm for Weighted Automata
    Proceedings of the Fifth International Conference on Automata (CIAA 2000), London, Ontario, Canada
    The Design Principles of a Weighted Finite-State Transducer Library
    Fernando C. N. Pereira
    Theor. Comput. Sci., vol. 231 (2000), pp. 17-32
    Generic epsilon -Removal Algorithm for Weighted Automata
    CIAA (2000), pp. 230-242
    The Design Principles of a Weighted Finite-State Transducer Library
    Theoretical Computer Science, vol. 231 (2000), pp. 17-32
    Minimization Algorithms for Sequential Transducers
    Theoretical Computer Science, vol. 234 (2000), pp. 177-201
    Weighted Finite-State Transducers in Speech Recognition
    Proceedings of the ISCA Tutorial and Research Workshop, Automatic Speech Recognition: Challenges for the new Millenium (ASR2000), Paris, France
    Integrated context-dependent networks in very large vocabulary speech recognition
    Integrated Context-Dependent Networks in Very Large Vocabulary Speech Recognition
    Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech '99), Budapest, Hungary (1999)
    Rapid unit selection from a large speech corpus for concatenative speech synthesis
    Mark Beutnagel
    EUROSPEECH (1999)
    Network optimizations for large-vocabulary speech recognition
    Speech Communication, vol. 28 (1999), pp. 1-12
    Comments on Jelinek, Language modeling for speech recognition, by Frederick Jelinek
    Extended Finite State Models of Language, Cambridge University Press, Cambridge (1999)
    Rapid Unit Selection from a Large Speech Corpus for Concatenative Speech Synthesis
    Mark Beutnagel
    Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech '99), Budapest, Hungary (1999)
    Context-Free Recognition with Weighted Automata
    Proceedings of the Sixth Meeting on Mathematics of Language (MOL6), Orlando, Florida (1999)
    Network Optimizations for Large Vocabulary Speech Recognition
    Speech Communication, vol. 28 (1999), pp. 1-12
    VPQ: a spoken language interface to large scale directory information
    Bruce Buntschuh
    Candace A. Kamm
    Giuseppe Di Fabbrizio
    Alicia Abella
    Shrikanth Narayanan
    Ilija Zeljkovic
    R. D. Sharp
    Jeremy H. Wright
    S. Marcus
    J. Shaffer
    R. Duncan
    Jay G. Wilpon
    ICSLP (1998)
    Dynamic Compilation of Weighted Context-Free Grammars
    Proceedings of COLING-ACL '98, Montreal, Canada (1998), pp. 891-897
    Speech Processing
    Graduate course, Columbia University, Department of Computer Science, New York, NY, 515 pages (1998)
    General Algebraic Frameworks and Algorithms for Shortest-Distance Problems
    AT&T Labs - Research, 62 pages (1998)
    A Rational Design for a Weighted Finite-State Transducer Library
    Proceedings of the Second International Workshop on Implementing Automata (WIA '97), Springer-Verlag, Berlin-NY (1998), pp. 144-158
    Full Expansion of Context-Dependent Networks in Large Vocabulary Speech Recognition
    Don Hindle
    Andrej Ljolje
    Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), Seattle, Washington (1998)
    Full expansion of context-dependent networks in large vocabulary speech recognition
    Donald Hindle
    Andrej Ljolje
    Fernando C. N. Pereira
    ICASSP (1998), pp. 665-668
    Dynamic Compilation of Weighted Context-Free Grammars
    36th Meeting of the Association for Computational Linguistics (ACL '98), Proceedings of the Conference, Montréal, Québec, Canada (1998), pp. 891-897
    String-Matching with Automata
    Nordic Journal of Computing, vol. 4 (1997), pp. 217-231
    Transducer Composition for Context-Dependent Network Expansion
    Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech '97), Rhodes, Greece (1997)
    Finite-State Transducers in Language and Speech Processing
    Computational Linguistics, vol. 23 (1997), pp. 269-311
    A Rational Design for a Weighted Finite-State Transducer Library
    Proceedings of the Workshop on Implementing Automata (WIA '97), London, Ontario, Canada, University of Western Ontario, London, Ontario, Canada (1997)
    String-Matching with Automata
    Nord. J. Comput., vol. 4 (1997), pp. 217-231
    On The Use of Sequential Transducers in Natural Language Processing
    Finite-State Language Processing, The MIT Press, Cambridge, Massachusetts (1997)
    A Rational Design for a Weighted Finite-State Transducer Library
    WIA'97: Proceedings of the Workshop on Implementing Automata, Springer-Verlag (1997)
    Weighted Determinization and Minimization for Large Vocabulary Speech Recognition
    Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech '97), Rhodes, Greece (1997)
    Weighted determinization and minimization for large vocabulary speech recognition
    Transducer Composition for Context-Dependent Network Expansion
    EuroSpeech'97, European Speech Communication Association, Genova, Italy (1997), pp. 1427-1430
    Algorithms for Speech Recognition and Language Processing
    CoRR, vol. cmp-lg/9608018 (1996)
    On some applications of finite-state automata theory to natural language processing
    Natural Language Engineering, vol. 2 (1996), pp. 61-80
    An Efficient Compiler for Weighted Rewrite Rules
    ACL (1996), pp. 231-238
    An Efficient Compiler for Weighted Rewrite Rules
    CoRR, vol. cmp-lg/9606026 (1996)
    An Efficient Compiler for Weighted Rewrite Rules
    $34$th Meeting of the Association for Computational Linguistics (ACL '96), Proceedings of the Conference, Santa Cruz, California, Santa Cruz, California (1996)
    Weighted Automata in Text and Speech Processing
    Proceedings of the 12th biennial European Conference on Artificial Intelligence (ECAI-96), Workshop on Extended finite state models of language, John Wiley and Sons, Chichester, Budapest, Hungary (1996)
    Rational Power Series in Text and Speech Processing
    Graduate course, University of Pennsylvania, Department of Computer Science, Philadelphia, PA (1996)
    On some Applications of Finite-State Automata Theory to Natural Language Processing
    Journal of Natural Language Engineering, vol. 2 (1996), pp. 1-20
    Finite-State Transducers in Language and Speech Processing
    Tutorial at the 16th International Conference on Computational Linguistics (COLING-96), COLING, Copenhagen, Denmark (1996)
    Matching Patterns of An Automaton
    CPM (1995), pp. 286-297
    Computation of French Temporal Expressions to Query Databases
    Denis Maurel
    The First Workshop on the Applications of Natural language Processing to Databases, FWANLPD, Versailles, France (1995)
    Computation of French Temporal Expressions to Query Databases
    Denis Maurel
    NLDB (1995), pp. 0-
    Matching Patterns of an Automaton
    Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching (CPM '95), Springer-Verlag, Berlin-NY, Espoo, Finland (1995), pp. 286-297
    Syntactic Analysis by Local Grammars Automata: an Efficient Algorithm
    Proceedings of the International Conference on Computational Lexicography (COMPLEX 94), Linguistic Institute, Hungarian Academy of Science: Budapest, Hungary (1994)
    Minimization of Sequential Transducers
    Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching (CPM '94), Springer-Verlag, Berlin-NY, Asilomar, California (1994), pp. 151-163
    French Temporal Expressions: Recognition, Parsing and Real Computation
    Denis Maurel
    Proceedings of the 10th Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research, Waterloo, Ontario, Canada, University of Waterloo (1994)
    Compact Representations by Finite-State Transducers
    32nd Meeting of the Association for Computational Linguistics (ACL '94), Proceedings of the Conference, Las Cruces, New Mexico (1994), pp. 204-209
    Reprise par une relative
    Proceedings of the International Conference Dépendance et intégration syntaxique, Bordeaux, France, Niemeyer (1994)
    Minimization of Sequential Transducers
    CPM (1994), pp. 151-163
    Combinaisons appropriées dans les constructions complétives
    Langages, Larousse: Paris, vol. 115 (1994)
    Compact Representations by Finite-State Transducers
    ACL (1994), pp. 204-209
    Syntactic Analysis by Local Grammars Automata: an Efficient Algorithm
    CoRR, vol. abs/cmp-lg/9407002 (1994)
    Review of Les Nouvelles Syntaxes, Grammaires d'unification et analyse du français by Anne Abeillé, 1993, Armand Colin, Paris, France
    Lingvisticae Investigationes, vol. 18 (1994), pp. 415-418
    On some Applications of Finite-State Automata Theory to Natural Language Processing: Representation of Morphological Dictionaries, Compaction, and Indexation
    Universit\'e Marne-la-Vall\'ee, Institut Gaspard Monge, Noisy-le-Grand (1994)
    Analyse et représentation par automates de structures syntaxiques composées: Application aux complétives (Thesis Abstract)
    Lingvisticae Investigationes, vol. 17 (1993), pp. 431-432
    Réduction De Complétive À Un Nom Et Article Défini Générique
    Lingvisticae Investigationes, vol. 17 (1993), pp. 83-97
    La coréférence et l'aspect
    Lingvisticae Investigationes, vol. 14 (1990), pp. 403-412