Jump to Content
Corinna Cortes

Corinna Cortes

Corinna Cortes is a VP in Google Research, where she is working on a broad range of theoretical and applied large-scale machine learning problems. Prior to Google, Corinna spent more than ten years at AT&T Labs - Research, formerly AT&T Bell Labs, where she held a distinguished research position. Corinna's research work is well-known in particular for her contributions to the theoretical foundations of support vector machines (SVMs), for which she jointly with Vladimir Vapnik received the 2008 Paris Kanellakis Theory and Practice Award, and her work on data-mining in very large data sets for which she was awarded the AT&T Science and Technology Medal in the year 2000. Corinna received her MS degree in Physics from University of Copenhagen and joined AT&T Bell Labs as a researcher in 1989. She received her Ph.D. in computer science from the University of Rochester in 1993.

Corinna is also a competitive and a mother of two.

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Preview abstract We present a new active learning algorithm that adaptively partitions the input space into a finite number of regions, and subsequently seeks a distinct predictor for each region, both phases actively requesting labels. We prove theoretical guarantees for both the generalization error and the label complexity of our algorithm, and analyze the number of regions defined by the algorithm under some mild assumptions. We also report the results of an extensive suite of experiments on several real-world datasets demonstrating substantial empirical benefits over existing single-region and nonadaptive region-based active learning baselines. View details
    Understanding the Effects of Batching in Online Active Learning
    Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (2020)
    Preview abstract Online active learning (AL) algorithms often assume immediate access to a label once a query has been made. However, due to practical constraints, the labels of these queried examples are generally only available in ``batches''. In this work, we present a novel analysis for a generic class of batch online AL algorithms and reveal that the effects of batching are in fact mild and only result in an additional term in the label complexity that is linear in the batch size. To our knowledge, this provides the first theoretical justification for such algorithms and we show how they can be applied to batch variants of three canonical online AL algorithms: IWAL, ORIWAL, and DHM. We also conduct an empirical study that corroborates the novel theoretical insights. View details
    Preview abstract A general framework for online learning with partial information is one where feedback graphs specify which losses can be observed by the learner. We study a challenging scenario where feedback graphs vary with time stochastically and,more importantly, where graphs and losses are stochastically dependent. This scenario appears in several applications that we describe. We give a new algorithm for this scenario that exploits the stochastic properties of the graphs and that benefits from favorable regret guarantees in this challenging setting. We present a detailed theoretical analysis of this algorithm, and also report the results of a series of experiments on real-world datasets, which show that our algorithm outperforms standard baselines for online learning with feedback graphs. View details
    Preview abstract We present a new algorithm for domain adaptation improving upon a discrepancy minimization algorithm, (DM), previously shown to outperform a number of algorithms for this problem. Unlike many previously proposed solutions for domain adaptation, our algorithm does not consist of a fixed reweighting of the losses over the training sample. Instead, the reweighting depends on the hypothesis sought. The algorithm is derived from a less conservative notion of discrepancy than the DM algorithm called generalized discrepancy. We present a detailed description of our algorithm and show that it can be formulated as a convex optimization problem. We also give a detailed theoretical analysis of its learning guarantees which helps us select its parameters. Finally, we report the results of experiments demonstrating that it improves upon discrepancy minimization. View details
    AdaNet: A Scalable and Flexible Framework for Automatically Learning Ensembles
    Charles Weill
    Vitaly Kuznetsov
    Scott Yang
    Scott Yak
    Hanna Mazzawi
    Eugen Hotaj
    Ghassen Jerfel
    Vladimir Macko
    (2019)
    Preview abstract AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework for automatically learning high-quality ensembles with minimal expert intervention. Our framework is inspired by the AdaNet algorithm (Cortes et al., 2017) which learns the structure of a neural network as an ensemble of subnetworks. We designed it to: (1) integrate with the existing TensorFlow ecosystem, (2) offer sensible default search spaces to perform well on novel datasets, (3) present a flexible API to utilize expert information when available, and (4) efficiently accelerate training with distributed CPU, GPU, and TPU hardware. The code is open-source and available at https://github.com/tensorflow/adanet. View details
    Preview abstract We consider the scenario of online learning with the sleeping experts where not all experts are available at each round and analyze the general framework of learning with stochastic feedback graphs, where loss observations associated to each expert are characterized by a graph, thereby including both the bandit and full information settings as special cases. A critical assumption in this framework is that the loss observations and the set of sleeping experts at each round is independent. We first extend the classical algorithm of Kleinberg et al 2008 to use the loss information encoded by any sequence of feedback graphs and prove matching upper and lower bounds for the sleeping regret of this algorithm. Our main contribution is then to relax this independence assumption, present a finer notion of sleeping regret, and derive a general algorithm with strong theoretical guarantees. We instantiate our framework to the important scenario of online learning with abstention, where a learner can elect to abstain from making a prediction at the price of a certain cost. We empirically validate the improvement of our algorithm against multiple abstention algorithms on several real-world datasets View details
    Preview abstract We present two novel extensions of an on-line importance weighted active learning algorithm IWAL, using the properties of disagreement values among hypotheses. The first extension, DIWAL, prunes the hypothesis set with a more aggressive strategy based on concentration bounds with disagreement values. We show that DIWAL improves the generalization performance and the label complexity of the original IWAL, and further quantify the improvement in terms of the disagreement graph coefficient. The second extension, ZOOM, adaptively zooms into the function space near the best-in-class hypothesis, which effectively reduces the best-in-class error and thus simultaneously improves the generalization performance and the label complexity. We report experimental results on multiple datasets and demonstrate that the proposed algorithms achieve better test performances than IWAL given the same amount of labeling budget. View details
    Preview abstract We study a scenario of active learning where the input space is partitioned into different regions and where a distinct hypothesis is learned for each region. We first introduce a new active learning algorithm (EIWAL), which is an enhanced version of the IWAL algorithm, based on a finer analysis that results in more favorable learning guarantees. Then, we present a new learning algorithm for region-based active learning, ORIWAL, in which either IWAL or EIWAL serve as a subroutine. ORIWAL optimally allocates points to the subroutine algorithm for each region. We give a detailed theoretical analysis of ORIWAL, including generalization error guarantees and bounds on the number of points labeled, in terms of both the hypothesis set used in each region and the probability mass of that region. We also report the results of several experiments for our algorithm which demonstrate substantial benefits over existing non-region-based active learning algorithms, such as IWAL, and over passive learning. View details
    Learning GANs and Ensembles Using Discrepancy
    Ningshan Zhang
    Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019
    Preview abstract Generative adversarial networks (GANs) generate data based on minimizing a divergence between two distributions. The choice of that divergence is therefore critical. We argue that the divergence must take into account the hypothesis set and the loss function used in a subsequent learning task, where the data generated by a GAN serves for training. Taking that structural information into account is also important to derive generalization guarantees. Thus, we propose to use the discrepancy measure, which was originally introduced for the closely related problem of domain adaptation and which precisely takes into account the hypothesis set and the loss function. We show that discrepancy admits favorable properties for training GANs and prove explicit generalization guarantees. We present efficient algorithms using discrepancy for two tasks: training a GAN directly, namely DGAN, and mixing previously trained generative models, namely EDGAN. Our experiments on toy examples and several benchmark datasets show that DGAN is competitive with other GANs and that EDGAN outperforms existing GAN ensembles, such as AdaGAN. View details
    Preview abstract We present an extensive analysis of relative deviation bounds, including detailed proofs of two-sided inequalities and their implications. We also give detailed proofs of two-sided generalization bounds that hold in the general case of unbounded loss functions, under the assumption that a moment of the loss is bounded. We then illustrate how to apply these results in a sample application: the analysis of importance weighting. View details
    Online learning with abstention
    Scott Yang
    Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden (2018)
    Preview abstract We present an extensive study of a key problem in online learning where the learner can opt to abstain from making a prediction, at a certain cost. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we first point out a bias problem that limits the straightforward extension of algorithms such as UCB-N to this context. Next, we give a new algorithm, UCB-GT, that exploits historical data and time-varying feedback graphs. We show that this algorithm benefits from more favorable regret guarantees than a natural extension of UCB-N. We further report the results of a series of experiments demonstrating that UCB-GT largely outperforms that extension of UCB-N, as well as other standard baselines. View details
    AdaNet: Adaptive structural learning of artificial neural networks
    Vitaly Kuznetsov
    Scott Yang
    Proceedings of the 34th International Conference on Machine Learning (ICML 2017). Sydney, Australia, August 2017. (2017)
    Preview abstract We present new algorithms for adaptively learning artificial neural networks. Our algorithms (ADANET) adaptively learn both the structure of the network and its weights. They are based on a solid theoretical analysis, including data-dependent generalization guarantees that we prove and discuss in detail. We report the results of large-scale experiments with one of our algorithms on several binary classification tasks extracted from the CIFAR-10 dataset and on the Criteo dataset. The results demonstrate that our algorithm can automatically learn network structures with very competitive performance accuracies when compared with those achieved by neural networks found by standard approaches. View details
    Learning with rejection
    Proceedings of The 27th International Conference on Algorithmic Learning Theory (ALT 2016). pages 67-82, Bari, Italy, 2016. Springer, Heidelberg, Germany. (2016)
    Preview abstract We introduce a novel framework for classification with a rejection option that consists of simultaneously learning two functions: a classifier along with a rejection function. We present a full theoretical analysis of this framework including new data-dependent learning bounds in terms of the Rademacher complexities of the classifier and rejection families as well as consistency and calibration results. These theoretical guarantees guide us in designing new algorithms that can exploit different kernel-based hypothesis sets for the classifier and rejection functions. We compare and contrast our general framework with the special case of confidence-based rejection for which we devise alternative loss functions and algorithms as well. We report the results of several experiments showing that our kernel-based algorithms can yield a notable improvement over the best existing confidence-based rejection algorithm. View details
    Boosting with abstention
    Advances in Neural Information Processing Systems (NIPS 2016). Barcelona, Spain, 2016. MIT Press. (2016)
    Preview abstract We present a new boosting algorithm for the key scenario of binary classification with abstention where the algorithm can abstain from predicting the label of a point, at the price of a fixed cost. At each round, our algorithm selects a pair of functions, a base predictor and a base abstention function. We define convex upper bounds for the natural loss function associated to this problem, which we prove to be calibrated with respect to the Bayes solution. Our algorithm benefits from general margin-based learning guarantees which we derive for ensembles of pairs of base predictor and abstention functions, in terms of the Rademacher complexities of the corresponding function classes. We give convergence guarantees for our algorithm along with a linear-time weak-learning algorithm for abstention stumps. We also report the results of several experiments suggesting that our algorithm provides a significant improvement in practice over two confidence-based algorithms. View details
    Structured prediction theory based on factor graph complexity
    Vitaly Kuznetsov
    Scott Yang
    Advances in Neural Information Processing Systems (NIPS 2016). Barcelona, Spain, 2016. MIT Press. (2016)
    Preview abstract We present a general theoretical analysis of structured prediction with a series of new results. We give new data-dependent margin guarantees for structured prediction for a very wide family of loss functions and a general family of hypotheses, with an arbitrary factor graph decomposition. These are the tightest margin bounds known for both standard multi-class and general structured prediction problems. Our guarantees are expressed in terms of a data-dependent complexity measure, factor graph complexity, which we show can be estimated from data and bounded in terms of familiar quantities for several commonly used hypothesis sets along with a sparsity measure for features and graphs. Our proof techniques include generalizations of Talagrand’s contraction lemma that can be of independent interest. We further extend our theory by leveraging the principle of Voted Risk Minimization (VRM) and show that learning is possible even with complex factor graphs. We present new learning bounds for this advanced setting, which we use to design two new algorithms, Voted Conditional Random Field (VCRF) and Voted Structured Boosting (StructBoost). These algorithms can make use of complex features and factor graphs and yet benefit from favorable learning guarantees. We also report the results of experiments with VCRF on several datasets to validate our theory. View details
    Preview abstract We present a new algorithm for domain adaptation improving upon a discrepancy mini- mization algorithm, (DM), previously shown to outperform a number of algorithms for this problem. Unlike many previous proposed solutions for domain adaptation, our algorithm does not consist of a fixed reweighting of the losses over the training sample. Instead, the reweighting depends on the hypothesis sought. The algorithm is derived from a less con- servative notion of discrepancy than the DM algorithm. We call this quantity generalized discrepancy. We present a detailed description of our algorithm and show that it can be formulated as a convex optimization problem. We also give a detailed theoretical analysis of its learning guarantees which helps us select its parameters. Finally, we report the results of experiments demonstrating that it improves upon discrepancy minimization in several tasks. View details
    On-line learning algorithms for path experts with non-additive losses
    Vitaly Kuznetsov
    Manfred K. Warmuth
    Proceedings of The 28th Annual Conference on Learning Theory (COLT 2015)
    Preview
    Structural maxent models
    Vitaly Kuznetsov
    Proceedings of the Thirty-Second International Conference on Machine Learning (ICML 2015)
    Preview abstract We present a new class of density estimation models, Structural Maxent models, with feature functions selected from a union of possibly very complex sub-families and yet benefiting from strong learning guarantees. The design of our models is based on a new principle supported by uniform convergence bounds and taking into consideration the complexity of the different sub-families composing the full set of features. We prove new data-dependent learning bounds for our models, expressed in terms of the Rademacher complexities of these sub-families. We also prove a duality theorem, which we use to derive our Structural Maxent algorithm. We give a full description of our algorithm, including the details of its derivation, and report the results of several experiments demonstrating that its performance improves on that of existing L1-norm regularized Maxent algorithms. We further similarly define conditional Structural Maxent models for multi-class classification problems. These are conditional probability models also making use of a union of possibly complex feature subfamilies. We prove a duality theorem for these models as well, which reveals their connection with existing binary and multi-class deep boosting algorithms. View details
    Learning ensembles of structured prediction rules
    Vitaly Kuznetsov
    Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014)
    Preview
    Deep boosting
    Proceedings of the Thirty-First International Conference on Machine Learning (ICML 2014)
    Preview abstract We present a new ensemble learning algorithm, DeepBoost, which can use as base classifiers a hypothesis set containing deep decision trees, or members of other rich or complex families, and succeed in achieving high accuracy without overfitting the data. The key to the success of the algorithm is a capacity-conscious criterion for the selection of the hypotheses. We give new data- dependent learning bounds for convex ensembles expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. Our algorithm directly benefits from these guarantees since it seeks to minimize the corresponding learning bound. We give a full description of our algorithm, including the details of its derivation, and report the results of several experiments showing that its performance compares favorably to that of AdaBoost and Logistic Regression and their L1-regularized variants. View details
    Ensemble methods for structured prediction
    Vitaly Kuznetsov
    Proceedings of the 31st International Conference on Machine Learning (ICML 2014)
    Preview
    Learning kernels using local rademacher complexity
    Marius Kloft
    Advances in Neural Information Processing Systems (NIPS 2013), MIT Press.
    Preview abstract We use the notion of local Rademacher complexity to design new algorithms for learning kernels. Our algorithms thereby benefit from the sharper learning bounds based on that notion which, under certain general conditions, guarantee a faster convergence rate. We devise two new learning kernel algorithms: one based on a convex optimization problem for which we give an efficient solution using existing learning kernel techniques, and another one that can be formulated as a DC-programming problem for which we describe a solution in detail. We also re- port the results of experiments with both algorithms in both binary and multi-class classification tasks. View details
    Accuracy at the Top
    Stephen Boyd
    NIPS: Neural Information Processing Systems Foundation (2012)
    Preview abstract We introduce a new notion of classification accuracy based on the top τ -quantile values of a scoring function, a relevant criterion in a number of problems arising for search engines. We define an algorithm optimizing a convex surrogate of the corresponding loss, and show how its solution can be obtained by solving a set of convex optimization problems. We also present margin-based guarantees for this algorithm based on the top τ -quantile of the scores of the functions in the hypothesis set. Finally, we report the results of several experiments in the bipartite setting evaluating the performance of our algorithm and comparing the results to several other algorithms seeking high precision at the top. In most examples, our algorithm achieves a better performance in precision at the top. View details
    Domain adaptation in regression
    Proceedings of The 22nd International Conference on Algorithmic Learning Theory, ALT 2011, Springer, Heidelberg, Germany
    Preview
    Ensembles of Kernel Predictors
    Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011)
    Preview
    On the Impact of Kernel Approximation on Learning Accuracy
    Ameet Talwalkar
    Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010)
    Preview abstract Kernel approximation is commonly used to scale kernel-based algorithms to applications containing as many as several million instances. This paper analyzes the effect of such approximations in the kernel matrix on the hypothesis generated by several widely used learning algorithms. We give stability bounds based on the norm of the kernel approximation for these algorithms, including SVMs, KRR, and graph Laplacian-based regularization algorithms. These bounds help determine the degree of approximation that can be tolerated in the estimation of the kernel matrix. Our analysis is general and applies to arbitrary approximations of the kernel matrix. However, we also give a specific analysis of the Nystr¨om low-rank approximation in this context and report the results of experiments evaluating the quality of the Nystr¨om low-rank kernel approximation when used with ridge regression. View details
    Two-Stage Learning Kernel Algorithms
    Proceedings of the 27th Annual International Conference on Machine Learning (ICML 2010)
    Preview abstract This paper examines two-stage techniques for learning kernels based on a notion of alignment. It presents a number of novel theoretical, algorithmic, and empirical results for alignmentbased techniques. Our results build on previous work by Cristianini et al. (2001), but we adopt a different definition of kernel alignment and significantly extend that work in several directions: we give a novel and simple concentration bound for alignment between kernel matrices; show the existence of good predictors for kernels with high alignment, both for classification and for regression; give algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP; and report the results of extensive experimentswith this alignment-based method in classification and regression tasks, which show an improvement both over the uniformcombination of kernels and over other state-of-the-art learning kernel methods. View details
    Learning Bounds for Importance Weighting
    Yishay Mansour
    Advances in Neural Information Processing Systems (NIPS 2010), MIT Press, Vancouver, Canada
    Preview
    Half Transductive Ranking
    Bing Bai
    Jason Weston
    David Grangier
    Ronan Collobert
    Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010)
    Preview abstract We study the standard retrieval task of ranking a fixed set of items given a previously unseen query and pose it as the half transductive ranking problem. The task is transductive as the set of items is fixed. Transductive representations (where the vector representation of each example is learned) allow the generation of highly nonlinear embeddings that capture object relationships without relying on a specific choice of features, and require only relatively simple optimization. Unfortunately, they have no direct outof- sample extension. Inductive approaches on the other hand allow for the representation of unknown queries. We describe algorithms for this setting which have the advantages of both transductive and inductive approaches, and can be applied in unsupervised (either reconstruction-based or graph-based) and supervised ranking setups. We show empirically that our methods give strong performance on all three tasks. View details
    Generalization Bounds for Learning Kernels
    Proceedings of the 27th Annual International Conference on Machine Learning (ICML 2010)
    Preview abstract This paper presents several novel generalization bounds for the problem of learning kernels based on a combinatorial analysis of the Rademacher complexity of the corresponding hypothesis sets. Our bound for learning kernels with a convex combination of p base kernels using L1 regularization admits only a √log p dependency on the number of kernels, which is tight and considerably more favorable than the previous best bound given for the same problem. We also give a novel bound for learning with a non-negative combination of p base kernels with an L2 regularization whose dependency on p is also tight and only in p^(1/4). We present similar results for Lq regularization with other values of q, and outline the relevance of our proof techniques to the analysis of the complexity of the class of linear functions. Experiments with a large number of kernels further validate the behavior of the generalization error as a function of p predicted by our bounds. View details
    Invited talk: Can learning kernels help performance?
    ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, NY, USA (2009), pp. 1-1
    Preview
    L2 Regularization for Learning Kernels
    Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), Montr\'eal, Canada
    Preview
    Polynomial semantic indexing
    Bing Bai
    Jason Weston
    David Grangier
    Ronan Collobert
    Kunihiko Sadamasa
    Yanjun Qi
    Advances in Neural Information Processing Systems (NIPS 2009), MIT Press
    Preview
    Stability of Transductive Regression Algorithms
    Dmitry Pechyony
    Ashish Rastogi
    Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML 2008), Helsinki, Finland
    Preview
    Kernel Methods for Learning Languages
    Leonid Kontorovich
    Theoretical Computer Science, vol. 405 (2008), pp. 223-236
    Preview
    Learning with weighted transducers
    Proceedings of the Seventh International Workshop Finite-State Methods and Natural Language Processing (2008)
    Preview
    Learning sequence kernels
    Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (2008)
    Preview
    On the Computation of the Relative Entropy of Probabilistic Automata
    Ashish Rastogi
    International Journal of Foundations of Computer Science, vol. 19 (2008), pp. 219-242
    Preview
    A Machine Learning Framework for Spoken-Dialog Classification
    Patrick Haffner
    Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)
    Preview
    Sample Selection Bias Correction Theory
    Proceedings of The 19th International Conference on Algorithmic Learning Theory (ALT 2008), Springer, Heidelberg, Germany, Budapest, Hungary
    Preview
    An Alternative Ranking Problem for Search Engines
    Ashish Rastogi
    Proceedings of the 6th Workshop on Experimental Algorithms (WEA 2007), Springer-Verlag, Heidelberg, Germany, Rome, Italy, pp. 1-21
    Preview
    On the Computation of the Relative Entropy of Probabilistic Automata
    Ashish Rastogi
    International Journal of Foundations of Computer Science, vol. to appear (2007)
    Preview
    A Machine Learning Framework for Spoken-Dialog Classification
    Patrick Haffner
    Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)
    Preview
    Learning Languages with Rational Kernels
    Leonid Kontorovich
    Proceedings of The 20th Annual Conference on Computational Learning Theory (COLT 2007), Springer, Heidelberg, Germany, San Diego, California
    Preview
    L_p Distance and Equivalence of Probabilistic Automata
    Ashish Rastogi
    International Journal of Foundations of Computer Science, vol. to appear (2007)
    Preview
    Lp Distance and Equivalence of Probabilistic Automata
    Ashish Rastogi
    International Journal of Foundations of Computer Science, vol. 18 (2007)
    Preview
    On Transductive Regression
    Advances in Neural Information Processing Systems (NIPS 2006), MIT Press, Vancouver, Canada (2007)
    Preview
    Kernel Methods for Learning Languages
    Leonid Kontorovich
    Theoretical Computer Science, vol. to appear (2007)
    Preview
    Magnitude-Preserving Ranking Algorithms
    Ashish Rastogi
    Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML 2007), Oregon State University, Corvallis, OR
    Preview
    Lp Distance and Equivalence of Probabilistic Automata
    Ashish Rastogi
    International Journal of Foundations of Computer Science, vol. 18 (2007), pp. 761-780
    Preview
    On the Computation of Some Standard Distances between Probabilistic Automata
    Ashish Rastogi
    Proceedings of the 11th International Conference on Implementation and Application of Automata (CIAA 2006), Springer-Verlag, Heidelberg, Germany, Taipei, Taiwan
    Preview
    Efficient Computation of the Relative Entropy of Probabilistic Automata
    Ashish Rastogi
    Proceedings of the 7th Latin American Symposium (LATIN 2006), Springer-Verlag, Heidelberg, Germany, Valdivia, Chile
    Preview
    Learning Linearly Separable Languages
    Leonid Kontorovich
    Proceedings of The 17th International Conference on Algorithmic Learning Theory (ALT 2006), Springer, Heidelberg, Germany
    Preview
    Finite-State Transducers in Computational Biology
    Tutorial presented at the 13th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2005), Detroit, MI
    Preview
    Margin-Based Ranking Meets Boosting in the Middle
    Cynthia Rudin
    Robert E. Schapire
    Proc. of the 18th Annual Conference on Computational Learning Theory (COLT 2005), Springer, Heidelberg, Germany, pp. 63-78
    Preview
    Confidence Intervals for the Area under the ROC Curve
    Advances in Neural Information Processing Systems (NIPS 2004), MIT Press, Vancouver, Canada (2005)
    Preview
    Margin-Based Ranking Meets Boosting in the Middle
    Cynthia Rudin
    Robert E. Schapire
    Proceedings of The 18th Annual Conference on Computational Learning Theory (COLT 2005), Springer, Heidelberg, Germany, Bertinoro, Italy, pp. 63-78
    Preview
    A General Regression Technique for Learning Transductions
    Jason Weston
    Proceedings of the Twenty-Second International Conference on Machine Learning (ICML 2005), Bonn, Germany
    Preview
    Distribution Kernels Based on Moments of Counts
    Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), Banff, Alberta, Canada
    Preview
    Rational Kernels: Theory and Algorithms
    Patrick Haffner
    Journal of Machine Learning Research (JMLR), vol. 5 (2004), pp. 1035-1062
    Preview
    Hancock: A language for analyzing transactional data streams
    Kathleen Fisher
    Daryl Pregibon
    Anne Rogers
    Frederick Smith
    ACM Trans. Program. Lang. Syst., vol. 26 (2004), pp. 301-338
    Preview
    Voted Kernel Regularization
    Prasoon Goyal
    Vitaly Kuznetsov
    CoRR, vol. abs/1509.04340 (2015)
    Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions
    Spencer Greenberg
    CoRR, vol. abs/1310.5796 (2013)
    Stability Analysis and Learning Bounds for Transductive Regression Algorithms
    Dmitry Pechyony
    Ashish Rastogi
    CoRR, vol. abs/0904.0814 (2009)
    New Generalization Bounds for Learning Kernels
    Magnitude-preserving ranking algorithms
    Ashish Rastogi
    ICML (2007), pp. 169-176
    An Alternative Ranking Problem for Search Engines
    Ashish Rastogi
    WEA (2007), pp. 1-22
    Learning Languages with Rational Kernels
    Leonid Kontorovich
    COLT (2007), pp. 349-364
    Efficient Computation of the Relative Entropy of Probabilistic Automata
    Ashish Rastogi
    LATIN (2006), pp. 323-336
    On the Computation of Some Standard Distances Between Probabilistic Automata
    Ashish Rastogi
    CIAA (2006), pp. 137-149
    On Transductive Regression
    NIPS (2006), pp. 305-312
    Learning Linearly Separable Languages
    Leonid Kontorovich
    ALT (2006), pp. 288-303
    Moment Kernels for Regular Distributions
    Machine Learning, vol. 60 (2005), pp. 117-134
    Confidence Intervals for the Area Under the ROC Curve
    Rational Kernels: Theory and Algorithms
    Patrick Haffner
    Journal of Machine Learning Research, vol. 5 (2004), pp. 1035-1062
    Distribution kernels based on moments of counts
    AUC Optimization vs. Error Rate Minimization
    Advances in Neural Information Processing Systems (NIPS 2003), MIT Press, Vancouver, Canada (2004)
    Positive Definite Rational Kernels
    Patrick Haffner
    Proceedings of The 16th Annual Conference on Computational Learning Theory (COLT 2003), Springer, Heidelberg, Germany, Washington D.C.
    Lattice Kernels for Spoken-Dialog Classification
    Patrick Haffner
    Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Hong Kong
    AUC Optimization vs. Error Rate Minimization
    Rational Kernels
    Patrick Haffner
    Advances in Neural Information Processing Systems (NIPS 2002), MIT Press, Vancouver, Canada (2003)
    Weighted automata kernels - general framework and algorithms
    Patrick Haffner
    INTERSPEECH (2003)
    Lattice kernels for spoken-dialog classification
    Patrick Haffner
    ICASSP (1) (2003), pp. 628-631
    Weighted Automata Kernels -- General Framework and Algorithms
    Patrick Haffner
    Proceedings of the 9th European Conference on Speech Communication and Technology (Eurospeech '03), Special Session Advanced Machine Learning Algorithms for Speech and Language Processing, Geneva, Switzerland (2003)
    Positive Definite Rational Kernels
    Patrick Haffner
    COLT (2003), pp. 41-56
    Rational Kernels
    Patrick Haffner
    NIPS (2002), pp. 601-608
    Communities of interest
    Daryl Pregibon
    Chris Volinsky
    Intell. Data Anal., vol. 6 (2002), pp. 211-219
    Signature-Based Methods for Data Streams
    Daryl Pregibon
    Data Min. Knowl. Discov., vol. 5 (2001), pp. 167-182
    Communities of Interest
    Daryl Pregibon
    Chris Volinsky
    IDA (2001), pp. 105-114
    Context-Free Recognition with Weighted Automata
    Grammars, vol. 3 (2000), pp. 133-150
    Hancock: a language for extracting signatures from data streams
    Kathleen Fisher
    Daryl Pregibon
    Anne Rogers
    KDD (2000), pp. 9-17
    Context-Free Recognition with Weighted Automata
    Proceedings of the Sixth Meeting on Mathematics of Language (MOL6), Orlando, Florida (1999)
    Squashing Flat Files Flatter
    William DuMouchel
    Chris Volinsky
    Theodore Johnson
    Daryl Pregibon
    KDD (1999), pp. 6-15
    Information Mining Platforms: An Infrastructure for KDD Rapid Deployment
    Daryl Pregibon
    KDD (1999), pp. 327-331
    Giga-Mining
    Daryl Pregibon
    KDD (1998), pp. 174-178
    Capacity and Complexity Control in Predicting the Spread Between Borrowing and Lending Interest Rates
    Harris Drucker
    Dennis Hoover
    Vladimir Vapnik
    KDD (1995), pp. 51-56
    Support-Vector Networks
    Vladimir Vapnik
    Machine Learning, vol. 20 (1995), pp. 273-297
    Limits on Learning Machine Accuracy Imposed by Data Quality
    Lawrence D. Jackel
    Wan-Ping Chiang
    KDD (1995), pp. 57-62
    Boosting Decision Trees
    Harris Drucker
    NIPS (1995), pp. 479-485
    Limits in Learning Machine Accuracy Imposed by Data Quality
    Lawrence D. Jackel
    Wan-Ping Chiang
    NIPS (1994), pp. 239-246
    Boosting and Other Machine Learning Algorithms
    Harris Drucker
    Lawrence D. Jackel
    Yann LeCun
    Vladimir Vapnik
    ICML (1994), pp. 53-61
    Learning Curves: Asymptotic Values and Rate of Convergence
    Lawrence D. Jackel
    Sara A. Solla
    Vladimir Vapnik
    John S. Denker
    NIPS (1993), pp. 327-334