Jump to Content
Apostol (Paul) Natsev

Apostol (Paul) Natsev

Apostol (Paul) Natsev is a software engineer and manager in the video content analysis group at Google Research. Previously, he was a research staff member and manager of the multimedia research group at IBM Research from 2001 to 2011. He received a master's degree and a Ph.D. in computer science from Duke University, Durham, NC, in 1997 and 2001, respectively. Dr. Natsev's research interests span the areas of image and video analysis and retrieval, machine perception, large-scale machine learning and recommendation systems. He is an author of more than 80 publications and his research has been recognized with several awards.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Large Scale Video Representation Learning via Relational Graph Clustering
    Hyodong Lee
    Joe Yue-Hei Ng
    Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
    Preview abstract Representation learning is widely applied for various tasks on multimedia data, e.g., retrieval and search. One approach for learning useful representation is by utilizing the relationships or similarities between examples. In this work, we explore two promising scalable representation learning approaches on video domain. With hierarchical graph clusters built upon video-to-video similarities, we propose: 1) smart negative sampling strategy that significantly boosts training efficiency with triplet loss, and 2) a pseudo-classification approach using the clusters as pseudo-labels. The embeddings trained with the proposed methods are competitive on multiple video understanding tasks, including related video retrieval and video annotation. Both of these proposed methods are highly scalable, as verified by experiments on large-scale datasets. View details
    Large-Scale Training Framework for Video Annotation
    Seong Jae Hwang
    Balakrishnan Varadarajan
    Ariel Gordon
    Proc. of the 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), ACM (2019)
    Preview abstract Video is one of the richest sources of information available online but extracting deep insights from video content at internet scale is still an open problem, both in terms of depth and breadth of understanding, as well as scale. Over the last few years, the field of video understanding has made great strides due to the availability of large-scale video datasets and core advances in image, audio, and video modeling architectures. However, the state-of-the-art architectures on small scale datasets are frequently impractical to deploy at internet scale, both in terms of the ability to train such deep networks on hundreds of millions of videos, and to deploy them for inference on billions of videos. In this paper, we present a MapReduce-based training framework, which exploits both data parallelism and model parallelism to scale training of complex video models. The proposed framework uses alternating optimization and full-batch fine-tuning, and supports large Mixture-of-Experts classifiers with hundreds of thousands of mixtures, which enables a trade-off between model depth and breadth, and the ability to shift model capacity between shared (generalization) layers and per-class (specialization) layers. We demonstrate that the proposed framework is able to reach state-of-the-art performance on the largest public video datasets, YouTube-8M and Sports-1M, and can scale to 100 times larger datasets. View details
    Collaborative Deep Metric Learning for Video Understanding
    Balakrishnan Varadarajan
    Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM (2018)
    Preview abstract The goal of video understanding is to develop algorithms that enable machines understand videos at the level of human experts. Researchers have tackled various domains including video classification, search, personalized recommendation, and more. However, there is a research gap in combining these domains in one unified learning framework. Towards that, we propose a deep network that embeds videos using their audio-visual content, onto a metric space which preserves video-to-video relationships. Then, we use the trained embedding network to tackle various domains including video classification and recommendation, showing significant improvements over state-of-the-art baselines. The proposed approach is highly scalable to deploy on large-scale video sharing platforms like YouTube. View details
    The Kinetics Human Action Video Dataset
    Andrew Zisserman
    Joao Carreira
    Karen Simonyan
    Will Kay
    Brian Zhang
    Chloe Hillier
    Fabio Viola
    Tim Green
    Trevor Back
    Mustafa Suleyman
    arXiv (2017)
    Preview abstract We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. We describe the statistics of the dataset, how it was collected, and give some baseline performance figures for neural network architectures trained and tested for human action classification on this dataset. View details
    Content-based Related Video Recommendations
    Nisarg Kothari
    Advances in Neural Information Processing Systems (NIPS) Demonstration Track (2016)
    Preview abstract This is a demo of related video recommendations, seeded from random YouTube videos, and based purely on video content signals. Traditional recommendation systems using collaborative filtering (CF) approaches suggest related videos for a given seed based on how many users have watched a particular candidate video right after watching the seed video. This does not take the video content into account but relies on aggregate user behavior. Traditional CF approaches work very well when the seed and the candidate videos are relatively popular – they must be watched in a sequence by many users in order for them to be identified as related by the CF system. In this demo, we focus on the cold-start problem, where either the seed and/or the candidate video are freshly uploaded (or undiscovered) so the CF system cannot identify any related videos for them. Being able to recommend freshly uploaded videos as well as recommend good related videos for fresh video seeds are important for improving freshness and user engagement. We model this as a video content-based similarity learning problem, and learn deep video embeddings trained to predict ground-truth video relationships (identified by a CF co-watch-based system) but using only visual content. The system does not depend on availability on video metadata or any click information, and can generalize to both popular and tail content, as well as new video uploads. It embeds any new video into a 1024-dimensional representation based on its content and pairwise video similarity is computed simply as a dot product in the embedding space. We show that the learned video embeddings generalize beyond simple visual similarity and are able to capture complex semantic relationships. View details
    Preview abstract Many recent advancements in Computer Vision are attributed to large datasets. Open-source software packages for Machine Learning and inexpensive commodity hardware have reduced the barrier of entry for exploring novel approaches at scale. It is possible to train models over millions of examples within a few days. Although large-scale datasets exist for image understanding, such as ImageNet, there are no comparable size video classification datasets. In this paper, we introduce YouTube-8M, the largest multi-label video classification dataset, composed of ~8 million videos---500K hours of video---annotated with a vocabulary of 4803 visual entities. To get the videos and their (multiple) labels, we used the YouTube Data APIs. We filtered the video labels (Freebase topics) using both automated and manual curation strategies, including by asking Mechanical Turk workers if the labels are visually recognizable. Then, we decoded each video at one-frame-per-second, and used a Deep CNN pre-trained on ImageNet to extract the hidden representation immediately prior to the classification layer. Finally, we compressed the frame features and make both the features and video-level labels available for download. The dataset contains frame-level features for over 1.9 billion video frames and 8 million videos, making it the largest public multi-label video dataset. We trained various (modest) classification models on the dataset, evaluated them using popular evaluation metrics, and report them as baselines. Despite the size of the dataset, some of our models train to convergence in less than a day on a single machine using the publicly-available TensorFlow framework. We plan to release code for training a basic TensorFlow model and for computing metrics. We show that pre-training on large data generalizes to other datasets like Sports-1M and ActivityNet. We achieve state-of-the-art on ActivityNet, improving mAP from 53.8% to 77.8%. We hope that the unprecedented scale and diversity of YouTube-8M will lead to advances in video understanding and representation learning. View details
    Efficient Large Scale Video Classification
    Balakrishnan Varadarajan
    dblp computer science bibliography, http://dblp.org (2015) (to appear)
    Preview abstract Video classification has advanced tremendously over the recent years. A large part of the improvements in video classification had to do with the work done by the image classification community and the use of deep convolutional networks (CNNs) which produce competitive results with hand- crafted motion features. These networks were adapted to use video frames in various ways and have yielded state of the art classification results. We present two methods that build on this work, and scale it up to work with millions of videos and hundreds of thousands of classes while maintaining a low computational cost. In the context of large scale video processing, training CNNs on video frames is extremely time consuming, due to the large number of frames involved. We propose to avoid this problem by training CNNs on either YouTube thumbnails or Flickr images, and then using these networks' outputs as features for other higher level classifiers. We discuss the challenges of achieving this and propose two models for frame-level and video-level classification. The first is a highly efficient mixture of experts while the latter is based on long short term memory neural networks. We present results on the Sports-1M video dataset (1 million videos, 487 classes) and on a new dataset which has 12 million videos and 150,000 labels. View details
    Tracking Large-Scale Video Remix in Real-World Events
    Lexing Xie
    Xuming He
    John R. Kender
    Matthew L. Hill
    John R. Smith
    IEEE Transactions on Multimedia, vol. 15, no. 6 (2013), pp. 1244-1254
    Preview abstract Content sharing networks, such as YouTube, contain traces of both explicit online interactions (such as likes, comments, or subscriptions), as well as latent interactions (such as quoting, or remixing, parts of a video). We propose visual memes, or frequently re-posted short video segments, for detecting and monitoring such latent video interactions at scale. Visual memes are extracted by scalable detection algorithms that we develop, with high accuracy. We further augment visual memes with text, via a statistical model of latent topics. We model content interactions on YouTube with visual memes, defining several measures of influence and building predictive models for meme popularity. Experiments are carried out with over 2 million video shots from more than 40,000 videos on two prominent news events in 2009: the election in Iran and the swine flu epidemic. In these two events, a high percentage of videos contain remixed content, and it is apparent that traditional news media and citizen journalists have different roles in disseminating remixed content. We perform two quantitative evaluations for annotating visual memes and predicting their popularity. The proposed joint statistical model of visual memes and words outperforms an alternative concurrence model, with an average error of 2% for predicting meme volume and 17% for predicting meme lifespan. View details
    Multimedia Semantics: Interactions Between Content and Community
    Hari Sundaram
    Lexing Xie
    Munmun De Choudhury
    Yu-Ru Lin
    Proceedings of the IEEE, vol. 100, no. 9 (2012)
    Preview abstract This paper reviews the state of the art and some emerging issues in research areas related to pattern analysis and monitoring of web-based social communities. This research area is important for several reasons. First, the presence of near-ubiquitous low-cost computing and communication technologies has enabled people to access and share information at an unprecedented scale. The scale of the data necessitates new research for making sense of such content. Furthermore, popular websites with sophisticated media sharing and notification features allow users to stay in touch with friends and loved ones; these sites also help to form explicit and implicit social groups. These social groups are an important source of information to organize and to manage multimedia data. In this article, we study how media-rich social networks provide additional insight into familiar multimedia research problems, including tagging and video ranking. In particular, we advance the idea that the contextual and social aspects of media are as important for successful multimedia applications as is the media content. We examine the interrelationship between content and social context through the prism of three key questions. First, how do we extract the context in which social interactions occur? Second, does social interaction provide value to the media object? Finally, how do social media facilitate the repurposing of shared content and engender cultural memes? We present three case studies to examine these questions in detail. In the first case study, we show how to discover structure latent in the social media data, and use the discovered structure to organize Flickr photo streams. In the second case study, we discuss how to determine the interestingness of conversations---and of participants---around videos uploaded to YouTube. Finally, we show how the analysis of visual content, in particular tracing of content remixes, can help us understand the relationship among YouTube participants. For each case, we present an overview of recent work and review the state of the art. We also discuss two emerging issues related to the analysis of social networks---robust data sampling and scalable data analysis. View details
    Scene Aligned Pooling for Complex Video Recognition
    Liangliang Cao
    Yadong Mu
    Shih-Fu Chang
    Gang Hua
    John R. Smith
    ECCV (2012), pp. 688-701
    Preview abstract Real-world videos often contain dynamic backgrounds and evolving people activities, especially for those web videos generated by users in unconstrained scenarios. This paper proposes a new visual representation, namely scene aligned pooling, for the task of event recognition in complex videos. Based on the observation that a video clip is often composed with shots of different scenes, the key idea of scene aligned pooling is to decompose any video features into concurrent scene components, and to construct classification models adaptive to different scenes. The experiments on two large scale real-world datasets including the TRECVID Multimedia Event Detection 2011 and the Human Motion Recognition Databases (HMDB) show that our new visual representation can consistently improve various kinds of visual features such as different low-level color and texture features, or middle-level histogram of local descriptors such as SIFT, or space-time interest points, and high level semantic model features, by a significant margin. For example, we improve the-state-of-the-art accuracy on HMDB dataset by 20% in terms of accuracy. View details
    Video Event Detection Using Temporal Pyramids of Visual Semantics with Kernel Optimization and Model Subspace Boosting
    Noel C. F. Codella
    Gang Hua
    Matthew L. Hill
    Liangliang Cao
    Leiguang Gong
    John R. Smith
    ICME (2012), pp. 747-752
    Social Media Use by Government: From the Routine to the Critical
    Andrea Kavanaugh
    Edward A. Fox
    Stephen Sheetz
    Seungwon Yang
    Lin Tzy Li
    Donald Shoemaker
    Lexing Xie
    Government Information Quarterly, vol. 29, no. 4 (2012), pp. 480-491
    Semantic Model Vectors for Complex Video Event Recognition
    Michele Merler
    Bert Huang
    Lexing Xie
    Gang Hua
    IEEE Transactions on Multimedia, vol. 14 (2012), pp. 88-101
    Visual memes in social media: tracking real-world news in YouTube videos
    Lexing Xie
    John R. Kender
    Matthew L. Hill
    John R. Smith
    ACM Multimedia (2011), pp. 53-62
    Towards large scale land-cover recognition of satellite images
    Noel C.F. Codella
    Gang Hua
    John R. Smith
    Intl. Conference on Information, Communications and Signal Processing (ICICS) (2011), pp. 1-5
    Image modality classification: a late fusion method based on confidence indicator and closeness matrix
    Xingzhi Sun
    Leiguang Gong
    Xiaofei Teng
    Li Tian
    Tao Wang
    Yue Pan
    ICMR (2011), pp. 55
    Tracking Visual Memes in Rich-Media Social Communities
    Lexing Xie
    John R. Kender
    Matthew L. Hill
    John R. Smith
    ICWSM (2011)
    Social media use by government: from the routine to the critical
    Andrea Kavanaugh
    Edward A. Fox
    Stephen Sheetz
    Seungwon Yang
    Lin Tyz Li
    Travis Whalen
    Donald Shoemaker
    Lexing Xie
    ACM Digital Government Conference, College Park, MD, USA (2011)
    IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System
    Liangliang Cao
    Shih-Fu Chang
    Noel C. F. Codella
    Courtenay Cotton
    Dan Ellis
    Leiguang Gong
    Matthew Hill
    Gang Hua
    John R. Kender
    Michele Merler
    Yadong Mu
    John R. Smith
    NIST TRECVID Workshop (2011)
    Probabilistic visual concept trees
    Lexing Xie
    Rong Yan
    Jelena Tesic
    John R. Smith
    ACM Multimedia (2010), pp. 867-870
    Multimedia semantics: opportunities and challenges
    Multimedia Information Retrieval (2010), pp. 9-10
    Design and evaluation of an effective and efficient video copy detection system
    Matthew L. Hill
    John R. Smith
    ICME (2010), pp. 1353-1358
    IBM Research TRECVID-2010 Video Copy Detection and Multimedia Event Detection System
    Matthew L. Hill
    Gang Hua
    John R. Smith
    Lexing Xie
    Bert Huang
    Michele Merler
    Hua Ouyang
    Mingyuan Zhou
    TRECVID (2010)
    The accuracy and value of machine-generated image tags: design and user evaluation of an end-to-end image tagging system
    Lexing Xie
    Matthew L. Hill
    John R. Smith
    Alex Phillips
    CIVR (2010), pp. 58-65
    Video genetics: a case study from YouTube
    John R. Kender
    Matthew L. Hill
    John R. Smith
    Lexing Xie
    ACM Multimedia (2010), pp. 1253-1258
    Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce
    Rong Yan
    Marc-Olivier Fleury
    Michele Merler
    John R. Smith
    ACM workshop on Large-scale multimedia retrieval and mining (LS-MMRM) (2009), pp. 35-42
    Evaluating application mapping scenarios on the Cell/B.E
    Ana Lucia Varbanescu
    Henk J. Sips
    Kenneth A. Ross
    Qiang Liu
    John R. Smith
    Lurng-Kuo Liu
    Concurrency and Computation: Practice and Experience, vol. 21 (2009), pp. 85-100
    Formal Models and Hybrid Approaches for Efficient Manual Image Annotation and Retrieval
    Rong Yan
    Murray Campbell
    Semantic Mining Technologies for Multimedia Databases (2009), pp. 272-297
    Hybrid Tagging and Browsing Approaches for Efficient Manual Image Annotation
    Rong Yan
    Murray Campbell
    IEEE MultiMedia, vol. 16 (2009), pp. 26-41
    IBM Research TRECVID-2009 Video Retrieval System
    Shenghua Bao
    Jane Chang
    Matthew Hill
    Michele Merler
    John R. Smith
    Dong Wang
    Lexing Xie
    Rong Yan
    Yi Zhang
    NIST TRECVID Workshop (2009)
    A learning-based hybrid tagging and browsing approach for efficient manual image annotation
    Rong Yan
    Murray Campbell
    CVPR (2008)
    Multi-query interactive image and video retrieval -: theory and practice
    Rong Yan
    Murray Campbell
    CIVR (2008), pp. 475-484
    Query-Adaptive Fusion for Multimodal Search
    Lyndon S. Kennedy
    Shih-Fu Chang
    Proceedings of the IEEE, vol. 96, no. 4 (2008)
    IBM Research TRECVID-2008 Video Retrieval System
    John R. Smith
    Jelena Tesic
    Lexing Xie
    Rong Yan
    Wei Jiang
    Michele Merler
    TRECVID (2008)
    IBM multimedia analysis and retrieval system
    John R. Smith
    Jelena Tesic
    Lexing Xie
    Rong Yan
    CIVR (2008), pp. 553-554
    Web-based information content and its application to concept-based video retrieval
    Alexander Haubold
    CIVR (2008), pp. 437-446
    Data Modeling Strategies for Imbalanced Learning in Visual Search
    Jelena Tesic
    Lexing Xie
    John R. Smith
    ICME (2007), pp. 1990-1993
    Dynamic Multimodal Fusion in Video Search
    Lexing Xie
    Jelena Tesic
    ICME (2007), pp. 1499-1502
    Semantics reinforcement and fusion learning for multimedia streams
    Dhiraj Joshi
    Milind R. Naphade
    CIVR (2007), pp. 309-316
    Digital Media Indexing on the Cell Processor
    Lurng-Kuo Liu
    Qiang Liu
    Kenneth A. Ross
    John R. Smith
    Ana Lucia Varbanescu
    ICME (2007), pp. 1866-1869
    An Effective Strategy for Porting C++ Applications on Cell
    Ana Lucia Varbanescu
    Henk J. Sips
    Kenneth A. Ross
    Qiang Liu
    Lurng-Kuo Liu
    John R. Smith
    ICPP (2007), pp. 59
    An efficient manual image annotation approach based on tagging and browsing
    Rong Yan
    Murray Campbell
    ACM Multimedia Workshop on the many faces of multimedia semantics (2007), pp. 13-20
    IBM multimodal interactive video threading (demo)
    Jelena Tesic
    Joachim Seidl
    John R. Smith
    CIVR (2007), pp. 124-126
    IBM Research TRECVID-2007 Video Retrieval System
    Murray Campbell
    Alexander Haubold
    Ming Liu
    John R. Smith
    Jelena Tesic
    Lexing Xie
    Rong Yan
    Jun Yang 0003
    TRECVID (2007)
    IBM multimedia search and retrieval system (demo)
    Jelena Tesic
    Lexing Xie
    Rong Yan
    John R. Smith
    CIVR (2007), pp. 645
    Cluster-based data modeling for semantic video search
    Jelena Tesic
    John R. Smith
    CIVR (2007), pp. 595-602
    A Greedy Performance Driven Algorithm for Decision Fusion Learning
    Dhiraj Joshi
    Milind R. Naphade
    ICIP (6) (2007), pp. 25-28
    Semantic concept-based query expansion and re-ranking for multimedia retrieval
    Alexander Haubold
    Jelena Tesic
    Lexing Xie
    Rong Yan
    ACM Multimedia (2007), pp. 991-1000
    IBM research TRECVID-2006 video retrieval system
    Murray Campbell
    Alexander Haubold
    Shahram Ebadollahi
    Milind R. Naphade
    John R. Smith
    Jelena Tesic
    Lexing Xie
    NIST TRECVID Workshop (2006)
    Assessing the Filtering and Browsing Utility of Automatic Semantic Concepts for Multimedia Retrieval
    Michael G. Christel
    Milind R. Naphade
    Jelena Tesic
    CVPR'06 Workshop on Semantic Learning Applications in Multimedia (SLAM) (2006), pp. 117
    Exploring Automatic Query Refinement for Text-Based Video Retrieval
    Timo Volkmer
    ICME (2006), pp. 765-768
    Multimodal Search for Effective Video Retrieval (demo)
    CIVR (2006), pp. 525-528
    Semantic Multimedia Retrieval using Lexical Query Expansion and Model-Based Reranking
    Alexander Haubold
    Milind R. Naphade
    ICME (2006), pp. 1761-1764
    IBM research TRECVID-2005 video retrieval system
    Arnon Amir
    J. Argillander
    Murray Campbell
    Alexander Haubold
    Giri Iyengar
    Shahram Ebadollahi
    F. Kang
    Milind R. Naphade
    John R. Smith
    Jelena Tesic
    Timo Volkmer
    NIST TRECVID Workshop (2005)
    Learning and classification of semantic concepts in broadcast video
    John R. Smith
    Murray Campbell
    Milind R. Naphade
    Jelena Tesic
    International Conference of Intelligence Analysis (2005)
    Automatic discovery of query-class-dependent models for multimodal search
    Lyndon S. Kennedy
    Shih-Fu Chang
    ACM Multimedia (2005), pp. 882-891
    A web-based system for collaborative annotation of large image and video collections: an evaluation and user study
    Timo Volkmer
    John R. Smith
    ACM Multimedia (2005), pp. 892-901
    Multimedia Research Challenges for Industry
    John R. Smith
    Milind R. Naphade
    Jelena Tesic
    CIVR (2005), pp. 28-37
    Learning the semantics of multimedia queries and concepts from a small number of examples
    Milind R. Naphade
    Jelena Tesic
    ACM Multimedia (2005), pp. 598-607
    Multi-granular detection of regional semantic concepts
    Milind R. Naphade
    Ching-Yung Lin
    John R. Smith
    ICME (2004), pp. 109-112
    Over-complete representation and fusion for semantic concept detection
    Milind R. Naphade
    John R. Smith
    ICIP (2004), pp. 2375-2378
    Semantic representation: search and mining of multimedia content
    Milind R. Naphade
    John R. Smith
    KDD (2004), pp. 641-646
    Content transcoding middleware for pervasive geospatial intelligence access
    Ching-Yung Lin
    Belle L. Tseng
    Matthew Hill
    John R. Smith
    Chung-Sheng Li
    ICME (2004), pp. 2139-2142
    WALRUS: A Similarity Retrieval Algorithm for Image Databases
    Rajeev Rastogi
    Kyuseok Shim
    IEEE Trans. Knowl. Data Eng., vol. 16 (2004), pp. 301-316
    Validity-weighted model vector-based retrieval of video
    John R. Smith
    Ching-Yung Lin
    Milind R. Naphade
    Belle L. Tseng
    Storage and Retrieval Methods and Applications for Multimedia (2004), pp. 271-279
    Multimodal video search techniques: late fusion of speech-based retrieval and visual content-based retrieval
    Arnon Amir
    Giri Iyengar
    Ching-Yung Lin
    Milind R. Naphade
    Chalapathy Neti
    Harriet J. Nock
    John R. Smith
    Belle Tseng
    Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) (2004), pp. 1048-1051
    Multisource Video Clustering Using Semantic Model Vectors
    John R. Smith
    Ching-Yung Lin
    Milind R. Naphade
    Multimedia Information Retrieval, AIDA Informazioni (2004)
    IBM research TRECVID-2004 video retrieval system
    Arnon Amir
    J. Argillander
    M. Berg
    Shih-Fu Chang
    M. Franz
    Winston Hsu
    Giri Iyengar
    John R. Kender
    Lyndon S. Kennedy
    Ching-Yung Lin
    Milind R. Naphade
    John R. Smith
    Jelena Tesic
    Gang Wu
    Rong Yan
    Donqing Zhang
    NIST TRECVID Workshop (2004)
    Lexicon design for semantic indexing in media databases
    Milind R. Naphade
    John R. Smith
    International Conference on Communication Technologies and Programming (2003)
    Interactive search fusion methods for video database retrieval
    John R. Smith
    Alejandro Jaimes
    Ching-Yung Lin
    Milind R. Naphade
    Belle L. Tseng
    ICIP (1) (2003), pp. 741-744
    MPEG-7 video automatic labeling system (demo)
    Ching-Yung Lin
    Belle L. Tseng
    Milind R. Naphade
    John R. Smith
    ACM Multimedia (2003), pp. 98-99
    IBM Research TRECVID-2003 Video Retrieval System
    Arnon Amir
    Marco Berg
    Shih-Fu Chang
    Winston Hsu
    Giridharan Iyengar
    Ching-Yung Lin
    Milind R. Naphade
    Chalapathy Neti
    Harriet Nock
    John R. Smith
    Belle L. Tseng
    Yi Wu
    Donqing Zhang
    NIST TRECVID Workshop (2003)
    Multimedia semantic indexing using model vectors
    John R. Smith
    Milind R. Naphade
    ICME (2003), pp. 445-448
    Normalized classifier fusion for semantic visual concept detection
    Belle L. Tseng
    Ching-Yung Lin
    Milind R. Naphade
    John R. Smith
    ICIP (2) (2003), pp. 535-538
    A framework for moderate vocabulary semantic visual concept detection
    Milind R. Naphade
    Ching-Yung Lin
    Belle L. Tseng
    John R. Smith
    ICME (2003), pp. 437-440
    Statistical Techniques for Video Analysis and Searching
    John R. Smith
    Ching-Yung Lin
    Milind R. Naphade
    Belle L. Tseng
    Video Mining, Kluwer Academic Publishers (2003)
    New anchor selection methods for image retrieval
    John R. Smith
    Storage and Retrieval for Media Databases (2003), pp. 474-481
    Exploring semantic dependencies for scalable concept detection
    Milind R. Naphade
    John R. Smith
    ICIP (3) (2003), pp. 625-628
    Active selection for multi-example querying by content
    John R. Smith
    ICME (2003), pp. 445-448
    VideoAL: a novel end-to-end MPEG-7 video automatic labeling system
    Ching-Yung Lin
    Belle L. Tseng
    Milind R. Naphade
    John R. Smith
    ICIP (3) (2003), pp. 53-56
    User-trainable video annotation using multimodal cues
    Ching-Yung Lin
    Milind R. Naphade
    Chalapathy Neti
    John R. Smith
    Belle L. Tseng
    Harriet J. Nock
    W. Adams
    SIGIR (2003), pp. 403-404
    Aggregate Predicate Support in DBMS
    Gene Y. C. Fuh
    Weidong Chen
    Chi-Huang Chiu
    Jeffrey Scott Vitter
    Australasian Database Conference (2002)
    A study of image retrieval by anchoring
    John R. Smith
    ICME (2002)
    IBM Research TREC 2002 Video Retrieval System
    Bill Adams
    Giridharan Iyengar
    Chalapathy Neti
    Harriet J. Nock
    Arnon Amir
    Haim H. Permuter
    Savitha Srinivasan
    Chitra Dorai
    Alejandro Jaimes
    Christian A. Lang
    Ching-Yung Lin
    Milind R. Naphade
    John R. Smith
    Belle L. Tseng
    Sugata Ghosal
    Raghavendra Singh
    T. V. Ashwin
    DongQing Zhang
    TREC (2002)
    Spatial and feature normalization for content-based retrieval
    John R. Smith
    ICME (2002), pp. 193-196
    CAMEL: concept annotated image libraries
    Atul Chadha
    Basuki Soetarman
    Jeffrey Scott Vitter
    Storage and Retrieval for Media Databases (2001), pp. 62-73
    Supporting Incremental Join Queries on Ranked Inputs
    Yuan-Chi Chang
    John R. Smith
    Chung-Sheng Li
    Jeffrey Scott Vitter
    VLDB (2001), pp. 281-290
    Constrained querying of multimedia databases: issues and approaches
    John R. Smith
    Yuan-Chi Chang
    Chung-Sheng Li
    Jeffrey Scott Vitter
    Storage and Retrieval for Media Databases (2001), pp. 74-85
    Text compression via alphabet re-representation
    Philip M. Long
    Jeffrey Scott Vitter
    Neural Networks, vol. 12 (1999), pp. 755-765
    WALRUS: A Similarity Retrieval Algorithm for Image Databases
    Rajeev Rastogi
    Kyuseok Shim
    SIGMOD Conference (1999), pp. 395-406
    Text Compression Via Alphabet Re-Representation
    Philip M. Long
    Jeffrey Scott Vitter
    Data Compression Conference (1997), pp. 161-170