Nick Johnston
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Preview abstract
The rate-distortion performance of neural image compression models has exceeded the state-of-the-art of non-learned codecs, but neural codecs are still far from widespread deployment and adoption. The largest obstacle is having efficient models that are feasible on a wide variety of consumer hardware. Comparative research and evaluation is difficult because of the lack of standard benchmarking platforms and by variations in hardware architectures and test environments.Through our rate-distortion-computation (RDC) study we demonstrate that neither floating-point operations (FLOPs) nor runtime are sufficient on their own to accurately rank neural compression methods. We also explore the RDC frontier, which leads to a family of model architectures with the best empirical trade-off between computational requirements and RD performance. Finally, we identify a novel neural compression architecture that yields state-of-the-art RD performance with rate savings of 23.1% over BPG (7.0% overVTM and 3.0% over ELIC) without requiring significantly more FLOPs than other learning-based codecs
View details
Preview abstract
Compression is essential to storing and transmitting medical videos, but the effect of compression artifacts on downstream medical tasks is often ignored. Furthermore, systems in practice rely on standard video codecs, which naively allocate bits evenly between medically interesting and uninteresting frames and parts of frames. In this work, we present an empirical study of some deficiencies of classical codecs on gastroenterology videos, and motivate our ongoing work to train a learned compression model for colonoscopy videos, which we call ``GastroEnterology Aware Compression" (GEAC). We show that H264 and HEVC, two of the most common classical codecs, perform worse on the most medically-relevant frames. We also show that polyp detector performance degrades rapidly as compression increases, and explain why a learned compressor would degrade more gracefully. Many of our proposed techniques generalize to medical video domains beyond gastroenterology.
View details
Neural Video Compression using GANs for Detail Synthesis and Propagation
European Conference on Computer Vision (2022)
Preview abstract
We present the first neural video compression method based on generative adversarial networks (GANs). Our approach significantly outperforms previous neural and non-neural video compression methods in a user study, setting a new state-of-the-art in visual quality for neural methods. We show that the GAN loss is crucial to obtain this high visual quality. Two components make the GAN loss effective: we i) synthesize detail by conditioning the generator on a latent extracted from the warped previous reconstruction to then ii) propagate this detail with high-quality flow. We find that user studies are required to compare methods, i.e., none of our quantitative metrics were able to predict all studies. We present the network design choices in detail, and ablate them with user studies.
View details
Nonlinear Transform Coding
Philip A. Chou
Sung Jin Hwang
IEEE Trans. on Special Topics in Signal Processing, vol. 15 (2021) (to appear)
Preview abstract
We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate–distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate–distortion performance of NTC with the help of simple example sources, for which the optimal performance of a vector quantizer is easier to estimate than with natural data sources. To this end, we introduce a novel variant of entropy-constrained vector quantization. We provide an analysis of various forms of stochastic optimization techniques for NTC models; review architectures of transforms based on artificial neural networks, as well as learned entropy models; and provide a direct comparison of a number of methods to parameterize the rate–distortion trade-off of nonlinear transforms, introducing a simplified one.
View details
End-to-end Learning of Compressible Features
Abhinav Shrivastava
2020 IEEE Int. Conf. on Image Processing (ICIP)
Preview abstract
Pre-trained convolutional neural networks (CNNs) are very powerful as an off the shelf feature generator and have been shown to perform very well on a variety of tasks. Unfortunately, the generated features are high dimensional and expensive to store: potentially hundreds of thousands of floats per example when processing videos. Traditional entropy based lossless compression methods are of little help as they do not yield desired level of compression while general purpose lossy alternatives (e.g. dimensionality reduction techniques) are sub-optimal as they end up losing important information. We propose a learned method that jointly optimizes for compressibility along with the original objective for learning the features. The plug-in nature of our method makes it straight-forward to integrate with any target objective and trade-off against compressibility. We present results on multiple benchmarks and demonstrate that features learned by our method maintain their informativeness while being order of magnitude more compressible.
View details
Scale-Space Flow for End-to-End Optimized Video Compression
Sung Jin Hwang
2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR)
Preview abstract
Despite considerable progress on end-to-end optimized deep networks for image
compression, video coding remains a challenging task. Recently proposed
methods for learned video compression use optical flow and bilinear warping
for motion compensation and show competitive rate-distortion performance
relative to hand-engineered codecs like H.264 and HEVC. However, these
learning-based methods rely on complex architectures and training schemes
including the use of pre-trained optical flow networks, sequential training of
sub-networks, adaptive rate control, and buffering intermediate
reconstructions to disk during training. In this paper, we show that a
generalized warping operator that better handles common failure cases,
e.g. disocclusions and fast motion, can provide competitive compression
results with a greatly simplified model and training procedure. Specifically,
we propose scale-space flow, an intuitive generalization of optical
flow that adds a scale parameter to allow the network to better model
uncertainty. Our experiments show that a low-latency video compression model
(no B-frames) using scale-space flow for motion compensation can outperform
analogous state-of-the art learned video compression models while being
trained using a much simpler procedure and without any pre-trained optical
flow networks.
View details
Preview abstract
We consider the problem of using variational latent-variable models for data compression. For such models to produce a compressed binary sequence, which is the universal data representation in a digital world, the latent representation needs to be subjected to entropy coding. Range coding as an entropy coding technique is optimal, but it can fail catastrophically if the computation of the prior differs even slightly between the sending and the receiving side. Unfortunately, this is a common scenario when floating point math is used and the sender and receiver operate on different hardware or software platforms, as numerical round-off is often platform dependent. We propose using integer networks as a universal solution to this problem, and demonstrate that they enable reliable cross-platform encoding and decoding of images using variational models.
View details
Preview abstract
Image compression using neural networks have reached or exceeded non-neural methods (such as JPEG, WebP, BPG). While these networks are state of the art in rate-distortion performance, computational feasibility of these models remains a challenge. Our work provides three novel contributions. We propose a run-time improvement to the Generalized Divisive Normalization formulation, a regularization technique targeted to optimizing neural image decoders, and an analysis of
the trade offs in 207 architecture variations across multiple distortion loss functions to recommend an architecture that is twice as fast while maintaining state-of-the-art image compression performance.
View details
Neural Image Decompression: Learning to Render Better Image Previews
Michele Covell
2019 IEEE International Conference on Image Processing, IEEE
Preview abstract
A rapidly increasing portion of Internet traffic is dominated by requests from mobile devices with limited- and metered-bandwidth constraints. To satisfy these requests, it has become standard practice for websites to transmit small and extremely compressed image previews as part of the initial page-load process. Recent work, based on an adaptive triangulation of the target image, has shown the ability to generate thumbnails of full images at extreme compression rates: 200 bytes or less with impressive gains (in terms of PSNR and SSIM) over both JPEG and WebP standards. However, qualitative assessments and preservation of semantic content can be less favorable. We present a novel method to significantly improve the reconstruction quality of the original image with no changes to the encoded information. Our neural-based decoding not only achieves higher PSNR and SSIM scores than the original methods, but also yields a substantial increase in semantic-level content preservation. In addition, by keeping the same encoding stream, our solution is completely inter-operable with the original decoder. The end result is suitable for a range of small-device deployments, as it involves only a single forward-pass through a small, scalable network.
View details
Variational Image Compression with a Scale Hyperprior
Sung Jin Hwang
6th Int. Conf. on Learning Representations (ICLR) (2018)
Preview abstract
We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate–distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.
View details
Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates in Recurrent Convolutional Neural Networks
Damien Vincent
Michele Covell
Sung Jin Hwang
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Preview abstract
We propose a method for lossy image compression based on recurrent, convolutional neural networks that outperforms BPG (4:2:0), WebP, JPEG2000, and JPEG as measured by MS-SSIM. We introduce three improvements over previous research that lead to this state-of-the-art result using a single model. First, we show that training with a pixel-wise loss weighted by SSIM increases reconstruction quality according to several metrics. Second, we modify the recurrent architecture to improve spatial diffusion, which allows the network to more effectively capture and propagate image information through the network’s hidden state. Finally, in addition to lossless entropy coding, we use a spatially adaptive bit allocation algorithm to more efficiently use the limited number of bits to encode visually complex image regions. We evaluate our method on the Kodak and Tecnick image sets and compare against standard codecs as well recently published methods based on deep neural networks.
View details
Towards a Semantic Perceptual Image Metric
Sung Jin Hwang
Sergey Ioffe
Sean O'Malley
Charles Rosenberg
2018 25th IEEE Int. Conf. on Image Processing (ICIP)
Preview abstract
We present a full reference, perceptual image metric based on VGG-16, an artificial neural network trained on object classification. We fit the metric to a new database based on 140k unique images annotated with ground truth by human raters who received minimal instruction. The resulting metric shows competitive performance on TID 2013, a database widely used to assess image quality assessments methods. More interestingly, it shows strong responses to objects potentially carrying semantic relevance such as faces and text, which we demonstrate using a visualization technique and ablation experiments. In effect, the metric appears to model a higher influence of semantic context on judgements, which we observe particularly in untrained raters. As the vast majority of users of image processing systems are unfamiliar with Image Quality Assessment (IQA) tasks, these findings may have significant impact on real-world applications of perceptual metrics.
View details
Spatially adaptive image compression using a tiled deep network
Michele Covell
Sung Jin Hwang
Damien Vincent
Proceedings of the International Conference on Image Processing (2017), pp. 2796-2800
Preview abstract
Deep neural networks represent a powerful class of function approximators that
can learn to compress and reconstruct images. Existing image compression
algorithms based on neural networks learn quantized representations with a
constant spatial bit rate across each image. While entropy coding introduces
some spatial variation, traditional codecs have benefited significantly by
explicitly adapting the bit rate based on local image complexity and visual
saliency. This paper introduces an algorithm that combines deep neural
networks with quality-sensitive bit rate adaptation using a tiled network. We
demonstrate the importance of spatial context prediction and show improved
quantitative (PSNR) and qualitative (subjective rater assessment) results
compared to a non-adaptive baseline and a recently published image compression
model based on fully-convolutional neural networks.
View details
Full Resolution Image Compression with Recurrent Neural Networks
Damien Vincent
Sung Jin Hwang
Michele Covell
arxiv (2016)
Preview abstract
This paper presents a set of full-resolution lossy image compression methods based on neural networks. Each of the architectures we describe can provide variable compression rates during deployment without requiring retraining of the network: each network need only be trained once. All of our architectures consist of a recurrent neural network (RNN)-based encoder and decoder, a binarizer, and a neural network for entropy coding. We compare RNN types (LSTM, associative LSTM) and introduce a new hybrid of GRU and ResNet. We also study "one-shot" versus additive reconstruction architectures and introduce a new scaled-additive framework. We compare to previous work, showing improvements of 4.3%-8.8% AUC (area under the rate-distortion curve), depending on the perceptual metric used. As far as we know, this is the first neural network architecture that is able to outperform JPEG at image compression across most bitrates on the rate-distortion curve on the Kodak dataset images, with and without the aid of entropy coding.
View details
What’s Cookin’? Interpreting Cooking Videos using Text, Speech and Vision
Jonathan Malmaud
Vivek Rathod
Andrew Rabinovich
North American Chapter of the Association for Computational Linguistics – Human Language Technologies (NAACL HLT 2015) (to appear)
Preview abstract
We present a novel method for aligning a sequence
of instructions to a video of someone
carrying out a task. In particular, we focus
on the cooking domain, where the instructions
correspond to the recipe. Our technique
relies on an HMM to align the recipe steps
to the (automatically generated) speech transcript.
We then refine this alignment using
a state-of-the-art visual food detector, based
on a deep convolutional neural network. We
show that our technique outperforms simpler
techniques based on keyword spotting. It also
enables interesting applications, such as automatically
illustrating recipes with keyframes,
and searching within a video for events of interest.
View details
Im2Calories: towards an automated mobile vision food diary
Austin Myers
Vivek Rathod
Anoop Korattikara
Alex Gorban
Nathan Silberman
George Papandreou
ICCV (2015)
Preview abstract
We present a system which can recognize the contents
of your meal from a single image, and then predict its nutritional
contents, such as calories. The simplest version
assumes that the user is eating at a restaurant for which we
know the menu. In this case, we can collect images offline
to train a multi-label classifier. At run time, we apply the
classifier (running on your phone) to predict which foods
are present in your meal, and we lookup the corresponding
nutritional facts. We apply this method to a new dataset of
images from 23 different restaurants, using a CNN-based
classifier, significantly outperforming previous work. The
more challenging setting works outside of restaurants. In
this case, we need to estimate the size of the foods, as
well as their labels. This requires solving segmentation and
depth / volume estimation from a single image. We present
CNN-based approaches to these problems, with promising
preliminary results.
View details
No Results Found