Neural Architecture Search with Reinforcement Learning
Abstract
Neural networks are powerful and flexible models that work well for many difficult
learning tasks in image, speech and natural language understanding. Despite their
success, neural networks are still hard to design. In this paper, we use a
recurrent network to generate the model descriptions of neural networks and train
this RNN with reinforcement learning to maximize the expected accuracy of the
generated architectures on a validation set. On the CIFAR-10 dataset, our method,
starting from scratch, can design a novel network architecture that rivals the best
human-invented architecture in terms of test set accuracy. Our CIFAR-10 model
achieves a test error rate of 3.84, which is only 0.1 percent worse and 1.2x faster
than the current state-of-the-art model. On the Penn Treebank dataset, our model
can compose a novel recurrent cell that outperforms the widely-used LSTM cell, and
other state-of-the-art baselines. Our cell achieves a test set perplexity of 62.4
on the Penn Treebank, which is 3.6 perplexity better than the previous
state-of-the-art.