(ICML 2018) Efficient Neural Architecture Search via Parameter Sharing

Paper:

Code:

# 中文

## 方法

### 设计循环单元

1. 哪一条边处于激活状态

2. 在 DAG 中的每一个结点会执行哪些运算

### 设计卷积网络

1. 前一个单元需要连接什么；

2. 它需要什么样的运算过程。

### 设计卷积单元

1. 从搜索空间中对计算图进行采样
2. 对所有操作设置步长为 2

##结论

# English

## Introduction

The main contribution of this work is to improve the efficiency of NAS by forcing all child models to share weights to eschew training each child model from scratch to convergence.

Importantly, in all of our experiments, for which we use a single Nvidia GTX 1080Ti GPU, the search for architectures takes less than 16 hours. Compared to NAS, this is a reduction of GPU-hours by more than 1000x.

## Methods

Central to the idea of ENAS is the observation that all of the graphs which NAS ends up iterating over can be viewed as sub-graphs of a larger graph. In other words, we can represent NAS’s search space using a single directed acyclic graph (DAG).

### Designing Recurrent Cells

ENAS’s controller is an RNN that decides:

1. 手机上的澳门永利真的假的which edges are activated

2. 手机上的澳门永利真的假的which computations are performed at each node in the DAG.

### Training ENAS and Deriving Architectures

Our controller network is an LSTM with 100 hidden units.

The first phase trains $\omega$, the shared parameters of the child models, on a whole pass through the training data set.

Training the shared parameters $\omega$ of the child models.

In this step, we fix the controller’s policy $\pi(m; \theta)$ and perform stochastic gradient descent (SGD) on $\omega$ to minimize the expected loss function (the standard cross-entropy loss). The gradient is computed using the Monte Carlo estimate.

Training the controller parameters $\theta$.

In this step, we fix $\omega$ and update the policy parameters $\theta$, aiming to maximize the expected reward. The gradient is computed using REINFORCE.

Deriving Architectures.

### Designing Convolutional Networks

In the search space for convolutional models, the controller RNN also samples two sets of decisions at each decision block:

1. 手机上的澳门永利真的假的what previous nodes to connect to

2. what computation operation to use.

Figure 3. An example run of a recurrent cell in our search space with 4 computational nodes, which represent 4 layers in a convolutional network. Top: The output of the controller RNN. Bot- tom Left: The computational DAG corresponding to the network’s architecture. Red arrows denote the active computational paths. Bottom Right: The complete network. Dotted arrows denote skip connections.

### Designing Convolutional Cells

Rather than designing the entire convolutional network, one can design smaller modules and then connect them together to form a network. Figure 4 illustrates this design, where the convolutional cell and reduction cell architectures are to be designed.

Figure 4. Connecting 3 blocks, each with N convolution cells and 1 reduction cell, to make the final network.

The 5 available operations are: identity, separable convolution with kernel size 3 x 3 and 5 x 5, and average pooling and max pooling with kernel size 3 x 3.

1. sampling a computational graph from the search space

2. applying all operations with a stride of 2.

## Experiments

### Language Model with Penn Treebank

Figure 6. The RNN cell ENAS discovered for Penn Treebank.

Table 1. Test perplexity on Penn Treebank of ENAS and other baselines. Abbreviations: RHN is Recurrent Highway Network, VD is Variational Dropout; WT is Weight Tying; ℓ2 is Weight Penalty; AWD is Averaged Weight Drop; MoC is Mixture of Contexts; MoS is Mixture of Softmaxes.

### Image Classification on CIFAR-10

Table 2. Classification errors of ENAS and baselines on CIFAR-10. In this table, the first block presents DenseNet, one of the state-ofthe- art architectures designed by human experts. The second block presents approaches that design the entire network. The last block presents techniques that design modular cells which are combined to build the final network.

ENAS takes 11.5 hours to discover the convolution cell and the reduction cell, which are visualized in Figure 8.

Figure 8. ENAS cells discovered in the micro search space.

## Conclusion

In this paper, we presented ENAS, a novel method that speeds up NAS by more than 1000x, in terms of GPU hours. ENAS’s key contribution is the sharing of parameters across child models during the search for architectures.