Self-Normalizing Neural Networks Note

Intuition

Only two design choices are available for the function $g$: (1) the activation function and (2) the initialization of the weights.

Requirements for an SNN activation function

negative and positive values for controlling the mean
saturation regions (derivatives approaching zero) to dampen the variance if it is too large in the lower layer
a slope larger than one to increase the variance if it is too small in the lower layer
a continuous curve

Introducing the “Scaled Exponential Linear Units”, or SELUs

\[\mathrm{selu}(x) = \lambda \begin{cases} &x &\text{if } x \gt 0 \\ &\alpha e^{x} - \alpha &\text{if } x \leq 0 \end{cases} \,.\]

Now find the magic numbers $\lambda$ and $\alpha$ that make auto-convergence happen.

To Be Continued