Notes on Self-Normalizing Neural Networks.
Intuition
Only two design choices are available for the function $g$: (1) the activation function and (2) the initialization of the weights.
Requirements for an SNN activation function
- negative and positive values for controlling the mean
- saturation regions (derivatives approaching zero) to dampen the variance if it is too large in the lower layer
- a slope larger than one to increase the variance if it is too small in the lower layer
- a continuous curve
Introducing the “Scaled Exponential Linear Units”, or SELUs
\[\mathrm{selu}(x) = \lambda \begin{cases} &x &\text{if } x \gt 0 \\ &\alpha e^{x} - \alpha &\text{if } x \leq 0 \end{cases} \,.\]Now find the magic numbers $\lambda$ and $\alpha$ that make auto-convergence happen.
To Be Continued