๐๐ฐ๐๐ถ๐๐ฎ๐๐ถ๐ผ๐ป ๐๐๐ป๐ฐ๐๐ถ๐ผ๐ป๐: ๐ง๐ต๐ฒ ๐๐ฒ๐ป๐ฑ ๐๐ป ๐๐ฒ๐ฒ๐ฝ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด
A neuron computes wยทx + b. This math creates a straight line.
If you stack two linear layers, the math collapses. Layer 2 (Layer 1(x)) = W2(W1x) = (W2W1)x.
Two layers become one single linear layer. A 100-layer network without activation functions remains one straight line. You cannot use a straight line to process images or language.
Activation functions add a bend to the math. Each layer warps space. Stacking these bends allows neural networks to approximate any shape.
Common activation functions:
โข ReLU: The modern standard. It uses Math.max(0, z). It is fast and helps deep networks train. โข Sigmoid: Used for output probabilities between 0 and 1. โข Tanh: Maps values between -1 and 1. โข Leaky ReLU: Fixes the problem where neurons stay stuck at zero. It uses a small slope for negative values.
How to choose your function:
- Hidden layers: Use ReLU or Leaky ReLU.
- Binary output: Use Sigmoid.
- Multi-class output: Use Softmax.
- Regression output: Use a linear function.
These choices cover most networks you will build.
Interactive tool to see how curves work: https://dev48v.infy.uk/dl/day2-activations.html
Day 2 of DeepLearningFromZero.