𝗔𝗰𝘁𝗶𝘃𝗮𝘁𝗶𝗼𝗻 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀: 𝗧𝗵𝗲 𝗕𝗲𝗻𝗱 𝗜𝗻 𝗗𝗲𝗲𝗽 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴

📅23 hours ago⏱1 min read

A neuron computes w·x + b. This math creates a straight line.

If you stack two linear layers, the math collapses. Layer 2 (Layer 1(x)) = W2(W1x) = (W2W1)x.

Two layers become one single linear layer. A 100-layer network without activation functions remains one straight line. You cannot use a straight line to process images or language.

Activation functions add a bend to the math. Each layer warps space. Stacking these bends allows neural networks to approximate any shape.

Common activation functions:

• ReLU: The modern standard. It uses Math.max(0, z). It is fast and helps deep networks train. • Sigmoid: Used for output probabilities between 0 and 1. • Tanh: Maps values between -1 and 1. • Leaky ReLU: Fixes the problem where neurons stay stuck at zero. It uses a small slope for negative values.

How to choose your function:

Hidden layers: Use ReLU or Leaky ReLU.
Binary output: Use Sigmoid.
Multi-class output: Use Softmax.
Regression output: Use a linear function.

These choices cover most networks you will build.

Interactive tool to see how curves work: https://dev48v.infy.uk/dl/day2-activations.html

Day 2 of DeepLearningFromZero.

Full post: https://dev.to/dev48v/activation-functions-why-a-100-layer-network-without-them-is-still-one-line-ef6

𝗔𝗰𝘁𝗶𝘃𝗮𝘁𝗶𝗼𝗻 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀: 𝗧𝗵𝗲 𝗕𝗲𝗻𝗱 𝗜𝗻 𝗗𝗲𝗲𝗽 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴

Continue reading

𝗧𝗵𝗲 𝗦𝗵𝗮𝗽𝗲 𝗼𝗳 𝗮 𝗡𝗲𝘂𝗿𝗼𝗻

𝗠𝗲𝗰𝗵𝗮𝗻𝗶𝘀𝘁𝗶𝗰 𝗜𝗻𝘁𝗲𝗿𝗽𝗿𝗲𝘁𝗮𝗯𝗶𝗹𝗶𝘁𝘆: 𝗜𝗻𝘀𝗶𝗱𝗲 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿𝘀

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗔 𝗡𝗲𝘂𝗿𝗮𝗹 𝗡𝗲𝘁𝘄𝗼𝗿𝗸 𝗙𝗿𝗼𝗺 𝗦𝗰𝗿𝗮𝘁𝗰𝗵

𝗜 𝗕𝘂𝗶𝗹𝘁 𝗔 𝗡𝗲𝘂𝗿𝗮𝗹 𝗡𝗲𝘁𝘄𝗼𝗿𝗸'𝘀 𝗙𝗶𝗿𝘀𝘁 𝗡𝗲𝘂𝗿𝗼𝗻

𝗣𝘆𝘁𝗵𝗼𝗻 𝗳𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴: 𝗧𝗵𝗲 𝗥𝗼𝗮𝗱𝗺𝗮𝗽