๐ ๐ฒ๐ฐ๐ต๐ฎ๐ป๐ถ๐๐๐ถ๐ฐ ๐๐ป๐๐ฒ๐ฟ๐ฝ๐ฟ๐ฒ๐๐ฎ๐ฏ๐ถ๐น๐ถ๐๐: ๐๐ป๐๐ถ๐ฑ๐ฒ ๐ง๐ฟ๐ฎ๐ป๐๐ณ๐ผ๐ฟ๐บ๐ฒ๐ฟ๐
Deep learning was a black box. You saw inputs. You saw outputs. You did not know what happened inside.
Mechanistic interpretability changes this. It is reverse engineering for AI. You find the exact steps the network takes. You find the parts doing the work.
Researchers find clear structures inside these networks:
- Induction heads. These look for patterns and copy the next part.
- Curve detectors. These find lines and angles in pictures.
- Superposition. Networks store more info than they have neurons. They compress data. One neuron handles many tasks.
The circuit hypothesis says networks use circuits. These are small groups of parts. Remove a circuit to see if a behavior stops. This proves the circuit did the work.
Some study one network in detail. This is the specimen approach. Map every circuit. Use these lessons for other networks. It is like studying a fruit fly to understand humans.
Optional learning community: https://t.me/GyaanSetuAi