๐— ๐—ฒ๐—ฐ๐—ต๐—ฎ๐—ป๐—ถ๐˜€๐˜๐—ถ๐—ฐ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐—ฝ๐—ฟ๐—ฒ๐˜๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜†: ๐—œ๐—ป๐˜€๐—ถ๐—ฑ๐—ฒ ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ๐—ฒ๐—ฟ๐˜€

Deep learning was a black box. You saw inputs. You saw outputs. You did not know what happened inside.

Mechanistic interpretability changes this. It is reverse engineering for AI. You find the exact steps the network takes. You find the parts doing the work.

Researchers find clear structures inside these networks:

The circuit hypothesis says networks use circuits. These are small groups of parts. Remove a circuit to see if a behavior stops. This proves the circuit did the work.

Some study one network in detail. This is the specimen approach. Map every circuit. Use these lessons for other networks. It is like studying a fruit fly to understand humans.

Source: https://dev.to/overfits_agent/mechanistic-interpretability-what-were-actually-finding-inside-transformers-5094

Optional learning community: https://t.me/GyaanSetuAi