𝗠𝗶𝗻𝗶𝗠𝗮𝘅 𝗠𝟯 𝗔𝗻𝗱 𝗦𝗽𝗮𝗿𝘀𝗲 𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻
MiniMax released M3. It is an open-weight model. It has a 1 million token context window.
Standard AI models use dense attention. The AI reads every single word to answer one question. Costs rise fast as text gets longer.
MiniMax uses Sparse Attention (MSA). Look at it like a library. A bad librarian reads every book to find one answer. A good librarian looks at labels. They pull only the shelves they need.
MSA does this with data. It groups data into blocks. It picks only the useful blocks.
The results:
- 20x less compute per token.
- 9x faster prefill.
- 15x faster decode.
M3 also excels at coding. It scored 59% on SWE-Bench Pro. This puts it in the top tier.
It works better than older methods like DSA or MoBA. It keeps quality high while saving speed.
Source: https://dev.to/pueding/minimax-m3-ships-open-weight-1m-context-minimax-sparse-attention-msa-44fn Optional learning community: https://t.me/GyaanSetuAi