𝗠𝗶𝗻𝗶𝗠𝗮𝘅 𝗠𝟯: 𝗔 𝗡𝗲𝘄 𝗪𝗮𝘆 𝗧𝗼 𝗛𝗮𝗻𝗱𝗹𝗲 𝗟𝗼𝗻𝗴 𝗖𝗼𝗻𝘁𝗲𝘅𝘁

MiniMax M3 is a new multimodal model from Shanghai. It features a 1-million-token context window. Most models become too slow and expensive at this length. MiniMax M3 uses a new method called MiniMax Sparse Attention (MSA) to solve this.

How MSA works:

• Index branch: A fast pass finds the most relevant parts of the data. • Sparse branch: The model only looks at those specific parts. • Memory efficiency: It groups queries together to stop GPU memory bottlenecks.

This makes the model 9x faster at processing data and 15x faster at generating text compared to previous versions.

Performance scores:

• SWE-Bench Pro: 59.0% • Terminal-Bench 2.1: 66.0% • BrowseComp: 83.5% • KernelBench Hard: 28.8% • MCP Atlas: 74.2%

The SWE-Bench Pro score is higher than GPT-5.5 and Gemini 3.1 Pro. However, Claude Opus 4.8 still leads with 69.2%. You should note that MiniMax ran these tests on their own hardware.

Technical details:

M3 is trained on text, images, and video together. It can operate desktop computers. In tests, it optimized a CUDA kernel on NVIDIA hardware.

You can use M3 in three ways:

  • MiniMax Platform API: It works with existing OpenAI code.
  • OpenRouter: Good if you do not want a direct MiniMax account.
  • Self-hosting: You need vLLM or SGLang support for the MSA architecture.

Pricing:

The cost is $0.60 per million input tokens and $2.40 per million output tokens. A launch discount brings these prices down to $0.30 and $1.20. This is much cheaper than Claude Opus.

Three things to remember:

  • Context is not memory. You still need external memory for long-term agent tasks.
  • Verify benchmarks. Wait for third-party tests before you switch your entire system.
  • Data privacy. MiniMax is based in Shanghai. Consider this if you handle sensitive data.

Source: https://dev.to/prabhakar_chaudhary_7afe4/minimax-m3-what-a-1m-token-open-weight-model-with-sparse-attention-actually-means-for-developers-i1i

Optional learning community: https://t.me/GyaanSetuAi