MiniMax M3

Coding & Agentic Frontier. 1M-context MSA. Native Multimodality

The first open-weight model with three frontier capabilities.

Performance Benchmark

M3 achieves top-tier performance on coding and agentic benchmarks, with autonomous task decomposition, tool invocation, and multi-step reasoning capabilities — providing a reliable foundation for AI coding assistants and automated workflows.

Powered by the proprietary MiniMax Sparse Attention (MSA) architecture, M3 API supports up to 1M tokens context window with a guaranteed minimum of 512K tokens. The 1M context is the infrastructure for long-range Agent tasks, long-range Coding, and long-video understanding.

A natively multimodal model. The entire data pipeline was rebuilt to scale pretraining data to 100T+, with multimodal training from step zero achieving deep alignment between textual and visual semantic spaces. Multimodal is a native core capability, not a superficial add-on.

On BrowseComp, M3 scores 83.5, surpassing Opus 4.7 (79.3), demonstrating strong autonomous browsing and information retrieval capabilities.

Until now, only a handful of closed-source models could simultaneously achieve frontier coding capabilities, million-token context, and Multimodal. M3 is the first to bring complete frontier capability to the open world.

Benchmark

Coding and Agent capabilities are key improvements in M3. In authoritative international benchmarks spanning software engineering, terminal execution, and more, M3 achieves world-leading performance.

Paper Reproduction: 12-Hour Autonomous ICLR Paper Replication

Paper Reproduction: 12-Hour Autonomous ICLR Paper Replication

We tasked M3 with independently reproducing an ICLR 2025 Outstanding Paper — Learning Dynamics of LLM Finetuning. M3 ran continuously for nearly 12 hours, independently producing 18 commits and 23 experimental figures, successfully replicating the core experiments. Multimodal capabilities parsed charts and formulas from the paper, long context fit paper + code + experiment logs in a single window, and coding + agentic capabilities drove long-horizon execution.

CUDA Kernel Optimization: 147 Iterations, 9.4× Speedup

FP8 GEMM is one of the most compute-intensive and difficult-to-optimize operations in LLM inference. We asked M3 to optimize this kernel on NVIDIA Hopper GPUs, starting with only a task description and a non-runnable Triton skeleton. Over ~24 hours, M3 completed 147 benchmark submissions and 1,959 tool calls, pushing hardware peak utilization from 7.6% to 71.3% — a 9.4× speedup with zero human intervention.

CUDA Kernel Optimization: 147 Iterations, 9.4× Speedup
PostTrainBench: M3 Training Models on Its Own

PostTrainBench: M3 Training Models on Its Own

We gave M3 four pretrain-only base models and asked it to autonomously complete the full pipeline — data synthesis, training, evaluation, and iteration — within 12 hours, enabling the models to perform math reasoning, code generation, and knowledge QA. The entire process ran without human intervention. M3 scored 37.1, ranking #3 overall, behind only Opus 4.7 (42.4) and GPT-5.5 (39.3), significantly ahead of all other models.

Rebate of 10% for the inviterDiscount of 10% for invitees

Invite friends, earn benefits

Subscribe to a Token Plan got a 10% discount, while the inviter got a 10% rebate!

DEVELOPER TOOLS

Empowering Developer Choice

Outstanding Tool Scaffolding Generalization

Claude CodeRoo CodeKilo CodeClineCodex CLIOpenCodeDroidTRAEGrok CLICursor
Claude CodeRoo CodeKilo CodeClineCodex CLIOpenCodeDroidTRAEGrok CLICursor

01 / Access Method

Quick API Integration

API versions: M3, with identical results but faster speed. Full automatic Cache support, no configuration needed.

PYTHONCURL
import requests
url = "https://api.minimax.io/v1/text/chatcompletion_v2"
payload = {
"model": "MiniMax-M3",
"messages": [
{"role": "user", "content": "Hello"}
]
}
headers = {"Authorization": "Bearer <token>"}
response = requests.post(url, json=payload, headers=headers)
print(response.text)
v2.1 API Connected
main.py

02 / Access Method

For AI Coding Tools

01 /

Subscribe to the Token Plan

The price remains unchanged, while performance has significantly improved. Token Plan users now automatically benefit from M3's enhanced coding and reasoning capabilities.

Read More
02 /

Open Platform Integration

Supports standard M3, with up to 1M tokens context window.

Read More
03 /

MiniMax Code Integration

The general Agent platform based on M3 is now fully open. Experience coding agentic, multimodal understanding, and other flagship capabilities without any development required.

Read More
04 /

Open Source and Local Deployment

We are committed to giving back to the community. M3 will soon be fully open-sourced on HuggingFace and GitHub, supporting private cluster deployment and fine-tuning.

Read More