What models does MiniMax offer for API access?

MiniMax currently offers M2.5, M2.5 Lightning, M2.7, M2.7 Highspeed, and M2.1 variants. M2.5 is optimized for high-throughput production workloads; M2.7 targets software engineering and complex agent tasks.

How much does MiniMax API cost via ofox?

Via ofox, MiniMax M2.5 is $0.30/M input and $1.20/M output tokens. M2.7 is $0.30/M input and $1.20/M output. M2.7 Highspeed is $0.60/M input and $2.40/M output.

Is MiniMax M2.5 open source?

Yes — MiniMax M2.5 weights are fully open-sourced on HuggingFace. You can self-host with vLLM or SGLang for optimal performance.

Can I use MiniMax with the OpenAI SDK?

Yes. ofox exposes MiniMax models through an OpenAI-compatible endpoint, so you only need to change the base URL and model name.

MiniMax API Guide: Access M2.5 and M2.7 via ofox (2026)

TL;DR: MiniMax’s M2.5 and M2.7 are among the most cost-efficient production-grade models available in 2026 — M2.5 claims SOTA on Multi-SWE-Bench and costs roughly 1/10th of comparable closed models; M2.7 hits 56.22% on SWE-Pro with an ELO of 1495. You can access both through ofox with a single OpenAI-compatible key, no MiniMax account required.

MiniMax M2.5 costs 1/10th to 1/20th of comparable models and outperforms closed-source alternatives on coding benchmarks — it’s the model most developers are ignoring.

What Is MiniMax, and Why Does It Matter Now?

MiniMax is a Shanghai-based AI lab that has quietly shipped three generation-over-generation model upgrades since 2025. Unlike labs that chase headlines, MiniMax publishes benchmark results and then open-sources the weights — M2.5 is fully available on HuggingFace. For production teams watching token costs, that combination is genuinely unusual.

The two models worth knowing right now:

MiniMax M2.5 — designed for high-throughput, low-latency production environments. Available at 100 TPS and 50 TPS tiers. Open-source (Apache 2.0). Claims best-in-industry on Multi-SWE-Bench (multilingual software engineering tasks). Output pricing is 1/10th to 1/20th of comparable models by MiniMax’s own measurement.
MiniMax M2.7 — a step up for agentic and complex software engineering workflows. Scores 56.22% on SWE-Pro, 55.6% on VIBE-Pro, and 57.0% on Terminal Bench 2. Achieves an ELO score of 1495 on GDPval-AA, which MiniMax claims is the highest among open-source models in that class. Approaches Claude Sonnet 4.6 on coding-focused leaderboards.

Pricing at a Glance

ofox prices as of April 2026 (verified from the ofox models page):

Model	Input (per M tokens)	Output (per M tokens)
minimax/minimax-m2.5	$0.30	$1.20
minimax/minimax-m2.5-lightning	$0.30	$2.40
minimax/minimax-m2.7	$0.30	$1.20
minimax/minimax-m2.7-highspeed	$0.60	$2.40
minimax/minimax-m2.1	$0.30	$1.20

For context: Claude Sonnet 4.6 on ofox is $3.00/M input and $15.00/M output. MiniMax M2.7 at $0.30/$1.20 is 10× cheaper on input tokens for benchmarks that put it in the same league for coding tasks.

How to Access MiniMax via ofox

ofox provides an OpenAI-compatible endpoint, meaning you change two lines of code — not your entire SDK stack.

Python (openai SDK):

from openai import OpenAI

client = OpenAI(
    api_key="sk-YOUR_OFOX_KEY",
    base_url="https://api.ofox.ai/v1"
)

response = client.chat.completions.create(
    model="minimax/minimax-m2.7",
    messages=[{"role": "user", "content": "Write a Python function to parse JSON with error handling."}]
)
print(response.choices[0].message.content)

curl:

curl -X POST "https://api.ofox.ai/v1/chat/completions" \
  -H "Authorization: Bearer sk-YOUR_OFOX_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"minimax/minimax-m2.5","messages":[{"role":"user","content":"Hello!"}]}'

Get your key at ofox.ai. No separate MiniMax account, no per-model API setup, no regional billing headaches.

M2.5 vs M2.7: Which Should You Use?

If you’re running a production pipeline with volume, M2.5 is the starting point. It handles 100 TPS, the weights are on HuggingFace (self-host with vLLM or SGLang if data residency matters), and it’s the right choice for standard code generation, RAG, and multilingual text tasks where raw throughput beats peak accuracy.

M2.7 is for when the task gets hairy. Bug hunting from logs, multi-step project delivery, automated Office document pipelines (Excel/PPT/Word), and any agentic loop where multi-turn coherence matters. It costs the same per token but the quality gap shows up on harder problems.

M2.7 Highspeed ($0.60/$2.40) exists for teams that need M2.7 quality but are hitting throughput limits on the standard tier. Most teams won’t need it until they’re at significant scale.

Benchmarks: Where Does MiniMax Actually Stand?

MiniMax publishes these numbers for M2.7 (sourced from minimax.io, April 2026):

SWE-Pro: 56.22% — measures real-world software engineering tasks, not toy problems
VIBE-Pro: 55.6% — agent viability benchmark
Terminal Bench 2: 57.0% — terminal/CLI-focused agent tasks
GDPval-AA ELO: 1495 — MiniMax claims highest among open-source models

For M2.5: best performance on Multi-SWE-Bench (multilingual software engineering), per MiniMax’s published results.

One caveat: MiniMax publishes their own benchmarks, which any lab can optimize for. Independent third-party evaluations on these specific models are still sparse. The numbers are directionally accurate but treat them as a starting point, not a definitive verdict.

For a broader picture of where models rank against each other, see our LLM Leaderboard.

Practical Use Cases

Code generation and review: M2.7’s SWE-Pro score suggests it handles real repositories better than its price implies. For a team running 10M output tokens/month, switching from Claude Sonnet 4.6 ($150,000) to MiniMax M2.7 ($12,000) is a 12× cost reduction worth stress-testing.

Agentic pipelines: M2.7 was specifically designed for “complex environment interaction” and “building complex agents.” If you’re hitting rate limits or cost ceilings with Claude on agent loops, M2.5 or M2.7 are worth benchmarking on your specific task.

Multilingual apps: M2.5 claims top scores on multilingual software engineering. If your codebase or user queries mix languages, that’s a real differentiator.

Self-hosted / on-prem: M2.5 weights are on HuggingFace. For teams with data residency requirements, this is one of the few production-grade models where you have that option.

Migrating from Another Provider

If you’re already using ofox with Claude or Gemini, switching to MiniMax is a one-line model swap:

# Before
model="anthropic/claude-sonnet-4.6"

# After — same code, same endpoint
model="minimax/minimax-m2.7"

The unified ofox API means no credential rotation, no new billing setup, no SDK changes. If you’re still routing through a separate provider per model, see our AI API aggregation guide for why a single gateway simplifies this at scale.

Bottom Line

If you’re paying $15+/M for output tokens on tasks MiniMax M2.7 handles at $1.20/M, you owe yourself a one-day benchmark run — the savings at production scale are hard to ignore.

MiniMax M2.5 and M2.7 are not household names yet in the English-speaking developer community, which means the window to lock in cheap, high-quality inference is open. The models have real benchmark numbers behind them, M2.5 is open-source, and both are available on ofox with zero account setup overhead.

Start with M2.7 for agent and coding tasks. Use M2.5 for high-volume throughput. Run your own evals on your actual workload — and check the cost delta against what you’re paying now.

Get access at ofox.ai.

What Is MiniMax, and Why Does It Matter Now?

Pricing at a Glance

How to Access MiniMax via ofox

M2.5 vs M2.7: Which Should You Use?

Benchmarks: Where Does MiniMax Actually Stand?

Practical Use Cases

Migrating from Another Provider

Bottom Line

Related Articles

How to Use Any OAI-Compatible API with GitHub Copilot — Custom Model Setup Guide

Qwen 3.7 Max Developer Guide: 1M Context & $2.50/MTok (2026)

Doubao Seed 2.0 API Guide: ByteDance's Budget LLM Pricing, Setup & Benchmarks (2026)