MiniMax API Guide: Access M2.5 and M2.7 via ofox (2026)

MiniMax API Guide: Access M2.5 and M2.7 via ofox (2026)

TL;DR: MiniMax’s M2.5 and M2.7 are among the most cost-efficient production-grade models available in 2026 — M2.5 claims SOTA on Multi-SWE-Bench and costs roughly 1/10th of comparable closed models; M2.7 hits 56.22% on SWE-Pro with an ELO of 1495. You can access both through ofox with a single OpenAI-compatible key, no MiniMax account required.

MiniMax M2.5 costs 1/10th to 1/20th of comparable models and outperforms closed-source alternatives on coding benchmarks — it’s the model most developers are ignoring.


What Is MiniMax, and Why Does It Matter Now?

MiniMax is a Shanghai-based AI lab that has quietly shipped three generation-over-generation model upgrades since 2025. Unlike labs that chase headlines, MiniMax publishes benchmark results and then open-sources the weights — M2.5 is fully available on HuggingFace. For production teams watching token costs, that combination is genuinely unusual.

The two models worth knowing right now:

  • MiniMax M2.5 — designed for high-throughput, low-latency production environments. Available at 100 TPS and 50 TPS tiers. Open-source (Apache 2.0). Claims best-in-industry on Multi-SWE-Bench (multilingual software engineering tasks). Output pricing is 1/10th to 1/20th of comparable models by MiniMax’s own measurement.
  • MiniMax M2.7 — a step up for agentic and complex software engineering workflows. Scores 56.22% on SWE-Pro, 55.6% on VIBE-Pro, and 57.0% on Terminal Bench 2. Achieves an ELO score of 1495 on GDPval-AA, which MiniMax claims is the highest among open-source models in that class. Approaches Claude Sonnet 4.6 on coding-focused leaderboards.

Pricing at a Glance

ofox prices as of April 2026 (verified from the ofox models page):

ModelInput (per M tokens)Output (per M tokens)
minimax/minimax-m2.5$0.30$1.20
minimax/minimax-m2.5-lightning$0.30$2.40
minimax/minimax-m2.7$0.30$1.20
minimax/minimax-m2.7-highspeed$0.60$2.40
minimax/minimax-m2.1$0.30$1.20

For context: Claude Sonnet 4.6 on ofox is $3.00/M input and $15.00/M output. MiniMax M2.7 at $0.30/$1.20 is 10× cheaper on input tokens for benchmarks that put it in the same league for coding tasks.


How to Access MiniMax via ofox

ofox provides an OpenAI-compatible endpoint, meaning you change two lines of code — not your entire SDK stack.

Python (openai SDK):

from openai import OpenAI

client = OpenAI(
    api_key="sk-YOUR_OFOX_KEY",
    base_url="https://api.ofox.ai/v1"
)

response = client.chat.completions.create(
    model="minimax/minimax-m2.7",
    messages=[{"role": "user", "content": "Write a Python function to parse JSON with error handling."}]
)
print(response.choices[0].message.content)

curl:

curl -X POST "https://api.ofox.ai/v1/chat/completions" \
  -H "Authorization: Bearer sk-YOUR_OFOX_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"minimax/minimax-m2.5","messages":[{"role":"user","content":"Hello!"}]}'

Get your key at ofox.ai. No separate MiniMax account, no per-model API setup, no regional billing headaches.


M2.5 vs M2.7: Which Should You Use?

If you’re running a production pipeline with volume, M2.5 is the starting point. It handles 100 TPS, the weights are on HuggingFace (self-host with vLLM or SGLang if data residency matters), and it’s the right choice for standard code generation, RAG, and multilingual text tasks where raw throughput beats peak accuracy.

M2.7 is for when the task gets hairy. Bug hunting from logs, multi-step project delivery, automated Office document pipelines (Excel/PPT/Word), and any agentic loop where multi-turn coherence matters. It costs the same per token but the quality gap shows up on harder problems.

M2.7 Highspeed ($0.60/$2.40) exists for teams that need M2.7 quality but are hitting throughput limits on the standard tier. Most teams won’t need it until they’re at significant scale.


Benchmarks: Where Does MiniMax Actually Stand?

MiniMax publishes these numbers for M2.7 (sourced from minimax.io, April 2026):

  • SWE-Pro: 56.22% — measures real-world software engineering tasks, not toy problems
  • VIBE-Pro: 55.6% — agent viability benchmark
  • Terminal Bench 2: 57.0% — terminal/CLI-focused agent tasks
  • GDPval-AA ELO: 1495 — MiniMax claims highest among open-source models

For M2.5: best performance on Multi-SWE-Bench (multilingual software engineering), per MiniMax’s published results.

One caveat: MiniMax publishes their own benchmarks, which any lab can optimize for. Independent third-party evaluations on these specific models are still sparse. The numbers are directionally accurate but treat them as a starting point, not a definitive verdict.

For a broader picture of where models rank against each other, see our LLM Leaderboard.


Practical Use Cases

Code generation and review: M2.7’s SWE-Pro score suggests it handles real repositories better than its price implies. For a team running 10M output tokens/month, switching from Claude Sonnet 4.6 ($150,000) to MiniMax M2.7 ($12,000) is a 12× cost reduction worth stress-testing.

Agentic pipelines: M2.7 was specifically designed for “complex environment interaction” and “building complex agents.” If you’re hitting rate limits or cost ceilings with Claude on agent loops, M2.5 or M2.7 are worth benchmarking on your specific task.

Multilingual apps: M2.5 claims top scores on multilingual software engineering. If your codebase or user queries mix languages, that’s a real differentiator.

Self-hosted / on-prem: M2.5 weights are on HuggingFace. For teams with data residency requirements, this is one of the few production-grade models where you have that option.


Migrating from Another Provider

If you’re already using ofox with Claude or Gemini, switching to MiniMax is a one-line model swap:

# Before
model="anthropic/claude-sonnet-4.6"

# After — same code, same endpoint
model="minimax/minimax-m2.7"

The unified ofox API means no credential rotation, no new billing setup, no SDK changes. If you’re still routing through a separate provider per model, see our AI API aggregation guide for why a single gateway simplifies this at scale.


Bottom Line

If you’re paying $15+/M for output tokens on tasks MiniMax M2.7 handles at $1.20/M, you owe yourself a one-day benchmark run — the savings at production scale are hard to ignore.

MiniMax M2.5 and M2.7 are not household names yet in the English-speaking developer community, which means the window to lock in cheap, high-quality inference is open. The models have real benchmark numbers behind them, M2.5 is open-source, and both are available on ofox with zero account setup overhead.

Start with M2.7 for agent and coding tasks. Use M2.5 for high-volume throughput. Run your own evals on your actual workload — and check the cost delta against what you’re paying now.

Get access at ofox.ai.