Cursor Composer 2.5: What's New, Best Models, and How to Set It Up
TL;DR: Cursor shipped Composer 2.5 on May 18, 2026 — post-trained from Moonshot’s Kimi K2.5, scoring 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1 (just past Opus 4.7’s 61.6%). It is genuinely good at sustained agent work, but the “Fast” tier defaults to $3/$15 per million tokens — six times the Standard price for the same model. The cheapest sane Cursor setup right now: Composer 2.5 Standard for routine edits, plus a BYO route to Claude Sonnet 4.6 or GPT-5.4 Codex for the hard ones.
What actually changed in Composer 2.5
Composer 2.5 is the same Kimi K2.5 open-source backbone as Composer 2, with a different post-training stack on top. Cursor’s release post lists three concrete shifts:
- 25× more synthetic training tasks than Composer 2, including a new family of “feature deletion” puzzles where the model is given a working repo with a feature ripped out and has to rebuild it.
- Textual-feedback RL — localized hints at each failed tool call, instead of only an end-of-run reward signal. That is the change behind the “follows complex instructions more reliably” line in the announcement.
- MoE-scale infrastructure — Cursor confirmed they invested heavily in distributed training plumbing so they can keep iterating on the base. They also confirmed (in the same post) that they are jointly training a much larger model from scratch with SpaceXAI — “10× more total compute” on Colossus 2 — but that one is not Composer 2.5.
The result on Cursor’s own benchmarks:
| Benchmark | Composer 2.5 | Claude Opus 4.7 | GPT-5.5 |
|---|---|---|---|
| SWE-Bench Multilingual | 79.8% | 80.5% | 77.8% |
| CursorBench v3.1 (default settings) | 63.2% | 61.6% | 59.2% |
| Terminal-Bench 2.0 | 69.3% | 69.4% | 82.7% |
A caveat worth sitting with: CursorBench is Cursor’s eval, and Composer 2.5 is Cursor’s model. Top developers on Hacker News pointed out that Composer 2’s CursorBench score quietly dropped from 60–65% to 50–55% between v3.0 and v3.1 — the kind of bench-version drift that should make you cautious about any single-vendor leaderboard. And Composer 2.5 loses Terminal-Bench 2.0 to GPT-5.5 by 13 percentage points. If your day is mostly shell-and-CLI work, that gap matters.
The HN thread is also where the cost story is: one engineer reported a 4-person team’s Cursor bill jumping from “$20–100 per person” to roughly $1,000 total per month after the Fast tier became default. The complaint is fair — Fast pricing is roughly 3× Composer 2.
Pricing — and the trap most people walk into
Composer 2.5 has two tiers that serve the same model weights:
| Tier | Input | Output | Default? |
|---|---|---|---|
| Standard | $0.50 / M tokens | $2.50 / M tokens | No |
| Fast | $3.00 / M tokens | $15.00 / M tokens | Yes |
Yes, same model. Fast is just inference on hotter, more expensive hardware so the first token arrives sooner. There is no quality difference.
This matters because Fast is the default, and most people never change it. If you are running an agent loop that fires off 30 tool calls before producing 200 lines of code, Fast will burn through your monthly credits in days. Cursor doubled the included usage in the first week after launch (through ~May 25, 2026) to soften the rollout, but that promotion is over.
The pragmatic rule: use Standard everywhere unless you can feel the latency. Standard matches Opus 4.7 on output cost ($2.50/M tokens versus $15/M for Opus), which is the comparison actually worth running.
How to set it up in Cursor
If you already have Cursor installed and up to date, this takes under five minutes.
1. Update Cursor. Composer 2.5 ships in Cursor 3.4+ (3.5 is the current release as of May 20, 2026). Cursor → Check for Updates. Quit and relaunch — the model picker does not refresh until you do.
2. Open the model picker. In the chat panel: click the model name at the bottom of the prompt input. In an inline edit (Cmd+K / Ctrl+K): same dropdown, top-left of the floating editor.
3. Select Composer 2.5. Open the model picker and choose Composer 2.5. Cursor loads the Fast variant by default — if you want Standard, switch to it explicitly before you start. See Cursor’s model docs for the exact picker labels in your version, since they have shifted between point releases.
4. Default to Standard where you can. For Background and Cloud Agent runs, Settings → Models → Composer 2.5 is where you set the Standard variant as the default — that one change is usually most of the bill. For interactive chats, Cursor still falls back to Fast at session start, so the practical habit is to flip to Standard at the top of any chat you expect to run long. The “Auto + Composer” usage pool counts both tiers, so the choice only affects per-token cost, not your plan bucket.
5. Optional — write a Cursor Rule for the repo. Cursor rules live in .cursor/rules/*.mdc with frontmatter (description, globs, alwaysApply). They cannot pin a model, but they can nudge the agent’s behavior. Example .cursor/rules/composer.mdc:
---
description: Conventions for Composer 2.5 in this repo
alwaysApply: true
---
Prefer Composer 2.5 Standard for refactors and long agent loops.
Reserve Fast for tight inline edits where latency dominates cost.
That is the whole setup. There is no API key to paste, no endpoint to configure — Composer 2.5 runs only through Cursor’s backend. If you want to use Composer 2.5 from a script, you go through the Cursor CLI agent, and that still routes through Cursor’s auth.
When to pick Composer 2.5 — and when not to
Composer 2.5 is strong at one specific shape of work: medium-length agent loops inside Cursor’s UI, where the model is calling Cursor’s tools (file edits, terminal, search) and reading back results. That is what the 25× synthetic task expansion was tuned for.
It is weak, or at least not the cheapest option, in three cases:
- One-shot architectural questions. You want a 500-word design opinion on whether to extract a service, not a code change. Send it to Claude Opus 4.7 instead — it is better at this and you will spend a few cents, not a few dollars.
- Long, terminal-heavy work. GPT-5.5 leads Terminal-Bench 2.0 by 13 points. If you are wiring up a deploy pipeline, GPT-5.4 Codex via Codex CLI is a real alternative.
- Code review and PR triage. You are reading more than writing. Composer 2.5’s Fast tier becomes a tax on reading. Use a cheaper model — Gemini 3.1 Flash or DeepSeek V4 Pro through a gateway — for the read pass, and reserve Composer 2.5 for the write pass.
A workflow several teams have settled into: Composer 2.5 Standard inside Cursor for inline edits and quick refactors, Claude Sonnet 4.6 (via Cursor’s BYO path) for long agent runs that need stronger judgment, and Opus 4.7 (also BYO) for the genuinely hard architectural calls. We covered the BYO route in Cursor / Claude Code / Cline Custom API Setup — Composer 2.5 slots in next to those without conflict.
The Kimi K2.5 connection no one talks about
The base weights under Composer 2.5 are public. Moonshot’s Kimi K2.5 is open-source, and you can hit it directly via the Kimi API — usually at roughly 1/5 the price of Composer 2.5 Standard. We have a full breakdown in Kimi K2.5 API: Pricing, Access, and Honest Benchmarks, including the gap between vanilla K2.5 and Cursor’s post-trained version.
The gap matters. Cursor’s 25× synthetic task RL adds something real — about 4–8 percentage points across our internal coding evals versus stock K2.5 — but it is not the magic the marketing suggests. If your use case is “long-horizon agent loops inside Cursor specifically,” Composer 2.5 wins. If your use case is “give me a coding model I can hit from any client,” stock K2.5 plus a thin agent harness gets you 90% of the way for a fraction of the cost.
This is the case-by-case decision. There is no universal winner.
The Cursor-without-Cursor escape hatch
For teams who want Cursor-style productivity but cannot stomach the Fast-tier pricing or the vendor lock-in, the practical answer is: keep Cursor for the editor, route the model traffic through a gateway.
Cursor supports an “Override OpenAI Base URL” field in Settings → Models. Point it at an aggregator that exposes Sonnet 4.6, GPT-5.4 Codex, Gemini 3.1 Pro, Kimi K2.5, and DeepSeek V4 Pro behind one OpenAI-format endpoint, and you can switch between them per-conversation without leaving Cursor. One caveat worth flagging up front: as of Cursor 3.5, the custom base URL is honored in the chat/planning panel (Cmd/Ctrl + L) but not in the agent loop — Composer-style runs still go through Cursor’s own backend. We document this pattern in AI API Aggregation: Access Every Model from One Endpoint — the same pattern works for Claude Code and Codex CLI.
The split that has been working for most ofox.ai users on Cursor: Composer 2.5 Standard for the in-IDE agent flow, plus a BYO route for the heavy stuff. Total monthly bill stays well under $50 per developer, which is what Cursor cost before the Fast tier landed.
For the broader question of which model to pick for which task across the whole 2026 landscape, our Best LLM for Coding (Ranked by Real Use) and the Claude vs GPT vs Gemini comparison pillar carry the full picture. Composer 2.5 belongs in the conversation now — but it is one option, not the option.
Bottom line: Composer 2.5 is the best in-Cursor coding experience available today, and it is also the easiest model in 2026 to massively overpay for. Switch the default from Fast to Standard, pair it with a BYO route for the hard problems, and you get the upgrade without the bill shock.
Sources: Cursor — Introducing Composer 2.5, Cursor changelog: Composer 2.5, Hacker News discussion, BuildFastWithAI benchmark breakdown.


