Why does my GPT-Image-2 call hang for 3 minutes and then 504?

The model is just slow on the server side — a high-quality 1024×1024 generation takes a median of ~195 seconds and a p95 around 280 seconds. Whoever has the lowest timeout in your chain (your CDN edge, your reverse proxy, your SDK) wins. The fix is almost always to switch to streaming with partial_images, or to make the call asynchronous behind a task ID.

What's the difference between 'request rejected' and 'image filtered'?

OpenAI runs the safety filter twice. 'Your request was rejected' means the input filter caught the prompt or reference image before generation. 'Generated image was filtered' means the output filter caught what came out of a clean-looking prompt. Retrying the exact same prompt won't help — you need to change the wording for the first, and rethink the scene for the second.

Cherry Studio or my desktop client throws 'Unknown parameter' — why?

Some third-party clients send the chat-completions parameter shape (response_format, messages, etc.) to the image-generation endpoint, which rejects them. The minimal correct payload is just model + prompt + size. Strip everything else and the call works.

How many concurrent GPT-Image-2 calls can I run?

Three is stable, five is borderline, ten gets you 429s. Image rate limits are stricter than text rate limits, even on the same key. A semaphore at the call site plus exponential backoff on 429 is the cheapest fix.

Do I have to do OpenAI's org verification to use GPT-Image-2?

If you call OpenAI directly, yes — and the process gets stuck for a non-trivial number of developers (country gating, the 90-day re-verification lock, Persona session expiry). If you route through an aggregator like Ofox that's already verified, you skip that handshake entirely. Same model ID, same SDK.

May 15, 2026

openaiimage-generationtroubleshootingapi-guide

Why GPT-Image-2 Generations Fail: 5 Real Causes and the Fixes That Work

TL;DR — GPT-Image-2 is slow by design. Real generation times sit at 145–280 seconds for high-quality 1024px output, and most “failures” people report are not failures at all — they’re a gateway in the middle of the chain timing out while OpenAI is still working. Four other causes account for the rest: two-stage moderation, parameter mismatches in third-party clients, image rate limits that bite at concurrency 3+, and the OpenAI org-verification wall. This post tells you how to identify which one you hit, and the smallest code change that fixes it.

30-second triage

What you’re seeing	Likely cause	Section
`504`, `connection reset`, hang at ~60s / 180s	Timeout chain	1
`moderation_blocked`, `Your request was rejected by the safety system`	Two-stage content filter	2
`Unknown parameter`, `Invalid request` from a wrapper client	SDK / parameter mismatch	3
`429 Too Many Requests`, fine at 1–2 but breaks at 3+ concurrent	Rate limit	4
`Your organization must be verified`	OpenAI org wall	5
You’re not a developer and just want a picture	Use a no-code wrapper	No-code

1. Timeouts (the big one)

This is roughly two-thirds of every “GPT-Image-2 is broken” thread on the OpenAI developer forum. It’s not a bug. The model is slow, and your gateway gives up before OpenAI is done.

Real generation times measured against gpt-image-2:

Minimal request, no references, 1024×1024 medium: ~80s
1024×1024 high quality, no references: median 195s, p95 ~280s
1536×1024 high quality + input_fidelity="high" + small reference image: ~130s (faster, because the model commits earlier)
1024×1024 medium with two JPG references: ~44s

Now look at the typical timeout stack between your client and OpenAI:

your code              ── OpenAI Python SDK default: 600s
  ↓
your proxy / gateway   ── NGINX default 60s, Azure SDK 180s, Express 120s
  ↓
your CDN edge          ── Cloudflare Free 100s, Vercel Hobby 60s
  ↓
OpenAI upstream        ── responds in 195s

Whoever has the lowest cap wins. If you’re on Vercel Hobby, you have 60 seconds — no amount of client.timeout=600 in your code matters, because the edge already cut the response. By second 61 the user sees a 504 and assumes the model failed. The upstream is still happily generating an image that nobody will ever receive.

Fixes in order of impact

a) Switch to streaming with partial images. This is the single biggest unlock and most code never turns it on:

stream = client.images.generate(
    model="openai/gpt-image-2",
    prompt="ramen shop at 2am, wet pavement, neon reflections",
    size="1024x1024",
    stream=True,
    partial_images=2,
)
for event in stream:
    if event.type == "image_generation.partial_image":
        push_to_client(event.b64_json, index=event.partial_image_index)
    elif event.type == "image_generation.completed":
        final = event.b64_json

First-byte latency drops from ~195s to 5–15s. The user sees something happening, and your gateway sees a live response, so the 60-second cutoff never fires. Cost is ~100 extra image-output tokens per partial; two partials is usually enough.

b) Go async if you can’t stream. Return a task ID, do the call in a worker, push the result over a webhook or poll endpoint. This is the only sane pattern when your front-end sits behind a CDN with a hard 100-second cap. It also means a client losing connection doesn’t waste the generation — the worker still finishes and stores the result.

c) Drop the quality knob. Most prompts don’t need quality="high". Switching to medium saves 60–120 seconds per call. JPEG output is faster to encode than PNG when latency matters.

What not to do: retry the same prompt the moment your client gives up. The upstream is still running. You just doubled your bill and primed yourself for a 429.

2. moderation_blocked

The second largest bucket. OpenAI runs safety twice — once on your prompt and any reference images, once on the generated output. So:

Your request was rejected by the safety system → input filter. Change the prompt.
Generated image was filtered → output filter. A clean-looking prompt produced something the filter didn’t like. Change the scene, not just the wording.

Retrying the same prompt does nothing. The filter is deterministic enough that the same input will block the same way every time, and you’ll burn requests for hours waiting for it to “warm up”.

High-risk categories the output filter catches even when the prompt is mild:

Anyone who could read as a minor, including stylized
Recognizable public figures (politicians, celebrities, athletes)
Trademarked characters and IP (Disney, Nintendo, Marvel)
Realistic medical, surgical, or injury imagery
Anything resembling an ID document, real currency, or a registered logo

The pragmatic fix is upstream: pre-validate the prompt in your own moderation layer so you don’t waste 200 seconds on a request that was always going to be rejected. A simple regex pass that flags celebrity names, brand names, and a few danger words covers most of it.

3. Parameter mismatches in third-party clients

If you’re calling GPT-Image-2 through Cherry Studio or a similar desktop client, this is probably your problem. The chat-completions parameter shape and the image-generation parameter shape are different, and clients that assume one API end up sending fields the other rejects.

At the time of writing:

response_format — not accepted by gpt-image-2. Strip it.
Chat-style messages: [...] arrays sent to /v1/images/generations — wrong endpoint shape entirely.
n>1 — works, but each unit counts separately for rate-limit budgeting.

The minimal, correct call:

curl https://api.ofox.ai/v1/images/generations \
  -H "Authorization: Bearer $OFOX_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-image-2",
    "prompt": "quiet ramen shop at 2am, neon signs reflected on wet pavement",
    "size": "1024x1024"
  }'

If a wrapper client is failing and the curl above succeeds with the same key, the client is sending extras. Most third-party clients have an Advanced Parameters panel — disable response_format and similar fields.

4. Rate limiting

Image rate limits are not the same as text rate limits. The same key that does 50 RPM on gpt-5-4 will start eating 429s at 10 concurrent image calls.

Practical concurrency budget:

3 concurrent + 2s gap between batches: stable
5 concurrent: occasional 429 under load
10 concurrent: nearly guaranteed throttle

A semaphore at the call site is the cheapest fix:

import asyncio

sem = asyncio.Semaphore(3)

async def generate(prompt: str):
    async with sem:
        result = await client.images.generate(
            model="openai/gpt-image-2",
            prompt=prompt,
            size="1024x1024",
        )
        await asyncio.sleep(2)
        return result

Pair it with exponential backoff that honors Retry-After on 429. And — important — don’t retry on timeout. See section 1.

5. Organization verification stuck

OpenAI gates GPT-Image-2 behind an org-verification step that uses Persona for ID checks. The common ways it stalls:

Country not on the allow-list (and the list isn’t published)
90-day re-verification lock after a previous successful verification
Persona session expires before you submit — restart from the OpenAI dashboard, never from the original Persona email
Verification succeeded but model access takes 6–24 hours to propagate

If you’re staring at “Your organization must be verified” and don’t want to wait, calling GPT-Image-2 through an already-verified aggregator skips the handshake entirely. The next section has the one-line SDK swap.

Pick the right tool for what you’re building

GPT-Image-2’s failure modes are easier to dodge if you stop sending requests in the most adversarial shape possible — straight from your laptop, on default timeouts, across an unstable network path. Two cleaner routes; pick the one that matches what you’re doing.

Shipping code? Route through Ofox’s OpenAI-compatible endpoint

from openai import OpenAI

client = OpenAI(
    api_key="ofox-...",
    base_url="https://api.ofox.ai/v1",
)

stream = client.images.generate(
    model="openai/gpt-image-2",
    prompt="...",
    size="1024x1024",
    stream=True,
    partial_images=2,
)

Most of the per-call time is OpenAI’s server-side generation, and that part doesn’t change no matter how you route. What changes is the network path between your client and OpenAI’s edge. Direct Asia-Pacific → US-East routes typically eat a few hundred milliseconds of latency plus periodic packet loss and TLS-handshake retries — fine for fast text calls, painful on a 100–200 second streaming connection where any one of those hiccups kills the whole generation. Ofox holds a stable optimized path for Asia-Pacific traffic, so the streaming bytes actually arrive over the full window. As a bonus, the org-verification wall from section 5 is already cleared on Ofox’s side — same model ID works without Persona.

Same OpenAI SDK. Same model ID openai/gpt-image-2. Same streaming, partial_images, reference inputs, input_fidelity. The only change is one line in your client setup.

Just need pictures, no integration to debug? Use gptimage2.plus

gptimage2.plus is GPT-Image-2 in a regular browser UI. It absorbs the timeouts and retries on its side — type a prompt, get back a 2K image. Five free credits on signup, one free generation per day without login, and the code LAUNCH50 is currently 50% off the first month.

Right tool when:

You don’t write code and you want a few images
ChatGPT’s web UI keeps stalling on image generation today
You want preset workflows (product shots, profile photos, photo restoration) instead of writing prompts from scratch

Wrong tool if you’re embedding image generation in a product or running batch jobs in a pipeline — use the API for that.

Shortest path to “stops failing” on the API

Turn on streaming with partial_images=2. Biggest single win.
Cap concurrency at 3 with a semaphore. Honor Retry-After.
Pre-validate prompts for high-risk moderation patterns.
Don’t retry on timeout — let the original request finish, or move to async.
Route through an Asia-stable aggregator base URL if you’re in CN / JP / KR / SEA, or blocked on org verification.

The model itself is the best text-to-image generator on the Arena leaderboard right now. The failure modes are real but narrow, and every one of them has a one-line fix.