# StableStudio API

> AI image/video generation via micropayments. USDC on Base, Solana, or Tempo. No API keys.

Base URL: `https://stablestudio.dev`

## Recommended Defaults

- **Image generation:** `gpt-image-2` — Best default quality. Use it for most image tasks unless the user prioritizes speed or explicitly requests another model.
- **Fast image generation:** `nano-banana-pro` — Use when speed matters or the user wants a faster draft; supports up to 4K resolution.
- **Video generation:** `veo-3.1` — Best quality/cost ratio, supports up to 1080p resolution

## Payment Flow

1. `POST /api/generate/{model}/{operation}` without payment header
   - Returns `402` with `PAYMENT-REQUIRED` header (base64 JSON)

2. Decode requirements, sign USDC authorization, POST with `PAYMENT-SIGNATURE` header
   - Returns `200` with `{jobId, status:"pending"}` and `PAYMENT-RESPONSE` header

3. Poll `GET /api/jobs/{jobId}` with `SIGN-IN-WITH-X` header until complete

## 402 Response Format

Response body is empty `{}`. Requirements in header:

```
PAYMENT-REQUIRED: <base64>
```

Decoded:

```json
{
  "x402Version": 2,
  "accepts": [
    {
      "scheme": "exact",
      "network": "eip155:8453",
      "amount": "134000",
      "asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
      "payTo": "0xfbd7b7Ed48146aD9bEfF956212c77cE056815ad0"
    }
  ],
  "resource": {
    "url": "https://stablestudio.dev/api/generate/nano-banana-pro/generate",
    "description": "Nano Banana Pro - generate"
  }
}
```

`amount` is USDC micro-units (6 decimals). 134000 = $0.134.

## Routes

| Endpoint                                 | Cost          | Time             |
| ---------------------------------------- | ------------- | ---------------- |
| `/api/generate/nano-banana/generate`     | $0.045–$0.151 | ~5s              |
| `/api/generate/nano-banana/edit`         | $0.045–$0.151 | ~5s              |
| `/api/generate/nano-banana-pro/generate` | $0.13–$0.24   | ~10s             |
| `/api/generate/nano-banana-pro/edit`     | $0.13–$0.24   | ~10s             |
| `/api/generate/gpt-image-2/generate`     | $0.005–$0.21  | can take minutes |
| `/api/generate/gpt-image-2/edit`         | $0.005–$0.21  | can take minutes |
| `/api/generate/gpt-image-1.5/generate`   | $0.009–$0.20  | ~3s              |
| `/api/generate/gpt-image-1.5/edit`       | $0.009–$0.20  | ~3s              |
| `/api/generate/flux-2-pro/generate`      | $0.02–$0.04   | ~5s              |
| `/api/generate/flux-2-pro/edit`          | $0.03–$0.06   | ~5s              |
| `/api/generate/flux-2-max/generate`      | $0.04–$0.17   | ~8s              |
| `/api/generate/flux-2-max/edit`          | $0.04–$0.17   | ~8s              |
| `/api/generate/grok/generate`            | $0.07         | ~3s              |
| `/api/generate/grok/edit`                | $0.022        | ~3s              |
| `/api/generate/grok-video/generate`      | $0.15–$0.75   | ~17s             |
| `/api/generate/seedance/t2v`             | $0.09–$0.54/s | 1-3min           |
| `/api/generate/seedance/i2v`             | $0.09–$0.54/s | 1-3min           |
| `/api/generate/seedance-fast/t2v`        | $0.08–$0.17/s | 1-3min           |
| `/api/generate/seedance-fast/i2v`        | $0.08–$0.17/s | 1-3min           |
| `/api/generate/wan-2.6/t2v`              | $0.50–$2.25   | 2-5min           |
| `/api/generate/wan-2.6/i2v`              | $0.50–$2.25   | 2-5min           |
| `/api/generate/sora-2/generate`          | $0.40–$1.20   | 1-3min           |
| `/api/generate/sora-2-pro/generate`      | $1.20–$6.00   | 2-5min           |
| `/api/generate/veo-3.1/generate`         | $1.60–$3.20   | 1-2min           |
| `/api/generate/veo-3.1-fast/generate`    | $1.00–$2.00   | ~30s             |
| `/api/upload`                            | $0.01         | instant          |

Canonical OpenAPI spec: `GET /api/openapi.json`.

## Input Schemas

All edit/i2v endpoints that take media require the [File Upload](#file-upload) flow first — use the returned `blobUrl` in the appropriate `images`, `image`, or `urls` field.

**nano-banana generate** (Gemini 3.1 Flash):

```json
{
  "prompt": "string",
  "aspectRatio": "1:1|1:4|1:8|2:3|3:2|3:4|4:1|4:3|4:5|5:4|8:1|9:16|16:9|21:9",
  "imageSize": "512|1K|2K|4K",
  "thinkingLevel": "minimal|high"
}
```

**nano-banana edit** (1–14 reference images):

```json
{
  "prompt": "string",
  "aspectRatio": "...same as generate",
  "imageSize": "512|1K|2K|4K",
  "thinkingLevel": "minimal|high",
  "images": ["https://blob-url..."]
}
```

**nano-banana-pro generate:**

```json
{
  "prompt": "string",
  "aspectRatio": "1:1|2:3|3:2|3:4|4:3|4:5|5:4|9:16|16:9|21:9",
  "imageSize": "1K|2K|4K"
}
```

**nano-banana-pro edit** (1–14 reference images):

```json
{
  "prompt": "string",
  "aspectRatio": "...same as generate",
  "imageSize": "1K|2K|4K",
  "images": ["https://blob-url..."]
}
```

**gpt-image-2 generate:**

```json
{
  "prompt": "string",
  "quality": "low|medium|high",
  "size": "1024x1024|1536x1024|1024x1536|auto",
  "background": "opaque|auto",
  "output_format": "png|jpeg|webp",
  "moderation": "low|auto"
}
```

**gpt-image-2 edit** (adds `images`):

```json
{
  "prompt": "string",
  "quality": "low|medium|high",
  "size": "1024x1024|1536x1024|1024x1536|auto",
  "background": "opaque|auto",
  "output_format": "png|jpeg|webp",
  "moderation": "low|auto",
  "images": ["https://blob-url..."]
}
```

**gpt-image-1.5 generate:**

```json
{
  "prompt": "string",
  "quality": "low|medium|high",
  "size": "1024x1024|1536x1024|1024x1536|auto",
  "background": "transparent|opaque|auto",
  "output_format": "png|jpeg|webp",
  "moderation": "low|auto"
}
```

**gpt-image-1.5 edit** (adds `input_fidelity`, `images`):

```json
{
  "prompt": "string",
  "quality": "low|medium|high",
  "size": "1024x1024|1536x1024|1024x1536|auto",
  "background": "transparent|opaque|auto",
  "output_format": "png|jpeg|webp",
  "moderation": "low|auto",
  "input_fidelity": "high|low",
  "images": ["https://blob-url..."]
}
```

**flux-2-pro generate:**

```json
{
  "prompt": "string",
  "aspect_ratio": "1:1|16:9|9:16|3:2|2:3|4:5|5:4|4:3|3:4",
  "resolution": "0.5 MP|1 MP|2 MP",
  "output_format": "webp|jpg|png",
  "output_quality": 80,
  "safety_tolerance": 2,
  "prompt_upsampling": false
}
```

**flux-2-pro edit** (1–8 reference images):

```json
{
  "prompt": "string",
  "aspect_ratio": "...same as generate",
  "resolution": "0.5 MP|1 MP|2 MP",
  "images": ["https://blob-url..."]
}
```

**flux-2-max generate** (up to 4 MP):

```json
{
  "prompt": "string",
  "aspect_ratio": "1:1|16:9|9:16|3:2|2:3|4:5|5:4|4:3|3:4",
  "resolution": "0.5 MP|1 MP|2 MP|4 MP",
  "output_format": "webp|jpg|png",
  "output_quality": 80,
  "safety_tolerance": 2,
  "prompt_upsampling": false
}
```

**flux-2-max edit** (1–10 reference images):

```json
{
  "prompt": "string",
  "aspect_ratio": "...same as generate",
  "resolution": "0.5 MP|1 MP|2 MP|4 MP",
  "images": ["https://blob-url..."]
}
```

**grok generate** (13 aspect ratios including ultra-wide):

```json
{
  "prompt": "string",
  "aspect_ratio": "1:1|16:9|9:16|4:3|3:4|3:2|2:3|2:1|1:2|19.5:9|9:19.5|20:9|9:20"
}
```

**grok edit:**

```json
{
  "prompt": "string",
  "aspect_ratio": "...same as generate",
  "images": ["https://blob-url..."]
}
```

**grok-video generate** (single endpoint — pass `image` for image-to-video):

```json
{
  "prompt": "string",
  "duration": "3|6|9|12|15",
  "resolution": "480p|720p",
  "aspect_ratio": "1:1|16:9|9:16|4:3|3:4|3:2|2:3",
  "image": "https://blob-url..."
}
```

**seedance / seedance-fast t2v** (Seedance 2 Pro/Fast happy path):

```json
{
  "prompt": "string",
  "duration": "5",
  "aspectRatio": "16:9",
  "outputResolution": "720p"
}
```

Optional advanced fields: `resolution` (`720x720|720x960|960x720|1280x720|720x1280|1280x540`), `upscaleResolution: "4k"`, `callBackUrl`.

**seedance / seedance-fast i2v** (first/last-frame keyframe):

```json
{
  "prompt": "optional guidance",
  "duration": "5",
  "aspectRatio": "16:9",
  "outputResolution": "720p",
  "mode": "keyframe",
  "urls": ["https://first-image...", "https://optional-last-image..."],
  "urlMediaTypes": ["image", "image"]
}
```

**seedance / seedance-fast i2v** (reference mode with images/video/audio):

```json
{
  "prompt": "@image1 keeps the character identity while @video1 supplies the camera move, synced to @audio1",
  "duration": "5",
  "aspectRatio": "16:9",
  "outputResolution": "720p",
  "mode": "reference",
  "urls": ["https://image-reference...", "https://video-reference..."],
  "urlMediaTypes": ["image", "video"],
  "audioUrls": ["https://audio-reference..."]
}
```

Seedance `urlMediaTypes` must align 1:1 with `urls`: use `"image"` for images and `"video"` for videos. StableStudio verifies media types from each URL before charging and persists the verified `urlMediaTypes`, but callers should still include it so `@image1` and `@video1` references are unambiguous. In reference mode, `@imageN` counts image URLs only, `@videoN` counts video URLs only, and `@audioN` counts `audioUrls` only. Use one strong reference first, then add more control if needed; overloaded prompts with many references can conflict.

Seedance reference/prompt sources:

- [WaveSpeed Seedance 2.0 Guide](https://wavespeed.ai/blog/posts/seedance-2-0-complete-guide-multimodal-video-creation/) — multimodal reference limits, @ mention syntax, and motion/audio use cases.
- [Magic Hour Reference Guide](https://magichour.ai/blog/seedance-20-reference-guide) — identity, motion, audio sync, and common reference failure modes.
- [SeaArt Best Prompts](https://www.seaart.ai/blog/seedance-2-0-prompt) — five-segment, CRAFT, and timeline prompt structures.

Use `seedance-fast` for the default happy path. Use `seedance` for Pro quality, 1080p, or higher-fidelity output. `seedance-fast` supports 480p/720p. 4K is available through `upscaleResolution: "4k"` and is significantly more expensive.

**wan-2.6 t2v:**

```json
{
  "prompt": "string",
  "duration": "5|10|15",
  "size": "1280*720|720*1280|1920*1080|1080*1920",
  "negativePrompt": "string",
  "enablePromptExpansion": true,
  "multiShots": false,
  "audioUrl": "https://...",
  "seed": 0
}
```

**wan-2.6 i2v:**

```json
{
  "prompt": "string",
  "image": "https://blob-url...",
  "duration": "5|10|15",
  "resolution": "720p|1080p",
  "negativePrompt": "string",
  "enablePromptExpansion": true,
  "multiShots": false,
  "audioUrl": "https://...",
  "seed": 0
}
```

**sora-2 generate** (pass `input_reference` for image-to-video):

```json
{
  "prompt": "string",
  "seconds": "4|8|12",
  "size": "1280x720|720x1280",
  "input_reference": "https://blob-url...",
  "autoCrop": true
}
```

**sora-2-pro generate** (same as sora-2, extra sizes):

```json
{
  "prompt": "string",
  "seconds": "4|8|12",
  "size": "1280x720|720x1280|1792x1024|1024x1792",
  "input_reference": "https://blob-url...",
  "autoCrop": true
}
```

**veo-3.1 / veo-3.1-fast generate:**

```json
{
  "prompt": "string",
  "durationSeconds": "4|6|8",
  "resolution": "720p|1080p",
  "aspectRatio": "16:9|9:16",
  "negativePrompt": "string",
  "imageMode": "none|first-frame|reference|interpolation",
  "image": "https://first-frame-blob-url...",
  "lastFrame": "https://last-frame-blob-url...",
  "referenceImages": ["https://blob-url..."]
}
```

Veo modes: omit `image`/`lastFrame`/`referenceImages` for pure text-to-video; pass `image` alone for image-to-video (first frame); pass `image` + `lastFrame` with `imageMode: "interpolation"` to animate between frames; pass up to 3 `referenceImages` with `imageMode: "reference"` for style guidance.

## File Upload

Upload images, video, or audio for editing, image-to-video, or reference video generation. Three-step flow:

**Step 1: Get upload token** (payment, $0.01)

```
POST /api/upload
PAYMENT-SIGNATURE: <signed payment>
Content-Type: application/json

{"filename": "reference.mp4", "contentType": "video/mp4"}
```

Returns:

```json
{
  "uploadId": "uuid",
  "clientToken": "vercel_blob_...",
  "pathname": "uploads/uuid/image.png",
  "expiresAt": "..."
}
```

**Step 2: Upload file directly to Vercel Blob**

```bash
curl -X PUT "https://vercel.com/api/blob/?pathname=uploads/uuid/image.png" \
  -H "authorization: Bearer $clientToken" \
  -H "x-content-type: image/png" \
  -H "x-api-version: 11" \
  --data-binary @image.png
```

Returns `{"url": "https://....blob.vercel-storage.com/..."}`.

**Step 3: Confirm upload** (SIGN-IN-WITH-X auth, no payment)

```
POST /api/upload/confirm
SIGN-IN-WITH-X: <base64>
Content-Type: application/json

{"uploadId": "uuid", "blobUrl": "https://....blob.vercel-storage.com/..."}
```

Returns `{"success": true, "upload": {"id": "...", "blobUrl": "..."}}`.

Use the `blobUrl` in edit/i2v requests.

## Job Polling (SIGN-IN-WITH-X)

Job routes require wallet signature authentication (no payment):

```
GET /api/jobs/{jobId}
SIGN-IN-WITH-X: <base64>
```

Header contains base64-encoded CAIP-122 message:

```json
{
  "domain": "stablestudio.dev",
  "address": "0x...",
  "uri": "https://stablestudio.dev/api/jobs/{jobId}",
  "version": "1",
  "chainId": "eip155:8453",
  "nonce": "<random>",
  "issuedAt": "<ISO8601>",
  "expirationTime": "<ISO8601>",
  "signature": "0x..."
}
```

If auth missing/invalid, returns 402 with SIWX extension:

```json
{
  "x402Version": 2,
  "accepts": [],
  "extensions": {
    "sign-in-with-x": {
      "info": {
        "domain": "stablestudio.dev",
        "uri": "https://stablestudio.dev/api/jobs/{jobId}",
        "version": "1",
        "nonce": "<server-generated>",
        "issuedAt": "<ISO8601>",
        "expirationTime": "<ISO8601>"
      },
      "supportedChains": [{ "chainId": "eip155:8453", "type": "eip191" }],
      "schema": { "...": "..." }
    }
  }
}
```

**Routes:**

- `GET /api/jobs/{jobId}` — Get job status
- `GET /api/jobs` — List jobs (`?limit=20&status=complete`)
- `DELETE /api/jobs/{jobId}` — Delete failed job

**Response:**

```json
{ "status": "complete", "result": { "imageUrl": "https://..." } }
```

Videos return `{videoUrl, thumbnailUrl}`. Returned URLs expire after roughly 20 minutes — download the asset immediately once the job completes.

**Do not resubmit a generation while its job is `pending` or `loading`.** Normal generation can take several minutes, especially GPT Image and video models. Keep polling the original `jobId`; duplicate submissions create duplicate paid jobs.

**Polling intervals:**

| Model family                   | Poll every |
| ------------------------------ | ---------- |
| Most image models              | 5s         |
| `gpt-image-2`, `gpt-image-1.5` | 10s        |
| Video models                   | 15s        |