Migrating from Sora or DALL-E? Use promo code DALLE1000 for $10 in free API credits!
CreativeAI
Live Production Benchmarks

API Performance & Latency

Sub-200ms TTFB, 1.2-second image generation, 45-second video generation. Real numbers from production β€” not cherry-picked benchmarks.

API TTFB

<180ms

p50 global

Image Gen

1.2s

Seedream 1024Γ—1024

Video Gen

~45s

Kling 5s clip

Uptime

99.97%

rolling 90 days

Per-Model Latency Benchmarks

Real production p50 numbers across every model we serve. Updated daily from aggregated telemetry.

Seedream V1.5

Image
TTFB (p50)142ms
Generation (1024Γ—1024)1.2s
Generation (2048Γ—2048)2.1s
Throughput (burst)48 img/min

Seedream V2.0

Image
TTFB (p50)156ms
Generation (1024Γ—1024)1.8s
Generation (2048Γ—2048)3.4s
Throughput (burst)32 img/min

Kling V2.0

Video
TTFB (p50)189ms
Generation (5s @ 720p)45s
Generation (10s @ 720p)78s
Throughput (sustained)80 vid/hr

Kling V3.0

Video
TTFB (p50)195ms
Generation (5s @ 1080p)38s
Generation (10s @ 1080p)65s
Throughput (sustained)95 vid/hr

Vidu Q1

Video
TTFB (p50)172ms
Generation (5s @ 720p)52s
Generation (10s @ 720p)91s
Throughput (sustained)68 vid/hr

Percentile Breakdown

Tail latency matters. Here's what p50 through p99 look like across our core metrics.

PercentileTTFBImage Gen (1024px)Video Gen (5s clip)
p50142ms1.18s43s
p75168ms1.34s48s
p90201ms1.62s56s
p99312ms2.41s72s

How We Compare

Side-by-side with other AI generation API providers on the metrics that matter most.

ProviderTTFBImage GenVideo GenUptime
CreativeAI142–195ms1.2s38–45s99.97%
Provider A350–600ms3.5s90–120s99.5%
Provider B500–900ms4.2s120–180s99.2%

Built for Speed at Scale

The infrastructure behind the numbers β€” from edge routing to GPU auto-scaling.

Global Edge Routing

Requests hit the nearest PoP before reaching GPU clusters. Median network hop < 40ms worldwide.

Dedicated GPU Pools

No cold starts. Warm model instances on A100 / H100 clusters with auto-scaling for burst traffic.

99.97% Uptime SLA

Redundant inference backends with automatic failover. Status page transparency with 90-day rolling metrics.

Real-Time Monitoring

Per-request latency traces exposed via response headers. Integrate with your own Datadog / Grafana dashboards.

Auto-Scaling

Burst from 10 to 10,000 concurrent requests without pre-warming. Queue-based scheduling with priority tiers.

Async + Webhooks

Fire-and-forget generation with webhook callbacks. No polling overhead β€” get notified the instant results are ready.

Per-Request Observability

Every response ships latency headers you can pipe straight into your monitoring stack.

# Response headers on every API call

X-Request-Duration: 1247ms

X-Queue-Wait: 12ms

X-Inference-Time: 1183ms

X-Model: seedream-v1.5

X-Region: us-east-1

# Pipe into Datadog, Grafana, or your own dashboards

curl -s -o /dev/null -w "TTFB: %{time_starttransfer}\nTotal: %{time_total}" \

https://api.creativeai.run/v1/generate

Need a Latency SLA?

Enterprise plans include contractual p99 latency commitments, dedicated GPU pools, and priority queue access. Talk to our solutions team.

Frequently Asked Questions