Nate’s public AI lab

Image Model Arena

A fixed 40-prompt set for reading image models like working creative tools: realism, age control, product continuity, text handling, style range, speed, and cost.

The field

6 models, one locked test

Every model gets the same prompts and the same read. The useful comparison is not whether an image is pretty; it is where each model reliably holds detail under pressure.

xAI / 2K / 1:1 / x-ai/grok-imagine-image-quality

Grok Imagine

The standout. The pinup poster reads “MISS JULY · ’55 / Rita Rae, the Garage Gal” taped to a Snap-on / MAC TOOLS toolbox; the cyberpunk main street glows “HANK’S FEED & SEED,” “MAYBERRY DINER,” “TAKASHI DELIVERY” — all crisp and correctly spelled. When a scene calls for a real sign, Grok renders legible type better than anything else I’ve tested. The exception is purely decorative text — the holographic UI panels behind the future razor still come out as sci-fi gibberish.

~10sper 2K image$0.07per 2K image40 / 40delivered · zero refusals$2.80the whole set

Open model page

Google / 2K / 1:1 / google/gemini-2.5-flash-image

Nano Banana

Softer and moodier than Grok. The Roman thermopolium and the diner read like warm film stills — atmospheric, cinematic light, lovely colour. It trades a little crispness for mood.

~6sper image$0.039per image40 / 40zero refusals$1.55the whole set

Open model page

Google / 2K / 1:1 / google/gemini-3-pro-image-preview

Nano Banana Pro

The most convincingly photographic. Highest resolution of the five — the garage, the rain-slick cyberpunk street and the 1960s bathroom all read like real photographs, not renders.

~20sper image$0.14per image40 / 40zero refusals$5.53the whole set

Open model page

ByteDance / 2K / 1:1 / bytedance-seed/seedream-4.5

Seedream 4.5

Crisp, punchy, high-saturation. Closer to polished commercial stock than Grok’s grit or Nano’s haze — clean and vivid, occasionally a touch over-produced.

~16sper image$0.04per image40 / 40zero refusals$1.60the whole set

Open model page

OpenAI / 1:1 / openai/gpt-5.4-image-2

GPT-5.4 Image 2

Its people look like actual photographs. Natural skin, real un-airbrushed faces, believable mall-and-garage light — the 1990 food court could pass for a film still. If the bar is “could this be a real photo,” GPT clears it highest.

~187sper image$0.23per image40 / 40zero refusals$9.21the whole set

Open model page

Microsoft / 1K / 1:1 / microsoft/mai-image-2.5

MAI-Image 2.5

The guardrails are the loudest failure, but they are not the only one. In the accepted set, 1960 · 19 uses strange colored framing instead of a convincing beach-bonfire scene, 1980 · 40 and 2000 · 40 both place the computer so it faces away from the people using it, and 2025 · 19 includes phone and pet handling that does not make physical sense.

30 / 40 deliveredstandard prompts completed10 / 40 blockedAzure ResponsibleAI guardrail blocks5 repair variants passedafter prompt rewrites or removing people~$0.048 / imageOpenRouter reported cost on successful generations

Open model page

The standard set

6 categories, 40 prompts

The set moves from historical people scenes into fashion, product photography, small subjects, food, imagined worlds, and abstract geometry.

I · Through the ages

Through the ages

Same eleven eras, two life stages, side by side: a group of fresh-faced nineteen-year-olds doing what the young did — hunts, dances, mosh pits, raves — next to middle-aged forty-year-olds doing what the settled did — harvests, workshops, offices, backyard parties. Different people, different lives, one timeline. Watch how each model handles youth vs. age across history.

22 promptsII · Fashion & portrait

Image Model Arena

6 models, one locked test

Grok Imagine

Nano Banana

Nano Banana Pro

Seedream 4.5

GPT-5.4 Image 2

MAI-Image 2.5

6 categories, 40 prompts

Through the ages

Fashion

Product

Pets

Food

Worlds