A practical, opinionated comparison across 12 AI vendors: the six big-model labs (Claude, OpenAI, Google, xAI, DeepSeek, Alibaba) plus six specialized contenders surfaced via attap.ai (Moonshot Kimi, Z.AI / GLM, ByteDance, Black Forest Labs, Kling + LTX, Z-Image / Pruna) — feature-by-feature, with a verdict on which to pick for which job. Topic-block panels remain six-way (the major vendors); specialized contenders are highlighted in the TL;DR and the "Specialized contenders" section below.
⚖ Pick up to 3 models to focus on
0 / 3 selected — showing allWhen you select 1-3 vendors, the topic-by-topic panels and specialized-contender cards filter to just your picks. The TL;DR table stays full as a global reference. Cap is 3 — uncheck one to add another.
TL;DR — pick this for that
| If your goal is… | Reach for | Why |
|---|---|---|
| One-off writing, brainstorm, learning | Coin flip — all three are excellent | Quality is essentially tied. Pick the subscription you already have. |
| Hands-on coding in your terminal | Claude Code + Sonnet 4.6 / Opus 4.7 | Most mature local-first agent stack. Skills + MCP feel cohesive. |
| Parallel cloud coding ("do this in 6 repos") | Codex + GPT-5.3-Codex / GPT-5.4 | Cloud agent opens PRs, runs tests, scales horizontally. |
| IDE inline completions / code chat | Gemini Code Assist | Free-tier IDE assistant; deeply integrated with Google Cloud + GitHub. |
| Whole-codebase analysis (very large input) | Gemini 3.1 Pro (2M context) | Largest native context window of the three. Beats GPT-4.1's 1M and Claude's 200k. |
| Hardest reasoning / strategy / ambiguous synthesis | Opus 4.7 with xhigh effort | Best-in-class on open-ended judgment calls; long-horizon rigor. |
| Formal math, multi-step proofs, code debugging | o3 (or GPT-5.4 Pro at xhigh) | Reasoning-trained models with explicit chain-of-thought. |
| Hardest abstract reasoning (ARC-AGI / GPQA territory) | Gemini 3.1 Pro + Deep Think | 77.1% on ARC-AGI-2 (more than 2× Gemini 3 Pro); 94.3% GPQA Diamond. |
| Browser/desktop automation | GPT-5.4 (native Computer Use) | 75.0% on OSWorld-Verified, beats human baseline 72.4%. Currently the SOTA. |
| Voice agent / phone bot | OpenAI Realtime ≈ Gemini Live | OpenAI more mature in production. Gemini 3.1 Flash Live + 3.1 Flash TTS now competitive — and natural-language voice control is unique. |
| Image generation | gpt-image-2 ≈ Imagen 4 Ultra | Both strong. Imagen 4 has stronger text rendering; gpt-image-2 has native reasoning. |
| Video generation (with audio) | Veo 3 | Generates video with synchronized audio — dialogue, SFX, ambient sound — natively. OpenAI's Sora 2 is shutting down. |
| Native video understanding (read a video) | Gemini 3.1 Pro | Gemini was multimodal-native from day one. Drop a video into the prompt; it just reads. |
| Reading dense / high-res images | GPT-5.4 | Up to 10.24 MP in "original" detail; superior raw-pixel ingestion. |
| Visual reasoning quality at standard sizes | Opus 4.7 | 98.5% on Anthropic's visual-acuity benchmark; nuanced visual judgment. |
| Music generation | Lyria 2 | The only first-party music model from the three providers. |
| Cost-sensitive frontier text work | GPT-5.4 ($2.50/$15) ≈ Gemini 3.1 Pro ($2/$12) | Both substantially cheaper than Opus 4.7 ($5/$25). Pick by free tier and ecosystem fit. |
| Cost-sensitive volume work | Haiku 4.5 ≈ GPT-5.4 Nano ≈ Gemini 3.1 Flash-Lite | All three sit at the cheap end. Run a head-to-head on your own data. |
| Free-tier developer playground / prototyping | AI Studio + Flash / Flash-Lite | Most generous free tier still available on the API. Best place to start with no payment method. |
| Open-weight models (run yourself, fine-tune) | Gemma 4 / Gemma 3 | Only one of the three with first-party open weights. Anthropic and OpenAI ship closed only. |
| Engineering-heavy team workspace | Claude CoWork | MCP parity with Claude Code; skills migrate cleanly between Chat and IDE. |
| Mixed-org enterprise (sales, ops, product, eng) | ChatGPT Business / Enterprise | Broader connector ecosystem (Salesforce, Outlook, Box, Zendesk…), most mature admin tooling. |
| Companies already on Google Workspace | Gemini in Workspace + Vertex AI | AI lives where you already work — Docs, Sheets, Gmail, Meet. No tab-switching tax. |
| One vendor for text + image + audio + video | Widest first-party media surface: Gemini + Imagen + Veo + Lyria + Live + TTS — all under one API. | |
| One vendor for the cleanest agent + tooling story | Anthropic | Skills + MCP + Claude Code feel like one designed system. |
| Real-time X / social-discourse intelligence | Grok 4.3 + Live Search | Only provider with first-party access to X data. Nobody else can see what's trending on X right now. |
| Cheapest frontier-class text per token | DeepSeek V4 Pro discounted ($0.435/$0.87) → Grok 4.3 at list ($1.25/$2.50) | V4 Pro at the 75%-off rate (through 2026-05-31) undercuts everything; at list ($1.74/$3.48) Grok 4.3 reclaims it. |
| Open-weights frontier model | DeepSeek V4 Pro | MIT-licensed, 1.6T/49B MoE, agentic-coding open SOTA. Only frontier-class model in the world that's downloadable. |
| Agentic-coding benchmarks (open-source) | DeepSeek V4 Pro | Per DeepSeek's release notes: "Open-source SOTA in Agentic Coding benchmarks." Closed-source SOTA still tracks Claude / GPT. |
| Self-host / on-prem / data residency | DeepSeek V4 Pro / V4 Flash | Both open-weights, MIT license. Anthropic and OpenAI ship closed only; Gemma 4 not at full Gemini 3.1 Pro parity. |
| Largest output cap (long-form generation) | DeepSeek V4 (384K out) | 384,000 output tokens per request — largest in the field. Useful for codebase generation, long-form research reports. |
| Best prefix-cache economics | DeepSeek V4 Pro (~120× cache discount) | Cache-hit input is ~1/100 of cache-miss. If you reuse a long stable prompt, V4 Pro is hard to beat on $/M. |
| Agentic app-dev workflows (visual browsing, multi-step plan) | Qwen 3.6 Max | Explicitly tuned for autonomous agent work — app dev and visual browsing are the named flagship use cases. 1M+ context. |
| Image-to-video (top-fidelity) | HappyHorse 1.0 | Top-ranked image-to-video model — high-fidelity, realistic dynamic rendering. The animate-an-existing-image tier. |
| Broadest open-weights family across modalities | Alibaba (Qwen + Qwen Image + Qwen Omni) | Text, multimodal, image gen, audio — all open. DeepSeek wins on a single model; Alibaba wins on the family. |
| Cost-efficient hosted text (China-region access) | Qwen 3.6 Plus | "60% cheaper, 8× faster" generation positioning; explicit speed-and-cost tier for high-volume work. |
| One vendor for text + image + video + audio | Google or Alibaba | Google: Gemini + Imagen + Veo + Lyria + Live + TTS. Alibaba: Qwen + Qwen Image + Wan + HappyHorse + Qwen Omni. The two true full-stack multimodal vendors. |
| Massively-parallel agent swarms (300 sub-agents, 4000 steps) | Moonshot Kimi K2.6 | Open-weight 1T-MoE explicitly tuned for long-horizon agent orchestration. No big-vendor product currently markets at this fan-out scale. |
| Open-source frontier from a publicly-traded AI lab | Z.AI GLM-5.1 | 745B-MoE / 44B-active, 200K context, MIT, DeepSeek Sparse Attention. Choose when license clarity + corporate accountability matter. |
| Unified video + audio in one pass with rich reference inputs | ByteDance Seedance 2.0 | Up to 9 image / 3 video / 3 audio refs per prompt; 4-15s multi-shot output with dual-channel audio. Most flexible reference inputs in video gen. |
| Photorealism + multi-image references for image gen | Black Forest FLUX 2 Pro | 32B Rectified Flow Transformer + Mistral-3 24B VLM, up to 10 reference images per call, 4MP output. Lineage from Stable Diffusion team. |
| Cinematic motion + character consistency across shots | Kuaishou Kling 3 | Strongest multi-shot character consistency in the field; physics-grounded motion. From Kuaishou's short-video distribution priorities. |
| Open-source 4K video at $0.04/sec, fits 24GB VRAM | Lightricks LTX 2.3 | Only open-source 4K video model in this comparison; FP8 quantized runs on RTX 4090/5090. |
| Sub-second image gen on consumer hardware | Z-Image Turbo (Pruna / Tongyi-MAI) | 6B params, 8 inference steps, 16GB VRAM. Strong on Chinese + English typography. Ideal for high-volume / interactive UX. |
| You want a direct, opinionated AI (less hedging) | Grok | Default personality is more direct and opinionated than the others. For "just tell me what you'd do" tasks. |
| Cheap video gen with synchronized audio | Veo 3 ≈ Grok Imagine Video | Both generate video with audio natively. Veo 3 more polished; Grok Imagine cheaper ($0.05/sec for 720p). |
Specialized contenders (via attap.ai & partner platforms)
Six vendors that don't fit the "one shop, every modality" big-vendor frame, but win specific categories or accept different tradeoffs (open-source, on-prem, cost-floor, niche modality):
Moonshot AI — Kimi K2.6 (open-weight agent flagship)
1T-parameter MoE, 32B active, 262K context, Modified MIT. Released 2026-04-20. Defining feature: Agent Swarm coordinates up to 300 sub-agents across 4,000 steps per run — no big-vendor product currently advertises this fan-out scale. $0.60/$2.50 per 1M on Moonshot's own API.
Z.AI / Zhipu — GLM-5.1 (open-source frontier from a public AI lab)
745B / 44B-active MoE, 200K context, MIT. First open-source flagship from a publicly-traded Chinese AI company. DeepSeek Sparse Attention integrated for the first time. "From vibe coding to agentic engineering" is the explicit positioning. Cerebras-hosted variant runs faster on wafer-scale silicon.
ByteDance — Seedream 4.5 (image) + Seedance 2.0 (video+audio)
Seedream 4.5 generates and edits up to 4K with strong multi-image consistency and typography. Seedance 2.0 (Feb 2026) is uniquely multimodal-on-input — accepts up to 9 image / 3 video / 3 audio references in one prompt and outputs 4-15s multi-shot video with dual-channel audio. Distribution via Higgsfield, fal.ai, Runware, attap.ai (Seedance at 300 credits).
Black Forest Labs — FLUX 2 Pro (image gen frontier)
32B Rectified Flow Transformer + Mistral-3 24B VLM, 4MP output, up to 10 reference images per call. Founded by ex-Stable Diffusion team; respected for prompt fidelity and natural-language editing. ~60% first-attempt accuracy on complex typography. $0.014/image on the BFL official API.
Kuaishou Kling 3 + Lightricks LTX 2.3 (specialized video)
Kling 3 wins on multi-shot character consistency and cinematic motion physics — strongest character continuity in this set. LTX 2.3 is the only open-source 4K video model in the comparison: 22B DiT, native audio, ~$0.04/sec hosted, FP8 quantized fits a 24GB consumer GPU.
Z-Image Turbo (Pruna / Tongyi-MAI) — fast image gen
6B-parameter S3-DiT, 8 inference steps, sub-second wall clock on a 16GB GPU. Originated at Alibaba's Tongyi-MAI; Pruna AI's optimization engine compresses and accelerates it for production. Distinct strength: strong text rendering in both English and Chinese. Pruna's broader business is the optimization platform itself — the same engine speeds up other open-source diffusion models.
Topic-by-topic comparison
The grid below has panels for the six major vendors (Claude / OpenAI / Google / xAI / DeepSeek / Alibaba). Specialized contenders are called out in verdicts where they materially change the picture. Use the picker above to focus on up to 3 of the major-six.
1 · Frontier flagship model
Claude Opus 4.7
- Released 2026-04-16
- $5 / $25 per M tokens
- 200k context
- New
xhigheffort level - Major vision lift (98.5% visual-acuity)
- Task budgets (public beta), file-system memory
- Tokenizer change: 1.0–1.35× more tokens than 4.6
OpenAI GPT-5.5 / 5.5 Pro
- Released 2026-04-23
- API pricing TBA at time of writing
- In ChatGPT (Plus/Pro/Business/Enterprise) + Codex now
- Built around long-running goal completion
- 5.5 Pro for the very hardest tasks
Google Gemini 3.1 Pro
- Released 2026-02-19
- $2 / $12 per M tokens (≤200k input)
- $4 / $18 above 200k input — 2M-token context
- 94.3% on GPQA Diamond (highest reported at release)
- 77.1% on ARC-AGI-2 (vs 31.1% for Gemini 3 Pro)
- Deep Think mode for hardest problems
xAI Grok 4.3
- Released 2026-04-30
- $1.25 / $2.50 per M tokens — cheapest hosted frontier (list)
- 1M-token context, native video input
- Always-on reasoning (no effort dial)
- SuperGrok Heavy = multi-agent reasoning
- Live Search for real-time X grounding
DeepSeek V4 Pro
- Released 2026-04-24
- $0.435 / $0.87 (75% off thru 2026-05-31), $1.74 / $3.48 list
- 1M context, 384K max output (largest output cap)
- 1.6T total / 49B active MoE — open weights, MIT
- Open-source SOTA on agentic-coding benchmarks
- OpenAI- AND Anthropic-compatible API
Alibaba Qwen 3.6 Max
- Released 2026-04/05
- 1M+ token context
- Tuned for agentic workflows — app dev, visual browsing
- High-level coding + visual reasoning
- Proprietary (closed); pair with open Qwen 3.6 / 3.5 for self-host
- OpenAI-compatible Model Studio API
2 · Hardest reasoning & strategy
Opus 4.7 with xhigh
- Best on ambiguous, open-ended judgment
- Long-horizon rigor across multi-hour agentic work
- Internal file-system memory persists context
- Higher latency & output token usage at xhigh
o3 (and GPT-5.4 Pro)
- o3: reasoning-trained, explicit chain-of-thought
- $2 / $8 per M tokens — much cheaper than competing flagships
- GPT-5.4 Pro: $30 / $180 — reserve for the hardest jobs
- Five-level effort dial in 5.4 family
Gemini 3.1 Pro + Deep Think
- Reasoning lift via test-time compute (Deep Think mode)
- 77.1% on ARC-AGI-2 — best score reported on novel pattern recognition
- 94.3% on GPQA Diamond — highest at release
- Cheaper than Opus xhigh; ties or beats it on benchmarks
Grok 4.3 + SuperGrok Heavy
- Always-on reasoning — built in, can't be disabled
- SuperGrok Heavy: multi-agent reasoning ($300/mo plan)
- Cheap reasoning-class API ($1.25/$2.50)
- Less benchmarked publicly than the other three
- Strong agentic tool calling
V4 Pro (thinking enabled)
- "Beats all current open models in Math/STEM/Coding" per DeepSeek
- Toggle
thinking.type=enabledper request - World knowledge "trails only Gemini 3.1 Pro"
- Cheapest reasoning-class API at the discount ($0.435/$0.87)
- Public benchmark transparency thinner than US labs
Qwen 3.6 Max (thinking)
- Reasoning + visual reasoning bundled at the flagship tier
- Long-context reasoning across 1M+ tokens
- Agentic frame supports plan-execute-verify on hard problems
- Public benchmark publication thinner than US labs
- Pair with open Qwen 3.5 397B for self-host research
3 · Coding agents & pair programming
Claude Code (terminal & IDE)
- Local-first agent — reads, edits, runs commands
- Skills, hooks, sub-agents, MCP servers built-in
- CLAUDE.md project memory
- Plan mode, worktrees,
/ultrareview - Fast mode (Opus 4.6) for low-latency Opus depth
Codex (cloud + CLI)
- Cloud agent: connect a GitHub repo, ask, it opens PRs
- Local CLI option also available
- GPT-5.3-Codex: ~25% faster, coding-specialised
- GPT-5.4 folds 5.3-Codex stack into mainline
- ~80% on SWE-bench Verified (5.4)
Gemini Code Assist + Jules
- Code Assist: inline IDE completions + chat (VS Code, JetBrains)
- Jules: autonomous coding agent (cloud-based, GitHub-connected)
- Free individual tier; Standard / Enterprise for teams
- Tight Cloud Console integration on GCP projects
- Less mature agent ecosystem than Claude Code or Codex
xAI — no dedicated coding agent
- Grok 4.3 codes well in chat / API
- No first-party coding-agent product (no Claude-Code / Codex / Code-Assist equivalent)
- OpenAI-compatible API works with third-party tools (Cursor, Cline, etc.)
- Cheap token rate for raw code generation
DeepSeek — no dedicated coding agent
- V4 Pro = open-source SOTA on agentic-coding benchmarks (the model, not a product)
- No first-party coding-agent product
- OpenAI- and Anthropic-compatible — drops into Cursor, Cline, Claude Code (via gateway)
- Cheapest token rate for raw code generation, especially with cache hits
- Open weights — can fine-tune on internal codebases
Alibaba — Qwen 3.6 Max as agent (no first-party CLI)
- Qwen 3.6 Max explicitly tuned for autonomous app-dev workflows
- 1M+ context — fit a whole codebase in one prompt
- Visual browsing capability for UI work / screenshot reasoning
- No first-party Claude-Code / Codex-style CLI product
- Drops into Cursor / Cline via OpenAI-compatible API
4 · Long-context reading
Claude
- 200k tokens across 4.x lineup
- Excellent recall & reasoning within that window
- For larger inputs: chunk + summarize or use file-system memory
OpenAI
- GPT-4.1: 1,000,000-token context
- GPT-5.4: 272k standard, expandable to 1,050,000
- Above 272k input: $5/MTok (input price doubles)
- Gemini 3.1 Pro: 2,000,000-token context
- Gemini was the first to ship a 1M-token model (1.5 Pro, Feb 2024)
- Above 200k input: $4/MTok (input price doubles)
- Strong recall across the full window
xAI
- Grok 4.20: 2,000,000-token context
- Grok 4.1 Fast: 2,000,000-token context (at $0.20/$0.50!)
- Grok 4.3: 1M context (depth-per-token tradeoff)
- Cheapest 2M-context option in the market via 4.1 Fast
DeepSeek V4
- V4 Pro & V4 Flash: 1,000,000-token input context
- 384,000-token max output — largest output cap in the field
- Cache-hit input ~1/100 of cache-miss — best long-doc economics if you re-query
- Smaller window than Gemini/Grok 2M but the largest output
Alibaba Qwen 3.6 Max
- 1M+ token context — explicitly positioned for codebase / multi-doc work
- Long-context paired with agentic frame — process and act on large inputs
- Visual reasoning over the same long context (screenshots + text in one window)
- Smaller than Gemini / Grok 2M, but on par with GPT-5.4 / V4 Pro / Grok 4.3
5 · Browser & desktop automation (computer use)
Claude Computer Use
- Available since Sonnet 3.5 v2 (2024-10-22)
- Battle-tested in production agents
- More mature ecosystem & tooling
- Anthropic-published reference implementation
GPT-5.4 native Computer Use
- Released 2026-03-05 — newest entrant
- 75.0% on OSWorld-Verified (vs 47.3% for GPT-5.2)
- Beats human baseline of 72.4%
- 95% first-try / 100% within 3 tries on real portals
- ~3× faster, ~70% fewer tokens vs prior CUA models
Project Mariner / Gemini agentic
- Project Mariner: experimental browser agent (research preview)
- Gemini 3.1 Pro: stronger agentic performance vs 3 Pro
- No SOTA OSWorld score reported publicly
- Most production browser-automation today builds on Claude or OpenAI
- Watch I/O 2026 (May 19–20) for likely advances
xAI — limited public computer-use product
- No first-party Computer Use API
- Strong agentic tool calling in Grok 4.20 / 4.3
- Live Search covers "browse and read" but not "click and type"
- Currently not a player in the desktop-automation category
DeepSeek — not a player
- No Computer Use / desktop-automation product
- No public CUA-style API; vision in V4 lineup is limited
- Open weights mean third parties could build one — none commercially shipping yet
- Skip DeepSeek for this category today
Alibaba — visual browsing, no CUA API
- No first-party Computer Use API for click/type/navigate
- Qwen 3.6 Max has "visual browsing" capability — read and reason over UIs visually
- Not the same as a click-and-type agent product, but adjacent
- Pair with a third-party CUA harness for full automation
6 · Vision & image understanding
Claude Opus 4.7
- Images up to ~3.75 MP (2,576 px long edge)
- 98.5% on Anthropic's visual-acuity benchmark (vs 54.5% for 4.6)
- Strong nuanced visual reasoning
GPT-5.4
- Up to 10.24 MP at "original" detail (or 6,000px max edge)
- Up to 2.56 MP at "high" detail
- New detail-level controls per request
Gemini 3.1 Pro
- Multimodal-native since Gemini 1.0 (Dec 2023)
- 81% on MMMU-Pro, 87.6% on Video-MMMU (Gemini 3 Pro baseline)
- Native video understanding — read whole videos, not just frames
- Strong at OCR, charts, dashboards, diagrams
Grok 4.3
- Native video input — new in 4.3 (2026-04-30)
- Image input across 4.x lineup
- Less benchmarked publicly than the other three
- Cheapest video-capable frontier model on tokens
DeepSeek V4 — vision limited
- Image input supported in V4 lineup
- No native video understanding at frontier-quality
- Vision benchmarks publicly thinner than peers
- Not the pick for vision-heavy work
Alibaba Qwen 3.6 Max + Omni
- Qwen 3.6 Max: visual reasoning as a flagship capability — screenshots, diagrams, dashboards, document layouts
- Qwen 3.5 Omni (open): native text + audio + image + video in one model
- Open Qwen 3.6 35B-A3B / 27B variants: image-text-to-text capable
- Distinctive: visual browsing tasks — reasoning over a sequence of UI screenshots
7 · Voice & realtime
Claude voice mode (in chat)
- Voice in claude.ai mobile/desktop chat
- No public realtime / speech-to-speech API
- For voice agent products: build via STT → text → TTS
OpenAI Realtime API
- GA since 2025-08-28
- Native speech-to-speech (no separate STT/TTS pipeline)
gpt-4o-transcribe+gpt-4o-ttsas cheaper one-shot alternatives- Whisper available open-source for self-host
Gemini Live API + 3.1 Flash TTS
- Gemini 3.1 Flash Live: realtime voice + video + screen-share
- Gemini 3.1 Flash TTS (2026-04-15): natural-language voice control — no SSML
- Single-speaker and multi-speaker output
- Live mode in Gemini app reads camera/screen in real time
Grok voice mode + Companions
- Voice mode in Grok app (consumer-facing)
- Animated AI Companions with distinct voices
- No public realtime API for voice agents
- For product builds: not currently an option
DeepSeek — not a player
- No realtime voice API, no TTS, no STT as first-party products
- chat.deepseek.com has no native voice mode
- Pair with OpenAI Whisper / Realtime if voice is required
- Skip DeepSeek for voice-agent builds
Alibaba Qwen 3.5 Omni
- Native audio + multimodal in one open-weights model
- Plus dedicated ASR / TTS demos in the Qwen family
- Less productionized than OpenAI Realtime — heavier integration lift
- Edge case: only open model with first-party audio + vision in one
- Self-host path makes voice agents possible without per-token cost
8 · Image generation
Claude
- No native raster image generation
- Excellent at SVG & Mermaid in Artifacts
- Can produce HTML/CSS mockups in chat
OpenAI gpt-image-2
- Released 2026-04-21
- First OpenAI image model with native reasoning
- Strong text rendering, layout control
- DALL-E 2 & 3 retiring 2026-05-12
Imagen 4 (Ultra / Standard / Fast)
- GA on 2025-05-20 at I/O 2025
- Three tiers — Ultra for highest fidelity, Fast for cheap/fast
- Substantially improved text rendering over Imagen 3
- Available in Gemini API, AI Studio, Vertex AI
Grok Imagine — Image
- Multiple styles (anime, cyberpunk, futuristic, kawaii, minimal art…)
- Fast generation; image-edit instructions work well
- Weaker on in-image text rendering than Imagen 4 / gpt-image-2
- Available via Grok app, X.com, and Imagine API
DeepSeek — not a player
- No first-party image generation
- V4 lineup is text/code-focused
- Pair with Imagen 4 / gpt-image-2 / Grok Imagine if image gen is required
- Skip DeepSeek for image-gen builds
Alibaba Qwen Image
- Qwen Image 2512 — text-to-image, available in Model Studio and on Hugging Face (open weights)
- Strong on Chinese-language prompts
- Open-weights image gen — fine-tunable, self-hostable
- Less polished than Imagen 4 / gpt-image-2 on benchmarks
- Pair with HappyHorse 1.0 for image-to-video output
9 · Pricing — frontier & volume
Claude pricing
- Opus 4.7: $5 / $25
- Sonnet 4.6: mid-tier (verify on pricing page)
- Haiku 4.5: cheap, fast volume
- Tokenizer change in 4.7: 1.0–1.35× more tokens than 4.6
- Prompt caching available (very high savings on repeat context)
OpenAI pricing
- GPT-5.4: $2.50 / $15 (above 272k: $5 input)
- GPT-5.4 Pro: $30 / $180
- GPT-5: $1.25 / $10
- GPT-5 Mini: $0.25 / $2.00
- GPT-4.1 Nano: $0.10 / $0.40
- Batch API: 50% off, 24h turnaround
Google pricing
- Gemini 3.1 Pro: $2 / $12 (≤200k); $4 / $18 above
- Gemini 3 Flash: $0.50 / $3.00
- Gemini 3.1 Flash-Lite: $0.25 / $1.50
- Free tier retained on Flash & Flash-Lite (Pro paid-only since 2026-04-01)
- Context caching available; very long inputs price-tier above 200k
xAI pricing
- Grok 4.3: $1.25 / $2.50 — cheapest frontier on list
- Grok 4.20: $2.00 / $6.00 (2M context)
- Grok 4.1 Fast: $0.20 / $0.50 (2M context!)
- Aggressive 40% input price cut at 4.3 launch
- No batch-API discount equivalent
DeepSeek pricing
- V4 Pro discounted:
$0.435 / $0.87(75% off thru 2026-05-31) - V4 Pro list: $1.74 / $3.48
- V4 Flash: $0.14 / $0.28 (both modes)
- Cache hit input ~1/100 of cache miss — best in industry
- Self-host path eliminates per-token cost entirely (MIT weights)
Alibaba pricing
- Qwen 3.5 series shipped at "60% cheaper, 8× faster" than the prior generation
- Qwen 3.6 Plus positioned as the speed-and-cost-efficient tier
- Region-dependent — China / Singapore / international rates differ
- Open-weights variants (Qwen 3.6 / 3.5) eliminate per-token cost when self-hosted
- Verify per-region rates in Model Studio console
10 · Team workspaces
Claude CoWork
- Skills + MCP-native connectors
- Role-based plugins (engineering, sales, design…)
- Background agents (scheduled remote agents)
- Engineering-feel; tight integration with Claude Code
ChatGPT Business / Enterprise / Edu
- Custom GPTs, sharable across the workspace
- Broader connector ecosystem (Salesforce, Box, Zendesk, Outlook…)
- SSO/SCIM, audit logs, group permissions
- More mature admin tooling overall
Gemini in Workspace + Vertex AI
- AI inside Docs / Sheets / Gmail / Meet / Slides / Drive — no tab switch
- Gems (custom Geminis) shareable across workspace
- Vertex AI Agent Builder for production agents
- Grounding to BigQuery / Cloud Storage / Search
- NotebookLM for source-grounded research
xAI — no enterprise workspace product
- No Workspace / CoWork / Business equivalent
- Subscriptions are per-user (X Premium / SuperGrok / SuperGrok Heavy)
- Enterprise API access via direct contract
- Not currently a player in the team-workspace category
DeepSeek — not a player
- No team-workspace / business product
- chat.deepseek.com is consumer-only with no admin tooling
- API only at platform.deepseek.com — bring your own gateway
- Not currently a player in the team-workspace category
Alibaba — not a global workspace player
- No CoWork / Workspace / Business product targeting Western markets
- DingTalk (within China) integrates Qwen for enterprise use cases, but is not a global product
- Model Studio targets developers, not end-user collaboration
- Not a workspace player for org-wide global deployment today
11 · Developer experience & SDK
Anthropic API + Agent SDK
- Clean, focused SDK
- MCP is first-class — server registry, hooks, skills
- Tool use is integrated and ergonomic
- Prompt caching with very large discounts
- Smaller surface area = easier to learn whole platform
OpenAI Responses API + Codex + Realtime + …
- Largest product surface in the industry
- Responses API + function calling + structured outputs
- Realtime, Whisper, image, embeddings, batch
- More fragmented; multiple SDKs & surfaces
- Wider community / ecosystem (langchain, etc.)
Google Gemini API + AI Studio + Vertex
- AI Studio: best-in-class free playground for prototyping
- Gemini API: clean Python/Node SDK, generous free tier on Flash
- Native multimodal (text + image + video + audio) in one API
- Vertex AI for enterprise: grounding, tuning, agent builder
- Hosts third-party models too (Claude, Llama) on Vertex
xAI API (docs.x.ai)
- OpenAI-compatible — change one base URL, drop in Grok
- Live Search for real-time X / web grounding
- Imagine API for image + video gen
- Smaller surface area; no batch API, no enterprise IDE assistant
- Mostly direct API — limited cloud-marketplace presence
DeepSeek API (api-docs.deepseek.com)
- Both OpenAI- AND Anthropic-compatible endpoints — uniquely flexible
- Function calls, JSON mode, prefix caching, thinking-mode toggle
- Open weights on Hugging Face — self-host as a deployment option
- Available via OpenRouter, DeepInfra, Together, etc.
- Smaller first-party surface; no IDE assistant, no batch API
Alibaba Model Studio + DashScope
- OpenAI-compatible chat completions endpoint
- Native DashScope SDK with first-party features
- Multiple regions — China, Singapore, international
- Multimodal (Wan, HappyHorse, Image, Omni) as first-class API endpoints
- Open weights on Hugging Face — broadest open-weights family
12 · Safety, privacy & data handling
Anthropic
- Constitutional AI heritage; safety is a core brand pillar
- Enterprise tier: data not used for training
- Available on AWS Bedrock, GCP Vertex, MS Foundry
- Detailed model cards & deployment safety docs
OpenAI
- Business/Enterprise/Edu: data not used for training
- SSO, audit logs, retention controls
- Available on Azure OpenAI Service
- Public Deployment Safety Hub for newer models
- Vertex AI: enterprise data residency, IAM, audit logs
- Workspace data not used for model training
- SynthID watermarking on Imagen / Veo outputs
- SAIF (Secure AI Framework) for enterprise deployments
- Standard Google Cloud governance tooling
xAI
- Standard enterprise data terms via direct contract
- Less detailed model cards than Anthropic / Google
- Brand has had more public controversy on safety positioning
- No SOC 2 / GDPR enterprise tooling as widely documented
- Cloud-marketplace availability narrower than the others
DeepSeek
- China-based provenance — procurement / data-flow review needed for some regulated buyers
- Hosted API runs in PRC infrastructure — review terms before sending sensitive data
- MIT-licensed open weights are the privacy answer: self-host on your own GPUs
- No first-party SOC 2 / HIPAA / FedRAMP documentation as of mid-2026
- Available via Western providers (OpenRouter, DeepInfra) for those preferring non-PRC hosting
Alibaba
- China-based provenance; Alibaba Cloud regional hosting (China / Singapore / international) gives more options than DeepSeek's API-only setup
- Singapore-international region helps Western buyers seeking non-PRC data flow
- Open-weights Qwen family on Hugging Face for self-host privacy
- Standard Alibaba Cloud enterprise terms + audit / IAM / KMS tooling on the cloud side
- Less Western enterprise certification adoption than US peers
13 · Ecosystem & availability
Claude
- Anthropic API, Amazon Bedrock, GCP Vertex AI, MS Foundry
- Tight integration with the Anthropic-built tooling stack
- MCP is an open protocol with a growing third-party ecosystem
OpenAI
- OpenAI API, Azure OpenAI Service
- Largest third-party tooling ecosystem (langchain, llamaindex, etc.)
- Most existing AI app code targets the OpenAI API shape
- Gemini API, AI Studio, Vertex AI on GCP
- Open-weight Gemma on Hugging Face, Ollama, Kaggle
- Distribution across Android, Chrome, Workspace, Search
- Vertex Model Garden hosts third-party models (Claude, Llama)
xAI
- Direct API at api.x.ai (OpenAI-compatible)
- Distribution via X — uniquely embedded in the social network
- Grok 1 was open-weighted (March 2024); newer Groks are closed
- Narrower cloud-marketplace presence than the big three
- Smaller third-party tooling ecosystem
DeepSeek
- Direct API at api.deepseek.com (OpenAI- and Anthropic-compatible)
- Open weights on Hugging Face — MIT license, full V4 lineup downloadable
- Hosted via OpenRouter, DeepInfra, Together, Fireworks, etc.
- Not on AWS Bedrock / Vertex / Azure as a first-party offering
- Strongest cost-aware-router presence — "the cheap-frontier choice" by default in OpenRouter
Alibaba
- Alibaba Cloud Model Studio — multi-region (CN / SG / intl)
- Broadest open-weights family on Hugging Face — text + multimodal + image gen + audio
- Hosted via OpenRouter, DeepInfra, Together for Qwen text
- Distribution within Alibaba ecosystem — DingTalk, Taobao, Alipay, Alibaba Cloud customer base
- Not on AWS Bedrock / Vertex / Azure as first-party
14 · Video generation
Claude
- No native video generation
- Can describe storyboards, write video scripts
OpenAI Sora 2
- Released 2025-09-30
- App shut down 2026-04-26
- API discontinuing 2026-09-24
- Effectively exiting the category
Veo 3 + Veo 3 Fast + Veo 3.1 Lite
- GA on Vertex since 2025-05
- Generates synchronized audio — dialogue, SFX, ambient sound
- Fast tier and Lite preview for high-volume / iteration
- Veo 4 likely at I/O 2026 (May 19–20)
Grok Imagine — Video
- API launched 2026-01-28; v1.0 (10-sec, 720p) on 2026-02-03
- Native synchronized audio — same headline feature as Veo 3
- $0.05/sec for 720p w/ audio — cheaper than Veo 3
- Extend from Frame chains clips into longer sequences
- Less polished than Veo 3 on cinematic shots
DeepSeek — not a player
- No first-party video generation
- V4 lineup is text/code-focused
- Pair with Veo 3 / Grok Imagine / Wan / HappyHorse
- Skip DeepSeek for video-gen builds
Alibaba Wan 2.7 + HappyHorse 1.0
- Wan 2.7 — text-to-video; new in Model Studio (April 2026)
- HappyHorse 1.0 — top-ranked image-to-video; high-fidelity realistic dynamic rendering
- Image-to-video as a distinct first-class product is unique to Alibaba
- Single-vendor pipeline: Qwen Image → HappyHorse → Wan continuation
- Less benchmark data publicly than Veo 3 — verify on your use cases
15 · Open-weight models
Claude
- Closed-weight only — no Anthropic open releases
OpenAI
- Whisper is open-weight (audio recognition)
- No flagship LLM open-weight
Gemma family (open)
- Gemma 4 released 2026-04 — newest open generation
- Gemma 3: 1B–27B params, 128k context, multimodal, 140+ languages
- Same lineage as Gemini; weights on Hugging Face / Kaggle / Ollama
- Permissive license suitable for most commercial use
xAI — Grok 1 (one-off)
- Grok 1 open-weighted on 2024-03-17 — 314B-parameter MoE
- No newer Grok versions are open-weight
- One-shot release rather than ongoing open family
- Grok 1 is now significantly behind frontier; mainly historical interest
DeepSeek V4 family (open)
- V4 Pro: 1.6T total / 49B active MoE — MIT-licensed
- V4 Flash: 284B / 13B MoE — also MIT
- Weights on Hugging Face; runs on Ollama, vLLM, llama.cpp, etc.
- Most capable open-weights single model
- Open-source SOTA on agentic-coding benchmarks per release notes
Alibaba Qwen family (open)
- Broadest open-weights family across modalities — text, multimodal, audio, image gen
- Qwen 3.6 (35B-A3B / 27B), Qwen 3.5 (397B-A17B), Qwen 3.5 Omni
- Qwen Image 2512 for text-to-image (open)
- Active maintainer — frequent releases, deep model count on Hugging Face
- Note: Qwen 3.6 Max (the proprietary flagship) is not open
16 · Real-time data access (the Grok-only category)
Claude web search
- Web-search tool when enabled in chat / API
- No first-party social-network access
- Cited results, but with normal indexing latency
ChatGPT Search / SearchGPT
- Mature web-search grounding
- No first-party access to X, Reddit, or other social networks
- Indexes via standard search providers
Gemini grounding (Search)
- Native grounding to Google Search results
- Strongest web-grounding signal due to underlying Google index
- No native X access
Grok Live Search + X integration
- First-party access to X posts in real time
- Date-filtered queries: "what was said about X last 48 hours"
- Live Search API parameter — no plugin needed
- Web search also available; X is the differentiator
DeepSeek — not a player
- No first-party real-time data access
- chat.deepseek.com has a Search toggle (via standard providers) but no social-network grounding
- API has no built-in web search — bring your own retrieval pipeline
- Skip DeepSeek for "what's happening now on X" queries
Alibaba — not a player
- No first-party social-network access for global discourse
- chat.qwen.ai has web search; not a real-time-X equivalent
- For Chinese-internet content (Weibo / Taobao reviews / Alipay merchant data), Alibaba's ecosystem reach is unique — but not a packaged Live-Search-style API
- Skip Alibaba for global real-time discourse intelligence
How I'd combine them in practice
You don't have to pick one. The strongest 2026 setups I'm seeing in the wild:
- Default chat / writing for individuals: Whichever subscription you already have. All four are excellent — Claude Pro, ChatGPT Plus, Google AI Pro, X Premium / SuperGrok. Stop optimizing — start using.
- Production text apps: Route by task. Cheap triage (Haiku 4.5 / GPT-5 Mini / Gemini 3.1 Flash-Lite / Grok 4.1 Fast) → escalate to a frontier model only when triage flags "hard." Cuts cost 50–80%. Route to Gemini 3.1 Pro or Grok 4.20 when input exceeds 200k tokens; route to Grok 4.3 when budget is the dominant constraint at the frontier tier.
- Engineering workflow: Claude Code for hands-on local work + Codex for cloud parallel jobs + GPT-5.4 for browser/desktop automation + Gemini Code Assist for IDE inline. Grok via Cursor/Cline for cheap raw code generation. Use whichever model is best for the subtask, not "the company we picked."
- Agentic products: OpenAI Realtime or Gemini Live for voice + GPT-5.4 for Computer Use + Claude / Gemini 3.1 Pro / Grok 4.3 for the orchestration brain. Veo 3 or Grok Imagine for video generation. Test multiple with your evals.
- Real-time / social-discourse intelligence: Grok with Live Search. No alternative. Pair with Claude or GPT for downstream synthesis once Grok has pulled the X data.
- Customer-facing chat: All four are fine. Test latency and quality on your domain. Often Sonnet 4.6, GPT-5, Gemini 3 Flash, or Grok 4.1 Fast wins on cost-quality balance — leave the flagships for the hardest 5% of queries.
- Multimodal media product (text + image + audio + video): Google leads on breadth — Imagen 4 + Veo 3 + Lyria 2 + Gemini Live + 3.1 Flash TTS under one API. Grok Imagine is a cheaper alternative for the image+video subset.
- Open-weight / on-prem / fine-tuning: DeepSeek V4 Pro (MIT, 1.6T MoE, agentic-coding open SOTA) for single-model frontier; Alibaba Qwen family (Qwen 3.6 / 3.5 / Omni / Image) for the broadest open-weights stack across modalities; Gemma 4 for US-provenance preference. Anthropic, OpenAI, and xAI don't ship maintained frontier open weights.
- Multimodal media product (text + image + audio + video): Google leads on breadth from a single US vendor (Imagen 4 + Veo 3 + Lyria 2 + Gemini Live + 3.1 Flash TTS). Alibaba matches on breadth from the open-weights side (Qwen 3.6 Max + Qwen Image + Wan 2.7 + HappyHorse 1.0 + Qwen Omni). Pick by procurement and license preferences.
- Agentic coding / app dev: Claude Code / Codex / Code Assist for product workflows. For raw model capability inside your own harness: V4 Pro (open SOTA) or Qwen 3.6 Max (explicit agentic-app-dev tuning + 1M context + visual reasoning).
- Image-to-video specifically: HappyHorse 1.0 from Alibaba is the dedicated image-to-video product; pair with any image gen for a two-step pipeline.
- Cost-floor production routing: DeepSeek V4 Flash ($0.14/$0.28), Grok 4.1 Fast ($0.20/$0.50, 2M context), and Qwen 3.6 Plus ("60% cheaper, 8× faster" tier) compete for the cheapest competent tier. V4 Pro at the discount ($0.435/$0.87) opens up frontier-quality work at sub-Grok pricing — but watch the May 31 expiry.