❱ Models

The Majestix AI Inference Hub provides access to 18 models across 4 providers: Anthropic, OpenAI, Vertex AI (Google), and OpenRouter. All models are accessed through a single unified API endpoint (POST /chat) using a consistent request format and SSE streaming protocol.

Available Models

Key

Provider

Underlying Model

Context Window

Best For

claude-sonnet

Anthropic

Claude Sonnet 4.6

200K

Balanced performance across all tasks

claude-haiku

Anthropic

Claude Haiku 4.5

200K

Fast responses, cost-sensitive workloads

claude-opus

Anthropic

Claude Opus 4.6

200K

Complex reasoning, difficult problems

gpt-5-mini

OpenAI

GPT-5 Mini

400K

Fast general-purpose tasks

gpt-5.2

OpenAI

GPT-5.2

400K

Flagship-quality responses

gpt-5.2-codex

OpenAI

GPT-5.2 Codex

400K

Code generation, agentic workflows

cmo-agent

OpenAI (fine-tuned)

CMO Agent

128K

Marketing content and strategy

gemini-3-flash

Vertex AI

Gemini 3 Flash

Fast responses with massive context

gemini-3.1-pro

Vertex AI

Gemini 3.1 Pro

Reasoning with massive context

gemini-3-image

Vertex AI

Gemini 3 Image

Image generation

gpt-5-image

OpenAI

GPT-5 Image

Image generation

seedream-4.5

OpenRouter (Google)

Seedream 4.5

High-quality image generation

llama-4-maverick

OpenRouter (Meta)

Llama 4 Maverick

Large-context open-source tasks

deepseek-v3.2

OpenRouter

DeepSeek V3.2

164K

Budget-friendly general tasks

deepseek-r1

OpenRouter

DeepSeek R1

64K

Reasoning and chain-of-thought

qwen3-coder

OpenRouter

Qwen3 Coder

262K

Code generation, agentic workflows

kimi-k2.5

OpenRouter (Moonshot)

Kimi K2.5

262K

Versatile general-purpose tasks

grok-4.1-fast

OpenRouter (xAI)

Grok 4.1 Fast

Tasks requiring the largest context window

Custom Models

The platform includes proprietary fine-tuned models exclusive to Majestix AI, identified by the is_custom flag in the model catalog:

Model

Specialization

Details

cmo-agent

Marketing & Strategy

Fine-tuned OpenAI model trained as a Chief Marketing Officer. Excels at campaigns, copywriting, SEO, brand strategy, and go-to-market planning. See OpenAI Models for full details.

Custom models are tagged with a Custom badge in the web app's model selector. More custom models are planned for future releases.

Free Tier Model Access

The Free plan ($0/month, 500 credits) restricts model access to fast-tier models only:

Available on Free

Not Available on Free

claude-haiku

claude-sonnet, claude-opus

gpt-5-mini

gpt-5.2, gpt-5.2-codex, cmo-agent

gemini-3-flash

gemini-3.1-pro

deepseek-v3.2

deepseek-r1, qwen3-coder, kimi-k2.5, grok-4.1-fast, llama-4-maverick

—

All image generation models

Upgrade to Guru ($10/month) or Pro ($50/month) for full access to all 18 models. See Credits & Billing for plan details.

Provider Pages

Anthropic Models -- Claude Sonnet, Haiku, and Opus
OpenAI Models -- GPT-5 family and CMO Agent (custom)
Vertex AI Models -- Gemini family and Seedream
OpenRouter Models -- Llama, DeepSeek, Qwen, Kimi, and Grok

Auto-Routing

If you omit the model parameter from your POST /chat request (or select Auto Router in the web app), the platform automatically selects the best model for your task. The router considers:

Message content — marketing queries route to cmo-agent, coding queries to coding-optimized models
Task complexity — simple Q&A routes to fast models, complex analysis routes to reasoning models
Message length — very long inputs route to large-context models (Gemini, Grok)
Model strengths — each model is tagged with specializations that inform routing decisions

Auto-routing respects your plan's model access. Free-tier users are routed within the fast-tier models only.

Example -- explicit model selection:

{
  "model": "claude-sonnet",
  "messages": [{"role": "user", "content": "Explain quicksort."}]
}

Example -- auto-routed (no model specified):

{
  "messages": [{"role": "user", "content": "Explain quicksort."}]
}

Provider Fallback

When a primary provider experiences an outage or returns a transient error, the platform automatically retries the request on a comparable model from a different provider. This happens transparently -- the client receives a normal streamed response and does not need to handle retry logic.

The fallback map pairs models with similar capability profiles across providers. For example, a request targeting claude-sonnet may fall back to gpt-5.2 or gemini-3.1-pro if Anthropic is unreachable. Fallback is attempted once; if the fallback provider also fails, the error is returned to the client.

Fallback routing is designed to preserve response quality. A flagship model will never fall back to a budget model or vice versa.

Token Limits and max_tokens Clamping

Each model defines a maximum output token limit. If your request specifies a max_tokens value that exceeds the model's supported maximum, the platform automatically clamps it down to the model's limit rather than rejecting the request. This ensures that requests are never refused solely due to an overly large max_tokens value.

If you do not specify max_tokens, the platform uses a sensible default for the selected model.

Choosing a Model

Use the following guidelines to select the right model for your use case:

Use Case

Recommended Models

General chat and Q&A

claude-sonnet, gpt-5.2, kimi-k2.5

Fast, cost-effective responses

claude-haiku, gpt-5-mini, deepseek-v3.2, gemini-3-flash

Complex reasoning and analysis

claude-opus, deepseek-r1, gemini-3.1-pro

Code generation and agentic tasks

gpt-5.2-codex, qwen3-coder

Very large documents (100K+ tokens)

gemini-3-flash, gemini-3.1-pro, llama-4-maverick, grok-4.1-fast

Marketing content

cmo-agent

Image generation

gemini-3-image, gpt-5-image, seedream-4.5

PreviousBilling NextAnthropic (Claude)

Last updated 8 minutes ago

Good morning

hashtagAvailable Models

hashtagCustom Models

hashtagFree Tier Model Access

hashtagProvider Pages

hashtagAuto-Routing

hashtagProvider Fallback

hashtagToken Limits and max_tokens Clamping

hashtagChoosing a Model

Available Models

Custom Models

Free Tier Model Access

Provider Pages

Auto-Routing

Provider Fallback

Token Limits and max_tokens Clamping

Choosing a Model