❱ Models
The Majestix AI Inference Hub provides access to 18 models across 4 providers: Anthropic, OpenAI, Vertex AI (Google), and OpenRouter. All models are accessed through a single unified API endpoint (POST /chat) using a consistent request format and SSE streaming protocol.
Available Models
claude-sonnet
Anthropic
Claude Sonnet 4.6
200K
Balanced performance across all tasks
claude-haiku
Anthropic
Claude Haiku 4.5
200K
Fast responses, cost-sensitive workloads
claude-opus
Anthropic
Claude Opus 4.6
200K
Complex reasoning, difficult problems
gpt-5-mini
OpenAI
GPT-5 Mini
400K
Fast general-purpose tasks
gpt-5.2
OpenAI
GPT-5.2
400K
Flagship-quality responses
gpt-5.2-codex
OpenAI
GPT-5.2 Codex
400K
Code generation, agentic workflows
cmo-agent
OpenAI (fine-tuned)
CMO Agent
128K
Marketing content and strategy
gemini-3-flash
Vertex AI
Gemini 3 Flash
1M
Fast responses with massive context
gemini-3.1-pro
Vertex AI
Gemini 3.1 Pro
1M
Reasoning with massive context
gemini-3-image
Vertex AI
Gemini 3 Image
--
Image generation
gpt-5-image
OpenAI
GPT-5 Image
--
Image generation
seedream-4.5
OpenRouter (Google)
Seedream 4.5
--
High-quality image generation
llama-4-maverick
OpenRouter (Meta)
Llama 4 Maverick
1M
Large-context open-source tasks
deepseek-v3.2
OpenRouter
DeepSeek V3.2
164K
Budget-friendly general tasks
deepseek-r1
OpenRouter
DeepSeek R1
64K
Reasoning and chain-of-thought
qwen3-coder
OpenRouter
Qwen3 Coder
262K
Code generation, agentic workflows
kimi-k2.5
OpenRouter (Moonshot)
Kimi K2.5
262K
Versatile general-purpose tasks
grok-4.1-fast
OpenRouter (xAI)
Grok 4.1 Fast
2M
Tasks requiring the largest context window
Custom Models
The platform includes proprietary fine-tuned models exclusive to Majestix AI, identified by the is_custom flag in the model catalog:
cmo-agent
Marketing & Strategy
Fine-tuned OpenAI model trained as a Chief Marketing Officer. Excels at campaigns, copywriting, SEO, brand strategy, and go-to-market planning. See OpenAI Models for full details.
Custom models are tagged with a Custom badge in the web app's model selector. More custom models are planned for future releases.
Free Tier Model Access
The Free plan ($0/month, 500 credits) restricts model access to fast-tier models only:
claude-haiku
claude-sonnet, claude-opus
gpt-5-mini
gpt-5.2, gpt-5.2-codex, cmo-agent
gemini-3-flash
gemini-3.1-pro
deepseek-v3.2
deepseek-r1, qwen3-coder, kimi-k2.5, grok-4.1-fast, llama-4-maverick
—
All image generation models
Upgrade to Guru ($10/month) or Pro ($50/month) for full access to all 18 models. See Credits & Billing for plan details.
Provider Pages
Anthropic Models -- Claude Sonnet, Haiku, and Opus
OpenAI Models -- GPT-5 family and CMO Agent (custom)
Vertex AI Models -- Gemini family and Seedream
OpenRouter Models -- Llama, DeepSeek, Qwen, Kimi, and Grok
Auto-Routing
If you omit the model parameter from your POST /chat request (or select Auto Router in the web app), the platform automatically selects the best model for your task. The router considers:
Message content — marketing queries route to
cmo-agent, coding queries to coding-optimized modelsTask complexity — simple Q&A routes to fast models, complex analysis routes to reasoning models
Message length — very long inputs route to large-context models (Gemini, Grok)
Model strengths — each model is tagged with specializations that inform routing decisions
Auto-routing respects your plan's model access. Free-tier users are routed within the fast-tier models only.
Example -- explicit model selection:
Example -- auto-routed (no model specified):
Provider Fallback
When a primary provider experiences an outage or returns a transient error, the platform automatically retries the request on a comparable model from a different provider. This happens transparently -- the client receives a normal streamed response and does not need to handle retry logic.
The fallback map pairs models with similar capability profiles across providers. For example, a request targeting claude-sonnet may fall back to gpt-5.2 or gemini-3.1-pro if Anthropic is unreachable. Fallback is attempted once; if the fallback provider also fails, the error is returned to the client.
Fallback routing is designed to preserve response quality. A flagship model will never fall back to a budget model or vice versa.
Token Limits and max_tokens Clamping
Each model defines a maximum output token limit. If your request specifies a max_tokens value that exceeds the model's supported maximum, the platform automatically clamps it down to the model's limit rather than rejecting the request. This ensures that requests are never refused solely due to an overly large max_tokens value.
If you do not specify max_tokens, the platform uses a sensible default for the selected model.
Choosing a Model
Use the following guidelines to select the right model for your use case:
General chat and Q&A
claude-sonnet, gpt-5.2, kimi-k2.5
Fast, cost-effective responses
claude-haiku, gpt-5-mini, deepseek-v3.2, gemini-3-flash
Complex reasoning and analysis
claude-opus, deepseek-r1, gemini-3.1-pro
Code generation and agentic tasks
gpt-5.2-codex, qwen3-coder
Very large documents (100K+ tokens)
gemini-3-flash, gemini-3.1-pro, llama-4-maverick, grok-4.1-fast
Marketing content
cmo-agent
Image generation
gemini-3-image, gpt-5-image, seedream-4.5
Last updated
