Google (Gemini)

The Majestix AI Inference Hub provides access to two Gemini chat models and two image generation models through Google Cloud's Vertex AI. The Gemini models stand out for their 1M token context windows -- the largest among the platform's chat models -- making them the best choice for tasks involving very large inputs.

Available Models

Key
Underlying Model
Context Window
Category

gemini-3-flash

Gemini 3 Flash

1M

Fast / Large Context

gemini-3.1-pro

Gemini 3.1 Pro

1M

Reasoning / Large Context

gemini-3-image

Gemini 3 Image

--

Image Generation

seedream-4.5

Seedream 4.5

--

Image Generation

gemini-3-flash

Gemini 3 Flash is Google's fast, cost-effective model with a 1M token context window. It delivers rapid responses while being able to ingest vastly more input than any Anthropic or OpenAI chat model.

When to use: Processing entire codebases, analyzing very long documents (books, legal filings, transcripts), high-throughput workloads where speed matters, and any task where you need to fit large amounts of context into a single request without truncation.

When to consider alternatives: For tasks requiring deep multi-step reasoning on large inputs, step up to gemini-3.1-pro. If context window size is not a concern and you prefer Anthropic or OpenAI model characteristics, claude-haiku or gpt-5-mini are comparable in speed and cost.

gemini-3.1-pro

Gemini 3.1 Pro is Google's reasoning-focused model. It pairs the same 1M token context window with stronger analytical and reasoning capabilities compared to Flash.

When to use: Complex analysis over large documents, research synthesis across many sources, multi-step reasoning tasks that also require large context, and scenarios where you need both depth of understanding and breadth of input.

When to consider alternatives: If the task is straightforward and does not require deep reasoning, gemini-3-flash provides faster responses at lower cost. For reasoning tasks on shorter inputs (under 200K tokens), claude-opus or deepseek-r1 may produce higher-quality results.

gemini-3-image

Gemini 3 Image generates images using Google's Gemini image generation capabilities through Vertex AI.

When to use: Creating images when you want Google's generative image quality, or as an alternative to OpenAI's image generation.

When to consider alternatives: For different artistic styles, try gpt-5-image (OpenAI) or seedream-4.5 (Google's dedicated image model).

Note: Image generation models use flat per-image credit pricing rather than per-token billing.

seedream-4.5

Seedream 4.5 is Google's dedicated high-quality image generation model, accessed through the OpenRouter relay. It is purpose-built for image synthesis and produces detailed, high-fidelity outputs.

When to use: When image quality is the top priority, marketing and design assets, detailed illustrations, and scenarios where you want the best available image generation quality from Google.

When to consider alternatives: If you want faster image generation or a different visual style, try gemini-3-image or gpt-5-image.

Note: Image generation models use flat per-image credit pricing rather than per-token billing.

The 1M Context Window

The Gemini chat models offer a 1M token context window, which translates to roughly 750,000 words of input. To put this in perspective:

Input Size
Tokens (approx.)
Fits in Gemini?
Fits in GPT-5?
Fits in Claude?

Short conversation

1K--10K

Yes

Yes

Yes

Long document

50K--100K

Yes

Yes

Yes

Full codebase

200K--500K

Yes

Yes

No

Book-length text

500K--800K

Yes

No

No

Multiple books / datasets

800K--1M

Yes

No

No

If your workload regularly exceeds 400K tokens, Gemini models are the practical choice among major-provider offerings. The only model with an even larger context window is grok-4.1-fast at 2M tokens, available through OpenRouter.

Pricing Tier

Gemini 3 Flash is among the most cost-effective models on the platform, offering an exceptional ratio of context capacity to credit cost. Gemini 3.1 Pro is priced at a moderate tier, comparable to other reasoning-focused models. Both image generation models use flat per-image pricing.

Last updated