Chat

Chat completions endpoint. Sends a conversation to an AI model and returns the response as a Server-Sent Event (SSE) stream or a single JSON object.

POST /chat

Authentication

Requires one of:

X-Api-Key header with a valid API key
Authorization and X-Firebase-AppCheck headers for browser-based auth

Request Body

Parameter

Type

Required

Default

Description

model

string

Auto-routed

Model ID to use. If omitted, the platform auto-selects based on query complexity.

messages

array

Yes

Array of message objects forming the conversation.

stream

boolean

true

Whether to stream the response via SSE.

temperature

float

0.7

Sampling temperature. Range: 0.0 to 2.0.

max_tokens

integer

Model default

Maximum tokens to generate. Clamped to the model's maximum output limit.

session_id

string

Resume an existing conversation session.

Message Object

Field

Type

Required

Description

role

string

Yes

One of system, user, or assistant.

content

string

Yes

The message content.

Request Example

curl -X POST https://inference-api-611798501438.us-central1.run.app/chat \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: inf_your_api_key_here" \
  -d '{
    "model": "claude-sonnet",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum entanglement in simple terms."}
    ],
    "stream": true,
    "temperature": 0.7,
    "max_tokens": 1024
  }'

Response (Streaming)

When stream is true (the default), the response is an SSE stream with Content-Type: text/event-stream.

Event Types

chunk -- Incremental text content:

data: {"type": "chunk", "content": "Quantum entanglement is"}

data: {"type": "chunk", "content": " a phenomenon where two"}

data: {"type": "chunk", "content": " particles become linked."}

done -- Stream complete with usage metadata:

data: {"type": "done", "model": "claude-sonnet", "session_id": "abc123", "credits_used": 18, "input_tokens": 42, "output_tokens": 156}

error -- Error during generation:

data: {"type": "error", "message": "Model returned an error"}

Response (Non-Streaming)

When stream is false, the response is a single JSON object:

{
  "content": "Quantum entanglement is a phenomenon where two particles become linked in such a way that measuring one instantly affects the other, regardless of distance.",
  "model": "claude-sonnet",
  "session_id": "abc123",
  "credits_used": 18,
  "input_tokens": 42,
  "output_tokens": 156
}

Auto-Routing

When the model parameter is omitted, the platform automatically selects a model based on the content and complexity of the query. This is useful for general-purpose applications that do not need to target a specific model.

Credit Billing

Credits are handled using a reservation-and-reconciliation model:

Reservation: Before generation begins, the platform reserves credits based on a worst-case estimate (maximum possible output tokens for the selected model).
Generation: The model produces the response.
Reconciliation: After generation completes, the reserved amount is adjusted to reflect the actual token usage. Unused reserved credits are returned to the user's balance.

If the user's credit balance is insufficient to cover the worst-case reservation, the request is rejected with a 403 status.

Error Responses

Status

Description

400

Invalid request body or parameters

401

Missing or invalid authentication

403

Insufficient credits for the request

404

Specified model not found

429

Rate limit exceeded

500

Provider or internal server error

Previous❱ API Reference NextCode (Agentic)

Last updated 1 hour ago

Good morning

hashtagAuthentication

hashtagRequest Body

hashtagMessage Object

hashtagRequest Example

hashtagResponse (Streaming)

hashtagEvent Types

hashtagResponse (Non-Streaming)

hashtagAuto-Routing

hashtagCredit Billing

hashtagError Responses

Authentication

Request Body

Message Object

Request Example

Response (Streaming)

Event Types

Response (Non-Streaming)

Auto-Routing

Credit Billing

Error Responses