Ensemble

Ensemble consensus is a multi-model iterative refinement pattern. Three specialized models -- a Drafter, a Critic, and a Synthesizer -- collaborate in a loop to produce high-quality output that meets a configurable quality threshold.

Endpoint

POST /internal/agent/ensemble

This is an internal OIDC-authenticated endpoint on the agent executor service. It is not directly accessible to end users.

Concept

The ensemble pattern assigns three distinct roles to three different models, each operating at a different temperature to balance creativity and rigor:

                     +---------------------------+
                     |    Round N (N = 1..max)    |
                     +---------------------------+
                                |
                                v
                     +---------------------+
                     |      DRAFTER        |
                     |  (creative, high T) |
                     |  Generates output   |
                     |  + self-assessment   |
                     +---------------------+
                                |
                                v
                     +---------------------+
                     |       CRITIC        |
                     |  (rigorous, low T)  |
                     |  Scores dimensions  |
                     |  + verdict          |
                     +---------------------+
                                |
                                v
                     +---------------------+
                     |    SYNTHESIZER      |
                     |  (balanced, mid T)  |
                     |  Integrates feedback|
                     |  + decision         |
                     +---------------------+
                                |
                        +-------+-------+
                        |               |
                   APPROVED       ANOTHER_ROUND
                        |               |
                        v               v
                   Return output   Loop back to
                                   Drafter (round N+1)

Role Definitions

Role

Purpose

Temperature

Behavior

Drafter

Generate creative, comprehensive output

High (0.4--0.8)

Produces initial content. In rounds > 1, explicitly addresses previous feedback and critical issues. Includes a self-assessment section.

Critic

Evaluate output rigorously against quality dimensions

Low (0.2--0.3)

Scores each dimension 1--10. Provides composite score, verdict, critical issues, suggestions, and specific edits. Never generates content.

Synthesizer

Integrate feedback and decide whether to approve or loop

Medium (0.3--0.5)

Reviews drafter output and critic feedback. Applies non-destructive edits. Issues final APPROVED or ANOTHER_ROUND decision.

Request Body

{
  "user_id": "string",
  "task": "string",
  "drafter_model": "string",
  "critic_model": "string",
  "synthesizer_model": "string",
  "drafter_temperature": 0.8,
  "critic_temperature": 0.2,
  "synthesizer_temperature": 0.5,
  "max_rounds": 3,
  "approval_threshold": 8.0,
  "scoring_dimensions": [
    "accuracy",
    "completeness",
    "clarity",
    "actionability"
  ],
  "context": "Optional additional context injected into all prompts."
}

Parameter Reference

Parameter

Type

Required

Constraints

Description

user_id

string

Yes

Authenticated user ID for credit billing.

task

string

Yes

The task description provided to the Drafter.

drafter_model

string

Yes

Must be a valid model ID

Model used for the Drafter role.

critic_model

string

Yes

Must be a valid model ID

Model used for the Critic role.

synthesizer_model

string

Yes

Must be a valid model ID

Model used for the Synthesizer role.

drafter_temperature

float

0.0--1.0

Temperature for the Drafter. Default: 0.7.

critic_temperature

float

0.0--1.0

Temperature for the Critic. Default: 0.2.

synthesizer_temperature

float

0.0--1.0

Temperature for the Synthesizer. Default: 0.4.

max_rounds

integer

Yes

1--5

Maximum number of Drafter-Critic-Synthesizer loops.

approval_threshold

float

Yes

1.0--10.0

Minimum composite score required for approval.

scoring_dimensions

array

1--10 items

Dimensions the Critic evaluates. Default: ["accuracy", "completeness", "clarity", "actionability"].

context

string

Additional context injected into all model prompts.

Loop Mechanics

Round Execution

Each round proceeds through three phases:

Phase 1 -- Drafter generates output:

Receives the original task and any context.
If round > 1, also receives the Critic's feedback from the previous round, including critical issues and specific edit suggestions.
Produces the output content plus a self-assessment section.

Phase 2 -- Critic evaluates output:

Receives the Drafter's output and the original task.
Scores each dimension in scoring_dimensions on a 1--10 scale.
Produces a structured evaluation:

COMPOSITE SCORE: 7.2 / 10.0
VERDICT: NEEDS_REVISION

DIMENSION SCORES:
- accuracy: 8
- completeness: 6
- clarity: 8
- actionability: 6

CRITICAL ISSUES:
1. Missing competitive analysis section
2. Revenue projections lack supporting data

SUGGESTIONS:
- Add market size estimates with cited sources
- Include risk mitigation strategies

SPECIFIC EDITS:
- Section 3, paragraph 2: Replace "significant growth" with quantified projections
- Add new subsection: "Competitive Landscape" between sections 2 and 3

Phase 3 -- Synthesizer decides:

Receives the Drafter's output and the Critic's evaluation.
Integrates non-destructive edits where possible.
Issues a decision:
- DECISION: APPROVED -- output meets quality threshold, return to caller.
- DECISION: ANOTHER_ROUND -- output needs revision, loop back to Drafter.

Approval Logic

The Synthesizer approves the output when both conditions are met:

The Critic's composite score is >= approval_threshold.
The Critic's verdict contains zero critical issues.

If either condition fails and rounds remain, the Synthesizer issues ANOTHER_ROUND.

Final Round Behavior

On the final round (round == max_rounds), the Synthesizer always auto-approves regardless of score. If unresolved critical issues remain, the Synthesizer appends [ESCALATE] notes to the output, identifying issues that could not be fully resolved within the allotted rounds.

Active Ensemble Configurations

The platform ships with five pre-configured ensemble profiles optimized for common use cases:

Ensemble

Drafter

Critic

Synthesizer

Max Rounds

Threshold

Content

claude-sonnet (T=0.8)

gpt-5.2 (T=0.2)

gemini-3.1-pro (T=0.5)

8.0

Research

gpt-5.2 (T=0.4)

claude-opus (T=0.2)

gemini-3.1-pro (T=0.3)

8.5

Strategy

claude-opus (T=0.6)

gpt-5.2 (T=0.2)

claude-sonnet (T=0.5)

8.0

Code Review

qwen3-coder (T=0.3)

claude-sonnet (T=0.2)

gpt-5.2 (T=0.2)

9.0

Comms/PR

claude-sonnet (T=0.6)

gemini-3.1-pro (T=0.2)

gpt-5.2 (T=0.4)

8.5

Configuration Notes

Content: High-creativity drafter for marketing and editorial work. Rigorous fact-checking critic. Three rounds allow iterative refinement.
Research: Moderate-temperature drafter for analytical tasks. Highest threshold (8.5) ensures factual rigor.
Strategy: Claude Opus drafts strategic documents. Cross-vendor critic provides independent assessment.
Code Review: Lowest temperatures across all roles. Highest threshold (9.0) enforces strict correctness. Three rounds for thorough iteration.
Comms/PR: Optimized for professional communications. Two rounds balance quality with speed.

Plan Limits

Plan

Max Rounds Allowed

Notes

Free

Not available

Ensemble is not available on the Free plan.

Guru

max_rounds capped at 2 regardless of request value.

Pro

max_rounds capped at 3. Full access to all configurations.

Requests exceeding the plan's round limit will have max_rounds silently clamped to the plan maximum. The response includes the effective max_rounds used.

Example Request

POST /internal/agent/ensemble
Authorization: Bearer <OIDC token>

{
  "user_id": "uid_abc123",
  "task": "Write a 1500-word blog post on the impact of edge computing on IoT latency, targeting a technical audience familiar with cloud infrastructure.",
  "drafter_model": "claude-sonnet",
  "critic_model": "gpt-5.2",
  "synthesizer_model": "gemini-3.1-pro",
  "drafter_temperature": 0.8,
  "critic_temperature": 0.2,
  "synthesizer_temperature": 0.5,
  "max_rounds": 3,
  "approval_threshold": 8.0,
  "scoring_dimensions": [
    "accuracy",
    "completeness",
    "clarity",
    "technical_depth",
    "actionability"
  ],
  "context": "Target publication: company engineering blog. Audience: senior engineers and architects."
}

Example Response

{
  "status": "completed",
  "ensemble_id": "ens_7f3a9c2e",
  "rounds_used": 2,
  "max_rounds": 3,
  "final_score": 8.6,
  "approval_threshold": 8.0,
  "decision": "APPROVED",
  "scoring_dimensions": {
    "accuracy": 9,
    "completeness": 8,
    "clarity": 9,
    "technical_depth": 8,
    "actionability": 9
  },
  "output": "# Edge Computing and IoT Latency: A Technical Deep Dive\n\n...(full blog post content)...",
  "escalations": [],
  "round_history": [
    {
      "round": 1,
      "drafter_model": "claude-sonnet",
      "critic_score": 6.8,
      "critic_verdict": "NEEDS_REVISION",
      "critical_issues": [
        "Missing latency benchmarks comparing edge vs cloud",
        "No mention of 5G integration considerations"
      ],
      "synthesizer_decision": "ANOTHER_ROUND"
    },
    {
      "round": 2,
      "drafter_model": "claude-sonnet",
      "critic_score": 8.6,
      "critic_verdict": "EXCELLENT",
      "critical_issues": [],
      "synthesizer_decision": "APPROVED"
    }
  ],
  "credits_used": 342,
  "model_calls": [
    { "model": "claude-sonnet", "role": "drafter", "round": 1, "credits": 68 },
    { "model": "gpt-5.2", "role": "critic", "round": 1, "credits": 45 },
    { "model": "gemini-3.1-pro", "role": "synthesizer", "round": 1, "credits": 52 },
    { "model": "claude-sonnet", "role": "drafter", "round": 2, "credits": 74 },
    { "model": "gpt-5.2", "role": "critic", "round": 2, "credits": 48 },
    { "model": "gemini-3.1-pro", "role": "synthesizer", "round": 2, "credits": 55 }
  ]
}

Error Handling

Error

HTTP Status

Description

INSUFFICIENT_CREDITS

402

User does not have enough credits for the estimated ensemble cost.

PLAN_LIMIT_EXCEEDED

403

Requested max_rounds exceeds plan allowance (informational; rounds are clamped).

INVALID_MODEL

400

One or more specified models are not available in the model registry.

DRAFTER_FAILURE

502

The Drafter model returned an error during generation.

CRITIC_FAILURE

502

The Critic model returned an error during evaluation.

SYNTHESIZER_FAILURE

502

The Synthesizer model returned an error during integration.

On any model failure, the ensemble returns partial results including all completed rounds and the error detail.

Previous❱ Orchestration NextSwarm

Last updated 1 hour ago

Good morning

hashtagEndpoint

hashtagConcept

hashtagRole Definitions

hashtagRequest Body

hashtagParameter Reference

hashtagLoop Mechanics

hashtagRound Execution

hashtagApproval Logic

hashtagFinal Round Behavior

hashtagActive Ensemble Configurations

hashtagConfiguration Notes

hashtagPlan Limits

hashtagExample Request

hashtagExample Response

hashtagError Handling

hashtagRelated

Endpoint

Concept

Role Definitions

Request Body

Parameter Reference

Loop Mechanics

Round Execution

Approval Logic

Final Round Behavior

Active Ensemble Configurations

Configuration Notes

Plan Limits

Example Request

Example Response

Error Handling

Related