Ensemble

Ensemble consensus is a multi-model iterative refinement pattern. Three specialized models -- a Drafter, a Critic, and a Synthesizer -- collaborate in a loop to produce high-quality output that meets a configurable quality threshold.


Endpoint

POST /internal/agent/ensemble

This is an internal OIDC-authenticated endpoint on the agent executor service. It is not directly accessible to end users.


Concept

The ensemble pattern assigns three distinct roles to three different models, each operating at a different temperature to balance creativity and rigor:

                     +---------------------------+
                     |    Round N (N = 1..max)    |
                     +---------------------------+
                                |
                                v
                     +---------------------+
                     |      DRAFTER        |
                     |  (creative, high T) |
                     |  Generates output   |
                     |  + self-assessment   |
                     +---------------------+
                                |
                                v
                     +---------------------+
                     |       CRITIC        |
                     |  (rigorous, low T)  |
                     |  Scores dimensions  |
                     |  + verdict          |
                     +---------------------+
                                |
                                v
                     +---------------------+
                     |    SYNTHESIZER      |
                     |  (balanced, mid T)  |
                     |  Integrates feedback|
                     |  + decision         |
                     +---------------------+
                                |
                        +-------+-------+
                        |               |
                   APPROVED       ANOTHER_ROUND
                        |               |
                        v               v
                   Return output   Loop back to
                                   Drafter (round N+1)

Role Definitions

Role
Purpose
Temperature
Behavior

Drafter

Generate creative, comprehensive output

High (0.4--0.8)

Produces initial content. In rounds > 1, explicitly addresses previous feedback and critical issues. Includes a self-assessment section.

Critic

Evaluate output rigorously against quality dimensions

Low (0.2--0.3)

Scores each dimension 1--10. Provides composite score, verdict, critical issues, suggestions, and specific edits. Never generates content.

Synthesizer

Integrate feedback and decide whether to approve or loop

Medium (0.3--0.5)

Reviews drafter output and critic feedback. Applies non-destructive edits. Issues final APPROVED or ANOTHER_ROUND decision.


Request Body

Parameter Reference

Parameter
Type
Required
Constraints
Description

user_id

string

Yes

--

Authenticated user ID for credit billing.

task

string

Yes

--

The task description provided to the Drafter.

drafter_model

string

Yes

Must be a valid model ID

Model used for the Drafter role.

critic_model

string

Yes

Must be a valid model ID

Model used for the Critic role.

synthesizer_model

string

Yes

Must be a valid model ID

Model used for the Synthesizer role.

drafter_temperature

float

No

0.0--1.0

Temperature for the Drafter. Default: 0.7.

critic_temperature

float

No

0.0--1.0

Temperature for the Critic. Default: 0.2.

synthesizer_temperature

float

No

0.0--1.0

Temperature for the Synthesizer. Default: 0.4.

max_rounds

integer

Yes

1--5

Maximum number of Drafter-Critic-Synthesizer loops.

approval_threshold

float

Yes

1.0--10.0

Minimum composite score required for approval.

scoring_dimensions

array

No

1--10 items

Dimensions the Critic evaluates. Default: ["accuracy", "completeness", "clarity", "actionability"].

context

string

No

--

Additional context injected into all model prompts.


Loop Mechanics

Round Execution

Each round proceeds through three phases:

Phase 1 -- Drafter generates output:

  • Receives the original task and any context.

  • If round > 1, also receives the Critic's feedback from the previous round, including critical issues and specific edit suggestions.

  • Produces the output content plus a self-assessment section.

Phase 2 -- Critic evaluates output:

  • Receives the Drafter's output and the original task.

  • Scores each dimension in scoring_dimensions on a 1--10 scale.

  • Produces a structured evaluation:

Phase 3 -- Synthesizer decides:

  • Receives the Drafter's output and the Critic's evaluation.

  • Integrates non-destructive edits where possible.

  • Issues a decision:

    • DECISION: APPROVED -- output meets quality threshold, return to caller.

    • DECISION: ANOTHER_ROUND -- output needs revision, loop back to Drafter.

Approval Logic

The Synthesizer approves the output when both conditions are met:

  1. The Critic's composite score is >= approval_threshold.

  2. The Critic's verdict contains zero critical issues.

If either condition fails and rounds remain, the Synthesizer issues ANOTHER_ROUND.

Final Round Behavior

On the final round (round == max_rounds), the Synthesizer always auto-approves regardless of score. If unresolved critical issues remain, the Synthesizer appends [ESCALATE] notes to the output, identifying issues that could not be fully resolved within the allotted rounds.


Active Ensemble Configurations

The platform ships with five pre-configured ensemble profiles optimized for common use cases:

Ensemble
Drafter
Critic
Synthesizer
Max Rounds
Threshold

Content

claude-sonnet (T=0.8)

gpt-5.2 (T=0.2)

gemini-3.1-pro (T=0.5)

3

8.0

Research

gpt-5.2 (T=0.4)

claude-opus (T=0.2)

gemini-3.1-pro (T=0.3)

2

8.5

Strategy

claude-opus (T=0.6)

gpt-5.2 (T=0.2)

claude-sonnet (T=0.5)

2

8.0

Code Review

qwen3-coder (T=0.3)

claude-sonnet (T=0.2)

gpt-5.2 (T=0.2)

3

9.0

Comms/PR

claude-sonnet (T=0.6)

gemini-3.1-pro (T=0.2)

gpt-5.2 (T=0.4)

2

8.5

Configuration Notes

  • Content: High-creativity drafter for marketing and editorial work. Rigorous fact-checking critic. Three rounds allow iterative refinement.

  • Research: Moderate-temperature drafter for analytical tasks. Highest threshold (8.5) ensures factual rigor.

  • Strategy: Claude Opus drafts strategic documents. Cross-vendor critic provides independent assessment.

  • Code Review: Lowest temperatures across all roles. Highest threshold (9.0) enforces strict correctness. Three rounds for thorough iteration.

  • Comms/PR: Optimized for professional communications. Two rounds balance quality with speed.


Plan Limits

Plan
Max Rounds Allowed
Notes

Free

Not available

Ensemble is not available on the Free plan.

Guru

2

max_rounds capped at 2 regardless of request value.

Pro

3

max_rounds capped at 3. Full access to all configurations.

Requests exceeding the plan's round limit will have max_rounds silently clamped to the plan maximum. The response includes the effective max_rounds used.


Example Request

Example Response


Error Handling

Error
HTTP Status
Description

INSUFFICIENT_CREDITS

402

User does not have enough credits for the estimated ensemble cost.

PLAN_LIMIT_EXCEEDED

403

Requested max_rounds exceeds plan allowance (informational; rounds are clamped).

INVALID_MODEL

400

One or more specified models are not available in the model registry.

DRAFTER_FAILURE

502

The Drafter model returned an error during generation.

CRITIC_FAILURE

502

The Critic model returned an error during evaluation.

SYNTHESIZER_FAILURE

502

The Synthesizer model returned an error during integration.

On any model failure, the ensemble returns partial results including all completed rounds and the error detail.


Last updated