API Core

Compare / Blend / Judge API Reference

Deep reference for parallel model evaluation, synthesis, and ranking workflows.

12 minUpdated 2026-02-15

Summary

Deep reference for parallel model evaluation, synthesis, and ranking workflows.

5 deep-dive sections1 code samples

Quick Start

Copy the request sample from this page.
Run it in API Explorer with your key.
Confirm stream done payload (finish_reason + charged credits).
Move the same payload into your backend code.

Endpoint matrix

Mode	Method	Path	Model count	Output
Compare	POST	/api/v1/compare	2 to 9	Parallel responses + summary
Blend	POST	/api/v1/blend	2 to 6 (or 1 for self_moa)	Single synthesized response
Judge	POST	/api/v1/judge	2 to 4 + judge	Ranked verdict + winner

Compare behavior

Runs all selected models concurrently.
Emits per-model completion events.
Emits summary metadata (fastest, longest).
Refunds when all models fail.

Blend behavior

Blend execution

Gather

Collect source responses from selected models

Synthesize

Use synthesizer model with selected strategy

Done

Return synthesized answer + settled usage

Blend supports strategies:

consensus
council
best_of
chain
moa (Mixture-of-Agents refinement layers)
self_moa (Self-MoA: multiple candidates from one base model)

Notes:

Most strategies require 2+ models. Passing 1 model returns a 400 error.
For self_moa, pass exactly 1 model in models[] and set samples (2–8).
For moa, set layers (1–3). Each layer refines answers using the previous layer as references.
The judge model cannot be one of the contestants.

Judge behavior

Judge mode collects contestant outputs, then prompts the judge model to return ranked JSON.

{
  "event": "verdict",
  "winner": "claude-sonnet-4.5",
  "scores": [
    {"model": "claude-sonnet-4.5", "rank": 1, "score": 9.2, "reasoning": "..."},
    {"model": "gpt-5.2", "rank": 2, "score": 8.8, "reasoning": "..."}
  ],
  "overall": "Claude response was more complete and better structured."
}

Failure semantics

Mode	Failure behavior
Compare	Refund if all candidate models fail
Blend	May return partial if sources succeed but synthesizer fails
Judge	Requires at least two successful contestant outputs

Chat API reference Replay Lab tutorial Regression testing tutorial

Docs Assistant

ChatKit-style guided help

Product-scoped assistant for LLMWise docs and API usage. It does not answer unrelated topics.

Auto Routing and Optimization (Load Balancer Mode)

API Explorer Guide