API Core

Compare / Blend / Judge API Reference

Deep reference for parallel model evaluation, synthesis, and ranking workflows.

12 minUpdated 2026-02-15
Summary

Deep reference for parallel model evaluation, synthesis, and ranking workflows.

5 deep-dive sections1 code samples
Quick Start
  1. Copy the request sample from this page.
  2. Run it in API Explorer with your key.
  3. Confirm stream done payload (finish_reason + charged credits).
  4. Move the same payload into your backend code.

Endpoint matrix

ModeMethodPathModel countOutput
ComparePOST/api/v1/compare2 to 9Parallel responses + summary
BlendPOST/api/v1/blend2 to 6 (or 1 for self_moa)Single synthesized response
JudgePOST/api/v1/judge2 to 4 + judgeRanked verdict + winner

Compare behavior

  • Runs all selected models concurrently.
  • Emits per-model completion events.
  • Emits summary metadata (fastest, longest).
  • Refunds when all models fail.

Blend behavior

Blend execution
1
Gather
Collect source responses from selected models
2
Synthesize
Use synthesizer model with selected strategy
3
Done
Return synthesized answer + settled usage

Blend supports strategies:

  • consensus
  • council
  • best_of
  • chain
  • moa (Mixture-of-Agents refinement layers)
  • self_moa (Self-MoA: multiple candidates from one base model)

Notes:

  • Most strategies require 2+ models. Passing 1 model returns a 400 error.
  • For self_moa, pass exactly 1 model in models[] and set samples (2–8).
  • For moa, set layers (1–3). Each layer refines answers using the previous layer as references.
  • The judge model cannot be one of the contestants.

Judge behavior

Judge mode collects contestant outputs, then prompts the judge model to return ranked JSON.

{
  "event": "verdict",
  "winner": "claude-sonnet-4.5",
  "scores": [
    {"model": "claude-sonnet-4.5", "rank": 1, "score": 9.2, "reasoning": "..."},
    {"model": "gpt-5.2", "rank": 2, "score": 8.8, "reasoning": "..."}
  ],
  "overall": "Claude response was more complete and better structured."
}

Failure semantics

ModeFailure behavior
CompareRefund if all candidate models fail
BlendMay return partial if sources succeed but synthesizer fails
JudgeRequires at least two successful contestant outputs
Docs Assistant

ChatKit-style guided help

Product-scoped assistant for LLMWise docs and API usage. It does not answer unrelated topics.

Sign in to ask implementation questions and get runnable snippets.

Sign in to use assistant
Previous
Auto Routing and Optimization (Load Balancer Mode)
Next
API Explorer Guide