API Core
Compare / Blend / Judge API Reference
Deep reference for parallel model evaluation, synthesis, and ranking workflows.
12 minUpdated 2026-02-15
Summary
Deep reference for parallel model evaluation, synthesis, and ranking workflows.
5 deep-dive sections1 code samples
Quick Start
- Copy the request sample from this page.
- Run it in API Explorer with your key.
- Confirm stream done payload (finish_reason + charged credits).
- Move the same payload into your backend code.
Endpoint matrix
| Mode | Method | Path | Model count | Output |
|---|---|---|---|---|
| Compare | POST | /api/v1/compare | 2 to 9 | Parallel responses + summary |
| Blend | POST | /api/v1/blend | 2 to 6 (or 1 for self_moa) | Single synthesized response |
| Judge | POST | /api/v1/judge | 2 to 4 + judge | Ranked verdict + winner |
Compare behavior
- Runs all selected models concurrently.
- Emits per-model completion events.
- Emits summary metadata (
fastest,longest). - Refunds when all models fail.
Blend behavior
Blend execution
1
Gather
Collect source responses from selected models
2
Synthesize
Use synthesizer model with selected strategy
3
Done
Return synthesized answer + settled usage
Blend supports strategies:
consensuscouncilbest_ofchainmoa(Mixture-of-Agents refinement layers)self_moa(Self-MoA: multiple candidates from one base model)
Notes:
- Most strategies require 2+ models. Passing 1 model returns a 400 error.
- For
self_moa, pass exactly 1 model inmodels[]and setsamples(2–8). - For
moa, setlayers(1–3). Each layer refines answers using the previous layer as references. - The judge model cannot be one of the contestants.
Judge behavior
Judge mode collects contestant outputs, then prompts the judge model to return ranked JSON.
{
"event": "verdict",
"winner": "claude-sonnet-4.5",
"scores": [
{"model": "claude-sonnet-4.5", "rank": 1, "score": 9.2, "reasoning": "..."},
{"model": "gpt-5.2", "rank": 2, "score": 8.8, "reasoning": "..."}
],
"overall": "Claude response was more complete and better structured."
}
Failure semantics
| Mode | Failure behavior |
|---|---|
| Compare | Refund if all candidate models fail |
| Blend | May return partial if sources succeed but synthesizer fails |
| Judge | Requires at least two successful contestant outputs |
Docs Assistant
ChatKit-style guided help
Product-scoped assistant for LLMWise docs and API usage. It does not answer unrelated topics.
Sign in to ask implementation questions and get runnable snippets.
Sign in to use assistantPrevious
Auto Routing and Optimization (Load Balancer Mode)
Next
API Explorer Guide