Tutorials

Prompt Regression Testing Tutorial

Create suites, run prompt regressions, schedule recurring checks, and export CSV results.

13 minUpdated 2026-02-15
Summary

Create suites, run prompt regressions, schedule recurring checks, and export CSV results.

5 deep-dive sections2 code samples
Quick Start
  1. Start from your current production prompt/request.
  2. Run the exact tutorial flow step-by-step once.
  3. Measure impact in Usage before rollout.
  4. Promote only when quality/cost/reliability metrics match target.

What this feature covers

  • Prebuilt prompt templates
  • Custom suite creation
  • Manual and scheduled test runs
  • CSV export for historical tracking

Workflow

Templates -> suite -> run -> schedule
Define
  • GET /optimization/test-templates
  • POST /optimization/test-suites
  • Set models and cases
Execute
  • POST /optimization/test-suites/{suite_id}/run
  • Collect scores and latency
  • Store run artifacts
Automate
  • POST /optimization/regression-schedules
  • POST /optimization/regression-schedules/{id}/run
  • GET /optimization/test-runs/{id}/csv

Core endpoints

MethodPathPurpose
GET/api/v1/optimization/test-templatesList prebuilt templates
POST/api/v1/optimization/test-suitesCreate suite
PUT/api/v1/optimization/test-suites/{suite_id}Update suite
POST/api/v1/optimization/test-suites/{suite_id}/runRun suite now
GET/api/v1/optimization/test-runsList run history
GET/api/v1/optimization/test-runs/{run_id}/csvDownload run CSV
POST/api/v1/optimization/regression-schedulesCreate schedule
PUT/api/v1/optimization/regression-schedules/{schedule_id}Update schedule

Example suite payload

{
  "name": "Code quality regression",
  "description": "Weekly check for code prompts",
  "models": ["gpt-5.2", "claude-sonnet-4.5", "deepseek-v3"],
  "template_ids": ["code-review", "summarization"],
  "temperature": 0.2,
  "max_tokens": 800,
  "is_active": true
}

Example schedule payload

{
  "name": "Weekly code regression",
  "suite_id": "SUITE_UUID",
  "cadence_minutes": 10080,
  "enabled": true
}
Practical baseline

Start with one suite for your top 10 production prompts and run weekly. Expand only after your scoring rubric is stable.

Docs Assistant

ChatKit-style guided help

Product-scoped assistant for LLMWise docs and API usage. It does not answer unrelated topics.

Sign in to ask implementation questions and get runnable snippets.

Sign in to use assistant
Previous
Replay Lab Tutorial
Next
Blend Strategies & Orchestration Algorithms