Evaluations

Configure evaluators, batch evaluations, and continuous tests in the Visual Builder

The Visual Builder lets you define, manage, and run evaluations. You define evaluators (how to score agents), then run them in two ways: batch evaluations (one-time jobs over selected conversations) and continuous tests (automatic evaluation on a sample of live conversations).

Where to find evaluations

Open your project in the Visual Builder.
In the project sidebar, go to Evaluations for evaluators, batch jobs, and continuous tests.

Note

You need Edit permission on the project to create or change evaluators, batch evaluations, and continuous tests. See Access control for roles and permissions.

Evaluators

Evaluators define how agent responses are scored. Each evaluator has a prompt, a schema, and optionally a pass criteria to produce a score or structured output.

Creating an evaluator

Go to Evaluations and open the Evaluators tab.
Click New evaluator.
Fill in:
- Name and optional Description
- Prompt — instructions for the model (e.g. what to score and how)
- Schema — JSON schema for structured output (e.g. numeric score, categories)
- Model — the model used to run the evaluator
- Pass criteria (optional) — conditions on numeric schema fields that define pass/fail (e.g. score >= 0.8)
Save. The evaluator is then available for batch evaluations and continuous tests.

Example Evaluator

Evaluator form in the Visual Builder showing name, prompt, schema, model, and pass criteria fields

Editing or deleting

From the Evaluators list, open an evaluator to view or edit it, or use the delete action.

Batch evaluations

Batch evaluations run selected evaluators over a set of conversations once. You choose which evaluators to run and over what date range.

Creating a batch evaluation

Go to Evaluations and open the Batch Evaluations tab.
Click New batch evaluation.
Select one or more Evaluators.
Narrow the scope by Date range — only conversations within that range
Start the job. A new batch evaluation job is created and runs asynchronously.

Go to Evaluations and open the Continuous Tests tab.
Click New continuous test.
Set Name and optional Description.
The config is Active so it runs on new conversations.
Choose Evaluators to run.
Optionally restrict by Agents (only evaluate conversations for those agents).
Set Sample rate (0–1) to evaluate a fraction of matching conversations.
Save. Once active, matching conversations will be evaluated according to the sample rate.

Viewing results

From the Continuous Tests list, open a config to see Results for that run config: all evaluation results triggered by that continuous test, with filters.

Summary

Area	Purpose
Evaluators	Define how to score agent outputs (prompt, model, schema, pass criteria).
Batch evaluations	Run evaluators once over a scoped set of conversations (date range).
Continuous tests	Automatically run evaluators on a sample of live conversations.

For programmatic access to the same concepts, see TypeScript SDK: Evaluations and the Evaluations API reference.

Where to find evaluations

Evaluators

Creating an evaluator

Example Evaluator

Editing or deleting

Batch evaluations

Creating a batch evaluation

Viewing results

Continuous tests

Creating a continuous test

Viewing results

Summary

On this page