Evaluations
Copy page
Manage evaluators programmatically with the TypeScript SDK
The TypeScript SDK provides an EvaluationClient that talks to the Evaluations API so you can manage evaluators, evaluation suite configs, trigger batch evaluations, and read results—all from code.
For full endpoint details and request/response shapes, see the Evaluations API reference.
Setup: create a client
Create an evaluation client with your tenant ID, project ID, API base URL, and optional API key.
| Parameter | Type | Required | Description |
|---|---|---|---|
tenantId | string | Yes | Your tenant (organization) ID |
projectId | string | Yes | Your project ID |
apiUrl | string | Yes | API base URL (e.g. https://api.inkeep.com or your self-hosted URL) |
apiKey | string | No | Bearer token for authenticated requests. Omit for unauthenticated or custom auth. |
Use client in the examples below (e.g. client.createEvaluator(...)).
Evaluators
Evaluators define how to score agent outputs (e.g. with a prompt and model, optional pass criteria).
Creating an evaluator
Pass an object with name, description, prompt, schema (JSON schema for the evaluator output), and model (model identifier and optional provider options). Optionally include passCriteria to define pass/fail conditions on the schema fields.
Evaluation suite configs
Suite configs group evaluators and optional agent filters and sample rates. They are used by continuous tests (evaluation run configs) to decide which conversations to evaluate automatically.
Creating an evaluation suite config
Pass evaluatorIds (required, at least one) and optionally sampleRate (0–1) and filters (e.g. agentIds to restrict which agents’ conversations are evaluated). The suite can then be attached to a continuous test (evaluation run config).
| Option | Type | Required | Description |
|---|---|---|---|
evaluatorIds | string[] | Yes | At least one evaluator ID to run in this suite |
sampleRate | number | No | Fraction of matching conversations to evaluate (0–1). Omit to evaluate all. |
filters | object | No | Restrict which conversations are in scope, e.g. { agentIds: ["agent-id"] } |
Batch evaluation
Trigger a one-off batch evaluation over conversations, optionally filtered by conversation IDs or date range:
| Option | Type | Required | Description |
|---|---|---|---|
evaluatorIds | string[] | Yes | IDs of evaluators to run |
name | string | No | Name for the job (defaults to a timestamped name) |
conversationIds | string[] | No | Limit to these conversations |
dateRange | Object with startDate and endDate (YYYY-MM-DD) | No | Limit to conversations in this date range |
To list results by job or run config, use the Evaluations API (e.g. get evaluation results by job config ID or by run config ID).
Related
- Evaluations API reference — Full list of evaluation endpoints and schemas
- Visual Builder: Evaluations — Configure evaluators, batch evaluations, and continuous tests in the UI