Data Sources
Data sources provide test cases for your evaluations. Load from static arrays, production systems, or custom functions.
Overview
Orchestrated supports multiple ways to provide test data:
- Static arrays - Hardcoded test cases for development
- Built-in data sources -
interactions()for production data - Custom data sources - Your own data loading logic
All data sources return test cases with the shape:
{
input: any, // Input to your system
output?: any, // Expected/actual output
expected?: any, // Ground truth for comparison
ctx?: object, // Per-test-case context override
tags?: string[], // Tags for filtering/grouping
}
Static Data
The simplest way to provide test data is with a static array:
await Eval("Static Data Eval", {
data: [
{
input: "What is 2+2?",
output: "4",
expected: "4",
},
{
input: "What is the capital of France?",
output: "Paris",
expected: "Paris",
tags: ["geography"],
},
],
scores: [Factuality],
})
With Context Overrides
Each test case can override the base context:
await Eval("Context Override Eval", {
ctx: { temperature: 0.7 },
data: [
{ input: "test 1" },
{ input: "test 2", ctx: { temperature: 0.9 } }, // Override
],
task: (input, ctx) => callLLM(input, ctx.temperature),
scores: [Effectiveness],
})
Built-in Data Sources
interactions()
Load user interactions from your production system:
import { interactions } from 'orchestrated'
await Eval("Production Eval", {
data: interactions({
tenantId: "acme", // Organization ID
serviceName: "chatbot", // Service to evaluate
environment: "production", // Environment filter
limit: 100, // Max test cases
startDate: "2025-01-01", // Date range (optional)
endDate: "2025-01-31",
}),
scores: [Effectiveness],
})
All parameters are optional and default to your configured state values.
getDataset()
Generic dataset fetcher for any dataset type:
import { getDataset } from 'orchestrated'
await Eval("Generic Dataset Eval", {
data: getDataset({
name: "my-custom-dataset",
version: "v1",
}),
scores: [Effectiveness],
})
Custom Data Sources
Define your own data loading logic:
async function myDataSource() {
// Load from database, API, files, etc.
const rows = await db.query("SELECT * FROM test_cases")
return rows.map(row => ({
input: row.input,
output: row.output,
expected: row.expected,
tags: row.tags?.split(','),
}))
}
await Eval("Custom Data Eval", {
data: myDataSource,
scores: [Effectiveness],
})
With Parameters
Data source functions can accept parameters:
function createDataSource(options: { limit: number }) {
return async () => {
const rows = await db.query(
"SELECT * FROM test_cases LIMIT ?",
[options.limit]
)
return rows.map(/* ... */)
}
}
await Eval("Parameterized Data Eval", {
data: createDataSource({ limit: 50 }),
scores: [Effectiveness],
})
Data Source API
createApiGatewayDataSource()
Create a data source function that fetches from API Gateway:
import { createApiGatewayDataSource } from 'orchestrated'
const myDataSource = createApiGatewayDataSource({
endpoint: "/datasets/my-dataset",
transform: (response) => response.items,
})
await Eval("API Gateway Eval", {
data: myDataSource,
scores: [Effectiveness],
})
Type Definitions
type TestCase = {
input: any
output?: any
expected?: any
ctx?: object
tags?: string[]
}
type DataSource = TestCase[] | (() => Promise<TestCase[]>)