Data Sources

Data sources provide test cases for your evaluations. Load from static arrays, production systems, or custom functions.

Overview

Orchestrated supports multiple ways to provide test data:

Static arrays - Hardcoded test cases for development
Built-in data sources - interactions() for production data
Custom data sources - Your own data loading logic

All data sources return test cases with the shape:

{
  input: any,          // Input to your system
  output?: any,        // Expected/actual output
  expected?: any,      // Ground truth for comparison
  ctx?: object,        // Per-test-case context override
  tags?: string[],     // Tags for filtering/grouping
}

Static Data

The simplest way to provide test data is with a static array:

await Eval("Static Data Eval", {
  data: [
    {
      input: "What is 2+2?",
      output: "4",
      expected: "4",
    },
    {
      input: "What is the capital of France?",
      output: "Paris",
      expected: "Paris",
      tags: ["geography"],
    },
  ],
  scores: [Factuality],
})

With Context Overrides

Each test case can override the base context:

await Eval("Context Override Eval", {
  ctx: { temperature: 0.7 },
  data: [
    { input: "test 1" },
    { input: "test 2", ctx: { temperature: 0.9 } },  // Override
  ],
  task: (input, ctx) => callLLM(input, ctx.temperature),
  scores: [Effectiveness],
})

Built-in Data Sources

interactions()

Load user interactions from your production system:

import { interactions } from 'orchestrated'

await Eval("Production Eval", {
  data: interactions({
    tenantId: "acme",           // Organization ID
    serviceName: "chatbot",     // Service to evaluate
    environment: "production",  // Environment filter
    limit: 100,                 // Max test cases
    startDate: "2025-01-01",    // Date range (optional)
    endDate: "2025-01-31",
  }),
  scores: [Effectiveness],
})

All parameters are optional and default to your configured state values.

getDataset()

Generic dataset fetcher for any dataset type:

import { getDataset } from 'orchestrated'

await Eval("Generic Dataset Eval", {
  data: getDataset({
    name: "my-custom-dataset",
    version: "v1",
  }),
  scores: [Effectiveness],
})

Custom Data Sources

Define your own data loading logic:

async function myDataSource() {
  // Load from database, API, files, etc.
  const rows = await db.query("SELECT * FROM test_cases")

  return rows.map(row => ({
    input: row.input,
    output: row.output,
    expected: row.expected,
    tags: row.tags?.split(','),
  }))
}

await Eval("Custom Data Eval", {
  data: myDataSource,
  scores: [Effectiveness],
})

With Parameters

Data source functions can accept parameters:

function createDataSource(options: { limit: number }) {
  return async () => {
    const rows = await db.query(
      "SELECT * FROM test_cases LIMIT ?",
      [options.limit]
    )
    return rows.map(/* ... */)
  }
}

await Eval("Parameterized Data Eval", {
  data: createDataSource({ limit: 50 }),
  scores: [Effectiveness],
})

Data Source API

createApiGatewayDataSource()

Create a data source function that fetches from API Gateway:

import { createApiGatewayDataSource } from 'orchestrated'

const myDataSource = createApiGatewayDataSource({
  endpoint: "/datasets/my-dataset",
  transform: (response) => response.items,
})

await Eval("API Gateway Eval", {
  data: myDataSource,
  scores: [Effectiveness],
})

Type Definitions

type TestCase = {
  input: any
  output?: any
  expected?: any
  ctx?: object
  tags?: string[]
}

type DataSource = TestCase[] | (() => Promise<TestCase[]>)

Next: Tasks Data Sources API