Orchestrated SDK Documentation
The Orchestrated SDK provides a complete toolkit for evaluating LLM applications. Whether you're testing prompt changes, measuring model performance, or tracking quality over time, Orchestrated gives you the tools to evaluate systematically and at scale.
Key Features
Custom Scorers
- Define your own evaluation criteria with TypeScript. Use LLM-as-judge, deterministic functions, or hybrid approaches.
Batch Processing
- Automatically batch expensive LLM-based evaluations for cost efficiency and faster results.
Data Sources
- Connect to your production data, test datasets, or custom sources to evaluate against real-world scenarios.
Cloud Integration
- Upload evaluations to the cloud, share with your team, and track results in the web console.
Quick Example
Here's a simple evaluation to get you started:
import { Levenshtein } from "autoevals";
import { Eval } from "orchestrated";
/**
* Simple evaluation example using Levenshtein distance scorer.
*
* This eval demonstrates how to test string similarity between expected and actual outputs.
* Levenshtein distance measures the minimum number of single-character edits required
* to transform one string into another.
*/
Eval("String Similarity Check", {
data: [
{
input: "What is the capital of France?",
expected: "Paris",
output: "Paris",
},
{
input: "What is the capital of France?",
expected: "Paris",
output: "paris", // Lowercase - still similar
},
{
input: "What is the capital of France?",
expected: "Paris",
output: "Lyon", // Different city - low similarity
},
{
input: "What is 2 + 2?",
expected: "4",
output: "4",
},
{
input: "What is 2 + 2?",
expected: "4",
output: "five", // Wrong answer
},
],
scores: [Levenshtein],
});
This evaluates a single test case using two built-in scorers. Results appear in your terminal with detailed scoring breakdowns.