Tasks
Tasks transform inputs into outputs in your evaluations. They're optional but powerful for testing live systems.
What are Tasks?
Tasks are functions that generate outputs from inputs. Use tasks when you want to:
- Test live systems - Call your LLM, API, or application with test inputs
- Compare approaches - Evaluate different prompts, models, or configurations
- Generate outputs - Create outputs to score when you only have inputs
Without a task, your data must include pre-generated output fields. With a task, Orchestrated generates outputs for you.
Basic Tasks
Simple Task
The most basic task takes an input and returns an output:
await Eval("Basic Task Eval", {
data: [
{ input: "What is 2+2?" },
{ input: "What is the capital of France?" },
],
task: async (input) => {
return await callLLM(input)
},
scores: [Effectiveness],
})
With Expected Values
Include expected values in your data for comparison:
await Eval("Expected Values Eval", {
data: [
{ input: "What is 2+2?", expected: "4" },
{ input: "Capital of France?", expected: "Paris" },
],
task: async (input) => {
return await callLLM(input)
},
scores: [Factuality], // Compares output to expected
})
Context Access
Tasks receive the merged context as their second parameter:
await Eval("Context Task Eval", {
ctx: {
apiKey: process.env.OPENAI_API_KEY,
model: "gpt-4",
temperature: 0.7,
},
data: [{ input: "test" }],
task: async (input, ctx) => {
// Access context values
return await callLLM(input, {
apiKey: ctx.apiKey,
model: ctx.model,
temperature: ctx.temperature,
})
},
scores: [Effectiveness],
})
State Access
Global state is auto-injected as ctx.state:
task: async (input, ctx) => {
console.log(ctx.state.tenantId)
console.log(ctx.state.environment)
console.log(ctx.state.loggedInUser)
return await callLLM(input)
}
Context Overrides
Tasks can return context overrides for downstream scorers:
await Eval("Override Context Eval", {
data: [{ input: "test" }],
task: async (input, ctx) => {
const startTime = Date.now()
const output = await callLLM(input)
const latency = Date.now() - startTime
// Return [output, contextOverride]
return [output, { latency }]
},
scores: [
createCustomScorer({
name: "LatencyChecker",
schema: z.object({ output: z.string() }),
handler: async (args, ctx) => ({
name: "LatencyChecker",
score: ctx.latency < 1000 ? 1 : 0,
metadata: { latency: ctx.latency },
}),
}),
],
})
Scorers receive the merged context including task overrides.
Error Handling
Graceful Failures
If a task throws an error, the evaluation continues with other test cases:
task: async (input) => {
try {
return await callLLM(input)
} catch (error) {
console.error(`Task failed for input: ${input}`, error)
throw error // Evaluation continues, but this test case fails
}
}
Failed test cases are marked as errors in the results summary.
Retry Logic
Implement retry logic for transient failures:
task: async (input) => {
let retries = 3
while (retries > 0) {
try {
return await callLLM(input)
} catch (error) {
retries--
if (retries === 0) throw error
await new Promise(r => setTimeout(r, 1000))
}
}
}
Timeouts
Add timeouts to prevent hanging tasks:
task: async (input) => {
const timeout = 30000 // 30 seconds
const result = await Promise.race([
callLLM(input),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Timeout')), timeout)
),
])
return result
}