Skip to main content

Test Scenarios

Scenarios let you test how your AI agents handle different situations before going live. Think of them as automated quality assurance tests that run realistic conversations with your agents.

Why Use Scenarios?

Before deploying an AI agent to handle real customer calls, you need to know:
  • Will it handle angry customers appropriately?
  • Can it process refund requests correctly?
  • Does it follow your company’s policies?
  • Will it maintain quality across different customer types?
Scenarios answer these questions by automatically testing your agent against multiple customer behaviors and measuring the results.

How Scenarios Work

A scenario defines a specific situation (like “customer requesting refund”) and automatically tests it across different customer personalities and agent configurations.
Example: Create one “Refund Request” scenario with 5 customer personas and 3 agent variants. Chanl automatically runs 15 conversations (5 × 3) and scores each one.

The Core Formula

Scenario + Personas + Agents = Simulations

1 scenario × 3 personas × 2 agents = 6 automated test conversations
Each combination runs as a separate simulation, giving you comprehensive test coverage with minimal setup.

Creating Your First Scenario

Chanl provides both a visual UI wizard and an API for creating scenarios. Here’s how to do it:
The scenario creation wizard guides you through 6 steps:

Step 1: Define the Scenario

Give your scenario a name and describe the situation you want to test. You can use variables to make scenarios reusable.Scenario definition with name, tags, prompt description and variablesKey elements:
  • Name: Descriptive title (e.g., “Order Tracking”)
  • Tags: Organize scenarios by category
  • Prompt: Describe the customer situation
  • Variables: Make scenarios reusable with placeholders like {{customer_name}}

Step 2: Select Personas

Choose which customer personalities to test against your agent.Persona selection showing various customer types with different emotions and speech stylesPick diverse personas to ensure comprehensive testing:
  • Different emotional states (angry, friendly, stressed)
  • Various speech patterns (fast, slow, mumbled)
  • Multiple accents and languages

Step 3: Choose Target Agents

Select which agents to test with this scenario.Agent selection showing connected agents from VAPI and custom providersYou can test:
  • Production vs staging agents
  • Different prompt versions
  • Various AI models (GPT-4, Claude, etc.)

Step 4: Select Score Criteria

Pick the scorecard that defines quality standards for evaluation.Scorecard selection showing various evaluation criteria optionsChoose scorecards based on your goals:
  • Customer service quality
  • Sales effectiveness
  • Compliance requirements

Step 5: Set Schedule

Configure how often this scenario should run automatically.Schedule configuration with frequency options and time settingsOptions:
  • Once: Run immediately, manual reruns
  • Daily: Continuous regression testing
  • Weekly/Monthly: Periodic quality checks
  • Stop condition: Never, after date, or after N runs

Step 6: Preview & Launch

Review your configuration before running the simulations.Preview showing all simulation combinations that will be createdThe preview shows exactly how many simulations will run:
  • 2 personas × 1 agent × 1 scorecard = 2 simulations
Click “Publish & Run” to start testing!

Managing Scenarios

After creation, view all your scenarios in one place:Scenarios dashboard showing list of scenarios with scores and statusesDashboard features:
  • Total runs and average scores
  • Active vs completed scenarios
  • Quick access to results
  • Edit or rerun scenarios

Scheduling Automated Tests

Run scenarios automatically to catch issues before customers do.

Scheduling Options

Once

Run immediately, then manually trigger again when needed

Daily

Perfect for testing production agents every night

Weekly

Good for regression testing major scenarios

Monthly

Useful for comprehensive quality audits

Setting End Conditions

Control when scheduled tests stop:
{
  "schedule": {
    "frequency": "daily",
    "time": "02:00 AM EST",
    "endCondition": "after_runs",
    "maxRuns": 30
  }
}
  • Never - Runs indefinitely (useful for continuous monitoring)
  • End Date - Stops after a specific date (good for limited testing periods)
  • After N Runs - Stops after specified executions (e.g., 30 days of daily tests)

Understanding Scenario Results

After running a scenario, you’ll see results for each simulation:

Reading the Results Dashboard

Scenario: Product Refund Request

Total Simulations: 6 (3 personas × 2 agents)
CombinationScoreStatus
Frustrated + Agent V178⚠️
Frustrated + Agent V292
Analytical + Agent V185
Analytical + Agent V288
Confused + Agent V171
Confused + Agent V282
Key Finding: Agent V2 performs better with frustrated customers Recommendation: Deploy V2, improve V1’s empathy responses

Analyzing Patterns

Look for:
  • Persona weaknesses - Which customer types cause issues?
  • Agent comparisons - Which version performs better?
  • Consistent failures - What scenarios always score low?
  • Score trends - Are agents improving over time?

Common Scenario Templates

Customer Service

{
  "name": "Angry Customer Escalation",
  "prompt": "Customer has been on hold for 45 minutes and is extremely frustrated. They're threatening to cancel their account.",
  "personas": ["angry", "demanding"],
  "scorecard": "de-escalation-quality"
}

Sales

{
  "name": "Price Objection Handling",
  "prompt": "Prospect is interested in the product but says the price is too high compared to competitors.",
  "personas": ["price-sensitive", "skeptical"],
  "scorecard": "sales-effectiveness"
}

Technical Support

{
  "name": "Complex Technical Issue",
  "prompt": "Customer's software won't connect to the server. They've already tried restarting and checking their internet.",
  "personas": ["frustrated-technical", "patient-technical"],
  "scorecard": "technical-support-quality"
}

Compliance Verification

{
  "name": "TCPA Compliance Check",
  "prompt": "Outbound sales call to verify agent provides all required disclosures and consent requests.",
  "personas": ["rushed", "detail-oriented"],
  "scorecard": "compliance-tcpa"
}

Best Practices

Begin with 2-3 personas and 1-2 agents. Once you validate the scenario works, expand to cover more combinations.
Don’t just test happy paths. Include difficult personas like “confused elderly customer” or “angry and rushed.”
Name scenarios clearly: “Refund Request - Defective Product” not “Scenario 1”
When testing new agent versions, keep scenario names consistent to compare results over time.
Run key scenarios daily to catch when agent updates break existing functionality.

Automated Testing with API

Automate scenario testing programmatically:
// validate-agent.js - Automated quality validation
const chanl = require('@chanl/sdk');

async function validateAgent(agentId) {
  // Create test scenario
  const scenario = await chanl.scenarios.create({
    name: `Quality Check - ${new Date().toISOString()}`,
    prompt: "Customer requests refund for defective product",
    personas: ['frustrated', 'analytical', 'confused'],
    agents: [agentId],
    scorecard: 'customer-service-quality'
  });

  // Wait for all simulations to complete
  const results = await chanl.scenarios.waitForCompletion(scenario.id, {
    timeout: 300000 // 5 minutes
  });

  // Check if quality threshold met
  const avgScore = results.averageScore;
  const minScore = results.minScore;

  if (avgScore < 80 || minScore < 70) {
    throw new Error(
      `Agent quality below threshold. Average: ${avgScore}, Min: ${minScore}`
    );
  }

  console.log(`✅ Agent passed quality tests. Average score: ${avgScore}`);
  return results;
}

// Run validation
validateAgent(process.env.AGENT_ID)
  .then(() => console.log('Validation complete'))
  .catch(err => {
    console.error('❌ Agent validation failed:', err.message);
    process.exit(1);
  });

Troubleshooting

Problem: Simulations take too long or timeoutSolutions:
  • Reduce the number of personas or agents in the scenario
  • Check if your agent has timeout issues in production
  • Contact support if timeouts persist
Problem: All combinations scoring below 70Solutions:
  • Review your scorecard criteria - are they too strict?
  • Check agent configuration for obvious issues
  • Review simulation transcripts to identify common failure points
Problem: Same scenario getting different scores on rerunsSolutions:
  • This is normal with AI - some variation expected
  • Look at trends over multiple runs, not single scores
  • If variation is extreme (±20 points), review agent configuration

What’s Next?