Scorecards

Scorecards define what “good” looks like for your AI agents. They’re structured evaluation frameworks that measure conversation quality against specific criteria you care about—like empathy, problem resolution, or compliance.

Why Scorecards Matter

Without clear evaluation criteria, quality is subjective. Scorecards help you:

Measure consistently - Everyone evaluates using the same standards
Track improvements - See if changes actually make agents better
Identify weaknesses - Know exactly what needs fixing
Compare agents - Understand which performs better and why

Example: Your “Customer Service” scorecard gives Agent V1 a 78 vs Agent V2’s 92. Drilling down shows V1 scores lower on “Empathy” (65) while matching V2 on “Accuracy” (91). Now you know exactly what to improve.

How Scorecards Work

A scorecard contains:

Categories - Major areas to evaluate (e.g., Communication, Problem Resolution)
Criteria - Specific things to check within each category (e.g., “Shows empathy”)
Weights - How important each category is (totals 100%)

When you run a simulation or analyze a call, Chanl scores it against your scorecard and shows where it excelled or fell short.

Creating a Scorecard

Via UI
Via API

Navigate to Scorecards and click “Create Scorecard”:

Name your scorecard
Add categories with weights
Define criteria for each category
Test on sample calls
Activate for use

curl -X POST https://api.chanl.ai/v1/scorecards \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Customer Service Quality",
    "description": "Evaluates agent performance on customer service calls",
    "categories": [
      {
        "name": "Communication",
        "weight": 30,
        "criteria": [
          {
            "name": "Empathy",
            "description": "Agent acknowledges customer emotions and shows understanding"
          },
          {
            "name": "Clarity",
            "description": "Explanations are clear and easy to understand"
          },
          {
            "name": "Professionalism",
            "description": "Maintains professional tone throughout"
          }
        ]
      },
      {
        "name": "Problem Resolution",
        "weight": 50,
        "criteria": [
          {
            "name": "Issue Identified",
            "description": "Correctly understands the customer problem"
          },
          {
            "name": "Solution Provided",
            "description": "Offers appropriate solution"
          },
          {
            "name": "Follow-up",
            "description": "Confirms resolution and offers additional help"
          }
        ]
      },
      {
        "name": "Compliance",
        "weight": 20,
        "criteria": [
          {
            "name": "Required Disclosures",
            "description": "Provides all legally required information"
          },
          {
            "name": "Data Privacy",
            "description": "Handles customer information appropriately"
          }
        ]
      }
    ]
  }'

Common Scorecard Templates

Customer Service Scorecard

Evaluate support interactions:

Communication Quality (30%)

Empathy - Acknowledges customer feelings
Clarity - Uses simple, understandable language
Active Listening - Reflects back what customer said
Tone - Maintains friendly, professional demeanor

Problem Resolution (50%)

Issue Identification - Correctly understands the problem
Solution Quality - Provides effective resolution
Efficiency - Resolves in reasonable time
Confirmation - Verifies customer satisfaction

Compliance (20%)

Required Disclosures - Provides mandatory information
Policy Adherence - Follows company guidelines
Data Handling - Protects customer information

Sales Scorecard

Evaluate sales conversations:

Discovery (25%)

Needs Assessment - Asks questions to understand requirements
Budget Qualification - Determines financial fit
Timeline - Establishes decision timeframe
Authority - Identifies decision makers

Presentation (30%)

Value Communication - Explains benefits clearly
Objection Handling - Addresses concerns effectively
Proof Points - Provides relevant examples/data
Customization - Tailors pitch to customer needs

Closing (25%)

Call to Action - Clear next steps
Urgency - Creates appropriate motivation
Commitment - Secures agreement or advance

Compliance (20%)

Truthfulness - No misleading statements
Disclosures - Provides required information
Documentation - Confirms details in writing

Technical Support Scorecard

Evaluate technical assistance:

Technical Accuracy (40%)

Problem Diagnosis - Identifies root cause
Solution Correctness - Provides accurate fix
Technical Knowledge - Demonstrates expertise

Communication (30%)

Jargon-Free - Explains without technical terms
Step-by-Step - Provides clear instructions
Patience - Allows customer time to follow along

Efficiency (30%)

Resolution Time - Solves within acceptable timeframe
Tools Used - Leverages available resources
Escalation - Knows when to involve specialists

Scorecard Categories & Weights

Setting Weights

Weights determine how much each category impacts the overall score:

{
  "categories": [
    {
      "name": "Communication",
      "weight": 30  // 30% of total score
    },
    {
      "name": "Problem Resolution",
      "weight": 50  // 50% of total score
    },
    {
      "name": "Compliance",
      "weight": 20  // 20% of total score
    }
  ]
}

Total weights must equal 100%. When to weight higher:

Compliance - If regulatory consequences are severe
Problem Resolution - For support where solving issues is primary goal
Communication - For brand-sensitive interactions

Criteria Best Practices

Write Clear, Specific Criteria

❌ Too vague: “Agent was good” ✅ Clear and measurable:

{
  "name": "Empathy Demonstrated",
  "description": "Agent acknowledges customer frustration with phrases like 'I understand that's frustrating' or 'I can see why you're concerned' within first 30 seconds"
}

Include Examples

{
  "name": "Professional Tone",
  "description": "Maintains friendly, professional language throughout",
  "examples": {
    "good": [
      "I'd be happy to help you with that",
      "Let me look into this for you"
    ],
    "bad": [
      "That's not my problem",
      "You should have read the instructions"
    ]
  }
}

Make Them Actionable

Criteria should guide improvement:

{
  "name": "Clear Next Steps",
  "description": "Agent ends call by summarizing what will happen next and when customer can expect resolution",
  "passingExample": "I've processed your refund, and you'll see the credit in 3-5 business days. Is there anything else I can help with today?"
}

Using Scorecards

In Test Scenarios

Assign a scorecard when creating scenarios:

curl -X POST https://api.chanl.ai/v1/scenarios \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Refund Request Test",
    "prompt": "Customer wants refund for defective product",
    "personas": ["frustrated", "analytical"],
    "agents": ["agent-v1"],
    "scorecard": "customer-service-quality"
  }'

Every simulation from this scenario will be scored using that scorecard.

For Live Call Analysis

Analyze real calls using a scorecard:

curl -X POST https://api.chanl.ai/v1/call-logs/call_abc123/analyze \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "scorecard": "customer-service-quality"
  }'

Bulk Analysis

Score multiple calls at once:

const chanl = require('@chanl/sdk');

// Score last week's calls
const results = await chanl.callLogs.batchAnalyze({
  filters: {
    dateRange: {
      start: '2024-01-08',
      end: '2024-01-15'
    }
  },
  scorecard: 'customer-service-quality'
});

console.log(`Analyzed ${results.count} calls`);
console.log(`Average score: ${results.avgScore}`);
console.log(`Top weakness: ${results.lowestCategory}`);

Understanding Scorecard Results

Reading Category Scores

{
  "overallScore": 87,
  "categories": [
    {
      "name": "Communication",
      "weight": 30,
      "score": 92,
      "contributionToTotal": 27.6,
      "criteria": [
        {
          "name": "Empathy",
          "score": 95,
          "passed": true,
          "notes": "Excellent acknowledgment of customer frustration"
        },
        {
          "name": "Clarity",
          "score": 88,
          "passed": true,
          "notes": "Clear explanations with minor jargon"
        }
      ]
    },
    {
      "name": "Problem Resolution",
      "weight": 50,
      "score": 85,
      "contributionToTotal": 42.5,
      "criteria": [
        {
          "name": "Solution Provided",
          "score": 80,
          "passed": true,
          "notes": "Solution worked but took longer than ideal"
        }
      ]
    }
  ]
}

Key insights:

Overall score (87) is good but has room for improvement
Communication (92) is a strength
Problem Resolution (85) drags down overall score despite 50% weight
Focus improvement efforts on “Solution Provided” criterion

Comparing Across Agents

curl https://api.chanl.ai/v1/analytics/scorecard-comparison \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "scorecard": "customer-service-quality",
    "agents": ["agent-v1", "agent-v2"],
    "timeRange": "30d"
  }'

{
  "scorecard": "customer-service-quality",
  "comparison": {
    "agent-v1": {
      "overallScore": 82,
      "categoryScores": {
        "Communication": 88,
        "Problem Resolution": 78,
        "Compliance": 84
      }
    },
    "agent-v2": {
      "overallScore": 91,
      "categoryScores": {
        "Communication": 93,
        "Problem Resolution": 92,
        "Compliance": 87
      }
    }
  },
  "insights": [
    "Agent V2 outperforms across all categories",
    "Biggest gap is in Problem Resolution (+14 points)",
    "Both agents strong on Communication"
  ]
}

Refining Scorecards

Iterate Based on Data

After using a scorecard for a while:

Review Score Distribution

Are most calls scoring 90+? Criteria might be too easy. All below 70? Too strict.

Check Criterion Relevance

Which criteria consistently score well/poorly? Remove ones that don’t differentiate quality.

Adjust Weights

If a high-weight category rarely varies, consider reducing its weight.

Add Missing Criteria

If agents fail in ways not captured by scorecard, add new criteria.

Test Changes

Run sample calls through updated scorecard before deploying broadly.

A/B Test Scorecards

Compare different evaluation approaches:

const chanl = require('@chanl/sdk');

// Run same calls through two scorecards
const results = await chanl.scorecards.compare({
  calls: ['call_1', 'call_2', 'call_3'],
  scorecards: ['scorecard-v1', 'scorecard-v2']
});

console.log('V1 avg:', results.v1.avgScore);
console.log('V2 avg:', results.v2.avgScore);
console.log('Correlation:', results.correlation);

Best Practices

Start Simple

Begin with 3 categories and 2-3 criteria each. Add complexity as you learn what matters.

Get Team Input

QA team, sales managers, and compliance should all contribute criteria.

Test Before Production

Run scorecard on 20-30 sample calls to validate it produces useful scores.

Document Examples

For each criterion, include clear examples of passing and failing behavior.

Review Quarterly

Business priorities change. Update scorecards to match current goals.

Troubleshooting

All scores are too high/low

Problem: Every call scores above 90 or below 50Solutions:

Criteria are too lenient/strict - adjust thresholds
Review criterion descriptions for clarity
Test scorecard on known good and bad calls
Consider if weights are appropriate

Scores don't match intuition

Problem: Calls you think are good score low and vice versaSolutions:

Review which specific criteria are failing
Criteria descriptions may not capture what you actually value
Check if AI is misinterpreting criteria
Provide more explicit examples in criterion descriptions

Can't distinguish between agents

Problem: All agents score similarly on scorecardSolutions:

Criteria aren’t specific enough to capture differences
Add more granular criteria
Check if weights are masking category differences
May need different scorecards for different agent types

Scoring takes too long

Problem: Analysis timing out or very slowSolutions:

Reduce number of criteria (aim for <15 total)
Simplify criterion descriptions
Contact support for optimization help

What’s Next?

Run Scenarios

Use your scorecard to test agents

Analyze Call Logs

Score historical calls with your scorecard

View Analytics

See scorecard performance trends

Improve Prompts

Update agents based on scorecard insights

Overview

Test

Observe

Optimize

Scorecards

Scorecards

Why Scorecards Matter

How Scorecards Work

Creating a Scorecard

Common Scorecard Templates

Customer Service Scorecard

Sales Scorecard

Technical Support Scorecard

Scorecard Categories & Weights

Setting Weights

Criteria Best Practices

Write Clear, Specific Criteria

Include Examples

Make Them Actionable

Using Scorecards

In Test Scenarios

For Live Call Analysis

Bulk Analysis

Understanding Scorecard Results

Reading Category Scores

Comparing Across Agents

Refining Scorecards

Iterate Based on Data

A/B Test Scorecards

Best Practices

Troubleshooting

What’s Next?

Run Scenarios

Analyze Call Logs

View Analytics

Improve Prompts

Overview

Test

Observe

Optimize

​Scorecards

​Why Scorecards Matter

​How Scorecards Work

​Creating a Scorecard

​Common Scorecard Templates

​Customer Service Scorecard

​Sales Scorecard

​Technical Support Scorecard

​Scorecard Categories & Weights

​Setting Weights

​Criteria Best Practices

​Write Clear, Specific Criteria

​Include Examples

​Make Them Actionable

​Using Scorecards

​In Test Scenarios

​For Live Call Analysis

​Bulk Analysis

​Understanding Scorecard Results

​Reading Category Scores

​Comparing Across Agents

​Refining Scorecards

​Iterate Based on Data

​A/B Test Scorecards

​Best Practices

​Troubleshooting

​What’s Next?

Run Scenarios

Analyze Call Logs

View Analytics

Improve Prompts

Scorecards

Why Scorecards Matter

How Scorecards Work

Creating a Scorecard

Common Scorecard Templates

Customer Service Scorecard

Sales Scorecard

Technical Support Scorecard

Scorecard Categories & Weights

Setting Weights

Criteria Best Practices

Write Clear, Specific Criteria

Include Examples

Make Them Actionable

Using Scorecards

In Test Scenarios

For Live Call Analysis

Bulk Analysis

Understanding Scorecard Results

Reading Category Scores

Comparing Across Agents

Refining Scorecards

Iterate Based on Data

A/B Test Scorecards

Best Practices

Troubleshooting

What’s Next?