Scorecards
Scorecards define what “good” looks like for your AI agents. They’re structured evaluation frameworks that measure conversation quality against specific criteria you care about—like empathy, problem resolution, or compliance.Why Scorecards Matter
Without clear evaluation criteria, quality is subjective. Scorecards help you:- Measure consistently - Everyone evaluates using the same standards
- Track improvements - See if changes actually make agents better
- Identify weaknesses - Know exactly what needs fixing
- Compare agents - Understand which performs better and why
Example: Your “Customer Service” scorecard gives Agent V1 a 78 vs Agent V2’s 92. Drilling down shows V1 scores lower on “Empathy” (65) while matching V2 on “Accuracy” (91). Now you know exactly what to improve.
How Scorecards Work
A scorecard contains:- Categories - Major areas to evaluate (e.g., Communication, Problem Resolution)
- Criteria - Specific things to check within each category (e.g., “Shows empathy”)
- Weights - How important each category is (totals 100%)
Creating a Scorecard
- Via UI
- Via API
Navigate to Scorecards and click “Create Scorecard”:
- Name your scorecard
- Add categories with weights
- Define criteria for each category
- Test on sample calls
- Activate for use
Common Scorecard Templates
Customer Service Scorecard
Evaluate support interactions:Communication Quality (30%)
Communication Quality (30%)
- Empathy - Acknowledges customer feelings
- Clarity - Uses simple, understandable language
- Active Listening - Reflects back what customer said
- Tone - Maintains friendly, professional demeanor
Problem Resolution (50%)
Problem Resolution (50%)
- Issue Identification - Correctly understands the problem
- Solution Quality - Provides effective resolution
- Efficiency - Resolves in reasonable time
- Confirmation - Verifies customer satisfaction
Compliance (20%)
Compliance (20%)
- Required Disclosures - Provides mandatory information
- Policy Adherence - Follows company guidelines
- Data Handling - Protects customer information
Sales Scorecard
Evaluate sales conversations:Discovery (25%)
Discovery (25%)
- Needs Assessment - Asks questions to understand requirements
- Budget Qualification - Determines financial fit
- Timeline - Establishes decision timeframe
- Authority - Identifies decision makers
Presentation (30%)
Presentation (30%)
- Value Communication - Explains benefits clearly
- Objection Handling - Addresses concerns effectively
- Proof Points - Provides relevant examples/data
- Customization - Tailors pitch to customer needs
Closing (25%)
Closing (25%)
- Call to Action - Clear next steps
- Urgency - Creates appropriate motivation
- Commitment - Secures agreement or advance
Compliance (20%)
Compliance (20%)
- Truthfulness - No misleading statements
- Disclosures - Provides required information
- Documentation - Confirms details in writing
Technical Support Scorecard
Evaluate technical assistance:Technical Accuracy (40%)
Technical Accuracy (40%)
- Problem Diagnosis - Identifies root cause
- Solution Correctness - Provides accurate fix
- Technical Knowledge - Demonstrates expertise
Communication (30%)
Communication (30%)
- Jargon-Free - Explains without technical terms
- Step-by-Step - Provides clear instructions
- Patience - Allows customer time to follow along
Efficiency (30%)
Efficiency (30%)
- Resolution Time - Solves within acceptable timeframe
- Tools Used - Leverages available resources
- Escalation - Knows when to involve specialists
Scorecard Categories & Weights
Setting Weights
Weights determine how much each category impacts the overall score:- Compliance - If regulatory consequences are severe
- Problem Resolution - For support where solving issues is primary goal
- Communication - For brand-sensitive interactions
Criteria Best Practices
Write Clear, Specific Criteria
❌ Too vague: “Agent was good” ✅ Clear and measurable:Include Examples
Make Them Actionable
Criteria should guide improvement:Using Scorecards
In Test Scenarios
Assign a scorecard when creating scenarios:For Live Call Analysis
Analyze real calls using a scorecard:Bulk Analysis
Score multiple calls at once:Understanding Scorecard Results
Reading Category Scores
- Overall score (87) is good but has room for improvement
- Communication (92) is a strength
- Problem Resolution (85) drags down overall score despite 50% weight
- Focus improvement efforts on “Solution Provided” criterion
Comparing Across Agents
Refining Scorecards
Iterate Based on Data
After using a scorecard for a while:Review Score Distribution
Are most calls scoring 90+? Criteria might be too easy. All below 70? Too strict.
Check Criterion Relevance
Which criteria consistently score well/poorly? Remove ones that don’t differentiate quality.
A/B Test Scorecards
Compare different evaluation approaches:Best Practices
Start Simple
Begin with 3 categories and 2-3 criteria each. Add complexity as you learn what matters.
Troubleshooting
All scores are too high/low
All scores are too high/low
Problem: Every call scores above 90 or below 50Solutions:
- Criteria are too lenient/strict - adjust thresholds
- Review criterion descriptions for clarity
- Test scorecard on known good and bad calls
- Consider if weights are appropriate
Scores don't match intuition
Scores don't match intuition
Problem: Calls you think are good score low and vice versaSolutions:
- Review which specific criteria are failing
- Criteria descriptions may not capture what you actually value
- Check if AI is misinterpreting criteria
- Provide more explicit examples in criterion descriptions
Can't distinguish between agents
Can't distinguish between agents
Problem: All agents score similarly on scorecardSolutions:
- Criteria aren’t specific enough to capture differences
- Add more granular criteria
- Check if weights are masking category differences
- May need different scorecards for different agent types
Scoring takes too long
Scoring takes too long
Problem: Analysis timing out or very slowSolutions:
- Reduce number of criteria (aim for <15 total)
- Simplify criterion descriptions
- Contact support for optimization help