Fine-Tuning
Fine-tuning is how you make your agent smarter over time. Instead of just tweaking prompts, you train custom AI models on your actual conversations—teaching the agent to naturally sound like your best performers.Why Fine-Tuning Matters
Prompts get you 80% of the way there. Fine-tuning gets you the last 20%. It helps you:- Learn from success - Train on conversations that went well
- Fix recurring issues - Teach agent to avoid common mistakes
- Match your style - Make agent sound like your company
- Improve over time - Get better as you collect more data
Example: Your agent handles refunds well but struggles with angry customers. Fine-tune on 100 high-scoring “angry customer” conversations. New model naturally de-escalates better without needing a longer prompt.
How Fine-Tuning Works
When to Use Fine-Tuning
Don't Fine-Tune Yet
- You’re just starting out
- Have fewer than 100 quality conversations
- Haven’t tried prompt optimization
- Need quick improvements
Fine-Tune When
- Prompts alone aren’t enough
- You have 100+ high-quality conversations
- Need specific behavioral patterns
- Want long-term improvement
Collecting Training Data
Finding Good Conversations
Look for conversations that score well on your scorecards:What Makes Good Training Data?
High-Scoring Conversations
High-Scoring Conversations
Calls that score 85+ on your scorecardsWhy: These demonstrate your quality standards
Diverse Scenarios
Diverse Scenarios
Mix of different customer types and situationsWhy: Prevents overfitting to one conversation pattern
Representative of Goals
Representative of Goals
Conversations that show the behavior you wantWhy: Model learns “this is how we do things”
Clean Outcomes
Clean Outcomes
Clear resolution or successful interactionWhy: Ambiguous outcomes confuse the model
Creating a Training Dataset
Starting a Fine-Tuning Job
- Via UI
- Via API
- Navigate to Fine-Tuning in sidebar
- Click “Create Training Job”
- Select training dataset
- Choose base model (GPT-4, Claude, etc.)
- Set training parameters
- Review data privacy settings
- Start training
Training Parameters
Monitoring Training Progress
What the Metrics Mean
Training Loss
How well model fits training dataLower is better
Target: <0.5
Validation Loss
How well model generalizes to new dataLower is better
Should be close to training loss
Accuracy
Percentage of correct predictionsHigher is better
Target: >0.85
Testing Fine-Tuned Models
Before deploying, test against baseline:Deploying Fine-Tuned Models
Gradual Rollout
Start with a small percentage of traffic:A/B Testing
Run both models simultaneously:Monitoring After Deployment
Continuous Improvement
Collecting More Data
Keep training on new high-performing conversations:Fine-Tuning Use Cases
Customer Service Excellence
Sales Optimization
Compliance & Accuracy
Best Practices
Start with Prompt Optimization
Fine-tuning is powerful but slow. Get prompts working well first, then fine-tune for the extra edge.
Collect Diverse Examples
Don’t just train on one type of conversation. Mix scenarios, personas, and outcomes.
Use Enough Data
Minimum 100 conversations for meaningful results. 500+ is better. More diverse data beats more of the same.
Test Thoroughly Before Deploying
Run extensive scenarios comparing fine-tuned vs baseline. Look for any regressions in edge cases.
Monitor in Production
Watch real performance closely for first week after deployment. Be ready to rollback if needed.
Data Privacy & Security
Chanl automatically removes common PII during training:- Credit card numbers
- Social security numbers
- Email addresses
- Phone numbers
- Specific account numbers
Troubleshooting
Training failing or stuck
Training failing or stuck
Problem: Training job not completingSolutions:
- Check dataset has at least 50 conversations
- Verify all calls in dataset are accessible
- Reduce batch size if hitting memory limits
- Contact support if stuck for >24 hours
Model performing worse than baseline
Model performing worse than baseline
Problem: Fine-tuned model scores lowerInvestigate:
- Did you train on diverse enough data?
- Are training examples actually high-quality?
- Did you overtrain (too many epochs)?
- Test on validation set - is it overfitting?
- Compare on same scenarios as training data
Model too similar to baseline
Model too similar to baseline
Problem: No measurable improvementSolutions:
- Use more training examples (aim for 200+)
- Increase learning rate slightly
- Add more epochs (try 4-5)
- Ensure training data is different from what base model already does well
Model behaving inconsistently
Model behaving inconsistently
Problem: Works well sometimes, poorly othersSolutions:
- Training data may have conflicting examples
- Review dataset for contradictory conversations
- Add more examples of the edge cases
- Consider separate models for different use cases
Cost Considerations
Fine-tuning costs depend on:- Training data size - More conversations = higher cost
- Base model - GPT-4 more expensive than GPT-3.5
- Training duration - More epochs = more compute time
- Inference - Fine-tuned models may cost more per call
Typical costs:
- Training: $50-500 per job depending on size
- Inference: 10-50% more per call than base model
- ROI: Usually positive if improvement >5% on key metrics
What’s Next?
Optimize Agents
Configure agents to use your fine-tuned models
Test Scenarios
Validate fine-tuned model performance
Monitor Analytics
Track improvement from fine-tuning
Improve Prompts
Combine fine-tuning with prompt optimization