Fine-Tuning

Fine-tuning is how you make your agent smarter over time. Instead of just tweaking prompts, you train custom AI models on your actual conversations—teaching the agent to naturally sound like your best performers.

Why Fine-Tuning Matters

Prompts get you 80% of the way there. Fine-tuning gets you the last 20%. It helps you:

Learn from success - Train on conversations that went well
Fix recurring issues - Teach agent to avoid common mistakes
Match your style - Make agent sound like your company
Improve over time - Get better as you collect more data

Example: Your agent handles refunds well but struggles with angry customers. Fine-tune on 100 high-scoring “angry customer” conversations. New model naturally de-escalates better without needing a longer prompt.

How Fine-Tuning Works

Collect Best Conversations → Clean & Prepare → Train Custom Model → Test → Deploy

Think of it like training a new employee by having them shadow your best performer. The AI learns patterns from successful conversations and applies them going forward.

When to Use Fine-Tuning

Don't Fine-Tune Yet

You’re just starting out
Have fewer than 100 quality conversations
Haven’t tried prompt optimization
Need quick improvements

Instead: Focus on prompts and tools first

Fine-Tune When

Prompts alone aren’t enough
You have 100+ high-quality conversations
Need specific behavioral patterns
Want long-term improvement

Result: Better base performance across all conversations

Collecting Training Data

Finding Good Conversations

Look for conversations that score well on your scorecards:

# Get high-scoring calls for training
curl https://api.chanl.ai/v1/call-logs?minScore=90&limit=100 \
  -H "Authorization: Bearer YOUR_API_KEY"

{
  "calls": [
    {
      "id": "call_abc123",
      "score": 94,
      "category": "customer-service",
      "duration": 183,
      "outcome": "resolved",
      "tags": ["refund", "empathy", "quick-resolution"]
    }
  ]
}

What Makes Good Training Data?

High-Scoring Conversations

Calls that score 85+ on your scorecardsWhy: These demonstrate your quality standards

Diverse Scenarios

Mix of different customer types and situationsWhy: Prevents overfitting to one conversation pattern

Representative of Goals

Conversations that show the behavior you wantWhy: Model learns “this is how we do things”

Clean Outcomes

Clear resolution or successful interactionWhy: Ambiguous outcomes confuse the model

Creating a Training Dataset

const chanl = require('@chanl/sdk');

async function buildTrainingDataset() {
  // Get high-scoring calls from last 30 days
  const highScoring = await chanl.callLogs.list({
    minScore: 85,
    days: 30,
    limit: 200
  });

  // Get diverse scenario coverage
  const scenarios = ['refund', 'billing', 'technical', 'general'];
  const trainingData = [];

  for (const scenario of scenarios) {
    const calls = highScoring.filter(c => c.tags.includes(scenario));

    // Take top 25 from each scenario
    trainingData.push(...calls.slice(0, 25));
  }

  // Create dataset
  const dataset = await chanl.fineTuning.createDataset({
    name: 'Customer Service Excellence Q1 2024',
    description: 'Top performing customer service calls',
    callIds: trainingData.map(c => c.id),
    targetBehaviors: [
      'empathetic_responses',
      'problem_resolution',
      'professional_tone'
    ]
  });

  console.log(`Created dataset with ${trainingData.length} conversations`);
  return dataset;
}

Starting a Fine-Tuning Job

Via UI
Via API

Navigate to Fine-Tuning in sidebar
Click “Create Training Job”
Select training dataset
Choose base model (GPT-4, Claude, etc.)
Set training parameters
Review data privacy settings
Start training

curl -X POST https://api.chanl.ai/v1/fine-tuning/jobs \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Customer Service Model v2",
    "datasetId": "dataset_abc123",
    "baseModel": "gpt-4",
    "parameters": {
      "epochs": 3,
      "learningRate": 0.0001,
      "batchSize": 4
    },
    "validationSplit": 0.2
  }'

Response:

{
  "jobId": "ft_job_xyz789",
  "status": "queued",
  "estimatedCompletion": "2024-01-15T18:00:00Z",
  "trainingExamples": 100,
  "validationExamples": 25
}

Training Parameters

{
  "epochs": 3,
  // How many times to train on full dataset
  // More epochs = more learning, but risk overfitting
  // Typical: 2-4 epochs

  "learningRate": 0.0001,
  // How much to adjust model each step
  // Lower = more conservative, safer
  // Typical: 0.0001 - 0.001

  "batchSize": 4,
  // Conversations processed together
  // Larger = faster training, more memory
  // Typical: 4-8

  "validationSplit": 0.2
  // Percentage held back for testing
  // Typical: 0.15 - 0.25 (15-25%)
}

Monitoring Training Progress

# Check training status
curl https://api.chanl.ai/v1/fine-tuning/jobs/ft_job_xyz789 \
  -H "Authorization: Bearer YOUR_API_KEY"

{
  "jobId": "ft_job_xyz789",
  "status": "training",
  "progress": 67,
  "currentEpoch": 2,
  "totalEpochs": 3,
  "metrics": {
    "trainingLoss": 0.23,
    "validationLoss": 0.31,
    "estimatedAccuracy": 0.89
  },
  "estimatedCompletion": "2024-01-15T17:30:00Z"
}

What the Metrics Mean

Training Loss

How well model fits training dataLower is better Target: <0.5

Validation Loss

How well model generalizes to new dataLower is better Should be close to training loss

Accuracy

Percentage of correct predictionsHigher is better Target: >0.85

If validation loss is much higher than training loss, your model is overfitting (memorizing training data instead of learning patterns). Use more diverse training data or fewer epochs.

Testing Fine-Tuned Models

Before deploying, test against baseline:

const chanl = require('@chanl/sdk');

async function compareModels(fineTunedModelId, baselineAgentId) {
  // Create test agent with fine-tuned model
  const testAgent = await chanl.agents.create({
    name: 'Fine-Tuned Test Agent',
    modelId: fineTunedModelId,
    prompt: 'Use the same prompt as baseline',
    tools: ['same tools as baseline']
  });

  // Run comparison scenarios
  const comparison = await chanl.scenarios.create({
    name: 'Fine-Tuned vs Baseline',
    prompt: 'Customer service scenarios',
    personas: ['polite', 'frustrated', 'confused'],
    agents: [testAgent.id, baselineAgentId],
    scorecard: 'customer-service-quality'
  });

  const results = await chanl.scenarios.waitForCompletion(comparison.id);

  return {
    fineTunedScore: results.agents[testAgent.id].avgScore,
    baselineScore: results.agents[baselineAgentId].avgScore,
    improvement: results.agents[testAgent.id].avgScore - results.agents[baselineAgentId].avgScore,
    recommendation: results.agents[testAgent.id].avgScore > results.agents[baselineAgentId].avgScore + 5
      ? 'Deploy fine-tuned model'
      : 'Needs more training or data'
  };
}

const results = await compareModels('model_ft_abc', 'agent_baseline');
console.log(results);
/*
{
  fineTunedScore: 91,
  baselineScore: 84,
  improvement: +7,
  recommendation: 'Deploy fine-tuned model'
}
*/

Deploying Fine-Tuned Models

Gradual Rollout

Start with a small percentage of traffic:

# Deploy to 10% of calls initially
curl -X POST https://api.chanl.ai/v1/agents/agent_abc123/model \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "modelId": "model_ft_xyz789",
    "rolloutStrategy": {
      "type": "percentage",
      "percentage": 10
    }
  }'

A/B Testing

Run both models simultaneously:

# 50% old model, 50% new model
curl -X POST https://api.chanl.ai/v1/agents/agent_abc123/ab-test \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "modelA": "model_baseline",
    "modelB": "model_ft_xyz789",
    "split": 50,
    "duration": "7d"
  }'

Monitoring After Deployment

const chanl = require('@chanl/sdk');

// Monitor performance of fine-tuned model
const performance = await chanl.agents.analytics('agent_abc123', {
  compareModels: ['model_baseline', 'model_ft_xyz789'],
  timeRange: '7d',
  metrics: ['avgScore', 'successRate', 'escalationRate']
});

console.log('Performance Comparison:');
console.log('Baseline:', performance.models.model_baseline);
console.log('Fine-tuned:', performance.models.model_ft_xyz789);

if (performance.models.model_ft_xyz789.avgScore < performance.models.model_baseline.avgScore) {
  console.log('⚠️ Fine-tuned model underperforming. Consider rollback.');
}

Continuous Improvement

Collecting More Data

Keep training on new high-performing conversations:

// Automated training pipeline
async function continuousImprovement() {
  // Every month, collect new high-scoring calls
  const newCalls = await chanl.callLogs.list({
    minScore: 90,
    startDate: '2024-01-01',
    endDate: '2024-01-31'
  });

  // Add to existing dataset
  await chanl.fineTuning.updateDataset('dataset_abc123', {
    addCallIds: newCalls.map(c => c.id)
  });

  // Retrain model
  const newJob = await chanl.fineTuning.createJob({
    name: `Customer Service Model ${new Date().toISOString().slice(0, 7)}`,
    datasetId: 'dataset_abc123',
    baseModel: 'model_ft_xyz789', // Train on top of previous fine-tuned model
    parameters: {
      epochs: 2,
      learningRate: 0.00005 // Lower rate for refinement
    }
  });

  return newJob;
}

Fine-Tuning Use Cases

Customer Service Excellence

// Train model to handle frustrated customers better
const dataset = await chanl.fineTuning.createDataset({
  name: 'De-escalation Training',
  callIds: await chanl.callLogs.search({
    tags: ['frustrated', 'angry'],
    minScore: 88,
    outcome: 'resolved'
  }),
  targetBehaviors: [
    'acknowledge_emotion_first',
    'apologize_when_appropriate',
    'focus_on_solution',
    'never_defensive'
  ]
});

Sales Optimization

// Train model on successful sales conversations
const dataset = await chanl.fineTuning.createDataset({
  name: 'High-Converting Sales Calls',
  callIds: await chanl.callLogs.search({
    outcome: 'sale',
    minScore: 85
  }),
  targetBehaviors: [
    'needs_discovery',
    'objection_handling',
    'value_proposition',
    'closing_techniques'
  ]
});

Compliance & Accuracy

// Train model to follow compliance requirements perfectly
const dataset = await chanl.fineTuning.createDataset({
  name: 'Compliance Perfect Calls',
  callIds: await chanl.callLogs.search({
    scorecard: 'compliance-tcpa',
    minScore: 98
  }),
  targetBehaviors: [
    'required_disclosures',
    'consent_collection',
    'policy_adherence'
  ]
});

Best Practices

Start with Prompt Optimization

Fine-tuning is powerful but slow. Get prompts working well first, then fine-tune for the extra edge.

Collect Diverse Examples

Don’t just train on one type of conversation. Mix scenarios, personas, and outcomes.

Use Enough Data

Minimum 100 conversations for meaningful results. 500+ is better. More diverse data beats more of the same.

Test Thoroughly Before Deploying

Run extensive scenarios comparing fine-tuned vs baseline. Look for any regressions in edge cases.

Monitor in Production

Watch real performance closely for first week after deployment. Be ready to rollback if needed.

Retrain Regularly

Every month or quarter, add new high-quality conversations and retrain. Models improve with fresh data.

Data Privacy & Security

Important: Fine-tuning uses real conversation data. Ensure:

PII is removed or anonymized
You have rights to use the data
Customers consented (if required)
Data is encrypted at rest and in transit
Compliance with GDPR, CCPA, etc.

Chanl automatically removes common PII during training:

Credit card numbers
Social security numbers
Email addresses
Phone numbers
Specific account numbers

But always review your data before training.

Troubleshooting

Training failing or stuck

Problem: Training job not completingSolutions:

Check dataset has at least 50 conversations
Verify all calls in dataset are accessible
Reduce batch size if hitting memory limits
Contact support if stuck for >24 hours

Model performing worse than baseline

Problem: Fine-tuned model scores lowerInvestigate:

Did you train on diverse enough data?
Are training examples actually high-quality?
Did you overtrain (too many epochs)?
Test on validation set - is it overfitting?
Compare on same scenarios as training data

Model too similar to baseline

Problem: No measurable improvementSolutions:

Use more training examples (aim for 200+)
Increase learning rate slightly
Add more epochs (try 4-5)
Ensure training data is different from what base model already does well

Model behaving inconsistently

Problem: Works well sometimes, poorly othersSolutions:

Training data may have conflicting examples
Review dataset for contradictory conversations
Add more examples of the edge cases
Consider separate models for different use cases

Cost Considerations

Fine-tuning costs depend on:

Training data size - More conversations = higher cost
Base model - GPT-4 more expensive than GPT-3.5
Training duration - More epochs = more compute time
Inference - Fine-tuned models may cost more per call

Typical costs:

Training: $50-500 per job depending on size
Inference: 10-50% more per call than base model
ROI: Usually positive if improvement >5% on key metrics

What’s Next?

Optimize Agents

Configure agents to use your fine-tuned models

Test Scenarios

Validate fine-tuned model performance

Monitor Analytics

Track improvement from fine-tuning

Improve Prompts

Combine fine-tuning with prompt optimization

​Fine-Tuning

​Why Fine-Tuning Matters

​How Fine-Tuning Works

​When to Use Fine-Tuning

Don't Fine-Tune Yet

Fine-Tune When

​Collecting Training Data

​Finding Good Conversations

​What Makes Good Training Data?

​Creating a Training Dataset

​Starting a Fine-Tuning Job

​Training Parameters

​Monitoring Training Progress

​What the Metrics Mean

Training Loss

Validation Loss

Accuracy

​Testing Fine-Tuned Models

​Deploying Fine-Tuned Models

​Gradual Rollout

​A/B Testing

​Monitoring After Deployment

​Continuous Improvement

​Collecting More Data

​Fine-Tuning Use Cases

​Customer Service Excellence

​Sales Optimization

​Compliance & Accuracy

​Best Practices

​Data Privacy & Security

​Troubleshooting

​Cost Considerations

​What’s Next?

Optimize Agents

Test Scenarios

Monitor Analytics

Improve Prompts

Fine-Tuning

Why Fine-Tuning Matters

How Fine-Tuning Works

When to Use Fine-Tuning

Collecting Training Data

Finding Good Conversations

What Makes Good Training Data?

Creating a Training Dataset

Starting a Fine-Tuning Job

Training Parameters

Monitoring Training Progress

What the Metrics Mean

Testing Fine-Tuned Models

Deploying Fine-Tuned Models

Gradual Rollout

A/B Testing

Monitoring After Deployment

Continuous Improvement

Collecting More Data

Fine-Tuning Use Cases

Customer Service Excellence

Sales Optimization

Compliance & Accuracy

Best Practices

Data Privacy & Security

Troubleshooting

Cost Considerations

What’s Next?