Performance metrics
Performance metrics are about quality. Where the Overview tells you how much, this page tells you how well.
In this guide:
- Latency (response time)
- Accuracy (improve-rate)
- Resolution and escalation
- Agent SLAs
- Use the data
Latency: how fast does the bot reply?
Top section:
- p50 latency — median time from user message to bot reply.
- p95 latency — 95th percentile. The “long tail” people most notice.
- By model — broken down by which LLM was used.
Screenshot: Performance charts showing latency distributions per model.
If p95 is above 5 seconds, your bot feels sluggish. Common fixes:
- Switch to a faster model in Playground.
- Reduce context — prune unused knowledge sources.
- Check for slow AI actions inflating round-trip time.
Accuracy: how often is the bot right?
Three signals:
- Improve rate — flags per 100 conversations from Improve responses. Higher = bot wrong more often.
- CSAT correlation — average CSAT for bot-only vs. agent-handled conversations. Useful for justifying the bot’s value.
- Refusal rate — % of replies that say “I don’t know.” Healthy is 5–15%; over 25% means you need more knowledge.
Resolution and escalation
- Resolution rate — % of conversations closed without escalation.
- Escalation rate — % handed to humans.
- Reopen rate — % of solved conversations the customer reopens within 7 days. High reopens = answers weren’t actually solving.
A great chatbot:
- 60–80% resolution rate (rest escalate).
- Under 10% reopen rate.
- Improve rate under 3 per 100.
These are guidelines, not laws — your customer mix matters.
Agent SLAs (live conversations)
For escalated conversations:
- First response time — bot escalation to first agent message. SLA target configurable per plan.
- Full resolution time — escalation to closed.
- SLA hit rate — % of conversations meeting the SLA. Color-coded green/yellow/red.
Configure SLAs in Settings → SLAs. Track misses to identify staffing gaps.
Drill-downs
Each metric supports drill-down:
- Click a chart segment → filtered conversation list.
- Open a conversation → see what happened, learn from it.
Tips
- Look at p95, not just p50. Half your users have a worse experience than the median.
- Improve rate trends over weeks. Don’t react to a daily spike.
- Cross-reference accuracy with knowledge changes. A jump in improve-rate after a Guidelines edit means your edit hurt; revert.