Performance metrics

Performance metrics are about quality. Where the Overview tells you how much, this page tells you how well.

In this guide:

  • Latency (response time)
  • Accuracy (improve-rate)
  • Resolution and escalation
  • Agent SLAs
  • Use the data

Latency: how fast does the bot reply?

Top section:

  • p50 latency — median time from user message to bot reply.
  • p95 latency — 95th percentile. The “long tail” people most notice.
  • By model — broken down by which LLM was used.

Performance charts Screenshot: Performance charts showing latency distributions per model.

If p95 is above 5 seconds, your bot feels sluggish. Common fixes:

  • Switch to a faster model in Playground.
  • Reduce context — prune unused knowledge sources.
  • Check for slow AI actions inflating round-trip time.

Accuracy: how often is the bot right?

Three signals:

  • Improve rate — flags per 100 conversations from Improve responses. Higher = bot wrong more often.
  • CSAT correlation — average CSAT for bot-only vs. agent-handled conversations. Useful for justifying the bot’s value.
  • Refusal rate — % of replies that say “I don’t know.” Healthy is 5–15%; over 25% means you need more knowledge.

Resolution and escalation

  • Resolution rate — % of conversations closed without escalation.
  • Escalation rate — % handed to humans.
  • Reopen rate — % of solved conversations the customer reopens within 7 days. High reopens = answers weren’t actually solving.

A great chatbot:

  • 60–80% resolution rate (rest escalate).
  • Under 10% reopen rate.
  • Improve rate under 3 per 100.

These are guidelines, not laws — your customer mix matters.

Agent SLAs (live conversations)

For escalated conversations:

  • First response time — bot escalation to first agent message. SLA target configurable per plan.
  • Full resolution time — escalation to closed.
  • SLA hit rate — % of conversations meeting the SLA. Color-coded green/yellow/red.

Configure SLAs in Settings → SLAs. Track misses to identify staffing gaps.

Drill-downs

Each metric supports drill-down:

  • Click a chart segment → filtered conversation list.
  • Open a conversation → see what happened, learn from it.

Tips

  • Look at p95, not just p50. Half your users have a worse experience than the median.
  • Improve rate trends over weeks. Don’t react to a daily spike.
  • Cross-reference accuracy with knowledge changes. A jump in improve-rate after a Guidelines edit means your edit hurt; revert.

What’s next