Case studiesB2B SaaS
Cutting first-response time 71% with a support copilot built on Zendesk + Notion
Support copilot for a 220-person B2B SaaS.
“We had tried two AI vendors before. Both were demos. This is the first one that actually understands our product because it actually reads our docs.”
The challenge
The client's support team was drowning. 1,800 tickets per week, ten percent month-over-month growth, and a hiring freeze that meant headcount was not the answer. Their existing macros covered maybe 40% of common questions but required agents to know which macro existed and where to find it. Median first-response time had crept past fourteen minutes and CSAT was sliding.
They had tried two off-the-shelf AI chatbots. Both produced confident wrong answers because neither had access to the team's actual knowledge base — a mix of Notion runbooks, Zendesk macros, and Slack-pinned notes. Trust evaporated within a week and both were rolled back.
How we approached it
We scoped a 2-week AI Sprint with a hard rule: no autonomous replies in week 1. The copilot drafts; humans send. That single constraint unlocked the rest.
Day 1–3 — Ingest and rank. Built a Postgres + pgvector index over the client's Notion workspace, Zendesk macros, and the last 90 days of resolved tickets. Tagged every chunk with source, last-updated date, and a confidence prior based on agent thumbs-ups in historical replies.
Day 4–7 — The copilot loop. Wired into Zendesk via the Apps SDK. When an agent opens a ticket, the copilot retrieves the top three relevant docs, drafts a reply with citations, and shows the agent a confidence band. Above 80% confidence the draft pre-fills the reply box. Below, it shows as a suggestion the agent can pull in or ignore.
Day 8–10 — Eval harness + ship. Built a 200-ticket eval set from historical resolutions. The copilot scored 91% match against agent-validated answers on first pass. Shipped to production behind a feature flag for one team of six agents.
Week 3–6 — Hardening. Rolled out to the full support org. Added an autonomous-reply lane for a narrow set of FAQ-class tickets (password resets, plan changes, billing receipts) that scored 99%+ in eval. Trained the team on what "good copilot output" looks like and built a feedback loop that auto-promotes high-rated answers into the index.
The outcome
Six weeks in, the copilot was drafting on 100% of incoming tickets and autonomously closing 34% without human reply. First-response time dropped from a 14-minute median to 4 minutes. CSAT on AI-touched tickets came in 8 points higher than the baseline because agents could spend their time on harder issues instead of typing the same answer for the hundredth time.
The bigger win was retention. Two senior agents who had been considering leaving stayed because the copilot eliminated the most demoralizing part of the job: typing the same five answers all day. The client paid back the sprint cost in deflection alone within six weeks.
Stack
- OpenAI GPT-4.1
- Zendesk API
- Notion API
- Postgres + pgvector
- Vercel
Team
Two engineers + design + a fractional product lead
Handoff
Full code in their Vercel + GitHub org. Runbook covering eval re-runs, doc re-ingest cadence, and the autonomous-reply expansion playbook. Two pairing sessions with their senior backend engineer.
Your engagement, next on this list