.01 / case study

Panthir AI.

A 24/7 voice receptionist for law firms — answers every call, qualifies the matter, runs live conflict checks, and hands off to SMS. Sub-second turn latency. Running in production.

Year
2026
Role
End-to-end
Vertical
Legal
Status
Shipping
.02 / problem

Calls drop, retainers walk.

Law firms miss roughly 40% of inbound calls, and a missed call from a qualified plaintiff is a missed retainer. Existing answering services fail two ways: they're slow — humans triaging from scripted Rolodexes — and they have no idea what the firm's conflict list looks like.

The result is qualified leads dropped at the front door — every night, every weekend, every lunch break.

.03 / approach

One ring. One second.

Panthir picks up on the first ring, qualifies the caller's matter, runs a live conflict check against the firm's own data, and either schedules a call-back or hands off to SMS — all inside a per-turn budget of ~700 ms.

That latency target is non-negotiable. Past about a second of silence, callers hear “AI” and hang up. Every component in the pipeline was chosen to defend the budget.

.04 / architecture

Latency is a stack, not a number.

The hot path is a streaming pipeline. The intent router and conflict-check service start working before the caller has finished speaking. The LLM only runs on the slice of work that actually needs it.

TelephonyStreaming STTIntent RouterLLMTTSTelephony
↳ Intent Router → Conflict DB (parallel fan-out)

Three things keep latency under target:

Streaming STT with partial commits. Routing starts on the first phrase, not the final one.

Custom intent router, not a single-prompt agent. Most calls follow one of about twenty paths. Routing them deterministically saves an LLM hop on the hot path and makes the behaviour testable.

Conflict checks fan out in parallel.The moment a caller name is heard, the database query fires. By the time the LLM needs the answer, it's already there.

.05 / decisions

Three calls, three reasons.

Rust on the hot path.

The router and conflict-check service are Rust + Tokio. Tail latency — not average latency — is what callers actually notice. Rust gives us a p99 we can predict.

No off-the-shelf voice platform.

Generic voice agents are built to replace IVRs. Legal intake has firm-specific conflict logic, jurisdiction rules, and matter-type qualification — work that has to live in your code regardless of which platform you start on.

SQLite per firm.

Each firm gets a single-tenant store. A firm's conflict list fits in memory and reads in microseconds; Postgres would be over-engineered for the access pattern and add network hops we don't want on the hot path.

.06 / outcome

Live, paying, answering.

Deployed with paying firms. The system answers every inbound call, completes conflict checks before the caller finishes describing their matter, and hands clean, qualified leads off to the firm's case-management system within seconds.

100%
Inbound answer rate
1m 12s
Avg handle time
<700ms
Per-turn latency
.07 / stack
JavaRustReactTypeScriptNext.jsPostgresGCPTwilioDockerDeepgramCartesia AIVertex AI

Building something hard?