Just Hired AI · AI agency · 2026

Digital Workforce Platform — Voice AI, RAG, and real-time telephony

Multi-tenant SaaS that lets any business deploy a Voice AI receptionist on their own number in minutes.

3 min read

XanoVertex AITwilioGemini LiveRAG

Digital Workforce Platform — Voice AI, RAG, and real-time telephony

Multi-tenant

Tenants supported

Scalable

Concurrent calls

Self-serve

Knowledge ingestion

The problem

The agency had a strong agent product but every new client was a custom integration:

2–3 weeks of telephony plumbing per onboard
Knowledge base hand-built per business
No way for clients to update their own agent

They wanted a product, not a service line. Same agent quality, but tenant-self-serve and 1-day onboarding.

The approach

The architecture has three layers:

Three-layer architecture

Why Gemini Live (instead of OpenAI Realtime)

For this client, two reasons:

Multimodal pricing. Vertex billing rolled into existing GCP credits.
Latency from EU/US. Sub-300ms audio round-trip held in our tests.

I'd reach for OpenAI Realtime instead when:

The client already lives in the OpenAI ecosystem
They want voice cloning via parallel ElevenLabs hookups
They need OpenAI's stronger function-calling reliability for complex tool chains

The RAG layer

Each tenant gets:

An ingestion pipeline (PDF / website / docs) that chunks + embeds into pgvector with a tenant_id filter.
A retrieval prompt that runs before every model turn, scoped strictly to that tenant.
A tenant_id-scoped tool API the agent can call (look up an order, book an appointment).

The strict scoping rule

Every retrieval and every tool call is bounded by tenant_id. We treat tenant data isolation the same way we'd treat row-level security in Postgres — defense in depth, not just a query filter.

A NestJS service would have given more flexibility, but the agency's existing team was Xano-fluent. Putting tenant config + routing in Xano meant non-engineers could ship tenant-specific tweaks without a deploy.

Where Xano was the wrong tool: the call runtime itself. Streaming audio + tool orchestration belongs in a proper service. So we kept Xano on the slow path (config, webhooks, billing) and put a small NestJS service in front of Twilio for the fast path.

Twilio number provisioning

Each tenant provisions their own number through the console. Twilio's number search → purchase → webhook bind is a three-call flow; I wrapped it in a Xano function stack so the UI just calls provision_number(tenant_id, area_code).

Outcomes

3 wks

Time to MVP

From contract to first live tenant

1 day

Tenant onboarding

Down from 2–3 weeks per integration

Tenant-isolated

RAG + tools

Strict per-tenant scoping at every layer

Lessons

Two backends are sometimes the right call. Don't force one tool to do both slow-path config and real-time orchestration.
Streaming audio debug needs first-class tooling. I built a Loom-style transcript replay early on — saved a week of round-tripping with the client.
Tenant isolation is policy, not query syntax. Every retrieval, every tool, every webhook needs to assert it.

Building something similar?

Send a quick note — happy to compare notes on the architecture.

Send a message More case studies