A working agent, six monitoring layers, and a few uncomfortable questions for the industry.

The agent is live at jaxconsult.com. His name is Jax. He’s a golden retriever. You can talk to him in your browser. He’ll answer you out loud, in a real voice, and his mouth moves when he speaks. He knows what page you’re on. If you tell him you’re in higher education, he’ll route the conversation to the right person on our team. If you give him your email, he creates a record in our Salesforce, looks at the right account executive’s actual Google Calendar, and offers you real available meeting times before you close the tab. If you come back next month, he remembers you and picks up where you left off.

What’s Actually Running Under the Hood

That’s the surface. Underneath, there are six monitoring layers running continuously, cross-session visitor identity that survives across weeks, vertical-based lead routing across five different practice areas, calendar booking with timezone-correct availability, voice synthesis, mobile audio that works on iPhones and Androids, and an observability loop that catches mistakes and ships fixes within minutes.

I built it as an experiment. Four days of work, one QA tester helping me verify behavior. I’m the CEO of Jax Consulting. We have a real delivery team that builds real things for clients every day. This wasn’t that team. This was me running a personal test of what one person could produce with Revecast Orchestrate, our AI-assisted delivery platform built for Salesforce work.

I’m writing this up because the result of the experiment surprised me, and I think it’s going to surprise the people who read it.

A year ago, if a customer had asked us to build what’s now running on our site, I would have scoped it as a five-to-six-month engagement. A project lead, two senior developers, a junior developer, dedicated QA, project management overhead, design reviews, two rounds of UAT. The kind of statement of work that quotes in the low six figures. I would have felt completely fair quoting it that way, because that’s what comparable work has actually taken us historically. I have the time logs.

Four days, one person, one QA tester.

How We Built It: Bugs and All

The agent runs on Salesforce’s headless Agentforce API — the same platform we’d implement for any enterprise client. A custom WordPress plugin I wrote handles the auth proxy, streams the responses via SSE, and rate-limits across six endpoint tiers. ElevenLabs handles voice synthesis. The lip-sync is a Web Audio API frequency analyzer mapping audio amplitude to mouth-open percentage on a canvas overlay. The iPhone audio works because the widget plays a silent MP3 frame on the user’s first send-button click, which unlocks iOS Safari’s autoplay policy for the rest of the session. I learned that the hard way after the audio silently broke on mobile and a real visitor told us. 

Calendar booking uses a Google service account with Domain-Wide Delegation, so onboarding a new AE means granting one scope rather than running a per-person OAuth flow. The Google freeBusy API returns times in UTC, which I initially parsed with the wrong DateTime method — that interpreted the timestamps in our org’s timezone and shifted every booked busy block by four to five hours, which meant the agent was happily offering customers slots that were actually booked. I caught it a few weeks in. The fix was one line. It used valueOfGmt and applied the parsed offset.

I’m telling you these specific bugs because I want you to understand this is a real build, not a demo. Real builds have bugs. Mine had bugs. I fixed them.

Six Monitoring Layers Most Salesforce AI Builds Skip

Six monitoring layers run on this agent continuously. Every visitor conversation gets scored daily by an AI evaluator on a 1-to-10 scale with sentiment, vertical, and issue category, written to a custom Salesforce object. Every Monday at 9am ET, seven scripted synthetic conversations run against the live agent and assert persona, routing, lead capture, post-booking suppression. Every hour, four production endpoints get tested with real HTTP requests. Every morning, Playwright runs the agent through six device emulations including iPhone Safari and Android Chrome, and asserts that audio actually plays via the real onplaying browser event. A centralized alert webhook logs every alert as a queryable Salesforce Task. And automated cleanup deletes synthetic test data so it never pollutes the real AE pipeline.

The reason I want to walk through that monitoring stack carefully is that it’s the part most AI agents in production don’t have at all. Most of them are deployed without regression testing, without health monitoring, without device-level e2e, without observability of any kind. They drift in silence. The reason I built six layers on a personal-experiment project — instead of zero, which would have been industry-standard — is that Orchestrate made building them roughly as cheap as not building them. When the marginal cost of production discipline approaches zero, the right amount of production discipline goes up.

The Salesforce AI Agent Self-Improvement Loop

The self-improvement loop is the part I want to be precise about, because I don’t want to overstate it. The agent does not modify its own prompts. What happens is the QA layer surfaces issues — drift in persona, a missed routing call, a hallucinated answer, a degraded response — and Orchestrate proposes a fix. I approve the fix. Often through Slack, on my phone, in a few seconds. The fix ships. The next conversation is better.

It’s not autonomous self-modification. It’s a very tight observability and remediation loop with me as the human in the middle, moving at minutes-not-weeks speed.

That loop is the part that I keep thinking about. Because it means the agent that is live on the site today is meaningfully better than the agent that shipped at the end of day four. The agent three months from now will be meaningfully better than the agent today. And there’s no managed services contract paying for a team to maintain it. It’s me, sometimes ninety seconds in Slack, and the QA layer doing the rest.

Now the Uncomfortable Part

I built this as a personal experiment, but the implications aren’t personal. They’re industry wide, and they’re not subtle.

If one person with the right tooling can produce a working production agent in four days, then a meaningful portion of what Salesforce services firms have been scoping as multi-month engagements is no longer multi-month work. Not all of it. Architecture still requires architects. Integration design across complex enterprise systems still requires people who have seen a hundred Salesforce orgs. Change management still requires humans who can read a room. Vertical compliance still requires specialists. Those things are more valuable now, not less, because there’s less noise around them.

But the build itself? The configuration. The integration plumbing. The custom Apex. The QA. The monitoring layer that almost nobody actually builds because it’s “too expensive.” The work that fills the line items on most managed services contracts. That work, today, is something that can be compressed by an order of magnitude when the right tooling is in the hands of the right person.

I’m not the right person. I’m a CEO who codes when he has to. Imagine what our actual delivery team — the people who do this for a living — is going to produce when this tooling is in their hands on every client engagement. That’s the conversation I’m having internally at Jax right now, and it’s the conversation I think every Salesforce AI consultant and services firm leader is going to be having within the next twelve months, whether they want to or not.

If you’re a Salesforce account executive reading this and you have a customer who needs an Agentforce build, talk to us before anyone scopes it at twenty weeks.

If you’re a customer of ours reading this, this is a preview of how we’re going to deliver your next engagement.

If you’re a prospective customer reading this, the agent in the corner of our site is the kind of thing we can now build for you. Go ask him hard questions. Try to break him. Tell him you want to book a meeting and see what happens.

If you’re a peer services firm reading this and you think I’m overstating any of it, I’d welcome you proving me wrong. Build something comparable in four days with one person and publish the methodology. I’ll link to it from this page.

If you’re evaluating Revecast Orchestrate and the velocity claims sound too good to be true, I’m one data point. But I’m a data point you can talk to, and there is an agent you can interact with right now, and a set of bugs I’ve documented above that you can verify are real engineering rather than marketing. Make of that what you will.

The agent is live. The build is real. The implications are the part I’m still sitting with.