Building an AI Agent on Cloudflare: From RAG to Multi-Turn Conversations

The Problem With V1

My original “Ask This Blog” feature worked — you typed a question, it searched my blog content, and returned an answer. But every question was a blank slate. No memory. No follow-ups. No ability to pull from multiple data sources in a single response.

That’s fine for a demo, but it’s not how real AI applications work. Customers building AI features need conversation state, tool orchestration, and the ability to chain actions together. I couldn’t credibly talk about agentic patterns until I’d built one.

What V2 Does Differently

Visit saltwaterbrc.com/ask-ai and you’ll see a chat interface instead of a search box. The agent can:

Remember context across messages — ask a follow-up and it knows what you were talking about
Call tools during a response — search blog content, look up customer use cases by industry, fetch live site stats
Stream responses in real-time over WebSocket — no waiting for the full response to generate

Under the hood, this is built on the Cloudflare Agents SDK with three tools:

search_blog — Embeds your question, searches Vectorize for relevant blog content, returns source passages
find_use_cases — Maps Cloudflare products to real customer use cases by industry vertical (healthcare, fintech, gaming, etc.)
get_site_stats — Fetches live visitor count and page analytics from the Durable Objects counter

The Architecture

Browser ←WebSocket→ Agent Worker (Durable Object)
                        ├── search_blog → Vectorize
                        ├── find_use_cases → Hardcoded data
                        ├── get_site_stats → Counter Worker
                        └── Workers AI (GLM-4.7-Flash) via AI Gateway

The agent runs as a Durable Object — a single-threaded, stateful instance that persists your conversation in SQLite. The Agents SDK manages this automatically. I didn’t write any state management code. The SDK handles WebSocket connections, message history, tool dispatch, and streaming.

V1 vs V2

	V1: Ask This Blog	V2: Ask AI
Architecture	Stateless Worker	Agents SDK (Durable Object)
Conversation	Single Q&A	Multi-turn with memory
Tools	None (hardcoded RAG)	3 tools (search, use cases, stats)
Connection	HTTP POST	WebSocket (real-time)
State	None	Persistent per session
Use Cases	Not surfaced	Industry-specific mappings

What I Learned

The Agents SDK abstracts the hard parts. I didn’t have to build WebSocket handling, conversation storage, or tool dispatch. The SDK’s AIChatAgent class handles all of it. My code focuses on defining tools and the system prompt.

Tool calling is where agents get useful. A chatbot that only generates text is limited. An agent that can search a database, look up structured data, AND generate a response — that’s what customers actually need. The find_use_cases tool alone makes this more valuable than V1, because it surfaces real customer examples without hallucinating.

Model choice matters for tool calling. Not every model handles tool calling well with the Vercel AI SDK pattern the Agents SDK uses. We started with Llama 3.3 70B and it refused to use the tools. Switching to GLM-4.7-Flash (the same model the official agents-starter template uses) solved it instantly.

The kebab-case gotcha. The Agents SDK routes requests using PartyServer under the hood, which converts Durable Object binding names to kebab-case internally. Our binding SaltWaterAgent became salt-water-agent in the URL path. This took real debugging to figure out — reading the source code of partyserver/dist/index.js to find the camelCaseToKebabCase conversion.

The Sales Angle

This is the demo I wanted to be able to give. A customer asks “how would a bank use Workers?” and instead of opening a slide deck, I can point them to the live agent that searches real content and surfaces industry-specific use cases with actual customer names.

The tech stack behind it is entirely Cloudflare: Workers AI for inference, Vectorize for semantic search, Durable Objects for state, AI Gateway for observability, and the Agents SDK tying it all together. No external APIs. No third-party services. Everything runs at the edge.

Both V1 and V2 are live at /ask and /ask-ai so visitors can compare the approaches side by side.