Every AI chat tutorial assumes you have a backend. An Express server, a Next.js API route, a Cloudflare Worker — something that holds your API key and proxies requests to OpenAI. This is so universal that most developers don’t question it: if you want AI in a web app, you need a server.
You don’t. This article explains an architecture where the user brings their own AI provider and your web app has zero server-side infrastructure for AI features.
The standard architecture and its costs
Here’s what a typical AI chat web app looks like:
Browser → Your Server (API route) → OpenAI / Claude / etc.
Your server holds the API key in an environment variable. Every user request passes through your server, which forwards it to the AI provider and streams the response back. You pay for every inference call. You handle rate limiting, key rotation, and credential security.
The cost breakdown for a typical chat app with 1,000 daily active users:
| Cost | Monthly estimate |
|---|---|
| AI inference (GPT-4o @ ~$5/1M tokens) | $150–500 |
| Server hosting (handling streaming) | $20–50 |
| Developer time on proxy/auth code | 2–5 days initial + ongoing |
| Security liability for stored API keys | Unquantifiable |
For hobby projects and small tools, this is often the reason AI features don’t get built. The infrastructure cost isn’t worth it for a side project.
The alternative: user-provided AI
What if the user already has an AI subscription? Most developers using AI tools already have at least one of: an OpenAI key, a Claude key, an Ollama instance running locally, or access to a cloud provider like Bedrock or Vertex AI.
Instead of your app proxying requests through your server with your key, the user’s browser extension routes requests to their provider with their credentials. Your app sends a message, gets a response, and never sees a key.
Browser → Arlopass Extension (in browser) → Local Bridge → User's AI Provider
The architecture shift: your app is a frontend-only client. The AI pipeline runs entirely on the user’s machine.
How it works in practice
Arlopass is an open-source browser extension that implements this pattern. Here’s the actual integration:
1. Install the SDK
npm install @arlopass/web-sdk
2. Connect and chat
import { ArlopassClient } from "@arlopass/web-sdk";
const client = new ArlopassClient({
transport: window.arlopass,
origin: location.origin,
});
await client.connect({ appId: "com.myapp.chat" });
// Send a message — routes to whatever the user selected
const reply = await client.chat.send({
messages: [{ role: "user", content: "What's new in TypeScript 5.8?" }],
});
console.log(reply.message.content);
3. Stream responses
for await (const event of client.chat.stream({
messages: [{ role: "user", content: "Explain React Suspense" }],
})) {
if (event.type === "chunk") {
document.getElementById("output").textContent += event.delta;
}
}
That’s the entire AI integration. No server file. No API route. No environment variable.
What the user sees
When your app calls client.connect(), the Arlopass extension shows a consent prompt:
- The user sees which app is requesting AI access
- They select a provider from their connected list (Ollama, OpenAI, Claude, etc.)
- They pick a specific model (llama3.2, gpt-4o, claude-3.5-sonnet)
- They approve the connection
From that point, your app can send messages and receive responses. The user can change their model at any time through the extension popup — your code doesn’t change.
The architecture in detail
┌──────────────────────────────────┐
│ Your Web App │
│ @arlopass/web-sdk │
│ (10 lines, no credentials) │
└──────────────┬───────────────────┘
│ window.arlopass (injected transport)
┌──────────────▼───────────────────┐
│ Arlopass Browser Extension │
│ Consent UI · Permission store │
│ HMAC handshake · Anti-replay │
└──────────────┬───────────────────┘
│ Chrome native messaging (stdio)
┌──────────────▼───────────────────┐
│ Local Bridge Daemon │
│ Adapter host · Session manager │
│ Policy enforcement │
├──────┬──────┬──────┬─────────────┤
│Ollama│OpenAI│Claude│ 6 more... │
└──────┴──────┴──────┴─────────────┘
Trust boundaries at every layer:
- Web app ↔ Extension: Origin isolation, per-request consent
- Extension ↔ Bridge: HMAC challenge/response, ephemeral session keys, nonce replay protection
- Bridge ↔ Provider: OS keychain credential storage, least-privilege auth, request timeouts
The user’s API keys are in their OS keychain (Windows Credential Manager, macOS Keychain, Linux Secret Service) — not in browser storage, not in your app, not on any server.
When this architecture makes sense
Good fit:
- Developer tools where every user has their own AI preference
- Productivity apps (writing assistants, code tools, summarizers)
- Internal tools where teams already have API keys
- Open-source projects where you can’t afford inference costs
- Privacy-sensitive applications where prompts shouldn’t leave the user’s machine
- Hobby projects where $0 infrastructure cost matters
Not a good fit:
- SaaS products where you control the AI model and behavior (your product, your API key)
- Apps that need server-side pre/post-processing of AI responses
- Apps where you need to rate-limit or meter AI usage per user
- Consumer apps where asking users to install a browser extension is a dealbreaker
Comparison: backend proxy vs. user-provided AI
| Backend proxy | User-provided AI (Arlopass) | |
|---|---|---|
| Server infrastructure | Required | None |
| API keys in your code | Yes (.env) | None |
| AI inference cost | You pay | User pays |
| Provider flexibility | You pick one | User picks any |
| Lines of integration code | 40–80 | ~10 |
| Works offline | No | Yes (with Ollama) |
| User installs extension | No | Yes |
| Security liability for keys | High | None |
The extension installation is the honest trade-off. It’s a higher adoption friction than a pure web app. But for the right audience — developers, power users, people who already run Ollama — it’s a small ask that eliminates entire categories of infrastructure.
Frequently asked questions
What if the user doesn’t have Arlopass installed?
The SDK detects this immediately. The React SDK provides guard components (ArlopassRequiredGate) that render an install prompt. In vanilla JS, check for window.arlopass before connecting. Your app should gracefully degrade or show an onboarding flow.
Can I use this with Next.js / Remix / Astro?
Yes. The SDK is client-only — it runs in the browser. Your framework’s SSR doesn’t need to know about it. Import the SDK in a client component ("use client" in Next.js) and use it like any browser API.
What if the user switches providers mid-session?
The SDK fires a providersChanged event. Your app can listen via client.onProvidersChanged() or the React SDK’s useProviders hook, which auto-updates when the user changes their selection.
Is streaming really real-time?
Yes. The SDK uses async iterators that yield chunks as they arrive from the provider. In the React SDK, useChat accumulates streaming content via requestAnimationFrame for smooth UI updates. There’s no buffering in the extension or bridge layer — chunks are forwarded immediately.
How much overhead does the extension add?
Typically under 5 ms per request. The extension doesn’t parse or transform the AI response — it’s a routing layer. Streaming chunks are forwarded as they arrive with no queuing.
Try it
- Web SDK quickstart — Get running in 5 minutes
- React SDK quickstart — Hooks and guard components
- Starter template — Clone and run
- GitHub — MIT licensed, open source