Lesson 06 — Browser inside codemode
Grounding: issue
#1501,
building on PRs
#1566
(codemode provider runtimes) and
#1492
(reusable Browser Run sessions). Code in
packages/codemode/src/executor.ts,
packages/agents/src/browser/, and
packages/think/src/tools/.
1. Two sandboxes that should be one
The SDK ships two code-mode surfaces that look alike but never meet.
The codemode tool runs model-written JavaScript in a
sandboxed Worker isolate, exposing your tools under namespaces:
codemode.* for arbitrary tools, state.* for
the workspace filesystem. The browser tool
(browser_execute) runs model-written JavaScript in a
separate sandbox, exposing only a cdp helper to
drive Chrome over the DevTools Protocol.
They are siblings. Code in the browser sandbox cannot call
state.writeFileBytes(); code in the codemode sandbox
cannot call cdp.send(). So the obvious agent task —
"navigate to a page, screenshot it, save the PNG to the workspace" —
cannot be expressed in one step. You drive the browser in one tool
call, return the bytes through the model, and write them in another.
Round-tripping binary through the LLM is exactly what you do not want.
Issue #1501 asks to nest them: make cdp available inside
the normal codemode execution, alongside state.* and
everything else. The motivating snippet:
async () => {
const { targetId } = await cdp.send("Target.createTarget", { url: "about:blank" });
const sessionId = await cdp.attachToTarget(targetId);
await cdp.send("Page.navigate", { url: "https://example.com" }, { sessionId });
const shot = await cdp.send("Page.captureScreenshot", { format: "png" }, { sessionId });
await state.writeFileBytes("/captures/example.png", shot.data, "image/png");
return { saved: "/captures/example.png" };
}
This lesson reconstructs how two in-flight PRs, combined, make that one code block possible without any new sandbox plumbing.
2. Primer: providers, namespaces, runtimes
Codemode's unit of capability is the tool provider.
createCodeTool
(packages/codemode/src/tool.ts:93) takes a list of
providers and merges them into a single code tool. Each
provider contributes a namespace and a set of functions; every
namespace is visible to the same sandbox code.
Field on ToolProvider | Purpose |
|---|---|
name | The sandbox namespace, e.g. state → state.readFile(). |
tools | AI SDK tools; used to generate the type block the model sees, and to extract host-side fns. |
types | Optional hand-written type declarations (used instead of generating from tools). |
createRuntime | Optional per-execution lifecycle (the new bit from #1566). |
The state provider is the model to copy: it is just
{ name: "state", tools, types: STATE_TYPES }
(packages/shell/src/workers.ts). Browser-in-codemode is
"another provider" — once it has a runtime.
3. The runtime: per-execution lifecycle
PR #1566 adds createRuntime to a provider. When the
executor runs a code block it instantiates each provider's runtime,
overlays the runtime's fns on top of the static ones,
and disposes the runtime when the block finishes
(packages/codemode/src/executor.ts:354-372):
for (const provider of providers) {
const runtime = provider.createRuntime ? await provider.createRuntime() : undefined;
if (runtime) runtimes.push(runtime);
activeProviders.push({
name: provider.name,
fns: { ...provider.fns, ...(runtime?.fns ?? {}) } // runtime fns win
});
}
// … run the code …
// finally:
await disposeRuntimes(); // reverse order, errors surfaced
A runtime is { fns, dispose? }. Its job is to hold state
that should live exactly as long as one execution — a database
connection, a browser session — and tear it down afterwards. The
executor guarantees dispose() runs even if the code
throws (executor.ts:477).
Why this is the right hook
A browser session is precisely "state scoped to one execution that must be cleaned up." Before #1566 there was nowhere to hang that lifecycle inside codemode. The runtime is the missing seam.
4. The lease: one-shot vs reusable
PR #1492 introduces a BrowserSessionManager
(packages/agents/src/browser/session-manager.ts) that
hides whether a browser session is fresh or reused behind one method:
interface BrowserSessionManager {
acquire(): Promise<BrowserLease>; // a CdpSession + release()
info(): Promise<BrowserSessionInfo | undefined>;
close(): Promise<void>;
reset(): Promise<BrowserSessionInfo | undefined>;
}
Two implementations, same contract:
- One-shot (default):
acquire()connects a freshCdpSession;release()closes it. A clean browser per execution. - Reusable (
session: { mode: "reuse" }):acquire()reconnects to a Browser Run session whose id is persisted (in Think, in the agent's SQLite DB);release()only disconnects the WebSocket, leaving tabs, cookies, and navigation state warm for the next call.
The caller never branches on the mode. That is what makes the next step a one-liner.
5. Joining them: the cdp provider
The lease is the lifecycle the runtime wants. So the
browser-in-codemode runtime acquires a lease lazily on first
cdp.* call and releases it on dispose
(packages/agents/src/browser/shared.ts):
export function createCdpRuntime(sessionManager) {
let lease, leasePromise;
const getSession = async () => {
leasePromise ??= sessionManager.acquire(); // exactly once per execution
lease = await leasePromise;
return lease.session;
};
return {
fns: {
send: async (m, p, o) => (await getSession()).send(m, p, o),
attachToTarget: async (t, o) => (await getSession()).attachToTarget(t, o),
getDebugLog: async (n) => (await getSession()).getDebugLog(n),
clearDebugLog: async () => (await getSession()).clearDebugLog()
},
dispose: async () => { await lease?.release(); } // close or disconnect
};
}
Wrapping that in a provider is then trivial:
export function createBrowserCdpProvider(options) {
const sessionManager = createBrowserSessionManager(options);
const provider = {
name: "cdp",
tools: {}, // no static tools…
types: CDP_TYPES, // …just the `declare const cdp` block
createRuntime: () => createCdpRuntime(sessionManager)
};
return { provider, sessionManager, /* lifecycle handlers */ };
}
Three properties fall out for free:
- Lazy. No browser is opened unless the code
actually calls
cdp.*. A code block that only touchesstate.*pays nothing. - Reuse within an execution. The
leasePromise ??=means everycdp.*call in one block shares one session — tabs survive across calls. - Mode-agnostic cleanup.
disposecallsrelease(); the manager decides whether that closes the browser or just parks it.
sequenceDiagram
participant M as Model code (sandbox)
participant Ex as Executor
participant RT as cdp runtime
participant SM as SessionManager
participant B as Browser Run
Ex->>RT: createRuntime()
Note over RT: no session yet
M->>RT: cdp.send("Target.createTarget")
RT->>SM: acquire() (first use)
SM->>B: connect / reconnect
SM-->>RT: lease(session, release)
RT-->>M: result
M->>RT: cdp.send("Page.navigate")
Note over RT: same lease, no re-acquire
M->>M: state.writeFileBytes(...)
Ex->>RT: dispose()
RT->>SM: lease.release()
Note over SM,B: one-shot → close; reuse → disconnect, keep warm
6. The host / sandbox boundary
A subtle but important point: cdp.* functions do not run
inside the isolate. The sandbox proxy serialises the call and
dispatches it back to the host via Workers RPC
(ToolDispatcher, executor.ts:205-227). The
host-side fns hold the real CdpSession and
talk to Browser Rendering through the BROWSER Fetcher
binding.
That means the sandbox stays fully network-isolated
(globalOutbound: null) — model code cannot
fetch() the internet — yet it can still drive a browser,
because the browser access is mediated by the host exactly like
state.* filesystem access. Same trust model, no new hole.
One detail that bit the merge
ToolDispatcher.call always spreads arguments
positionally (fn(...args)). #1492 carried a
positionalArgs: true flag from an older executor API;
in the runtime-based executor that flag does not exist and every
provider fn is positional already. Resolving the merge meant
deleting it, not preserving it.
7. Wiring it into a Think agent
createExecuteTool already accepts extra providers
(packages/think/src/tools/execute.ts:140-166), so the
Think helper just produces a provider plus the optional session
lifecycle tools:
getTools() {
const { provider, sessionTools } = createBrowserProvider({
browser: this.env.BROWSER,
session: { mode: "reuse" },
});
return {
...sessionTools, // browser_session_info / close / reset
code: createExecuteTool({
tools: myTools, // codemode.*
state: createWorkspaceStateBackend(this.workspace), // state.*
loader: this.env.LOADER,
providers: [provider], // cdp.* ← the new bit
}),
};
}
Note createBrowserProvider takes no loader:
the cdp helpers run host-side, and the surrounding
code tool owns the sandbox executor. The loader lives
with the tool that needs it, not duplicated onto the browser config.
8. Self check
Q1
Why does exposing the browser as a codemode provider require no new sandbox or executor code, when the standalone browser tool needed its own executor?
Answer
Because the code tool is a composition root that
already runs one executor over many provider namespaces. Adding
cdp to its provider list puts cdp.* in
the same isolate as state.*. The only new concept
needed was a per-execution lifecycle for the session — supplied
by #1566's createRuntime — not a new sandbox.
Q2
The runtime acquires the lease with leasePromise ??=
rather than calling acquire() in
createRuntime. Why does that distinction matter?
Answer
Laziness. createRuntime runs for every execution,
including ones that never touch the browser. Acquiring on first
cdp.* call means a state-only block
opens no browser session and pays no connection cost. The
??= also guarantees a single shared session for all
cdp.* calls within the block.
Q3
In reuse mode, what does dispose do at the
end of a code block, and what survives to the next block?
Answer
dispose calls lease.release(), which in
the reusable manager only disconnects the CDP WebSocket. The
Browser Run session itself is left alive (its id persisted), so
tabs, cookies, local storage, and navigation state survive to the
next execution's acquire(). One-shot mode would have
closed the session instead.
Q4
Model code calls cdp.send(...). Where does the CDP
request actually originate, and why does that keep the sandbox's
globalOutbound: null isolation intact?
Answer
The call is dispatched out of the isolate via Workers RPC to a
host-side function holding the CdpSession, which
reaches Browser Rendering through the BROWSER binding.
The isolate never opens a socket itself, so blocking its outbound
network does not block browser control — exactly the way
state.* filesystem access already works.