Lesson 06 — Browser inside codemode

Grounding: issue #1501, building on PRs #1566 (codemode provider runtimes) and #1492 (reusable Browser Run sessions). Code in packages/codemode/src/executor.ts, packages/agents/src/browser/, and packages/think/src/tools/.

1. Two sandboxes that should be one

The SDK ships two code-mode surfaces that look alike but never meet. The codemode tool runs model-written JavaScript in a sandboxed Worker isolate, exposing your tools under namespaces: codemode.* for arbitrary tools, state.* for the workspace filesystem. The browser tool (browser_execute) runs model-written JavaScript in a separate sandbox, exposing only a cdp helper to drive Chrome over the DevTools Protocol.

They are siblings. Code in the browser sandbox cannot call state.writeFileBytes(); code in the codemode sandbox cannot call cdp.send(). So the obvious agent task — "navigate to a page, screenshot it, save the PNG to the workspace" — cannot be expressed in one step. You drive the browser in one tool call, return the bytes through the model, and write them in another. Round-tripping binary through the LLM is exactly what you do not want.

Issue #1501 asks to nest them: make cdp available inside the normal codemode execution, alongside state.* and everything else. The motivating snippet:

async () => {
  const { targetId } = await cdp.send("Target.createTarget", { url: "about:blank" });
  const sessionId = await cdp.attachToTarget(targetId);
  await cdp.send("Page.navigate", { url: "https://example.com" }, { sessionId });
  const shot = await cdp.send("Page.captureScreenshot", { format: "png" }, { sessionId });
  await state.writeFileBytes("/captures/example.png", shot.data, "image/png");
  return { saved: "/captures/example.png" };
}

This lesson reconstructs how two in-flight PRs, combined, make that one code block possible without any new sandbox plumbing.

2. Primer: providers, namespaces, runtimes

Codemode's unit of capability is the tool provider. createCodeTool (packages/codemode/src/tool.ts:93) takes a list of providers and merges them into a single code tool. Each provider contributes a namespace and a set of functions; every namespace is visible to the same sandbox code.

Field on ToolProviderPurpose
nameThe sandbox namespace, e.g. statestate.readFile().
toolsAI SDK tools; used to generate the type block the model sees, and to extract host-side fns.
typesOptional hand-written type declarations (used instead of generating from tools).
createRuntimeOptional per-execution lifecycle (the new bit from #1566).

The state provider is the model to copy: it is just { name: "state", tools, types: STATE_TYPES } (packages/shell/src/workers.ts). Browser-in-codemode is "another provider" — once it has a runtime.

3. The runtime: per-execution lifecycle

PR #1566 adds createRuntime to a provider. When the executor runs a code block it instantiates each provider's runtime, overlays the runtime's fns on top of the static ones, and disposes the runtime when the block finishes (packages/codemode/src/executor.ts:354-372):

for (const provider of providers) {
  const runtime = provider.createRuntime ? await provider.createRuntime() : undefined;
  if (runtime) runtimes.push(runtime);
  activeProviders.push({
    name: provider.name,
    fns: { ...provider.fns, ...(runtime?.fns ?? {}) }   // runtime fns win
  });
}
// … run the code …
// finally:
await disposeRuntimes();   // reverse order, errors surfaced

A runtime is { fns, dispose? }. Its job is to hold state that should live exactly as long as one execution — a database connection, a browser session — and tear it down afterwards. The executor guarantees dispose() runs even if the code throws (executor.ts:477).

Why this is the right hook

A browser session is precisely "state scoped to one execution that must be cleaned up." Before #1566 there was nowhere to hang that lifecycle inside codemode. The runtime is the missing seam.

4. The lease: one-shot vs reusable

PR #1492 introduces a BrowserSessionManager (packages/agents/src/browser/session-manager.ts) that hides whether a browser session is fresh or reused behind one method:

interface BrowserSessionManager {
  acquire(): Promise<BrowserLease>;   // a CdpSession + release()
  info(): Promise<BrowserSessionInfo | undefined>;
  close(): Promise<void>;
  reset(): Promise<BrowserSessionInfo | undefined>;
}

Two implementations, same contract:

  • One-shot (default): acquire() connects a fresh CdpSession; release() closes it. A clean browser per execution.
  • Reusable (session: { mode: "reuse" }): acquire() reconnects to a Browser Run session whose id is persisted (in Think, in the agent's SQLite DB); release() only disconnects the WebSocket, leaving tabs, cookies, and navigation state warm for the next call.

The caller never branches on the mode. That is what makes the next step a one-liner.

5. Joining them: the cdp provider

The lease is the lifecycle the runtime wants. So the browser-in-codemode runtime acquires a lease lazily on first cdp.* call and releases it on dispose (packages/agents/src/browser/shared.ts):

export function createCdpRuntime(sessionManager) {
  let lease, leasePromise;
  const getSession = async () => {
    leasePromise ??= sessionManager.acquire();  // exactly once per execution
    lease = await leasePromise;
    return lease.session;
  };
  return {
    fns: {
      send: async (m, p, o) => (await getSession()).send(m, p, o),
      attachToTarget: async (t, o) => (await getSession()).attachToTarget(t, o),
      getDebugLog: async (n) => (await getSession()).getDebugLog(n),
      clearDebugLog: async () => (await getSession()).clearDebugLog()
    },
    dispose: async () => { await lease?.release(); }   // close or disconnect
  };
}

Wrapping that in a provider is then trivial:

export function createBrowserCdpProvider(options) {
  const sessionManager = createBrowserSessionManager(options);
  const provider = {
    name: "cdp",
    tools: {},                 // no static tools…
    types: CDP_TYPES,          // …just the `declare const cdp` block
    createRuntime: () => createCdpRuntime(sessionManager)
  };
  return { provider, sessionManager, /* lifecycle handlers */ };
}

Three properties fall out for free:

  1. Lazy. No browser is opened unless the code actually calls cdp.*. A code block that only touches state.* pays nothing.
  2. Reuse within an execution. The leasePromise ??= means every cdp.* call in one block shares one session — tabs survive across calls.
  3. Mode-agnostic cleanup. dispose calls release(); the manager decides whether that closes the browser or just parks it.
sequenceDiagram
  participant M as Model code (sandbox)
  participant Ex as Executor
  participant RT as cdp runtime
  participant SM as SessionManager
  participant B as Browser Run

  Ex->>RT: createRuntime()
  Note over RT: no session yet
  M->>RT: cdp.send("Target.createTarget")
  RT->>SM: acquire()  (first use)
  SM->>B: connect / reconnect
  SM-->>RT: lease(session, release)
  RT-->>M: result
  M->>RT: cdp.send("Page.navigate")
  Note over RT: same lease, no re-acquire
  M->>M: state.writeFileBytes(...)
  Ex->>RT: dispose()
  RT->>SM: lease.release()
  Note over SM,B: one-shot → close; reuse → disconnect, keep warm
            

6. The host / sandbox boundary

A subtle but important point: cdp.* functions do not run inside the isolate. The sandbox proxy serialises the call and dispatches it back to the host via Workers RPC (ToolDispatcher, executor.ts:205-227). The host-side fns hold the real CdpSession and talk to Browser Rendering through the BROWSER Fetcher binding.

That means the sandbox stays fully network-isolated (globalOutbound: null) — model code cannot fetch() the internet — yet it can still drive a browser, because the browser access is mediated by the host exactly like state.* filesystem access. Same trust model, no new hole.

One detail that bit the merge

ToolDispatcher.call always spreads arguments positionally (fn(...args)). #1492 carried a positionalArgs: true flag from an older executor API; in the runtime-based executor that flag does not exist and every provider fn is positional already. Resolving the merge meant deleting it, not preserving it.

7. Wiring it into a Think agent

createExecuteTool already accepts extra providers (packages/think/src/tools/execute.ts:140-166), so the Think helper just produces a provider plus the optional session lifecycle tools:

getTools() {
  const { provider, sessionTools } = createBrowserProvider({
    browser: this.env.BROWSER,
    session: { mode: "reuse" },
  });
  return {
    ...sessionTools,                       // browser_session_info / close / reset
    code: createExecuteTool({
      tools: myTools,                      // codemode.*
      state: createWorkspaceStateBackend(this.workspace),  // state.*
      loader: this.env.LOADER,
      providers: [provider],               // cdp.*  ← the new bit
    }),
  };
}

Note createBrowserProvider takes no loader: the cdp helpers run host-side, and the surrounding code tool owns the sandbox executor. The loader lives with the tool that needs it, not duplicated onto the browser config.

8. Self check

Q1

Why does exposing the browser as a codemode provider require no new sandbox or executor code, when the standalone browser tool needed its own executor?

Answer

Because the code tool is a composition root that already runs one executor over many provider namespaces. Adding cdp to its provider list puts cdp.* in the same isolate as state.*. The only new concept needed was a per-execution lifecycle for the session — supplied by #1566's createRuntime — not a new sandbox.

Q2

The runtime acquires the lease with leasePromise ??= rather than calling acquire() in createRuntime. Why does that distinction matter?

Answer

Laziness. createRuntime runs for every execution, including ones that never touch the browser. Acquiring on first cdp.* call means a state-only block opens no browser session and pays no connection cost. The ??= also guarantees a single shared session for all cdp.* calls within the block.

Q3

In reuse mode, what does dispose do at the end of a code block, and what survives to the next block?

Answer

dispose calls lease.release(), which in the reusable manager only disconnects the CDP WebSocket. The Browser Run session itself is left alive (its id persisted), so tabs, cookies, local storage, and navigation state survive to the next execution's acquire(). One-shot mode would have closed the session instead.

Q4

Model code calls cdp.send(...). Where does the CDP request actually originate, and why does that keep the sandbox's globalOutbound: null isolation intact?

Answer

The call is dispatched out of the isolate via Workers RPC to a host-side function holding the CdpSession, which reaches Browser Rendering through the BROWSER binding. The isolate never opens a socket itself, so blocking its outbound network does not block browser control — exactly the way state.* filesystem access already works.