DocsAgentsSurfaces

Surfaces the agent acts through.

Every agent has access to four interactive surfaces. Each is real, observable, and auditable — a real shell, a real browser, a real file tree, a real preview. The agent picks; the UI follows.

§ 01Terminal & PTY

The terminal surface is the closest one to just give the agent a computer. It's a real pseudo-terminal, not a sandboxed sub-shell or a string-in-string-out wrapper.

Runtime

Each agent gets its own node-pty process — a real interactive shell with its own working directory, environment, and process tree. Output is streamed to the UI over a WebSocket and rendered through xterm.js, the same library you'd find in VS Code's terminal. Scrollback is replayable: re-open a past run and the terminal plays back exactly as the agent saw it.

Tools

Two tools expose the terminal to the agent.

execCommand() — run an arbitrary shell command.
execPython() — run a Python snippet in a managed interpreter.

Both stream output back as terminal_output events on the run timeline, and both write to the agent's knowledge base so subsequent steps can reason about what happened.

Isolation

Per-run isolation: each run gets its own working directory and environment, and one run cannot read another run's terminal state. Per-project isolation is enforced at the session boundary.

TipThe terminal is the surface that lets the agent do the kind of work that other AI products only describe. Convert a docx, run a pandas summary, hit a CLI, kick off a build.

§ 02Browser

The browser surface is a real cloud Chromium controlled through Stagehand's natural-language action layer.

Runtime

Browser sessions are provisioned from a Browserbase cloud Chromium pool. Each session is its own real browser with its own cookies, viewport, and rendering. The live view of the browser is exposed back to the workspace UI as a live-view iframe — you watch the page as the agent drives it.

Tools

Three tools expose the browser to the agent. They speak the language of intent, not low-level DOM.

browser_navigate — go to a URL.
browser_act — perform an action on the page ("click the login button", "fill the email field with X").
browser_extract — pull structured data off the page ("extract the table of invoices as JSON with columns date, vendor, amount").

Stagehand handles the translation from natural language to Playwright actions, with retries and self-healing when the page shape shifts.

Session lifecycle

A browser session is tied to the agent for the duration of a task. When the task ends the session is released back to the pool. Cookies and login state are not carried between unrelated runs by default — if you need persistent login state, use a project-scoped session pattern.

§ 03IDE

The IDE surface is the file view. A read-only tree of project files plus a viewer for the file currently in focus.

It is the surface the workspace switches to when the agent reads a file: open the file, jump to the relevant section, highlight the lines that ground the agent's answer. You read along; the agent does the work.

The IDE is currently read-only — it is the foundation for editor evolution. The agent writes through tool calls (file write, save output) rather than through a live editor handle, which keeps the audit trail clean.

§ 04Preview

The preview surface is a live proxy in front of any dev server the agent boots inside the workspace.

Frameworks

The preview detects and proxies common dev servers automatically.

Vite — single-page apps, modern frontends.
Next.js — server-rendered React apps with HMR.
Flask — Python web apps and dashboards.
Create React App — legacy React projects.

Inspector

An inspector is injected into the preview iframe so you can see DOM state and console output without leaving the workspace. The iframe is sandboxed — same-origin policies apply, and the preview cannot reach back into the parent app's storage or session.

Typical flow

The agent installs dependencies in the terminal, boots the dev server, and emits a surface_switch to preview. The UI swaps to the preview pane with the live URL loaded. From there you see the app rendering, change requests come back into chat, and the cycle repeats.

§ 05Surface switching

The agent is in charge of which surface is foregrounded. When it changes context — moving from reading a file to running a shell command, from running a shell command to opening a browser — it emits a surface_switch SSE event and the workspace UI follows.

The event payload is small and explicit:

json

{
  "event": "surface_switch",
  "run_id": "run_01HG7…",
  "surface": "browser",
  "reason": "navigating to vendor portal to extract invoice list",
  "timestamp": "2026-04-30T14:22:11.482Z"
}

The surface field is one of terminal, browser, ide, or preview. The reason is a short human-readable explanation the UI can show in a toast or status bar so you always know why the workspace just switched.

NoteSurface switches are part of the run's persistent timeline. Replaying a past run replays its surface switches in order — what you see is what the agent saw.

§ 06Where to go next

How the agent decides what to call → Runtime & subagents.
How skills shape behavior on each surface → Skills & jobs.
How files flow through reads and writes → Drive & artifacts.