DocsPlatformArchitecture

Architecture, laid bare.

Essarion's three product surfaces are built on three services that compose, not three services that compete. This page is the wiring diagram — where data lives, where control lives, and what actually happens between the moment a query arrives and the moment a citation comes back.

§ 01The chained gateway

From the outside the platform looks like one API. From the inside it is two services in a line: a Next.js gateway in front, and a FastAPI research engine behind it.

The gateway — essarion_api — owns identity. It holds the users table, the keys table, the usage ledger, and the plan logic. Every external request hits it first. It authenticates the caller, decides whether the call is allowed, records the usage, and proxies the work upstream.

The engine — research-agent-deploy — owns the work. It runs the phases, scores the sources, drafts the citations, and persists the run. It does not ask who you are; the gateway has already decided that. It only accepts traffic that arrives with a valid service token, and it trusts the user attribution headers the gateway adds.

request shape
browser / SDK
   │
   ▼
essarion_api  (Next.js)        ← identity, keys, usage, plans
   │  Authorization: Bearer <service-token>
   │  X-Essarion-User-Id: <uuid>
   │  X-Essarion-Key-Id: <uuid>
   ▼
research-agent-deploy  (FastAPI) ← phases, sources, citations, runs

That separation is the load-bearing decision. It means the engine never has to know about pricing, sessions, or tenancy logic; the gateway never has to know about Tavily, OpenRouter, or the twelve-phase pipeline. Each side stays opinionated about exactly one job.

§ 02The research engine

The engine is a FastAPI service in Python. It owns the deep-research pipeline — up to twelve phases per run — and persists everything it produces.

State lives in Postgres: runs, steps, sources, citations, reasoning chunks. Web search is delegated to Tavily. LLM calls are routed through OpenRouter as the primary provider with XAI as a fallback, so a single upstream outage doesn't take a run with it. Long-running operations expose two streams: WebSocket for fully bidirectional progress and SSE for one-way event feeds. The latter is what most callers use.

The pipeline runs phases in a defined order. A typical deep run touches: analyze (rewrite the question), plan (split into sub-queries), search (Tavily), scrape (fetch and extract), screen (score and prune), analyze (deep read), cite (build bibliographic records), synthesize (compose the final answer), with finishing passes around them. Every phase writes a step row; every chunk of reasoning writes a reasoning row; every URL writes a source row. By the time the run is done, the timeline is complete and queryable.

§ 03The agent workspace

The agent workspace is a separate but compatible system. The control plane is an Express + TypeScript server; the UI is a Next.js app. Together they expose project-scoped chat, files, runs, and surfaces.

State for the workspace lives in SQLite (or Postgres in deployments that need it) — projects, runs, files, audit logs. Each project owns its own file root on disk; agents read and write through the workspace, not directly. Real terminals are real PTYs, opened with node-pty. The cloud browser is a real Chromium, driven through Browserbase with Stagehand for high-level actions.

Two stream types serve the workspace. Chat events — agent thoughts, tool calls, results — flow over SSE. Terminal output and PTY events flow over WebSocket, because PTYs are inherently bidirectional.

§ 04Shared identity

One user account spans all three systems. The mechanism depends on the caller.

The rule is strict: the engine never accepts caller-supplied user IDs without a valid service token. Identity is decided at the gateway, signed into the headers, and trusted only because the underlying transport is.

§ 05Data flow for a query

Concretely, this is what happens when a developer sends a query.

  1. The caller sends POST /api/v1/query to essarion_api with an esk_ key in the Authorization header.
  2. resolveApiKey() hashes the key, looks it up in Postgres, checks status, and returns the owning user. Invalid or revoked keys are rejected immediately.
  3. The gateway calls callResearchAnything() to proxy the request upstream. It attaches the service bearer token and the X-Essarion-User-Id / X-Essarion-Key-Id headers.
  4. The research engine accepts the request, creates a run row with a fresh request_id, and starts working through phases. Each phase persists steps, reasoning chunks, sources, and (eventually) citations.
  5. The engine returns a response envelope containing the answer, the citations, and the request_id.
  6. The gateway inserts a usage row keyed to the user and the key, then returns the envelope to the caller.
  7. Later, the caller can fetch the full timeline with GET /api/v1/runs/{request_id} — the same run, fully persisted.
envelope (abbreviated)
{
  "request_id": "...",
  "answer": "...",
  "citations": [ ... ],
  "sources":   [ ... ]
}

§ 06Storage

Three stores, each with one job.

§ 07Streaming

The platform exposes two stream shapes, each chosen for the workload it serves.

Server-Sent Events (SSE) back the query stream. A research run emits five event types in order: start (run created), message (incremental reasoning), sources (sources discovered and scored), reasoning (chunks of model thinking attached to a step), and final (the synthesized answer with citations). SSE is one-way and survives reconnection cleanly, which is what a long research run actually needs.

WebSocket backs the agent terminal and PTY events, where bidirectional traffic is unavoidable — keystrokes go in, output comes out, control bytes go in both directions.

TipFor most integrations, SSE is the right choice. See Streaming with SSE for a worked example.