Architecture, laid bare.
Essarion's three product surfaces are built on three services that compose, not three services that compete. This page is the wiring diagram — where data lives, where control lives, and what actually happens between the moment a query arrives and the moment a citation comes back.
§ 01The chained gateway
From the outside the platform looks like one API. From the inside it is two services in a line: a Next.js gateway in front, and a FastAPI research engine behind it.
The gateway — essarion_api — owns identity. It holds the users table, the keys table, the usage ledger, and the plan logic. Every external request hits it first. It authenticates the caller, decides whether the call is allowed, records the usage, and proxies the work upstream.
The engine — research-agent-deploy — owns the work. It runs the phases, scores the sources, drafts the citations, and persists the run. It does not ask who you are; the gateway has already decided that. It only accepts traffic that arrives with a valid service token, and it trusts the user attribution headers the gateway adds.
browser / SDK
│
▼
essarion_api (Next.js) ← identity, keys, usage, plans
│ Authorization: Bearer <service-token>
│ X-Essarion-User-Id: <uuid>
│ X-Essarion-Key-Id: <uuid>
▼
research-agent-deploy (FastAPI) ← phases, sources, citations, runs
That separation is the load-bearing decision. It means the engine never has to know about pricing, sessions, or tenancy logic; the gateway never has to know about Tavily, OpenRouter, or the twelve-phase pipeline. Each side stays opinionated about exactly one job.
§ 02The research engine
The engine is a FastAPI service in Python. It owns the deep-research pipeline — up to twelve phases per run — and persists everything it produces.
State lives in Postgres: runs, steps, sources, citations, reasoning chunks. Web search is delegated to Tavily. LLM calls are routed through OpenRouter as the primary provider with XAI as a fallback, so a single upstream outage doesn't take a run with it. Long-running operations expose two streams: WebSocket for fully bidirectional progress and SSE for one-way event feeds. The latter is what most callers use.
The pipeline runs phases in a defined order. A typical deep run touches: analyze (rewrite the question), plan (split into sub-queries), search (Tavily), scrape (fetch and extract), screen (score and prune), analyze (deep read), cite (build bibliographic records), synthesize (compose the final answer), with finishing passes around them. Every phase writes a step row; every chunk of reasoning writes a reasoning row; every URL writes a source row. By the time the run is done, the timeline is complete and queryable.
§ 03The agent workspace
The agent workspace is a separate but compatible system. The control plane is an Express + TypeScript server; the UI is a Next.js app. Together they expose project-scoped chat, files, runs, and surfaces.
State for the workspace lives in SQLite (or Postgres in deployments that need it) — projects, runs, files, audit logs. Each project owns its own file root on disk; agents read and write through the workspace, not directly. Real terminals are real PTYs, opened with node-pty. The cloud browser is a real Chromium, driven through Browserbase with Stagehand for high-level actions.
Two stream types serve the workspace. Chat events — agent thoughts, tool calls, results — flow over SSE. Terminal output and PTY events flow over WebSocket, because PTYs are inherently bidirectional.
§ 04Shared identity
One user account spans all three systems. The mechanism depends on the caller.
- Browsers authenticate with a JWT cookie set after sign-in. The cookie is HTTP-only, secure, and short-lived; the workspace and the research UI both read it.
- Code authenticates with an
esk_API key on theAuthorizationheader. Keys are tied to a user and resolved server-side against the hashed keys table. - Internal services authenticate to each other with a service bearer token. When the gateway calls the engine, it adds
X-Essarion-User-IdandX-Essarion-Key-Idheaders so the engine can attribute the work without ever needing to read the key itself.
The rule is strict: the engine never accepts caller-supplied user IDs without a valid service token. Identity is decided at the gateway, signed into the headers, and trusted only because the underlying transport is.
§ 05Data flow for a query
Concretely, this is what happens when a developer sends a query.
- The caller sends
POST /api/v1/querytoessarion_apiwith anesk_key in theAuthorizationheader. resolveApiKey()hashes the key, looks it up in Postgres, checks status, and returns the owning user. Invalid or revoked keys are rejected immediately.- The gateway calls
callResearchAnything()to proxy the request upstream. It attaches the service bearer token and theX-Essarion-User-Id/X-Essarion-Key-Idheaders. - The research engine accepts the request, creates a run row with a fresh
request_id, and starts working through phases. Each phase persists steps, reasoning chunks, sources, and (eventually) citations. - The engine returns a response envelope containing the answer, the citations, and the
request_id. - The gateway inserts a usage row keyed to the user and the key, then returns the envelope to the caller.
- Later, the caller can fetch the full timeline with
GET /api/v1/runs/{request_id}— the same run, fully persisted.
{
"request_id": "...",
"answer": "...",
"citations": [ ... ],
"sources": [ ... ]
}
§ 06Storage
Three stores, each with one job.
- Neon Postgres (serverless) backs identity and the research engine — users, keys, usage, runs, steps, sources, citations.
- SQLite backs the agent control plane — projects, files, audit, run metadata. Deployments that outgrow SQLite swap to Postgres without code changes.
- Per-project file roots hold the drive: the documents, artifacts, and uploads each project accumulates. A project's files never live in the database; the database only points at them.
§ 07Streaming
The platform exposes two stream shapes, each chosen for the workload it serves.
Server-Sent Events (SSE) back the query stream. A research run emits five event types in order: start (run created), message (incremental reasoning), sources (sources discovered and scored), reasoning (chunks of model thinking attached to a step), and final (the synthesized answer with citations). SSE is one-way and survives reconnection cleanly, which is what a long research run actually needs.
WebSocket backs the agent terminal and PTY events, where bidirectional traffic is unavoidable — keystrokes go in, output comes out, control bytes go in both directions.