What is Exo Operator
The in-cluster operator that reconciles the agentkube.io CRDs into running workloads and connects your deployment to the Exo control plane over one outbound WebSocket.
Exo Operator is the connector that makes Exo work for a self-host deployment. It runs inside your Kubernetes cluster (or any host you can run a container on), authenticates to Exo with a deployment-scoped token, and opens a long-lived outbound WebSocket. Everything else — discovery, status streaming, shell I/O, session audit — flows over that one connection.
What Exo Operator is
Think of Exo Operator as a control-plane client, not a server. It does five things on your side of the wire:
- Discover. It watches your runtime — Agents — using controller-runtime informers and pushes a complete snapshot the first time it connects.
- Stream changes. When an underlying object changes — pod ready, a resource crashed — a Delta frame arrives at the control plane within a second.
- Heartbeat. Keep-alives flow on the same connection. If they stop, the dashboard marks the deployment
degradedand eventuallyoffline. - Host shell I/O. When a user attaches to an agent shell from the dashboard, Exo Operator opens a remote-exec PTY into the agent's container and proxies bytes both ways.
- Reconcile resources. It's a full Kubernetes operator: it turns
AgentandToolcustom resources into running pods, Services, and Secrets, and reports their status back over the wire. See Resources & CRDs.
One agent, one deployment
A deployment in Exo represents one customer-cluster installation. A deployment token is bound 1:1 to a deployment — the control plane refuses a second active connection on the same token, with a configurable last-writer-wins grace window. This intentional simplicity has two consequences:
- Scaling out means installing multiple Exo Operator replicas with leader election, not running two unrelated tokens against the same cluster.
- Multi-cluster tenants mint one token per cluster and see two deployments in the dashboard. That's the right primitive for environment separation (staging vs prod) and geographic separation (eu-west vs us-east).
Resource discovery
Exo Operator watches the agentkube.io/v1alpha1 CRDs with controller-runtime informers. Each object is reconciled into real cluster workloads and translated into a wire-form Resource that the dashboard renders.
1Kind Short Reconciles into Status surfaces2───── ───── ─────────────── ──────────────3Agent ag agent pod (agentlet + runtime) Active | Paused | Error | Succeeded | Failed4Tool tl on-demand pod Pending | Ready | Error | DisabledAdd and Update events emit delta:upsert frames; Delete emitsdelta:remove. A periodic Snapshot frame every 60 seconds acts as a backstop, so even a dropped Delta is reconciled on the next tick.
Connection lifecycle
The connection states are simple, and they map cleanly to dashboard colour:
- Bootstrap — the manager starts and (when
AGENTKUBE_MANAGED=true) readsAGENTKUBE_BASE_URLand its deployment token fromAGENTKUBE_AUTH_TOKEN. See Connecting. - Handshake —
Helloframe with token, version, host metadata, and a monotonicboot_id. - Registration — the control plane binds the connection to the deployment. If another connection holds the deployment, the older session is evicted with a
DeploymentReconnectedevent. - Discovery — initial Snapshot lands in the dashboard.
- Steady state — Deltas, heartbeats, audit, shell I/O, and pod-control frames flow in both directions.
- Disconnect / reconnect — capped exponential backoff. The deployment is marked
degradedafter the grace window andofflineafter the eviction window.