§03 — Exo Operator

What is Exo Operator

The in-cluster operator that reconciles the agentkube.io CRDs into running workloads and connects your deployment to the Exo control plane over one outbound WebSocket.

5 min read·Set by Exo Editorial·v0.3.0 Beta

Exo Operator is the connector that makes Exo work for a self-host deployment. It runs inside your Kubernetes cluster (or any host you can run a container on), authenticates to Exo with a deployment-scoped token, and opens a long-lived outbound WebSocket. Everything else — discovery, status streaming, shell I/O, session audit — flows over that one connection.

What Exo Operator is

Think of Exo Operator as a control-plane client, not a server. It does five things on your side of the wire:

  • Discover. It watches your runtime — Agents — using controller-runtime informers and pushes a complete snapshot the first time it connects.
  • Stream changes. When an underlying object changes — pod ready, a resource crashed — a Delta frame arrives at the control plane within a second.
  • Heartbeat. Keep-alives flow on the same connection. If they stop, the dashboard marks the deployment degraded and eventuallyoffline.
  • Host shell I/O. When a user attaches to an agent shell from the dashboard, Exo Operator opens a remote-exec PTY into the agent's container and proxies bytes both ways.
  • Reconcile resources. It's a full Kubernetes operator: it turns Agent and Tool custom resources into running pods, Services, and Secrets, and reports their status back over the wire. See Resources & CRDs.

One agent, one deployment

A deployment in Exo represents one customer-cluster installation. A deployment token is bound 1:1 to a deployment — the control plane refuses a second active connection on the same token, with a configurable last-writer-wins grace window. This intentional simplicity has two consequences:

  • Scaling out means installing multiple Exo Operator replicas with leader election, not running two unrelated tokens against the same cluster.
  • Multi-cluster tenants mint one token per cluster and see two deployments in the dashboard. That's the right primitive for environment separation (staging vs prod) and geographic separation (eu-west vs us-east).

Resource discovery

Exo Operator watches the agentkube.io/v1alpha1 CRDs with controller-runtime informers. Each object is reconciled into real cluster workloads and translated into a wire-form Resource that the dashboard renders.

resource families· text
1Kind Short Reconciles into Status surfaces
2───── ───── ─────────────── ──────────────
3Agent ag agent pod (agentlet + runtime) Active | Paused | Error | Succeeded | Failed
4Tool tl on-demand pod Pending | Ready | Error | Disabled

Add and Update events emit delta:upsert frames; Delete emitsdelta:remove. A periodic Snapshot frame every 60 seconds acts as a backstop, so even a dropped Delta is reconciled on the next tick.

Connection lifecycle

The connection states are simple, and they map cleanly to dashboard colour:

  • Bootstrap — the manager starts and (when AGENTKUBE_MANAGED=true) reads AGENTKUBE_BASE_URL and its deployment token from AGENTKUBE_AUTH_TOKEN. See Connecting.
  • HandshakeHello frame with token, version, host metadata, and a monotonic boot_id.
  • Registration — the control plane binds the connection to the deployment. If another connection holds the deployment, the older session is evicted with a DeploymentReconnected event.
  • Discovery — initial Snapshot lands in the dashboard.
  • Steady state — Deltas, heartbeats, audit, shell I/O, and pod-control frames flow in both directions.
  • Disconnect / reconnect — capped exponential backoff. The deployment is marked degraded after the grace window andoffline after the eviction window.

What Exo Operator isn't