Remote plane¶
A remote plane lets one aegis serve enqueue work into another
aegis serve over HTTP. One agent on one machine can hand a task off
to another machine — typically because the work is long-running, needs
different hardware, or should run under a different agent profile —
while the calling agent keeps moving.
By default the call is fire-and-forget: the receiver runs the worker on its own queue under its own config and the calling agent is not notified over the wire. As of v0.8.0 the caller can opt in to a wire callback that delivers the worker's final message back to the originating agent's inbox once the remote task terminates. See Callbacks below.
The two HTTP planes¶
Both ends run aegis serve. Each aegis serve already exposes one
HTTP plane — the MCP plane, loopback-bound, consumed by the
workers this serve spawned. The remote plane is a second, distinct
plane, bound to a tailnet IP, consumed by other aegis serves.
┌──────────────────────┐ ┌────────────────────────┐
│ zion: aegis serve │ │ vps: aegis serve │
│ │ │ │
│ MCP plane (loop.) │ │ MCP plane (loop.) │
│ │ HTTP POST │ │
│ Remote plane ───┼───────────────▶│ Remote plane │
│ (tailnet IP) │ tailnet IP │ (tailnet IP) │
│ │ │ │
│ QueueManager │ │ QueueManager │
└──────────────────────┘ └────────────────────────┘
The two planes never overlap. Local workers can't reach the remote plane (it's bound to a different address); remote callers can't reach the MCP plane (it's loopback). One JSON endpoint each direction.
The single endpoint¶
POST /remote/v1/enqueue
Content-Type: application/json
Body:
{
"queue": "implementation",
"payload": "<full prompt for the worker>",
"from": "zion"
}
Response 200:
{
"task_id": "01J...",
"queued_position": 0
}
Response 4xx/5xx:
{ "error": "<reason>" }
The remote QueueManager accepts the task as if it were locally
enqueued and records enqueued_by="remote:<from>" in its JSONL
lifecycle log. The worker runs on the remote machine's filesystem,
with the remote .aegis.yaml's agent profiles.
MCP surface¶
aegis_enqueue grew one optional parameter:
aegis_enqueue(
queue: str,
payload: str,
from_handle: str,
callback: bool | None = None, # default depends on target (see below)
target: str | None = None,
) -> dict
target=None(default): the existing local-enqueue path.callbackdefaults to True — worker's final result is routed into the caller's inbox.target="<name>": the substrate looks<name>up in the localremotesconfig and POSTs the body to that remote's/remote/v1/enqueue.callbackdefaults to False (v0.7 fire-and-forget compatibility). Passcallback=Trueexplicitly to request a wire-level callback — see Callbacks.
aegis_enqueue with target set and explicit callback=True will
loudly reject the call when any of the following is missing:
remote_plane.peer_nameon the caller's serve;remotes[<target>].peer_nameon the caller's serve;remote_planeblock on the caller's serve at all.
There is no silent fallback to fire-and-forget; if you ask for a callback and the symmetric-peer config isn't complete, the substrate tells you so.
Configuration¶
Two new top-level sections in .aegis.yaml. Both follow the same
inline-plus-overlay pattern as agents, queues, and schedules.
Outbound — remotes¶
The list of remotes this serve is willing to call.
Inline in .aegis.yaml:
remotes:
vps:
url: http://100.64.0.5:8556
# optional bearer token (otherwise tailnet trust only)
token: "<secret>"
Or split into overlay files at .aegis/remotes/<name>.yaml:
Same fail-loud rule as queues and schedules: a name appearing in both inline and an overlay aborts boot.
url must include scheme and host. http:// over a tailnet is fine —
WireGuard already encrypts. https:// is permitted but adds no value
on a tailnet.
Inbound — remote_plane¶
The opt-in section that turns on the receive side. Default off (missing key or empty block).
remote_plane:
bind: 100.64.0.5:8556 # tailnet IP, explicit
peer_name: zion # this serve's identity on outbound callbacks
# — required when `remotes` is also configured
accept_tokens: [] # optional bearer-token allowlist
accept_from: [] # optional source-IP allowlist
bindis the address the inbound plane listens on. Binding to a specific tailnet IP is the recommended shape;0.0.0.0is permitted but warned at boot.accept_tokens: when non-empty, requests must present a matchingAuthorization: Bearer <token>. Empty list = tailnet trust only.accept_from: when non-empty, only requests from listed source IPs pass. Empty list = any source IP that reaches the port is accepted.
Gates compose with AND: if both are configured, the request must satisfy both. Both empty is "anything that reaches the port is trusted" — appropriate for a small personal tailnet, inadequate the day a third device joins.
There is one remote_plane block per serve. No overlay folder for
this section (it's not a multi-entry table).
Error model¶
All failure paths are loud and distinguishable. There is no silent
fallback to local enqueue — the whole point of target=vps is that
the work runs on vps; running it locally instead would defeat the
ask.
| Condition | What the tool returns |
|---|---|
target not in remotes config |
{"error": "unknown target 'vps'"} |
| Remote serve unreachable (TCP refused) | {"error": "remote 'vps' unreachable: ..."} |
| Remote serve timeout | {"error": "remote 'vps' timed out"} |
| Remote 401 (token mismatch) | {"error": "remote 'vps' rejected auth"} |
| Remote 404 (queue unknown) | {"error": "remote 'vps': unknown queue 'X'"} |
| Remote 5xx | {"error": "remote 'vps' failed: <body>"} |
Timeouts in v1 are fixed: 5s connect, 10s read. No retries — fail fast and let the calling agent decide what to do.
Completion / return channel¶
The default completion model is fire-and-forget: the calling aegis
gets a task_id back from the POST and is not notified when the
remote task finishes. What happens on completion is then entirely up
to the receiving serve's own configuration:
- If the receiver has a Telegram bridge configured, the worker's final message will land in Telegram on its way out.
- If the receiver runs in a repo and the worker commits and pushes, the work shows up in git.
- If the worker writes into a shared filesystem (a vault, a synced folder), it shows up there.
- If the receiver does nothing on completion, nothing happens on completion.
For the opt-in wire callback path — final-message delivery back to the originating agent's inbox — see Callbacks.
Callbacks¶
When aegis_enqueue(target="<peer>", callback=True) is invoked, the
substrate attaches callback_to + callback_handle hints to the
wire enqueue body. On worker termination the receiving serve POSTs
the worker's final message to POST /remote/v1/callback on the
caller's plane; the caller's inbox router delivers it to the
originating agent as a normal ✉ from queue:<peer>:<name> envelope.
# zion: agent dispatches and keeps working
aegis_enqueue(
queue="implementation",
payload="Implement the design at docs/specs/foo.md…",
from_handle="lucid-knuth",
target="vps",
callback=True,
)
# → {"task_id": "01J…", "target": "vps", …}
# …later, in lucid-knuth's transcript on zion:
# ✉ from queue:vps:impl · task#01J… · ok · 17:46:11Z
# Done. Implemented in branch feat/foo, pushed, CHANGELOG updated.
Symmetric config required. Both sides must define each other in
their remotes: block (see Configuration). The
caller's peer_name for the target is what gets sent as
callback_to; the receiver looks that name up in its own
remotes: to find the URL to POST the callback to.
Best-effort, no retry. A callback that can't reach the caller
(network drop, caller restarted, auth rejected) is logged and
dropped — every callback attempt is recorded in the receiver's queue
JSONL with its outcome. Pass callback=False if you want the
fire-and-forget behavior explicitly.
Remote schedules¶
A small control plane lets one serve push a schedule into a peer and then inspect or remove it remotely. The substrate of v0.6's schedule hot-reload watcher picks the pushed file up the moment it lands; once seeded it behaves identically to an inline or overlay schedule.
HTTP endpoints¶
PUT /remote/v1/schedule/<name> # push a schedule body
GET /remote/v1/schedule # list (with source classification)
GET /remote/v1/schedule/<name> # show one
DELETE /remote/v1/schedule/<name> # remove a pushed/overlay schedule
GET /remote/v1/schedule/<name>/logs # JSONL tail
The PUT body is the same schedule shape used in .aegis.yaml. The
receiver writes it to .aegis/schedules/<name>.yaml with a
# pushed_from: <caller> provenance comment on the first line.
MCP tools¶
aegis_schedule_push (name, spec_body, target=None)
aegis_schedule_list ( target=None)
aegis_schedule_show (name, target=None)
aegis_schedule_remove (name, target=None)
aegis_schedule_logs (name, tail=50, target=None)
target=None operates on the local scheduler; target="<peer>"
routes through the remote-plane client. Same dict shapes either way.
CLI verbs¶
aegis schedule push <name> <file.yaml> --to vps # push to a peer
aegis schedule list --remote vps # inspect a peer
aegis schedule show morning-briefing --remote vps
aegis schedule remove ci-watch --remote vps
aegis schedule logs nightly --remote vps -n 100
--to is push-only and required for schedule push when targeting
a peer; --remote is the inspection flag on the read/remove verbs.
Source classification¶
list and show responses tag each schedule with a source field:
| source | meaning |
|---|---|
inline |
declared in the receiver's .aegis.yaml schedules: table |
overlay |
hand-written file under .aegis/schedules/<name>.yaml |
pushed |
overlay file with a # pushed_from: provenance comment |
DELETE refuses to remove inline schedules (the receiver's own
config owns those); overlay and pushed are removable.
Self-scheduling¶
An agent can push its own future task into the local serve (or a
peer) — useful when finishing the current turn requires kicking
something off later. The standard pattern: build a fire_at body
~N seconds ahead, push, exit. The substrate's hot-reload picks it
up and fires it without further intervention.
Security model¶
The remote plane has no opinion about your network — bind it where it should be reachable from, and only from. A common deployment is a private overlay network (Tailscale, Headscale, WireGuard, plain VPN); the plane binds to its interface address and only nodes on the overlay can reach the port. That makes the network itself the trust anchor and keeps the HTTP surface narrow.
Two HTTP-layer gates compose on top:
- Bearer tokens (
accept_tokenson the receiver,tokenon the caller). Set this when your network has callers you don't fully trust, or when you want different callers to use different secrets. - Source-IP allowlists (
accept_from). Set this when you want only specific peer IPs to be able to enqueue.
Both gates compose with AND when set. Both empty means "anything that reaches the port is trusted" — appropriate when the network itself is the trust anchor (small tailnet, plain VPN), inadequate the day untrusted devices share the same network.
Tokens, if used, live in .aegis.yaml or its overlay files. Treat
them as secrets — keep them out of version control if the repo is
shared, or load them from env at startup.
Patterns¶
Local brainstorm → remote implementation¶
A long research / implementation task gets handed from an interactive
session on one machine to an aegis serve on another, where it can
run for hours under whatever profile and quota live there.
aegis_enqueue(
queue="implementation",
payload=(
"Implement the design at docs/specs/foo.md in this repo. "
"Use TDD. Commit and push as you go. Update CHANGELOG."
),
from_handle="lucid-knuth",
target="builder",
)
# → {"task_id": "01J…", "queued_position": 0, "target": "builder",
# "callback_note": "no wire return channel in v1; completion
# behavior is whatever the receiving serve is
# configured to do"}
The calling agent can wrap up the conversation. How you learn the work is done depends on what the receiver does on completion — commits landing in git, a notification through whatever bridge the receiver runs, a file appearing in a synced folder, or simply checking back later.
Cheap local → expensive remote¶
A small fast model on one machine handles routing and clarification
and hands hard subproblems off to a bigger model running on another.
The remote's .aegis.yaml configures implementation against the
heavier model with full permissions; the calling serve doesn't even
need to know that model exists.
# .aegis.yaml on the cheap side
agents = {
"router": Agent(provider=ClaudeCode(model="haiku",
permission="auto")),
}
default_agent = "router"
Several machines on one overlay¶
A handful of machines on the same overlay network — laptop, desktop,
a beefy box — each running aegis serve. Any agent can hand off to
any other peer it has declared as a remote.
# .aegis.yaml on the laptop
remotes:
desktop: { url: http://100.64.0.3:8556 }
builder: { url: http://100.64.0.5:8556 }
remote_plane:
bind: 100.64.0.4:8556
When the network itself is trusted (e.g. a personal tailnet with only
your own devices on it), no per-peer tokens are needed. Add bearer
tokens or accept_from allowlists the moment that stops being true.
File layout¶
src/aegis/remote/
config.py # RemoteSpec, RemotePlaneSpec dataclasses
plane.py # Starlette app + build_plane / run_plane_async
client.py # httpx client: remote_enqueue, remote_callback,
# remote_schedule_push/list/show/remove/logs
callback_observer.py # receiver-side worker-completion observer
src/aegis/scheduler/
push.py # PUT/GET/DELETE handlers + source classification
The aegis_enqueue MCP tool routes to remote.client.remote_enqueue
when target= is set; cli.serve() mounts the plane via
build_plane(queue_manager, cfg.remote_plane) when remote_plane is
configured. The aegis_schedule_* tools and aegis schedule CLI
verbs use the same client module when given a target=/--to/
--remote peer.
Future extensions (not built)¶
These are noted to confirm the /remote/v1/ namespace leaves room
for them without breaking changes:
- Status query —
GET /remote/v1/task/<id>. - Cancel —
POST /remote/v1/task/<id>/cancel. - Cross-host
aegis_handoff— handoff to a live remote handle. Semantically harder than enqueue; deferred until the use case appears.