A small tool for a focused workflow. Three or four models in defined roles, one operator at the keystone position. Not a multi-agent system. A structured code review that automates the file-shuffling and preserves the judgment.
A single Python tool that automates the file-shuffling in an existing audit-cycle workflow. Not an agent framework. Not a team. A relay with one operator at the keystone.
The workflow this tool automates already works. A coding interaction produces an initial implementation. That implementation is sent to two models for in-depth audit with a strict prompt: do not gloss over flaws, do not change code, indicate ambiguity, produce an action plan. The plans are reviewed by the operator, optionally tightened by a third model, then sent back to a coding model for revision. The coding role rotates between cycles to prevent inherited bias.
The mechanical work — copying code between four windows, re-attaching references, parsing out the action plan from prose, applying the diff back to disk — is what slows down the cycle. The judgment work — deciding whether the plan is right, whether the revision achieved it, whether the next iteration is needed — is what produces value. This tool removes the mechanical work and preserves the judgment work. That distinction is the entire design.
The tool is not the project. Phase 1 is. This blueprint is for a tool that exists to accelerate Phase 1, not for a tool that justifies its own development time. Every hour spent on the relay that doesn't save more than an hour on Phase 1 is wasted hour.
Build incrementally. Use the simplest version that removes the most painful copy-paste first. Add capability when specific friction earns it.
The audit relay is a Python orchestrator that runs the existing audit-cycle prompts against multiple LLM APIs in parallel, presents action plans to the operator for review at defined gates, applies approved revisions, and tracks state across multiple in-flight cycles.
APIs, libraries, conventions. Every external dependency named.
Claude via https://api.anthropic.com/v1/messages.
Highest tier subscription supports the model needed. SDK:
anthropic Python package. Used for: long-form audit
role and revision role.
Auth: ANTHROPIC_API_KEY in environment.
Never commit secrets — load from a process env or an untracked
.env.
Codex / GPT via https://api.openai.com/v1/responses.
SDK: openai Python package. Used for: alternate audit
role and rotated revision role.
Auth: OPENAI_API_KEY in environment.
Never commit secrets — load from a process env or an untracked
.env.
Note: Avoid the Monday persona for this work — register the model with a clinical system prompt that suppresses snark.
Gemini via Google Cloud or AI Studio. SDK:
google-generativeai Python package. Used for: plan
refinement role only — sharpen and consolidate, never originate.
Auth: GOOGLE_API_KEY in environment, or
application default credentials if going through Cloud. Never
commit secrets — load from a process env or an untracked
.env.
Each cycle commits to a feature branch. State transitions are commits with structured messages. Diffs between commits show what changed at each stage. Solves the WSL/Windows mirror problem because git is the source of truth, not either filesystem location.
Project lives under ~/projects/paxiom-relay on the
Linux filesystem (not /mnt/c — performance). WSLg
provides notify-send for desktop notifications on
Windows 11. tmux for split-pane workflow. $EDITOR
respected for plan review (vim, nvim, micro, or
code --wait if VS Code preferred).
Python 3.11+. Direct dependencies kept small:
anthropic, openai,
google-generativeai, pyyaml,
click (or typer) for CLI,
rich for terminal output. No heavyweight frameworks
(no LangChain, no CrewAI, no AutoGen). The orchestrator is
pedestrian Python, not a meta-system.
The relay sits between conversational tooling (Claude long-form, Codex CLI, Antigravity) and git. It runs structured passes after the conversation has produced a first artifact.
Long-form conversational work in Claude — idea shaping, spec drafting,
architecture decisions, blueprint sheets like the ones referenced above.
Output: written specifications and reference materials in
references/.
Initial coding interaction in any of the conversational coding tools (Claude Code, Codex CLI, Antigravity). Output: a first-pass implementation committed to a feature branch.
Take that first-pass implementation and run audit-revise cycles against it until the operator accepts the result. The number of cycles is bounded by configuration; in practice 1–3 cycles is typical.
A feature branch with cycle commits visible as history. Final accepted code on the tip of the branch. Audit artifacts preserved as files for later reference. Merge to main is a separate step the operator does manually.
relay start <service>.
Operator
Cycle UUID generated. Branch created. State file written.
references/<service>/*. Compose audit input.
audits/<cycle>/.
Validate output structure. If parse fails, retry once with format reminder.
notify-send. Plan files appear in queue.
relay review <cycle>.
Operator
$EDITOR opens consolidated plan. Operator edits, saves, exits.
relay continue <cycle>.
Operator
Or relay abort <cycle>. Decision is explicit, never inferred.
test_command from service config. Output captured.
relay accept, relay iterate, or relay abort.
complete or back to audit-running.
Plan finalization (between steps 04 and 05) and final acceptance (after step 07) are mandatory operator gates. The tool never auto-continues past either. This is the structural defense against confidently-wrong drift. If the tool ever ships a mode that skips these gates, that mode is a bug.
The substrate the tool runs on. Concrete paths, concrete formats.
~/projects/paxiom-relay/ ├── relay.py # Entry point. Click-based CLI. ├── relay/ │ ├── __init__.py │ ├── orchestrator.py # Cycle state machine, dispatch logic. │ ├── models/ │ │ ├── anthropic.py # Claude wrapper. │ │ ├── openai.py # Codex/GPT wrapper. │ │ └── gemini.py # Gemini wrapper. │ ├── prompts.py # Template loading and rendering. │ ├── parse.py # Audit output parsing. │ ├── notify.py # notify-send + tmux + bell. │ ├── state.py # Cycle persistence in .relay/ │ └── git.py # Branch + commit helpers. ├── prompts/ │ ├── audit.md # Audit prompt template. │ ├── refine.md # Gemini refinement prompt. │ └── revise.md # Coding revision prompt. ├── config/ │ ├── default.yaml # Global defaults. │ └── services/ │ ├── A-201.yaml # Per-service overrides. │ ├── A-202.yaml │ └── ... └── README.md
<target-repo>/.relay/ ├── cycles/ │ └── <uuid>/ │ ├── state.json # State machine value + metadata. │ ├── input.txt # Code state at cycle start. │ ├── audits/ │ │ ├── claude.md │ │ └── codex.md │ ├── refined-plan.md # Optional, after Gemini. │ ├── plan.md # What the operator approved. │ ├── revision.diff │ └── test-output.txt ├── queue.md # Live list of in-flight cycles. └── logs/ └── <date>.log # Daily log of all cycle activity.
# config/services/A-202.yaml service: id: A-202 name: Sync Committee Verification branch_prefix: feat/svc-202 references: - blueprint/A-202.md - architecture/kya-primitive.md - architecture/x402-integration.md - tests/sync-committee-vectors.json models: audit_a: provider: anthropic model: claude-opus-4-7 audit_b: provider: openai model: gpt-5 refinement: provider: google model: gemini-2.5-pro enabled: true revision: rotation: [anthropic, openai] # Alternates per cycle. cycle: max_iterations: 5 test_command: cargo test --package sync-committee-verifier acceptance_required: true
A single CLI that handles every operation. No daemons. No long-running processes. Each invocation does one thing and exits.
relay start <service-id> # Begin new cycle. Returns UUID. relay status # Show all in-flight cycles. relay show <uuid> # Detailed view of one cycle. relay review <uuid> # Open plan in $EDITOR. relay continue <uuid> # Approve plan, run revision. relay accept <uuid> # Final accept after revision. relay iterate <uuid> # Revision incomplete, audit again. relay abort <uuid> # Stop cycle, clean state. relay rotate <service-id> # Manually flip revision rotation. relay cost # Token usage and $ to date. relay logs [--tail] # Activity log, optionally follow.
0 — operation completed.1 — operation paused, waiting on operator (after audit, after revision).2 — error, operator should investigate.3 — config or environment problem (missing API key, missing repo).
Exit codes matter for tmux scripting and for integration with other
tools. relay status; if [ $? -eq 1 ]; then ... patterns
let the operator wire the relay into their existing terminal flow.
config/default.yaml.config/services/<service-id>.yaml.--model claude-opus-4-7 on the start command).
Every API call, state transition, and operator command logs to
.relay/logs/YYYY-MM-DD.log with timestamps and cycle
UUIDs. Logs are append-only, plain text, grep-friendly. No structured
logging library. If something went wrong at 2 AM the previous day,
grep <uuid> logs/2026-04-28.log reconstructs the
story.
The prompts are the actual product. The orchestrator is plumbing.
# prompts/audit.md
You are conducting an in-depth technical audit of the code below
against the reference materials provided. Your role is auditor only.
Hard rules:
- Do not gloss over flaws. Flaws unaddressed in the audit will reach
production unaddressed.
- Do not change the code. Do not propose code. Produce an action plan
that another model will execute.
- Indicate ambiguity. Where the spec is unclear or the code's intent is
uncertain, say so explicitly with [AMBIGUOUS].
- Be specific. Quote line numbers and code excerpts.
- Do not propose features beyond the spec. The spec is the contract.
Reference materials:
{{ references }}
Service specification:
{{ spec }}
Code to audit:
{{ code }}
Produce output in this exact structure. Do not add or remove sections.
## Critical issues
[bugs, security flaws, spec violations - must fix before merge]
## Significant concerns
[design problems, robustness issues - should fix]
## Ambiguities
[places the spec is unclear or code intent is uncertain]
## Style and convention notes
[lower priority improvements]
## Action plan
[ordered list of specific imperative changes,
each item references which section above it addresses]
# prompts/refine.md
Two independent audits of the same code are below. Your role is to
consolidate them into a single action plan. Your role is editorial,
not creative.
Hard rules:
- Do not add new items. If neither audit raised it, it stays out.
- Do not propose new directions. Do not editorialize.
- Tighten existing items for accuracy and explicitness.
- Mark items unclear with [AMBIGUOUS].
- Mark items present in both audits with [DUPLICATE — consolidate].
- Mark items where the two audits conflict with [CONFLICT — operator decides].
Audit A (Claude):
{{ audit_a }}
Audit B (Codex):
{{ audit_b }}
Produce a single consolidated action plan in the same structure as the
input audits. Preserve attribution where it matters; otherwise merge
silently.
# prompts/revise.md
Revise the code below to address the action plan. The plan has been
reviewed and approved by the operator. Your role is to execute the plan
as written.
Hard rules:
- Do not deviate from the plan. Items the plan doesn't mention stay
unchanged.
- Do not add features. The plan is the contract.
- If a plan item is impossible to address as written, stop and report
the conflict in a section called "## Plan conflicts" rather than
improvising a workaround.
- Preserve existing tests. Add tests only for new behavior the plan
introduces.
Reference materials:
{{ references }}
Service specification:
{{ spec }}
Original code:
{{ code }}
Approved action plan:
{{ plan }}
Produce the revised code as a single artifact. Use proper file structure
with file path headers (e.g., # path: src/verifier/lib.rs) before each
file's contents.
Parsing audit output is straightforward markdown header detection.
parse.py looks for ## Critical issues,
## Significant concerns, etc., and slices the document
into sections. If a required section is missing, the orchestrator
retries the audit once with a format reminder. If the second attempt
also fails, it surfaces the raw output to the operator with a flag.
No Slack. No email. No web dashboard. The notification surface is the terminal you already have open.
notify-send — desktop toast
via Windows 11 native integration. Default. Works without setup on
Win11 + WSL2 with WSLg enabled.\a) — fallback if
notify-send unavailable. Configurable on/off per
operator preference..relay/queue.md) —
always written. Watch with watch -n 5 cat .relay/queue.md
in a tmux pane for live status without polling the API.relay tmux-status subcommand emits a single-line
summary suitable for status-right.
relay review <uuid> opens the consolidated plan
file in $EDITOR. Operator edits in place — adding,
removing, sharpening items. Saves and exits. The orchestrator reads
the modified file as the approved plan.
Editor selection follows standard convention:
$EDITOR environment variable if set.git config --get core.editor if set.vi as last fallback.
For VS Code users: export EDITOR="code --wait".
For nvim users: standard. For terminal-only operators (vim, micro):
standard.
$ relay status
CYCLES IN FLIGHT
────────────────────────────────────────────────────────
A-202 bf3c91 AUDIT-RUNNING 2m ago claude+codex
A-204 1a8d72 AWAITING-REVIEW 14m ago plan ready
A-201 7e0c45 REVISION-RUNNING 38s ago claude
A-205 9b2317 AWAITING-ACCEPTANCE 1h ago tests passed
COSTS THIS SESSION
────────────────────────────────────────────────────────
Anthropic $4.21 18,402 input / 6,108 output tokens
OpenAI $3.86 21,209 input / 4,892 output tokens
Google $0.42 8,103 input / 718 output tokens
TOTAL $8.49
3 cycle(s) need your attention. Run `relay queue`.
$ tmux new-session -d -s paxiom $ tmux send-keys -t paxiom 'cd ~/projects/paxiom' C-m $ tmux split-window -h -t paxiom 'watch -n 10 relay status' $ tmux split-window -v -t paxiom:0.1 'tail -f .relay/logs/$(date +%Y-%m-%d).log' $ tmux attach -t paxiom
Cycle states explicit, transitions journaled, crashes recoverable. Multiple cycles in flight without conflict.
INITIALIZED — UUID created, branch checked out, ready to dispatch.AUDIT_RUNNING — API calls in flight to audit models.AUDIT_COMPLETE — both audits captured, refinement (if enabled) pending or done.AWAITING_REVIEW — operator gate. Notification fired.PLAN_APPROVED — operator ran continue.REVISION_RUNNING — coding model has the revision prompt.TESTING — revision applied, tests executing.AWAITING_ACCEPTANCE — operator gate. Notification fired.COMPLETE — operator accepted. Branch ready for merge.ITERATING — operator chose to re-audit. Loops back to AUDIT_RUNNING.ABORTED — operator killed cycle. State preserved for review.// .relay/cycles/<uuid>/state.json
{
"uuid": "bf3c91a4-...",
"service_id": "A-202",
"state": "AWAITING_REVIEW",
"branch": "feat/svc-202-bf3c91",
"iteration": 1,
"started_at": "2026-04-29T22:14:03Z",
"transitions": [
{ "to": "AUDIT_RUNNING", "at": "..." },
{ "to": "AUDIT_COMPLETE", "at": "..." },
{ "to": "AWAITING_REVIEW", "at": "..." }
],
"config_snapshot": { ... },
"tokens": { "anthropic_in": 8421, ... },
"cost_usd": 0.34
}
If the orchestrator dies mid-cycle (network fail, system reboot,
Ctrl+C), state is preserved. relay status
shows the last known state. relay resume <uuid>
picks up from the last persisted transition. API calls that were
in flight at crash time may have completed server-side; the orchestrator
checks for cached results before re-issuing.
.relay/locks/<service-id> prevents two cycles
against the same service from running simultaneously. Operator can
override with --force if they understand the risk.relay cost
surfaces both views.Each state transition with a code change produces a commit on the cycle branch. Commit messages follow a structured format:
[<cycle-uuid>] <state-transition>: <summary> Cycle: bf3c91a4 Service: A-202 (Sync Committee Verification) Iteration: 1 State: REVISION_COMPLETE Plan items addressed: - Critical #1: Add bounds check on signature aggregation - Significant #3: Refactor witness loading for evidence path - Ambiguity resolved: Decision was <X>, see plan v2 Models: audit_a: claude-opus-4-7 audit_b: gpt-5 refine: gemini-2.5-pro revise: claude-opus-4-7 Tokens: in 24,213 out 8,047 Cost: $0.71
The relay is built incrementally alongside Phase 1. Each gate produces a working tool that's better than the previous gate but useful at every step. No "build everything before using anything."
Project structure exists. relay.py imports successfully. Three
prompt templates exist. One service config exists. relay --help
prints subcommands. No actual API calls yet.
Estimated: 2–4 hours. Output: Tool that can be invoked, even if it doesn't do anything yet.
relay start <service> sends code to one model with the
audit prompt and writes output to .relay/cycles/<uuid>/audits/.
Operator can read the output. No revision step yet.
Estimated: 4–6 hours. Output: The first thing that saves real copy-paste time. Already useful.
Both audits run in parallel via asyncio.gather. Notification
fires on completion. relay review <uuid> opens
$EDITOR on a consolidated plan file.
Estimated: 4–8 hours. Output: The audit-and-plan loop fully automated. Major friction reduction.
relay continue dispatches the revision prompt. Result applied
as commit on the cycle branch. relay accept merges (or marks
ready). Iteration loop closes.
Estimated: 6–10 hours. Output: Full single-cycle end-to-end. Tool is feature-complete for sequential use.
After revision, run test_command from service config. Capture
output. Surface failures in the acceptance prompt. Optionally auto-iterate
on test failures up to max_iterations.
Estimated: 3–5 hours. Output: Reduces the manual test-then-decide step.
Multiple cycles on different services can run simultaneously. Lock files
prevent same-service conflicts. relay queue shows what awaits
review across all cycles. Gemini refinement step optional and toggleable.
Estimated: 4–6 hours. Output: Multi-service throughput. This is where the time savings really compound.
Per-provider cost tracking. Rate limit handling with backoff. Daily and session cost commands. Alerts if any single cycle blows past a configurable cost threshold.
Estimated: 2–4 hours. Output: Production hygiene.
Gate 2 is when the tool becomes net-positive on time. After Gate 2, every additional gate is an optimization rather than a requirement. If Phase 1 work is being delayed by relay-tool work, stop at the latest gate that's stable and use the tool while building Phase 1 services. Return to relay improvement when specific friction earns it.
Do not finish all six gates before starting Service 2. Gate 1 ships the same week Service 2 work starts. Gate 2 ships mid-Service-2. Gate 3+ ships during the Service 2 → Service 4 transition.
Gates 0–6 sum to roughly 25–43 hours. At evening pace (2–3 productive hours per evening) that's 9–18 evenings. At weekend pace (8–10 hours per Saturday) that's 3–4 weekends. Realistic: built incrementally over the first month of Phase 1, never as a discrete project.
Failure modes specific to the relay itself. Distinct from Phase 1 risks.
Hours flow into relay improvements that don't translate to Phase 1 acceleration. The tool is interesting; Phase 1 work has higher activation energy. Time leaks toward the easier task.
Mitigation: Gate-based discipline (M-300). Stop at the latest stable gate when Phase 1 work needs attention. Treat relay improvement as background work, not foreground.
Audit prompt requires specific section headers. A model occasionally omits a section or uses a different header format. Parsing fails. Cycle stalls.
Mitigation: One automatic retry with a format reminder appended. After second failure, surface raw output to operator with a flag. Operator decides whether to manually parse or abort. Track non-conformance rate in logs to identify which model needs prompt tuning.
Parallel cycles, multiple iterations, expanding reference materials drive token usage above expectations. $50/day becomes $200/day becomes the dominant Phase 1 expense.
Mitigation: Cost tracking per cycle from Gate 6. Configurable per-cycle and per-day caps. Alert at 50% and abort at 100% by default. Reference materials trimmed: only what's necessary for the specific service, not the entire blueprint.
A subtle wrong assumption enters the code. Audit doesn't flag it because the assumption is plausible. Refinement doesn't flag it because it wasn't in either audit. Revision preserves it. Subsequent cycles deepen it. Operator skims and accepts.
Mitigation: Operator review at plan finalization and final acceptance is non-negotiable (M-100 / W.01). When skimming starts to feel automatic, slow down and read carefully — that's exactly when drift is most likely.
The blueprint sheet for a service was updated yesterday but the relay's reference path still points to the old version. Audits run against stale spec. Code drifts from current intent.
Mitigation: References live in the same git repo as the code. The relay reads HEAD, not a snapshot. When the blueprint changes, commit it before running the next cycle. Cycle metadata records the commit SHA the references came from.
An API key expires or gets rotated. Cycle dies mid-run with auth error. State preserved but no progress.
Mitigation: relay doctor subcommand
checks all configured providers for valid auth before any cycle
starts. Run as part of relay start precondition.
Failure surfaces immediately rather than mid-cycle.
This set is the M-series — Mechanical / Operational. Companion to the A-series Phase 1 blueprint. Future sets reserved by the master plan: S-series (Structural / security audit findings), E-series (Electrical / signing key management), L-series (Landscape / public-facing surfaces).
The tool's internal name is relay. The CLI binary is
relay. The package is paxiom-relay. No external
branding. This is internal infrastructure, not a product.
A-series Phase 1 Blueprint — what the relay accelerates. Each A-2## elevation lists the references the relay needs for that service.
Build journal — narrative record. Relay-related entries
will be marked with [relay] tag for filtering.
This document (M-series Rev. A) — the relay blueprint. Subsequent revisions will be issued as the relay's capability grows. Rev. B expected at Gate 3 (revision step working end-to-end).
Rev. A — 2026.04.29. Initial issue. Six gates planned. Three prompt templates drafted. Filesystem layout specified. Risk register with six entries. WSL terminal as the notification surface.
The relay is leverage. Used well, it removes mechanical friction from a workflow that already produces good code. Used poorly, it removes the operator from the decision loop and produces confidently-wrong code at scale. The structural defense — two non-negotiable operator gates per cycle — is what keeps the leverage from cutting the wrong way.
Build the engine first. The moats come later. The tools come from doing the work.