Gap 1: No agent registry

The first question we ask in a governance consult is "list every AI agent running in production". Nobody can. There is one in the support stack, one a marketer wired up through Zapier, two in the data pipeline that an analyst built, one a developer left running on a personal AWS account that still hits the corporate Slack webhook. You cannot govern what you cannot list. The fix is a registry: a single maintained list of every agent, what it does, what data it touches, what tools it has, who owns it, what model and provider it uses, what region it runs in, and what eval score it last produced. A Google Sheet is a fine v1; Notion or Airtable if you want a richer schema. The act of filling it in surfaces half the governance gaps on its own. A pattern that works: a weekly five-minute scan of CI artefacts and API gateway logs to catch agents nobody registered. The OpenAI and Anthropic enterprise admin consoles both expose API key usage broken down by application, which is the fastest way to discover the shadow agents.

Gap 2: No named owner

An agent without an owner is an agent nobody updates, evaluates, or turns off when it misbehaves. Every agent in the registry needs a named human owner, not a team, a person. The owner is accountable for the agent eval score, its monthly cost, its incidents, and the decision to retire it. This is the cheapest governance control and the one most often missing. When an agent has no owner, an incident has no first responder, and what should have been a 20-minute pause-and-investigate turns into a four-hour scramble while three people argue about who owns it. The pattern that holds: the owner is named in the agent registry, in the agent system prompt as a contact, and in the on-call rota for the system the agent sits inside. When the owner changes role, the handover is a written ceremony with an explicit acceptance, not a chat message.

Gap 3: No eval harness

Without an eval harness you cannot answer the most basic governance question: did this agent get worse? A prompt change, a model upgrade, a new tool, an upstream data shift, any of them can regress quality silently. The fix is a written test set with ground-truth labels that runs in CI on every change, and weekly on a schedule even when nothing changed. It does not need to be large. A hundred well-chosen cases catch most regressions; we have seen evals with as few as 30 cases produce useful signal when the cases are picked deliberately. Mix happy-path cases, known-failure cases, and adversarial cases (prompt injection, jailbreaks, edge-case formatting). Anthropic publishes a free eval template; Braintrust, Langfuse, and LangSmith all sell hosted versions if you do not want to roll your own. The eval is also what lets you defend the agent to a regulator or an auditor: you can show what it does, measured, with a numeric trendline going back to launch.

Gap 4: No approval gate on consequential actions

Human-in-the-loop gets argued as all or nothing, and so teams either gate everything (and the agent is useless) or gate nothing (and the agent is dangerous). The right answer is to gate the consequential actions and automate the rest. Reading data, drafting a reply, classifying a ticket: automate. Sending money, deleting records, emailing a customer, changing a production config, posting publicly: gate behind a human approval. Draw the line by reversibility (can the action be undone in under a minute) and blast radius (how many users are affected if the action is wrong). Most actions are safe to automate; the few that are not are obvious once you ask the question. The implementation pattern: every tool the agent can call has an explicit "requires_human_approval" flag in the registry, and the gate is enforced in the tool layer, not in the prompt. A prompt-level instruction can be ignored; a tool-level gate cannot. The OpenAI Agents SDK and the Anthropic tool use docs both show this pattern as standard.

Gap 5: Shadow AI

Shadow AI is the set of AI tools and agents teams use without going through any approval. It is not a discipline problem, it is a friction problem. People route around governance when the approved path is slower than the shadow path, and the shadow path (open ChatGPT, paste the customer record, ask for a draft) is fast. The fix is two-sided: a short, readable AI usage policy that says what is allowed in plain English (one page, not a 40-page corporate document), and an approved path that is genuinely fast. If getting an agent reviewed and registered takes a day, people will register. If it takes a quarter, they will not. The 2024 Microsoft Work Trend Index reported that 78 percent of AI users at work were bringing their own AI, and the gap between sanctioned and shadow AI tracks almost perfectly with how fast the sanctioned path moves. Compete on speed, not on policy length.

Gap 6: No incident runbook

When an agent does something wrong (sends the wrong message, leaks a field, loops on cost, hallucinates a refund), what happens? At most clients, the answer is improvised. An agent incident runbook is one page: how to detect (which dashboard alerts which person), how to pause the agent fast (the kill switch, named, tested monthly), who to notify (legal, security, comms, the named owner), how to communicate (template for the customer note), how to write the postmortem (template, deadline, audience). Write it before the incident. The first agent incident is not the time to design the response. The Google SRE workbook chapter on incident management is the canonical reference for the shape of the runbook; the AI-specific addition is the kill switch and the prompt-injection-investigation checklist.

The lightweight governance framework

Put together, the framework is six artefacts and none of them is heavy. A registry of every agent. A named owner per agent. An eval harness per agent, in CI. An approval gate on consequential actions, enforced at the tool layer. A one-page AI usage policy. A one-page incident runbook. Map these to NIST AI RMF 1.0 (free PDF, US reference framework) or ISO/IEC 42001 (the certifiable international standard, published December 2023) if you need an external standard to point at, but do not wait for the standard to start. The registry alone, built this week in a spreadsheet, moves you further than a governance committee that meets next quarter. The EU AI Act phases obligations in through 2026 and 2027, and the documentation it expects for high-risk systems maps almost one-to-one onto these six artefacts, so the work is not wasted whichever framework you eventually align to.

AI agent governance: the framework most teams build too late

Gap 1: No agent registry

Gap 2: No named owner

Gap 3: No eval harness

Gap 4: No approval gate on consequential actions

Gap 5: Shadow AI

Gap 6: No incident runbook

The lightweight governance framework

Further reading

Comments

Leave a comment

Gap 1: No agent registry

Gap 2: No named owner

Gap 3: No eval harness

Gap 4: No approval gate on consequential actions

Gap 5: Shadow AI

Gap 6: No incident runbook

The lightweight governance framework

Further reading

Comments

Leave a comment

Agentic programming security: the fundamentals most teams skip

Privacy best practices for agentic AI: a consultant's checklist

Agentic AI compliance: what changes when the agent decides