AI agent governance

AI agent governance: the framework most teams build too late

AI agent governance gets built after the incident, not before. By the time a team calls us, there are usually six agents in production, nobody can list all of them, and nobody owns half of them. Governance does not have to be heavy. It has to exist. Here is the lightweight framework we install at clients, the gaps it closes, and why the heavy version of governance always gets bypassed.

Gap 1: No agent registry

The first question we ask in a governance consult is "list every AI agent running in production". Nobody can. There is one in the support stack, one a marketer wired up through Zapier, two in the data pipeline that an analyst built, one a developer left running on a personal AWS account that still hits the corporate Slack webhook. You cannot govern what you cannot list. The fix is a registry: a single maintained list of every agent, what it does, what data it touches, what tools it has, who owns it, what model and provider it uses, what region it runs in, and what eval score it last produced. A Google Sheet is a fine v1; Notion or Airtable if you want a richer schema. The act of filling it in surfaces half the governance gaps on its own. A pattern that works: a weekly five-minute scan of CI artefacts and API gateway logs to catch agents nobody registered. The OpenAI and Anthropic enterprise admin consoles both expose API key usage broken down by application, which is the fastest way to discover the shadow agents.

Gap 2: No named owner

An agent without an owner is an agent nobody updates, evaluates, or turns off when it misbehaves. Every agent in the registry needs a named human owner, not a team, a person. The owner is accountable for the agent eval score, its monthly cost, its incidents, and the decision to retire it. This is the cheapest governance control and the one most often missing. When an agent has no owner, an incident has no first responder, and what should have been a 20-minute pause-and-investigate turns into a four-hour scramble while three people argue about who owns it. The pattern that holds: the owner is named in the agent registry, in the agent system prompt as a contact, and in the on-call rota for the system the agent sits inside. When the owner changes role, the handover is a written ceremony with an explicit acceptance, not a chat message.

Gap 3: No eval harness

Without an eval harness you cannot answer the most basic governance question: did this agent get worse? A prompt change, a model upgrade, a new tool, an upstream data shift, any of them can regress quality silently. The fix is a written test set with ground-truth labels that runs in CI on every change, and weekly on a schedule even when nothing changed. It does not need to be large. A hundred well-chosen cases catch most regressions; we have seen evals with as few as 30 cases produce useful signal when the cases are picked deliberately. Mix happy-path cases, known-failure cases, and adversarial cases (prompt injection, jailbreaks, edge-case formatting). Anthropic publishes a free eval template; Braintrust, Langfuse, and LangSmith all sell hosted versions if you do not want to roll your own. The eval is also what lets you defend the agent to a regulator or an auditor: you can show what it does, measured, with a numeric trendline going back to launch.

Gap 4: No approval gate on consequential actions

Human-in-the-loop gets argued as all or nothing, and so teams either gate everything (and the agent is useless) or gate nothing (and the agent is dangerous). The right answer is to gate the consequential actions and automate the rest. Reading data, drafting a reply, classifying a ticket: automate. Sending money, deleting records, emailing a customer, changing a production config, posting publicly: gate behind a human approval. Draw the line by reversibility (can the action be undone in under a minute) and blast radius (how many users are affected if the action is wrong). Most actions are safe to automate; the few that are not are obvious once you ask the question. The implementation pattern: every tool the agent can call has an explicit "requires_human_approval" flag in the registry, and the gate is enforced in the tool layer, not in the prompt. A prompt-level instruction can be ignored; a tool-level gate cannot. The OpenAI Agents SDK and the Anthropic tool use docs both show this pattern as standard.

Gap 5: Shadow AI

Shadow AI is the set of AI tools and agents teams use without going through any approval. It is not a discipline problem, it is a friction problem. People route around governance when the approved path is slower than the shadow path, and the shadow path (open ChatGPT, paste the customer record, ask for a draft) is fast. The fix is two-sided: a short, readable AI usage policy that says what is allowed in plain English (one page, not a 40-page corporate document), and an approved path that is genuinely fast. If getting an agent reviewed and registered takes a day, people will register. If it takes a quarter, they will not. The 2024 Microsoft Work Trend Index reported that 78 percent of AI users at work were bringing their own AI, and the gap between sanctioned and shadow AI tracks almost perfectly with how fast the sanctioned path moves. Compete on speed, not on policy length.

Gap 6: No incident runbook

When an agent does something wrong (sends the wrong message, leaks a field, loops on cost, hallucinates a refund), what happens? At most clients, the answer is improvised. An agent incident runbook is one page: how to detect (which dashboard alerts which person), how to pause the agent fast (the kill switch, named, tested monthly), who to notify (legal, security, comms, the named owner), how to communicate (template for the customer note), how to write the postmortem (template, deadline, audience). Write it before the incident. The first agent incident is not the time to design the response. The Google SRE workbook chapter on incident management is the canonical reference for the shape of the runbook; the AI-specific addition is the kill switch and the prompt-injection-investigation checklist.

The lightweight governance framework

Put together, the framework is six artefacts and none of them is heavy. A registry of every agent. A named owner per agent. An eval harness per agent, in CI. An approval gate on consequential actions, enforced at the tool layer. A one-page AI usage policy. A one-page incident runbook. Map these to NIST AI RMF 1.0 (free PDF, US reference framework) or ISO/IEC 42001 (the certifiable international standard, published December 2023) if you need an external standard to point at, but do not wait for the standard to start. The registry alone, built this week in a spreadsheet, moves you further than a governance committee that meets next quarter. The EU AI Act phases obligations in through 2026 and 2027, and the documentation it expects for high-risk systems maps almost one-to-one onto these six artefacts, so the work is not wasted whichever framework you eventually align to.

Further reading

Real, named sources the editor can swap in for specific URLs. We do not auto-link these because the right link changes over time. If you find a great primary source, write us and we will update the note.

  • NIST AI Risk Management Framework (AI RMF 1.0). The US reference framework for AI risk. Voluntary, practical, the one most teams map to first.
  • ISO/IEC 42001 (AI management system). The international standard for an AI management system. The certifiable counterpart to NIST AI RMF.
  • Google's Secure AI Framework (SAIF). A practitioner framework for securing AI systems. Strong on the agent-tooling and supply-chain angles.
  • EU AI Act (Regulation 2024/1689). The governance and documentation obligations that apply to higher-risk agents. Phasing in through 2026-2027.
  • r/MLOps and r/MachineLearning. Practitioner threads on model registries, eval harnesses in CI, and production agent governance.

Comments

Leave a comment

Your email won't be published. Comments are reviewed before they appear.
★ Read next