What If AI Agents Weren't Black Boxes?
The Problem
Agent systems are emerging across industries, promising to automate and streamline workflows. The promise is compelling – automate complicated tasks with autonomous systems rather than human hands.
But there’s a problem no one seems to be talking about.
When agents plan and build, we are left in the dark about their reasoning. We can see what information went in and what results came out, but the decision-making process itself is not externally inspectable as explicit decision artifacts. The model reasons inside its context window, and we are left hoping it chose the right path, with the right intention.
For trivial tasks, this works fine. For anything with real consequences — regulated industries, enterprise workflows, legal compliance, healthcare, finance — “the AI decided” isn’t an acceptable answer.
When something goes wrong — as complex systems eventually do — it becomes important to understand what happened. In many current approaches, that visibility is limited.
The Insight
What if we stopped treating AI as the all-trusted controller and started treating it as a valued consultant?
Traditional programming uses logic gates. If/then branches that evaluate conditions and route execution. The programmer defines the conditions. The program executes predictably.
But some decisions aren’t that simple. Traditional logic falls on its head when faced with: “Is this email professional enough to send?” “Does this response actually address the customer’s concern?” “Is this transaction suspicious?” These require judgment — something traditional logic gates can’t provide.
AI can.
The insight is this: treat AI models as literal decision gates. The program runs outside the model’s context window. When it hits a decision point requiring judgment, it calls the model, gets a decision, and continues.
The model doesn’t control the program. It gets consulted. And every consultation is captured — the prompt, the response, the reasoning.
A Different Framing
Technically, none of this is impossible to build today. There are existing frameworks that let you combine deterministic workflows with model calls at specific points. From a tooling perspective, the pieces already exist.
The difference here isn’t capability — it’s where authority lives.
Much of the current conversation around AI systems focuses on autonomy: agents that plan, reason, and act inside long-running loops, with execution largely driven by the model itself. That approach can be powerful, but it also pushes more responsibility into opaque model behavior, making it harder to inspect, audit, or intervene when something goes wrong.
This work starts from a different assumption: that execution authority should remain outside the model. The program defines the flow. The model is consulted at explicit decision points and may supply judgments or propose structured next steps when asked, but authority over what actually runs remains external to the model.
This framing intentionally trades autonomy for control. Not because autonomy is useless, but because in many real-world systems — especially those with regulatory, financial, or safety implications — transparency and accountability matter more than letting the AI “figure it out.”
The Pattern
At the core, this system treats AI as a decision point, not a controller.
The program runs normally until it reaches a place where simple logic breaks down — a judgment call. At that point, it asks a model a very specific question using explicit program state and expects a structured answer back. The response is recorded along with the prompt and timestamp, and the program continues by branching on that result like any other boolean condition.
Nothing magical is happening here. The model isn’t “running” the program, and it doesn’t know what comes next. It’s consulted, returns an answer, and execution resumes outside the model. Every decision is visible, logged, and attributable to a specific point in the flow.
This alone solves a large part of the black-box problem — but it’s also intentionally limited. Decision gates let the system choose, not build. They provide judgment without autonomy.
Context Windows and State
Context windows still matter — each decision requires enough local context to be evaluated meaningfully. What changes here is where state lives.
Instead of accumulating all prior reasoning inside a single, ever-growing context window, state is held externally by the program. Each model call is scoped to the specific decision being evaluated, using only the information relevant at that point in execution.
This allows decisions to be composed across multiple, focused context windows rather than constrained by one monolithic prompt. The system scales by chaining decisions together under program control, not by stretching a single context window until it becomes opaque.
The result is not less context, but better-bounded context — small enough to reason about, explicit enough to audit, and isolated enough to understand when something goes wrong.
What You Gain
Auditability.
Every decision is captured as a concrete artifact: the prompt, the structured response, the rationale, and when it occurred. When something goes wrong, you don’t have to guess which part of the system failed — you can point to the exact decision boundary where it happened.
Transparency.
There is no hidden reasoning loop. Each judgment call is explicit, scoped, and visible in the execution flow. Instead of “the AI just did that,” you can trace how a specific decision influenced what happened next.
Control.
The program determines when the model is consulted and what authority it has at each point. Human approval gates can be inserted where consequences warrant it, without redesigning the system. The AI proposes; the system decides whether and how that proposal is acted on.
Efficiency.
Because each model call is scoped to a single decision, prompts stay small and focused. There’s no need to carry accumulated context forward just to preserve state. You spend tokens where judgment is required, not to maintain an ever-growing prompt.
Composability.
Simple, auditable decisions can be chained into complex workflows without losing clarity. As systems grow, each decision remains isolated and inspectable, rather than disappearing into a single opaque reasoning process.
The Proof
I’ve built this.
Not a whitepaper. Not a theory. A working system.
This is a program evaluating whether an email draft is professional enough to send to clients. You can see:
The exact prompt the model received
The decision it returned
The rationale returned for that decision
What the program did next (rewrote the email, requested human approval)
Every step is clearly visible. Every decision is traceable, auditable. If something goes wrong, you know exactly where and why.
Beyond Static Workflows
Decision gates solve the basic transparency problem, but they expose a practical limitation. A workflow can only handle the situations it was explicitly designed for. When execution reaches an edge case the system doesn’t recognize, the traditional answer is failure — followed by human intervention, redesign, and redeployment.
What if there were another option?
Instead of letting the system break, the model can be asked to propose a structured set of next steps under explicit conditions. These proposals are produced as visible, inspectable instructions and are treated as data — not execution.
Before anything runs, the proposal is checked for execution compatibility. In some cases, it may also require explicit human approval. Validation here refers to execution fit, not correctness or intent. The system enforces what can run, not whether a proposed step is a good idea.
This does not give the model authority over execution. It gives it a constrained way to respond when the existing workflow reaches its limits. The program remains in control of what executes, when it executes, and under what conditions.
Seen this way, the system isn’t self-modifying. It’s adaptive within a defined execution model. The AI doesn’t rewrite itself — it proposes patches. You can inspect them, approve them, reject them, or ignore them entirely.
This layer builds on decision gates rather than replacing them. It allows workflows to extend themselves transparently when needed, without turning execution into an opaque reasoning loop.
I’ve built this too.
Why This Matters Now
Much of the current AI safety conversation focuses on alignment — making sure models are trained to want the right things. That work is essential, and it should continue. In fact, architectures like this make alignment more important, not less.
But alignment alone doesn’t answer a simpler, operational question:
When an AI system takes an action, how do you know what actually happened?
As systems grow more capable and more autonomous, the gap between intent and execution widens. Longer context windows don’t solve this. They make it harder to see where decisions came from and why they were made. The black box just gets bigger.
For systems operating in real-world environments — finance, healthcare, infrastructure, regulated enterprises — that’s not acceptable. “Trust the model” is not a strategy. Accountability requires the ability to inspect decisions after the fact and intervene before they compound.
This work argues for a complementary foundation. One where execution remains explicit, decisions are surfaced as artifacts, and adaptation happens transparently rather than inside opaque reasoning loops. Alignment determines what judgments are made; architecture determines how those judgments affect the world.
If AI agents are going to take on more responsibility, the next step isn’t more autonomy. It’s clearer boundaries between judgment, execution, and authority.
This is my proposal for what that step could look like.
What This Isn’t
Let me be clear about what I’m not describing:
Not a chatbot. This isn’t conversation. It’s execution.
Not prompt engineering. The architecture is different, not just the prompts.
Not an agent swarm. One program, with AI as a component — not dozens of models talking to each other.
Not chain-of-thought. Reasoning is surfaced as explicit artifacts, not inferred from an internal prompt alone.
Not uncontrolled autonomy. Authority is enforced by the execution model, not delegated to the AI.
It’s a different way of thinking about what AI agents can be.
What’s Next
I’ve been building this for a while. The decision gate layer works. The controlled evolution layer works. There’s more beyond that I haven’t discussed here.
Now I’m interested in the right conversations — with people thinking seriously about AI transparency, auditability, safety, and what comes after the current hype cycle.
If that’s you, I’d like to hear from you. jon (dot) macpherson -at- gmail {dot} com