Skip to main content

Read the full Apollo-1 announcement

Reasoning in Code, Not Weights: Apollo-1

Summary

Apollo-1 introduces a new kind of reasoning over language: reasoning you can read, that runs exactly as written. A language model’s reasoning lives in weights — you cannot read it, and its behavior is sampled. A prompt is readable but only advisory; the model may follow it or not. Apollo-1’s reasoning is neither. It is written down in full, and it runs as written. What you read is what runs. That reasoning is a program — a file the company authors, edits, and owns. Not behavior frozen in a vendor’s weights, not logic buried inside a vendor’s application, but the company’s own artifact: readable end to end, version-controlled, auditable, and changed when the business changes. The principal is the company. An Apollo-1 agent talks to a user but answers to a business — booking, claims, returns, disputes, payments — where generative AI works for the user: the developer, the employee, the individual. The mechanism is three layers, and only three. A typed symbolic language for task-oriented reasoning. Programs written in it — each program an agent. And a runtime that runs them: one frozen model that takes a program, a user’s message, and live context, and produces behavior. Language, programs, runtime. Because that one model runs any program written for it — generalizing to programs, situations, and phrasings it was never shown — it is a foundation model in the strict sense, and the agents are what it runs, not what is baked into it. This is possible because the domain is finite: the procedural structure of task-oriented dialogue is a closed, compact grammar, and a single model can cover it. Building and changing an agent is software work, and that is the deeper point. Once reasoning is a program, it is a software artifact — versioned, diffed, reviewed, tested, composed, owned — and it improves the way software improves. Long-horizon coding agents made this practical: the program, the runtime call, and the trace all live in one medium a coding agent reads and writes, so it can author a change, run it, read the trace, and revise until the behavior holds. That loop is what turns a faithful program into a reliable one, and what closed the cost of authoring and maintaining logic this complex — the standing argument against neuro-symbolic AI at scale. Run it enough and the gains compound: a behavior made correct stays correct, so quality becomes a function of compute. A different kind of model, for a different kind of agent. Reasoning in code, not weights — which is to say, reasoning as software.

Neuro-Symbolic AI

Neuro-symbolic AI composes neural perception with symbolic logic in a single model. Decisions are computed from typed symbolic state, not sampled from token probabilities. The architecture has been a research direction for decades, and it produced no foundation model. Classical symbolic AI tried to encode meaning into its symbols, which forced ontologies to represent the world. The world did not fit. The structures did not compose across domains, and the maintenance cost crushed every implementation. Apollo-1’s symbols are procedural, not semantic. They carry roles, relations, state transitions, and predicates over state — the grammar of task-oriented dialogue, not a description of the world. Content stays in the neural modules. The symbolic layer knows where a value sits in a program and what role it plays; it does not represent what the value means. This is the separation classical AI never made, and it is the line between a foundation model and a graveyard. Semantic common sense — that water is wet, that the dead stay dead — is about the world, and it is infinite; it is what earlier symbolic programs spent decades trying to write down while the world refused to fit. Procedural common sense — confirm before charging, identify before acting, never reverse what never happened — is not about the world at all. It is the shape of finishing a task through conversation, and it is finite. The procedural grammar can be completed; the semantic one cannot. Apollo-1 takes the first and refuses the second. This is the load-bearing claim of the architecture, so it is worth stating precisely: the procedural grammar is finite by construction. We built it, over years, by dissecting task-oriented conversations into their recurring structures. The same structures recur under different content, so generalization works on the structures while the values that fill them change turn to turn. Novel inputs are mapped onto that finite set at inference. When the mapping is imperfect, the result is task failure, not a silently broken rule. Apollo-1 is the first foundation model for neuro-symbolic AI: a single model, frozen, that generalizes one capability — reasoning over typed symbolic state, binding it to natural language — across the whole space of task-oriented reasoning programs written for it. A language model’s domain is text; Apollo-1’s domain is those programs. Same model, different program, different agent, and an improvement to the model propagates to every program built on it. Three architectural consequences run through the rest of the paper. Language and logic operate in one computation. The agent’s cognition is a program, separate from the model that runs it. And because the program is software, the agent improves through the loop that has driven a decade of machine learning — propose a change, validate it, keep what scores — with one addition that matters: because what scores is preserved as a test, the gains compound rather than regress.

Two Kinds of Agents

Two different systems are emerging under one word. They are not variations of the same thing. They have different principals, different jobs, and cognition in different places. Open-ended agents work for users: coding assistants, personal AI, employee tools. The user is the principal, and flexibility is the point. Cognition lives at the model provider, in weights, and the user adapts to whatever the latest model does. The LLM is the right substrate for this. Task-oriented agents work on behalf of companies: the agent that handles a claim, schedules a procedure, processes a return, files a dispute, authorizes a transfer, books a seat. These agents serve users, but they represent the company — the entity whose policies must be enforced, whose lawyers must approve, whose compliance team must audit, whose product team must change behavior when the business changes. For these, cognition cannot stay implicit at a provider. It has to be an artifact the company holds. Task-oriented agents require three properties, and an architecture has to provide all three at once: Reasoning over both language and state, in one model. Users do not follow scripts. They ask unexpected questions, change their minds, wander. The agent has to reason over what they say. At the same time it must evaluate conditions against state and enforce its rules without exception: the ticket cancels only when the passenger is Business Class and Platinum Elite; the payment processes only on explicit confirmation; the refund issues only on documented eligibility. Open-ended language and formal reasoning over state have to meet inside one model, not across two systems trading messages. Mutable cognition. An airline’s cancellation policy changes; a bank’s dispute window changes; a hospital’s scheduling rules change. The agent has to move with the business, without retraining and without a model release. Intelligence in a program, not in weights. The agent has to be an artifact compliance can read, legal can sign, engineering can version. Cognition in weights cannot be read, audited, or attributed to a decision. Without an artifact to point at, behavior is no one’s responsibility — and no enterprise deploys responsibility it cannot assign.

Why Current Approaches Struggle

Two architectures dominate task-oriented AI today: orchestration frameworks and function-calling LLM agents. Both wrap an LLM in different scaffolding, and both fail the three-property test for the same structural reason: rules are not a first-class object in either. Orchestration frameworks wrap an LLM in a workflow system — state machines, routing, branching. The state machine reasons; the LLM converses; the two do not share understanding. A user is mid-payment and asks, “wait — what’s the cancellation policy before I pay?” No transition was coded for the digression, so the system breaks, gives a canned reply, or forces the user back on script. You add a branch. Then users ask about refunds mid-payment, or shipping. Real deployments accumulate hundreds of branches and still miss edge cases. Hand off to the LLM instead, and it has no model of the flow, the rules, or the accumulated state, so it may process the payment without confirmation because it is predicting a token, not reasoning from state. Conversation and reasoning end up inversely correlated: the tighter the state machine, the worse the experience; the more the LLM is trusted, the less the behavior holds. Function-calling agents take the opposite approach: give the LLM tools and let it decide when to call them. Conversation works. But the decisions are sampled from a probability distribution, not computed from state. Prompting, fine-tuning, and output filtering reduce unwanted tool calls; they do not eliminate them. The model might call the refund function without verifying documentation, skip a confirmation, or invoke a tool with the wrong parameters. Validation layers that gate a call before it executes help, but they are reactive — the agent has already decided to act — and each one is written per tool, not derived from a shared model of the domain. Both treat rules as add-ons. In orchestration, a rule is a branch — part of a flow, not part of a model. In function-calling, a rule is a sentence in a system prompt — advisory, soft, forgettable. Neither captures what rules actually do in task-oriented cognition. Rules are not only enforcement; they are the structural layer that lets symbolic logic and neural reasoning coexist. When a rule is a symbolic predicate the runtime evaluates, the symbolic side holds the logic absolutely while the neural side handles whatever language arrives. The rule does not constrain the conversation. It stabilizes the cognition. Without rules as structure, the model has nothing to hold, and conversation and reasoning revert to the inverse correlation neither approach can escape. Until neuro-symbolic AI, no architecture combined open-ended conversation with reliable enforcement in one model.

Origins

In 2017 we began encoding millions of real task-oriented conversations into structured data, with a workforce of 60,000 human agents. The insight was not data scale; it was what must be represented — and why it could only be learned, not written down. Task-oriented conversational AI requires two kinds of knowledge in tandem. Descriptive knowledge — entities, attributes, domain content. Procedural knowledge — roles, logic, flows, policies. Datasets are stateless; logic requires explicit state. We began building a typed symbolic language to capture these recurring structures. Why 60,000 agents and not an ontology team? Because the knowledge we needed is tacit. The procedural sense of how a skilled person actually closes a claim, or defuses a dispute, or knows when to confirm and when to refuse, is written in no rulebook; it lives only in the doing. Tacit knowledge cannot be stated, only distilled from behavior. That is the deeper reason classical AI was doomed in principle and not merely in practice — it set out to write down what can only be extracted — and it is why our path ran through millions of real conversations ranked by reputation rather than through a specification. Around 2021 the leap in language models arrived. Modern LLMs replaced the pre-transformer foundations our neural stack had been built on. Language stopped being the bottleneck, and only then did the rest of the architecture become reachable. Across every domain we tested — booking, scheduling, claims, disputes, renewals, authorizations — task-oriented dialogue followed the same procedural patterns: parameter extraction, intent identification, logic evaluation, policy enforcement, state-dependent branching. We built the typed symbolic language out across these structures, and the neuro-symbolic reasoner that computes next actions from encoded state. The procedural logic inside the engine was not trained into weights; it was taught — distilled over years from those conversations into symbolic structure, by dissecting them into their elements and ranking contributions through a peer-review reputation system. Augmented Intelligence — our name — is the term for the loop that produced it. The second outside contribution arrived in mid-2025: coding agents able to read, modify, and verify structured codebases end-to-end. Cognition expressed as code is the end-state of a typed symbolic language — what the language was designed to enable. The maintenance cost of authoring and evolving programs in our language had been the standing argument against neuro-symbolic AI at scale. With coding agents at production capability, that argument closed. Two things had to be true at once, and only recently were. On our side, a coding agent had to be able to drive the runtime directly: the language complete enough to express an agent in full, the CLI and API working under real version control, and the path from the files to the runtime cleared down to the program, the prompt, and the context — removing the last code between them is what brought it within reach. On the other side, coding agents had to become long-horizon, able to work a problem for hours rather than answer in one pass. The first half was ours to build; the second was not. They arrived together, which is why this is only now possible. And the fit is not luck: the reasoning is code because it was taught, not learned into weights, so advances in coding agents accrue to this architecture rather than threaten it.

Apollo-1

Apollo-1 is built on neuro-symbolic architecture. Its inputs are typed symbolic programs and natural-language messages. Its outputs are typed symbolic states and natural-language responses. Reasoning happens in one pass over a single representation: neural modules handle language and perception, symbolic modules handle state and logic, both operating together inside the same computational loop. The same computation that writes the sentence checks the rule. There is no moment at which the model could choose to break a rule, because rules are part of the computation that produces the response, not a check wrapped around it. What makes Apollo-1 distinct from any other foundation model is where its cognition lives. A language model’s cognition lives in weights — a parameter tensor whose behavior is implicit, produced by training, modifiable only by more training. Apollo-1’s cognition lives in code — a symbolic program whose behavior is explicit, produced by writing, modifiable by editing. The runtime compiles the program and executes it. The program is the agent. Two agents on Apollo-1 run the same runtime and different programs; what makes one a refund agent and another a claims agent is a file. This is what makes Apollo-1 a foundation model, and the claim rests on what it generalizes over. A language model generalizes over text; Apollo-1 generalizes over the space of task-oriented reasoning programs — one frozen model that runs any program written in its symbolic language and produces correct behavior for it, with an improvement to the model reaching every program at once.

The Symbolic Language

Apollo-1’s symbolic substrate is a typed programming language for task-oriented reasoning: a grammar that covers a finite domain — task-oriented dialogue — and a runtime that compiles programs written in it. The language has three properties:
  1. It is typed — Every entity, parameter, rule, and tool has a type the runtime checks, and a program that does not type-check does not run.
  2. It is finite in its procedural states — Across every domain we have studied, the same procedural structures appear under different content, and the grammar covers their full set.
  3. It is expressive over content — Any value can occupy any field, because the symbols describe procedural roles, not world meaning.
An Apollo-1 agent’s program is a typed JSON codebase: five files at the root — agent.aui.json, entities.aui.json, parameters.aui.json, integrations.aui.json, rules.aui.json — and a tools/ directory with one file per tool. Together they describe what the agent does, what it knows, what it must enforce, and what it can call. The program is text, readable end-to-end by anyone who can open a file.

The Runtime

Apollo-1’s runtime is a neuro-symbolic reasoner. At inference it takes a typed symbolic program and a natural-language message and produces behavior, turn by turn, against live state. A domain-agnostic encoder parses the message into typed symbolic objects, forming the initial state. A stateful loop then iterates until the turn completes: a neuro-symbolic state machine maintains symbolic state, a symbolic reasoning engine computes the next action from that state, and a neuro-symbolic planner compiles executable plans. A domain-agnostic decoder generates language from the final state. Perception is probabilistic; action selection is not. Given the same state, the runtime makes the same decision, and every decision in a trace is reproducible from the state that produced it. End-to-end outputs are not deterministic, because perception runs through the neural modules and two phrasings of one request can form different initial states. Once a state is formed, the logic over it is fixed. Failures in perception surface as task failure, not as policy violation.

Authoring

Apollo-1 does not write agents. It runs them. Programs are written by coding agents — in the CLI, by the developer’s coding agent; in the Playground, by the Agent Builder, a coding agent embedded in an authoring harness. Apollo-1 provides the language, the runtime, the schema, the templates, and the documentation. The coding agent does the writing. When the coding agent can run programs on the runtime, the target of authoring changes. It is no longer a well-typed program that expresses the policy; it is a program whose runtime behavior matches the intent. The agent writes a candidate, compiles it, runs it against scenarios, reads the trace, and revises until the observed behavior and the intended behavior agree. The scenarios need not be supplied by hand — the coding agent derives them from the program and the change it just made. The unit of work shifts from producing an artifact to converging on a behavior, and most of the work is no longer generation; it is running programs and reading traces.

Properties

Many properties follow from cognition being code rather than weights. Four are central: rules become a first-class object; every decision is recorded in a trace anyone can read; a faithful program can be made reliable by running it; and reasoning, being a program, becomes software in the full sense.

Rules as a First-Class Object

In Apollo-1, rules are typed symbolic predicates the runtime evaluates against state. The agent does not break a rule for the same reason a compiler does not ignore a type: the rule is part of the computation that produces the response, not a check around it. When you define your tools, Apollo-1 generates an ontology — a typed representation of your entities, parameters, and relationships, shared across all your tools — and from it you author the rules the agent must enforce. Apollo-1 expresses them as predicates that live in rules.aui.json as code: policy rules (unconditional enforcement), confirmation rules (explicit consent before execution), authentication rules (identity before execution), conditional rules (applied only when conditions hold), and sequencing rules (enforced ordering). Evaluation is deterministic. If the predicate is today − txn.date ≤ 8 days and the transaction is nine days old, the action is blocked, every time. Perception remains probabilistic; the system can misunderstand a request, but it cannot decide to skip a required step or forget a policy mid-conversation. The payoff is wider than enforcement. When rules are part of the computation, they do not constrain the conversation; they stabilize the cognition. The symbolic side holds the logic absolutely — the predicate fires or it does not — and the neural side stays free to handle whatever language arrives.

White-Box Traceability

Every turn produces a trace that records the symbolic computation in full: the intent as parsed, the entities resolved, the tools considered, the rules evaluated, the predicates that fired, the parameters extracted, the decisions made and the reasons attached. Each is a real object in the trace, addressable in code, comparable across turns. Apollo-1’s trace is the computation, recorded as it happens. The runtime cannot decide one thing and trace another, because the decision and the trace are the same object. “Why did you block that?” has a literal answer: this rule, this predicate, this state.

From Faithful to Reliable

Authoring produces a faithful program: it runs exactly as written. Faithful is not reliable. Reliable is faithful and written right, and for logic this complex, whether it is written right is an empirical fact, not one you can read off the page. The run-read-revise loop: runtime access, so the agent can see what the program actually does, and a long horizon, so it can keep going until the behavior holds. Because the program, the scenario, the runtime call, and the trace are all in one medium the agent reads and writes, the loop runs without a human in it. Quality becomes something you buy with compute.

Reasoning as Software

Once cognition is a program rather than a parameter tensor, it is not only readable and ownable — it is software, and it inherits the entire software lifecycle. An Apollo-1 agent is versioned and diffed; forked into a variant for a new market or a new regulator; reviewed line by line before it ships; tested against scenarios; composed from shared modules; packaged; and, in principle, open-sourced. Apollo-1 does not just let a company own its reasoning. It makes a company’s reasoning software. The breakthrough is not that reasoning is written in code. It is that reasoning becomes software.

The CLI and the Playground

Apollo-1 agents are authored in two places. The CLI is the developer surface; the Playground is the working surface for everyone touching an agent — engineers, compliance officers, operations leads, product managers, customer-experience owners. Both edit the same typed JSON codebase. Both run against the same runtime.

The CLI

An Apollo-1 agent is a typed JSON codebase, edited in Cursor, VS Code, or any editor with full schema autocomplete. Version control is git. Diffs against rules.aui.json are real diffs. Programs are written by coding agents in the developer’s terminal or IDE: the developer says what the agent should do, the coding agent generates the program against the language specification, the developer iterates. Pull requests work as pull requests work; tests against fixtures work as tests against fixtures work. An agent’s history is a commit log, its review is a code review, its rollback is a git revert.

The Playground

The Playground is the agent’s working surface. Engineers use it to inspect reasoning, watch rules fire, and iterate on a policy without leaving the browser. Stakeholders without code in their workflow use it to author and edit the program in natural language. Two surfaces sit side by side. On the right, the agent — the code at runtime; it talks and acts, and clicking any turn opens the white-box trace: initial state, execution, rule evaluation, generation. On the left, the program, addressable in three modes: Build (English, where the Agent Builder authors and modifies the program), View (the structured UI), and Code (the raw .aui.json files). In Build mode you work with the Agent Builder — a coding agent in a context-rich harness with read-write access to the program, live access to every turn’s trace, schema validation that ensures changes type-check, scenario evaluation, and a quality gate that decides what commits.

What Apollo-1 Isn’t For

Apollo-1’s architecture makes deliberate trade-offs. By optimizing for task-oriented agents, it does not compete in other domains, by design. Open-ended creative work — writing, brainstorming, exploratory dialogue, where variation creates value — is better served by transformers; Apollo-1’s structures enforce consistency, and creativity often requires the opposite. Code generation — Apollo-1 can integrate with code-execution tools, but its language is purpose-built for task execution, not software development. Low-stakes, high-variation settings — engagement campaigns, tutoring, entertainment chatbots — are better served by probabilistic variety than by formal enforcement.

General Availability

Apollo-1 is already deployed at scale across dozens of enterprises in regulated and unregulated industries, including Fortune 500 companies, with a strategic go-to-market partnership with Google. A preview Playground is open, featuring Apollo-1 agents across HR, IT, regulated industries, retail, automotive warranties, and more — domains where we have active early deployments, simulated for preview. Each agent runs from its program alone, viewable as code or in a UI view for non-technical stakeholders, and the Agent Builder is available on every one of them. A technical paper will be released alongside GA. General availability is Q2 2026. Apollo-1 integrates with existing generative-AI workflows and adapts to any API or external system — no endpoint changes, no data preprocessing — with native connectivity to Salesforce, HubSpot, Zendesk, and others, and full MCP support. GA launches with:
  • The Conversational API — for task-oriented dialogue
  • The Playground and CLI
  • Full documentation and toolkits
Following in 2026: the Workflow Automation API, voice support, and fully local agent development in the CLI. Apollo-1 improves on three axes. Its neural modules improve with every advance in low-latency LLMs. Its symbolic language evolves as we extend its coverage of task-oriented reasoning. And as coding-agent capability advances, so does the ease of building on and evolving Apollo-1.

Conclusion

Open-ended agents work for users; their cognition lives at the model provider, and that suits the user. Apollo-1 is the foundation model the other kind of agent runs on — agents that act on behalf of the entity the user is talking to, whose cognition the company writes, owns, and changes. Because the agent is a program rather than a behavior trapped in weights, the company can read it, sign it, version it, and improve it, and Apollo-1 compiles it into reliable execution. Filing claims, opening disputes, processing returns, authorizing payments, completing bookings: these are the conversations that run the economy, and they are too consequential to leave to behavior no one can read. The deeper shift is what becomes possible once reasoning is a program. It stops being a model you rent and becomes software you own — versioned, tested, reviewed, and improved on a loop that turns compute into reliability, and surrounded, in time, by the same ecosystem every other kind of software has. A different kind of model, for a different kind of agent, on a different reasoning framework: the first an organization can not only call its own, but build on as software. Reasoning in code, not weights — reasoning, at last, as software.
Augmented Intelligence (AUI) Inc. Patents pending.

Get Started

Quickstart

Build your first agent and send messages.

Messaging API

Send messages to agents programmatically.

CLI

Build and manage agents from your terminal.

Changelog

Latest updates and API changes.