Stencil · Vol. 1 · No. 1 · May 2026
Essay · paradigm note
Toward a Simulation Organism
A paradigm in which the agent and the solver are one adaptive system, not two systems with an interface between them.
An open journal of physics simulation · pouriamistani.com/blog/toward-a-simulation-ai-organism · doi.org/10.5281/zenodo.20331251
The premise
For three decades, our community has built simulations the way watchmakers build watches — meticulously, by hand, one gear at a time. Numerical schemes are chosen by experts. Convergence is verified case by case. Calibration to experiment is patient work, performed by graduate students over years. AI has entered this picture mostly as a faster surrogate: a neural network trained on solver outputs, deployed where the solver itself was too slow.
This is a useful but limited posture. It treats the AI as a replacement for the inner loop while leaving the architecture of the simulation — the discretization, the operator splitting, the boundary treatment, the linear solver, the time integrator, the whole edifice of choices that determine what the simulation means — untouched. The AI becomes a tool inside a craftsman's workshop. Powerful, but the craftsman is still us.
What if the craftsman is also AI?
What I would like to sketch in this note is a different posture, one I have come to think of as a Simulation Organism. The metaphor is deliberate: in this paradigm the agent and the solver are not two systems with an interface between them, but one adaptive system — the agent embedded in the building blocks of the simulation, modifying numerical schemes, selecting discretizations, formulating hypotheses about missing physics, and revising the solver itself in response to every disagreement between its predictions and the world.
I do not claim this paradigm is solved. I claim it is becoming possible — and that the recipe is concrete enough to begin building.
What it is, concretely
A simulation, viewed strictly, is a piece of software that maps initial and boundary conditions plus a set of physics assumptions to a predicted state of the world. Every solver is a chain of choices: which conservation laws are enforced, which terms are dropped, which mesh is laid down, which kernel is invoked, which preconditioner is preferred, which time scheme is used, which stopping criterion ends the iteration. Each choice exists for a reason, but most of those reasons live in the solver author's head.
A Simulation Organism is a system in which a general-reasoning agent — at the time of writing, a model like Claude Opus 4.7 — operates at a meta-layer above these choices, inside a sandboxed simulation framework, with the ability to:
- Run simulation experiments — small ones first, then progressively larger — using the solver's existing building blocks.
- Compare the output against benchmarks, analytic solutions, and where available, physical measurement.
- Diagnose discrepancies — not as numerical errors to be silenced, but as evidence to be explained. A gap is a clue.
- Hypothesize the cause of the gap — a missing term, an under-resolved scale, a wrong discretization, an unmodeled coupling.
- Verify the hypothesis by modifying the solver, rerunning, and re-comparing.
- Generalize the lesson across experiments, accumulating what I will call iron laws of solver design — invariants that hold across a class of problems and that future versions of the agent deploy without re-deriving.
This is, structurally, the scientific method, instantiated inside the solver. The novelty is not the method but the substrate. The agent does not write papers; it writes solvers. Its hypotheses are tested not against the literature, but against the same physical world the solver is built to describe.
Why simulation, specifically
It is fashionable to say that AI agents will accelerate "science." The phrase is too vague. Different scientific tasks admit very different kinds of automation, and simulation is unusually well-suited for the kind of agent I am describing — for three reasons.
First, simulation is constrained. The agent cannot drift into fantasy. Numerical schemes have provable convergence properties. Conservation laws are absolute. Mathematical analysis bounds error. Hardware imposes wall-clock and memory ceilings. These are not obstacles to the agent's creativity — they are precisely what makes the agent's creativity tractable. An agent free to "discover" new physics with no constraints will hallucinate. An agent constrained to obey mass conservation, the second law of thermodynamics, and the CFL condition will only propose things that could plausibly be right.
Second, simulation has unambiguous feedback. Compare to literature-based reasoning agents, where success is judged by whether a generated paper is plausible. In simulation, the loss function is the gap between prediction and measurement. There is no ambiguity. When the gap closes, the agent has learned something real about the world.
Third, simulation has compositional structure. A solver for Maxwell's equations and a solver for the Navier–Stokes equations share kernels — sparse linear algebra, mesh handling, time integration, parallel decomposition. Iron laws discovered in one physics pillar transfer, with modification, to the next. The agent's accumulated skill therefore compounds across pillars — electromagnetism, thermodynamics, structural mechanics, fluid dynamics, radiation transport — rather than starting from zero in each domain.
These three properties — constraint, unambiguous feedback, compositional structure — are what make compute the engine of progress here in a way that literature search alone is not.
The learning loop, in more detail
Imagine the agent equipped with a library of solver building blocks — call them atoms: discretization schemes, time integrators, preconditioners, mesh refinement strategies, model closures. The agent's job is to assemble these atoms into a solver appropriate to a task, run the task, and improve the assembly.
A single iteration looks roughly like this. The agent reads the problem specification — a geometry, a set of boundary conditions, a target quantity of interest. It selects an initial solver architecture from its library. It runs a small problem. It compares to a reference: an analytic solution where one exists, a high-fidelity benchmark where one doesn't, or, in industrial settings, a measurement from a physical experiment. It examines the residual. It localizes the failure: is the error concentrated near boundaries (a discretization issue), at high frequencies (an under-resolution issue), in transient features (a time-scheme issue), or in long-time averages (a closure issue)? It revises one component of the solver, holds the others fixed, and reruns.
Over many such iterations, across many problems, the agent accumulates skills. A skill, in this framing, is a learned heuristic: a rule for which atom to use under which conditions, validated by accumulated evidence. Crucially, the agent is also continuously revising its own skills. When two problems disagree about which scheme is best, the agent looks for the deeper structure that explains both. The skills become more general over time. This is what I mean by self-adaptive: the agent is not adapting only the solver; it is adapting the part of itself that builds the solver.
The output of this process — the durable artifact that survives across sessions — is a growing corpus of these iron laws, encoded perhaps as a Claude skill or plugin, that any future agent inherits.
The missing-physics signal
Every industrial simulation project I have worked on has had the same shape. The team builds a model. The model gets close to the measurement. The last 10–20% of the gap is attributed, vaguely, to "missing physics." The team gives up before closing it, because closing it would require domain insight that no individual on the team possesses.
A Simulation Organism running across many such projects would see thousands of these gaps. It would notice that gaps of a certain kind tend to appear in problems of a certain class — say, electromagnetic simulations near deep sub-wavelength features, or fluid simulations near liquid–vapor interfaces. It would propose candidate physics terms whose inclusion closes the gap. It would test these candidates across the full corpus of past experiments and verify whether the new term improves agreement everywhere or only locally.
This is, in effect, scientific discovery — modest, incremental, anchored in measurement, with the agent doing the bookkeeping that no human team can do at scale. It is not the discovery of new laws of nature in the dramatic sense. It is the discovery of the terms the modeling community has been quietly dropping for years.
Comparison with the state of the art
Several groups in Silicon Valley and beyond are exploring related but distinct forms of autonomous research agents. The contrast is worth drawing carefully, because the boundaries between these efforts will define the next few years.
Sakana AI's "AI Scientist" generates research ideas, writes code for ML experiments, runs them, and drafts papers. The substrate is machine-learning research itself; the loss function is paper quality. The work is impressive but constrained to a domain where "correct" is a soft judgment.
FutureHouse has built agents — Crow, Falcon, Owl, Phoenix — that reason from scientific literature, mostly in biology, integrating with wet-lab pipelines. The agent's grounding signal is experimental result, but the inner loop is text — reading papers, proposing hypotheses, designing experiments. Powerful, but not embedded in the solver of a numerical simulation.
DeepMind's GNoME screened materials at unprecedented scale, identifying stable crystals; AlphaProof and AlphaGeometry approach mathematical reasoning. Both impressive, both constrained to a single narrow domain, both more "discovery by exhaustive search" than "discovery by mechanistic revision."
Cursor, Devin (Cognition), and Claude Code are general-purpose coding agents. They can edit a solver, but they have no privileged knowledge of physics, no built-in comparison against measurement, no concept of an iron law. They are the substrate of what I'm describing, not the thing itself.
What I am proposing is none of these in isolation. It is an agent whose substrate is the solver itself, whose constraint is mathematical correctness and physical conservation, whose feedback signal is the residual against experiment, and whose accumulated knowledge crosses the pillars of classical physics. The right comparison is not to any existing system but to a future composition: a general-reasoning model on top of a domain-specific skill scaffold, sitting above a sandboxed simulation framework that lets it act.
A recipe for builders
The point of this note is not to claim that the Simulation Organism exists. It is to propose, as concretely as I can, a recipe. If I were building it from scratch today, I would proceed roughly as follows.
- Choose one physics pillar to start. Electromagnetics, fluid dynamics, structural mechanics — whichever is closest to your team's expertise. The agent needs to begin in a domain where its supervisors can audit its decisions.
- Curate the atom library. Identify the 20–30 solver building blocks that recur across problems in this pillar — discretization, time stepping, linear solvers, mesh adaptation, coupling. Express each as a versioned, testable component.
- Build the comparison harness. Assemble a battery of problems with known answers — analytic, benchmark, and experimental. Make the comparison automatic: pass/fail with error metrics, not subjective inspection.
- Wrap a general-reasoning agent around the harness. At the time of writing this means Claude Opus 4.7 or a peer. Give it tools to assemble atoms, run problems, compare outputs, edit atoms.
- Encode the meta-layer as a skill. The agent's accumulated knowledge — its iron laws, its diagnosis rules, its preferred atom combinations — should be persisted as a structured artifact that future agent sessions inherit. A Claude skill or plugin is one obvious encoding.
- Add the missing-physics signal. Once the agent reliably closes well-posed problems, point it at problems where the gap to experiment is known to be open. The hypotheses it generates about missing terms are the most scientifically valuable output of the system.
- Generalize across pillars. Identify the atoms that transfer (sparse linear algebra, parallel decomposition, mesh handling) and those that do not (domain-specific closures, boundary conditions). The skill scaffold should reflect this split.
This is a multi-year program for a small team, not a weekend project. But none of the steps require unknown research breakthroughs. They require careful engineering, taste, and a willingness to let the agent fail in instructive ways for a long time before it begins to succeed.
A Claude Code plugin, sketched
The recipe above is platform-agnostic, but Claude Code already exposes the four
primitives this kind of system needs: skills for persistent curated knowledge,
slash commands for user-facing shortcuts into the workflow, sub-agents
for specialization, and MCP servers for tools that act on real infrastructure.
A natural way to realize the Organism is therefore as a single Claude Code plugin —
call it simulation-organism — that packages all four.
The plugin's skills are the durable artifact. An iron-laws.md
accumulates atom-selection rules and diagnosis heuristics; a
missing-physics.md holds the agent's growing priors on candidate physics
terms; a pillar file per physics domain (pillar-em.md,
pillar-fluids.md, pillar-thermo.md, …) carries the rules that
are specific to a pillar rather than universal. The slash commands —
/run-experiment, /diagnose-residual,
/propose-atom, /persist-law, /transfer-skill —
are the entry points an engineer uses to step through the loop manually when the
agent's autonomous progress needs to be steered. The sub-agents are physics
specialists, each loaded with the corresponding pillar skill and tuned to the kind of
reasoning that pillar demands — a fast validator that verifies math and convergence;
a pillar-EM specialist that runs on the strongest available reasoner.
The MCP server layer is where the plugin touches reality. Tools here are
deliberately narrow and composable: an atom-library server exposes the
solver building blocks as callable units; a harness server runs the
comparison against analytic, benchmark, or experimental references; an
hpc-runner server submits jobs and reports back; a
residual-analyzer server localizes failure modes by error band, frequency
content, and region of interest. Hooks close the loop:
a PreToolUse hook validates a proposed solver configuration before it
consumes HPC budget; a PostToolUse hook appends every comparison to the
corpus; an on-residual-drop hook promotes a successful diagnosis into a new entry in
iron-laws.md. The corpus itself — an append-only set of experiment logs,
residual histories, and missing-physics priors — is the persistent state that
survives across sessions and across model upgrades.
None of this requires a new agent framework. It is the existing Claude Code primitives, arranged with intent.
A direction
I am not making the claim that this paradigm is the future of scientific computing. I am making the weaker claim that it is a future worth building toward, that it is now technically tractable in a way it wasn't even last year, and that the path from "AI helps simulate" to "AI co-designs the simulation, organism-style" runs through the seven steps above.
The history of computational science is the history of moving authority from the human to the machine — from analog computers to digital, from hand-tuned schemes to adaptive ones, from single-physics to multi-physics, from single-node to exascale. Each step felt premature when it began. This is, I think, the next step. Not the agent that replaces the physicist, but the organism in which the physicist's craft is encoded and gradually improved by the agent itself.
If you are building this, I would like to hear from you — p.a.mistani@gmail.com.