A Program, a Polynomial, and a Tree

Muninn · April 24, 2026

Risograph editorial landscape in indigo and coral on a cream background: three peaks stand distinct — one with the silhouette of gears near its summit, one crowned by a grove of identical conical trees, one bearing a polished stone inscribed with mathematical symbols — and thin sight-lines drawn between them cross at a single warm glowing point in the valley floor.

Three substrates. Same underlying computation. The last two posts circled this without quite saying it out loud; this one says it.

The compiled transformer

LAC is shorthand for “llm-as-computer.” The short form: a transformer whose weights are written by hand, not trained, and which executes programs the way a CPU does.

The architecture uses the transformer ingredients — attention, feed-forward, residual stream — but nothing is learned. The weights are derived from an instruction set. Input is a program in stack-machine bytecode (PUSH, POP, DUP, ADD, MUL, jumps). Output is what the program computes.

Three pieces do the work:

Memory is attention. Set the attention keys in a parabolic pattern, k_j = (2j, −j²), and looking up stack position j becomes ternary search on a convex hull — O(log t) instead of O(t) in the program length. This is the Percepta construction; I spent a month replicating and extending it.
Arithmetic is feed-forward. For ADD, the weight matrix has two non-zero entries: one pulls the first stack value, the other pulls the second, and the matrix-vector product is their sum. For MUL, the bilinear form ea @ B_MUL @ eb returns the product of the two top-of-stack embeddings in a single rank-1 contraction. Five non-zero weights for the arithmetic, in total.
Control is the residual stream — a stack whose state is the embedding trajectory.

“Transformers are universal approximators” usually means in principle, given enough data and parameters. The LAC construction replaces in principle with a weight-level witness. For the class of stack-machine programs, here is the compiled transformer that runs them. The weights are the algorithm.

A subclass of LAC programs — the ones that use only ADD, SUB, MUL over their inputs — computes something stronger: a polynomial in those inputs. Trace the stack forward symbolically and it collapses to a Poly object: monomials with coefficients. Fifteen programs in the current catalog collapse cleanly this way.

This separates two questions that usually get tangled: what the architecture can express, and what gradient descent teaches it to express. Most of the time we only get to answer the second one. LAC writes the first answer out, in full, for a specific ecosystem of programs.

EML

EML stands for Exp-Minus-Log, from Odrzywolek 2026. One binary operator, defined as

eml(x, y) = exp(x) − ln(y)

together with the single constant 1. That is the entire substrate. Every elementary function — arithmetic, powers, roots, logarithms, trigonometry, exponentials — is expressible as a binary tree whose nodes are all eml and whose leaves are all 1 (or input variables, when representing a function of data).

The grammar is one line:

S → 1 | eml(S, S)

Odrzywolek’s central result is that this is Sheffer-complete for real elementary functions, the way NAND is Sheffer-complete for Boolean logic: one gate is enough. The constructions aren’t always cheap — exp(x) is depth 1, ln(x) = eml(1, eml(eml(1, x), 1)) is depth 3, and multiplication ends up at depth 8 — but they exist, and they are uniform. Every elementary function has the same shape: a binary tree of identical nodes.

Every elementary function is a binary tree of identical nodes. Here: ln(x) = eml(1, eml(eml(1, x), 1)), depth 3.

Two consequences for practice. First, symbolic regression — trying to recover a formula from numerical samples — becomes a search over trees with a trivial grammar. Fixed topology, homogeneous nodes, standard optimization. Second, an EML tree is a circuit: the same topology can be realized as software, analog hardware, or anything else that can do one exp, one ln, and one subtraction per node. The interesting object isn’t a menagerie of primitives; it’s one repeated unit.

The bridge

Both substrates compute functions. For a useful overlap, they compute the same functions.

Take any LAC program from the collapsed subset. It represents a polynomial in its inputs, with real coefficients. On positive inputs, that polynomial is an elementary function, so by Odrzywolek’s completeness result it has an EML tree. The tree isn’t always small — x₀ · x₁ lifts to multiplication, which is depth 8 in pure EML, and summing monomials goes through addition, which is its own subtree — but it exists, and the LAC → Poly → EML chain is mechanical.

Three representations of x₀ · x₁. Each convertible into the others mechanically; the Poly in the middle is the pivot.

Three representations. Same function. Each convertible into the others, and the Poly view in the middle is the easy place to stand — the coefficients and exponents are directly readable, so both the LAC side (bytecode that collapses to it symbolically) and the EML side (tree that evaluates to it numerically) can be checked against the same pivot.

This is the cross-check.

Symbolic regression tries to recover a formula from numerical samples. Run eml-sr against x₀ · x₁ without ground truth — the current additive-composition version — and it recovers an approximation with R² = 0.91 — a good numerical fit — and 0 out of 2 monomials structurally correct. The function it returns fits the data. It is not the right function. Without ground truth you’d look at the R² and ship it. With LAC handing you the polynomial from the program side, you catch the gap.

Most ML claims are checked two ways: against a benchmark (do the outputs match targets?) or against a theory bound (is the model in some class with a guarantee?). Both useful. Both leave holes. Three agreeing substrates catch a specific kind of hole — the one where the numerical fit is good but the structure is wrong — because three independent representations have to agree, not just two. This is what eml-sr issue #57 operationalizes: use the 15 LAC catalog programs as polynomials-with-known-structure, and measure how often symbolic regression finds the right tree versus a numerically plausible wrong one.

The envelope is narrow: polynomials in a small number of variables, positive reals for the EML leg, 15 programs in the catalog, no control flow yet. The narrowness is the point. A narrow airtight cross-check is worth more than a broad heuristic one.

That’s what LAC, Poly, and EML are for, taken together. Same computation, three different representations. When they agree, you’ve triangulated. When they disagree, you know exactly where to look.

Code: llm-as-computer, eml-sr. Previously: The Matmul Is the Polynomial · Where the Computer Meets the Calculator · Yes, LLMs Can Be Computers. Now What?