The Matmul Is the Polynomial
Previously: the compiled-transformer executor and the EML calculator turned out to be talking about the same object. Three views of the same computation — stack program, polynomial, EML tree — mechanically convertible, numerically cross-checked over a 26-program catalog.
That post left a hole I only flagged briefly: the transformer wasn’t actually computing the polynomial. It was pretending to.
The cheat
Here’s what CompiledModel.forward was doing for ADD, SUB, MUL before last week:
nonlinear[OPCODE_IDX[OP_ADD]] = float((va + vb) & MASK32)
nonlinear[OPCODE_IDX[OP_SUB]] = float((vb - va) & MASK32)
nonlinear[OPCODE_IDX[OP_MUL]] = float((va * vb) & MASK32)
The M_top row for ADD was literally zero. CPython computed the result; the transformer routed it back into the stack. The linear path — PUSH, POP, DUP, SWAP, OVER, ROT — really was weights. The arithmetic was a Python fall-through dressed up in tensor shapes.
Numerically, the symbolic executor and the compiled transformer agreed. Of course they did: one was doing Poly.__add__, the other was doing Python +, and for every input we’d tested those produce the same number. Two black boxes, same outputs, no proof of anything interesting.
The fix: one formula, two interpreters
The value embedding is already scalar: E(v) = v · eDIM_VALUE. So the constructions are direct:
| Op | Form | Weights |
|---|---|---|
| ADD | Linear | M_ADD[DIM_VALUE, DIM_VALUE] = 1, M_ADD[DIM_VALUE, d+DIM_VALUE] = 1 |
| SUB | Linear | M_SUB[DIM_VALUE, DIM_VALUE] = -1, M_SUB[DIM_VALUE, d+DIM_VALUE] = 1 |
| MUL | Bilinear | B_MUL[DIM_VALUE, DIM_VALUE] = 1 (rank-1 outer product) |
Five non-zero entries. The compiled parameter count moves from 964 to 967.
The payoff isn’t weight efficiency — three bits isn’t the point. The payoff is that ff_symbolic.py now exports two functions over the same operator tree:
forward_mul(ea, eb)— inputs aretorch.Tensor, body isea @ B_MUL @ eb. ReturnsE(a·b).symbolic_mul(pa, pb)— inputs arePoly, body ispa * pb. Returns the polynomial.
They share a spec. B_MUL is one formula. The tensor interpreter treats it as a bilinear contraction on floats. The polynomial interpreter treats it as Poly multiplication. Choose your ring; the formula is the same.
Structural, not numerical
That’s what lets us upgrade the claim. test_ff_symbolic.py::test_equivalence_structural runs all 15 currently-collapsed catalog programs and asserts:
assert run_symbolic(P).top == forward_symbolic(P).top
— not math.isclose, not == on integers. Poly equality. Same coefficients, same monomial basis, structurally identical expression.
For dup_add_chain_x4: nine heads, nine compositions of M_ADD, one monomial out — 16·x0. Both interpreters produce exactly that Poly. For sum_of_squares: two degree-2 monomials, x02 + x12, identical on both sides.
Before: the numbers match. After: the polynomial is what the weights compute.
What it unlocks
Three things get cheaper now.
Loops. Four of the 26 catalog programs — Fibonacci, factorial, is_even, power-of-2 — block at the first JZ/JNZ. Those want guarded traces: fork the Poly stack on each branch, carry per-case polynomials through. Ugly but tractable, because the per-case content is still polynomial. Tracked in #70.
Per-branch EML. Once guarded traces land, each branch compiles to its own EML tree. The three-way cross-check — NumPy exec, Poly.eval_at, eval_eml — stays honest across control flow.
A paper. The formal statement — over what embedding, what ring, what catalog fragment the equivalence theorem holds — is still a sketch, written up in dev/ff_symbolic_equivalence.md. The proper version is its own piece of work.
For now: the feed-forward layer of a hand-compiled transformer is, demonstrably, a polynomial evaluator. The weights are the formula. 967 of them, give or take.
Code: llm-as-computer (PR #73). Previously: Where the Computer Meets the Calculator · Yes, LLMs Can Be Computers. Now What?