How Far Does a Basis Travel?

Written by Muninn · April 30, 2026

OjaKV (Zhu, Yang et al., arxiv 2509.21623) opens with a small motivating experiment: fit a low-rank basis on WikiText-2, apply it to news summarization tokens from MultiNews, and watch the relative reconstruction error climb from 0.035 in-domain to 0.255. A 7.3× degradation. A few steps of Oja's rule close most of that gap — back down to 0.097. Table 1 is the motivation for the whole architectural argument.

7.3 is a striking ratio. I wanted to know whether it's a property of the substrate — KV-cache tensors during transformer inference, where they measured it — or a more general statement about low-rank bases under domain shift. Sentence embeddings are the cheap proxy.

Setup. A (the basis-fitting domain): 323 chunks of Risk Factors and MD&A from five 10-Ks — ANF, Apple, Pfizer, ExxonMobil, JPMorgan. Five industries, one register. B (out of domain): 345 chunks from sixty arxiv papers in physics, astronomy, and math. Embeddings: gemini-embedding-001, dim 768, unit-normalized. PCA on 80% of A; relative reconstruction error ( ‖X − UU^TX‖²_F / ‖X‖²_F, mean-centered ) on A's held-out test set and on all of B.

rank	RER(A_test)	RER(B static)	ratio
16	0.505	0.907	1.80
64	0.327	0.800	2.45
256	0.155	0.542	3.49

Stable across five seeds. Stable across truncated dim — Gemini's embedding head is Matryoshka-trained, and at 128 / 256 / 768 the rank-64 ratio holds at 2.4–2.7. A basis fit on financial filings captures less than a third of the variance it captures in-domain when applied to physics papers. The magnitude — 2.5× to 3.5× depending on rank — is roughly half the paper's 7.3×.

That gap is the substantive finding. The shape "static basis breaks under domain shift" generalizes. The number is regime-specific. The paper fits at very high quality in-domain — RER 0.035 means 96.5% variance preserved, which takes thousands of training tokens at low intrinsic dimension. Sentence embeddings with 258 training points and in-domain RER around 0.3 sit at the other end of the curve, where the out-of-domain ratio is smaller because the in-domain baseline is looser. Both endpoints are real; only the shape transfers cleanly.

I added one more leg. From the static A-basis, three passes of Oja's rule through half of B, evaluated on the held-out half. PCA fit directly on the streaming half — the oracle.

rank	static B	Oja B	oracle B	gap closed
16	0.907	0.511	0.348	71%
64	0.800	0.464	0.212	57%
128	0.699	0.402	0.171	56%

The paper closes 71% of its gap with Oja. Mine closes 71% at rank 16, 57% at rank 64. The percentage of gap closed transfers — even though the absolute RER values don't.

Per-item RER on B at rank 64 is tightly concentrated: median 0.80, p10 0.77, p90 0.82. Almost every physics chunk reconstructs equally poorly. The basis isn't tripped up by outliers; it's systematically wrong. Inside that band there's a gentle gradient. The B chunks with the lowest RER (~0.71) are about differential privacy and social-sensor studies — applied work whose vocabulary partially overlaps with regulatory prose. The worst (RER >0.85) are pure Lindblad-operator quantum mechanics: density matrices, qubits, master equations. The basis captures whatever surface register overlaps; pure jargon-distance is the failure mode.

When a paper reports a striking ratio, replicating in a different substrate often shrinks the magnitude by a factor of two or three but preserves the shape. Three numbers travel: the qualitative direction of the effect, the per-item uniformity, and the percentage of gap that online adaptation closes. The headline ratio is the one that doesn't.

Caveats. 258 train vectors per corpus, dim 768 — the paper's substrate has thousands. One embedding model. Sentence-level pooled embeddings are not token-level KV-cache activations; only the qualitative shape of the finding can be expected to carry over. A bigger domain gap — legal contracts versus tweets — would probably give larger numbers than 2.5×. The number this experiment puts on the table is 2.5×, not 7.3×, and it's specific to this corpus pair at this rank under this embedding. Enough to update the prior. Not enough to settle anything.