neural / latent-space

Pick two tokens. Walk the principal axis.
The model's geometry is touchable.

Latent space is what every generative model interpolates through to produce a sample. Below, 20 hand-built 8-D semantic vectors — the same vocabulary used by the analogy explorer — but the lens is different. We compute the principal components live, then let you scrub a latent between any two endpoints and watch the decoded token shift along the way.

PCA traversal — power iteration + Hotelling deflation on the 8×8 covariance

pick A and B · drag t (or auto-scrub) · the amber dot is z(t) in the PC1×PC2 plane · the bottom strip shows every word's PC1 coordinate so you can see what gets crossed

PC1 variance

46.6%

Computed live from the 8×8 covariance via power iteration. Dominant axis: −human + machine. The model's biggest source of variance is who's doing the thinking.

PC2 variance

41.0%

Second axis: +entity − action. Once you know human vs machine, the next bit you need is noun vs verb. Together PC1 + PC2 explain 87.6% of the structure.

interpolation

linear in 8-D

z(t) = (1−t)·A + t·B in the original space. Because projection is linear, this also linearly interpolates the PC coordinates — the path on the canvas is a straight line by construction.

decode

argmax cos(z, vᵢ)

At every t we 'reconstruct' by nearest-neighbor in the vocabulary. Real generative models decode through a learned head; the principle is the same — pick the most probable token under the latent.

analogy view — same vocabulary, different lens

analogy::::infer

20 words · 8D feature space · 2D projection · linear arithmetic in the original 8D · mechanism: B − A + C finds the word whose direction from C matches B from A

the analogy view solves B − A + C in the same 8-D space — vector arithmetic is just another way of moving along latent directions. PCA finds the directions; analogies use them.

latent-space laws

Pearson 1901 / Hotelling 1933: principal components are the eigenvectors of the covariance matrix, ordered by eigenvalue. They are the directions of maximum variance, and they are orthogonal by construction.

Linear interpolation in latent space is the workhorse of generative models. king − man + woman ≈ queen is one step. user → agent at t = 0.5 is the same operation in shorter form.

Decoding is not the latent. The continuous z(t) lives between vocabulary points; the nearest-token snap is a quantization. Real models smooth this with a softmax head — same math, more buckets.

PCA is the linear ceiling. When the manifold is curved (and in real LLMs it always is), PC1 + PC2 leaves residual variance in PC3+. The 8.4% in PC3 here is the curvature you can't flatten linearly.

what this traversal is honest about

Real diffusion and language models walk through 768- to 4096-dimensional latents during sampling. This page walks through 8. The math is identical: compute covariance, take eigenvectors, project, interpolate. What survives the simplification is the structure — the human/machine axis falls out of the data without us labelling it, the entity/action axis falls out orthogonally, and you can watch a token cross both as you drag the slider. That's the whole content of "the model has a latent geometry": directions that mean something, and you can walk between them.

→ diffusion · the same interpolation but the endpoint is noise