neural / embeddings

Every word is a coordinate.
Meaning is geometry.

Each token below is a hand-built 8-D semantic vector. The 2-D layout is not decorative — it comes from classical MDS on the cosine-distance matrix, the same projection technique used to flatten real word2vec and BERT spaces. Pin a token to read its actual top-5 cosine neighbors.

8-D semantic space — MDS projection

hover to identify · click to pin · drag τ to thin or thicken the edge graph · edges drawn iff cos(a, b) > τ

dimensions

1,536

text-embedding-3-small. Each token is a point in 1,536-dimensional space. You can't visualize it. That's fine.

cosine similarity

0–1

The only distance metric that matters in high-dimensional space. Euclidean distance lies. Cosine tells the truth about angle.

nearest neighbors

k=5

RAG is just kNN with extra steps. The retrieval quality ceiling is set by your embedding model, not your prompt.

MDS stress

≈ 0.08

Classical MDS preserves pairwise distances in the L2 sense. Some collapse is unavoidable when projecting 8 dims to 2 — this is the cost of legibility.

embedding laws

king − man + woman ≈ queen. This arithmetic works. It's been working since 2013. We're still figuring out what it means.

Similar concepts cluster. Analogies form parallelograms. The geometry of meaning is real and measurable.

The model doesn't 'know' anything. It knows distances. Everything else is projection — yours, mine, the attention head's.

Fine-tuning moves points. RLHF reshapes the manifold. Every training step is a coordinate transformation on meaning itself.

what this map is honest about

Real embedding spaces are 768 to 4096 dimensions. This one is 8. Real models learn vectors from billions of training tokens. These are hand-built so the geometry has predictable structure you can verify by eye. What survives the simplification: cosine similarity is still the right metric, MDS still finds the lowest-stress 2-D layout, and nearest-neighbor lookup still routes you to semantically related tokens. Same machinery. Smaller stage. Drag τ to watch the cluster graph form and dissolve as you change what counts as "close".

→ context window laws