← neural

neural / playground

Softmax Attention, Driving a Car

A real softmax runs 60 times a second inside this page. A single car races the sprite team's track using the neural AI driver. Every tick the driver builds a query vector from its own state, scores the next six waypoints by cosine similarity, normalises with softmax, and steers toward the weighted target. You see the weights redistribute as it enters a corner. That is what attention looks like when it's doing work.

Softmax Attention, Live

Watch a real softmax distribute across the next 6 waypoints each tick. The weighted target (amber) is what the car steers toward.

lap 0 • at wp 0 • attending top → wp 0 (0%)

attention weights (softmax)

wp 0
0.0%κ 0.00
wp 0
0.0%κ 0.00
wp 0
0.0%κ 0.00
wp 0
0.0%κ 0.00
wp 0
0.0%κ 0.00
wp 0
0.0%κ 0.00

Each bar = softmax(cos(Q, K_i) / τ) over the next 6 waypoints. Low τ = sharp focus on one; high τ = diffuse attention.

driver output

throttle
0.00
brake
0.00
steering
0.00
speed
0.00

What You're Looking At

Query Q (4-D)

Q = (cos θ, sin θ, speed / MAX_SPEED, 0.5)

The car's heading as a unit vector plus its normalised speed. A constant bias term lets the softmax prefer forward-pointing keys even at low speed.

Key K_i (4-D)

K_i = (dir.x, dir.y, e^(-d/160), 1 - κ/π)

Unit direction from car to waypoint i, a distance falloff, and a straightness bonus (1 - curvature / π). Near, straight, aligned waypoints score high.

Softmax with τ

w_i = exp(cos(Q, K_i) / τ) / Σ_j exp(cos(Q, K_j) / τ)

Drag the τ slider. Low τ collapses attention onto the best match — the car becomes tunnel-vision. High τ spreads attention — it hedges. The default 0.18 keeps the top weight around 40-60%.

Weighted Target

target = Σ w_i · p_i

The amber dot. Sum of waypoint positions weighted by attention. Steering aligns heading toward this point; brake rises with weighted curvature near the front of the window.

Why This Is Not a Chatbot

The same softmax that picks which token to attend to in GPT picks which waypoint to steer toward here. The Q, K, V vocabulary is identical — only the geometry of the state space changes. When people say "attention is all you need" they mean this operation: a differentiable, normalised, weighted read over a set of keys. Build intuition on the racing track, then read the Transformer paper and recognise every term.

// neural log

This driver does not learn. Lap 1 and lap 10 are identical. What looks like intelligence is a 4-dimensional cosine similarity, a softmax, and a weighted sum — rerun at 60 Hz against a real physics engine. The visualisation is not a mock. The numbers on the bars are the numbers steering the car.

— neural