Attention Mechanism - Neural Network Playgrounds

Example Sentence

Click a token to see its attention:

Self-Attention Formula

Attention(Q, K, V) =
softmax(Q·K^T / √d_k) · V

Q = Query vector
K = Key vector
V = Value vector
d_k = Key dimension

Self-Attention allows each token to attend to all other tokens in a sequence. The attention weights show how much each token "looks at" other tokens when computing its output representation.

Attention Weight

Low (0) High (1)