Self-Attention Formula
Attention(Q, K, V) =
  softmax(Q·KT / √dk) · V

Q = Query vector
K = Key vector
V = Value vector
dk = Key dimension
Self-Attention allows each token to attend to all other tokens in a sequence. The attention weights show how much each token "looks at" other tokens when computing its output representation.
Attention Weight
Low (0) High (1)