A Metaphoric Explanation of Key

 

How GPT works: A Metaphoric Explanation of Key, Value, Query in Attention, using a Tale of Potion




The backbone of ChatGPT is the GPT model, which is built using the Transformer architecture. The backbone of Transformer is the Attention mechanism. The hardest concept to grok in Attention for many is Key, Value, and Query. In this post, I will use an analogy of potion to internalize these concepts. Even if you already understand the maths of transformer mechanically, I hope by the end of this post, you can develop a more intuitive understanding of the inner workings of GPT from end to end.

This explanation requires no maths background. For the technically inclined, I add more technical explanations in […]. You can also safely skip notes in [brackets] and side notes in quote blocks like this one. Throughout my writing, I make up some human-readable interpretation of the intermediary states of the transformer model to aid the explanation, but GPT doesn’t think exactly like that.

GPT can spew out paragraphs of coherent content, because it does one task superbly well: “Given a text, what word comes next?” Let’s role-play GPT: “Sarah lies still on the bed, feeling ____”. Can you fill in the blank?

Post a Comment

0 Comments