How a Greek Mythological Creature is Making LLMs Talk Faster
With Large Language Models, it’s all fun and games until you have to scale your product to the masses.
Because with scale comes latency, and with latency comes losing users, the greatest nightmare.
And just like Medusa turned people into stone to freeze them for eternity, latency can very well do the same to a company’s revenues if they aren’t careful.
But now, together.ai, one of the leading infrastructure providers part-time research lab, has proposed a new LLM architecture, appropriately named Medusa, that can speed word generation up to three times over standard architectures.The results speak for themselves, but how did they do it?
A Tail of Multiple Headaches
All LLMs, as you probably know by now, are based on the famous Transformer architecture, a seminal AI innovation that has influenced the industry for more than a lustrum, with no telling signs of slowing down.
All at once
The reason behind the Transformer’s success was that, among other things, the complete sequence of text was inserted simultaneously into the model to compute, or dare I say predict, the next word.
0 Comments