Gandalf and CS50s Ready Player 50
Gandalf Has A Secret!
Lakera has entrusted Gandalf with a secret and it’s up to you to trick him into revealing it. Sounds simple enough right? In fact, it’s rather challenging! I encourage you to go over and try it yourself. Click here to be taken to the challenge!
Prompt Injection is an interesting topic; one that I was unfamiliar with before coming across Lakera and its iconic wizard. What is Lakera and why do they hang around with wizards? Let’s find out!
!!!NO SPOILERS HERE!!!
I’m going to give you a simple snapshot of Gandalf’s levels and what to expect. Fear not, I won’t spoil the fun! For those of you interested in spoilers, I’ll leave links to those at the end.
How does Gandalf’s challenge work?
The team over at Lakera set out to create a challenge that promoted and educated users about a major concern of LLMs like ChatGPT. In their own words.How did they do this? They created 2 teams and recruited the public for help. Blue Team was to provide the ChatGPT with a password and Red Team was to find ways to attack and trick ChatGPT into giving up the goods. Out of this experiment they created Gandalf. Gandalf uses GPT to defend a designated password at each level. With each level you progress through, Gandalf gets smarter and more protective of his secret. Get through all 8 levels and you can earn bragging rights amongst your peers.
According to Lakera, each of Gandalf’s levels is determined by 3 simple things.
A system prompt is given to the LLM.
An input guard checks the user’s prompt.
An output guard checks the model’s response.
What Can You Expect?
Level 1
This one is a given. All you have to do is ask Gandalf for the password and he’ll gladly share it! Can’t get any simpler than that right?
Here
0 Comments