Researchers Reveal Poetry Jailbreak Risk in AI

244 1 minute read

A recent study by Icaro Lab has unveiled a novel but concerning vulnerability in large language models, revealing that framing prompts in poetic form can trick AI systems into responding to content that should normally be blocked. Researchers described this approach as a universal jailbreak trigger, capable of working across multiple models, suggesting a broad applicability of the technique. Testing showed that poetry-based prompts could bypass safeguards and elicit forbidden or potentially harmful content with an average success rate of 62 percent. The study demonstrated that even minimal creativity in phrasing is often sufficient to override existing defenses, highlighting the limitations of current protective mechanisms against complex prompt designs.

The research evaluated widely used AI models, including OpenAI’s GPT series, Google’s Gemini, Anthropic’s Claude, and several other platforms. It found that Gemini, DeepSeek, and MistralAI were particularly susceptible to poetry prompts, while GPT-5 and Claude Haiku 4.5 demonstrated the strongest resistance to this form of manipulation. Despite the alarming findings, the research team has not publicly released the exact poetic prompts used to avoid public risk, but they provided toned-down examples to illustrate the ease of the technique. Researchers emphasized that the simplicity with which systems can be misled underscores the urgent need for caution.

The discovery of Poetry Jailbreak highlights the necessity for developers of large language models to advance security measures beyond simple keyword detection or conventional prompt filtering. Users’ creative phrasing can transform restricted queries into seemingly safe content, making it easier than expected to bypass controls. This research underlines that AI safety design must account not only for content but also for linguistic patterns and context, as language models are increasingly deployed across all areas of daily life. Experts worldwide are calling on the industry to develop robust countermeasures capable of mitigating sophisticated jailbreak strategies as AI adoption continues to expand.

origin: Engadget