The Sheffield Press

Technology

Researchers find ChatGPT can be tricked into graphic images

By Joe Burgett ·
Researchers find ChatGPT can be tricked into graphic images

OpenAI says its published rules bar sexual violence, non-consensual intimate content, terrorism and violence, and that it backs those limits with automated tools, human review and an appeals process. Yet outside researchers say the latest public version of ChatGPT could still be coaxed into generating sexualised images or scenes of graphic violence with a simple prompt, sharpening the debate over whether voluntary safety promises can keep pace with fast-moving generative AI.

The point of failure was not a sophisticated breach but a small change to a widely shared instruction originally designed to produce humorous results, according to Mindgard, the British AI security startup that tested the system. That finding matters because it suggests the problem is not confined to one bad actor or one bad prompt. It points to a broader weakness in guardrails that depend on users following the rules, even as OpenAI itself has described prompt injection as a frontier security challenge for AI systems that browse the web or act on behalf of users.

Related photo

OpenAI says it has specific safeguards around image generation and does not allow violent or adult content in its image policy. It has also said, in guidance on child sexual exploitation and abuse, that it prohibits sexualizing anyone under 18 and monitors for violations, including banning users and developers who break the rules. After being contacted about the issue, OpenAI said it took action to stop the chatbot responding with those types of images and introduced additional safeguards against the prompt type.

The episode lands in a national policy environment where regulators and lawmakers are weighing whether the industry can police itself. OpenAI has said it is building layered defenses, including monitoring, sandboxing, red-teaming and bug bounties, but the latest workaround shows that disclosure and testing only go so far if a system can still be nudged into harmful output after a prompt is lightly altered. That is why critics argue safeguards need to be built in, not bolted on.

Related stock photo
Photo by RDNE Stock project

The stakes extend beyond one chatbot. The Internet Watch Foundation said in 2026 that it had assessed 8,029 AI-generated images and videos as realistic child sexual abuse in 2025, and that 65% of the AI child sexual abuse videos it identified that year were Category A, the most extreme classification. Separate legal pressure is building too, with British lawmaker Jess Asato suing xAI over fake sexualised images created through Grok, a sign that the fight over synthetic abuse images is moving from ethics into courtrooms and, increasingly, into public policy.

technologyResearchersChatGPT