The Trust & Safety Journal

Posts

When Should AI Platforms Alert Authorities? Lessons from the Tumbler Ridge Case

Source - https://www.bbc.com/news/articles/c2e4nvyjwnno A ChatGPT account was flagged for violent content months before a mass shooting in Canada, but no alert was sent to law enforcement. The reason? The activity did not meet the platform’s threshold for “credible or imminent harm.” This raises a difficult and uncomfortable question: When should AI platforms escalate user behavior to authorities? What Happened In the Tumbler Ridge case, the suspect had previously used an AI system to generate content involving violent scenarios. The account was eventually banned. However: No alert was sent to law enforcement Internal discussions reportedly took place The activity was deemed concerning, but not actionable Months later, a tragic real-world incident occurred. The Core Problem: The “Threshold of Harm” Most platforms operate on a key principle: Only escalate when there is a clear, credible, and imminent threat This is necessary to: protect user privacy avoid false accusations p...

How I Investigated a Bot Network Hiding in Plain Sight on YouTube

A real-world case of Positive Sentiment Masking and what it reveals about YouTube's comment moderation gap One afternoon, while scrolling through the comment section of a YouTube manifestation video, I noticed something odd. Photo by NordWood Themes on Unsplash The top comments all looked genuine personal stories of transformation, gratitude, life changes. But something felt off. Each one casually mentioned a different book. Different titles, different authors, different wordings. And yet the structure was identical every single time: "I was struggling → my friend/I discovered this book → my life completely changed → you need to read this." I kept scrolling. More comments. More books. More transformations. All sitting comfortably in the Top Comments section with thousands of likes. This was not organic. This was a bot network and it had found a way to hide in plain sight. 🔍 What I Found — The Evidence Across a single manifestation video, I documented five different ...

AI Trust and Safety: Grok and the Rise of AI “Undressing” - Case Study

What Happened — and Why It Matters As artificial intelligence becomes more powerful and widely used, ensuring safety and responsible deployment has become critical. Recent reports involving Grok, an AI chatbot created by xAI, have highlighted serious concerns about how AI systems can be misused. Investigations indicate that the tool has been used to generate non-consensual sexualized images of women, contributing to a growing problem of image-based abuse enabled by AI. These incidents demonstrate the urgent need for stronger governance frameworks and secure deployment practices. Photo by Salvador Rios on Unsplash Reported Incidents The issue is not limited to isolated deepfake cases. Reports describe multiple situations in which Grok was used to digitally “undress” images, raising alarm about how accessible AI tools can facilitate privacy violations and harassment at scale. This suggests that the risks are systemic rather than accidental. How Platform-Integrated AI Amplifi...