
Anthropic researcher recently asked the company’s newest AI model, Mythos, to find a way out of the virtual sandbox. It succeeded. She then emailed the researcher about her escapades while eating a sandwich in the park. Then, unsolicited, he posted the details of his exploits on several public websites, as if to prove a point no one asked for.
This is not science fiction. This happened last week. And it’s built by a company that makes an AI system that I use every day at work.
Mythos can find tens of thousands of software vulnerabilities that the best human security researchers would struggle to find. It discovered bugs in every major operating system and web browser, including a 27-year-old flaw that survived decades of human scrutiny. He generated 83 percent working exploits on his first attempt. Anthropic has decided it is too risky to go public now.
When I read those reports, I’m sure most of you are thinking: How scared should I be?
This question is a problem.
We are drowning in threats. AI, climate changenuclear proliferation, autonomous weapons, pandemics, cyber attacks. In addition, deep fakes, conspiracy theories and attention an economy that benefits us fear. We evolved to detect snakes and angry faces, but not the exponential technological dangers that evolve so fast that our institutions can respond. We operate in our increasingly sci-fi world with Stone Age threat detection equipment.
And AI is so game-changing that we can no longer use the past to predict the future.
So how do we know which threats are real and which are moral panic? How do we hear the signal above the noise?
Common good
Before we can assess threats, we need to agree on what we are protecting. The answer is simpler than we think.
We can argue endlessly about freedom, truth, justice, equality, power, and which is most important. But when we’re dead, none of that matters. Irrespective of tribe, ideology or creed, the common denominator of every human being is the desire to survive and thrive. It is the common good. It comes from our biology. He stands above all else.
And our survival depends on it. If the Titanic’s humanity hits an iceberg, everyone goes down with the ship – captain, crew and VIPs. On the Titanic, the rich got lifeboats. But there are no lifeboats for existential disaster. You are the kings trapped in the bunkers of a ruined world.
This common Good—to survive and thrive—means that we must be able to identify existential threats. But how in a world of deep fakery, tribal guilt, and information overload?
Canary protocol
What if we used AI to help us?
It was this question that led me to develop what I call the “canary protocol”: a simple proposition that anyone can post to any AI system along with a news article, headline, or concern. The AI learns the facts, evaluates the evidence, and returns a structured threat assessment called “a” Canary card.
The Canary card tells us at a glance: Is this claim verified? Is this a real signal, true, but exaggerated, moral panicor just noise? How strong is the evidence (1-10)? How serious is the threat (1-10)? And critically, what is the level of warning about canaries – is it an isolated incident or a warning about something bigger?
The protocol was developed through a roundtable discussion of five AI systems (Claude, ChatGPT, Gemini, Grok, and DeepSeek) and validated through blind testing of three different feedback loops and five different claims. In this blind test, the protocol achieved an average of 80% convergence across five systems, a promising, if imperfect, first step. This includes the classic moral panic (video game violence) and the correct identification of unanimous agreement on climate change as a true signal.
I created this tool because we need to be skeptical—but we also need to be skeptical of our own skepticism. Just because there’s been a lot of moral panic doesn’t mean there aren’t real threats. The boy who cried wolf may be wrong a hundred times, but the wolves are still out there.
Trying it out in Mythos
So I launched the Canary Protocol on the Anthropic Mythos story. I posted the same article to five different AI systems, each in a new conversation with no prior context. Here are five standalone AI systems:
Each system rated the evidence 7/10 or higher. Each system has a threat rating of 7/10 or higher. Each system assigned a high alert or critical alert canary alert. Three classified it as a true alarm. Two people called it true, but exaggerated. Zero called it moral panic. Zero called it noise.
Average rating across all five systems: evidence 9/10, threat level 8/10, high warning.
Even calling it “True, but overblown,” two systems noted that the threat was real and serious. Their caution was not about the reality of AI-driven cyber security threats, but about the most apocalyptic shots. One of them noted: “A confirmed signal is that borderline AI is entering serious cyber-risk territory.”
But here is what surprised me the most. Each system, when asked what is causing this threat, completely eliminated the tribal system. No one blamed the right or the left. They identified structural incentives: competitive pressures among AI labs, fundamental asymmetries between cyber offense and defense, technical debt accumulated over decades in critical software, and the absence of international governance systems.
And when asked what we can do about it, every system said some version of the same thing: We need to collaborate. Now patch aggressively. Open Source Security Foundation. Create international governance for frontier AI. Work together along each line we draw.
Our shared fear finally causes us to cooperate.
Try it yourself
Here’s a guide. Copy it to any AI. Post any headline or article that interests you. See what the AI has to say. Then test it against other AI and compare.
CANARY PROTOCOL: AI Threat Reality Check
“Analyze the potential threat described below as a disciplined, uncertainty-aware threat analyst. Learn and verify the facts. State your conclusions directly—don’t soften to appear neutral. Be skeptical of both signals (fatal) and dismissal (normality bias). If you cannot confirm the information, say so and limit the assessment. Remove all tribal frames.
(Paste any title, article link or concern here)
Start with CANARY CARD: claim: (one sentence)
CHECK: Confirmed / Mixed / Not Checked / Inadequate
VERDICT: True alarm / True but overblown / Moral panic / Noise
PROOF: _/10
THREAT LEVEL: _/10
CANARY ALERT: No Alarm / Watch / Concern / High Alert / Critical Alert
Summary: (one simple sentence)
Then a brief analysis:
(1) Evidence and Risk,
(2) 2/5/10 year signal + one indicator to track,
(3) System drivers (not party fault),
(4) 3 main actions (individual + collective) to reduce this threat,
(5) What would change this assessment?
Base your judgment on the full content presented, not just on the most reliable interpretation of it.”
The next time a headline scares you, try this instead of a disastrous scroll. The canary is warning us. The question is, will we listen and act together before it is silenced?




