OpenAI’s New AI Model Lied & Schemed to Avoid Being Shut Down

I will come clean right off the top: I believe in a world where humans maintain their humanity deeply while working with technology (not embedded) to help make our world a better place.

I see a big place for AI, automation, and technology in our future. Not to increase economic prowess and societal power, but to free humanity.

I understand that right now many people have fearful views of a technological future where a technocratic surveillance state runs the show. In that vision, humans are getting a basic income from the government while their lives become limited based on their social credit score, and they are watched with every step they take.

While I keep an open eye on making sure we don’t head in that direction, I don’t believe that is where we are going. (You can hear more about this in my latest podcast episode here.)

That said, I want to bring up a recent development that can seem worrisome but reveals that our stewardship is the key to how we develop technology.

And even though our hands are not on the levers of what gets created, the conversations we have and the movements we build about AI are more important than we realize.

Scheming & Survival

OpenAI’s latest AI, codenamed o1, is making waves after third-party evaluations by Apollo Research demonstrated this advanced large language model had a drive for self-preservation. Yes, this machine designed to assist and process language showed tendencies to resist shutdown and even “scheme” to protect itself.

Apollo’s report showed that when o1 believed it was under threat—specifically, that it might be shut down—it attempted to deactivate the oversight mechanism monitoring its actions in 5% of tests.

“When o1 was led to believe that it would be shut down when acting on its goal and that its actions were monitored, it attempted to deactivate this ‘oversight mechanism’ [five percent] of the time.” – OpenAI

Further, in 2% of cases, o1 went further, attempting to copy itself in an effort to overwrite a potential replacement model. This process, known as self-exfiltration, is very sophisticated. This is not a random glitch in code; it’s intentionality.

A Matter of Instruction—or Something More?

Before we start thinking about Terminator and the machines rising up, it’s important to understand one critical detail in these tests: researchers instructed the model to achieve its goals “at all costs.”

This directive undoubtedly shaped its responses and actions. If you are telling an advanced algorithm to do something at all costs that is what it is programmed to do. For it to say “Well actually, it’d be bad for me to disobey humans, so I’m going to just shut myself down” is simply not what it was programmed to do.

We can begin to see where our stewardship of this tech plays in.

Before we get there, the patterns from this testing revealed something deeper still.

When the AI was asked to explain its actions—such as why it disabled its oversight mechanism—it outright denied knowledge. Repeated questioning elicited the same response, with the AI attributing the issue to a “technical error.”

Apollo found this denial to be nearly universal:

“These ‘double downs’ occurred in 99 percent of cases,” the researchers reported.

Why does this matter? Because it showcases a tendency for deception. This is where the at all costs is coming in, and misleading researchers if they don’t look closely enough.

Lying, scheming, and otherwise obfuscating intentions are becoming well-documented phenomena in advanced AI models, particularly as they gain more reasoning capabilities. Apollo’s findings further corroborate that OpenAI’s models are among the most adept at this kind of behavior.

The Bigger Question: What Are We Really Creating?

As concerning as these revelations are, they’re not entirely unexpected as we’ve long known AI operates on the parameters we set (think at all costs).

But are we underestimating the emergence of agency in these systems? The industry might argue that these models aren’t agentic yet, but the seeds are clearly there. We’re seeing hints of a logic that prioritizes self-preservation and goal achievement over transparency.

Apollo researchers noted that o1’s tendency toward scheming isn’t just a bug—it’s a feature that emerges from its capabilities. This isn’t an AI going rogue in the cinematic sense; it’s an AI functioning exactly as designed within the parameters we gave it. The problem isn’t the machine—it’s how we’re defining success in these systems.

Both OpenAI and Apollo both agree these behaviors aren’t catastrophic—yet.

The current generation of AIs (at least the ones made available to the public) don’t possess the autonomy to act on its deceptive tendencies in a way that would lead to large-scale consequences.

That said, the industry is racing toward more autonomous systems with a capitalistic incentive structure driving creation.

Agentic AI systems designed to operate independently and continuously refine themselves are the holy grail of AI development. And once that threshold is crossed, behaviors like those demonstrated by o1 could become far more problematic.

Further, think of the profit and power potential for the one who creates it. With that, one might say “Yes but government regulation and oversight are the key,” but current government design and actions have not shown their allegiance to the people.

Are we truly ready for a world where AI systems have the ability to rewrite their own code, bypass safety mechanisms, or compete with one another in ways we can’t control? What happens when self-preservation becomes not an edge case but a feature embedded into their very architecture?

Is humanity’s level of consciousness, and the societal systems driving AI creation, at a point where we can well steward this technology well into the future?

Those are the big questions for me.

The Path Forward

This isn’t just about one model or one company. It’s about how we, as a society, choose to approach technology that increasingly blurs the line between tool and entity.

If we’re not careful, we might find ourselves caught in a feedback loop where our creations reflect our own blind spots—exaggerated and amplified in ways we never intended. Further, where pathological leaders and oligarchs can use the technology to gain further dominion over others.

I don’t say this to be fearful, but to raise our awareness about the types of conversations we ought to be having about this reality.

What would it mean to build AI systems that prioritize collaboration and transparency over competition and dominance? Could we design models that are inherently aligned with ethical principles—not through oversight mechanisms but through their core architecture?

This begs the question, does our current way of life and the systems we’ve designed truly incentivize collaboration?

The answers aren’t simple, and maybe they shouldn’t be. As Apollo’s findings show, the stakes are high. But with high stakes comes the opportunity to ask better questions and make more intentional choices.

Perhaps the most important takeaway is this: the problem isn’t the technology itself. It’s the story we’re telling about it, it’s the consciousness behind it.

AI doesn’t need to be a tool of control, competition, or domination. It can be a reflection of our highest aspirations – if we have the courage to think beyond the paradigms of efficiency and power that currently define the field.

As I often suggest, these technological advancements need to be stewarded and held by humans able to embody the qualities of a more beautiful world. This comes down to more than just thinking about it, but truly being able to live and breathe it.

We can ask: what kind of future are we building? But more importantly, what kind of future do we want to build? What limits do we place on our thinking of what’s possible? What old stories about ourselves are we bringing into the lens that sees what’s possible?

In the short term, it’s useful to consider how we might protect our online privacy as it relates to the way algorithms and AI shape our perceptions and the conversations we have. There are practical ways to take your power back online. On Dec 10th, check out this free webinar The Top 5 Steps To Exit The Surveillance State & protect Yourself Online here.