Table of Contents
René Descartes sat in a room in 1641 and decided to doubt everything. Not some things. Everything. The floor beneath him, the fire in front of him, even his own hands. He wanted to find one thing he could not doubt and build all of knowledge on top of it. He landed on “I think, therefore I am” and called it a day.
Nearly four centuries later, we are building artificial intelligence systems that process millions of decisions per second, and we are asking them to trust almost everything. Every input, every data packet, every API call arrives and gets processed with a kind of blind faith that would have made Descartes physically uncomfortable.
The irony is hard to miss. We live in the most sophisticated era of technology ever created, and our security models are often less philosophically rigorous than a French guy sitting alone with a candle.
The Meditation That Security Forgot
Descartes did not just doubt for fun. He had a method. In his Meditations on First Philosophy, he introduced the idea of a malicious demon, an all powerful deceiver who could fabricate every sensory experience. If such a demon existed, Descartes reasoned, then nothing you perceive can be trusted at face value. You need a foundation that survives even the worst case scenario.
Now replace “malicious demon” with “adversarial actor” and you have the central problem of AI security.
Most AI systems today operate on an implicit assumption: the data coming in is more or less what it claims to be. Training data is presumed clean. User inputs are presumed legitimate. Model outputs are presumed reliable. Each of these assumptions is a crack in the wall, and anyone who has worked in cybersecurity for more than fifteen minutes knows that cracks are where the interesting things happen.
Descartes would not have signed off on this architecture. His entire project was about refusing to accept anything until it proved itself beyond the reach of doubt. In AI security terms, that translates to something radical but increasingly necessary: treat every layer of the system as potentially compromised until you have independent reason to believe otherwise.
The Evil Demon Is Already Here
Here is the thing about Descartes and his demon thought experiment. It was supposed to be hypothetical. A philosophical stress test. Nobody actually expected a malicious entity to be feeding fake reality into your perception.
In AI, that is not hypothetical. It is Tuesday.
Data poisoning attacks inject corrupted samples into training sets. Adversarial examples are carefully crafted inputs designed to make models fail in specific, predictable ways. Prompt injection hijacks the instructions a language model follows. Model inversion attacks extract private training data from a model’s outputs. Each of these is, functionally, a version of the evil demon. An intelligent adversary manipulating what the system perceives in order to control what it does.
The difference is that Descartes only had to worry about one demon. AI systems face thousands of them, and they are not philosophical abstractions. They are people with laptops and motivation.
What Descartes understood, and what too many AI security frameworks still do not fully internalize, is that the threat is not just external. It is epistemic. The danger is not simply that someone might attack your system. The danger is that your system might not know it is being attacked. Worse, it might not have the capacity to know.
Cogito Ergo Securitas
So what would a Cartesian approach to AI security actually look like? Let us build it up from first principles, the way Descartes would have wanted.
Step one: doubt everything. Not as a slogan. As a design philosophy. Every input to an AI system should be treated as potentially adversarial until validated. Every output should be treated as potentially wrong until verified. This is not paranoia. It is engineering discipline.
The security world already has a version of this. It is called zero trust architecture. The idea is simple: do not trust any user, device, or connection by default, even if it is inside the network perimeter. Verify continuously. Authenticate relentlessly. Assume breach.
Zero trust is Descartes in a server rack.
Step two: find your cogito. Descartes needed one indubitable truth to anchor everything else. AI security needs the same thing. What is the one thing about your system that you can verify independently of everything else?
For most systems, this anchoring point should be cryptographic identity. The one thing that cannot be faked, even by the demon, is a properly implemented cryptographic proof. Digital signatures, hash chains, secure enclaves. These are the “I think, therefore I am” of machine security. Everything else can be questioned, but a valid cryptographic verification is, within the bounds of mathematical reality, certain.
Step three: rebuild from the foundation. Once you have your anchor, you build outward. Each layer of trust must be justified by the layer below it. Data provenance must be cryptographically verifiable. Model integrity must be attestable. Inference results must be auditable. Nothing gets a free pass.
This is expensive. It is slow. It is inconvenient. It is also the only approach that takes the threat landscape seriously.
The Problem of Other Minds (and Other Models)
Descartes had a follow up problem after his initial meditations. Even after establishing his own existence, he could not prove that other minds existed. Other people might be automata, sophisticated machines with no inner experience. He eventually resolved this through some rather creative theological reasoning that most modern philosophers find unconvincing, but the problem itself remains important.
In AI security, this maps onto a critical challenge: how do you trust another model?
As AI systems become more interconnected, with models calling other models, agents delegating to sub agents, and pipelines chaining multiple inference steps together, the question of inter model trust becomes urgent. When your system receives output from another AI, how do you know that model has not been compromised? How do you know its reasoning is sound? How do you know it is not hallucinating with absolute confidence?
You do not. And Descartes would tell you that you cannot, not without independent verification.
This is where the Cartesian framework gets genuinely practical. In a multi model architecture, every handoff between models is a point of potential failure. Each transition is a moment where the evil demon could intervene. A Cartesian security approach would mandate verification at every boundary. Not just input validation, but semantic validation. Not just checking that the data is well formed, but checking that it makes sense in context.
Think of it like this: if you asked someone for directions and they told you to drive north into the ocean, you would not follow those directions just because they were delivered in a grammatically correct sentence. AI systems do the equivalent of this constantly, because they check format but not meaning.
The Wax Argument and the Shape of Data
One of Descartes’ lesser known but more fascinating arguments involves a piece of wax. He observes a ball of wax near a fire. As it melts, everything about it changes. Its shape, color, texture, smell, and size all transform. Yet we still call it the same wax. Why?
Because, Descartes argues, we understand its identity not through sensory data but through reason. The wax is the same wax not because it looks the same, but because we can rationally track its continuity.
This has a surprisingly direct application in AI security: adversarial examples.
An adversarial example is an input that has been subtly modified to fool a model while remaining perceptually identical (or nearly so) to a human observer. A picture of a stop sign with a few pixels changed that a model classifies as a speed limit sign. A sentence with invisible Unicode characters that changes a model’s interpretation entirely.
The sensory data says one thing. The underlying reality is another. Descartes’ wax argument tells us that we need to understand inputs through rational analysis, not just surface pattern matching. An AI system that only looks at the surface features of its inputs, the sensory data, will be fooled by adversarial examples. One that reasons about the deeper structure and context of those inputs has a fighting chance.
The Trademark Argument, Inverted
Descartes made a peculiar argument for the existence of God. He said that the idea of a perfect being could not have originated from an imperfect mind. Therefore, a perfect being must have implanted that idea. He called this the “trademark argument” because the idea of God is like a manufacturer’s mark left on the product.
Setting the theology aside, the structure of this argument has an interesting inverse application in AI security. If you find something in your system that could not have originated from legitimate operation, then something illegitimate must have put it there.
This is the logic behind anomaly detection, one of the most important tools in AI security. If a model suddenly starts behaving in ways that its training and architecture could not produce through normal operation, that behavioral anomaly is a trademark. Something or someone left their mark. The system has been tampered with.
The Cartesian version of anomaly detection would be more radical than most current implementations. Instead of looking for deviations from a baseline, it would question whether the baseline itself is trustworthy. After all, if the evil demon has been present since the beginning, the baseline is already corrupted.
This leads to a counter intuitive but important conclusion: the most dangerous attacks on AI systems are not the ones that cause dramatic failures. They are the ones that subtly shift the baseline so that corrupted behavior looks normal. The demon does not want you to notice.
Why Perfect Security Is Impossible (and Why That Is Fine)
Descartes never fully solved the problem of radical skepticism. His solutions relied on proving God’s existence and goodness, which most modern philosophers consider the weakest part of his project. The skeptical challenge, taken to its logical extreme, cannot be fully answered. You cannot prove with absolute certainty that you are not a brain in a vat.
Similarly, perfect AI security is impossible. You cannot prove with absolute certainty that a system has not been compromised. The adversary has advantages that are structural, not incidental. They only need to find one vulnerability. You need to defend all of them.
But Descartes’ project was not a failure just because it did not achieve absolute certainty. It established a methodology. It showed that systematic doubt, rigorously applied, leads to more robust foundations than credulous acceptance ever could. The process matters even when the endpoint is unattainable.
The same is true for AI security. The goal is not a system that cannot be attacked. The goal is a system that knows it can be attacked and is designed accordingly. A system that doubts its inputs, verifies its outputs, questions its assumptions, and maintains the capacity to recognize when something is wrong.
That is skepticism as a service. Not a product you buy, but a disposition you build into the architecture.
The Meditation Continues
Descartes published his meditations and then spent the rest of his life arguing with people who disagreed with him. The skeptical method was not popular with everyone. Religious authorities were uncomfortable. Fellow philosophers found holes. Descartes himself kept revising and defending.
AI security is in a similar position. The Cartesian approach, doubt everything, verify independently, trust nothing by default, is not comfortable. It is expensive. It slows development. It annoys product managers who want to ship features. It creates friction in pipelines that are optimized for speed.
But the alternative is building complex, powerful systems on foundations of unexamined trust. Systems that assume their training data is clean because checking it is hard. Systems that trust other models because verifying them is slow. Systems that accept inputs at face value because validation is inconvenient.
Descartes showed us, almost four hundred years ago, that unexamined assumptions are the most dangerous kind. They are the ones you do not even know you are making. In AI security, those assumptions are not just philosophical risks. They are attack surfaces.
The most important lesson from Descartes is not “I think, therefore I am.” It is the method that got him there. Systematic doubt, applied with discipline and rebuilt from verified foundations. That method does not expire. It does not become obsolete with new technology. If anything, the more powerful and complex our systems become, the more we need a thinker from 1641 to remind us that the first question should always be: how do I know this is real?
The demon is not hypothetical anymore. But neither is the method for dealing with it.


