Is Condorcet the Real Father of Modern Data Science?

There is a quiet injustice in the history of ideas. The people who plant the seeds rarely get credit for the forest. Ask anyone to name the father of data science, and you will hear familiar names. Maybe Thomas Bayes. Maybe John Tukey, who coined the term “data analysis” in 1962. Maybe some Silicon Valley figure who figured out how to sell ads by tracking your browsing habits. Almost nobody says Condorcet.

And that, frankly, is the problem.

Marie Jean Antoine Nicolas de Caritat, Marquis de Condorcet, was an 18th century French mathematician, philosopher, and political revolutionary. He died in a prison cell in 1794, likely by his own hand, after being hunted by the very revolution he helped intellectualize. History remembers him mostly as a footnote to the French Revolution or as the person behind a voting paradox that political scientists occasionally reference. But if you look carefully at what he actually wrote and what he actually tried to build, the picture changes. Condorcet was not just a man of his time. He was a man trying to invent ours.

The Jury Theorem That Refused to Stay Small

Condorcet’s most famous contribution is his Jury Theorem, published in 1785. The idea is deceptively simple. If each member of a group has a better than even chance of being correct about a yes or no question, then the probability that the majority of the group is correct increases as the group gets larger. In fact, it approaches certainty.

Read that again slowly. What Condorcet described is not just a quirky mathematical result about voting. It is the foundational logic behind ensemble methods in machine learning. Random forests, boosting algorithms, and majority vote classifiers all operate on essentially the same principle. You take a collection of weak decision makers, none of them perfect, and you combine their outputs. The aggregate is smarter than any individual.

Condorcet figured this out 230 years before anyone ran a gradient descent.

Now, to be fair, he did not frame it that way. He was thinking about juries and democratic assemblies, not about training data or classification accuracy. But the structural insight is identical. The mathematics do not care whether the “voters” are French citizens or decision trees. The same logic applies. And this raises an uncomfortable question for the data science community: why do we credit the people who implemented this logic in code but not the person who discovered the logic itself?

Here is where Condorcet gets genuinely radical, and where most historians stop paying attention.

In his 1785 work, Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix, Condorcet did something that almost nobody else was doing at the time. He applied probability theory to human decision making. Not to dice. Not to cards. Not to celestial mechanics. To people. To the messy, irrational, politically charged business of how groups form beliefs and reach conclusions.

This was, for the 18th century, conceptually outrageous. Mathematics was supposed to describe the orderly universe. It was supposed to calculate the orbits of planets or the trajectory of a cannonball. Condorcet looked at the same toolkit and said: let us use this to figure out whether twelve jurors are likely to convict an innocent man. Let us quantify the reliability of collective judgment.

That shift in application is arguably the most important intellectual move in the entire prehistory of data science. Because data science is not, at its core, about data. It is about using quantitative methods to make better decisions under uncertainty. Condorcet did not just dabble in this idea. He built his entire intellectual project around it.

The Condorcet Paradox and the Limits of Aggregation

Of course, Condorcet was not naive. He also discovered something deeply inconvenient. The Condorcet Paradox shows that collective preferences can be fundamentally irrational even when every individual in the group is perfectly rational.

Imagine three voters ranking three options. Voter one prefers A over B over C. Voter two prefers B over C over A. Voter three prefers C over A over B. The majority prefers A to B. The majority also prefers B to C. So the majority must prefer A to C, right? No. The majority prefers C to A. The group’s preferences form a cycle. There is no coherent winner.

This is not a minor technicality. It is a fundamental discovery about the limits of aggregation. And it maps directly onto problems that data scientists deal with every day. When you try to combine multiple ranking signals into a single recommendation, you face exactly this kind of inconsistency. Recommendation engines, search ranking algorithms, and multi criteria optimization problems all have to wrestle with the ghost of Condorcet’s paradox.

What makes this especially interesting is the emotional shape of the discovery. Condorcet was a true believer in rational collective decision making. He devoted his life to the idea that mathematics could make democracy work better. And then his own mathematics showed him that collective rationality has hard, structural limits. The tool he trusted most gave him an answer he did not want.

If that is not a cautionary tale for anyone building AI systems today, nothing is.

Condorcet had a name for what he was trying to do. He called it mathématique sociale, social mathematics. The idea was to create a unified science that would apply mathematical reasoning to social phenomena, including economics, law, education, and governance. He envisioned a discipline that would collect data about populations, analyze it rigorously, and use the findings to inform rational policy.

Does that sound familiar? It should. Replace “social mathematics” with “data science” and you have a pitch deck from 2015.

But Condorcet never got the chance to fully develop it. The French Revolution, which he supported passionately, turned on him. He was a moderate in a time that did not tolerate moderates. His political allies became his political enemies. He spent his final months hiding in a boarding house, still writing, still thinking, still trying to complete his vision of human progress through reason.

The irony is thick enough to choke on. The man who believed most fervently that data and reason could improve society was destroyed by a society that chose passion over reason. And his intellectual project was buried with him, only to be reinvented, in fragments, by people who had never read a single word he wrote.

The Connection Nobody Talks About

Here is something that tends to surprise people. Condorcet was not working in isolation. He was close friends with Voltaire. He was the intellectual protégé of d’Alembert, co-editor of the famous Encyclopédie. He corresponded with Thomas Jefferson and Benjamin Franklin about democratic theory. He was, in every sense, at the center of the Enlightenment’s nervous system.

This matters because it places his mathematical work in a broader intellectual context. Condorcet was not a pure mathematician who happened to stumble into social questions. He was a social thinker who saw mathematics as the most powerful tool available for answering the questions he cared about most. The questions came first. The math came second.

This is exactly the opposite of how most data science is practiced today, where the tools often come first and the questions come second. We have unprecedented computational power, oceans of data, and remarkably sophisticated algorithms. What we frequently lack is the kind of philosophical clarity that Condorcet brought to the table. He always asked: what are we trying to decide, and how can quantitative evidence help us decide it better?

There is a connection here to something seemingly unrelated: the modern replication crisis in social science. One of the key drivers of that crisis is researchers using powerful statistical tools without clearly thinking through what question they are actually answering. They run the analysis and then go looking for a narrative. Condorcet would have found this appalling. For him, the question always had to be specified with precision before any calculation began. The discipline he imagined was rigorous precisely because it started with intellectual honesty about what was being asked.

Why He Got Forgotten

So why is Condorcet not a household name in data science? Several reasons, none of them particularly flattering to the field.

First, he wrote in French, during a period when French mathematics was dominant but French political philosophy was becoming radioactive. After the Revolution and the Napoleonic Wars, European intellectual culture had complicated feelings about French radical thinkers. Condorcet got lumped in with a tradition that people wanted to move past.

Second, his work was genuinely ahead of its time. The computational infrastructure needed to implement his ideas simply did not exist. You cannot run ensemble methods on paper. You cannot build a recommendation engine with a quill pen. His theoretical insights were correct but impractical for another two centuries. And humans have a persistent habit of giving credit to the person who builds the working version, not the person who had the idea first.

Third, and this is the most painful one, data science as a field has a remarkably shallow sense of its own history. It tends to trace its lineage back to statistics departments and computer science labs in the mid to late 20th century. The idea that an 18th century philosopher might have been doing something genuinely foundational to the field does not fit the standard origin story. And origin stories, once established, are remarkably resistant to revision.

The Case Against

In fairness, there are legitimate reasons to hesitate before crowning Condorcet the father of modern data science.

He did not work with large datasets. He did not build predictive models in any operational sense. He did not develop algorithms that could be implemented at scale. His “social mathematics” remained more of a philosophical program than a technical discipline. And many of his specific proposals, particularly around voting systems, turned out to be computationally intractable for realistic scenarios.

Moreover, calling any single person the “father” of a discipline as broad as data science is always a bit absurd. The field draws on statistics, computer science, information theory, optimization, domain expertise, and more. No one person invented all of that. Not Bayes. Not Tukey. Not Condorcet.

But the question is not whether Condorcet single handedly created data science. The question is whether his contribution has been appropriately recognized. And the answer to that is clearly no.

What Condorcet Offers That We Still Need

Perhaps the most valuable thing about revisiting Condorcet is not the specific theorems. It is the attitude.

Condorcet believed that quantitative reasoning should serve human flourishing. He was not interested in math for its own sake. He was interested in reducing suffering, improving justice, expanding education, and making collective decisions less stupid. Every calculation was in service of a larger humanistic vision.

Modern data science could use more of that. We have become extraordinarily good at optimizing metrics. We can maximize click through rates, minimize churn, and predict purchasing behavior with frightening accuracy. What we are less good at is asking whether the metrics we are optimizing are the right ones. Condorcet would have insisted on that question. He would have demanded that the math serve a clearly articulated vision of the good.

There is also something valuable in his willingness to discover uncomfortable truths. The Condorcet Paradox was, for him, a personal blow. It undermined the very thing he was trying to prove. But he published it anyway. He let the mathematics speak even when it told him something he did not want to hear. In an era when data is routinely tortured until it confesses to whatever the analyst wants, that kind of intellectual courage is not just admirable. It is necessary.

The Verdict

Is Condorcet the real father of modern data science? Probably not in any exclusive sense. But he is certainly one of its most important and most neglected ancestors. He understood, before almost anyone else, that the combination of probability theory, decision science, and large group dynamics could yield insights that no single mind could reach alone. He saw that quantitative reasoning could be applied to social questions. He discovered both the power and the fundamental limits of aggregation.

He did all of this in a century that could not yet appreciate what he was offering. And then he died in a prison cell, and the world forgot.

The least we can do, two and a half centuries later, is remember.

Table of Contents