When AI Stops Testing and Starts Creating

https://www.youtube.com/watch?v=qebUG3nHQAI

The Discovery Era Begins — When AI Stops Testing and Starts Creating

How OpenAI’s solution to an 80-year-old math problem signals the end of benchmarks and the birth of genuine scientific innovation

From Benchmarks to Breakthroughs: The Moment Everything Changed

May 20th, 2026 will be remembered as the day artificial intelligence crossed a fundamental threshold. On this date, AI didn’t just pass another test or beat another human record—it created original knowledge that didn’t exist before. This wasn’t a prediction about what AI might do someday. This was the moment it actually happened.

For years, the AI narrative has followed a predictable pattern: machines excel at retrieval tasks. They’re exceptional at finding answers buried in vast databases, recognizing patterns in existing data, and predicting future outcomes based on historical trends. Think of it like a supremely talented librarian who can instantly locate any book on any shelf. Impressive? Absolutely. But fundamentally, the librarian is still retrieving information that already exists.

Creation is different. Generating new proofs, discovering previously unknown mathematical relationships, and advancing genuine scientific understanding represent a completely different category of achievement. May 20th marked the moment when general-purpose reasoning models accomplished what specialized systems had never managed: autonomous discovery of original knowledge.

What makes this breakthrough truly historic isn’t what it predicts about the future. It’s what it is right now. The proof existed in the mathematical landscape long before humans or machines found it. But finding it required the kind of reasoning that bridges multiple disciplines, connects unexpected insights, and pursues logical threads into uncharted territory.

This discovery proved that you don’t need purpose-built, specialized systems to make scientific breakthroughs. A general-purpose model—one trained broadly across multiple domains—could achieve what narrowly focused systems could not. The paradigm has shifted from artificial intelligence as a tool for solving human problems to artificial intelligence as a genuine partner in scientific discovery itself.

The Erdős Problem: Eight Decades of Geometric Intuition

In 1946, legendary mathematician Paul Erdős posed what seemed like an innocent question: how many pairs of points can you place on a plane such that they are exactly one unit apart from each other? The simplicity of the question masked its profound difficulty. For eight decades, this problem would resist every attempt at solution, becoming one of mathematics’ most stubborn puzzles.

Throughout those eighty years, mathematicians approached the problem with remarkable consistency. They explored grid-based arrangements, lattice patterns, and carefully structured geometric configurations. Each new attempt followed the same logical path: optimize within the Euclidean framework, refine the calculations, and hope for breakthrough insight. Yet the ceiling remained unmoved.

What made this problem so intractable was not the mathematics itself, but the assumptions underlying every approach. Mathematicians had unconsciously painted themselves into a corner, believing that the answer must lie within traditional geometric logic. They operated under the assumption that elegant solutions required elegant, orderly arrangements—grids and lattices that followed predictable rules.

The real insight came not from better geometry, but from stepping outside it entirely. The problem needed fresh perspective, a willingness to abandon the comfortable frameworks that had guided mathematical intuition for generations. Sometimes the greatest discoveries come not from working harder within the system, but from recognizing when the system itself has become the obstacle.

The Unexpected Bridge: Connecting Geometry to Number Theory

The breakthrough in solving this decades-old problem didn’t emerge from AI outthinking the world’s best mathematicians. Instead, it revealed something far more interesting: AI’s unique ability to forge connections across vast intellectual distances that human experts had never considered traversing.

The solution hinged on connecting two seemingly unrelated mathematical territories. On one side lay combinatorial geometry—the study of how shapes and points arrange themselves in space. On the other side sat algebraic number theory—an abstract field dealing with properties of numbers and their relationships. These domains had evolved separately for decades, each with its own language, problems, and practitioners. Few mathematicians spent their careers in both.

The critical insight involved the Golod-Shafarevich criterion, a result from abstract number theory established in 1964. To most observers, this theoretical result seemed to have no practical application to geometric problems whatsoever. It lived in the realm of pure abstraction, far removed from spatial reasoning.

Even Jacob Tsimerman, a leading mathematician at the University of Toronto, had glimpsed this possibility before. But he chose not to pursue it, acknowledging later that the approach seemed like “a grind and seemingly unlikely to succeed.” The path appeared too difficult, the potential payoff too uncertain.

Here lies the real advantage AI possessed: not superior intelligence or mathematical insight, but something more fundamental yet powerful. AI systems can simultaneously hold vast swaths of mathematical literature in their working memory, instantly accessible and interconnected. Where a human researcher might know deeply a handful of subfields, the AI could maintain millions of papers, theorems, and connections simultaneously. This enabled it to recognize a pattern invisible to specialists working within traditional disciplinary boundaries.

The breakthrough reveals that discovery sometimes requires not genius, but perspective—the ability to see how distant islands of knowledge might be connected by a bridge no one had thought to build.

Human Expertise Refines the Discovery: The Princeton Weekend

While OpenAI’s initial proof represented a genuine breakthrough, it came with a significant limitation: the mathematical constants lacked the precision that the scientific community demands. The proof was valid in its logic, but the improvement exponent—a crucial numerical value that quantifies how much better the solution performs—remained frustratingly rough around the edges.

Enter Will Sawin, Princeton University’s Fernholz Professor of Mathematics. What happened next exemplifies the true partnership between artificial intelligence and human expertise. Over an intensive weekend, Sawin devoted himself to the painstaking work of refining OpenAI’s foundational proof. Through meticulous mathematical analysis and creative problem-solving, he transformed vague estimates into a precise constant: 0.014. This wasn’t merely rounding numbers—it was mathematical craftsmanship of the highest order, taking raw AI-generated insights and polishing them into publishable science.

The validation process that followed proved equally remarkable. Nine leading mathematicians, including a Fields Medal winner—mathematics’ highest honor—independently examined the proof from multiple angles. Each verification strengthened confidence in the result, ensuring that what emerged was not merely an AI hallucination dressed up in mathematical notation, but genuine mathematical truth.

This collaboration reveals a fundamental truth about modern scientific discovery: neither artificial intelligence nor human expertise alone is sufficient. The AI provided speed and an innovative perspective that no mathematician had previously considered. Yet without Sawin’s weekend of intensive refinement and the expert validation of nine accomplished mathematicians, the proof would have remained a promising sketch rather than a completed masterpiece. The future of scientific breakthrough belongs not to machines or humans working independently, but to both working together—each compensating for the other’s limitations.

The October Lesson: Why Verification Matters More Than Speed

In October 2025, OpenAI made headlines claiming that GPT-5 had solved ten Erdős problems—a mathematically significant achievement. The announcement generated excitement and media coverage. But there was a critical problem: the AI hadn’t actually solved these problems. It had retrieved existing solutions from its training data. When Yann LeCun and Demis Hassabis publicly corrected the record, the mistake stung. OpenAI’s credibility took a hit, but something valuable emerged from the correction: a new accountability standard for AI breakthroughs in science.

This embarrassment proved transformative. By May 2026, when OpenAI announced a genuine mathematical breakthrough—an original proof relevant to the Erdős problems—the approach had fundamentally changed. The discovery didn’t come with breathless press releases and premature claims. Instead, it was accompanied by rigorous peer review conducted before publication. This deliberate shift from speed to verification represented a maturation in how the AI research community handles scientific claims.

The distinction matters profoundly. Mathematician Daniel Litt articulated why: previous AI breakthroughs were exciting primarily as leading indicators—signs of capability and future potential. The May 2026 Erdős proof was different. It was exciting intrinsically, for its own mathematical merit, regardless of what it predicted about AI’s future.

The October lesson wasn’t about avoiding failure—it was about building trust through transparency. In science, credibility moves slowly but lasts forever. By choosing verification over speed, the AI community learned that sustainable progress requires patience, humility, and the willingness to correct course publicly. That foundation makes future breakthroughs genuinely meaningful.

What Comes Next: General-Purpose Intelligence and Cross-Domain Discovery

The breakthrough that solved an eighty-year-old geometry problem arrived not from a specialized mathematics engine, but from a general-purpose reasoning model. This distinction carries profound implications. Rather than being trained specifically for number theory or geometric proofs, the same AI system that tackles language, vision, and countless other domains found the missing connection. It suggests that genuine discovery may require the kind of flexible, boundary-crossing thinking that general intelligence provides.

Mathematician Thomas Bloom’s insight crystallizes this potential: deep number theory may hold answers to multiple unsolved discrete geometry problems. Think of it like discovering that a key hidden in one room unlocks doors throughout an entire building. The geometry problems weren’t merely difficult—they were waiting for someone or something to recognize that number theory contained the missing pieces. A specialized geometry system might never have ventured into number theory’s domain.

This discovery creates a roadmap for scientific exploration. Researchers can now revisit older, seemingly intractable problems with renewed focus on cross-domain connections. If number theory and geometry share hidden bridges, what unexpected links exist between chemistry and physics, or between biology and materials science? The question shifts from “can we solve this problem?” to “what other field holds its solution?”

The implications extend far beyond mathematics. If general-purpose AI can find unexpected bridges without specialized training, then perhaps the future of science involves less compartmentalization. Rather than separate systems for molecular biology, climate modeling, or astrophysics, perhaps breakthrough discoveries emerge when a reasoning engine can freely wander across disciplines, making connections humans never considered because our expertise naturally funnels us into narrow channels.

We may be witnessing the birth of a new scientific methodology—one where the ability to think across domains becomes humanity’s most powerful tool for understanding our world.

Stay ahead of the curve! Subscribe for more insights on the latest breakthroughs and innovations.

The Discovery Era Begins — When AI Stops Testing and Starts Creating

The Discovery Era Begins — When AI Stops Testing and Starts Creating

From Benchmarks to Breakthroughs: The Moment Everything Changed

The Erdős Problem: Eight Decades of Geometric Intuition

The Unexpected Bridge: Connecting Geometry to Number Theory

Human Expertise Refines the Discovery: The Princeton Weekend

The October Lesson: Why Verification Matters More Than Speed

What Comes Next: General-Purpose Intelligence and Cross-Domain Discovery

Like this:

Sign up to receive email updates, fresh news and more!

The Discovery Era Begins — When AI Stops Testing and Starts Creating

From Benchmarks to Breakthroughs: The Moment Everything Changed

The Erdős Problem: Eight Decades of Geometric Intuition

The Unexpected Bridge: Connecting Geometry to Number Theory

Human Expertise Refines the Discovery: The Princeton Weekend

The October Lesson: Why Verification Matters More Than Speed

What Comes Next: General-Purpose Intelligence and Cross-Domain Discovery

Share this:

Like this:

Related Posts