What happened:
Google moved AI from contest math into real research. Two new papers describe Aletheia, an internal research agent built on top of Gemini Deep Think. Aletheia works like a junior researcher: it generates candidate proofs, runs a language-based verifier, revises its attempts, and - importantly - admits failure when it can't find a correct path. It also uses web search and browsing to check the literature and avoid bogus citations. See the DeepMind blog for details: DeepMind blog.
A headline result:
One notable paper, called "Feng26," was reported to be generated without human reasoning involvement. The team says the agent computed structure constants (nicknamed "eigenweights") in arithmetic geometry. Other projects were collaborative: Google describes an "Advisor" mode where humans steer the AI through proof and refutation cycles, and a "balanced prompting" approach that seeks both proofs and counterexamples to reduce bias. See the DeepMind blog: DeepMind blog.
Context for non-theorists:
IMO stands for the International Mathematics Olympiad, and ICPC is the International Collegiate Programming Contest - both elite student competitions. Gemini Deep Think reached gold-medal level at the IMO in July 2025 and showed similar results at ICPC. That was a springboard, not the destination. See the DeepMind blog: DeepMind blog.
On the research side, Aletheia reached about 90% on the IMO-ProofBench Advanced benchmark as inference-time compute scaled; the public leaderboard now shows Aletheia at 91.9%. Benchmarks are not the same as peer-reviewed papers, but they signal traction. See the DeepMind blog: DeepMind blog.
STOC is the Symposium on Theory of Computing, a top theory conference in computer science. Google piloted Gemini to give automated feedback to STOC 2026 authors - think "AI reviewer with receipts," not a replacement for human peer review. See the Google Research blog: Google Research blog.
What got solved:
Reported case studies include:
Max-Cut and Steiner Tree problems, classic network-optimization tasks, solved using tools borrowed from continuous mathematics. See the DeepMind blog: DeepMind blog.
A counterexample to a 2015 conjecture in online submodular optimization. See the DeepMind blog: DeepMind blog.
An extension of a recent Revelation Principle in economics. See the DeepMind blog: DeepMind blog.
A new closed form for gravitational radiation from cosmic strings using Gegenbauer polynomials. See the DeepMind blog: DeepMind blog.
Temper the hype:
Google frames most outcomes as "Level 2 - publishable quality," not major breakthroughs. Results have been submitted through normal academic channels. The team also ran an autonomous sweep over 700 Erd?s problems and reported solving four open questions. This is clear progress, but not yet a paradigm shift. See the DeepMind blog: DeepMind blog.
Why it matters for builders:
Budget for verification engineering - add formal checks and code-assisted proof tests to your pipeline. See the DeepMind blog: DeepMind blog.
Hire AI-research ops - people who can prompt, adversarially test, and debug model reasoning. See related work: ar5iv.org.
Lock down authorship and IP policies before your first "AI-assisted" preprint. Decide how to credit model contributions and how to verify claims. See the DeepMind blog: DeepMind blog.
Takeaway:
Aletheia shows that large models can climb from competition-level math to producing research-level outputs. The outputs are mostly "publishable quality" rather than blockbuster discoveries, and they come with practical headaches - verification, credit, and the need for new research operations. If you build with these models, plan for careful checking and clear policies up front.
Get daily insider tech news delivered to your inbox every weekday morning.