Select Page

26 Nov 2025 | 0 comments

“Is the belief really, 'Oh, it’s so big, but if you had 100x more, everything would be so different?' It would be different, for sure. But is the belief that if you just 100x the scale, everything would be transformed? I don’t think that’s true. So it’s back to the age of research again, just with big computers.” - Ilya Sutskever - Safe Superintelligence

“Is the belief really, ‘Oh, it’s so big, but if you had 100x more, everything would be so different?’ It would be different, for sure. But is the belief that if you just 100x the scale, everything would be transformed? I don’t think that’s true. So it’s back to the age of research again, just with big computers.” – Ilya Sutskever – Safe Superintelligence

Ilya Sutskever stands as one of the most influential figures in modern artificial intelligence—a scientist whose work has fundamentally shaped the trajectory of deep learning over the past decade. As co-author of the seminal 2012 AlexNet paper, he helped catalyse the deep learning revolution that transformed machine vision and launched the contemporary AI era. His influence extends through his role as Chief Scientist at OpenAI, where he played a pivotal part in developing GPT-2 and GPT-3, the models that established large-scale language model pre-training as the dominant paradigm in AI research.

In late 2024, Sutskever departed OpenAI and co-founded Safe Superintelligence Inc. (SSI) alongside Daniel Gross and Daniel Levy, positioning the company as the world’s “first straight-shot SSI lab”—an organisation with a single focus: developing safe superintelligence without distraction from product development or revenue generation. The company has since raised $3 billion and reached a $32 billion valuation, reflecting investor confidence in Sutskever’s strategic vision and reputation.

The Context: The Exhaustion of Scaling

Sutskever’s quoted observation emerges from a moment of genuine inflection in AI development. For roughly five years—from 2020 to 2025—the AI industry operated under what he terms the “age of scaling.” This era was defined by a simple, powerful insight: that scaling pre-training data, computational resources, and model parameters yielded predictable improvements in model performance. Organisations could invest capital with low perceived risk, knowing that more compute plus more data plus larger models would reliably produce measurable gains.

This scaling paradigm was extraordinarily productive. It yielded GPT-3, GPT-4, and an entire generation of frontier models that demonstrated capabilities that astonished both researchers and the public. The logic was elegant: if you wanted better AI, you simply scaled the recipe. Sutskever himself was instrumental in validating this approach. The word “scaling” became conceptually magnetic, drawing resources, attention, and organisational focus toward a single axis of improvement.

Yet by 2024–2025, that era began showing clear signs of exhaustion. Data is finite—the amount of high-quality training material available on the internet is not infinite, and organisations are rapidly approaching meaningful constraints on pre-training data supply. Computational resources, whilst vast, are not unlimited, and the economic marginal returns on compute investment have become less obvious. Most critically, the empirical question has shifted: if current frontier labs have access to extraordinary computational resources, would 100 times more compute actually produce a qualitative transformation in capabilities, or merely incremental improvement?

Sutskever’s answer is direct: incremental, not transformative. This reframing is consequential because it redefines where the bottleneck actually lies. The constraint is no longer the ability to purchase more GPUs or accumulate more data. The constraint is ideas—novel technical approaches, new training methodologies, fundamentally different recipes for building AI systems.

The Jaggedness Problem: Theory Meeting Reality

One critical observation animates Sutskever’s thinking: a profound disconnect between benchmark performance and real-world robustness. Current models achieve superhuman performance on carefully constructed evaluation tasks—yet in deployment, they exhibit what Sutskever calls “jagged” behaviour. They repeat errors, introduce new bugs whilst fixing old ones, and cycle between mistakes even when given clear corrective feedback.

This apparent paradox suggests something deeper than mere data or compute insufficiency. It points to inadequate generalisation—the inability to transfer learning from narrow, benchmark-optimised domains into the messy complexity of real-world application. Sutskever frames this through an analogy: a competitive programmer who practises 10,000 hours on competition problems will be highly skilled within that narrow domain but often fails to transfer that knowledge flexibly to broader engineering challenges. Current models, in his assessment, resemble that hyper-specialised competitor rather than the flexible, adaptive learner.

The Core Insight: Generalisation Over Scale

The central thesis animating Sutskever’s work at SSI—and implicit in his quote—is that human-like generalisation and learning efficiency represent a fundamentally different ML principle than scaling, one that has not yet been discovered or operationalised within contemporary AI systems.

Humans learn with orders of magnitude less data than large models yet generalise far more robustly to novel contexts. A teenager learns to drive in roughly ten hours of practice; current AI systems struggle to acquire equivalent robustness with vastly more training data. This is not because humans possess specialised evolutionary priors for driving (a recent activity that evolution could not have optimized for); rather, it suggests humans employ a more general-purpose learning principle that contemporary AI has not yet captured.

Sutskever hypothesises that this principle is connected to what he terms “value functions”—internal mechanisms akin to emotions that provide continuous, intermediate feedback on actions and states, enabling more efficient learning than end-of-trajectory reward signals alone. Evolution appears to have hard-coded robust value functions—emotional and evaluative systems—that make humans viable, adaptive agents across radically different environments. Whether an equivalent principle can be extracted purely from pre-training data, rather than built into learning architecture, remains uncertain.

The Leading Theorists and Related Work

Yann LeCun and Data Efficiency

Yann LeCun, Meta’s Chief AI Scientist and a pioneer of deep learning, has long emphasised the importance of learning efficiency and the role of what he terms “world models” in understanding how agents learn causal structure from limited data. His work highlights that human vision achieves remarkable robustness from developmental data scarcity—children recognise cars after seeing far fewer exemplars than AI systems require—suggesting that the brain employs inductive biases or learning principles that current architectures lack.

Geoffrey Hinton and Neuroscience-Inspired AI

Geoffrey Hinton, winner of the 2024 Nobel Prize in Physics for his work on deep learning, has articulated concerns about AI safety and expressed support for Sutskever’s emphasis on fundamentally rethinking how AI systems learn and align. Hinton’s career-long emphasis on biologically plausible learning mechanisms—from Boltzmann machines to capsule networks—reflects a conviction that important principles for efficient learning remain undiscovered and that neuroscience offers crucial guidance.

Stuart Russell and Alignment Through Uncertainty

Stuart Russell, UC Berkeley’s leading AI safety researcher, has emphasised that robust AI alignment requires systems that remain genuinely uncertain about human values and continue learning from interaction, rather than attempting to encode fixed objectives. This aligns with Sutskever’s thesis that safe superintelligence requires continual learning in deployment rather than monolithic pre-training followed by fixed RL optimisation.

Demis Hassabis and Continual Learning

Demis Hassabis, CEO of DeepMind and a co-developer of AlphaGo, has invested significant research effort into systems that learn continually rather than through discrete training phases. This work recognises that biological intelligence fundamentally involves interaction with environments over time, generating diverse signals that guide learning—a principle SSI appears to be operationalising.

The Paradigm Shift: From Offline to Online Learning

Sutskever’s thinking reflects a broader intellectual shift visible across multiple frontiers of AI research. The dominant pre-training + RL framework assumes a clean separation: a model is trained offline on fixed data, then post-trained with reinforcement learning, then deployed. Increasingly, frontier researchers are questioning whether this separation reflects how learning should actually work.

His articulation of “age of research” signals a return to intellectual plurality and heterodox experimentation—the opposite of the monoculture that scaling paradigm created. When everyone is racing to scale the same recipe, innovation becomes incremental. When new recipes are required, diversity of approach becomes an asset rather than liability.

The Stakes and Implications

This reframing carries significant strategic implications. If the bottleneck is truly ideas rather than compute, then smaller, more cognitively coherent organisations with clear intellectual direction may outpace larger organisations constrained by product commitments, legacy systems, and organisational inertia. If the key innovation is a new training methodology—one that achieves human-like generalisation through different mechanisms—then the first organisation to discover and validate it may enjoy substantial competitive advantage, not through superior resources but through superior understanding.

Equally, this framing challenges the common assumption that AI capability is primarily a function of computational spend. If methodological innovation matters more than scale, the future of AI leadership becomes less a question of capital concentration and more a question of research insight—less about who can purchase the most GPUs, more about who can understand how learning actually works.

Sutskever’s quote thus represents not merely a rhetorical flourish but a fundamental reorientation of strategic thinking about AI development. The age of confident scaling is ending. The age of rigorous research into the principles of generalisation, sample efficiency, and robust learning has begun.

Download brochure

Introduction brochure

What we do, case studies and profiles of some of our amazing team.

Download

Our latest podcasts on Spotify
Global Advisors | Quantified Strategy Consulting