Breaking Business News | Breaking business news AM | Breaking Business News PM | Business News Select | Link from bio | Quotes | SMPostStory

Quote: Ilya Sutskever – Safe Superintelligence

27 Nov 2025 | 0 comments

“These models somehow just generalize dramatically worse than people. It’s super obvious. That seems like a very fundamental thing.” – Ilya Sutskever – Safe Superintelligence

Sutskever, as co-founder and Chief Scientist of Safe Superintelligence Inc. (SSI), has emerged as one of the most influential voices in AI strategy and research direction. His trajectory illustrates the depth of his authority: co-author of AlexNet (2012), the paper that ignited the deep learning revolution; Chief Scientist at OpenAI during the development of GPT-2 and GPT-3; and now directing a $3 billion research organisation explicitly committed to solving the generalisation problem rather than pursuing incremental scaling.

His assertion about generalisation deficiency is not rhetorical flourish. It represents a fundamental diagnostic claim about why current AI systems, despite superhuman performance on benchmarks, remain brittle, unreliable, and poorly suited to robust real-world deployment. Understanding this claim requires examining what generalisation actually means, why it matters, and what the gap between human and AI learning reveals about the future of artificial intelligence.

What Generalisation Means: Beyond Benchmark Performance

Generalisation, in machine learning, refers to the ability of a system to apply knowledge learned in one context to novel, unfamiliar contexts it has not explicitly encountered during training. A model that generalises well can transfer principles, patterns, and capabilities across domains. A model that generalises poorly becomes a brittle specialist—effective within narrow training distributions but fragile when confronted with variation, novelty, or real-world complexity.

The crisis Sutskever identifies is this: contemporary large language models and frontier AI systems achieve extraordinary performance on carefully curated evaluation tasks and benchmarks. GPT-4 scores in the 88th percentile of the bar exam. O1 solves competition mathematics problems at elite levels. Yet these same systems, when deployed into unconstrained real-world workflows, exhibit what Sutskever terms “jagged” behaviour—they repeat errors, introduce new bugs whilst fixing previous ones, cycle between mistakes even with clear corrective feedback, and fail in ways that suggest fundamentally incomplete understanding rather than mere data scarcity.

This paradox reveals a hidden truth: benchmark performance and deployment robustness are not tightly coupled. An AI system can memorise, pattern-match, and perform well on evaluation metrics whilst failing to develop the kind of flexible, transferable understanding that enables genuine competence.

The Sample Efficiency Question: Orders of Magnitude of Difference

Underlying the generalisation crisis is a more specific puzzle: sample efficiency. Why does it require vastly more training data for AI systems to achieve competence in a domain than it takes humans?

A human child learns to recognise objects through a few thousand exposures. Contemporary vision models require millions. A teenager learns to drive in approximately ten hours of practice; AI systems struggle to achieve equivalent robustness with orders of magnitude more training. A university student learns to code, write mathematically, and reason about abstract concepts—domains that did not exist during human evolutionary history—with remarkably few examples and little explicit feedback.

This disparity points to something fundamental: humans possess not merely better priors or more specialised knowledge, but better general-purpose learning machinery. The principle underlying human learning efficiency remains largely unexpressed in mathematical or computational terms. Current AI systems lack it.

Sutskever’s diagnostic claim is that this gap reflects not engineering immaturity or the need for more compute, but the absence of a conceptual breakthrough—a missing principle of how to build systems that learn as efficiently as humans do. The implication is stark: you cannot scale your way out of this problem. More data and more compute, applied to existing methodologies, will not solve it. The bottleneck is epistemic, not computational.

Why Current Models Fail at Generalisation: The Competitive Programming Analogy

Sutskever illustrates the generalisation problem through an instructive analogy. Imagine two competitive programmers:

Student A dedicates 10,000 hours to competitive programming. They memorise every algorithm, every proof technique, every problem pattern. They become exceptionally skilled within competitive programming itself—one of the very best.

Student B spends only 100 hours on competitive programming but develops deeper, more flexible understanding. They grasp underlying principles rather than memorising solutions.

When both pursue careers in software engineering, Student B typically outperforms Student A. Why? Because Student A has optimised for a narrow domain and lacks the flexible transfer of understanding that Student B developed through lighter but more principled engagement.

Current frontier AI models, in Sutskever’s assessment, resemble Student A. They are trained on enormous quantities of narrowly curated data—competitive programming problems, benchmark evaluation tasks, reinforcement learning environments explicitly designed to optimise for measurable performance. They have been “over-trained” on carefully optimised domains but lack the flexible, generalised understanding that enables robust performance in novel contexts.

This over-optimisation problem is compounded by a subtle but crucial factor: reinforcement learning optimisation targets. Companies designing RL training environments face substantial degrees of freedom in how to construct reward signals. Sutskever observes that there is often a systematic bias: RL environments are subtly shaped to ensure models perform well on public benchmarks at release time, creating a form of unintentional reward hacking where the system becomes highly tuned to evaluation metrics rather than genuinely robust to real-world variation.

The Deeper Problem: Pre-Training’s Limits and RL’s Inefficiency

The generalisation crisis reflects deeper structural issues within contemporary AI training paradigms.

Pre-training’s opacity: Large-scale language model pre-training—trained on internet text data—provides models with an enormous foundation of patterns. Yet the way models rely on this pre-training data is poorly understood. When a model fails, it is unclear whether the failure reflects insufficient statistical support in the training distribution or whether something more fundamental is missing. Pre-training provides scale but at the cost of reasoning about what has actually been learned.

RL’s inefficiency: Current reinforcement learning approaches provide training signals only at the end of long trajectories. If a model spends thousands of steps reasoning about a problem and arrives at a dead end, it receives no signal until the trajectory completes. This is computationally wasteful. A more efficient learning system would provide intermediate evaluative feedback—signals that say, “this direction of reasoning is unpromising; abandon it now rather than after 1,000 more steps.” Sutskever hypothesises that this intermediate feedback mechanism—what he terms a “value function” and what evolutionary biology has encoded as emotions—is crucial to sample-efficient learning.

The gap between how humans and current AI systems learn suggests that human learning operates on fundamentally different principles: continuous, intermediate evaluation; robust internal models of progress and performance; the ability to self-correct and redirect effort based on internal signals rather than external reward.

Generalisation as Proof of Concept: What Human Learning Reveals

A critical move in Sutskever’s argument is this: the fact that humans generalise vastly better than current AI systems is not merely an interesting curiosity—it is proof that better generalisation is achievable. The existence of human learners demonstrates, in principle, that a learning system can operate with orders of magnitude less data whilst maintaining superior robustness and transfer capability.

This reframes the research challenge. The question is no longer whether better generalisation is possible (humans prove it is) but rather what principle or mechanism underlies it. This principle could arise from:

Architectural innovations: new ways of structuring neural networks that embody better inductive biases for generalisation
Learning algorithms: different training procedures that more efficiently extract principles from limited data
Value function mechanisms: intermediate feedback systems that enable more efficient learning trajectories
Continual learning frameworks: systems that learn continuously from interaction rather than through discrete offline training phases

What matters is that Sutskever’s claim shifts the research agenda from “get more compute” to “discover the missing principle.”

The Strategic Implications: Why This Matters Now

Sutskever’s diagnosis, articulated in November 2025, arrives at a crucial moment. The AI industry has operated under the “age of scaling” paradigm since approximately 2020. During this period, the scaling laws discovered by OpenAI and others suggested a remarkably reliable relationship: larger models trained on more data with more compute reliably produced better performance.

This created a powerful strategic imperative: invest capital in compute, acquire data, build larger systems. The approach was low-risk from a research perspective because the outcome was relatively predictable. Companies could deploy enormous resources confident they would yield measurable returns.

By 2025, however, this model shows clear strain. Data is approaching finite limits. Computational resources, whilst vast, are not unlimited, and marginal returns diminish. Most importantly, the question has shifted: would 100 times more compute actually produce a qualitative transformation or merely incremental improvement? Sutskever’s answer is clear: the latter. This fundamentally reorients strategic thinking. If 100x scaling yields only incremental gains, the bottleneck is not compute but ideas. The competitive advantage belongs not to whoever can purchase the most GPUs but to whoever discovers the missing principle of generalisation.

Leading Theorists and Related Research Programs

Yann LeCun: World Models and Causal Learning

Yann LeCun, Meta’s Chief AI Scientist and a pioneer of deep learning, has long emphasized that current supervised learning approaches are fundamentally limited. His work on “world models”—internal representations that capture causal structure rather than mere correlation—points toward learning mechanisms that could enable better generalisation. LeCun’s argument is that humans learn causal models of how the world works, enabling robust generalisation because causal understanding is stable across contexts in a way that statistical correlation is not.

Geoffrey Hinton: Neuroscience-Inspired Learning

Geoffrey Hinton, recipient of the 2024 Nobel Prize in Physics for foundational deep learning work, has increasingly emphasized that neuroscience holds crucial clues for improving AI learning efficiency. His recent work on biological plausibility and learning mechanisms reflects conviction that important principles of how neural systems efficiently extract generalised understanding remain undiscovered. Hinton has expressed support for Sutskever’s research agenda, recognizing that the next frontier requires fundamental conceptual breakthroughs rather than incremental scaling.

Stuart Russell: Learning Under Uncertainty

Stuart Russell, UC Berkeley’s leading AI safety researcher, has articulated that robust AI alignment requires systems that remain genuinely uncertain about objectives and learn from interaction. This aligns with Sutskever’s emphasis on continual learning. Russell’s work highlights that systems designed to optimise fixed objectives without capacity for ongoing learning and adjustment tend to produce brittle, misaligned outcomes—a dynamic that improves when systems maintain epistemic humility and learn continuously.

Demis Hassabis and DeepMind’s Continual Learning Research

Demis Hassabis, CEO of DeepMind, has invested substantial research effort into systems that learn continually from environmental interaction rather than through discrete offline training phases. DeepMind’s work on continual reinforcement learning, meta-learning, and systems that adapt to new tasks reflects recognition that learning efficiency depends on how feedback is structured and integrated over time—not merely on total data quantity.

Judea Pearl: Causality and Abstraction

Judea Pearl, pioneering researcher in causal inference and probabilistic reasoning, has long argued that correlation-based learning has fundamental limits and that causal reasoning is necessary for genuine understanding and generalisation. His work on causal models and graphical representation of dependencies provides theoretical foundations for why systems that learn causal structure (rather than mere patterns) achieve better generalisation across domains.

The Research Agenda Going Forward

Sutskever’s claim that generalisation is the “very fundamental thing” reorients the entire research agenda. This shift has profound implications:

From scaling to methodology: Research emphasis moves from “how do we get more compute” to “what training procedures, architectural innovations, or learning algorithms enable human-like generalisation?”

From benchmarks to robustness: Evaluation shifts from benchmark performance to deployment reliability—how systems perform on novel, unconstrained tasks rather than carefully curated evaluations.

From monolithic pre-training to continual learning: The training paradigm shifts from discrete offline phases (pre-train, then RL, then deploy) toward systems that learn continuously from real-world interaction.

From scale as differentiator to ideas as differentiator: Competitive advantage in AI development becomes less about resource concentration and more about research insight—the organisation that discovers better generalisation principles gains asymmetric advantage.

The Deeper Question: What Humans Know That AI Doesn’t

Beneath Sutskever’s diagnostic claim lies a profound question: What do humans actually know about learning that AI systems don’t yet embody?

Humans learn efficiently because they:

Develop internal models of their own performance and progress (value functions)
Self-correct through continuous feedback rather than awaiting end-of-trajectory rewards
Transfer principles flexibly across domains rather than memorising domain-specific patterns
Learn from remarkably few examples through principled understanding rather than statistical averaging
Integrate feedback across time scales and contexts in ways that build robust, generalised knowledge

These capabilities do not require superhuman intelligence or extraordinary cognitive resources. A fifteen-year-old possesses them. Yet current AI systems, despite vastly larger parameter counts and more data, lack equivalent ability.

This gap is not accidental. It reflects that current AI development has optimised for the wrong targets—benchmark performance rather than genuine generalisation, scale rather than efficiency, memorisation rather than principled understanding. The next breakthrough requires not more of the same but fundamentally different approaches.

Conclusion: The Shift from Scaling to Discovery

Sutskever’s assertion that “these models somehow just generalize dramatically worse than people” is, at first glance, an observation of inadequacy. But reframed, it is actually a statement of profound optimism about what remains to be discovered. The fact that humans achieve vastly better generalisation proves that better generalisation is possible. The task ahead is not to accept poor generalisation as inevitable but to discover the principle that enables human-like learning efficiency.

This diagnostic shift—from “we need more compute” to “we need better understanding of generalisation”—represents the intellectual reorientation of AI research in 2025 and beyond. The age of scaling is ending not because scaling is impossible but because it has approached its productive limits. The age of research into fundamental learning principles is beginning. What emerges from this research agenda may prove far more consequential than any previous scaling increment.

Download brochure

Our latest podcasts on Spotify