“People have said we’re hitting a plateau every month for three years… I look at how models are produced and every part could be improved. The training pipeline is primitive, held together by duct tape, best efforts, and late nights. There’s so much room to grow everywhere.” – Sholto Douglas – Anthropic
Sholto Douglas made the statement during a major public podcast interview in October 2025, coinciding with Anthropic’s release of Claude Sonnet 4.5—at the time, the world’s strongest and most “agentic” AI coding model. The comment specifically rebuts repeated industry and media assertions that large AI models have reached a ceiling or are slowing in progress. Douglas argues the opposite: that the field is in a phase of accelerating advancement, driven both by transformative hardware investment (“compute super-cycle”), new algorithmic techniques (particularly reinforcement learning and test-time compute), and the persistent “primitive” state of today’s AI engineering infrastructure.
He draws an analogy with early-stage, improvisational systems: the models are held together “by duct tape, best efforts, and late nights,” making clear that immense headroom for improvement remains at every level, from training data pipelines and distributed infrastructure to model architecture and reward design. As a result, every new benchmark and capability reveals further unrealised opportunity, with measurable progress charted month after month.
Douglas’s deeper implication is that claims of a plateau often arise from surface-level analysis or the “saturation” of public benchmarks, not from a rigorous understanding of what is technically possible or how much scale remains untapped across the technical stack.
Sholto Douglas: Career Trajectory and Perspective
Sholto Douglas is a leading member of Anthropic’s technical staff, focused on scaling reinforcement learning and agentic AI. His unconventional journey illustrates both the new talent paradigm and the nature of breakthrough AI research today:
- Early Life and Mentorship: Douglas grew up in Australia, where he benefited from unusually strong academic and athletic mentorship. His mother, an accomplished physician frustrated by systemic barriers, instilled discipline and a systemic approach; his Olympic-level fencing coach provided a first-hand experience of how repeated, directed effort leads to world-class performance.
- Academic Formation: He studied computer science and robotics as an undergraduate, with a focus on practical experimentation and a global mindset. A turning point was reading the “scaling hypothesis” for AGI, convincing him that progress on artificial general intelligence was feasible within a decade—and worth devoting his career to.
- Independent Innovation: As a student, Douglas built “bedroom-scale” foundation models for robotics, working independently on large-scale data collection, simulation, and early adoption of transformer-based methods. This entrepreneurial approach—demonstrating initiative and technical depth without formal institutional backing—proved decisive.
- Google (Gemini and DeepMind): His independent work brought him to Google, where he joined just before the release of ChatGPT, in time to witness and help drive the rapid unification and acceleration of Google’s AI efforts (Gemini, Brain, DeepMind). He co-designed new inference infrastructure that reduced costs and worked at the intersection of large-scale learning, reinforcement learning, and applied reasoning.
- Anthropic (from 2025): Drawn by Anthropic’s focus on measurable, near-term economic impact and deep alignment work, Douglas joined to lead and scale reinforcement learning research—helping push the capability frontier for agentic models. He values a culture where every contributor understands and can articulate how their work advances both capability and safety in AI.
Douglas is distinctive for his advocacy of “taste” in AI research, favouring mechanistic understanding and simplicity over clever domain-specific tricks—a direct homage to Richard Sutton’s “bitter lesson.” This perspective shapes his belief that the greatest advances will come not from hiding complexity with hand-crafted heuristics, but from scaling general algorithms and rigorous feedback loops.
Intellectual and Scientific Context: The ‘Plateau’ Debate and Leading Theorists
The debate around the so-called “AI plateau” is best understood against the backdrop of core advances and recurring philosophical arguments in machine learning.
The “Bitter Lesson” and Richard Sutton
- Richard Sutton (University of Alberta, DeepMind), one of the founding figures in reinforcement learning, crystallised the field’s “bitter lesson”: that general, scalable methods powered by increased compute will eventually outperform more elegant, hand-crafted, domain-specific approaches.
- In practical terms, this means that the field’s recent leaps—from vision to language to coding—are powered less by clever new inductive biases, and more by architectural simplicity plus massive compute and data. Sutton has also maintained that real progress in AI will come from reinforcement learning with minimal task-specific assumptions and maximal data, computation, and feedback.
Yann LeCun and Alternative Paradigms
- Yann LeCun (Meta, NYU), a pioneer of deep learning, has maintained that the transformer paradigm is limited and that fundamentally novel architectures are necessary for human-like reasoning and autonomy. He argues that unsupervised/self-supervised learning and new world-modelling approaches will be required.
- LeCun’s disagreement with Sutton’s “bitter lesson” centres on the claim that scaling is not the final answer: new representation learning, memory, and planning mechanisms will be needed to reach AGI.
Shane Legg, Demis Hassabis, and DeepMind
- DeepMind’s approach has historically been “science-first,” tackling a broad swathe of human intelligence challenges (AlphaGo, AlphaFold, science AI), promoting a research culture that takes long-horizon bets on new architectures (memory-augmented neural networks, world models, differentiable reasoning).
- Demis Hassabis and Shane Legg (DeepMind co-founders) have advocated for testing a diversity of approaches, believing that the path to AGI is not yet clear—though they too acknowledge the value of massive scale and reinforcement learning.
The Scaling Hypothesis: GW’s Essay and the Modern Era
- The so-called “scaling hypothesis”—the idea that simply making models larger and providing more compute and data will continue yielding improvements—has become the default “bet” for Anthropic, OpenAI, and others. Douglas refers directly to this intellectual lineage as the critical “hinge” moment that set his trajectory.
- This hypothesis is now being extended into new areas, including agentic systems where long context, verification, memory, and reinforcement learning allow models to reliably pursue complex, multi-step goals semi-autonomously.
Summing Up: The Current Frontier
Today, researchers like Douglas are moving beyond the original transformer pre-training paradigm, leveraging multi-axis scaling (pre-training, RL, test-time compute), richer reward systems, and continuous experimentation to drive model capabilities in coding, digital productivity, and emerging physical domains (robotics and manipulation).
Douglas’s quote epitomises the view that not only has performance not plateaued—every “limitation” encountered is a signpost for further exponential improvement. The modest, “patchwork” nature of current AI infrastructure is a competitive advantage: it means there is vast room for optimisation, iteration, and compounding gains in capability.
As the field races into a new era of agentic AI and economic impact, his perspective serves as a grounded, inside-out refutation of technological pessimism and a call to action grounded in both technical understanding and relentless ambition.

