Select Page

15 Jan 2026 | 0 comments

"Since 2020, we have seen a 600 000x increase in the computational scale of decentralized training projects, for an implied growth rate of about 20x/year." - Jack Clark - Import AI

“Since 2020, we have seen a 600 000x increase in the computational scale of decentralized training projects, for an implied growth rate of about 20x/year.” – Jack Clark – Import AI

Jack Clark on Exponential Growth in Decentralized AI Training

The Quote and Its Context

Jack Clark’s statement about the 600,000x increase in computational scale for decentralized training projects over approximately five years (2020-2025) represents a striking observation about the democratization of frontier AI development.1,2,3,4 This 20x annual growth rate reflects one of the most significant shifts in the technological and political economy of artificial intelligence: the transition from centralized, proprietary training architectures controlled by a handful of well-capitalized labs toward distributed, federated approaches that enable loosely coordinated collectives to pool computational resources globally.

Jack Clark: Architect of AI Governance Thinking

Jack Clark is the Head of Policy at Anthropic and one of the most influential voices shaping how we think about AI development, governance, and the distribution of technological power.1 His trajectory uniquely positions him to observe this transformation. Clark co-authored the original GPT-2 paper at OpenAI in 2019, a moment he now reflects on as pivotal—not merely for the model’s capabilities, but for what it revealed about scaling laws: the discovery that larger models trained on more data would exhibit predictably superior performance across diverse tasks, even without task-specific optimization.1

This insight proved prophetic. Clark recognized that GPT-2 was “a sketch of the future”—a partial glimpse of what would emerge through scaling. The paper’s modest performance advances on seven of eight tested benchmarks, achieved without narrow task optimization, suggested something fundamental about how neural networks could be made more generally capable.1 What followed validated his foresight: GPT-3, instruction-tuned variants, ChatGPT, Claude, and the subsequent explosion of large language models all emerged from the scaling principles Clark and colleagues had identified.

However, Clark’s thinking has evolved substantially since those early days. Reflecting in 2024, five years after GPT-2’s release, he acknowledged that while his team had anticipated many malicious uses of advanced language models, they failed to predict the most disruptive actual impact: the generation of low-grade synthetic content driven by economic incentives rather than malicious intent.1 This humility about the limits of foresight informs his current policy positions.

The Political Economy of Decentralized Training

Clark’s observation about the 600,000x scaling in decentralized training projects is not merely a technical metric—it is a statement about power distribution. Currently, the frontier of AI capability depends on the ability to concentrate vast amounts of computational resources in physically centralized clusters. Companies like Anthropic, OpenAI, and hyperscalers like Google and Meta control this concentrated compute, which has enabled governments and policymakers to theoretically monitor and regulate AI development through chokepoints: controlling access to advanced semiconductors, tracking large training clusters, and licensing centralized development entities.3,4

Decentralized training disrupts this assumption entirely. If computational resources can be pooled across hundreds of loosely federated organizations and individuals globally—each contributing smaller clusters of GPUs or other accelerators—then the frontier of AI capability becomes distributed across many actors rather than concentrated in a few.3,4 This changes everything about AI policy, which has largely been built on the premise of controllable centralization.

Recent proof-of-concepts underscore this trajectory:

  • Prime Intellect’s INTELLECT-1 (10 billion parameters) demonstrated that decentralized training at scale was technically feasible, a threshold achievement because it showed loosely coordinated collectives could match capabilities that previously required single-company efforts.3,9
  • INTELLECT-2 (32 billion parameters) followed, designed to compete with modern reasoning models through distributed training, suggesting that decentralized approaches were not merely proof-of-concept but could produce competitive frontier-grade systems.4
  • DiLoCoX, an advancement on DeepMind’s DiLoCo technology, demonstrated a 357x speedup in distributed training while achieving model convergence across decentralized clusters with minimal network bandwidth (1Gbps)—a crucial breakthrough because communication overhead had previously been the limiting factor in distributed training.2

The implied growth rate of 20x annually suggests an acceleration curve where technical barriers to decentralized training are falling faster than regulatory frameworks or policy interventions can adapt.

Leading Theorists and Intellectual Lineages

Scaling Laws and the Foundations

The intellectual foundation for understanding exponential growth in AI capabilities rests on the work of researchers who formalized scaling laws. While Clark and colleagues at OpenAI contributed to this work through GPT-2 and subsequent research, the broader field—including contributions from Jared Kaplan, Dario Amodei, and others at Anthropic—established that model performance scales predictably with increases in parameters, data, and compute.1 These scaling laws create the mathematical logic that enables decentralized systems to be competitive: a 32-billion-parameter model trained via distributed methods can approach the capabilities of centralized training at similar scales.

Political Economy and Technological Governance

Clark’s thinking is situated within broader intellectual traditions examining how technology distributes power. His emphasis on the “political economy” of AI reflects influence from scholars and policymakers concerned with how technological architectures embed power relationships. The notion that decentralized training redistributes who can develop frontier AI systems draws on longstanding traditions in technology policy examining how architectural choices (centralized vs. distributed systems) have political consequences.

His advocacy for polycentric governance—distributing decision-making about AI behavior across multiple scales from individuals to platforms to regulatory bodies—reflects engagement with governance theory emphasizing that monocentric control is often less resilient and responsive than systems with distributed decision-making authority.5

The “Regulatory Markets” Framework

Clark has articulated the need for governments to systematically monitor the societal impact and diffusion of AI technologies, a position he advanced through the concept of “Regulatory Markets”—market-driven mechanisms for monitoring AI systems. This framework acknowledges that traditional command-and-control regulation may be poorly suited to rapidly evolving technological domains and that measurement and transparency might be more foundational than licensing or restriction.1 This connects to broader work in regulatory innovation and adaptive governance.

The Implications of Exponential Decentralization

The 600,000x growth over five years, if sustained or accelerated, implies several transformative consequences:

On AI Policy: Traditional approaches to AI governance that assume centralized training clusters and a small number of frontier labs become obsolete. Export controls on advanced semiconductors, for instance, become less effective if 100 organizations in 50 countries can collectively train competitive models using previous-generation chips.3,4

On Open-Source Development: The growth depends crucially on the availability of open-weight models (like Meta’s LLaMA or DeepSeek) and accessible software stacks (like Prime.cpp) that enable distributed inference and fine-tuning.4 The democratization of capability is inseparable from the proliferation of open-source infrastructure.

On Sovereignty and Concentration: Clark frames this as essential for “sovereign AI”—the ability for nations, organizations, and individuals to develop and deploy capable AI systems without dependence on centralized providers. However, this same decentralization could enable the rapid proliferation of systems with limited safety testing or alignment work.4

On Clark’s Own Policy Evolution: Notably, Clark has found himself increasingly at odds with AI safety and policy positions he previously held or was associated with. He expresses skepticism toward licensing regimes for AI development, restrictions on open-source model deployment, and calls for worldwide development pauses—positions that, he argues, would create concentrated power in the present to prevent speculative future risks.1 Instead, he remains confident in the value of systematic societal impact monitoring and measurement, which he has championed through his work at Anthropic and in policy forums like the Bletchley and Seoul AI safety summits.1

The Unresolved Tension

The exponential growth in decentralized training capacity creates a central tension in AI governance: it democratizes access to frontier capabilities but potentially distributes both beneficial and harmful applications more widely. Clark’s quote and his broader work reflect an intellectual reckoning with this tension—recognizing that attempts to maintain centralized control through policy and export restrictions may be both technically infeasible and politically counterproductive, yet that some form of measurement and transparency remains essential for democratic societies to understand and respond to AI’s societal impacts.

 

References

1. https://jack-clark.net/2024/06/03/import-ai-375-gpt-2-five-years-later-decentralized-training-new-ways-of-thinking-about-consciousness-and-ai/

2. https://jack-clark.net/2025/06/30/import-ai-418-100b-distributed-training-run-decentralized-robots-ai-myths/

3. https://jack-clark.net/2024/10/14/import-ai-387-overfitting-vs-reasoning-distributed-training-runs-and-facebooks-new-video-models/

4. https://jack-clark.net/2025/04/21/import-ai-409-huawei-trains-a-model-on-8000-ascend-chips-32b-decentralized-training-run-and-the-era-of-experience-and-superintelligence/

5. https://importai.substack.com/p/import-ai-413-40b-distributed-training

6. https://www.youtube.com/watch?v=uRXrP_nfTSI

7. https://importai.substack.com/p/import-ai-375-gpt-2-five-years-later/comments

8. https://jack-clark.net

9. https://jack-clark.net/2024/12/03/import-ai-393-10b-distributed-training-run-china-vs-the-chip-embargo-and-moral-hazards-of-ai-development/

10. https://www.lesswrong.com/posts/iFrefmWAct3wYG7vQ/ai-labs-statements-on-governance

 

Download brochure

Introduction brochure

What we do, case studies and profiles of some of our amazing team.

Download

Our latest podcasts on Spotify
Global Advisors | Quantified Strategy Consulting