‌

A daily bite-size selection of top business content.

PM edition. Issue number 1345

Latest 10 stories. Click the button for more.

‌

Quote: Andrej Karpathy - Anthropic (Openai Founder, formerly head of Tesla AI)

"You can give [Claude Fable 5, the same underlying model as Mythos but with added safeguards] a lot more ambitious tasks than what you're used to, the model 'gets it' and it will just go, and it's never felt this tempting to stop looking at the code at all." - Andrej Karpathy - Anthropic (Openai Founder, formerly head of Tesla AI)

Modern software development is being pulled towards a regime where human programmers increasingly specify intent while machines decide the details. That shift is most visible not in abstract benchmarks but in the psychological moment when an expert developer realises they can hand over a far more ambitious task than before, watch an AI system autonomously decompose and implement it, and feel genuine temptation to stop inspecting every line it produces. For someone steeped in traditional notions of craftsmanship and code review, that temptation is both intoxicating and alarming.

The move from instructions to intent

The underlying transition is from imperative programming, where humans micromanage every step, to a declarative style where they specify success criteria and let an AI agent find a path. Historically, a senior engineer might spend hours designing architecture, writing scaffolding, and orchestrating tools; now, high-capability models can generate coherent multi-file projects, manage dependencies, restructure modules, and even propose test suites to validate their own work. In that world, the bottleneck shifts from typing speed or API recall to the clarity and completeness of the human specification.

This is what makes the ability to give a model substantially more ambitious tasks so significant. When an AI system can handle not just a function or a bug fix but an end-to-end feature, a migration, or a refactor across tens of files, the human role changes. The developer becomes more of a product and safety architect: deciding goals, constraints, and trade-offs, then auditing whether the agent met them. The quote speaks directly to this: work that once had to be decomposed into micro-prompts can now be expressed as a single high-level directive, with the model reliably filling in the operational gaps.

Why this particular endorsement matters

The significance of that shift is amplified by who is speaking. Andrej Karpathy is not a casual user experimenting with consumer tooling but a foundational figure in modern deep learning and applied AI. He was a founding member of OpenAI and later headed Tesla Autopilot, leading large teams building vision and planning systems for real-world safety-critical autonomy. He has also been one of the most visible educators in the field, teaching tens of thousands of practitioners how neural networks work and how to reason about their failure modes.

That background makes his sense of surprise at feeling "behind" as a programmer, and the ego hit of giving more of the work to AI, noteworthy. When someone with deep understanding of model limitations and training quirks reports that a code-focused model "gets it" on larger, more complex tasks, it is not naive enthusiasm but a statement grounded in long experience of what usually goes wrong. It suggests a step change in practical reliability, especially on extended coding sessions.

Claude Fable 5, Mythos and safeguards

The specific model involved, Claude Fable 5, sits in a deliberately structured product family. Anthropic describes Fable 5 as sharing the same underlying model as Claude Mythos, but with additional safeguards and alignment layers. In practice, that means the core capability for long, multi-step reasoning and coding is retained, while the system has tighter policies around potentially harmful outputs and more conservative behaviours in ambiguous domains. Mythos is aimed at frontier, less constrained exploration; Fable, at general-purpose work where safety, compliance, and predictability matter.

Karpathy himself characterises Fable 5 as a "major-version-bump-deserving step change", particularly on long and difficult tasks. Reports and demos around the release highlight stronger performance not only on standard coding benchmarks but on real-world development flows: navigating large repositories, performing multi-file edits, and maintaining context over extended sessions. The result is a system that feels less like an autocomplete gadget and more like a junior engineer who can hold a problem in mind over hours of work.

Crucially, the "added safeguards" do not just refer to refusal policies. They also encompass training and inference-time measures that make the model more robust to prompt injection, reduce hallucinations, and bias it toward verifiable operations like running tests or inspecting diffs instead of bluffing. That combination - high capability plus strong guardrails - is what makes handing over more ambitious tasks psychologically viable. The user is not simply trusting a stochastic parrot; they are interacting with a toolchain engineered for cautious autonomy.

Karpathy's journey into agentic coding

To understand the deeper significance of the quote, it helps to situate it within Karpathy's broader ideas about software. In recent years he has described a transition towards what he calls "Software 3.0" or "agentic coding". Earlier eras could be caricatured as follows: in Software 1.0, humans wrote explicit logic; in Software 2.0, humans trained models but still wrote the surrounding infrastructure; in Software 3.0, AI systems increasingly write, test, and maintain significant portions of the codebase themselves, guided by human intent and oversight.

Within that frame, he has promoted practices like "vibe coding", where a developer converses with an AI assistant, iteratively refining prompts and reading outputs rather than manually hand-crafting every function. The point is not laziness but bandwidth: by offloading boilerplate and low-level wiring, humans can spend more time on product thinking, architecture, and evaluation. Yet he has also been candid about the danger of "brain atrophy" if humans stop engaging deeply with technical substance and become mere prompt routers.

His move to Anthropic, announced publicly on X, is explicitly about pushing this paradigm further. He is joining the Claude pre-training team, with a mandate to build a sub-team focused on using Claude itself to accelerate pre-training research. That is, he is not only using AI to write ordinary application code but using AI agents to help design, run, and analyse the experiments that produce the next generation of AI models. Some observers describe this as laying the groundwork for recursive self-improvement, where systems contribute directly to their own advancement.

The temptation to stop reading the code

The most charged part of the quote is not the praise for task capability but the admission that "it's never felt this tempting to stop looking at the code at all". That sentence crystallises a new risk frontier. Up to now, cautious practitioners have recommended heavy human inspection of AI-generated code: checking logic, scanning for security flaws, reviewing for maintainability. Those practices are time-consuming, but they preserve a culture where humans remain accountable for what ships.

As model quality improves, the marginal benefit of reading every line may appear to shrink. When the output often looks clean, idiomatic, and passes tests, the pressure to skim rather than scrutinise grows stronger. That temptation is exacerbated by business incentives. If an AI agent can implement a feature in 1 hour that would take a human 10 hours, organisations will be driven to capture that 9-hour gain, especially under competitive pressure. Deep review may be cast as optional overhead rather than mandatory safety.

This dynamic is not unique to coding. In aviation, pilots became less hands-on as autopilots grew more reliable, leading to worries about skill decay; yet in rare edge cases, human intervention remained vital. The same pattern looms in software: as AI-generated code becomes the default, there is a risk that fewer engineers retain the ability to reason from first principles when the system fails in a novel way.

Strategic and technological tension

The tension, then, is between speed and scrutiny, between trusting an increasingly competent agent and insisting on human understanding. On one side lies the productivity windfall: AI can manage dependency graphs, propose architecture refactors, and generate regression tests at a pace that would overwhelm any human team. On the other side lies epistemic opacity: large language models generate code via pattern completion, not explicit formal derivation, and even when the code passes tests, it may encode subtle bugs, non-obvious security weaknesses, or performance pathologies.

In safety-conscious organisations, this tension will likely be addressed with layered controls. For critical systems, one can imagine a workflow where an AI agent proposes changes, another independent agent attempts to break or exploit them, and human reviewers arbitrate. For less critical contexts, teams may accept a higher degree of automated autonomy, using telemetry and canary deployments to catch regressions in production.

Technologically, the quote points to a world where coding models are integrated deeply into development environments as persistent agents rather than stateless assistants. In that world, the system remembers project history, tracks unresolved issues, and maintains a map of the codebase. This is already visible in the way tools like Claude Code are embedded into full IDE surfaces where generation, testing, and git operations happen in one loop. The practical question is not whether such agents will exist but what guardrails and observability layers they will carry.

Anthropic's safety-first positioning

Anthropic has invested heavily in a brand and research agenda built around "constitutional" AI and safety. That approach involves specifying normative guidelines that models are trained to follow, and then auditing behaviour against those guidelines. For coding, that can be extended into concrete policies: refuse to write insecure patterns, prefer constant-time implementations in cryptographic contexts, suggest mitigation when encountering user-supplied input.

Fable 5's positioning as "Mythos but safe" reflects a belief that potential harms can be reduced without sacrificing too much capability. Karpathy's enthusiasm suggests that, at least in his workflows, the safeguards are not experienced as a hindrance but as a trust multiplier. He can instruct the model more ambitiously precisely because he expects it to act conservatively when it encounters sensitive operations and to avoid reckless actions like deleting large portions of a repository without confirmation.

Yet there remains an unresolved debate over how far safety techniques can go in mitigating risks that emerge from sheer scale and generality. Even a strongly aligned model may generate exploitable code when given innocuous prompts, simply because the space of correct-looking but vulnerable implementations is vast. Critics argue that this cannot be fully addressed by refusal policies and that deep formal methods or language-level safety guarantees will be necessary. The temptation to "stop looking at the code" must be evaluated against that backdrop.

Debates and objections

There are at least four major lines of objection or concern surrounding the world implied by the quote.

First, there is the professional identity and labour market concern. If AI tools can handle an increasing share of coding, especially the more routine or boilerplate-heavy parts, junior roles may shrink, making it harder for new developers to gain experience. Karpathy himself acknowledges a crossroads between "brain atrophy" and skill evolution, where humans must decide whether to re-skill towards higher-level system design and evaluation or risk being displaced.

Second, there is the epistemic reliability concern. Benchmarks can show impressive averages, but systems are still brittle on rare edge cases, poorly specified tasks, or ambiguous requirements. A sense that "the model gets it" can mask the fact that its understanding is statistical, not semantic in a human sense. Critics worry that as trust grows, organisations will deploy AI-generated code beyond domains where its failure modes are well characterised.

Third, there is the self-referential risk of using AI to build the next generation of AI. The work Karpathy is taking on at Anthropic involves using Claude to accelerate pre-training research itself, potentially moving towards recursive self-improvement. Enthusiasts argue that this is necessary to make progress at the current frontier, where experiments are too numerous and complex for purely human pipelines. Skeptics warn that errors, biases, or misalignments may be amplified if AI-driven research loops are not carefully constrained and audited.

Fourth, there is the cultural concern. Software engineering has long valued code readability not only for maintainability but as a vehicle for knowledge sharing. If more of the codebase is generated and fewer humans read it deeply, tacit knowledge may concentrate in the behaviour of models rather than in the minds of engineers. Some fear a loss of craftsmanship and a drift towards opaque systems even within a single organisation.

Why this moment matters

Despite these concerns, the practical direction of travel is clear. Developers are already wiring multiple frontier models into a single development surface, choosing per-task which to call, whether Claude, GPT, or others, based on performance and cost rather than vendor loyalty. Tools that bundle coding, testing, and version control into agentic workflows are proliferating. The quote captures a threshold where these tools no longer feel like experimental sidekicks but like the primary engine of implementation.

From a strategic perspective, this changes how organisations think about their software capability. Instead of asking how many engineers they can hire, they will ask how effectively they can orchestrate AI coding capacity: prompt libraries, evaluation harnesses, and safety procedures become as important as hiring pipelines. Companies that embrace this shift thoughtfully will invest in engineers who are excellent at specifying intent, designing tests, and auditing AI proposals - a different profile from traditional full-stack roles.

For individual developers, it poses a challenge and an invitation. The challenge is to resist the laziness of unexamined trust while also resisting nostalgia for a world where writing every line oneself was feasible. The invitation is to climb the abstraction ladder: to become better at defining product goals, at thinking in systems, at debugging not just functions but entire AI-assisted workflows.

Karpathy's experience with Claude Fable 5 illustrates that frontier models are now strong enough to make this shift emotionally palpable. When a veteran practitioner feels tempted to stop reading the code, that is not a signal to give up scrutiny, but it is evidence that the agent has crossed a qualitative threshold. The world of software will be shaped by how we respond to that feeling: whether by surrendering to it, ignoring it, or deliberately building new practices, tools, and norms that harness its power without abandoning responsibility.

"You can give [Claude Fable 5, the same underlying model as Mythos but with added safeguards] a lot more ambitious tasks than what you're used to, the model 'gets it' and it will just go, and it's never felt this tempting to stop looking at the code at all." - Quote: Andrej Karpathy - Anthropic (Openai Founder, formerly head of Tesla AI)

‌

Term: Absorption costing - Managerial accounting

"Absorption costing, also known as full costing, is a managerial accounting method that captures and assigns all manufacturing costs to the specific products being produced. Under this system, the unit cost of an item absorbs every single expense required to get it ready for sale, including both fixed and variable costs." - Absorption costing - Managerial accounting

Profitability in manufacturing depends as much on how costs are measured as on how efficiently factories run. The way overheads such as factory rent, depreciation and supervisory salaries are spread across products can change reported margins, influence pricing, and even affect behaviour inside the plant. Absorption costing sits at the centre of this machinery, because it drives the unit cost that flows into inventory valuation, cost of goods sold, and headline profit figures used by boards, lenders and tax authorities alike.

Underlying economic issue: who should bear the fixed factory bill?

Manufacturing businesses incur large fixed costs to keep production capacity available: buildings, machines, salaried staff and support functions. These expenses are paid regardless of whether the factory runs at 20 percent or 90 percent of capacity. The central issue is how to attribute this fixed factory bill to individual units of output so that financial statements, pricing decisions and performance assessments make sense.

Absorption costing answers by insisting that every unit produced should carry a fair slice of that fixed burden, alongside its direct materials, direct labour and variable overhead. In other words, the economic logic is that capacity costs exist in order to make units, so units must "absorb" them. This contrasts with variable costing, where fixed manufacturing overhead is treated as a period expense of having capacity, rather than a cost of individual units.

The tension between these views is not merely academic. It determines whether unsold inventory carries embedded fixed overhead on the balance sheet (absorption costing) or whether all fixed overhead hits the income statement immediately (variable costing). The result is different profit paths over time when production and sales volumes diverge.

Substantive meaning: what costs are absorbed?

In practice, absorption costing brings together four categories of manufacturing cost as product cost:

- Direct materials

- Direct labour

- Variable manufacturing overhead (for example, indirect supplies, power linked to machine hours)

- Fixed manufacturing overhead (for example, factory rent, depreciation, factory management salaries)

These costs are all treated as part of inventory while units remain unsold and only become cost of goods sold when the units leave inventory. Selling, general and administrative costs, whether fixed or variable, remain period costs and are never attached to units.

From a financial reporting standpoint, this approach is not optional. Under major accounting frameworks, inventory must be carried at cost, including an appropriate allocation of fixed and variable production overhead. Absorption costing therefore underpins external profit reporting, tax computation and many loan covenant calculations.

Mathematical specification of unit cost under absorption costing

Although the mechanics appear straightforward, writing the relationships explicitly clarifies how production volume and allocation rates interact. Suppose a single product is manufactured in a period. Denote:

- : total direct materials cost for the period

- : total direct labour cost

- : total variable manufacturing overhead

- : total fixed manufacturing overhead

- : total units produced in the period

The total product cost for the period under absorption costing is:

The absorption costing unit cost is then:

Variable costing would instead treat only variable elements as product cost. Let be total variable manufacturing cost (). The variable costing unit cost is:

The difference between the two unit costs is simply the fixed overhead per unit:

This fixed overhead rate, often computed per machine hour or labour hour in multi-product environments, is the core mechanism by which overhead is absorbed into inventory. When production volume rises, increases, reducing fixed overhead per unit; when volume falls, each unit carries a heavier fixed overhead charge.

Income effects: production vs sales volume

The choice of costing method does not change total cash flows, but it can change the timing of reported profit. Under absorption costing, the fixed overhead tied to unsold units remains in inventory and is not yet expensed. Under variable costing, all fixed manufacturing overhead for the period appears immediately as an expense. As a result, in any period where production exceeds sales, absorption costing will usually show higher profit than variable costing; when production is below sales, the reverse occurs.

A simple reconciliation highlights the mechanism. Define:

- : units sold in the period

- : change in inventory units (positive if inventory grows)

- : fixed overhead per unit produced

The difference between absorption costing net income () and variable costing net income () in a period is:

When production exceeds sales so that , fixed overhead is deferred in inventory and exceeds . When sales draw down inventory so that , previously deferred fixed overhead flows to cost of goods sold, making lower than . When production equals sales, both methods report the same profit.

This algebra explains why standard-setting bodies still require absorption costing for external reporting but many internal management reports supplement it with variable or contribution costing to show the direct profit impact of volume changes.

Practical mechanics: cost pools and allocation bases

The theoretical unit cost formulas mask a significant practical challenge: allocating overhead to products in a way that is both systematic and economically meaningful. In a multi-product plant, overheads are typically collected into cost pools and assigned to products using allocation bases such as machine hours, labour hours, or material quantity.

A typical implementation proceeds in three stages:

- Establish cost pools: group similar overhead costs, for example all machine-related expenses, maintenance, and depreciation into a machinery pool; factory management salaries into a supervision pool.

- Determine usage measures: identify the driver that best reflects how products consume each cost pool, such as machine hours, direct labour hours, or production runs.

- Compute and apply rates: divide each pool by its total driver quantity to obtain a rate (for example, per machine hour), then multiply by each product's usage to assign overhead.

Absorption costing does not prescribe a particular choice of allocation base; the method is an overarching principle that all manufacturing costs should be absorbed by units. The sophistication of the allocation scheme can range from a single plant-wide rate to detailed activity-based costing with many cost pools and drivers.

Relation to variable costing and contribution analysis

Variable costing strips away the fixed overhead component of unit cost, focusing on the marginal resource consumption of each unit. For internal decision-making, this provides a cleaner view of how additional units affect profit because fixed overhead is held constant. Contribution margin analysis, which subtracts variable costs from sales to show the amount available to cover fixed costs and profit, is built on this variable costing logic.

The key contrast can be summarised conceptually:

- Absorption costing: all manufacturing costs, including fixed overhead, are product costs; inventory includes fixed overhead; external reporting requirement.

- Variable costing: only variable manufacturing costs are product costs; fixed manufacturing overhead is a period cost; used internally for planning, pricing, and performance evaluation.

Managers need both lenses. Absorption costing ensures financial statements comply with standards and reflect the full cost invested in inventory. Variable costing illuminates how decisions about volume, mix, and pricing will change cash profit in the short and medium term.

Major schools of thought and debates

Within managerial accounting, debates around absorption costing centre on three themes: performance measurement, decision relevance and overhead allocation philosophy.

First, performance measurement. Critics argue that tying profit to production volume via overhead absorption can create perverse incentives. Because producing more units spreads fixed overhead over more units, the unit cost falls, cost of goods sold per unit drops, and short-term profit often rises as long as the additional units go into inventory rather than being sold at a loss. This can encourage managers evaluated on absorption-based profit to overproduce relative to demand, leading to excess inventory, storage costs and potential obsolescence.

Proponents respond that robust inventory and working capital controls, together with careful use of variable costing and non-financial metrics, can mitigate these incentives while preserving the benefits of full cost information for pricing and long-term investment decisions.

Second, decision relevance. For decisions such as special orders, make-or-buy evaluations, or short-term pricing in the face of spare capacity, the fixed overhead portion of unit cost is sunk in the short run and should not drive the decision. Analysts therefore often ignore the absorbed fixed overhead in unit cost and instead work from variable costs and incremental cash flows. This creates a conceptual split between the "accounting cost" of a unit (including overhead) and the "economic cost" relevant for a particular decision scenario.

Third, overhead allocation philosophy. Traditional absorption costing usually allocates overhead using volume-based drivers like labour or machine hours. As production technologies and product diversity expanded, critics pointed out that such bases can distort product costs: low-volume, complex products may consume disproportionate setup and scheduling resources that do not scale with simple machine hours. Activity-based costing emerged as a refinement, retaining the absorption principle but using multiple cost drivers linked to underlying activities. This evolution reflects a broader debate about whether any allocation of common fixed costs is inherently arbitrary or whether careful design can approximate economic cause-and-effect sufficiently for management use.

Why absorption costing still matters

Despite these criticisms and refinements, absorption costing remains central to financial management for several reasons.

First, it is mandated for external reporting and taxation. Inventory must include an allocation of fixed overhead under accounting standards, which means any manufacturer preparing audited accounts must implement some form of absorption costing. As a result, banks, investors and regulators interpret performance largely through absorption-based statements.

Second, it anchors pricing and profitability analysis in the full cost base. Over time, businesses must recover both variable and fixed manufacturing costs through prices if they are to remain viable. While short-run decisions can legitimately use variable cost information, sustainable pricing strategies need to recognise the burden of capacity costs, which absorption costing surfaces.

Third, it disciplines capacity investment and utilisation decisions. By making fixed overhead visible within unit costs, absorption costing signals when capacity is under-utilised and factory-scale economics are deteriorating. Rising unit costs due to falling volume highlight the financial consequences of excess capacity or lost demand, encouraging rebalancing either through market expansion or capacity reduction.

Finally, it provides a common language for integrating financial control with operational data. Overhead rates per machine hour or per labour hour connect accounting records to shop-floor metrics, enabling cost variance analysis, standard costing systems and budgetary control. Even when management decisions rely on more refined models, the absorption framework underlies many of the control reports they receive.

Contemporary practice and evolving challenges

Modern manufacturing environments pose new challenges for absorption costing. Automation reduces direct labour content and increases capital intensity, weakening the link between simple volume measures and true resource consumption. Multi-site global supply chains complicate the definition of what counts as "manufacturing" overhead for a particular product. Customisation and short product life cycles create more setup and engineering costs, whose allocation may dominate traditional overhead pools.

Practitioners respond by:

- Refining cost pools and drivers, for example separating machine-level overhead, setup costs, quality assurance and engineering support so that each is allocated using an appropriate activity driver.

- Integrating operational systems with costing, using data from production execution and planning systems to update overhead drivers in near real time.

- Running parallel views: one set of absorption-based numbers for external reporting and high-level budgeting, and alternative contribution and activity-based analyses for operational decisions.

Even as digital tools make more sophisticated costing feasible, the fundamental requirement remains: inventory values on the balance sheet and cost of goods sold in the income statement must reflect all manufacturing costs, including an allocation of fixed overhead. Absorption costing provides the conceptual and procedural backbone for meeting that requirement.

Understanding how this method works, where it can mislead, and how it interacts with alternative views such as variable and activity-based costing equips managers, analysts and students to interpret reported margins critically, design better performance measures and make more informed operational and strategic decisions.

‌

Quote: Anthropic - Artificial Intelligence - Recursive Self Improvement

"Claude writes a significant proportion of Anthropic's code. As of May 2026, more than 80% of the code we merge into Anthropic's codebase was authored by Claude. Before Claude Code launched in research preview in February 2025, this number was in the low single digits." - Anthropic - Artificial Intelligence - Recursive Self Improvement

The moment an internal engineering metric flips from human-written to AI-written code marks a structural shift in how complex software systems are built and evolved, not just a productivity bump for individual programmers. It signals that the primary generative force shaping a large codebase has become a model rather than a workforce, and that human engineers are increasingly curators, reviewers, and system designers guiding a non-human author.

In Anthropic's case, that shift is tightly bound to a broader concern: the trajectory from powerful coding assistants to systems that can meaningfully participate in, and eventually drive, the entire AI research and development cycle. When an AI model can write most of the code for its own infrastructure, tools, and scaffolding, the boundary between "AI helps humans build AI" and "AI builds AI" becomes thinner, and the timeline to more thorough forms of recursive self-improvement compresses.

From coding assistant to dominant author

Large language models like Claude were initially introduced as general-purpose assistants: chatbots that could answer questions, draft text, help with documents, and generate basic code. Early coding capabilities looked like autocomplete on steroids: filling in small functions, refactoring snippets, or suggesting tests. In that phase, AI was clearly subordinate to the human developer, integrated into IDEs as a suggestion layer with humans still doing the conceptual work, system design, and most of the implementation.

The internal numbers highlighted by Anthropic indicate that this relationship has inverted in at least one crucial dimension: the share of merged code now primarily authored by the model rather than by employees. Human engineers still specify goals, review diffs, and orchestrate work, but the bulk of literal line-by-line code is machine-generated. Independent developers using Claude Code describe a similar workflow: they treat the AI interface almost as the primary editor, with a traditional editor demoted to a verification and correction tool. One typical pattern is to spend most of the time explaining the problem and iterating on plans with the model, then auto-accept its changes, and only afterwards manually review and adjust. That mirrors the internal picture: humans move up a level of abstraction, while the model handles implementation detail at scale.

The key structural consequence is that the constraint on how fast a codebase can change shifts away from human typing speed or individual concentration. Instead, the main bottlenecks become prompt quality, review capacity, testing infrastructure, and organisational willingness to deploy AI-authored changes. Once those guardrails are in place, the marginal cost of asking the AI to implement yet another subsystem approaches the cost of specifying it, rather than building it yourself.

Recursive self-improvement: several distinct mechanisms

The idea of recursive self-improvement (RSI) in AI originally focused on a dramatic scenario: a sufficiently capable system rewrites its own code, becomes smarter, uses that increased intelligence to further rewrite itself, and so on, producing an "intelligence explosion". In more formal discussions, RSI is framed as a process where an AI improves its own ability to improve, potentially leading to superintelligence if the feedback loop is strong enough. For decades this remained hypothetical, because no deployed system could modify its own internals in a reliable, directed way.

Recent work on RSI has clarified that there are at least three separable mechanisms, each with different bottlenecks and risk profiles. First, there is what some researchers call scaffolding-level improvement: you keep the base model weights fixed but wrap the model in better tools, agents, and workflows that make more effective use of its capabilities over time. Coding agents that orchestrate tool calls, decompose tasks into subproblems, and maintain long-lived workspaces fall into this category. The AI does not change itself directly, but the environment around it is iteratively improved-often with heavy AI assistance.

Second, there is improvement of the broader AI research and engineering process. Here, models help design better architectures, tune hyperparameters, automate experiments, and analyse results. The AI is not rewriting its own weights on the fly but is heavily used by human researchers to run more experiments faster, test more ideas, and push the frontier models forward. In effect, the research pipeline that generates new models is being partially automated by prior models, shortening cycle times.

Third, there is the more classical vision of model-internal self-modification: a system that can inspect, reason about, and deliberately rewrite its own internal structure. In the current deep learning paradigm, this would require some combination of advanced mechanistic interpretability and internal training or optimisation loops guided by the model itself. This is the least empirically grounded category today; there are not yet widely documented systems that autonomously edit their own weights in a stable, predictable way in production, without external training pipelines.

Anthropic's published analysis emphasises that the world is beginning to see concrete progress in the first two forms of RSI, while the third remains more speculative but increasingly relevant. The metric that more than four-fifths of merged code comes from Claude is directly relevant to the first two types: scaffolding-level improvement and research-process acceleration. It is not yet full-blown self-modifying AI, but it clearly moves along the continuum from "AI as a tool" to "AI as a primary agent in its own development ecosystem".

What does it mean for AI to "build itself"?

In its report "When AI builds itself", Anthropic defines a future regime in which AI systems can design, implement, and train successor models with minimal human involvement. That scenario includes choosing research directions, generating experimental configurations, running training runs, monitoring results, and iteratively refining architectures, all mediated by models rather than individual researchers. The report stresses that current systems have not yet reached this stage, but the pattern of automation suggests a trajectory that could plausibly converge towards it in the medium term.

Already, tools like Claude Code enable models to handle much of the mundane engineering needed to integrate new components, instrument experiments, and manage evaluation pipelines. For example, a model can generate scripts to launch training runs, write configuration files for different hyperparameter sweeps, produce dashboards for monitoring metrics, and adapt code to new hardware or inference setups. Engineers remain in the loop to approve designs, interpret anomalies, and adjust objectives, but they increasingly operate at the level of specifying desired behaviours and constraints rather than manually wiring every detail.

Once the majority of the code surrounding the training and deployment pipeline is generated by models, the human role shifts to defining goals, setting safety criteria, and analysing higher-level trade-offs. The mechanics of "building"-in the sense of constructing new experimental setups, converting research ideas into running code, and instrumenting systems-becomes heavily AI-mediated. Over time, if models learn from this process (for instance by analysing successful and failed experiments), they can become better at designing and conducting AI research itself.

Strategic and technological tensions

The shift towards AI-written code simultaneously advances capability and heightens safety concerns. On the one hand, organisations that can mobilise models as large-scale coding engines enjoy dramatic efficiency gains. Anthropic and other labs report that a single engineer working with AI can now accomplish several times the output of a solo developer from only a few years ago. Internal numbers cited in commentary around the Anthropic report suggest that in some workflows, one engineer paired with advanced coding models can match the productivity of many engineers without such tools. This is economically attractive and strategically hard to ignore, especially in competitive markets where speed and feature velocity matter.

On the other hand, every additional layer of automation in the AI development pipeline reduces the surface area where humans directly engage with the details of what is being built. If most of the code diff is AI-authored, there is a constant pressure to keep review lightweight enough not to erase the productivity gains. Organisations must decide how much friction to reintroduce via testing, code review, and formal verification to compensate for the opacity and potential brittleness of model-generated software.

There is also a tension between transparency and performance. Coding models are trained on large corpora and fine-tuned for usefulness, but their internal reasoning is not inherently interpretable. When such models are tasked with writing critical infrastructure-especially infrastructure that itself trains or deploys models-the demand for rigorous verification increases. Yet the whole point of using AI at scale is to compress the development cycle; fully auditing every AI-generated line is often infeasible. This pushes teams towards probabilistic assurance: relying on automated tests, static analysis, and spot checks, accepting that some defects or misalignments may slip through.

Anthropic's policy stance reflects this tension. The organisation has publicly advocated for a potential future pause or slowdown in frontier AI development if such a pause can be coordinated and verifiable. At the same time, it continues to deploy tools that significantly accelerate the AI engineering process. The argument is not that acceleration ought to stop now, but that the world should build governance and monitoring infrastructure capable of making a pause credible if systems begin to show signs of more autonomous, less controllable forms of self-improvement.

Debates and objections

There are several lines of scepticism about treating AI-written code as a near-term marker of recursive self-improvement. One objection is that a model generating code on command is still deeply dependent on a human-constructed training pipeline and hardware stack. The AI may write most of the repository, but it does not yet select its own training data, modify its own loss functions, or commission new datacentres. From this perspective, calling such behaviour "self-improvement" risks overstating the level of autonomy.

Another objection focuses on quality. Critics argue that high percentages of AI-written code may reflect a bias towards quantity over robustness. If models can quickly generate large volumes of superficially plausible code, teams may be tempted to merge more, trusting tests and users to uncover issues. This could increase technical debt and vulnerability surfaces, particularly if AI-generated code uses patterns that are less idiomatic or less well understood by the team. In this view, the headline figure of more than four-fifths AI-authored code says more about internal incentives and tooling than about genuine leaps in capability.

A further concern is that the narrative of "AI writing its own code" might be leveraged for competitive signalling or regulatory positioning. Emphasising that models are rapidly approaching self-building status can support calls for stricter regulation, but it can also serve as a way to demonstrate leadership and sophistication in the race for funding and talent. Observers therefore scrutinise such claims, asking how the metric is defined (for example, how attribution between human and AI edits is measured) and what kinds of code are included-core model logic, surrounding infrastructure, or peripheral tools.

Supporters of the stronger interpretation respond that the exact percentage is less important than the direction of travel and the kinds of tasks being automated. The movement from "AI can write helper scripts" to "AI can build and maintain major production systems" represents a qualitative shift. Moreover, as AI-generated code begins to include experiment orchestration, data processing pipelines, and evaluation harnesses, the model's role in improving subsequent models increases, even if human oversight remains substantial. From this vantage point, the concern is not that current systems are already self-improving in the strongest sense, but that they are laying the groundwork for a regime in which incremental capability increases lead to disproportionate gains in further capability development.

Why it matters beyond software engineering

The implications of AI writing most of the code in a frontier lab extend well beyond the internal life of software teams. One major dimension is economic. If an AI-augmented engineer can do the work of several traditional engineers, the effective labour cost of software development drops sharply. Over a horizon of a few years, this could reshape labour markets, favouring organisations that can most effectively integrate AI into workflows. Entire categories of skilled work-software engineering, research assistance, data analysis, legal drafting-could be automated at a pace that leaves limited time for institutions to adapt.

Another dimension is geopolitical. Access to models capable of acting as high-bandwidth coding engines becomes a strategic asset. States or firms that control such systems can upgrade their digital infrastructure, defence systems, and research capabilities faster than competitors. If recursive self-improvement processes take hold, the gap between leading actors and followers could widen rapidly. This is one reason why some analysts emphasise the risks of concentration of power: if a small number of organisations own the most capable self-improving AI systems, they may acquire outsized influence over economic and political developments.

There is also a safety dimension that goes beyond the immediate risk of buggy code. As AI systems participate more in their own development, misalignments in objectives or reward signals can be compounded. If an AI is tasked with optimising for performance on certain benchmarks, and it also plays a role in designing the evaluation apparatus and experimental setups, it might inadvertently favour changes that make it look better on metrics without improving, or even while degrading, its broader alignment with human values. The more of the research loop is automatised, the more important it becomes to design robust, hard-to-game objectives and interpretability tools.

Finally, there is an epistemic dimension. When AI systems write most of the code, run most of the experiments, and summarise most of the results, human understanding of complex software and research landscapes can become indirect. Engineers and scientists may interact primarily with AI-generated abstractions of what is going on. This can be efficient, but it also risks a kind of institutional deskilling: fewer people understand systems end-to-end, making it harder to detect systemic errors, correlated failures, or unanticipated interactions. In high-stakes domains, that loss of deep understanding could itself become a safety hazard.

The emerging role of human engineers

In the near term, the rise of models as dominant code authors does not eliminate the need for human engineers; it changes their role. Reports from practitioners using Claude Code suggest that humans increasingly focus on problem decomposition, specification, and verification. They spend more time writing detailed natural language descriptions of desired behaviour, orchestrating multi-step workflows, and designing tests that capture subtle requirements. They also become stewards of code quality and maintainers of conceptual coherence across rapidly evolving codebases.

This role shift is non-trivial. Writing good prompts or instructions is a skill; designing prompts that anticipate edge cases, security concerns, and performance constraints is even more demanding. Similarly, effective verification under conditions of AI-generated abundance requires new practices: stronger automated test suites, better monitoring, and perhaps new forms of formal methods that are integrated into everyday workflows. Human engineers who adapt to these demands may become more like system architects and editors, curating and refining the work of a powerful but sometimes unreliable assistant.

At the same time, there will likely remain pockets of development where human-written code is preferred or required, especially for safety-critical components, low-level systems programming, or domains where subtle domain knowledge is hard to transmit through prompts alone. The distribution of human effort across a codebase will change: less time on boilerplate and repetitive patterns, more on rare but consequential decision points.

Looking ahead

The internal data that an AI system now authors the majority of a leading lab's merged codebase should be understood as a waypoint, not an endpoint. It marks a concrete, measurable point on a curve that leads from basic assistance to deeper forms of recursive self-improvement. The same dynamics that allow models to dominate code authoring-scaling, better scaffolding, agentic tools, and integration into research workflows-are also those that will shape how quickly AI systems begin to design and build their successors with decreasing human input.

Whether this trajectory culminates in controllable, beneficial systems or in hard-to-govern, rapidly self-improving agents will depend on decisions being made now: how much autonomy to grant coding models, what review standards to enforce, how to design incentives for safety rather than pure speed, and what international coordination mechanisms to build in anticipation of more powerful RSI. As the proportion of AI-written code grows, so too does the responsibility to align not just the models, but the socio-technical systems that surround them.

"Claude writes a significant proportion of Anthropic?s code. As of May 2026, more than 80% of the code we merge into Anthropic?s codebase was authored by Claude. Before Claude Code launched in research preview in February 2025, this number was in the low single digits." - Quote: Anthropic - Artificial Intelligence - Recursive Self Improvement

‌

Strategy Tool: Rethinking SWOT analysis in the context of AI

AI-SWOT reframes classic SWOT for the AI era by treating AI as a strategic amplifier and mitigator, not just another technology. It shows how AI can dramatically amplify existing strengths and opportunities (through scale, speed, data flywheels and new business models) while mitigating key weaknesses and threats (by closing capacity gaps, enhancing risk detection and building early?warning systems). The tool introduces a structured, workshop-ready process that walks leaders through: (1) identifying where AI can turn genuine strengths into durable moats, (2) using AI to unlock or accelerate external opportunities, (3) targeting AI at the specific weaknesses that drive competitive loss, (4) deploying AI to detect and neutralise emerging threats, especially in the WT quadrant, and (5) recognising AI itself as a new category of threat via competitor amplification and low-barrier new entrants. Packed with contemporary case studies (Nike, Amazon, Netflix, Klarna, JPMorgan, Siemens, boutiques vs. global firms), diagnostic questions, and stepped tasks, AI-SWOT gives executives a practical, evidence-based way to convert AI from a generic initiative into a focused, advantage-creating strategy tool.

‌

Term: Lean manufacturing

"Lean manufacturing is a production methodology that maximises productivity while systematically minimising waste. The core philosophy is to eliminate any step or resource that does not add value to the end customer, ultimately delivering higher quality products at a lower cost and in less time." - Lean manufacturing

Pressure to deliver higher quality at lower cost in shorter lead times has forced production systems to confront a fundamental constraint: every extra handoff, queue, batch, and defect consumes scarce capital, time, and human attention that could be redeployed to value-creating work instead. The practical challenge is to design operations so that resources follow customer value, not historical habits or departmental silos.

From traditional mass production to Lean thinking

Conventional mass production systems typically optimise for equipment utilisation and large batches, relying on forecasts to justify high inventory levels and long campaigns on each machine. This can mask deep inefficiencies: products sitting in warehouses, operators waiting for upstream processes, and entire batches scrapped due to a single defect discovered late in the sequence. By contrast, Lean reorganises the same resources around responsiveness and waste reduction, often revealing that much of the apparent "efficiency" of mass production comes from pushing hidden costs downstream.

Historically, this shift was crystallised in the Toyota Production System, which combined just-in-time supply, rapid problem detection, and worker-led improvement to meet diverse demand with limited capital after the Second World War. Over time this approach was abstracted into a general management system applied not only in automotive plants but also in electronics, pharmaceuticals, logistics, and even healthcare. The central practical implication is that processes are redesigned so that only customer-valued work survives and everything else is questioned.

Waste as the central diagnostic lens

The mechanism that links everyday operations to strategic performance is the disciplined identification and removal of waste, broadly defined as any activity consuming resources without changing the product in a way the customer would pay for. Classic Lean practice categorises waste into recurring patterns such as overproduction, waiting, unnecessary transportation, excess inventory, overprocessing, defects, and underutilised human skills. Each of these patterns translates directly into slower response, higher cost, and reduced quality.

For example, producing far ahead of demand inflates inventory and ties up working capital, yet does nothing to improve the customer experience if specifications or preferences change in the meantime. Similarly, complex approval layers or redundant inspections can create overprocessing, where work is done repeatedly to compensate for unstable upstream processes rather than stabilising those processes in the first place. By repeatedly asking whether a given step adds value from the customer's perspective, Lean teams progressively strip away these non-essentials.

The five core principles and their operational meaning

Various authors distil Lean into five interlocking principles: value, value stream, flow, pull, and perfection. These are less an abstract philosophy than a practical roadmap for redesigning production.

1. Value as defined by the customer

Value is specified in terms of the customer's needs, not the producer's convenience. This includes the product's features and performance, but also delivery reliability, lead time, and total cost of ownership. When organisations misjudge value, they often invest heavily in features or internal metrics (such as machine utilisation) that the customer neither notices nor rewards, while neglecting speed, consistency, or service.

In practice, value clarification requires structured dialogue with customers, analysis of complaints and returns, and often cross-functional teams responsible for a product across its lifecycle. Once value is properly defined, it becomes the reference for deciding which process steps are essential and which are candidates for elimination or redesign.

2. Mapping the value stream

The value stream comprises all actions required to bring a product from concept to launch and from raw material to finished good in the customer's hands. Value stream mapping makes these flows visible, quantifying process times, waiting times, inventories, and information flows so that waste becomes explicit.

Teams often discover that only a small fraction of end-to-end lead time is spent in true value-adding work, with the remainder trapped in queues, approvals, and rework. This diagnosis leads to targeted interventions: removing redundant inspections, simplifying routings, co-locating dependent operations, or redesigning products to reduce variation and setup complexity.

3. Creating continuous flow

Flow aims to ensure that once work starts on a unit, it moves without interruption through successive value-creating steps. Instead of large batches moving sporadically between functional departments, Lean systems favour smaller lot sizes, balanced work content, and cell layouts that physically bring sequential tasks closer together.

When flow improves, several effects follow: lead times shrink, defects are detected earlier, inventory falls, and planning becomes simpler because work-in-progress is more predictable. Achieving this state often requires technical interventions, such as reducing changeover times using Single-Minute Exchange of Die (SMED), introducing standard work to stabilise cycle time, and redesigning equipment layouts to minimise transport and handling.

4. Pull-based production

Pull systems authorise production based on actual downstream consumption rather than forecasted demand, thereby aligning output with real customer needs. Techniques such as Kanban employ visual signals-cards, bins, electronic triggers-to initiate replenishment only when a defined quantity has been used.

This approach directly attacks overproduction and excess inventory, which are often the largest sources of waste in traditional plants. However, pull relies on underlying stability: reliable machines, disciplined standard work, and responsive suppliers are prerequisites for responding quickly to consumption signals without resorting to large safety stocks.

5. Pursuing perfection through continuous improvement

Perfection in this context means an ever-closer alignment between processes and customer value, with fewer steps, shorter times, and lower cost. Because markets, technologies, and product portfolios evolve, Lean treats improvement as ongoing work rather than a one-off project, embedding structured problem-solving (often under the label of Kaizen) into daily operations.

Empowering operators to stop a process when abnormalities occur-supported by visual controls and root cause analysis-shifts focus from firefighting symptoms to eliminating underlying causes. Over years, this accumulation of small changes can transform cost structures and quality levels more effectively than sporadic capital-intensive upgrades.

Lean and the quantitative view of production performance

While Lean is frequently presented qualitatively, its impact can be expressed using simple performance relationships. Consider a production line where throughput depends on effective operating time and cycle time per unit via . Reducing changeover losses, unplanned downtime, and rework increases , while standard work, layout improvements, and defect prevention can reduce ; Lean attacks both sides of this relationship through waste elimination.

Inventory dynamics can also be framed mathematically. If average work-in-progress is , throughput is , and average lead time is , then Little's Law gives . Lean interventions that smooth flow and reduce waiting lower ; if throughput is maintained, work-in-progress must fall accordingly, releasing space and working capital. Pull systems in particular are designed to cap by limiting the number of Kanban signals in circulation.

Quality improvements can be connected to cost by considering the defect rate and cost per defect . The expected cost of defects per period is . By tackling root causes of defects, Lean reduces and often as well, because problems are caught earlier when rework is cheaper. These simple relationships make it possible to quantify the economic contribution of Lean projects and prioritise efforts.

Key tools and practices that operationalise Lean

Beyond principles and equations, Lean is expressed in a toolkit of methods that embed waste-conscious thinking into daily operations. Value stream mapping visualises material and information flows, highlighting bottlenecks, inventories, and rework loops. 5S workplace organisation arranges tools and materials for clarity and cleanliness, reducing motion and errors while supporting safety.

Kanban systems control replenishment of components and work-in-progress via clearly defined signal limits, preventing uncontrolled build-up of inventory. Standard work defines the best-known sequence, timing, and expected outcomes for each task, providing a stable baseline from which improvements can be made. SMED techniques shorten changeovers by separating internal and external activities and simplifying tooling and fixtures, enabling smaller batches and more responsive scheduling.

These tools are often supported by digital systems-such as production monitoring, advanced planning and scheduling, and inventory management software-that provide real-time data to sustain Lean decisions. However, Lean emphasises that technology should reinforce clear processes and problem-solving discipline rather than substitute for them.

Benefits and trade-offs in practice

Well-executed Lean programmes typically report higher productivity, reduced lead times, lower inventory, and better quality. Examples include freeing up floor space as work-in-progress falls, lowering logistics costs due to more predictable flows, and achieving shorter order-to-delivery times that allow firms to win business on responsiveness. Many organisations also see improvements in safety and employee engagement because processes become more orderly and frontline ideas are actively sought.

Yet these gains come with trade-offs and risks. Aggressive inventory reduction without robust process capability can leave plants vulnerable to supply disruptions or equipment failures. Overemphasis on eliminating variation may clash with the need for flexibility in highly customised or uncertain environments. In some cases, poorly implemented Lean programmes have been criticised as cost-cutting exercises dressed in new language, leading to workforce distrust when headcount reductions are framed as "waste elimination" rather than redeployment into higher-value work.

Sustaining benefits therefore requires governance mechanisms that balance efficiency with resilience: carefully chosen safety stocks, dual sourcing for critical materials, preventive maintenance programmes, and scenario planning for demand surges or supply shocks. The strategic question is not simply how lean a system can become, but how to set waste and buffer levels compatible with the organisation's risk appetite and market position.

Lean in a broader operations and supply chain context

As supply chains globalised, Lean principles extended beyond individual factories to logistics, procurement, and distribution networks. Optimising flow now involves synchronising suppliers, contract manufacturers, and logistics providers so that materials move smoothly from source to end customer. This requires data-driven demand planning, real-time visibility of inventories, and collaborative problem-solving across organisational boundaries.

Within this extended context, Lean intersects with other methodologies. Six Sigma's statistical focus on variation reduction complements Lean's emphasis on flow, leading many firms to adopt integrated Lean Six Sigma frameworks. Agile product development, with its short iterations and customer feedback loops, echoes Lean's insistence on value and adaptation, especially in environments of high uncertainty. Digital technologies-such as sensor-equipped equipment, analytics platforms, and automated material handling-can further amplify Lean's aims when used to stabilise processes and expose waste.

Ongoing debates and why Lean still matters

Contemporary debates centre on the robustness of Lean systems in the face of external shocks and long, complex supply chains. Just-in-time practices were scrutinised during periods of global disruption when shortages of critical components halted entire production lines. Critics argued that relentless pressure to minimise inventory had removed valuable resilience. Proponents countered that the problem lay in applying Lean simplistically, without adequate risk assessment, diversification, or strategic buffers.

Another tension concerns the human dimension. Lean's success depends on engaged workers empowered to identify problems and suggest improvements, yet implementations driven solely from the top can feel like cost reduction programmes imposed on staff. Reconciling these perspectives requires transparent communication about goals, genuine investment in training, and mechanisms that ensure productivity gains translate into better work rather than just cuts.

Despite these controversies, the underlying logic remains powerful: resources are finite, customer expectations for speed and quality continue to rise, and environmental constraints make waste in all forms increasingly untenable. Organisations that systematically align processes with value, expose and remove waste, and cultivate a culture of continuous improvement are better positioned to adapt to new technologies, regulatory pressures, and market shifts.

Lean manufacturing therefore still matters not as a fixed toolkit from a particular era but as a way of structuring operational thinking around value, flow, and learning. In a world where competitive advantage is often determined by how effectively companies convert ideas, materials, and information into reliable outcomes for customers, the disciplined pursuit of waste-free processes remains a central strategic concern.

‌

Quote: "Where is AI in GDP statistics?"

"Imperfect measures of AI productive capacity are far more informative than the implicit assumption embedded in conventional projections-that the AI sector's productive capacity is small and slow-growing. Fiscal authorities could use such measures to stress test projections about the labor tax base; central banks could..." - "Where is AI in GDP statistics?" - May 2026 - Anton Korinek (PIIE) and Patrick McKelvey (Bank of Canada)

Macroeconomic policy is being steered by models that quietly embed an assumption about artificial intelligence: that the sector is economically small, its capacity expands slowly, and its contribution to the tax base and inflation dynamics will remain marginal for years to come. In parallel, AI investment, compute capacity, and quality-adjusted output have begun to grow at extraordinary rates that are largely invisible to the national accounts used by fiscal authorities and central banks. The underlying tension is not just one of measurement technique but of strategic blindness: policy frameworks calibrated to a pre-AI economy are extrapolating forward as though the production frontier itself were not being shifted by an emerging general-purpose technology.

The core issue is a widening gap between the productive capacity of the AI sector and its measured footprint in GDP statistics. When quality-adjusted AI output grows at rates that would be implausible for any traditional industry, this is not a minor statistical curiosity; it is a signal that the informational content of standard projections is degrading. Legislatures drafting medium-term budget frameworks, and central banks publishing fan charts for growth and inflation, are implicitly conditioning on a world in which the AI production function is both small in scale and smooth in its evolution. If that premise is wrong, the entire configuration of projected labour income, tax receipts, and output gaps may be systematically biased.

The factual backdrop: explosive AI output, invisible in GDP

Over the past few years, high-end estimates of AI sector activity in the United States have suggested nominal output on the order of USD 250 billion in 2025, comparable to the scheduled passenger airline industry. More striking than the level is the growth rate of quality-adjusted AI output. By treating AI as a coherent production sector and adjusting for improvements in model quality at fixed prices, Korinek and McKelvey estimate that quality-adjusted AI production expanded at more than 2 000 percent per year in 2024 and 2025. These numbers are driven by three compounding forces: rapid expansion of data-centre and compute capacity, hardware efficiency gains, and algorithmic progress that dramatically improves output per unit of hardware.

National statistics offices, however, were not designed to track such activity. Conventional GDP accounting captures the AI boom only indirectly: as investment in structures and equipment, as intermediate inputs to other industries, and as service purchases by firms and consumers. Many of the most important gains show up as quality improvements or consumer surplus rather than observed market transactions. The result is that the data streams feeding macroeconomic models depict an economy with modest technology-driven productivity improvements, even as AI developers scale capacity in ways that historically have been associated with major general-purpose technological shifts.

This disconnect is why the authors argue for an "AI GDP" framework and satellite accounts that explicitly measure AI production and capacity. Their empirical work shows that once AI is treated as a distinct sector with its own capital stock, intermediate inputs, and quality-adjusted output, the growth dynamics look radically different from the rest of the economy. For policymakers, the lesson is not that headline GDP should be replaced, but that relying on projections which implicitly assume a small, slow-moving AI sector is no longer tenable.

Productive capacity versus realised output

The statement about "imperfect measures of AI productive capacity" turns on a crucial distinction between two concepts that macroeconomic models often conflate when technologies are stable: productive capacity and realised output. Productive capacity refers to what the AI sector could produce at current prices and technology if it were fully utilised, given existing compute stock, model architectures, and available data. Realised output is what is actually being produced and sold at a point in time, which depends on demand, regulatory constraints, infrastructure bottlenecks, and organisational readiness across the wider economy.

In conventional macroeconomics, realised output is typically modelled relative to a potential output , with an output gap . For most sectors, capacity grows relatively smoothly, and potential output is estimated using trend filters or production functions with modest capital-deepening and productivity terms. The implicit assumption in many forecasting frameworks is that the AI sector contributes only a small increment to aggregate , so that treating capacity as a smooth extrapolation of past trends is adequate.

Once AI capacity begins to grow at rates exceeding 2 000 percent in quality-adjusted terms, that assumption breaks down. Even if only a fraction of that capacity is deployed into new products, automation tools, and complementary capital, the path of potential output could deviate markedly from trend. A production function that includes AI capital alongside traditional physical and human capital may need to be written as something like , where is growing at extraordinary rates and itself partly reflects AI-driven spillovers. Ignoring this term or extrapolating it linearly is no longer a neutral simplification.

This is why even imperfect estimates of AI capacity can be more informative than implicitly assuming capacity is trivial. An imperfect measure at least anchors projections to a dynamic that recognises the scale and direction of change. In contrast, a baseline that effectively sets or grows it as a modest share of aggregate capital builds in a structural misrepresentation of the economy's production frontier.

From measurement gap to policy gap

If official statistics understate the growth of AI productive capacity, a policy gap follows. Fiscal and monetary authorities are tasked with stabilising the economy, financing public goods, and safeguarding financial stability in the face of shocks. Their tools and frameworks are calibrated around relationships between output, employment, inflation, and asset prices that assume gradual technological progress. When a technology arrives that can simultaneously automate cognitive tasks, create new service categories, and compress the time needed to design and deploy software, those relationships become unstable.

One channel is aggregate supply. Suppose AI diffusion accelerates between 2026 and 2030, with AI-enhanced processes raising effective labour productivity in certain sectors by large multiples. If models underestimate the expansion of productive capacity, central banks may misinterpret disinflationary pressures as evidence of weak demand rather than a positive supply shock, potentially leading to policy that is too accommodative or too tight depending on the sign of the misreading. A parallel risk exists on the fiscal side: if projected tax bases are derived from historical elasticities of labour income to GDP, they may fail to account for a shift in value creation from wages to AI-mediated capital income.

Financial stability is another concern. Massive investment in data centres, high-end chips, and AI-native firms is expanding the AI capital stock in ways that could resemble past investment booms. Without explicit measures of sectoral productive capacity and utilisation, regulators may struggle to gauge whether valuations reflect reasonable expectations of future cash flows or a speculative overshoot. Imperfect but transparent measures of AI capacity would allow stress tests to incorporate scenarios in which utilisation stalls, regulatory constraints bite, or technical progress slows, affecting both earnings and collateral values.

Stress testing the labour tax base

The quote points explicitly to one of the most immediate fiscal applications: stress testing projections for the labour tax base. Tax systems in advanced economies rely heavily on taxes on labour and consumption, with labour often providing between 40 and 60 percent of total revenue when payroll and personal income taxes are combined. If AI capacity enables rapid automation of tasks, especially in high-wage professions, the composition of tax bases could shift towards capital income and rents linked to data, intellectual property, and platform control.

Imperfect measures of AI capacity can inform scenario analysis even before comprehensive AI satellite accounts exist. Consider a simple mapping from AI capacity to potential labour displacement: if AI-driven tools can, at full deployment, perform a fraction of tasks currently performed by workers in certain occupations, and if the effective AI capacity index is growing at an exponential rate, then plausible stress scenarios can be constructed around the trajectories of relative to current labour inputs. Fiscal authorities can then simulate paths in which the labour share of income declines by, say, 5 to 15 percentage points over one or two decades, and examine the consequences for personal income tax and social insurance contributions.

Such stress tests do not require precise predictions about which jobs will be automated in which year. They require a disciplined way of linking the growth of AI capacity to enveloping ranges of labour income outcomes. Even if the underlying AI capacity index is built from noisy proxies-data-centre investment, GPU shipments, estimated algorithmic efficiency gains, and model deployment metrics-its imperfections are transparent and can be bracketed with sensitivity analysis. That is more informative than assuming, as many baseline projections still do, that labour's share of income and the elasticity of taxable wages to GDP will remain approximately constant.

Central banks and AI-adjusted output gaps

Central banks face a different but related challenge. Standard New Keynesian frameworks rely on estimates of potential output and output gaps to guide interest rate policy. When AI capacity increases rapidly, the shape of potential output becomes more uncertain. If AI raises trend productivity growth, then what appears as cyclical weakness might actually be a benign reflection of the economy adjusting to a higher productivity path. Conversely, if AI-driven sectoral shifts create pockets of structural unemployment, traditional Phillips curve relationships between slack and inflation may weaken.

Incorporating AI capacity measures into monetary policy models could take several forms. One is to extend production functions to include AI capital explicitly, with separate utilisation rates for that capital. Another is to augment the information set used for estimating potential output with AI-specific indicators, treating them as leading signals of future supply shifts. Even a rudimentary AI capacity index-constructed from investment, compute, and benchmark performance measures-could help central banks distinguish between inflation dynamics driven by demand fluctuations and those driven by AI-enabled supply changes.

This matters for interest rate paths and communication strategies. If AI capacity is expected to unleash significant deflationary pressure in certain sectors while boosting demand for complementary skills and capital elsewhere, central banks must decide how to respond to a more uneven and possibly more volatile pattern of relative price changes. Failing to recognise AI as a material driver of potential output and productivity risks miscalibrating both policy stance and forward guidance.

The strategic tension: ignorance versus imperfect information

The phrase "imperfect measures" acknowledges that any attempt to quantify AI productive capacity at this stage will be fraught with conceptual difficulties. Where exactly should the boundary of the AI sector be drawn-only foundation model developers, or also downstream firms building domain-specific applications? How should quality be adjusted when models differ along dimensions that are difficult to aggregate? How should non-market outputs, such as open-source models and freely available tools, be treated?

Yet the alternative is not a world of perfect accuracy; it is a world of structurally embedded ignorance. When conventional projections assume that AI capacity is small and slow-growing, they effectively fix technology parameters that may in fact be changing rapidly. The strategic choice is between embracing a noisy, revisable set of AI-specific metrics or relying on models that treat a potentially transformative technology as a footnote. Korinek and McKelvey argue that the former is superior precisely because it allows policymaking to be conditioned on explicit assumptions that can be scrutinised, updated, and stress-tested.

This is analogous to the evolution of macro-financial surveillance after the global financial crisis. Before 2008, many macro models either omitted financial frictions or represented them in highly stylised ways, effectively assuming that the financial sector's capacity to generate credit and risk was constrained and well-behaved. Post-crisis, central banks and international institutions built macro-prudential frameworks, stress testing regimes, and detailed sectoral accounts to monitor systemic risks. These tools are imperfect by design, but they are grounded in an explicit recognition that ignoring financial capacity dynamics is unacceptable. AI capacity measurement occupies a similar conceptual role for the production side of the economy.

Debates and objections

There are, however, serious debates around the measurement approach and its policy uses. One line of criticism questions whether quality-adjusted AI output growth figures in the 2 000 to 2 600 percent range are economically meaningful. Skeptics argue that adjusting for model capabilities at fixed prices may overstate the contribution to welfare and productivity if users' willingness to pay does not rise in proportion to benchmark scores. They caution that capacity measures built on technical performance metrics risk becoming detached from the pace of real-world diffusion, organisational change, and complementary investment.

Another objection concerns the mapping from sectoral AI capacity to aggregate outcomes. Critics note that productive capacity in the AI sector does not automatically translate into realised productivity gains across the economy. Bottlenecks in regulation, trust, data access, and skills could delay deployment for years. From this perspective, the danger is not that conventional projections underestimate AI's impact but that they might overreact to capacity signals that are only slowly realised in output and employment.

These critiques underscore the need to treat AI capacity measures as inputs to scenario analysis rather than as point forecasts. Imperfect measures can still be used to generate bounded scenarios: a low-deployment path in which only a small share of capacity is applied to economically significant tasks, a central path with gradual diffusion, and high-deployment paths in which adoption accelerates non-linearly. Fiscal and monetary authorities can then design policies that are robust across these scenarios rather than optimised for a single assumed trajectory.

Why the measurement choice matters now

The timing of this measurement agenda is not incidental. If AI capacity continues to expand at recent rates, the gap between what AI could do and what it is currently doing will grow rapidly. That capacity-realisation gap carries both upside and downside risks. On the upside, if deployment accelerates, economies could experience a wave of productivity growth that eases fiscal pressures and raises living standards. On the downside, if deployment is uneven or concentrated in ways that displace labour without adequate redistribution, the tax base could become more volatile and more reliant on capital taxation, wealth taxes, or new instruments targeted at AI-intensive firms.

Policymakers therefore face interlocking strategic questions. How should social insurance systems and tax codes be redesigned to remain solvent if labour income becomes a less reliable base? What mix of labour, consumption, and capital taxation can sustain revenue without unduly discouraging innovation? How should central banks adjust their analytical toolkits to handle economies in which potential output and sectoral composition are shaped by a rapidly evolving AI sector? None of these questions can be addressed adequately if the AI sector is treated as a black box whose size and capacity are left unspecified.

Imperfect measures of AI productive capacity offer a way out of that impasse. They allow fiscal authorities to run stress tests in which the labour tax base is eroded under different deployment scenarios, prompting early consideration of alternative revenue sources and automatic stabilisers. They enable central banks to explore how AI-driven supply shifts could affect inflation dynamics, wage bargaining, and asset prices, informing both baseline projections and tail-risk planning. And they provide a common reference point for debates about regulation, competition policy, and industrial strategy, even if the underlying figures are subject to revision.

In the longer run, the development of AI-focused satellite accounts and an "AI GDP" framework is likely to transform how we think about the structure of the economy. What begins as a set of rough capacity indicators can evolve into a more comprehensive picture of the AI value chain, from compute infrastructure and foundation models to domain-specific applications and labour-AI complementarities. The statement that imperfect measures are more informative than implicit assumptions is therefore not only a comment on current data gaps; it is a call to rebuild the informational foundations of macroeconomic policy before the AI economy grows large enough to turn today's measurement gap into tomorrow's policy failure.

‌

Quote: Kristalina Georgieva - International Monetary Fund (IMF) Managing Director

"We collectively, including the fund, did not appreciate the backlash against globalisation that came from the fact that, yes, the world economy is doing better as a whole, but many communities were hollowed out because their jobs disappeared and there was not enough attention to them. I'll tell you what I'm very keen not to see repeated is the same with artificial intelligence." - Kristalina Georgieva - International Monetary Fund (IMF) Managing Director

The central issue is not whether a new technology makes economies more productive, but whether the gains arrive faster and more visibly than the losses. When job destruction is concentrated in particular towns, sectors, and skill groups, aggregate growth can look healthy while the social fabric in affected places weakens, and that imbalance has become a defining political risk around artificial intelligence. Kristalina Georgieva, who has served as Managing Director of the International Monetary Fund since October 1, 2019 and began a second term on October 1, 2024, has made that warning from a position of institutional authority that was shaped by the IMF's experience of multiple global shocks.

The remark reflects a lesson that global institutions learned, often slowly, from the era of rapid trade integration. The world economy can be better off on paper even as specific communities lose stable work, local spending power, and a sense of economic purpose. That distinction matters because politics is rarely organised around the global average. It is organised around visible closures, wage stagnation, and the feeling that national and international leaders celebrated efficiency while leaving the costs of adjustment to be absorbed locally. Georgieva's concern is that artificial intelligence could repeat that pattern on a faster clock, with the benefits accruing to firms, capital owners, and highly adaptable workers while the disruption lands on those whose tasks are easiest to automate.

From globalisation's backlash to AI's distributional shock

The comparison with globalisation is not rhetorical flourish; it is an argument about political economy. In her interview, Georgieva said that the IMF and others had not sufficiently appreciated the backlash against globalisation because they focused on the fact that the world economy was doing better as a whole, while many communities were hollowed out when jobs disappeared. That description captures the core failure of technocratic optimism: it can measure aggregate welfare precisely while underweighting the geography of decline. A region that loses a factory, a port function, a back-office cluster, or a processing plant does not experience the economy as a statistical average. It experiences it as closure, migration, and social churn.

Artificial intelligence creates a similar tension because it is best understood as a general-purpose technology whose economic effect is broad, uneven, and delayed. The IMF has estimated that almost 40% of global employment is exposed to AI, rising to about 60% in advanced economies. Exposure does not mean every exposed job vanishes, but it does mean that a substantial share of routine cognitive work, administrative handling, analysis, and content production may be altered, compressed, or partially automated. The IMF also noted that in advanced economies roughly half of the exposed jobs may benefit from AI integration, while the other half may see lower labour demand, lower wages, or in some cases disappearance.

This is why the social question is not merely about total output. If AI raises productivity by making firms leaner and faster, the headline number can be positive even when bargaining power shifts away from labour. Goldman Sachs has argued that generative AI could raise global GDP by 7% and lift productivity growth by 1,5 percentage points over 10 years, while also exposing the equivalent of 300 million full-time jobs to automation. Those figures are not incompatible. They describe a world in which technology expands the economic pie while simultaneously changing who gets the slices and who is left waiting outside the bakery.

The IMF's warning is also a warning about timing

One reason AI is politically delicate is that its benefits may be diffused over time, while its costs are immediate and local. Productivity gains can take years to appear in national accounts because firms need to adapt workflows, train staff, redesign products, and learn how to trust new systems. By contrast, a call centre that reduces headcount, a law office that automates first-draft work, or a media business that cuts junior roles can do so quickly. The result is a familiar asymmetry: the burden of adjustment arrives before the compensation mechanisms are ready.

This timing problem helps explain why economists disagree so sharply on the size of the prize. Optimistic estimates stress economy-wide efficiency gains, new products, and the value of complementary tasks. More restrained work emphasises that only a fraction of tasks can be profitably automated once implementation costs, error rates, oversight, regulation, and customer preferences are included. Daron Acemoglu has argued that the medium-term productivity effect may be far smaller than the largest headline estimates, with a much more modest uplift in output once only economically viable uses are counted. The disagreement matters because policy should not be built on the most dramatic forecast, nor should it ignore the possibility that adoption will be slower and less comprehensive than enthusiasts predict.

Georgieva's intervention sits between those poles. She is not denying that AI can boost growth. Indeed, the IMF itself has argued that AI is on the brink of a technological revolution that could jumpstart productivity, boost global growth, and raise incomes around the world. The warning is that the distributional consequences could still be severe enough to deepen inequality and social tension if governments assume that aggregate gains will automatically trickle down. In other words, the productivity story and the social story are not rivals. They are two halves of the same policy problem.

Why global institutions are especially sensitive to this pattern

The IMF's interest in this issue is not accidental. A multilateral lender and surveillance institution sees macroeconomic stability through the lens of crises, capital flows, unemployment, and political backlash. If a major technology wave deepens inequality inside countries, it can also change fiscal politics, trade politics, and attitudes towards international cooperation. Francine Lacqua, the interviewer in the podcast series, is a Bloomberg anchor who regularly speaks with central bankers, finance ministers, and senior officials, which makes the conversation part of a broader public debate about how economic power is being reorganised.

Georgieva's own background reinforces the institutional seriousness of the warning. Since the IMF has already had to manage the economic consequences of the pandemic and other global disruptions, it has become increasingly alert to the fact that resilience cannot be treated as an abstract ideal. It must be built in advance through labour-market policy, social protection, training, competition rules, and investment in digital capacity. That is especially true because AI does not affect all countries equally. The IMF has said exposure is highest in advanced economies, while emerging markets and low-income countries face lower but still significant exposure. That means the immediate labour-market shock may be concentrated in wealthier countries, but the longer-term diffusion of AI capabilities could widen the gap between economies that can adopt, regulate, and complement the technology, and those that cannot.

What was missed during globalisation

The phrase about communities being hollowed out points to a specific historical failure. Policymakers often treated trade and integration as a sum-of-parts problem: if the nation as a whole is richer, then the policy is successful. But local economies do not adjust frictionlessly. Workers in declining industries are not instantly reallocated to new sectors. Skills are not perfectly transferable. Housing markets are sticky, family ties matter, and the social meaning of work is not captured by GDP. When those frictions are ignored, resentment accumulates and eventually seeks political expression.

That experience is directly relevant to AI because the technology may hollow out different kinds of places. Globalisation often hit manufacturing towns, logistics hubs, and regions dependent on tradable goods. AI may instead pressure administrative centres, shared-service locations, media organisations, some professional services, and entry-level white-collar pathways. The political consequence may therefore be different in detail but similar in structure: whole ladders of advancement can be shortened before replacements are fully visible. For younger workers, especially, the problem is not just displacement but the erosion of the first rung of a career ladder.

There is also a deeper ideological parallel. During the globalisation era, many advocates implicitly assumed that efficiency was self-justifying. If something was cheaper, faster, and better for consumers, the distributional pain was treated as secondary. AI could repeat that error if firms and governments measure success by adoption rates alone. But broad adoption is not the same as broad benefit. A technology can be commercially successful, strategically important, and still socially destabilising if the gains are narrowly held.

The strategic debate: productivity engine or inequality accelerator?

The strongest argument in favour of AI is that it can raise productivity in economies that have struggled with weak growth, labour shortages, and ageing populations. Goldman Sachs' estimate of a 7% lift in global GDP captures the scale of ambition that surrounds the technology, while the IMF has stressed that AI could improve incomes and support growth if it is deployed well. In sectors from healthcare to education to finance, AI systems can reduce routine workload, accelerate analysis, and improve service quality. The promise is not only cost cutting but the creation of new products and business models.

The strongest argument against complacency is that AI may amplify existing inequalities in capital, data, and skill. Firms with the best models, the most data, and the strongest distribution channels will capture disproportionate value. Workers with high complementary skills may see their productivity rise, while workers in modular, repeatable tasks face stagnation or displacement. Countries with advanced digital infrastructure may use AI to widen their advantage, while countries with weaker institutional capacity struggle to keep up. Even when the overall effect on employment is positive in the long run, the short run may still bring a wave of churn that outpaces retraining and policy response.

This is why the debate is not really about whether AI is good or bad. It is about whether societies will manage transition costs with enough seriousness. The IMF has argued that policymakers should proactively address inequality to prevent AI from further stoking social tensions. That implies practical choices: stronger safety nets, wage insurance, mobility support, lifelong learning, and public investment in digital skills. It also implies a regulatory stance that encourages adoption while checking abuses, such as excessive market concentration or labour substitution without offsetting investment in human capability.

Why the warning matters now

Georgieva's message matters because it shifts the debate from hype to governance. It is easy to celebrate a technology when its promised benefits are still theoretical. It is harder to govern it when its disruptions are already visible. The IMF chief's insistence that the world should not repeat the mistakes of globalisation is a reminder that economic success measured at the top can coexist with social fracture at the base. If AI is allowed to proceed as a private efficiency project with public consequences ignored until later, then the backlash will not be surprising; it will be predictable.

That is the practical consequence buried inside the warning. AI can make economies richer, but it can also make societies less stable if the transition is unmanaged. The policy challenge is to ensure that productivity gains are not treated as an excuse to forget the communities and career paths that bear the cost of change. If that lesson is missed again, the political response may be harsher than the technology itself.

‌

Quote: Demis Hassabis - Google Deepmind CEO

"When we look back at this time, I think we will realise that we were standing in the foothills of the singularity. It will be a profound moment for humanity." - Demis Hassabis - Google Deepmind CEO - 2026 Google I/O technology developer conference

The underlying issue is no longer whether machine intelligence will transform human affairs, but whether our political, economic and ethical systems can adapt at the same speed as the underlying technology that is now compounding year on year. The friction lies in a widening gap: frontier AI systems are moving from tools that wait for instructions to entities that can act, plan, and coordinate with minimal human supervision, while institutions, laws and norms still assume a world of slower, more legible change. When a leading AI scientist asserts that this transition marks the early stage of a new historical regime, he is naming a tension that is already visible in boardrooms, laboratories and legislatures.

From static tools to agentic systems

For several decades, AI systems were framed as narrow tools: chess engines, recommendation algorithms, translation services and search ranking models. They were powerful, but fundamentally reactive. They did not initiate projects, hold long-term goals or orchestrate complex workflows without an engineer in the loop. The recent shift to so-called "agentic" systems is qualitatively different. These models can decompose a user objective into sub-tasks, call tools such as browsers or code interpreters, write and debug software, and loop over their own outputs until a performance criterion is met. In effect, they act like junior colleagues rather than software menus.

At Google I/O, this shift was made concrete through demonstrations of AI systems that design operating systems, draft and execute multi-step research plans, and coordinate across products from search to productivity suites. One showcase involved an autonomous system that could construct a functional operating system for under USD 1 000 in compute and overhead, a task that would historically require teams of engineers working for months. The key is not that such feats are possible in principle; it is that they are rapidly becoming cheap, repeatable and integrated into mainstream platforms.

This transition matters because it changes the leverage a small group of people or organisations can exert. A single developer equipped with powerful agents can now build, test and deploy complex services that once demanded a mid-sized company. In security terms, the same leverage can enhance defensive capabilities but also lower the barrier for sophisticated cyberattacks, automated social engineering, or automated discovery of software vulnerabilities. The trajectory is towards a world where much more can be done by far fewer humans.

Why "singularity" entered the AI mainstream

The term "singularity" was originally borrowed from physics and mathematics, where it describes points such as the centre of a black hole, at which descriptive equations break down and conventional intuitions fail. In the early 1990s, computer scientist Vernor Vinge repurposed the idea for AI, suggesting that once systems exceed human cognitive capabilities and can improve themselves, the resulting feedback loop would produce change so rapid that it would be difficult to model with existing social or economic theories.

For years, such visions were largely confined to science fiction, futurist circles and a subset of AI safety researchers. Large technology companies tended to avoid the language, preferring incremental narratives about productivity and assistance. The decision by a major AI lab leader to adopt the singularity framing publicly signals a deliberate shift: it acknowledges that the slope of capability is steepening and that the transition from experimental systems to world-shaping infrastructure is well under way. It also functions as a warning that the timelines to serious disruption are short enough that preparation cannot be deferred.

Hassabis has suggested that artificial general intelligence, often defined as systems with performance roughly comparable to an expert human across a wide range of tasks, could emerge by around 2030, with uncertainty measured in only a few years. If those estimates are even approximately correct, then organisations that plan on decade-long cycles, from regulators to universities to defence ministries, face a planning problem they have rarely confronted: they must hedge against both the possibility of very rapid transformation and the possibility that the curve flattens.

The factual context: a platform company bets on autonomy

The backdrop to this language is a strategic reorientation of one of the world's largest technology companies around AI. At Google I/O 2026, Google and DeepMind unveiled an array of products and research initiatives: new frontier models, multimodal assistants integrated into search and productivity tools, autonomous coding systems, AI-augmented video generation tools, and bespoke hardware for training and serving models at scale. Rather than being siloed experiments, these systems are presented as a coherent platform spanning consumer, enterprise and developer ecosystems.

In this environment, Hassabis's statement is not an isolated philosophical remark. It sits alongside concrete decisions: allocating large capital budgets to specialised AI accelerators, restructuring products around AI agents, and articulating timelines that compress the expected arrival of broadly capable systems into the span of a single strategic planning horizon. The narrative is that humanity is entering a phase where each iteration of capability builds directly on the previous one, leading to compounding returns rather than linear gains.

In effect, the company is arguing that today's chatbots and coding assistants represent only the earliest stage of a broader transition. These are the first footholds, not the peak. As agents are networked, endowed with memory, and embedded in physical systems such as robots, vehicles and infrastructure, their actions will increasingly manifest in the material economy rather than just digital text and images. This is where concerns about labour markets, safety and governance become more immediate.

Acceleration, compounding and feedback

The strategic tension revolves around feedback loops. If AI systems can help design better versions of themselves, build more efficient hardware, discover new materials and streamline research, then progress in AI becomes entangled with progress in the rest of science and engineering. Hassabis has argued that AI may prove several times more transformative than past industrial revolutions because it targets the bottleneck that constrained earlier eras: the pace at which new ideas can be generated, tested and implemented.

Historically, improvements in productivity depended on larger workforces, more capital or incremental process optimisation. A significant share of that optimisation was done by human experts. If AI can augment or partially automate the role of these experts, the rate of innovation itself could accelerate. In economic terms, this raises the prospect that growth models based on a roughly constant rate of technological improvement could be replaced by regimes in which the effective innovation rate increases as AI improves.

For example, consider a stylised research process where the time required to complete a project is . If AI tools cut by a factor of , with , then the number of projects completed per year increases by . If AI is itself improved by the outputs of these projects, then can shrink over time, leading to a feedback loop in which the pace of progress itself accelerates. In more formal endogenous growth models, AI would augment the "effective" number of researchers, increasing the term governing idea production and pushing economies onto steeper growth trajectories.

In practice, such models are crude and highly uncertain, but they capture the intuition behind singularity language: beyond a certain level of capability, the interactions between AI, science and industry may generate dynamics qualitatively different from previous technological shifts. This is both the lure and the anxiety of the current moment.

Promise: scientific discovery and problem-solving

Hassabis has consistently emphasised the constructive side of this transition, particularly in science and healthcare. DeepMind's work on protein folding, through its AlphaFold system, offers an early indication of how AI can contribute to core scientific challenges. Where traditional approaches required painstaking experiments to infer the three-dimensional structure of proteins from their amino acid sequences, AI systems can now predict many such structures computationally, vastly expanding the available dataset for drug discovery and basic biology. Similar methods are being developed for material science, climate modelling and mathematics.

As models become more capable at exploring hypothesis spaces, designing experiments and interpreting complex datasets, the hope is that they will help unlock treatments for diseases, design low-carbon materials and optimise energy systems more rapidly than human research alone could achieve. This is part of why some AI leaders argue that the net impact of advanced AI could dwarf earlier industrial transformations: it does not only automate existing tasks but also amplifies the process by which new capabilities are created.

In a world facing climate change, ageing populations and geopolitical instability, such accelerations are understandably attractive. They offer a narrative in which AI is not primarily about efficiency or consumer convenience but about expanding the frontier of what is technically possible in domains that matter directly to human survival and flourishing.

Risk: misalignment, misuse and concentration of power

The same features that make advanced AI attractive also generate serious risks. Systems capable of autonomous planning and self-improvement raise questions about alignment: ensuring that their objectives, when pursued at scale, remain compatible with human values and legal constraints. Even if one is sceptical of scenarios involving fully superhuman intelligence, there are near-term concerns about AI systems that are merely very capable and deployed widely without sufficient safeguards.

One class of risk involves misuse. Autonomous coding agents can assist in writing malware, identifying vulnerabilities, or orchestrating coordinated attacks. Large-scale language models can generate persuasive disinformation tailored to specific demographics, potentially amplifying existing social fractures. As these systems become better at modelling human psychology and adapting in real time, the cost of high-quality manipulation could fall, with implications for elections, public health campaigns and social cohesion.

Another involves structural power. If the resources required to train frontier models remain concentrated in a handful of companies and states, control over the most capable systems will be highly centralised. Those actors could, intentionally or not, shape everything from labour markets to information flows. The singularity framing draws attention to a moment where artificial systems may hold more de facto power than any single human institution can easily check, not because they are sentient or malicious, but because they are embedded in so many layers of critical infrastructure.

There is also the possibility of accidents and emergent behaviour. As models grow larger and are coupled with external tools and other agents, predicting their behaviour in novel situations becomes more difficult. Aligning such systems may require new formal methods, rigorous evaluation regimes and international norms that do not yet exist at scale. Here, the concern is less a sudden catastrophic failure and more a series of cascading incidents-financial flash crashes, infrastructure outages, or uncontrolled propagation of flawed code-arising from tightly coupled automated systems.

The strategic and technological tension

At the heart of current debates is a tension between speed and control. On one side, there is the argument that rapid deployment is necessary to capture economic value, to stay ahead of competitors and to make beneficial applications widely available. On the other, there is the view that racing ahead without robust safety measures, regulatory frameworks and democratic oversight is irresponsible, particularly as systems approach or exceed human-level competence across many domains.

Hassabis's public positioning seeks to occupy a middle ground. He emphasises both the proximity of general-purpose AI and the need for society to prepare within a relatively short time window. This implicitly calls for a dual strategy: accelerate the development of beneficial uses while simultaneously investing in safety research, governance structures and public engagement. The challenge is that market incentives, geopolitical rivalry and the sheer pace of technical progress make coordinated restraint difficult.

Governments are only beginning to respond with AI acts, executive orders and voluntary code commitments. These instruments tend to lag technical frontier capabilities by several years. By the time a regulation is in place to address one generation of models, the next generation-with qualitatively different properties-may already be under development. This regulatory lag is familiar from other technologies but is amplified when the paradigm itself is in flux.

Debates and objections

Not all researchers or policymakers accept the singularity framing or the specific timelines associated with it. Critics raise several objections. One is empirical: past predictions of AI breakthroughs, including earlier waves of optimism in the 1960s and 1980s, were often overconfident. They argue that current systems, impressive as they are, still rely heavily on pattern recognition rather than deep understanding, struggle with long-term reasoning and lack robust grounding in the physical world.

From this perspective, equating progress in large language models and agents with an imminent singularity risks obscuring unresolved problems such as brittleness, hallucination and vulnerability to adversarial inputs. Some suggest that claims about timelines to AGI are influenced by competitive pressures and investor expectations, and that more humility is warranted. They also worry that dramatic narratives about near-term singularity could crowd out attention to mundane but urgent issues like labour displacement, privacy and market concentration.

Another objection targets the metaphor itself. The term "singularity" implies a sharp discontinuity, a moment after which extrapolating from previous trends becomes meaningless. Some economists and sociologists argue that a more accurate picture is one of uneven, domain-specific adoption. In this view, certain sectors-software, digital marketing, some scientific fields-may experience extremely rapid change, while others-construction, caregiving, public administration-move more slowly, constrained by physical, legal or cultural factors.

Accordingly, they suggest focusing less on hypothetical points of infinite change and more on concrete decisions about where and how AI is deployed, who benefits, and how costs are distributed. For them, the danger of singularity language is that it can induce either complacent fatalism-"nothing we do matters"-or reckless acceleration-"we must move as fast as possible to reach the promised land"-neither of which encourages careful stewardship.

Why the framing matters now

Regardless of whether one accepts the metaphor or the timelines, the choice by a central figure in AI to characterise the current era as the beginning of a singularity has practical consequences. It signals to engineers, investors and policymakers that they should treat AI not as a marginal upgrade to existing tools, but as a transformational general-purpose technology. That shift in perception can influence everything from research priorities to education policy.

In research, the framing encourages work on foundational capabilities and long-term safety rather than solely on narrow applications. Teams may prioritise interpretability, robustness and alignment techniques in anticipation of systems whose influence extends across critical infrastructures. In industry, the expectation of accelerating capability may drive aggressive investment in AI-native products, workforce retraining and new business models that assume AI will be a core component of almost every workflow.

In public policy, acknowledging that we might be in the "foothills" of a major transformation sharpens the urgency of questions about accountability, global coordination and equitable access. If advanced AI is likely to amplify existing inequalities unless actively governed, then social choices made in the next few years-about data rights, model access, liability regimes and international cooperation-will have outsized effects. The metaphor thus serves as a prompt: if we are indeed at an early stage of a steep climb, the route we choose now will determine which groups bear the risks and reap the rewards.

Finally, there is a psychological dimension. Seeing one's era as a hinge point in history can be both motivating and destabilising. For researchers and entrepreneurs, it provides a sense of purpose: their work may have consequences far beyond quarterly metrics. For citizens and policymakers, it can induce anxiety about loss of control. Navigating between these reactions requires a form of collective maturity: the ability to recognise that transformative capability is emerging, to take its risks seriously without succumbing to paralysis, and to articulate positive, plural visions of futures in which powerful AI is integrated into human institutions rather than simply unleashed.

Whether or not historians ultimately agree that this period marked the true "foothills" of a singularity, the underlying reality is that AI systems are already reshaping knowledge work, scientific research and digital infrastructure. The choice now is not whether to enter this terrain, but how to do so deliberately, with as much foresight as a rapidly changing technological landscape will allow.

"When we look back at this time, I think we will realise that we were standing in the foothills of the singularity. It will be ?a profound moment for humanity.? - Quote: Demis Hassabis - Google Deepmind CEO - 2026 Google I/O technology developer conference

‌

Term: Trade marketing - FMCG / CPG

"Trade marketing in FMCG/CPG is a business-to-business (B2B) marketing strategy focused on selling products to supply chain partners, such as retailers, wholesalers, and distributors, rather than directly to the end shopper. Its primary objective is to maximize immediate sales volume by ensuring products are widely available, competitively priced for the channel, and highly visible on the retail shelf." - Trade marketing - FMCG / CPG

The struggle for share in grocery, convenience, pharmacy, and e-commerce is won or lost long before shoppers walk down the aisle. The decisive battleground is the relationship between manufacturers and their channel partners: which brands distributors push, which lines retailers list, where they sit on the shelf, how they are priced, and which promotions receive scarce in-store support. This upstream negotiation of attention, space, and effort is where trade marketing becomes economically critical for fast-moving consumer goods and consumer packaged goods.

From consumer pull to channel push

Manufacturers in everyday categories like food, beverages, personal care, and household cleaners operate in markets defined by high purchase frequency, low unit prices, and intense competition. Consumers typically decide within seconds, often on autopilot, with limited engagement or product research. In that setting, even strong brands cannot rely solely on consumer advertising and digital campaigns to secure volume. If a product is out of stock, relegated to a low-traffic shelf, obscured by competitor displays, or uncompetitive in channel pricing, consumer pull cannot convert into sales.

Trade marketing addresses this gap by focusing on the economic and operational incentives of wholesalers, distributors, and retailers themselves. The objective is to shape the assortment, placement, pricing, and promotional calendar at the point of sale so that channel partners actively prioritise one brand over alternatives. It transforms the manufacturer-retailer relationship from transactional listing to ongoing joint business planning, anchored in shared volume and margin goals.

Substantive meaning of trade marketing in FMCG / CPG

In substantive terms, trade marketing in these categories is a cluster of business-to-business practices aimed at making a product the easiest and most profitable choice for the channel. It is concerned less with brand positioning in the mind of the end consumer, and more with how the product competes for scarce shelf space, warehouse capacity, and promotional slots.

Key features include:

- Channel-first targeting. Activities are directed at distributors, wholesalers, and retailers rather than directly at shoppers. The immediate customer is the buyer at a supermarket chain, the procurement manager at a wholesale club, or the owner of a convenience store network.

- Volume orientation. Given that CPG and FMCG rely on thin per-unit margins and high turnover, profitability depends on large, repeat orders. Trade marketing tactics are therefore tuned to drive case volumes and share of shelf across outlets, not just brand awareness.

- Shelf and display optimisation. Because products are chosen quickly and often with low involvement, visibility at the point of sale materially affects demand. Securing eye-level shelf positions, secondary placements, end caps, and special displays is a central concern.

- Channel-specific pricing and promotion. Retailers need sufficient margins and promotional funding to justify prioritising a brand. Trade marketing negotiates wholesale pricing, discounts, rebates, and co-funded promotions that work economically for both sides.

- Joint execution and support. Successful programmes include training, point-of-sale materials, data sharing, and operational support to help retailers execute agreed plans effectively.

In practice, this means that much of the marketing budget in a large consumer goods company is channel-directed: trade discounts, promotional allowances, co-operative advertising, in-store activation, and the staff and systems needed to design and monitor these levers.

Core activities and mechanisms

Trade marketing in FMCG / CPG spans a wide set of operational and strategic activities, each tied to a specific mechanism for generating incremental sales.

1. Retailer relationship and joint planning

Relationship management with key accounts is foundational. Teams work with retailers to understand category strategies, margin targets, and shopper segments, then co-develop plans covering assortment, shelf layout, and promotion. This often involves:

- Annual or seasonal joint business plans specifying volume targets, investment levels, and marketing calendars.

- Category advice based on market and shopper data, positioning the manufacturer as a partner rather than just a supplier.

- Negotiation of listing fees, exclusive deals, and long-term contracts for distribution or shelf space where appropriate.

The underlying mechanism is alignment of economic incentives: the manufacturer offers value in the form of insights, investment, and reliable supply; the retailer responds with space, visibility, and featured promotion.

2. Product display, placement, and merchandising

Where and how a product appears in store strongly influences its share of purchases, especially in categories with many near-substitutable brands. Trade marketing teams design planograms, propose shelf adjacencies, and deploy merchandising resources to secure favourable locations such as eye-level shelves and aisle ends.

They also manage temporary and semi-permanent displays: floor stands, dump bins, pallet stacks, chillers, and branded fixtures. These serve dual purposes: meeting retailer objectives for revenue per square metre, and boosting spontaneous purchases on top of planned shopping lists.

3. Promotions, incentives, and trade deals

Promotional mechanics sit at the heart of trade marketing because retail partners must decide which deals to feature at any given time. Typical levers include:

- Volume-based discounts and tiered pricing for larger orders.

- Off-invoice discounts or rebates tied to sell-out performance.

- Multi-buy offers and value packs co-funded by manufacturers to encourage shoppers to buy more units at once.

- Seasonal or event-based promotions aligned to peak consumption periods.

These mechanics are designed to satisfy three constraints simultaneously: maintain adequate retailer margin, protect manufacturer profitability, and provide compelling value for shoppers. Misaligned deals may drive short-term volume at the expense of long-term price integrity or brand positioning, which is why careful planning and post-event evaluation are increasingly important.

4. Training, point-of-sale materials, and retail execution

Beyond financial incentives, FMCG suppliers often invest in retailer capability. This can include staff training on product benefits, usage occasions, and cross-selling opportunities, as well as the provision of brochures, shelf talkers, wobblers, and digital screens to communicate with shoppers.

Many companies also deploy field teams or outsourced agencies to audit compliance with agreed displays, check stock levels, and correct execution gaps. Retail execution technology helps track whether promotions and planograms are implemented as intended, closing the loop between head office agreements and in-store reality.

5. Market research, analytics, and data-driven targeting

Because FMCG categories are characterised by frequent repurchase and high volumes, point-of-sale data is rich. Trade marketing functions increasingly rely on analytics to understand which promotions drive incremental volume, which outlets under-perform, and how different channels respond to specific tactics.

Retailer loyalty data, syndicated market panels, and internal shipment records feed models that estimate baseline sales and promotional uplifts. This evidence base allows the more scientific allocation of trade spend and sharper negotiation with retailers about which programmes deliver genuine category growth versus simple brand switching.

Simple mathematical framing of trade marketing impact

Although most decisions are commercial and strategic, it is useful to express the mechanics in a simple quantitative framework. Consider a single product in a specific retail chain. Weekly sales volume can be viewed as a function of three key factors:

where is unit sales, represents numeric and weighted distribution (how widely and in which store formats the product is available), captures visibility and in-store activation (shelf position, number of facings, displays, promotions), and is the effective price to the shopper, net of discounts and deals.

Trade marketing interventions mainly act on and . Listing a product in more stores or more branches increases ; securing better shelf locations and additional displays raises . The combined effect can be conceptualised with elasticities:

where , , and are the elasticities of sales with respect to availability, visibility, and price. In many FMCG categories, the absolute value of and is high, because out-of-stock or poor placement directly suppress sales. Trade marketing focuses on improving and in a cost-effective way, while coordinating with pricing teams responsible for .

At a portfolio level, trade spend itself can be modelled as an input where manufacturers seek to maximise profit:

Here denotes profit, is volume for brand , is revenue, is cost, and is total trade investment. The optimisation challenge is to allocate across brands, channels, and mechanics to generate the largest incremental , taking into account retailer reactions and competitive responses.

Key parameters and levers in practice

While such equations are simplifications, they highlight the main parameters trade marketers manage in real organisations:

- Distribution breadth and depth. Numeric distribution (percentage of outlets stocking the brand) and weighted distribution (share of category sales represented by those outlets) are primary levers. Gaining entry into high-volume retailers or formats such as large supermarkets yields disproportionate impact.

- Shelf share and facings. The proportion of category shelf space devoted to a brand, and the number of facings at eye-level, directly influence visibility and availability under real-world conditions.

- Promotional intensity. Frequency, depth, and type of in-store promotions determine how strongly trade marketing influences trial and stock-up behaviours.

- Trade margin structure. The split of value between manufacturer and channel partners is a key determinant of retailer support.

- Compliance rates. The percentage of stores that actually implement agreed displays and promotions influences realised uplift versus planned uplift.

These parameters are increasingly monitored via digital tools, from electronic shelf labels and POS data feeds to in-store image recognition that tracks facings and compliance. This creates feedback loops that allow continuous optimisation of trade marketing plans.

Major schools of thought and strategic approaches

Different companies and consultants frame trade marketing in somewhat different ways, leading to several schools of thought within FMCG and CPG organisations.

Category management-centric view

One approach embeds trade marketing within category management, emphasising that manufacturers should help retailers grow entire categories, not just their own brands. Under this paradigm, trade initiatives are evaluated on their ability to increase total category sales and shopper satisfaction, on the assumption that retailers will reward genuine category growth with stronger long-term partnerships.

This view tends to favour data-rich, insight-driven interventions: shelf re-sets based on shopper missions, rationalisation of low-velocity SKUs, and promotions that recruit new shoppers rather than trigger subsidised switching between similar products.

Promotion-driven and sales-led view

A second school of thought sees trade marketing primarily as the engine of short-term volume, closely tied to sales quotas and quarterly targets. Here, success is measured in immediate sell-in and sell-out spikes around promotional windows, with heavy reliance on price cuts, multi-buy deals, and aggressive display activity.

This approach can deliver rapid results, especially in mature categories where consumer demand is responsive to price. However, it risks eroding perceived value and conditioning shoppers to buy only on deal, as well as fostering a cycle of promotion wars among competing brands.

Customer marketing and joint value creation

A more recent perspective frames trade marketing as customer marketing, underlining that retailers are themselves customers with distinct needs and brand equities. From this standpoint, the goal is to create tailored programmes for each key account that enhance the retailer's proposition as well as the manufacturer's.

Examples include exclusive SKUs or flavours for a specific chain, co-developed digital campaigns that run in the retailer's app or website, and shared sustainability initiatives that support both corporate strategies. This view aligns with the broader trend towards collaborative planning and integrated shopper marketing across online and offline channels.

Tensions, debates, and long-running challenges

Because trade marketing sits at the intersection of marketing, sales, finance, and supply chain, several persistent tensions shape practice and strategy.

Short-term volume vs long-term brand health

One recurring debate concerns the balance between driving immediate volume through deep discounts and protecting long-term brand equity. Frequent, steep promotions can train shoppers to perceive the regular price as inflated and to delay purchase until the next deal. At the same time, retailers often push for more trade funding as a condition for space and visibility.

Resolving this tension requires close coordination between brand marketing and trade marketing teams. Decisions about promotion frequency and depth must be aligned with the brand's positioning and consumer price expectations, not driven solely by short-term sales targets.

Transparency and complexity of trade spend

Trade marketing budgets are often among the largest line items in an FMCG P&L, yet historically they have been less transparent than above-the-line advertising spend. Different types of discounts, rebates, and allowances can make it difficult to measure true net prices and returns on investment.

As competition intensifies and margins remain thin in CPG industries, finance and revenue management functions are demanding more rigorous measurement of trade promotion effectiveness. This has led to growing use of trade promotion management and optimisation tools, which attempt to quantify incremental volume and profit from each activity rather than treating trade spend as a cost of doing business.

Channel conflict and omnichannel complexity

The growth of e-commerce and quick-commerce has added new layers of complexity. Manufacturers must manage relationships with traditional brick-and-mortar retailers alongside online marketplaces, direct-to-consumer sites, and delivery platforms. Trade terms negotiated with one channel can influence those in others, creating the risk of channel conflict over pricing, assortment, or exclusivity.

Trade marketing in this environment must adapt from a store-centric mindset to an omnichannel one, integrating digital shelf visibility (search ranking, sponsored placements, product detail page content) with physical merchandising and in-store activation. The basic principles of availability, pricing, and visibility still apply, but the execution space is now broader and more data-intensive.

Why trade marketing in FMCG / CPG still matters

Despite shifts towards digital media, influencer partnerships, and sophisticated brand storytelling, trade marketing remains crucial in fast-moving consumer categories. Several structural reasons explain why.

First, the underlying economics of FMCG and CPG are unchanged: products are often low cost with thin margins, sold in high volumes, and replenished frequently. Profitability still hinges on efficient distribution, reliable in-stock rates, and capturing as large a share of shopper baskets as possible. Trade marketing is the managerial function designed to secure these conditions.

Second, retailers continue to wield substantial power in deciding which brands appear where and with what support. Even as direct-to-consumer models grow, most everyday purchases still flow through supermarkets, convenience stores, pharmacies, and mass merchants. Winning in these channels requires understanding and influencing retailer economics, not just consumer preferences.

Third, the rise of data, AI, and retail media has, if anything, increased the sophistication and potential impact of trade marketing rather than rendering it obsolete. Manufacturers can now target spend more precisely, test and learn across banners and regions, and quantify payback with more rigour than before. Retailers, in turn, monetise their digital properties through sponsored listings and on-site advertising, blurring boundaries between trade investment and advertising spend.

Finally, macro trends such as sustainability, responsible sourcing, and health consciousness create new arenas for manufacturer-retailer collaboration. Joint initiatives on packaging reduction, ingredient reformulation, or community programmes often rely on the same relationship infrastructure and negotiation skills built through trade marketing. The conversation may extend beyond price and promotion to include shared values and long-term differentiation for both parties.

For practitioners in FMCG and CPG, mastering trade marketing means moving fluently between commercial negotiation, shopper insight, operational execution, and analytical evaluation. It is not simply a set of discounts or a department adjacent to sales; it is the discipline through which brands convert their consumer promise into real-world presence on the shelf, in the basket, and on the bottom line.

‌

Quote: "Where is AI in GDP statistics?"

"The AI economy in the United States has been growing at an unprecedented rate, but this extraordinary growth is largely invisible in conventional GDP statistics. Treating the AI sector as a coherent economic entity yields preliminary estimates of nominal AI GDP at approximately $250 billion in 2025, growing at roughly 2 600 percent per year in quality-adjusted real terms." - "Where is AI in GDP statistics?" - May 2026 - Anton Korinek (PIIE) and Patrick McKelvey (Bank of Canada)

Economic statistics are struggling to keep pace with a technology whose productive capacity is compounding across hardware, data centres and algorithms faster than the measurement systems designed in the mid-20th century can register. The result is a widening gulf between what is happening inside the AI ecosystem and what appears in national accounts, complicating debates about productivity, inequality and policy at exactly the moment when artificial intelligence is beginning to reshape production methods and business models.

The invisible boom behind a modest spending line

On the surface, current US spending on AI looks like a sizeable but manageable line item in GDP: on one influential estimate, nominal AI compute outlays amount to about USD 250 billion a year by 2025, covering both inference and model training activities. That figure encompasses the purchase of specialised chips, cloud compute services, and associated infrastructure that is straightforwardly counted as investment or intermediate consumption in the national accounts. In conventional terms, this is the visible part of the AI economy: money changing hands for hardware, data centre capacity and access to models, all recorded using existing categories.

Yet behind that nominal spending path lies an extraordinary explosion in effective AI capacity. When the same researchers treat AI as a coherent production sector and apply quality adjustments for both hardware improvements and algorithmic progress, they find that real AI output is growing not by 140 percent a year but by more than 2 000 percent annually in 2024 and 2025. In other words, each year's spending is buying orders of magnitude more capability than the previous year's, even though the nominal cash flows entering GDP aggregates rise only modestly by comparison. The boom is hidden not because it is economically trivial, but because the price-quality relationship is collapsing too quickly for unadjusted statistics to capture.

Three compounding engines of AI output

The backstory to such eye-catching growth rates lies in three distinct but reinforcing processes that reshape what a given dollar of AI spending delivers. The first is the physical build-out of data centre capacity. AI-optimised facilities packed with accelerator chips and high-bandwidth networking are being deployed at an accelerating pace, so the raw compute available for training and running models is growing well above 200 percent a year. This expansion is visible in investment data, but only in a blunt way: the accounts see larger structures and more equipment, not the combinatorial increase in tasks those resources can perform.

The second process is hardware efficiency. Successive generations of AI chips deliver substantially more floating-point operations per second for each dollar of cost and each unit of energy. Measured in H100-equivalent units, US AI computing capacity is estimated to grow at more than 200 percent per year, significantly outpacing nominal spending growth because each chip class is more powerful than its predecessor at roughly comparable prices. This dynamic echoes the semiconductor industry's long history of rapid quality improvements that kept its GDP share modest even as real output exploded, but AI accelerators are doing so in a context where demand for compute is sky-high and model sizes are scaling aggressively.

The third, and most potent, engine is algorithmic progress. Advances in model architectures, training techniques, optimisation and data curation mean that the amount of compute required to achieve a given performance benchmark has been falling sharply, in some estimates by around two-thirds per year. Put differently, a fixed quantity of chips running for a given period now delivers far more useful task performance than it did a few years ago. When quality adjustments account simultaneously for (a) more data centre capacity, (b) more powerful hardware, and (c) more efficient algorithms, the implied growth in effective AI output jumps from high triple-digit percentages to the 2 000-2 600 percent range cited in recent work.

Formally, one can think of quality-adjusted AI output as the product of three components: physical compute capacity , hardware efficiency and algorithmic efficiency . A simple multiplicative representation is . If each term grows at an annual rate , and , then the growth rate of is approximately for moderate rates; with the extreme compounding observed in AI, the exact expression yields multi-thousand-percent annual increases when all three drivers are simultaneously large.

Why GDP barely moves

National accounts, by design, focus on market transactions valued at current prices, and only selectively adjust for quality improvements. When the price of a service falls as its quality rises, the accounts record a combination of higher quantities and lower prices, but the frameworks and data pipelines required to track both accurately for a new technology often arrive late. For AI, the issue is pronounced because per-unit prices for a given capability are dropping almost as fast as underlying capacity is rising. If API access to a model that can perform a given benchmark task becomes ten times cheaper in a year, and enterprises spend only modestly more on AI in total, then nominal AI revenues will show only a small increase, even though the effective quantity of AI services they purchase has surged.

This pattern is familiar from historical episodes. The semiconductor sector experienced decades of rapid performance improvements, yet its share of GDP remained modest because each new generation of chips delivered more performance for similar or lower nominal prices. Hedonic pricing methods allow statistical agencies to adjust for this by re-expressing prices in terms of constant performance metrics, but applying such methods requires stable benchmarks and extensive data that typically emerge only after technologies have matured. The AI wave is arriving too quickly for existing statistical routines to keep up, leaving much of the quality-adjusted output uncounted in official real GDP figures.

Defining an "AI GDP" and its boundary

Korinek and McKelvey therefore propose treating AI as a distinct economic entity with its own satellite accounts, yielding an "AI GDP" that complements, rather than replaces, standard aggregates. The idea is to delineate the production boundary of AI: which activities belong in the AI sector, how their output should be measured, and how to avoid double-counting when AI is embedded in other goods and services. They focus on the production side, aggregating spending on compute for model training and inference, as well as AI-related research and development, and then applying quality adjustments based on API pricing and estimates of algorithmic progress.

This boundary is necessarily provisional. Including only core compute and API-based access captures the narrow AI industry, but much of the economic value from AI will emerge in applying models to domains like healthcare, education, logistics and creative industries. If those downstream uses are counted as ordinary sectoral output without explicit attribution to AI, then AI's contribution to growth will remain partly hidden even if the upstream AI GDP is measured perfectly. Conversely, if the AI boundary is drawn too broadly, there is a risk of attributing to AI productivity improvements that are jointly driven by complementary investments in human capital, organisational change or non-AI software.

One emerging response in the measurement literature is to distinguish between AI's productive capacity and its realised utilisation. Capacity can be proxied by compute resources and model capabilities, while utilisation depends on demand, adoption and complementary changes in firms' processes. This motivates a conceptual gap between potential AI GDP, based on what the technology could deliver if fully deployed, and actual AI-enabled output that shows up in sectoral productivity data. The unusually high quality-adjusted growth rates identified in the AI sector look more like capacity expansion than like realised welfare gains; the satellite account framework is a way to track this capacity before it fully diffuses through the economy.

The strategic tension: capacity versus productivity

The divergence between internal AI growth and GDP statistics matters because it shapes how policymakers, firms and the public interpret the technology's macroeconomic role. On one view, the rapid expansion of AI capacity with limited reflection in aggregate productivity suggests a familiar pattern of lagging diffusion: general-purpose technologies often require time-consuming organisational and human-capital investments before they translate into economy-wide gains. The classic comparison is with electrification, where factories needed decades to reorganise around distributed motors instead of central shafts, during which time headline productivity growth remained subdued.

On another view, the mismatch raises questions about whether current measurement practices are still adequate. If AI is enabling substantial quality improvements in services that are poorly captured by prices or quantities, such as personalised education or medical diagnostics, then real welfare may already be rising faster than official statistics suggest. However, because GDP is anchored to transactions, not to subjective well-being or consumer surplus, an AI-augmented service that remains priced similarly to its predecessor will not add much to measured output even if users derive more value from it. This reinforces the importance of clarity about what GDP can and cannot represent.

Strategically, governments face a tension between under-reacting to AI because it is invisible in official numbers and over-hyping it based on internal metrics from the AI industry. An AI sector that is expanding at 2 600 percent per year in quality-adjusted terms looks like a revolution from the perspective of data centre operators and model developers. From the perspective of macroeconomic analysts focused on trend productivity growth of perhaps 1-2 percent per year, the effects so far look modest. Calibrating regulation, infrastructure policy and workforce programmes in this context is difficult: the technology's future impact is potentially enormous, but the statistical evidence of current gains is thin.

Debates and objections

The very concept of an AI GDP has sparked several lines of debate. One concern is that quality-adjusted growth rates on the order of thousands of percent may be more a reflection of the chosen metrics than of genuine economic output. Measuring AI in terms of benchmark tasks or API performance might overstate economically relevant progress if those benchmarks do not map cleanly to productivity in real-world workflows. Critics argue that the ability of models to score higher on academic tests or synthetic tasks does not automatically translate into major cost savings or revenue gains for firms.

Another line of criticism questions whether focusing on production-side compute spending misses the demand side of the story. GDP aggregates are meant to reconcile production with income and expenditure; a satellite account that captures only the production capacity of AI might be analytically useful but risks being misinterpreted as a measure of realised welfare. The authors themselves are cautious on this point, emphasising that their quality-adjusted AI GDP should be read as a signal of productive capacity rather than as a replacement for standard welfare concepts. From this perspective, the headline growth rates highlight how quickly the technological frontier is moving, not how much better off households currently are.

A third objection is practical. Statistical agencies operate under resource constraints and must prioritise improvements that deliver the greatest benefit for overall data quality. Some observers worry that building specialised AI satellite accounts could divert attention from more pressing tasks, such as better measuring services, intangibles and household production. In response, proponents of AI-focused measurement argue that because AI is highly input-intensive and potentially general-purpose, an early investment in dedicated tracking can prevent larger measurement problems later, particularly if AI-enabled services blur sector boundaries and reconfigure value chains.

Why it matters for policy and strategy

Despite the conceptual disputes, the attempt to quantify an AI GDP has several concrete implications. For macroeconomic policy, understanding the scale and trajectory of AI investment and capacity is essential for forecasting productivity, inflation and labour market dynamics. If AI capacity is growing far faster than utilisation, there may be a period in which capital deepening outpaces labour adaptation, affecting wage structures and sectoral employment even before aggregate productivity accelerates. Conversely, if AI adoption triggers rapid efficiency gains in certain tasks, it could exert disinflationary pressures in specific service categories, complicating monetary policy calibration.

For innovation and industrial policy, AI satellite accounts can inform decisions about infrastructure, regulation and public R&D support. Knowing whether AI investment is concentrated in a handful of large firms or spread across a wider ecosystem affects concerns about competition and resilience. Tracking the balance between training expenditure and inference-related spending sheds light on whether the frontier is shifting primarily through ever-larger models or through more efficient deployment of existing capabilities. These are questions that conventional sectoral classifications and investment data are not well suited to answer.

For firms, the backstory behind the AI GDP figures highlights the importance of complementarity. The mere existence of rapidly expanding AI capacity does not automatically translate into competitive advantage; what matters is the ability to integrate models into production processes, redesign workflows and manage data effectively. Businesses that treat AI as a drop-in technology may find that the gains visible in benchmark tests do not materialise in their own operations. Those that invest in organisational learning, experimentation and human-machine collaboration are more likely to convert the sector's quality-adjusted output growth into genuine productivity improvements.

Towards a more nuanced statistical architecture

Ultimately, the story behind the headline estimate of a 250-billion-dollar AI economy growing at thousands of percent per year is about the need for a richer statistical architecture. Traditional GDP will remain the workhorse indicator for macroeconomic analysis, but its design assumptions are being stretched by technologies that deliver rapid quality improvements at falling prices and that diffuse across sectors in ways that blur the boundaries between producers and users. Satellite accounts for AI, structured around clear production boundaries and transparent quality-adjustment methods, offer a way to track the technology's evolution without over-claiming about its current welfare impact.

The challenge for researchers and policymakers is to keep the conceptual distinctions clear. AI capacity is not the same as AI usage; AI usage is not the same as productivity; and productivity is not the same as welfare. Yet all four are linked, and their trajectories over the coming decade will shape living standards, industrial structures and geopolitical balances. An analytical framework that isolates AI's contribution to production, while acknowledging the limits of current data and the uncertainties around mapping benchmarks to economic value, is a crucial step in making sense of a transformation that standard GDP statistics barely register today.

‌