‌

Our selection of the top business news sources on the web.

AM edition. Issue number 1341

Latest 10 stories. Click the button for more.

‌

Quote: Bryan Catanzaro - Vice president of applied deep learning at Nvidia

"For my team, the cost of [AI] compute is far beyond the costs of the employees." - Bryan Catanzaro - Vice president of applied deep learning at Nvidia

In the current generation of artificial intelligence deployment, the binding constraint for many organisations is no longer talent but access to affordable compute at scale. That inversion of the traditional cost structure is what turns remarks from senior technologists into a broader economic signal: for some cutting-edge teams, the recurring bill for accelerated hardware, cloud instances, networking, and power now exceeds the wage bill for the engineers designing and operating the systems. This is not just an accounting oddity; it alters how firms evaluate automation, how investors price AI strategies, and how policymakers should interpret predictions of rapid job displacement.

From cheap silicon and expensive people to expensive silicon and leveraged people

For several decades, digital transformation followed a familiar pattern: hardware and basic infrastructure costs per unit of computation fell predictably, while skilled labour remained scarce and expensive. Organisations hired more software engineers to sit atop an increasingly cheap computational substrate. In that world, the canonical argument for automation was straightforward: once a process was codified in software, the marginal cost of running it again was negligible compared with paying an additional human to do the same work.

What has changed with frontier AI systems is the scale and intensity of computation required to deliver competitive performance. Training large language models and vision systems involves running vast numbers of parallel operations across specialised GPUs and associated accelerators, often for weeks, on clusters that can cost tens or hundreds of millions of dollars in capital expenditure for hyperscalers and large enterprises. Even when organisations do not own the hardware, their cloud providers pass through these capital and operating costs via metered pricing. As a result, the unit of analysis for AI economics has shifted from "salary per employee" to metrics such as cost per GPU hour, cost per million tokens, and cost per inference request.

Industry observers now describe AI infrastructure as a new class of heavy industry: data centres designed around specialised accelerators, redundant power feeds, and advanced cooling, with aggregate global spending that consultancy estimates place in the trillions of dollars by 2030. That capital intensity explains why some AI teams report that, at the margin, paying for more compute to improve a model or scale an application is financially weightier than hiring additional engineers to refine prompts, build interfaces, or clean data.

Why compute dominates the cost stack

There are several mechanisms behind compute eclipsing labour in AI projects.

First, state-of-the-art models are extremely large. Modern large language models and multi-modal systems often contain hundreds of billions or even 1 trillion parameters. Each training run requires repeated passes over large datasets, with backpropagation and optimisation algorithms applied across every parameter. The computation required scales roughly linearly with the number of parameters and data tokens, and in practice teams often run multiple training, fine-tuning, and ablation cycles. That translates into millions of GPU hours, even when using the most efficient hardware and software stacks available.

Second, inference - the process of serving model outputs to users - imposes ongoing costs that grow with adoption. Training is a one-off or periodic capital-like expense, but inference is an operational expense that scales with queries. Industry frameworks therefore emphasise cost per token as a central metric: the all-in cost to produce each output token, incorporating hardware depreciation, energy, data centre overheads, networking, and software optimisation. Even small differences in cost per million tokens can compound dramatically when applications serve millions of users or integrate AI into high-frequency workflows.

Third, the energy footprint of frontier AI is substantial. High-end GPUs draw significant power, and data centres require additional energy for cooling and ancillary systems. Energy prices vary geographically, but in many locations they are high enough that power constitutes a large share of the total cost of ownership for AI infrastructure. Analysts have therefore started speaking of "intelligence per megawatt" as a key performance dimension. When firms compare this stack of costs with the wages of knowledge workers, the balance can tilt unexpectedly toward hardware and energy spending.

Fourth, there is a structural asymmetry between compute and labour costs. Employee salaries are relatively predictable and can be adjusted slowly via hiring freezes, attrition, or compensation changes. Compute costs, by contrast, can spike rapidly if product usage grows or if teams run more experiments than anticipated. In startups, venture capital typically funds both headcount and infrastructure, but some investors report portfolio companies spending more than 80% of their capital raised on compute resources, dwarfing wage bills.

The empirical picture: AI not yet cheaper than humans

These mechanisms are now visible in empirical work. One 2024 study from MIT examined where AI systems could perform visual tasks at or near human level, and then compared the cost of machine versus human performance. The researchers concluded that automation was economically viable in roughly 23% of roles where vision is central to the job, meaning that in about 77% of cases, it remained cheaper to pay humans than to deploy AI. The issue was not capability - the models could often do the tasks - but economics: hardware, energy, and infrastructure outweighed labour costs.

Macro-level data on AI expenditure reinforce this picture. Big technology firms alone have announced around 740 billion in capital expenditures in a single year, largely driven by AI data centre build-out, representing a 69% increase over the prior year. Other analysis suggests that AI-related expenditures could reach 5,2 trillion by 2030 under central scenarios, and as high as 7,9 trillion under more aggressive build-out, with data centre and IT equipment accounting for the bulk of this spending. Against those numbers, even generous headcounts of highly paid engineers, researchers, and product staff occupy a smaller share of the cost base than might be expected.

At the same time, external observers note that, despite the scale of these investments, there is still limited aggregate evidence of AI-driven productivity gains across the economy. Budget analysts and academic studies point out that, so far, there is no broad-based data showing AI displacing jobs at scale or dramatically boosting measured output per worker, even as tech sector layoffs have accelerated. That divergence between spending and measured productivity raises the question of whether the current wave of AI investment is front-loaded - infrastructure built ahead of realised returns - or whether some fraction will prove to be misallocated capital.

Strategic tension: build now, pay later

This leads to a central strategic tension. On one side is the argument that AI is an infrastructure revolution akin to electrification or the early internet, requiring enormous upfront capital before productivity gains show up in statistics. Firms that invest early, this view holds, will establish competitive moats via proprietary data, trained models, and optimised infrastructure. For such firms, the fact that compute currently costs more than labour may be beside the point; they are laying the foundations for future economies of scale, where the amortised cost per unit of AI output falls sharply as utilisation rises.

On the other side is a more sceptical view, which emphasises opportunity cost and path dependence. If AI systems are currently more expensive than humans for many tasks, especially outside narrow high-value niches, then replacing workers prematurely may destroy value rather than create it. Companies that chase AI for its own sake, without rigorous cost-benefit analysis, risk saddling themselves with fixed infrastructure commitments and ongoing compute bills that are difficult to roll back. This argument is bolstered by evidence that many firms do not fully understand their AI cost structures, focusing on headline model access fees or GPU rental rates rather than total cost per token or per workflow.

These perspectives are not mutually exclusive. It is possible that some organisations are overbuilding, while others are rationally investing in infrastructure that will underpin future competitive advantage. What unites them is an underlying bet: that the cost of compute will fall fast enough, and the productivity benefits of AI will rise high enough, to justify today's discrepancy between machine and labour costs.

Falling unit costs versus rising aggregate spend

Industry roadmaps and analyst reports forecast significant reductions in the unit economics of AI over the next few years. Hardware generations such as Nvidia's Blackwell architecture promise up to 30× gains in inference performance at similar power budgets compared with earlier accelerators. Software improvements - better compilers, quantisation techniques like FP4 precision, more efficient attention mechanisms, and mixture-of-experts routing - all work to reduce the computational load per unit of useful output. Gartner-style forecasts point to the cost of running inference for models with 1 trillion parameters dropping by more than 90% over a four-year horizon.

If realised, those gains could radically alter the relative cost of compute and labour. A workflow that is uneconomic today because each AI call is expensive might become viable once the cost per million tokens falls below some threshold. In that future, the remark that compute costs more than employees would be overtaken by a new reality in which compute is cheap enough that the main question becomes how to reconfigure organisations to exploit it.

However, even as unit costs fall, aggregate spending may still rise. The classic rebound effect applies: cheaper computation tends to expand the range of feasible applications and increase total usage. Organisations that pay less per token may respond by embedding AI into more products, workflows, and services, multiplying the total number of tokens generated. If spending on AI grows from 5,2 trillion to 7,9 trillion by 2030, a large part of that increase will likely reflect expanded scope, not just higher prices. The result is a paradox: individually, each unit of compute may become cheaper and more efficient; collectively, compute may remain the largest single line item for AI-heavy firms.

Employment, displacement, and the cost paradox

The fact that compute can cost more than employees complicates narratives about AI-driven job displacement. From a firm's perspective, automation only makes sense when the total cost of designing, training, deploying, and maintaining an AI system is lower than - or at least justifiable relative to - the wage cost, management overhead, and performance variability of human workers. When AI is more expensive, substituting capital for labour purely on cost grounds is irrational.

This does not mean AI will not change employment. Instead, it suggests a more nuanced pattern of complementarity and selective substitution. In domains where human labour is extremely costly or scarce, such as high-end legal services, algorithmic trading, or complex simulation, even expensive compute may be a bargain. In mass-market customer service or routine back-office work, by contrast, the current cost structure favours augmenting workers with AI tools rather than replacing them outright. The MIT study's finding that only around 23% of vision-centric jobs are currently economically automatable illustrates how narrow the immediate substitution window may be.

The paradox is that headlines about companies spending more on AI infrastructure than on salaries coexist with data showing limited net job losses attributable directly to AI. Part of the explanation is temporal: firms are investing ahead of adoption, building capabilities before fully restructuring their workforces. Another part is strategic: some firms see AI as a growth tool rather than a cost-cutting tool, aiming to enable new products and services rather than simply replacing existing staff.

Pricing models and hidden subsidies

One reason many users and even corporate customers underestimate the true cost of AI compute is the structure of pricing models. A significant portion of the market relies on flat subscription charges or simple usage tiers that do not map transparently to underlying infrastructure costs. For light users, this can be attractive: they pay a fixed fee and rarely hit limits. For heavy users, however, the provider may be effectively subsidising usage, especially if the subscription was priced before providers had a clear picture of real-world load.

Reports of AI software fees rising by 20% to 37% over a year indicate providers adjusting to this reality. As cost pressures mount - from energy, hardware procurement, and the need to recoup massive capital investments - providers are likely to shift toward more granular, usage-based pricing that reflects cost per token or per request more accurately. When that occurs, more enterprises will discover that their apparent labour savings are offset by higher-than-expected compute bills.

This evolution will bring AI closer to other utilities: electricity, cloud storage, and bandwidth. In each case, users ultimately pay for the marginal resource consumed, and efficient usage becomes a competitive advantage. Just as cloud-native firms learned to optimise workloads to reduce compute and storage charges, AI-native firms will need to optimise prompts, context lengths, caching strategies, and model architectures to minimise unnecessary tokens and reduce idle capacity.

Why the remark matters

The observation that compute can be more costly than employees is important for several constituencies.

For executives and boards, it underscores the need for rigorous capital allocation in AI initiatives. Projects should be evaluated not only on potential strategic upside but also on fully burdened compute economics: total cost per workflow, sensitivity to usage spikes, and exposure to future hardware and energy price shifts. In an environment where tech companies have already announced hundreds of billions in AI-related capital expenditure, misjudging these factors can have material consequences for profitability and competitive positioning.

For investors, the remark acts as a reminder that not all AI spending is value-creating. A significant share may be speculative or defensive, driven by fear of missing out rather than clear use cases. Distinguishing between firms that translate compute into durable revenue and those that merely accumulate expensive infrastructure will be a central task over the next decade.

For policymakers and labour economists, recognising the current cost structure is essential when interpreting forecasts of rapid, sweeping job automation. If AI is still more expensive than humans for the majority of tasks, then near-term labour market disruption is likely to be more contained and sector-specific than some narratives suggest. This does not eliminate the long-term risk of displacement as unit costs fall, but it introduces a window in which policy can focus on adaptation: training, re-skilling, and ensuring that productivity gains, when they arrive, are broadly shared.

Finally, for engineers and product teams, the remark is a design constraint. It implies that building AI systems is not just a problem of maximising accuracy or capability; it is also a problem of optimising for economic viability. Model selection, quantisation choices, caching, retrieval strategies, and system architecture all affect compute consumption. Teams that learn to treat tokens, GPU seconds, and watts as scarce resources, on par with developer time, will be better positioned to create sustainable AI products.

As AI infrastructure matures, the relative prices of compute and labour will continue to evolve. The present moment, in which a leading practitioner can credibly say that the compute bill dwarfs the wage bill, is a snapshot in a longer trajectory. Whether history records it as an early, capital-intensive stage on the way to widely affordable machine intelligence, or as evidence of an overcapitalised boom, will depend on how effectively organisations turn expensive silicon into genuine productivity.

"For my team, the cost of [AI] compute is far beyond the costs of the employees." - Quote: Bryan Catanzaro - Vice president of applied deep learning at Nvidia

‌

Quote: Dario Amodei - Founder and CEO, Anthropic

"Those who don't kind of see what's coming, who don't identify the moats they have, they're going to have a really hard time." - Dario Amodei - Founder and CEO, Anthropic

The practical problem is not simply that AI capability is improving quickly, but that the basis of competitive advantage is shifting faster than many organisations can recognise it. In a market where model quality, distribution, infrastructure, data access, and trust are all being repriced at once, the winners are likely to be those who can identify which advantages are durable before the rest of the field realises that the old ones have decayed.

That is the deeper force behind Dario Amodei's warning. It is not a casual remark about caution; it reflects a strategic view that many participants in the AI economy are mistaking visible momentum for defensible position. Anthropic's own public profile has made Amodei one of the clearest voices on the concentration of power in the AI era, and his concern appears to be that speed alone can create a false sense of security.

What a moat means in the AI context

A moat in the classic business sense is any structural feature that makes profits more resistant to competition. In AI, that can mean several different things at once: proprietary distribution, developer loyalty, enterprise integration, talent depth, compute access, safety credibility, regulatory readiness, or a feedback loop that improves the product faster than rivals can copy it. The difficulty is that AI moats are often unstable early on. A capability that looks like an enduring edge one year may be commoditised the next by better open models, cheaper inference, or a rival's stronger distribution channel.

That instability matters because many companies have been tempted to treat AI as a feature layer rather than a strategic reordering. But the recent market conversation has increasingly moved towards the idea that AI will not merely be sold into existing businesses; it will reorganise them from the inside. Commentary around the so-called AI rollup thesis argues that investors are buying labour-intensive businesses and rebuilding them around AI so that economics begin to resemble software rather than services. If that thesis proves correct, then the old source of value is not just under pressure; it is being redefined.

Why speed makes identification harder

Amodei's warning lands because the pace of improvement is itself part of the competitive landscape. He has said that cognitive ability in frontier systems can be doubling every four to 12 months, a pace that would make conventional strategic planning dangerously slow. When the underlying capability curve is that steep, the shelf life of an advantage shortens. What looks like a moat may actually be a temporary lead created by timing, capital, or first-mover publicity.

"Those who don’t kind of see what’s coming, who don’t identify the moats they have, they’re going to have a really hard time." - Quote: Dario Amodei - Founder and CEO, Anthropic

‌

Quote: Andrej Karpathy - Anthropic (Openai Founder, formerly head of Tesla AI)

"You can give [Claude Fable 5, the same underlying model as Mythos but with added safeguards] a lot more ambitious tasks than what you're used to, the model 'gets it' and it will just go, and it's never felt this tempting to stop looking at the code at all." - Andrej Karpathy - Anthropic (Openai Founder, formerly head of Tesla AI)

Modern software development is being pulled towards a regime where human programmers increasingly specify intent while machines decide the details. That shift is most visible not in abstract benchmarks but in the psychological moment when an expert developer realises they can hand over a far more ambitious task than before, watch an AI system autonomously decompose and implement it, and feel genuine temptation to stop inspecting every line it produces. For someone steeped in traditional notions of craftsmanship and code review, that temptation is both intoxicating and alarming.

The move from instructions to intent

The underlying transition is from imperative programming, where humans micromanage every step, to a declarative style where they specify success criteria and let an AI agent find a path. Historically, a senior engineer might spend hours designing architecture, writing scaffolding, and orchestrating tools; now, high-capability models can generate coherent multi-file projects, manage dependencies, restructure modules, and even propose test suites to validate their own work. In that world, the bottleneck shifts from typing speed or API recall to the clarity and completeness of the human specification.

This is what makes the ability to give a model substantially more ambitious tasks so significant. When an AI system can handle not just a function or a bug fix but an end-to-end feature, a migration, or a refactor across tens of files, the human role changes. The developer becomes more of a product and safety architect: deciding goals, constraints, and trade-offs, then auditing whether the agent met them. The quote speaks directly to this: work that once had to be decomposed into micro-prompts can now be expressed as a single high-level directive, with the model reliably filling in the operational gaps.

Why this particular endorsement matters

The significance of that shift is amplified by who is speaking. Andrej Karpathy is not a casual user experimenting with consumer tooling but a foundational figure in modern deep learning and applied AI. He was a founding member of OpenAI and later headed Tesla Autopilot, leading large teams building vision and planning systems for real-world safety-critical autonomy. He has also been one of the most visible educators in the field, teaching tens of thousands of practitioners how neural networks work and how to reason about their failure modes.

That background makes his sense of surprise at feeling "behind" as a programmer, and the ego hit of giving more of the work to AI, noteworthy. When someone with deep understanding of model limitations and training quirks reports that a code-focused model "gets it" on larger, more complex tasks, it is not naive enthusiasm but a statement grounded in long experience of what usually goes wrong. It suggests a step change in practical reliability, especially on extended coding sessions.

Claude Fable 5, Mythos and safeguards

The specific model involved, Claude Fable 5, sits in a deliberately structured product family. Anthropic describes Fable 5 as sharing the same underlying model as Claude Mythos, but with additional safeguards and alignment layers. In practice, that means the core capability for long, multi-step reasoning and coding is retained, while the system has tighter policies around potentially harmful outputs and more conservative behaviours in ambiguous domains. Mythos is aimed at frontier, less constrained exploration; Fable, at general-purpose work where safety, compliance, and predictability matter.

Karpathy himself characterises Fable 5 as a "major-version-bump-deserving step change", particularly on long and difficult tasks. Reports and demos around the release highlight stronger performance not only on standard coding benchmarks but on real-world development flows: navigating large repositories, performing multi-file edits, and maintaining context over extended sessions. The result is a system that feels less like an autocomplete gadget and more like a junior engineer who can hold a problem in mind over hours of work.

Crucially, the "added safeguards" do not just refer to refusal policies. They also encompass training and inference-time measures that make the model more robust to prompt injection, reduce hallucinations, and bias it toward verifiable operations like running tests or inspecting diffs instead of bluffing. That combination - high capability plus strong guardrails - is what makes handing over more ambitious tasks psychologically viable. The user is not simply trusting a stochastic parrot; they are interacting with a toolchain engineered for cautious autonomy.

Karpathy's journey into agentic coding

To understand the deeper significance of the quote, it helps to situate it within Karpathy's broader ideas about software. In recent years he has described a transition towards what he calls "Software 3.0" or "agentic coding". Earlier eras could be caricatured as follows: in Software 1.0, humans wrote explicit logic; in Software 2.0, humans trained models but still wrote the surrounding infrastructure; in Software 3.0, AI systems increasingly write, test, and maintain significant portions of the codebase themselves, guided by human intent and oversight.

Within that frame, he has promoted practices like "vibe coding", where a developer converses with an AI assistant, iteratively refining prompts and reading outputs rather than manually hand-crafting every function. The point is not laziness but bandwidth: by offloading boilerplate and low-level wiring, humans can spend more time on product thinking, architecture, and evaluation. Yet he has also been candid about the danger of "brain atrophy" if humans stop engaging deeply with technical substance and become mere prompt routers.

His move to Anthropic, announced publicly on X, is explicitly about pushing this paradigm further. He is joining the Claude pre-training team, with a mandate to build a sub-team focused on using Claude itself to accelerate pre-training research. That is, he is not only using AI to write ordinary application code but using AI agents to help design, run, and analyse the experiments that produce the next generation of AI models. Some observers describe this as laying the groundwork for recursive self-improvement, where systems contribute directly to their own advancement.

The temptation to stop reading the code

The most charged part of the quote is not the praise for task capability but the admission that "it's never felt this tempting to stop looking at the code at all". That sentence crystallises a new risk frontier. Up to now, cautious practitioners have recommended heavy human inspection of AI-generated code: checking logic, scanning for security flaws, reviewing for maintainability. Those practices are time-consuming, but they preserve a culture where humans remain accountable for what ships.

As model quality improves, the marginal benefit of reading every line may appear to shrink. When the output often looks clean, idiomatic, and passes tests, the pressure to skim rather than scrutinise grows stronger. That temptation is exacerbated by business incentives. If an AI agent can implement a feature in 1 hour that would take a human 10 hours, organisations will be driven to capture that 9-hour gain, especially under competitive pressure. Deep review may be cast as optional overhead rather than mandatory safety.

This dynamic is not unique to coding. In aviation, pilots became less hands-on as autopilots grew more reliable, leading to worries about skill decay; yet in rare edge cases, human intervention remained vital. The same pattern looms in software: as AI-generated code becomes the default, there is a risk that fewer engineers retain the ability to reason from first principles when the system fails in a novel way.

Strategic and technological tension

The tension, then, is between speed and scrutiny, between trusting an increasingly competent agent and insisting on human understanding. On one side lies the productivity windfall: AI can manage dependency graphs, propose architecture refactors, and generate regression tests at a pace that would overwhelm any human team. On the other side lies epistemic opacity: large language models generate code via pattern completion, not explicit formal derivation, and even when the code passes tests, it may encode subtle bugs, non-obvious security weaknesses, or performance pathologies.

In safety-conscious organisations, this tension will likely be addressed with layered controls. For critical systems, one can imagine a workflow where an AI agent proposes changes, another independent agent attempts to break or exploit them, and human reviewers arbitrate. For less critical contexts, teams may accept a higher degree of automated autonomy, using telemetry and canary deployments to catch regressions in production.

Technologically, the quote points to a world where coding models are integrated deeply into development environments as persistent agents rather than stateless assistants. In that world, the system remembers project history, tracks unresolved issues, and maintains a map of the codebase. This is already visible in the way tools like Claude Code are embedded into full IDE surfaces where generation, testing, and git operations happen in one loop. The practical question is not whether such agents will exist but what guardrails and observability layers they will carry.

Anthropic's safety-first positioning

Anthropic has invested heavily in a brand and research agenda built around "constitutional" AI and safety. That approach involves specifying normative guidelines that models are trained to follow, and then auditing behaviour against those guidelines. For coding, that can be extended into concrete policies: refuse to write insecure patterns, prefer constant-time implementations in cryptographic contexts, suggest mitigation when encountering user-supplied input.

Fable 5's positioning as "Mythos but safe" reflects a belief that potential harms can be reduced without sacrificing too much capability. Karpathy's enthusiasm suggests that, at least in his workflows, the safeguards are not experienced as a hindrance but as a trust multiplier. He can instruct the model more ambitiously precisely because he expects it to act conservatively when it encounters sensitive operations and to avoid reckless actions like deleting large portions of a repository without confirmation.

Yet there remains an unresolved debate over how far safety techniques can go in mitigating risks that emerge from sheer scale and generality. Even a strongly aligned model may generate exploitable code when given innocuous prompts, simply because the space of correct-looking but vulnerable implementations is vast. Critics argue that this cannot be fully addressed by refusal policies and that deep formal methods or language-level safety guarantees will be necessary. The temptation to "stop looking at the code" must be evaluated against that backdrop.

Debates and objections

There are at least four major lines of objection or concern surrounding the world implied by the quote.

First, there is the professional identity and labour market concern. If AI tools can handle an increasing share of coding, especially the more routine or boilerplate-heavy parts, junior roles may shrink, making it harder for new developers to gain experience. Karpathy himself acknowledges a crossroads between "brain atrophy" and skill evolution, where humans must decide whether to re-skill towards higher-level system design and evaluation or risk being displaced.

Second, there is the epistemic reliability concern. Benchmarks can show impressive averages, but systems are still brittle on rare edge cases, poorly specified tasks, or ambiguous requirements. A sense that "the model gets it" can mask the fact that its understanding is statistical, not semantic in a human sense. Critics worry that as trust grows, organisations will deploy AI-generated code beyond domains where its failure modes are well characterised.

Third, there is the self-referential risk of using AI to build the next generation of AI. The work Karpathy is taking on at Anthropic involves using Claude to accelerate pre-training research itself, potentially moving towards recursive self-improvement. Enthusiasts argue that this is necessary to make progress at the current frontier, where experiments are too numerous and complex for purely human pipelines. Skeptics warn that errors, biases, or misalignments may be amplified if AI-driven research loops are not carefully constrained and audited.

Fourth, there is the cultural concern. Software engineering has long valued code readability not only for maintainability but as a vehicle for knowledge sharing. If more of the codebase is generated and fewer humans read it deeply, tacit knowledge may concentrate in the behaviour of models rather than in the minds of engineers. Some fear a loss of craftsmanship and a drift towards opaque systems even within a single organisation.

Why this moment matters

Despite these concerns, the practical direction of travel is clear. Developers are already wiring multiple frontier models into a single development surface, choosing per-task which to call, whether Claude, GPT, or others, based on performance and cost rather than vendor loyalty. Tools that bundle coding, testing, and version control into agentic workflows are proliferating. The quote captures a threshold where these tools no longer feel like experimental sidekicks but like the primary engine of implementation.

From a strategic perspective, this changes how organisations think about their software capability. Instead of asking how many engineers they can hire, they will ask how effectively they can orchestrate AI coding capacity: prompt libraries, evaluation harnesses, and safety procedures become as important as hiring pipelines. Companies that embrace this shift thoughtfully will invest in engineers who are excellent at specifying intent, designing tests, and auditing AI proposals - a different profile from traditional full-stack roles.

For individual developers, it poses a challenge and an invitation. The challenge is to resist the laziness of unexamined trust while also resisting nostalgia for a world where writing every line oneself was feasible. The invitation is to climb the abstraction ladder: to become better at defining product goals, at thinking in systems, at debugging not just functions but entire AI-assisted workflows.

Karpathy's experience with Claude Fable 5 illustrates that frontier models are now strong enough to make this shift emotionally palpable. When a veteran practitioner feels tempted to stop reading the code, that is not a signal to give up scrutiny, but it is evidence that the agent has crossed a qualitative threshold. The world of software will be shaped by how we respond to that feeling: whether by surrendering to it, ignoring it, or deliberately building new practices, tools, and norms that harness its power without abandoning responsibility.

"You can give [Claude Fable 5, the same underlying model as Mythos but with added safeguards] a lot more ambitious tasks than what you're used to, the model 'gets it' and it will just go, and it's never felt this tempting to stop looking at the code at all." - Quote: Andrej Karpathy - Anthropic (Openai Founder, formerly head of Tesla AI)

‌

Term: Absorption costing - Managerial accounting

"Absorption costing, also known as full costing, is a managerial accounting method that captures and assigns all manufacturing costs to the specific products being produced. Under this system, the unit cost of an item absorbs every single expense required to get it ready for sale, including both fixed and variable costs." - Absorption costing - Managerial accounting

Profitability in manufacturing depends as much on how costs are measured as on how efficiently factories run. The way overheads such as factory rent, depreciation and supervisory salaries are spread across products can change reported margins, influence pricing, and even affect behaviour inside the plant. Absorption costing sits at the centre of this machinery, because it drives the unit cost that flows into inventory valuation, cost of goods sold, and headline profit figures used by boards, lenders and tax authorities alike.

Underlying economic issue: who should bear the fixed factory bill?

Manufacturing businesses incur large fixed costs to keep production capacity available: buildings, machines, salaried staff and support functions. These expenses are paid regardless of whether the factory runs at 20 percent or 90 percent of capacity. The central issue is how to attribute this fixed factory bill to individual units of output so that financial statements, pricing decisions and performance assessments make sense.

Absorption costing answers by insisting that every unit produced should carry a fair slice of that fixed burden, alongside its direct materials, direct labour and variable overhead. In other words, the economic logic is that capacity costs exist in order to make units, so units must "absorb" them. This contrasts with variable costing, where fixed manufacturing overhead is treated as a period expense of having capacity, rather than a cost of individual units.

The tension between these views is not merely academic. It determines whether unsold inventory carries embedded fixed overhead on the balance sheet (absorption costing) or whether all fixed overhead hits the income statement immediately (variable costing). The result is different profit paths over time when production and sales volumes diverge.

Substantive meaning: what costs are absorbed?

In practice, absorption costing brings together four categories of manufacturing cost as product cost:

- Direct materials

- Direct labour

- Variable manufacturing overhead (for example, indirect supplies, power linked to machine hours)

- Fixed manufacturing overhead (for example, factory rent, depreciation, factory management salaries)

These costs are all treated as part of inventory while units remain unsold and only become cost of goods sold when the units leave inventory. Selling, general and administrative costs, whether fixed or variable, remain period costs and are never attached to units.

From a financial reporting standpoint, this approach is not optional. Under major accounting frameworks, inventory must be carried at cost, including an appropriate allocation of fixed and variable production overhead. Absorption costing therefore underpins external profit reporting, tax computation and many loan covenant calculations.

Mathematical specification of unit cost under absorption costing

Although the mechanics appear straightforward, writing the relationships explicitly clarifies how production volume and allocation rates interact. Suppose a single product is manufactured in a period. Denote:

- : total direct materials cost for the period

- : total direct labour cost

- : total variable manufacturing overhead

- : total fixed manufacturing overhead

- : total units produced in the period

The total product cost for the period under absorption costing is:

The absorption costing unit cost is then:

Variable costing would instead treat only variable elements as product cost. Let be total variable manufacturing cost (). The variable costing unit cost is:

The difference between the two unit costs is simply the fixed overhead per unit:

This fixed overhead rate, often computed per machine hour or labour hour in multi-product environments, is the core mechanism by which overhead is absorbed into inventory. When production volume rises, increases, reducing fixed overhead per unit; when volume falls, each unit carries a heavier fixed overhead charge.

Income effects: production vs sales volume

The choice of costing method does not change total cash flows, but it can change the timing of reported profit. Under absorption costing, the fixed overhead tied to unsold units remains in inventory and is not yet expensed. Under variable costing, all fixed manufacturing overhead for the period appears immediately as an expense. As a result, in any period where production exceeds sales, absorption costing will usually show higher profit than variable costing; when production is below sales, the reverse occurs.

A simple reconciliation highlights the mechanism. Define:

- : units sold in the period

- : change in inventory units (positive if inventory grows)

- : fixed overhead per unit produced

The difference between absorption costing net income () and variable costing net income () in a period is:

When production exceeds sales so that , fixed overhead is deferred in inventory and exceeds . When sales draw down inventory so that , previously deferred fixed overhead flows to cost of goods sold, making lower than . When production equals sales, both methods report the same profit.

This algebra explains why standard-setting bodies still require absorption costing for external reporting but many internal management reports supplement it with variable or contribution costing to show the direct profit impact of volume changes.

Practical mechanics: cost pools and allocation bases

The theoretical unit cost formulas mask a significant practical challenge: allocating overhead to products in a way that is both systematic and economically meaningful. In a multi-product plant, overheads are typically collected into cost pools and assigned to products using allocation bases such as machine hours, labour hours, or material quantity.

A typical implementation proceeds in three stages:

- Establish cost pools: group similar overhead costs, for example all machine-related expenses, maintenance, and depreciation into a machinery pool; factory management salaries into a supervision pool.

- Determine usage measures: identify the driver that best reflects how products consume each cost pool, such as machine hours, direct labour hours, or production runs.

- Compute and apply rates: divide each pool by its total driver quantity to obtain a rate (for example, per machine hour), then multiply by each product's usage to assign overhead.

Absorption costing does not prescribe a particular choice of allocation base; the method is an overarching principle that all manufacturing costs should be absorbed by units. The sophistication of the allocation scheme can range from a single plant-wide rate to detailed activity-based costing with many cost pools and drivers.

Relation to variable costing and contribution analysis

Variable costing strips away the fixed overhead component of unit cost, focusing on the marginal resource consumption of each unit. For internal decision-making, this provides a cleaner view of how additional units affect profit because fixed overhead is held constant. Contribution margin analysis, which subtracts variable costs from sales to show the amount available to cover fixed costs and profit, is built on this variable costing logic.

The key contrast can be summarised conceptually:

- Absorption costing: all manufacturing costs, including fixed overhead, are product costs; inventory includes fixed overhead; external reporting requirement.

- Variable costing: only variable manufacturing costs are product costs; fixed manufacturing overhead is a period cost; used internally for planning, pricing, and performance evaluation.

Managers need both lenses. Absorption costing ensures financial statements comply with standards and reflect the full cost invested in inventory. Variable costing illuminates how decisions about volume, mix, and pricing will change cash profit in the short and medium term.

Major schools of thought and debates

Within managerial accounting, debates around absorption costing centre on three themes: performance measurement, decision relevance and overhead allocation philosophy.

First, performance measurement. Critics argue that tying profit to production volume via overhead absorption can create perverse incentives. Because producing more units spreads fixed overhead over more units, the unit cost falls, cost of goods sold per unit drops, and short-term profit often rises as long as the additional units go into inventory rather than being sold at a loss. This can encourage managers evaluated on absorption-based profit to overproduce relative to demand, leading to excess inventory, storage costs and potential obsolescence.

Proponents respond that robust inventory and working capital controls, together with careful use of variable costing and non-financial metrics, can mitigate these incentives while preserving the benefits of full cost information for pricing and long-term investment decisions.

Second, decision relevance. For decisions such as special orders, make-or-buy evaluations, or short-term pricing in the face of spare capacity, the fixed overhead portion of unit cost is sunk in the short run and should not drive the decision. Analysts therefore often ignore the absorbed fixed overhead in unit cost and instead work from variable costs and incremental cash flows. This creates a conceptual split between the "accounting cost" of a unit (including overhead) and the "economic cost" relevant for a particular decision scenario.

Third, overhead allocation philosophy. Traditional absorption costing usually allocates overhead using volume-based drivers like labour or machine hours. As production technologies and product diversity expanded, critics pointed out that such bases can distort product costs: low-volume, complex products may consume disproportionate setup and scheduling resources that do not scale with simple machine hours. Activity-based costing emerged as a refinement, retaining the absorption principle but using multiple cost drivers linked to underlying activities. This evolution reflects a broader debate about whether any allocation of common fixed costs is inherently arbitrary or whether careful design can approximate economic cause-and-effect sufficiently for management use.

Why absorption costing still matters

Despite these criticisms and refinements, absorption costing remains central to financial management for several reasons.

First, it is mandated for external reporting and taxation. Inventory must include an allocation of fixed overhead under accounting standards, which means any manufacturer preparing audited accounts must implement some form of absorption costing. As a result, banks, investors and regulators interpret performance largely through absorption-based statements.

Second, it anchors pricing and profitability analysis in the full cost base. Over time, businesses must recover both variable and fixed manufacturing costs through prices if they are to remain viable. While short-run decisions can legitimately use variable cost information, sustainable pricing strategies need to recognise the burden of capacity costs, which absorption costing surfaces.

Third, it disciplines capacity investment and utilisation decisions. By making fixed overhead visible within unit costs, absorption costing signals when capacity is under-utilised and factory-scale economics are deteriorating. Rising unit costs due to falling volume highlight the financial consequences of excess capacity or lost demand, encouraging rebalancing either through market expansion or capacity reduction.

Finally, it provides a common language for integrating financial control with operational data. Overhead rates per machine hour or per labour hour connect accounting records to shop-floor metrics, enabling cost variance analysis, standard costing systems and budgetary control. Even when management decisions rely on more refined models, the absorption framework underlies many of the control reports they receive.

Contemporary practice and evolving challenges

Modern manufacturing environments pose new challenges for absorption costing. Automation reduces direct labour content and increases capital intensity, weakening the link between simple volume measures and true resource consumption. Multi-site global supply chains complicate the definition of what counts as "manufacturing" overhead for a particular product. Customisation and short product life cycles create more setup and engineering costs, whose allocation may dominate traditional overhead pools.

Practitioners respond by:

- Refining cost pools and drivers, for example separating machine-level overhead, setup costs, quality assurance and engineering support so that each is allocated using an appropriate activity driver.

- Integrating operational systems with costing, using data from production execution and planning systems to update overhead drivers in near real time.

- Running parallel views: one set of absorption-based numbers for external reporting and high-level budgeting, and alternative contribution and activity-based analyses for operational decisions.

Even as digital tools make more sophisticated costing feasible, the fundamental requirement remains: inventory values on the balance sheet and cost of goods sold in the income statement must reflect all manufacturing costs, including an allocation of fixed overhead. Absorption costing provides the conceptual and procedural backbone for meeting that requirement.

Understanding how this method works, where it can mislead, and how it interacts with alternative views such as variable and activity-based costing equips managers, analysts and students to interpret reported margins critically, design better performance measures and make more informed operational and strategic decisions.

‌

Quote: Anthropic - Artificial Intelligence - Recursive Self Improvement

"Claude writes a significant proportion of Anthropic's code. As of May 2026, more than 80% of the code we merge into Anthropic's codebase was authored by Claude. Before Claude Code launched in research preview in February 2025, this number was in the low single digits." - Anthropic - Artificial Intelligence - Recursive Self Improvement

The moment an internal engineering metric flips from human-written to AI-written code marks a structural shift in how complex software systems are built and evolved, not just a productivity bump for individual programmers. It signals that the primary generative force shaping a large codebase has become a model rather than a workforce, and that human engineers are increasingly curators, reviewers, and system designers guiding a non-human author.

In Anthropic's case, that shift is tightly bound to a broader concern: the trajectory from powerful coding assistants to systems that can meaningfully participate in, and eventually drive, the entire AI research and development cycle. When an AI model can write most of the code for its own infrastructure, tools, and scaffolding, the boundary between "AI helps humans build AI" and "AI builds AI" becomes thinner, and the timeline to more thorough forms of recursive self-improvement compresses.

From coding assistant to dominant author

Large language models like Claude were initially introduced as general-purpose assistants: chatbots that could answer questions, draft text, help with documents, and generate basic code. Early coding capabilities looked like autocomplete on steroids: filling in small functions, refactoring snippets, or suggesting tests. In that phase, AI was clearly subordinate to the human developer, integrated into IDEs as a suggestion layer with humans still doing the conceptual work, system design, and most of the implementation.

The internal numbers highlighted by Anthropic indicate that this relationship has inverted in at least one crucial dimension: the share of merged code now primarily authored by the model rather than by employees. Human engineers still specify goals, review diffs, and orchestrate work, but the bulk of literal line-by-line code is machine-generated. Independent developers using Claude Code describe a similar workflow: they treat the AI interface almost as the primary editor, with a traditional editor demoted to a verification and correction tool. One typical pattern is to spend most of the time explaining the problem and iterating on plans with the model, then auto-accept its changes, and only afterwards manually review and adjust. That mirrors the internal picture: humans move up a level of abstraction, while the model handles implementation detail at scale.

The key structural consequence is that the constraint on how fast a codebase can change shifts away from human typing speed or individual concentration. Instead, the main bottlenecks become prompt quality, review capacity, testing infrastructure, and organisational willingness to deploy AI-authored changes. Once those guardrails are in place, the marginal cost of asking the AI to implement yet another subsystem approaches the cost of specifying it, rather than building it yourself.

Recursive self-improvement: several distinct mechanisms

The idea of recursive self-improvement (RSI) in AI originally focused on a dramatic scenario: a sufficiently capable system rewrites its own code, becomes smarter, uses that increased intelligence to further rewrite itself, and so on, producing an "intelligence explosion". In more formal discussions, RSI is framed as a process where an AI improves its own ability to improve, potentially leading to superintelligence if the feedback loop is strong enough. For decades this remained hypothetical, because no deployed system could modify its own internals in a reliable, directed way.

Recent work on RSI has clarified that there are at least three separable mechanisms, each with different bottlenecks and risk profiles. First, there is what some researchers call scaffolding-level improvement: you keep the base model weights fixed but wrap the model in better tools, agents, and workflows that make more effective use of its capabilities over time. Coding agents that orchestrate tool calls, decompose tasks into subproblems, and maintain long-lived workspaces fall into this category. The AI does not change itself directly, but the environment around it is iteratively improved-often with heavy AI assistance.

Second, there is improvement of the broader AI research and engineering process. Here, models help design better architectures, tune hyperparameters, automate experiments, and analyse results. The AI is not rewriting its own weights on the fly but is heavily used by human researchers to run more experiments faster, test more ideas, and push the frontier models forward. In effect, the research pipeline that generates new models is being partially automated by prior models, shortening cycle times.

Third, there is the more classical vision of model-internal self-modification: a system that can inspect, reason about, and deliberately rewrite its own internal structure. In the current deep learning paradigm, this would require some combination of advanced mechanistic interpretability and internal training or optimisation loops guided by the model itself. This is the least empirically grounded category today; there are not yet widely documented systems that autonomously edit their own weights in a stable, predictable way in production, without external training pipelines.

Anthropic's published analysis emphasises that the world is beginning to see concrete progress in the first two forms of RSI, while the third remains more speculative but increasingly relevant. The metric that more than four-fifths of merged code comes from Claude is directly relevant to the first two types: scaffolding-level improvement and research-process acceleration. It is not yet full-blown self-modifying AI, but it clearly moves along the continuum from "AI as a tool" to "AI as a primary agent in its own development ecosystem".

What does it mean for AI to "build itself"?

In its report "When AI builds itself", Anthropic defines a future regime in which AI systems can design, implement, and train successor models with minimal human involvement. That scenario includes choosing research directions, generating experimental configurations, running training runs, monitoring results, and iteratively refining architectures, all mediated by models rather than individual researchers. The report stresses that current systems have not yet reached this stage, but the pattern of automation suggests a trajectory that could plausibly converge towards it in the medium term.

Already, tools like Claude Code enable models to handle much of the mundane engineering needed to integrate new components, instrument experiments, and manage evaluation pipelines. For example, a model can generate scripts to launch training runs, write configuration files for different hyperparameter sweeps, produce dashboards for monitoring metrics, and adapt code to new hardware or inference setups. Engineers remain in the loop to approve designs, interpret anomalies, and adjust objectives, but they increasingly operate at the level of specifying desired behaviours and constraints rather than manually wiring every detail.

Once the majority of the code surrounding the training and deployment pipeline is generated by models, the human role shifts to defining goals, setting safety criteria, and analysing higher-level trade-offs. The mechanics of "building"-in the sense of constructing new experimental setups, converting research ideas into running code, and instrumenting systems-becomes heavily AI-mediated. Over time, if models learn from this process (for instance by analysing successful and failed experiments), they can become better at designing and conducting AI research itself.

Strategic and technological tensions

The shift towards AI-written code simultaneously advances capability and heightens safety concerns. On the one hand, organisations that can mobilise models as large-scale coding engines enjoy dramatic efficiency gains. Anthropic and other labs report that a single engineer working with AI can now accomplish several times the output of a solo developer from only a few years ago. Internal numbers cited in commentary around the Anthropic report suggest that in some workflows, one engineer paired with advanced coding models can match the productivity of many engineers without such tools. This is economically attractive and strategically hard to ignore, especially in competitive markets where speed and feature velocity matter.

On the other hand, every additional layer of automation in the AI development pipeline reduces the surface area where humans directly engage with the details of what is being built. If most of the code diff is AI-authored, there is a constant pressure to keep review lightweight enough not to erase the productivity gains. Organisations must decide how much friction to reintroduce via testing, code review, and formal verification to compensate for the opacity and potential brittleness of model-generated software.

There is also a tension between transparency and performance. Coding models are trained on large corpora and fine-tuned for usefulness, but their internal reasoning is not inherently interpretable. When such models are tasked with writing critical infrastructure-especially infrastructure that itself trains or deploys models-the demand for rigorous verification increases. Yet the whole point of using AI at scale is to compress the development cycle; fully auditing every AI-generated line is often infeasible. This pushes teams towards probabilistic assurance: relying on automated tests, static analysis, and spot checks, accepting that some defects or misalignments may slip through.

Anthropic's policy stance reflects this tension. The organisation has publicly advocated for a potential future pause or slowdown in frontier AI development if such a pause can be coordinated and verifiable. At the same time, it continues to deploy tools that significantly accelerate the AI engineering process. The argument is not that acceleration ought to stop now, but that the world should build governance and monitoring infrastructure capable of making a pause credible if systems begin to show signs of more autonomous, less controllable forms of self-improvement.

Debates and objections

There are several lines of scepticism about treating AI-written code as a near-term marker of recursive self-improvement. One objection is that a model generating code on command is still deeply dependent on a human-constructed training pipeline and hardware stack. The AI may write most of the repository, but it does not yet select its own training data, modify its own loss functions, or commission new datacentres. From this perspective, calling such behaviour "self-improvement" risks overstating the level of autonomy.

Another objection focuses on quality. Critics argue that high percentages of AI-written code may reflect a bias towards quantity over robustness. If models can quickly generate large volumes of superficially plausible code, teams may be tempted to merge more, trusting tests and users to uncover issues. This could increase technical debt and vulnerability surfaces, particularly if AI-generated code uses patterns that are less idiomatic or less well understood by the team. In this view, the headline figure of more than four-fifths AI-authored code says more about internal incentives and tooling than about genuine leaps in capability.

A further concern is that the narrative of "AI writing its own code" might be leveraged for competitive signalling or regulatory positioning. Emphasising that models are rapidly approaching self-building status can support calls for stricter regulation, but it can also serve as a way to demonstrate leadership and sophistication in the race for funding and talent. Observers therefore scrutinise such claims, asking how the metric is defined (for example, how attribution between human and AI edits is measured) and what kinds of code are included-core model logic, surrounding infrastructure, or peripheral tools.

Supporters of the stronger interpretation respond that the exact percentage is less important than the direction of travel and the kinds of tasks being automated. The movement from "AI can write helper scripts" to "AI can build and maintain major production systems" represents a qualitative shift. Moreover, as AI-generated code begins to include experiment orchestration, data processing pipelines, and evaluation harnesses, the model's role in improving subsequent models increases, even if human oversight remains substantial. From this vantage point, the concern is not that current systems are already self-improving in the strongest sense, but that they are laying the groundwork for a regime in which incremental capability increases lead to disproportionate gains in further capability development.

Why it matters beyond software engineering

The implications of AI writing most of the code in a frontier lab extend well beyond the internal life of software teams. One major dimension is economic. If an AI-augmented engineer can do the work of several traditional engineers, the effective labour cost of software development drops sharply. Over a horizon of a few years, this could reshape labour markets, favouring organisations that can most effectively integrate AI into workflows. Entire categories of skilled work-software engineering, research assistance, data analysis, legal drafting-could be automated at a pace that leaves limited time for institutions to adapt.

Another dimension is geopolitical. Access to models capable of acting as high-bandwidth coding engines becomes a strategic asset. States or firms that control such systems can upgrade their digital infrastructure, defence systems, and research capabilities faster than competitors. If recursive self-improvement processes take hold, the gap between leading actors and followers could widen rapidly. This is one reason why some analysts emphasise the risks of concentration of power: if a small number of organisations own the most capable self-improving AI systems, they may acquire outsized influence over economic and political developments.

There is also a safety dimension that goes beyond the immediate risk of buggy code. As AI systems participate more in their own development, misalignments in objectives or reward signals can be compounded. If an AI is tasked with optimising for performance on certain benchmarks, and it also plays a role in designing the evaluation apparatus and experimental setups, it might inadvertently favour changes that make it look better on metrics without improving, or even while degrading, its broader alignment with human values. The more of the research loop is automatised, the more important it becomes to design robust, hard-to-game objectives and interpretability tools.

Finally, there is an epistemic dimension. When AI systems write most of the code, run most of the experiments, and summarise most of the results, human understanding of complex software and research landscapes can become indirect. Engineers and scientists may interact primarily with AI-generated abstractions of what is going on. This can be efficient, but it also risks a kind of institutional deskilling: fewer people understand systems end-to-end, making it harder to detect systemic errors, correlated failures, or unanticipated interactions. In high-stakes domains, that loss of deep understanding could itself become a safety hazard.

The emerging role of human engineers

In the near term, the rise of models as dominant code authors does not eliminate the need for human engineers; it changes their role. Reports from practitioners using Claude Code suggest that humans increasingly focus on problem decomposition, specification, and verification. They spend more time writing detailed natural language descriptions of desired behaviour, orchestrating multi-step workflows, and designing tests that capture subtle requirements. They also become stewards of code quality and maintainers of conceptual coherence across rapidly evolving codebases.

This role shift is non-trivial. Writing good prompts or instructions is a skill; designing prompts that anticipate edge cases, security concerns, and performance constraints is even more demanding. Similarly, effective verification under conditions of AI-generated abundance requires new practices: stronger automated test suites, better monitoring, and perhaps new forms of formal methods that are integrated into everyday workflows. Human engineers who adapt to these demands may become more like system architects and editors, curating and refining the work of a powerful but sometimes unreliable assistant.

At the same time, there will likely remain pockets of development where human-written code is preferred or required, especially for safety-critical components, low-level systems programming, or domains where subtle domain knowledge is hard to transmit through prompts alone. The distribution of human effort across a codebase will change: less time on boilerplate and repetitive patterns, more on rare but consequential decision points.

Looking ahead

The internal data that an AI system now authors the majority of a leading lab's merged codebase should be understood as a waypoint, not an endpoint. It marks a concrete, measurable point on a curve that leads from basic assistance to deeper forms of recursive self-improvement. The same dynamics that allow models to dominate code authoring-scaling, better scaffolding, agentic tools, and integration into research workflows-are also those that will shape how quickly AI systems begin to design and build their successors with decreasing human input.

Whether this trajectory culminates in controllable, beneficial systems or in hard-to-govern, rapidly self-improving agents will depend on decisions being made now: how much autonomy to grant coding models, what review standards to enforce, how to design incentives for safety rather than pure speed, and what international coordination mechanisms to build in anticipation of more powerful RSI. As the proportion of AI-written code grows, so too does the responsibility to align not just the models, but the socio-technical systems that surround them.

"Claude writes a significant proportion of Anthropic’s code. As of May 2026, more than 80% of the code we merge into Anthropic’s codebase was authored by Claude. Before Claude Code launched in research preview in February 2025, this number was in the low single digits." - Quote: Anthropic - Artificial Intelligence - Recursive Self Improvement

‌

Strategy Tool: Rethinking SWOT analysis in the context of AI

AI-SWOT reframes classic SWOT for the AI era by treating AI as a strategic amplifier and mitigator, not just another technology. It shows how AI can dramatically amplify existing strengths and opportunities (through scale, speed, data flywheels and new business models) while mitigating key weaknesses and threats (by closing capacity gaps, enhancing risk detection and building early?warning systems). The tool introduces a structured, workshop-ready process that walks leaders through: (1) identifying where AI can turn genuine strengths into durable moats, (2) using AI to unlock or accelerate external opportunities, (3) targeting AI at the specific weaknesses that drive competitive loss, (4) deploying AI to detect and neutralise emerging threats, especially in the WT quadrant, and (5) recognising AI itself as a new category of threat via competitor amplification and low-barrier new entrants. Packed with contemporary case studies (Nike, Amazon, Netflix, Klarna, JPMorgan, Siemens, boutiques vs. global firms), diagnostic questions, and stepped tasks, AI-SWOT gives executives a practical, evidence-based way to convert AI from a generic initiative into a focused, advantage-creating strategy tool.

‌

Term: Lean manufacturing

"Lean manufacturing is a production methodology that maximises productivity while systematically minimising waste. The core philosophy is to eliminate any step or resource that does not add value to the end customer, ultimately delivering higher quality products at a lower cost and in less time." - Lean manufacturing

Pressure to deliver higher quality at lower cost in shorter lead times has forced production systems to confront a fundamental constraint: every extra handoff, queue, batch, and defect consumes scarce capital, time, and human attention that could be redeployed to value-creating work instead. The practical challenge is to design operations so that resources follow customer value, not historical habits or departmental silos.

From traditional mass production to Lean thinking

Conventional mass production systems typically optimise for equipment utilisation and large batches, relying on forecasts to justify high inventory levels and long campaigns on each machine. This can mask deep inefficiencies: products sitting in warehouses, operators waiting for upstream processes, and entire batches scrapped due to a single defect discovered late in the sequence. By contrast, Lean reorganises the same resources around responsiveness and waste reduction, often revealing that much of the apparent "efficiency" of mass production comes from pushing hidden costs downstream.

Historically, this shift was crystallised in the Toyota Production System, which combined just-in-time supply, rapid problem detection, and worker-led improvement to meet diverse demand with limited capital after the Second World War. Over time this approach was abstracted into a general management system applied not only in automotive plants but also in electronics, pharmaceuticals, logistics, and even healthcare. The central practical implication is that processes are redesigned so that only customer-valued work survives and everything else is questioned.

Waste as the central diagnostic lens

The mechanism that links everyday operations to strategic performance is the disciplined identification and removal of waste, broadly defined as any activity consuming resources without changing the product in a way the customer would pay for. Classic Lean practice categorises waste into recurring patterns such as overproduction, waiting, unnecessary transportation, excess inventory, overprocessing, defects, and underutilised human skills. Each of these patterns translates directly into slower response, higher cost, and reduced quality.

For example, producing far ahead of demand inflates inventory and ties up working capital, yet does nothing to improve the customer experience if specifications or preferences change in the meantime. Similarly, complex approval layers or redundant inspections can create overprocessing, where work is done repeatedly to compensate for unstable upstream processes rather than stabilising those processes in the first place. By repeatedly asking whether a given step adds value from the customer's perspective, Lean teams progressively strip away these non-essentials.

The five core principles and their operational meaning

Various authors distil Lean into five interlocking principles: value, value stream, flow, pull, and perfection. These are less an abstract philosophy than a practical roadmap for redesigning production.

1. Value as defined by the customer

Value is specified in terms of the customer's needs, not the producer's convenience. This includes the product's features and performance, but also delivery reliability, lead time, and total cost of ownership. When organisations misjudge value, they often invest heavily in features or internal metrics (such as machine utilisation) that the customer neither notices nor rewards, while neglecting speed, consistency, or service.

In practice, value clarification requires structured dialogue with customers, analysis of complaints and returns, and often cross-functional teams responsible for a product across its lifecycle. Once value is properly defined, it becomes the reference for deciding which process steps are essential and which are candidates for elimination or redesign.

2. Mapping the value stream

The value stream comprises all actions required to bring a product from concept to launch and from raw material to finished good in the customer's hands. Value stream mapping makes these flows visible, quantifying process times, waiting times, inventories, and information flows so that waste becomes explicit.

Teams often discover that only a small fraction of end-to-end lead time is spent in true value-adding work, with the remainder trapped in queues, approvals, and rework. This diagnosis leads to targeted interventions: removing redundant inspections, simplifying routings, co-locating dependent operations, or redesigning products to reduce variation and setup complexity.

3. Creating continuous flow

Flow aims to ensure that once work starts on a unit, it moves without interruption through successive value-creating steps. Instead of large batches moving sporadically between functional departments, Lean systems favour smaller lot sizes, balanced work content, and cell layouts that physically bring sequential tasks closer together.

When flow improves, several effects follow: lead times shrink, defects are detected earlier, inventory falls, and planning becomes simpler because work-in-progress is more predictable. Achieving this state often requires technical interventions, such as reducing changeover times using Single-Minute Exchange of Die (SMED), introducing standard work to stabilise cycle time, and redesigning equipment layouts to minimise transport and handling.

4. Pull-based production

Pull systems authorise production based on actual downstream consumption rather than forecasted demand, thereby aligning output with real customer needs. Techniques such as Kanban employ visual signals-cards, bins, electronic triggers-to initiate replenishment only when a defined quantity has been used.

This approach directly attacks overproduction and excess inventory, which are often the largest sources of waste in traditional plants. However, pull relies on underlying stability: reliable machines, disciplined standard work, and responsive suppliers are prerequisites for responding quickly to consumption signals without resorting to large safety stocks.

5. Pursuing perfection through continuous improvement

Perfection in this context means an ever-closer alignment between processes and customer value, with fewer steps, shorter times, and lower cost. Because markets, technologies, and product portfolios evolve, Lean treats improvement as ongoing work rather than a one-off project, embedding structured problem-solving (often under the label of Kaizen) into daily operations.

Empowering operators to stop a process when abnormalities occur-supported by visual controls and root cause analysis-shifts focus from firefighting symptoms to eliminating underlying causes. Over years, this accumulation of small changes can transform cost structures and quality levels more effectively than sporadic capital-intensive upgrades.

Lean and the quantitative view of production performance

While Lean is frequently presented qualitatively, its impact can be expressed using simple performance relationships. Consider a production line where throughput depends on effective operating time and cycle time per unit via . Reducing changeover losses, unplanned downtime, and rework increases , while standard work, layout improvements, and defect prevention can reduce ; Lean attacks both sides of this relationship through waste elimination.

Inventory dynamics can also be framed mathematically. If average work-in-progress is , throughput is , and average lead time is , then Little's Law gives . Lean interventions that smooth flow and reduce waiting lower ; if throughput is maintained, work-in-progress must fall accordingly, releasing space and working capital. Pull systems in particular are designed to cap by limiting the number of Kanban signals in circulation.

Quality improvements can be connected to cost by considering the defect rate and cost per defect . The expected cost of defects per period is . By tackling root causes of defects, Lean reduces and often as well, because problems are caught earlier when rework is cheaper. These simple relationships make it possible to quantify the economic contribution of Lean projects and prioritise efforts.

Key tools and practices that operationalise Lean

Beyond principles and equations, Lean is expressed in a toolkit of methods that embed waste-conscious thinking into daily operations. Value stream mapping visualises material and information flows, highlighting bottlenecks, inventories, and rework loops. 5S workplace organisation arranges tools and materials for clarity and cleanliness, reducing motion and errors while supporting safety.

Kanban systems control replenishment of components and work-in-progress via clearly defined signal limits, preventing uncontrolled build-up of inventory. Standard work defines the best-known sequence, timing, and expected outcomes for each task, providing a stable baseline from which improvements can be made. SMED techniques shorten changeovers by separating internal and external activities and simplifying tooling and fixtures, enabling smaller batches and more responsive scheduling.

These tools are often supported by digital systems-such as production monitoring, advanced planning and scheduling, and inventory management software-that provide real-time data to sustain Lean decisions. However, Lean emphasises that technology should reinforce clear processes and problem-solving discipline rather than substitute for them.

Benefits and trade-offs in practice

Well-executed Lean programmes typically report higher productivity, reduced lead times, lower inventory, and better quality. Examples include freeing up floor space as work-in-progress falls, lowering logistics costs due to more predictable flows, and achieving shorter order-to-delivery times that allow firms to win business on responsiveness. Many organisations also see improvements in safety and employee engagement because processes become more orderly and frontline ideas are actively sought.

Yet these gains come with trade-offs and risks. Aggressive inventory reduction without robust process capability can leave plants vulnerable to supply disruptions or equipment failures. Overemphasis on eliminating variation may clash with the need for flexibility in highly customised or uncertain environments. In some cases, poorly implemented Lean programmes have been criticised as cost-cutting exercises dressed in new language, leading to workforce distrust when headcount reductions are framed as "waste elimination" rather than redeployment into higher-value work.

Sustaining benefits therefore requires governance mechanisms that balance efficiency with resilience: carefully chosen safety stocks, dual sourcing for critical materials, preventive maintenance programmes, and scenario planning for demand surges or supply shocks. The strategic question is not simply how lean a system can become, but how to set waste and buffer levels compatible with the organisation's risk appetite and market position.

Lean in a broader operations and supply chain context

As supply chains globalised, Lean principles extended beyond individual factories to logistics, procurement, and distribution networks. Optimising flow now involves synchronising suppliers, contract manufacturers, and logistics providers so that materials move smoothly from source to end customer. This requires data-driven demand planning, real-time visibility of inventories, and collaborative problem-solving across organisational boundaries.

Within this extended context, Lean intersects with other methodologies. Six Sigma's statistical focus on variation reduction complements Lean's emphasis on flow, leading many firms to adopt integrated Lean Six Sigma frameworks. Agile product development, with its short iterations and customer feedback loops, echoes Lean's insistence on value and adaptation, especially in environments of high uncertainty. Digital technologies-such as sensor-equipped equipment, analytics platforms, and automated material handling-can further amplify Lean's aims when used to stabilise processes and expose waste.

Ongoing debates and why Lean still matters

Contemporary debates centre on the robustness of Lean systems in the face of external shocks and long, complex supply chains. Just-in-time practices were scrutinised during periods of global disruption when shortages of critical components halted entire production lines. Critics argued that relentless pressure to minimise inventory had removed valuable resilience. Proponents countered that the problem lay in applying Lean simplistically, without adequate risk assessment, diversification, or strategic buffers.

Another tension concerns the human dimension. Lean's success depends on engaged workers empowered to identify problems and suggest improvements, yet implementations driven solely from the top can feel like cost reduction programmes imposed on staff. Reconciling these perspectives requires transparent communication about goals, genuine investment in training, and mechanisms that ensure productivity gains translate into better work rather than just cuts.

Despite these controversies, the underlying logic remains powerful: resources are finite, customer expectations for speed and quality continue to rise, and environmental constraints make waste in all forms increasingly untenable. Organisations that systematically align processes with value, expose and remove waste, and cultivate a culture of continuous improvement are better positioned to adapt to new technologies, regulatory pressures, and market shifts.

Lean manufacturing therefore still matters not as a fixed toolkit from a particular era but as a way of structuring operational thinking around value, flow, and learning. In a world where competitive advantage is often determined by how effectively companies convert ideas, materials, and information into reliable outcomes for customers, the disciplined pursuit of waste-free processes remains a central strategic concern.

‌

Quote: "Where is AI in GDP statistics?"

"Imperfect measures of AI productive capacity are far more informative than the implicit assumption embedded in conventional projections-that the AI sector's productive capacity is small and slow-growing. Fiscal authorities could use such measures to stress test projections about the labor tax base; central banks could..." - "Where is AI in GDP statistics?" - May 2026 - Anton Korinek (PIIE) and Patrick McKelvey (Bank of Canada)

Macroeconomic policy is being steered by models that quietly embed an assumption about artificial intelligence: that the sector is economically small, its capacity expands slowly, and its contribution to the tax base and inflation dynamics will remain marginal for years to come. In parallel, AI investment, compute capacity, and quality-adjusted output have begun to grow at extraordinary rates that are largely invisible to the national accounts used by fiscal authorities and central banks. The underlying tension is not just one of measurement technique but of strategic blindness: policy frameworks calibrated to a pre-AI economy are extrapolating forward as though the production frontier itself were not being shifted by an emerging general-purpose technology.

The core issue is a widening gap between the productive capacity of the AI sector and its measured footprint in GDP statistics. When quality-adjusted AI output grows at rates that would be implausible for any traditional industry, this is not a minor statistical curiosity; it is a signal that the informational content of standard projections is degrading. Legislatures drafting medium-term budget frameworks, and central banks publishing fan charts for growth and inflation, are implicitly conditioning on a world in which the AI production function is both small in scale and smooth in its evolution. If that premise is wrong, the entire configuration of projected labour income, tax receipts, and output gaps may be systematically biased.

The factual backdrop: explosive AI output, invisible in GDP

Over the past few years, high-end estimates of AI sector activity in the United States have suggested nominal output on the order of USD 250 billion in 2025, comparable to the scheduled passenger airline industry. More striking than the level is the growth rate of quality-adjusted AI output. By treating AI as a coherent production sector and adjusting for improvements in model quality at fixed prices, Korinek and McKelvey estimate that quality-adjusted AI production expanded at more than 2 000 percent per year in 2024 and 2025. These numbers are driven by three compounding forces: rapid expansion of data-centre and compute capacity, hardware efficiency gains, and algorithmic progress that dramatically improves output per unit of hardware.

National statistics offices, however, were not designed to track such activity. Conventional GDP accounting captures the AI boom only indirectly: as investment in structures and equipment, as intermediate inputs to other industries, and as service purchases by firms and consumers. Many of the most important gains show up as quality improvements or consumer surplus rather than observed market transactions. The result is that the data streams feeding macroeconomic models depict an economy with modest technology-driven productivity improvements, even as AI developers scale capacity in ways that historically have been associated with major general-purpose technological shifts.

This disconnect is why the authors argue for an "AI GDP" framework and satellite accounts that explicitly measure AI production and capacity. Their empirical work shows that once AI is treated as a distinct sector with its own capital stock, intermediate inputs, and quality-adjusted output, the growth dynamics look radically different from the rest of the economy. For policymakers, the lesson is not that headline GDP should be replaced, but that relying on projections which implicitly assume a small, slow-moving AI sector is no longer tenable.

Productive capacity versus realised output

The statement about "imperfect measures of AI productive capacity" turns on a crucial distinction between two concepts that macroeconomic models often conflate when technologies are stable: productive capacity and realised output. Productive capacity refers to what the AI sector could produce at current prices and technology if it were fully utilised, given existing compute stock, model architectures, and available data. Realised output is what is actually being produced and sold at a point in time, which depends on demand, regulatory constraints, infrastructure bottlenecks, and organisational readiness across the wider economy.

In conventional macroeconomics, realised output is typically modelled relative to a potential output , with an output gap . For most sectors, capacity grows relatively smoothly, and potential output is estimated using trend filters or production functions with modest capital-deepening and productivity terms. The implicit assumption in many forecasting frameworks is that the AI sector contributes only a small increment to aggregate , so that treating capacity as a smooth extrapolation of past trends is adequate.

Once AI capacity begins to grow at rates exceeding 2 000 percent in quality-adjusted terms, that assumption breaks down. Even if only a fraction of that capacity is deployed into new products, automation tools, and complementary capital, the path of potential output could deviate markedly from trend. A production function that includes AI capital alongside traditional physical and human capital may need to be written as something like , where is growing at extraordinary rates and itself partly reflects AI-driven spillovers. Ignoring this term or extrapolating it linearly is no longer a neutral simplification.

This is why even imperfect estimates of AI capacity can be more informative than implicitly assuming capacity is trivial. An imperfect measure at least anchors projections to a dynamic that recognises the scale and direction of change. In contrast, a baseline that effectively sets or grows it as a modest share of aggregate capital builds in a structural misrepresentation of the economy's production frontier.

From measurement gap to policy gap

If official statistics understate the growth of AI productive capacity, a policy gap follows. Fiscal and monetary authorities are tasked with stabilising the economy, financing public goods, and safeguarding financial stability in the face of shocks. Their tools and frameworks are calibrated around relationships between output, employment, inflation, and asset prices that assume gradual technological progress. When a technology arrives that can simultaneously automate cognitive tasks, create new service categories, and compress the time needed to design and deploy software, those relationships become unstable.

One channel is aggregate supply. Suppose AI diffusion accelerates between 2026 and 2030, with AI-enhanced processes raising effective labour productivity in certain sectors by large multiples. If models underestimate the expansion of productive capacity, central banks may misinterpret disinflationary pressures as evidence of weak demand rather than a positive supply shock, potentially leading to policy that is too accommodative or too tight depending on the sign of the misreading. A parallel risk exists on the fiscal side: if projected tax bases are derived from historical elasticities of labour income to GDP, they may fail to account for a shift in value creation from wages to AI-mediated capital income.

Financial stability is another concern. Massive investment in data centres, high-end chips, and AI-native firms is expanding the AI capital stock in ways that could resemble past investment booms. Without explicit measures of sectoral productive capacity and utilisation, regulators may struggle to gauge whether valuations reflect reasonable expectations of future cash flows or a speculative overshoot. Imperfect but transparent measures of AI capacity would allow stress tests to incorporate scenarios in which utilisation stalls, regulatory constraints bite, or technical progress slows, affecting both earnings and collateral values.

Stress testing the labour tax base

The quote points explicitly to one of the most immediate fiscal applications: stress testing projections for the labour tax base. Tax systems in advanced economies rely heavily on taxes on labour and consumption, with labour often providing between 40 and 60 percent of total revenue when payroll and personal income taxes are combined. If AI capacity enables rapid automation of tasks, especially in high-wage professions, the composition of tax bases could shift towards capital income and rents linked to data, intellectual property, and platform control.

Imperfect measures of AI capacity can inform scenario analysis even before comprehensive AI satellite accounts exist. Consider a simple mapping from AI capacity to potential labour displacement: if AI-driven tools can, at full deployment, perform a fraction of tasks currently performed by workers in certain occupations, and if the effective AI capacity index is growing at an exponential rate, then plausible stress scenarios can be constructed around the trajectories of relative to current labour inputs. Fiscal authorities can then simulate paths in which the labour share of income declines by, say, 5 to 15 percentage points over one or two decades, and examine the consequences for personal income tax and social insurance contributions.

Such stress tests do not require precise predictions about which jobs will be automated in which year. They require a disciplined way of linking the growth of AI capacity to enveloping ranges of labour income outcomes. Even if the underlying AI capacity index is built from noisy proxies-data-centre investment, GPU shipments, estimated algorithmic efficiency gains, and model deployment metrics-its imperfections are transparent and can be bracketed with sensitivity analysis. That is more informative than assuming, as many baseline projections still do, that labour's share of income and the elasticity of taxable wages to GDP will remain approximately constant.

Central banks and AI-adjusted output gaps

Central banks face a different but related challenge. Standard New Keynesian frameworks rely on estimates of potential output and output gaps to guide interest rate policy. When AI capacity increases rapidly, the shape of potential output becomes more uncertain. If AI raises trend productivity growth, then what appears as cyclical weakness might actually be a benign reflection of the economy adjusting to a higher productivity path. Conversely, if AI-driven sectoral shifts create pockets of structural unemployment, traditional Phillips curve relationships between slack and inflation may weaken.

Incorporating AI capacity measures into monetary policy models could take several forms. One is to extend production functions to include AI capital explicitly, with separate utilisation rates for that capital. Another is to augment the information set used for estimating potential output with AI-specific indicators, treating them as leading signals of future supply shifts. Even a rudimentary AI capacity index-constructed from investment, compute, and benchmark performance measures-could help central banks distinguish between inflation dynamics driven by demand fluctuations and those driven by AI-enabled supply changes.

This matters for interest rate paths and communication strategies. If AI capacity is expected to unleash significant deflationary pressure in certain sectors while boosting demand for complementary skills and capital elsewhere, central banks must decide how to respond to a more uneven and possibly more volatile pattern of relative price changes. Failing to recognise AI as a material driver of potential output and productivity risks miscalibrating both policy stance and forward guidance.

The strategic tension: ignorance versus imperfect information

The phrase "imperfect measures" acknowledges that any attempt to quantify AI productive capacity at this stage will be fraught with conceptual difficulties. Where exactly should the boundary of the AI sector be drawn-only foundation model developers, or also downstream firms building domain-specific applications? How should quality be adjusted when models differ along dimensions that are difficult to aggregate? How should non-market outputs, such as open-source models and freely available tools, be treated?

Yet the alternative is not a world of perfect accuracy; it is a world of structurally embedded ignorance. When conventional projections assume that AI capacity is small and slow-growing, they effectively fix technology parameters that may in fact be changing rapidly. The strategic choice is between embracing a noisy, revisable set of AI-specific metrics or relying on models that treat a potentially transformative technology as a footnote. Korinek and McKelvey argue that the former is superior precisely because it allows policymaking to be conditioned on explicit assumptions that can be scrutinised, updated, and stress-tested.

This is analogous to the evolution of macro-financial surveillance after the global financial crisis. Before 2008, many macro models either omitted financial frictions or represented them in highly stylised ways, effectively assuming that the financial sector's capacity to generate credit and risk was constrained and well-behaved. Post-crisis, central banks and international institutions built macro-prudential frameworks, stress testing regimes, and detailed sectoral accounts to monitor systemic risks. These tools are imperfect by design, but they are grounded in an explicit recognition that ignoring financial capacity dynamics is unacceptable. AI capacity measurement occupies a similar conceptual role for the production side of the economy.

Debates and objections

There are, however, serious debates around the measurement approach and its policy uses. One line of criticism questions whether quality-adjusted AI output growth figures in the 2 000 to 2 600 percent range are economically meaningful. Skeptics argue that adjusting for model capabilities at fixed prices may overstate the contribution to welfare and productivity if users' willingness to pay does not rise in proportion to benchmark scores. They caution that capacity measures built on technical performance metrics risk becoming detached from the pace of real-world diffusion, organisational change, and complementary investment.

Another objection concerns the mapping from sectoral AI capacity to aggregate outcomes. Critics note that productive capacity in the AI sector does not automatically translate into realised productivity gains across the economy. Bottlenecks in regulation, trust, data access, and skills could delay deployment for years. From this perspective, the danger is not that conventional projections underestimate AI's impact but that they might overreact to capacity signals that are only slowly realised in output and employment.

These critiques underscore the need to treat AI capacity measures as inputs to scenario analysis rather than as point forecasts. Imperfect measures can still be used to generate bounded scenarios: a low-deployment path in which only a small share of capacity is applied to economically significant tasks, a central path with gradual diffusion, and high-deployment paths in which adoption accelerates non-linearly. Fiscal and monetary authorities can then design policies that are robust across these scenarios rather than optimised for a single assumed trajectory.

Why the measurement choice matters now

The timing of this measurement agenda is not incidental. If AI capacity continues to expand at recent rates, the gap between what AI could do and what it is currently doing will grow rapidly. That capacity-realisation gap carries both upside and downside risks. On the upside, if deployment accelerates, economies could experience a wave of productivity growth that eases fiscal pressures and raises living standards. On the downside, if deployment is uneven or concentrated in ways that displace labour without adequate redistribution, the tax base could become more volatile and more reliant on capital taxation, wealth taxes, or new instruments targeted at AI-intensive firms.

Policymakers therefore face interlocking strategic questions. How should social insurance systems and tax codes be redesigned to remain solvent if labour income becomes a less reliable base? What mix of labour, consumption, and capital taxation can sustain revenue without unduly discouraging innovation? How should central banks adjust their analytical toolkits to handle economies in which potential output and sectoral composition are shaped by a rapidly evolving AI sector? None of these questions can be addressed adequately if the AI sector is treated as a black box whose size and capacity are left unspecified.

Imperfect measures of AI productive capacity offer a way out of that impasse. They allow fiscal authorities to run stress tests in which the labour tax base is eroded under different deployment scenarios, prompting early consideration of alternative revenue sources and automatic stabilisers. They enable central banks to explore how AI-driven supply shifts could affect inflation dynamics, wage bargaining, and asset prices, informing both baseline projections and tail-risk planning. And they provide a common reference point for debates about regulation, competition policy, and industrial strategy, even if the underlying figures are subject to revision.

In the longer run, the development of AI-focused satellite accounts and an "AI GDP" framework is likely to transform how we think about the structure of the economy. What begins as a set of rough capacity indicators can evolve into a more comprehensive picture of the AI value chain, from compute infrastructure and foundation models to domain-specific applications and labour-AI complementarities. The statement that imperfect measures are more informative than implicit assumptions is therefore not only a comment on current data gaps; it is a call to rebuild the informational foundations of macroeconomic policy before the AI economy grows large enough to turn today's measurement gap into tomorrow's policy failure.

‌

Quote: Kristalina Georgieva - International Monetary Fund (IMF) Managing Director

"We collectively, including the fund, did not appreciate the backlash against globalisation that came from the fact that, yes, the world economy is doing better as a whole, but many communities were hollowed out because their jobs disappeared and there was not enough attention to them. I'll tell you what I'm very keen not to see repeated is the same with artificial intelligence." - Kristalina Georgieva - International Monetary Fund (IMF) Managing Director

The central issue is not whether a new technology makes economies more productive, but whether the gains arrive faster and more visibly than the losses. When job destruction is concentrated in particular towns, sectors, and skill groups, aggregate growth can look healthy while the social fabric in affected places weakens, and that imbalance has become a defining political risk around artificial intelligence. Kristalina Georgieva, who has served as Managing Director of the International Monetary Fund since October 1, 2019 and began a second term on October 1, 2024, has made that warning from a position of institutional authority that was shaped by the IMF's experience of multiple global shocks.

The remark reflects a lesson that global institutions learned, often slowly, from the era of rapid trade integration. The world economy can be better off on paper even as specific communities lose stable work, local spending power, and a sense of economic purpose. That distinction matters because politics is rarely organised around the global average. It is organised around visible closures, wage stagnation, and the feeling that national and international leaders celebrated efficiency while leaving the costs of adjustment to be absorbed locally. Georgieva's concern is that artificial intelligence could repeat that pattern on a faster clock, with the benefits accruing to firms, capital owners, and highly adaptable workers while the disruption lands on those whose tasks are easiest to automate.

From globalisation's backlash to AI's distributional shock

The comparison with globalisation is not rhetorical flourish; it is an argument about political economy. In her interview, Georgieva said that the IMF and others had not sufficiently appreciated the backlash against globalisation because they focused on the fact that the world economy was doing better as a whole, while many communities were hollowed out when jobs disappeared. That description captures the core failure of technocratic optimism: it can measure aggregate welfare precisely while underweighting the geography of decline. A region that loses a factory, a port function, a back-office cluster, or a processing plant does not experience the economy as a statistical average. It experiences it as closure, migration, and social churn.

Artificial intelligence creates a similar tension because it is best understood as a general-purpose technology whose economic effect is broad, uneven, and delayed. The IMF has estimated that almost 40% of global employment is exposed to AI, rising to about 60% in advanced economies. Exposure does not mean every exposed job vanishes, but it does mean that a substantial share of routine cognitive work, administrative handling, analysis, and content production may be altered, compressed, or partially automated. The IMF also noted that in advanced economies roughly half of the exposed jobs may benefit from AI integration, while the other half may see lower labour demand, lower wages, or in some cases disappearance.

This is why the social question is not merely about total output. If AI raises productivity by making firms leaner and faster, the headline number can be positive even when bargaining power shifts away from labour. Goldman Sachs has argued that generative AI could raise global GDP by 7% and lift productivity growth by 1,5 percentage points over 10 years, while also exposing the equivalent of 300 million full-time jobs to automation. Those figures are not incompatible. They describe a world in which technology expands the economic pie while simultaneously changing who gets the slices and who is left waiting outside the bakery.

The IMF's warning is also a warning about timing

One reason AI is politically delicate is that its benefits may be diffused over time, while its costs are immediate and local. Productivity gains can take years to appear in national accounts because firms need to adapt workflows, train staff, redesign products, and learn how to trust new systems. By contrast, a call centre that reduces headcount, a law office that automates first-draft work, or a media business that cuts junior roles can do so quickly. The result is a familiar asymmetry: the burden of adjustment arrives before the compensation mechanisms are ready.

This timing problem helps explain why economists disagree so sharply on the size of the prize. Optimistic estimates stress economy-wide efficiency gains, new products, and the value of complementary tasks. More restrained work emphasises that only a fraction of tasks can be profitably automated once implementation costs, error rates, oversight, regulation, and customer preferences are included. Daron Acemoglu has argued that the medium-term productivity effect may be far smaller than the largest headline estimates, with a much more modest uplift in output once only economically viable uses are counted. The disagreement matters because policy should not be built on the most dramatic forecast, nor should it ignore the possibility that adoption will be slower and less comprehensive than enthusiasts predict.

Georgieva's intervention sits between those poles. She is not denying that AI can boost growth. Indeed, the IMF itself has argued that AI is on the brink of a technological revolution that could jumpstart productivity, boost global growth, and raise incomes around the world. The warning is that the distributional consequences could still be severe enough to deepen inequality and social tension if governments assume that aggregate gains will automatically trickle down. In other words, the productivity story and the social story are not rivals. They are two halves of the same policy problem.

Why global institutions are especially sensitive to this pattern

The IMF's interest in this issue is not accidental. A multilateral lender and surveillance institution sees macroeconomic stability through the lens of crises, capital flows, unemployment, and political backlash. If a major technology wave deepens inequality inside countries, it can also change fiscal politics, trade politics, and attitudes towards international cooperation. Francine Lacqua, the interviewer in the podcast series, is a Bloomberg anchor who regularly speaks with central bankers, finance ministers, and senior officials, which makes the conversation part of a broader public debate about how economic power is being reorganised.

Georgieva's own background reinforces the institutional seriousness of the warning. Since the IMF has already had to manage the economic consequences of the pandemic and other global disruptions, it has become increasingly alert to the fact that resilience cannot be treated as an abstract ideal. It must be built in advance through labour-market policy, social protection, training, competition rules, and investment in digital capacity. That is especially true because AI does not affect all countries equally. The IMF has said exposure is highest in advanced economies, while emerging markets and low-income countries face lower but still significant exposure. That means the immediate labour-market shock may be concentrated in wealthier countries, but the longer-term diffusion of AI capabilities could widen the gap between economies that can adopt, regulate, and complement the technology, and those that cannot.

What was missed during globalisation

The phrase about communities being hollowed out points to a specific historical failure. Policymakers often treated trade and integration as a sum-of-parts problem: if the nation as a whole is richer, then the policy is successful. But local economies do not adjust frictionlessly. Workers in declining industries are not instantly reallocated to new sectors. Skills are not perfectly transferable. Housing markets are sticky, family ties matter, and the social meaning of work is not captured by GDP. When those frictions are ignored, resentment accumulates and eventually seeks political expression.

That experience is directly relevant to AI because the technology may hollow out different kinds of places. Globalisation often hit manufacturing towns, logistics hubs, and regions dependent on tradable goods. AI may instead pressure administrative centres, shared-service locations, media organisations, some professional services, and entry-level white-collar pathways. The political consequence may therefore be different in detail but similar in structure: whole ladders of advancement can be shortened before replacements are fully visible. For younger workers, especially, the problem is not just displacement but the erosion of the first rung of a career ladder.

There is also a deeper ideological parallel. During the globalisation era, many advocates implicitly assumed that efficiency was self-justifying. If something was cheaper, faster, and better for consumers, the distributional pain was treated as secondary. AI could repeat that error if firms and governments measure success by adoption rates alone. But broad adoption is not the same as broad benefit. A technology can be commercially successful, strategically important, and still socially destabilising if the gains are narrowly held.

The strategic debate: productivity engine or inequality accelerator?

The strongest argument in favour of AI is that it can raise productivity in economies that have struggled with weak growth, labour shortages, and ageing populations. Goldman Sachs' estimate of a 7% lift in global GDP captures the scale of ambition that surrounds the technology, while the IMF has stressed that AI could improve incomes and support growth if it is deployed well. In sectors from healthcare to education to finance, AI systems can reduce routine workload, accelerate analysis, and improve service quality. The promise is not only cost cutting but the creation of new products and business models.

The strongest argument against complacency is that AI may amplify existing inequalities in capital, data, and skill. Firms with the best models, the most data, and the strongest distribution channels will capture disproportionate value. Workers with high complementary skills may see their productivity rise, while workers in modular, repeatable tasks face stagnation or displacement. Countries with advanced digital infrastructure may use AI to widen their advantage, while countries with weaker institutional capacity struggle to keep up. Even when the overall effect on employment is positive in the long run, the short run may still bring a wave of churn that outpaces retraining and policy response.

This is why the debate is not really about whether AI is good or bad. It is about whether societies will manage transition costs with enough seriousness. The IMF has argued that policymakers should proactively address inequality to prevent AI from further stoking social tensions. That implies practical choices: stronger safety nets, wage insurance, mobility support, lifelong learning, and public investment in digital skills. It also implies a regulatory stance that encourages adoption while checking abuses, such as excessive market concentration or labour substitution without offsetting investment in human capability.

Why the warning matters now

Georgieva's message matters because it shifts the debate from hype to governance. It is easy to celebrate a technology when its promised benefits are still theoretical. It is harder to govern it when its disruptions are already visible. The IMF chief's insistence that the world should not repeat the mistakes of globalisation is a reminder that economic success measured at the top can coexist with social fracture at the base. If AI is allowed to proceed as a private efficiency project with public consequences ignored until later, then the backlash will not be surprising; it will be predictable.

That is the practical consequence buried inside the warning. AI can make economies richer, but it can also make societies less stable if the transition is unmanaged. The policy challenge is to ensure that productivity gains are not treated as an excuse to forget the communities and career paths that bear the cost of change. If that lesson is missed again, the political response may be harsher than the technology itself.

‌

Quote: Demis Hassabis - Google Deepmind CEO

"When we look back at this time, I think we will realise that we were standing in the foothills of the singularity. It will be a profound moment for humanity." - Demis Hassabis - Google Deepmind CEO - 2026 Google I/O technology developer conference

The underlying issue is no longer whether machine intelligence will transform human affairs, but whether our political, economic and ethical systems can adapt at the same speed as the underlying technology that is now compounding year on year. The friction lies in a widening gap: frontier AI systems are moving from tools that wait for instructions to entities that can act, plan, and coordinate with minimal human supervision, while institutions, laws and norms still assume a world of slower, more legible change. When a leading AI scientist asserts that this transition marks the early stage of a new historical regime, he is naming a tension that is already visible in boardrooms, laboratories and legislatures.

From static tools to agentic systems

For several decades, AI systems were framed as narrow tools: chess engines, recommendation algorithms, translation services and search ranking models. They were powerful, but fundamentally reactive. They did not initiate projects, hold long-term goals or orchestrate complex workflows without an engineer in the loop. The recent shift to so-called "agentic" systems is qualitatively different. These models can decompose a user objective into sub-tasks, call tools such as browsers or code interpreters, write and debug software, and loop over their own outputs until a performance criterion is met. In effect, they act like junior colleagues rather than software menus.

At Google I/O, this shift was made concrete through demonstrations of AI systems that design operating systems, draft and execute multi-step research plans, and coordinate across products from search to productivity suites. One showcase involved an autonomous system that could construct a functional operating system for under USD 1 000 in compute and overhead, a task that would historically require teams of engineers working for months. The key is not that such feats are possible in principle; it is that they are rapidly becoming cheap, repeatable and integrated into mainstream platforms.

This transition matters because it changes the leverage a small group of people or organisations can exert. A single developer equipped with powerful agents can now build, test and deploy complex services that once demanded a mid-sized company. In security terms, the same leverage can enhance defensive capabilities but also lower the barrier for sophisticated cyberattacks, automated social engineering, or automated discovery of software vulnerabilities. The trajectory is towards a world where much more can be done by far fewer humans.

Why "singularity" entered the AI mainstream

The term "singularity" was originally borrowed from physics and mathematics, where it describes points such as the centre of a black hole, at which descriptive equations break down and conventional intuitions fail. In the early 1990s, computer scientist Vernor Vinge repurposed the idea for AI, suggesting that once systems exceed human cognitive capabilities and can improve themselves, the resulting feedback loop would produce change so rapid that it would be difficult to model with existing social or economic theories.

For years, such visions were largely confined to science fiction, futurist circles and a subset of AI safety researchers. Large technology companies tended to avoid the language, preferring incremental narratives about productivity and assistance. The decision by a major AI lab leader to adopt the singularity framing publicly signals a deliberate shift: it acknowledges that the slope of capability is steepening and that the transition from experimental systems to world-shaping infrastructure is well under way. It also functions as a warning that the timelines to serious disruption are short enough that preparation cannot be deferred.

Hassabis has suggested that artificial general intelligence, often defined as systems with performance roughly comparable to an expert human across a wide range of tasks, could emerge by around 2030, with uncertainty measured in only a few years. If those estimates are even approximately correct, then organisations that plan on decade-long cycles, from regulators to universities to defence ministries, face a planning problem they have rarely confronted: they must hedge against both the possibility of very rapid transformation and the possibility that the curve flattens.

The factual context: a platform company bets on autonomy

The backdrop to this language is a strategic reorientation of one of the world's largest technology companies around AI. At Google I/O 2026, Google and DeepMind unveiled an array of products and research initiatives: new frontier models, multimodal assistants integrated into search and productivity tools, autonomous coding systems, AI-augmented video generation tools, and bespoke hardware for training and serving models at scale. Rather than being siloed experiments, these systems are presented as a coherent platform spanning consumer, enterprise and developer ecosystems.

In this environment, Hassabis's statement is not an isolated philosophical remark. It sits alongside concrete decisions: allocating large capital budgets to specialised AI accelerators, restructuring products around AI agents, and articulating timelines that compress the expected arrival of broadly capable systems into the span of a single strategic planning horizon. The narrative is that humanity is entering a phase where each iteration of capability builds directly on the previous one, leading to compounding returns rather than linear gains.

In effect, the company is arguing that today's chatbots and coding assistants represent only the earliest stage of a broader transition. These are the first footholds, not the peak. As agents are networked, endowed with memory, and embedded in physical systems such as robots, vehicles and infrastructure, their actions will increasingly manifest in the material economy rather than just digital text and images. This is where concerns about labour markets, safety and governance become more immediate.

Acceleration, compounding and feedback

The strategic tension revolves around feedback loops. If AI systems can help design better versions of themselves, build more efficient hardware, discover new materials and streamline research, then progress in AI becomes entangled with progress in the rest of science and engineering. Hassabis has argued that AI may prove several times more transformative than past industrial revolutions because it targets the bottleneck that constrained earlier eras: the pace at which new ideas can be generated, tested and implemented.

Historically, improvements in productivity depended on larger workforces, more capital or incremental process optimisation. A significant share of that optimisation was done by human experts. If AI can augment or partially automate the role of these experts, the rate of innovation itself could accelerate. In economic terms, this raises the prospect that growth models based on a roughly constant rate of technological improvement could be replaced by regimes in which the effective innovation rate increases as AI improves.

For example, consider a stylised research process where the time required to complete a project is . If AI tools cut by a factor of , with , then the number of projects completed per year increases by . If AI is itself improved by the outputs of these projects, then can shrink over time, leading to a feedback loop in which the pace of progress itself accelerates. In more formal endogenous growth models, AI would augment the "effective" number of researchers, increasing the term governing idea production and pushing economies onto steeper growth trajectories.

In practice, such models are crude and highly uncertain, but they capture the intuition behind singularity language: beyond a certain level of capability, the interactions between AI, science and industry may generate dynamics qualitatively different from previous technological shifts. This is both the lure and the anxiety of the current moment.

Promise: scientific discovery and problem-solving

Hassabis has consistently emphasised the constructive side of this transition, particularly in science and healthcare. DeepMind's work on protein folding, through its AlphaFold system, offers an early indication of how AI can contribute to core scientific challenges. Where traditional approaches required painstaking experiments to infer the three-dimensional structure of proteins from their amino acid sequences, AI systems can now predict many such structures computationally, vastly expanding the available dataset for drug discovery and basic biology. Similar methods are being developed for material science, climate modelling and mathematics.

As models become more capable at exploring hypothesis spaces, designing experiments and interpreting complex datasets, the hope is that they will help unlock treatments for diseases, design low-carbon materials and optimise energy systems more rapidly than human research alone could achieve. This is part of why some AI leaders argue that the net impact of advanced AI could dwarf earlier industrial transformations: it does not only automate existing tasks but also amplifies the process by which new capabilities are created.

In a world facing climate change, ageing populations and geopolitical instability, such accelerations are understandably attractive. They offer a narrative in which AI is not primarily about efficiency or consumer convenience but about expanding the frontier of what is technically possible in domains that matter directly to human survival and flourishing.

Risk: misalignment, misuse and concentration of power

The same features that make advanced AI attractive also generate serious risks. Systems capable of autonomous planning and self-improvement raise questions about alignment: ensuring that their objectives, when pursued at scale, remain compatible with human values and legal constraints. Even if one is sceptical of scenarios involving fully superhuman intelligence, there are near-term concerns about AI systems that are merely very capable and deployed widely without sufficient safeguards.

One class of risk involves misuse. Autonomous coding agents can assist in writing malware, identifying vulnerabilities, or orchestrating coordinated attacks. Large-scale language models can generate persuasive disinformation tailored to specific demographics, potentially amplifying existing social fractures. As these systems become better at modelling human psychology and adapting in real time, the cost of high-quality manipulation could fall, with implications for elections, public health campaigns and social cohesion.

Another involves structural power. If the resources required to train frontier models remain concentrated in a handful of companies and states, control over the most capable systems will be highly centralised. Those actors could, intentionally or not, shape everything from labour markets to information flows. The singularity framing draws attention to a moment where artificial systems may hold more de facto power than any single human institution can easily check, not because they are sentient or malicious, but because they are embedded in so many layers of critical infrastructure.

There is also the possibility of accidents and emergent behaviour. As models grow larger and are coupled with external tools and other agents, predicting their behaviour in novel situations becomes more difficult. Aligning such systems may require new formal methods, rigorous evaluation regimes and international norms that do not yet exist at scale. Here, the concern is less a sudden catastrophic failure and more a series of cascading incidents-financial flash crashes, infrastructure outages, or uncontrolled propagation of flawed code-arising from tightly coupled automated systems.

The strategic and technological tension

At the heart of current debates is a tension between speed and control. On one side, there is the argument that rapid deployment is necessary to capture economic value, to stay ahead of competitors and to make beneficial applications widely available. On the other, there is the view that racing ahead without robust safety measures, regulatory frameworks and democratic oversight is irresponsible, particularly as systems approach or exceed human-level competence across many domains.

Hassabis's public positioning seeks to occupy a middle ground. He emphasises both the proximity of general-purpose AI and the need for society to prepare within a relatively short time window. This implicitly calls for a dual strategy: accelerate the development of beneficial uses while simultaneously investing in safety research, governance structures and public engagement. The challenge is that market incentives, geopolitical rivalry and the sheer pace of technical progress make coordinated restraint difficult.

Governments are only beginning to respond with AI acts, executive orders and voluntary code commitments. These instruments tend to lag technical frontier capabilities by several years. By the time a regulation is in place to address one generation of models, the next generation-with qualitatively different properties-may already be under development. This regulatory lag is familiar from other technologies but is amplified when the paradigm itself is in flux.

Debates and objections

Not all researchers or policymakers accept the singularity framing or the specific timelines associated with it. Critics raise several objections. One is empirical: past predictions of AI breakthroughs, including earlier waves of optimism in the 1960s and 1980s, were often overconfident. They argue that current systems, impressive as they are, still rely heavily on pattern recognition rather than deep understanding, struggle with long-term reasoning and lack robust grounding in the physical world.

From this perspective, equating progress in large language models and agents with an imminent singularity risks obscuring unresolved problems such as brittleness, hallucination and vulnerability to adversarial inputs. Some suggest that claims about timelines to AGI are influenced by competitive pressures and investor expectations, and that more humility is warranted. They also worry that dramatic narratives about near-term singularity could crowd out attention to mundane but urgent issues like labour displacement, privacy and market concentration.

Another objection targets the metaphor itself. The term "singularity" implies a sharp discontinuity, a moment after which extrapolating from previous trends becomes meaningless. Some economists and sociologists argue that a more accurate picture is one of uneven, domain-specific adoption. In this view, certain sectors-software, digital marketing, some scientific fields-may experience extremely rapid change, while others-construction, caregiving, public administration-move more slowly, constrained by physical, legal or cultural factors.

Accordingly, they suggest focusing less on hypothetical points of infinite change and more on concrete decisions about where and how AI is deployed, who benefits, and how costs are distributed. For them, the danger of singularity language is that it can induce either complacent fatalism-"nothing we do matters"-or reckless acceleration-"we must move as fast as possible to reach the promised land"-neither of which encourages careful stewardship.

Why the framing matters now

Regardless of whether one accepts the metaphor or the timelines, the choice by a central figure in AI to characterise the current era as the beginning of a singularity has practical consequences. It signals to engineers, investors and policymakers that they should treat AI not as a marginal upgrade to existing tools, but as a transformational general-purpose technology. That shift in perception can influence everything from research priorities to education policy.

In research, the framing encourages work on foundational capabilities and long-term safety rather than solely on narrow applications. Teams may prioritise interpretability, robustness and alignment techniques in anticipation of systems whose influence extends across critical infrastructures. In industry, the expectation of accelerating capability may drive aggressive investment in AI-native products, workforce retraining and new business models that assume AI will be a core component of almost every workflow.

In public policy, acknowledging that we might be in the "foothills" of a major transformation sharpens the urgency of questions about accountability, global coordination and equitable access. If advanced AI is likely to amplify existing inequalities unless actively governed, then social choices made in the next few years-about data rights, model access, liability regimes and international cooperation-will have outsized effects. The metaphor thus serves as a prompt: if we are indeed at an early stage of a steep climb, the route we choose now will determine which groups bear the risks and reap the rewards.

Finally, there is a psychological dimension. Seeing one's era as a hinge point in history can be both motivating and destabilising. For researchers and entrepreneurs, it provides a sense of purpose: their work may have consequences far beyond quarterly metrics. For citizens and policymakers, it can induce anxiety about loss of control. Navigating between these reactions requires a form of collective maturity: the ability to recognise that transformative capability is emerging, to take its risks seriously without succumbing to paralysis, and to articulate positive, plural visions of futures in which powerful AI is integrated into human institutions rather than simply unleashed.

Whether or not historians ultimately agree that this period marked the true "foothills" of a singularity, the underlying reality is that AI systems are already reshaping knowledge work, scientific research and digital infrastructure. The choice now is not whether to enter this terrain, but how to do so deliberately, with as much foresight as a rapidly changing technological landscape will allow.

"When we look back at this time, I think we will realise that we were standing in the foothills of the singularity. It will be ?a profound moment for humanity.” - Quote: Demis Hassabis - Google Deepmind CEO - 2026 Google I/O technology developer conference

‌