Artificial Intelligence Archives - Global Advisors

Term: Tool calling

“Tool calling (often called function calling) is a technical capability in modern AI systems-specifically Large Language Models (LLMs)-that allows the model to interact with external tools, APIs, or databases to perform tasks beyond its own training data.” – Tool calling

Tool calling, also known as function calling, is a technical capability that enables Large Language Models (LLMs) to intelligently request and utilise external tools, APIs, databases, and services during conversations or processing tasks.^1,2 Rather than relying solely on information contained within their training data, LLMs equipped with tool calling can dynamically access real-time information, perform actions, and interact with external systems to provide more accurate, current, and actionable responses.^3,4

How Tool Calling Works

The tool calling process follows a structured flow that bridges the gap between language models and external systems:²

A user submits a prompt or query to the LLM that may require external data or functionality
The model analyses the request and determines whether a tool is needed to fulfil it
If necessary, the model outputs structured data specifying which tool to call and what parameters to use
The application executes the requested tool with the provided parameters
The tool returns results to the model
The model incorporates this information into its final response to the user

Critically, the model itself does not execute the functions or interact directly with external systems. Instead, it generates structured parameters for potential function calls, allowing your application to maintain full control over whether to invoke the suggested function or take alternative actions.⁸

Defining Tools and Functions

Tools are defined using JSON Schema format, which informs the model about available capabilities.³ Each tool definition requires three essential components:

Name: A function identifier using alphanumeric characters, underscores, or dashes (maximum 64 characters)
Description: A clear explanation of what the function does, which the model uses to decide when to call it
Parameters: A JSON Schema object describing the function’s input arguments and their types

For example, a weather function might be defined with the name get_weather, a description explaining it retrieves current weather conditions, and parameters specifying that it requires a location argument.²

Types of Tool Calling

Tool calling implementations vary in complexity depending on application requirements:¹

Simple: One function triggered by a single user prompt, ideal for basic utilities
Multiple: Several functions available, with the model selecting the most appropriate one based on user intent
Parallel: The same function called multiple times simultaneously for complex requests
Parallel Multiple: Multiple different functions executed in parallel within a single request
Multi-Step: Sequential function calling within one conversation turn for data processing workflows
Multi-Turn: Conversational context combined with function calling, enabling AI agents to interact with humans in iterative loops

Primary Use Cases

Tool calling enables two fundamental categories of functionality:⁴

Fetching Data: Retrieving up-to-date information for model responses, such as current weather conditions, currency conversion rates, or specific data from knowledge bases and APIs. This approach is particularly valuable for Retrieval-Augmented Generation (RAG) systems that require access to external knowledge sources.⁴

Taking Action: Performing external operations such as submitting forms, updating application state, scheduling appointments, controlling smart home devices, or orchestrating agentic workflows including conversation handoffs.^4,5

Practical Applications

Tool calling transforms LLMs from passive information providers into active agents capable of real-world interaction. Common implementations include:⁵

Conversational agents that answer questions by accessing current data
Voice AI bots that check weather, look up stock prices, or query databases
Automated systems that schedule appointments or control connected devices
Agentic AI workflows that perform complex multi-step tasks

Key Distinction: Tools vs Functions

Whilst the terms are often used interchangeably, a subtle distinction exists. A function is a specific kind of tool defined by a JSON schema, allowing the model to pass structured data to your application. A tool is the broader concept encompassing any external capability or resource-including functions, custom tools with free-form text inputs and outputs, and built-in tools such as web search, code execution, and Model Context Protocol (MCP) server functionality.^2,8

Related Strategy Theorist: Andrew Ng

Andrew Ng (born 1976) is a pioneering computer scientist and AI researcher whose work has profoundly influenced how modern AI systems are designed and deployed, including the development of tool-augmented AI architectures. As a co-founder of Coursera, Chief Scientist at Baidu, and founder of Landing AI, Ng has consistently advocated for practical, production-oriented approaches to artificial intelligence that extend model capabilities beyond their training data.

Ng’s relationship to tool calling stems from his broader philosophy that effective AI systems must be grounded in real-world applications. Rather than viewing LLMs as isolated systems, Ng has championed the integration of language models with external tools, databases, and domain-specific systems-an approach that directly parallels modern tool calling implementations. His work on machine learning systems design emphasises the importance of connecting AI models to actionable data and external services, enabling them to operate effectively in production environments.

In his influential writings and lectures, particularly through his “AI for Everyone” initiative and subsequent work on AI transformation, Ng has stressed that the future of AI lies not in larger models alone, but in intelligent systems that can leverage external resources and tools to solve real problems. This perspective aligns precisely with tool calling’s core principle: extending LLM capabilities by enabling structured interaction with external systems.

Ng’s background includes a PhD in Computer Science from UC Berkeley, where he conducted research in machine learning and robotics. He served as Director of the Stanford Artificial Intelligence Laboratory and has held leadership positions at major technology companies. His contributions to deep learning, transfer learning, and practical AI deployment have shaped industry standards for building intelligent systems that operate beyond their training data-making him a foundational figure in the theoretical and practical development of tool-augmented AI systems like those enabled by tool calling.

References

1. https://docs.together.ai/docs/function-calling

2. https://platform.openai.com/docs/guides/function-calling

3. https://docs.fireworks.ai/guides/function-calling

4. https://docs.cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling

5. https://docs.pipecat.ai/guides/learn/function-calling

6. https://budibase.com/blog/ai-agents/tool-calling/

7. https://www.promptingguide.ai/applications/function_calling

8. https://cobusgreyling.substack.com/p/whats-the-difference-between-tools

Term: Diffusion models

“Diffusion models are a class of generative artificial intelligence (AI) models that create new data instances by learning to reverse a gradual, step-by-step process of adding noise to training data.” – Diffusion models

Diffusion models are a class of generative artificial intelligence models that create new data instances by learning to reverse a gradual, step-by-step process of adding noise to training data. They represent one of the most significant advances in machine learning, emerging as the dominant generative approach since the introduction of Generative Adversarial Networks in 2014.

Core Mechanism

Diffusion models operate through a dual-phase process inspired by non-equilibrium thermodynamics in physics. The mechanism mirrors the natural diffusion phenomenon, where molecules move from areas of high concentration to low concentration. In machine learning, this principle is inverted to generate high-quality synthetic data.

The process consists of two complementary components:

Forward diffusion process: Training data is progressively corrupted by adding Gaussian noise through a series of small, incremental steps. Each step introduces controlled complexity via a Markov chain, gradually transforming structured data into pure noise.
Reverse diffusion process: The model learns to reverse this noise-addition procedure, starting from random noise and iteratively removing it to reconstruct data that matches the original training distribution.

During training, the model learns to predict the noise added at each step of the forward process by minimising a loss function that measures the difference between predicted and actual noise. Once trained, the model can generate entirely new data by passing randomly sampled noise through the learned denoising process.

Key Components and Architecture

Three essential elements enable diffusion models to function effectively:

Forward diffusion process: Adds noise to data in successive small steps, with each iteration increasing randomness until the data resembles pure noise.
Reverse diffusion process: The neural network learns to iteratively remove noise, generating data that closely resembles training examples.
Score function: Estimates the gradient of the data distribution with respect to noise, guiding the reverse diffusion process to produce realistic samples.

A notable architectural advancement is the Latent Diffusion Model (LDM), which runs the diffusion process in latent space rather than pixel space. This approach significantly reduces training costs and accelerates inference speed by first compressing data with an autoencoder, then performing the diffusion process on learned semantic representations.

Advantages Over Alternative Approaches

Diffusion models offer several compelling advantages compared to competing generative models such as GANs and Variational Autoencoders (VAEs):

Superior image quality: They generate highly realistic images that closely match the distribution of real data, outperforming GANs through their distinct mechanisms for precise replication of real-world imagery.
Stable training: Unlike GANs, diffusion models avoid mode collapse and unstable training dynamics, providing a more reliable learning process.
Flexibility: They can model complex data distributions without requiring explicit likelihood estimation.
Theoretical foundations: Based on well-understood principles from stochastic processes and statistical mechanics, providing strong mathematical grounding.
Simple loss functions: Training employs straightforward and efficient loss functions that are easier to optimise.

Applications and Impact

Diffusion models have revolutionised digital content creation across multiple domains. Notable applications include:

Text-to-image generation (Stable Diffusion, Google Imagen)
Text-to-video synthesis (OpenAI SORA)
Medical imaging and diagnostic applications
Autonomous vehicle development
Audio and sound generation
Personalised AI assistants

Mathematical Foundation

Diffusion models are formally classified as latent variable generative models that map to latent space using a fixed Markov chain. The forward process gradually adds noise to obtain the approximate posterior:

q(x_{1:T}|x_0)

where $x_1, \ldots, x_T$ are latent variables with the same dimensionality as the original data $x_0$ . The reverse process learns to invert this transformation, generating new samples from pure noise through iterative denoising steps.

Theoretical Lineage: Yoshua Bengio and Deep Learning Foundations

Whilst diffusion models represent a relatively recent innovation, their theoretical foundations are deeply rooted in the work of Yoshua Bengio, a pioneering figure in deep learning and artificial intelligence. Bengio’s contributions to understanding neural networks, representation learning, and generative models have profoundly influenced the development of modern AI systems, including diffusion models.

Bengio, born in 1964 in Paris and now based in Canada, is widely recognised as one of the three “godfathers of AI” alongside Yann LeCun and Geoffrey Hinton. His career has been marked by fundamental contributions to machine learning theory and practice. In the 1990s and 2000s, Bengio conducted groundbreaking research on neural networks, including work on the vanishing gradient problem and the development of techniques for training deep architectures. His research on representation learning established that neural networks learn hierarchical representations of data, a principle central to understanding how diffusion models capture complex patterns.

Bengio’s work on energy-based models and probabilistic approaches to learning directly informed the theoretical framework underlying diffusion models. His emphasis on understanding the statistical principles governing generative processes provided crucial insights into how models can learn to reverse noising processes. Furthermore, Bengio’s advocacy for interpretability and theoretical understanding in deep learning has influenced the rigorous mathematical treatment of diffusion models, distinguishing them from more empirically-driven approaches.

In recent years, Bengio has become increasingly focused on AI safety and the societal implications of advanced AI systems. His recognition of diffusion models’ potential-both for beneficial applications and potential risks-reflects his broader commitment to ensuring that powerful generative technologies are developed responsibly. Bengio’s continued influence on the field ensures that diffusion models are developed with attention to both theoretical rigour and ethical considerations.

The connection between Bengio’s foundational work on deep learning and the emergence of diffusion models exemplifies how theoretical advances in understanding neural networks eventually enable practical breakthroughs in generative modelling. Diffusion models represent a maturation of principles Bengio helped establish: the power of hierarchical representations, the importance of probabilistic frameworks, and the value of learning from data through carefully designed loss functions.

References

1. https://www.superannotate.com/blog/diffusion-models

2. https://www.geeksforgeeks.org/artificial-intelligence/what-are-diffusion-models/

3. https://en.wikipedia.org/wiki/Diffusion_model

4. https://www.coursera.org/articles/diffusion-models

5. https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction

6. https://www.splunk.com/en_us/blog/learn/diffusion-models.html

7. https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

Term: Model density

“Model density” in AI, particularly regarding LLMs, is a performance-efficiency metric defined as the ratio of a model’s effective capability (performance) to its total parameter size.” – Model density

Model density represents a fundamental shift in how we measure artificial intelligence performance, moving beyond raw computational power to assess how effectively a model utilises its parameters. Rather than simply counting the number of parameters in a neural network, model density quantifies the ratio of effective capability to total parameter count, revealing how intelligently a model has been trained and architected.³

The Core Concept

At its essence, model density answers a critical question: how much useful intelligence does each parameter contribute? This metric emerged from the recognition that newer models achieve superior performance with fewer parameters than their predecessors, suggesting that progress in large language models stems not merely from scaling size, but from improving architecture, training data quality, and algorithmic efficiency.³

The concept can be understood through what researchers call capability density, formally defined as the ratio of a model’s effective parameter count to its actual parameter count.³ The effective parameter count is estimated by fitting scaling laws to existing models and determining how large a reference model would need to be to match current performance. When this ratio exceeds 1.0, it indicates that a model performs better than expected for its size-a hallmark of efficient design.

Information Compression and the “Great Squeeze”

Model density becomes particularly illuminating when examined through the lens of information compression. Modern large language models achieve remarkable density through what has been termed “the Great Squeeze”-the process of compressing vast training datasets into mathematical representations.¹

Consider the Llama 3 family as a concrete example. During training, the model encountered approximately 15 trillion tokens of information. If stored in a traditional database, this would require 15 to 20 terabytes of raw data. The resulting Llama 3 70B model, however, contains only 70 billion parameters with a final weight of roughly 140 gigabytes-representing a 100:1 reduction in physical size.¹ This translates to a squeeze ratio where each parameter has “seen” over 200 different tokens of information during training.¹

The smaller Llama 3 8B model demonstrates even more extreme density, compressing 15 trillion tokens into 8 billion parameters-a ratio of nearly 1,875 tokens per parameter.¹ This extreme over-training paradoxically enables superior reasoning capabilities, as the higher density of learned experience per parameter allows the model to extract more nuanced patterns from its training data.

Semantic Density and Output Reliability

Beyond parameter efficiency, model density extends to the quality and consistency of outputs. Semantic density measures the confidence level of an LLM’s response by analysing how probable and semantically consistent the generated answer is.² This metric evaluates how well each answer aligns with alternative responses and the query’s overall context, functioning as a post-processing step that requires no retraining or fine-tuning.²

High semantic density indicates strong understanding of a topic and internal consistency, resulting in more reliable outputs.² This proves particularly valuable given that LLMs lack built-in confidence measures and can produce outputs that sound authoritative even when incorrect or misleading.⁵ By generating multiple responses and computing confidence scores between 0 and 1, semantic density identifies responses located in denser regions of output semantic space-and therefore more trustworthy.⁵

Intelligence Density in Practical Application

Beyond parameter ratios, practitioners increasingly focus on intelligence density as the amount of useful intelligence produced per unit of time or computational resource.⁴ This reframing acknowledges that once models achieve sufficient peak intelligence for their intended tasks, the primary constraint shifts from maximum capability to the density of intelligence they can produce.⁴ In customer support and similar domains, this means optimising the quantity of intelligence produced per second becomes more valuable than pursuing ever-higher peak performance.⁴

This principle reveals that high-enough peak intelligence is necessary but not sufficient; once achieved, value creation moves towards latency and density optimisation, where significant opportunities for differentiation remain under-explored and are cheaper to capture.⁴

The Exponential Progress Trend

Research indicates that the best-performing models at each time point show rising capability density, with newer models achieving given performance levels with fewer parameters than older models.³ This trend appears approximately exponential over time, suggesting that progress in large language models is fundamentally about improving efficiency rather than simply scaling up.³ This observation underscores that tracking parameter efficiency is essential for understanding future directions in natural language processing and machine learning.

Related Theorist: Ilya Sutskever and Scaling Laws

The theoretical foundations of model density connect deeply to the work of Ilya Sutskever, Chief Scientist at OpenAI and a pioneering researcher in understanding how neural networks scale. Sutskever’s research on scaling laws-particularly his work demonstrating predictable relationships between model size, data size, and performance-provided the mathematical framework upon which modern density metrics rest.

Born in 1986 in Yegoryevsk, Russia, Sutskever emigrated to Canada as a child and developed an early passion for artificial intelligence. He completed his PhD at the University of Toronto under Geoffrey Hinton, one of the founding figures of deep learning, where he focused on understanding the principles governing neural network training and optimisation.

Sutskever’s seminal work on scaling laws, conducted whilst at OpenAI alongside researchers including Jared Kaplan, revealed that model performance follows predictable power-law relationships with respect to compute, data, and model size.³ These discoveries fundamentally changed how the field approaches model development. Rather than viewing larger models as inherently better, Sutskever’s work demonstrated that the efficiency with which a model uses its parameters matters profoundly.

His research established that progress in AI is not merely about building bigger models, but about understanding and optimising the relationship between parameters and capability-the very essence of model density. Sutskever’s theoretical contributions directly enabled the concept of capability density, as researchers could now quantify how much “effective” capacity a model possessed relative to its actual parameter count. His work demonstrated that architectural innovations, superior training algorithms, and higher-quality data could yield models that achieve better performance with fewer parameters, validating the principle that density-not size-drives progress.

Sutskever’s influence extends beyond scaling laws to shaping how the entire field conceptualises model efficiency. His emphasis on understanding the mathematical principles underlying neural network training rather than pursuing brute-force scaling has become increasingly relevant as computational costs and environmental concerns make parameter efficiency paramount. In this sense, model density represents the practical realisation of Sutskever’s theoretical insights: the recognition that intelligent design and efficient parameter utilisation outweigh raw computational scale.

References

1. https://dentro.de/ai/blog/2025/12/20/the-great-squeeze—understanding-llm-information-density/

2. https://www.geekytech.co.uk/semantic-density-and-its-impact-on-llm-ranking/

3. https://research.aimultiple.com/llm-scaling-laws/

4. https://fin.ai/research/we-dont-need-higher-peak-intelligence-only-more-intelligence-density/

5. https://www.cognizant.com/us/en/ai-lab/blog/semantic-density-demo

6. https://www.educationdynamics.com/ai-density-in-search-marketing/

7. https://pub.towardsai.net/the-generative-ai-model-map-fff0b6490f77

Term: Model weights

“Model weights are the crucial numerical parameters learned during training that define a model’s internal knowledge, dictating how input data is transformed into outputs and enabling it to recognise patterns and make predictions.” – Model weights

Model weights represent the learnable numerical parameters within a neural network that determine how input data is processed to generate predictions, functioning similarly to synaptic strengths in a biological brain.^1,2,4 These values control the influence of specific features on the output, such as edges in images or tokens in language models, through operations like matrix multiplications, convolutions, or weighted sums across layers.^1,2,3 Initially randomised, weights are optimised during training via algorithms like gradient descent, which iteratively adjust them to minimise a loss function measuring the difference between predictions and actual targets.^1,2,5

In practice, for a simple linear regression model expressed as $y = wx + b$ , the weight w scales the input x to predict y, while b is the bias term.² In complex architectures like convolutional neural networks (CNNs) or large language models (LLMs), weights include filters detecting textures and fully connected layers combining features, often numbering in billions.^1,2,5 This enables tasks from image classification to real-time translation, with pre-trained weights facilitating transfer learning on custom datasets.¹

Weights are distinct from biases, which add normalisation and extra characteristics to the weighted sum before activation functions, aiding forward and backward propagation.^3,6 Protecting these parameters is vital, as they encode the model’s performance, robustness, and decision logic; unauthorised changes can lead to malfunction.⁵ In LLMs, weights boost emphasis on words or associations, shaping generative outputs.³

Key Theorist: Geoffrey Hinton

The preeminent theorist linked to model weights is **Geoffrey Hinton**, often called the ‘Godfather of Deep Learning’ for pioneering backpropagation and neural network training techniques that optimise these parameters.^1,2 Hinton’s seminal 1986 paper with David Rumelhart and Ronald Williams popularised backpropagation, the cornerstone algorithm for adjusting weights layer-by-layer based on error gradients, revolutionising machine learning.^2,4

Born in 1947 in Wimbledon, London, Hinton descends from a lineage of scientists: his great-great-grandfather George Boole invented Boolean logic, his grandfather Charles Howard Hinton coined ‘hyperspace’, and his great-uncle was logician Bertrand Russell. Initially studying experimental psychology at Cambridge (BA 1970), Hinton earned a PhD in AI from Edinburgh in 1978, focusing on Boltzmann machines-early stochastic neural networks with learnable weights. Disillusioned with symbolic AI, he championed connectionism, simulating brain-like learning via weights.

In the 1980s, amid the first AI winter, Hinton persisted at Carnegie Mellon and Toronto, developing restricted Boltzmann machines for unsupervised pre-training of weights, addressing vanishing gradients. His 2006 breakthrough with Alex Krizhevsky and Ilya Sutskever-training deep belief networks on ImageNet-proved deep nets with billions of weights could excel, sparking the deep learning revolution.¹ At Google Brain (2013-2023), he advanced capsule networks and transformers indirectly influencing LLMs. Hinton quit Google in 2023, warning of AI risks, and won the 2018 Turing Award with Yann LeCun and Yoshua Bengio. His work directly underpins how modern models, including LLMs, learn weights to recognise patterns and predict outcomes.^3,5

References

1. https://www.ultralytics.com/glossary/model-weights

2. https://www.tencentcloud.com/techpedia/132448

3. https://blog.metaphysic.ai/weights-in-machine-learning/

4. https://tedai-sanfrancisco.ted.com/glossary/weights/

5. https://alliancefortrustinai.org/how-model-weights-can-be-used-to-fine-tune-ai-models/

6. https://h2o.ai/wiki/weights-and-biases/

Term: Recursive Language Model (RLM)

“A Recursive Language Model (RLM) is an AI inference strategy where a large language model (LLM) is granted the ability to programmatically interact with and recursively call itself or smaller helper models to solve complex tasks and process extremely long inputs.” – Recursive Language Model (RLM)

A **Recursive Language Model (RLM)** is an innovative inference strategy that empowers large language models (LLMs) to treat input contexts not as static strings but as dynamic environments they can actively explore, decompose, and recursively process.^1,3,4 This approach fundamentally shifts AI from passive text processing to active problem-solving, enabling the handling of extremely long inputs, complex reasoning tasks, and structured outputs without being constrained by traditional context window limits.^1,6

At its core, an RLM operates within a Python Read-Eval-Print Loop (REPL) environment where the input context is stored as a programmable variable.^1,2,3 The model begins with exploration and inspection, using tools like string slicing, regular expressions, and keyword searches to scan and understand the data structure actively rather than passively reading it.¹ It then performs task decomposition, breaking the problem into smaller subtasks that fit within standard context windows, with the model deciding the splits based on its discoveries.^1,3

The hallmark is recursive self-calls, where the model invokes itself (or smaller helper models) on each subtask, forming a tree of reasoning that aggregates partial results into variables within the REPL.^1,4 This is followed by aggregation and synthesis, combining outputs programmatically into lists, tables, or documents, and verification and self-checking through re-runs or cross-checks for reliability.¹ Unlike traditional LLMs that process a single forward pass on tokenised input, RLMs grant the model ‘hands and eyes’ to query itself programmatically, such as result = rlm_query(sub_prompt), transforming context from ‘input’ to ‘environment’.^1,3

RLMs address key limitations like ‘context rot’-degradation in long-context performance-and scale to effectively unlimited lengths (over 10 million tokens tested), outperforming baselines by up to 114% on benchmarks without fine-tuning, via prompt engineering alone.^3,6,2 They differ from agentic systems by decomposing context adaptively rather than predefined tasks, and from reasoning models by scaling through recursive decomposition.⁶

Key Theorist: Alex L. Zhang and the MIT Origins

The primary theorist behind RLMs is **Alex L. Zhang**, a researcher affiliated with MIT, who co-authored the seminal work proposing RLMs as a general inference framework.^3,4,8 In his detailed blog and the arXiv paper ‘Recursive Language Models’ (published around late 2025), Zhang articulates the vision: enabling LLMs to ‘recursively call themselves or other LLMs’ to process unbounded contexts and mitigate degradation.^3,4 His implementation uses GPT-5 or GPT-5-mini in a Python REPL, allowing adaptive chunking and recursion at test time.³

Alex L. Zhang’s biography reflects a deep expertise in AI scaling and inference innovations. Active in 2025 through platforms like his GitHub blog (alexzhang13.github.io), he focuses on practical advancements in language model capabilities, particularly long-context handling.³ While specific early career details are sparse in available sources, his work builds on MIT’s disruptive ethos-echoed in proposals like ‘why not let the model read itself?’-positioning him as a key figure in the 2026 paradigm shift towards recursive AI architectures.^1,8 Zhang’s contributions emphasise test-time compute scaling, distinguishing RLMs from mere architectural changes by framing them as a ‘thin wrapper’ around standard LLMs that reframes them as stateful programmes.⁵

Experimental validations in Zhang’s framework demonstrate RLMs’ superiority, such as dramatically improved accuracy on pairwise comparison tasks (from near-zero to over 58%) and spam classification in massive prompts.^2,4 His ideas have sparked widespread discussion, with sources hailing RLMs as ‘the ultimate evolution of AI’ and a ‘game-changer for 2026’.^1,2,7

References

1. https://gaodalie.substack.com/p/rlm-the-ultimate-evolution-of-ai

2. https://www.oreateai.com/blog/the-rise-of-recursive-language-models-a-game-changer-for-2026/0fee0de5cdd99689fca9e499f6333681

3. https://alexzhang13.github.io/blog/2025/rlm/

4. https://arxiv.org/html/2512.24601v1

5. https://datasciencedojo.com/blog/what-are-recursive-language-models/

6. https://www.getmaxim.ai/blog/breaking-the-context-window-how-recursive-language-models-handle-infinite-input/

7. https://www.primeintellect.ai/blog/rlm

8. https://www.theneuron.ai/explainer-articles/recursive-language-models-rlms-the-clever-hack-that-gives-ai-infinite-memory

Quote: Jensen Huang – Nvidia CEO

“OpenClaw is probably the single most important release of software, probably ever. If you look at… the adoption of it, Linux took some 30 years to reach this level. OpenClaw has now surpassed Linux. It is now the single most downloaded open source software in history, and it took 3 weeks.” – Jensen Huang – Nvidia CEO

In a striking declaration at the Morgan Stanley Technology, Media and Telecom Conference in San Francisco, Nvidia CEO Jensen Huang positioned OpenClaw as a revolutionary force in open source software, outpacing even the legendary Linux kernel in adoption speed and scale.⁵ This remark underscores Huang’s vision for AI agents – autonomous systems capable of continuous operation and complex tasks – as the next frontier in artificial intelligence, with OpenClaw serving as their foundational framework.⁵

Context of the Quote

Delivered on 4 March 2026, Huang’s comments came amid discussions on Nvidia’s strategic investments in AI leaders like OpenAI and Anthropic, where he noted that recent deals, including a $30 billion stake in OpenAI, might represent the company’s final major private investments before these firms pursue initial public offerings.^1,2,3,5,6 Amid this, Huang pivoted to OpenClaw’s meteoric rise, contrasting its three-week dominance in downloads against Linux’s three-decade journey to similar prominence.⁵ He highlighted its ‘vertical’ growth on semi-log charts, attributing this to the insatiable demand for AI agents that process a million times more tokens and run perpetually in enterprise environments.⁵

Who is Jensen Huang?

Jensen Huang co-founded Nvidia in 1993 alongside Chris Malachowsky and Curtis Priem, initially focusing on graphics processing units (GPUs) for gaming and visualisation.⁴ Under his leadership, Nvidia pivoted decisively to AI and high-performance computing, with breakthroughs like CUDA – a parallel computing platform that locks in developers through its ecosystem of software, interconnects like NVLink, and rack-scale systems.⁴ Huang’s prescience in positioning GPUs as indispensable for AI training and inference has propelled Nvidia to a market leader, with hyperscalers committing over $660 billion in AI spending for 2026 alone.⁴ His conference appearances, including this one, blend investment insights with technological evangelism, reinforcing Nvidia’s moat in the AI stack.^1,3,4,5

What is OpenClaw?

OpenClaw emerges as Nvidia’s open source initiative tailored for AI agents – intelligent, persistent programmes that autonomously handle tasks such as software development, tool creation, and data processing.⁵ Unlike traditional software, these agents operate continuously, consuming vast token volumes (a measure of computational language processing) and integrating seamlessly into workflows.⁵ Huang’s team deploys numerous OpenClaw instances internally, automating coding and innovation, which explains the explosive download figures: surpassing Linux – the cornerstone of servers, supercomputers, and embedded systems – in just three weeks.⁵ This positions OpenClaw not merely as code, but as infrastructure for the agentic AI era, where autonomy scales intelligence.

Backstory: Linux’s Enduring Legacy

To grasp OpenClaw’s feat, consider Linux’s trajectory. Initiated in 1991 by Linus Torvalds as a hobby project, Linux evolved into the world’s most ubiquitous operating system kernel, powering 96% of the top supercomputers, most cloud infrastructure, and Android devices.⁵ Its adoption spanned three decades, driven by open source principles, community contributions, and enterprise embrace from IBM to Google. Yet, as Huang noted, even this benchmark took 30 years to cement Linux as a download and deployment juggernaut.⁵ OpenClaw’s subversion of this timeline signals a paradigm shift: AI-driven tools now accelerate adoption via immediate utility in high-stakes domains like enterprise AI.

Leading Theorists in AI Agents and Open Source AI

Linus Torvalds: Architect of Linux, Torvalds pioneered collaborative open source development via Git, influencing every major software ecosystem. His ‘benevolent dictator’ governance model ensured Linux’s stability and growth, principles echoed in modern AI repositories.⁵
Ilya Sutskever: Co-founder of OpenAI and key figure in transformer models (the backbone of agents), Sutskever’s work on scaling laws demonstrated how compute and data yield emergent intelligence, paving the way for agentic systems like those powered by OpenClaw.
Andrej Karpathy: Former OpenAI and Tesla AI director, Karpathy advanced accessible AI through nanoGPT and LLM training tutorials, theorising agent swarms – multi-agent collaborations – that align with Huang’s vision of continuous, token-hungry OpenClaw deployments.
Yohei Nakajima: Creator of BabyAGI, an early agent framework, Nakajima theorised task decomposition and self-improvement loops, concepts central to OpenClaw’s real-world utility in software engineering and beyond.
Sam Altman: OpenAI CEO, Altman champions ‘agentic AI’ as the post-ChatGPT phase, where models act independently. Despite tensions in Nvidia partnerships, his firm’s trajectory validates Huang’s infrastructure bets.^1,2,3

Huang’s endorsement frames OpenClaw as the synthesis of these ideas: open source velocity meets agentic scale, challenging developers to harness AI’s full potential.

Implications for AI and Open Source

OpenClaw’s ascent heralds a compression of innovation cycles, where AI agents bootstrap their own ecosystems faster than human-led projects like Linux.⁵ For investors and technologists, it reinforces Nvidia’s centrality: not just in hardware, but in software that cements lock-in.⁴ As agents proliferate – writing code, optimising systems, and driving revenue – Huang’s words invite scrutiny of whether this marks the true democratisation of AI, or Nvidia’s deepening dominance in the field.^1,4,5

References

1. https://www.mexc.com/news/855185

2. https://finviz.com/news/330373/jensen-huang-says-nvidias-30-billion-openai-investment-might-be-the-last-before-ipo

3. https://techcrunch.com/2026/03/04/jensen-huang-says-nvidia-is-pulling-back-from-openai-and-anthropic-but-his-explanation-raises-more-questions-than-it-answers/

4. https://www.thestreet.com/investing/morgan-stanley-changes-its-nvidia-position-for-the-rest-of-2026

5. https://ng.investing.com/news/transcripts/nvidia-at-morgan-stanley-conference-ai-leadership-and-strategic-growth-93CH-2375443

6. https://ppam.com.au/nvidia-ceo-huang-says-30-billion-openai-investment-might-be-the-last/

7. https://www.tmtbreakout.com/p/ms-tmt-conf-nvidias-jensen-nvda-microsofts

Term: Mixture of Experts (MoE)

“Mixture of Experts (MoE) is an efficient neural network architecture that uses multiple specialised sub-models (experts) and a gating network (router) to dynamically select and activate only the most relevant experts for a given input.” – Mixture of Experts (MoE)

This architectural approach divides a large artificial intelligence model into separate sub-networks, each specialising in processing specific types of input data. Rather than activating the entire network for every task, MoE models employ a gating mechanism-often called a router-that intelligently selects which experts should process each input. This selective activation introduces sparsity into the network, meaning only a fraction of the model’s total parameters are used for any given computation.^1,3

Core Architecture and Components

The fundamental structure of MoE consists of two essential elements:⁴

Expert networks: Multiple specialised sub-networks, typically implemented as feed-forward neural networks (FFNs), each with its own set of learnable parameters. These experts become skilled at handling specific patterns or types of data during training.¹
Gating network (router): A trainable mechanism that evaluates each input and determines which expert or combination of experts is best suited to process it. This routing function is computationally efficient, enabling the model to make rapid decisions about expert selection.^1,3

In practical implementations, such as the Mixtral 8x7B language model, each layer contains multiple experts-for instance, eight separate feedforward blocks with 7 billion parameters each. For every token processed, the router selects only a subset of these experts (in Mixtral’s case, two out of eight) to perform the computation, then combines their outputs before passing the result to the next layer.³

How MoE Achieves Efficiency

MoE models leverage conditional computation to reduce computational burden without sacrificing model capacity.³ This approach enables several efficiency gains:

Models can scale to billions of parameters whilst maintaining manageable inference costs, since not all parameters are activated for every input.^1,3
Training can occur with significantly less compute, allowing researchers to either reduce training time or expand model and dataset sizes.⁴
Experts can be distributed across multiple devices through expert parallelism, enabling efficient large-scale deployments.¹

The gating mechanism ensures that frequently selected experts receive continuous updates during training, improving their performance, whilst load balancing mechanisms attempt to distribute computational work evenly across experts to prevent bottlenecks.¹

Historical Development and Key Theorist: Noam Shazeer

Noam Shazeer stands as the primary architect of modern MoE systems in deep learning. In 2017, Shazeer and colleagues-including the legendary Geoffrey Hinton and Google’s Jeff Dean-introduced the Sparsely-Gated Mixture-of-Experts Layer for recurrent neural language models.^1,4 This seminal work fundamentally transformed how researchers approached scaling neural networks.

Shazeer’s contribution was revolutionary because it reintroduced the mixture of experts concept, which had existed in earlier machine learning literature, into the deep learning era. His team scaled this architecture to a 137-billion-parameter LSTM model, demonstrating that sparsity could maintain very fast inference even at massive scale.⁴ Although this initial work focused on machine translation and encountered challenges such as high communication costs and training instabilities, it established the theoretical and practical foundation for all subsequent MoE research.⁴

Shazeer’s background as a researcher at Google positioned him at the intersection of theoretical machine learning and practical systems engineering. His work exemplified a crucial insight: that not all parameters in a neural network need to be active simultaneously. This principle has since become foundational to modern large language model design, influencing architectures used by leading AI organisations worldwide. The Sparsely-Gated Mixture-of-Experts Layer introduced the trainable gating network concept that remains central to MoE implementations today, enabling conditional computation that balances model expressiveness with computational efficiency.¹

Applications and Performance

MoE architectures have demonstrated faster training and comparable or superior performance to dense language models on many benchmarks, particularly in multi-domain tasks where different experts can specialise in different knowledge areas.¹ Applications span natural language processing, computer vision, and recommendation systems.²

Challenges and Considerations

Despite their advantages, MoE systems present implementation challenges. Load balancing remains critical-when experts are distributed across multiple devices, uneven expert selection can create memory and computational bottlenecks, with some experts handling significantly more tokens than others.¹ Additionally, distributed training complexity and the need for careful tuning to maintain stability and efficiency require sophisticated engineering approaches.¹

References

1. https://neptune.ai/blog/mixture-of-experts-llms

2. https://www.datacamp.com/blog/mixture-of-experts-moe

3. https://www.ibm.com/think/topics/mixture-of-experts

4. https://huggingface.co/blog/moe

5. https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mixture-of-experts

6. https://www.youtube.com/watch?v=sYDlVVyJYn4

7. https://arxiv.org/html/2503.07137v1

8. https://cameronrwolfe.substack.com/p/moe-llms

Term: AI harness

“A harness (often called an agent harness or agentic harness) is an external software framework that wraps around a Large Language Model (LLM) to make it functional, durable, and capable of taking actions in the real world.” – AI harness

An AI harness is the external software framework that wraps around a Large Language Model (LLM) to extend its capabilities beyond text generation, enabling it to function as a persistent, tool-using agent capable of taking real-world actions. Without a harness, an LLM operates in isolation-processing a single prompt and generating a response with no memory of previous interactions and no ability to interact with external systems. The harness solves this fundamental limitation by providing the infrastructure necessary for autonomous, multi-step reasoning and execution.

Core Functions and Architecture

An AI harness performs several critical functions that transform a static language model into a dynamic agent. Memory management addresses one of the most significant constraints of raw LLMs: their fixed context windows and lack of persistent memory. Standard language models begin each session with no recollection of previous interactions, forcing them to operate without historical context. The harness implements memory systems-including persistent context logs, summaries, and external knowledge stores-that carry information across sessions, enabling the agent to learn from past experiences and maintain continuity across multiple interactions.

Tool execution and external action represents another essential function. Language models alone can only produce text; they cannot browse the web, execute code, query databases, or generate images. The harness monitors the model’s output for special tool-call commands and executes those operations on the model’s behalf. When a tool call is detected, the harness pauses text generation, executes the requested operation in the external environment (such as performing a web search or running code in a sandbox), and feeds the results back into the model’s context. This mechanism effectively gives the model “hands and eyes,” transforming textual intentions into tangible real-world actions.

Context management and orchestration ensure that information flows efficiently between the model and its environment. The harness determines what information is provided to the model at each step, managing the transient prompt whilst maintaining a persistent task log separate from the model’s immediate context. This separation is crucial for long-running projects: even if an AI agent instance stops and a new one begins later with no memory in the raw LLM, the project itself retains memory through files and logs maintained by the harness.

Modular Design and Components

Contemporary harness architectures increasingly adopt modular designs that decompose agent functionality into interchangeable components. Research from ICML 2025 on “General Modular Harness for LLM Agents in Multi-Turn Gaming Environments” demonstrates this approach through three core modules: perception, which processes both low-resolution grid environments and visually complex images; memory, which stores recent trajectories and synthesises self-reflection signals enabling agents to critique past moves and adjust future plans; and reasoning, which integrates perceptual embeddings and memory traces to produce sequential decisions. This modular structure allows developers to toggle components on and off, systematically analysing each module’s contribution to overall performance.

Performance Impact and Practical Benefits

The empirical benefits of harness implementation are substantial. Models operating within a harness achieve significantly higher task success rates compared to un-harnessed baselines. In gaming environments, an AI with a memory and perception harness wins more games than the same AI without one. In coding tasks, an AI with a harness that runs and debugs its own code completes programming tasks that a standalone LLM would fail due to runtime errors. The harness essentially compensates for the model’s inherent weaknesses-lack of persistence, inability to access external knowledge, and propensity for errors-resulting in markedly improved real-world performance.

Perhaps most significantly, harnesses extend what an AI can accomplish without requiring model retraining. Want an LLM to handle images? Integrate a vision module or image captioning API into the harness. Need mathematical reasoning or complex logic? Add the appropriate tool or module. This extensibility makes harnesses economically valuable: two products using identical underlying LLMs can deliver vastly different user experiences based on the quality and sophistication of their respective harnesses.

Evolution and Strategic Importance

As AI capabilities have advanced, harness design has become increasingly critical to product success. The harness landscape is dynamic and evolving: popular agents like Manus have undergone five complete re-architectures since March 2024, and even Anthropic continuously refines Claude Code’s agent harness as underlying models improve. This reflects a fundamental principle: as models become more capable, harnesses must be continually simplified, stripping away scaffolding and crutches that are no longer necessary.

The distinction between orchestration and harness is worth noting. Orchestration serves as the “brain” of an AI system-determining the overall workflow and decision logic-whilst the harness functions as the “hands and infrastructure,” executing those decisions and managing the technical details. Both are critical for complex AI agents, and improvements in either dimension can dramatically enhance real-world performance.

Related Theorist: Allen Newell and Cognitive Architecture

Allen Newell (1927-1992) was an American cognitive scientist and computer scientist whose theoretical framework profoundly influences contemporary harness design. Newell’s “Unified Theories of Cognition” (UTC), published in 1990, proposed that human cognition operates through integrated systems of perception, memory, and reasoning-three faculties that work in concert to enable intelligent behaviour. This theoretical foundation directly inspired the modular harness architectures now prevalent in AI research.

Newell’s career spanned the emergence of cognitive science as a discipline. Working initially at the RAND Corporation and later at Carnegie Mellon University, he collaborated with Herbert Simon to develop the “Physical Symbol System Hypothesis,” which posited that physical symbol systems (such as computers) could exhibit intelligent behaviour through the manipulation of symbols according to rules. This work earned Newell and Simon the Turing Award in 1975, recognising their foundational contributions to artificial intelligence.

Newell’s UTC represented his mature synthesis of decades of research into human problem-solving, learning, and memory. Rather than treating perception, memory, and reasoning as separate cognitive modules, Newell argued they must be understood as deeply integrated systems operating within a unified cognitive architecture. This insight proved prescient: modern AI harnesses implement precisely this integration, with perception modules processing environmental information, memory modules storing and retrieving relevant context, and reasoning modules synthesising these inputs into coherent action sequences.

The connection between Newell’s theoretical work and contemporary harness design is not merely coincidental. Researchers explicitly cite Newell’s framework when justifying modular harness architectures, recognising that his cognitive science insights provide a principled foundation for engineering AI systems. In this sense, Newell’s work from the 1980s and early 1990s anticipated the architectural requirements that AI engineers would discover empirically decades later when attempting to build capable, persistent, tool-using agents.

References

1. https://parallel.ai/articles/what-is-an-agent-harness

2. https://developer.harness.io/docs/platform/harness-aida/aida-overview

3. https://arxiv.org/html/2507.11633v1

4. https://hugobowne.substack.com/p/ai-agent-harness-3-principles-for

5. https://dxwand.com/boost-business-ai-harness-llms-nlp-nlu/

6. https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents

Term: Loss function

“A loss function, also known as a cost function, is a mathematical function that quantifies the difference between a model’s predicted output and the actual ‘ground truth’ value for a given input.” – Loss function

A loss function is a mathematical function that quantifies the discrepancy between a model’s predicted output and the actual ground truth value for a given input. Also referred to as an error function or cost function, it serves as the objective function that machine learning and artificial intelligence algorithms seek to optimize during training efforts.

Core Purpose and Function

The loss function operates as a feedback mechanism within machine learning systems. When a model makes a prediction, the loss function calculates a numerical value representing the prediction error-the gap between what the model predicted and what actually occurred. This error quantification is fundamental to the learning process. During training, algorithms such as backpropagation use the gradient of the loss function with respect to the model’s parameters to iteratively adjust weights and biases, progressively reducing the loss and improving predictive accuracy.

The relationship between loss function and cost function warrants clarification: whilst these terms are often used interchangeably, a loss function technically applies to a single training example, whereas a cost function typically represents the average loss across an entire dataset or batch. Both, however, serve the same essential purpose of guiding model optimization.

Key Roles in Machine Learning

Loss functions fulfil several critical functions within machine learning systems:

Performance measurement: Loss functions provide a quantitative metric to evaluate how well a model’s predictions align with actual results, enabling objective assessment of model effectiveness.
Optimization guidance: By calculating prediction error, loss functions direct the learning algorithm to adjust parameters iteratively, creating a clear path toward improved predictions.
Bias-variance balance: Effective loss functions help balance model bias (oversimplification) and variance (overfitting), essential for generalisation to new, unseen data.
Training signal: The gradient of the loss function provides the signal by which learning algorithms update model weights during backpropagation.

Common Loss Function Types

Different machine learning tasks require different loss functions. For regression problems involving continuous numerical predictions, Mean Squared Error (MSE) and Mean Absolute Error (MAE) are widely employed. The MAE formula is:

\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right|

For classification tasks dealing with categorical data, Binary Cross-Entropy (also called Log Loss) is commonly used for binary classification problems. The formula is:

L(y, f(x)) = -[y \cdot \log(f(x)) + (1 - y) \cdot \log(1 - f(x))]

where y represents the true binary label (0 or 1) and f(x) is the predicted probability of the positive class.

For multi-class classification, Categorical Cross-Entropy extends this concept. Additionally, Hinge Loss is particularly useful in binary classification where clear separation between classes is desired:

L(y, f(x)) = \max(0, 1 - y \cdot f(x))

The Huber Loss function provides robustness to outliers by combining quadratic and linear components, switching between them based on a threshold parameter delta (?).

Related Strategy Theorist: Vladimir Vapnik

Vladimir Naumovich Vapnik (born 1935) stands as a foundational figure in the theoretical underpinnings of loss functions and machine learning optimisation. A Soviet and later American computer scientist, Vapnik’s work on Statistical Learning Theory and Support Vector Machines (SVMs) fundamentally shaped how the machine learning community understands loss functions and their role in model generalisation.

Vapnik’s most significant contribution to loss function theory came through his development of Support Vector Machines in the 1990s, where he introduced the concept of the hinge loss function-a loss function specifically designed to maximise the margin between classification boundaries. This represented a paradigm shift in thinking about loss functions: rather than simply minimising prediction error, Vapnik’s approach emphasised confidence and margin, ensuring models were not merely correct but confidently correct by a specified distance.

Born in the Soviet Union, Vapnik studied mathematics at the University of Uzbekistan before joining the Institute of Control Sciences in Moscow, where he conducted groundbreaking research on learning theory. His theoretical framework, Vapnik-Chervonenkis (VC) theory, provided mathematical foundations for understanding how models generalise from training data to unseen examples-a concept intimately connected to loss function design and selection.

Vapnik’s insight that different loss functions encode different assumptions about what constitutes “good” model behaviour proved revolutionary. His work demonstrated that the choice of loss function directly influences not just training efficiency but the model’s ability to generalise. This principle remains central to modern machine learning: data scientists select loss functions strategically to encode domain knowledge and desired model properties, whether robustness to outliers, confidence in predictions, or balanced handling of imbalanced datasets.

Vapnik’s career spanned decades of innovation, including his later work on transductive learning and learning using privileged information. His theoretical contributions earned him numerous accolades and established him as one of the most influential figures in machine learning science. His emphasis on understanding the mathematical foundations of learning-particularly through the lens of loss functions and generalisation bounds-continues to guide contemporary research in deep learning and artificial intelligence.

Practical Significance

The selection of an appropriate loss function significantly impacts model performance and training efficiency. Data scientists carefully consider different loss functions to achieve specific objectives: reducing sensitivity to outliers, better handling noisy data, minimising overfitting, or improving performance on imbalanced datasets. The loss function thus represents not merely a technical component but a strategic choice that encodes domain expertise and learning objectives into the machine learning system itself.

References

1. https://www.datacamp.com/tutorial/loss-function-in-machine-learning

2. https://h2o.ai/wiki/loss-function/

3. https://c3.ai/introduction-what-is-machine-learning/loss-functions/

4. https://www.geeksforgeeks.org/machine-learning/ml-common-loss-functions/

5. https://arxiv.org/html/2504.04242v1

6. https://www.youtube.com/watch?v=v_ueBW_5dLg

7. https://www.ibm.com/think/topics/loss-function

8. https://en.wikipedia.org/wiki/Loss_function

9. https://www.datarobot.com/blog/introduction-to-loss-functions/

Term: AI scaffolding

“Scaffolding refers to the structured architecture and instructional techniques built around an AI model to enhance its reasoning, reliability, and capability.” – AI scaffolding

AI scaffolding is the structured architecture and tooling built around a large language model (LLM) to enable it to perform complex, goal-driven tasks with enhanced reasoning, reliability, and capability.¹ Rather than relying on a single prompt or query, scaffolding places an LLM within a control loop that includes memory systems, external tools, decision logic, and feedback mechanisms, allowing the model to observe its environment, call APIs or code, update its context, and iterate until goals are achieved.¹

In essence, scaffolding bridges the critical gap between the capabilities of base models and production-ready systems. A standalone LLM lacks the architectural support needed to reliably complete multi-step tasks, interface with business systems, or adapt to domain-specific requirements.¹ Scaffolding augments the model’s bare capabilities by providing access to tools, domain data, and structured workflows that guide and extend its behaviour.

Core Components of AI Scaffolding

Effective scaffolding operates through several interconnected layers:

Planning and reasoning: Agents operate through defined reasoning and evaluation steps. Rather than acting immediately, scaffolding may prompt the model to plan or reflect before taking action, and to self-critique its outputs. Research demonstrates that allowing agents to plan and self-evaluate significantly improves problem-solving accuracy compared to action-only approaches.¹
Tool integration: The LLM is wrapped in code that interprets its outputs as tool calls. When the model determines it needs external resources-such as a calculator, database query, API call, or web search-the scaffold safely executes that tool and returns results to the model for the next reasoning step.¹
Memory systems: Scaffolding includes mechanisms for the agent to maintain and update context across multiple interactions, enabling it to build upon previous observations and decisions.¹
Feedback and control: Robust agents include feedback loops and safeguards such as self-evaluation steps, human-in-the-loop checks, and policy enforcement. In enterprise settings, scaffolding adds logging, testing suites, and guardrails like content filters to ensure outputs remain controlled and auditable.¹

Types of AI Scaffolding Techniques

AI scaffolding encompasses several distinct approaches, which can be combined to enhance model performance:

Tool access scaffolding: Granting models access to external tools such as code editors, web browsers, or specialised software significantly expands their problem-solving capabilities. For example, LLMs initially trained on finite datasets with fixed cut-off dates became substantially more capable when granted internet access.²
Agent loop scaffolding: This technique automates multi-step task completion by placing AI models in a loop with access to their own observations and actions, enabling them to self-generate each prompt needed to finish complex tasks. Systems like AutoGPT exemplify this approach.²
Multi-agent scaffolding: Multiple AI models collaborate on complex problems through dialogue, division of labour, or critique mechanisms. Research shows that extended networks of up to a thousand agents can coordinate to outperform individual models, with capability scaling predictably as networks grow larger.²
Procedural scaffolding: This approach builds a structured process in which the model generates outputs, checks them, and revises them iteratively, enforcing process discipline rather than relying on raw prompts alone.³
Semantic scaffolding: Using ontological frameworks and domain rules to validate outputs against formal relations, preventing deeper misunderstandings and moving AI closer to auditable, trustworthy reasoning.³

Practical Applications and Enterprise Use

Scaffolding is essential for operationalising LLMs in enterprise environments. Whether an agent is expected to generate structured outputs, interact with APIs, or solve problems through planning and iteration, its effectiveness depends on the scaffold that guides and extends its behaviour.¹ In sectors such as customer service, risk analysis, logistics, healthcare, and finance, scaffolding enables AI systems to maintain reliability and auditability in high-stakes contexts.³

A key advantage of scaffolding is that it improves accuracy whilst making AI reasoning more transparent. When a system reaches a conclusion, leaders can trace it back to formal relations in an ontology rather than relying solely on statistical inference, making the system trustworthy for critical applications.³

Scaffolding versus Model Scale

An important principle in modern AI development is that scaffolding often matters more than raw model scale. The future of AI-whether in homeland security, finance, healthcare, or other domains-will be defined not by the size of models but by the quality of the architectural frameworks surrounding them.³ Hybrid architectures that embed statistical models within well-designed scaffolded systems deliver superior performance and reliability compared to simply scaling larger models without structural support.

Key Theorist: Stuart Russell and the Alignment Research Tradition

The conceptual foundations of AI scaffolding are deeply rooted in the work of Stuart Russell, a leading figure in artificial intelligence safety and alignment research. Russell, the Volgenau Chair of Engineering at the University of California, Berkeley, and co-author of the seminal textbook Artificial Intelligence: A Modern Approach, has been instrumental in developing frameworks for ensuring AI systems remain controllable and aligned with human values as they become more capable.

Russell’s contributions to scaffolding theory emerge from his broader research agenda on AI safety and the control problem. In the early 2000s, as machine learning systems began to demonstrate increasing autonomy, Russell recognised that simply building more powerful models without corresponding advances in control architecture would create dangerous misalignment between AI capabilities and human oversight. His work emphasised that the architecture surrounding an AI system-not merely the model itself-determines whether that system can be safely deployed in high-stakes environments.

One of Russell’s most influential contributions to scaffolding concepts is his work on iterated amplification, developed in collaboration with researchers at OpenAI and other institutions. Iterated amplification is a form of scaffolding that uses multi-AI collaborations to solve increasingly complex problems whilst maintaining human oversight at each stage. In this approach, humans decompose complex tasks into simpler subtasks that AI systems solve, then humans review and synthesise these solutions. Over time, humans operate at progressively higher levels of abstraction whilst AI systems assume responsibility for more of the process. This iterative cycle improves model capabilities whilst preserving human auditability and control-a principle directly aligned with scaffolding’s core objective.²

Russell’s broader philosophical stance is that AI safety and capability enhancement are not opposing forces but complementary objectives. Scaffolding embodies this principle: by building structured architectures around models, developers simultaneously enhance capability (through tool access, planning, and feedback loops) and improve safety (through auditability, human-in-the-loop checks, and formal validation against domain rules). Russell’s insistence that AI systems must remain interpretable and auditable has directly influenced how modern scaffolding frameworks incorporate semantic validation, ontological constraints, and transparent reasoning pathways.

Throughout his career, Russell has advocated for what he terms “beneficial AI”-systems designed from inception to be controllable, transparent, and aligned with human values. Scaffolding represents a practical instantiation of this vision. Rather than hoping that larger models will somehow become more trustworthy, Russell’s framework suggests that intentional architectural design-the very essence of scaffolding-is the path to AI systems that are simultaneously more capable and more reliable.

Russell’s influence extends beyond theoretical work. His research group at Berkeley has contributed to developing practical frameworks for AI governance, model evaluation, and safety testing that directly inform how organisations implement scaffolding in production environments. His emphasis on formal methods, constraint satisfaction, and human-AI collaboration has shaped industry standards for building enterprise-grade AI systems.

References

1. https://zbrain.ai/agent-scaffolding/

2. https://blog.bluedot.org/p/what-is-ai-scaffolding

3. https://www.cio.com/article/4076515/beyond-ai-prompts-why-scaffolding-matters-more-than-scale.html

4. https://www.godofprompt.ai/blog/what-is-prompt-scaffolding

5. https://kpcrossacademy.ua.edu/scaffolding-ai-as-a-learning-collaborator-integrating-artificial-intelligence-in-college-classes/

6. https://www.tandfonline.com/doi/full/10.1080/10494820.2025.2470319

Quote: Jamie Dimon – JP Morgan Chase CEO

“I think the harder thing to measure has always been tech projects. That’s been true my whole life. It’s also been true my whole life, the tech is what changes everything, like everything.” – Jamie Dimon – JP Morgan Chase CEO

Jamie Dimon’s candid observation captures a fundamental tension at the heart of modern business strategy: the profound impact of technology juxtaposed against the persistent challenge of measuring its value. Delivered during JPMorgan Chase’s 2026 Investor Day on 24 February, this remark came amid revelations of the bank’s unprecedented $19.8 billion technology budget – a 10% increase from 2025, with significant allocations to artificial intelligence (AI) projects.^1,2,4 As CEO of the world’s largest bank by market capitalisation, Dimon’s perspective is shaped by decades of navigating technological shifts, from the rise of digital banking to the current AI boom.

Jamie Dimon’s Career and Leadership at JPMorgan Chase

Born in 1956 in New York City to Greek immigrant parents, Jamie Dimon began his career in finance at American Express in the 1980s, rising rapidly under the mentorship of Sandy Weill. He co-led the merger that created Citigroup in 1998 but parted ways acrimoniously in 2000. Dimon then transformed Bank One from near-collapse into a powerhouse, earning a reputation as a crisis manager. In 2004, he became CEO of JPMorgan Chase following its acquisition of Bank One, a role he has held for over two decades.³

Under Dimon’s stewardship, JPMorgan has become a technology leader in banking. The firm employs over 300,000 people, with tens of thousands in tech roles, and invests billions annually in innovation. Dimon has long championed tech as a competitive moat, famously urging investors to ‘trust him’ on spending despite vague ROI metrics. In 2026, this commitment manifests in a tech budget swelled by $2 billion, driven by AI for customer service, personalised insights, and developer tools, amid rising hardware costs from AI chip demand.^1,5 Dimon predicts JPMorgan will be a ‘winner’ in the AI race, leveraging its data assets and No. 1 ranking in AI maturity among banks.^1,3

Context of the Quote: JPMorgan’s 2026 Strategic Framework

The quote emerged in a Q&A at the 24 February 2026 event, responding to analyst pressure on tech ROI. CFO Jeremy Barnum highlighted technology as a major expense driver, up $9 billion overall, with $1.2 billion in investments including AI. Dimon acknowledged time savings from tech as ‘too vague’ to measure precisely, echoing lifelong observations from mainframes to cloud computing.^1,2 This aligns with broader warnings: AI will revolutionise operations but displace jobs, necessitating societal preparation like retraining and phased adoption to avoid shocks, such as mass unemployment from autonomous trucks.⁴

JPMorgan is aggressively deploying AI – its large language model serves 150,000 users weekly – while planning ‘huge redeployment’ for affected staff. Executives like Marianne Lake stress paranoia in competition, quoting ‘Only the paranoid survive’. Rivals like Bank of America ($14 billion tech spend) underscore the sector-wide arms race.¹

Leading Theorists on Technology Measurement and Impact

Dimon’s views resonate with seminal thinkers on technology’s intangible returns. Peter Drucker, the father of modern management, argued in The Practice of Management (1954) that knowledge workers’ output defies traditional metrics, prefiguring tech’s measurement woes. He coined ‘knowledge economy’, emphasising innovation’s long-term value over short-term quantification.[/latex]

Erik Brynjolfsson and Andrew McAfee, MIT economists, explore this in The Second Machine Age (2014), detailing how digital technologies yield ‘non-rival’ benefits – exponential productivity without proportional costs – hard to capture in GDP or ROI. Their ‘bounty vs. spread’ framework warns of uneven gains, mirroring Dimon’s job displacement concerns.⁴

Clayton Christensen’s The Innovator’s Dilemma (1997) explains why incumbents struggle with disruptive tech: metrics favour sustaining innovations, blinding firms to transformative ones. JPMorgan’s shift from infrastructure modernisation to AI-ready data exemplifies overcoming this.⁵

In AI specifically, Nick Bostrom’s Superintelligence (2014) and Stuart Russell’s Human Compatible (2019) address measurement beyond finance – aligning superintelligent systems with human values amid unpredictable impacts. Dimon’s pragmatic focus on phased integration echoes calls for cautious deployment.⁴

These theorists underscore Dimon’s point: technology’s true worth lies in reshaping ‘everything’, demanding faith in leadership over precise yardsticks. JPMorgan’s strategy embodies this, positioning the bank at the vanguard of finance’s technological frontier.

References

1. https://www.businessinsider.com/jpmorgan-tech-budget-ai-20-billion-jamie-dimon-2026-2

2. https://www.aol.com/articles/jpmorgan-spend-almost-20-billion-000403027.html

3. https://www.benzinga.com/markets/large-cap/26/02/50808191/jamie-dimon-predicts-jpmorgan-will-be-a-winner-in-ai-race-boosts-2026-tech-spend-to-nearly-20-billion

4. https://fortune.com/2026/02/25/jamie-dimon-society-prepare-ai-job-displacement/

5. https://finviz.com/news/321869/how-to-play-jpm-stock-as-tech-spend-ramps-in-2026-amid-ai-uncertainty

6. https://fintechmagazine.com/news/inside-jpmorgans-2026-stock-market-hopes-and-new-london-hq

Term: World model

“A world model is defined as a learned neural representation that simulates the dynamics of an environment, enabling an AI agent to predict future states and reason about the consequences of its actions.” – World model

A **world model** is an internal representation of the environment that an AI system creates to simulate the external world within itself. This learned neural representation enables an AI agent to predict future states, simulate the consequences of different actions before executing them in the real world, and reason about causal relationships, much like the human brain does when planning activities.^1,3,6

At its core, a world model comprises key components:

Transition model: Predicts how the environment’s state changes based on the agent’s actions, such as a robot displacing an object by moving its hand.¹
Observation model: Determines what the agent observes in each state, incorporating data from sensors, cameras, and other inputs.¹
Reward model: In reinforcement learning contexts, forecasts rewards or penalties from actions in specific states.¹

Unlike traditional machine learning, which maps inputs directly to outputs, world models foster a general understanding of environmental dynamics, enhancing performance in novel situations.^1,4

Key Capabilities and Advantages

World models empower AI with:

Causality understanding: Grasping why events occur, beyond mere statistical correlations seen in large language models (LLMs) like GPT.^1,2
Planning and reasoning: Simulating scenarios internally to select optimal actions, akin to chain-of-thought reasoning.^1,3
Efficient learning: Requiring fewer examples, similar to a child grasping gravity after minimal observations.¹
Transfer learning and generalisation: Applying knowledge across domains, such as adapting object manipulation skills.¹
Intuitive physics: Comprehending basic physical principles, essential for real-world interaction.^1,4

Trained on diverse data like videos, photos, audio, and text, world models provide richer grounding in reality than LLMs, which focus on text patterns.^2,4,6

Role in Achieving Artificial General Intelligence (AGI)

Prominent figures like Yann LeCun (Meta), Demis Hassabis (Google DeepMind), and Yoshua Bengio (Mila) view world models as crucial for AGI, enabling safe, scientific, and intelligent systems that plan ahead and simulate outcomes.³ Recent advancements, such as DeepMind’s Genie 3 (August 2025), generate diverse 3D environments from text prompts, simulating realistic physics for AI training.¹ Runway’s GWM-1 further advances general-purpose simulation for robotics and discovery.⁵

Best Related Strategy Theorist: Yann LeCun

**Yann LeCun**, Chief AI Scientist at Meta and a pioneer of convolutional neural networks (CNNs), is the foremost theorist championing world models as foundational for intelligent AI. LeCun describes them as internal predictive models that simulate real-world dynamics, incorporating modules for perception, prediction, cost/reward evaluation, and planning. This allows AI to ‘imagine’ action consequences, vital for robotics, autonomous vehicles, and AGI.^2,3

Born in 1960 in France, LeCun earned his PhD in 1987 from Universite Pierre et Marie Curie, Paris, under supervision of Yves Le Cun (no relation). He popularised CNNs in the 1980s-1990s for handwriting recognition, co-founding the field of deep learning. Joining New York University as a professor in 2003, he co-directed the NYU Center for Data Science. In 2013, he became Meta’s first AI head, driving open-source initiatives like PyTorch.

LeCun’s advocacy for world models stems from his critique of LLMs’ limitations in causal reasoning and physical simulation. He argues they enable ‘objective-driven AI’ with energy-based models for planning, positioning world models as the path beyond pattern-matching to human-like intelligence. A Turing Award winner (2018) with Bengio and Hinton, LeCun’s vision influences labs worldwide, emphasising world models for safe, efficient real-world AI.^2,3

References

1. https://deepfa.ir/en/blog/world-model-ai-agi-future

2. https://www.youtube.com/watch?v=qulPOUiz-08

3. https://www.quantamagazine.org/world-models-an-old-idea-in-ai-mount-a-comeback-20250902/

4. https://www.turingpost.com/p/topic-35-what-are-world-models

5. https://runwayml.com/research/introducing-runway-gwm-1

6. https://techcrunch.com/2024/12/14/what-are-ai-world-models-and-why-do-they-matter/

Term: AI Data Centre

“An AI Data Center is a highly specialized, power-dense physical facility designed specifically to train, deploy, and run artificial intelligence (AI) models, machine learning (ML) algorithms, and generative AI applications.” – AI Data Centre

This specialised facility diverges significantly from traditional data centres, which handle mixed enterprise workloads, by prioritising accelerated compute, ultra-high-bandwidth networking, and advanced power and cooling systems to manage dense GPU clusters and continuous data pipelines for AI tasks like model training, fine-tuning, and inference.^1,2,4

Central to its operation are high-performance computing resources such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs). GPUs excel in parallel processing, enabling rapid handling of billions of data points essential for AI model training, while TPUs offer tailored efficiency for AI-specific tasks, reducing energy consumption.^2,3,5

High-speed networking is critical, employing technologies like InfiniBand, 400 Gbps Ethernet, and optical interconnects to facilitate seamless data movement across thousands of servers, preventing bottlenecks in distributed AI workloads.^2,4

Robust storage systems-including distributed file systems and object storage-ensure swift access to vast datasets, model weights, and real-time inference data, with scalability to accommodate ever-growing AI requirements.^1,2,3

Addressing the immense power density, advanced cooling systems are vital, often accounting for 35-40% of energy use, incorporating liquid cooling and thermal zoning to maintain efficiency and low Power Usage Effectiveness (PUE) for sustainability.^2,4

Additional features include data centre automation, network security, and energy-efficient designs, yielding benefits like enhanced performance, scalability, cost optimisation, and support for innovation in fields such as big data analytics, natural language processing, and computer vision.^3,5

Key Theorist: Jensen Huang and the GPU Revolution

The foremost strategist linked to the evolution of AI data centres is Jensen Huang, co-founder, president, and CEO of NVIDIA Corporation. Huang’s vision has positioned NVIDIA’s GPUs as the cornerstone of modern AI infrastructure, directly shaping the architecture of these power-dense facilities.²

Born in 1963 in Taiwan, Huang immigrated to the United States as a child. He earned a bachelor’s degree in electrical engineering from Oregon State University and a master’s from Stanford University. In 1993, at age 30, he co-founded NVIDIA with Chris Malachowsky and Curtis Priem, initially targeting 3D graphics for gaming and PCs. Huang recognised the parallel processing power of GPUs, pivoting NVIDIA towards general-purpose computing on GPUs (CUDA platform, launched 2006), which unlocked their potential for scientific simulations, cryptography, and eventually AI.²

Huang’s prescient relationship to AI data centres stems from his early advocacy for GPU-accelerated computing in machine learning. By 2012, Alex Krizhevsky’s use of NVIDIA GPUs to win the ImageNet competition catalysed the deep learning boom, proving GPUs’ superiority over CPUs for neural networks. Under Huang’s leadership, NVIDIA developed AI-specific hardware like A100 and H100 GPUs, Blackwell architecture, and full-stack solutions including InfiniBand networking via Mellanox (acquired 2020). These innovations address AI data centre challenges: massive parallelism for training trillion-parameter models, high-bandwidth interconnects for multi-node scaling, and power-efficient designs for dense racks consuming up to 100kW each.^2,4

Huang’s biography reflects relentless innovation; he famously wore a black leather jacket onstage, symbolising his contrarian style. NVIDIA’s market cap surged from $3 billion in 2015 to over $3 trillion by 2024, propelled by AI demand. His strategic foresight-declaring in 2017 that “the era of AI has begun”-anticipated the hyperscale AI data centre boom, making NVIDIA indispensable to leaders like Microsoft, Google, and Meta. Huang’s influence extends to sustainability, pushing for efficient cooling and low-PUE designs amid AI’s energy demands.⁴

Today, virtually every major AI data centre relies on NVIDIA technology, underscoring Huang’s role as the architect of the AI infrastructure revolution.

References

1. https://www.aflhyperscale.com/articles/ai-data-center-infrastructure-essentials/

2. https://www.rcrwireless.com/20250407/fundamentals/ai-optimized-data-center

3. https://www.racksolutions.com/news/blog/what-is-an-ai-data-center/

4. https://www.f5.com/glossary/ai-data-center

5. https://www.lenovo.com/us/en/glossary/what-is-ai-data-center/

6. https://www.ibm.com/think/topics/ai-data-center

7. https://www.generativevalue.com/p/a-primer-on-ai-data-centers

8. https://www.sunbirddcim.com/glossary/data-center-components

Term: Edge devices

“Edge devices are physical computing devices located at the ‘edge. of a network, close to where data is generated or consumed, that run AI algorithms and models locally rather than relying exclusively on a centralised cloud or data center.” – Edge devices

Edge devices integrate edge computing with artificial intelligence, enabling real-time data processing on interconnected hardware such as sensors, Internet of Things (IoT) devices, smartphones, cameras, and industrial equipment. This local execution reduces latency to milliseconds, enhances privacy by retaining data on-device, and alleviates network bandwidth strain from constant cloud transmission.^1,4,5

Unlike traditional cloud-based AI, where data travels to remote servers for computation, edge devices perform tasks like predictive analytics, anomaly detection, speech recognition, and machine vision directly at the source. This supports applications in autonomous vehicles, smart factories, healthcare monitoring, retail systems, and wearable technology.^2,3,6

Key Characteristics and Benefits

Low Latency: Processes data in real time without cloud round-trips, critical for time-sensitive scenarios like defect detection in manufacturing.^3,4
Bandwidth Efficiency: Reduces data transfer volumes by analysing locally and sending only aggregated insights to the cloud.^1,5
Enhanced Privacy and Security: Keeps sensitive data on-device, mitigating breach risks during transmission.^5,6
Offline Capability: Operates without constant internet connectivity, ideal for remote or unreliable networks.^6,8

Best Related Strategy Theorist: Dr. Andrew Chi-Chih Yao

Dr. Andrew Chi-Chih Yao, a pioneering computer scientist, stands as the most relevant strategy theorist linked to edge devices through his foundational contributions to distributed computing and efficient algorithms, which underpin modern edge AI architectures. Born in Shanghai, China, in 1946, Yao earned his PhD from Harvard University in 1972 under advisor Patrick C. Fischer. He held faculty positions at MIT, Princeton, and Stanford before joining Tsinghua University in 2004 as Director of the Institute for Interdisciplinary Information Sciences (IIIS), dubbed the ‘Chinese Springboard for talents in computer science’.[external knowledge basis]

Yao’s relationship to edge devices stems from his seminal work on parallel and distributed algorithms, including the Yao minimax principle for computational complexity (1970s), which optimises resource allocation in decentralised systems-directly analogous to edge computing’s local processing paradigm. His PRAM (Parallel Random Access Machine) model formalised efficient parallelism on resource-constrained devices, influencing how AI models are deployed on edge hardware with limited power and compute.[external knowledge basis] Notably, Yao’s research on communication complexity minimises data exchange between nodes, mirroring edge devices’ strategy of local inference to cut cloud dependency-a core tenet echoed in edge AI literature.^1,7

A Turing Award winner (2000) for contributions to computation theory, Yao’s strategic vision emphasises scalable, efficient computing at the periphery, shaping industries from IoT to AI. His mentorship of talents like Jack Ma (Alibaba founder) further extends his influence on practical deployments of edge technologies in global supply chains.

References

1. https://www.ibm.com/think/topics/edge-ai

2. https://www.micron.com/about/micron-glossary/edge-ai

3. https://zededa.com/glossary/edge-ai-computing/

4. https://www.flexential.com/resources/blog/beginners-guide-ai-edge-computing

5. https://www.splunk.com/en_us/blog/learn/edge-ai.html

6. https://www.f5.com/glossary/what-is-edge-ai

7. https://www.cisco.com/site/us/en/learn/topics/artificial-intelligence/what-is-edge-ai.html

8. https://blogs.nvidia.com/blog/what-is-edge-ai/

Quote: Arthur Mensch – Arthur Mensch – Mistral CEO

“In real life, enterprises are complex systems, and you can’t solve that with a single abstraction like AGI. AGI, to a large extent, is a north star of ‘I’m going to make the system better over time.'” – Arthur Mensch – Mistral CEO

Arthur Mensch, CEO of Mistral AI, offers a grounded perspective on artificial general intelligence (AGI), emphasising its role as an aspirational guide rather than a practical fix for intricate business challenges. In a recent Big Technology Podcast interview with Alex Kantrowitz on 16 January 2026, Mensch highlighted how enterprises function as complex systems that defy singular abstractions like AGI, positioning it instead as a directional ‘north star’ for incremental system improvements. This view aligns with his longstanding scepticism towards AGI hype, rooted in his self-described strong atheism and belief that such rhetoric equates to ‘creating God’^1,2,3,4.

Who is Arthur Mensch?

Born in Paris, Arthur Mensch, aged 31, is a French entrepreneur and AI researcher who co-founded Mistral AI in 2023 alongside former Meta engineers Timothée Lacroix and Guillaume Lample. Before Mistral, Mensch worked as an engineer at Google DeepMind’s Paris lab, gaining expertise in advanced AI models^2,4. His venture quickly rose to prominence, positioning Europe as a contender in the AI landscape dominated by US giants. Mistral’s models, including open-weight offerings, have secured partnerships like one with Microsoft in early 2024, while attracting support from the French government and investors such as former digital minister Cédric O^2,4. Mensch advocates for a ‘European champion’ in AI to counterbalance cultural influences from American tech firms, stressing that AI shapes global perceptions and values². He warns against over-reliance on US competitors for AI standards, pushing for lighter European regulations to foster innovation⁴.

Context of the Quote

Mensch’s statement emerges amid intensifying AI debates, just two days before this post, on a podcast discussing real-world AI applications. It reflects his consistent dismissal of AGI as an unattainable, quasi-religious pursuit, a stance he reiterated in a 2024 New York Times interview: ‘The whole AGI rhetoric is about creating God. I don’t believe in God. I’m a strong atheist. So I don’t believe in AGI’^1,2,3,4. Unlike peers forecasting AGI’s imminent arrival, Mensch prioritises practical AI tools that enhance productivity, predicting rapid workforce retraining needs within two years rather than a decade⁴. He critiques Big Tech’s open-source strategies as competitive ploys and emphasises culturally attuned AI development^1,2. This podcast remark builds on those themes, applying them to enterprise complexity where iterative progress trumps hypothetical superintelligence.

Leading Theorists on AGI and Complex Systems

The discourse around AGI and its limits in complex systems draws from pioneering theorists in AI, cybernetics, and systems theory.

Alan Turing (1912-1954): Laid AI foundations with his 1950 ‘Computing Machinery and Intelligence’ paper, proposing the Turing Test for machine intelligence. He envisioned machines mimicking human cognition but did not pursue god-like generality, focusing on computable problems[internal knowledge].
Norbert Wiener (1894-1964): Founder of cybernetics, which studies control and communication in animals and machines. In Cybernetics (1948), Wiener described enterprises and societies as dynamic feedback systems resistant to simple models, prefiguring Mensch’s complexity argument[internal knowledge].
John McCarthy (1927-2011): Coined ‘artificial intelligence’ in 1956 at the Dartmouth Conference, distinguishing narrow AI from general forms. He advocated high-level programming for generality but recognised real-world messiness[internal knowledge].
Demis Hassabis: Google DeepMind CEO and Mensch’s former colleague, predicts AGI within years, viewing it as AI matching human versatility across tasks. Hassabis emphasises multimodal learning from games like AlphaGo⁴[internal knowledge].
Sam Altman and Elon Musk: OpenAI’s Altman warns of AGI risks like ‘subtle misalignments’ while pursuing it as transformative; Musk forecasts superhuman AI by late 2025 and sues OpenAI over profit shifts^3,4. Both treat AGI as epochal, contrasting Mensch’s pragmatism.

These figures highlight a divide: early theorists like Wiener stressed systemic complexity, while modern leaders like Hassabis chase generality. Mensch bridges this by favouring commoditised, improvable AI over AGI mythology[TAGS].

Implications for AI and Enterprise

Mensch’s philosophy underscores AI’s commoditisation, where models like Mistral’s drive efficiency without superintelligence. This resonates in Europe’s push for sovereign AI, amid tags like commoditisation and artificial intelligence[TAGS]. As enterprises navigate complexity, his ‘north star’ metaphor encourages sustained progress over speculative leaps.

References

1. https://www.businessinsider.com/mistrals-ceo-said-obsession-with-agi-about-creating-god-2024-4

2. https://futurism.com/the-byte/mistral-ceo-agi-god

3. https://www.benzinga.com/news/24/04/38266018/mistral-ceo-shades-openais-sam-altman-says-obsession-with-reaching-agi-is-about-creating-god

4. https://fortune.com/europe/article/mistral-boss-tech-ceos-obsession-ai-outsmarting-humans-very-religious-fascination/

5. https://www.binance.com/en/square/post/6742502031714

6. https://www.christianpost.com/cartoon/musk-to-altman-what-are-tech-moguls-saying-about-ai-and-agi.html?page=5

Quote: Andrej Karpathy – Previously Director of AI at Tesla, founding team at OpenAI

“Programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You’re spinning up AI agents, giving them tasks in English and managing and reviewing their work in parallel.” – Andrej Karpathy – Previously Director of AI at Tesla, founding team at OpenAI

This statement captures a pivotal moment in the evolution of software development, where traditional coding practices are giving way to a new era dominated by AI agents. Spoken by Andrej Karpathy, a visionary in artificial intelligence, it reflects the rapid transformation driven by large language models (LLMs) and autonomous systems. Karpathy’s insight underscores how programming is shifting from manual code entry to orchestrating intelligent agents via natural language, marking the end of an era that began with the earliest computers.

About Andrej Karpathy

Andrej Karpathy is a leading figure in AI, renowned for his contributions to deep learning and computer vision. A founding member of OpenAI in 2015, he played a key role in pioneering advancements in generative models and neural networks. Later, as Director of AI at Tesla, he led the Autopilot vision team, developing autonomous driving technologies that pushed the boundaries of real-world AI deployment. Today, he is building Eureka Labs, an AI-native educational platform. His talks and writings, such as ‘Software Is Changing (Again),’ articulate the shift to ‘Software 3.0,’ where LLMs enable programming in natural language like English.1 2 3

Karpathy’s line struck a nerve because it didn’t describe a distant future. It sounded like a description of what many engineers were already starting to experience in early 2026. The shift he’s talking about is less about writing code and more about orchestrating work—breaking problems into pieces, describing them in plain language, and then supervising agents that actually execute them.

The February Leap: Codex 5.2 and Claude Code

What made this moment feel like a real inflection was the quality jump in early 2026. When tools like ChatGPT Codex 5.2 and Claude Code landed in February, they weren’t just “better autocomplete.” They could stay on task for long, multi?step workflows, recover from errors, and push through the kind of friction that used to send developers back to the keyboard.

Karpathy has described this himself: coding agents that “basically didn’t work before December and basically work since,” with noticeably higher quality, long?term coherence, and tenacity. The February releases crystallised that shift. What used to be a weekend project became something you could kick off, let the agent run for 20–30 minutes, and then review—all while thinking about the next layer of the system rather than the syntax of the current one.

A New Kind of Programming Workflow

The pattern Karpathy is describing is less “pair programming with an autocomplete” and more “manager?style delegation.” You frame a task in English, give the agent context, tools, and constraints, and then let it run multiple steps in parallel—installing dependencies, writing tests, debugging, and even documenting the outcome. You then review outputs, steer the next round, and gradually refine the agent’s instructions.

This isn’t a replacement for engineering judgment. It’s a layer on top: your job becomes decomposing work, defining what success looks like, and deciding which parts to hand off and which to keep close. The “productivity flywheel” turns faster when you can treat the agent as a high?leverage assistant that can keep going while you move up the stack.

Software 3.0, In Practice

Karpathy has long framed this as Software 3.0—the evolution of programming from:

Software 1.0: explicit code written in languages like C++ or Python, where the programmer spells out every step.
Software 2.0: neural networks trained on data, where the “program” is a dataset and training objective rather than a long list of rules.
Software 3.0: natural?language?driven agents that compose systems, debug problems, and manage long?running workflows, while still relying on 1.0 and 2.0 components underneath.

The February releases of Codex 5.2 and Claude Code made Software 3.0 feel tangible. It’s no longer a thought experiment; it’s something practitioners can use today for tasks that are well?specified and easy to verify—infrastructure setup, data pipelines, internal tooling, and boilerplate?heavy workflows.

What This Means for Practitioners

The implication isn’t that “everyone will be a programmer.” It’s that the nature of programming is changing. The most valuable skills are no longer just fluency in a language, but:

Decomposing complex work into agent?friendly tasks,
Designing interfaces and documentation that models can use effectively,
Building feedback loops and guardrails so agents can operate safely, and
Knowing when to lean in (complex, under?specified logic) and when to lean out (repetitive, well?structured work).

Karpathy’s point is that the default workflow is no longer “you write code line by line.” The era where the editor is the center of the universe is ending. Programming is becoming less about keystrokes and more about direction, oversight, and iteration—with AI agents as the new layer of execution in between.

Leading Theorists and Influences

Karpathy’s views draw from pioneers in AI and agents. Ilya Sutskever, his OpenAI co-founder, advanced sequence models like GPT, enabling natural language programming. At Tesla, Ashok Elluswamy and the Autopilot team influenced his emphasis on human-AI loops and ‘autonomy sliders.’ Broader influences include Andrew Ng, under whom Karpathy studied at Stanford, popularising deep learning education, and Yann LeCun, whose convolutional networks underpin vision AI. Recent agentic work echoes Yohei Nakajima’s BabyAGI (2023), an early autonomous agent framework, and Microsoft’s AutoGen for multi-agent systems. Karpathy positions agents as a new ‘consumer of digital information,’ urging infrastructure redesign for LLM autonomy.1 2 3

Implications for the Future

This shift promises unprecedented productivity but demands new skills: fluency across paradigms, agent management, and ‘applied psychology of neural nets.’ As Karpathy notes, ‘everyone is now a programmer’ via English, yet professionals must build for agents – rewriting codebases and creating agent-friendly interfaces. With LLM capabilities surging by late 2025, 2026 heralds a ‘high energy’ phase of industry adaptation.1 4

References

1. https://www.businessinsider.com/agentic-engineering-andrej-karpathy-vibe-coding-2026-2

2. https://www.youtube.com/watch?v=LCEmiRjPEtQ

3. https://singjupost.com/andrej-karpathy-software-is-changing-again/

4. https://paweldubiel.com/42l1%E2%81%9D–Andrej-Karpathy-quote-26-Jan-2026-

5. https://www.christopherspenn.com/2024/07/mind-readings-generative-ai-as-a-programming-language/

6. https://www.ycombinator.com/library/MW-andrej-karpathy-software-is-changing-again

7. https://karpathy.ai/tweets.html

Term: Agent2Agent (A2A)

“The Agent2Agent (A2A) protocol is an open standard that enables different AI agents, built by various vendors and using diverse frameworks, to seamlessly communicate, collaborate, and coordinate on complex tasks.” – Agent2Agent (A2A)

A2A addresses the challenges of multi-agent systems by providing a vendor-neutral framework for agents to discover each other, exchange capabilities, delegate tasks, and manage complex workflows.^1,2,3 It leverages familiar web standards such as HTTP, JSON-RPC, and Server-Sent Events (SSE) to ensure reliable, interoperable interactions while incorporating enterprise-grade security features like JWT and OIDC authentication.¹

Key Features of A2A

Agent Discovery and Capabilities Exchange: Agents publish standardised ‘Agent Cards’ (JSON files) that detail their abilities, enabling dynamic discovery and task negotiation.^1,3
Structured Task Management: Defines protocols for task delegation using unique task IDs, supporting states like submitted, working, and completed, ideal for long-running processes.^1,3
Standards-Based Communication: Uses HTTP POST requests and structured JSON messages for consistent messaging between client agents (task initiators) and remote agents (task executors).^1,3
Enterprise Security and Privacy: Includes encryption, fine-grained authorisation, payload validation, and support for various authentication schemes to protect data and identities.^1,2
Support for Collaboration: Facilitates message exchanges for context sharing, real-time updates via asynchronous notifications, and dynamic UX negotiation.^1,3

How A2A Works

A2A operates on a client-server model: the client agent formulates tasks and identifies suitable remote agents via Agent Cards, then communicates structured requests over HTTP.³ Tasks progress through defined lifecycles with messages containing parts for content delivery, ensuring agents remain synchronised even in opaque, diverse environments.^1,3

For example, in e-commerce, an inventory agent could use A2A to collaborate with demand forecasting, customer service, and logistics agents to optimise supply chains.⁵

Key Theorist: Sundar Pichai and Google’s Role in A2A

No single ‘strategy theorist’ in the traditional academic sense originated A2A, as it is a practical engineering protocol driven by industry leaders. The most directly associated figure is **Sundar Pichai**, CEO of Google and Alphabet Inc., whose strategic vision propelled its development and announcement.⁴

Biography of Sundar Pichai

Born in 1972 in Madurai, India, Sundar Pichai grew up in a modest middle-class family. He excelled academically, earning a degree in metallurgical engineering from the Indian Institute of Technology Kharagpur in 1993. Pichai then pursued higher education in the US, obtaining an MS in materials science from Stanford University and an MBA from the Wharton School of the University of Pennsylvania.¹ (Note: Biographical details drawn from general knowledge, aligned with A2A context.)

Joining Google in 2004, Pichai initially led product management for Google Chrome, transforming it into the world’s most-used browser through innovative strategies emphasising speed, security, and user-centric design. His success led to promotions: Vice President of Product Development (2008), overseeing Chrome OS and apps; Senior VP for Chrome and Android (2012); and Chief Business Officer (2014). In 2015, he became CEO of Google, and in 2019, CEO of parent company Alphabet Inc.⁴ (contextual link).

Relationship to A2A

Under Pichai’s leadership, Google prioritised AI agent interoperability as part of its broader AI strategy, culminating in the A2A protocol’s announcement via the Google Developers Blog in 2025.⁴ Pichai’s emphasis on open standards mirrors his earlier work on Chrome’s open-source model, fostering ecosystems over proprietary silos. A2A embodies his vision for ‘a new era of agent interoperability,’ enabling secure multi-agent collaboration across frameworks – much like Android unified mobile ecosystems.^1,4

Pichai’s strategic oversight ensured A2A adhered to principles of discovery, interoperability, delegation, and trust, positioning Google as a leader in agentic AI infrastructure while inviting broad industry adoption through its open GitHub repository.⁷

Tags: Agent2Agent, A2A, agents, AI, artificial intelligence, term

References

1. https://www.solo.io/topics/ai-infrastructure/what-is-a2a

2. https://developer.pingidentity.com/identity-for-ai/agents/idai-what-is-a2a.html

3. https://www.descope.com/learn/post/a2a

4. https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/

5. https://www.alumio.com/blog/what-is-a2a-agent2agent-ai-protocol

6. https://www.credal.ai/blog/what-is-agent2agent-a2a-protocol

7. https://github.com/a2aproject/A2A

8. https://ai.pydantic.dev/a2a/

9. https://www.youtube.com/watch?v=Tud9HLTk8hg

Quote: Arthur Mensch – Mistral CEO

“There’s no such thing as one system that is going to be solving all the problems of the world. You don’t have any human able to solve every task in the world. You of course need some amount of specialisation to solve problems.” – Arthur Mensch – Mistral CEO

Arthur Mensch’s observation about specialisation in artificial intelligence reflects a fundamental principle that has shaped not only his work at Mistral AI, but also the broader trajectory of how we think about building intelligent systems. The statement emerges from a pragmatic understanding of complexity-one that draws parallels between human expertise and machine learning, whilst challenging the prevailing assumption that larger, more generalised models represent the inevitable future of AI.

The Context: A Moment of Inflection in AI Development

When Mensch made this statement on the Big Technology Podcast in January 2026, the artificial intelligence landscape was at a critical juncture. The initial euphoria surrounding large language models like GPT-4 and their apparent ability to handle diverse tasks had begun to give way to a more nuanced understanding of their limitations. Organisations deploying these systems were discovering that whilst general-purpose models could perform adequately across many domains, they rarely excelled in any single domain. The cost of running these massive systems, combined with their mediocre performance on specialised tasks, created an opening for a different approach-one that Mensch and Mistral AI have been actively pursuing since the company’s founding in May 2023.

Mensch’s background as a machine learning researcher with a PhD in machine learning and functional magnetic resonance imaging, combined with his experience at Google DeepMind working on large language models, positioned him uniquely to recognise this gap. His two co-founders, Guillaume Lample and Timothée Lacroix, brought complementary expertise from Meta’s AI research division. Together, they had witnessed firsthand the capabilities and constraints of cutting-edge AI systems, and they recognised that the industry was pursuing a path that, whilst impressive in breadth, lacked depth.

The Philosophy Behind Mistral’s Approach

Mistral AI’s strategy directly operationalises Mensch’s philosophy about specialisation. Rather than attempting to build a single monolithic system that claims to solve all problems, the company has focused on developing smaller, more efficient models that can be tailored to specific use cases. This approach has proven remarkably successful: within four months of founding, Mistral released its 7B model, which outperformed larger competitors in many benchmarks. The company achieved unicorn status-a valuation exceeding $1 billion-within its first year, a trajectory that vindicated Mensch’s conviction that specialisation was not merely philosophically sound but commercially viable.

The emphasis on smaller models that can run locally on devices, rather than requiring centralised cloud infrastructure, represents a practical manifestation of this specialisation principle. A financial services institution, for instance, can deploy a model specifically optimised for fraud detection or regulatory compliance, rather than relying on a general-purpose system that must compromise between countless competing objectives. A healthcare provider can implement a model trained on medical literature and patient data, rather than one diluted by training on the entire internet. This is not merely more efficient; it is fundamentally more effective.

Theoretical Foundations: The Specialisation Principle in Machine Learning

Mensch’s assertion draws upon well-established principles in machine learning and cognitive science. The concept of specialisation in learning systems has deep roots in the field. In the 1990s and 2000s, researchers including Yann LeCun and Geoffrey Hinton-pioneers in deep learning-recognised that neural networks trained on specific tasks often outperformed more generalised architectures. This principle, sometimes referred to as the “bias-variance tradeoff,” suggests that systems optimised for particular problems can achieve superior performance by accepting constraints that would be inappropriate for general-purpose systems.

The analogy to human expertise is particularly apt. A world-class cardiologist possesses knowledge and intuition that a general practitioner cannot match, despite the latter’s broader medical knowledge. This specialisation comes from years of focused study, deliberate practice, and exposure to patterns specific to their domain. Similarly, an AI system trained extensively on financial data, with architectural choices optimised for temporal sequences and numerical relationships, will outperform a general model on financial forecasting tasks. The human brain itself demonstrates this principle: different regions specialise in different functions, and whilst there is integration across these regions, the specialisation is fundamental to cognitive capability.

This principle also aligns with recent research in transfer learning and domain adaptation. Researchers including Fei-Fei Li at Stanford have demonstrated that models pre-trained on large, diverse datasets often require substantial fine-tuning to perform well on specific tasks. The fine-tuning process essentially involves re-specialising the model, suggesting that the initial generalisation, whilst useful as a starting point, is not the endpoint of effective AI development.

The Commoditisation Argument

Embedded within Mensch’s statement is an implicit argument about the commoditisation of AI. If a single system could genuinely solve all problems, it would represent the ultimate commodity-a universal tool that would rapidly become standardised and undifferentiated. The fact that no such system exists, and that the laws of machine learning suggest none can exist, means that competitive advantage in AI will increasingly accrue to those who can build specialised systems tailored to specific domains and use cases.

This has profound implications for the structure of the AI industry. Rather than a winner-take-all market dominated by a handful of companies with the largest models, Mensch’s vision suggests a more distributed ecosystem where numerous companies build specialised solutions. Mistral’s open-source strategy supports this vision: by releasing models that developers can fine-tune and adapt, the company enables a proliferation of specialised applications rather than enforcing dependence on a single centralised system.

The comparison to human society is instructive. We do not have a single human who solves all problems; instead, we have a complex division of labour with specialists in countless domains. The most advanced societies are those that have developed the most sophisticated mechanisms for specialisation and coordination. An AI ecosystem that mirrors this structure-with specialised systems coordinating to solve complex problems-may ultimately prove more capable and more resilient than one built around monolithic general-purpose systems.

Implications for the Future of Work and AI Deployment

Mensch has articulated elsewhere his vision for how AI will transform work. Rather than replacing human workers wholesale, AI will handle routine, well-defined tasks, freeing humans to focus on activities that require creativity, relationship management, and novel problem-solving. This vision is entirely consistent with the specialisation principle: specialised AI systems handle their specific domains, whilst humans focus on the uniquely human aspects of work. A specialised AI system for document processing, another for customer service routing, and another for data analysis can work in concert, each excelling in its domain, with human judgment and creativity orchestrating their outputs.

This approach also addresses concerns about AI safety and alignment. A specialised system optimised for a specific task, with clear boundaries and well-defined objectives, is inherently more interpretable and controllable than a general-purpose system trained to optimise for performance across thousands of disparate tasks. The constraints that make a system specialised also make it more trustworthy.

The Broader Intellectual Landscape

Mensch’s perspective aligns with emerging consensus among leading AI researchers. Yann LeCun, Chief AI Scientist at Meta, has increasingly emphasised the limitations of large language models and the need for AI systems with different architectures and training approaches for different tasks. Demis Hassabis, CEO of Google DeepMind, has similarly highlighted the importance of building AI systems with appropriate inductive biases for their intended domains. The field is gradually moving away from the assumption that scale and generality are sufficient, towards a more nuanced understanding of how to build effective AI systems.

This intellectual shift reflects a maturation of the field. The initial excitement about large language models was justified-they represented a genuine breakthrough in our ability to build systems that could engage in flexible, language-based reasoning. However, the assumption that this breakthrough would generalise to all domains, and that bigger models would always be better, has proven naive. The next phase of AI development will likely be characterised by greater diversity in approaches, architectures, and training methodologies, with specialisation playing an increasingly central role.

Mensch’s Role in Shaping This Future

Arthur Mensch’s significance lies not merely in his articulation of these principles, but in his demonstrated ability to execute on them. Mistral AI’s rapid ascent-achieving a $2.1 billion valuation within approximately two years of founding-suggests that the market recognises the validity of the specialisation approach. The company’s success in attracting top talent, securing substantial venture funding, and building a platform that developers actively choose to build upon indicates that Mensch’s vision resonates with practitioners who understand the practical constraints of deploying AI systems.

In 2024, Mensch was recognised on TIME’s 100 Next list, an acknowledgment of his influence on the future direction of technology. The recognition highlighted his ability to combine “bold vision with execution,” his commitment to democratising AI through open-source models, and his foresight in addressing gaps overlooked by others. These qualities-vision, execution, and attention to overlooked opportunities-are precisely what the specialisation principle requires.

Mensch’s background as an academic researcher who transitioned to entrepreneurship also shapes his approach. Unlike entrepreneurs who might prioritise rapid growth and market dominance above all else, Mensch brings a researcher’s commitment to understanding fundamental principles. His insistence on specialisation is not a marketing narrative but a reflection of his deep understanding of how learning systems actually work.

Conclusion: A Principle for the Age of AI

The statement that “there’s no such thing as one system that is going to be solving all the problems of the world” may seem obvious in retrospect, but it represents a crucial corrective to the prevailing assumptions of the AI industry. It grounds AI development in principles drawn from human expertise, cognitive science, and machine learning theory. It suggests that the future of AI is not a race to build ever-larger models, but rather a more sophisticated ecosystem of specialised systems, each optimised for its domain, working in concert to solve complex problems.

For organisations deploying AI, for researchers developing new approaches, and for policymakers considering how to regulate AI development, Mensch’s principle offers clear guidance: invest in specialisation, build systems with appropriate constraints for their domains, and recognise that the most powerful AI systems will likely be those that do one thing exceptionally well, rather than many things adequately. In an age of increasing complexity, specialisation is not a limitation but a necessity-and a source of genuine competitive advantage.

References

1. https://www.allamericanspeakers.com/celebritytalentbios/Arthur+Mensch/462557

2. https://www.mckinsey.com/featured-insights/insights-on-europe/videos-and-podcasts/creating-a-european-ai-unicorn-interview-with-arthur-mensch-ceo-of-mistral-ai

3. https://blog.eladgil.com/p/discussion-w-arthur-mensch-ceo-of

4. https://time.com/collections/time100-next-2024/7023471/arthur-mensch-2/

5. https://thecreatorsai.com/p/the-story-of-arthur-mensch-how-to

6. https://www.antoinebuteau.com/lessons-from-arthur-mensch/

Quote: Jamie Dimon – JP Morgan Chase CEO

“I see a couple people doing some dumb things. They’re just doing dumb things to create NII.” – Jamie Dimon – JP Morgan Chase CEO

In a candid assessment delivered at JPMorgan Chase’s 2026 company update on 23 February, CEO Jamie Dimon voiced profound concerns about the financial landscape, drawing direct parallels to the reckless lending practices that precipitated the 2008 global financial crisis. He observed competitors engaging in imprudent strategies purely to inflate net interest income (NII), a key profitability metric derived from lending spreads and investments^1,3. This remark underscores Dimon’s longstanding vigilance amid buoyant markets, where high asset prices and surging volumes foster complacency^1,2.

Jamie Dimon’s Background and Leadership

Jamie Dimon, born in 1956 in New York to Greek immigrant parents, embodies the archetype of a battle-hardened banker. Educated at Tufts University and Harvard Business School, he ascended through the ranks at American Express and Citigroup before co-founding Bank One in 1991, where he orchestrated a remarkable turnaround. In 2004, he assumed the helm of JPMorgan Chase following its acquisition of Bank One, steering the institution through the 2008 crisis as one of the few major banks to emerge unscathed. Under his stewardship, JPMorgan has ballooned into the world’s most valuable bank by market capitalisation, with Dimon earning renown for his prescient risk management and forthright annual shareholder letters¹. His tenure has been marked by navigating geopolitical tensions, regulatory scrutiny, and technological disruptions, all while prioritising capital strength over opportunistic growth.

Context of the Quote: A Market on the Brink?

Dimon’s comments arrived against a backdrop of intensifying competition in lending and private credit markets, where firms scramble to capture market share amid elevated interest rates and economic optimism. He likened the current environment to 2005-2007, when ‘the rising tide was lifting all boats’ and excessive leverage permeated the system, culminating in subprime mortgage meltdowns^1,2,3. Recent indicators, such as the collapse of subprime auto lender Tricolor Holdings and debt-burdened First Brands, evoked Dimon’s ‘cockroach theory’ – spotting one signals an infestation¹. Broader anxieties include artificial intelligence’s disruptive potential across sectors like software, utilities, and telecommunications, mirroring unforeseen vulnerabilities exposed in 2008^2,3. Despite S&P 500 highs, Dimon cautioned that credit cycles invariably turn, with surprises lurking in unexpected quarters³. JPMorgan, he affirmed, adheres strictly to underwriting standards, forgoing business rather than compromising¹.

Leading Theorists on Financial Crises and Risk-Taking

Dimon’s perspective resonates with seminal theories on financial instability. Hyman Minsky, the American economist whose ‘financial instability hypothesis’ (developed in the 1970s and 1980s) posits that stability breeds complacency, prompting speculative and Ponzi financing schemes that amplify booms into busts. Minsky argued that prolonged prosperity erodes risk aversion, much as Dimon describes today’s ‘dumb things’ to chase NII¹.

Complementing this, Charles Kindleberger’s Manias, Panics, and Crashes (1978, updated editions) outlines the anatomy of bubbles: displacement, boom, euphoria, profit-taking, and panic. Kindleberger, building on Kindleberger’s historical analyses, highlighted herd behaviour and leverage as crisis harbingers, echoing Dimon’s pre-2008 parallels².

Modern extensions include Raghuram Rajan, former IMF Chief Economist and Reserve Bank of India Governor, whose 2005 Jackson Hole speech presciently warned of incentives driving financial institutions towards systemic risks. Rajan’s ‘search for yield’ concept – akin to boosting NII through lax lending – anticipated 2008 excesses³.

Nouriel Roubini, dubbed ‘Dr Doom’, forecasted the 2008 subprime debacle in 2006, emphasising global imbalances, debt overhangs, and asset bubbles. His framework aligns with Dimon’s cycle warnings, stressing confluence events like AI disruptions or policy shifts².

These theorists collectively illuminate Dimon’s caution: markets’ euphoria masks fragility, demanding disciplined risk assessment amid competitive pressures.

Implications for Investors and Markets

Heightened Vigilance: Dimon’s stance signals potential volatility in private credit and lending, urging scrutiny of banks’ NII strategies.
Sectoral Risks: AI-driven upheavals could mirror 2008’s utility surprises, impacting software and beyond.
JPMorgan’s Edge: Conservative positioning may yield resilience, as proven in prior downturns.

Dimon’s words serve as a clarion call: prosperity’s siren song often precedes turbulence. Prudent navigation demands heeding history’s lessons.

References

1. https://www.businessinsider.com/jamie-dimon-banks-doing-dumb-things-2008-credit-crisis-warning-2026-2

2. https://economictimes.com/markets/stocks/news/jpmorgan-ceo-jamie-dimon-warns-ai-and-dumb-things-can-trigger-a-2008-like-crisis/articleshow/128770717.cms

3. https://www.news18.com/business/banking-finance/jpmorgan-chase-ceo-warns-of-dumb-risk-taking-by-financial-firms-sees-echoes-of-2008-crisis-ws-l-9926903.html

4. https://en.sedaily.com/international/2026/02/24/jpmorgan-ceo-dimon-warns-of-pre-2008-crisis-similarities

Term: AI skills

“Skills are essentially curated instructions containing best practices, guidelines, and workflows that AI can reference when performing particular types of work. They’re like expert manuals that help AI produce higher-quality outputs for specialised tasks.” – AI skills

AI skills are structured sets of curated instructions, best practices, guidelines, and workflows that artificial intelligence systems reference when performing particular types of work. They function as expert manuals or knowledge repositories, enabling AI to produce higher-quality outputs for specialised tasks by drawing on accumulated domain expertise and proven methodologies.

Unlike general-purpose AI capabilities, skills represent a layer of curation and refinement that transforms raw AI capacity into contextually appropriate, task-specific performance. They embody the principle that filter intelligence-the ability to distinguish valuable information from noise-has become essential in an AI-driven world, where the volume of available data and potential outputs far exceeds what any individual or system can meaningfully process.

Core Characteristics

Structured Knowledge: Skills organise information into actionable formats that AI systems can readily access and apply, rather than requiring the system to search through unstructured data.
Domain Specificity: Each skill is tailored to particular types of work, ensuring that AI outputs reflect the nuances, standards, and best practices of that domain.
Quality Enhancement: By constraining AI outputs to established guidelines and proven workflows, skills improve consistency, accuracy, and relevance compared to unconstrained generation.
Continuous Refinement: Like knowledge curation more broadly, skills require ongoing maintenance, verification, and updating to remain accurate and aligned with evolving practices.
Human-AI Collaboration: Skills represent the intersection of human expertise and AI capability-humans curate and validate the instructions; AI applies them at scale.

Practical Applications

AI skills manifest across multiple contexts:

Learning and Development: Curated training materials, course recommendations, and procedural documentation that AI systems use to personalise employee learning pathways and deliver relevant content.
Content Generation: Guidelines for tone, style, accuracy standards, and domain-specific terminology that shape AI-generated text, ensuring outputs match organisational voice and quality expectations.
Technical Documentation: Structured workflows and best practices that enable AI to generate or organise software documentation, reducing search time and improving accessibility.
Knowledge Management: Taxonomies, metadata standards, and verification protocols that help AI systems organise, categorise, and validate information within organisational knowledge bases.
Decision Support: Curated decision trees, risk assessment frameworks, and contextual guidelines that enable AI to provide recommendations aligned with organisational values and risk tolerance.

The Relationship to Filter Intelligence

AI skills are fundamentally about curation-the process of selecting, organising, verifying, and enriching information to make it more useful and trustworthy. In an age where AI can generate vast quantities of content and analysis, the critical human skill is no longer the ability to process information (which AI can do at scale) but rather the ability to filter, judge, and curate what matters.

This reflects a broader shift in how organisations and individuals must operate. Traditional intelligence-the ability to learn facts and processes-can now be outsourced to AI. What cannot be outsourced is the judgment required to determine which AI outputs are accurate, which are misleading, and which are worth acting upon. AI skills encode this judgment into reusable, systematised form.

Implementation Considerations

Effective AI skills require:

Clear ownership and accountability for skill development and maintenance
Regular audits to identify outdated or conflicting guidance
Verification processes to ensure accuracy and relevance
Accessible documentation that explains not just what to do but why and when
Integration with broader content governance policies
Feedback loops that allow AI systems and human users to surface gaps or failures in skill application

Related Theorist: Charles Fadel

Charles Fadel is an educational theorist and thought leader whose work directly addresses the role of curation in an AI-driven world. His framework for education in the age of artificial intelligence places curation at the centre of how organisations and individuals must adapt.

Biographical Context

Fadel is the founder and chairman of the Centre for Curriculum Redesign, an international non-profit organisation dedicated to rethinking education for the 21st century. He has held leadership roles at the World Economic Forum and has been instrumental in developing competency frameworks that emphasise skills beyond traditional knowledge acquisition. His background spans education policy, curriculum design, and futures thinking, positioning him at the intersection of pedagogy and technological change.

Relationship to AI Skills and Curation

In his work Education for the Age of AI, Fadel articulates a vision in which curation becomes a foundational competency. He argues that as AI systems become more powerful and capable of handling routine information processing, the human role must shift toward curating knowledge rather than merely acquiring it. This directly parallels the concept of AI skills: just as humans must learn to curate and judge AI outputs, organisations must curate the instructions and best practices that guide AI systems themselves.

Fadel distinguishes between three types of knowledge: declarative (facts and figures), procedural (how to do things), and conceptual (understanding why). He contends that in an AI age, organisations should prioritise procedural and conceptual knowledge-precisely the elements that constitute effective AI skills. An AI skill is not a collection of facts; it is a curated set of procedures and conceptual frameworks that enable consistent, high-quality performance.

Furthermore, Fadel emphasises what he calls the Drivers-agency, identity, purpose, and motivation-as essential human capacities that cannot be automated. AI skills, in this framework, are tools that free humans from routine tasks so they can focus on these higher-order capacities. By encoding best practices into skills, organisations enable their AI systems to handle specialised work whilst their human teams concentrate on judgment, creativity, and strategic direction.

Fadel’s work also highlights the importance of critical thinking and creativity as priority competencies. These are precisely the capacities required to develop, refine, and validate AI skills. Someone must decide what constitutes a best practice, what guidelines are most relevant, and when a skill requires updating. This curation work is fundamentally creative and critical-it requires immersion in a domain, the ability to distinguish signal from noise, and the judgment to make difficult trade-offs about what to include and what to exclude.

Conclusion

AI skills represent a practical instantiation of curation as a core competency in an AI-driven world. They embody the principle that as machines become more capable at processing information and generating outputs, human value increasingly lies in the ability to curate, judge, and refine. By systematising best practices and domain expertise into reusable skills, organisations create a feedback loop in which AI systems produce higher-quality work, humans can focus on higher-order judgment, and the organisation’s collective knowledge becomes more accessible and trustworthy.

References

1. https://ocasta.com/glossary/internal-comms/ai-driven-content-curation-for-employees/

2. https://www.digitallearninginstitute.com/blog/ai-transformative-effect-on-curating-content

3. https://www.glitter.io/glossary/knowledge-curation

4. https://futureiq.substack.com/p/curate-your-consumption-the-most

5. https://www.gettingsmart.com/2025/09/16/3-human-skills-that-make-you-irreplaceable-in-an-ai-world/

6. https://spencereducation.com/content-curation-ai/

7. https://www.techclass.com/resources/learning-and-development-articles/how-ld-teams-can-curate-smarter-content-with-ai

8. https://ploko.nl/en/knowledge-base/ai-content-curation/

Global Advisors | Quantified Strategy Consulting

“Tool calling (often called function calling) is a technical capability in modern AI systems-specifically Large Language Models (LLMs)-that allows the model to interact with external tools, APIs, or databases to perform tasks beyond its own training data.” – Tool calling

How Tool Calling Works

Defining Tools and Functions

Types of Tool Calling

Primary Use Cases

Practical Applications

Key Distinction: Tools vs Functions

Related Strategy Theorist: Andrew Ng

Share this:

“Diffusion models are a class of generative artificial intelligence (AI) models that create new data instances by learning to reverse a gradual, step-by-step process of adding noise to training data.” – Diffusion models

Core Mechanism

Key Components and Architecture

Advantages Over Alternative Approaches

Applications and Impact

Mathematical Foundation

Theoretical Lineage: Yoshua Bengio and Deep Learning Foundations

Share this:

“Model density” in AI, particularly regarding LLMs, is a performance-efficiency metric defined as the ratio of a model’s effective capability (performance) to its total parameter size.” – Model density

The Core Concept

Information Compression and the “Great Squeeze”

Semantic Density and Output Reliability

Intelligence Density in Practical Application

The Exponential Progress Trend

Related Theorist: Ilya Sutskever and Scaling Laws

Share this:

“Model weights are the crucial numerical parameters learned during training that define a model’s internal knowledge, dictating how input data is transformed into outputs and enabling it to recognise patterns and make predictions.” – Model weights

Key Theorist: Geoffrey Hinton

Share this:

Key Theorist: Alex L. Zhang and the MIT Origins

Share this:

Context of the Quote

Who is Jensen Huang?

What is OpenClaw?

Backstory: Linux’s Enduring Legacy

Leading Theorists in AI Agents and Open Source AI

Implications for AI and Open Source

Share this:

“Mixture of Experts (MoE) is an efficient neural network architecture that uses multiple specialised sub-models (experts) and a gating network (router) to dynamically select and activate only the most relevant experts for a given input.” – Mixture of Experts (MoE)

Core Architecture and Components

How MoE Achieves Efficiency

Historical Development and Key Theorist: Noam Shazeer

Applications and Performance

Challenges and Considerations

Share this:

“A harness (often called an agent harness or agentic harness) is an external software framework that wraps around a Large Language Model (LLM) to make it functional, durable, and capable of taking actions in the real world.” – AI harness

Core Functions and Architecture

Modular Design and Components

Performance Impact and Practical Benefits

Evolution and Strategic Importance

Related Theorist: Allen Newell and Cognitive Architecture

Share this:

“A loss function, also known as a cost function, is a mathematical function that quantifies the difference between a model’s predicted output and the actual ‘ground truth’ value for a given input.” – Loss function

Core Purpose and Function

Key Roles in Machine Learning

Common Loss Function Types

Related Strategy Theorist: Vladimir Vapnik

Practical Significance

Share this:

“Scaffolding refers to the structured architecture and instructional techniques built around an AI model to enhance its reasoning, reliability, and capability.” – AI scaffolding

Core Components of AI Scaffolding

Types of AI Scaffolding Techniques

Practical Applications and Enterprise Use

Scaffolding versus Model Scale

Key Theorist: Stuart Russell and the Alignment Research Tradition

Share this:

“I think the harder thing to measure has always been tech projects. That’s been true my whole life. It’s also been true my whole life, the tech is what changes everything, like everything.” – Jamie Dimon – JP Morgan Chase CEO

Jamie Dimon’s Career and Leadership at JPMorgan Chase

Context of the Quote: JPMorgan’s 2026 Strategic Framework

Leading Theorists on Technology Measurement and Impact

Share this:

“A world model is defined as a learned neural representation that simulates the dynamics of an environment, enabling an AI agent to predict future states and reason about the consequences of its actions.” – World model

Key Capabilities and Advantages

Role in Achieving Artificial General Intelligence (AGI)

Best Related Strategy Theorist: Yann LeCun

Share this:

“An AI Data Center is a highly specialized, power-dense physical facility designed specifically to train, deploy, and run artificial intelligence (AI) models, machine learning (ML) algorithms, and generative AI applications.” – AI Data Centre

Key Theorist: Jensen Huang and the GPU Revolution

Share this:

“Edge devices are physical computing devices located at the ‘edge. of a network, close to where data is generated or consumed, that run AI algorithms and models locally rather than relying exclusively on a centralised cloud or data center.” – Edge devices