‌

Our selection of the top business news sources on the web.

AM edition. Issue number 953

Latest 10 stories. Click the button for more.

‌

Infographic: Four critical DeepSeek enablers

The DeepSeek team has introduced several high-impact changes to Large Language Model (LLM) architecture to enhance performance and efficiency:

Multi-Head Latent Attention (MLA): This mechanism enables the model to process multiple facets of input data simultaneously, improving both efficiency and performance. MLA reduces the memory required to compute a transformer's attention by a factor of 7.5x to 20x, a breakthrough that makes large-scale AI applications more feasible. Unlike Flash Attention, which improves data organization in memory, MLA compresses the KV cache into a lower-dimensional space, significantly reducing memory usage—down to 5% to 13% of traditional attention mechanisms—while maintaining performance.
Mixture-of-Experts (MoE) Architecture: DeepSeek employs an MoE system that activates only a subset of its total parameters during any given task. For instance, in DeepSeek-V3, only 37 billion out of 671 billion parameters are active at a time, significantly reducing computational costs. This approach enhances efficiency and aligns with the trend of making AI models more compute-light, allowing freed-up GPU resources to be allocated to multi-modal processing, spatial intelligence, or genomic analysis. MoE models, as also leveraged by Mistral and other leading AI labs, allow for scalability while keeping inference costs manageable.
FP8 Floating Point Precision: To enhance computational efficiency, DeepSeek-V3 utilizes FP8 floating point precision during training, which helps in reducing memory usage and accelerating computation. This follows a broader trend in AI to optimize training methodologies, potentially influencing the approach taken by U.S.-based LLM providers. Given China's restricted access to high-end GPUs due to U.S. export controls, optimizations like FP8 and MLA are critical in overcoming hardware limitations.
DeepSeek-R1 and Test-Time Compute Capabilities: DeepSeek-R1 is a model that leverages reinforcement learning (RL) to enable test-time compute, significantly improving reasoning capabilities. The model was trained using an innovative RL strategy, incorporating fine-tuned Chain of Thought (CoT) data and supervised fine-tuning (SFT) data across multiple domains. Notably, DeepSeek demonstrated that any sufficiently powerful LLM can be transformed into a high-performance reasoning model using only 800k curated training samples. This technique allows for rapid adaptation of smaller models, such as Qwen and LLaMa-70b, into competitive reasoners.
Distillation to Smaller Models: The team has developed distilled versions of their models, such as DeepSeek-R1-Distill, which are fine-tuned on synthetic data generated by larger models. These distilled models contain fewer parameters, making them more efficient while retaining significant capabilities. DeepSeek’s ability to achieve comparable reasoning performance at a fraction of the cost of OpenAI’s models (5% of the cost, according to Pelliccione) has disrupted the AI landscape.

The Impact of Open-Source Models:

DeepSeek's success highlights a fundamental shift in AI development. Traditionally, leading-edge models have been closed-source and controlled by Western AI firms like OpenAI, Google, and Anthropic. However, DeepSeek's approach, leveraging open-source components while innovating on training efficiency, has disrupted this dynamic. Pelliccione notes that DeepSeek now offers similar performance to OpenAI at just 5% of the cost, making high-quality AI more accessible. This shift pressures proprietary AI companies to rethink their business models and embrace greater openness.

Challenges and Innovations in the Chinese AI Ecosystem:

China's AI sector faces major constraints, particularly in access to high-performance GPUs due to U.S. export restrictions. Yet, Chinese companies like DeepSeek have turned these challenges into strengths through aggressive efficiency improvements. MLA and FP8 precision optimizations exemplify how innovation can offset hardware limitations. Furthermore, Chinese AI firms, historically focused on scaling existing tech, are now contributing to fundamental advancements in AI research, signaling a shift towards deeper innovation.

The Future of AI Control and Adaptation:

DeepSeek-R1’s approach to training AI reasoners poses a challenge to traditional AI control mechanisms. Since reasoning capabilities can now be transferred to any capable model with fewer than a million curated samples, AI governance must extend beyond compute resources and focus on securing datasets, training methodologies, and deployment platforms. OpenAI has previously obscured Chain of Thought traces to prevent leakage, but DeepSeek’s open-weight release and published RL techniques have made such restrictions ineffective.

Broader Industry Context:

DeepSeek benefits from Western open-source AI developments, particularly Meta’s LLama model disclosures, which provided a foundation for its advancements. However, DeepSeek’s success also demonstrates that China is shifting from scaling existing technology to innovating at the frontier.
Open-source models like DeepSeek will see widespread adoption for enterprise and research applications, though Western businesses are unlikely to build their consumer apps on a Chinese API.
The AI innovation cycle is exceptionally fast, with breakthroughs assessed daily or weekly. DeepSeek’s advances are part of a rapidly evolving competitive landscape dominated by U.S. big tech players like OpenAI, Google, Microsoft, and Meta, who continue to push for productization and revenue generation. Meanwhile, Chinese AI firms, despite hardware and data limitations, are innovating at an accelerated pace and have proven capable of challenging OpenAI’s dominance.

These innovations collectively contribute to more efficient and effective LLMs, balancing performance with resource utilization while shaping the future of AI model development.

Sources: Global Advisors, Jack Clark - Anthropic, Antoine Blondeau, Alberto Pelliccione, infoq.com, medium.com, en.wikipedia.org, arxiv.org

‌

PODCAST: Effective Transfer Pricing

Our Spotify podcast discusses how to get transfer pricing right.

We discuss effective transfer pricing within organizations, highlighting the prevalent challenges and proposing solutions. The core issue is that poorly implemented internal pricing leads to suboptimal economic decisions, resource allocation problems, and interdepartmental conflict. The hosts advocate for market-based pricing over cost recovery, emphasizing the importance of clear price signals for efficient resource allocation and accurate decision-making. They stress the need for service level agreements, fair cost allocation, and a comprehensive process to manage the political and emotional aspects of internal pricing, ultimately aiming for improved organizational performance and profitability. The podcast includes case studies illustrating successful implementations and the authors' expertise in this field.

PODCAST: A strategic take on cost-volume-profit analysis

Our Spotify podcast highlights that despite familiarity, most managers do not apply CVP analysis and get it wrong in its most basic form.

The hosts explain cost-volume-profit (CVP) analysis, a crucial business tool often misapplied. It details the theoretical underpinnings of CVP, using graphs to illustrate relationships between price, volume, and profit. The hosts highlight common errors in CVP application, such as neglecting volume changes after price increases, leading to the "margin-price-volume death spiral." The hosts offer practical advice and strategic questions to improve CVP analysis and decision-making, emphasizing the need for accurate costing and a nuanced understanding of market dynamics. Finally, the podcast provides case studies illustrating both successful and unsuccessful CVP implementations.

Quote: Dario Amodei

"If we want AI to favor democracy and individual rights, we are going to have to fight for that outcome."

Dario Amodei
CEO, Anthropic

‌

Quote: Dario Amodei

"It’s my guess that powerful AI could at least 10x the rate of these discoveries, giving us the next 50-100 years of biological progress in 5-10 years."

Dario Amodei
CEO, Anthropic

Gimg src="https://globaladvisors.biz/wp-content/uploads/2024/11/20241120_13h00_GlobalAdvisors_Marketing_Quote_DarioAmodei_MW.png"/>

‌

Quote: Dario Amodei

“I think that most people are underestimating just how radical the upside of AI could be, just as I think most people are underestimating how bad the risks could be.”

- Dario Amodei
CEO, Anthropic

‌

Quote: Sam Altman

“Build a company that benefits from the model getting better and better ... I encourage people to be aligned with that.”

- Sam Altman

“Build a company that benefits from the model getting better and better ... I encourage people to be aligned with that.” - Sam Altman

‌

PODCAST: Your Due Diligence is Most Likely Wrong

Our Spotify podcast explores why most mergers and acquisitions fail to create value and provides a practical guide to performing a strategic due diligence process.

The hosts The hosts highlight common pitfalls like overpaying for acquisitions, failing to understand the true value of a deal, and neglecting to account for future uncertainties. They emphasize that a successful deal depends on a clear strategic rationale, a thorough understanding of the target's competitive position, and a comprehensive assessment of potential risks. They then present a four-stage approach to strategic due diligence that incorporates scenario planning and probabilistic simulations to quantify uncertainty and guide decision-making. Finally, they discuss how to navigate deal-making during economic downturns and stress the importance of securing existing businesses, revisiting return measures, prioritizing potential targets, and factoring in potential delays.

Quote: Sam Altman

“Building a business - man that's the brass ring: the rules still apply. You can do it faster than ever before and better than ever before, but you still have to build a business.”

- Sam Altman
CEO, OpenAI

‌

Quote: Andrew Ng

"We're making this analogy that AI is the new electricity. Electricity transformed industries: agriculture, transportation, communication, manufacturing."

-Andrew Ng

‌