Select Page

ARTIFICIAL INTELLIGENCE

An AI-native strategy firm

Global Advisors: a consulting leader in defining quantified strategy, decreasing uncertainty, improving decisions, achieving measureable results.

Learn MoreGlobal Advisors AI

A Different Kind of Partner in an AI World

AI-native strategy
consulting

Experienced hires

We are hiring experienced top-tier strategy consultants

Quantified Strategy

Decreased uncertainty, improved decisions

Global Advisors is a leader in defining quantified strategies, decreasing uncertainty, improving decisions and achieving measureable results.

We specialise in providing highly-analytical data-driven recommendations in the face of significant uncertainty.

We utilise advanced predictive analytics to build robust strategies and enable our clients to make calculated decisions.

We support implementation of adaptive capability and capacity.

Our latest

Thoughts

Podcast – The Real AI Signal from Davos 2026

Podcast – The Real AI Signal from Davos 2026

While the headlines from Davos were dominated by geopolitical conflict and debates on AGI timelines and asset bubbles, a different signal emerged from the noise. It wasn’t about if AI works, but how it is being ruthlessly integrated into the real economy.

In our latest podcast, we break down the “Diffusion Strategy” defining 2026.

3 Key Takeaways:

  1. China and the “Global South” are trying to leapfrog: While the West debates regulation, emerging economies are treating AI as essential infrastructure.
    • China has set a goal for 70% AI diffusion by 2027.
    • The UAE has mandated AI literacy in public schools from K-12.
    • Rwanda is using AI to quadruple its healthcare workforce.
  2. The Rise of the “Agentic Self”: We aren’t just using chatbots anymore; we are employing agents. Entrepreneur Steven Bartlett revealed he has established a “Head of Experimentation and Failure” to use AI to disrupt his own business before competitors do. Musician will.i.am argued that in an age of predictive machines, humans must cultivate their “agentic self” to handle the predictable, while remaining unpredictable themselves.
  3. Rewiring the Core: Uber’s CEO Dara Khosrowshahi noted the difference between an “AI veneer” and a fundamental rewire. It’s no longer about summarising meetings; it’s about autonomous agents resolving customer issues without scripts.

The Global Advisors Perspective: Don’t wait for AGI. The current generation of models is sufficient to drive massive value today. The winners will be those who control their “sovereign capabilities” – embedding their tacit knowledge into models they own.

Read our original perspective here – https://with.ga/w1bd5

Listen to the full breakdown here – https://with.ga/2vg0z
While the headlines from Davos were dominated by geopolitical conflict and debates on AGI timelines and asset bubbles, a different signal emerged from the noise. It wasn't about if AI works, but how it is being ruthlessly integrated into the real economy.

read more

Strategy Tools

Fast Facts

Fast Fact: Great returns aren’t enough

Fast Fact: Great returns aren’t enough

Key insights

It’s not enough to just have great returns – top-line growth is just as critical.

In fact, S&P 500 investors rewarded high-growth companies more than high-ROIC companies over the past decade.

While the distinction was less clear on the JSE, what is clear is that getting a balance of growth and returns is critical.

Strong and consistent ROIC or RONA performers provide investors with a steady flow of discounted cash flows – without growth effectively a fixed-income instrument.

Improvements in ROIC through margin improvements, efficiencies and working-capital optimisation provide point-in-time uplifts to share price.

Top-line growth presents a compounding mechanism – ROIC (and improvements) are compounded each year leading to on-going increases in share price.

However, without acceptable levels of ROIC, the benefits of compounding will be subdued and share price appreciation will be depressed – and when ROIC is below WACC value will be destroyed.

Maintaining high levels of growth is not as sustainable as maintaining high levels of ROIC – while both typically decline as industries mature, growth is usually more affected.

Getting the right balance between ROIC and growth is critical to optimising shareholder value.

read more

Selected News

Term: Mixture of Experts (MoE)

Term: Mixture of Experts (MoE)

“Mixture of Experts (MoE) is an efficient neural network architecture that uses multiple specialised sub-models (experts) and a gating network (router) to dynamically select and activate only the most relevant experts for a given input.” – Mixture of Experts (MoE)

This architectural approach divides a large artificial intelligence model into separate sub-networks, each specialising in processing specific types of input data. Rather than activating the entire network for every task, MoE models employ a gating mechanism-often called a router-that intelligently selects which experts should process each input. This selective activation introduces sparsity into the network, meaning only a fraction of the model’s total parameters are used for any given computation.1,3

Core Architecture and Components

The fundamental structure of MoE consists of two essential elements:4

  • Expert networks: Multiple specialised sub-networks, typically implemented as feed-forward neural networks (FFNs), each with its own set of learnable parameters. These experts become skilled at handling specific patterns or types of data during training.1
  • Gating network (router): A trainable mechanism that evaluates each input and determines which expert or combination of experts is best suited to process it. This routing function is computationally efficient, enabling the model to make rapid decisions about expert selection.1,3

In practical implementations, such as the Mixtral 8x7B language model, each layer contains multiple experts-for instance, eight separate feedforward blocks with 7 billion parameters each. For every token processed, the router selects only a subset of these experts (in Mixtral’s case, two out of eight) to perform the computation, then combines their outputs before passing the result to the next layer.3

How MoE Achieves Efficiency

MoE models leverage conditional computation to reduce computational burden without sacrificing model capacity.3 This approach enables several efficiency gains:

  • Models can scale to billions of parameters whilst maintaining manageable inference costs, since not all parameters are activated for every input.1,3
  • Training can occur with significantly less compute, allowing researchers to either reduce training time or expand model and dataset sizes.4
  • Experts can be distributed across multiple devices through expert parallelism, enabling efficient large-scale deployments.1

The gating mechanism ensures that frequently selected experts receive continuous updates during training, improving their performance, whilst load balancing mechanisms attempt to distribute computational work evenly across experts to prevent bottlenecks.1

Historical Development and Key Theorist: Noam Shazeer

Noam Shazeer stands as the primary architect of modern MoE systems in deep learning. In 2017, Shazeer and colleagues-including the legendary Geoffrey Hinton and Google’s Jeff Dean-introduced the Sparsely-Gated Mixture-of-Experts Layer for recurrent neural language models.1,4 This seminal work fundamentally transformed how researchers approached scaling neural networks.

Shazeer’s contribution was revolutionary because it reintroduced the mixture of experts concept, which had existed in earlier machine learning literature, into the deep learning era. His team scaled this architecture to a 137-billion-parameter LSTM model, demonstrating that sparsity could maintain very fast inference even at massive scale.4 Although this initial work focused on machine translation and encountered challenges such as high communication costs and training instabilities, it established the theoretical and practical foundation for all subsequent MoE research.4

Shazeer’s background as a researcher at Google positioned him at the intersection of theoretical machine learning and practical systems engineering. His work exemplified a crucial insight: that not all parameters in a neural network need to be active simultaneously. This principle has since become foundational to modern large language model design, influencing architectures used by leading AI organisations worldwide. The Sparsely-Gated Mixture-of-Experts Layer introduced the trainable gating network concept that remains central to MoE implementations today, enabling conditional computation that balances model expressiveness with computational efficiency.1

Applications and Performance

MoE architectures have demonstrated faster training and comparable or superior performance to dense language models on many benchmarks, particularly in multi-domain tasks where different experts can specialise in different knowledge areas.1 Applications span natural language processing, computer vision, and recommendation systems.2

Challenges and Considerations

Despite their advantages, MoE systems present implementation challenges. Load balancing remains critical-when experts are distributed across multiple devices, uneven expert selection can create memory and computational bottlenecks, with some experts handling significantly more tokens than others.1 Additionally, distributed training complexity and the need for careful tuning to maintain stability and efficiency require sophisticated engineering approaches.1

References

1. https://neptune.ai/blog/mixture-of-experts-llms

2. https://www.datacamp.com/blog/mixture-of-experts-moe

3. https://www.ibm.com/think/topics/mixture-of-experts

4. https://huggingface.co/blog/moe

5. https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mixture-of-experts

6. https://www.youtube.com/watch?v=sYDlVVyJYn4

7. https://arxiv.org/html/2503.07137v1

8. https://cameronrwolfe.substack.com/p/moe-llms

"Mixture of Experts (MoE) is an efficient neural network architecture that uses multiple specialised sub-models (experts) and a gating network (router) to dynamically select and activate only the most relevant experts for a given input." - Term: Mixture of Experts (MoE)

read more

Polls

Services

Global Advisors is different

We help clients to measurably improve strategic decision-making and the results they achieve through defining clearly prioritised choices, reducing uncertainty, winning hearts and minds and partnering to deliver.

Our difference is embodied in our team. Our values define us.

Corporate portfolio strategy

Define optimal business portfolios aligned with investor expectations

BUSINESS UNIT STRATEGY

Define how to win against competitors

Reach full potential

Understand your business’ core, reach full potential and grow into optimal adjacencies

Deal advisory

M&A, due diligence, deal structuring, balance sheet optimisation

Global Advisors Digital Data Analytics

14 years of quantitative and data science experience

An enabler to delivering quantified strategy and accelerated implementation

Digital enablement, acceleration and data science

Leading-edge data science and digital skills

Experts in large data processing, analytics and data visualisation

Developers of digital proof-of-concepts

An accelerator for Global Advisors and our clients

Join Global Advisors

We hire and grow amazing people

Consultants join our firm based on a fit with our values, culture and vision. They believe in and are excited by our differentiated approach. They realise that working on our clients’ most important projects is a privilege. While the problems we solve are strategic to clients, consultants recognise that solutions primarily require hard work – rigorous and thorough analysis, partnering with client team members to overcome political and emotional obstacles, and a large investment in knowledge development and self-growth.

Get In Touch

16th Floor, The Forum, 2 Maude Street, Sandton, Johannesburg, South Africa
+27114616371

Global Advisors | Quantified Strategy Consulting