Select Page

ARTIFICIAL INTELLIGENCE

An AI-native strategy firm

Global Advisors: a consulting leader in defining quantified strategy, decreasing uncertainty, improving decisions, achieving measureable results.

Learn MoreGlobal Advisors AI

A Different Kind of Partner in an AI World

AI-native strategy
consulting

Experienced hires

We are hiring experienced top-tier strategy consultants

Quantified Strategy

Decreased uncertainty, improved decisions

Global Advisors is a leader in defining quantified strategies, decreasing uncertainty, improving decisions and achieving measureable results.

We specialise in providing highly-analytical data-driven recommendations in the face of significant uncertainty.

We utilise advanced predictive analytics to build robust strategies and enable our clients to make calculated decisions.

We support implementation of adaptive capability and capacity.

Our latest

Thoughts

No Results Found

The page you requested could not be found. Try refining your search, or use the navigation above to locate the post.

Strategy Tools

No Results Found

The page you requested could not be found. Try refining your search, or use the navigation above to locate the post.

Fast Facts

Selected News

Term: Transformer architecture

Term: Transformer architecture

“The Transformer architecture is a deep learning model that processes entire data sequences in parallel, using an attention mechanism to weigh the significance of different elements in the sequence.” – Transformer architecture

Definition

The **Transformer architecture** is a deep learning model that processes entire data sequences in parallel, using an attention mechanism to weigh the significance of different elements in the sequence.1,2

It represents a neural network architecture based on multi-head self-attention, where text is converted into numerical tokens via tokenisers and embeddings, allowing parallel computation without recurrent or convolutional layers.1,3 Key components include:

  • Tokenisers and Embeddings: Convert input text into integer tokens and vector representations, incorporating positional encodings to preserve sequence order.1,4
  • Encoder-Decoder Structure: Stacked layers of encoders (self-attention and feed-forward networks) generate contextual representations; decoders add cross-attention to incorporate encoder outputs.1,5
  • Multi-Head Attention: Computes attention in parallel across multiple heads, capturing diverse relationships like syntactic and semantic dependencies.1,2
  • Feed-Forward Layers and Residual Connections: Refine token representations with position-wise networks, stabilised by layer normalisation.4,5

The attention mechanism is defined mathematically as:

Attention(Q, K, V) = softmax\left( \frac{QK^T}{\sqrt{d_k}} \right) V

where Q, K, V are query, key, and value matrices, and d_k is the dimension of the keys.1

Introduced in 2017, Transformers excel in tasks like machine translation, text generation, and beyond, powering models such as BERT and GPT by handling long-range dependencies efficiently.3,6

Key Theorist: Ashish Vaswani

Ashish Vaswani is a lead author of the seminal paper “Attention Is All You Need”, which introduced the Transformer architecture, fundamentally shifting deep learning paradigms.1,2

Born in India, Vaswani earned his Bachelor’s in Computer Science from the Indian Institute of Technology Bombay. He pursued a PhD at the University of Massachusetts Amherst, focusing on machine learning and natural language processing. Post-PhD, he joined Google Brain in 2015, where he collaborated with Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ?ukasz Kaiser, and Illia Polosukhin on the Transformer paper presented at NeurIPS 2017.1

Vaswani’s relationship to the term stems from co-inventing the architecture to address limitations of recurrent neural networks (RNNs) in sequence transduction tasks like translation. The team hypothesised that pure attention mechanisms could enable parallelisation, outperforming RNNs in speed and scalability. This innovation eliminated sequential processing bottlenecks, enabling training on massive datasets and spawning the modern era of large language models.2,6

Currently a research scientist at Google, Vaswani continues advancing AI efficiency and scaling laws, with his work cited over 100,000 times, cementing his influence on artificial intelligence.1

References

1. https://en.wikipedia.org/wiki/Transformer_(deep_learning)

2. https://poloclub.github.io/transformer-explainer/

3. https://www.datacamp.com/tutorial/how-transformers-work

4. https://www.jeremyjordan.me/transformer-architecture/

5. https://d2l.ai/chapter_attention-mechanisms-and-transformers/transformer.html

6. https://blogs.nvidia.com/blog/what-is-a-transformer-model/

7. https://www.ibm.com/think/topics/transformer-model

8. https://www.geeksforgeeks.org/machine-learning/getting-started-with-transformers/

"The Transformer architecture is a deep learning model that processes entire data sequences in parallel, using an attention mechanism to weigh the significance of different elements in the sequence." - Term: Transformer architecture

read more

Polls

No Results Found

The page you requested could not be found. Try refining your search, or use the navigation above to locate the post.

Services

Global Advisors is different

We help clients to measurably improve strategic decision-making and the results they achieve through defining clearly prioritised choices, reducing uncertainty, winning hearts and minds and partnering to deliver.

Our difference is embodied in our team. Our values define us.

Corporate portfolio strategy

Define optimal business portfolios aligned with investor expectations

BUSINESS UNIT STRATEGY

Define how to win against competitors

Reach full potential

Understand your business’ core, reach full potential and grow into optimal adjacencies

Deal advisory

M&A, due diligence, deal structuring, balance sheet optimisation

Global Advisors Digital Data Analytics

14 years of quantitative and data science experience

An enabler to delivering quantified strategy and accelerated implementation

Digital enablement, acceleration and data science

Leading-edge data science and digital skills

Experts in large data processing, analytics and data visualisation

Developers of digital proof-of-concepts

An accelerator for Global Advisors and our clients

Join Global Advisors

We hire and grow amazing people

Consultants join our firm based on a fit with our values, culture and vision. They believe in and are excited by our differentiated approach. They realise that working on our clients’ most important projects is a privilege. While the problems we solve are strategic to clients, consultants recognise that solutions primarily require hard work – rigorous and thorough analysis, partnering with client team members to overcome political and emotional obstacles, and a large investment in knowledge development and self-growth.

Get In Touch

16th Floor, The Forum, 2 Maude Street, Sandton, Johannesburg, South Africa
+27114616371

Global Advisors | Quantified Strategy Consulting