“Model weights are the crucial numerical parameters learned during training that define a model’s internal knowledge, dictating how input data is transformed into outputs and enabling it to recognise patterns and make predictions.” – Model weights
Model weights represent the learnable numerical parameters within a neural network that determine how input data is processed to generate predictions, functioning similarly to synaptic strengths in a biological brain.1,2,4 These values control the influence of specific features on the output, such as edges in images or tokens in language models, through operations like matrix multiplications, convolutions, or weighted sums across layers.1,2,3 Initially randomised, weights are optimised during training via algorithms like gradient descent, which iteratively adjust them to minimise a loss function measuring the difference between predictions and actual targets.1,2,5
In practice, for a simple linear regression model expressed as y = wx + b, the weight w scales the input x to predict y, while b is the bias term.2 In complex architectures like convolutional neural networks (CNNs) or large language models (LLMs), weights include filters detecting textures and fully connected layers combining features, often numbering in billions.1,2,5 This enables tasks from image classification to real-time translation, with pre-trained weights facilitating transfer learning on custom datasets.1
Weights are distinct from biases, which add normalisation and extra characteristics to the weighted sum before activation functions, aiding forward and backward propagation.3,6 Protecting these parameters is vital, as they encode the model’s performance, robustness, and decision logic; unauthorised changes can lead to malfunction.5 In LLMs, weights boost emphasis on words or associations, shaping generative outputs.3
Key Theorist: Geoffrey Hinton
The preeminent theorist linked to model weights is **Geoffrey Hinton**, often called the ‘Godfather of Deep Learning’ for pioneering backpropagation and neural network training techniques that optimise these parameters.1,2 Hinton’s seminal 1986 paper with David Rumelhart and Ronald Williams popularised backpropagation, the cornerstone algorithm for adjusting weights layer-by-layer based on error gradients, revolutionising machine learning.2,4
Born in 1947 in Wimbledon, London, Hinton descends from a lineage of scientists: his great-great-grandfather George Boole invented Boolean logic, his grandfather Charles Howard Hinton coined ‘hyperspace’, and his great-uncle was logician Bertrand Russell. Initially studying experimental psychology at Cambridge (BA 1970), Hinton earned a PhD in AI from Edinburgh in 1978, focusing on Boltzmann machines-early stochastic neural networks with learnable weights. Disillusioned with symbolic AI, he championed connectionism, simulating brain-like learning via weights.
In the 1980s, amid the first AI winter, Hinton persisted at Carnegie Mellon and Toronto, developing restricted Boltzmann machines for unsupervised pre-training of weights, addressing vanishing gradients. His 2006 breakthrough with Alex Krizhevsky and Ilya Sutskever-training deep belief networks on ImageNet-proved deep nets with billions of weights could excel, sparking the deep learning revolution.1 At Google Brain (2013-2023), he advanced capsule networks and transformers indirectly influencing LLMs. Hinton quit Google in 2023, warning of AI risks, and won the 2018 Turing Award with Yann LeCun and Yoshua Bengio. His work directly underpins how modern models, including LLMs, learn weights to recognise patterns and predict outcomes.3,5
References
1. https://www.ultralytics.com/glossary/model-weights
2. https://www.tencentcloud.com/techpedia/132448
3. https://blog.metaphysic.ai/weights-in-machine-learning/
4. https://tedai-sanfrancisco.ted.com/glossary/weights/
5. https://alliancefortrustinai.org/how-model-weights-can-be-used-to-fine-tune-ai-models/
6. https://h2o.ai/wiki/weights-and-biases/

