Word Embeddings Explained: The Math Behind AI, LLMs, and Chatbots

NLP Explainer · AI Series 2026

How Machines Understand Language

A guide to word embeddings — where meaning becomes mathematics, and vectors do the talking.

When a search engine retrieves a document about automobiles in response to a query about cars, it is not matching text character by character. Somewhere beneath the interface, the system understands that these two words are semantically related. The mechanism behind that understanding is the word embedding — and once you see the geometry, you cannot unsee it.

This article walks through the key mathematical operations that make embeddings work: distance, similarity, arithmetic, scaling, and the dot product. Each concept is illustrated with concrete numerical vectors so the math is visible, not just described. Real embeddings typically use hundreds of dimensions; the 3- and 4-dimensional examples here preserve all the structure while staying readable on a page.

1  ·  What is a Word Embedding?

A word embedding is a representation of a word as a vector — an ordered list of numbers — in a high-dimensional space. A typical embedding model might use 300 dimensions, so the word cat becomes a point with 300 coordinates. That sounds abstract, but the key insight is this: the position of that point encodes meaning.

This is what researchers call a semantic space. Words with related meanings end up positioned close to each other. King and Queen live near each other. Paris and London live near each other. Bicycle and democracy live far apart. The model learns these positions not from human-curated rules, but from the statistical patterns of how words appear together in enormous text corpora.

EXAMPLE: 4-DIMENSIONAL VECTORS (simplified from real 300-dim embeddings)
vec(“King”)  = [ 0.9, 0.7, 0.4,  +0.6 ]
vec(“Queen”)  = [ 0.9, 0.7, 0.4,  -0.6 ]
vec(“Man”)  = [ 0.5, 0.3, 0.1,  +0.8 ]
vec(“Woman”) = [ 0.5, 0.3, 0.1,  -0.8 ]

The first three dimensions encode royalty, authority, and age.
The fourth dimension encodes gender: positive = masculine, negative = feminine.

Think of it as a map where the geography is meaning. Every word is a pin, and the distances between pins reflect semantic relationships rather than physical ones.

2  ·  The Geometry of Meaning: Distance and Similarity

Once words are points in space, we need a way to measure how close they are. Two approaches dominate: Euclidean distance and cosine similarity. For the examples below, we use a 3-dimensional temperature embedding:

TEMPERATURE VECTORS (3 dimensions)
vec(“Hot”) = [  1.0,  0.8,  0.6 ]
vec(“Warm”) = [  0.8,  0.6,  0.4 ]
vec(“Cold”) = [ -0.6,  0.4, -0.8 ]

2.1   Euclidean (Cartesian) Distance

The most intuitive measure — the straight-line gap between the tips of two arrows drawn from the origin. For vectors a and b in n dimensions:

d(a, b)  =  √ Σi ( aibi )2
WORKED EXAMPLE: EUCLIDEAN DISTANCE
// Hot vs Warm (similar words)
d(Hot, Warm) = √[ (1.0-0.8)2 + (0.8-0.6)2 + (0.6-0.4)2 ]
              = √[ 0.04 + 0.04 + 0.04 ] = √0.12  ≈  0.346  ← small: close together

// Hot vs Cold (opposite words)
d(Hot, Cold) = √[ (1.0-(-0.6))2 + (0.8-0.4)2 + (0.6-(-0.8))2 ]
              = √[ 2.56 + 0.16 + 1.96 ] = √4.68  ≈  2.163  ← large: far apart

2.2   Cosine Similarity — The Industry Standard

In practice, NLP systems almost universally prefer cosine similarity over Euclidean distance. It ignores the length of vectors entirely and focuses only on the angle between them — two vectors pointing the same direction score 1.0 regardless of their magnitude.

COSINE SIMILARITY
cos(θ)  = a  ·  b
a‖  ×  ‖b
Range: −1  (opposite)  →  0  (orthogonal)  →  +1  (identical direction)
WORKED EXAMPLE: COSINE SIMILARITY
// First compute magnitudes
‖Hot‖ = √(1.02 + 0.82 + 0.62) = √2.00 ≈ 1.414
‖Warm‖ = √(0.82 + 0.62 + 0.42) = √1.16 ≈ 1.077
‖Cold‖ = √(0.62 + 0.42 + 0.82) = √1.16 ≈ 1.077

// Hot vs Warm (small angle)
dot(Hot, Warm) = (1.0)(0.8) + (0.8)(0.6) + (0.6)(0.4) = 0.80 + 0.48 + 0.24 = 1.52
cos(Hot, Warm) = 1.52 / (1.414 × 1.077) = 1.52 / 1.523 ≈ +0.998

// Hot vs Cold (large angle)
dot(Hot, Cold) = (1.0)(-0.6) + (0.8)(0.4) + (0.6)(-0.8) = -0.60 + 0.32 – 0.48 = -0.76
cos(Hot, Cold) = -0.76 / (1.414 × 1.077) = -0.76 / 1.523 ≈ -0.499
Word Pair Euclidean d cos(θ) Interpretation
Hot vs Warm 0.346 +0.998 Nearly identical direction — closely related
Hot vs Cold 2.163 −0.499 Opposite directions — antonyms
3  ·  Vector Arithmetic: Meaning You Can Add and Subtract

Because words are vectors, you can perform arithmetic on them — and the results are semantically meaningful. The most famous example uses the 4-dimensional royalty vectors introduced in Section 1:

THE CLASSIC ANALOGY
vec(“King”) − vec(“Man”) + vec(“Woman”)  ≈  vec(“Queen”)
WORKED EXAMPLE: KING – MAN + WOMAN
King  = [ 0.9,  0.7,  0.4,  +0.6 ]
Man  = [ 0.5,  0.3,  0.1,  +0.8 ]
Woman  = [ 0.5,  0.3,  0.1,  -0.8 ]

// Subtract component by component, then add
King – Man = [ 0.9-0.5,  0.7-0.3,  0.4-0.1,  0.6-0.8 ] = [  0.4,   0.4,   0.3,  -0.2 ]
+ Woman   = [ 0.4+0.5,  0.4+0.3,  0.3+0.1,  -0.2+(-0.8) ] = [  0.9,   0.7,   0.4,  -1.0 ]

// Find nearest word by Euclidean distance
result = [ 0.9, 0.7, 0.4, -1.0 ]

d(result, Queen) = √[ 0 + 0 + 0 + (-1.0-(-0.6))2 ] = √0.16 ≈ 0.400 ← nearest
d(result, Woman) ≈ 0.671    d(result, King) = 1.600    d(result, Man) ≈ 1.910

cos(result, Queen) ≈ 0.974   ← highest cosine similarity also points to Queen

What happened geometrically? Subtracting Man from King stripped out the gender dimension (+0.8 gone), leaving the royalty structure intact. Adding Woman injected the feminine gender value (-0.8). The result sits 0.4 units from Queen — the nearest word in this vocabulary.

4  ·  Scalar Multiplication and Division: Changing Intensity

Multiplying or dividing a vector by a scalar (a plain number) changes its magnitude without changing its direction. This maps neatly onto the idea of degree in language — Tiny, Large, and Gigantic all point in roughly the same semantic direction, but at different intensities.

SIZE VECTORS (3 dimensions)
vec(“Tiny”) = [ 0.10, 0.20, 0.10 ]
vec(“Large”) = [ 0.50, 0.70, 0.40 ]
vec(“Gigantic”) = [ 1.10, 1.50, 0.90 ]
WORKED EXAMPLE: SCALING ALONG THE SIZE AXIS
// Multiplying Large by 2 moves it toward Gigantic
Large × 2 = [ 0.5×2,  0.7×2,  0.4×2 ] = [ 1.00,  1.40,  0.80 ]
vec(“Gigantic”) = [ 1.10,  1.50,  0.90 ]    d(Large × 2, Gigantic) ≈ 0.173 ← very close

// Multiplying Large by 0.2 moves it toward Tiny
Large × 0.2 = [ 0.10,  0.14,  0.08 ]
vec(“Tiny”) = [ 0.10,  0.20,  0.10 ]    d(Large × 0.2, Tiny) ≈ 0.063 ← very close

Division works the same way along an intensity axis. Halving a “Loud” vector lands near “Soft”:

WORKED EXAMPLE: DIVIDING ALONG THE LOUDNESS AXIS
vec(“Loud”) = [ 0.90, 1.20, 0.60 ]    vec(“Soft”) = [ 0.30, 0.40, 0.20 ]
Loud ÷ 2 = [ 0.45,  0.60,  0.30 ]
d(Loud ÷ 2, Soft) ≈ 0.269  ← direction unchanged, intensity halved
Key intuition: Scalar operations change how much of something a vector represents, without changing what kind of thing it represents. Direction is preserved; intensity is tuned.
5  ·  The Dot Product: Agreement and Magnitude Together

The dot product of two vectors is computed by multiplying their corresponding components and summing the results:

DOT PRODUCT
a  ·  b  =  Σi  ( ai × bi )  =  a1b1  +  a2b2  + … +  anbn

The dot product is cosine similarity before normalising away the vector lengths. It captures two things simultaneously: the direction of agreement and the combined magnitude. Cosine similarity captures only the first.

We reuse the loudness vectors from Section 4 — Very Loud is “Loud” and A Little Loud is “Soft”. They point in exactly the same direction but have very different lengths:

WORKED EXAMPLE: VERY LOUD vs A LITTLE LOUD
vec(“A Little Loud”) = [ 0.30, 0.40, 0.20 ]  |magnitude| = 0.539
vec(“Very Loud”) = [ 0.90, 1.20, 0.60 ]  |magnitude| = 1.616

// Cosine similarity: measures direction only
dot(AL, VL) = (0.3)(0.9) + (0.4)(1.2) + (0.2)(0.6) = 0.27 + 0.48 + 0.12 = 0.87
cos(AL, VL) = 0.87 / (0.539 × 1.616) = 0.87 / 0.871 ≈ 1.000

// Dot product: measures direction AND magnitude
AL · AL = (0.3)2 + (0.4)2 + (0.2)2 = 0.09 + 0.16 + 0.04 = 0.29
VL · VL = (0.9)2 + (1.2)2 + (0.6)2 = 0.81 + 1.44 + 0.36 = 2.61
Comparison Magnitude cos(θ) v · v
A Little Loud 0.539 1.000 (same dir.) 0.29
Very Loud 1.616 1.000 (same dir.) 2.61

Both words are perfectly collinear — cosine similarity is 1.0 in both cases. But the dot products are 0.29 vs 2.61, a 9× difference. This is why recommendation systems and attention mechanisms in transformer models often prefer raw dot products: when you want to know not just whether a document is relevant but also how prominently it discusses a topic, the dot product gives you both dimensions at once.

6  ·  Practical Applications

Search engines convert your query into a vector and retrieve documents whose vectors are nearest to it in the semantic space — using cosine similarity to rank by relevance regardless of exact word match. When you search for car insurance and the engine returns results about vehicle coverage, it is doing nearest-neighbour lookup in embedding space, exactly as the Hot/Warm/Cold example in Section 2 demonstrates.

Recommendation systems represent your interests as a vector computed from your history, then find products whose vectors are closest to yours. The dot product is particularly useful here: a highly-relevant item with a large magnitude — analogous to Very Loud — will score higher than a mildly-relevant item even if they point in the same direction.

Large language models use the scaled dot product directly inside the attention mechanism. For every token, a query vector and a set of key vectors are compared via dot product to determine which parts of the context deserve attention — a direct descendant of the arithmetic explored in Section 5.

Quick Reference: Embedding Operations

Operation Formula Section 2-5 Result
Euclidean Distance √( Σ (aibi)2 ) d(Hot,Warm) = 0.346   d(Hot,Cold) = 2.163
Cosine Similarity (a·b) / (‖a‖×‖b‖) cos(Hot,Warm) = +0.998   cos(Hot,Cold) = -0.499
Vector Arithmetic a ± b King-Man+Woman → nearest Queen (d = 0.400)
Scalar Multiplication λ · a Large × 2 → near Gigantic   Loud ÷ 2 → near Soft
Dot Product a·b = Σ aibi cos = 1.00 for both; dot 0.29 (soft) vs 2.61 (loud)

✦ This article was generated with the assistance of Claude by Anthropic

Dario Amodei — The Adolescence of Technology

https://www.darioamodei.com/essay/the-adolescence-of-technology

In “The Adolescence of Technology,” Anthropic CEO Dario Amodei argues that humanity is entering a high-stakes “technological puberty” with the imminent arrival of expert-level AI. He outlines a pragmatic strategy to counter existential risks—ranging from biological threats to digital authoritarianism—stressing that through surgical regulation and rigorous safety engineering, we can navigate this dangerous transition toward a future of immense global benefit.

Does Challenging AI Make It Smarter?

A recent Medium article claims that adding challenge phrases like “I bet you can’t solve this” to AI prompts improves output quality by 45%, based on research by Li et al. (2023).

Quick Test Results

Testing these techniques on academic tasks—SQL queries, code debugging, and research synthesis—showed mixed but interesting results:

What worked: Challenge framing produced more thorough, systematic responses for complex multi-step problems. Confidence scoring (asking AI to rate certainty and re-evaluate if below 0.9) caught overconfident answers.

What didn’t: Simple factual queries showed no improvement.

The Why

High-stakes language doesn’t trigger AI emotions—it cues pattern-matching against higher-quality training examples where stakes were high.

Bottom Line

Worth trying for complex tasks, but expect higher token usage. Results are task-dependent, not universal.


Source: Li et al. (2023), arXiv:2307.11760

“`

Source: Li et al. (2023), arXiv:2307.11760

Streamlining macOS Application Management with Homebrew Cask

macOS users frequently face the challenge of efficiently managing application installations across multiple machines. The traditional approach involves manually downloading disk images, navigating installation wizards, and maintaining applications across systems. Homebrew Cask offers a command-line solution that significantly streamlines this process.

Understanding Homebrew Cask

Homebrew Cask is an extension of Homebrew, the widely-adopted package manager for macOS. While Homebrew manages command-line tools and libraries, Cask extends this functionality to graphical user interface (GUI) applications. This enables system administrators, developers, and power users to install, update, and manage standard macOS applications through terminal commands.

The conventional installation workflow requires multiple steps:

  1. Locating the official download source
  2. Downloading the disk image file
  3. Opening and mounting the disk image
  4. Transferring the application to the Applications folder
  5. Ejecting the disk image
  6. Managing the downloaded installer file
  7. Repeating this process for each required application

Homebrew Cask reduces this to a single command:

brew install --cask google-chrome

The application is then installed automatically with no further user interaction required.

Key Advantages for Professional Workflows

1. Accelerated System Provisioning

Organizations and individual users can maintain installation scripts containing all required applications. A typical enterprise development environment setup might include:

brew install --cask visual-studio-code
brew install --cask docker
brew install --cask slack
brew install --cask zoom
brew install --cask rectangle
brew install --cask iterm2
brew install --cask spotify
brew install --cask vlc

This approach reduces new machine setup time from several hours to approximately 15-20 minutes, depending on network bandwidth and the number of applications being installed.

2. Simplified Update Management

Maintaining current software versions is essential for security compliance and feature availability. Rather than monitoring and updating each application individually, administrators can execute a single command:

brew upgrade --cask

This command updates all Cask-managed applications to their latest versions, ensuring consistent patch management across the system.

3. Complete Application Removal

Standard macOS uninstallation methods often leave residual files including configuration data, cache files, and preference files distributed throughout the file system. Homebrew Cask performs thorough removal:

brew uninstall --cask docker

This ensures complete application removal without orphaned system files.

4. Automation and Standardization

Homebrew Cask’s command-line interface enables scripting and automation. Development teams can create standardized setup scripts ensuring consistent development environments. IT departments can implement automated workstation provisioning workflows. System configurations can be version-controlled in dotfiles repositories, enabling rapid deployment and rollback capabilities.

Recommended Applications by Category

The following applications represent commonly deployed tools across professional environments:

Development Tools

brew install --cask visual-studio-code
brew install --cask iterm2
brew install --cask docker
brew install --cask postman
brew install --cask dbeaver-community

Productivity Applications

brew install --cask rectangle        # Window management
brew install --cask alfred           # Enhanced search functionality
brew install --cask obsidian         # Knowledge management
brew install --cask notion           # Collaborative workspace

Communication Platforms

brew install --cask slack
brew install --cask zoom
brew install --cask discord

System Utilities

brew install --cask the-unarchiver
brew install --cask appcleaner
brew install --cask vlc

Implementation Guide

Organizations and users without an existing Homebrew installation can deploy it with a single command:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Once Homebrew is installed, Cask functionality is built right in. Just start using brew install --cask commands.

Useful Commands to Know

# Search for an app
brew search --cask chrome

# Get information about an app
brew info --cask visual-studio-code

# List all installed cask apps
brew list --cask

# Update all apps
brew upgrade --cask

# Uninstall an app
brew uninstall --cask slack

A Few Gotchas

Cask isn’t perfect. Here are some things to be aware of:

  • Not every app is available – Popular apps are well-covered, but niche or very new applications might not be in the repository yet
  • App Store apps aren’t included – Apps distributed exclusively through the Mac App Store can’t be installed via Cask
  • Some apps require manual steps – Occasionally, an app needs additional configuration or permissions that Cask can’t automate
  • Updates might lag slightly – Cask maintainers need to update formulas when new versions release, so there can be a brief delay

These are minor inconveniences compared to the time saved.

The Bottom Line

Homebrew Cask has fundamentally changed how I interact with my Mac. What started as a way to avoid repetitive downloads has become an essential part of my workflow. The ability to script, automate, and version-control my application setup means I’m never more than a few commands away from a productive environment.

If you spend any significant time on macOS, especially as a developer or power user, Homebrew Cask is worth learning. Your future self—the one setting up that next new machine—will thank you.

Try It Yourself

Pick three applications you use regularly and install them via Cask. I bet you’ll be hooked by the simplicity. Start with something like:

brew install --cask visual-studio-code
brew install --cask google-chrome  
brew install --cask rectangle

Welcome to a more efficient way of managing your Mac applications.


What’s your favorite Homebrew Cask application? Have you automated your Mac setup? Share your experiences in the comments below!

AI is displacing software engineers, but those in Singapore have the chance to fare better

https://www.straitstimes.com/business/ai-is-displacing-software-engineers-but-those-in-singapore-have-the-chance-to-fare-better?sfnsn=mo

The CEO Magazine: David Ellis: Why AI makes new graduates more valuable than ever

https://amp.theceomagazine.com/business/innovation-technology/david-ellis/

Ellis sees a different future. Rather than eliminating graduate positions, IBM Consulting is actively increasing them.

“So, for example, we are increasing, not decreasing, the number of graduate hires that we’re making here in Australia,” he says.

Investing in the future
The reasoning is both strategic and generational. Today’s graduates enter the workforce with a crucial advantage – they’ve been using AI longer than most experienced workers.

“We have people entering the workforce that have perhaps been using AI longer than many others. Maybe they’ve been using it through their studies. Maybe they’ve just got a deeper affinity to it,” Ellis explains.

When properly equipped and trained, these AI-native workers can be a huge asset to organizations.

“We can skill them, we can equip them, we can give them the confidence to be much more effective than you or I might have been at the beginning of our careers,” he says.