Word Embeddings Explained: The Math Behind AI, LLMs, and Chatbots

    NLP Explainer · AI Series 2026
  

How Machines Understand Language

A guide to word embeddings — where meaning becomes mathematics, and vectors do the talking.

When a search engine retrieves a document about automobiles in response to a query about cars, it is not matching text character by character. Somewhere beneath the interface, the system understands that these two words are semantically related. The mechanism behind that understanding is the word embedding — and once you see the geometry, you cannot unsee it.

This article walks through the key mathematical operations that make embeddings work: distance, similarity, arithmetic, scaling, and the dot product. Each concept is illustrated with concrete numerical vectors so the math is visible, not just described. Real embeddings typically use hundreds of dimensions; the 3- and 4-dimensional examples here preserve all the structure while staying readable on a page.

1 · What is a Word Embedding?

A word embedding is a representation of a word as a vector — an ordered list of numbers — in a high-dimensional space. A typical embedding model might use 300 dimensions, so the word cat becomes a point with 300 coordinates. That sounds abstract, but the key insight is this: the position of that point encodes meaning.

This is what researchers call a semantic space. Words with related meanings end up positioned close to each other. King and Queen live near each other. Paris and London live near each other. Bicycle and democracy live far apart. The model learns these positions not from human-curated rules, but from the statistical patterns of how words appear together in enormous text corpora.

EXAMPLE: 4-DIMENSIONAL VECTORS (simplified from real 300-dim embeddings)
vec(“King”)   = [ 0.9,  0.7,  0.4,  +0.6 ]
vec(“Queen”)  = [ 0.9,  0.7,  0.4,  -0.6 ]
vec(“Man”)    = [ 0.5,  0.3,  0.1,  +0.8 ]
vec(“Woman”)  = [ 0.5,  0.3,  0.1,  -0.8 ]

The first three dimensions encode royalty, authority, and age.
The fourth dimension encodes gender: positive = masculine, negative = feminine.

Think of it as a map where the geography is meaning. Every word is a pin, and the distances between pins reflect semantic relationships rather than physical ones.

2 · The Geometry of Meaning: Distance and Similarity

Once words are points in space, we need a way to measure how close they are. Two approaches dominate: Euclidean distance and cosine similarity. For the examples below, we use a 3-dimensional temperature embedding:

TEMPERATURE VECTORS (3 dimensions)
vec(“Hot”)  = [  1.0,  0.8,  0.6 ]
vec(“Warm”) = [  0.8,  0.6,  0.4 ]
vec(“Cold”) = [ -0.6,  0.4, -0.8 ]
      

2.1 Euclidean (Cartesian) Distance

The most intuitive measure — the straight-line gap between the tips of two arrows drawn from the origin. For vectors a and b in n dimensions:

        d(a, b)  =  √ Σi ( ai − bi )2 
      

WORKED EXAMPLE: EUCLIDEAN DISTANCE
// Hot vs Warm (similar words)
d(Hot, Warm) = √[ (1.0-0.8)2 + (0.8-0.6)2 + (0.6-0.4)2 ]

                      = √[ 0.04 + 0.04 + 0.04 ] = √0.12  &approx;  0.346  ← small: close together

// Hot vs Cold (opposite words)
d(Hot, Cold) = √[ (1.0-(-0.6))2 + (0.8-0.4)2 + (0.6-(-0.8))2 ]

                      = √[ 2.56 + 0.16 + 1.96 ] = √4.68  &approx;  2.163  ← large: far apart
      

2.2 Cosine Similarity — The Industry Standard

In practice, NLP systems almost universally prefer cosine similarity over Euclidean distance. It ignores the length of vectors entirely and focuses only on the angle between them — two vectors pointing the same direction score 1.0 regardless of their magnitude.

COSINE SIMILARITY

              cos(θ)  =
              a  ·  b
            
              ‖a‖  ×  ‖b‖
            
        Range: −1  (opposite)  →  0  (orthogonal)  →  +1  (identical direction)

WORKED EXAMPLE: COSINE SIMILARITY
// First compute magnitudes

        ‖Hot‖  = √(1.02 + 0.82 + 0.62) = √2.00 &approx; 1.414

        ‖Warm‖ = √(0.82 + 0.62 + 0.42) = √1.16 &approx; 1.077

        ‖Cold‖ = √(0.62 + 0.42 + 0.82) = √1.16 &approx; 1.077

// Hot vs Warm (small angle)

        dot(Hot, Warm) = (1.0)(0.8) + (0.8)(0.6) + (0.6)(0.4) = 0.80 + 0.48 + 0.24 = 1.52

        cos(Hot, Warm) = 1.52 / (1.414 × 1.077) = 1.52 / 1.523 &approx; +0.998

// Hot vs Cold (large angle)

        dot(Hot, Cold) = (1.0)(-0.6) + (0.8)(0.4) + (0.6)(-0.8) = -0.60 + 0.32 – 0.48 = -0.76

        cos(Hot, Cold) = -0.76 / (1.414 × 1.077) = -0.76 / 1.523 &approx; -0.499

Word Pair	Euclidean d	cos(θ)	Interpretation
Hot vs Warm	0.346	+0.998	Nearly identical direction — closely related
Hot vs Cold	2.163	−0.499	Opposite directions — antonyms

3 · Vector Arithmetic: Meaning You Can Add and Subtract

Because words are vectors, you can perform arithmetic on them — and the results are semantically meaningful. The most famous example uses the 4-dimensional royalty vectors introduced in Section 1:

THE CLASSIC ANALOGY
vec(“King”) − vec(“Man”) + vec(“Woman”)  &approx;  vec(“Queen”)
      

WORKED EXAMPLE: KING – MAN + WOMAN
King   = [ 0.9,  0.7,  0.4,  +0.6 ]
Man    = [ 0.5,  0.3,  0.1,  +0.8 ]
Woman  = [ 0.5,  0.3,  0.1,  -0.8 ]

// Subtract component by component, then add

        King – Man = [ 0.9-0.5,  0.7-0.3,  0.4-0.1,  0.6-0.8 ] = [  0.4,   0.4,   0.3,  -0.2 ]

        + Woman   = [ 0.4+0.5,  0.4+0.3,  0.3+0.1,  -0.2+(-0.8) ] = [  0.9,   0.7,   0.4,  -1.0 ]

// Find nearest word by Euclidean distance
result = [ 0.9, 0.7, 0.4, -1.0 ]

d(result, Queen)  = √[ 0 + 0 + 0 + (-1.0-(-0.6))2 ] = √0.16 &approx; 0.400 ← nearest
d(result, Woman)  &approx; 0.671    d(result, King) = 1.600    d(result, Man) &approx; 1.910


        cos(result, Queen) &approx; 0.974   ← highest cosine similarity also points to Queen
      

What happened geometrically? Subtracting Man from King stripped out the gender dimension (+0.8 gone), leaving the royalty structure intact. Adding Woman injected the feminine gender value (-0.8). The result sits 0.4 units from Queen — the nearest word in this vocabulary.

4 · Scalar Multiplication and Division: Changing Intensity

Multiplying or dividing a vector by a scalar (a plain number) changes its magnitude without changing its direction. This maps neatly onto the idea of degree in language — Tiny, Large, and Gigantic all point in roughly the same semantic direction, but at different intensities.

SIZE VECTORS (3 dimensions)
vec(“Tiny”)     = [ 0.10, 0.20, 0.10 ]
vec(“Large”)    = [ 0.50, 0.70, 0.40 ]
vec(“Gigantic”) = [ 1.10, 1.50, 0.90 ]
      

WORKED EXAMPLE: SCALING ALONG THE SIZE AXIS
// Multiplying Large by 2 moves it toward Gigantic
Large × 2 = [ 0.5×2,  0.7×2,  0.4×2 ] = [ 1.00,  1.40,  0.80 ]
vec(“Gigantic”) = [ 1.10,  1.50,  0.90 ]    d(Large × 2, Gigantic) &approx; 0.173 ← very close

// Multiplying Large by 0.2 moves it toward Tiny
Large × 0.2 = [ 0.10,  0.14,  0.08 ]
vec(“Tiny”) = [ 0.10,  0.20,  0.10 ]    d(Large × 0.2, Tiny) &approx; 0.063 ← very close
      

Division works the same way along an intensity axis. Halving a “Loud” vector lands near “Soft”:

WORKED EXAMPLE: DIVIDING ALONG THE LOUDNESS AXIS
vec(“Loud”) = [ 0.90, 1.20, 0.60 ]    vec(“Soft”) = [ 0.30, 0.40, 0.20 ]
Loud ÷ 2 = [ 0.45,  0.60,  0.30 ]
d(Loud ÷ 2, Soft) &approx; 0.269  ← direction unchanged, intensity halved
      

Key intuition: Scalar operations change how much of something a vector represents, without changing what kind of thing it represents. Direction is preserved; intensity is tuned.

5 · The Dot Product: Agreement and Magnitude Together

The dot product of two vectors is computed by multiplying their corresponding components and summing the results:

DOT PRODUCT
a  ·  b  =  Σi  ( ai × bi )  =  a1b1  +  a2b2  + … +  anbn

The dot product is cosine similarity before normalising away the vector lengths. It captures two things simultaneously: the direction of agreement and the combined magnitude. Cosine similarity captures only the first.

We reuse the loudness vectors from Section 4 — Very Loud is “Loud” and A Little Loud is “Soft”. They point in exactly the same direction but have very different lengths:

WORKED EXAMPLE: VERY LOUD vs A LITTLE LOUD
vec(“A Little Loud”) = [ 0.30, 0.40, 0.20 ]  |magnitude| = 0.539
vec(“Very Loud”)     = [ 0.90, 1.20, 0.60 ]  |magnitude| = 1.616

// Cosine similarity: measures direction only

        dot(AL, VL) = (0.3)(0.9) + (0.4)(1.2) + (0.2)(0.6) = 0.27 + 0.48 + 0.12 = 0.87

        cos(AL, VL) = 0.87 / (0.539 × 1.616) = 0.87 / 0.871 &approx; 1.000

// Dot product: measures direction AND magnitude
AL · AL = (0.3)2 + (0.4)2 + (0.2)2 = 0.09 + 0.16 + 0.04 = 0.29
VL · VL = (0.9)2 + (1.2)2 + (0.6)2 = 0.81 + 1.44 + 0.36 = 2.61

Comparison	Magnitude	cos(θ)	v · v
A Little Loud	0.539	1.000 (same dir.)	0.29
Very Loud	1.616	1.000 (same dir.)	2.61

Both words are perfectly collinear — cosine similarity is 1.0 in both cases. But the dot products are 0.29 vs 2.61, a 9× difference. This is why recommendation systems and attention mechanisms in transformer models often prefer raw dot products: when you want to know not just whether a document is relevant but also how prominently it discusses a topic, the dot product gives you both dimensions at once.

6 · Practical Applications

Search engines convert your query into a vector and retrieve documents whose vectors are nearest to it in the semantic space — using cosine similarity to rank by relevance regardless of exact word match. When you search for car insurance and the engine returns results about vehicle coverage, it is doing nearest-neighbour lookup in embedding space, exactly as the Hot/Warm/Cold example in Section 2 demonstrates.

Recommendation systems represent your interests as a vector computed from your history, then find products whose vectors are closest to yours. The dot product is particularly useful here: a highly-relevant item with a large magnitude — analogous to Very Loud — will score higher than a mildly-relevant item even if they point in the same direction.

Large language models use the scaled dot product directly inside the attention mechanism. For every token, a query vector and a set of key vectors are compared via dot product to determine which parts of the context deserve attention — a direct descendant of the arithmetic explored in Section 5.

Quick Reference: Embedding Operations

Operation	Formula	Section 2-5 Result
Euclidean Distance	√( Σ (a_i − b_i)² )	d(Hot,Warm) = 0.346 d(Hot,Cold) = 2.163
Cosine Similarity	(a·b) / (‖a‖×‖b‖)	cos(Hot,Warm) = +0.998 cos(Hot,Cold) = -0.499
Vector Arithmetic	a ± b	King-Man+Woman → nearest Queen (d = 0.400)
Scalar Multiplication	λ · a	Large × 2 → near Gigantic Loud ÷ 2 → near Soft
Dot Product	a·b = Σ a_ib_i	cos = 1.00 for both; dot 0.29 (soft) vs 2.61 (loud)

✦ This article was generated with the assistance of Claude by Anthropic ✦

America’s $200 AI Coding Tool Just Met a $3 Chinese Rival, GLM-4.7

https://www.techloy.com/americas-200-ai-coding-tool-just-met-a-3-chinese-rival-glm-4-7/

Mac Studio vs Mac Mini M4: Local AI Performance Benchmarks

APPLE SILICON 2025
Mac Studio vs Mac Mini M4: Local AI Performance Benchmarks

The rise of local AI has transformed how professionals and enthusiasts interact with large language models. Running AI models locally offers significant advantages: complete data privacy, no recurring subscription costs, offline functionality, and freedom from rate limits. However, the performance of local AI systems varies dramatically depending on hardware choices.

Apple Silicon has emerged as a compelling platform for local AI deployment, leveraging unified memory architecture and efficient neural processing capabilities. But which Apple system delivers the best balance of performance, capability, and value for running local language models?

1 · Motivation

Choosing the right hardware for local AI can be challenging. While cloud-based AI services like ChatGPT and Claude offer convenience, they come with privacy concerns, ongoing costs, and dependency on internet connectivity. Local AI eliminates these issues but requires careful hardware selection to ensure adequate performance.

This benchmark comparison aims to answer four critical questions:

How does the Mac Studio compare to the more affordable Mac Mini M4?
What performance trade-offs exist when scaling from tiny (1B) to medium (14B) models?
Which configurations provide acceptable interactive performance?
Where do Apple Silicon systems stand compared to dedicated GPU solutions?

All benchmarks were conducted using LocalScore AI, a standardized testing platform measuring generation speed, response latency, and prompt processing capabilities. Tests were run on November 13, 2025 using Q4_K Medium quantization.

2 · Key Takeaway

The Mac Studio dominates local AI performance across all model sizes, delivering 2–10x better speeds than the Mac Mini M4 depending on configuration.

Quick Recommendation: Choose Mac Studio for professional work or if you want to run 8B+ models. Choose Mac Mini M4 only if you are budget-constrained and committed to tiny (1B) models exclusively.

3 · Complete Performance Results

Both systems were tested with tiny (1B), small (8B), and medium (14B) models using Q4_K Medium quantization.

Metric	Mac Studio (1B)	Mac Mini M4 (1B)	Mac Studio (8B)	Mac Mini M4 (8B)	Mac Studio (14B)	Mac Mini M4 (14B)
Model	Llama 3.2 1B	Llama 3.2 1B	Llama 3.1 8B	Llama 3.1 8B	Qwen2.5 14B	Qwen2.5 14B
Generation Speed	178 tokens/s	77.1 tokens/s	62.7 tokens/s	17.7 tokens/s	35.8 tokens/s	9.6 tokens/s
Time to First Token	203 ms	1,180 ms	1,060 ms	6,850 ms	2,040 ms	13,300 ms
Prompt Processing	5,719 tokens/s	1,111 tokens/s	1,119 tokens/s	186 tokens/s	583 tokens/s	96 tokens/s
LocalScore Rating	1,713	417	405	78	217	41

4 · Performance Analysis by Model Size

Tiny Model (1B Parameters)

Metric	Mac Studio	Mac Mini M4	Performance Ratio
Generation Speed	178 tokens/s	77.1 tokens/s	2.3x faster
Time to First Token	203 ms	1,180 ms	5.8x faster
Prompt Processing	5,719 tokens/s	1,111 tokens/s	5.1x faster
LocalScore Rating	1,713	417	4.1x higher

Mac Studio: Delivers exceptional performance with near-instantaneous 203 ms response time. Excellent for real-time coding assistance, content creation, and interactive workflows.

Mac Mini M4: Provides functional performance with noticeable 1.18-second latency. Adequate for occasional use and non-critical applications.

Small Model (8B Parameters)

Metric	Mac Studio	Mac Mini M4	Performance Ratio
Generation Speed	62.7 tokens/s	17.7 tokens/s	3.5x faster
Time to First Token	1,060 ms	6,850 ms	6.5x faster
Prompt Processing	1,119 tokens/s	186 tokens/s	6.0x faster
LocalScore Rating	405	78	5.2x higher

Mac Studio: Maintains functional performance with 1.06-second response time. Suitable for quality-focused applications where enhanced model capabilities justify slower speeds.

Mac Mini M4: Experiences severe degradation with 6.85-second latency, making interactive use impractical for most workflows.

Medium Model (14B Parameters)

Metric	Mac Studio	Mac Mini M4	Performance Ratio
Generation Speed	35.8 tokens/s	9.6 tokens/s	3.7x faster
Time to First Token	2,040 ms	13,300 ms	6.5x faster
Prompt Processing	583 tokens/s	96 tokens/s	6.1x faster
LocalScore Rating	217	41	5.3x higher

Mac Studio: Shows significant slowdown with 2.04-second response time. Best for batch-oriented workflows where maximum model capability is prioritised over speed.

Mac Mini M4: Performance becomes severely constrained with 13.3-second latency. Generation at only 9.6 tokens/s makes this configuration unusable for interactive applications.

5 · Model Scaling Performance

Mac Studio Scaling

Model Size	Generation	First Token	Prompt Processing	Score
1B (Tiny)	178 tokens/s	203 ms	5,719 tokens/s	1,713
8B (Small)	62.7 tokens/s	1,060 ms	1,119 tokens/s	405
14B (Medium)	35.8 tokens/s	2,040 ms	583 tokens/s	217

The Mac Studio shows progressive degradation as model size increases but maintains usable performance throughout. The 8x parameter increase from 1B to 8B results in 65% slower generation; the 14B model runs at approximately half the speed of the 8B.

Mac Mini M4 Scaling

Model Size	Generation	First Token	Prompt Processing	Score
1B (Tiny)	77.1 tokens/s	1,180 ms	1,111 tokens/s	417
8B (Small)	17.7 tokens/s	6,850 ms	186 tokens/s	78
14B (Medium)	9.6 tokens/s	13,300 ms	96 tokens/s	41

The Mac Mini M4 experiences catastrophic degradation with larger models. The jump from 1B to 8B results in 77% slower generation; the 14B adds a further 46% reduction. A 13.3-second time to first token makes the 14B configuration nearly unusable for any interactive application.

6 · Configuration Recommendations

Configuration	Performance	Best For	Rating
Mac Studio + 1B	178 tok/s, 203 ms	Real-time coding, content creation	Excellent
Mac Studio + 8B	62.7 tok/s, 1.06 s	Enhanced reasoning, quality work	Good
Mac Studio + 14B	35.8 tok/s, 2.04 s	Max capability, batch workflows	Fair
Mac Mini M4 + 1B	77.1 tok/s, 1.18 s	Budget-conscious, occasional use	Fair
Mac Mini M4 + 8B	17.7 tok/s, 6.85 s	Not suitable for interactive use	Poor
Mac Mini M4 + 14B	9.6 tok/s, 13.3 s	Not practical for any use case	Poor

7 · Bottom Line

The Mac Studio demonstrates clear superiority across all tested configurations, with performance advantages ranging from 2–6x for tiny models up to 10x for larger ones. The system handles tiny models exceptionally well, small models competently, and medium models adequately for users prioritising capability over speed.

The Mac Mini M4 is only viable for tiny (1B) models, where it provides functional if slower performance. Small (8B) and medium (14B) models push the hardware well beyond practical limits, with response latencies of 6.85 and 13.3 seconds making interactive use frustrating or impossible.

Hardware choice significantly impacts local AI usability. Match your investment to your model size requirements: Mac Studio for flexibility across all model sizes, Mac Mini M4 only if you are committed to tiny models exclusively.

8 · Apple Silicon vs Dedicated GPUs

While these benchmarks show the Mac Studio leading among Apple Silicon options, dedicated GPU solutions like the NVIDIA RTX 4090 still deliver 3–5x higher raw performance for similar model sizes, with 400+ tokens/s achievable on small models.

Apple Silicon remains compelling despite lower absolute throughput:

System Integration: All-in-one design without external GPU requirements
Energy Efficiency: Lower power consumption and heat generation
Silent Operation: Minimal fan noise compared to high-performance GPUs
Unified Memory: Efficient sharing between CPU and neural processing
macOS Ecosystem: Seamless integration with macOS applications and workflows

Users requiring maximum raw throughput should consider GPU-based systems. Those prioritising integration, efficiency, noise levels, and macOS compatibility will find Apple Silicon delivers excellent local AI capabilities within its design constraints.

For more hardware comparisons, visit LocalScore AI.

9 · Benchmark Sources

Hardware	Model	Parameters	Test Link
Mac Studio	Llama 3.2 1B	1B (Tiny)	Test #1788
Mac Mini M4	Llama 3.2 1B	1B (Tiny)	Test #1789
Mac Studio	Llama 3.1 8B	8B (Small)	Test #1790
Mac Mini M4	Llama 3.1 8B	8B (Small)	Test #1791
Mac Studio	Qwen2.5 14B	14B (Medium)	Test #1792
Mac Mini M4	Qwen2.5 14B	14B (Medium)	Test #1793

All tests conducted November 13, 2025, using LocalScore AI with Q4_K Medium quantization.

Apple Silicon 2025 · Local AI Benchmarks

✦ This article was generated with the assistance of Claude by Anthropic ✦

XDA: I tried this open-source platform to self-host LLMs, and it’s faster than I expected

https://www.xda-developers.com/open-source-platform-to-self-host-llms-faster-than-expected/

LLMs’ ‘Simulated Reasoning’ Abilities Are a ‘Brittle Mirage,’ Researchers Find

https://m.slashdot.org/story/445464

Automating oral argument

Back in January 2023, a company called DoNotPay offered “any lawyer or person $1,000,000 with an upcoming case in front of the United States Supreme Court to wear AirPods and let our robot lawyer argue the case by repeating exactly what it says.” At the time, everyone thought this was a silly gimmick.

https://adamunikowsky.substack.com/p/automating-oral-argument?utm_source=tldrai

Claude Code overview – Anthropic

Claude Code is an agentic coding tool developed by Anthropic that operates directly within the terminal, designed to help developers write code faster and more efficiently. It understands an entire codebase using natural language commands, assisting with routine tasks, explaining complex code, and handling Git workflows.

By integrating with the development environment, Claude Code can facilitate refactoring, documenting, and debugging, aiming to streamline development processes and enhance code quality. It leverages advanced AI models like Claude Opus 4 to provide deep codebase awareness and the ability to execute commands and edit files directly.

https://docs.anthropic.com/en/docs/claude-code/overview