Embeddings, Vectors & High-Dimensional Latent Spaces

Understanding embeddings, vectors, and latent spaces in plain language.


This is the plain-language model I use for embeddings:

An embedding is a list of numbers that represents something. That something can be a word, a sentence, an image, a user, a product, or almost anything else a model needs to compare.

The important bit: in modern ML systems, these numbers are usually learned. Humans do not sit and manually assign every dimension. The model learns a useful representation from data.

Start with a toy example

Imagine we want to compare four objects:

ObjectSpeedHardnessCuddlyHas wheels
Car8901
Teddy0190
Bicycle3401
Phone0600

For the car, the feature list is:

[8, 9, 0, 1]

That list is a vector: an ordered list of numbers.

This example is useful because it shows why vectors help. Computers compare numbers well. If two vectors are close, we can treat the objects as similar for some purpose.

But this toy table is not exactly how real embeddings work. In this table, we named the dimensions ourselves: speed, hardness, cuddly, wheels. In a learned embedding, the dimensions usually do not have clean human names.

What a vector is

A vector is just an ordered list of numbers.

[0.12, -0.45, 1.03, 0.08]

The order matters. If two systems do not agree on what each position means, the numbers cannot be compared safely.

What an embedding is

An embedding is a vector used to represent something in a way that is useful for a task.

For example:

  • word embeddings can place related words near each other
  • sentence embeddings can help find similar passages
  • product embeddings can help with recommendations
  • image embeddings can help compare visual content

The embedding does not store the original thing. It stores a compressed representation that preserves useful relationships.

What "high-dimensional" means

Our toy car vector had four dimensions. Real embeddings often have hundreds or thousands of dimensions.

More dimensions give the model more room to encode useful differences. That does not automatically mean "better." It means the representation has more capacity, and the model still has to learn something useful.

What "latent space" means

A latent space is the space where these learned vectors live.

It is called latent because the features are hidden. We can measure distance and direction in the space, but we usually cannot point to one dimension and say, "this one means softness" or "this one means sarcasm."

That is the tradeoff. Learned embeddings can capture patterns humans did not hand-code, but they are harder to interpret directly.

Why distance matters

Once things become vectors, we can compare them with distance or similarity measures. That is the basis for a lot of search, clustering, recommendation, and retrieval systems.

If two sentence embeddings are close, the sentences may be about similar ideas. If a query embedding is close to a document embedding, that document may be a good result.

"May" matters. Vector similarity is a useful signal, not a guarantee of truth.

A simple way to remember it

Vector: an ordered list of numbers.

Embedding: a learned vector representation of something.

Latent space: the high-dimensional space where those vectors can be compared.

The main idea is not that the numbers are magical. It is that useful similarity can be turned into geometry.


Friendly Copyright & Sharing Reminder by Tushar Mohan.

Hey there! I’m thrilled you stopped by and hope my posts spark ideas of your own.

Feel free to quote short excerpts for commentary, reviews, or academic purposes—but please don’t copy, republish, or remix substantial portions without first getting my written okay.

Need permission? It’s easy—just drop me a note on my email or connect with me on any of the social media platforms I have linked here, with a quick outline of what you’d like to use, and we’ll sort it out fast. Thanks for respecting the work that goes into each post, and for helping keep the internet a place where creators and readers both thrive.

Unless I’ve credited someone else, all articles, code snippets, images, and other goodies on this site are

© Tushar Mohan, 2026.