Embeddings, Vectors & High-Dimensional Latent Spaces

Understanding the basic concepts of Machine Learning.

Mar 11, 2025


Embeddings

Imagine you have a bunch of toys, like metallic cars and teddies. How do you identify if a toy is a metal car or a teddy? Let's walk through the process together.

1. Identify what makes each object different or similar.

đźš— For a car: speed, number of wheels, engine, hard exterior, etc.
🧸 For a teddy: softness, cuddly texture, no wheels, generally no engine, etc.

2. Turn these traits into a corresponding number.

For instance, we could have a speed trait measured on a scale of 0 (slow) to 10 (fast). A car might have 8 for speed, while a teddy might have 0. Another trait could be hardness. A car might get 9, while a teddy might get 1.

Here's a table representing this.

ObjectSpeedHardnessCuddlyHas Wheels
Car8901
Teddy0190
Bicycle3401
Phone0600

In case you are wondering, why numbers? We turn traits into numbers to give computers a way to compare them. Computers are good at numerical operations, so we give them numerical values. When each trait is assigned a number, a computer can quickly check which ones are close (similar) or far apart (different). This is how it figures out that two teddies are more alike (both have high “cuddly” numbers) compared to a car, which has high “speed” and "hardness" numbers.

3. Create a “feature list” (the embedding).

Imagine writing down these numbers in order for each object. For a Car, this could be similar to something below.

SpeedHardnessCuddlyHas Wheels
Car8901

If we preserve the ordering of the accounted properties across notations. [Speed | Hardness | Cuddly | Has Wheels] in the example above. we can represent a Car using a list notation such as [8,9,0,1]. If there is another toy having a similar list notation, let's say [3,9,0,1], then it may (or may not) be a Car. Just that, it seems to be a relatively slower Car.

That list of numbers is your "embedding". Embeddings are like giving each toy a special code that tells a computer what the toy is like. It helps the computer know which toys are similar (like two fast cars) and which are different (like a fast car and... a slow Teddy?).

Vectors

In our current context, a Vector is an ordered lists of numbers. Say [8,9,0,1] for a Car. A vector can represent just about anything—words, images, or objects—as long as we’ve decided how to break down that thing into numerical features. Just as we did with our toys while creating a feature list above.

High-Dimensional Latent Spaces

When you collect a lot of these vectors and imagine them all floating in a big “cloud,” you get a latent space. It’s called latent because it captures features that might not be directly visible or intuitive (like subtle similarities between different toys). And it’s often high-dimensional because we might track dozens or hundreds of features.

Putting it all together:

  • An embedding represents what you want to describe as a vector of numbers.
  • These vectors live in a high-dimensional latent space where distance and direction capture how similar or different the objects (or words, or images) are.

But what is the significance of having a High-Dimensional Latent Space?

A high-dimensional latent space gives a model. a richer, more nuanced way to represent objects, words, or any data. Here’s why that matters:

  1. More Detail and Nuance
    Each extra dimension can capture a different aspect of whatever is being modeled. For a product, they might be features like style, color, brand, etc. The higher the dimension, the more detail the model can encode. The more information it is considering while taking decisions.

  2. Flexible Similarity
    In a high-dimensional space, two points (representing, say, two items) can be similar along one “axis” but differ along another. This lets the model understand that “car” and “truck” might be close in terms of “vehicle-ness” but differ in their size or capacity dimensions.

  3. Better Generalization
    By capturing many properties, the model can adapt to new or complex tasks. Because it has a richer representation, it can, for instance, more easily compare items, group similar things together, or understand nuanced categories.

  4. Dimensionality Reduction
    For us humans, it’s hard to directly view or interpret a high-dimensional space, we can use techniques like PCA or t-SNE to create a simpler, lower-dimensional “snapshot” for visualization—while still keeping the detailed internal representation for machine learning tasks.

In short, a high-dimensional latent space is powerful because it encodes a lot of information, letting models capture subtle relationships that would be lost in just one, two, or three dimensions.


Friendly Copyright & Sharing Reminder by Tushar Mohan.
Hey there! I’m thrilled you stopped by and hope my posts spark ideas of your own.

Feel free to quote short excerpts for commentary, reviews, or academic purposes—but please don’t copy, republish, or remix substantial portions without first getting my written okay.

Need permission? It’s easy—just drop me a note on my email or connect with me on any of the social media platforms I have linked here, with a quick outline of what you’d like to use, and we’ll sort it out fast. Thanks for respecting the work that goes into each post, and for helping keep the internet a place where creators and readers both thrive.

Unless I’ve credited someone else, all articles, code snippets, images, and other goodies on this site are

© Tushar Mohan, 2025.