Vectors in Machine Learning: A Fundamental Building Block

Welcome back to the second episode of the blog series on Linear Algebra from the lens of Machine Learning. In the first episode, an overview of Scalars was discussed alongwith their relevance in machine learning. In this episode, let’s dive deep into vectors, one of the fundamental concepts of linear algebra and discuss their significance in machine learning algorithms.

What Are Vectors?

In the simplest terms, a vector is an ordered array of numbers. These numbers can represent anything from coordinates in space to features of a data point. For example, consider a house with two features: the number of bedrooms and the size in square feet. A house with 2 bedrooms and 1500 sqft can be represented as a vector [2, 1500].

Why Are Vectors Important in Machine Learning?

Vectors play a crucial role in the development and implementation of machine learning algorithms. Here are some reasons why:

Data Representation: Vectors are used to represent data in machine learning algorithms. This makes it easier to compute distances, similarities, or make predictions. For example, in a simple linear regression problem, a house can be represented as a vector of its features – [number of bedrooms, number of bathrooms, area in square feet, etc.].
Operations: Vector operations like addition, subtraction, dot product, etc., are commonly used in machine learning algorithms. For example, in gradient descent, a fundamental optimization algorithm, we compute the gradient (a vector of partial derivatives) and update the model parameters (another vector) in the direction of the negative gradient. This is done iteratively – compute the gradient, update the parameters in the direction of the negative gradient, repeat until convergence. Each iteration will, hopefully, bring the model parameters closer to the values that minimize the loss function, and thus make the model’s predictions more accurate.
Model Parameters: The parameters of a model can also be represented as vectors. For example, in a simple linear regression model, the parameters (or weights) can be represented as a vector, (w = [w1, w2, …, wn] ).
Predictions: The predictions made by a model can also be represented as vectors. For example, in a multi-class classification problem, the predicted probabilities for each class can be represented as a vector.

Let’s go through a simple example to illustrate these points:

Example: Suppose we are building a simple linear regression model to predict the price of a house based on two features: the number of bedrooms and the size in square feet. We have a dataset with the following three houses:

House 1: 2 bedrooms, 1500 sqft
House 2: 3 bedrooms, 2000 sqft
House 3: 4 bedrooms, 2500 sqft

We can represent these houses as vectors:

[ x1 = [2, 1500] ]
[ x2= [3, 2000] ]
[ x3 = [4, 2500] ]

Suppose our model has the parameters:

[ w = [1000, 200] ]

To make a prediction for the price of House 1, we compute the dot product of the feature vector of House 1 and the parameter vector:

[(x1).(w) = 2*1000 + 1500*200 = 302,000 ]

So, our model predicts that the price of House 1 is $302,000.

What Can Go Wrong in the Absence of Vectors?

Vectors are fundamental to the structure and implementation of machine learning algorithms, so their absence would pose several significant challenges:

Data Representation: In machine learning, data is often represented as a matrix, where each row is a vector representing a data point, and each column is a vector representing a feature. This representation is convenient because it allows us to perform operations on all the data points at once using matrix operations, which are computationally efficient. For example, we can compute the predictions for all the data points in a dataset at once by multiplying the feature matrix by the parameter vector. Without vectors, we would need to represent the data in a different way, which may not allow us to perform operations on all the data points at once and may not be as computationally efficient.
Computational Efficiency: Vectorized operations, such as matrix multiplication, are highly optimized and can be performed much more efficiently than performing the same operations element-wise using loops. For example, multiplying two matrices can be done much more quickly using a single matrix multiplication operation than by multiplying each pair of elements individually using a loop. This is especially important in machine learning, where we often need to perform operations on very large datasets. Without vectors, we would need to use loops to perform these operations, which would be much less efficient and would significantly increase the computational time required to train and evaluate models.
Mathematical Operations: Most machine learning algorithms involve mathematical operations like computing distances, similarities, or gradients. These operations are often defined in terms of vectors. For example, the Euclidean distance between two points is defined as the square root of the sum of the squares of the differences between corresponding elements of the two points, which is essentially the norm of the difference between the two vectors representing the points. Without vectors, we would need to define a completely different way of computing distances, which may not have the nice mathematical properties (like being a metric) that the Euclidean distance has.
Model Parameters: The parameters of a model are often represented as vectors because it is computationally efficient to update and store them in this way. For example, in linear regression, the model parameters can be updated using the gradient of the loss function, which is a vector. This update can be done efficiently using vectorized operations. Without vectors, we would need to update each parameter individually, which would be much less efficient, especially for models with a large number of parameters.
Predictions: The predictions made by a model are often represented as vectors because it is computationally efficient to compute them in this way. For example, in a multi-class classification problem, the output of the model can be a vector of probabilities, one for each class. This output can be computed efficiently as the softmax of the dot product of the input vector and the parameter matrix. Without vectors, we would need to compute each probability individually, which would not only be less effective but also undesirable.

Therefore, without vectors, we would need to come up with entirely new ways of representing and manipulating data, performing mathematical operations, representing and updating model parameters, and representing and interpreting predictions. This would make the development and implementation of machine learning algorithms cumbersome and costly.

Brief Overview of Vector Properties

In this section, let’s have a brief overview on some important properties of vectors and their significance in machine learning. These properties will be discussed in more detail in subsequent episodes.

Addition: Adding two vectors results in another vector, where each element is the sum of the corresponding elements of the two vectors. This is used in many algorithms, for example, when updating model parameters in gradient descent.
Scalar Multiplication: Multiplying a vector by a scalar results in another vector, where each element of the original vector is multiplied by the scalar. This is used, for example, when scaling the gradient in gradient descent.
Dot Product: The dot product of two vectors is a scalar, obtained by multiplying corresponding elements of the two vectors and summing the results. This is used in many contexts, for example, when computing the similarity between two vectors or when making predictions in a linear regression model.
Norm: The norm of a vector is a measure of its length or size. The most common norm used in machine learning is the Euclidean norm (or L2 norm), which is used in many algorithms that involve computing distances between data points, such as k-nearest neighbors or k-means clustering.
Cosine Similarity: This is the cosine of the angle between two vectors and is used as a measure of similarity between the vectors. It is often used in text analysis to measure the similarity between two documents.
Orthogonality: Two vectors are orthogonal if their dot product is zero. This means that they are perpendicular to each other in the vector space. Orthogonality is a key concept in many machine learning algorithms, for example, in Principal Component Analysis (PCA), it ensures that the principal components are uncorrelated, each contributing uniquely to the variance in the data, which is fundamental to the goals of PCA – variance maximization and dimensionality reduction.

These properties form the mathematical foundation for many operations and algorithms in machine learning. Understanding these concepts is crucial for understanding and implementing machine learning algorithms.

Conclusion

In this episode, the significance of vectors in machine learning algorithms was discussed using a simple example. The impact of absence of vectors was assessed and a brief overview of some important properties of vectors with their significance in machine learning was also outlined. These properties will be covered in more detail in subsequent episodes. Stay tuned!

SimplifAIng