Matrices for developers

January 23, 2017 — 45 minutes read

math matrix

WARNING: Long article, big images, heavy GIFs.

A few weeks ago I was on an android-user-group channel, when someone posted a question about Android’s Matrix.postScale(sx, sy, px, py) method and how it works because it was “hard to grasp”.

Coincidence: in the beginning of 2016, I finished a freelance project on an Android application where I had to implement an exciting feature:

Android app screenshots

The user, after buying and downloading a digital topography of a crag, had to be able to view the crag which was composed of:

a picture of the cliff,
a SVG file containing an overlay of the climbing routes.

The user had to have the ability to pan and zoom at will and have the routes layer “follow” the picture.

Technical challenge

In order to have the overlay of routes follow the user’s actions, I found I had to get my hands dirty by overloading an Android ImageView, draw onto the Canvas and deal with finger gestures.
As a good engineer: I searched on Stack Overflow :sweat_smile:
And I discovered I’d need the android.graphics.Matrix class for 2D transformations.

The problem with this class, is that it might seem obvious what it does, but if you have no mathematical background, it’s quite mysterious.

boolean postScale (float sx, float sy, float px, float py)

Postconcats the matrix with the specified scale. M’ = S(sx, sy, px, py) * M

Yeah, cool, so it scales something with some parameters and it does it with some kind of multiplication. Nah, I don’t get it:

What does it do exactly? Scales a matrix? What’s that supposed to mean, I want to scale the canvas…
What should I use, preScale or postScale? Do I try both while I get the input parameters from my gesture detection code and enter an infinite loop of trial and error guesstimates? (No. Fucking. Way.)

So at this very moment of the development process I realized I needed to re-learn basic math skills about matrices that I had forgotten many years ago, after finishing my first two years of uni :scream:

WWW to the rescue!

While searching around I’ve found a number of good resources and was able to learn some math again, and it felt great. It also helped me solve my 2D transformations problems by applying my understandings as code in Java and Android.

So, given the discussion I’ve had on the channel I’ve mentioned above, it seems I was not the only one struggling with matrices, trying to make sense of it and using these skills with Android’s Matrix class and methods, so I thought I’d write an article.

The first part, this one, is about matrices. The second part, “2D Transformations with Android and Java”, is about how to apply what you know about matrices in code, with Java and on Android.

What is a matrix?

The first resource you might encounter when trying to understand 2D transformations are articles about “Transformation matrix” and “Affine transformations” on Wikipedia:

I don’t know you, but with this material, I almost got everything — wait…

Chewbacca defense - It does not make sense

NOPE! I didn’t get anything at all.

Luckily, on Khan Academy you will find a very well taught algebra course about matrices.

If you have this kind of problem, I encourage you to take the time needed to follow this course until you reach that “AHA” moment. It’s just a few hours of investment (it’s free) and you won’t regret it.

Why? Because matrices are good at representing data, and operations on matrices can help you solve problems on this data. For instance, remember having to solve systems of linear equations at school?
The most common ways (at least the two I‘ve studied) to solve a system like that is with the elimination of variables method or the row reduction method. But you can also use matrices for that, which leads to interesting algorithms.
Matrices are used heavily in every branch of science, and they can also be used for linear transformation to describe the position of points in space, and this is the use case we will study in this article.

Anatomy

Simply put, a matrix is a 2D array. In fact, talking about a $m \times n$ matrix relates to an array of length $m$ in which each item is also an array but this time of length $n$. Usually, $m$ represents a rows’ number and $n$ a columns’ number. Each element in the matrix is called an entry.
A matrix is represented by a bold capital letter, and each entry is represented by the same letter, but in lowercase and suffixed with its row number and column number, in this order. For example:

$$ \mathbf{A} = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \vdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} $$

Now what can we do with it? We can define an algebra for instance: like addition, subtraction and multiplication operations, for fun and profit. :nerd:

Addition/Subtraction

Addition and subtraction of matrices is done by adding or subtracting the corresponding entries of the operand matrices:

$$ \mathbf{A} + \mathbf{B} = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \vdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} + \begin{pmatrix} b_{11} & b_{12} & \cdots & b_{1n}\\ b_{21} & b_{22} & \vdots & b_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ b_{m1} & b_{m2} & \cdots & b_{mn} \end{pmatrix} = \begin{pmatrix} a_{11}+b_{11} & a_{12}+b_{12} & \cdots & a_{1n}+b_{1n}\\ a_{21}+b_{21} & a_{22}+b_{22} & \vdots & a_{2n}+b_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1}+b_{m1} & a_{m2}+b_{m2} & \cdots & a_{mn}+b_{mn} \end{pmatrix} $$

$$ \mathbf{A} - \mathbf{B} = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \vdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} - \begin{pmatrix} b_{11} & b_{12} & \cdots & b_{1n}\\ b_{21} & b_{22} & \vdots & b_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ b_{m1} & b_{m2} & \cdots & b_{mn} \end{pmatrix} = \begin{pmatrix} a_{11}-b_{11} & a_{12}-b_{12} & \cdots & a_{1n}-b_{1n}\\ a_{21}-b_{21} & a_{22}-b_{22} & \vdots & a_{2n}-b_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1}-b_{m1} & a_{m2}-b_{m2} & \cdots & a_{mn}-b_{mn} \end{pmatrix} $$

Corollary to this definition we can deduce that in order to be defined, a matrix addition or subtraction must be performed against two matrices of same dimensions $m \times n$, otherwise the “corresponding entries” bit would have no sense:
Grab a pen and paper and try to add a $3 \times 2$ matrix to a $2 \times 3$ matrix. What will you do with the last row of the first matrix? Same question with the last column of the second matrix?
If you don’t know, then you’ve reached the same conclusion as the mathematicians that defined matrices additions and subtractions, pretty much :innocent:

Examples

$$ \begin{aligned} \text{Addition}\\ \mathbf{A} + \mathbf{B} &= \begin{pmatrix} 4 & -8 & 7\\ 0 & 2 & -1\\ 15 & 4 & 9 \end{pmatrix} + \begin{pmatrix} -5 & 2 & 3\\ 4 & -1 & 6\\ 0 & 12 & 3 \end{pmatrix}\\\\ &= \begin{pmatrix} 4+\left(-5\right) & \left(-8\right)+2 & 7+3\\ 0+4 & 2+\left(-1\right) & \left(-1\right)+6\\ 15+0 & 4+12 & 9+3 \end{pmatrix}\\\\ \mathbf{A} + \mathbf{B} &= \begin{pmatrix} -1 & -6 & 10\\ 4 & 1 & 5\\ 15 & 16 & 12 \end{pmatrix} \end{aligned} $$

$$ \begin{aligned} \text{Subtraction}\\ \mathbf{A} - \mathbf{B} &= \begin{pmatrix} 4 & -8 & 7\\ 0 & 2 & -1\\ 15 & 4 & 9 \end{pmatrix} - \begin{pmatrix} -5 & 2 & 3\\ 4 & -1 & 6\\ 0 & 12 & 3 \end{pmatrix}\\\\ &= \begin{pmatrix} 4-\left(-5\right) & \left(-8\right)-2 & 7-3\\ 0-4 & 2-\left(-1\right) & \left(-1\right)-6\\ 15-0 & 4-12 & 9-3 \end{pmatrix}\\\\ \mathbf{A} + \mathbf{B} &= \begin{pmatrix} 9 & -10 & 4\\ -4 & 3 & -7\\ 15 & -8 & 6 \end{pmatrix} \end{aligned} $$

Matrix multiplication

Throughout all my math schooling I’ve been said things like "you can’t add apples to oranges, it makes no sense”, in order to express the importance of units.
Well it turns out that multiplying apples and oranges is allowed.
And it can be applied to matrices: we can only add matrices to matrices, but we can multiply matrices by numbers and by other matrices.

In the first case though, the number is not just a number (semantically). You don’t multiply a matrix by a number, you multiply a matrix by a scalar. In order to multiply a matrix by a scalar, we have to multiply each entry in the matrix by the scalar, which will give us another matrix as a result.

$$ k . \mathbf{A} = k . \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \vdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} = \begin{pmatrix} k.a_{11} & k.a_{12} & \cdots & k.a_{1n}\\ k.a_{21} & k.a_{22} & \vdots & k.a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ k.a_{m1} & k.a_{m2} & \cdots & k.a_{mn} \end{pmatrix} $$

And a little example:

$$ 4 . \begin{pmatrix} 0 & 3 & 12\\ 7 & -5 & 1\\ -8 & 2 & 0 \end{pmatrix} = \begin{pmatrix} 0 & 12 & 48\\ 28 & -20 & 4\\ -32 & 8 & 0 \end{pmatrix} $$

The second type of multiplication operation is the multiplication of matrices by matrices. This operation is a little bit more complicated than addition/subtraction because in order to multiply a matrix by a matrix we don’t simply multiply the corresponding entries. I’ll just quote wikipedia on that one:

if $\mathbf{A}$ is an $m \times n$ matrix and $\mathbf{B}$ is an $n \times p$ matrix, their matrix product $\mathbf{AB}$ is an $m \times p$ matrix, in which the $n$ entries across a row of $\mathbf{A}$ are multiplied with the $n$ entries down a columns of $\mathbf{B}$ and summed to produce an entry of $\mathbf{AB}$

:expressionless:

This hurts my brain, let’s break it down:

if $\mathbf{A}$ is an $m \times n$ matrix and $\mathbf{B}$ is an $n \times p$ matrix, their matrix product $\mathbf{AB}$ is an $m \times p$ matrix

We can write this in a more graphical way: $ {\tiny\begin{matrix}^{\scriptsize }\\ \normalsize \mathbf{A} \\ ^{\scriptsize m \times n}\end{matrix} } \times {\tiny\begin{matrix}^{\scriptsize }\\ \normalsize \mathbf{B} \\ ^{\scriptsize n \times p}\end{matrix} } = {\tiny\begin{matrix}^{\scriptsize }\\ \normalsize \mathbf{AB} \\ ^{\scriptsize m \times p}\end{matrix} } $.

See this simple matrix $ {\tiny\begin{matrix}^{\scriptsize }\\ \normalsize \mathbf{A} \\ ^{\scriptsize 2 \times 3}\end{matrix} } = \begin{pmatrix}a_{11} & a_{12} & a_{13}\\a_{21} & a_{22} & a_{23}\end{pmatrix} $ and this other matrix $ {\tiny\begin{matrix}^{\scriptsize }\\ \normalsize \mathbf{B} \\ ^{\scriptsize 3 \times 1}\end{matrix} } = \begin{pmatrix}b_{11}\\b_{21}\\b_{31}\end{pmatrix} $.
We have $m=2$, $n=3$ and $p=1$ so the multiplication will give $ {\tiny\begin{matrix}^{\scriptsize }\\ \normalsize \mathbf{AB} \\ ^{\scriptsize 2 \times 1}\end{matrix} } = \begin{pmatrix}ab_{11}\\ab_{21}\end{pmatrix} $.

Let’s decompose the second part now:

“the $n$ entries across a row of $\mathbf{A}$” means that each row in $\mathbf{A}$ is an array of $n=3$ entries: if we take the first row we get $a_{11}$, $a_{12}$ and $a_{13}$.
“the $n$ entries down a columns of $\mathbf{B}$” means that each column of $\mathbf{B}$ is also an array of $n=3$ entries: in the first column we get $b_{11}$, $b_{21}$ and $b_{31}$.
“are multiplied with” means that each entry in $\mathbf{A}$'s row must be multiplied with its corresponding (first with first, second with second, etc.) entry in $\mathbf{B}$'s column: $a_{11} \times b_{11}$, $a_{12} \times b_{21}$ and $a_{13} \times b_{31}$
“And summed to produce an entry of $\mathbf{AB}$” means that we must add the products of these corresponding rows and columns entries in order to get the entry of the new matrix at this row number and column number: in our case we took the products of the entries in the first row in the first matrix with the entries in the first column in the second matrix, so this will give us the entry in the first row and first column of the new matrix: $a_{11} \times b_{11} + a_{12} \times b_{21} + a_{13} \times b_{31}$

To plagiate wikipedia, here is the formula:

$$ \mathbf{A} = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \vdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} \text{, } \mathbf{B} = \begin{pmatrix} b_{11} & b_{12} & \cdots & b_{1p}\\ b_{21} & b_{22} & \vdots & b_{2p}\\ \vdots & \vdots & \ddots & \vdots\\ b_{n1} & b_{n2} & \cdots & b_{np} \end{pmatrix} $$

$$ \mathbf{AB} = \begin{pmatrix} ab_{11} & ab_{12} & \cdots & ab_{1p}\\ ab_{21} & ab_{22} & \vdots & ab_{2p}\\ \vdots & \vdots & \ddots & \vdots\\ ab_{m1} & ab_{m2} & \cdots & ab_{mp} \end{pmatrix} \text{where } ab_{ij}=\sum_{k=1}^{m}a_{ik}b_{kj} $$

Ok I realize I don’t have any better way to explain this so here is a visual representation of the matrix multiplication process and an example:

$$ \mathbf{A} = \begin{pmatrix} 4 & 3\\ 0 & -5\\ 2 & 1\\ -6 & 8 \end{pmatrix} \text{, } \mathbf{B} = \begin{pmatrix} 7 & 1 & 3\\ -2 & 4 & 1 \end{pmatrix} $$

$$ \begin{aligned} \mathbf{AB} &= \begin{pmatrix} 4\times7+3\times\left(-2\right) & 4\times1+3\times4 & 4\times3+3\times1\\ 0\times7+\left(-5\right)\times\left(-2\right) & 0\times1+\left(-5\right)\times4 & 0\times3+\left(-5\right)\times1\\ 2\times7+1\times\left(-2\right) & 2\times1+1\times4 & 2\times3+1\times1\\ \left(-6\right)\times7+8\times\left(-2\right) & \left(-6\right)\times1+8\times4 & \left(-6\right)\times3+8\times1 \end{pmatrix}\\\\ &= \begin{pmatrix} 28-6 & 4+12 & 12+3\\ 0+10 & 0-20 & 0-5\\ 14-2 & 2+4 & 6+1\\ -42-16 & -6+32 & -18+8 \end{pmatrix}\\\\ \mathbf{AB} &= \begin{pmatrix} 22 & 16 & 15\\ 10 & -20 & -5\\ 12 & 6 & 7\\ -58 & 26 & -10 \end{pmatrix} \end{aligned} $$

Remember:

In order for matrix multiplication to be defined, the number of columns in the first matrix must be equal to the number of rows in the second matrix.

Otherwise you can’t multiply, period.

More details here and here if you are interested.

Transformation matrices

Now that we know what is a matrix and how we can multiply matrices, we can see why it is interesting for 2D transformations.

Transforming points

As I’ve said previously, matrices can be used to represent systems of linear equations. Suppose I give you this system:

$$ \begin{aligned} 2x+y &= 5\\ -x+2y &= 0 \end{aligned} $$

Now that you are familiar with matrix multiplications, maybe you can see this coming, but we can definitely express this system of equations as the following matrix multiplication:

$$ \begin{pmatrix} 2 & 1\\ -1 & 2 \end{pmatrix} . \begin{pmatrix} x\\y \end{pmatrix} = \begin{pmatrix} 5\\0 \end{pmatrix} $$

If we go a little further, we can see something else based on the matrices $\begin{pmatrix}x\\y\end{pmatrix}$ and $\begin{pmatrix}5\\0\end{pmatrix}$.
We can see that they can be used to reprensent points in the Cartesian plane, right? A point can be represented by a vector originating from origin, and a vector is just a $2 \times 1$ matrix.

What we have here, is a matrix multiplication that represents the transformation of a point into another point. We don’t know what the first point’s coordinates are yet, and it doesn’t matter. What I wanted to show is that, given a position vector, we are able to transform it into another via a matrix multiplication operation.

Given a point $P$, whose coordinates are represented by the position vector, $\begin{pmatrix}x\\y\end{pmatrix}$, we can obtain a new point $P^{\prime}$ whose coordinates are represented by the position vector $\begin{pmatrix}x^{\prime}\\y^{\prime}\end{pmatrix}$ by multiplying it by a matrix.

One important thing is that this transformation matrix has to have specific dimensions, in order to fulfill the rule of matrix multiplication: because $\begin{pmatrix}x\\y\end{pmatrix}$ is a $2 \times 1$ matrix, and $\begin{pmatrix}x^{\prime}\\y^{\prime}\end{pmatrix}$ is also a $2 \times 1$ matrix, then the transformation matrix has to be a $2 \times 2$ matrix in order to have:

$$ \mathbf{A} . \begin{pmatrix} x\\y \end{pmatrix} = \begin{pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{pmatrix} . \begin{pmatrix} x\\y \end{pmatrix} = \begin{pmatrix} x^{\prime}\\y^{\prime} \end{pmatrix} $$

Note: The order here is important as we will see later, but you can already see that switching $\mathbf{A}$ and $\begin{pmatrix}x\\y\end{pmatrix}$ would lead to an $undefined$ result (if you don’t get it, re-read the part on matrix multiplication and their dimensions).

Notice that the nature of the transformation represented by our matrix above and in the link is not clear, and I didn’t say what kind of transformation it is, on purpose. The transformation matrix was picked at random, and yet we see how interesting and useful it is for 2D manipulation of graphics.

Another great thing about transformation matrices, is that they can be used to transform a whole bunch of points at the same time.

For now, I suppose all you know is the type of transformations you want to apply: rotation, scale or translation and some parameters.

So how do you go from scale by a factor of 2 and rotate 90 degrees clockwise to a transformation matrix?

Well the answer is:

More math stuff

More specifically I encourage you to read this course on Matrices as transformations (which is full of fancy plots and animations) and particularly its last part: Representing two dimensional linear transforms with matrices.

Come back here once you’ve read it, or it’s goind to hurt :sweat_smile:

Ok I suppose you’ve read the course above, but just in case, here is a reminder

a position vector $\begin{pmatrix}x\\y\end{pmatrix}$ can be broken down as $\begin{pmatrix}x\\y\end{pmatrix} = x.\begin{pmatrix}\color{Green} 1\\ \color{Green} 0\end{pmatrix} + y.\begin{pmatrix}\color{Red} 0\\ \color{Red} 1\end{pmatrix}$.

[Show explanation]

$\begin{pmatrix}\color{Green} a\\ \color{Green} c\end{pmatrix}$ and $\begin{pmatrix}\color{Red} b\\ \color{Red} d\end{pmatrix}$ are the position vectors where $\begin{pmatrix} \color{Green} 0\\ \color{Green} 1\end{pmatrix}$ and $\begin{pmatrix} \color{Red} 1\\ \color{Red} 0\end{pmatrix}$ will land respectively after the transformation matrix $\mathbf{A} = \begin{pmatrix} \color{Green} a & \color{Red} b\\ \color{Green} c & \color{Red} d \end{pmatrix}$ has been applied.

[Show explanation]

given the previous transformation, $\begin{pmatrix} x\\ y \end{pmatrix}$ will land on $\begin{pmatrix} \color{Green} a.x + \color{Red} b.y\\ \color{Green} c.x + \color{Red} d.y \end{pmatrix}$.

If you don’t understand this conclusion, read again, read the course, take your time.

Now remember, our goal is to determine what $ \mathbf{A} $ is, because we know the transformation we want to apply but we’re searching for the matrix we should apply to our position vector(s) in order to transform our graphics.

Let’s take the example of the transformation of a series of points: we know where the position vectors will land, but we’re looking for $ \mathbf{A} $.
We have our cartesian plane with a triangle formed by the three points $P_{(2,1)}$, $Q_{(-2,0)}$, $R_{(0,2)}$, and another triangle which represents a transformed version of the first one: $P^{\prime}_{(5, 0)}$ and $Q^{\prime}_{(-4, 2)}$ and $R^{\prime}_{(2,4)}$.

Cartesian plane containing two triangles

Example transformation of a triangle

We just need two points for this example, let’s take $P$ and $Q$. We know that:

$$ \begin{pmatrix} 2\\ 1 \end{pmatrix} \text{ lands on } \begin{pmatrix} 5\\ 0 \end{pmatrix} $$

$$ \begin{pmatrix} -2\\ 0 \end{pmatrix} \text{ lands on } \begin{pmatrix} -4\\ 2 \end{pmatrix} $$

Which means:

$$ \begin{pmatrix} x\\ y \end{pmatrix} = \begin{pmatrix} 2\\ 1 \end{pmatrix} \text{ lands on } \begin{pmatrix} a.x+b.y\\ c.x+d.y \end{pmatrix} = \begin{pmatrix} 5\\ 0 \end{pmatrix} $$

$$ \begin{pmatrix} x\\ y \end{pmatrix} = \begin{pmatrix} -2\\ 0 \end{pmatrix} \text{ lands on } \begin{pmatrix} a.x+b.y\\ c.x+d.y \end{pmatrix} = \begin{pmatrix} -4\\ 2 \end{pmatrix} $$

From which we can deduce:

$$ \begin{pmatrix} 2.a+1.b\\ 2.c+1.d \end{pmatrix} = \begin{pmatrix} 5\\ 0 \end{pmatrix} $$

$$ \begin{pmatrix} -2.a+0.b\\ -2.c+0.d \end{pmatrix} = \begin{pmatrix} -4\\ 2 \end{pmatrix} $$

The right side gives us $ a=2 $ and $ c = -1 $, with which we can deduce $ b=1 $ and $ d=2 $ from the left side.

And this, is our transformation matrix:

$$ \mathbf{A} = \begin{pmatrix} \color{Green} 2 & \color{Red} 1\\ \color{Green} -\color{Green} 1 & \color{Red} 2 \end{pmatrix} $$

Try that same exercise with $P$ and $R$, or with $Q$ and $R$ and you should end up to the same result.

The identity matrix

We don’t know how to define a transformation matrix yet, but we know its form.
So what do we do next? Remember the last section where we’ve seen that a position vector $\begin{pmatrix} x\\ y \end{pmatrix}$ can be broken down as $\begin{pmatrix} x\\y \end{pmatrix} = x.\begin{pmatrix} \color{Green} 1\\ \color{Green} 0 \end{pmatrix} + y.\begin{pmatrix} \color{Red} 0\\ \color{Red} 1 \end{pmatrix} $ ?

That’s a pretty good starting point, we just laid out our “base” matrix:

$$ \begin{pmatrix} \color{Green} 1 & \color{Red} 0\\ \color{Green} 0 & \color{Red} 1 \end{pmatrix} $$

This matrix represents the base state of your plane, the matrix applied to your plane when you have just loaded your image for example (granted your image is the same size as its receiving container view).
In other words, this is the matrix that, applied to any position vector will return that same position vector.

This matrix is called the identity matrix.

[More on the identity matrix]

Combining transformations

One more thing before we get concrete: We want our user to be able to combine/chain transformations (like zooming and panning at the same time for instance).

In order to chain multiple transformations we need to understand the properties of matrix multiplication, and more specifically the non-commutative and associative properties of matrix multiplication:

Matrix multiplication is associative $\left(\mathbf{A}.\mathbf{B}\right).\mathbf{C} = \mathbf{A}.\left(\mathbf{B}.\mathbf{C}\right)$

[Show explanation]

Matrix multiplication is non-commutative $\mathbf{A}.\mathbf{B} \neq \mathbf{B}.\mathbf{A}$

[Show explanation]

Back to our transformations.
Imagine we want to apply transformation $ \mathbf{B} $, then transformation $ \mathbf{A} $ to our position vector $ \vec{v} $.

We have $ \vec{v^{\prime}} = \mathbf{B} . \vec{v} $ and $ \vec{v^{\prime\prime}} = \mathbf{A} . \vec{v^{\prime}} $, which leads us to:

$$ \vec{v^{\prime\prime}} = \mathbf{A} . \left( \mathbf{B} . \vec{v} \right) $$

We know that matrix multiplication is associative, which gives us:

$$ \vec{v^{\prime\prime}} = \mathbf{A} . \left( \mathbf{B} . \vec{v} \right) \Leftrightarrow \vec{v^{\prime\prime}} = \left( \mathbf{A} . \mathbf{B} \right) . \vec{v} $$

In conclusion, in order to apply multiple transformations at once, we can multiply all our transformation matrices and apply the resulting transformation matrix to our vector(s).

We also know that matrix multiplication is not commutative, so the order in which we multiply our transformation matrices ($ \mathbf{A} . \mathbf{B} $ or $ \mathbf{B} . \mathbf{A} $) will have an impact on our final matrix and will lead to different results, different transformations.

Types of transformations

There are several types of 2D transformations we are able to define using $2 \times 2$ dimensions matrices, and you’ve had a preview of most of them in this course on matrices as transformations.
Namely:

Scaling
Reflexion
Shearing
Rotation

For the rest of this section imagine we have the point $ P_{\left(x, y\right)} $, which represents any point of an object on the plane, and we want to find the matrix to transform it into $ P^{\prime}_{\left(x^{\prime}, y^{\prime}\right)}$ such that

$$ \begin{pmatrix} x^{\prime}\\y^{\prime} \end{pmatrix} = \mathbf{A} . \begin{pmatrix} x\\y \end{pmatrix} = \begin{pmatrix} a & b\\c & d \end{pmatrix} . \begin{pmatrix} x\\y \end{pmatrix} $$

Scaling

Scaling (like zooming in by a factor of 2 for instance) might seem straightforward to represent, right? “Simply multiply the coordinates by the scaling factor and you’re done."
But the pitfall here is that you might want to have different horizontal and vertical scaling factors for your transformation, I mean it’s possible!

So we must differentiate between $ s_{x} $ and $ s_{y} $ which represent the horizontal and vertical scaling factors, respectively.

The two equations this gives us are:

$$ \begin{aligned} x' &= s_{x} . x \\ y' &= s_{y} . y \end{aligned} $$

Knowing that:

$$ \begin{pmatrix} x^{\prime}\\y^{\prime} \end{pmatrix} = \begin{pmatrix} a & b\\c & d \end{pmatrix} . \begin{pmatrix} x\\y \end{pmatrix} $$

We can find $a$, $b$, $c$ and $d$:

$$ \begin{aligned} s_{x} . x &= a . x + b . y\\\\ \Rightarrow a &= s_{x} \text{ and }\\ b &= 0 \end{aligned} $$

$$ \begin{aligned} s_{y} . y &= c . x + d . y\\\\ \Rightarrow c &= s_{y} \text{ and }\\ d &= 0 \end{aligned} $$

In conclusion, the $2 \times 2$ scaling matrix for the factors $\left(s_{x}, s_{y}\right)$ is:

$$ \begin{pmatrix} a & b\\c & d \end{pmatrix} = \begin{pmatrix} s_{x} & 0\\0 & s_{y} \end{pmatrix} $$

Which makes sense, right? I mean, scaling by a factor of $1$ both on the $x$ and $y$ axises will give:

$$ \begin{pmatrix} s_{x} & 0\\0 & s_{y} \end{pmatrix} = \begin{pmatrix} 1 & 0\\0 & 1 \end{pmatrix} $$

Which is… the identity matrix! So nothing moves, basically.

Reflexion

There are 2 types of reflexions we can think about right ahead: reflexion around an axis, or around a point.
To keep things simple we’ll focus on reflexions around the $x$ and $y$ axises (reflexion around the origin is the equivalent of applying a reflexion on the $x$ and $y$ axises successively).

Reflexion around the $x$ axis gives us:

$$ \begin{aligned} x^{\prime} &= x\\ x &= a . x + b . y\\\\ \Rightarrow a &= 1 \text{ and }\\ b &= 0 \end{aligned} $$

$$ \begin{aligned} y^{\prime} &= -y\\ -y &= c . x + d . y\\\\ \Rightarrow c &= 0 \text{ and }\\ d &= -1 \end{aligned} $$

Funny, reflecting around the $x$ axis is the same transformation as scaling $x$ by a factor of $-1$ and $y$ by a factor of $1$:

$$ \begin{pmatrix} a & b\\c & d \end{pmatrix} = \begin{pmatrix} 1 & 0\\ 0 & -1 \end{pmatrix} $$

And reflexion around the $y$ axis gives us:

$$ \begin{aligned} x^{\prime} &= -x\\ -x &= a . x + b . y\\\\ \Rightarrow a &= -1 \text{ and }\\ b &= 0 \end{aligned} $$

$$ \begin{aligned} y^{\prime} &= y\\ y &= c . x + d . y\\\\ \Rightarrow c &= 0 \text{ and }\\ d &= 1 \end{aligned} $$

The transformation matrix to reflect around the $y$ axis is:

$$ \begin{pmatrix} a & b\\c & d \end{pmatrix} = \begin{pmatrix} -1 & 0\\ 0 & 1 \end{pmatrix} $$

Shearing

Now it gets a little bit trickier.

In most examples I’ve found, shearing is explained by saying the coordinates are changed by adding a constant that measures the degree of shearing.
For instance, a shear along the $x$ axis is often represented showing a rectangle with a vertex at $\left(0, 1\right)$ is transformed into a parallelogram with a vertex at $\left(1, 1\right)$.

$\underline{\text{Shearing along } x \text{ axis by a constant } k_{x}=1}$

In this article, I want to explain it using the shearing angle, the angle through which the axis is sheared. Let’s call it $\alpha$ (alpha).

$\underline{\text{Shearing along } x \text{ axis by an angle } \alpha}$

If we look at the plane above, we can see that the new abscissa $x^{\prime}$ is equal to $x$ plus/minus the opposite side of the triangle formed by the $y$ axis, the sheared version of the $y$ axis and the segment between the top left vertex of the rectangle and the top left vertex of the parallelogram. In other words, $x^{\prime}$ is equal to $x$ plus/minus the opposite side of the green triangle, see:

Shearing by negative alpha=-30 degrees when y(P)>0 — Shearing by negative $\alpha=-30^{\circ}$ when $y\left(P\right)>0$

Shearing by positive alpha=30 degrees when y(P)>0 — Shearing by positive $\alpha=30^{\circ}$ when $y\left(P\right)>0$

$\underline{\text{Triangles formed by shearing along } x \text{ axis by an angle } \alpha}$

Remember your trigonometry class?

In a right-angled triangle:

the hypotenuse is the longest side

the opposite side is the one at the opposite of a given angle

the adjacent side is the next to a given angle

$PP^{\prime}$ is the opposite side, we need to find its length ($k$), in order to calculate $x^{\prime}$ from $x$
the adjacent side is $P$'s ordinate: $y$
we don’t know the hypotenuse’s length

From our trigonometry class, we know that:

$$ \begin{aligned} \cos \left( \alpha \right) &= \frac{adjacent}{hypotenuse}\\\\ \sin \left( \alpha \right) &= \frac{opposite}{hypotenuse}\\\\ \tan \left( \alpha \right) &= \frac{opposite}{adjacent} \end{aligned} $$

We know $\alpha$, but we don’t know the length of the hypotenuse, so we can’t use the cosine function.
On the other hand, we know the adjacent side’s length: it’s $y$, so we can use the tangent function to find the opposite side’s length:

$$ \begin{aligned} \tan \left( \alpha \right) &= \frac{opposite}{adjacent}\\\\ opposite &= adjacent \times \tan \left( \alpha \right) \end{aligned} $$

We can start solving our system of equations in order to find our matrix with the following:

$$ x^{\prime} = x + k = x + y . \tan \left( \alpha \right) $$

$$ y^{\prime} = y $$

However, we can see that when $\alpha > 0$, $\tan \left( \alpha \right) < 0$ and when $\alpha < 0$, $\tan \left( \alpha \right) > 0$. This multiplied by $y$ which can itself be positive or negative will give very different results for $x^{\prime} = x + k = x + y . \tan \left( \alpha \right)$.
So don’t forget that $\alpha > 0$ is counterclockwise rotation/shearing angle, while $\alpha < 0$ is clockwise rotation/shearing angle.

$$ \begin{aligned} x^{\prime} &= x + y . \tan \left( \alpha \right) \\ x + y . \tan \left( \alpha \right) &= a . x + b . y\\\\ \Rightarrow a &= 1 \text{ and }\\ b &= \tan \left( \alpha \right) \end{aligned} $$

$$ \begin{aligned} y^{\prime} &= y\\ y &= c . x + d . y\\\\ \Rightarrow c &= 0 \text{ and }\\ d &= 1 \end{aligned} $$

The transformation matrix to shear along the $x$ direction is:

$$ \begin{aligned} \begin{pmatrix} a & b\\c & d \end{pmatrix} = \begin{pmatrix} 1 & \tan \alpha \\ 0 & 1 \end{pmatrix} = \begin{pmatrix} 1 & k_{x}\\ 0 & 1 \end{pmatrix}\\\\ \text{where } k_{x} \text{ is the shearing constant} \end{aligned} $$

Similarly, the transformation matrix to shear along the $y$ direction is:

$$ \begin{aligned} \begin{pmatrix} a & b\\c & d \end{pmatrix} = \begin{pmatrix} 1 & 0\\ \tan \beta & 1 \end{pmatrix} = \begin{pmatrix} 1 & 0\\ k_{y} & 1 \end{pmatrix}\\\\ \text{where } k_{y} \text{ is the shearing constant} \end{aligned} $$

Rotation

Rotations are yet a little bit more complex.

Let’s take a closer look at it with an example of rotating (around the origin) from a angle $ \theta $ (theta).

$\underline{\text{Rotate by an angle } \theta}$

Notice how the coordinates of $P$ and $P^{\prime}$ are the same in their respective planes: $P$ and $P^{\prime}$ have the same set of coordinates $ \left( x, y\right) $ in each planes.
But $P^{\prime}$ has new coordinates $ \left( x^{\prime}, y^{\prime}\right) $ in the first plane, the one that has not been rotated.

We can now define the relationship between the coordinates $ \left(x, y\right) $ and the new coordinates $ \left(x^{\prime}, y^{\prime}\right) $, right?

This is where trigonometry helps again.

While searching for the demonstration of this, I stumbled upon this geometry based explanation by Nick Berry and this video.

To be honest, I’m not 100% comfortable with this solution, which means I didn’t fully understand it. And also after re-reading what I’ve written, Hadrien (one of the reviewers) and I have found my explanation to be a bit awkward.
So I’m leaving it here in case you’re interested, but I suggest you don’t bother unless you’re very curious and don’t mind a little confusion.

[Show fuzzy explanation]

Trigonometry triangles based on $\theta$

$\underline{\text{Unit vectors rotation by } \theta}$

On this plane we see that $x$ (the blue line) can be expressed as the addition of the adjacent side of the green triangle plus the opposite side of the red triangle.
And $y$ as the subtraction of the opposite side of the green triangle from the adjacent side of the red triangle.
We know that:

$$ \begin{aligned} \cos \left( \theta \right) &= \frac{adjacent}{hypotenuse} \Rightarrow adjacent = hypotenuse \times \cos \left( \theta \right)\\\\ \sin \left( \theta \right) &= \frac{opposite}{hypotenuse} \Rightarrow opposite = hypotenuse \times \sin \left( \theta \right) \end{aligned} $$

So we can express our relationship as follows:

$$ \begin{aligned} x & = \color{Green}a\color{Green}d\color{Green}j\color{Green}a\color{Green}c\color{Green}e\color{Green}n\color{Green}t + \color{Red}o\color{Red}p\color{Red}p\color{Red}o\color{Red}s\color{Red}i\color{Red}t\color{Red}e\\ & = \color{Green}h\color{Green}y\color{Green}p\color{Green}o\color{Green}t\color{Green}e\color{Green}n\color{Green}u\color{Green}s\color{Green}e . \cos \left( \theta \right) + \color{Red}h\color{Red}y\color{Red}p\color{Red}o\color{Red}t\color{Red}e\color{Red}n\color{Red}u\color{Red}s\color{Red}e . \sin \left( \theta \right)\\ & = x^{\prime} . \cos \left( \theta \right) + y^{\prime} . \sin \left( \theta \right) \end{aligned} $$

$$ \begin{aligned} y & = \color{Red}a\color{Red}d\color{Red}j\color{Red}a\color{Red}c\color{Red}e\color{Red}n\color{Red}t - \color{Green}o\color{Green}p\color{Green}p\color{Green}o\color{Green}s\color{Green}i\color{Green}t\color{Green}e\\ & = \color{Red}h\color{Red}y\color{Red}p\color{Red}o\color{Red}t\color{Red}e\color{Red}n\color{Red}u\color{Red}s\color{Red}e . \cos \left( \theta \right) - \color{Green}h\color{Green}y\color{Green}p\color{Green}o\color{Green}t\color{Green}e\color{Green}n\color{Green}u\color{Green}s\color{Green}e . \sin \left( \theta \right)\\ & = y^{\prime} . \cos \left( \theta \right) - x^{\prime} . \sin \left( \theta \right)\\ & = -x^{\prime} . \sin \left( \theta \right) + y^{\prime} . \cos \left( \theta \right) \end{aligned} $$

In the end what we really have here is a system of equations that we can represent as a $2 \times 2$ matrix:

$$ \begin{pmatrix} x\\ y \end{pmatrix} = \begin{pmatrix} \cos \theta & \sin \theta\\ -\sin \theta & \cos \theta \end{pmatrix} . \begin{pmatrix} x^{\prime}\\ y^{\prime} \end{pmatrix} $$

But this is not exactly what we are looking for, right?
This defines the relationship to convert from the new coordinates in the original plane $ \left(x^{\prime}, y^{\prime}\right) $ what are the coordinates $ \left(x, y\right) $ in the rotated plane.
Whereas what we want to define is how to convert from the rotated plane (the coordinates that we know) to the original plane.

In order to do what we want, we need to take the same matrix, but define a rotation of $ - \theta $.

We know that:

$$ \begin{aligned} \cos \left( -\theta \right) &= cos \left( \theta \right)\\ \sin \left( -\theta \right) &= - sin \left( \theta \right) \end{aligned} $$

Which gives us our desired rotation matrix:

$$ \begin{pmatrix} a & b\\c & d \end{pmatrix} = \begin{pmatrix} \cos \theta & -\sin \theta\\ \sin \theta & \cos \theta \end{pmatrix} $$

Now for the simple demonstration I’m going to go with the “This position vector lands on this position vector” route.

Suppose you are zooming on the unit vectors like so:

Unit vectors under rotation by $\underline{\theta}$

Trigonometry triangles based on $\underline{\theta}$

Based on the rules of trigonometry we’ve already seen, we have:

$$ \begin{pmatrix} 0\\ 1 \end{pmatrix} \text{ lands on } \begin{pmatrix} \cos \theta \\ \sin \theta \end{pmatrix} $$

$$ \begin{pmatrix} 1\\ 0 \end{pmatrix} \text{ lands on } \begin{pmatrix} - \sin \theta \\ \cos \theta \end{pmatrix} $$

Which means:

$$ \begin{pmatrix} x\\ y \end{pmatrix} = \begin{pmatrix} 1\\ 0 \end{pmatrix} \text{ lands on } \begin{pmatrix} a.x+b.y\\ c.x+d.y \end{pmatrix} = \begin{pmatrix} \cos \theta \\ \sin \theta \end{pmatrix} $$

$$ \begin{pmatrix} x\\ y \end{pmatrix} = \begin{pmatrix} 0\\ 1 \end{pmatrix} \text{ lands on } \begin{pmatrix} a.x+b.y\\ c.x+d.y \end{pmatrix} = \begin{pmatrix} - \sin \theta \\ \cos \theta \end{pmatrix} $$

From which we can deduce:

$$ \begin{pmatrix} 1.a+0.b\\ 1.c+0.d \end{pmatrix} = \begin{pmatrix} \cos \theta \\ \sin \theta \end{pmatrix} $$

$$ \begin{pmatrix} 0.a+1.b\\ 0.c+1.d \end{pmatrix} = \begin{pmatrix} - \sin \theta \\ \cos \theta \end{pmatrix} $$

Easy to deduce $ a = \cos \left( \theta \right) $, $ b = - \sin \left( \theta \right) $, $ c = \sin \left( \theta \right) $ and $ d = \cos \left( \theta \right) $.

Which gives us our desired rotation matrix:

$$ \begin{pmatrix} a & b\\c & d \end{pmatrix} = \begin{pmatrix} \cos \theta & -\sin \theta\\ \sin \theta & \cos \theta \end{pmatrix} $$

Congratulations! You know of to define scaling, reflexion, shearing and rotation transformation matrices. So what is missing?

3x3 transformation matrices

If you’re still with me at this point, maybe you’re wondering why any of this is useful. If it’s the case, you missed the point of this article, which is to understand affine transformations in order to apply them in code :mortar_board: .

This is useful because at this point you know what a transformation matrix looks like, and you know how to compute one given a few position vectors, and it is also a great accomplishment by itself.

But here’s the thing: $2 \times 2$ matrices are limited in the number of operations we can perform. With a $2 \times 2$ matrix, the only transformations we can do are the ones we’ve seen in the previous section:

Scaling
Reflexion
Shearing
Rotation

So what are we missing? Answer: translations!
And this is unfortunate, as translations are really useful, like when the user pans and the image has to behave accordingly (aka. follow the finger).
Translations are defined by the addition of two matrices :

$$ \begin{pmatrix} x'\\ y' \end{pmatrix} = \begin{pmatrix} x\\ y \end{pmatrix} + \begin{pmatrix} t_{x}\\ t_{y} \end{pmatrix} $$

But we want our user to be able to combine/chain transformations (like zooming on a specific point which is not the origin), so we need to find a way to express translations as matrices multiplications too.

Here comes the world of Homogeneous coordinates…

No, you don’t have to read it, and no I don’t totally get it either…

The gist of it is:

the Cartesian plane you’re used to, is really just one of many planes that exist in the 3D space, and is at $ z = 1 $
for any point $ \left(x, y, z\right)$ in the 3D space, the line in the projecting space that is going through this point and the origin is also passing through any point that is obtained by scaling $x$, $y$ and $z$ by the same factor
the coordinates of any of these points on the line is $ \left(\frac{x}{z}, \frac{y}{z}, z\right)$.

Homogeneous coordinates graphics

I’ve collected a list of blog posts, articles and videos links at the end of this post if you’re interested.

Without further dig in, this is helping, because it says that we can now represent any point in our Cartesian plane ($ z = 1 $) not only as a $2 \times 1$ matrix, but also as a $3 \times 1$ matrix:

$$ \begin{pmatrix} x\\ y \end{pmatrix} \Leftrightarrow \begin{pmatrix} x\\ y\\ 1 \end{pmatrix} $$

Which means we have to redefine all our previous transformation matrices, because the product of a $3 \times 1$ matrix (position vector) by a $2 \times 2$ matrix (transformation) is undefined.

Don’t rage quit! It’s straightforward: $\mathbf{z^{\prime}=z}$!

We have to find the transformation matrix $ \mathbf{A} = \begin{pmatrix} a & b & c\\ d & e & f\\ g & h & i \end{pmatrix} $

If, like in the previous section, we imagine that we have the point $ P_{\left(x, y, z\right)} $, which represents any point of an object on the cartesian plane, then we want to find the matrix to transform it into $ P^{\prime}_{\left(x^{\prime}, y^{\prime}, z^{\prime}\right)}$ such that

$$ \begin{pmatrix} x^{\prime}\\y^{\prime}\\z^{\prime} \end{pmatrix} = \mathbf{A} . \begin{pmatrix} x\\y\\z \end{pmatrix} = \begin{pmatrix} a & b & c\\d & e & f\\g & h & i\end{pmatrix} . \begin{pmatrix} x\\y\\z \end{pmatrix} $$