Jordan forms — grow your own!

I remember how infuriating it was to read about the Jordan form when I was first learning linear algebra. Nobody ever wrote down what the algorithm was! They all blabbed on about generalised eigenvectors, generalised eigenspaces, minimal polynomial, algebraic multiplicity,and geometric multiplicity, and kept beating around the bush with all the abstraction without ever writing down a complete and concise procedure on how to put a matrix in Jordan form. Eventually, after reading at least three linear algebra textbooks (that actually went deep enough into the subject to talk about the Jordan form) I managed to decipher what the algorithm should be. I knew that it should be almost the same as the diagonalisation algorithm for a matrix, but with a little twist and a few more steps. It took a little effort. Eventually, I got it.

My purpose here is to save you the deciphering effort and to just tell you what the damn algorithm is. Plain language, or as plain as linear algebra ever gets. Warning: the algorithm that I present is the stupid algorithm. It's only meant to be used by hard-working adult humans and is not particularly recommended for computers or small children. I'm sure there are smarter numerical methods for computing Jordan forms if this is the sort of thing you need to do for a living (although matrices that cannot be diagonalised and require Jordan forms instead are relatively rare in Nature). In fact, one such numerical method can be found here:

http://portal.acm.org/citation.cfm?id=355912

The stupid algorithm I will be describing instead plays to this tune:

Compute the characteristic polynomial.
1. Subtract a variable down the diagonal of your matrix.
2. Compute the determinant of resulting matrix
3. Simplify resulting polynomial if necessary. This is the characteristic polynomial.
Factor the characteristic polynomial.
1. A difficult problem in general. Consult polynomial factoring algorithms.
2. The roots of the characteristic polynomial are the eigenvalues of the matrix.
Find the eigenvectors for each eigenvalue.
1. Subtract one at a time each of the eigenvalues from the previous step down the diagonal of the matrix.
2. Row-reduce the resulting matrices.
3. Interpet row-reduced matrices as homogeneous systems of linear equations where all the constants on the right are zero, and write down as many linearly independent solutions as possible. These are the eigenvectors.
Find generalised eigenvectors.
1. If there aren't as many eigenvectors as size of the matrix, proceed with this step.
2. Again subtract each eigenvalue down the diagonal of the matrix, and augment the resulting matrix on the right with the corresponding eigenvector. Row-reduce again.
3. Interpret the row-reduced matrix as a system of inhomogeneous linear equations and solve. If there is more than one free variable, cut down the degrees of freedom by specifying that the solution must be orthogonal to one or more eigenvectors belonging to this eigenvalue (use enough orthogonality conditions until there is only one free variable.) A solution to this problem is a generalised eigenvector.
4. Repeat step 4.c if necessary, augmenting the same matrix instead with the solution just found. Stop when there are as many solution vectors as size of matrix, or if no solution to the inhomogeneous system exists. In the latter case, pick another eigenvector and find its generalised eigenvectors.
Form the transition matrix and the Jordan form.

Put each eigenvector followed by its generalised eigenvectors in the order they were found as column vectors in a matrix. This is the transition matrix
Conjugate the original matrix by the transition matrix (meaning multiply the original matrix on the left by the inverse of the transition matrix and on the right by the transition matrix). This is the Jordan form, which might be the diagonalisation of a matrix in case that Step 4 was superfluous.

I am going to demonstrate this algorithm below. We will be working throughout with the following nontrivial three (yikes!) examples of sizes 4 × 4, 3 × 3, and 2 × 2 because we are hard-core:

       A                  B                  C

[ -1 -3  3  0 ]     [ -2 -2  2 ]          [ -1 -6 ]
[ -2 -1  0 -3 ],    [ -2  1  1 ],  and    [  2  6 ].
[ -2  0 -1 -3 ]     [ -9 -7  7 ]
[  0  2 -2 -1 ]

Let's get to work. *cracks knuckles*

Step Número Uno: Compute the characteristic polynomial

An ominous melody slowly builds up.

To compute the characteristic polynomial, take your matrix, subtract a variable down the diagonal (traditionally λ or x; I'll use λ) and compute the determinant of the resulting matrix. In this case we are computing the determinants of

                               

              [ -1 - λ     -3        3        0   ]  
     A - λI = [   -2     -1 - λ      0       -3   ]  
              [   -2        0     -1 - λ     -3   ],    
              [    0        2       -2     -1 - λ ]

 


              [ -2 - λ    -2       2    ]                        [  -1 - λ    -6    ]
     B - λI = [   -2     1 - λ     1    ],     and      C - λI = [     2     6 - λ  ].
              [   -9      -7     7 - λ  ]

"Determinant of a 4 × 4?" I hear you whine, maggot? Oh, boo-hoo! Real Mathematicians™ do this three times before breakfast in the morning. As I once remarked to a colleague, "I am not a pussy" (and you should say it too before going on), so we shall proceed with the computations. Now get down and give me 4!

For matrix A we begin with a Laplace expansion along the first row to obtain

                      |-1 - λ      0      -3  |     |-2     0      -3  |   |-2  -1 - λ    -3  |
det(A - λI) = (-1 - λ)|   0     -1 - λ    -3  |  + 3|-2  -1 - λ    -3  | +3|-2     0      -3  |.
                      |   2       -2    -1 - λ|     | 0    -2    -1 - λ|   | 0     2    -1 - λ|

Now we compute the three minor 3 × 3 determinants that arose.

det(A - λI) = (-1-λ)[(-1 - λ)³ + 6(-1 - λ) -6(1 - λ)] + 3[-2(-1 - λ)² - 12 + 12] + 3[12 + 2(1 - λ)² -12]

Most of this cancels to yield, in a pretty factorised form,

det(A - λI) = (-1 - λ)⁴.

With similar efforts but slightly fewer computations, we find that the other two characteristic polynomials are

det(B - λI) = 8 - 12λ + 6λ² - λ³,
det(C - λI) = 6 - 5λ + λ².

Le Step Deux: Factor the characteristic polynomial

A lively French interlude plays.

That is, take the characteristic polynomial and write it as a product of linear factors (which can be done in this case by the fundamental theorem of algebra, right?) so that we may find its roots. The roots of the characteristic polynomial are the eigenvalues of the corresponding matrix. We need these eigenvalues before we can proceed with the Jordan form.

Factoring polynomials is in itself a difficult problem in general (and actually one of the more stupid ways to find eigenvalues). In this case, however, it is actually rather easy. The characteristic polynomial for the A matrix came pre-factored during its computation, and the other two aren't hard to factor. This is because I deliberately chose computationally easy examples to work with so as to highlight the steps in the algorithm and not get bogged down with too many computations. It is not hard to see that

det(A - λI) = (-1 - &lambda)⁴,
det(B - λI) = (2 - λ)³,
det(C - λI) = (2 - λ)(3 - λ),

so that the eigenvalues are -1 and 2 for matrices A and B correspondingly, while matrix C has the two eigenvalues 2 and 3.

Step The Third: Find the eigenvectors for each eigenvalue

A recognisable and familiar hymn leads us on.

This means, take each eigenvalue, subtract it down the diagonal of the corresponding matrix, and row reduce the resulting matrix using the Gaussian algorithm. Then interpret this row reduced matrix as a homogeneous linear system and find as many linearly independent solution vectors as possible. These vectors are the eigenvectors corresponding to the eigenvalue you subtracted down the diagonal.

For matrix A this is done thusly:


          [  0 -3  3  0 ]                [ 0  -3  3   0]
          [ -2  0  0 -3 ]    R3-R2       [-2   0  0  -3]
  A + I = [ -2  0  0 -3 ]  -------->     [ 0   0  0   0]
          [  0  2 -2  0 ]  R4-(2/3)R1    [ 0   0  0   0]

  -R1     [ 2  0  0  3 ]             [ 2  0  0  3]
------->  [ 0  3 -3  0 ]  -------->  [ 0  1 -1  0]
  -R2     [ 0  0  0  0 ]  (1/3)R3    [ 0  0  0  0]
R1<->R2   [ 0  0  0  0 ]             [ 0  0  0  0]

Ok, so I did several row operations in one step, and I didn't do them in the strict order that the Gaussian row-reduction algorithm requires. Also, because I like integers, I didn't divide the first row by its leading term in order to get a leading one. These are merely cosmetic enhancements for human eyes. It should nevertheless be clear that except for dividing each row by its leading term, I have already put the matrix into reduced row-echelon form, and that we are not going to be getting any more rows of zeroes.

Now we reinterpret the row reduced matrix as a homogeneous system of linear equations where all the constant terms on the right are zero (we could have done this from the beginning), and see that there should be two linearly independent solutions and that

      u₁       u₂
     [ 3 ]  [  0 ] 
     [ 0 ]  [  1 ]
     [ 0 ]  [  1 ]
     [-2 ]  [  0 ]

are two such solutions (that is, matrix A + I sends those two vectors into zero when it multiplies them from the left).

In a similar manner, for matrix B we subtract 2 down the diagonal to obtain

            [ -4  -2   2 ]
   B - 2I = [ -2  -1   1 ]
            [ -9  -7   5 ]

and row reduce it to the form

   [  5   0  -2 ]
   [  0   5  -1 ]
   [  0   0   0 ]

where I have again opted for the cosmetic enhancement to not divide each row by its leading term. This homogeneous system only has one linearly independent solution, such as

     v₁
   [  2 ]
   [  1 ]
   [  5 ]

For matrix C, we first subtract eigenvalues 2 and 3 down the diagonal to obtain

         [ -3  -6 ]             [ -4  -6]
C - 2I = [  2   4 ]    C - 3I = [  2   3],

which row-reduce to

      [ 1  2 ]    [ 2  3 ]
      [ 0  0 ]    [ 0  0 ]

yielding the eigenvectors

     w₁      w₂
    [ 2 ]  [  3 ]
    [-1 ]  [ -2 ].

Jordanian Step: Find generalised eigenvectors

An apocryphal extended piece fills the room.

For this step, if we still don't have enough eigenvectors (as many eigenvectors as the size of the matrix), then for each eigenvector v_k, we solve the inhomogeneous linear systems (M - λI)v_{k + 1} = v_k successively in order. Yes, more Gaussian algorithm, I'm afraid. If the system has a solution, it should have one-dimensional solutions, meaning, one free variable after we row-reduce, or one leading 1 less than the size of the matrix. It might have more, which will be the case only if there is more than one Jordan block with the same eigenvalue. In that case, we impose the further restriction that the solution v_k+1 must be orthogonal to one or more of the eigenvectors belonging to this eigenvalue.

The system might also have no solution, which means that this particular eigenvector doesn't have more generalised eigenvectors. Basically, what happens is that each eigenvector will have a chain of generalised eigenvectors, and all the chains of generalised eigenvectors taken together will give us as many generalised eigenvectors as the size of the matrix. If a particular chain has stopped yielding eigenvectors, it is time to look at another eigenvector and see how many generalised eigenvectors it yields. We will do all of this with our ongoing examples.

All of this step and its various substeps may be superfluous if the matrix is already diagonalisable (has as many eigenvectors as its size).

We begin with matrix A. We have found in the previous step two eigenvectors, dubbed u₁ and u₂ Let us begin by looking for generalised eigenvectors for u₁.

This amounts to solving the following inhomogeneous linear system. Observe that I augmented the matrix A + I on the right by the eigenvector u₁ and added another row which says that the solution has to be orthogonal to the eigenvector u₁ (I could have also chosen the solution to be orthgonal to u₂ instead.)

      [  0 -3  3  0 |  3 ]
      [ -2  0  0 -3 |  0 ]
      [ -2  0  0 -3 |  0 ]
      [  0  2 -2  0 | -2 ]
      [  3  0  0 -2 |  0 ]

After computations that look awfully familiar, this row-reduces to

      [  1  0  0  0 |  0 ]
      [  0  1 -1  0 | -1 ]
      [  0  0  0  1 |  0 ] .
      [  0  0  0  0 |  0 ]
      [  0  0  0  0 |  0 ]

Observe that there is only one free variable, the third one from left to right. From this we can read that

      u₃
    [ 0 ]
    [ 1 ]
    [ 2 ]
    [ 0 ]

is a generalised eigenvector.

We keep looking for generalised eigenvectors, this time augmenting the matrix with u₃. But the resulting matrix

      [  0 -3  3  0 |  0 ]
      [ -2  0  0 -3 |  1 ]
      [ -2  0  0 -3 |  2 ]
      [  0  2 -2  0 |  0 ]
      [  3  0  0 -2 |  0 ]

row reduces to

      [  1  0  0  0 |  0 ]
      [  0  1 -1  0 |  0 ]
      [  0  0  0  1 |  0 ] ,
      [  0  0  0  0 |  1 ]
      [  0  0  0  0 |  0 ]

which is an inconsistent set of equations with no solution (the fourth row says that 0 = 1). Therefore, u₁ only has the generalised eigenvector u₃ and it follows that the last generalised eigenvector must come from u₂.

We augment the same matrix with u₂, and for variety's sake, we now require the solution to be orthgonal to u₂ instead of u₁:

      [  0 -3  3  0 |  0 ]
      [ -2  0  0 -3 |  1 ]
      [ -2  0  0 -3 |  1 ].
      [  0  2 -2  0 |  0 ]
      [  0  1  1  0 |  0 ]

The row-reduced matrix this time is

      [  2  0  0  3 | -1 ]
      [  0  1  0  0 |  0 ]
      [  0  0  1  0 |  0 ] ,
      [  0  0  0  0 |  0 ]
      [  0  0  0  0 |  0 ]

yielding the solution

     u₄
   [  1 ]
   [  0 ]
   [  0 ].
   [ -1 ]

Since that gives us already four generalised eigenvectors total, there is no need to check if u₄ will give us another eigenvector, because we know it won't. This concludes step 4 for matrix A.

Matrix B only has one eigenvector, so we know that there must be two generalised eigenvectors for this one eigenvector. Also, because there is only one eigenvector for this eigenvalue, there is no need to impose orthogonality conditions.

Proceeding as before, we augment the matrix B - 2I by the vector v₁ on the right (look back to Step the Third), yielding

  [ -4  -2   2  | 2 ]
  [ -2  -1   1  | 1 ],
  [ -9  -7   5  | 5 ]

and the corresponding row reduction is

  [ 5  0  -2 | -2 ]
  [ 0  5  -1 | -1 ].
  [ 0  0   0 |  0 ]

A possible solution is

     v₂
    [ 2 ]
    [ 1 ] ,
    [ 6 ]

which we use again to augment the matrix B - 2I like so:

  [ -4  -2   2  | 2 ]
  [ -2  -1   1  | 1 ],
  [ -9  -7   5  | 6 ]

reducing to

  [ 5  0  -2 | -1 ]
  [ 0  5  -1 | -3 ]
  [ 0  0   0 |  0 ]

and suggesting the solution

Since that gives us three generalised eigenvectors for B of size three, we are done with the Jordanian step for the matrix B.

Since matrix C is of size two and has two ordinary eigenvectors, there is no need to perform this step on this matrix.

Grand Finale: Form transition matrix and the Jordan form

Cue dramatic music!

The hard work is done. How much of that did you follow? We can enjoy our dessert now, which simply consists of putting the (generalised) eigenvectors we found above into a pretty matrix as columns, taking care to put the generalised eigenvectors in the order that we found them following their ordinary eigenvector. This is the transition matrix. Then we conjugate the original matrix by this transition matrix (multiply the original matrix by the transition matrix on the right and by the inverse of the transition matrix on the left), and this is the Jordan form we sought.

The transition matrix for A is

 
        u₁ u₃ u₂ u₄
      [ 3  0  0  1 ]
  U = [ 0  1  1  0 ]
      [ 0  2  1  0 ]
      [-2  0  0 -1 ]

where u₁ is preceded by u₃ and u₂ by u₄ because u₃ is a generalised eigenvector belonging to u₁ and u₄ belongs to u₂. The Jordan form is therefore

J_A = U^-1AU

           [ -1  1  0  0 ]
           [  0 -1  0  0 ]
    J_A  =  [  0  0 -1  1 ].
           [  0  0  0 -1 ]

Observe that there are two Jordan blocks, one for each eigenvector, and the lone eigenvalue -1 runs down the diagonal.

For B we find the transition matrix to be

          v₁ v₂ v₃
        [ 2  2  3 ] 
    V = [ 1  1  1 ]
        [ 5  6  8 ]

where again observe that we put the two generalised eigenvectors following their ordinary eigenvector in the order that we found them. The Jordan form is J_B = V^-1BV or

         [ 2 1 0 ]
    J_B = [ 0 2 1 ] .
         [ 0 0 2 ]

One Jordan block because B only had one eigenvector.

The last transition matrix is

        w₁ w₂
      [ 2  3 ]
  W = [-1 -2 ],

and the Jordan form J_C = W^-1CW is

          [ 2  0 ] 
     J_C = [ 0  3 ].

Two trivial Jordan blocks because there were two eigenvectors, and the matrix is in fact diagonal because we didn't have to worry about generalised eigenvectors for this one.

Epilogue: Who cares about all this matrix crap?

Actually, not many people, to be frank. Numerical analysts have little use for the algorithm that I presented above, because as I said, they have better methods. In fact, the Jordan form is a bit of an artificial theoretical gadget, because practically all matrices can be diagonalised and don't need the Jordanian step above of finding generalised eigenvectors. Nothing wrong with theoretical gadgets, except that fewer loons are interested in them.

The Jordan form, although useful, is of more limited scope than the ordinary diagonalised form that most matrices have. If there is indeed a naturally-occuring matrix arising from practical problem out there that needs a Jordan form, (for example, some difference equations, discrete cousins of differential equations, may give rise to matrices that are not diagonalisable), then there is something very special and particular about this situation that merits very careful analysis.

So why did we go through all the above work with three different matrices, if it's all mostly useless? Because people have asked me how to compute Jordan forms by hand, and this writeup is my answer. Because theory is pretty. And because we can, and therefore we should.

Jordan block	Schubert cell	Primary decomposition theorem	Canonical form
characteristic polynomial	Matrix	Canonical Forms Lemma	Eigenvalues of nilpotent matrices are always 0
Things to do, if you are hard-core	Playfair Cipher	How to solve linear equations with matrices	factorial
similar	Module	December 19, 2019	block matrix
Dungeoneer	Laplace's equation	First Declension	control theory
The expansion of the universe is speeding up	minimal polynomial	Cayley-Hamilton Theorem