5.3 Example using conjugate directions

  5.3.1 First method, Direct calculus
  5.3.2 Second method, basic Conjugate direction
  5.3.3 Third method. Conjugate gradient
  5.3.4 Fourth method. Conjugate gradient using Fletcher-Reeves
  5.3.5 Fifth method. Conjugate gradient using Polak-Ribiere

This example is solved in number of ways. Given quadratic function \(J\left ( x_{1},x_{2}\right ) =\frac{1}{2}x^{T}Ax+bx^{T}\) where \(x=\begin{bmatrix} x_{1}\\ x_{2}\end{bmatrix} \). To find \(x^{\ast }\), which minimizes \(J\left ( x\right ) \). Let \(A=\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix} \) and \(b=\begin{bmatrix} -1\\ 1 \end{bmatrix} \)

5.3.1 First method, Direct calculus

\begin{align*} \nabla J\left ( x\right ) & =0\\ Ax+b & =\begin{bmatrix} 0\\ 0 \end{bmatrix} \\\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\end{bmatrix} +\begin{bmatrix} -1\\ 1 \end{bmatrix} & =\begin{bmatrix} 0\\ 0 \end{bmatrix} \\\begin{bmatrix} 4x_{1}+2x_{2}-1\\ 2x_{1}+2x_{2}+1 \end{bmatrix} & =\begin{bmatrix} 0\\ 0 \end{bmatrix} \end{align*}

Solving gives \[ x^{\ast }=\begin{bmatrix} x_{1}\\ x_{2}\end{bmatrix} =\begin{bmatrix} 1\\ -\frac{3}{2}\end{bmatrix} \]

5.3.2 Second method, basic Conjugate direction

Since \(A\) is of size \(n=2\), then this will converge in \(2\) steps using conjugate directions. let \(x^{0}=\begin{bmatrix} 0\\ 0 \end{bmatrix} \). Let first direction be \[ v^{0}=\begin{bmatrix} 1\\ 0 \end{bmatrix} \] Then\[ h_{0}=\frac{-\left ( v^{0}\right ) ^{T}\nabla J\left ( x^{0}\right ) }{\left ( v^{0}\right ) ^{T}Av^{0}}=\frac{-\left ( v^{0}\right ) ^{T}\left ( Ax^{0}+b\right ) }{\left ( v^{0}\right ) ^{T}Av^{0}}=\frac{-\begin{bmatrix} 1 & 0 \end{bmatrix}\begin{bmatrix} -1\\ 1 \end{bmatrix} }{\begin{bmatrix} 1 & 0 \end{bmatrix}\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 1\\ 0 \end{bmatrix} }=\frac{1}{4}\] Hence\begin{align*} x^{1} & =x^{0}+h_{0}v^{0}\\ & =\begin{bmatrix} 0\\ 0 \end{bmatrix} +\frac{1}{4}\begin{bmatrix} 1\\ 0 \end{bmatrix} =\begin{bmatrix} \frac{1}{4}\\ 0 \end{bmatrix} \end{align*}

Second step. We need to find \(v^{1}\). Using conjugate mutual property of \(A\), we solve for \(v^{1}\) using\begin{align*} \left ( v^{0}\right ) ^{T}Av^{1} & =0\\\begin{bmatrix} 1 & 0 \end{bmatrix}\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} v_{1}\\ v_{2}\end{bmatrix} & =0\\ 4v_{1}+2v_{2} & =0 \end{align*}

Let \(v_{1}=1\) then \(v_{2}=-2\) and hence \[ v^{1}=\begin{bmatrix} 1\\ -2 \end{bmatrix} \] Now we find the next optimal step\[ h_{1}=\frac{-\left ( v^{1}\right ) ^{T}\nabla J\left ( x^{1}\right ) }{\left ( v^{1}\right ) ^{T}Av^{1}}=\frac{-\left ( v^{1}\right ) ^{T}\left ( Ax^{1}+b\right ) }{\left ( v^{1}\right ) ^{T}Av^{1}}=\frac{-\begin{bmatrix} 1 & -2 \end{bmatrix} \left ( \begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} \frac{1}{4}\\ 0 \end{bmatrix} +\begin{bmatrix} -1\\ 1 \end{bmatrix} \right ) }{\begin{bmatrix} 1 & -2 \end{bmatrix}\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 1\\ -2 \end{bmatrix} }=\frac{3}{4}\] Hence\begin{align*} x^{2} & =x^{1}+h_{1}v^{1}\\ & =\begin{bmatrix} \frac{1}{4}\\ 0 \end{bmatrix} +\frac{3}{4}\begin{bmatrix} 1\\ -2 \end{bmatrix} \\ & =\begin{bmatrix} 1\\ -\frac{3}{2}\end{bmatrix} \end{align*}

Which is \(x^{\ast }\) that we found in first method. Using \(n=2\) steps as expected. In implementation, we will have to check we converged by looking at \(\nabla J\left ( x^{2}\right ) \) which will be\begin{align*} \nabla J\left ( x^{2}\right ) & =Ax^{2}+b\\ & =\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 1\\ -\frac{3}{2}\end{bmatrix} +\begin{bmatrix} -1\\ 1 \end{bmatrix} \\ & =\begin{bmatrix} 0\\ 0 \end{bmatrix} \end{align*}

As expected.

5.3.3 Third method. Conjugate gradient

The difference here is that we find \(v^{i}\) on the fly after each step. Unlike the conjugate direction method, where \(v^{i}\) are all pre-computed. Let \(v^{0}=\nabla \left ( J\left ( x^{0}\right ) \right ) =\begin{bmatrix} -1\\ 1 \end{bmatrix} \). In this method, we always pick \(v^{0}=\nabla \left ( J\left ( x^{0}\right ) \right ) \), where \(x^{0}\) is the starting guess vector. First step

\[ h_{0}=\frac{-\left ( v^{0}\right ) ^{T}\nabla J\left ( x^{0}\right ) }{\left ( v^{0}\right ) ^{T}Av^{0}}=\frac{-\left ( v^{0}\right ) ^{T}\left ( Ax^{0}+b\right ) }{\left ( v^{0}\right ) ^{T}Av^{0}}=\frac{-\begin{bmatrix} -1 & 1 \end{bmatrix}\begin{bmatrix} -1\\ 1 \end{bmatrix} }{\begin{bmatrix} -1 & 1 \end{bmatrix}\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} -1\\ 1 \end{bmatrix} }=\frac{-2}{2}=-1 \] Hence\begin{align*} x^{1} & =x^{0}+h_{0}v^{0}\\ & =\begin{bmatrix} 0\\ 0 \end{bmatrix} -1\begin{bmatrix} -1\\ 1 \end{bmatrix} =\begin{bmatrix} 1\\ -1 \end{bmatrix} \end{align*}

Now we find the mutual conjugate \(v^{1}\) as follows\begin{align*} \beta _{0} & =\frac{\left ( \nabla J\left ( x^{1}\right ) \right ) ^{T}Av^{0}}{\left ( v^{0}\right ) ^{T}Av^{0}}=\frac{\left ( Ax^{1}+b\right ) \left ( Av^{0}\right ) }{\left ( v^{0}\right ) ^{T}Av^{0}}=\frac{\left ( \begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 1\\ -1 \end{bmatrix} +\begin{bmatrix} -1\\ 1 \end{bmatrix} \right ) ^{T}\left ( \begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} -1\\ 1 \end{bmatrix} \right ) }{\begin{bmatrix} -1 & 1 \end{bmatrix}\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} -1\\ 1 \end{bmatrix} }\\ & =\frac{\begin{bmatrix} 1\\ 1 \end{bmatrix} ^{T}\begin{bmatrix} -2\\ 0 \end{bmatrix} }{2}=\frac{-2}{2}=-1 \end{align*}

Hence\begin{align*} v^{1} & =-\nabla J\left ( x^{1}\right ) +\beta _{0}v^{0}\\ & =-\left ( \begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 1\\ -1 \end{bmatrix} +\begin{bmatrix} -1\\ 1 \end{bmatrix} \right ) -\left ( 1\right ) \begin{bmatrix} -1\\ 1 \end{bmatrix} \\ & =\begin{bmatrix} 0\\ -2 \end{bmatrix} \end{align*}

Now that we found \(v^{1}\), we repeat the process. \[ h_{1}=\frac{-\left ( v^{1}\right ) ^{T}\nabla J\left ( x^{1}\right ) }{\left ( v^{1}\right ) ^{T}Av^{1}}=\frac{-\left ( v^{1}\right ) ^{T}\left ( Ax^{1}+b\right ) }{\left ( v^{1}\right ) ^{T}Av^{1}}=\frac{-\begin{bmatrix} 0 & -2 \end{bmatrix} \left ( \begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 1\\ -1 \end{bmatrix} +\begin{bmatrix} -1\\ 1 \end{bmatrix} \right ) }{\begin{bmatrix} 0 & -2 \end{bmatrix}\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 0\\ -2 \end{bmatrix} }=\frac{2}{8}=\frac{1}{4}\] Hence\begin{align*} x^{2} & =x^{1}+h_{1}v^{1}\\ & =\begin{bmatrix} 1\\ -1 \end{bmatrix} +\left ( \frac{1}{4}\right ) \begin{bmatrix} 0\\ -2 \end{bmatrix} \\ & =\begin{bmatrix} 1\\ -\frac{3}{2}\end{bmatrix} \end{align*}

Which is the same as with conjugate direction method. Converged in \(2\) steps also.

5.3.4 Fourth method. Conjugate gradient using Fletcher-Reeves

In this method \[ \beta _{k}=\frac{\nabla J\left ( u^{k+1}\right ) ^{T}\nabla J\left ( u^{k+1}\right ) }{\nabla J\left ( u^{k}\right ) ^{T}\nabla J\left ( u^{k}\right ) }=\frac{\left \Vert \nabla J\left ( u^{k+1}\right ) \right \Vert ^{2}}{\left \Vert \nabla J\left ( u^{k}\right ) \right \Vert ^{2}}\] We also start here with \(v^{0}=\nabla J\left ( u^{0}\right ) =\begin{bmatrix} -1\\ 1 \end{bmatrix} \) in this example. \[ h_{0}=\frac{-\left ( v^{0}\right ) ^{T}\nabla J\left ( x^{0}\right ) }{\left ( v^{0}\right ) ^{T}Av^{0}}=\frac{-\left ( v^{0}\right ) ^{T}\left ( Ax^{0}+b\right ) }{\left ( v^{0}\right ) ^{T}Av^{0}}=\frac{-\begin{bmatrix} -1 & 1 \end{bmatrix}\begin{bmatrix} -1\\ 1 \end{bmatrix} }{\begin{bmatrix} -1 & 1 \end{bmatrix}\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} -1\\ 1 \end{bmatrix} }=\frac{-2}{2}=-1 \] Hence\begin{align*} x^{1} & =x^{0}+h_{0}v^{0}\\ & =\begin{bmatrix} 0\\ 0 \end{bmatrix} -1\begin{bmatrix} -1\\ 1 \end{bmatrix} =\begin{bmatrix} 1\\ -1 \end{bmatrix} \end{align*}

Now find the mutual conjugate \(v^{1}\) as follows, using Fletcher-Reeves formula \[ \beta _{0}=\frac{\left \Vert \nabla J\left ( u^{1}\right ) \right \Vert ^{2}}{\left \Vert \nabla J\left ( u^{0}\right ) \right \Vert ^{2}}=\frac{\left \Vert Ax^{1}+b\right \Vert ^{2}}{\left \Vert Ax^{0}+b\right \Vert ^{2}}=\frac{\left \Vert \left ( \begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 1\\ -1 \end{bmatrix} +\begin{bmatrix} -1\\ 1 \end{bmatrix} \right ) \right \Vert ^{2}}{\left \Vert \left ( \begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 0\\ 0 \end{bmatrix} +\begin{bmatrix} -1\\ 1 \end{bmatrix} \right ) \right \Vert ^{2}}=\frac{\left \Vert \begin{bmatrix} 1\\ 1 \end{bmatrix} \right \Vert ^{2}}{\left \Vert \begin{bmatrix} -1\\ 1 \end{bmatrix} \right \Vert ^{2}}=\frac{\left ( \sqrt{2}\right ) ^{2}}{\left ( \sqrt{2}\right ) ^{2}}=1 \] \begin{align*} v^{1} & =-\nabla J\left ( x^{1}\right ) +\beta _{0}v^{0}\\ & =-\left ( \begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 1\\ -1 \end{bmatrix} +\begin{bmatrix} -1\\ 1 \end{bmatrix} \right ) +\left ( 1\right ) \begin{bmatrix} -1\\ 1 \end{bmatrix} \\ & =\begin{bmatrix} -2\\ 0 \end{bmatrix} \end{align*}

Now that we found \(v^{1}\), we repeat the process. \[ h_{1}=\frac{-\left ( v^{1}\right ) ^{T}\nabla J\left ( x^{1}\right ) }{\left ( v^{1}\right ) ^{T}Av^{1}}=\frac{-\left ( v^{1}\right ) ^{T}\left ( Ax^{1}+b\right ) }{\left ( v^{1}\right ) ^{T}Av^{1}}=\frac{-\begin{bmatrix} -2 & 0 \end{bmatrix} \left ( \begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 1\\ -1 \end{bmatrix} +\begin{bmatrix} -1\\ 1 \end{bmatrix} \right ) }{\begin{bmatrix} -2 & 0 \end{bmatrix}\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 0\\ -2 \end{bmatrix} }=\frac{2}{8}=\frac{1}{4}\] Hence\begin{align*} x^{2} & =x^{1}+h_{1}v^{1}\\ & =\begin{bmatrix} 1\\ -1 \end{bmatrix} +\left ( \frac{1}{4}\right ) \begin{bmatrix} 0\\ -2 \end{bmatrix} \\ & =\begin{bmatrix} 1\\ -\frac{3}{2}\end{bmatrix} \end{align*}

Which is the same as with conjugate direction method. It converges in \(2\) steps also.

5.3.5 Fifth method. Conjugate gradient using Polak-Ribiere

In this method \[ \beta _{k}=\frac{\nabla J\left ( u^{k+1}\right ) ^{T}\left ( \nabla J\left ( u^{k+1}\right ) -\nabla J\left ( u^{k}\right ) \right ) }{\nabla J\left ( u^{k}\right ) ^{T}\nabla J\left ( u^{k}\right ) }\] We also start here with \(v^{0}=\nabla J\left ( u^{0}\right ) =\begin{bmatrix} -1\\ 1 \end{bmatrix} \) in this example. \[ h_{0}=\frac{-\left ( v^{0}\right ) ^{T}\nabla J\left ( x^{0}\right ) }{\left ( v^{0}\right ) ^{T}Av^{0}}=\frac{-\left ( v^{0}\right ) ^{T}\left ( Ax^{0}+b\right ) }{\left ( v^{0}\right ) ^{T}Av^{0}}=\frac{-\begin{bmatrix} -1 & 1 \end{bmatrix}\begin{bmatrix} -1\\ 1 \end{bmatrix} }{\begin{bmatrix} -1 & 1 \end{bmatrix}\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} -1\\ 1 \end{bmatrix} }=\frac{-2}{2}=-1 \] Hence\begin{align*} x^{1} & =x^{0}+h_{0}v^{0}\\ & =\begin{bmatrix} 0\\ 0 \end{bmatrix} -1\begin{bmatrix} -1\\ 1 \end{bmatrix} =\begin{bmatrix} 1\\ -1 \end{bmatrix} \end{align*}

Now we find the mutual conjugate \(v^{1}\) direction as follows, using Polak-Ribiere formula \[ \beta _{0}=\frac{\nabla J\left ( u^{1}\right ) ^{T}\left ( \nabla J\left ( u^{1}\right ) -\nabla J\left ( u^{0}\right ) \right ) }{\nabla J\left ( u^{0}\right ) ^{T}\nabla J\left ( u^{0}\right ) }\]

But

\begin{align*} \nabla J\left ( u^{1}\right ) & =\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 1\\ -1 \end{bmatrix} +\begin{bmatrix} -1\\ 1 \end{bmatrix} =\begin{bmatrix} 1\\ 1 \end{bmatrix} \\ \nabla J\left ( u^{0}\right ) & =\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 0\\ 0 \end{bmatrix} +\begin{bmatrix} -1\\ 1 \end{bmatrix} =\begin{bmatrix} -1\\ 1 \end{bmatrix} \end{align*}

Hence

\[ \beta _{0}=\frac{\begin{bmatrix} 1 & 1 \end{bmatrix} \left ( \begin{bmatrix} 1\\ 1 \end{bmatrix} -\begin{bmatrix} -1\\ 1 \end{bmatrix} \right ) }{\begin{bmatrix} -1 & 1 \end{bmatrix}\begin{bmatrix} -1\\ 1 \end{bmatrix} }=\frac{2}{2}=1 \] Hence

\begin{align*} v^{1} & =-\nabla J\left ( x^{1}\right ) +\beta _{0}v^{0}\\ & =-\left ( \begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 1\\ -1 \end{bmatrix} +\begin{bmatrix} -1\\ 1 \end{bmatrix} \right ) +\left ( 1\right ) \begin{bmatrix} -1\\ 1 \end{bmatrix} \\ & =\begin{bmatrix} -2\\ 0 \end{bmatrix} \end{align*}

Now that we found \(v^{1}\), we repeat the process. \[ h_{1}=\frac{-\left ( v^{1}\right ) ^{T}\nabla J\left ( x^{1}\right ) }{\left ( v^{1}\right ) ^{T}Av^{1}}=\frac{-\left ( v^{1}\right ) ^{T}\left ( Ax^{1}+b\right ) }{\left ( v^{1}\right ) ^{T}Av^{1}}=\frac{-\begin{bmatrix} -2 & 0 \end{bmatrix} \left ( \begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 1\\ -1 \end{bmatrix} +\begin{bmatrix} -1\\ 1 \end{bmatrix} \right ) }{\begin{bmatrix} -2 & 0 \end{bmatrix}\begin{bmatrix} 4 & 2\\ 2 & 2 \end{bmatrix}\begin{bmatrix} 0\\ -2 \end{bmatrix} }=\frac{2}{8}=\frac{1}{4}\] Hence\begin{align*} x^{2} & =x^{1}+h_{1}v^{1}\\ & =\begin{bmatrix} 1\\ -1 \end{bmatrix} +\left ( \frac{1}{4}\right ) \begin{bmatrix} 0\\ -2 \end{bmatrix} \\ & =\begin{bmatrix} 1\\ -\frac{3}{2}\end{bmatrix} \end{align*}

Which is the same as with conjugate direction method. Converges in \(2\) steps also as expected