5.5 Summary of iterative search algorithms

  5.5.1 steepest descent
  5.5.2 conjugate direction, Quadratic function \(J(x)\)
  5.5.3 conjugate gradient, Quadratic function \(J(x)\)
  5.5.4 conjugate gradient, None quadratic function \(J(x)\), Hestenses-Stiefel
  5.5.5 conjugate gradient, None quadratic function \(J(x)\), Polak-Ribiere
  5.5.6 conjugate gradient, None quadratic function \(J(x)\), Fletcher-Reeves

5.5.1 steepest descent

   5.5.1.1 steepest descent, any objective function \(J(x)\)
   5.5.1.2 steepest descent, Quadratic objective function \(J(x)\)

5.5.1.1 steepest descent, any objective function \(J(x)\)

The input is \(x(0)\) the initial starting point and \(J(x)\) itself.

1.

init

\(x^{0}=x(0)\), \(k=0\)

2.

\(g^{k}\)

\(= \nabla J(x^{k})\)

3.

\(\alpha _{k}\)

\(= \min _{\alpha }J\left ( x^{k} - \alpha \frac{g^{k}}{\|g^{k}\|} \right ) \) (line search)

4.

\(x^{k+1}\)

\(= x^{k} - \alpha _{k} \frac{g^{k}}{\|g^{k}\|} \)

5.

\(k\)

\(= k +1\)

6.

goto 2

5.5.1.2 steepest descent, Quadratic objective function \(J(x)\)

If the objective function \(J(x)\) is quadratic \(J(x)=x^{T} A x - b^{T} x +c\) then there is no need to do the line search.

The input is \(x(0)\) the initial starting point and \(A,b\). The algorithm becomes

1.

Init

\(x^{0}=x(0)\), \(k=0\)

2.

\(g^{k}\)

\(= \nabla J(x^{k}) = A x^{k} - b\)

3.

\(\alpha _{k}\)

\(= \frac{ \left [ g^{k} \right ] ^{T} g^{k} }{ \left [ g^{k} \right ] ^{T} A\, g^{k}} \)

4.

\(x^{k+1}\)

\(= x^{k} - \alpha _{k} g^{k} \)

5.

\(k\)

\(= k +1\)

6.

goto 2

5.5.2 conjugate direction, Quadratic function \(J(x)\)

For quadratic \(J(x)=x^{T} A x - b^{T} x +c\) the conjugate direction algorithm is as follows.

Input \(x(0)\) starting point, and \(A,b\) and set of \(n\) mutually conjugate vectors \(\{v^{0},v^{1},\dots ,v^{n-1}\}\) with respect to \(A\), where \(n\) is the size of \(A\). In other words, \((v^{i})^{T} A v^{j}=0\) for \(i \neq j\).

These \(v^{i}\) vectors have to be generated before starting the algorithm. With the conjugate gradient (below), these A-conjugate vectors are generated on the fly inside the algorithm as it iterates. This is the main difference between conjugate direction and conjugate gradient.

1.

init

\(u^{0}=x(0)\), \(k=0\)

2.

\(g^{k}\)

\(= \nabla J(x^{k}) = A x^{k} - b\)

3.

\(\alpha _{k}\)

\(= \frac{ \left [ g^{k}\right ] ^{T} v^{k}}{ \left [ g^{k} \right ] ^{T} A\, v^{k}} \)

4.

\(x^{k+1}\)

\(= x^{k} - \alpha _{k} v^{k} \)

5.

\(k\)

\(= k +1\)

6.

goto 2

We see the difference between the above and the steepest descent before it, is in line 3,4. Where now \(v^{k}\) replaces \(g^{k}\) in two places.

5.5.3 conjugate gradient, Quadratic function \(J(x)\)

Conjugate direction required finding set of \(v\) vectors before starting the algorithm. This algorithm generates these vectors as it runs.

Input \(x(0)\) starting point, and \(A,b\).

1.

Init

\(u^{0}=x(0)\), \(k=0\), \(g^{0}=\nabla J(x^{0}) = A x^{0} - b\), \(v^{0}=-g^{0}\)

2.

\(\alpha _{k}\)

\(= \frac{ \left [ g^{k}\right ] ^{T} v^{k}}{ \left [ g^{k} \right ] ^{T} A\, v^{k}} \)

4.

\(x^{k+1}\)

\(= x^{k} + \alpha _{k} v^{k} \)

5.

\(g^{k+1}\)

\(= \nabla J(x^{k+1}) = A x^{k+1} - b\)

6.

\(\beta \)

\(= \frac{ \left [ g^{k+1}\right ] ^{T} A v^{k}}{ \left [ v^{k} \right ] ^{T} A v^{k} }\)

7.

\(v^{k+1}\)

\(= - g^{k+1} + \beta v^{k}\)

8.

\(k\)

\(= k +1\)

9.

goto 2

5.5.4 conjugate gradient, None quadratic function \(J(x)\), Hestenses-Stiefel

If we do not have quadratic function, then we can not use \(A,b\) to generate \(\beta \). The above algorithm becomes using Hestenses-Stiefel

Input \(x(0)\) starting point.

1.

Init

\(u^{0}=x(0)\), \(k=0\), \(g^{0}=\nabla J(x^{0})\), \(v^{0}=-g^{0}\)

2.

\(\alpha _{k}\)

\(= \min _{\alpha }J\left ( x^{k} + \alpha v^{k} \right ) \) (line search)

3.

\(x^{k+1}\)

\(= x^{k} + \alpha _{k} v^{k} \)

4.

\(g^{k+1}\)

\(= \nabla J(x^{k+1})\)

5.

\(\beta \)

\(= \frac{\left [ g^{k+1}\right ] ^{T} \left [ g^{k+1}-g^{k}\right ] }{\left [ v^{k} \right ] ^{T} \left [ g^{k+1}-g^{k}\right ] }\)

6.

\(v^{k+1}\)

\(= - g^{k+1} + \beta v^{k}\)

7.

\(k\)

\(= k +1\)

8.

goto 2

5.5.5 conjugate gradient, None quadratic function \(J(x)\), Polak-Ribiere

If we do not have quadratic function, then we can not use \(A,b\) to generate \(\beta \). The conjugate gradient algorithm becomes using Polak-Ribiere as follows

Input \(x(0)\) starting point.

1.

Init

\(u^{0}=x(0)\), \(k=0\), \(g^{0}=\nabla J(x^{0})\), \(v^{0}=-g^{0}\)

2.

\(\alpha _{k}\)

\(= \min _{\alpha }J\left ( x^{k} + \alpha v^{k} \right ) \) (line search)

3.

\(x^{k+1}\)

\(= x^{k} + \alpha _{k} v^{k} \)

4.

\(g^{k+1}\)

\(= \nabla J(x^{k+1})\)

5.

\(\beta \)

\(= \frac{\left [ g^{k+1}\right ] ^{T} \left [ g^{k+1}-g^{k}\right ] }{ \left [ g^{k}\right ] ^{T} g^{k} }\)

6.

\(v^{k+1}\)

\(= - g^{k+1} + \beta v^{k}\)

7.

\(k\)

\(= k +1\)

8.

goto 2

5.5.6 conjugate gradient, None quadratic function \(J(x)\), Fletcher-Reeves

If we do not have quadratic function, then we can not use \(A,b\) to generate \(\beta \). The conjugate gradient algorithm becomes using Fletcher-Reeves as follows

Input \(x(0)\) starting point.

1.

Init

\(u^{0}=x(0)\), \(k=0\), \(g^{0}=\nabla J(x^{0})\), \(v^{0}=-g^{0}\)

2.

\(\alpha _{k}\)

\(=\min _{\alpha }J\left ( x^{k}+\alpha v^{k}\right ) \) (line search)

3.

\(x^{k+1}\)

\(=x^{k}+\alpha _{k}v^{k}\)

4.

\(g^{k+1}\)

\(=\nabla J(x^{k+1})\)

5.

\(\beta \)

\(=\frac{\left [ g^{k+1}\right ] ^{T}g^{k+1}}{\left [ g^{k}\right ] ^{T}g^{k}}\)

6.

\(v^{k+1}\)

\(=-g^{k+1}+\beta v^{k}\)

7.

\(k\)

\(=k+1\)

8.

goto 2