2.6 HW 5

  2.6.1 Introduction
  2.6.2 Solvers efficiency and iterations count
  2.6.3 Discussion of results and conclusions
  2.6.4 Appendix
  2.6.5 Result for CG with incomplete cholesky preconditioner \(\varepsilon =10^{-3}\)
PDF (letter size)
PDF (legal size)

2.6.1 Introduction

pict
Figure 2.58:problem description

The test problem used is \[ \frac{\partial ^{2}u}{\partial x^{2}}+\frac{\partial ^{2}u}{\partial y^{2}}=0 \] on the unit square \(\left [ \left ( 0,1\right ) ,\left ( 0,1\right ) \right ] \) with zero boundary conditions.

The above problem is solved using the numerical method of conjugate gradient iterative solver. The mesh spacings used is \[ h=\left \{ \frac{1}{16},\frac{1}{32},\frac{1}{64},\frac{1}{128}\right \} \] and the tolerance used to check for convergence is\[ \varepsilon =10^{-6}\] The solver terminates when the mesh norm of the residual becomes smaller than the above quantity using the following check

\[ \sqrt{h}\left \Vert r^{\left ( k\right ) }\right \Vert _{2}<\varepsilon \] Where in the above, \(r^{\left ( k\right ) }\) is the residual at the \(k^{th}\) iteration and \(h\) is the current value of the mesh spacing.

The reason for using \(zero\) as the driving force on the RHS of the pde, is to allow the calculation and tracking of the error at each iteration as the exact solution \(u_{exact}\) for this problem is now known, which is zero. Now the error at each iteration \(k\) to be found using \begin{align*} e^{\left ( k\right ) } & =\left \Vert u_{exact}-u^{\left ( k\right ) }\right \Vert \\ & =\left \Vert u^{\left ( k\right ) }\right \Vert \end{align*}

Where in the above \(u^{\left ( k\right ) }\) represents the approximate solution at the \(k^{th}\) iteration.

A Matlab function \(CG.m\) was written to implement conjugate gradient solver. One of the parameters this function accepts is the name of the preconditioner to use. The following preconditioners are supported\[ \text{NONE,\ SSOR,\ MultiGrid,\ IncompleteCholesky}\] When NONE is specified, then no preconditioning is done.

For each preconditioner, the solver was run to find the solution to the above test problem. The initial guess for the solution \(u^{\left ( 0\right ) }\) used was generated using Matlab rand() function.

For each preconditioner the following plots were generated

1.
Plot of error \(\left \Vert e^{\left ( k\right ) }\right \Vert \) per iteration \(k\) which showed how the rate of error reduction per iteration. The plot was generated in log and linear scale.
2.
Plot of the residual \(\left \Vert r^{\left ( k\right ) }\right \Vert \) per iteration which showed the rate of residual norm reduction per iteration, and also plotted in log and linear scale. The initial residual is defined as \(r^{\left ( 0\right ) }=f-Au^{\left ( 0\right ) }\) and each subsequent iteration, the residual is defined as \(r^{\left ( k\right ) }=r^{\left ( k-1\right ) }-\alpha Ap^{\left ( k-1\right ) }\). The algorithm below illustrates this in more details.
3.
The spectrum of the eigenvalues of \(A\) and the spectrum of the eigenvalues of \(M^{-1}A\) are plotted using matlab’s scatter() command to better see the effect on the condition number value when multiplying \(A\) by \(M^{-1}\), where \(M\) is the preconditioner matrix.
4.
Plot of the final solution found on a 3D mesh plot. The final solution was verified to be close to zero, which is the same as the exact solution.

In addition to the above plots, for each mesh spacing \(h\), the actual result table is printed which tabulates the above values at each iteration. This table was used to generate the above plots. The printed tables also show the ratio of the value of norm of the residual at the current iteration to its value at the previous iteration, similarly for the error norm.

Due to the large size of these tables, the tables for all the spacings and for each solver are available in the appendix.

Conjugate gradient algorithm description

The idea of conjugate gradient is to use preconditioning matrix to speed up the convergence of the conjugate gradient method. The original problem\[ Ax=f \] is transformed to a new problem\[ M^{-1}Ax=M^{-1}f \] such that \(M^{-1}A\) has a smaller condition number than \(A\). For most iterative solvers, the rate of convergence increases as the condition number of the system matrix \(A\) decreases.

The conjugate gradient method works only on symmetric positive definite \(A\) matrix, and its speed of convergence is affected by the distribution of the eigenvalues of the \(A\) matrix. The estimate of convergence is more accurate if the distribution of eigenvalues is uniform. For the discrete 2D Poisson problem, this is the case, as verified by plots of the spectrum generated below for each case.

The preconditioning is used to modify the spectrum of \(A\) so that the eigenvalues of the new system matrix \(M^{-1}A\) become more clustered together causing the condition number to become smaller and thus increasing the convergence rate.

The following table was generated to show the effect of preconditioning on lowering the condition number. It shows the condition number for CG (in other words, for the \(A\) matrix only), and then the condition number for \(M^{-1}A\) for different solvers as mesh spacing is changed. It also shows below the condition number value, in a box, the maximum eigenvalue and the minimum eigenvalue. Notice that in the following table, if one tries to apply the\(\frac{\left \vert \max \lambda \right \vert }{\left \vert \min \lambda \right \vert }\)to determine the condition number, then the result will not match the condition number as shown. The above formula do not apply in this case, as these are sparse matrices and the condition number was found by estimating its value using the Matlab function condest() and not by applying the above formula.






solver \(h=\frac{1}{16},N=225\) \(h=\frac{1}{32},N=961\) \(h=\frac{1}{64},N=3969\) \(h=\frac{1}{128},N=16129\)





NONE \(150,(7.923,0.0768)\) \(603,(7.9807,0.01926)\) \(2413,(7.9995,0.0048)\) \(9655,(7.9987,0.001205)\)





\(SSOR\) \(33,(0.25,0.018203)\) \(128,(0.25,0.004748)\) \(511,(0.25,0.0012)\)





IncCholesky \(\varepsilon =10^{-2}\) \(32,(2.365,0.2279)\) \(108,(2.44,0.0585)\) \(422,(2.537,0.0146)\)





IncCholesky \(\varepsilon =10^{-3}\) \(48,(2.337,0.508)\) \(153,(2.526,0.1603)\) \(373,(2.592,0.041756)\)





This diagram below reflects the above table result to clearly show the reduction of the condition number as a result of preconditioning.

pict
Figure 2.59:compare condition numbers

CG Algorithm pseudocode

The following is the algorithm used for the implementation of conjugate gradient with preconditioning.\[\begin{array} [c]{l}Input:\ A,f,tol,preconditionSolverName,dropTol\\ Output:\ \tilde{u}\ approximate\ solution\ to\ Ax=0\\ \\ u_{0}=\operatorname{rand}()\ (\ast initial\ guess\ of\ solution\ \ast )\\ r_{0}=f-Au_{0}\ (\ast initial\ residual\ast )\\ z_{0}\leftarrow CALL\ \ preconditionSolver(r_{0},A,preconditionSolverName,dropTol)\\ p_{0}=z_{0}\\ FOR\ k=1,2,3,\cdots \\ \ \ \ \ \ \ \ \ \begin{array} [c]{l}e_{k-1}=\sqrt{h}\left \Vert u_{k-1}\right \Vert _{2}\ \ (\ast \ the\ error\ since\ u_{exact}\ is\ known\ to\ be\ zero\ast )\\ \omega _{k-1}=Ap_{k-1}\\ \alpha _{k-1}=\frac{z_{k-1}^{T}r_{k-1}}{p_{k-1}^{T}\omega _{k-1}}\\ u_{k}=u_{k-1}+\alpha _{k-1\ }p_{k-1}\\ r_{k}=r_{k-1}-\alpha _{k-1}\omega _{k-1}\\ IF\ \left ( \sqrt{h}\left \Vert r_{k}\right \Vert _{2}\right ) <tol\ THEN\\ \ \ \ \ \ RETURN\ u_{k}\\ END\ IF\\ z_{k}\leftarrow CALL\ \ preconditionSolver(r_{k},A,preconditionSolverName,dropTol)\\ \beta _{k-1}=\frac{z_{k}^{T}r_{k}}{z_{k-1}^{T}r_{k-1}}\\ p_{k}=z_{k}+\beta _{k-1}p_{k-1}\end{array} \\ END\ FOR \end{array} \] The algorithm for the function \(preconditionSolver()\) is as follows\[\begin{array} [c]{l}Input:\ r,A,preconditionSolverName,dropTol\\ Output:\ z\ approximate\ solution\ to\ Mz=r\\ CASE\ preconditionSolverName\ IS\\ \ \ \ \ \ \ \ \ \begin{array} [c]{l}WHEN\ NONE\ z\leftarrow r\ \ \ \ \ \ \ \ \ \ //\ no\ preconditioning\ \\ WHEN\ MultiGrid\\ \ \ \ \ \ \ \begin{array} [c]{l}\mu _{1}=\mu _{2}=1\ (\ast \ presmoother\ and\ postsmoother\ast )\\ z\leftarrow CALL\ VCYCLE(zeroInitialGuess,r)\\ \ \ \ //VCYCLE\ is\ one\ implemented\ in\ HW4\ but\ changed\ to\ do\ \\ \ \ \ //one\ forward\ Gauss-Seidel/red-black\ followed\ by\ \\ \ \ \ //one\ reverse\ Gauss-Seidel/red-black\ \end{array} \\ WHEN\ \ SSOR\\ \ \ \ \ \ \ z\leftarrow CALL\ SOR\ forward\ followed\ by\ SOR\ in\ reverse\\ WHEN\ IncompleteCholesky\\\begin{array} [c]{l}R=cholinc(A,dropTol)\\ z\leftarrow R\backslash (R^{T}\backslash r) \end{array} \end{array} \\ END\ CASE\\ RETURN\ z \end{array} \]

2.6.2 Solvers efficiency and iterations count

   2.6.2.1 Work per iteration
   2.6.2.2 Number of iterations

In addition to finding the number of iterations needed for convergence by each solver, the problem also asked to compare the efficiency of each solver. This is done by finding the work needed by each solver to converge.

Work needed is defined as\[ Work=NumberOfIterations\ \times \ WorkPerIteration \] Before determining the work for each solver, the following table lists the \(\boldsymbol{cputime}\) used by each solver for the different spacings. The cpu time is measured using Matlab cputime function, and measures only the call to CG() and does not include any other calls such as plotting.






preconditioning \(\begin{array} [c]{l}h=\frac{1}{16}\\ N=225 \end{array} \) \(\begin{array} [c]{l}h=\frac{1}{32}\\ N=961 \end{array} \) \(\begin{array} [c]{l}h=\frac{1}{64}\\ N=3969 \end{array} \) \(\begin{array} [c]{l}h=\frac{1}{128}\\ N=16129 \end{array} \)





NONE \(0.19\) \(0.37\) \(13.48\) \(556.6\)





Multigrid \(0.34\) \(0.6\) \(13.48\) \(566\)





SSOR \(0.23\) \(0.5\) \(13.3\) \(564.6\)





Incomplete Cholesky\(\ \varepsilon =10^{-2}\) \(0.16\) \(0.34\) \(13.8\) \(559\)





Incomplete Cholesky\(\ \varepsilon =10^{-3}\) \(0.19\) \(0.5\) \(13.3\) \(559\)





Surprisingly, no appreciable difference can be seen between the different solvers in terms of cpu time. It was expected that NONE would have the largest CPU time as it has the lowest efficiency. This result can be attributed to using small number of \(N\) values, which was not large enough in the limit to reflect the difference. One needs to use much larger values of \(N\) to see the effect of preconditioning on CPU time difference. Due to memory limitation, this was not possible to implement at this time. Now the work per iteration is analyzed.

2.6.2.1 Work per iteration

All solvers perform similar work per iterations except for the step needed to apply the preconditioning to determine \(z_{k}\). The only difference between not using preconditioning and using one, is in the step to solve for \(z\) in \(Mz=r\). Using work per iteration as \(O\left ( N\right ) \) for the base CG with no preconditioning, then the following can be defined for work per iteration for each solver:

1.
No preconditioner is applied: no extra work is needed, as \(z\) is the same as \(r\) hence \(O\left ( N\right ) \)
2.
\(multigrid\ \): work needed to determine \(z\) adds an extra cost of one V cycle. Work for one V cycle was found from HW4 to be \(\frac{4}{3}C\times N\) where \(N\) is number of unknowns and \(C\) is a constant estimated to be \(\left ( 7\left ( \upsilon _{1}+\upsilon _{2}\right ) +13\right ) \,\ \)where \(\upsilon _{1},\upsilon _{2}\) are the numbers of pre smooth and post smooth operations. These are both \(one\) in this case. The smoothing is done twice (forward and reverse), hence the above becomes \(\left ( 2\times 7\left ( \upsilon _{1}+\upsilon _{2}\right ) +13\right ) \), resulting in work per iteration of \(\frac{4}{3}\left ( 2\times 7\left ( \upsilon _{1}+\upsilon _{2}\right ) +13\right ) N\) or about \(55N.\) Adding the \(O\left ( N\right ) \) from the above, this is still results in \(O\left ( N\right ) .\ \)
3.
\(SSOR:\) The cost is twice one SOR step. One step of SOR work is \(7N,\) where \(N\) is number of unknowns, since it takes about 7 flop operations to smooth one grid point, and there are \(N\) of these. Hence for SSOR work is twice that or \(14N\). As above, this is still an order of \(N\).

2.6.2.2 Number of iterations

From lectures notes, it was found that the error rate in conjugate gradient (with no preconditioning) behaves as\[ \left \Vert e_{k}\right \Vert _{A}\leq 2\frac{\sqrt{\Bbbk \left ( A\right ) }-1}{\sqrt{\Bbbk \left ( A\right ) }+1}\left \Vert e_{0}\right \Vert _{A}\] Where \(\Bbbk \left ( A\right ) \) is the condition number of \(A\), it was shown that \(\Bbbk \left ( A\right ) =O\left ( h^{-1}\right ) \) where \(h\) is the mesh spacing. Hence, for fixed tolerance, which is the case here, the number of iterations is \(O\left ( h^{-1}\right ) =O\left ( N^{1/2}\right ) \) where \(N\) is number of unknowns.

The results of the numerical experiment done agrees with the above, as shown below, for the case of NONE (which is the case of conjugate gradient with no preconditioning).

The following table was generated from result of running the program. In this table, \(N\) is the number of unknowns, \(\Bbbk \left ( M^{-1}A\right ) \) is the condition number of \(M^{-1}A\), and \(\Bbbk \left ( A\right ) \) is the condition number of \(A\). Since sparse matrices are used, the Matlab function condest() was used to find the condition numbers. In this table I.C. means Incomplete Cholesky








preconditioner \(\begin{array} [c]{l}h=\frac{1}{16}\\ N=225 \end{array} \) \(\Bbbk \left ( M^{-1}A\right ) \) \(\Bbbk \left ( A\right ) \) \(\begin{array} [c]{l}h=\frac{1}{32}\\ N=961 \end{array} \) \(\Bbbk \left ( M^{-1}A\right ) \) \(\Bbbk \left ( A\right ) \)







NONE \(42\) \(N/A\) \(150\) \(82\) \(N/A\) \(603\)







Multigrid \(4\) \(150\) \(4\) \(603\)







SSOR \(18\) \(33\) \(150\) \(30\) \(128\) \(603\)







IC \(\varepsilon =10^{-2}\) \(7\) \(32\) \(150\) \(13\) \(108\) \(603\)







IC \(\varepsilon =10^{-3}\) \(4\) \(48\) \(150\) \(6\) \(153\) \(603\)














preconditioner \(\begin{array} [c]{l}h=\frac{1}{64}\\ N=3969 \end{array} \) \(\Bbbk \left ( M^{-1}A\right ) \) \(\Bbbk \left ( A\right ) \) \(\begin{array} [c]{l}h=\frac{1}{128}\\ N=16129 \end{array} \) \(\Bbbk \left ( M^{-1}A\right ) \) \(\Bbbk \left ( A\right ) \)







NONE \(157\) \(N/A\) \(2413\) \(291\) N/A \(9655\)







Multigrid \(4\) \(2413\) \(4\) \(9655\)







SSOR \(56\) \(511\) \(2413\) \(103\) Memory  problem \(9655\)







IC \(\varepsilon =10^{-2}\) \(22\) \(422\) \(2413\) \(39\) Memory  problem \(9655\)







IC \(\varepsilon =10^{-3}\) \(8\) \(373\) \(2413\) \(14\) Memory  problem \(9655\)







The following is a plot that represents the above results.

pict
Figure 2.60:iterations plot

From the above one can see that multigrid has \(O\left ( 1\right ) \) for the number of iterations. The number of iterations was \(4\) for all the cases from \(h=\frac{1}{16}\) to \(h=\frac{1}{128}\). For SSOR, the number of iterations grew sublinear in terms of \(N\), from the above one can estimate this to be \(O\left ( N^{1/4}\right ) \), while for no preconditioning, the number of iterations grew as approximately as \(O\left ( N^{\frac{1}{2}}\right ) \) as predicted by earlier analytical result.

2.6.3 Discussion of results and conclusions

The use of preconditioning on \(A\) caused a reduction of the number of iterations to convergence to the same fixed tolerance when compared to convergence with no preconditioning. This was due to reduction of the condition number of the system matrix as can be seen in the above table. By reducing the largest eigenvalue, the rate of convergence increased. However, preconditioning also adds an extra cost per iteration. The extra work however, was also of order \(N\) and hence the final efficiency was governed by the number of iterations for large \(N.\)

Therefore, this is the result of work efficiency for the main solvers, using the formula of \[ \text{Work}=\text{NumberOfIterations}\ \times \ \text{WorkPerIteration}\]

1.
NONE:  \(O\left ( N^{1/2}\right ) \times O\left ( N\right ) =O\left ( N^{3/2}\right ) \)
2.
SSOR: \(O\left ( N^{1/4}\right ) \times O\left ( N\right ) =O\left ( N^{5/4}\right ) \)
3.
Multigrid: \(O\left ( 1\right ) \times O\left ( N\right ) =O\left ( N\right ) \)

Incomplete Cholesky was not added as it was hard to estimate from the curve above the number of iterations and since depend on the drop tolerance values.

The above analysis showed that Multigrid is most efficient, followed by Incomplete Cholesky (but these depend on the tolerance drop term), followed by SSOR, and finally by CG with no preconditioning

In conclusion, using Multigrid for preconditioner for conjugate gradient seems to be the most effective solver.

2.6.4 Appendix

   2.6.4.1 Result for CG with no preconditioner
   2.6.4.2 Result for CG with Multigrid preconditioner
   2.6.4.3 Result for CG with SSOR preconditioner
   2.6.4.4 Plots for h=1/128
   2.6.4.5 Result for CG with incomplete cholesky preconditioner \(\varepsilon =10^{-2}\)

2.6.4.1 Result for CG with no preconditioner

Plots h=1/16

pict
Figure 2.61:solver none plots 16
Plots for h=1/32

pict
Figure 2.62:solver none plots 32
Plot for h=1/64

pict
Figure 2.63:solver none plots 64
Plots for h=1/128

pict
Figure 2.64:solver none plots 128
2.6.4.2 Result for CG with Multigrid preconditioner

Plots for h=1/16

pict
Figure 2.65:solver MG plots 16
Plots for h1/32

pict
Figure 2.66:solver MG plots 32
Plots for h=164

pict
Figure 2.67:solver MG plots 64

Plots for h=1/128

pict
Figure 2.68:solver MG plots 128
2.6.4.3 Result for CG with SSOR preconditioner

Plots for h=1/16

pict
Figure 2.69:solver SSOR plots 16

Plots for h=1/32

pict
Figure 2.70:solver SSOR plots 32

Plots for h=1/64

pict
Figure 2.71:solver SSOR plots 64
2.6.4.4 Plots for h=1/128

pict
Figure 2.72:solver SSOR plots 128
2.6.4.5 Result for CG with incomplete cholesky preconditioner \(\varepsilon =10^{-2}\)

Plots for h=1/16

pict
Figure 2.73:solver incomplete cholesky plots 16

Plots for h=1/32

pict
Figure 2.74:solver incomplete cholesky plots 32

Plots for h=1/64

pict
Figure 2.75:solver incomplete cholesky plots 64

Plots for h=1/128

pict
Figure 2.76:solver incomplete cholesky plots 128

2.6.5 Result for CG with incomplete cholesky preconditioner \(\varepsilon =10^{-3}\)

   2.6.5.1 References
   2.6.5.2 Computation Tables
   2.6.5.3 Table CG with incomplete cholesky

Plots for h=1/16

pict
Figure 2.77:solver CG with incomplete cholesky preconditioner 16

underlinePlots for h=1/32

pict
Figure 2.78:solver CG with incomplete cholesky preconditioner 32

Plots for \(h=\frac{1}{64}\)

pict
Figure 2.79:solver CG with incomplete cholesky preconditioner 64

Plots for h=1/128

pict
Figure 2.80:solver CG with incomplete cholesky preconditioner 128
2.6.5.1 References

1.
R. J. LeVeque. Finite Difference Methods for Ordinary and Partial Differential Equations: Steady-State and Time-Dependent Problems. SIAM, 2007.

2.6.5.2 Computation Tables

table for CG with no preconditioner

table for CG with no preconditioner \(h=\frac{1}{16}\)

Tolerance=\(10^{-6}\) method=NONE, Iterations = \(42\), condition number A=\(150.4167\)

Table CG with no preconditioner for \(h=\frac{1}{32}\)

Tolerance=\(10^{-6}\) method=NONE, Iterations =\(82\), condition number A=\(603\)

Table for CG with no preconditioner \(h=\frac{1}{64}\)

Tolerance=\(10^{-6}\) method=NONE, Iterations = \(157\), condition number A=\(2413\)

Table for CG with no preconditioner \(h=\frac{1}{128}\)

Tolerance=\(10^{-6}\) method=NONE, Iterations = \(291\), condition number A=\(9655\)

tables for CG with Multigrid preconditioner

 
h =      1/16   eps=0.000001    method=MG 
k=4, cond A=150.416930 
               k             |e|           ratio             |r|           ratio 
               1       2.1998630       0.0000000       5.3253083       0.0000000 
               2       0.0956633       0.0434860       0.0363016       0.0068168 
               3       0.0014271       0.0149181       0.0006556       0.0180607 
               4       0.0000002       0.0001388       0.0000002       0.0003796 
 
h =      1/32            eps=0.000001   method=MG 
k=4,  cond A=603.051928 
               k             |e|           ratio             |r|           ratio 
               1       3.1204692       0.0000000       7.1238671       0.0000000 
               2       0.1461191       0.0468260       0.0360066       0.0050544 
               3       0.0031325       0.0214382       0.0006083       0.0168931 
               4       0.0000004       0.0001403       0.0000003       0.0005447 
 
h =      1/64    ,n=65   eps=0.000001   method=MG 
k=4, cond A=2413.598654 
               k             |e|           ratio             |r|           ratio 
               1       4.5764854       0.0000000      10.4620417       0.0000000 
               2       0.2180254       0.0476404       0.0388724       0.0037156 
               3       0.0058026       0.0266142       0.0006767       0.0174076 
               4       0.0000012       0.0002040       0.0000004       0.0005660 
 
h =      1/128   ,n=129  eps=0.000001   method=MG 
k=4, cond A=9655.787254 
               k             |e|           ratio             |r|           ratio 
               1       6.4563811       0.0000000      14.6329466       0.0000000 
               2       0.3174234       0.0491643       0.0482766       0.0032992 
               3       0.0094526       0.0297790       0.0008327       0.0172495 
               4       0.0000034       0.0003597       0.0000004       0.0005216

Tables for CG with SSOR preconditioner

Tables for CG with SSOR preconditioner h=1/16

Tables for CG with SSOR preconditioner h=1/32

Tables for CG with SSOR preconditioner h=1/64

Tables for CG with SSOR preconditioner h=1/128

2.6.5.3 Table CG with incomplete cholesky

Table CG with incomplete cholesky preconditioner \(\epsilon =10^{-2}\), \(h=\frac{1}{16}\)

Table CG with incomplete cholesky preconditioner \(\epsilon =10^{-2}\), \(h=\frac{1}{32}\)

Table CG with incomplete cholesky preconditioner \(\epsilon =10^{-2}\), \(h=\frac{1}{64}\)

Table CG with incomplete cholesky preconditioner \(\epsilon =10^{-2}\), \(h=\frac{1}{128}\)

Tables for CG with incomplete cholesky preconditioner \(\epsilon =10^{-3}\)