HW 5

The test problem used is \[ \frac{\partial ^{2}u}{\partial x^{2}}+\frac{\partial ^{2}u}{\partial y^{2}}=0 \] on the unit square \(\left [ \left ( 0,1\right ) ,\left ( 0,1\right ) \right ] \) with zero boundary conditions.

The above problem is solved using the numerical method of conjugate gradient iterative solver. The mesh spacings used is \[ h=\left \{ \frac{1}{16},\frac{1}{32},\frac{1}{64},\frac{1}{128}\right \} \] and the tolerance used to check for convergence is\[ \varepsilon =10^{-6}\] The solver terminates when the mesh norm of the residual becomes smaller than the above quantity using the following check

\[ \sqrt{h}\left \Vert r^{\left ( k\right ) }\right \Vert _{2}<\varepsilon \] Where in the above, \(r^{\left ( k\right ) }\) is the residual at the \(k^{th}\) iteration and \(h\) is the current value of the mesh spacing.

The reason for using \(zero\) as the driving force on the RHS of the pde, is to allow the calculation and tracking of the error at each iteration as the exact solution \(u_{exact}\) for this problem is now known, which is zero. Now the error at each iteration \(k\) to be found using \begin{align*} e^{\left ( k\right ) } & =\left \Vert u_{exact}-u^{\left ( k\right ) }\right \Vert \\ & =\left \Vert u^{\left ( k\right ) }\right \Vert \end{align*}

Where in the above \(u^{\left ( k\right ) }\) represents the approximate solution at the \(k^{th}\) iteration.

A Matlab function \(CG.m\) was written to implement conjugate gradient solver. One of the parameters this function accepts is the name of the preconditioner to use. The following preconditioners are supported\[ \text{NONE,\ SSOR,\ MultiGrid,\ IncompleteCholesky}\] When NONE is speciﬁed, then no preconditioning is done.

For each preconditioner, the solver was run to ﬁnd the solution to the above test problem. The initial guess for the solution \(u^{\left ( 0\right ) }\) used was generated using Matlab rand() function.

In addition to the above plots, for each mesh spacing \(h\), the actual result table is printed which tabulates the above values at each iteration. This table was used to generate the above plots. The printed tables also show the ratio of the value of norm of the residual at the current iteration to its value at the previous iteration, similarly for the error norm.

Due to the large size of these tables, the tables for all the spacings and for each solver are available in the appendix.

The idea of conjugate gradient is to use preconditioning matrix to speed up the convergence of the conjugate gradient method. The original problem\[ Ax=f \] is transformed to a new problem\[ M^{-1}Ax=M^{-1}f \] such that \(M^{-1}A\) has a smaller condition number than \(A\). For most iterative solvers, the rate of convergence increases as the condition number of the system matrix \(A\) decreases.

The conjugate gradient method works only on symmetric positive deﬁnite \(A\) matrix, and its speed of convergence is aﬀected by the distribution of the eigenvalues of the \(A\) matrix. The estimate of convergence is more accurate if the distribution of eigenvalues is uniform. For the discrete 2D Poisson problem, this is the case, as veriﬁed by plots of the spectrum generated below for each case.

The preconditioning is used to modify the spectrum of \(A\) so that the eigenvalues of the new system matrix \(M^{-1}A\) become more clustered together causing the condition number to become smaller and thus increasing the convergence rate.

The following table was generated to show the eﬀect of preconditioning on lowering the condition number. It shows the condition number for CG (in other words, for the \(A\) matrix only), and then the condition number for \(M^{-1}A\) for diﬀerent solvers as mesh spacing is changed. It also shows below the condition number value, in a box, the maximum eigenvalue and the minimum eigenvalue. Notice that in the following table, if one tries to apply the\(\frac{\left \vert \max \lambda \right \vert }{\left \vert \min \lambda \right \vert }\)to determine the condition number, then the result will not match the condition number as shown. The above formula do not apply in this case, as these are sparse matrices and the condition number was found by estimating its value using the Matlab function condest() and not by applying the above formula.


solver	\(h=\frac{1}{16},N=225\)	\(h=\frac{1}{32},N=961\)	\(h=\frac{1}{64},N=3969\)	\(h=\frac{1}{128},N=16129\)

NONE	\(150,(7.923,0.0768)\)	\(603,(7.9807,0.01926)\)	\(2413,(7.9995,0.0048)\)	\(9655,(7.9987,0.001205)\)

\(SSOR\)	\(33,(0.25,0.018203)\)	\(128,(0.25,0.004748)\)	\(511,(0.25,0.0012)\)

IncCholesky \(\varepsilon =10^{-2}\)	\(32,(2.365,0.2279)\)	\(108,(2.44,0.0585)\)	\(422,(2.537,0.0146)\)

IncCholesky \(\varepsilon =10^{-3}\)	\(48,(2.337,0.508)\)	\(153,(2.526,0.1603)\)	\(373,(2.592,0.041756)\)

This diagram below reﬂects the above table result to clearly show the reduction of the condition number as a result of preconditioning.

The following is the algorithm used for the implementation of conjugate gradient with preconditioning.\[\begin{array} [c]{l}Input:\ A,f,tol,preconditionSolverName,dropTol\\ Output:\ \tilde{u}\ approximate\ solution\ to\ Ax=0\\ \\ u_{0}=\operatorname{rand}()\ (\ast initial\ guess\ of\ solution\ \ast )\\ r_{0}=f-Au_{0}\ (\ast initial\ residual\ast )\\ z_{0}\leftarrow CALL\ \ preconditionSolver(r_{0},A,preconditionSolverName,dropTol)\\ p_{0}=z_{0}\\ FOR\ k=1,2,3,\cdots \\ \ \ \ \ \ \ \ \ \begin{array} [c]{l}e_{k-1}=\sqrt{h}\left \Vert u_{k-1}\right \Vert _{2}\ \ (\ast \ the\ error\ since\ u_{exact}\ is\ known\ to\ be\ zero\ast )\\ \omega _{k-1}=Ap_{k-1}\\ \alpha _{k-1}=\frac{z_{k-1}^{T}r_{k-1}}{p_{k-1}^{T}\omega _{k-1}}\\ u_{k}=u_{k-1}+\alpha _{k-1\ }p_{k-1}\\ r_{k}=r_{k-1}-\alpha _{k-1}\omega _{k-1}\\ IF\ \left ( \sqrt{h}\left \Vert r_{k}\right \Vert _{2}\right ) <tol\ THEN\\ \ \ \ \ \ RETURN\ u_{k}\\ END\ IF\\ z_{k}\leftarrow CALL\ \ preconditionSolver(r_{k},A,preconditionSolverName,dropTol)\\ \beta _{k-1}=\frac{z_{k}^{T}r_{k}}{z_{k-1}^{T}r_{k-1}}\\ p_{k}=z_{k}+\beta _{k-1}p_{k-1}\end{array} \\ END\ FOR \end{array} \] The algorithm for the function \(preconditionSolver()\) is as follows\[\begin{array} [c]{l}Input:\ r,A,preconditionSolverName,dropTol\\ Output:\ z\ approximate\ solution\ to\ Mz=r\\ CASE\ preconditionSolverName\ IS\\ \ \ \ \ \ \ \ \ \begin{array} [c]{l}WHEN\ NONE\ z\leftarrow r\ \ \ \ \ \ \ \ \ \ //\ no\ preconditioning\ \\ WHEN\ MultiGrid\\ \ \ \ \ \ \ \begin{array} [c]{l}\mu _{1}=\mu _{2}=1\ (\ast \ presmoother\ and\ postsmoother\ast )\\ z\leftarrow CALL\ VCYCLE(zeroInitialGuess,r)\\ \ \ \ //VCYCLE\ is\ one\ implemented\ in\ HW4\ but\ changed\ to\ do\ \\ \ \ \ //one\ forward\ Gauss-Seidel/red-black\ followed\ by\ \\ \ \ \ //one\ reverse\ Gauss-Seidel/red-black\ \end{array} \\ WHEN\ \ SSOR\\ \ \ \ \ \ \ z\leftarrow CALL\ SOR\ forward\ followed\ by\ SOR\ in\ reverse\\ WHEN\ IncompleteCholesky\\\begin{array} [c]{l}R=cholinc(A,dropTol)\\ z\leftarrow R\backslash (R^{T}\backslash r) \end{array} \end{array} \\ END\ CASE\\ RETURN\ z \end{array} \]

2.6.2 Solvers eﬃciency and iterations count

In addition to ﬁnding the number of iterations needed for convergence by each solver, the problem also asked to compare the eﬃciency of each solver. This is done by ﬁnding the work needed by each solver to converge.

Work needed is deﬁned as\[ Work=NumberOfIterations\ \times \ WorkPerIteration \] Before determining the work for each solver, the following table lists the \(\boldsymbol{cputime}\) used by each solver for the diﬀerent spacings. The cpu time is measured using Matlab cputime function, and measures only the call to CG() and does not include any other calls such as plotting.


preconditioning	\(\begin{array} [c]{l}h=\frac{1}{16}\\ N=225 \end{array} \)	\(\begin{array} [c]{l}h=\frac{1}{32}\\ N=961 \end{array} \)	\(\begin{array} [c]{l}h=\frac{1}{64}\\ N=3969 \end{array} \)	\(\begin{array} [c]{l}h=\frac{1}{128}\\ N=16129 \end{array} \)

NONE	\(0.19\)	\(0.37\)	\(13.48\)	\(556.6\)

Multigrid	\(0.34\)	\(0.6\)	\(13.48\)	\(566\)

SSOR	\(0.23\)	\(0.5\)	\(13.3\)	\(564.6\)

Incomplete Cholesky\(\ \varepsilon =10^{-2}\)	\(0.16\)	\(0.34\)	\(13.8\)	\(559\)

Incomplete Cholesky\(\ \varepsilon =10^{-3}\)	\(0.19\)	\(0.5\)	\(13.3\)	\(559\)

Surprisingly, no appreciable diﬀerence can be seen between the diﬀerent solvers in terms of cpu time. It was expected that NONE would have the largest CPU time as it has the lowest eﬃciency. This result can be attributed to using small number of \(N\) values, which was not large enough in the limit to reﬂect the diﬀerence. One needs to use much larger values of \(N\) to see the eﬀect of preconditioning on CPU time diﬀerence. Due to memory limitation, this was not possible to implement at this time. Now the work per iteration is analyzed.

2.6.2.1 Work per iteration

All solvers perform similar work per iterations except for the step needed to apply the preconditioning to determine \(z_{k}\). The only diﬀerence between not using preconditioning and using one, is in the step to solve for \(z\) in \(Mz=r\). Using work per iteration as \(O\left ( N\right ) \) for the base CG with no preconditioning, then the following can be deﬁned for work per iteration for each solver:

2.6.2.2 Number of iterations

From lectures notes, it was found that the error rate in conjugate gradient (with no preconditioning) behaves as\[ \left \Vert e_{k}\right \Vert _{A}\leq 2\frac{\sqrt{\Bbbk \left ( A\right ) }-1}{\sqrt{\Bbbk \left ( A\right ) }+1}\left \Vert e_{0}\right \Vert _{A}\] Where \(\Bbbk \left ( A\right ) \) is the condition number of \(A\), it was shown that \(\Bbbk \left ( A\right ) =O\left ( h^{-1}\right ) \) where \(h\) is the mesh spacing. Hence, for ﬁxed tolerance, which is the case here, the number of iterations is \(O\left ( h^{-1}\right ) =O\left ( N^{1/2}\right ) \) where \(N\) is number of unknowns.

The results of the numerical experiment done agrees with the above, as shown below, for the case of NONE (which is the case of conjugate gradient with no preconditioning).

The following table was generated from result of running the program. In this table, \(N\) is the number of unknowns, \(\Bbbk \left ( M^{-1}A\right ) \) is the condition number of \(M^{-1}A\), and \(\Bbbk \left ( A\right ) \) is the condition number of \(A\). Since sparse matrices are used, the Matlab function condest() was used to ﬁnd the condition numbers. In this table I.C. means Incomplete Cholesky


preconditioner	\(\begin{array} [c]{l}h=\frac{1}{16}\\ N=225 \end{array} \)	\(\Bbbk \left ( M^{-1}A\right ) \)	\(\Bbbk \left ( A\right ) \)	\(\begin{array} [c]{l}h=\frac{1}{32}\\ N=961 \end{array} \)	\(\Bbbk \left ( M^{-1}A\right ) \)	\(\Bbbk \left ( A\right ) \)

NONE	\(42\)	\(N/A\)	\(150\)	\(82\)	\(N/A\)	\(603\)

Multigrid	\(4\)		\(150\)	\(4\)		\(603\)

SSOR	\(18\)	\(33\)	\(150\)	\(30\)	\(128\)	\(603\)

IC \(\varepsilon =10^{-2}\)	\(7\)	\(32\)	\(150\)	\(13\)	\(108\)	\(603\)

IC \(\varepsilon =10^{-3}\)	\(4\)	\(48\)	\(150\)	\(6\)	\(153\)	\(603\)


preconditioner	\(\begin{array} [c]{l}h=\frac{1}{64}\\ N=3969 \end{array} \)	\(\Bbbk \left ( M^{-1}A\right ) \)	\(\Bbbk \left ( A\right ) \)	\(\begin{array} [c]{l}h=\frac{1}{128}\\ N=16129 \end{array} \)	\(\Bbbk \left ( M^{-1}A\right ) \)	\(\Bbbk \left ( A\right ) \)

NONE	\(157\)	\(N/A\)	\(2413\)	\(291\)	N/A	\(9655\)

Multigrid	\(4\)		\(2413\)	\(4\)		\(9655\)

SSOR	\(56\)	\(511\)	\(2413\)	\(103\)	Memory problem	\(9655\)

IC \(\varepsilon =10^{-2}\)	\(22\)	\(422\)	\(2413\)	\(39\)	Memory problem	\(9655\)

IC \(\varepsilon =10^{-3}\)	\(8\)	\(373\)	\(2413\)	\(14\)	Memory problem	\(9655\)

From the above one can see that multigrid has \(O\left ( 1\right ) \) for the number of iterations. The number of iterations was \(4\) for all the cases from \(h=\frac{1}{16}\) to \(h=\frac{1}{128}\). For SSOR, the number of iterations grew sublinear in terms of \(N\), from the above one can estimate this to be \(O\left ( N^{1/4}\right ) \), while for no preconditioning, the number of iterations grew as approximately as \(O\left ( N^{\frac{1}{2}}\right ) \) as predicted by earlier analytical result.

2.6.3 Discussion of results and conclusions

The use of preconditioning on \(A\) caused a reduction of the number of iterations to convergence to the same ﬁxed tolerance when compared to convergence with no preconditioning. This was due to reduction of the condition number of the system matrix as can be seen in the above table. By reducing the largest eigenvalue, the rate of convergence increased. However, preconditioning also adds an extra cost per iteration. The extra work however, was also of order \(N\) and hence the ﬁnal eﬃciency was governed by the number of iterations for large \(N.\)

Therefore, this is the result of work eﬃciency for the main solvers, using the formula of \[ \text{Work}=\text{NumberOfIterations}\ \times \ \text{WorkPerIteration}\]

Incomplete Cholesky was not added as it was hard to estimate from the curve above the number of iterations and since depend on the drop tolerance values.

The above analysis showed that Multigrid is most eﬃcient, followed by Incomplete Cholesky (but these depend on the tolerance drop term), followed by SSOR, and ﬁnally by CG with no preconditioning

In conclusion, using Multigrid for preconditioner for conjugate gradient seems to be the most eﬀective solver.

2.6.4 Appendix

2.6.4.1 Result for CG with no preconditioner

2.6.4.2 Result for CG with Multigrid preconditioner

2.6.4.3 Result for CG with SSOR preconditioner

2.6.4.4 Plots for h=1/128

2.6.4.5 Result for CG with incomplete cholesky preconditioner \(\varepsilon =10^{-2}\)

2.6.5 Result for CG with incomplete cholesky preconditioner \(\varepsilon =10^{-3}\)

2.6.5.1 References

2.6.5.2 Computation Tables

Tolerance=\(10^{-6}\) method=NONE, Iterations = \(42\), condition number A=\(150.4167\)

Tolerance=\(10^{-6}\) method=NONE, Iterations =\(82\), condition number A=\(603\)

Tolerance=\(10^{-6}\) method=NONE, Iterations = \(157\), condition number A=\(2413\)

Tolerance=\(10^{-6}\) method=NONE, Iterations = \(291\), condition number A=\(9655\)

h =      1/16   eps=0.000001    method=MG
k=4, cond A=150.416930
               k             |e|           ratio             |r|           ratio
               1       2.1998630       0.0000000       5.3253083       0.0000000
               2       0.0956633       0.0434860       0.0363016       0.0068168
               3       0.0014271       0.0149181       0.0006556       0.0180607
               4       0.0000002       0.0001388       0.0000002       0.0003796

h =      1/32            eps=0.000001   method=MG
k=4,  cond A=603.051928
               k             |e|           ratio             |r|           ratio
               1       3.1204692       0.0000000       7.1238671       0.0000000
               2       0.1461191       0.0468260       0.0360066       0.0050544
               3       0.0031325       0.0214382       0.0006083       0.0168931
               4       0.0000004       0.0001403       0.0000003       0.0005447

h =      1/64    ,n=65   eps=0.000001   method=MG
k=4, cond A=2413.598654
               k             |e|           ratio             |r|           ratio
               1       4.5764854       0.0000000      10.4620417       0.0000000
               2       0.2180254       0.0476404       0.0388724       0.0037156
               3       0.0058026       0.0266142       0.0006767       0.0174076
               4       0.0000012       0.0002040       0.0000004       0.0005660

h =      1/128   ,n=129  eps=0.000001   method=MG
k=4, cond A=9655.787254
               k             |e|           ratio             |r|           ratio
               1       6.4563811       0.0000000      14.6329466       0.0000000
               2       0.3174234       0.0491643       0.0482766       0.0032992
               3       0.0094526       0.0297790       0.0008327       0.0172495
               4       0.0000034       0.0003597       0.0000004       0.0005216

2.6.5.3 Table CG with incomplete cholesky

Table CG with incomplete cholesky preconditioner \(\epsilon =10^{-2}\), \(h=\frac{1}{16}\)

Table CG with incomplete cholesky preconditioner \(\epsilon =10^{-2}\), \(h=\frac{1}{32}\)

Table CG with incomplete cholesky preconditioner \(\epsilon =10^{-2}\), \(h=\frac{1}{64}\)

Table CG with incomplete cholesky preconditioner \(\epsilon =10^{-2}\), \(h=\frac{1}{128}\)

2.6 HW 5

2.6.1 Introduction