HW 4

The multigrid V cycle algorithm was implemented in Matlab 2010a. The documented source code is included in the appendix of this problem.

For relaxation, Gauss-Seidel with red-black (GSRB) ordering was used as the default. A relax() function was written to implement this method, in addition, this function can also implement relaxation using these solvers: Jacobi, Gauss-Seidel Lex, and SOR. Selecting the relaxation method is done via an argument option. These diﬀerent methods are implemented in the relax() function for future numerical experimentation. GSRB is the one used for all the solutions below as required by the problem statement and is the default method. GSRB is known to have good smoothing rates and is suitable for parallelism as well.

For mapping from the ﬁne mesh to coarse mesh, the full weighting method is used.

Additional auxiliary functions are written for performing the following: ﬁnding the norm (mesh norm), ﬁnding the residue and validating dimensions of the input.

The following diagram illustrates the call ﬂow chart for a program making a call to the V cycle algorithm, such as the program written to solve problem 2 and 3. It shows the Matlab functions used, and the interface between them. This ﬂow diagram also shows a full multigrid solver (FMG) function, which was implemented on top of the V cycle algorithm, but was not used to generate the results in problem 2 as the problem asked to use V cycle algorithm only. FMG cycle algorithm was implemented for future reference and to study its eﬀect on reducing number of iterations. A note on this is can be found the end of this assignment.

2.5.1.1 Restriction and prolongation operators

The restriction operator \(I_{h}^{2h}\) (ﬁne to coarse mesh) mapping uses full weighting, while the prolongation operator \(I_{2h}^{h}\) (coarse to ﬁne mesh) uses bilinear interpolation.

For illustration, the following diagram shows applying these operators for the 1D case for a mesh of 9 points. The edge points are boundary points and in this problem (Dirichlet homogeneous boundary conditions), these will always be zero.

2.5.1.2 V cycle algorithm

The multigrid V cycle algorithm is recursive in nature. The following is description of the algorithm

2.5.1.3 Multigrid V cycle function (V_cycle.m)

2.5.1.4 Solver using V cycle (solver_Vcycle.m)

This function is an interface to V cycle algorithm to use for solving the 2D Poisson problem. It uses V_cycle.m in a loop until convergence is reached.

2.5.1.5 Relaxation or smoother function (relax.m)

This function implements Gauss-Seidel red-black solver. It is a little longer than needed as it also implements other solvers as was mentioned in the introduction. These are added for future numerical experimentation. The

algorithm is straight forward. If the sum of the row and column index adds to an even value, then the grid point is considered a red grid point, else it is black. The red grid points are smoothed ﬁrst, then the black grid points are smoothed.

2.5.1.6 Find residual function (ﬁnd_residue.m)

This function is called from a number of locations to obtain the residue mesh. The residue is deﬁned as

\[ r_{i,j}=f_{i,j}-\frac{1}{h^{2}}\left ( u_{i-1,j}+u_{i+1,j}+u_{i,j-1}+u_{i,j+1}-4u_{i,j}\right ) \]

2.5.1.7 Find norm function (ﬁnd_norm.m)

This function is called from number of locations to obtain the norm of the mesh or any 2D matrix.

2.5.1.8 Validate boundary conditions (check_all_zero_boundaries.m)

An auxiliary function used by a number of functions to validate that input has consistent boundaries for this problem.

2.5.1.9 Coarse to ﬁne prolongation operator 2D (c2f.m)

This function is the prolongation bilinear interpolation which implements coarse to ﬁne grid mapping on 2D grid.

2.5.1.10 Fine to coarse full weight restriction operator 2D (f2c.m)

This function is the full weight restriction operator which implements the ﬁne grid to coarse grid mapping on 2D grid.

2.5.1.11 Validate u and f have consistent dimensions(validate_dimensions.m)

An auxiliary function used by number of other function to validate that input dimensions are consistent.

2.5.1.12 Validate grid for consistent dimensions (validate_dimensions_1.m)

An auxiliary function used by number of other function to validate that a grid dimensions are consistent.

2.5.1.13 FMG solver (initial_solution_guess_using_FMG.m)

Implements a full multigrid cycle using V cycle algorithm as building block. Used to compare eﬀect on solution only.

2.5.1.14 Restriction operator for 1D (f2c_1D.m)

This function is the full weight restriction operator which implements the ﬁne grid to coarse grid mapping on 1D grid

2.5.1.15 Prolongation operator for 1D (c2f_1D.m)

This function is the prolongation bilinear interpolation which implements coarse to ﬁne grid mapping on 1D grid

2.5.2 Problem 2

The test problem used is \[ \nabla u=0 \] with zero boundary conditions on the unit square. This has a known solution which is zero.

Initial guess is a random solution generated using matlab rand(). Hence \(\left \Vert \mathbf{e}^{\left ( 0\right ) }\right \Vert \) has the same norm as the initial guess.

The V Cycle function written for problem 1 was used to generate the average convergence factor. For each combination of \(\nu _{1},\nu _{2}\), a table was generated which contained the following columns:

The problem asked to generate result for \(\nu \leq 4\). This solution extended this to \(\upsilon \leq 8\) to use the results for future study if needed. The summary table below shows the ﬁnal result for \(k=15\). Each individual table generated for each combination of \(\nu _{1},\nu _{2}\) is listed in the appendix of this problem. The function HW4_problem2() was used to generate these tables and to calculate the work done for each solver combination of \(\nu _{1},\nu _{2}.\)

2.5.2.1 Average convergence factor and work unit estimates

To determine the most eﬃcient solver, the amount of work by each solver that achieves the same convergence is determined. The solver with the least amount of work is deemed the most eﬃcient. The total amount of work for convergence is deﬁned as

\begin{equation} \text{WORK=number\ of\ Iterations\ for\ convergence\ }\times \text{work\ per\ iteration} \tag{1} \end{equation} An iteration is one full V cycle. Each V cycle contains a number of levels. The same number of levels exist on the left side of the V cycle as on the right side of the V cycle. On the left side of the V cycle, work at each level consist of the following items

On the right side of the V cycle, work at each level consist of the following items

At each level, the work is proportional to the size of the grid at that level. For smoothing, it is estimated that 7 ﬂops are needed to obtain an average of each grid point. (5 additions, one multiplication, one division) based on the use of the following formula\[ u_{i,j}=\frac{1}{4}\left ( u_{i-1,j}+u_{i+1,j}+u_{i,j-1}+u_{i,j+1}-h^{2}f\right ) \] Therefore work needed for smoothing is \(7\times \left ( v_{1}+v_{2}\right ) \times N\) where \(N\) is the total number of grid points at that level. (Boundary grid points are not involved in this work, but for simplicity of analysis, the total number of grid points \(N\) is used).

Work needed for ﬁnding the residual is also about \(7N.\) Work needed for mapping to the next grid level is about \(6N.\)

On the right branch of the cycle no residual calculation is required. To simplify the analysis, it is assumed that the same work is performed at each level on both sides of the V cycle.

Therefore, Letting \(N\) be the number of grid points at the most ﬁne level (the number of unknowns), the total work per cycle is found to be\begin{align*} work\ per\ V\ cycle & =\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N+\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) \frac{N}{4}+\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) \frac{N}{16}+\cdots \\ = & \left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N\left [ 1+\frac{1}{4}+\frac{1}{4^{2}}+\cdots +\frac{1}{4^{L-1}}\right ] \end{align*}

Where \(L\) is the number of levels. In the limit as \(L\) becomes very large, this becomes a geometric series whose sum is \(\frac{\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N}{1-r}\) where \(r=\frac{1}{4}\)

Therefore, total amount of work from (1) becomes\[ W=M\times \frac{4}{3}\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N \] Where \(M\) is number of iterations. Using the same tolerance \(\varepsilon \) for all solvers, \(M=\frac{\log \left ( \varepsilon \right ) }{\log \left ( \rho \right ) }\). Using the average convergence rate found as an estimate for the spectral radius \(\rho \), \(W\) can now be found as a function of \(N\)\[ W=\left ( \frac{\log \left ( \varepsilon \right ) }{\log \left ( \rho \right ) }\right ) \left ( \frac{4}{3}\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N\right ) \] For the purpose of comparing the diﬀerent solvers, the value of \(\varepsilon \) used is not important as long as it is the same value in all cases. Hence, for numerical computation, let \(\varepsilon =10^{-6}\) and the above becomes\[ W=\left ( \frac{-6}{\log \left ( \rho \right ) }\right ) \left ( \frac{4}{3}\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N\right ) \] The program written for this problem uses the above equation to calculate the work done for each solver. The result is shown below. This table shows the work done by each solver for convergence based on the same tolerance. As mentioned above, changing the tolerance value will not change the result, as the eﬀect will be to scale all result by the same amount.

2.5.2.2 Result



\(v=\left ( v_{1},v_{2}\right ) \)	\(v_{1}+v_{2}\)	\(CF=\left ( \frac{\left \Vert e^{\left ( k\right ) }\right \Vert _{\infty }}{\left \Vert e^{\left ( 0\right ) }\right \Vert _{\infty }}\right ) ^{\left ( \frac{1}{k}\right ) }\)	work \(\frac{-6}{\log \left ( \rho \right ) }\times \frac{4}{3}\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N\)

\(\left ( 0,1\right ) \)	\(1\)	\(0.374588\)	\(213\ N\)

\(\left ( 0,2\right ) \)	\(2\)	\(0.202747\)	\(177\ N\)

\(\left ( 0,3\right ) \)	\(3\)	\(0.132526\)	\(174.9\ N\)

\(\left ( 0,4\right ) \)	\(4\)	\(0.098857\)	\(182.7\ N\)

\(\left ( 1,0\right ) \)	\(1\)	\(0.323688\)	\(182.4\ N\)

\(\left ( 1,1\right ) \)	\(2\)	\(0.116811\)	\(129.4\ N\)

\(\left ( 1,2\right ) \)	\(3\)	\(0.079578\)	\(138.4\ N\)

\(\left ( 1,3\right ) \)	\(4\)	\(0.060307\)	\(150.5\ N\)

\(\left ( 1,4\right ) \)	\(5\)	\(0.049019\)	\(164.1\ N\)

\(\left ( 2,0\right ) \)	\(2\)	\(0.171973\)	\(158.5\ N\)

\(\left ( 2,1\right ) \)	\(3\)	\(0.079845\)	\(138.6\ N\)

\(\left ( 2,2\right ) \)	\(4\)	\(0.060424\)	\(150.6\ N\)

\(\left ( 2,3\right ) \)	\(5\)	\(0.049068\)	\(164.2\ N\)

\(\left ( 2,4\right ) \)	\(6\)	\(0.041573\)	\(178.4\ N\)

\(\left ( 3,0\right ) \)	\(3\)	\(0.117023\)	\(163.6\ N\)

\(\left ( 3,1\right ) \)	\(4\)	\(0.060444\)	\(150.6\ N\)

\(\left ( 3,2\right ) \)	\(5\)	\(0.049072\)	\(164.2\ N\)

\(\left ( 3,3\right ) \)	\(6\)	\(0.041575\)	\(178.4\ N\)

\(\left ( 3,4\right ) \)	\(7\)	\(0.036128\)	\(192.6\ N\)

\(\left ( 4,0\right ) \)	\(4\)	\(0.088624\)	\(174.6\ N\)

\(\left ( 4,1\right ) \)	\(5\)	\(0.049075\)	\(164.2\ N\)

\(\left ( 4,2\right ) \)	\(6\)	\(0.041575\)	\(178.4\ N\)

\(\left ( 4,3\right ) \)	\(7\)	\(0.036128\)	\(192.6\ N\)

\(\left ( 4,4\right ) \)	\(8\)	\(0.031908\)	\(206.7\ N\)

2.5.2.3 Conclusion

From the above result, The least work was for the \(\left ( 1,1\right ) \) solver, followed by \(\left ( 1,2\right ) \) which had about the same as the \(\left ( 2,1\right ) \) solver. This result shows that using \(v=2\) or \(v=3\) is the most eﬃcient solver in terms of least work required.

Notice that in full multigrid, the combination which makes up the value \(v\) is important (While for the case of the 2 level multigrid, this is not the case). For example, as shown in the above table, work for solver \(v=\left ( 1,2\right ) \) was \(138\ N\) while work for solver \(v=\left ( 3,0\right ) \) was \(\allowbreak \allowbreak 163\ N\) even though they both add to same total number of smooth operations \(v=3\).

2.5.2.4 Appendix. Tables for each combination of \(\nu =\left ( \nu _{1},\upsilon _{2}\right ) \)

The following tables are the result of running problem 2 program on the test problem. The ﬁelds for each table are described above. The last row in each table contain the result for \(k=15\). The value of the average convergence factor used is that for \(k=15\) under the column heading convergence factor. This below is link to the text ﬁle containing the tables as they are printed by the matlab function.

2.5.3 Problem 3

This problem was solved using multigrid V cycle method. The following is the solution found

Tolerance used is \(h^{2}=0.000061\), the number of grid points along one dimension is \(n=129\), the spacing is \(2^{-7}\).

V cycle method converged in \(5\) iterations. The number of pre-smooth is \(1\) and the number of post smooth is \(1\). These are selected since in problem 2 it was found they lead to the most eﬃcient solver.

Hence, the amount of work done \[ W=\left ( \frac{\log \left ( \varepsilon \right ) }{\log \left ( \rho \right ) }\right ) \left ( \frac{3}{4}\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N\right ) \] From the table in problem 2, \(\rho =0.116811\) for the solver \(\left ( 1,1\right ) \), and given that \(N=\left ( n-2\right ) ^{2}=127^{2}=16\,129\), hence the above becomes\begin{align*} W & =\left ( \frac{\log _{10}\left ( 0.000061\right ) }{\log _{10}\left ( 0.116811\right ) }\right ) \left ( \frac{3}{4}\left ( 7\left ( 1+1\right ) +13\right ) 16\,129\right ) \\ & =1.\,476\,2\times 10^{6}\text{ operations} \end{align*}

The above is compared with the solvers used in HW3, for same tolerance as above and same \(h\), from HW3, the results were the following



method	Number of iterations


Jacobi	\(31702\)

Gauss-Seidel	\(15852\)

SOR	\(306\)

To compare work between all methods, it is required to ﬁnd the work per iteration for the Jacobi, GS and SOR.

Work per iteration in these methods required only one smooth operation, and one calculation for the residue. No mapping between diﬀerent grid sizes was needed. Hence, assuming about 13 ﬂops to calculate the averaging and residue per one grid point, work per iteration for the above solver becomes\[ Work\ Per\ iteration=\left ( 6N+7N\right ) =13N \] where \(N\) is the total number of grid points which is \(16129\) in this example.

Therefore, total work can be found for all the methods, including the multigrid solver. The following table summarizes the result


method	Number of iterations	W (ﬂops)

Jacobi	\(31702\)	\(31702\times 13\times 16129=6.647\,2\times 10^{9}\)

Gauss-Seidel	\(15852\)	\(15852\times 13\times 16129=3.323\,8\times 10^{9}\)

SOR	\(306\)	\(306\times 13\times 16129=6.\,416\,1\times 10^{7}\)

Multigrid V Cycle	\(5\)	\(1.\,\allowbreak 476\,2\times 10^{6}\)

Using the multigrid as a base measure, and normalizing other solvers relative to it, the above becomes


method	work

Jacobi	\(\frac{6.\,647\,2\times 10^{9}}{1.476\,2\times 10^{6}}=4502.9\)

Gauss-Seidel	\(\frac{3.\,323\,8\times 10^{9}}{1.\,476\,2\times 10^{6}}=2251.6\)

SOR	\(\frac{6.\,416\,1\times 10^{7}}{1.\,476\,2\times 10^{6}}=43.464\)

Multigrid V Cycle	\(1\)

The above shows clearly that Multigrid is the most eﬃcient solver. SOR required about \(44\) times as much work, GS over \(2251\) more work, and Jacobi about \(4500\) more work.

2.5.4 Note on using Full Multigrid cycle (FMG) to improve convergence

It was found that using FMG to determine a better initial guess solution before initiating the V cycle algorithm resulted in about 40% reduction in the number of iterations needed to convergence by the V cycle algorithm.

The following is a plot of one of the tests performed showing the diﬀerence in number of iterations needed to converge. All other parameters are kept the same. This shows that with FMG cycle, convergence reached in 4 iterations, while without FMG, it was reached in 7 cycles. The cost of the FMG cycle itself was not taken into account. It is estimated that the FMG correction cycle adds about \(\frac{1}{2}\) the cost of one V cycle to the total cost.

The following is a small function written to compare convergence when using FMG and without using FMG which generated the above result. (see code on web page)

2.5.5 References

[1] Thor Gjesdal, Analysis of a new Red-Black ordering for Gauss-Seidel smoothing in cell-centered multigrid. ref no. CMR-93-A200007, November 1993.

[3] Robert Guy, Lecture notes, Math 228a, Fall 2010. Mathematics dept. UC Davis.

[5] P. Wesseling, A survey of Fourier smoothing analysis results. ISNM 98, Multigrid methods II page 105-106.

2.5 HW 4

2.5.1 Problem 1