2.5 HW 4

  2.5.1 Problem 1
  2.5.2 Problem 2
  2.5.3 Problem 3
  2.5.4 Note on using Full Multigrid cycle (FMG) to improve convergence
  2.5.5 References
  2.5.6 Source code listing
PDF (letter size)
PDF (legal size)

2.5.1 Problem 1

   2.5.1.1 Restriction and prolongation operators
   2.5.1.2 V cycle algorithm
   2.5.1.3 Multigrid V cycle function (V_cycle.m)
   2.5.1.4 Solver using V cycle (solver_Vcycle.m)
   2.5.1.5 Relaxation or smoother function (relax.m)
   2.5.1.6 Find residual function (find_residue.m)
   2.5.1.7 Find norm function (find_norm.m)
   2.5.1.8 Validate boundary conditions (check_all_zero_boundaries.m)
   2.5.1.9 Coarse to fine prolongation operator 2D (c2f.m)
   2.5.1.10 Fine to coarse full weight restriction operator 2D (f2c.m)
   2.5.1.11 Validate u and f have consistent dimensions(validate_dimensions.m)
   2.5.1.12 Validate grid for consistent dimensions (validate_dimensions_1.m)
   2.5.1.13 FMG solver (initial_solution_guess_using_FMG.m)
   2.5.1.14 Restriction operator for 1D (f2c_1D.m)
   2.5.1.15 Prolongation operator for 1D (c2f_1D.m)

pict
Figure 2.50:Problem 1

The multigrid V cycle algorithm was implemented in Matlab 2010a. The documented source code is included in the appendix of this problem.  

For relaxation, Gauss-Seidel with red-black (GSRB) ordering was used as the default. A relax() function was written to implement this method, in addition, this function can also implement relaxation using these solvers: Jacobi, Gauss-Seidel Lex, and SOR. Selecting the relaxation method is done via an argument option. These different methods are implemented in the relax() function for future numerical experimentation. GSRB is the one used for all the solutions below as required by the problem statement and is the default method. GSRB is known to have good smoothing rates and is suitable for parallelism as well.

For mapping from the fine mesh to coarse mesh, the full weighting method is used.

For mapping from coarse mesh to fine mesh, bilinear interpolation is used.

Additional auxiliary functions are written for performing the following: finding the norm (mesh norm), finding the residue and validating dimensions of the input.

The following diagram illustrates the call flow chart for a program making a call to the V cycle algorithm, such as the program written to solve problem 2 and 3. It shows the Matlab functions used, and the interface between them. This flow diagram also shows a full multigrid solver (FMG) function, which was implemented on top of the V cycle algorithm, but was not used to generate the results in problem 2 as the problem asked to use V cycle algorithm only. FMG cycle algorithm was implemented for future reference and to study its effect on reducing number of iterations. A note on this is can be found the end of this assignment.

pict
Figure 2.51:algorithm flow chart
2.5.1.1 Restriction and prolongation operators

The restriction operator \(I_{h}^{2h}\) (fine to coarse mesh) mapping uses full weighting, while the prolongation operator \(I_{2h}^{h}\) (coarse to fine mesh) uses bilinear interpolation.

For illustration, the following diagram shows applying these operators for the 1D case for a mesh of 9 points. The edge points are boundary points and in this problem (Dirichlet homogeneous boundary conditions), these will always be zero.

pict
Figure 2.52:operator diagrams
2.5.1.2 V cycle algorithm

The multigrid V cycle algorithm is recursive in nature. The following is description of the algorithm

Problem 2 below also has a diagram which helps understand this algorithm more. The Matlab function shown below implements the above algorithm.
2.5.1.3 Multigrid V cycle function (V_cycle.m)

This function implements one multigrid V cycle. It is recursive function

2.5.1.4 Solver using V cycle (solver_Vcycle.m)

This function is an interface to V cycle algorithm to use for solving the 2D Poisson problem. It uses V_cycle.m in a loop until convergence is reached.

2.5.1.5 Relaxation or smoother function (relax.m)

This function implements Gauss-Seidel red-black solver. It is a little longer than needed as it also implements other solvers as was mentioned in the introduction. These are added for future numerical experimentation. The

algorithm is straight forward. If the sum of the row and column index adds to an even value, then the grid point is considered a red grid point, else it is black. The red grid points are smoothed first, then the black grid points are smoothed.

2.5.1.6 Find residual function (find_residue.m)

This function is called from a number of locations to obtain the residue mesh. The residue is defined as

\[ r_{i,j}=f_{i,j}-\frac{1}{h^{2}}\left ( u_{i-1,j}+u_{i+1,j}+u_{i,j-1}+u_{i,j+1}-4u_{i,j}\right ) \]

2.5.1.7 Find norm function (find_norm.m)

This function is called from number of locations to obtain the norm of the mesh or any 2D matrix.

2.5.1.8 Validate boundary conditions (check_all_zero_boundaries.m)

An auxiliary function used by a number of functions to validate that input has consistent boundaries for this problem.

2.5.1.9 Coarse to fine prolongation operator 2D (c2f.m)

This function is the prolongation bilinear interpolation which implements coarse to fine grid mapping on 2D grid.

2.5.1.10 Fine to coarse full weight restriction operator 2D (f2c.m)

This function is the full weight restriction operator which implements the fine grid to coarse grid mapping on 2D grid.

2.5.1.11 Validate u and f have consistent dimensions(validate_dimensions.m)

An auxiliary function used by number of other function to validate that input dimensions are consistent.

2.5.1.12 Validate grid for consistent dimensions (validate_dimensions_1.m)

An auxiliary function used by number of other function to validate that a grid dimensions are consistent.

2.5.1.13 FMG solver (initial_solution_guess_using_FMG.m)

Implements a full multigrid cycle using V cycle algorithm as building block. Used to compare effect on solution only.

2.5.1.14 Restriction operator for 1D (f2c_1D.m)

This function is the full weight restriction operator which implements the fine grid to coarse grid mapping on 1D grid

2.5.1.15 Prolongation operator for 1D (c2f_1D.m)

This function is the prolongation bilinear interpolation which implements coarse to fine grid mapping on 1D grid

2.5.2 Problem 2

   2.5.2.1 Average convergence factor and work unit estimates
   2.5.2.2 Result
   2.5.2.3 Conclusion
   2.5.2.4 Appendix. Tables for each combination of \(\nu =\left ( \nu _{1},\upsilon _{2}\right ) \)

pict
Figure 2.53:Problem 2

The test problem used is \[ \nabla u=0 \] with zero boundary conditions on the unit square. This has a known solution which is zero.

Initial guess is a random solution generated using matlab rand(). Hence \(\left \Vert \mathbf{e}^{\left ( 0\right ) }\right \Vert \) has the same norm as the initial guess.

The V Cycle function written for problem 1 was used to generate the average convergence factor. For each combination of \(\nu _{1},\nu _{2}\), a table was generated which contained the following columns:

1.
Cycle number.
2.
The norm of the residue \(\left \Vert r^{\left ( k\right ) }\right \Vert =\left \Vert f-Au^{\left ( k\right ) }\right \Vert \) after each cycle.
3.
Ratio of the current residue norm to the previous residue norm \(\frac{\left \Vert r^{\left ( k+1\right ) }\right \Vert }{\left \Vert r^{\left ( k\right ) }\right \Vert }.\)
4.
Error norm \(\left \Vert e^{\left ( k\right ) }\right \Vert =\left \Vert u-u^{\left ( k\right ) }\right \Vert =\left \Vert u^{\left ( k\right ) }\right \Vert \) (since exact solution is zero).
5.
The ratio of the current error norm to the previous error norm \(\frac{\left \Vert e^{\left ( k+1\right ) }\right \Vert }{\left \Vert e^{\left ( k\right ) }\right \Vert }.\)
6.
The average convergence factor up to each cycle \(\left ( \frac{\left \Vert e^{\left ( k\right ) }\right \Vert _{\infty }}{\left \Vert e^{\left ( 0\right ) }\right \Vert _{\infty }}\right ) ^{\left ( \frac{1}{k}\right ) }.\)

The problem asked to generate result for \(\nu \leq 4\). This solution extended this to \(\upsilon \leq 8\) to use the results for future study if needed.  The summary table below shows the final result for \(k=15\). Each individual table generated for each combination of \(\nu _{1},\nu _{2}\) is listed in the appendix of this problem. The function HW4_problem2() was used to generate these tables and to calculate the work done for each solver combination of \(\nu _{1},\nu _{2}.\)

2.5.2.1 Average convergence factor and work unit estimates

To determine the most efficient solver, the amount of work by each solver that achieves the same convergence is determined. The solver with the least amount of work is deemed the most efficient. The total amount of work for convergence is defined as

\begin{equation} \text{WORK=number\ of\ Iterations\ for\ convergence\ }\times \text{work\ per\ iteration} \tag{1} \end{equation} An iteration is one full V cycle. Each V cycle contains a number of levels. The same number of levels exist on the left side of the V cycle as on the right side of the V cycle. On the left side of the V cycle, work at each level consist of the following items

1.
Work to perform \(\nu _{1}\) number of pre-smooth operations.
2.
Work needed to map to the next lower level of the grid.
3.
Work needed to compute the residue.

The following diagram helps to illustrates this.

pict
Figure 2.54:problem 2 v cycle shape

On the right side of the V cycle, work at each level consist of the following items

1.
Work to perform \(\nu _{2}\) number of post-smooth operations.
2.
Work needed to map to the next lower level of the grid.

At each level, the work is proportional to the size of the grid at that level. For smoothing, it is estimated that 7 flops are needed to obtain an average of each grid point. (5 additions, one multiplication, one division) based on the use of the following formula\[ u_{i,j}=\frac{1}{4}\left ( u_{i-1,j}+u_{i+1,j}+u_{i,j-1}+u_{i,j+1}-h^{2}f\right ) \] Therefore work needed for smoothing is \(7\times \left ( v_{1}+v_{2}\right ) \times N\) where \(N\) is the total number of grid points at that level. (Boundary grid points are not involved in this work, but for simplicity of analysis, the total number of grid points \(N\) is used).

Work needed for finding the residual is also about \(7N.\) Work needed for mapping to the next grid level is about \(6N.\)

On the right branch of the cycle no residual calculation is required. To simplify the analysis, it is assumed that the same work is performed at each level on both sides of the V cycle.

Therefore, Letting \(N\) be the number of grid points at the most fine level (the number of unknowns), the total work per cycle is found to be\begin{align*} work\ per\ V\ cycle & =\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N+\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) \frac{N}{4}+\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) \frac{N}{16}+\cdots \\ = & \left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N\left [ 1+\frac{1}{4}+\frac{1}{4^{2}}+\cdots +\frac{1}{4^{L-1}}\right ] \end{align*}

Where \(L\) is the number of levels. In the limit as \(L\) becomes very large, this becomes a geometric series whose sum is \(\frac{\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N}{1-r}\) where \(r=\frac{1}{4}\)

Therefore, total amount of work from (1) becomes\[ W=M\times \frac{4}{3}\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N \] Where \(M\) is number of iterations. Using the same tolerance \(\varepsilon \) for all solvers, \(M=\frac{\log \left ( \varepsilon \right ) }{\log \left ( \rho \right ) }\). Using the average convergence rate found as an estimate for the spectral radius \(\rho \), \(W\) can now be found as a function of \(N\)\[ W=\left ( \frac{\log \left ( \varepsilon \right ) }{\log \left ( \rho \right ) }\right ) \left ( \frac{4}{3}\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N\right ) \] For the purpose of comparing the different solvers, the value of \(\varepsilon \) used is not important as long as it is the same value in all cases. Hence, for numerical computation, let \(\varepsilon =10^{-6}\) and the above becomes\[ W=\left ( \frac{-6}{\log \left ( \rho \right ) }\right ) \left ( \frac{4}{3}\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N\right ) \] The program written for this problem uses the above equation to calculate the work done for each solver. The result is shown below. This table shows the work done by each solver for convergence based on the same tolerance. As mentioned above, changing the tolerance value will not change the result, as the effect will be to scale all result by the same amount.

2.5.2.2 Result









\(v=\left ( v_{1},v_{2}\right ) \)
\(v_{1}+v_{2}\)
\(CF=\left ( \frac{\left \Vert e^{\left ( k\right ) }\right \Vert _{\infty }}{\left \Vert e^{\left ( 0\right ) }\right \Vert _{\infty }}\right ) ^{\left ( \frac{1}{k}\right ) }\)
work \(\frac{-6}{\log \left ( \rho \right ) }\times \frac{4}{3}\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N\)




\(\left ( 0,1\right ) \) \(1\) \(0.374588\) \(213\ N\)




\(\left ( 0,2\right ) \) \(2\) \(0.202747\) \(177\ N\)




\(\left ( 0,3\right ) \) \(3\) \(0.132526\) \(174.9\ N\)




\(\left ( 0,4\right ) \) \(4\) \(0.098857\) \(182.7\ N\)




\(\left ( 1,0\right ) \) \(1\) \(0.323688\) \(182.4\ N\)




\(\left ( 1,1\right ) \) \(2\) \(0.116811\) \(129.4\ N\)




\(\left ( 1,2\right ) \) \(3\) \(0.079578\) \(138.4\ N\)




\(\left ( 1,3\right ) \) \(4\) \(0.060307\) \(150.5\ N\)




\(\left ( 1,4\right ) \) \(5\) \(0.049019\) \(164.1\ N\)




\(\left ( 2,0\right ) \) \(2\) \(0.171973\) \(158.5\ N\)




\(\left ( 2,1\right ) \) \(3\) \(0.079845\) \(138.6\ N\)




\(\left ( 2,2\right ) \) \(4\) \(0.060424\) \(150.6\ N\)




\(\left ( 2,3\right ) \) \(5\) \(0.049068\) \(164.2\ N\)




\(\left ( 2,4\right ) \) \(6\) \(0.041573\) \(178.4\ N\)




\(\left ( 3,0\right ) \) \(3\) \(0.117023\) \(163.6\ N\)




\(\left ( 3,1\right ) \) \(4\) \(0.060444\) \(150.6\ N\)




\(\left ( 3,2\right ) \) \(5\) \(0.049072\) \(164.2\ N\)




\(\left ( 3,3\right ) \) \(6\) \(0.041575\) \(178.4\ N\)




\(\left ( 3,4\right ) \) \(7\) \(0.036128\) \(192.6\ N\)




\(\left ( 4,0\right ) \) \(4\) \(0.088624\) \(174.6\ N\)




\(\left ( 4,1\right ) \) \(5\) \(0.049075\) \(164.2\ N\)




\(\left ( 4,2\right ) \) \(6\) \(0.041575\) \(178.4\ N\)




\(\left ( 4,3\right ) \) \(7\) \(0.036128\) \(192.6\ N\)




\(\left ( 4,4\right ) \) \(8\) \(0.031908\) \(206.7\ N\)




2.5.2.3 Conclusion

From the above result, The least work was for the \(\left ( 1,1\right ) \) solver, followed by \(\left ( 1,2\right ) \) which had about the same as the \(\left ( 2,1\right ) \) solver. This result shows that using \(v=2\) or \(v=3\) is the most efficient solver in terms of least work required.

Notice that in full multigrid, the combination which makes up the value \(v\) is important (While for the case of the 2 level multigrid, this is not the case). For example, as shown in the above table, work for solver \(v=\left ( 1,2\right ) \) was \(138\ N\) while work for solver \(v=\left ( 3,0\right ) \) was \(\allowbreak \allowbreak 163\ N\) even though they both add to same total number of smooth operations \(v=3\).

2.5.2.4 Appendix. Tables for each combination of \(\nu =\left ( \nu _{1},\upsilon _{2}\right ) \)

The following tables are the result of running problem 2 program on the test problem. The fields for each table are described above. The last row in each table contain the result for \(k=15\). The value of the average convergence factor used is that for \(k=15\) under the column heading convergence factor. This below is link to the text file containing the tables as they are printed by the matlab function.

table_result_work.txt

Result for \(\upsilon _{1}=0,\upsilon _{2}=0\)

Result for \(\upsilon _{1}=0,\upsilon _{2}=1\)

Result for \(\upsilon _{1}=0,\upsilon _{2}=2\)

Result for \(\upsilon _{1}=0,\upsilon _{2}=3\)

Result for \(\upsilon _{1}=0,\upsilon _{2}=4\)

Result for \(\upsilon _{1}=1,\upsilon _{2}=0\)

Result for \(\upsilon _{1}=1,\upsilon _{2}=1\)

Result for \(\upsilon _{1}=1,\upsilon _{2}=2\)

Result for \(\upsilon _{1}=1,\upsilon _{2}=3\)

Result for \(\upsilon _{1}=1,\upsilon _{2}=4\)

Result for \(\upsilon _{1}=2,\upsilon _{2}=0\)

Result for \(\upsilon _{1}=2,\upsilon _{2}=1\)

Result for \(\upsilon _{1}=2,\upsilon _{2}=2\)

Result for \(\upsilon _{1}=2,\upsilon _{2}=3\)

Result for \(\upsilon _{1}=2,\upsilon _{2}=4\)

Result for \(\upsilon _{1}=3,\upsilon _{2}=0\)

Result for \(\upsilon _{1}=3,\upsilon _{2}=1\)

Result for \(\upsilon _{1}=3,\upsilon _{2}=2\)

Result for \(\upsilon _{1}=3,\upsilon _{2}=3\)

Result for \(\upsilon _{1}=3,\upsilon _{2}=4\)

Result for \(\upsilon _{1}=4,\upsilon _{2}=0\)

Result for \(\upsilon _{1}=4,\upsilon _{2}=1\)

Result for \(\upsilon _{1}=4,\upsilon _{2}=2\)

Result for \(\upsilon _{1}=4,\upsilon _{2}=3\)

Result for \(\upsilon _{1}=4,\upsilon _{2}=4\)

2.5.3 Problem 3

pict
Figure 2.55:problem 3

This problem was solved using multigrid V cycle method. The following is the solution found

pict
Figure 2.56:problem 3 solutions found earlier

Tolerance used is \(h^{2}=0.000061\), the number of grid points along one dimension is \(n=129\), the spacing is \(2^{-7}\).

V cycle method converged in \(5\) iterations. The number of pre-smooth is \(1\) and the number of post smooth is \(1\). These are selected since in problem 2 it was found they lead to the most efficient solver.

Hence, the amount of work done \[ W=\left ( \frac{\log \left ( \varepsilon \right ) }{\log \left ( \rho \right ) }\right ) \left ( \frac{3}{4}\left ( 7\left ( v_{1}+v_{2}\right ) +13\right ) N\right ) \] From the table in problem 2, \(\rho =0.116811\) for the solver \(\left ( 1,1\right ) \), and given that \(N=\left ( n-2\right ) ^{2}=127^{2}=16\,129\), hence the above becomes\begin{align*} W & =\left ( \frac{\log _{10}\left ( 0.000061\right ) }{\log _{10}\left ( 0.116811\right ) }\right ) \left ( \frac{3}{4}\left ( 7\left ( 1+1\right ) +13\right ) 16\,129\right ) \\ & =1.\,476\,2\times 10^{6}\text{ operations} \end{align*}

The above is compared with the solvers used in HW3, for same tolerance as above and same \(h\), from HW3, the results were the following





method
Number of iterations




Jacobi \(31702\)


Gauss-Seidel \(15852\)


SOR \(306\)


To compare work between all methods, it is required to find the work per iteration for the Jacobi, GS and SOR.

Work per iteration in these methods required only one smooth operation, and one calculation for the residue. No mapping between different grid sizes was needed. Hence, assuming about 13 flops to calculate the averaging and residue per one grid point, work per iteration for the above solver becomes\[ Work\ Per\ iteration=\left ( 6N+7N\right ) =13N \] where \(N\) is the total number of grid points which is \(16129\) in this example.

Therefore, total work can be found for all the methods, including the multigrid solver. The following table summarizes the result




method Number of iterations W (flops)



Jacobi \(31702\) \(31702\times 13\times 16129=6.647\,2\times 10^{9}\)



Gauss-Seidel \(15852\) \(15852\times 13\times 16129=3.323\,8\times 10^{9}\)



SOR \(306\) \(306\times 13\times 16129=6.\,416\,1\times 10^{7}\)



Multigrid V Cycle \(5\) \(1.\,\allowbreak 476\,2\times 10^{6}\)



Using the multigrid as a base measure, and normalizing other solvers relative to it, the above becomes



method work


Jacobi \(\frac{6.\,647\,2\times 10^{9}}{1.476\,2\times 10^{6}}=4502.9\)


Gauss-Seidel \(\frac{3.\,323\,8\times 10^{9}}{1.\,476\,2\times 10^{6}}=2251.6\)


SOR \(\frac{6.\,416\,1\times 10^{7}}{1.\,476\,2\times 10^{6}}=43.464\)


Multigrid V Cycle \(1\)


The above shows clearly that Multigrid is the most efficient solver. SOR required about \(44\) times as much work, GS over \(2251\) more work, and Jacobi about \(4500\) more work.

2.5.4 Note on using Full Multigrid cycle (FMG) to improve convergence

It was found that using FMG to determine a better initial guess solution before initiating the V cycle algorithm resulted in about 40% reduction in the number of iterations needed to convergence by the V cycle algorithm.

The following is a plot of one of the tests performed showing the difference in number of iterations needed to converge. All other parameters are kept the same. This shows that with FMG cycle, convergence reached in 4 iterations, while without FMG, it was reached in 7 cycles. The cost of the FMG cycle itself was not taken into account. It is estimated that the FMG correction cycle adds about \(\frac{1}{2}\) the cost of one V cycle to the total cost.

pict
Figure 2.57:compare with FMG

The following is a small function written to compare convergence when using FMG and without using FMG which generated the above result. (see code on web page)

2.5.5 References

[1] Thor Gjesdal, Analysis of a new Red-Black ordering for Gauss-Seidel smoothing in cell-centered multigrid. ref no. CMR-93-A200007, November 1993.

[2] William L. Briggs, A multigrid tutorial, W-7405-Eng-48.

[3] Robert Guy, Lecture notes, Math 228a, Fall 2010. Mathematics dept. UC Davis.

[4] Jim Demmel, Lecture notes, CS267, 1996. UC Berkeley.

[5] P. Wesseling, A survey of Fourier smoothing analysis results. ISNM 98, Multigrid methods II page 105-106.

2.5.6 Source code listing