To ﬁnd a numerical value for the integral of a real valued function of a real variable over a speciﬁc range over the real line. This means to evaluate \[ I={\int \limits _{a}^{b}}f\relax (x) \ dx \] Geometrically, this integral represents the area under \(f(x)\) from \(a\) to \(b\)
We can always approximate the area by dividing it in equal width strips and then sum the areas of all the strips.
In general, there will always be an error in the estimate of the area using this method. The error will become smaller the more strips we use (which implies a smaller strip width). Hence we can write\[ {\int \limits _{a}^{b}}f\relax (x) \ dx=\left ({\sum \limits _{i=1}^{N}}\Delta x\ f\left (x_{i}\right ) \right ) +E \] Where \(E\) is the error between the actual area and the approximated area using the above method of numerical integration. \(N\) above is the number of strips or can also be refereed to as the number of integration points.
Instead of keep referring to the ’width of the strip’ all the time, we will call this quantity the weight \(w_{i}\) that we will multiply the value of the function with to obtain the area. Hence the above becomes\[ {\int \limits _{a}^{b}}f\relax (x) \ dx=\left ({\sum \limits _{i=1}^{N}}w_{i}\ f\left (x_{i}\right ) \right ) +E \] Using implied summation on indices the above becomes\[ {\int \limits _{a}^{b}}f\relax (x) \ dx=w_{i}\ f\left (x_{i}\right ) +E \] In the above we divided the range of the integration (the distance between the upper and lower limits of integration) into equal intervals. We can decide to evaluate \(f(x_{i})\) at the middle of the strip or at the start of the strip or at the end of the strip. In the diagram above we evaluated the \(f(x)\) at the left end of the strip.
Our goal is to evaluate this integral such as the error \(E\) is minimum and using the smallest number of integration points. In a sense this can be considered an optimization problem with constraints: minimize the error of integration using the smallest possible number of points.
To be able to do this minimization, we need to consider what are the variables involved. We see that there are two degrees of freedom to this problem. One is the width of the strip or \(w_{i}\). We do not have to use a ﬁxed value of width, we can use diﬀerent width for diﬀerent strips if the resulting integral gives better approximation.
The second degree of freedom is the point at which to evaluate \(f(x_{i})\) associated with each strip. In the example above we choose to evaluate \(f(x_{i})\) at left end point of the strip. We can choose to select a diﬀerent \(x_{i}\) point if this will result in a better approximation.
This is the main idea of Gauss Quadrature numerical integration. It is to be able to choose speciﬁc values for these two degrees of freedom, the \(w_{i}\) and the \(x_{i}\).
It turns out that if the function \(f(x)\) is a polynomial, then there is an optimal solution. There is an optimal \(\left \{ w_{i} ,x_{i}\right \} \) for each polynomial of order \(n\).
We can determine these degrees of freedom such that the error \(E\) is zero, and with the least possible number of integration points. We are able to tabulate these two degrees of freedom for each polynomial of speciﬁc order. In other words, if the function \(f(x)\) is a polynomial of order \(n\) then we know before the computation starts what these 2 degrees of freedom should be. We know the locations of \(x_{i}\) and we know weight \(w_{i}\) that we need to multiply \(f(x_{i})\) with to obtain the area with minimum error.
You might ask how can this method of integration know the locations of the integration points \(x_{i}\) beforehand without being given the integration range of the function to integrate? It turns out that we will map \(f\left ( x\right ) \) into a new known and speciﬁc range of integration (from \(-1\) to \(+1\)) for the method that we will now discuss.
From now on we will assume the function \(f(x)\) to be integrated is a polynomial in \(x\) of some order \(n\).
Gauss quadrature is optimal when the function is a polynomial
The main starting point is to represent the function \(f\relax (x) \) as a combination of linearly independent basis.
Instead of using strips of equal width, we assume the width can vary from one strip to the next. Let us call the width of the strip \(w_{i}\). Instead of taking the height of the \(i^{th}\) strip to be the value of the function at the left edge of the strip, let us also be ﬂexible on the location of the \(x\) associated with strip \(w_{i}\) and call the height of the strip \(w_{i}\) as \(f(x_{i})\) where \(x_{i}\) is to be determined. Hence the above integration becomes\begin {align*} {\int \limits _{a}^{b}}f\relax (x) \ dx & =\left ({\sum \limits _{i=1}^{N}}w_{i}\ f\left (x_{i}\right ) \right ) +E\\ & \approx {\sum \limits _{i=1}^{N}}w_{i}\ f\left (x_{i}\right ) \end {align*}
So our goal is to determine \(w_{i}\) and \(x_{i}\) such as the error \(E\) is minimized in the above equation. We would really like to ﬁnd \(w_{i}\) and \(x_{i}\) such that the error is zero with the smallest value for \(N\).
It seems as if this is a very hard problem. We have \(2 N\) unknown quantities to determine. \(N\) diﬀerent widths, and \(N\) associated \(x\) points to evaluate the height of each speciﬁc strip at. And we only have as an input \(f(x)\) and the limits of integration, and we need to determine these \(2N\) quantities such that the error in integration is zero.
In other words, the problem is to ﬁnd \(w_{i},x_{i}\) such that\begin {equation} I={\int \limits _{a}^{b}}f\relax (x) \ dx=w_{1}f\left (x_{1}\right ) +w_{2}f\left (x_{2}\right ) +\cdots w_{N}f\left (x_{N}\right ) \tag {1} \end {equation} One way to make some progress is to expand \(f(x)\) as a series. We can approximate \(f(x)\) as convergent power series for example. If \(f(x)\) happens to be a polynomial instead, we can represent it exactly using a ﬁnite sequence of Legendre polynomials. It is in this second case where this method makes the most sense to use due to the advantages we make from the second representation.
We show both methods below.
Expanding \(f\relax (x) \) as convergent power series over the range \(a,b\) gives\begin {equation} f\relax (x) =a_{0}+a_{1}x+a_{2}x^{2}+\cdots a_{m}x^{m}+\cdots \tag {2} \end {equation} Substituting (2) into (1) gives\begin {equation} I={\int \limits _{a}^{b}}\left (a_{0}+a_{1}x+a_{2}x^{2}+\cdots a_{m}x^{m}+\cdots \right ) \ dx=w_{1}f\left (x_{1}\right ) +w_{2}f\left ( x_{2}\right ) +\cdots w_{N}f\left (x_{N}\right ) \tag {3} \end {equation} But\begin {gather*} {\int \limits _{a}^{b}}\left (a_{0}+a_{1}x+a_{2}x^{2}+\cdots a_{m}x^{m}+\cdots \right ) \ dx={\int \limits _{a}^{b}}a_{0}dx+{\int \limits _{a}^{b}}a_{1}x\ dx+{\int \limits _{a}^{b}}a_{2}x^{2}\ dx+\cdots +{\int \limits _{a}^{b}}a_{m}x^{m}\ dx+\cdots \\ =a_{0}\left (b-a\right ) +a_{1}\frac {\left (b^{2}-a^{2}\right ) }{2}+a_{2}\frac {\left (b^{3}-a^{3}\right ) }{3}+\cdots +a_{m}\frac {\left ( b^{m+1}-a^{m+1}\right ) }{m+1}+\cdots \end {gather*} Substituting the above into (3) results in\begin {gather} a_{0}\left (b-a\right ) +a_{1}\frac {\left (b^{2}-a^{2}\right ) }{2}+a_{2}\frac {\left (b^{3}-a^{3}\right ) }{3}+\cdots +a_{m}\frac {\left ( b^{m+1}-a^{m+1}\right ) }{m+1}+\cdots \nonumber \\ =w_{1}f\left (x_{1}\right ) +w_{2}f\left (x_{2}\right ) +\cdots +w_{N}f\left (x_{N}\right ) \tag {4} \end {gather} But from (2) we see that\begin {align*} f\left (x_{1}\right ) & =a_{0}+a_{1}x_{1}+a_{2}x_{1}^{2}+\cdots a_{m}x_{1}^{m}+\cdots \\ f\left (x_{2}\right ) & =a_{0}+a_{1}x_{2}+a_{2}x_{2}^{2}+\cdots a_{m}x_{2}^{m}+\cdots \\ & \cdots \\ f\left (x_{N}\right ) & =a_{0}+a_{1}x_{N}+a_{2}x_{N}^{2}+\cdots a_{m}x_{N}^{m}+\cdots \end {align*}
Substituting the above into (4) gives\begin {gather*} a_{0}\left (b-a\right ) +a_{1}\frac {\left (b^{2}-a^{2}\right ) }{2}+a_{2}\frac {\left (b^{3}-a^{3}\right ) }{3}+\cdots +a_{m}\frac {\left ( b^{m+1}-a^{m+1}\right ) }{m+1}+\cdots =\\ w_{1}\left (a_{0}+a_{1}x_{1}+a_{2}x_{1}^{2}+\cdots a_{m}x_{1}^{m}+\cdots \right ) \\ +w_{2}\left (a_{0}+a_{1}x_{2}+a_{2}x_{2}^{2}+\cdots a_{m}x_{2}^{m}+\cdots \right ) \\ \cdots \\ +w_{N}\left (a_{0}+a_{1}x_{N}+a_{2}x_{N}^{2}+\cdots a_{m}x_{N}^{m}+\cdots \right ) \end {gather*} Rearranging gives\begin {gather*} a_{0}\left (b-a\right ) +a_{1}\frac {\left (b^{2}-a^{2}\right ) }{2}+a_{2}\frac {\left (b^{3}-a^{3}\right ) }{3}+\cdots +a_{m}\frac {\left ( b^{m}-a^{m}\right ) }{m}+\cdots =\\ a_{0}\left (w_{1}+w_{2}+\cdots +w_{N}\right ) \\ +a_{1}\left (w_{1}x_{1}+w_{2}x_{2}+\cdots +w_{N}x_{N}\right ) \\ +a_{2}\left (w_{1}x_{1}^{2}+w_{2}x_{2}^{2}+\cdots +w_{N}x_{N}^{2}\right ) \\ \cdots \\ +a_{m}\left (w_{1}x_{1}^{m}+w_{2}x_{2}^{m}+\cdots +w_{N}x_{N}^{m}\right ) \end {gather*} Equating coeﬃcients on both sides results in\begin {align} w_{1}+w_{2}+\cdots +w_{N} & =b-a\tag {5}\\ w_{1}x_{1}+w_{2}x_{2}+\cdots +w_{N}x_{N} & =\frac {\left (b^{2}-a^{2}\right ) }{2}\nonumber \\ w_{1}x_{1}^{2}+w_{2}x_{2}^{2}+\cdots +w_{N}x_{N}^{2} & =\frac {\left ( b^{3}-a^{3}\right ) }{3}\nonumber \\ & \cdots \nonumber \\ w_{1}x_{1}^{m}+w_{2}x_{2}^{m}+\cdots +w_{N}x_{N}^{m} & =\frac {\left ( b^{m}-a^{m}\right ) }{m}\nonumber \\ & \cdots \nonumber \end {align}
Since we have \(2N\) unknown quantities to solve for (\(N\) weights \(w_{i}\) and \(N\) points \(x_{i}\)) we need \(2N\) equations. In other words, we need to have \(m=2N\). The above set of simultaneous \(2N\) equations can now be solved for the unknown \(w_{i},x_{i}\) and this will give us the numerical integration we wanted.
The above assumed that the function \(f(x)\) can be represented exactly by the power series expansion with \(m\) terms.
We now consider the representation of \(f(x)\) as a series of Legendre polynomials. We do this since when \(f(x)\) itself is a polynomial. We can represent \(f\relax (x) \) exactly by a ﬁnite number of Legendre polynomials. But since Legendre polynomials \(P_{n}\relax (x) \) are deﬁned over \([-1,1]\) we start by transforming \(f\relax (x) \) to this new range and then we can expand the mapped \(f\relax (x) \) (which we will call \(g\left (\zeta \right ) \)) in terms of the Legendre polynomials.
An easy way to ﬁnd this mapping is to align the ranges over each others and take the ratio between as the scale factor. This diagram shows this for a general case where we map \(f\relax (x) \) deﬁned over \([a,b]\) to a new range deﬁned over \([c_{1},c_{2}]\)
We see from the diagram that\[ \zeta =c_{1}+d\zeta \] But \(\frac {d\zeta }{dx}\) is the same ratio as \(\frac {b-a}{c_{2}-c_{1}}\). Hence \begin {equation} \frac {dx}{d\zeta }=\frac {b-a}{c_{2}-c_{1}}\tag {6} \end {equation} The above is called the Jacobian of the transformation. Now, From the diagram we see that \[ dx=x-a \] And\[ d\zeta =\zeta -c_{1}\] Eq. (6) becomes\[ \frac {x-a}{\zeta -c_{1}}=\frac {b-a}{c_{2}-c_{1}}\] Hence we obtain that
\[ \zeta =\frac {x-a}{b-a}\left (c_{2}-c_{1}\right ) +c_{1}\] And\[ x=\frac {b-a}{c_{2}-c_{1}}\left (\zeta -c_{1}\right ) +a \] For the speciﬁc case when \(c_{1}=-1\) and \(c_{2}=+1\) the above expression becomes\begin {align*} \zeta & =\frac {x-a}{b-a}\relax (2) -1\\ & =\frac {2x-2a-\left (b-a\right ) }{b-a}\\ & =\frac {2x-a-b}{b-a} \end {align*}
Therefore\begin {align*} x & =\frac {b-a}{2}\left (\zeta +1\right ) +a\\ & =\frac {\left (b-a\right ) \zeta +\left (a+b\right ) }{2} \end {align*}
Before we proceed further, It will be interesting to see the eﬀect of this transformation on the shape of some functions. Below I plotted some functions under this transformation. The left plots are the original functions plotted over some range, in this case \([4,10]\) and the right side plots show the new shape (the function \(g\left (\zeta \right ) \)) over the new range \([-1,1]\)
We must remember that in the following analysis, we are integrating now the function \(g\left (\zeta \right ) \) over \([-1,1]\) and not \(f\relax (x) \) over \([a,b]\). Hence to obtain the required integral we need to transform back the ﬁnal result. We will show how to do this below.
We can approximate any function \(f\relax (x) \) as a series expansion in terms of weighted sums of a set of basis functions. We do this for example when we use Fourier series expansion. Hence we write
\begin {equation} f\relax (x) ={\displaystyle \sum \limits _{i}^{\infty }} a_{i}\Phi _{i}\relax (x) \tag {7} \end {equation} We can give an intuitive justiﬁcation to the above formulation as follows. If we think in terms of vectors. A vector \(\mathbf {V}\) in an n-dimensional space is written in terms of its components as follows\begin {align*} \mathbf {V} & =a_{1}\mathbf {e}_{1}+a_{2}\mathbf {e}_{2}+\cdots a_{N}\mathbf {e}_{N}\\ & ={\sum \limits _{i}^{N}}a_{i}\mathbf {e}_{i} \end {align*}
Where \(\mathbf {e}_{i}\) is the basis vectors in that space.
If we now consider a general inﬁnite dimensional vector space where each point in that space is a function, then we see that we can also represent that function as a weighted series of a basis functions just as we did for a normal vector.
There are many sets of basis functions we can choose to represent the function \(f\relax (x) \) with. The main requirements for the basis functions is that they are complete (This means they span the whole space) and there is deﬁned an inner product on them.
For our purposes here, we are interested in the class of function \(f\left ( x\right ) \) that are polynomials in \(x\). The basis we will select to use are the Legendre basis as explained above. To do this, we transform \(f\left ( x\right ) \) to \(g\left (\zeta \right ) \) as shown above and then now the integral becomes\[ {\int \limits _{a}^{b}}f\relax (x) dx={\int \limits _{-1}^{1}}f\left ( x\left (\zeta \right ) \right ) \left (\frac {\left (b-a\right ) }{2}d\zeta \right ) \] This is because we found that \(dx=\frac {\left (b-a\right ) }{2}d\zeta \) from above. If we call \(f\left (x\left (\zeta \right ) \right ) \) as \(g\left ( \zeta \right ) \) the integral becomes\[ {\int \limits _{a}^{b}}f\relax (x) dx={\int \limits _{-1}^{1}}\frac {\left ( b-a\right ) }{2}g\left (\zeta \right ) \ d\zeta \] Since \(\frac {\left (b-a\right ) }{2}\) is the Jacobian of the transformation, we write the integral as\[ {\int \limits _{a}^{b}}f\relax (x) dx\rightarrow {\int \limits _{-1}^{1}}J\ g\left (\zeta \right ) \ d\zeta \] Since the Jacobian is constant in this case, we will only worry about \({\int \limits _{-1}^{1}}\ g\left (\zeta \right ) \ d\zeta \) and we just need to remember to scale the result at the end by \(J\). This is the integral we will now numerically integrate. Equation (7) is now written as\[ g\left (\zeta \right ) ={\sum \limits _{i}^{\infty }}a_{i}P_{i}\left ( \zeta \right ) \] Where \(P_{i}\left (\zeta \right ) \) is the Legendre polynomial of order \(i\) and \(g\left (\zeta \right ) \) is a polynomial in \(\zeta \) or some order \(m\).
Now we carry the same analysis we did when we expressed \(f\relax (x) \) as a power series. The diﬀerence now is that the limit of integration is symmetrical and the basis are now the Legendre polynomials instead of the polynomials \(x^{n}\) as the case was in the power series expansion. So now equation (1) becomes\begin {equation} I={\int \limits _{-1}^{1}}\ g\left (\zeta \right ) \ d\zeta =w_{1}g\left ( \zeta _{1}\right ) +w_{2}g\left (\zeta _{2}\right ) +\cdots +w_{N}g\left ( \zeta _{N}\right ) \tag {8} \end {equation} And equation (2) becomes\begin {equation} g\left (\zeta \right ) =a_{0}P_{0}\left (\zeta \right ) +a_{1}P_{1}\left ( \zeta \right ) +a_{2}P_{2}\left (\zeta \right ) +\cdots a_{m}P_{m}\left ( \zeta \right ) +\cdots \tag {9} \end {equation} Substituting (9) into (8) we get the equivalent of equation (3)\begin {align} I & ={\int \limits _{-1}^{1}}\left (a_{0}P_{0}\left (\zeta \right ) +a_{1}P_{1}\left (\zeta \right ) +a_{2}P_{2}\left (\zeta \right ) +\cdots a_{m}P_{m}\left (\zeta \right ) +\cdots \right ) \ d\zeta \tag {10}\\ & =w_{1}g\left (\zeta _{1}\right ) +w_{2}g\left (\zeta _{2}\right ) +\cdots +w_{N}g\left (\zeta _{N}\right ) \nonumber \end {align}
Therefore\begin {align*} {\int \limits _{-1}^{1}}\left (a_{0}P_{0}+a_{1}P_{1}+a_{2}P_{2}+\cdots a_{m}P_{m}+\cdots \right ) \ d\zeta & ={\int \limits _{-1}^{1}}a_{0}P_{0}d\zeta +{\int \limits _{-1}^{1}}a_{1}P_{1}d\zeta +{\int \limits _{-1}^{1}}a_{2}P_{2}\ d\zeta +\cdots +{\int \limits _{-1}^{1}}a_{m}P_{m}d\zeta \ +\cdots \\ & =a_{0}\relax (2) +a_{1}0+a_{2}0+\cdots +a0+\cdots \\ & =2a_{0} \end {align*}
The above occurs since the integral of any Legendre polynomial of order greater than zero will be zero over \([-1,1]\)
Substituting the above into (10) we obtain\begin {equation} I=2a_{0}=w_{1}g\left (\zeta _{1}\right ) +w_{2}g\left (\zeta _{2}\right ) +\cdots +w_{N}g\left (\zeta _{N}\right ) \tag {11} \end {equation} But from (9) we see that\begin {align*} g\left (\zeta _{1}\right ) & =a_{0}P_{0}\left (\zeta _{1}\right ) +a_{1}P_{1}\left (\zeta _{1}\right ) +a_{2}P_{2}\left (\zeta _{1}\right ) +\cdots a_{m}P_{m}\left (\zeta _{1}\right ) +\cdots \\ g\left (\zeta _{2}\right ) & =a_{0}P_{0}\left (\zeta _{2}\right ) +a_{1}P_{1}\left (\zeta _{2}\right ) +a_{2}P_{2}\left (\zeta _{2}\right ) +\cdots a_{m}P_{m}\left (\zeta _{2}\right ) +\cdots \\ & \cdots \\ g\left (\zeta _{N}\right ) & =a_{0}P_{0}\left (\zeta _{N}\right ) +a_{1}P_{1}\left (\zeta _{N}\right ) +a_{2}P_{2}\left (\zeta _{N}\right ) +\cdots a_{m}P_{m}\left (\zeta _{N}\right ) +\cdots \end {align*}
Substituting the above in (11) gives\begin {align*} 2a_{0} & =w_{1}\left (a_{0}P_{0}\left (\zeta _{1}\right ) +a_{1}P_{1}\left ( \zeta _{1}\right ) +a_{2}P_{2}\left (\zeta _{1}\right ) +\cdots a_{m}P_{m}\left (\zeta _{1}\right ) +\cdots \right ) +\\ & +w_{2}\left (a_{0}P_{0}\left (\zeta _{2}\right ) +a_{1}P_{1}\left ( \zeta _{2}\right ) +a_{2}P_{2}\left (\zeta _{2}\right ) +\cdots a_{m}P_{m}\left (\zeta _{2}\right ) +\cdots \right ) +\\ & \cdots \\ & +w_{N}\left (a_{0}P_{0}\left (\zeta _{N}\right ) +a_{1}P_{1}\left ( \zeta _{N}\right ) +a_{2}P_{2}\left (\zeta _{N}\right ) +\cdots a_{m}P_{m}\left (\zeta _{N}\right ) +\cdots \right ) \end {align*}
Rearranging results in\begin {align*} 2a_{0} & =a_{0}\left (w_{1}P_{0}\left (\zeta _{1}\right ) +w_{2}P_{0}\left ( \zeta _{2}\right ) +\cdots +w_{N}P_{0}\left (\zeta _{N}\right ) \right ) \\ & +a_{1}\left (w_{1}P_{1}\left (\zeta _{1}\right ) +w_{2}P_{1}\left ( \zeta _{2}\right ) +\cdots +w_{N}P_{1}\left (\zeta _{N}\right ) \right ) \\ & +\cdots \\ & +a_{m}\left (w_{1}P_{m}\left (\zeta _{1}\right ) +w_{2}P_{m}\left ( \zeta _{2}\right ) +\cdots +w_{N}P_{m}\left (\zeta _{N}\right ) \right ) \end {align*}
Equating coeﬃcients gives\begin {align*} 2 & =w_{1}P_{0}\left (\zeta _{1}\right ) +w_{2}P_{0}\left (\zeta _{2}\right ) +\cdots +w_{N}P_{0}\left (\zeta _{N}\right ) \\ 0 & =w_{1}P_{1}\left (\zeta _{1}\right ) +w_{2}P_{1}\left (\zeta _{2}\right ) +\cdots +w_{N}P_{1}\left (\zeta _{N}\right ) \\ 0 & =w_{1}P_{2}\left (\zeta _{1}\right ) +w_{2}P_{2}\left (\zeta _{2}\right ) +\cdots +w_{N}P_{2}\left (\zeta _{N}\right ) \\ & \cdots \\ 0 & =w_{1}P_{m}\left (\zeta _{1}\right ) +w_{2}P_{m}\left (\zeta _{2}\right ) +\cdots +w_{N}P_{m}\left (\zeta _{N}\right ) \end {align*}
If we select the points \(\zeta _{i}\) to be the roots of \(P_{i-1}\) we can write the above as\begin {align*} 2 & =w_{1}P_{0}\left (\zeta _{1}\right ) +w_{2}P_{0}\left (\zeta _{2}\right ) +\cdots +w_{N}P_{0}\left (\zeta _{N}\right ) \\ 0 & =w_{1}P_{1}\left (\zeta _{1}\right ) +w_{2}P_{1}\left (\zeta _{2}\right ) +\cdots +w_{N}P_{1}\left (\zeta _{N}\right ) \\ 0 & =w_{1}P_{2}\left (\zeta _{1}\right ) +w_{2}P_{2}\left (\zeta _{2}\right ) +\cdots +w_{N}P_{2}\left (\zeta _{N}\right ) \\ & \cdots \\ 0 & =w_{1}P_{m}\left (\zeta _{1}\right ) +w_{2}P_{m}\left (\zeta _{2}\right ) +\cdots +w_{N}P_{m}\left (\zeta _{N}\right ) \end {align*}
Mathematical Structural Mechanics help page
MIT course 16.20 lecture notes. MIT open course website.
Theory of elasticity by S. P. Timoshenko and J. N. Goodier. chapter 10