home

UP

PDF (letter size)

PDF (legal size)

Mathematics 501
Advanced Numerical Analysis and Computing
CSU Fullerton, Fullerton
Spring, 2007


spring 2007   Compiled on October 14, 2025 at 5:17pm  [public]

1 Introduction
2 my typed lecture notes
3 study notes
4 HWs

Chapter 1
Introduction

1.1 Text book
1.2 course description
1.3 Syllabus

I took this course during summer 2007, at California state univ. Fullerton. This was a required course for my MSc. In Applied Mathematics. Instructor is

Dr Charles. H. Lee (Hung V. Ly), 
Associate Professor 
B.Sc., M.Sc., Ph.D., University of California, Irvine. 
 
Department of Mathematics 
California State University 
Fullerton, CA 92834

Professor Lee’s web site is http://newton.fullerton.edu/hunglee/

1.1 Text book

pict

1.1.1 My review of the book

The book contains lots of technical information. Almost anything related to numerical analysis is contained in this book.

However, this book has a very dry style. It was hard for me to learn from it. It contains small number of worked out examples. The style of presentation is too formal and hence this text is best used as a reference more than something a student can learn from.

I found many other books which was from me easier to read and learn from.

Again, this is a good book, but as a reference.

This was a really hard course for me. Too many theorems and proofs to learn. Our instructor Dr Lee was very good and knew the material very well, but for me this was still a hard because it contained more real math than I am used to in the engineering courses I took before (but again, this is an applied math course, so what did I expect?)

1.2 course description

1.3 Syllabus

PDF

Chapter 2
my typed lecture notes

2.1 Introduction
2.2 Lecture monday April 17 2007
2.3 Second lecture
2.4 Mathematics Colloquium notes

2.1 Introduction

Each student had to typeset some lectures, here are mine

2.2 Lecture monday April 17 2007

2.2.1 Section 5.4

2.2.1.1 Theorem 1: Singular value decomposition

Given any arbitrary matrix \(A_{m\times n}\) it can be factored into 3 matrices as follows \(A_{m\times n}=P_{m\times m}D_{m\times n}Q_{n\times n}\) where \(P\) is a unitary matrix (\(P^{H}=P^{-1}\) or \(P^{H}P=I\)), and \(Q\) is also unitary matrix.

These are the steps to do SVD

  1. Find the rank of \(A\), say \(r\)
  2. Let \(B=A_{n\times m}^{H}A_{m\times n}\), hence \(B_{n\times n}\) is a square matrix, and it is semi positive definite, hence its eigenvalues will all be \(\geq 0\). Find the eigenvalues of \(B\), call these \(\sigma _{i}^{2}\). There will be \(n\) such eigenvalues since \(B\) is of order \(n\times n\). But only \(r\) of these will be positive, and \(n-r\) will be zero. Arrange these eigenvalues such that the first \(r\) non-zero eigenvalues come first, followed by the zero eigenvalues: \(\overset {n\ eigenvalues}{\overbrace {\sigma _{1}^{2},\sigma _{2}^{2},\cdots ,\sigma _{r}^{2},0,0,\cdots ,0}}\)
  3. Initialize matrix \(D_{m\times n}\) to be all zeros. Take the the first \(r\) eigenvalues from above (non-zero ones), and take the square root of each, hence we get \(\overset {r\ \text {singular values}}{\overbrace {\sigma _{1},\sigma _{2},\cdots ,\sigma _{r}}},\)and write these down the diagonal of \(D\) starting at \(D\left ( 1,1\right ) \), i.e. \(D\left ( 1,1\right ) =\sigma _{1},D\left ( 2,2\right ) =\sigma _{2},\cdots ,D\left ( r,r\right ) =\sigma _{r}\). Notice that the matrix \(D\) need not square matrix. Hence we can obtain an arrangement such as the following for \(r=2\begin {pmatrix} \sigma _{1} & 0 & 0 & 0\\ 0 & \sigma _{2} & 0 & 0\\ 0 & 0 & 0 & 0 \end {pmatrix} \), where the matrix \(A\) was \(3\times 4\,\,\)for example.
  4. Now find each eigenvalue of \(A^{H}A\). For each eigenvalue, find the corresponding eigenvector. Call these eigenvectors \(\vec {u}_{1},\vec {u}_{2},\cdots ,\vec {u}_{n}\).
  5. Normalize the eigenvectors found in the above step. Now \(\vec {u}_{1},\vec {u}_{2},\cdots ,\vec {u}_{n}\) eigenvector will be an orthonormal set of vectors. Take the Hermitian of each eigenvector \(\vec {u}_{1}^{H},\vec {u}_{2}^{H},\cdots ,\vec {u}_{n}^{H}\,\)and make one of these vectors (now in row format instead of column format) go into a row in the matrix \(Q.\)i.e the first row of \(Q\) will be \(\vec {u}_{1}^{H}\), the second row of \(Q\) will be \(\vec {u}_{2}^{H}\), etc... \(\left ( A^{H}A\right ) _{n\times n}\overset {\text {find eigenvectors and normalize}}{\rightarrow }\left \{ \vec {u}_{1},\vec {u}_{2},\cdots ,\vec {u}_{r},\overset {\text {ortho basis (n-r) for N(A)}}{\overbrace {\vec {u}_{r+1},\cdots ,\vec {u}_{n}}}\right \} \rightarrow Q_{n\times n}=\begin {bmatrix} \vec {u}_{1}^{T}\\ \vec {u}_{2}^{T}\\ \vdots \\ \vec {u}_{r}^{T}\\ \vec {u}_{r+1}^{T}\\ \vdots \\ \vec {u}_{n}^{T}\end {bmatrix} \)
  6. Now we need to find a set of \(m\) orthonormal vectors, these will be the columns of the matrix \(P\). There are 2 ways to do this. First the textbook way, and then another way which I think is simpler.

    1. The textbook method is as follows: find a set of \(r\) orthonormal eigenvector \(\vec {v}_{i}=\frac {1}{\sigma _{i}}A\vec {u}_{i}\), for \(i=1\cdots r\). Notice that here we only use the first \(r\) vectors found in step 5. Take each one of these \(\vec {v}_{i}\) vectors and make them into columns of \(P\). But since we need \(m\) columns in \(P\) not just \(r\), we need to come up with \(m-r\) more basis vectors such that all the \(m\) vectors form an orthonormal set of basis vectors for the row space of \(A\), ie. \(\mathbb {C} ^{m}\). If doing this by hand, it is easy to find the this \(m-r\) by inspection. In a program, we could use the same process we used with Gram-Schmidt, where we learned how find a new vector which is orthonormal to a an existing set of other vectors.
    2. An easier approach is: Find \(AA^{H}\) (do not confused with how we did this in step 4 where we did \(A^{H}A\)). This will be an \(m\times m\) matrix. Now find each eigenvalue. For each eigenvalue, find the corresponding eigenvector. Now normalize these basis vectors. These will now be an orthonormal eigenvectors \(\vec {v}_{1},\vec {v}_{2},\cdots ,\vec {v}_{m}\). Now as in the above step, these vectors will becomes the columns of the matrix \(P\). The difference in this approach is that we do not need to use Gram-Schmidt to find the \(m-r\) eigenvectors since we will obtain the \(m\) eigenvectors right away.   \(\left ( AA^{H}\right ) _{m\times m}\overset {\text {find eigenvectors and normalize}}{\rightarrow }\left \{ \overset {\text {r ortho basis for range A}}{\overbrace {\vec {v}_{1},\vec {v}_{2},\cdots ,\vec {v}_{r}}},\vec {v}_{r+1},\cdots ,\vec {v}_{m}\right \} \rightarrow P_{m\times m}=\begin {bmatrix} \vec {v}_{1} & \vec {v}_{2} & \cdots & \vec {v}_{r} & \vec {v}_{r+1} & \cdots & \vec {v}_{m}\end {bmatrix} \)
  7. This completes the factorization, now we have \(A=PDQ\)

In Matlab, to find SVD, use the command \([P,D,Q]=svd(A).\)

The following diagram help illustrate the SVD process described above.

Remarks:

  1. \(\sigma _{1},\sigma _{2},\cdots ,\sigma _{n}\) are called the singular values of \(A\).
  2. the SVD factorization is not unique as \(\sigma _{1},\sigma _{2},\cdots ,\sigma _{r}\) can be ordered differently. Also the choice of \(m-r\) orthonormal basis for the \(P\) matrix is arbitrary.
2.2.1.2 Pseudo Inverse

Given an arbitrary matrix \(A\) to find its pseudo inverse, called \(A^{+}\) we start by first find the SVD of \(A\), then write

\begin{equation} A^{+}=Q^{H}D^{+}P^{H} \tag {1}\end{equation}

Where, since \(D\) is diagonal \(\begin {pmatrix} \sigma _{1} & 0 & 0\\ 0 & \sigma _{2} & 0\\ 0 & 0 & \ddots \end {pmatrix} \), then \(D^{+}=\) \(\begin {pmatrix} \frac {1}{\sigma _{1}} & 0 & 0\\ 0 & \frac {1}{\sigma _{2}} & 0\\ 0 & 0 & \ddots \end {pmatrix} \), and so finding the pseudo inverse of a matrix is straightforward if we have the SVD factorization of the matrix.  To show equation (1): Since

\[ A=PDQ \]

Then pre multiply both sides by \(P^{H}\) we obtain

\[ P^{H}A=P^{H}PDQ \]

But \(P^{H}=P^{-1}\) since \(P\) is unitary matrix, then the above becomes

\[ P^{H}A=DQ \]

Now post-multiply both sides by \(Q^{H}\)

\[ P^{H}AQ^{H}=DQQ^{H}\]

But \(Q^{H}=Q^{-1}\) since \(Q\) is unitary matrix, then the above becomes

\[ P^{H}AQ^{H}=D \]

Hence

\[ D^{+}=\left ( P^{H}AQ^{H}\right ) ^{+}\]

To find definition for \(A^{+}\), start from \(Ax=b\)

\begin{align*} Ax & =b\\ A^{H}Ax & =A^{H}b\\ x & =\left ( A^{H}A\right ) ^{-1}A^{H}b \end{align*}

Hence

\begin{align*} A^{+} & =\left ( A^{H}A\right ) ^{-1}A^{H}\\ & =\left ( \left ( PDQ\right ) ^{H}\left ( PDQ\right ) \right ) ^{-1}\left ( PDQ\right ) ^{H}\\ & =\left ( \left ( Q^{H}D^{H}P^{H}\right ) \left ( PDQ\right ) \right ) ^{-1}\left ( Q^{H}D^{H}P^{H}\right ) \\ & =\left ( \left ( PDQ\right ) ^{-1}\left ( Q^{H}D^{H}P^{H}\right ) ^{-1}\right ) \left ( Q^{H}D^{H}P^{H}\right ) \\ & =\left ( \left ( Q^{-1}D^{-1}P^{-1}\right ) \left ( P^{-H}D^{-H}Q^{-H}\right ) \right ) \left ( Q^{H}D^{H}P^{H}\right ) \\ & =\left ( Q^{-1}D^{-1}P^{-1}P^{-H}D^{-H}Q^{-H}\right ) \left ( Q^{H}D^{H}P^{H}\right ) \\ & =\left ( Q^{-1}D^{-1}D^{-H}Q^{-H}\right ) \left ( Q^{H}D^{H}P^{H}\right ) \\ & =Q^{-1}D^{-1}D^{-H}Q^{-H}Q^{H}D^{H}P^{H}\\ & =Q^{-1}D^{-1}D^{-H}D^{H}P^{H}\\ & =Q^{-1}D^{-1}P^{H}\\ & =Q^{H}D^{-1}P^{H}\end{align*}

But \(D^{-1}=D^{+}\) hence the above becomes

\[ A^{+}=Q^{H}D^{+}P^{H}\]

which is equation (1).

Note: \(A^{+}A=I^{+}\), where \(I^{+}\) has the form\(\begin {pmatrix} 1 & 0 & 0 & 0 & 0\\ 0 & \ddots & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 \end {pmatrix} \). in other words, it does not have \(1\) on all the diagonal elements as the normal \(I\) would.

2.2.1.3 Theorem 2: Minimal solution to A x=b and the pseudo inverse

if \(\hat {x}=A^{+}\vec {b}\) then \(\hat {x}\) is the minimal solution of \(A\vec {x}=\vec {b}\)

proof:

let \(c=P^{H}b\), \(y=Qx\)

\begin{align*} \rho & =\inf _{x}\left \Vert Ax-b\right \Vert _{2}\\ & =\inf _{x}\left \Vert PDQx-b\right \Vert _{2}\\ & =\inf _{x}\left \Vert P^{H}PDQx-P^{H}b\right \Vert _{2}\\ & =\inf _{x}\left \Vert DQx-P^{H}b\right \Vert _{2}\\ & =\inf _{x}\left \Vert Dy-c\right \Vert _{2}\end{align*}

Since \(D=\begin {pmatrix} \sigma _{1} & & & \\ & \sigma _{2} & & \\ & & \sigma _{r} & \\ & & & 0 \end {pmatrix} \) then

\[ \left \Vert Dy-c\right \Vert _{2}^{2}=\sum _{i=1}^{r}\left ( \sigma _{i}y_{i}-c_{i}\right ) ^{2}+\sum _{i=r+1}^{n}c_{i}^{2}\]

This is minium when \(y_{i}=\frac {\sigma _{i}}{c_{i}},\)where \(i=1\cdots r\), then will make the first term in the RHS above go to zero. Hence

\[ \rho =\sqrt {\sum _{i=r+1}^{n}c_{i}^{2}}\]

The vector \(\vec {y}\) which gives this minium solution has \(y_{r+1},y_{r+2},\cdots ,y_{n}=0\), hence

\[ y=D^{+}c \]

But since \(y=Qx\), hence \(D^{+}c=Qx\) or \(x=Q^{H}D^{+}c\), but \(c=P^{H}b\), then

\begin{align*} x & =Q^{H}D^{+}P^{H}b\\ & =A^{+}b \end{align*}

Hence

\[ A^{+}=Q^{H}D^{+}P^{H}\]
2.2.1.4 Theorem 3: Penrose properties for any matrix \(A\)

For any \(m\times n\) matrix \(A\) there is at least one matrix \(X\) with the following 4 properties

  1. \(AXA=A\)
  2. \(XAX=X\)
  3. \(\left ( AX\right ) ^{H}=AX\)
  4. \(\left ( XA\right ) ^{H}=XA\)

We need to show that the matrix \(X\) is unique (with respect to \(A\)). The proof is by contradiction. We assume that \(A\) has associated with it 2 matrices with the above properties, \(X\) and \(Y\) and we will show that \(X=Y\), hence \(X\) is unique for \(A.\)

The proof starts by saying \(X=XAX\) and through number of substitution steps using the above 4 we obtain \(Y\) as follows

\begin{align*} X & =X\overset {AYA}{\overbrace {A}}X\ \ \ \ \ \ \ \\ & =XAYAX\\ & =X\overset {AYA}{\overbrace {A}}Y\overset {AYA}{\overbrace {A}}X\\ & =XAYAYAYAX\\ & =\left ( XA\right ) ^{H}\left ( YA\right ) ^{H}Y\left ( AY\right ) ^{H}\left ( AX\right ) ^{H}\ \ \ \ \text {using property 3,4}\\ & =A^{H}X^{H}A^{H}Y^{H}YY^{H}A^{H}X^{H}A^{H}\\ & =\left ( AXA\right ) ^{H}Y^{H}YY^{H}\left ( AXA\right ) ^{H}\\ & =\overbrace {\left ( AXA\right ) }^{H}Y^{H}YY^{H}\overbrace {\left ( AXA\right ) }^{H}\\ & =\left ( A\right ) ^{H}Y^{H}YY^{H}\left ( A\right ) ^{H}\text { \ \ \ \ \ \ \ property 1}\\ & =\left ( YA\right ) ^{H}Y\left ( AY\right ) ^{H}\\ & =YAYAY\text { \ \ \ property 4 and 3}\\ & =YAY\text { \ \ property 2}\\ & =Y\text { \ \ property 2}\end{align*}

Hence \(X=Y\), so \(X\) is unique.

2.2.1.5 Theorem 4: \(A^{+}\) satisfies Penrose properties hence unique

Theorem: The pseudo inverse of a matrix has four Penrose properties. hence each matrix has a unique pseudo inverse \(A^{+}\)

proof: Let \(A\) be any matrix, its SVD is \(A=PDQ\), and we showed before that \(A^{+}=Q^{H}D^{+}P^{H}\)

note: \(DD^{+}D=D\)

Let us now show that each property  of Penrose is satisfied. In this case \(X\) is our \(A^{+}\). Hence for property 1 we need to show that \(AA^{+}A=A.\)

\begin{align*} AA^{+}A & =PDQ\overset {A^{+}}{\overbrace {Q^{H}D^{+}P^{H}}}PDQ\\ & =PD\overset {}{D^{+}P^{H}}PDQ\\ & =PDD^{+}DQ\\ & =PDI^{+}Q\\ & =PDQ\\ & =A \end{align*}

To show property 2, we need to show that \(A^{+}AA^{+}=A^{+}\)

\begin{align*} A^{+}AA^{+} & =\left ( Q^{H}D^{+}P^{H}\right ) A\left ( Q^{H}D^{+}P^{H}\right ) \\ & =\left ( Q^{H}D^{+}P^{H}\right ) PDQ\left ( Q^{H}D^{+}P^{H}\right ) \\ & =\left ( Q^{H}D^{+}\right ) D\left ( D^{+}P^{H}\right ) \\ & =Q^{H}D^{+}DD^{+}P^{H}\\ & =Q^{H}I^{+}D^{+}P^{H}\\ & =Q^{H}D^{+}P^{H}\\ & =A^{+}\end{align*}

To show property 3, need to show \(\left ( AA^{+}\right ) ^{H}=AA^{+}\)

\begin{align*} \left ( AA^{+}\right ) ^{H} & =\left ( \left ( PDQ\right ) \left ( Q^{H}D^{+}P^{H}\right ) \right ) ^{H}\\ & =\left ( \left ( Q^{H}D^{+}P^{H}\right ) ^{H}\left ( PDQ\right ) ^{H}\right ) \\ & =\left ( P\left ( D^{+}\right ) ^{H}Q\left ( PDQ\right ) ^{H}\right ) \\ & =\left ( P\left ( D^{+}\right ) ^{H}Q\left ( Q^{H}D^{H}P^{H}\right ) \right ) \\ & =P\left ( D^{+}\right ) ^{H}QQ^{H}D^{H}P^{H}\\ & =PD^{+}QQ^{H}D^{H}P^{H}\\ & =PD^{+}QQ^{H}D^{H}P^{H}\\ & =PDQQ^{H}D^{+}P^{H}\\ & =\left ( PDQ\right ) \left ( Q^{H}D^{+}P^{H}\right ) \\ & =AA+ \end{align*}

For property (4) need to show \(\left ( A^{+}A\right ) ^{H}=A^{+}A\)

\begin{align*} \left ( A^{+}A\right ) ^{H} & =\left ( \left ( Q^{H}D^{+}P^{H}\right ) \left ( PDQ\right ) \right ) ^{H}\\ & =\left ( PDQ\right ) ^{H}\left ( Q^{H}D^{+}P^{H}\right ) ^{H}\\ & =Q^{H}D^{H}P^{H}P\left ( D^{+}\right ) ^{H}Q\\ & =Q^{H}DP^{H}P\left ( D^{+}\right ) Q\\ & =Q^{H}D\left ( D^{+}\right ) Q\\ & =Q^{H}D^{+}DQ\\ & =Q^{H}D^{+}P^{H}PDQ\\ & =A^{+}A \end{align*}

Hence Pseudocode is unique.

2.2.1.6 Theorem 5: On SVD properties

Let \(A\) have svd \(A=PDQ\ \), then the following is true

  1. rank of \(A\) is \(r\)
  2. \(\left \{ \vec {u}_{r+1},\vec {u}_{r+2},\cdots ,\vec {u}_{n}\right \} \) is an orthonormal base for null space of \(A\)
  3. \(\left \{ \vec {v}_{1},\vec {v}_{2},\cdots ,\vec {v}_{r}\right \} \) is an orthonormal base for range of of \(A\)
  4. \(\left \Vert A\right \Vert _{2}=\max _{1\leq i\leq n}\left \vert \sigma _{i}\right \vert \)

proof:

  1. Since \(A=PDQ\) and since \(P,Q\) are invertible matrices (by construction, they are made up of eigenvectors from a set of distinct eigenvalues, hence they must be orthogonal eigenvectors). Therefor, the rank of \(A\) is decided by the rank of \(D\). But \(D\) is all zeros except for the values \(\sigma _{i}\) each on a separate column and on separate row. Since there are \(r\) distinct values of those, hence \(D\) has \(r\) vectors that are linearly independent. Hence the rank of \(A\) is \(r.\)
  2. Since \(A\vec {u}_{i}=\vec {0}\) for each vector \(i=\left ( r+1\right ) \cdots n\), and these \(\vec {u}_{i}\) vectors form an orthogonal set of size \(n-r\), and since the dimension of the null-space(A) is \(n-r\) hence they span the whole null-space(A). Then they can be used as a basis for the null space of \(A\) (recall, the null-space of \(A\) is the space in which \(A\vec {x}=\vec {0}\))
  3. Since \(\vec {v}_{i}=\frac {A\vec {u}_{i}}{\sigma _{i}}\neq \vec {0}\), for \(i=1\cdots r\), and these \(\vec {v}_{i}\) vectors form an orthogonal set of basis, and since the dimension of the Range(A) is \(r\) hence they span the whole Range(A). Then they can be used as a basis for the Range(\(A).\)Another proof of this, is to use the construction of the \(\vec {v}_{i}\) vector from \(AA^{H}\). But we did not use this method, so I will leave this out for now.
  4. \begin{align*} \left \Vert A\right \Vert _{2} & =\sup _{\left \Vert x\right \Vert _{2}=1}\left \{ \left \Vert A\vec {x}\right \Vert _{2}\right \} \\ & =\sup _{\left \Vert x\right \Vert _{2}=1}\left \Vert PDQ\vec {x}\right \Vert _{2}\\ & =\sup _{\left \Vert y\right \Vert _{2}=1}\left \Vert D\vec {y}\right \Vert _{2}\\ & =\sup _{\left \Vert y\right \Vert _{2}=1}\sqrt {\sum _{i=1}^{r}\left ( \sigma _{i}y_{i}\right ) ^{2}}\\ & =\max _{1\leq i\leq r}\ \left \vert \sigma _{i}\right \vert \end{align*}
2.2.1.7 Theorem 6: Reduced (or economical) SVD

Recall in the above standard SVD that when we build \(P\) matrix, we used the first \(r\) vectors \(\vec {u}_{i}\) to generate \(\vec {v}_{i}\) from, by doing \(\vec {v}_{i}=\frac {A\vec {u}_{i}}{\sigma _{i}}\), then we used these \(\vec {v}_{i}\) as the columns of \(P,\) but since \(P\) was of size \(m\times m\,\,\)we had to come up with \(m-r\,\) more orthogonal vectors to fill in the \(P\) matrix. In the economical SVD, we stop when we fill in \(r\) vectors in \(P\). Hence \(P\) will be of size \(m\times r\). Similarly for \(D\) we stop at the \(r\) singular value. Hence in this case \(D\) will be a diagonal matrix \(r\times r\). And for the \(Q\) matrix, we also generate \(r\)  orthonormal basis \(\vec {u}_{i}\)

Hence the economical \(SVD\) follows the same steps as the normal SVD, except we stop when we obtain \(r\) basis. The matrices will therefor be smaller, hence the name economical. The factorization will then be \(A_{m\times n}=P_{m\times r}D_{r\times r}Q_{r\times n}\) (for the case when \(m\geq n\geq r\))

I am not sure now in what application one would have to generate the full SVD vs. the economical SVD. But in Matlab, there is an optional to select either factorization. Type svd(A,’econ’).

2.2.1.8 Theorem 7: On orthonormal Bases

Let \(L\) be a linear transformation from \(\mathbb {C} ^{m}\) to \(\mathbb {C} ^{n}\) then there are orthonormal bases \(\vec {u}_{1},\vec {u}_{2},\cdots ,\vec {u}_{n}\), for \(\mathbb {C} ^{m}\) and \(v_{1},v_{2},\cdots ,v_{n}\) for \(\mathbb {C} ^{n}\) s.t. \(L\vec {u}_{i}=\left \{ \begin {array} [c]{ccc}\sigma _{i}\vec {v}_{i} & & 1\leq i\leq \min \left ( m,n\right ) \\ 0 & & \min \left ( m,n\right ) <i\leq m \end {array} \right . \)

proof: The proof as given in the textbook for this theorem is clear enough as outlined. Please see page 295.

2.2.2 Homework Solution for section 5.4

2.2.2.1 Problem section 5.4, 2(a)

question: Find the minimal solution for \(x_{1}x_{2}=b\)

answer:

First write the problem as

\[\begin {bmatrix} 1 & 1 \end {bmatrix}\begin {bmatrix} x_{1}\\ x_{2}\end {bmatrix} =\left [ b\right ] \]

Minimal solution is \(\vec {x}=A^{+}b\), so we need to find \(A^{+}\). Find \(A=PDQ\), then \(A^{+}=Q^{H}D^{+}P^{H}\)

First find the set of \(\vec {u}_{i}\) vectors to go to the \(Q\) matrix. I will use the economical SVD method.

\begin{align*} A^{H}A & =\begin {bmatrix} 1\\ 1 \end {bmatrix}\begin {bmatrix} 1 & 1 \end {bmatrix} \\ & =\begin {bmatrix} 1 & 1\\ 1 & 1 \end {bmatrix} \end{align*}

Hence \(r=1\)

Hence \(\left \vert A-\lambda I\right \vert =\begin {vmatrix} 1-\lambda & 1\\ 1 & 1-\lambda \end {vmatrix} =0\rightarrow \left ( 1-\lambda \right ) ^{2}-1=0\rightarrow 1+\lambda ^{2}-2\lambda -1=0\rightarrow \lambda \left ( \lambda -2\right ) =0\)

Hence \(\lambda _{1}=2,\lambda _{2}=0\rightarrow \sigma _{1}=\sqrt {2},\sigma _{2}=0\)

Find eigenvectors \(\vec {u}_{1},\vec {u}_{2}\).

For \(\lambda _{1}=2\rightarrow \begin {bmatrix} 1-2 & 1\\ 1 & 1-2 \end {bmatrix}\begin {bmatrix} y_{1}\\ y_{2}\end {bmatrix} =\begin {bmatrix} 0\\ 0 \end {bmatrix} \rightarrow \begin {bmatrix} -1 & 1\\ 1 & -1 \end {bmatrix}\begin {bmatrix} y_{1}\\ y_{2}\end {bmatrix} =\begin {bmatrix} 0\\ 0 \end {bmatrix} \)

\(\rightarrow \begin {bmatrix} -1 & 1\\ 0 & 0 \end {bmatrix}\begin {bmatrix} y_{1}\\ y_{2}\end {bmatrix} =\begin {bmatrix} 0\\ 0 \end {bmatrix} \Rightarrow -y_{1}+y_{2}=0\Rightarrow \vec {u}_{1}=\begin {bmatrix} 1\\ 1 \end {bmatrix} \overset {\text {normalize\ norm 2}}{\rightarrow }\begin {bmatrix} \frac {1}{\sqrt {2}}\\ \frac {1}{\sqrt {2}}\end {bmatrix} \)

For \(\lambda _{2}=0\rightarrow \begin {bmatrix} 1-0 & 1\\ 1 & 1-0 \end {bmatrix}\begin {bmatrix} y_{1}\\ y_{2}\end {bmatrix} =\begin {bmatrix} 0\\ 0 \end {bmatrix} \rightarrow \begin {bmatrix} 1 & 1\\ 1 & 1 \end {bmatrix}\begin {bmatrix} y_{1}\\ y_{2}\end {bmatrix} =\begin {bmatrix} 0\\ 0 \end {bmatrix} \)

\(\rightarrow \begin {bmatrix} 1 & 1\\ 0 & 0 \end {bmatrix}\begin {bmatrix} y_{1}\\ y_{2}\end {bmatrix} =\begin {bmatrix} 0\\ 0 \end {bmatrix} \Rightarrow y_{1}+y_{2}=0\Rightarrow \vec {u}_{2}=\begin {bmatrix} -1\\ 1 \end {bmatrix} \overset {\text {normalize\ norm 2}}{\rightarrow }\begin {bmatrix} \frac {-1}{\sqrt {2}}\\ \frac {1}{\sqrt {2}}\end {bmatrix} \)

Hence \(Q=\begin {bmatrix} \vec {u}_{1}^{T}\\ \vec {u}_{2}^{T}\end {bmatrix} =\begin {bmatrix} 1 & 1\\ -1 & 1 \end {bmatrix} \frac {1}{\sqrt {2}}\)

Not to find the \(P\) matrix. \(AA^{H}=\begin {bmatrix} 1 & 1 \end {bmatrix}\begin {bmatrix} 1\\ 1 \end {bmatrix} =\left [ 2\right ] \)

The eigenvalue is \(2-\lambda =0\rightarrow \lambda =2\). Hence the eigenvector is \(2y_{1}=0\rightarrow y_{1}=\)anything\(\rightarrow \left [ 1\right ] \)

Hence the \(P\) matrix is \(\left [ 1\right ] \)

The \(D\) matrix is \(m\times n\), hence \(1\times 2\), then \(D=\begin {bmatrix} \sigma _{1} & 0 \end {bmatrix} \)

Hence this completes the SVD. We have that

\begin{align*}\begin {bmatrix} 1 & 1 \end {bmatrix} & =\left [ 1\right ] \begin {bmatrix} \sigma _{1} & 0 \end {bmatrix}\begin {bmatrix} 1 & 1\\ -1 & 1 \end {bmatrix} \frac {1}{\sqrt {2}}\\ & =\left [ 1\right ] \begin {bmatrix} \sqrt {2} & 0 \end {bmatrix}\begin {bmatrix} 1 & 1\\ -1 & 1 \end {bmatrix} \frac {1}{\sqrt {2}}\\ & =\begin {bmatrix} \sqrt {2} & 0 \end {bmatrix}\begin {bmatrix} 1 & 1\\ -1 & 1 \end {bmatrix} \frac {1}{\sqrt {2}}\\ & =\begin {bmatrix} \sqrt {2} & \sqrt {2}\end {bmatrix} \frac {1}{\sqrt {2}}\\ & =\begin {bmatrix} 1 & 1 \end {bmatrix} \end{align*}

So the SVD is verified. Not find

\begin{align*} A^{+} & =Q^{H}D^{+}P^{H}\\ & =\begin {bmatrix} 1 & 1\\ -1 & 1 \end {bmatrix} ^{H}\frac {1}{\sqrt {2}}\begin {bmatrix} \sqrt {2} & 0 \end {bmatrix} ^{+}\left [ 1\right ] ^{H}\\ & =\frac {1}{\sqrt {2}}\begin {bmatrix} 1 & -1\\ 1 & 1 \end {bmatrix}\begin {bmatrix} \frac {1}{\sqrt {2}}\\ 0 \end {bmatrix} \left [ 1\right ] \end{align*}

Notice that \(D^{+}\) mean we also take the conjugate transpose of \(D\) and then we take the reciprocal of each entry. Hence if \(D\) is \(m\times n\) then \(D^{+}\) is \(n\times m\)

\begin{align*} A^{+} & =\frac {1}{\sqrt {2}}\begin {bmatrix} \frac {1}{\sqrt {2}}\\ \frac {1}{\sqrt {2}}\end {bmatrix} \\ & =\begin {bmatrix} \frac {1}{2}\\ \frac {1}{2}\end {bmatrix} \end{align*}

Hence

\begin{align*} \hat {x} & =A^{+}b\\ & =\begin {bmatrix} \frac {1}{2}\\ \frac {1}{2}\end {bmatrix} b \end{align*}

So the minimal solution is \(x_{1}=\frac {b}{2},x_{2}=\frac {b}{2}\)

2.2.2.2 Problem 5.4, 10

Problem: Prove the following properties of \(A^{+}\)

a)\(A^{++}=A\)

b)\(A^{+H}=A^{H+}\)

answer:

a)\(\left ( A^{+}\right ) ^{+}=\left ( \left ( PDQ\right ) ^{+}\right ) ^{+}=\left ( Q^{H}D^{+}P^{H}\right ) ^{+}=\left ( P^{H}\right ) ^{H}\left ( D^{+}\right ) ^{+}\left ( Q^{H}\right ) ^{H}\)

But \(\left ( P^{H}\right ) ^{H}=P\) and \(\left ( Q^{H}\right ) ^{H}=Q\), and the reciprocal of a reciprocal gives us back the original value, hence \(\left ( D^{+}\right ) ^{+}=D\)

Hence we have \(\left ( P^{H}\right ) ^{H}\left ( D^{+}\right ) ^{+}\left ( Q^{H}\right ) ^{H}=PDQ=A\)

b)

\begin{align} \left ( A^{+}\right ) ^{H} & =\left ( Q^{H}D^{+}P^{H}\right ) ^{H}\nonumber \\ & =\left ( P^{H}\right ) ^{H}\left ( D^{+}\right ) ^{H}\left ( Q^{H}\right ) ^{H}\nonumber \\ & =P\left ( D^{+}\right ) ^{H}Q \tag {1}\end{align}

and

\begin{align} A^{H+} & =\left ( \left ( PDQ\right ) ^{H}\right ) ^{+}\nonumber \\ & =\left ( Q^{H}D^{H}P^{H}\right ) ^{+}\nonumber \\ & =\left ( P^{H}\right ) ^{H}\left ( D^{H}\right ) ^{+}\left ( Q^{H}\right ) ^{H}\nonumber \\ & =P\left ( D^{H}\right ) ^{+}Q \tag {2}\end{align}

Hence (1)=(2) if \(D^{H+}=D^{+H}\). But this is the case. Since \(D^{H+}=D\) and \(D^{+H}=D\)

2.2.2.3 Problem 5.4, 23

Problem: Let \(A\) be a square matrix with SVD \(PDQ\), prove that the characteristic polynomial of \(A=\pm \det \left ( D-\lambda P^{H}Q^{H}\right ) =0\)

answer:

Let the characteristic polynomial of \(A\) be called \(\Delta \), hence we write

\begin{align} \Delta & =\det \left ( A-\lambda I\right ) \nonumber \\ & =\det \left ( PDQ-\lambda I\right ) \nonumber \\ & =\det \left ( P\left ( DQ-\lambda P^{-1}\right ) \right ) \nonumber \\ & =\det \left ( P\left ( D-\lambda P^{-1}Q^{-1}\right ) Q\right ) \nonumber \\ & =\det \left ( P\right ) \det \left ( D-\lambda P^{-1}Q^{-1}\right ) \det \left ( Q\right ) \nonumber \end{align}

But \(Q^{-1}=Q^{H}\) and \(P^{-1}=P^{H}\) hence the above becomes

\[ \Delta =\det \left ( P\right ) \det \left ( D-\lambda P^{H}Q^{H}\right ) \det \left ( Q\right ) \]

Now \(\Delta =0\), and also we know that \(\det \left ( P\right ) \neq 0\) and \(\det \left ( Q\right ) \neq 0\,\), this is becuase \(P\) and \(Q\) are unitary matrices. Hence this means that

\[ \det \left ( D-\lambda P^{H}Q^{H}\right ) =0 \]
2.2.2.4 Problem 5.4, 34

problem: prove that if \(A\) is symmetric then so is \(A^{+}\)

answer:

Assuming complex matrix then symmetric means \(A=A^{H}\) hence

\begin{align} PDQ & =\left ( PDQ\right ) ^{H}\nonumber \\ & =Q^{H}D^{H}P^{H}\nonumber \\ & =Q^{H}DP^{H} \tag {1}\end{align}

Hence \(PDQ=Q^{H}DP^{H}\rightarrow DQ=P^{-1}Q^{H}DP^{H}\rightarrow D=P^{-1}Q^{H}DP^{H}Q^{-1}\)

But since \(P^{H}=P^{-1}\) and \(Q^{H}=Q^{-1}\) then the above becomes

\begin{equation} D=P^{H}Q^{H}DP^{H}Q^{H} \tag {2}\end{equation}

now

\[ A^{+}=Q^{H}D^{+}P^{H}\]

Sub (2) into the above equation we obtain

\begin{align*} A^{+} & =Q^{H}\left ( P^{H}Q^{H}DP^{H}Q^{H}\right ) ^{+}P^{H}\\ & =Q^{H}\left ( \left ( P^{H}Q^{H}\right ) ^{H}D^{+}\left ( P^{H}Q^{H}\right ) ^{H}\right ) P^{H}\\ & =Q^{H}\left ( \left ( QP\right ) D^{+}\left ( QP\right ) \right ) P^{H}\\ & =Q^{H}QPD^{+}QPP^{H}\end{align*}

But \(Q^{H}Q=I\) and \(PP^{H}=I\)

Hence the above becomes

\begin{equation} A^{+}=PD^{+}Q \tag {3}\end{equation}

But

\begin{align} \left ( A^{+}\right ) ^{H} & =\left ( Q^{H}D^{+}P^{H}\right ) ^{H}\nonumber \\ & =PD^{+}Q \tag {4}\end{align}

Compare (3) and (4), they are the same.

Hence \(A^{+}\) is symmetric.

2.2.2.5 Problem 5.4, 39

problem: prove that eigenvalues of positive semi definite matrix are nonnegative

answer:

positive semi definite means \(\vec {x}^{T}A\vec {x}\geq 0\) for all \(\vec {x}\neq \vec {0}\)

Hence \(\vec {x}^{T}A\vec {x}=\vec {x}^{T}\lambda \vec {x}=\lambda \vec {x}^{T}\vec {x}\)

But \(\vec {x}^{T}\vec {x}=\left \Vert \vec {x}\right \Vert ^{2}\)

Hence \(\vec {x}^{T}A\vec {x}=\lambda \left \Vert \vec {x}\right \Vert ^{2}\)

We are told the above is \(\geq 0\). Assume \(\vec {x}\neq 0\,,\) then we have \(\lambda \times \) the norm, which is positive quantity \(\geq 0\,\), hence this is possible only if \(\lambda \) was zero (for the =0 case) or \(\lambda >0\) for the \(>0\) case. It is not possible to have \(\lambda \) negative and multiply it by positive quantity and obtain a positive quantity.

Now Assume \(\vec {x}=0\), hence the norm is zero. Hence \(A\vec {x}=\vec {0}\) and so eigenvalues is zero. Hence eigenvalues can be either positive or zero. Hence nonnegative

2.2.3 code

This is some code in Matlab

%compare the error in A by SVDing it and then rebuild A back 
 
clear all; 
A=rand(10000,2); 
fprintf('total storage needed for A is =%f MB\n',(8*size(A,1)*size(A,2))/10^6); 
 
[p,d,q]=svd(A,'econ'); 
econSize=8*(size(p,1)*size(p,2)+size(d,1)*size(d,2)+size(q,1)*size(q,2)); 
econSize=econSize/10^6; %make it in MB 
fprintf('total storage needed for A with economy SVD=%f MB\n',econSize); 
 
fprintf('Max difference between A and its reconstructed version is\n %f\n',... 
         max(max(A-p*d*q')));

2.3 Second lecture

2.3.1 Introduction

These are the lecture notes of the second lecture for the course Math 501, Spring 2007 at CSUF. These notes are written by Nasser M. Abbasi (student in the class).

Lecture was given by Dr C.H.Lee, Mathematics dept. CSU Fullerton on 1/24/2007.

Lecture started with walkthrough on using Microsoft powerpoint for presentations.

Note the following correction to last lecture note. For the error term in the 2D Taylor expansion, it is given by

\[ E_{n}\left ( x,y\right ) =\frac {1}{\left ( n+1\right ) !}\left ( h\frac {\partial }{\partial x}+k\frac {\partial }{\partial x}\right ) ^{n+1}f\left ( x+\theta h,y+\theta k\right ) \]

Where \(0\leq \theta \leq 1\)

For example, for \(n=1\), and for \(f\left ( x,y\right ) =e^{xy},\)the above works out to be

\[ E_{1}\left ( x,y\right ) =hk+\frac {1}{2}\left ( h\left ( y+\theta k\right ) +k\left ( x+\theta h\right ) \right ) ^{2}e^{\left ( x+\theta h\right ) \left ( y+\theta k\right ) }\]

I wrote a small function which computes the above error term for any \(n\), here is some of the terms for increasing \(n\)

\(n\) \(E_{n}\)
\(1\) pict
\(2\) pict
\(3\) pict

To see how the 2D error term behaves as \(n\) increases, we can select some values for \(\theta ,h,k\) and evaluate \(E_{n}.\) This is the result for \(\theta =0.5,h=0.1,k=0.1.\) First, this is a plot of the function \(e^{xy}\)

For \(\left ( x,y\right ) =\left ( 2,5\right ) \)

For \(\left ( x,y\right ) =\left ( 1,1\right ) \)

For \(\left ( x,y\right ) =\left ( 0,0\right ) \)

Notice the following: If point \(\left ( x,y\right ) \) selected maps to a large function value \(f\left ( x,y\right ) \) then more terms as needed to make the error term smaller. Notice we need at least 4 terms before \(E_{n}\) would continue to decrease as \(n\) increases. For example, for \(n=2\), \(E_{2}\) was actually smaller than \(E_{3}\) in the examples above. It is only after \(n=4\) than the error term would continue to become smaller as \(n\) in increased.

2.3.2 Convergence

Definition: Order of convergence:

Let \([x_{n}]\) be a sequence, \([x_{n}]\) is said to converge to \(L\) at order \(P\) if \(\exists \) two positive constants \(c<1\) and \(N>0\) s.t.

\[ \left \vert x_{n+1}-L\right \vert \leq c\left \vert x_{n}-L\right \vert ^{P}\]

for all \(n\geq N\)

Note than if we write \(E_{n}\equiv \left \vert x_{n}-L\right \vert \), then the above can be written as

\[ E_{n+1}\leq c\left ( E_{n}\right ) ^{P}\]

Taking the log, we write

\[ \ln E_{n+1}\leq \ln c+P\ln E_{n}\]

Which can be considered an equation of a line with intercept \(\ln c\) and slope \(P\)

The larger the slope \(P\) the faster the sequence \([x_{n}]\) will converge to \(L\)

Example

Suppose \(\lim _{n->\infty }\left [ x_{n}\right ] ->L\) at order \(2\) (quadratic), then

\[ E_{n+1}\leq c\ \left ( E_{n}\right ) ^{2}\]

For some positive constant \(c.\) (Note lecture notes said \(c<1\) here, but text book says \(c\) not necessarily less than one for the quadratic case). Now suppose \(E_{1}=10^{-1}\), then

\begin{align*} E_{2} & \leq cE_{1}\leq c\left ( 10^{-1}\right ) ^{2}\leq c\ 10^{-2}\\ E_{3} & \leq c\left ( E_{2}\right ) ^{2}\leq c\left ( c\ 10^{-2}\right ) ^{2}\leq c^{3}10^{-4}\\ E_{4} & \leq c\left ( E_{3}\right ) ^{2}\leq c\left ( c^{3}10^{-4}\right ) ^{2}\leq c^{7}10^{-8}\\ E_{5} & \leq c\left ( c^{7}10^{-8}\right ) ^{2}\leq c^{15}10^{-16}\end{align*}

So after \(5\) iterations, error became very small. Error went from order \(10^{-1}\) to order \(10^{-16}\). Compare that to something that converges linearly:

Suppose \(\lim _{n->\infty }\left [ x_{n}\right ] ->L\) at order \(1\) (Linear), then

\[ E_{n+1}\leq c\ E_{n}\]

For some positive constant \(c<1\) (note for linear case, \(c\) must be less than one)\(.\) Now suppose \(E_{1}=10^{-1}\), then

\begin{align*} E_{2} & \leq cE_{1}\leq c10^{-1}\\ E_{3} & \leq cE_{2}\leq c\left ( c\ 10^{-1}\right ) \leq c^{2}10^{-1}\\ E_{4} & \leq cE_{3}\leq c\left ( c^{2}10^{-1}\right ) \leq c^{3}10^{-1}\\ E_{5} & \leq c\left ( c^{3}10^{-1}\right ) \leq c^{4}10^{-1}\end{align*}

Compare that to the quadratic case above.

Any \(P>1\) is considered superlinear

Example:

Show that \(x_{n+1}=\frac {1}{2}x_{n}+\frac {1}{x_{n}}\) is a sequences which converges to \(L=\sqrt {2}\)quadratically. Use \(x_{1}=2.\)

We want to show that \(E_{n+1}\leq c\left ( E_{n}\right ) ^{2}\) for some \(\operatorname {positive}\) \(c\) (note: Lecture notes said also that \(c<1\) here) and for some \(n>N\)

\begin{align*} E_{n+1} & =\left \vert x_{n+1}-L\right \vert \\ & =\frac {1}{2}x_{n}+\frac {1}{x_{n}}-\sqrt {2}\\ & =\frac {x_{n}^{2}-\sqrt {2}}{2x_{n}}-\sqrt {2}\\ & =\frac {x_{n}^{2}-2\sqrt {2}x_{n}+2+2\sqrt {2}x_{n}}{2x_{n}}-\sqrt {2}\\ & =\frac {\left ( x_{n}-\sqrt {2}\right ) ^{2}+2\sqrt {2}x_{n}}{2x_{n}}-\sqrt {2}\\ & =\frac {\left ( x_{n}-\sqrt {2}\right ) ^{2}}{2x_{n}}+\sqrt {2}-\sqrt {2}\\ & =\frac {\left ( x_{n}-\sqrt {2}\right ) ^{2}}{2x_{n}}\\ & =\frac {\left ( E_{n}\right ) ^{2}}{2\left ( E_{n}+\sqrt {2}\right ) }\\ & \leq \frac {\left ( E_{n}\right ) ^{2}}{2\sqrt {2}}\end{align*}

Hence it converges quadratically, with \(c=\frac {1}{2\sqrt {2}}\)

Definition: Big \(O:\) Let \(\left [ x_{n}\right ] \) and \(\left [ \alpha _{n}\right ] \) be 2 sequences. We say that

\[ \,x_{n}=O\left ( \alpha _{n}\right ) \]

if \(\exists \) \(C>0\) and \(N>0\) s.t. \(\left \vert x_{n}\right \vert \leq C\left \vert \alpha _{n}\right \vert \) for all \(n\geq N\) (Note: Book only says that \(C\) and \(N\) are constants, lecture notes says they are also positive).

Example: Given \(x_{n}=\frac {n+1}{n^{2}}\) and \(\alpha _{n}=\frac {1}{n}\) show that \(x_{n}=O\left ( \alpha _{n}\right ) \)

\begin{align*} x_{n} & =\frac {n+1}{n^{2}}\\ & =\frac {1}{n}+\frac {1}{n^{2}}\\ & \leq \frac {1}{n}+\frac {1}{n}=2\left ( \frac {1}{n}\right ) \end{align*}

Hence

\[ x_{n}\leq 2\left ( \alpha _{n}\right ) \]

Hence \(x_{n}=O\left ( \alpha _{n}\right ) \)

Example: Find a simpler sequence \(\left [ \alpha _{n}\right ] \) s.t. \(x_{n}=O\left ( \alpha _{n}\right ) \) where \(x_{n}=\frac {\sqrt {n}}{\left ( 3n+1\right ) ^{3}}\)

\begin{align*} x_{n} & =\frac {\sqrt {n}}{\left ( 3n+1\right ) ^{3}}\\ & \leq \frac {\sqrt {n}}{\left ( 3n\right ) ^{3}}\\ & =\frac {1}{9}\frac {n^{\frac {1}{2}}}{n^{3}}\\ & =\frac {1}{9}n^{-3+\frac {1}{2}}\\ & =\frac {1}{9}n^{-\frac {5}{2}}\end{align*}

Hence \(x_{n}=O\left ( \frac {1}{n^{\frac {5}{2}}}\right ) \) where here \(\alpha _{n}=\frac {1}{n^{\frac {5}{2}}}\) and \(C=\frac {1}{9}\)

Definition: small \(o\). Given 2 sequences \(\left [ x_{n}\right ] \) and \(\left [ \alpha _{n}\right ] \) we say \(x_{n}=o\left ( \alpha _{n}\right ) \) if \(\lim _{n\rightarrow \infty }\frac {x_{n}}{\alpha _{n}}=0\). i.e. \(\exists \) sequence \(\left [ \varepsilon _{n}\right ] \) that converges to zero, and \(N>0\) s.t. \(\left \vert x_{n}\right \vert \leq \left \vert \varepsilon _{n}\right \vert \left \vert \alpha _{n}\right \vert \) for all \(n>N\)

The above means that \(x_{n}\rightarrow 0\) much faster than \(\alpha _{n}\) does. Graphically, this is illustrated as follows

Example: \(x_{n}=\frac {1}{n\ln n}\), hence \(x_{n}=o\left ( \frac {1}{n}\right ) \) where here \(\varepsilon _{n}=\frac {1}{\ln n}\)

Also we can write \(x_{n}=o\left ( \frac {1}{\ln n}\right ) \) where \(\varepsilon _{n}=\frac {1}{n}\)

Example: Show that \(e^{-n}=o\left ( n^{-k}\right ) \) for any \(k>0\)

One way is to show that \(\lim _{n\rightarrow \infty }\frac {x_{n}}{\alpha _{n}}=0\)

\begin{align*} \lim _{n\rightarrow \infty }\frac {x_{n}}{\alpha _{n}} & =\lim _{n\rightarrow \infty }\frac {n^{k}}{e^{n}}\\ & =\lim _{n\rightarrow \infty }\frac {n^{k}}{1+n+\frac {n^{2}}{2}+\frac {n^{3}}{3!}+\cdots }\\ & =0 \end{align*}

Hence \(x_{n}=o\left ( \alpha _{n}\right ) \)

HW section 1.2, problems 6(b,e), 7(b,c), 10,28,40

Note: Next week we meet in room MH380

2.3.3 Difference equations

Definition: Let us denote by \([x_{n}]=[x_{1},x_{2},x_{3},\cdots ]\) an infinite sequence of numbers. Then \(E\) is called the shift operator if \(E\left [ x\right ] =\left [ x_{2},x_{3},x_{4},\cdots \right ] \)

Some properties of \(E\)

  1. \(Ex_{1}=x_{2}\)
  2. \(Ex_{n}=x_{n+1}\)
  3. \(E\left ( Ex_{1}\right ) =x_{3}\)
  4. \(E^{k}x_{n}=x_{n+k}\)
  5. \(E^{0}x_{n}=x_{n}\)

Definition: \(x_{n+1}\) is called a recursive sequence if \(x_{n+1}=f\left ( x_{n},x_{n-1},\cdots ,x_{r}\right ) \) where \(r\leq n\)

Example: \(x_{n+1}=2x_{n}^{2}-7x_{n+1}+5\). Here 2 initial conditions are needed for this sequence to be defined for all \(n\). In this example the sequence is not linear.

Definition: \(x_{n+1}\) is called a linear recursive sequence if \(x_{n+1}=a_{n}x_{n}+a_{n-1}x_{n-1}+\cdots +a_{r}x_{r}\) where \(r\leq n\). (i.e. \(x_{n+1}\) is a linear combination of previous numbers in the sequence)

Given a linear recursive sequence our goal is to find its solution explicitly. i.e. in a non-recursive form.

\begin{align*} x_{n+1} & =a_{n}x_{n}+a_{n-1}x_{n-1}+\cdots +a_{r}x_{r}\\ E^{n+1-r}x_{r} & =a_{n}E^{n-r}x_{r}+a_{n-1}E^{n-r-1}x_{r}+\cdots +a_{r}E^{0}x_{r}\\ E^{n+1-r}x_{r}-a_{n}E^{n-r}x_{r}-a_{n-1}E^{n-r-1}x_{r}-\cdots -a_{r}E^{0}x_{r} & =0\\ \left ( E^{n+1-r}-a_{n}E^{n-r}-a_{n-1}E^{n-r-1}-\cdots -a_{r}E^{0}\right ) x_{r} & =0 \end{align*}

Hence since \(x_{r}\neq 0\) we obtain a polynomial in \(E\) in degree \(n+1-r\)

\[ E^{n+1-r}-a_{n}E^{n-r}-a_{n-1}E^{n-r-1}-\cdots -a_{r}=0 \]

We can now find its roots. Assume the roots are called \(\lambda _{1},\lambda _{2},\cdots ,\lambda _{n+1-r}\) then

case 1 all roots \(\lambda \) are distinct (simple roots) then the solution to the difference equation is

\[ x_{n}=k_{1}\lambda _{1}^{n}+k_{2}\lambda _{2}^{n}+\cdots +k_{n+1-r}\lambda _{n+1-r}^{n}\]

Where \(k_{1},k_{2},\cdots ,k_{n+1-r}\) are coefficients to be determined from initial conditions.

Example: Suppose we have linear recursive equation \(x_{n+1}=3x_{n}-2x_{n-1}\) with \(x_{1}=1,x_{2}=0\)

apply the shift operator, we write

\begin{align*} E^{2}x_{n-1} & =3E^{1}x_{n-1}-2E^{0}x_{n-1}\\ E^{2}x_{n-1}-3E^{1}x_{n-1}+2E^{0}x_{n-1} & =0\\ \left ( E^{2}-3E^{1}+2E^{0}\right ) x_{n-1} & =0 \end{align*}

Hence the polynomial is \(E^{2}-3E^{1}+2=0\) and its roots are given by \(\left ( E-2\right ) \left ( E-1\right ) =0\,,\)hence \(\lambda _{1}=2,\lambda _{2}=1\). Simple roots. Hence the solution is given by

\begin{align*} x_{n} & =k_{1}\lambda _{1}^{n}+k_{2}\lambda _{2}^{n}\\ & =k_{1}2^{n}+k_{2}1^{n}\\ & =k_{1}2^{n}+k_{2}\end{align*}

Now apply initial conditions to find \(k_{1}\) and \(k_{2}\)

\(n=1,x_{1}=1\Rightarrow 1=k_{1}2+k_{2}\)

\(n=2,x_{2}=0\Rightarrow 0=k_{1}4+k_{2}\)

Solving the above 2 equations for \(k_{1},k_{2}\) we obtain \(k_{1}=-\frac {1}{2},k_{2}=2\,\ \)Hence the solution is

\[ x_{n}=-\frac {1}{2}2^{n}+2 \]

Case 2: Roots are multiple roots. Let \(\lambda _{\ast }\) be \(k\) multiple root. Then

\begin{align*} x_{n} & =A_{0}\lambda _{\ast }^{n}+A_{1}\frac {d}{d\lambda _{\ast }}\left ( \lambda _{\ast }^{n}\right ) +A_{2}\frac {d^{2}}{d\lambda _{\ast }^{2}}\left ( \lambda _{\ast }^{n}\right ) +\cdots +A_{k-1}\frac {d^{k-1}}{d\lambda _{\ast }^{k-1}}\left ( \lambda _{\ast }^{n}\right ) \\ & =A_{0}\lambda _{\ast }^{n}+A_{1}n\lambda _{\ast }^{n-1}+A_{2}n\left ( n-1\right ) \lambda _{\ast }^{n-2}+\cdots +A_{k-1}n\left ( n-1\right ) \left ( n-2\right ) \cdots \left ( n-k+2\right ) \lambda _{\ast }^{n-k+1}\end{align*}

Example: Solve \(4x_{n}=-7x_{n-1}-2x_{n-2}+x_{n-3}\)

The polynomial in \(E\) is

\begin{align*} 4E^{3}x_{n-3}+7E^{2}x_{n-3}+2E^{1}x_{n-3}-E^{0}x_{n-3} & =0\\ 4E^{3}+7E^{2}+2E^{1}-1 & =0\\ \left ( E+1\right ) ^{2}\left ( 4E-1\right ) & =0 \end{align*}

Hence \(\lambda _{\ast }=-1\) with multiplicity \(k=2\) and \(\lambda _{1}=\frac {1}{4}\) a simple root.

Solution is

\begin{align*} x_{n} & =\overset {simple}{\overbrace {\left [ k_{0}\left ( \lambda _{1}\right ) ^{n}\right ] }}+\overset {multiple\ root}{\overbrace {\left [ A_{0}\lambda _{\ast }^{n}+A_{1}\frac {d}{d\lambda _{\ast }}\left ( \lambda _{\ast }^{n}\right ) \right ] }}\\ & =k_{0}\left ( \frac {1}{4}\right ) ^{n}+\left [ A_{0}\left ( -1\right ) ^{n}+A_{1}n\left ( \lambda _{\ast }^{n-1}\right ) \right ] \\ & =k_{0}\left ( \frac {1}{4}\right ) ^{n}+A_{0}\left ( -1\right ) ^{n}+A_{1}n\left ( -1\right ) ^{n-1}\end{align*}

Where the 3 coefficients \(k_{0},A_{0},A_{1}\) can be found from initial conditions.

Next we address stability of these difference equations. For this we need definitions of stable and bounded sequence.

Definition: Bounded: A sequence \([x_{n}]\) is bounded if \(\exists \) \(c>0\) and \(N>0\) s.t. \(\left \vert x_{n}\right \vert \leq c\) \(\forall n\geq N\)

Definition: Stability: A difference equation is stable if its solutions are bounded.

Theorem: Let \(x_{n}\) be the solution of the characteristic polynomial of the difference equation. The solution of the difference equation is stable if

  1. All simple roots \(\leq 1\)
  2. All repeated roots \(<1\)

Proof: Suppose \(\lambda ^{\prime }s\) are the simple roots of the characteristic polynomial of the difference equation, then

\begin{align*} \left \vert x_{n}\right \vert & =\left \vert k_{1}\lambda _{1}^{n}+k_{2}\lambda _{2}^{n}+\cdots +k_{n-r+1}\lambda _{n-r+1}^{n}\right \vert \\ & \leq \left \vert k_{1}\right \vert \left \vert \lambda _{1}^{n}\right \vert +\left \vert k_{2}\right \vert \left \vert \lambda _{2}^{n}\right \vert +\cdots +\left \vert k_{n-r+1}\right \vert \left \vert \lambda _{n-r+1}^{n}\right \vert \end{align*}

The above is bounded if each \(\lambda \leq 1\)

Now assume the roots are multiple roots order \(k.\) Then if \(\left \vert \lambda \right \vert <1\) then

\[ \lim _{n\rightarrow \infty }n^{k}\lambda ^{n}=0 \]

Hence \(\left \vert x_{n}\right \vert \) is bounded. See textbook page 33 for more detailed proof.

Example:

\[ x_{n+2}=3x_{n+1}-2x_{n}\]

The characteristic equation is

\begin{align*} E^{2}-3E+2 & =0\\ \left ( E-1\right ) \left ( E-2\right ) & =0 \end{align*}

Hence \(\lambda _{1}=1,\lambda _{2}=2\). Since simple roots, and one root \(\lambda _{2}>1\) then NOT stable.

Example:

\[ x_{n+2}-2x_{n+1}+2x_{n}=0 \]

The characteristic equation is

\[ E^{2}-2E+2=0 \]

The roots are \(\lambda _{1}=1-i,\lambda _{2}=1+i\)

Since simple roots, and the size of the root is \(>1\), then NOT stable (the size of the root is \(\left \vert \lambda _{2}\right \vert =\sqrt {2}\))

HW, section 1.3 problem 9,11,12,25

2.3.4 Computer arithmetic

2.3.4.1 Decimal system (base 10)

The digits are \(0-9\,\ \)and each digit it multiplied by power of 10. For example the number \(427.325\) in base 10 can be written as follows

\[ \left ( 427.325\right ) _{10}=4\times 10^{2}+2\times 10^{1}+7\times 10^{0}+3\times 10^{-1}+2\times 10^{-2}+5\times 10^{-3}\]
2.3.4.2 Binary system (base 2)

The digits are \(0,1\) and each digit is multiplied by powers of 2. For example the number \(1001.11101\) in base 2 can be written as

\begin{align*} \left ( 1001.11101\right ) _{2} & =1\times 2^{3}+0\times 2^{2}+0\times 2^{1}+1\times 2^{0}+1\times 2^{-1}+1\times 2^{-2}+1\times 2^{-3}+0\times 2^{-4}+1\times 2^{-5}\\ & =8+0+0+1+\frac {1}{2}+\frac {1}{4}+\frac {1}{8}+0+\frac {1}{32}\\ & =\left ( 9.90625\right ) _{10}\end{align*}

Example: Calculate \(\frac {1}{10}\)

In decimal, the answer is \(0.1\)

In Binary (depending on machine accuracy) the answer is  

\begin{align*} 1\div 10 & \simeq \left ( 1.000\cdots \right ) _{2}\div \left ( 1010.000\cdots \right ) _{2}\\ & =\left ( 0.00011001100110011\cdots \right ) _{2}\end{align*}

With 7 digits accuracy machine we get

\[ 1\div 10\simeq 0.1000000\overset {error}{\overbrace {1490116}}\cdots \]

Lets do conversion between binary and decimal

Example: Convert \(53_{10}\) to binary

\begin{align*} 53\div 2 & =26\ \ R\ 1\\ 26\div 2 & =13\ \ R\ 0\\ 13\div 2 & =6\ \ R\ \ 1\\ 6\div 2 & =3\ \ R\ 0\\ 3\div 2 & =1\ \ R\ 1\\ 1\div 2 & =0\ \ R\ \ 1 \end{align*}

Hence \(53_{10}=110101_{2}\)

Example: Convert \(19_{10}\) to binary

\begin{align*} 19\div 2 & =9\ \ R\ 1\\ 9\div 2 & =4\ \ R\ 1\\ 4\div 2 & =2\ \ R\ \ 0\\ 2\div 2 & =1\ \ R\ 0\\ 1\div 2 & =0\ \ R\ 1 \end{align*}

Hence \(19_{10}=10011_{2}\) Example: Convert \(0.7_{10}\) to binary

\begin{align*} 0.7\times 2 & =0.4+1\\ 0.4\times 2 & =0.8+0\\ 0.8\times 2 & =0.6+1\\ 0.6\times 2 & =0.2+1\\ 0.2\times 2 & =0.4+0\\ 0.4\times 2 & =0.8+0\\ & \vdots \\ & repated \end{align*}

Hence \(0.7_{10}=0.1\overbrace {0110}\ \overbrace {0110}\cdots \)

Hence \(53.7_{10}=\ 110101.1\overbrace {0110}\ \overbrace {0110}\cdots \)

2.4 Mathematics Colloquium notes

Mathematics Colloquium notes
Talk given by Dr McMillen,Tyler on 4/25/2007.
Mathematical Problems of Decision Making
California State University, Fullerton.
Spring 2007 semester Notes taken by Nasser Abbasi.

These are my notes taken during talk given by Dr McMillen, Mathematics department, California State University, Fullerton. On April 25, 2007. The subject of the talk was on Mathematical Problems of Decision Making

Dr McMillen started by asking the question: "How to choose between different choices?", examples given are: run or fight? and asked also: you might have to select between many choices, not just 2.

What are the choice-reaction model?

Need to vary signal to noise ratio and check how people can choose.

If some choices are close to each others, and one choice is distinct one, people tend to select the distinct one. For example, it is easier for people to choose between  2 bars that are drawn at 90 degree to each others, than 2 that are inclined such as they are very close to each others. The first case makes selecting easier since the choices are more distinct from each others.

There is a limit on how many choices people can handle at the same time. The limit seems to be around 7.

Now the talk went into discussing models of decision making:

This is a hard problem. Simplest types of models are only partially understood.

Statistical regularities

Reaction time (\(RT\)) effect:

  1. Hick’s Law, where \(RT\sim \log (N)\) where \(N\) is the number of choices
  2. Loss avoidance, this means people prefer choices that are far away from each others.
  3. The magic number 7.

Stochastic differential equations are used to model decision making process. Mention of Fokker-Planck equation.

Now the talk presented a neural model of decisions making. Where 2 brain neurons are shown each accepting a separate input (with noise added), and there exist what is called an inhibition \(W\) factor between these 2 neurons. These neurons are subject also to a decay \(K\) factor. This is called Neural Model of perceptual choices.

The talk also discussed the effect on the amount of time a person has to make a decision on what decisions they make. When the time to make a decision is limited, it is called the interrogation model.

The talk now discussed what is an optimal method to decide between more than 2 random choices to select from.

Using the above neural model, the best decision is made when the inhibition between the 2 neurons and the decay factor is the same. A model by the name of SPRT (WALD): Wald’s Sequential Probability Ratios, was mentioned in relation to optimally theorem of decision making.

The conclusion of this talk was that a mathematical model of how the brain makes decisions is very complex problem and not well understood, and only a very simple model exist when it comes to making a decision between 2 choices. The optimal way to make a decision is an unsolved mathematical problem.

I found this talk a bit hard to follow. I could not make a clear distinction on how the neural model shown related to the stochastic differential equations presented earlier. I did not understand what does the inhibition factor between neuron mean, and what is the decay factor actually represent? I think the talk was a little advanced for me as I felt I did not completely follow all the points presented. But I did get from this talk that modeling a decision making in humans at the neural level is a very difficult problem, but it was not clear to me why and how this difficulty comes about. Never the less, I did find the talk interesting and informative.

Chapter 3
study notes

3.1 Section 1.1
3.2 Taylor expansion with Lagrange remainder
3.3 Section 1.2 Order of convergence, linear, superlinear, quadratic
3.4 Section 1.3. difference equations, characteristic polynomial, simple and repeated roots, analytic solution and stability
3.5 Section 3.1 Bisection
3.6 Section 3.2 Newton root finding
3.7 Section 7.1 Numerical differentiation and Richardson extrapolation

3.1 Section 1.1

3.1.1 definition of limit of function \(f\left ( x\right ) \)

We say that the limit to \(f\left ( x\right ) \) is \(L\) as \(x\) gets close to \(c\), if for each number \(\varepsilon \) we can find another number \(\delta \) such that \(\left \vert f\left ( x\right ) -L\right \vert <\varepsilon \) for all \(x\) within a distance \(\delta \) from \(c.\)

So if we change \(\varepsilon \), may be make it smaller, we need to find another \(\delta \), most likely smaller than before also, such that \(\left \vert f\left ( x\right ) -L\right \vert <\varepsilon \) inside this new interval around \(c\)

note: We say \(\lim _{x\rightarrow c}f\left ( x\right ) \) exist if \(\lim _{x\rightarrow c^{-}}f\left ( x\right ) =\lim _{x\rightarrow c^{+}}f\left ( x\right ) =L\)

Example of a function where \(\lim _{x->0}f\left ( x\right ) \) does not exist is \(f\left ( x\right ) =\left \{ \begin {array} [c]{c}-1\ \ x<0\\ 0\ \ x=0\\ +1\ \ x>0 \end {array} \right . \)

note: A function can be defined at a point, but not have a limit at that point (as the example above shows)

3.1.2 Definition of continuous function at a point

A function \(f\left ( x\right ) \) is continuous at \(x=c\) if it is defined at that point, and if \(\lim _{x->c}f\left ( x\right ) \) exist and is equal to \(f\left ( c\right ) \)

Example: of a function that has \(\lim _{x->c}f\left ( x\right ) \) exist, but \(f\left ( c\right ) \)  is not equal to this limit. hence not continues at \(x=c\)

Example of function where \(f\left ( c\right ) \) equal the limit at \(x=c\), and \(\lim _{x->c}f\left ( x\right ) \) exist, hence continues

3.1.3 Definition of derivative of function \(f\left ( x\right ) \) at \(c\)

if \(f\left ( x\right ) \) is continues at \(x=c\), then

\[ f^{\prime }\left ( c\right ) =\lim _{x\rightarrow c}\frac {f\left ( x\right ) -f\left ( c\right ) }{x-c}\]

note: The above \(f^{\prime }\left ( c\right ) \) is defined only if the limit exist and is the same as we approach \(c\) from either side.

Conversely, we say that a function \(f\left ( x\right ) \) is differentiable at \(x=c\) iff \(f^{\prime }\left ( c\right ) \) exist and \(f\left ( c\right ) \) is continues.

In other words, \(f\left ( x\right ) \) is differentiable at \(x=c\) iff \(\lim _{x\rightarrow c^{-}}\frac {f\left ( x\right ) -f\left ( c\right ) }{x-c}=\lim _{x\rightarrow c^{+}}\frac {f\left ( x\right ) -f\left ( c\right ) }{x-c}=f^{\prime }\left ( c\right ) \)

note: It is possible for a function to be continues at \(c\) but not be differentiable there if the above limit is not the same as we approach \(c\) from either side.

Example, \(f\left ( x\right ) =\left \vert x\right \vert \)

3.1.4 Intermediate value theorem

on interval \([a,b]\), a continues function assumes all values between \(f\left ( a\right ) \) and \(f\left ( b\right ) \)

3.2 Taylor expansion with Lagrange remainder

On the real line, if we have a function \(f\left ( x\right ) \), and we wish to know the value of this function at a point \(x=b\) given the value of \(f\left ( x\right ) \) and its derivatives at another point say \(x=a\), then we write

\[ f\left ( b\right ) =f\left ( a\right ) +(b-a)f^{\prime }\left ( a\right ) +\frac {\left ( b-a\right ) ^{2}f^{\prime \prime }\left ( a\right ) }{2!}+\cdots \]

Now suppose we want to find the value of the function at arbitrary point \(x\) given the value of \(f\left ( x\right ) \) and its derivative at another point say \(x=a\), then we replace \(b\) by \(x\) above and write

\[ f\left ( x\right ) =f\left ( a\right ) +(x-a)f^{\prime }\left ( a\right ) +\frac {\left ( x-a\right ) ^{2}f^{\prime \prime }\left ( a\right ) }{2!}+\cdots +R_{n}\]

Where

\[ R_{n}=\frac {\left ( x-a\right ) ^{n+1}}{\left ( n+1\right ) !}f^{\left ( n+1\right ) }\left ( \xi \right ) \]

Where \(\xi \) is some point between \(x\) and \(a\)

If \(x-a=h\), we can write the above as

\[ \tilde {f}\left ( x\right ) =f\left ( a\right ) +hf^{\prime }\left ( a\right ) +\frac {h^{2}f^{\prime \prime }\left ( a\right ) }{2!}+\frac {h^{3}f^{\prime \prime }\left ( a\right ) }{3!}+\cdots +\frac {h^{n+1}}{\left ( n+1\right ) !}f^{\left ( n+1\right ) }\left ( \xi \right ) \]

note: If the point of expansion is zero, Taylor series is called maclaurin series.

\[ \tilde {f}\left ( x\right ) =f\left ( a\right ) +xf^{\prime }\left ( 0\right ) +\frac {x^{2}f^{\prime \prime }\left ( 0\right ) }{2!}+\frac {x^{3}f^{\prime \prime }\left ( 0\right ) }{3!}+\cdots +\frac {x^{n+1}}{\left ( n+1\right ) !}f^{\left ( n+1\right ) }\left ( \xi \right ) \]

Why do we use Taylor series for? To express a function as a series. This can allow one to more easily manipulate it. Also, if the function is non-linear, by expressing it in series, and dropping low order non-linear terms (h must be very small to have good approximation), then we have linearized a non-linear function in the vicinity of a point of expansion. Hence around the point of expansion, we can approximate the non-linear function by its linear Taylor series terms for the purpose of doing further linear system analysis (as it is easier to work with linear functions than non-linear ones).

3.2.1 Finding Error in Taylor series approximation

Things to know: How to find how many terms in Taylor series to approximate some given function to some accuracy?

Idea of solution: Express \(E_{n}\), this is the error term, or the remainder. Make \(\left \vert E_{n}\right \vert <\epsilon \) where \(\epsilon \) is the accuracy needed. Find smallest \(n\) which makes this true

Example: How many terms needed to find \(\ln \left ( 2\right ) \) to accuracy of \(\epsilon =10^{-8}\)?

Expand \(\ln \left ( x\right ) \) at \(x=1\), hence \(h=2-1=1\)

\begin{align*} \ln \left ( x\right ) & =\left ( x-1\right ) -\frac {1}{2}\left ( x-1\right ) ^{2}+\frac {1}{3}\left ( x-1\right ) ^{3}+\cdots +E_{n}\\ \ln \left ( 2\right ) & =1-\frac {1}{2}+\frac {1}{3}+\cdots +E_{n}\end{align*}

We want \(\left \vert E_{n}\right \vert <10^{-8}\), but \(E_{n}=\frac {1}{n+1}<10^{-8}\Rightarrow n\geq 10^{8}\), hence at least \(100\) million terms would be needed to computer \(\ln \left ( 2\right ) \) using Taylor series with accuracy of \(10^{-8}\)

3.2.2 Mean value theorem

if \(f\left ( x\right ) \) is continues on \([a,b]\), and if \(f^{\prime }\left ( x\right ) \) exist on the open interval \(\left ( a,b\right ) \) then there exists a point \(\xi \) between \(b,a\) s.t.

\[ f\left ( b\right ) -f\left ( a\right ) =f^{\prime }\left ( \xi \right ) \left ( b-a\right ) \]

3.2.3 Rolle’s theorem

if \(f\left ( x\right ) \) is continuous on \([a,b]\) and if \(f^{\prime }\left ( x\right ) \) exist on \(\left ( a,b\right ) \) and if \(f\left ( a\right ) =f\left ( b\right ) \) then \(f^{\prime }\left ( \xi \right ) =0\) for some point in \(\left ( a,b\right ) \)

3.3 Section 1.2 Order of convergence, linear, superlinear, quadratic

3.3.1 convergent sequence, limit definition

definition: A sequence of numbers \(x_{n}\) converges to limit \(x^{\ast }\), if for every positive number \(\epsilon \) there exist a number \(r\) such that \(\left \vert x_{n}-x^{\ast }\right \vert <\epsilon \) for all \(n>r\)

note: Hence to show a sequence converges to \(x^{\ast }\), all what we have to do is find \(r\) such that for any given \(\epsilon \), we get \(\left \vert x_{n}-x^{\ast }\right \vert <\epsilon \) whenever \(n>r\)

Example: to show that \(x_{n}=\frac {n+1}{n}\) converges to \(1\), need to show that there exist a number \(r\) such that \(\left \vert \frac {n+1}{n}-1\right \vert <\epsilon \) whenever \(n>r\). Rewrite we have \(\left \vert \frac {1}{n}\right \vert <\epsilon \), hence we see that \(r=\epsilon ^{-1}\), because whenever \(n>r\), then \(\left \vert \frac {1}{n}\right \vert <\epsilon \). For example, assume \(\epsilon =0.1,\) then \(r=10,\) and whenever \(n>10\), we have \(\frac {1}{n}<0.1\)

It is clear that \(\lim _{n->\infty }\frac {n+1}{n}=1\).

3.3.2 order of convergence

The order of the convergence of a sequence is the largest number \(q\) s.t. \(\lim _{n\rightarrow \infty }\frac {\left \vert x_{n+1}-x^{\ast }\right \vert }{\left \vert x_{n}-x^{\ast }\right \vert ^{q}}\) exist

This is the same as writing \(\lim _{n\rightarrow \infty }\frac {\left \vert e_{n+1}\right \vert }{\left \vert e_{n}\right \vert ^{q}}\) where \(e_{n}\) is the error at \(x_{n}\)

3.3.2.1 Linear

The rate of convergence is linear if we can find constant \(c<1\) and integer \(N\) s.t.

\[ \left \vert e_{n+1}\right \vert \leq c\ \left \vert e_{n}\right \vert \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ n\geq N \]
3.3.2.2 superlinear

The rate of convergence is superlinear if we can find sequence \(\varepsilon _{n}\) tending to zero and integer \(N\) s.t.

\[ \left \vert e_{n+1}\right \vert \leq \varepsilon _{n}\ \left \vert e_{n}\right \vert \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ n\geq N \]

question: I do not understand this definition Can I say it as the linear, but an exponent \(1<\alpha <2\), and write

The rate of convergence is superlinear if we can find constant \(c<1\) and integer \(N\) s.t.

\[ \left \vert e_{n+1}\right \vert \leq c\ \left \vert e_{n}\right \vert ^{\alpha }\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ n\geq N \]
3.3.2.3 quadratic

The rate of convergence is quadratic if we can find constant \(c\) NOT Necessarily less than 1, and integer \(N\) s.t.

\[ \left \vert e_{n+1}\right \vert \leq c\ \left \vert e_{n}\right \vert ^{2}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ n\geq N \]

idea: To show that a sequence converges quadratically to some limit, start with the expression for \(e_{n+1}\) and manipulate it to show that that is it \(\leq \) some constant \(\times e_{n}\)

3.3.3 Big O and little o

These are means by which to compare 2 sequences to each others.

def: big O: One sequence \(x_{n}\) is bounded by a linear scaled version of a second sequence

we say that \(x_{n}=O\alpha _{n}\) if there is a constant \(C\) and \(N\), s.t. \(x_{n}\leq C\alpha _{n}\) for all \(n>N\)

How to find if \(x_{n}=O\left ( \alpha _{n}\right ) \)? start with \(x_{n}\) expression and manipulate it so that it has only terms that contain \(\alpha _{n}\) with some multipliers (the constant).

Or, easier, just look to see if \(x_{n}\) is bigger or smaller than \(\alpha _{n}\), if it is bigger, then it goes to zero AFTER \(\alpha _{n}\), hence use BIG O. if it is smaller, then it goes to zero BEFORE \(\alpha _{n}\), hence us little o.

def: little o: we say \(x_{n}=o\left ( \alpha _{n}\right ) \) if \(\lim _{n\rightarrow \infty }\frac {x_{n}}{\alpha _{n}}=0\)

To test the above, given \(x_{n}=\frac {n+1}{n^{2}}\) and \(\alpha _{n}=\frac {1}{n}\), what is the relation between them? we see that \(x_{n}=\frac {1}{n}+\frac {1}{n^{2}}\), so it is BIGGER than \(\alpha _{n}\), so it goes to zero AFTER. Hence \(x_{n}=O\alpha _{n}\)

given \(x_{n}=\frac {1}{n\ln n}\), we see that this is SMALLER than \(\alpha _{n}\), hence it will go to zero BEFORE \(\alpha _{n}\), hence \(x_{n}=o\left ( \alpha _{n}\right ) \)

question: verify I can do this reasoning all the time.

3.4 Section 1.3. difference equations, characteristic polynomial, simple and repeated roots, analytic solution and stability

Given a linear recursive equation such as \(x_{n+1}=f\left ( x_{n},x_{n-1,}\cdots \right ) \), the goal is to find an non-recursive solution for \(x_{n}\)

To do that, we introduce the shift operator \(E\), rewrite the recursive equation in terms of \(E\), we get a polynomial in \(E\) which we solve. and depending on how the root come out, we get a solution for \(x_{n}\), which is called the explicit solution.

Notice that the explicit solution gives the value of \(x_{n}\) right away as a function of \(n\). No recursion is needed to find \(x\) for some specific \(n\), hence one can get numerical problems with the recursive solution due to cancellation errors, while the explicit solution will not show this problem. It is always better to use the explicit solution.

3.4.1 Example simple roots, no repeated roots

\begin{align*} x_{n+1} & =3x_{n}-2x_{n-1}\\ E^{2}x_{n-1} & =3Ex_{n-1}-2E^{0}x_{n-1}\\ \left ( E^{2}-3E+2\right ) x_{n-1} & =0 \end{align*}

Solve \(P\left ( E\right ) =0\), we get roots \(\lambda _{1}=2,\lambda _{2}=1\), hence simple distinct roots. Hence explicit solution is

\begin{align*} x_{n} & =A_{1}\lambda _{1}^{n}+A_{2}\lambda _{2}^{n}\\ & =A_{1}2^{n}+A_{2}\end{align*}

Now \(A_{1}\) and \(A_{2}\) can be found from initial conditions

3.4.2 Example simple roots, repeated roots

\[ 4x_{n}+7x_{n-1}+2x_{n-2}-x_{n-3}=0 \]
\[ P\left ( E\right ) =4E^{3}x_{n-3}+7E^{2}x_{n-3}+2Ex_{n-3}-E^{0}x_{n-3}=0 \]

Hence roots are found from \(\left ( E+1\right ) ^{2}\left ( 4E-1\right ) =0\), so \(\lambda _{1}=\lambda _{2}=-1\) repeated 2 times, and \(\lambda _{3}=\frac {1}{4}\)

hence solution is

\[ x_{n}=\left ( A_{1}\lambda _{1}^{n}+A_{2}n\lambda _{2}^{n-1}\right ) +A_{3}\lambda _{3}^{n}\]

so it a root is repeated \(k\) times, we write

\[ x_{n}=\left ( A_{1}\lambda _{1}^{n}+A_{2}n\lambda _{2}^{n-1}+A_{3}n\left ( n-1\right ) \lambda _{3}^{n-2}+\cdots +A_{k}n\left ( n-1\right ) \left ( n-2\right ) \cdots \left ( \lambda _{k}^{n-k+1}\right ) \right ) +A_{k+1}\lambda _{k+1}^{n}\]

so here we have

\[ x_{n}=A_{1}\left ( -1\right ) ^{n}+A_{2}n\left ( -1\right ) ^{n-1}+A_{3}\left ( \frac {1}{4}\right ) ^{n}\]

and use I.C. to find the coefficients.

3.4.3 Bounded sequence

sequence \(x_{n}=[x_{1},x_{2},\cdots ]\) is bounded if there is a constant \(c\) s.t. \(\left \vert x_{n}\right \vert \leq c\) for all \(n\). i.e. \(\sup _{n}\left \vert x_{n}\right \vert <\infty \)

3.4.3.1 Stable solution for the difference equation \(P\left ( E\right ) x=0\)

\(P\left ( E\right ) x=0\) is stable if all its solutions are stable.

Theorem: polynomial \(p\) satisfying \(P\left ( 0\right ) \neq 0\) the difference equation \(P\left ( E\right ) x=0\) is stable iff all simple roots of \(p\left ( E\right ) =0\) are \(\leq 1\) and all repeated roots are \(<1\)

So to find if a recursive equation is stable, find the roots of \(P\left ( E\right ) =0\) and check as per above

3.5 Section 3.1 Bisection

After n iterations, \(c_{n}\) , which is \(\left ( \frac {a_{n}+b_{n}}{2}\right ) \) is at a distance from root given by

\[ \left \vert c_{n}-r\right \vert =\frac {b_{0}-a_{0}}{2^{n+1}}\]

3.6 Section 3.2 Newton root finding

Understanding Newton method

Start from the line equation. You remember we normally write it as

\(y=c+mx\) where \(c\) is the y-axis intercept and \(m\) is the slope. But this equation always implicitly assumed that the slope is taken at \(x_{0}=0\), and the intercept is also at \(x_{0}=0\), hence the above can be written as

\begin{equation} y=f\left ( x\right ) =f\left ( x_{0}\right ) +f^{\prime }\left ( x_{0}\right ) \left ( x-x_{0}\right ) \tag {1}\end{equation}

The above equation is the same as \(y=c+mx\) when \(x_{0}=0\)

So from (1)

\begin{align*} f\left ( x\right ) & =f\left ( x_{0}\right ) +f^{\prime }\left ( x_{0}\right ) \left ( x-x_{0}\right ) \\ \frac {f\left ( x\right ) -f\left ( x_{0}\right ) }{f^{\prime }\left ( x_{0}\right ) } & =\left ( x-x_{0}\right ) \end{align*}

Now instead of writing \(x\) and \(x_{0}\) we write \(x_{n+1}\) and \(x_{n}\), so the above becomes

\begin{align*} \frac {f\left ( x_{n+1}\right ) -f\left ( x_{n}\right ) }{f^{\prime }\left ( x_{n}\right ) } & =\left ( x_{n+1}-x_{n}\right ) \\ x_{n+1} & =\frac {f\left ( x_{n+1}\right ) -f\left ( x_{n}\right ) }{f^{\prime }\left ( x_{n}\right ) }+x_{n}\end{align*}

That is it. Now for \(x_{n+1}\), \(f\left ( x_{n+1}\right ) =0\), and that is the whole idea. Replace \(f\left ( x_{n+1}\right ) \) by zero in the above we obtain

\[ x_{n+1}=x_{n}-\frac {f\left ( x_{n}\right ) }{f^{\prime }\left ( x_{n}\right ) }\]

Which is Newton method.

Theorem: Let \(f^{\prime \prime }\,\)be C\(^{2}\) and let \(r\) be a simple zero. There is a neighborhood of r and a constant C s.t. if Newton method is started in that neighborhood it will converge to r according to \(\left \vert x_{n+1}-r\right \vert \leq C\left \vert x_{n}-1\right \vert ^{2}\)

Proof of quadratic convergence order

From diagram we see that

\begin{equation} e_{n+1}=r-x_{n+1} \tag {1}\end{equation}

But from definition of Newton root finding

\begin{equation} x_{n+1}=x_{n}-\frac {f\left ( x_{n}\right ) }{f^{\prime }\left ( x_{n}\right ) } \tag {2}\end{equation}

substituting (2) into (1) gives

\begin{align} e_{n+1} & =r-\left ( x_{n}-\frac {f\left ( x_{n}\right ) }{f^{\prime }\left ( x_{n}\right ) }\right ) \nonumber \\ & =\overset {e_{n}}{\overbrace {r-x_{n}}}+\frac {f\left ( x_{n}\right ) }{f^{\prime }\left ( x_{n}\right ) }\nonumber \\ & =e_{n}+\frac {f\left ( x_{n}\right ) }{f^{\prime }\left ( x_{n}\right ) }\nonumber \\ & =\frac {e_{n}f^{\prime }\left ( x_{n}\right ) +f\left ( x_{n}\right ) }{f^{\prime }\left ( x_{n}\right ) } \tag {3}\end{align}

Now evaluate \(f\left ( x\right ) \) at \(r\) by expanding it using Taylor around \(x_{n}\)

\[ f\left ( r\right ) =f\left ( x_{n}\right ) +hf^{\prime }\left ( x_{n}\right ) +\frac {h^{2}f^{\prime \prime }\left ( \xi \right ) }{2!}\]

But \(h\) is the distance between \(r\) and \(x_{n}\), which is \(e_{n}\), hence the above becomes

\[ f\left ( r\right ) =f\left ( x_{n}\right ) +e_{n}f^{\prime }\left ( x_{n}\right ) +\frac {e_{n}^{2}f^{\prime \prime }\left ( \xi \right ) }{2!}\]

But \(f\left ( r\right ) =0\) since this is a root finding, and that is our goal. Hence the above becomes

\begin{align} 0 & =f\left ( x_{n}\right ) +e_{n}f^{\prime }\left ( x_{n}\right ) +\frac {e_{n}^{2}f^{\prime \prime }\left ( \xi \right ) }{2!}\nonumber \\ f\left ( x_{n}\right ) +e_{n}f^{\prime }\left ( x_{n}\right ) & =-\frac {f^{\prime \prime }\left ( \xi \right ) }{2!}e_{n}^{2} \tag {4}\end{align}

Substituting (4) into numerator of (3) gives

\begin{align*} e_{n+1} & =\frac {-\frac {f^{\prime \prime }\left ( \xi \right ) }{2!}e_{n}^{2}}{f^{\prime }\left ( x_{n}\right ) }\\ & =-\frac {f^{\prime \prime }\left ( \xi \right ) }{2f^{\prime }\left ( x_{n}\right ) }e_{n}^{2}\end{align*}

but \(\frac {f^{\prime \prime }\left ( \xi \right ) }{f^{\prime }\left ( x_{n}\right ) }\approx \frac {f^{\prime \prime }\left ( r\right ) }{f^{\prime }\left ( r\right ) }=k\) (some constant), hence above becomes

\[ e_{n+1}=k_{1}e_{n}^{2}\]

where \(k_{1}\) is some constant as well.

Hence we can find a constant C such that \(e_{n+1}\leq Ce_{n}^{2}\) where \(C\) is any constant smaller than \(k_{1}\)

The above proofes that Newton method converges quadratically.

3.6.1 Definition of convex function

A function \(f\left ( x\right ) \) is convex if \(f^{\prime \prime }\left ( x\right ) \) \(\geq 0\) for all \(x.\)

Mathworld has this definition

"A convex function is a continuous function whose value at the midpoint of every interval in its domain does not exceed the arithmetic mean of its values at the ends of the interval." diagram below from Mathworld

How to use Newton method to find \(\sqrt {R}\) ?

Let \(x=\sqrt {R}\), hence \(x^{2}-R=0\) is \(f\left ( x\right ) \) to use with Newton method. This leads to \(x_{n+1}=x_{n}-\frac {x_{n}^{2}-R}{2x_{n}}=\frac {1}{2}\left ( x_{n}+\frac {R}{x_{n}}\right ) \)

3.6.2 Newton method to solve set of equations

Writing

\begin{align*}\begin {bmatrix} x_{n+1}\\ y_{n+1}\end {bmatrix} & =\begin {bmatrix} x_{n}\\ y_{n}\end {bmatrix} -F\left ( x_{n}\right ) J^{-1}\left ( x_{n}\right ) \\ & =\begin {bmatrix} x_{n}\\ y_{n}\end {bmatrix} -\begin {bmatrix} F_{1}\left ( x_{n},y_{n}\right ) \\ F_{2}\left ( x_{n},y_{n}\right ) \end {bmatrix}\begin {bmatrix} \frac {\partial F_{1}\left ( x_{n},y_{n}\right ) }{x_{n}} & & \frac {\partial F_{1}\left ( x_{n},y_{n}\right ) }{y_{n}}\\ \frac {\partial F_{2}\left ( x_{n},y_{n}\right ) }{x_{n}} & & \frac {\partial F_{2}\left ( x_{n},y_{n}\right ) }{x_{n}}\end {bmatrix} \end{align*}

3.7 Section 7.1 Numerical differentiation and Richardson extrapolation

Some points to know

  1. If a function \(f\left ( x\right ) \) is known at \(n\) points, and we also know that the function is a polynomial of at most \(n-1\) degree, then we can find the polynomial exactly by solving \(n\) equations and finding the \(c_{0},c_{1},\cdots ,c_{n}\) coefficients. Hence no need to do numerical differentiation, we can do analytical differentiation.
  2. Remember this for Taylor: \(f\left ( x+h\right ) =f\left ( x\right ) +hf^{\prime }\left ( x\right ) +\frac {h^{2}}{2}f^{\prime \prime }\left ( \xi \right ) \) for this to be valid, \(f\left ( x\right ) ,f^{\prime }\left ( x\right ) \) have to be continuous in the CLOSED interval between \(x\) and \(h\) while \(f^{\prime \prime }\left ( x\right ) \) need to exist on the OPEN interval.

\(f\left ( \sqrt {2}+h\right ) -\frac {df}{dx}@\left ( \sqrt {2}\right ) \ \)

\(f\left ( \sqrt {2}\right ) \ \)

\(f^{\prime }\left ( \sqrt {2}\right ) =\frac {f\left ( \sqrt {2}+h\right ) -f\left ( \sqrt {2}\right ) }{h}\ \)

Chapter 4
HWs

4.1 lookup table
4.2 HW 1
4.3 HW 2
4.4 HW 3
4.5 HW 4
4.6 HW 5
4.7 HW 6
4.8 HW 7
4.9 HW 8
4.10 HW 9
4.11 HW 10
4.12 HW 11
4.13 HW 12

4.1 lookup table

#

date

section/problems

my solution

score

1

1.1: 10,16,24,32 1.2: 6(b,e),7(b,c),10,28,40 1.3: 9,11,12,25

18/20

2

2.2: 9,12,16,21,2.3: 2,4,6,7, 3.1: 2,14,15,16 3.2: 9,15,16,17,19,22,23,32

20/20

3

3.3: 4,5,6, 3.4: 4,5,10,12,13,29,40

18/20

4

3.5: 1,2,3,5,6,10, 4.1: 15,16,17,18, 4.2: 1,5,13,27,30,33,39,47

20/20, 17/20

5

march 2

4.2: 1,5,13,27,30,33,39,47

26/30

6

4.3: 1 (b),(e), 30,31,39,43,45

35/35

7

4.4: 7(a),(c), 21, 37, 40 (a),(c), 4.5: 2,5,8,12,22,24

18/20, 15/15

8

4/4/07

4.6: 2,14,16,17, 4.7: 1,2,6

Including computer assignment on iterative solvers) Richardson, Jacobi, Gauss-Seidel, SOR, Steepest descent

25/25

9

4/24/07

4.7: 9, and computer assignment on finding eigenvalues using power,InversePower,Shifted power, Shifted inverse power

20/20

10

5/3/07

5.3: 2,3,14,16,20,29,30,37

10/20

11

5/8/07

6.2: 13,22,26,27,37, 6.3: 4,9,12,23

12

5/15/07

100/100

4.2 HW 1

UP

PDF (letter size)

PDF (legal size)

4.2.1 Section 1.1, Problem 10

Problem: Prove or disprove this assertion: if \(f\) is differentiable at \(x\), then for any \(\alpha \neq 1\)

\[ f^{\prime }\left ( x\right ) =\lim _{h\rightarrow 0}\frac {f\left ( x+h\right ) -f\left ( x+\alpha h\right ) }{h-\alpha h}\]

Solution:

Since \(f^{\prime }\left ( x\right ) \) exists, then expanding \(f\left ( x+h\right ) \) and \(f\left ( x+\alpha h\right ) \) in Taylor series results in

\begin{align*} f\left ( x+h\right ) & =f\left ( x\right ) +f^{\prime }\left ( x\right ) h+\cdots \text {(Higher\ Order\ Terms\ involving\ }h^{m}\text { where }m\geq 2\text {)}\\ f\left ( x+\alpha h\right ) & =f\left ( x\right ) +\left ( \alpha h\right ) f^{\prime }\left ( x\right ) +\cdots \text {(Higher\ Order\ Terms\ involving\ }\left ( \alpha h\right ) ^{m}\text { where }m\geq 2\text {)}\end{align*}

From first equation above we write

\begin{equation} f\left ( x\right ) =f\left ( x+h\right ) -hf^{^{\prime }}\left ( x\right ) -(\text {Higher\ Order\ Terms\ involving\ }h^{m}\text {where }m\geq 2) \tag {1}\end{equation}

And from the second equation we write

\begin{equation} f\left ( x\right ) =f\left ( x+\alpha h\right ) -\left ( \alpha h\right ) f^{^{\prime }}\left ( x\right ) -(\text {Higher\ Order\ Terms\ involving\ }\left ( \alpha h\right ) ^{m}\ \text {where }m\geq 2) \tag {2}\end{equation}

equating equations (1)-(2)=0 we obtain

\begin{align*} \left [ f\left ( x+h\right ) -hf^{^{\prime }}\left ( x\right ) -O\left ( \ h^{m}\right ) \right ] -\left [ f\left ( x+\alpha h\right ) -\left ( \alpha h\right ) f^{^{\prime }}\left ( x\right ) -O\ \left ( \alpha h\right ) ^{m}\right ] & =0\\ f^{\prime }\left ( x\right ) \left [ \alpha h-h\right ] +f\left ( x+h\right ) -f\left ( x+\alpha h\right ) -O\ \left ( h^{m}\right ) +O\left ( \alpha h\right ) ^{m} & =0 \end{align*}

Keep \(f^{\prime }\left ( x\right ) \) on one side, and move everything to the other side results in

\[ f^{\prime }\left ( x\right ) =\frac {f\left ( x+\alpha h\right ) -f\left ( x+h\right ) }{\left ( \alpha h-h\right ) }+\frac {\left ( O\ \left ( h^{m}\right ) -O\left ( \alpha h\right ) ^{m}\right ) }{\left ( \alpha h-h\right ) }\]

As \(h\) goes to zero the above reduces to

\[ f^{\prime }\left ( x\right ) =\lim _{h\rightarrow 0}\frac {f\left ( x+\alpha h\right ) -f\left ( x+h\right ) }{\left ( \alpha h-h\right ) }\]

rearrange the sign results in

\[ f^{\prime }\left ( x\right ) =\lim _{h\rightarrow 0}\frac {f\left ( x+h\right ) -f\left ( x+\alpha h\right ) }{\left ( h-\alpha h\right ) }\]

4.2.2 Section 1.1, Problem 16

Problem: If the series for \(\ln x\) is truncated after the term involving \(\left ( x-1\right ) ^{1000}\) and is then used to compute \(\ln 2\), what bound on the error can be give?

Answer:

Assume \(\ln x\) has a power series expansion around \(x_{0}\), we write, from definition of power series

\begin{equation} \ln \left ( x\right ) =a_{0}+a_{1}\left ( x-x_{0}\right ) +a_{2}\left ( x-x_{0}\right ) ^{2}+a_{3}\left ( x-x_{0}\right ) ^{3}+\cdots +a_{n}\left ( x-x_{0}\right ) ^{n}+\cdots \tag {1}\end{equation}

When \(x_{0}=1\) we get

\begin{equation} \ln \left ( x\right ) =a_{0}+a_{1}\left ( x-1\right ) +a_{2}\left ( x-1\right ) ^{2}+a_{3}\left ( x-1\right ) ^{3}+\cdots +a_{n}\left ( x-1\right ) ^{n}+\cdots \tag {2}\end{equation}

At \(x=1\) we obtain \(a_{0}=0\) since \(\ln \left ( 1\right ) =0\)

Differentiate (2)

\begin{equation} \frac {1}{x}=a_{1}+2a_{2}\left ( x-1\right ) +3a_{3}\left ( x-1\right ) ^{2}+\cdots +na_{n}\left ( x-1\right ) ^{n-1}+\cdots \tag {3}\end{equation}

At \(x=1\) we obtain \(a_{1}=1\)

Differentiate (3)

\begin{equation} \frac {-1}{x^{2}}=2a_{2}+\left ( 3\times 2\right ) a_{3}\left ( x-1\right ) +\cdots +n\left ( n-1\right ) a_{n}\left ( x-1\right ) ^{n-2}+\cdots \tag {4}\end{equation}

At \(x=1\) we obtain \(-1=2a_{2}\rightarrow a_{2}=\frac {-1}{2}\)

Differentiate (4)

\begin{equation} \frac {2}{x^{3}}=\left ( 3\times 2\right ) a_{3}+\cdots +n\left ( n-1\right ) \left ( n-2\right ) a_{n}\left ( x-1\right ) ^{n-3}+\cdots \tag {5}\end{equation}

at \(x=1\) we obtain \(a_{3}=\frac {1}{3}\)

continue as above, we obtain the power series for \(\ln \left ( x\right ) \) as

\[ \ln \left ( x\right ) =\left ( x-1\right ) -\frac {1}{2}\left ( x-1\right ) ^{2}+\frac {1}{3}\left ( x-1\right ) ^{3}-\frac {1}{4}\left ( x-1\right ) ^{4}+\cdots +\frac {\left ( -1\right ) ^{n+1}}{n}\left ( x-1\right ) ^{n}+\cdots \]

Notice that for the above to converge, we need to have \(\left ( x-1\right ) \leq 1\), or \(x\leq 2\)

Now if the series is truncated after \(\left ( x-1\right ) ^{1000}\), hence \(n=1000\), and the maximum error will be the \(\left ( n+1\right ) \) term.

Hence

\begin{align*} E & \leq \left \vert \frac {\left ( -1\right ) ^{1002}}{1001}\left ( x-1\right ) ^{1001}\right \vert \\ & \leq \frac {\left ( x-1\right ) ^{1001}}{1001}\end{align*}

which for \(x=2\)

\begin{align*} E & \leq \frac {\left ( 2-1\right ) ^{1001}}{1001}\\ & \leq \frac {1}{1001}\\ & \leq 9.\,\allowbreak 99\times 10^{-4}\end{align*}

4.2.3 Section 1.1, Problem 24

Problem: For small values of \(x\), how good is the approximation for \(\cos x\approx 1-\frac {1}{2}x^{2}\)? for what range of values will this approximation give correct results rounded to 3 decimal places?

Answer:

Expand \(\cos \left ( x\right ) \) in power series, we write

\[ \cos \left ( x\right ) =a_{0}+a_{1}\left ( x-x_{0}\right ) +a_{2}\left ( x-x_{0}\right ) ^{2}+a_{3}\left ( x-x_{0}\right ) ^{3}+\cdots +a_{n}\left ( x-x_{0}\right ) ^{n}+\cdots \]

expand at \(x_{0}=0\)

\[ \cos \left ( x\right ) =a_{0}+a_{1}x+a_{2}x^{2}+a_{3}x^{3}+\cdots +a_{n}x^{n}+\cdots \]

At \(x=0\rightarrow a_{0}=1\), Differentiate the above

\[ -\sin \left ( x\right ) =a_{1}+2a_{2}x+3a_{3}x^{2}+\cdots +na_{n}x^{n-1}+\cdots \]

at \(x=0\rightarrow a_{1}=0\), Differentiate the above

\[ -\cos \left ( x\right ) =2a_{2}+\left ( 3\times 2\right ) a_{3}x+\cdots +n\left ( n-1\right ) a_{n}x^{n-2}+\cdots \]

at \(x=0\rightarrow a_{2}=-\frac {1}{2}\), continue as above, we obtain the series for \(\cos \left ( x\right ) \) as

\[ \cos \left ( x\right ) =1-\frac {1}{2}x^{2}+\frac {1}{4!}x^{4}-\cdots +\frac {\left ( -1\right ) ^{n+1}}{n!}x^{n}+\cdots \]

Hence if we truncate the series at \(\cos \left ( x\right ) \approx 1-\frac {1}{2}x^{2}\), then the maximum error will be bounded by

\[ E\leq \frac {1}{4!}x^{4}\]

Since we want the error to be correct to 3 decimal places, then we write

\[ E<0.001 \]

Hence

\begin{align*} x^{4} & <4!(0.001)\\ & <0.024 \end{align*}

Hence

\[ x<\left ( 0.024\right ) ^{\frac {1}{4}}=0.393\,60 \]

So for \(x<0.393\,60\) radians (about \(22.\,\allowbreak 552^{0}\)), the approximation \(\cos \left ( x\right ) \approx 1-\frac {1}{2}x^{2}\) give correct results to 3 decimal places.

A small code to verify:

4.2.4 Section 1.1, Problem 32

Problem: First develop the function \(\sqrt {x}\) in a series of powers of \(\left ( x-1\right ) \) and then use it to approximate \(\sqrt {0.99999\ 99995}\) to 10 decimal places.

Solution:

\[ \sqrt {x}=a_{0}+a_{1}\left ( x-x_{0}\right ) +a_{2}\left ( x-x_{0}\right ) ^{2}+a_{3}\left ( x-x_{0}\right ) ^{3}+\cdots +a_{n}\left ( x-x_{0}\right ) ^{n}+\cdots \]

Expand at \(x_{0}=1\)

\[ \sqrt {x}=a_{0}+a_{1}\left ( x-1\right ) +a_{2}\left ( x-1\right ) ^{2}+a_{3}\left ( x-1\right ) ^{3}+\cdots +a_{n}\left ( x-1\right ) ^{n}+\cdots \]

at \(x=1\rightarrow a_{0}=1\), differentiating the above we obtain

\[ \frac {1}{2\sqrt {x}}=a_{1}+2a_{2}\left ( x-1\right ) +3a_{3}\left ( x-1\right ) ^{2}+\cdots +na_{n}\left ( x-1\right ) ^{n-1}+\cdots \]

at \(x=1\rightarrow a_{1}=\frac {1}{2}\), differentiate the above we obtain

\[ \frac {-1}{4\left ( x\right ) ^{3/2}}=2a_{2}+\left ( 3\times 2\right ) a_{3}\left ( x-1\right ) +\cdots +n\left ( n-1\right ) a_{n}\left ( x-1\right ) ^{n-2}+\cdots \]

at \(x=1\rightarrow a_{2}=-\frac {1}{4\times 2}=-\frac {1}{8}\), differentiate the above we obtain

\[ \frac {3}{8\left ( x\right ) ^{5/2}}=\left ( 3\times 2\right ) a_{3}+\cdots +n\left ( n-1\right ) \left ( n-2\right ) a_{n}\left ( x-1\right ) ^{n-3}+\cdots \]

at \(x=1\rightarrow a_{3}=\frac {3}{8\left ( 3\times 2\right ) }=\frac {1}{16}\) differentiating the above gives

\[ -\frac {15}{16\left ( x\right ) ^{7/2}}=\left ( 4\times 3\times 2\right ) a_{4}+\cdots +n\left ( n-1\right ) \left ( n-2\right ) \left ( n-3\right ) a_{n}\left ( x-1\right ) ^{n-4}+\cdots \]

at \(x=1\rightarrow a_{4}=-\frac {15}{16\left ( 4\times 3\times 2\right ) }=-\frac {5}{128}\)

Hence the series is

\[ \sqrt {x}=1+\frac {1}{2}\left ( x-1\right ) -\frac {1}{8}\left ( x-1\right ) ^{2}+\frac {1}{16}\left ( x-1\right ) ^{3}-\frac {5}{128}\left ( x-1\right ) ^{4}+\cdots \]

Note: For convergence we require \(\left \vert x\right \vert \leq 1\)

We want accuracy to 10 decimal places. Since

\[ \sqrt {0.99999\ 99995}=0.9999\ 999974 \]

Then the series, using 2 terms gives

\[ \sqrt {0.99999\ 99995}\approx \left ( 1+\frac {1}{2}\left ( x-1\right ) \right ) _{x=0.9999999995}=1+\frac {1}{2}\left ( 0.9999999995-1\right ) =0.9999\ 999975 \]

hence 2 terms are only needed. hence \(n=1\)

4.2.5 Section 1.2, problem 6(b,e)

Problem: For the pair \(\left ( x_{n},\alpha _{n}\right ) \), is it true that \(x_{n}=O\left ( \alpha _{n}\right ) \) as \(n\rightarrow \infty ?\)

b) \(x_{n}=5n^{2}+9n^{3}+1,\alpha _{n}=1\)

e) \(x_{n}=\sqrt {n+3},\alpha _{n}=\frac {1}{n}\)

Solution:

b) Assume that \(5n^{2}+9n^{3}+1\leq C\left ( \alpha _{n}\right ) \) hence \(5n^{2}+9n^{3}+1\leq C\), but since \(n>1\) and keeps increasing, then no matter how large a \(C\) we select, \(5n^{2}+9n^{3}\) will eventually become larger than any constant \(C\) we choose when \(n>N\,\ \) for sufficiently large \(N\).

Hence there is no such \(C\), hence the answer is NOT TRUE.

e) we see that \(lim_{n\rightarrow \infty }x_{n}=\infty \), however \(\lim _{n\rightarrow \infty }\frac {1}{n}=0\), hence it is not possible to find \(C\) s.t. \(\sqrt {n+3}\leq C\frac {1}{n}\) for any \(N\). Hence the answer is NOT TRUE.

4.2.6 Section 1.2, problem 7(b,c)

Problem: Choose the correct assertion (in each, \(n\rightarrow \infty )\)

b) \(\frac {n+1}{\sqrt {n}}=o\left ( 1\right ) \)

c)\(\frac {1}{\ln n}=O\left ( \frac {1}{n}\right ) \)

Solution:

b) \(x_{n}=\frac {n+1}{\sqrt {n}},\alpha _{n}=1\).

\begin{align*} \lim _{n\rightarrow \infty }\left ( \frac {x_{n}}{\alpha _{n}}\right ) & =\lim _{n\rightarrow \infty }\left ( \frac {n+1}{\sqrt {n}}\right ) \\ & =\lim _{n\rightarrow \infty }\left ( \frac {n+1}{\sqrt {n}}\right ) \\ & =\lim _{n\rightarrow \infty }\left ( \frac {n}{\sqrt {n}}\right ) +\lim _{n\rightarrow \infty }\left ( \frac {1}{\sqrt {n}}\right ) \\ & =\lim _{n\rightarrow \infty }\left ( \sqrt {n}\right ) +0\\ & \neq 0 \end{align*}

Since the limit as \(n\rightarrow \infty \) is not zero, hence the assertion is FALSE

c)\(x_{n}=\frac {1}{\ln n},\alpha _{n}=\frac {1}{n}\). Since \(\ln n\) grows less rapidly than \(n\,\ \)then \(\frac {1}{\ln n}\)grows more rapidly than \(\frac {1}{n}\), Hence it is not possible to find some constant \(C\) s.t. \(\frac {1}{\ln n}\leq C\frac {1}{n}\) , hence assertion is FALSE

4.2.7 Section 1.2, problem 10

Problem: Show that these assertions are not true:

a) \(e^{x}-1=O\left ( x^{2}\right ) \) as \(x\rightarrow 0\)

b) \(x^{-2}=O\left ( \cot x\right ) \) as \(x\rightarrow 0\)

c) \(\cot x=o\left ( x^{-1}\right ) \) as \(x\rightarrow 0\)

Answer:

a) We need to

\begin{align*} x_{n} & =e^{x}-1\\ & =\left ( 1+x+\frac {x^{2}}{2!}+\frac {x^{3}}{3!}+\cdots \right ) -1\\ & =x+\frac {x^{2}}{2!}+\frac {x^{3}}{3!}+\cdots \end{align*}

and \(\alpha _{n}=x^{2}\)

As \(x\rightarrow 0\,\ \)the term \(x\) will become larger than \(x^{2}\), hence near \(x=0\), \(x_{n}>\alpha _{n}\) since near \(x=0\) \(x+\frac {x^{2}}{2!}+\frac {x^{3}}{3!}+\cdots >x^{2}\)

Therefore it is not possible to find a constant \(C\) such that \(x_{n}\leq C\alpha _{n}\) near \(x=0\) since for any constant \(C\) we select, no matter how small, we can find \(x\) closer to zero such that \(x_{n}>C\alpha _{n}\), Hence assertion is not true.

b) The power series for \(\cot \left ( x\right ) \) is (Using CAS:).

Here we have \(x_{n}=x^{-2}\) and \(\alpha _{n}=\frac {1}{x}-\frac {x}{3}-\frac {x^{3}}{45}-\cdots \)

As \(x\rightarrow 0\), \(\ \)then \(\alpha _{n}\rightarrow \frac {1}{x}\)

But \(\frac {1}{x}\) will grow less rapidly than \(\frac {1}{x^{2}}\)would as \(x\rightarrow 0\), hence it is not possible to find a constant \(C\) such that \(x_{n}\leq C\alpha _{n}\) near \(x=0\) since for any constant \(C\) we select, no matter how small, can find \(x\) closer to zero such that \(x_{n}>C\alpha _{n}\), Hence assertion is not true.

c) \(x_{n}=\cot \left ( x\right ) =\frac {1}{x}-\frac {x}{3}-\frac {x^{3}}{45}-\cdots ,\alpha _{n}=\frac {1}{x}\), hence

\begin{align*} \lim _{x\rightarrow 0}\frac {x_{n}}{\alpha _{n}} & =\lim _{x\rightarrow 0}\frac {\frac {1}{x}-\frac {x}{3}-\frac {x^{3}}{45}-\cdots }{\frac {1}{x}}\\ & =\lim _{x\rightarrow 0}\frac {\frac {1}{x}\left ( 1-\frac {x^{2}}{3}-\frac {x^{4}}{45}-\cdots \right ) }{\frac {1}{x}}\\ & =\lim _{x\rightarrow 0}\left ( 1-\frac {x^{2}}{3}-\frac {x^{4}}{45}-\cdots \right ) \\ & =1 \end{align*}

Since the limit does not go to zero, hence the assertion is not true.

4.2.8 Section 1.2, problem 28

Problem: Prove that \(x_{n}=x+o\left ( 1\right ) \) iff \(\lim _{n\rightarrow \infty }x_{n}=x\)

Solution:

Let \(X_{n}=x_{n}-x,\alpha _{n}=1\)

Forward direction proof:

\begin{align*} \lim _{n\rightarrow \infty }X_{n} & =\lim _{n\rightarrow \infty }\left ( x_{n}-x\right ) \\ & =\left ( \lim _{n\rightarrow \infty }x_{n}\right ) -\left ( \lim _{n\rightarrow \infty }x\right ) \\ & =\left ( \lim _{n\rightarrow \infty }x_{n}\right ) -x \end{align*}

If \(\lim _{n\rightarrow \infty }x_{n}=x\), then the above become \(x-x=0\)

Hence \(\lim _{n\rightarrow \infty }\frac {X_{n}}{\alpha _{n}}=\frac {0}{1}=0\), hence \(X_{n}=o\left ( 1\right ) \) or \(x_{n}-x=o\left ( 1\right ) \) or \(x_{n}=x+o\left ( 1\right ) \)

Now proof in the reverse direction. Assume that \(\lim _{n\rightarrow \infty }x_{n}\neq x\), we need to show that this implies \(x_{n}\neq x+o\left ( 1\right ) \)

If \(\lim _{n\rightarrow \infty }x_{n}\neq x\), then we can say that \(\lim _{n\rightarrow \infty }x_{n}=\beta \), where \(\beta \neq x\), hence \(\lim _{n\rightarrow \infty }X_{n}=\beta -x\)

Hence

\begin{align*} \lim _{n\rightarrow \infty }\frac {X_{n}}{\alpha _{n}} & =\lim _{n\rightarrow \infty }\frac {\beta -x}{\alpha _{n}}\\ & =\frac {\beta -x}{1}\\ & =\beta -x \end{align*}

But since \(\beta \neq x\), then this limit does not go to zero. Hence \(x_{n}\neq x+o\left ( 1\right ) \). This complete the proof.

4.2.9 Section 1.2, problem 40

Problem: Prove: If \(\alpha _{n}\rightarrow 0\), \(x_{n}=O\left ( \alpha _{n}\right ) \), and \(y_{n}=O\left ( \alpha _{n}\right ) \) then \(x_{n}y_{n}=o\left ( \alpha _{n}\right ) \)

Answer: Since \(x_{n}=O\left ( \alpha _{n}\right ) \) then \(x_{n}\leq C_{1}\left ( \alpha _{n}\right ) \), and since \(y_{n}=O\left ( \alpha _{n}\right ) \) then \(y_{n}\leq C_{2}\left ( \alpha _{n}\right ) ,\) where \(C_{1},C_{2}\) are positive constants.

Hence

\begin{align*} x_{n}y_{n} & \leq C_{1}C_{2}\left ( \alpha _{n}\right ) \\ & \leq C\left ( \alpha _{n}\right ) \end{align*}

Where \(C=C_{1}C_{2}\)

But \(x_{n}y_{n}\leq C\left ( \alpha _{n}\right ) \) means that \(x_{n}y_{n}\) is bounded above by \(\alpha _{n}\).

But we are told next that \(\lim _{n\rightarrow \infty }\alpha _{n}=0\), hence this means that the sequence \(x_{n}y_{n}\) will reach zero before the sequence \(\alpha _{n}\). But this is the same as saying that \(x_{n}y_{n}=o\left ( \alpha _{n}\right ) \)

4.2.10 Section 1.3, problem 9

Problem: Prove that if \(L_{1}\) and \(L_{2}\) are linear combinations of powers of \(E\) and if \(L_{1}x=0\), then \(L_{1}L_{2}x=0\)

Answer: Let \(L_{1}=a_{1}E^{n_{1}}+a_{2}E^{n_{2}}+\cdots \) and \(L_{2}=b_{1}E^{m_{1}}+b_{2}E^{m_{2}}+\cdots \)

Then

\begin{align*} L_{1}L_{2}x & =\left ( a_{1}E^{n_{1}}+a_{2}E^{n_{2}}+\cdots \right ) \left ( b_{1}E^{m_{1}}+b_{2}E^{m_{2}}+\cdots \right ) x\\ & =\left ( a_{1}E^{n_{1}}+a_{2}E^{n_{2}}+\cdots \right ) \left ( b_{1}E^{m_{1}}x+b_{2}E^{m_{2}}x+\cdots \right ) \\ & \\ & =\left ( a_{1}E^{n_{1}}\left ( b_{1}E^{m_{1}}x\right ) +a_{2}E^{n_{2}}\left ( b_{1}E^{m_{1}}x\right ) +\cdots \right ) \\ & +\left ( a_{1}E^{n_{1}}\left ( b_{2}E^{m_{2}}x\right ) +a_{2}E^{n_{2}}\left ( b_{2}E^{m_{2}}x\right ) +\cdots \right ) \\ & +\cdots \\ & =\left ( b_{1}a_{1}E^{n_{1}}\left ( E^{m_{1}}x\right ) +b_{1}a_{2}E^{n_{2}}\left ( E^{m_{1}}x\right ) +\cdots \right ) \\ & +\left ( b_{2}a_{1}E^{n_{1}}\left ( E^{m_{2}}x\right ) +b_{2}a_{2}E^{n_{2}}\left ( E^{m_{2}}x\right ) +\cdots \right ) \\ & +\cdots \\ & =\left ( b_{1}a_{1}E^{m_{1}}\left ( E^{n_{1}}x\right ) +b_{1}a_{2}E^{m_{1}}\left ( E^{n_{2}}x\right ) +\cdots \right ) \\ & +\left ( b_{2}a_{1}E^{m_{2}}\left ( E^{n_{1}}x\right ) +b_{2}a_{2}E^{m_{2}}\left ( E^{n_{2}}x\right ) +\cdots \right ) \\ & +\cdots \\ & =\left ( b_{1}E^{m_{1}}+b_{2}E^{m_{2}}+\cdots \right ) \left ( a_{1}E^{n_{1}}x+a_{2}E^{n_{2}}x+\cdots \right ) \\ & =L_{2}L_{1}x\\ & =L_{2}\left ( 0\right ) \\ & =0 \end{align*}

4.2.11 Section 1.3, Problem 11

Problem: Give bases consisting of real sequences for each solution space.

a) \(\left ( 4E^{0}-3E^{2}+E^{3}\right ) x=0\)

b) \(\left ( 3E^{0}-2E+E^{2}\right ) x=0\)

c) \(\left ( 2E^{6}-9E^{5}+12E^{4}-4E^{3}\right ) x=0\)

d) \(\left ( \pi E^{2}-\sqrt {2}E+E^{0}\log 2\right ) x=0\)

Solution:

a) Characteristic equation is \(\lambda ^{3}-3\lambda ^{2}+4=0\), or \(\left ( \lambda +1\right ) \left ( \lambda -2\right ) ^{2}=0\), hence the roots are \(\lambda =-1\), and \(\lambda =2\) or multiplicity 2.

i.e. \(\lambda _{1}=-1,\lambda _{2}=2,\lambda _{3}=2\)

Hence first solution \(x_{1}\left ( n\right ) \ \)associated with \(\lambda _{1}=-1\ \)is \(x_{1}\left ( n\right ) =\lambda _{1}^{n}=-1^{n}\)

the second solution \(x_{2}\left ( n\right ) \ \)associated with \(\lambda _{2}=-2\ \)is \(x_{2}\left ( n\right ) =\lambda _{2}^{n}=2^{n}\)

the third solution \(x_{3}\left ( n\right ) \ \)associated with \(\lambda _{3}=-2\ \)is \(x_{3}\left ( n\right ) =\frac {dx_{2}\left ( n\right ) }{d\lambda }=n\lambda _{2}^{n-1}=n2^{n-1}\)

Hence now we can write some terms of the above 3 basis solutions are follows

\begin{align*} x_{1}\left ( n\right ) & =\left [ \lambda _{1}^{1},\lambda _{1}^{2},\lambda _{1}^{3},\cdots \right ] \\ & =\left [ -1,-1^{2},-1^{3},\cdots \right ] \\ & =\left [ -1,1,-1,1,\cdots \right ] \\ & \\ x_{2}\left ( n\right ) & =\left [ \lambda _{2}^{1},\lambda _{2}^{2},\lambda _{2}^{3},\cdots \right ] \\ & =\left [ 2^{1},2^{2},2^{3},\cdots \right ] \\ & =\left [ 2,4,8,16,32,\cdots \right ] \\ & \\ x_{3}\left ( n\right ) & =\left [ \lambda _{2}^{0},2\lambda _{2}^{1},3\lambda _{2}^{2},4\lambda _{2}^{3},\cdots \right ] \\ & =\left [ \left ( 2^{0}\right ) ,2\left ( 2^{1}\right ) ,3\left ( 2^{2}\right ) ,4\left ( 2^{3}\right ) ,5\left ( 2^{4}\right ) ,\cdots \right ] \\ & =\left [ 1,4,12,32,80,\cdots \right ] \end{align*}

Hence the basis are

\begin{align*} & \left [ -1,1,-1,1,\cdots \right ] \\ & \left [ 2,4,8,16,32,\cdots \right ] \\ & \left [ 1,4,12,32,80,\cdots \right ] \end{align*}

b)\(\left ( 3E^{0}-2E+E^{2}\right ) x=0\)

Characteristic equation is \(\lambda ^{2}-2\lambda +3=0\), The roots are

\begin{align*} \lambda _{1} & =1+\sqrt {2}i\\ \lambda _{2} & =1-\sqrt {2}i \end{align*}

Hence first solution \(x_{1}\left ( n\right ) \ \)associated with \(\lambda _{1}=1+\sqrt {2}i\ \)is \(x_{1}\left ( n\right ) =\lambda _{1}^{n}=\left ( 1+\sqrt {2}i\right ) ^{n}\)

the second solution \(x_{2}\left ( n\right ) \)associated with \(\lambda _{2}=1-\sqrt {2}i\ \)is \(x_{2}\left ( n\right ) =\lambda _{2}^{n}=\left ( 1-\sqrt {2}i\right ) ^{n}\)

Hence

\begin{align*} x_{1}\left ( n\right ) & =\lambda _{1}^{n}\\ & =\left [ \left ( 1+\sqrt {2}i\right ) ^{1},\left ( 1+\sqrt {2}i\right ) ^{2},\left ( 1+\sqrt {2}i\right ) ^{3},\cdots \right ] \\ & =\left [ \left ( 1+\sqrt {2}i\right ) ,\left ( -1+2i\sqrt {2}\right ) ,\left ( -5+i\sqrt {2}\right ) ,\left ( -7-4i\sqrt {2}\right ) ,\cdots \right ] \\ & \\ x_{2}\left ( n\right ) & =\left [ \left ( 1-\sqrt {2}i\right ) ^{1},\left ( 1-\sqrt {2}i\right ) ^{2},\left ( 1-\sqrt {2}i\right ) ^{3},\cdots \right ] \\ & =\left [ \left ( 1-\sqrt {2}i\right ) ,\left ( -1-2i\sqrt {2}\right ) ,\left ( -5-i\sqrt {2}\right ) ,\left ( -7+4i\sqrt {2}\right ) \cdots \right ] \end{align*}

Notice that the 2 basis are conjugate to each others in each term in the sequence.

c)\(\left ( 2E^{6}-9E^{5}+12E^{4}-4E^{3}\right ) x=0\)

Characteristic equation is \(2\lambda ^{6}-9\lambda ^{5}+12\lambda ^{4}-4\lambda ^{3}=0\)

Factoring we obtain \(\lambda \)

\[ ^{3}\left ( 2\lambda -1\right ) \left ( \lambda -2\right ) ^{2}=0 \]

hence the solutions are

\(\lambda =0\) with multiplicity 3, \(\lambda =\frac {1}{2},\lambda =2\) with multiplicity 2.

Hence Solutions associated with \(\lambda =0\) are

\(x_{1}\left ( n\right ) =\lambda ^{n},x_{2}\left ( n\right ) =n\lambda ^{n-1},x_{3}\left ( n\right ) =n\left ( n-1\right ) \lambda ^{n-2}\)

Hence \(x_{1}\left ( n\right ) =\left [ 0,0,0,\cdots \right ] \), and \(x_{2}\) and \(x_{3}\) are also the null sequence.

Solution associated with \(\lambda =\frac {1}{2}\) is \(x_{4}\left ( n\right ) =\lambda ^{n}=\left ( \frac {1}{2}\right ) ^{n}=\left [ \frac {1}{2},\frac {1}{4},\frac {1}{8},\frac {1}{16},\cdots \right ] \)

Solutions associated with \(\lambda =2\) are \(x_{5}\left ( n\right ) =\lambda ^{n}=2^{n}=\left [ 2,4,8,16,\cdots \right ] \)

and \(x_{6}\left ( n\right ) =\frac {dx_{5}}{d\lambda }=n\lambda ^{n-1}=n2^{n-1}=\left [ 1,2\left ( 2\right ) ,3\left ( 2^{2}\right ) ,4\left ( 2^{3}\right ) ,\cdots \right ] =\left [ 1,4,12,32,\cdots \right ] \)

Hence the basis are

\begin{align*} & \left [ 0,0,0,\cdots \right ] \\ & \left [ \frac {1}{2},\frac {1}{4},\frac {1}{8},\frac {1}{16},\cdots \right ] \\ & \left [ 2,4,8,16,\cdots \right ] \\ & \left [ 1,4,12,32,\cdots \right ] \end{align*}

d)\(\left ( \pi E^{2}-\sqrt {2}E+E^{0}\log 2\right ) x\)

Characteristic equation is

\begin{align*} \pi \lambda ^{2}-\sqrt {2}\lambda +\log 2 & =0\\ \lambda ^{2}-\frac {\sqrt {2}}{\pi }\lambda +\frac {\log 2}{\pi } & =0\\ \lambda ^{2}-0.450\,16\ \lambda +0.220\,64 & =0 \end{align*}
\begin{align*} \lambda & =\frac {-b\pm \sqrt {b^{2}-4ac}}{2a}=\frac {0.450\,16\pm \sqrt {0.450\,16^{2}-4\times 0.220\,64}}{2}\\ & =\allowbreak \frac {0.450\,16\pm \sqrt {-0.679\,92}}{2}\\ & =\frac {0.450\,16\pm i0.824\,57}{2}\end{align*}

Hence \(\lambda _{1}=0.225079+i0.41228\) and \(\lambda _{2}=0.225079-i0.41228\)

Hence

\begin{align*} x_{1} & =\lambda _{1}^{n}\\ & =\left ( 0.225079+i0.41228\right ) ^{n}\end{align*}

and

\begin{align*} x_{2} & =\lambda _{2}^{n}\\ & =\left ( 0.225079-i0.41228\right ) ^{n}\end{align*}

4.2.12 Section 1.3, Problem 12

Problem: Prove that if P is a polynomial with real coefficients and if \(z\equiv \left [ z_{1},z_{2},z_{3},\cdots \right ] \) is a complex solution of \(p\left ( E\right ) z=0,\) then the conjugate of \(z\), the real part of \(z\) and the imaginary part of \(z\) are also solutions.

Solution:

\[ P\left ( E\right ) z=0 \]

Take conjugate of both sides

\begin{align*} \overline {P\left ( E\right ) \ z} & =\overline {0}\\ \overline {P\left ( E\right ) }\ \overline {z} & =0 \end{align*}

But

\[ P\left ( E\right ) =a_{0}E^{0}+a_{1}E^{1}+a_{2}E^{2}+\cdots \]

and all the \(a^{\prime }s\) are real, hence \(\overline {P\left ( E\right ) }=P\left ( E\right ) \), then

\begin{equation} P\left ( E\right ) \bar {z}=0 \tag {1}\end{equation}

Now take the real part of \(P\left ( E\right ) z=0\,\) we get

\begin{align*} \operatorname {Re}\left ( P\left ( E\right ) z\right ) & =\operatorname {Re}\left ( 0\right ) \\ \operatorname {Re}\left ( P\left ( E\right ) \right ) \operatorname {Re}\left ( z\right ) & =0 \end{align*}

But

\[ \operatorname {Re}\left ( P\left ( E\right ) \right ) =P\left ( E\right ) \]

Hence

\begin{equation} P\left ( E\right ) \operatorname {Re}\left ( z\right ) =0 \tag {2}\end{equation}

For the last part, let

\[ z=\operatorname {Re}\left ( z\right ) +i\ \operatorname {Im}\left ( z\right ) \]

Then \(P\left ( E\right ) z=0\) can be written as

\begin{align*} P\left ( E\right ) \left \{ \operatorname {Re}\left ( z\right ) +i\ \operatorname {Im}\left ( z\right ) \right \} & =0\\ P\left ( E\right ) \operatorname {Re}\left ( z\right ) +i\ P\left ( E\right ) \operatorname {Im}\left ( z\right ) & =0 \end{align*}

But from (2) we see that \(P\left ( E\right ) \operatorname {Re}\left ( z\right ) =0\,,\) hence the above becomes

\[ i\ P\left ( E\right ) \operatorname {Im}\left ( z\right ) =0 \]

Hence

\[ P\left ( E\right ) \operatorname {Im}\left ( z\right ) =0 \]

4.2.13 Section 1.3 problem 25

Problem: Determine if the difference equation \(x_{n}=x_{n-1}+x_{n-2}\)

Solution: Using the shift operator, we write \(E^{2}x_{n-2}=Ex_{n-1}+E^{0}x_{n-2}\)

Hence

\begin{align*} E^{2}x_{n-2}-Ex_{n-2}-E^{0}x_{n-2} & =0\\ \left ( E^{2}-E-1\right ) x_{n-2} & =0 \end{align*}

Hence the roots of the characteristic polynomial \(p\left ( E\right ) x=0\) are \(\lambda ^{2}-\lambda -1=0\,\ \) or \(\lambda =\frac {-b\pm \sqrt {b^{2}-4ac}}{2a}\), hence \(\lambda =\frac {1\pm \sqrt {1+4}}{2}=\frac {1\pm \sqrt {5}}{2}\)

Hence \(\left \vert \lambda _{1}\right \vert =\left \vert \frac {1+\sqrt {5}}{2}\right \vert =\) \(1.\,\allowbreak 618\) and \(\left \vert \lambda _{2}\right \vert =\left \vert \frac {1-\sqrt {5}}{2}\right \vert \) \(=0.618\,03\)

Since \(\lambda _{1}\geq 1\), then NOT STABLE difference equation.

4.3 HW 2

PDF

4.4 HW 3

4.4.1 HW 3. analytical part

PDF

4.4.2 HW 3. Computer part Matlab horner method, Taylor approx, Bisection and secant

PDF

4.4.3 Key solution

PDF

4.5 HW 4

PDF

4.6 HW 5

PDF

4.7 HW 6

PDF

4.8 HW 7

PDF

4.9 HW 8

4.9.1 Analytic problems

4.9.1.1 section 4.6, problem 2

question: Prove that if \(A\) has this property (unit row diagonal dominant)

\[ a_{ii}=1>{\displaystyle \sum \limits _{\substack {j=1\\j\neq i}}^{n}} \left \vert a_{ij}\right \vert \ \ \ \ \ \ \ \ \left ( 1\leq i\leq n\right ) \]

then Richardson iteration is successful.

Solution:

Since the iterative formula is

\begin{align*} x_{k+1} & =x_{k}+Q^{-1}\left ( b-Ax_{k}\right ) \\ & =x_{k}+Q^{-1}b-Q^{-1}Ax_{k}\\ & =(I-Q^{-1}A)x_{k}+Q^{-1}b \end{align*}

This converges, by theorem (1) on page 210 when \(\Vert I-Q^{-1}A\Vert <1\)

In Richardson method, \(Q=I\), hence Richardson converges if \(\Vert I-A\Vert <1\)

But \(\Vert I-A\Vert =\Vert \begin {bmatrix} 1 & 0 & \cdots & 0\\ 0 & 1 & \cdots & 0\\ 0 & \cdots & \ddots & \vdots \\ 0 & \cdots & \cdots & 1 \end {bmatrix}\begin {bmatrix} 1 & a_{12} & \cdots & a_{1n}\\ a_{21} & 1 & \cdots & a_{2n}\\ \vdots & \cdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & 1 \end {bmatrix} \Vert =\Vert \begin {bmatrix} 0 & a_{12} & \cdots & a_{1n}\\ a_{21} & 0 & \cdots & a_{2n}\\ \vdots & \cdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & 0 \end {bmatrix} \Vert \)

But since row unit diagonal dominant, then the sum of each row elements remaining (after \(a_{ii}\) was annihilated) is a sum which is less than 1. Hence each row about will sum to some value which is less than 1. Hence the infinity norm of the above matrix, which is the maximum row sum, is less than 1. Hence

\[ \Vert I-A\Vert <1 \]

Hence Richardson will converge. Each iteration will move closer to the solution \(x^{\ast }\)

4.9.1.2 problem 14

Problem: Prove that the eigenvalues of a Hermitian matrix are real.

Answer: \(A\) is Hermitian if \(\overline {\left ( A^{T}\right ) }=A\,\ \), where the bar above indicates taking the complex conjugate. Hence the matrix is transposed and then each element will be complex conjugated.

Now, an eigenvalue of a matrix is defined such as

\[ Ax=\lambda x \]

pre mutliply both sides by \(\overline {\left ( x^{T}\right ) }\)

\begin{align*} \overline {\left ( x^{T}\right ) }Ax & =\overline {\left ( x^{T}\right ) }\lambda x\\ \overline {\left ( x^{T}\right ) }Ax & =\lambda \overline {\left ( x^{T}\right ) }x \end{align*}

But since \(A\) is Herminitian, then \(\overline {\left ( A^{T}\right ) }=A\), hence the above becomes

\begin{align*} \overline {\left ( x^{T}\right ) }\overline {\left ( A^{T}\right ) }x & =\lambda \overline {\left ( x^{T}\right ) }x\\ \overline {\left ( x^{T}A^{T}\right ) }x & =\lambda \overline {\left ( x^{T}\right ) }x\\ \overline {\left ( Ax\right ) ^{T}}x & =\lambda \overline {\left ( x^{T}\right ) }x \end{align*}

But \(Ax=\lambda x\), hence the above becomes

\begin{align*} \overline {\left ( \lambda x\right ) ^{T}}x & =\lambda \overline {\left ( x^{T}\right ) }x\\ \overline {\left ( x^{T}\lambda \right ) }x & =\lambda \overline {\left ( x^{T}\right ) }x\\ \overline {\left ( x^{T}\right ) }\overline {\lambda }x & =\lambda \overline {\left ( x^{T}\right ) }x\\ \overline {\lambda }\overline {\left ( x^{T}\right ) }x & =\lambda \overline {\left ( x^{T}\right ) }x\\ \overline {\lambda } & =\lambda \end{align*}

Hence since complex conjugate of eigenvalue is the same as the eigenvalue, therefor \(\lambda \) is real.

4.9.1.3 problem 16

Problem: Prove that if \(A\) is nonsingular, then \(A\overline {A^{T}}\) is positive definite.

Answer: \(A\) is nonsingular, meaning its left and right inverses exist and are the same. To show that a matrix is positive definite, need to show that \(x^{T}Ax>0\) for all \(x\neq 0.\)

Let \(\overline {A^{T}}=B\), then let \(n=x^{T}A\overline {A^{T}}x\) \(,\) we need to show that \(n>0\)

\[ n=x^{T}\left ( AB\right ) x \]

Since \(A^{-1}\) exist, then multiply both sides by \(A^{-1}\) and \(B^{-1}\)

\begin{align*} A^{-1}B^{-1}n & =A^{-1}B^{-1}x^{T}\left ( AB\right ) x\\ & =x^{T}x \end{align*}

But \(x^{T}x=\Vert x\Vert ^{2}>0\) unless \(x=0\), hence above becomes

\begin{align*} A^{-1}B^{-1}n & >0\\ A^{-1}\left ( \overline {A^{T}}\right ) ^{-1}n & >0\\ \left ( \overline {A^{T}}A\right ) ^{-1}n & >0 \end{align*}

Multiply both sides by \(\overline {A^{T}}A\) leads to \(n>0\), hence \(A\overline {A^{T}}\) is positive definite.

4.9.1.4 problem 17

Problem: Prove that if \(A\) is positive definite, then its eigenvalues are positive\(.\)

Answer:  \(A\) is positive definite implies \(x^{T}Ax>0\) for all \(x\neq 0\). i.e. \(\left \langle x,Ax\right \rangle >0\).

But \(Ax=\lambda x\,\), hence

\[ \left \langle x,\lambda x\right \rangle >0 \]

or

\[ \lambda \left \langle x,x\right \rangle >0 \]

But \(\left \langle x,x\right \rangle =\Vert x\Vert ^{2}>0\) unless \(x=0\), therefor

\[ \lambda >0 \]
4.9.1.5 section 4.7, problem 1

question: Prove that if \(A\) is symmetric, then the gradient of the function \(q\left ( x\right ) =\left \langle x,Ax\right \rangle -2\left \langle x,b\right \rangle \) at \(x\) is \(2\left ( Ax-b\right ) \). Recall that the gradient of a function \(g:\mathbb {R} ^{n}\rightarrow \mathbb {R} \) is the vector whose components are \(\frac {\partial g}{\partial x_{i}}\) for \(i=1,2,\cdots n\)

answer:

\(\left \langle a,b\right \rangle \) is \(a^{T}b\), hence using this definition, we can expend the RHS above and see that it will give the result required.

\begin{align*} \left \langle x,Ax\right \rangle & =x^{T}\left ( Ax\right ) \\ & =\left [ x_{1}\ x_{2}\ \cdots \ x_{n}\right ] \begin {bmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn}\end {bmatrix}\begin {bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{n}\end {bmatrix} \\ & =\left [ x_{1}\ x_{2}\ \cdots \ x_{n}\right ] \begin {bmatrix} a_{11}x_{1}+a_{12}x_{2}+\cdots +a_{1n}x_{n}\\ a_{21}x_{1}+a_{22}x_{2}+\cdots +a_{2n}x_{n}\\ \vdots \\ a_{n1}x_{1}+a_{n2}x_{2}+\cdots +a_{nn}x_{n}\end {bmatrix} \\ & =x_{1}\left ( a_{11}x_{1}+a_{12}x_{2}+\cdots +a_{1n}x_{n}\right ) +x_{2}\left ( a_{21}x_{1}+a_{22}x_{2}+\cdots +a_{2n}x_{n}\right ) \\ & +x_{3}\left ( a_{31}x_{1}+a_{32}x_{2}+\cdots +a_{3n}x_{n}\right ) +\cdots +x_{n}\left ( a_{n1}x_{1}+a_{n2}x_{2}+\cdots +a_{nn}x_{n}\right ) \\ & =\left ( a_{11}x_{1}^{2}+a_{12}x_{1}x_{2}+\cdots +a_{1n}x_{1}x_{n}\right ) +\left ( a_{21}x_{2}x_{1}+a_{22}x_{2}^{2}+\cdots +a_{2n}x_{2}x_{n}\right ) \\ & +\left ( a_{31}x_{1}x_{3}+a_{32}x_{2}x_{3}+a_{33}x_{3}^{2}\cdots +a_{3n}x_{n}x_{3}\right ) +\cdots +\left ( a_{n1}x_{n}x_{1}+a_{n2}x_{n}x_{2}+\cdots +a_{nn}x_{n}^{2}\right ) \end{align*}

And

\begin{align*} \left \langle x,b\right \rangle & =x^{T}b\\ & =\left [ x_{1}\ x_{2}\ \cdots \ x_{n}\right ] \begin {bmatrix} b_{1}\\ b_{2}\\ \vdots \\ b_{n}\end {bmatrix} \\ & =x_{1}b_{1}+x_{2}b_{2}+\ \cdots +x_{n}b_{n}\end{align*}

Hence Putting the above together, we obtain

\begin{align*} q\left ( x\right ) & =\left \langle x,Ax\right \rangle -2\left \langle x,b\right \rangle \\ & =\left ( a_{11}x_{1}^{2}+a_{12}x_{1}x_{2}+\cdots +a_{1n}x_{1}x_{n}\right ) +\left ( a_{21}x_{2}x_{1}+a_{22}x_{2}^{2}+\cdots +a_{2n}x_{2}x_{n}\right ) \\ & +\left ( a_{31}x_{1}x_{3}+a_{32}x_{2}x_{3}+a_{33}x_{3}^{2}+\cdots +a_{3n}x_{n}x_{3}\right ) +\cdots +\left ( a_{n1}x_{n}x_{1}+a_{n2}x_{n}x_{2}+\cdots +a_{nn}x_{n}^{2}\right ) \\ & -2\left ( x_{1}b_{1}+x_{2}b_{2}+\ \cdots +x_{n}b_{n}\right ) \end{align*}

Now taking the derivative of the above w.r.t \(x_{1},x_{2},\cdots ,x_{n}\) to generate the gradient vector, we obtain

\begin{align*} \frac {\partial q\left ( x\right ) }{\partial x_{1}} & =\left ( 2a_{11}x_{1}+a_{12}x_{2}+\cdots +a_{1n}x_{n}\right ) +\left ( a_{21}x_{2}\right ) +\left ( a_{31}x_{3}\right ) +\cdots +\left ( a_{n1}x_{n}\right ) -2\left ( b_{1}\right ) \\ \frac {\partial q\left ( x\right ) }{\partial x_{2}} & =\left ( a_{12}x_{1}\right ) +\left ( a_{21}x_{1}+2a_{22}x_{2}+\cdots +a_{2n}x_{n}\right ) +\left ( a_{32}x_{3}\right ) +\cdots +\left ( a_{n2}x_{n}\right ) -2\left ( b_{2}\right ) \\ & \vdots \\ \frac {\partial q\left ( x\right ) }{\partial x_{n}} & =\left ( a_{1n}x_{1}\right ) +\left ( a_{2n}x_{2}\right ) +\left ( a_{3n}x_{3}\right ) +\cdots +\left ( a_{n1}x_{1}+a_{n2}x_{2}+\cdots +2a_{nn}x_{n}\right ) -2\left ( b_{n}\right ) \end{align*}

Hence

\begin{align*} \frac {\partial q\left ( x\right ) }{\partial x_{1}} & =\left [ 2a_{11},\ a_{12,}\ \cdots ,a_{1n}\right ] \begin {bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{n}\end {bmatrix} +\left [ 0\ ,a_{21},a_{31},\ \cdots ,a_{n1}\right ] \begin {bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{n}\end {bmatrix} -2\left ( b_{1}\right ) \\ \frac {\partial q\left ( x\right ) }{\partial x_{2}} & =\left [ a_{21}\ ,2a_{22}\ ,\cdots ,a_{2n}\right ] \begin {bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{n}\end {bmatrix} +\left [ a_{12}\ ,0,\ \cdots ,a_{n2}\right ] \begin {bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{n}\end {bmatrix} -2\left ( b_{2}\right ) \\ & \vdots \\ \frac {\partial q\left ( x\right ) }{\partial x_{n}} & =\left [ a_{n1}\ ,a_{n2},\ \cdots ,2a_{nn}\right ] \begin {bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{n}\end {bmatrix} +\left [ a_{n1},\ a_{n2,}\ \cdots ,0\right ] \begin {bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{n}\end {bmatrix} -2\left ( b_{n}\right ) \end{align*}

Combine, we get

\begin{align*} \frac {\partial q\left ( x\right ) }{\partial x_{1}} & =\left [ 2a_{11},\ 2a_{12,}\ \cdots ,2a_{1n}\right ] \begin {bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{n}\end {bmatrix} -2\left ( b_{1}\right ) \\ \frac {\partial q\left ( x\right ) }{\partial x_{2}} & =\left [ 2a_{21}\ ,2a_{22}\ ,\cdots ,2a_{2n}\right ] \begin {bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{n}\end {bmatrix} -2\left ( b_{2}\right ) \\ & \vdots \\ \frac {\partial q\left ( x\right ) }{\partial x_{n}} & =\left [ 2a_{n1}\ ,2a_{n2},\ \cdots ,2a_{nn}\right ] \begin {bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{n}\end {bmatrix} -2\left ( b_{n}\right ) \end{align*}

Hence, in Vector/Matrix notation we obtain

\begin{align*} \frac {\partial q\left ( x\right ) }{\partial \vec {x}} & =2A\vec {x}-2\vec {b}\\ & =2\left ( A\vec {x}-\vec {b}\right ) \end{align*}

Which is what we are asked to show.

(it would also have been possible to solve the above by expressing everything in the summation notations, i.e. writing \(\left ( Ax\right ) _{i}=\sum _{j}^{n}A\left ( i,j\right ) x_{j}\), and apply differentiations directly on these summation expression as is without expanding them as I did above, it would have been probably shorter solution)

4.9.1.6 problem 2

Question: Prove that the minimum value of \(q\left ( x\right ) \) is \(-\left \langle b,A^{-1}b\right \rangle \)

Solution:

\[ q\left ( x\right ) =\left \langle x,Ax\right \rangle -2\left \langle x,b\right \rangle \]

From problem (1) we found that \(\frac {\partial q\left ( x\right ) }{\partial \vec {x}}=2\left ( A\vec {x}-\vec {b}\right ) \), hence setting this to zero

\begin{align*} A\vec {x}-\vec {b} & =0\\ \vec {x} & =A^{-1}b \end{align*}

To check if this is a min,max, or saddle, we differentiate \(\frac {\partial q\left ( x\right ) }{\partial \vec {x}}\) once more, and plug in this solution and check

\begin{align*} \frac {\partial }{\partial \vec {x}}\left ( \frac {\partial q\left ( x\right ) }{\partial \vec {x}}\right ) & =2\frac {\partial }{\partial \vec {x}}\left ( A\vec {x}-\vec {b}\right ) \\ & =2A \end{align*}

(\(A\) needs to be positive definite here, problem did not say this?) Hence this is a minimum.

Hence minimum value \(q\left ( x\right ) \) is

\[ q_{\min }\left ( x\right ) =\left \langle A^{-1}b,Ax\right \rangle -2\left \langle A^{-1}b,b\right \rangle \]

But \(Ax=b\,\), hence

\begin{align*} q_{\min }\left ( x\right ) & =\left \langle A^{-1}b,b\right \rangle -2\left \langle A^{-1}b,b\right \rangle \\ & =-\left \langle A^{-1}b,b\right \rangle \\ & =-\left \langle b,A^{-1}b\right \rangle \end{align*}
4.9.1.7 problem 6

question: Prove that if \(\hat {t}=\frac {\left \langle v,b-Ax\right \rangle }{\left \langle v,Av\right \rangle }\),and if \(y=x+\hat {t}v\) then \(\left \langle v,b-Ay\right \rangle =0\)

answer:

\[ y=x+\frac {\left \langle v,b-Ax\right \rangle }{\left \langle v,Av\right \rangle }v \]

Then

\begin{align*} \left \langle v,b-Ay\right \rangle & =\left \langle v,b-A\left ( x+\frac {\left \langle v,b-Ax\right \rangle }{\left \langle v,Av\right \rangle }v\right ) \right \rangle \\ & =\left \langle v,b-Ax-A\frac {\left \langle v,b-Ax\right \rangle }{\left \langle v,Av\right \rangle }v\right \rangle \\ & =\left \langle v,b-Ax-Av\frac {\left \langle v,b-Ax\right \rangle }{\left \langle v,Av\right \rangle }\right \rangle \\ & =v^{T}\left ( b-Ax-Av\frac {\left \langle v,b-Ax\right \rangle }{\left \langle v,Av\right \rangle }\right ) \\ & =v^{T}b-v^{T}Ax-v^{T}Av\frac {\left \langle v,b-Ax\right \rangle }{\left \langle v,Av\right \rangle }\\ & =v^{T}b-v^{T}Ax-\left \langle v,Av\right \rangle \frac {\left \langle v,b-Ax\right \rangle }{\left \langle v,Av\right \rangle }\\ & =v^{T}b-v^{T}Ax-\left \langle v,b-Ax\right \rangle \\ & =v^{T}\left ( b-Ax\right ) -\left \langle v,b-Ax\right \rangle \\ & =\left \langle v,b-Ax\right \rangle -\left \langle v,b-Ax\right \rangle \\ & =0 \end{align*}

4.9.2 Computer problems

4.9.2.1 Jacobi, Gauss-Seidel, SOR

Jacobi, Gauss-Seidel, and SOR iterative solvers were implemented (in Matlab) and compared for rate of convergence. The implementation was tested on the following system

\[ A=\begin {bmatrix} -4 & 2 & 1 & 0 & 0\\ 1 & -4 & 1 & 1 & 0\\ 2 & 1 & -4 & 1 & 2\\ 0 & 1 & 1 & -4 & 1\\ 0 & 0 & 1 & 2 & 04 \end {bmatrix}\begin {bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end {bmatrix} =\begin {bmatrix} -4\\ 11\\ -16\\ 11\\ -4 \end {bmatrix} \]

and the result was compared to Matlab \(A\backslash b^{\prime }\) result, which is

A\b'
ans =
   1.00000000000000
  -2.00000000000000
   4.00000000000000
  -2.00000000000000
   1.00000000000000
>>

These iterative solvers solve the linear system \(Ax=b\) by iteratively approaching a solution. In all of these methods, the general iteration formula is

\[ x_{k+1}=x_{k}+Q^{-1}\left ( I-Ax_{k}\right ) \]

Each method uses a different \(Q\) matrix. For Jacobi, \(Q=diag(A)\), and for Gauss-Seidel, \(Q=L\left ( A\right ) \), which is the lower triangular part of \(A\). For SOR, \(Q=\frac {1}{\omega }\left ( diag\left ( A\right ) +\omega L^{0}\left ( A\right ) \right ) \) where \(L^{0}\left ( A\right ) \) is the strictly lower triangular part of \(A\), and \(\omega \) is an input parameter generally \(0<\omega <2\).

For the Jacobi and Gauss-Seidel methods, we are guaranteed to converge to a solution if \(A\) is diagonally dominant. For SOR the condition of convergence is \(\rho \left ( G\right ) <1\), where \(G=\left ( I-Q^{-1}A\right ) \), and \(\rho \left ( G\right ) \) is the spectral radius of \(G\).

4.9.2.2 Results

The following table shows the result of running Jacobian and Gauss-Seidel on the same Matrix. The columns are the relative error between each successive iteration. Defined as \(\frac {\left \vert x_{k+1}-x_{k}\right \vert }{\left \vert x_{k+1}\right \vert }\). The first column is the result from running the Jacobian method, and the second is the result from the Gauss-Seidel method.

Iteration Jacobi Gauss-Seidel
\(1\) \(0.94197873843414\) \(0.93435567469174\)
\(2\) \(0.22011159808466\) \(0.13203098317070\)
\(3\) \(0.04613055928361\) \(0.03056375635161\)
\(4\) \(0.02582530967034\) \(0.02878251571226\)
\(5\) \(0.02166103846735\) \(0.02294426703511\)
\(6\) \(0.01508738881859\) \(0.01935275191913\)
\(7\) \(0.01447950687767\) \(0.01618762392092\)
\(8\) \(0.01246529285252\) \(0.01353611057259\)
\(9\) \(0.01166876946105\) \(0.01130586303057\)
\(10\) \(0.01055706870353\) \(0.00943545181644\)
\(11\) \(0.00971262454554\) \(0.00786920716886\)
\(12\) \(0.00886458421973\) \(0.00655940732143\)
\(13\) \(0.00811604581680\) \(0.00546521547469\)
\(14\) \(0.00741643318343\) \(0.00455191661070\)
\(15\) \(0.00677994422374\) \(0.00379012917894\)
\(16\) \(0.00619437598238\) \(0.00315507291046\)
\(17\) \(0.00565882948442\) \(0.00262590555197\)
\(18\) \(0.00516810821828\) \(0.00218513512307\)
\(19\) \(0.00471915312779\) \(0.00181810678260\)
\(20\) \(0.00430837834153\) \(0.00151255968550\)

We see from the above table that Gauss-Seidel method has faster convergence than Jacobi method. The following is a plot of the above table.

For the SOR method, it depends on the value of \(\omega \) used. First we look at SOR on its own, comparing its performance as \(\omega \) changes. Next, we pick 2 values for \(\omega \) and compare SOR using these values with the Jacobian and the Gauss-Seidel method.

This table below shows the values of the relative error for difference \(\omega \). Only the first 20 iterations are shown. The following \(\omega \) values are used: \(.25,.5,.75,1,1.25,1.5,1.75\). This table also shows the number of iterations needed to achieve the same error tolerance specified by the user. Smaller number of iterations needed to reach this error limit indicates that \(\omega \) selected was better than otherwise.

From the above table, and for the specific data used in this exercise, we observe that \(\omega =1.25\) achieved the best convergence

This is a plot of the above table. (using zoom to make the important part more visible).

Now we show the difference between Jacobi, Gauss-Seidel and SOR (using \(\omega =1.25\)) as this \(\omega \) gave the best convergence using SOR. The following is a plot showing that SOR with \(\omega =1.25\) achieved best convergence.

4.9.2.3 Example Matlab output

The following is an output from one run generated by a Matlab script written to test these implementations. 2 test cases used. A,b input as given in the HW assignment sheet given in the class, and a second A,b input shown in the textbook (SPD matrix, but not diagonally dominant) shown on page 245. This is the output.

***** TEST 1 ********* 
A = 
    -4     2     1     0     0 
     1    -4     1     1     0 
     2     1    -4     1     2 
     0     1     1    -4     1 
     0     0     1     2    -4 
 
b = 
    -4    11   -16    11    -4 
======>Matlab linear solver solution, using A\b 
   1.00000000000000 
  -2.00000000000000 
   4.00000000000000 
  -2.00000000000000 
   1.00000000000000 
 
Solution from Jacobi 
   1.00037323222607 
  -1.99962676777393 
   4.00061424742539 
  -1.99962676777393 
   1.00037323222607 
 
Solution from Gauss-Seidel 
   1.00019802638995 
  -1.99981450377552 
   4.00028764196082 
  -1.99983534139755 
   1.00015423979143 
 
Solution from SOR, w=0.250000 
   1.00355389692825 
  -1.99647784171454 
   4.00574940211158 
  -1.99653496988616 
   1.00343408511030 
 
Solution from SOR, w=0.500000 
   1.00064715564763 
  -1.99936629645376 
   4.00102314817150 
  -1.99939010276819 
   1.00059721960251 
 
Solution from SOR, w=0.750000 
   1.00036324343971 
  -1.99965036525822 
   4.00055574249388 
  -1.99967387439430 
   1.00031390750518 
 
Solution from SOR, w=1.000000 
   1.00019802638995 
  -1.99981450377552 
   4.00028764196082 
  -1.99983534139755 
   1.00015423979143 
 
Solution from SOR, w=1.250000 
   1.00009213101077 
  -1.99991820839097 
   4.00012079232435 
  -1.99993416539569 
   1.00005844632680 
 
Solution from SOR, w=1.500000 
   0.99998926445485 
  -1.99999117155337 
   3.99998500044858 
  -1.99999354920373 
   0.99999484763949 
 
Solution from SOR, w=1.750000 
   1.00001519677595 
  -2.00001369840580 
   4.00002560215918 
  -2.00001192126720 
   1.00001112407411 
 
***** TEST 2 ********* 
A = 
    10     1     2     3     4 
     1     9    -1     2    -3 
     2    -1     7     3    -5 
     3     2     3    12    -1 
     4    -3    -5    -1    15 
 
b = 
    12   -27    14   -17    12 
 
======>Matlab linear solver solution, using A\b 
   1.00000000000000 
  -2.00000000000000 
   3.00000000000000 
  -2.00000000000000 
   1.00000000000000 
 
Jacobi solution 
   1.00018168868849 
  -2.00016761404521 
   2.99969203212914 
  -1.99995200794879 
   0.99978228017035 
 
Gauss Seidel solution 
   1.00009837024472 
  -2.00008075631740 
   2.99986024703377 
  -1.99998691907114 
   0.99991190441111 
 
SOR solution, w=1.25 
   1.00002736848705 
  -2.00001960859025 
   2.99996840603470 
  -1.99999920522245 
   0.99998285315310

4.9.2.3.1 Conclusion

SOR with \(\omega =1.25\) achieved the best convergence. However, one needs to find determine which \(\omega \) can do the best job, and this also will depend on the input. For different input different \(\omega \) might give better result. Hence to use SOR one must first do a number of trials to determine the best value to use. Between the Jacobian and the Gauss-Seidel methods, the Gauss-Seidel method showed better convergence.

4.9.2.4 Long operation count

In these operations long count, since these algorithms are iterative, hence the operation count will be based on one iteration. One would then need to multiply this count by the number of iterations need to converge as per the specification that the user supplies. Gauss-Seidel will require less iterations to converge to the same solution as compared to Jacobi method. With the SOR method, it will depend on the \(\omega \) selected.  

4.9.2.4.1 Jacobi

Long operations count for one iteration of the Jacobi method.

while keepLookingForSolution 
   k=k+1; 
   xnew=xold+Qinv*(b-A*xold);- - - - - (1) 
 
   currentError=norm(xnew-xold);- - - - - (2) 
   relError(k)=currentError/norm(xnew);- - - - - (3) 
 
   if norm(b-A*xnew)<=resLimit || currentError<=errorLimit || k>maxIter 
       keepLookingForSolution=FALSE; 
   else 
     xold=xnew; 
   end 
 
end
 

where

\[ Qinv=eye(nRow)/diag(diag(A)); \]

For line(1):  For multiplying an \(n\times n\) matrix by \(n\times 1\) vector, \(n^{2}\) ops are required. Hence for \(A\ast xold\) we require \(n^{2}\) ops. The result of \((b-A\ast xold)\) is an \(n\) long vector. Next we have \(Qinv\) multiplied by this \(n\) long vector. This will require only \(n\) ops since \(Qinv\) is all zeros except at the diagonal. Hence to calculate xnew we require \(n+n^{2}\) ops.

For line(2,3): It involves 2 norm operations and one division. Hence we need to find the count for the norm operation. Assuming norm 2 is being used which is defined as \(\sqrt {\sum _{i=1}^{n}x_{i}^{2}}\) hence this requires \(n\) multiplications and one long operation added for taking square root (ok to do?). Hence the total ops for finding relative error is \(2n+2+1=2n+3\)

The next operation is the check for the result limit:

\[ norm(b-A\ast xnew)<=resLimit \]

This involves \(n^{2}\) operations for the matrix by vector multiplication, and \(n+1\) operations for the norm. Hence \(n^{2}+n+1\)

Adding all the above we obtain: \(n+n^{2}+2n+3+n^{2}+n+1=2n^{2}+4n+4\) hence this is \(O\left ( 2n^{2}\right ) \)

The above is the cost per one iteration. Hence if \(M\) is close to \(n\), then it is worst than G.E. But if the number of iteration is much less than \(N\), then this method will be faster than non-iterative methods based on Gaussian elimination which required \(O\left ( n^{3}\right ) \).

4.9.2.4.2 Gauss-Seidel

Long operations count for one iteration of the Gauss-Seidel. For the statement

 while keepLookingForSolution
    k=k+1;

    xnew=xold;
    for i=1:nRow
        xnew(i)=xnew(i)+ (b(i)-A(i,:)*xnew)/A(i,i);- - - - -(1)
    end

    currentError=norm(xnew-xold); - - - - -(2)
    relError(k)=currentError/norm(xnew); - -- - - -(3)

    if norm(b-A*xnew)<=resLimit || currentError<=errorLimit || k>maxIter
        keepLookingForSolution=FALSE;
    else
        xold=xnew;
    end
end

For line (1): For each i, there is \(n\) for the operation A(i,:)*xnew, and one operation for the division. Hence the total cost is \(n^{2}\) since there are \(n\) rows.

For line(2+3): The cost is the same as with the Jacobi method, which was found to be \(2n+3\)

The next operation is the check for the result limit:

\[ norm(b-A\ast xnew)<=resLimit \]

This involves \(n^{2}\) operations for the matrix by vector multiplication, and \(n+1\) operations for the norm.

Hence the total operation is \(n^{2}+2n+3+n+1=\)\(n^{2}+3n+4\) hence this is \(O\left ( n^{2}\right ) \) per one iteration.

4.9.2.4.3 SOR

This is the same as Gauss-Seidel, except there is an extra multiplication by \(\omega \) per one row per one iteration. Hence we need to add \(n\) to the count found in Gauss-Seidel. Hence the count for SOR is \(n^{2}+4n+4\) or \(O\left ( n^{2}\right ) \)

4.9.2.5 Steepest descent iterative solver algorithm

The following is the output from testing the Steepest descent iterative algorithm.

======>Matlab linear solver solution, using A\b 
   1.00000000000000 
  -2.00000000000000 
   3.00000000000000 
  -2.00000000000000 
   1.00000000000000 
 
Steepest descent solution 
   1.00013109477547 
  -2.00012625234312 
   2.99976034511258 
  -1.99996859964602 
   0.99987563548937 
 
>>
4.9.2.6 Steepest Descent operation count
while keepLookingForSolution 
    k=k+1; 
 
    v=b-A*xold;    - - - - - - (1) 
    t=dot(v,v)/dot(v,A*v); - - - - -(2) 
    xnew=xold+t*v; - - - - -(3) 
 
    currentError=norm(xnew-xold);   - - - - -(4) 
    relError(k)=currentError/norm(xnew); - - - -(5) 
 
    if norm(b-A*xnew)<=resLimit || currentError<=errorLimit || k>maxIter 
        keepLookingForSolution=FALSE; 
    else 
        xold=xnew; 
    end 
end
 

For line (1): \(n^{2}\)

For line (2): For numerator: dot operation is \(n.\) For denominator: for A*v need \(n^{2}\), and then need \(n\) added to the dot operation, hence need \(n+n^{2}+n\) operations. Add 1 to the division, hence need \(n^{2}+2n+1\) for line (2).

For line(3): \(n\) multiplications.

For line(4): \(n+1\) for the norm.

For line(5): \(n+2\) operations.

And finally for norm(b-A*xnew) in the final check, requires \(n^{2}+n+1\)

Hence total is \(n^{2}+n^{2}+2n+1+n+n+1+n+2+n^{2}+n+1=3n^{2}+6n+5\)

4.9.3 Source code

4.9.3.1 nma_driverTestIterativeSolvers.m
% 
%This script is the driver to test and gather data for plotting 
%for computer assignment 3/19/07 for Math 501, CSUF 
% 
%Nasser Abbasi 032607 
% 
%file name: nma_driverTestIterativeSolvers.m 
 
close all; 
clear all; 
 
DISP_FOR_TABLE=0;  %turn to 1 to get output for table display 
 
A=[-4 2 1 0 0;1 -4 1 1 0;2 1 -4 1 2;0 1 1 -4 1;0 0 1 2 -4]; 
b=[-4 11 -16 11 -4]; 
maxIter=200; 
errorLimit=0.0001; 
resLimit=0.00001; 
oTable=zeros(maxIter,2); 
 
%nma_getSpectraRadiusOfMatrix( 
%find spectral radius for (I-Q^-1 A) 
 
Q=diag(diag(A)); 
r=max(abs(eig(  eye(size(A,1)) - inv(Q)*A  ) )); 
fprintf('Jacboi: spectral radius is %f\n',r); 
if r>1 
   fprintf('WARNING, spectral radius of (I-Q^-1 A)  should be less than 1 for convergence\n'); 
end 
[x,k,relError]=nma_JacobiIterativeSolver(A,[1,1,1,1,1]',b',... 
    maxIter,errorLimit,resLimit); 
figure; 
plot(3:35,relError(3:35)); % plot(relError(1:k)); 
oTable(1:k,1)=relError(1:k); 
fprintf('Solution from Jacobi\n'); 
format long; 
disp(x) 
 
 
Q=tril(A); 
r=max(abs(eig(  eye(size(A,1)) - inv(Q)*A  ) )); 
 
fprintf('Jacboi: spectral radius is %f\n',r); 
if r>1 
   fprintf('WARNING, spectral radius of (I-Q^-1 A)  should be less than 1 for convergence\n'); 
end 
 
[x,k,relError]=nma_GaussSeidelIterativeSolver(A,[1,1,1,1,1]',b',... 
    maxIter,errorLimit,resLimit); 
hold on; 
plot(3:35,relError(3:35),'r'); % plot(relError(1:k)); 
legend('Jacobi','GaussSeidel'); 
title('comparing Jacobi and GaussSeidel solvers convergence'); 
xlabel('iteration number'); 
ylabel('relative error'); 
 
fprintf('Solution from Gauss-Seidel\n'); 
format long; 
disp(x) 
 
 
oTable(1:k,2)=relError(1:k); 
fprintf('J\tG-S\n'); 
for i=1:35 
    fprintf('%d\t%16.15f\t%16.15f\n',i,oTable(i,1),oTable(i,2)); 
end 
 
%do it again for inclusion into Latex 
if DISP_FOR_TABLE 
    fprintf('Jacob\n'); 
    format long; 
    for i=1:20 
        disp(oTable(i,1)) 
    end 
 
    %do it again for inclusion into Latex 
    fprintf('G-S\n'); 
    for i=1:20 
        disp(oTable(i,2)) 
    end 
end 
 
figure; 
omegaValues=0.25*[1 2 3 4 5 6 7]; 
oTable=zeros(length(omegaValues),maxIter+1); 
mycolor={'b','r:','k*','m:','m','k-.','k'}; 
for i=1:length(omegaValues) 
    [x,k,relError]=nma_SORIterativeSolver(A,[1,1,1,1,1]',b',... 
        maxIter,errorLimit,resLimit,omegaValues(i)); 
    plot(relError(1:k),mycolor{i}); 
    oTable(i,1)=omegaValues(i); 
    oTable(i,2)=k; 
    oTable(i,3:3+k-1)=relError(1:k); 
    hold on; 
    fprintf('Solution from SOR, w=%f\n',omegaValues(i)); 
    disp(x) 
 
end 
title('SOR using different \omega values'); 
xlabel('number of iterations'); 
ylabel('relative error'); 
legend('.25','.5','.75','1','1.25','1.5','1.75'); 
 
fprintf('omega\titeration\trelative errors\n'); 
for i=1:length(omegaValues) 
    fprintf('%3.2f\t%d',oTable(i,1),oTable(i,2)); 
    if(oTable(i,2)>35) 
        cutOff=35; 
    else 
        cutOff=oTable(i,2); 
    end 
    fprintf('\t\t'); 
    for j=1:cutOff 
        fprintf('\t%16.15f',oTable(i,j+2)); 
        %fprintf('   %9.8f',oTable(i,j+2)); 
    end 
    fprintf('\n'); 
end 
 
%do it again for inclusion into Latex 
if DISP_FOR_TABLE 
    fprintf('SOR\n'); 
    for j=1:length(omegaValues) 
        fprintf('SOR======>\n'); 
        for i=1:20 
            disp(oTable(j,i+2)) 
        end 
    end 
end 
 
 
 
%now compare the 3 methods using omega 1.25 only 
[x,k,relError]=nma_JacobiIterativeSolver(A,[1,1,1,1,1]',b',... 
    maxIter,errorLimit,resLimit); 
figure; 
plot(3:35,relError(3:35)); % plot(relError(1:k)); 
[x,k,relError]=nma_GaussSeidelIterativeSolver(A,[1,1,1,1,1]',b',... 
    maxIter,errorLimit,resLimit); 
hold on; 
plot(3:35,relError(3:35),'r'); % plot(relError(1:k)); 
 
[x,k,relError]=nma_SORIterativeSolver(A,[1,1,1,1,1]',b',... 
    maxIter,errorLimit,resLimit,1.25); 
plot(3:35,relError(3:35),'m'); % plot(relError(1:k)); 
 
legend('Jacobi','GaussSeidel','SOR w=1.25'); 
title('comparing Jacobi, GaussSeidel, SOR solvers convergence'); 
xlabel('iteration number'); 
ylabel('relative error'); 
 
 
%%%% Now run the test again. short version to paste into document. 
 
 
A=[-4 2 1 0 0;1 -4 1 1 0;2 1 -4 1 2;0 1 1 -4 1;0 0 1 2 -4]; 
b=[-4 11 -16 11 -4]; 
 
fprintf('***** TEST 1 *********\n'); 
A 
b 
fprintf('======>Matlab linear solver solution, using A\\b  \n'); 
disp(A\b'); 
 
[x,k,relError]=nma_JacobiIterativeSolver(A,[1,1,1,1,1]',b',... 
    maxIter,errorLimit,resLimit); 
fprintf('Jacobi solution\n'); 
disp(x); 
 
[x,k,relError]=nma_GaussSeidelIterativeSolver(A,[1,1,1,1,1]',b',... 
    maxIter,errorLimit,resLimit); 
fprintf('Gauss Seidel solution\n'); 
disp(x); 
 
[x,k,relError]=nma_SORIterativeSolver(A,[1,1,1,1,1]',b',... 
    maxIter,errorLimit,resLimit,1.25); 
fprintf('SOR solution, w=1.25\n'); 
disp(x); 
 
 
 
%%%% Now run another test. Use an SPD matrix, which is shown on 
%page 245 of textbook (Numerical Analysis, Kincaid.Cheney) 
fprintf('***** TEST 2 *********\n'); 
A=[10 1 2 3 4;1 9 -1 2 -3;2 -1 7 3 -5;3 2 3 12 -1;4 -3 -5 -1 15]; 
b=[12 -27 14 -17 12]; 
A 
b 
fprintf('======>Matlab linear solver solution, using A\\b  \n'); 
disp(A\b'); 
 
[x,k,relError]=nma_JacobiIterativeSolver(A,[1,1,1,1,1]',b',... 
    maxIter,errorLimit,resLimit); 
fprintf('Jacobi solution\n'); 
disp(x); 
 
[x,k,relError]=nma_GaussSeidelIterativeSolver(A,[1,1,1,1,1]',b',... 
    maxIter,errorLimit,resLimit); 
fprintf('Gauss Seidel solution\n'); 
disp(x); 
 
[x,k,relError]=nma_SORIterativeSolver(A,[1,1,1,1,1]',b',... 
    maxIter,errorLimit,resLimit,1.25); 
fprintf('SOR solution, w=1.25\n'); 
disp(x);
4.9.3.2 nma_SORIterativeSolver.m
function [xnew,k,relError]=nma_SORIterativeSolver(A,x,b,... 
    maxIter,errorLimit,resLimit,omega) 
%function [xnew,k,relError]=nma_GaussSeidelIterativeSolver(A,x,b,... 
%                                       maxIter,errorLimit,resLimit,omega) 
% 
% Solve Ax=b using the SOR Iterative method 
% 
%INPUT: 
% A: the A matrix 
% x: Initial guess for solution 
% b: right hand side 
% maxIter:  max number of iterations allowed 
% errorLimit: error tolerance. difference between successive x iteration 
%             values. if such a difference is less than this error, stop. 
% resLimit: if |b-A*x| is less than this limit, stop the iterative process. 
% omega: SOR factor 
% 
%OUTPUT 
% xnew: the solution found by iterative method. 
% k: actual number of iterations used to obtain the above solution. 
% relError: array that contains the relative error found at each iteration 
 
%example call 
% A=[-4 2 1 0 0;1 -4 1 1 0;2 1 -4 1 2;0 1 1 -4 1;0 0 1 2 -4]; 
% b=[-4 11 -16 11 -4]; maxIter=200; errorLimit=0.0001; resLimit=0.00001; 
%[x,k,relError]=nma_JacobiIterativeSolver(A,[1,1,1,1,1]',b',... 
%                                          maxIter,errorLimit,resLimit) 
 
%by Nasser Abbasi  3/26/07 
% 
 
% do some error checking on input.... 
 
if nargin ~=7 
    error 'wrong number of arguments. 7 inputs are required'; 
end 
 
if ~isnumeric(omega) 
    error 'omega must be numeric'; 
end 
 
TRUE=1; FALSE=0; 
 
[res,msg]=nma_IterativeSolversIsValidInput(A,x,b,... 
    maxIter,errorLimit,resLimit); 
if ~res 
    error(msg); 
end 
 
[nRow,nCol]=size(A); 
xold=x(:); 
b=b(:); 
k=0; 
relError=zeros(maxIter,1); 
 
keepLookingForSolution=TRUE; 
 
while keepLookingForSolution 
    k=k+1; 
 
    xnew=xold; 
    for i=1:nRow 
        xnew(i)=xnew(i)+ omega*(b(i)-A(i,:)*xnew)/A(i,i); 
    end 
 
    currentError=norm(xnew-xold); 
    relError(k)=currentError/norm(xnew); 
 
    if norm(b-A*xnew)<=resLimit || currentError<=errorLimit || k>maxIter 
        keepLookingForSolution=FALSE; 
    else 
        xold=xnew; 
    end 
end 
 
end
4.9.3.3 nma_JacobiIterativeSolver.m
function [xnew,k,relError]=nma_JacobiIterativeSolver(A,x,b,... 
                                            maxIter,errorLimit,resLimit) 
%function [xnew,k]=nma_JacobiIterativeSolver(A,x,b,... 
%                                           maxIter,errorLimit,resLimit) 
% 
% Solve Ax=b using the Jacobi Iterative method 
% 
%INPUT: 
% A: the A matrix 
% x: Initial guess for solution 
% b: right hand side 
% maxIter:  max number of iterations allowed 
% errorLimit: error tolerance. difference between successive x iteration 
%             values. if such a difference is less than this error, stop. 
% resLimit: if |b-A*x| is less than this limit, stop the iterative process. 
% 
%OUTPUT 
% xnew: the solution found by iterative method. 
% k: actual number of iterations used to obtain the above solution. 
% relError: array that contains the relative error found at each iteration 
% 
%example call 
% A=[-4 2 1 0 0;1 -4 1 1 0;2 1 -4 1 2;0 1 1 -4 1;0 0 1 2 -4]; 
% b=[-4 11 -16 11 -4]; maxIter=200; errorLimit=0.0001; resLimit=0.00001; 
%[x,k,relError]=nma_JacobiIterativeSolver(A,[1,1,1,1,1]',b',... 
%                                          maxIter,errorLimit,resLimit) 
 
%by Nasser Abbasi  3/26/07 
% 
 
if nargin ~=6 
    error 'wrong number of arguments. 6 inputs are required'; 
end 
 
TRUE=1; FALSE=0; 
 
[res,msg]=nma_IterativeSolversIsValidInput(A,x,b,maxIter,errorLimit,resLimit); 
if ~res 
    error(msg); 
end 
 
[nRow,nCol]=size(A); 
xold=x(:); 
b=b(:); 
k=0; 
relError=zeros(maxIter,1); 
Qinv=eye(nRow)/diag(diag(A)); 
 
keepLookingForSolution=TRUE; 
 
while keepLookingForSolution 
   k=k+1; 
   xnew=xold+Qinv*(b-A*xold); 
 
   currentError=norm(xnew-xold); 
   relError(k)=currentError/norm(xnew); 
 
   if norm(b-A*xnew)<=resLimit || currentError<=errorLimit || k>maxIter 
       keepLookingForSolution=FALSE; 
   else 
     xold=xnew; 
   end 
 
end 
 
end
4.9.3.4 nma_GaussSeidelIterativeSolver.m
function [xnew,k,relError]=nma_GaussSeidelIterativeSolver(A,x,b,... 
    maxIter,errorLimit,resLimit) 
%function [xnew,k]=nma_GaussSeidelIterativeSolver(A,x,b,... 
%                                             maxIter,errorLimit,resLimit) 
% 
% Solve Ax=b using the Gauss-Seidel Iterative method 
% 
%INPUT: 
% A: the A matrix 
% x: Initial guess for solution 
% b: right hand side 
% maxIter:  max number of iterations allowed 
% errorLimit: error tolerance. difference between successive x iteration 
%             values. if such a difference is less than this error, stop. 
% resLimit: if |b-A*x| is less than this limit, stop the iterative process. 
% 
%OUTPUT 
% xnew: the solution found by iterative method. 
% k: actual number of iterations used to obtain the above solution. 
% relError: array that contains the relative error found at each iteration 
% 
%example call 
% A=[-4 2 1 0 0;1 -4 1 1 0;2 1 -4 1 2;0 1 1 -4 1;0 0 1 2 -4]; 
% b=[-4 11 -16 11 -4]; maxIter=200; errorLimit=0.0001; resLimit=0.00001; 
%[x,k,relError]=nma_GaussSeidelIterativeSolver(A,... 
%                            [1,1,1,1,1]',b',maxIter,errorLimit,resLimit) 
% 
 
%by Nasser Abbasi  3/26/07 
 
% do some error checking on input.... 
 
if nargin ~=6 
    error 'wrong number of arguments. 6 inputs are required'; 
end 
 
TRUE=1; FALSE=0; 
 
[res,msg]=nma_IterativeSolversIsValidInput(A,x,b,... 
    maxIter,errorLimit,resLimit); 
if ~res 
    error(msg); 
end 
 
[nRow,nCol]=size(A); 
xold=x(:); 
b=b(:); 
k=0; 
relError=zeros(maxIter,1); 
 
keepLookingForSolution=TRUE; 
 
while keepLookingForSolution 
    k=k+1; 
 
    xnew=xold; 
    for i=1:nRow 
        xnew(i)=xnew(i)+ (b(i)-A(i,:)*xnew)/A(i,i); 
    end 
 
    currentError=norm(xnew-xold); 
    relError(k)=currentError/norm(xnew); 
 
    if norm(b-A*xnew)<=resLimit || currentError<=errorLimit || k>maxIter 
        keepLookingForSolution=FALSE; 
    else 
        xold=xnew; 
    end 
end 
 
end
4.9.3.5 nma_IterativeSolversIsValidInput.m
function [res,msg]=nma_IterativeSolversIsValidInput(A,x,b,maxIter,errorLimit,resLimit) 
%function[res,msg]=nma_IterativeSolversIsValidInput(A,x,b,maxIter,errorLimit,resLimit) 
% 
%helper function. Called by iterative liner solvers to validate input 
% 
%Nasser Abbasi 03/26/07 
 
res=0; 
msg=''; 
if ~isnumeric(A)|~isnumeric(b)|~isnumeric(x)|~isnumeric(maxIter)... 
        |~isnumeric(errorLimit)|~isnumeric(resLimit) 
    msg='non numeric input detected'; 
    return; 
end 
 
[nRow,nCol]=size(A); 
if nRow~=nCol 
    msg='square A matrix expected'; 
    return; 
end 
 
[m,n]=size(b); 
if n>1 
    msg='b must be a vector'; 
    return; 
end 
 
if m~=nRow 
    msg='length of b does not match A matrix size'; 
    return; 
end 
 
[m,n]=size(x); 
if n>1 
    msg='x must be a vector'; 
    return; 
end 
 
if m~=nRow 
    msg='length of x does not match A matrix size'; 
    return; 
end 
 
res=1; 
return; 
 
end
4.9.3.6 nma_SteepestIterativeSolver.m
function [xnew,k,relError]=nma_SteepestIterativeSolver(A,x,b,... 
    maxIter,errorLimit,resLimit) 
%function [xnew,k]=nma_SteepestIterativeSolver(A,x,b,... 
%                                             maxIter,errorLimit,resLimit) 
% 
% Solve Ax=b using the Steepest descent Iterative method 
% 
%INPUT: 
% A: the A matrix 
% x: Initial guess for solution 
% b: right hand side 
% maxIter:  max number of iterations allowed 
% errorLimit: error tolerance. difference between successive x iteration 
%             values. if such a difference is less than this error, stop. 
% resLimit: if |b-A*x| is less than this limit, stop the iterative process. 
% 
%OUTPUT 
% xnew: the solution found by iterative method. 
% k: actual number of iterations used to obtain the above solution. 
% relError: array that contains the relative error found at each iteration 
% 
%example call 
% A=[-4 2 1 0 0;1 -4 1 1 0;2 1 -4 1 2;0 1 1 -4 1;0 0 1 2 -4]; 
% b=[-4 11 -16 11 -4]; maxIter=200; errorLimit=0.0001; resLimit=0.00001; 
%[x,k,relError]=nma_SteepestIterativeSolver(A,... 
%                            [1,1,1,1,1]',b',maxIter,errorLimit,resLimit) 
% 
 
%by Nasser Abbasi  3/26/07 
 
% do some error checking on input.... 
 
if nargin ~=6 
    error 'wrong number of arguments. 6 inputs are required'; 
end 
 
TRUE=1; FALSE=0; 
 
[res,msg]=nma_IterativeSolversIsValidInput(A,x,b,... 
    maxIter,errorLimit,resLimit); 
if ~res 
    error(msg); 
end 
 
[nRow,nCol]=size(A); 
xold=x(:); 
b=b(:); 
k=0; 
relError=zeros(maxIter,1); 
 
keepLookingForSolution=TRUE; 
 
while keepLookingForSolution 
    k=k+1; 
 
    v=b-A*xold; 
    t=dot(v,v)/dot(v,A*v); 
    xnew=xold+t*v; 
 
    currentError=norm(xnew-xold); 
    relError(k)=currentError/norm(xnew); 
 
    if norm(b-A*xnew)<=resLimit || currentError<=errorLimit || k>maxIter 
        keepLookingForSolution=FALSE; 
    else 
        xold=xnew; 
    end 
end 
 
end
4.9.3.7 nma_driverTestSteepest
% 
%This script is the driver to test steepest descent solver 
%for computer assignment 3/19/07 for Math 501, CSUF 
% 
%Nasser Abbasi 033007 
% 
%file name: nma_driverTestSteepest.m 
 
close all; 
clear all; 
 
 
maxIter=200; 
errorLimit=0.0001; 
resLimit=0.00001; 
 
%%%% Now run another test. Use an SPD matrix, which is shown on 
%page 245 of textbook (Numerical Analysis, Kincaid.Cheney) 
fprintf('***** TEST steepest descent *********\n'); 
A=[10 1 2 3 4;1 9 -1 2 -3;2 -1 7 3 -5;3 2 3 12 -1;4 -3 -5 -1 15]; 
b=[12 -27 14 -17 12]; 
A 
b 
fprintf('======>Matlab linear solver solution, using A\\b  \n'); 
disp(A\b'); 
 
[x,k,relError]=nma_SteepestIterativeSolver(A,[1,1,1,1,1]',b',... 
    maxIter,errorLimit,resLimit); 
fprintf('Steepest descent solution\n'); 
disp(x);

4.10 HW 9

PDF

4.10.1 Source code

4.10.1.1 nma_inverse_power.m
nction lambda_new=nma_inverse_power(A,initialEigenVectorGuess,maxIter,delErr,delEps) 
% 
% MATH 501, Computer Assignment 04/04/2007 
% by Nasser Abbasi 
% 
% IMPLEMENT iterative Inverse power method 
 
A=inv(A); 
 
w_old=initialEigenVectorGuess(:); 
lambda_old=rand; 
 
for k=1:maxIter 
    w_old=w_old/norm(w_old); 
    w_new=A*w_old; 
    lambda_new=dot(w_old,w_new); 
 
    if norm(w_new-w_old)<delErr || abs(lambda_old-lambda_new)<delEps 
        break; 
    else 
        w_old=w_new; 
        lambda_old=lambda_new; 
    end 
end 
 
lambda_new=1/lambda_new;
4.10.1.2 nma_inverse_shifted_power.m
function [lambda_new,k]=nma_inverse_shifted_power(A,initialEigenVectorGuess,maxIter,delErr,delEps,mu) 
% 
% MATH 501, Computer Assignment 04/04/2007 
% by Nasser Abbasi 
% 
% IMPLEMENT iterative Inverse  shifted power method 
 
w_old=initialEigenVectorGuess(:); 
lambda_old=rand; 
In=eye(size(A,1)); 
A=inv(A-mu*In); 
 
for k=1:maxIter 
    w_old=w_old/norm(w_old); 
    w_new=A*w_old; 
    lambda_new=dot(w_old,w_new); 
 
    if norm(w_new-w_old)<delErr || abs(lambda_old-lambda_new)<delEps 
        break; 
    else 
        w_old=w_new; 
        lambda_old=lambda_new; 
    end 
end 
 
lambda_new=1/lambda_new;
4.10.1.3 nma_power.m
function lambda_new=nma_power(A,initialEigenVectorGuess,maxIter,delErr,delEps) 
% 
% MATH 501, Computer Assignment 04/04/2007 
% by Nasser Abbasi 
% 
% IMPLEMENT iterative  power method 
 
w_old=initialEigenVectorGuess(:); 
 
lambda_old=rand; 
 
for k=1:maxIter 
    w_old=w_old/norm(w_old); 
    w_new=A*w_old; 
    lambda_new=dot(w_old,w_new); 
    if norm(w_new-w_old)<delErr || abs(lambda_old-lambda_new)<delEps 
        break; 
    else 
        w_old=w_new; 
        lambda_old=lambda_new; 
    end 
end
4.10.1.4 nma_shifted_power.m
nction [lambda_new,k]=nma_shifted_power(A,initialEigenVectorGuess,... 
    maxIter,delErr,delEps,mu) 
% 
% MATH 501, Computer Assignment 04/04/2007 
% by Nasser Abbasi 
% 
% IMPLEMENT iterative  shifted power method 
 
w_old=initialEigenVectorGuess(:); 
lambda_old=mu; 
In=eye(size(A,1)); 
A=A-mu*In; 
 
for k=1:maxIter 
    w_old=w_old/norm(w_old); 
    w_new=A*w_old; 
    lambda_new=dot(w_old,w_new); 
 
    if norm(w_new-w_old)<delErr || abs(lambda_old-lambda_new)<delEps 
        break; 
    else 
        w_old=w_new; 
        lambda_old=lambda_new; 
    end 
end
4.10.1.5 nma_test.m
% 
% MATH 501, Computer Assignment 04/04/2007 
% by Nasser Abbasi 
% 
% test script to run problem 2 on the data given 
 
A=[4 2 1 0 0; 
   1 4 1 1 0; 
   2 1 4 1 2; 
   0 1 1 4 1; 
   0 0 1 2 4]; 
initialEigenVectorGuess=[1 1 1 1 1]; 
 
%use Matlab to see the values to verify aginst 
 
[v,l]=eig(A) 
 
%Set parameters 
maxIter=10; 
delErr=0.0001;  %not specified, try these 
delEps=0.0001;  %not specified, try these 
 
[lambda]=nma_power(A,initialEigenVectorGuess,maxIter,delErr,delEps); 
fprintf('---------- power result. Eigenvalue=%f\n',lambda); 
 
[lambda]=nma_inverse_power(A,initialEigenVectorGuess,maxIter,delErr,delEps); 
fprintf('---------- power inverse result. Eigenvalue=%f\n',lambda); 
 
[lambda,k]=nma_shifted_power(A,initialEigenVectorGuess,maxIter,delErr,delEps,2); 
lambda=2+lambda; 
fprintf('---------- power shifted result. Eigenvalue=%f\n',lambda); 
 
[lambda,k]=nma_inverse_shifted_power(A,initialEigenVectorGuess,maxIter,delErr,delEps,6.5); 
lambda=6.5+lambda; 
fprintf('---------- power inverse shifted result. Eigenvalue=%f\n',lambda);

4.11 HW 10

PDF

4.12 HW 11

PDF

4.13 HW 12

4.13.1 Section 7.1, Problem 6

Problem: Derive the following 2 formulas for approximation of derivatives and show they are both O(\(h^{4}\)) by evaluating their error terms

\begin{align*} f\ ^{\prime }\left ( x\right ) & =\frac {1}{12h}\left [ -f\left ( x+2h\right ) +8f\left ( x+h\right ) -8f\left ( x-h\right ) +f\left ( x-2h\right ) \right ] \\ f\ ^{\prime \prime }\left ( x\right ) & =\frac {1}{12h^{2}}\left [ -f\left ( x+2h\right ) +16f\left ( x+h\right ) -30f\left ( x\right ) +16f\left ( x-h\right ) -f\left ( x-2h\right ) \right ] \end{align*}

Solution:

I could obtain the above results directly from applying Richardson interpolation formulas (which is a short approach), but I assumed the question wanted us to derive these from first principles. I first show how to do one using Richardson, then solve both from first principles.

To obtain the approximation for \(f^{\prime }\left ( x\right ) \) using Richardson, we do the following:

\begin{align} \varphi \left ( h\right ) & =\frac {1}{2h}\left [ f\left ( x+h\right ) -f\left ( x-h\right ) \right ] \nonumber \\ L & =\varphi \left ( h\right ) +a_{2}h^{2}+a_{4}h^{4}+\cdots \tag {1C}\end{align}

Replacing \(h\) by \(2h\)

\begin{equation} L=\varphi \left ( 2h\right ) +a_{2}4h^{2}+a_{4}16h^{4}+\cdots \tag {2C}\end{equation}

Multiplying (1C) by 4 and subtract (2C) from result

\begin{align*} 3L & =\left ( 4\varphi \left ( h\right ) +4a_{2}h^{2}+4a_{4}h^{4}+\cdots \right ) -\left ( \varphi \left ( 2h\right ) +a_{2}4h^{2}+a_{4}16h^{4}+\cdots \right ) \\ & =4\varphi \left ( h\right ) -\varphi \left ( 2h\right ) -12a_{4}h^{4}-\cdots \end{align*}

Hence

\begin{align*} L & =\frac {1}{3}\left ( \frac {2}{h}\left [ f\left ( x+h\right ) -f\left ( x-h\right ) \right ] -\frac {1}{4h}\left [ f\left ( x+2h\right ) -f\left ( x-2h\right ) \right ] -12a_{4}h^{4}-\cdots \right ) \\ & =\frac {2}{3h}\left [ f\left ( x+h\right ) -f\left ( x-h\right ) \right ] -\frac {1}{12h}\left [ f\left ( x+2h\right ) -f\left ( x-2h\right ) \right ] -4a_{4}h^{4}-\cdots \\ & =\frac {1}{12h}\left ( 8\left [ f\left ( x+h\right ) -f\left ( x-h\right ) \right ] -\left [ f\left ( x+2h\right ) -f\left ( x-2h\right ) \right ] \right ) -4a_{4}h^{4}-\cdots \\ & =\frac {1}{12h}\left [ -f\left ( x+2h\right ) +8f\left ( x+h\right ) -8f\left ( x-h\right ) +f\left ( x-2h\right ) \right ] -4a_{4}h^{4}-\cdots \end{align*}

Which is the same result obtained earlier using the long approach. We also see that the error term is \(O\left ( h^{4}\right ) \)

Now, solve it again, but using direct usage of Taylor (which I assume what the book wanted us to do)

From Taylor expansion, we write, by expanding around \(x+h\) and \(x-h\)

\begin{align*} f\left ( x+h\right ) & =f\left ( x\right ) +hf\ ^{\prime }\left ( x\right ) +\frac {h^{2}}{2}f^{\prime \prime }\left ( x\right ) +\frac {h^{3}}{3!}f^{\left ( 3\right ) }\left ( x\right ) +\frac {h^{4}}{4!}f^{\left ( 4\right ) }\left ( x\right ) +\frac {h^{5}}{5!}f^{\left ( 5\right ) }\left ( \xi _{1}\right ) \\ f\left ( x-h\right ) & =f\left ( x\right ) -hf\ ^{\prime }\left ( x\right ) +\frac {h^{2}}{2}f^{\prime \prime }\left ( x\right ) -\frac {h^{3}}{3!}f^{\left ( 3\right ) }\left ( x\right ) +\frac {h^{4}}{4!}f^{\left ( 4\right ) }\left ( x\right ) -\frac {h^{5}}{5!}f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \end{align*}

Subtract the second from the first equation

\[ f\left ( x+h\right ) -f\left ( x-h\right ) =2hf\ ^{\prime }\left ( x\right ) +\frac {h^{3}}{3}f^{\left ( 3\right ) }\left ( x\right ) +\frac {h^{5}}{60}\left [ f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \right ] \]

Solve for \(f^{\prime }\left ( x\right ) \) we obtain

\begin{equation} f\ ^{\prime }\left ( x\right ) =\frac {1}{2h}\left [ f\left ( x+h\right ) -f\left ( x-h\right ) \right ] -\frac {1}{6}h^{2}f^{\left ( 3\right ) }\left ( x\right ) -\frac {1}{120}h^{4}\left [ f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \right ] \tag {1}\end{equation}

Now we do the same again, but by expanding around \(x+2h\) and \(x-2h\)

\begin{align*} f\left ( x+2h\right ) & =f\left ( x\right ) +2hf\ ^{\prime }\left ( x\right ) +\frac {\left ( 2h\right ) ^{2}}{2}f^{\prime \prime }\left ( x\right ) +\frac {\left ( 2h\right ) ^{3}}{3!}f^{\left ( 3\right ) }\left ( x\right ) +\frac {\left ( 2h\right ) ^{4}}{4!}f^{\left ( 4\right ) }\left ( x\right ) +\frac {\left ( 2h\right ) ^{5}}{5!}f^{\left ( 5\right ) }\left ( \xi _{1}\right ) \\ f\left ( x-2h\right ) & =f\left ( x\right ) -2hf\ ^{\prime }\left ( x\right ) +\frac {\left ( 2h\right ) ^{2}}{2}f^{\prime \prime }\left ( x\right ) -\frac {\left ( 2h\right ) ^{3}}{3!}f^{\left ( 3\right ) }\left ( x\right ) +\frac {\left ( 2h\right ) ^{4}}{4!}f^{\left ( 4\right ) }\left ( x\right ) -\frac {\left ( 2h\right ) ^{5}}{5!}f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \end{align*}

Subtract the second from the first equation

\begin{align*} f\left ( x+2h\right ) -f\left ( x-2h\right ) & =4hf\ ^{\prime }\left ( x\right ) +2\frac {\left ( 2h\right ) ^{3}}{3!}f^{\left ( 3\right ) }\left ( x\right ) +\frac {\left ( 2h\right ) ^{5}}{5!}\left [ f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \right ] \\ & =4hf\ ^{\prime }\left ( x\right ) +\frac {8}{3}h^{3}f^{\left ( 3\right ) }\left ( x\right ) +\frac {4}{15}h^{5}\left [ f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \right ] \end{align*}

Solve for \(f^{\prime }\left ( x\right ) \) we obtain

\begin{equation} f\ ^{\prime }\left ( x\right ) =\frac {1}{4h}\left [ f\left ( x+2h\right ) -f\left ( x-2h\right ) \right ] -\frac {4}{6}h^{2}f^{\left ( 3\right ) }\left ( x\right ) -\frac {1}{15}h^{4}\left [ f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \right ] \tag {2}\end{equation}

We want to eliminate \(f^{(3)}\left ( x\right ) \) from the above. So we multiply eq(1) by 4 and subtract eq(2) from the result. So equation (1) becomes

\begin{align} 4f\ ^{\prime }\left ( x\right ) & =4\left ( \frac {1}{2h}\left [ f\left ( x+h\right ) -f\left ( x-h\right ) \right ] -\frac {1}{6}h^{2}f^{\left ( 3\right ) }\left ( x\right ) -\frac {1}{120}h^{4}\left [ f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \right ] \right ) \nonumber \\ & =\frac {2}{h}\left [ f\left ( x+h\right ) -f\left ( x-h\right ) \right ] -\frac {4}{6}h^{2}f^{\left ( 3\right ) }\left ( x\right ) -\frac {1}{30}h^{4}\left [ f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \right ] \tag {3}\end{align}

Now subtract (2) from (3) we obtain

\begin{align*} 3f^{\prime }\left ( x\right ) & =\frac {2}{h}\left [ f\left ( x+h\right ) -f\left ( x-h\right ) \right ] -\frac {4}{6}h^{2}f^{\left ( 3\right ) }\left ( x\right ) -\frac {1}{30}h^{4}\left [ f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \right ] -\\ & \left ( \frac {1}{4h}\left [ f\left ( x+2h\right ) -f\left ( x-2h\right ) \right ] -\frac {4}{6}h^{2}f^{\left ( 3\right ) }\left ( x\right ) -\frac {1}{15}h^{4}\left [ f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \right ] \right ) \end{align*}

Hence

\begin{align*} 3f^{\prime }\left ( x\right ) & =\frac {2}{h}\left [ f\left ( x+h\right ) -f\left ( x-h\right ) \right ] -\frac {1}{30}h^{4}\left [ f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \right ] -\\ & \frac {1}{4h}\left [ f\left ( x+2h\right ) -f\left ( x-2h\right ) \right ] +\frac {1}{15}h^{4}\left [ f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \right ] \\ & \\ f^{\prime }\left ( x\right ) & =\frac {2}{3h}\left [ f\left ( x+h\right ) -f\left ( x-h\right ) \right ] -\frac {1}{12h}\left [ f\left ( x+2h\right ) -f\left ( x-2h\right ) \right ] +\frac {1}{90}h^{4}\left [ f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \right ] \\ & =\frac {1}{12h}\left [ 8f\left ( x+h\right ) -8f\left ( x-h\right ) -f\left ( x+2h\right ) -f\left ( x-2h\right ) \right ] +\frac {1}{90}h^{4}\left [ f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \right ] \end{align*}

Rearrange terms to make it look as in the textbook

\begin{equation} f^{\prime }\left ( x\right ) =\frac {1}{12h}\left [ -f\left ( x+2h\right ) +8f\left ( x+h\right ) -8f\left ( x-h\right ) -f\left ( x-2h\right ) \right ] +\frac {1}{90}h^{4}\left [ f^{\left ( 5\right ) }\left ( \xi \right ) \right ] \tag {4}\end{equation}

Where we replaced \(\frac {1}{90}h^{4}\left [ f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) \right ] \) by \(\frac {1}{45}h^{4}\left [ \frac {f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) }{2}\right ] =\frac {1}{90}h^{4}\left [ f^{\left ( 5\right ) }\left ( \xi \right ) \right ] \) with \(f^{\left ( 5\right ) }\left ( \xi \right ) \) being the mean value of \(\frac {f^{\left ( 5\right ) }\left ( \xi _{1}\right ) +f^{\left ( 5\right ) }\left ( \xi _{2}\right ) }{2}\)

Hence from equation (4) we see that the error is \(O\left ( h^{4}\right ) \) as required to show.

Hence

\[ \fbox {$f^\prime \left ( x\right ) \approx \frac {1}{12h}\left [ -f\left ( x+2h\right ) +8f\left ( x+h\right ) -8f\left ( x-h\right ) -f\left ( x-2h\right ) \right ] $}\]

Now we need to show the formula for \(f^{\prime \prime }\left ( x\right ) .\) We do the same as above, but instead of subtracting equations, we add them. We start from the top to show these again step by step.

From Taylor expansion, we write, by expanding around \(x+h\) and \(x-h\)

\begin{align*} f\left ( x+h\right ) & =f\left ( x\right ) +hf\ ^{\prime }\left ( x\right ) +\frac {h^{2}}{2}f^{\prime \prime }\left ( x\right ) +\frac {h^{3}}{3!}f^{\left ( 3\right ) }\left ( x\right ) +\frac {h^{4}}{4!}f^{\left ( 4\right ) }\left ( x\right ) +\frac {h^{5}}{5!}f^{\left ( 5\right ) }\left ( x\right ) +\frac {h^{6}}{6!}f^{\left ( 6\right ) }\left ( \xi _{1}\right ) \\ f\left ( x-h\right ) & =f\left ( x\right ) -hf\ ^{\prime }\left ( x\right ) +\frac {h^{2}}{2}f^{\prime \prime }\left ( x\right ) -\frac {h^{3}}{3!}f^{\left ( 3\right ) }\left ( x\right ) +\frac {h^{4}}{4!}f^{\left ( 4\right ) }\left ( x\right ) -\frac {h^{5}}{5!}f^{\left ( 5\right ) }\left ( x\right ) +\frac {h^{6}}{6!}f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \end{align*}

Add the second to the first equation

\[ f\left ( x+h\right ) +f\left ( x-h\right ) =2f\ \left ( x\right ) +h^{2}f^{\prime \prime }\left ( x\right ) +\frac {h^{4}}{12}f^{\left ( 4\right ) }\left ( x\right ) +\frac {h^{6}}{6!}\left [ f^{\left ( 6\right ) }\left ( \xi _{1}\right ) +f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \right ] \]

Solve for \(f^{\prime \prime }\left ( x\right ) \) we obtain

\begin{equation} f^{\prime \prime }\left ( x\right ) =\frac {1}{h^{2}}\left [ f\left ( x+h\right ) +f\left ( x-h\right ) \right ] -\frac {2}{h^{2}}f\ \left ( x\right ) -\frac {h^{2}}{12}f^{\left ( 4\right ) }\left ( x\right ) -\frac {1}{720}h^{4}\left [ f^{\left ( 6\right ) }\left ( \xi _{1}\right ) +f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \right ] \tag {1A}\end{equation}

Now we do the same again, but by expanding around \(x+2h\) and \(x-2h\)

\begin{align*} f\left ( x+2h\right ) & =f\left ( x\right ) +2hf\ ^{\prime }\left ( x\right ) +\frac {\left ( 2h\right ) ^{2}}{2}f^{\prime \prime }\left ( x\right ) +\frac {\left ( 2h\right ) ^{3}}{3!}f^{\left ( 3\right ) }\left ( x\right ) +\frac {\left ( 2h\right ) ^{4}}{4!}f^{\left ( 4\right ) }\left ( x\right ) +\frac {\left ( 2h\right ) ^{5}}{5!}f^{\left ( 5\right ) }\left ( x\right ) +\frac {\left ( 2h\right ) ^{6}}{6!}f^{\left ( 6\right ) }\left ( \xi _{1}\right ) \\ f\left ( x-2h\right ) & =f\left ( x\right ) -2hf\ ^{\prime }\left ( x\right ) +\frac {\left ( 2h\right ) ^{2}}{2}f^{\prime \prime }\left ( x\right ) -\frac {\left ( 2h\right ) ^{3}}{3!}f^{\left ( 3\right ) }\left ( x\right ) +\frac {\left ( 2h\right ) ^{4}}{4!}f^{\left ( 4\right ) }\left ( x\right ) -\frac {\left ( 2h\right ) ^{5}}{5!}f^{\left ( 5\right ) }\left ( x\right ) +\frac {\left ( 2h\right ) ^{6}}{6!}f^{\left ( 6\right ) }\left ( \xi _{1}\right ) \end{align*}

Add the second to the first equation

\begin{align*} f\left ( x+2h\right ) +f\left ( x-2h\right ) & =2f\ \left ( x\right ) +\left ( 2h\right ) ^{2}f^{\prime \prime }\left ( x\right ) +\frac {\left ( 2h\right ) ^{4}}{12}f^{\left ( 4\right ) }\left ( x\right ) +\frac {\left ( 2h\right ) ^{6}}{6!}\left [ f^{\left ( 6\right ) }\left ( \xi _{1}\right ) +f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \right ] \\ & =2f\ \left ( x\right ) +4h^{2}f^{\prime \prime }\left ( x\right ) +\frac {4}{3}h^{4}f^{\left ( 4\right ) }\left ( x\right ) +\frac {4}{45}h^{6}\left [ f^{\left ( 6\right ) }\left ( \xi _{1}\right ) +f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \right ] \end{align*}

Solve for \(f^{\prime \prime }\left ( x\right ) \) we obtain

\begin{equation} f\ ^{\prime \prime }\left ( x\right ) =\frac {1}{4h^{2}}\left [ f\left ( x+2h\right ) +f\left ( x-2h\right ) \right ] -\frac {1}{2h^{2}}f\ \left ( x\right ) -\frac {1}{3}h^{2}f^{\left ( 4\right ) }\left ( x\right ) -\frac {1}{45}h^{4}\left [ f^{\left ( 6\right ) }\left ( \xi _{1}\right ) +f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \right ] \tag {2A}\end{equation}

We want to eliminate \(f^{(4)}\left ( x\right ) \) from the above. So we multiply eq(1A) by 4 and subtract eq(2) from the result. So equation (1A) becomes

\begin{align} 4f^{\prime \prime }\left ( x\right ) & =4\left ( \frac {1}{h^{2}}\left [ f\left ( x+h\right ) +f\left ( x-h\right ) \right ] -\frac {2}{h^{2}}f\ \left ( x\right ) -\frac {h^{2}}{12}f^{\left ( 4\right ) }\left ( x\right ) -\frac {1}{720}h^{4}\left [ f^{\left ( 6\right ) }\left ( \xi _{1}\right ) +f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \right ] \right ) \nonumber \\ & =\frac {4}{h^{2}}\left [ f\left ( x+h\right ) +f\left ( x-h\right ) \right ] -\frac {8}{h^{2}}f\ \left ( x\right ) -\frac {1}{3}h^{2}f^{\left ( 4\right ) }\left ( x\right ) -\frac {1}{180}h^{4}\left [ f^{\left ( 6\right ) }\left ( \xi _{1}\right ) +f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \right ] \tag {3A}\end{align}

Now subtract (2A) from (3A) we obtain

\begin{align*} 3f^{\prime \prime }\left ( x\right ) & =\frac {4}{h^{2}}\left [ f\left ( x+h\right ) +f\left ( x-h\right ) \right ] -\frac {8}{h^{2}}f\ \left ( x\right ) -\frac {1}{3}h^{2}f^{\left ( 4\right ) }\left ( x\right ) -\frac {1}{180}h^{4}\left [ f^{\left ( 6\right ) }\left ( \xi _{1}\right ) +f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \right ] -\\ & \left ( \frac {1}{4h^{2}}\left [ f\left ( x+2h\right ) +f\left ( x-2h\right ) \right ] -\frac {1}{2h^{2}}f\ \left ( x\right ) -\frac {1}{3}h^{2}f^{\left ( 4\right ) }\left ( x\right ) -\frac {1}{45}h^{4}\left [ f^{\left ( 6\right ) }\left ( \xi _{1}\right ) +f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \right ] \right ) \end{align*}

Hence

\begin{align*} 3f^{\prime \prime }\left ( x\right ) & =\frac {4}{h^{2}}\left [ f\left ( x+h\right ) +f\left ( x-h\right ) \right ] -\frac {8}{h^{2}}f\ \left ( x\right ) -\frac {1}{180}h^{4}\left [ f^{\left ( 6\right ) }\left ( \xi _{1}\right ) +f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \right ] -\\ & \frac {1}{4h^{2}}\left [ f\left ( x+2h\right ) +f\left ( x-2h\right ) \right ] +\frac {1}{2h^{2}}f\ \left ( x\right ) +\frac {1}{45}h^{4}\left [ f^{\left ( 6\right ) }\left ( \xi _{1}\right ) +f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \right ] \\ & \\ f^{\prime \prime }\left ( x\right ) & =\frac {4}{3h^{2}}\left [ f\left ( x+h\right ) +f\left ( x-h\right ) \right ] -\frac {1}{12h^{2}}\left [ f\left ( x+2h\right ) +f\left ( x-2h\right ) \right ] -\frac {8}{3h^{2}}f\ \left ( x\right ) +\frac {1}{6h^{2}}f\ \left ( x\right ) -\\ & \frac {1}{3\times 180}h^{4}\left [ f^{\left ( 6\right ) }\left ( \xi _{1}\right ) +f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \right ] +\frac {1}{3\times 45}h^{4}\left [ f^{\left ( 6\right ) }\left ( \xi _{1}\right ) +f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \right ] \\ & \\ & =\frac {1}{12h^{2}}\left ( 16\left [ f\left ( x+h\right ) +f\left ( x-h\right ) \right ] -\left [ f\left ( x+2h\right ) +f\left ( x-2h\right ) \right ] -32f\ \left ( x\right ) +2f\ \left ( x\right ) \right ) +\frac {1}{180}h^{4}\left [ f^{\left ( 6\right ) }\left ( \xi _{1}\right ) +f^{\left ( 6\right ) }\left ( \xi _{2}\right ) \right ] \\ & =\frac {1}{12h^{2}}\left ( 16f\left ( x+h\right ) +16f\left ( x-h\right ) -f\left ( x+2h\right ) -f\left ( x-2h\right ) -30f\ \left ( x\right ) \right ) +\frac {1}{180}h^{4}\left [ f^{\left ( 6\right ) }\left ( \xi \right ) \right ] \end{align*}

Rearrange terms to make it look as in the textbook

\begin{equation} f^{\prime \prime }\left ( x\right ) =\frac {1}{12h^{2}}\left ( -f\left ( x+2h\right ) +16f\left ( x+h\right ) -30f\ \left ( x\right ) +16f\left ( x-h\right ) -f\left ( x-2h\right ) \right ) +\frac {1}{180}h^{4}\left [ f^{\left ( 6\right ) }\left ( \xi \right ) \right ] \tag {4A}\end{equation}

Hence from equation (4A) we see that the error is \(O\left ( h^{4}\right ) \) as required to show.

Hence

\[ f^\prime \prime \left ( x\right ) \approx \frac {1}{12h^2}\left ( -f\left ( x+2h\right ) +16f\left ( x+h\right ) -30f\ \left ( x\right ) +16f\left ( x-h\right ) -f\left ( x-2h\right ) \right ) \]

4.13.2 Section 7.1, Problem 9

problem: Show that in Richardson extrapolation, \(D\left ( 2,2\right ) =\frac {16}{15}\psi \left ( \frac {h}{2}\right ) -\frac {1}{15}\psi \left ( h\right ) \)

Solution:

\begin{equation} D\left ( n,k\right ) =\frac {4^{k}}{4^{k}-1}D\left ( n,k-1\right ) -\frac {1}{4^{k}-1}D\left ( n-1,k-1\right ) \tag {1}\end{equation}

Now, when \(n=2,k=2\), we obtain from (1)

\begin{align*} D\left ( 2,2\right ) & =\frac {4^{2}}{4^{2}-1}D\left ( 2,1\right ) -\frac {1}{4^{2}-1}D\left ( 1,1\right ) \\ & =\frac {16}{15}D\left ( 2,1\right ) -\frac {1}{15}D\left ( 1,1\right ) \end{align*}

But since \(D\left ( 1,1\right ) =\psi \left ( h\right ) ,D\left ( 2,1\right ) =\psi \left ( \frac {h}{2}\right ) \)

\[ D\left ( 2,2\right ) =\frac {16}{15}\psi \left ( \frac {h}{2}\right ) -\frac {1}{15}\psi \left ( h\right ) \]

4.13.3 Section 7.1, Problem 14

problem: Using Taylor series, derive the error term for the approximation

\[ f^{\prime }\left ( x\right ) \approx \frac {1}{2h}\left [ -3f\left ( x\right ) +4f\left ( x+h\right ) -f\left ( x+2h\right ) \right ] \]

answer:

expand around \(x+h\)

\begin{align} f\left ( x+h\right ) & =f\left ( x\right ) +hf^{\prime }\left ( x\right ) +\frac {h^{2}}{2}f^{\prime \prime }\left ( x\right ) +\frac {h^{3}}{6}f^{\prime \prime \prime }\left ( \xi _{1}\right ) \nonumber \\ f^{\prime }\left ( x\right ) & =\frac {1}{h}f\left ( x+h\right ) -\frac {1}{h}f\left ( x\right ) -\frac {h}{2}f^{\prime \prime }\left ( x\right ) -\frac {h^{2}}{6}f^{\prime \prime \prime }\left ( \xi _{1}\right ) \tag {1}\end{align}

Now expand around \(x+2h\)

\begin{align} f\left ( x+2h\right ) & =f\left ( x\right ) +2hf^{\prime }\left ( x\right ) +2h^{2}f^{\prime \prime }\left ( x\right ) +\frac {8h^{3}}{6}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \nonumber \\ f^{\prime }\left ( x\right ) & =\frac {1}{2h}f\left ( x+2h\right ) -\frac {1}{2h}f\left ( x\right ) -hf^{\prime \prime }\left ( x\right ) -\frac {4h^{2}}{6}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \tag {2}\end{align}

Multiply (2) by \(-\frac {1}{2}\) and add result to (1) we obtain

\begin{align*} -\frac {1}{2}f^{\prime }\left ( x\right ) +f^{\prime }\left ( x\right ) & =-\frac {1}{2}\left ( \frac {1}{2h}f\left ( x+2h\right ) -\frac {1}{2h}f\left ( x\right ) -hf^{\prime \prime }\left ( x\right ) -\frac {4h^{2}}{6}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \right ) +\\ & \left ( \frac {1}{h}f\left ( x+h\right ) -\frac {1}{h}f\left ( x\right ) -\frac {h}{2}f^{\prime \prime }\left ( x\right ) -\frac {h^{2}}{6}f^{\prime \prime \prime }\left ( \xi _{1}\right ) \right ) \\ & \\ \frac {1}{2}f^{\prime }\left ( x\right ) & =\frac {-1}{4h}f\left ( x+2h\right ) +\frac {1}{4h}f\left ( x\right ) +\frac {h}{2}f^{\prime \prime }\left ( x\right ) +\frac {2h^{2}}{6}f^{\prime \prime \prime }\left ( \xi _{2}\right ) +\frac {1}{h}f\left ( x+h\right ) -\frac {1}{h}f\left ( x\right ) -\frac {h}{2}f^{\prime \prime }\left ( x\right ) -\frac {h^{2}}{6}f^{\prime \prime \prime }\left ( \xi _{1}\right ) \\ f^{\prime }\left ( x\right ) & =\frac {-1}{2h}f\left ( x+2h\right ) +\frac {1}{2h}f\left ( x\right ) +hf^{\prime \prime }\left ( x\right ) +\frac {4h^{2}}{6}f^{\prime \prime \prime }\left ( \xi _{2}\right ) +\frac {2}{h}f\left ( x+h\right ) -\frac {2}{h}f\left ( x\right ) -hf^{\prime \prime }\left ( x\right ) -\frac {2h^{2}}{6}f^{\prime \prime \prime }\left ( \xi _{1}\right ) \\ & =\left [ \frac {-1}{2h}f\left ( x+2h\right ) +\frac {1}{2h}f\left ( x\right ) +hf^{\prime \prime }\left ( x\right ) +\frac {2}{h}f\left ( x+h\right ) -\frac {2}{h}f\left ( x\right ) -hf^{\prime \prime }\left ( x\right ) \right ] -\frac {2h^{2}}{6}f^{\prime \prime \prime }\left ( \xi _{1}\right ) +\frac {4h^{2}}{6}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \\ & =\frac {1}{2h}\left [ -f\left ( x+2h\right ) +f\left ( x\right ) +4f\left ( x+h\right ) -4f\left ( x\right ) \right ] -\frac {h^{2}}{3}f^{\prime \prime \prime }\left ( \xi _{1}\right ) +\frac {2h^{2}}{3}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \\ & =\frac {1}{2h}\left [ -f\left ( x+2h\right ) -3f\left ( x\right ) +4f\left ( x+h\right ) \right ] -h^{2}\left ( \frac {1}{3}f^{\prime \prime \prime }\left ( \xi _{1}\right ) +\frac {2}{3}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \right ) \end{align*}

Which is the equation we are asked to show.

From the above we see that the error term is given by

\[ h^{2}\left ( \frac {1}{3}f^{\prime \prime \prime }\left ( \xi _{1}\right ) +\frac {2}{3}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \right ) \]

Hence the error is \(O\left ( h^{2}\right ) \)

4.13.4 Section 7.1, Problem 16

problem: Using Taylor series, derive the error term for the approximation

\[ f^{\prime \prime }\left ( x\right ) \approx \frac {1}{h^{2}}\left [ f\left ( x\right ) -2f\left ( x+h\right ) +f\left ( x+2h\right ) \right ] \]

Answer: expand around \(x+h\)

\begin{align} f\left ( x+h\right ) & =f\left ( x\right ) +hf^{\prime }\left ( x\right ) +\frac {h^{2}}{2}f^{\prime \prime }\left ( x\right ) +\frac {h^{3}}{6}f^{\prime \prime \prime }\left ( \xi _{1}\right ) \nonumber \\ \frac {h^{2}}{2}f^{\prime \prime }\left ( x\right ) & =f\left ( x+h\right ) -f\left ( x\right ) -hf^{\prime }\left ( x\right ) -\frac {h^{3}}{6}f^{\prime \prime \prime }\left ( \xi _{1}\right ) \nonumber \\ f^{\prime \prime }\left ( x\right ) & =\frac {2}{h^{2}}f\left ( x+h\right ) -\frac {2}{h^{2}}f\left ( x\right ) -\frac {2}{h}f^{\prime }\left ( x\right ) -\frac {h}{3}f^{\prime \prime \prime }\left ( \xi _{1}\right ) \tag {1}\end{align}

Now expand around \(x+2h\)

\begin{align} f\left ( x+2h\right ) & =f\left ( x\right ) +2hf^{\prime }\left ( x\right ) +2h^{2}f^{\prime \prime }\left ( x\right ) +\frac {8h^{3}}{6}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \nonumber \\ 2h^{2}f^{\prime \prime }\left ( x\right ) & =f\left ( x+2h\right ) -f\left ( x\right ) -2hf^{\prime }\left ( x\right ) -\frac {8h^{3}}{6}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \nonumber \\ f^{\prime \prime }\left ( x\right ) & =\frac {1}{2h^{2}}f\left ( x+2h\right ) -\frac {1}{2h^{2}}f\left ( x\right ) -\frac {1}{h}f^{\prime }\left ( x\right ) -\frac {2h}{3}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \tag {2}\end{align}

Multiply (2) by \(-2\) and add result to (1) we obtain

\begin{align*} -2f^{\prime \prime }\left ( x\right ) +f^{\prime \prime }\left ( x\right ) & =-2\left ( \frac {1}{2h^{2}}f\left ( x+2h\right ) -\frac {1}{2h^{2}}f\left ( x\right ) -\frac {1}{h}f^{\prime }\left ( x\right ) -\frac {2h}{3}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \right ) \\ & +\frac {2}{h^{2}}f\left ( x+h\right ) -\frac {2}{h^{2}}f\left ( x\right ) -\frac {2}{h}f^{\prime }\left ( x\right ) -\frac {h}{3}f^{\prime \prime \prime }\left ( \xi _{1}\right ) \\ -f^{\prime \prime }\left ( x\right ) & =-\frac {1}{h^{2}}f\left ( x+2h\right ) +\frac {1}{h^{2}}f\left ( x\right ) +\frac {2}{h}f^{\prime }\left ( x\right ) +\frac {4h}{3}f^{\prime \prime \prime }\left ( \xi _{2}\right ) +\frac {2}{h^{2}}f\left ( x+h\right ) -\frac {2}{h^{2}}f\left ( x\right ) -\frac {2}{h}f^{\prime }\left ( x\right ) -\frac {h}{3}f^{\prime \prime \prime }\left ( \xi _{1}\right ) \\ f^{\prime \prime }\left ( x\right ) & =\frac {1}{h^{2}}f\left ( x+2h\right ) -\frac {1}{h^{2}}f\left ( x\right ) -\frac {2}{h}f^{\prime }\left ( x\right ) -\frac {4h}{3}f^{\prime \prime \prime }\left ( \xi _{2}\right ) -\frac {2}{h^{2}}f\left ( x+h\right ) +\frac {2}{h^{2}}f\left ( x\right ) +\frac {2}{h}f^{\prime }\left ( x\right ) +\frac {h}{3}f^{\prime \prime \prime }\left ( \xi _{1}\right ) \\ & =\frac {1}{h^{2}}f\left ( x+2h\right ) -\frac {1}{h^{2}}f\left ( x\right ) -\frac {2}{h^{2}}f\left ( x+h\right ) +\frac {2}{h^{2}}f\left ( x\right ) +\frac {h}{3}f^{\prime \prime \prime }\left ( \xi _{1}\right ) -\frac {4h}{3}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \\ & =\frac {1}{h^{2}}\left ( f\left ( x+2h\right ) -f\left ( x\right ) -2f\left ( x+h\right ) +2f\left ( x\right ) \right ) +h\left ( \frac {1}{3}f^{\prime \prime \prime }\left ( \xi _{1}\right ) -\frac {4}{3}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \right ) \\ & =\frac {1}{h^{2}}\left ( f\left ( x+2h\right ) +f\left ( x\right ) -2f\left ( x+h\right ) \right ) +h\left ( \frac {1}{3}f^{\prime \prime \prime }\left ( \xi _{1}\right ) -\frac {4}{3}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \right ) \end{align*}

Hence

\[ f^{\prime \prime }\left ( x\right ) \approx \frac {1}{h^{2}}\left ( f\left ( x+2h\right ) +f\left ( x\right ) -2f\left ( x+h\right ) \right ) \]

with the error term

\[ h\left ( \frac {1}{3}f^{\prime \prime \prime }\left ( \xi _{1}\right ) -\frac {4}{3}f^{\prime \prime \prime }\left ( \xi _{2}\right ) \right ) \]

Hence the error is \(O\left ( h\right ) \)

4.13.5 Computer assignment 4/30/2007. Richardson Algorithm

This is the output

This is the source code

%script to test nma_richardson 
%Nasser Abbasi 
 
h=1 
x=sqrt(2); 
f=@(x)atan(x); 
M=6; 
 
%first compute in single prcesion 
 D=nma_richardson(h,x,f,M,0); 
 format long g; 
 fprintf('Richardson table in single floating point\n'); 
 D 
 
 %Now do it in double prcesion 
 D=nma_richardson(h,x,f,M,1); 
 format long g; 
 fprintf('Richardson table in double floating point\n'); 
 D
function D=nma_richardson(h,x,f,M,doubleFlag) 
%function D=nma_richardson(h,x,f,M,doubleFlag) 
% 
%INPUT: 
% h:  spacing for numerical differentiation 
% x:  where to find diff 
% f:  the function whos derivative we are finding 
% M:  how big a richardson table to make 
% doubleFlag: 0 to do it in single floating point 
%             or 1 to do it in double floating 
 
 
% MATH 501, CSUF, spring 2007 
% computer assignment 4/30/2007 
% Richardson extrapolation 
% 
%Nasser Abbasi, May 5,2007 
 
if doubleFlag 
    D=zeros(M+1,M+1); 
else 
    D=zeros(M+1,M+1,'single'); 
end 
 
for n=1:M+1 
    D(n,1)=phi(h/(2^(n-1)),x,f); 
end 
 
for k=2:M+1 
    for n=k:M+1 
        D(n,k)=D(n,k-1)+(D(n,k-1)-D(n-1,k-1))/(4^(k-1)-1); 
    end 
end 
end 
 
 
function r=phi(h,x,f) 
r=1/(2*h)*(f(x+h)-f(x-h)); 
end

4.13.6 Computer assignment 5/2/2007. Midpoint,Trapezoid and Simpson

4.13.6.1 Conclusion

This table summarizes the results of the 3 methods

Method RESULTS
Simpson Error term \(\frac {1}{180}\left ( b-a\right ) h^{4}\ \max \left \vert f^{\left ( 4\right ) }\left ( \xi \right ) \right \vert \)
\(I=\int _{a}^{b}f\left ( x\right ) dx\approx \frac {h}{3}\left ( f\left ( x_{0}\right ) +2\sum _{i=2}^{N/2}f\left ( x_{2i-2}\right ) +4\sum _{i=1}^{N/2}f\left ( x_{2i-1}\right ) +f\left ( x_{N}\right ) \right ) \)
Intervals needed: \(900\)
long format print of numerical integration: \(90.379254649757272\)
Midpoint Error term \(\frac {1}{24}\left ( b-a\right ) h^{2}\max \left \vert f^{\left ( 2\right ) }\left ( \xi \right ) \right \vert \)
\(\int _{a}^{b}f\left ( x\right ) dx\approx h\sum _{i=1}^{N-1}f\left ( \frac {x_{i+1}+x_{i}}{2}\right ) \ \ \ \ \ \ \ \ \ \ \ \ \ \)note: \(N\) here is number of points
Intervals needed: \(174,285\)
long format print of numerical integration: \(90.379254649446878\)
Trapezoid Error term \(\frac {1}{12}\left ( b-a\right ) h^{2}\max \left \vert f^{\left ( 2\right ) }\left ( \xi \right ) \right \vert \)
\(h\left ( \frac {f\left ( x_{1}\right ) }{2}+\sum _{i=2}^{N-1}f\left ( x_{i}\right ) +\frac {f\left ( x_{N}\right ) }{2}\right ) \ \ \ \ \ \ \ \ \ \ \ \ \ \)note: \(N\) here is number of points
Intervals needed: \(246,476\)
long format print of numerical integration: \(90.379254649958952\)
4.13.6.2 Simpson

The error term in simpson is \(\frac {1}{180}\left ( b-a\right ) h^{4}\ \max \left \vert f^{\left ( 4\right ) }\left ( \xi \right ) \right \vert \) for some \(\xi \) between \(b,a\). Since we want to limit the maximum error, we look to find where \(f\left ( \xi \right ) \) is Max.

The function is \(x\ln \left ( x\right ) \), hence \(f^{\left ( 4\right ) }\left ( x\right ) =\frac {2}{x^{3}}\) and this is maximum when \(x\) is smallest. Hence the maximum will occur at the lower end of the range, which is \(x=1\) in this example.

Now we find the number of intervals \(N\) from solving \(\frac {1}{180}\left ( b-a\right ) h^{4}\ \max \left \vert f^{\left ( 4\right ) }\left ( \xi \right ) \right \vert <10^{-9}\) where \(10^{-9}\) is the error we are asked to limit our computation error to be below.

Next, we solve for \(h\) from the above. Knowing \(h\), we find \(N\) which is the number of intervals. Next, we make sure \(N\) is even number by adjusting it if needed. We need to have even number of intervals  Next we apply the simpson integration formula which is

\[ I=\int _{a}^{b}f\left ( x\right ) dx\approx \frac {h}{3}\left ( f\left ( x_{0}\right ) +2\sum _{i=2}^{N/2}f\left ( x_{2i-2}\right ) +4\sum _{i=1}^{N/2}f\left ( x_{2i-1}\right ) +f\left ( x_{N}\right ) \right ) \]

In the above \(N\) is the number of intervals. Not to be confused with the following 2 algorithms below, where I used \(N\) to be number of points. For simpson, it was easier to stick with \(N\) being number of intervals.

The matlab implementation uses a vectorized version for speed.

To verify that the correct answer is obtain, it was compared with the output from a computer algebra system which uses an arbitrary large number of correct decimal points. The Matlab output was aligned against the CAS output and the digits verified to be correct to 9 decimal places are required.

Source code:

function nma_simpson_math_501 
% 
%Math 501, CSUF, spring 2007 
%Computer assignment 5/2/2007 
%Nasser Abbasi 
 
%For reference, this is exact answer for 60 decimal places 
%NIntegrate[x*Log[x], {x, 1, 10}, WorkingPrecision -> 60] 
%90.37925464970228420089957273421821038005507443143864880166639577470023557205731`60. 
% 
 
a = 1; 
b = 10; 
%maxError = 10^(-9); 
 
%(2/x^3) is  d^4/dx^4 (x log(x)) 
%so max error will be when x is smallest, i.e. at a=1 
I4      = abs(2/a^3); 
errTerm = 1/180 * (b-a) * I4; 
h       = maxError /errTerm; 
h       = h^(1/4); 
N       = ceil((b-a)/h);  % N is number of intervals 
 
%N isnumber of intervals it needs to be EVEN number of intervals 
if mod(N,2)==1 
   N = N+1; 
end 
 
h = (b-a)/N;  %update h since we rounded up above. 
fprintf('Simposon: Number of intervals needed is %d\n',N); 
 
x = linspace(a,b,N+1); 
f = @(x) x.*log(x);   %the function to integrate 
 
%vectorized solution 
I = f(x(1)) + 2*sum(f(x(3:2:end-2))) + 4*sum(f(x(2:2:end-1))) + f(x(end)); 
I = (h/3)*I; 
 
fprintf('answer is'); format long;  I
4.13.6.3 Midpoint

The error term is \(\frac {1}{24}\left ( b-a\right ) h^{2}\max \left \vert f^{\left ( 2\right ) }\left ( \xi \right ) \right \vert \) for some \(\xi \) between \(b,a\). Midterm is evaluated as follows

\[ I=\int _{a}^{b}f\left ( x\right ) dx\approx h\sum _{i=1}^{N-1}f\left ( \frac {x_{i+1}+x_{i}}{2}\right ) \]

where \(N\) is the number of points. And I am using the Matlab convention for indexing, where the first point is \(x_{1}\) and not \(x_{0}\)

We start by finding the number of intervals by solving for \(h\) from \(\frac {1}{24}\left ( b-a\right ) h^{2}\max \left \vert f^{\left ( 2\right ) }\left ( \xi \right ) \right \vert <10^{-9}\) where \(10^{-9}\) is the error we are asked to limit our computation error to be below.

The function is \(x\ln \left ( x\right ) \), hence \(f^{\left ( 2\right ) }\left ( x\right ) =\frac {1}{x}\) which is maximum at \(x=a\).

The matlab implementation is below with the output.

function nma_midpoint_math_501 
% 
%Math 501, CSUF, spring 2007 
%Computer assignment 5/2/2007 
%Nasser Abbasi 
 
%For reference, this is exact answer for 60 decimal places 
%NIntegrate[x*Log[x], {x, 1, 10}, WorkingPrecision -> 60] 
%90.37925464970228420089957273421821038005507443143864880166639577470023557205731`60. 
% 
 
a = 1; 
b = 10; 
maxError = 10^-9; 
 
%d^2/dx^2 (x log(x)) is (1/x) 
%so max error will be when x is smallest, i.e. at a=1 
I2      = abs(1/a); 
errTerm = 1/24 * (b-a) * I2; 
h       = maxError /errTerm; 
h       = sqrt(h); 
N       = ceil((b-a)/h); 
h       = (b-a)/N;  %update h since we rounded up above. 
fprintf('Midpoint: Number of intervals needed is %d\n',N); 
 
x     = linspace(a,b,N+1); 
xbar  = (x(1:end-1)+x(2:end))/2; 
f     = @(x) x.*log(x);   %the function to integrate 
 
%vectorized solution 
I = h*sum(f(xbar)); 
 
fprintf('answer is'); format long;  I

Output is

Midpoint: Number of intervals needed is 174285 
answer is 
I = 
 
  90.379254649446878
 
4.13.6.4 Trapezoid numerical integration

The error term is \(\frac {1}{12}\left ( b-a\right ) h^{2}\max \left \vert f^{\left ( 2\right ) }\left ( \xi \right ) \right \vert \) for some \(\xi \) between \(b,a\). Trapezoid is evaluated as follows

\[ h\left ( \frac {f\left ( x_{1}\right ) }{2}+\sum _{i=2}^{n-1}f\left ( x_{i}\right ) +\frac {f\left ( x_{n}\right ) }{2}\right ) \]

Where \(n\) is number of points, and I am using the Matlab indexing where \(x_{1}\) is the first point, and not \(x_{0}\), hence the last point is \(x_{n}\)

The following is the source code and the output

function nma_trap_math_501 
% 
%Math 501, CSUF, spring 2007 
%Computer assignment 5/2/2007 
%Nasser Abbasi 
 
%For reference, this is exact answer for 60 decimal places 
%NIntegrate[x*Log[x], {x, 1, 10}, WorkingPrecision -> 60] 
%90.37925464970228420089957273421821038005507443143864880166639577470023557205731`60. 
% 
 
a = 1; 
b = 10; 
maxError = 10^-9; 
 
%d^2/dx^2 (x log(x)) is (1/x) 
%so max error will be when x is smallest, i.e. at a=1 
I2      = abs(1/a); 
errTerm = 1/12 * (b-a) * I2; 
h       = maxError /errTerm; 
h       = sqrt(h); 
N       = ceil((b-a)/h);  % Number of intervals 
h       = (b-a)/N; 
fprintf('Trapezoid: Number of intervals needed is %d\n',N); 
 
x     = linspace(a,b,N+1); 
f     = @(x) x.*log(x);   %the function to integrate 
fbar  = sum(f(x(2:end-1))); 
 
%vectorized solution 
I = h * ( f(x(1))/2 + fbar + f(x(end))/2 ); 
 
fprintf('answer is'); format long;  I

Output

Trapezoid: Number of intervals needed is 246476 
answer is 
I = 
 
  90.379254649958952
 

4.13.7 source code

4.13.7.1 nma_compare.m
 
% Matlab code to illustrate the how the error changes in 
% computing the derivative of arctan(x) at x=SQRT(2) as a function 
% of changing h in Taylor approximation. Forcing Matlab to do the 
% computation using 32 bits 
% by Nasser Abbasi 
 
h=single(1); 
M=26; 
X=single(sqrt(2)); 
f=@(x) single(atan(x)); 
 
F1=f(X); 
S = zeros(26,6,'single'); 
 
for k=1:M 
    F2=f(X+h); 
    d=single(F2-F1); 
    r=single(d/h); 
    S(k,1)=k; S(k,2)=h; S(k,3)=F2; S(k,4)=F1; S(k,5)=d; S(k,6)=r; 
    h=single(h/2); 
end 
format long g 
S 
 
 
% Matlab code to illustrate the how the error changes in 
% computing the derivative of arctan(x) at x=SQRT(2) as a function 
% of changing h in Taylor approximation. using Matlab default double 
% floating point 
% by Nasser Abbasi 
clear all 
 
h=1; 
M=60; 
X=sqrt(2); 
f=@(x) atan(x); 
 
F1=f(X); 
S = zeros(26,6); 
 
for k=1:M 
    F2=f(X+h); 
    d=F2-F1; 
    r=d/h; 
    S(k,1)=k; S(k,2)=h; S(k,3)=F2; S(k,4)=F1; S(k,5)=d; S(k,6)=r; 
    h=h/2; 
end 
format long g 
S
4.13.7.2 nma_trapezoidal.m
function I=nma_trapezoidal(func,from,to,nStrips) 
%function r=nma_trapezoidal(f,from,to,nStrips) 
% 
% integrates a function using trapezoidal rule using 
% specific number of strips. 
% 
% INPUT: 
%   func : a string that repesents the function itself 
%       for example 'x*sin(x)'. The independent variable 
%       used in the string must be 'x' and no other letter. 
% 
%  from: lower limit 
%  to  : upper limit 
%  nStrips: number of strips to use 
% 
% OUTPUT 
%   I : The integral. 
% 
% Author: Nasser Abbasi 
% May 3, 2003 
 
I=0; 
 
if(nStrips<=0) 
    error('number of strips must be > 0'); 
end 
 
nPoints=nStrips+1; 
X=linspace(from,to,nPoints); 
h=X(2)-X(1); 
 
for(i=1:length(X)) 
    x=X(i); 
    f(i)=eval(func); 
    if(i==1 | i==length(X) ) 
        I=I+f(i); 
    else 
        I=I+2*f(i); 
    end 
end 
 
I=I/2; 
I=I*h;

4.13.8 Graded

PDF