4.3  HW4, Some floating points computation

  4.3.1  my solution, First Problem
  4.3.2  my solution, second problem
  4.3.3  key solution
PDF (letter size)
PDF (legal size)

4.3.1  my solution, First Problem

   4.3.1.1  Analysis
   4.3.1.2  Computation and Results
   4.3.1.3  Conclusion

Looking at 2 floating points problems. The first to illustrate the problem when adding large number to small number. The second to illustrate the problem of subtracting 2 numbers close to each others in magnitude.

Investigate floating point errors generated by the following sum \({\displaystyle \sum \limits _{n=1}^{N}} \frac{1}{n^{2}}\), compare the result to that due summation in forward and in reverse directions.

4.3.1.1 Analysis

When performing the sum in the forward direction, as in \(1+\frac{1}{4}+\frac{1}{16}+\cdots +\frac{1}{N^{2}}\) we observe that very quickly into the sum, we will be adding relatively large quantity to a very small quantity. Adding a large number of a very small number leads to loss of digits as was discussed in last lecture.  However, we adding in reverse order, as in \(\frac{1}{N^{2}}+\frac{1}{\left ( N-1\right ) ^{2}}+\frac{1}{\left ( N-2\right ) ^{2}}+\cdots +1\), we see that we will be adding, each time, 2 quantities that are relatively close to each other in magnitude. This reduces floating point errors.

The following code and results generated confirms the above. \(N=20,000\) was used. The computation was forced to be in single precision to be able to better illustrate the problem.

4.3.1.2 Computation and Results

This program prints the result of the sum in the forward direction

now compare the above result with that when performing the sum in the reverse direction

The result from the reverse direction sum is the more accurate result. To proof this, we can use double precision and will see that the sum resulting from double precision agrees with the digits from the above result when using reverse direction sum

4.3.1.3 Conclusion

In floating point arithmetic, avoid adding a large number to a very small number as this results in loss of digits of the small number. The above trick illustrate one way to accomplish this and still perform the required computation.

In the above, there was \(1.644884-1.644725=\allowbreak 1.\,\allowbreak 59\times 10^{-4}\) error in the sum when it was done in the forward direction as compared to the reverse direction (for \(20,000\) steps)\(.\)In relative term, this error is \(\frac{1.644884-1.644725}{1.644884}100\) which is about \(0.01\%\) relative error.

4.3.2  my solution, second problem

Investigate the problem when subtracting 2 numbers which are close in magnitude. If \(a,b\,\)are 2 numbers close to each others, then instead of doing \(a-b\) do the following \(\left ( a-b\right ) \frac{\left ( a+b\right ) }{\left ( a+b\right ) }=\frac{a^{2}-b^{2}}{a+b}\). The following program attempts to illustrate this by comparing result from \(a-b\) to that from \(\frac{a^{2}-b^{2}}{a+b}\) for 2 numbers close to each others.

I need to look more into this as I am not getting the right 2 numbers to show this problem.

4.3.3  key solution

PDF