This note compares the result of computing the numerical derivative to \(\arctan \left ( x\right ) \) at \(x=\sqrt {2}\) using Taylor approximation using single ﬂoating point and double ﬂoating point. This was done using Matlab. With Matlab, we can do single ﬂoating point computation using the single command. The default in Matlab is to do all the computations in double precision.
The approximation used is \(f^{\prime }\left ( x\right ) =\frac {1}{h}\left ( f\left ( x+h\right ) -f\left ( x\right ) \right ) \) with \(h\) starting at \(1\) and halving it at each iteration.
The exact answer to \(\frac {d\arctan \left ( x\right ) }{dx}\) evaluated at \(x=\sqrt {2}\) is \(1/3.\) The results below show that using single precision, the numerical derivative keeps getting closer the exact answer up to iteration 12. The best answer is accuracy to 4 decimal places. After iteration 12, subtractive cancellation (loss of signiﬁcance, L.O.S) become more dominant, and the result starts to become less accurate.
Using double precision, we see that we can go up to iteration 27 before loss of signiﬁcance kicks in. The best numerical result at this point is accurate to 8 decimal points. Hence the accuracy is twice that of single precision.
The following diagram displays the results table for single precision, with a red box around the line where the numerical results starts to be aﬀected by L.O.S. with the Matlab code used.
The following diagram displays the results table for double precision, with a red box around the line where the numerical results starts to be aﬀected by L.O.S. The Matlab code is the same as before, expect we simplify remove the command single wherever it was used.
Source code listing
% Nasser M. Abbasi. Do computation using 32 bit % Computing derivative of arctan(x) at =sqrt(2) as a function % of changing h in Taylor approximation. h = single(1); M = 26; X = single(sqrt(2)); f = @(x) single(atan(x)); F1 = f(X); S = zeros(25,6,'single'); for k = 1:M F2 = f(X+h); d = single(F2-F1); r = single(d/h); S(k,1) = k; S(k,2) = h; S(k,3) = F2; S(k,4) = F1; S(k,5) = d; S(k,6) = r; h = single(h/2); end format long g S