2.2 HW 2
2.2.1 Problem
question: Consider the solution of \(A\mathbf {\vec {x}}=\mathbf {\vec {b}}+\mathbf {\vec {n}}\) where \(A\) is \(m\times n\) matrix, \(\mathbf {\vec {b}}\epsilon R^{m},\mathbf {\vec {x}}\epsilon R^{n}\) and \(\mathbf {\vec {n}}\) is vector of i.i.d. Gaussian \(N\left ( 0,\sigma ^{2}\mathbf {I}_{m}\right ) \) noise vector.
(i.e. noise is white Gaussian noise). Determine the best solution \(\vec {x}\)
Answer
Let us refer to the observed output (which includes the noise \(\mathbf {\vec {n}}\)) as \(\mathbf {\vec {z}}\), hence we write \(\mathbf {\vec {z}}=\mathbf {\vec {b}}+\mathbf {\vec {n}}\)
where \(\mathbf {\vec {b}}\) is the uncontaminated output (what the observed output would be if there is no
noise).
Since the noise \(\mathbf {\vec {n}}\) is an additive noise to the output \(\mathbf {\vec {b}}\) of the system, the since the noise has zero
mean, then the mean of \(\mathbf {\vec {z}}\) will be the same as the mean of \(\mathbf {\vec {b}}\). But \(\mathbf {\vec {b}}\) is a deterministic signal which does
not change, hence its mean is its value, hence the mean of \(\mathbf {\vec {z}}\,\ \)is \(\mathbf {\vec {b}\,.}\)
Now, \(\mathbf {\vec {z}}\) is described by a probability density function PDF as follows ( \(\mathbf {\vec {z}}\) is in \(R^{m}\), hence it is \(m\) long
vector)
\begin{align*} \Pr \left ( \mathbf {\vec {z};}\left \{ \mu \left ( \mathbf {\vec {z}}\right ) ,\sigma ^{2}\mathbf {I}_{m}\right \} \right ) & =\frac {1}{\left ( 2\pi \sigma ^{2}\right ) ^{\frac {m}{2}}}\exp \left ( -\frac {1}{2\sigma ^{2}}\left \Vert \mathbf {\vec {z}}-\mu \left ( \mathbf {\vec {z}}\right ) \right \Vert ^{2}\right ) \\ \Pr \left ( \mathbf {\vec {z};}\left \{ \mathbf {\vec {b},}\sigma ^{2}\mathbf {I}_{m}\right \} \right ) & =\frac {1}{\left ( 2\pi \sigma ^{2}\right ) ^{\frac {m}{2}}}\exp \left ( -\frac {1}{2\sigma ^{2}}\left \Vert \mathbf {\vec {z}}-\mathbf {\vec {b}}\right \Vert ^{2}\right ) \end{align*}
But since \(A\mathbf {\vec {x}}=\mathbf {\vec {b}}\), then the above can be written as
\[ \Pr \left ( \mathbf {\vec {z};}\left \{ A\mathbf {\vec {x},}\sigma ^{2}\mathbf {I}_{m}\right \} \right ) =\frac {1}{\left ( 2\pi \sigma ^{2}\right ) ^{\frac {m}{2}}}\exp \left ( -\frac {1}{2\sigma ^{2}}\left \Vert \mathbf {\vec {z}}-A\mathbf {\vec {x}}\right \Vert ^{2}\right ) \]
Since \(A\) is a constant matrix (system is assumed to be time-invariant), hence from the above we see
that the expression gives the probability of observing \(\mathbf {\vec {z}}\) for a given \(\mathbf {\vec {x}}\). Hence the best
estimate of \(\mathbf {\vec {x}}\) would be the one which maximizes this probability. Instead of maximizing the
PDF directly, we maximize its natural logarithm (a mathematical convenience trick, no
more).
Now find the natural logarithm of the above quantity, and find where the result is maximum.
\begin{align} \frac {\partial }{\partial \mathbf {\vec {x}}}\ln \Pr \left ( \mathbf {\vec {z};}\left \{ A\mathbf {\vec {x},}\sigma ^{2}\mathbf {I}_{m}\right \} \right ) & =\frac {\partial }{\partial \mathbf {\vec {x}}}\left ( \ln \frac {1}{\left ( 2\pi \sigma ^{2}\right ) ^{\frac {m}{2}}}\exp \left ( -\frac {1}{2\sigma ^{2}}\left \Vert \mathbf {\vec {z}}-A\mathbf {\vec {x}}\right \Vert ^{2}\right ) \right ) \nonumber \\ & =\nabla _{\mathbf {\vec {x}}}\left ( \ln \frac {1}{\left ( 2\pi \sigma ^{2}\right ) ^{\frac {m}{2}}}+\ln \exp \left ( -\frac {1}{2\sigma ^{2}}\left \Vert \mathbf {\vec {z}}-A\mathbf {\vec {x}}\right \Vert ^{2}\right ) \right ) \nonumber \\ & =\overset {0}{\overbrace {\nabla _{\mathbf {\vec {x}}}\left ( \ln \frac {1}{\left ( 2\pi \sigma ^{2}\right ) ^{\frac {m}{2}}}\right ) }}+\nabla _{\mathbf {\vec {x}}}\left ( -\frac {1}{2\sigma ^{2}}\left \Vert \mathbf {\vec {z}}-A\mathbf {\vec {x}}\right \Vert ^{2}\right ) \nonumber \\ & =-\frac {1}{2\sigma ^{2}}\nabla _{\mathbf {\vec {x}}}\left ( \left \Vert \mathbf {\vec {z}}-A\mathbf {\vec {x}}\right \Vert ^{2}\right ) \tag {1}\end{align}
Start with \(m=n=1\), hence the above becomes
\begin{align*} \frac {\partial }{\partial \mathbf {\vec {x}}}\ln \Pr \left ( \mathbf {\vec {z};}\left \{ A\mathbf {\vec {x},}\sigma ^{2}\mathbf {I}_{m}\right \} \right ) & =-\frac {1}{2\sigma ^{2}}\nabla _{x_{1}}\left ( z_{1}^{2}+a_{11}^{2}x_{1}^{2}-2z_{1}a_{11}x_{1}\right ) \\ & =-\frac {1}{2\sigma ^{2}}\left ( 2a_{11}^{2}x_{1}-2a_{11}z_{1}\right ) \\ & =-\frac {1}{\sigma ^{2}}\left ( a_{11}^{2}x_{1}-a_{11}z_{1}\right ) \end{align*}
Set the above to zero and solve for \(x_{1}\)
\begin{align*} 0 & =\left ( a_{11}x_{1}-z_{1}\right ) \\ x_{1} & =\frac {z_{1}}{a_{11}}\end{align*}
Hence
\[ \fbox {$x_1=\frac {z_1}{a_11}$}\]
This matches the least squares solution \(x_{1}=\left ( A^{T}A\right ) ^{-1}A^{T}z_{1}\rightarrow x_{1}=\left ( a_{11}a_{11}\right ) ^{-1}a_{11}z_{1}=\frac {z_{1}}{a_{11}}\)
Now I need to do this for \(m=n=2,\) and assuming that \(var\left ( z_{1}\right ) =var\left ( z_{1}\right ) \). We can start from equation (1) above, shown
again below
\begin{align*} \frac {\partial }{\partial \mathbf {\vec {x}}}\ln \Pr \left ( \mathbf {\vec {z};}\left \{ A\mathbf {\vec {x},}\sigma ^{2}\mathbf {I}_{m}\right \} \right ) & =-\frac {1}{2\sigma ^{2}}\nabla _{\mathbf {\vec {x}}}\left ( \left \Vert \mathbf {\vec {z}}-A\mathbf {\vec {x}}\right \Vert ^{2}\right ) \\ -2\sigma ^{2}\frac {\partial }{\partial \mathbf {\vec {x}}}\ln \Pr \left ( \mathbf {\vec {z};}\left \{ A\mathbf {\vec {x},}\sigma ^{2}\mathbf {I}_{m}\right \} \right ) & =\nabla _{\mathbf {\vec {x}}}\left ( \left \Vert \mathbf {\vec {z}}-A\mathbf {\vec {x}}\right \Vert ^{2}\right ) \\ & =\nabla _{\mathbf {\vec {x}}}\left ( \left \Vert \begin {pmatrix} z_{1}\\ z_{2}\end {pmatrix} -\begin {pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end {pmatrix}\begin {pmatrix} x_{1}\\ x_{2}\end {pmatrix} \right \Vert _{2}^{2}\right ) \\ & =\nabla _{\mathbf {\vec {x}}}\left ( \left \Vert \begin {pmatrix} z_{1}\\ z_{2}\end {pmatrix} -\begin {pmatrix} a_{11}x_{1}+a_{12}x_{2}\\ a_{21}x_{1}+a_{22}x_{2}\end {pmatrix} \right \Vert _{2}^{2}\right ) \\ & =\nabla _{\mathbf {\vec {x}}}\left ( \left \Vert \begin {pmatrix} z_{1}-\left ( a_{11}x_{1}+a_{12}x_{2}\right ) \\ z_{2}-\left ( a_{21}x_{1}+a_{22}x_{2}\right ) \end {pmatrix} \right \Vert _{2}^{2}\right ) \\ & =\nabla _{\mathbf {\vec {x}}}\left ( \left [ z_{1}-\left ( a_{11}x_{1}+a_{12}x_{2}\right ) \right ] ^{2}+\left [ z_{2}-\left ( a_{21}x_{1}+a_{22}x_{2}\right ) \right ] ^{2}\right ) \end{align*}
Let
\begin{align*} f\left ( x_{1},x_{2}\right ) & =\left [ z_{1}-\left ( a_{11}x_{1}+a_{12}x_{2}\right ) \right ] ^{2}+\left [ z_{2}-\left ( a_{21}x_{1}+a_{22}x_{2}\right ) \right ] ^{2}\\ & =\left [ z_{1}^{2}+\left ( a_{11}x_{1}+a_{12}x_{2}\right ) ^{2}-2z_{1}\left ( a_{11}x_{1}+a_{12}x_{2}\right ) \right ] +\left [ z_{2}^{2}+\left ( a_{21}x_{1}+a_{22}x_{2}\right ) ^{2}-2z_{2}\left ( a_{21}x_{1}+a_{22}x_{2}\right ) \right ] \\ & =\left [ z_{1}^{2}+a_{11}^{2}x_{1}^{2}+a_{12}^{2}x_{2}^{2}+2a_{11}a_{12}x_{1}x_{2}-2z_{1}a_{11}x_{1}-2z_{1}a_{12}x_{2}\right ] +\\ & \left [ z_{2}^{2}+a_{21}^{2}x_{1}^{2}+a_{22}^{2}x_{2}^{2}+2a_{21}a_{22}x_{1}x_{2}-2z_{2}a_{21}x_{1}-2z_{2}a_{22}x_{2}\right ] \end{align*}
Then
\[ \nabla _{\mathbf {\vec {x}}}\left ( f\left ( x_{1},x_{2}\right ) \right ) =\begin {pmatrix} \frac {\partial f\left ( x_{1},x_{2}\right ) }{\partial x_{1}}\\ \frac {\partial f\left ( x_{1},x_{2}\right ) }{\partial x_{2}}\end {pmatrix} \]
so
\begin{align*} \frac {\partial f\left ( x_{1},x_{2}\right ) }{\partial x_{1}} & =\left [ 0+2a_{11}^{2}x_{1}+0+2a_{11}a_{12}x_{2}-2z_{1}a_{11}-0\right ] +\left [ 0+2a_{21}^{2}x_{1}+0+2a_{21}a_{22}x_{2}-2z_{2}a_{21}-0\right ] \\ & =2a_{11}^{2}x_{1}+2a_{11}a_{12}x_{2}-2z_{1}a_{11}+2a_{21}^{2}x_{1}+2a_{21}a_{22}x_{2}-2z_{2}a_{21}\\ & =x_{1}\left ( 2a_{11}^{2}+2a_{21}^{2}\right ) +x_{2}\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) -2z_{1}a_{11}-2z_{2}a_{21}\end{align*}
and
\begin{align*} \frac {\partial f\left ( x_{1},x_{2}\right ) }{\partial x_{2}} & =\left [ 0+0+2a_{12}^{2}x_{2}+2a_{11}a_{12}x_{1}-0-2z_{1}a_{12}\right ] +\left [ 0+0+2a_{22}^{2}x_{2}+2a_{21}a_{22}x_{1}-0-2z_{2}a_{22}\right ] \\ & =2a_{12}^{2}x_{2}+2a_{11}a_{12}x_{1}-2z_{1}a_{12}+2a_{22}^{2}x_{2}+2a_{21}a_{22}x_{1}-2z_{2}a_{22}\\ & =x_{1}\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) +x_{2}\left ( 2a_{12}^{2}+2a_{22}^{2}\right ) -2z_{1}a_{12}-2z_{2}a_{22}\end{align*}
Hence we obtain that
\begin{align*} \frac {\partial }{\partial \mathbf {\vec {x}}}\ln \Pr \left ( \mathbf {\vec {z};}\left \{ A\mathbf {\vec {x},}\sigma ^{2}\mathbf {I}_{m}\right \} \right ) & =-\frac {1}{2\sigma ^{2}}\nabla _{\mathbf {\vec {x}}}\left ( f\left ( x_{1},x_{2}\right ) \right ) \\ & =-\frac {1}{2\sigma ^{2}}\begin {pmatrix} x_{1}\left ( 2a_{11}^{2}+2a_{21}^{2}\right ) +x_{2}\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) -2z_{1}a_{11}-2z_{2}a_{21}\\ x_{1}\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) +x_{2}\left ( 2a_{12}^{2}+2a_{22}^{2}\right ) -2z_{1}a_{12}-2z_{2}a_{22}\end {pmatrix} =\begin {pmatrix} 0\\ 0 \end {pmatrix} \end{align*}
Hence
\[ x_{1}\left ( 2a_{11}^{2}+2a_{21}^{2}\right ) +x_{2}\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) -2z_{1}a_{11}-2z_{2}a_{21}=0 \]
and
\[ x_{1}\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) +x_{2}\left ( 2a_{12}^{2}+2a_{22}^{2}\right ) -2z_{1}a_{12}-2z_{2}a_{22}=0 \]
so, solve for \(x_{1},x_{2}\). Write as \(Ax=b\) and solve:
\[\begin {pmatrix} \left ( 2a_{11}^{2}+2a_{21}^{2}\right ) & \left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) \\ \left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) & \left ( 2a_{12}^{2}+2a_{22}^{2}\right ) \end {pmatrix}\begin {pmatrix} x_{1}\\ x_{2}\end {pmatrix} =\begin {pmatrix} 2z_{1}a_{11}+2z_{2}a_{21}\\ 2z_{1}a_{12}+2z_{2}a_{22}\end {pmatrix} \]
Solve for
\begin{align*}\begin {pmatrix} x_{1}\\ x_{2}\end {pmatrix} & =\begin {pmatrix} \left ( 2a_{11}^{2}+2a_{21}^{2}\right ) & \left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) \\ \left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) & \left ( 2a_{12}^{2}+2a_{22}^{2}\right ) \end {pmatrix} ^{-1}\begin {pmatrix} 2z_{1}a_{11}+2z_{2}a_{21}\\ 2z_{1}a_{12}+2z_{2}a_{22}\end {pmatrix} \\ & =\frac {\begin {pmatrix} \left ( 2a_{12}^{2}+2a_{22}^{2}\right ) & -\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) \\ -\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) & \left ( 2a_{11}^{2}+2a_{21}^{2}\right ) \end {pmatrix}\begin {pmatrix} 2z_{1}a_{11}+2z_{2}a_{21}\\ 2z_{1}a_{12}+2z_{2}a_{22}\end {pmatrix} }{\left ( 2a_{11}^{2}+2a_{21}^{2}\right ) \left ( 2a_{12}^{2}+2a_{22}^{2}\right ) -\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) \left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) }\\ & \\ & =\frac {\begin {pmatrix} \left ( 2a_{12}^{2}+2a_{22}^{2}\right ) \left ( 2z_{1}a_{11}+2z_{2}a_{21}\right ) -\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) \left ( 2z_{1}a_{12}+2z_{2}a_{22}\right ) \\ -\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) \left ( 2z_{1}a_{11}+2z_{2}a_{21}\right ) +\left ( 2a_{11}^{2}+2a_{21}^{2}\right ) \left ( 2z_{1}a_{12}+2z_{2}a_{22}\right ) \end {pmatrix} }{\left ( 2a_{11}^{2}+2a_{21}^{2}\right ) \left ( 2a_{12}^{2}+2a_{22}^{2}\right ) -\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) \left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) }\\ & \\ & =\frac {\begin {pmatrix} 4a_{11}z_{1}a_{22}^{2}+4a_{12}^{2}a_{21}z_{2}-4a_{11}a_{12}z_{2}a_{22}-4z_{1}\allowbreak a_{12}a_{21}a_{22}\\ 4z_{1}a_{12}a_{21}^{2}+4a_{11}^{2}z_{2}a_{22}-4a_{11}z_{1}a_{21}a_{22}-4a_{11}\allowbreak a_{12}a_{21}z_{2}\end {pmatrix} }{4a_{11}^{2}a_{22}^{2}-8a_{11}a_{12}a_{21}a_{22}+4a_{12}^{2}a_{21}^{2}}\\ & \\ & =\begin {pmatrix} \frac {4a_{11}z_{1}a_{22}^{2}+4a_{12}^{2}a_{21}z_{2}-4a_{11}a_{12}z_{2}a_{22}-4z_{1}\allowbreak a_{12}a_{21}a_{22}}{4a_{11}^{2}a_{22}^{2}-8a_{11}a_{12}a_{21}a_{22}+4a_{12}^{2}a_{21}^{2}}\\ \frac {4z_{1}a_{12}a_{21}^{2}+4a_{11}^{2}z_{2}a_{22}-4a_{11}z_{1}a_{21}a_{22}-4a_{11}\allowbreak a_{12}a_{21}z_{2}}{4a_{11}^{2}a_{22}^{2}-8a_{11}a_{12}a_{21}a_{22}+4a_{12}^{2}a_{21}^{2}}\end {pmatrix} \end{align*}
so
\begin{align*} x_{1} & =\frac {4a_{11}z_{1}a_{22}^{2}+4a_{12}^{2}a_{21}z_{2}-4a_{11}a_{12}z_{2}a_{22}-4z_{1}\allowbreak a_{12}a_{21}a_{22}}{4a_{11}^{2}a_{22}^{2}-8a_{11}a_{12}a_{21}a_{22}+4a_{12}^{2}a_{21}^{2}}\\ & =\left ( \frac {z_{1}a_{22}-a_{12}z_{2}}{a_{11}a_{22}-a_{12}a_{21}}\right ) \end{align*}
and
\begin{align*} x_{2} & =\frac {4z_{1}a_{12}a_{21}^{2}+4a_{11}^{2}z_{2}a_{22}-4a_{11}z_{1}a_{21}a_{22}-4a_{11}\allowbreak a_{12}a_{21}z_{2}}{4a_{11}^{2}a_{22}^{2}-8a_{11}a_{12}a_{21}a_{22}+4a_{12}^{2}a_{21}^{2}}\\ & =\left ( \frac {a_{11}z_{2}-z_{1}a_{21}}{a_{11}a_{22}-a_{12}a_{21}}\right ) \end{align*}
Hence
\begin{equation}\begin {pmatrix} x_{1}\\ x_{2}\end {pmatrix} =\begin {pmatrix} \left ( \frac {z_{1}a_{22}-a_{12}z_{2}}{a_{11}a_{22}-a_{12}a_{21}}\right ) \\ \left ( \frac {a_{11}z_{2}-z_{1}a_{21}}{a_{11}a_{22}-a_{12}a_{21}}\right ) \end {pmatrix} \tag {2}\end{equation}
is the least squares error. To validate
\begin{align}\begin {pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end {pmatrix}\begin {pmatrix} x_{1}\\ x_{2}\end {pmatrix} & =\begin {pmatrix} z_{1}\\ z_{2}\end {pmatrix} \nonumber \\\begin {pmatrix} x_{1}\\ x_{2}\end {pmatrix} & =\left ( \begin {pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end {pmatrix} ^{T}\begin {pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end {pmatrix} \right ) ^{-1}\begin {pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end {pmatrix} ^{T}\begin {pmatrix} z_{1}\\ z_{2}\end {pmatrix} \nonumber \\ & =\left ( \begin {pmatrix} a_{11} & a_{21}\\ a_{12} & a_{22}\end {pmatrix}\begin {pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end {pmatrix} \right ) ^{-1}\begin {pmatrix} a_{11} & a_{21}\\ a_{12} & a_{22}\end {pmatrix}\begin {pmatrix} z_{1}\\ z_{2}\end {pmatrix} \nonumber \\ & =\begin {pmatrix} a_{11}^{2}+a_{21}^{2} & a_{11}a_{12}+a_{21}a_{22}\\ a_{11}a_{12}+a_{21}a_{22} & a_{12}^{2}+a_{22}^{2}\end {pmatrix} ^{-1}\begin {pmatrix} a_{11}z_{1}+a_{21}z_{2}\\ z_{1}a_{12}+z_{2}a_{22}\end {pmatrix} \nonumber \\ & =\begin {pmatrix} \frac {\left ( a_{12}^{2}+a_{22}^{2}\right ) }{\left ( a_{11}a_{22}-a_{12}a_{21}\right ) ^{2}} & -\frac {a_{11}a_{12}+a_{21}a_{22}}{\left ( a_{11}a_{22}-a_{12}a_{21}\right ) ^{2}}\\ -\frac {a_{11}a_{12}+a_{21}a_{22}}{\left ( a_{11}a_{22}-a_{12}a_{21}\right ) ^{2}} & \frac {\left ( a_{11}^{2}+a_{21}^{2}\right ) }{\left ( a_{11}a_{22}-a_{12}a_{21}\right ) ^{2}}\end {pmatrix}\begin {pmatrix} a_{11}z_{1}+a_{21}z_{2}\\ z_{1}a_{12}+z_{2}a_{22}\end {pmatrix} \nonumber \\ & =\begin {pmatrix} \frac {\left ( z_{1}a_{22}-a_{12}z_{2}\right ) }{a_{11}a_{22}-a_{12}a_{21}}\\ \frac {a_{11}z_{2}-z_{1}a_{21}}{a_{11}a_{22}-a_{12}a_{21}}\end {pmatrix} \tag {3}\end{align}
Compare equations (2) and (3) above, they are the same. OK, confirmed.
2.2.2 appendix