2 Class notes

This course is on ﬁnite dimensional optimization, which means having ﬁnite number of parameters. We now went over the syllabus. Here is a summary:

Homeworks will be graded using E,S,U grades. Most will get S, few will get E. Matlab will be used.

Cardinal rule Develop an answer using given material in class only. Can use other basic things like Laplace transform, etc...

A wise person once said: “Fundamental diﬃculties are invariant under reformulation”.

2.1.1 Objective functions, constraints and variables

In a problem, identify the objective function, constraints and variables. Optimization problems from diﬀerent ﬁelds can be formulated into a common framework for solving using optimization methods.

Ingredients in this case: Set $U\subseteq \Re ^{n}$, the constraint set. Problem has $n$ parameters (the decision variables). Therefore $u\in U$. One dimensional problem has $n=1$. An example is to ﬁnd optimal resistor value where $U=100\cdots 200$ Ohms. For $n=2$, an example is to ﬁnd two resistors $u= \begin{bmatrix} u_{1}\\ u_{2} \end{bmatrix} $ with $100\leq u_{1}\leq 200$ and $300\leq u_{2}\leq 400$.

Reader Often we describe $U$ graphically in $\Re ^{n}$. Typically for only $n=1,2,3$. Do this for the above example.

Reader design a bandpass ﬁlter with passband from $\omega _{1}$ to $\omega _{2}$ with $\omega _{1},\omega _{2}>0$. Describe graphically the set $U$.

Reader Often $U$ is sphere in $\Re ^{n}$ described by ${\displaystyle \sum \limits _{i=0}^{n}} \left ( u_{i}-u_{i,0}\right ) ^{2}\leq R$ where $u_{i},_{0}$ are coordinates of center of sphere.

Reader Suppose we are designing an input voltage $u\left ( t\right ) $ on $t\in \left [ 0,1\right ] $ such that $\int _{0}^{1}u^{2}\left ( t\right ) dt\leq 5$. This does not ﬁt in above framework. This is function space problem.

An important class of $U$: A generalization of rectangle in 2D to hypercube in $\Re ^{n}$ with $\left \vert u_{i}-u_{i,0}\right \vert \leq r_{i}$ for $i=1\cdots n$.

Reader For $n=3$ and $u_{0}=\begin{bmatrix} u_{1,0}\\ u_{2,0}\\ u_{3,0}\end{bmatrix} =\begin{bmatrix} 1\\ 1\\ 1 \end{bmatrix} $ and $r_{1}=1,r_{2}=2,r_{3}=3$, then sketch $U$.

2.1.2 Constrained and Unconstrained problems

Unconstrained problem is one when we say $U=\Re ^{n}$. There are no constraints on $U$. These are easier to solve than constrained problems. In practice, there will always be constraints. $U$ is sometimes called the set of decision variables, or also called the input. The other criteria, is the objective function $J\left ( u\right ) $, or more formally $J:\Re ^{n}\rightarrow \Re $. The objective function $J\left ( u\right ) $ is obtained from your goals.

Example we want to go from $A$ to $B$ in least time. Often the selection of $J\left ( u\right ) $ is not straight forward. Selection of $J\left ( u\right )$ can be diﬃcult in areas such as social or environmental science.

In stock markets, $J(u)$ is given and we just use it. Example is $J(u) = u^{2}+6u+e^{-u}$ in $\Re ^{n}$. Often we test the algorithm ﬁrst in $\Re ^{1}$ or $\Re ^{2}$ before going to higher dimensions. In $\Re ^{4}$ an example is \[ J(u) =u_1 u_2-3 u_1 u_3 u_4+6u_2 u_3-6 u_1-4 u_2-u_3 u_4+12 \] This has $16$ vertices.

Reader Can you ﬁnd max $J(u)$ subject to $U$ described by $\left \vert u_{i}\right \vert \leq i$ using common sense?

In VLSI, with hundreds of resistors, there are $2^{n}$ vertices, so the problem becomes computationally harder to solve very quickly. To proof that the optimal value is at a vertex, the idea is to freeze all other variables except for one at a time. This gives a straight line in the free variable. Hence the optimal is at ends of the line.

The main case problem: Find $u^{\ast }$ that minimizes $J\left ( u\right ) $ with the constraints $u\in U$. $u^{\ast }$ might not exist. When it exists, then \[ J\left ( u^{\ast }\right ) =J^{\ast }=\min _{u\in U}J\left ( u\right ) \] When we begin, we do not know if $u^{\ast }$ exist. We write \[ J\left ( u^{\ast }\right ) =\inf _{u\in U}J\left ( u\right ) \] Example $U=\Re ^{n}$, $J\left ( u\right ) =e^{-u}$. So $u^{\ast }$ do not exist. We do not allow $u^{\ast }$ to take values $\pm \infty $. But we allow $J\left ( u^{\ast }\right ) $ itself to become $\pm \infty $. For example, if $J\left ( u\right ) =u^{2}$ then $u^{\ast }=0$.

2.2 Lecture 2. Thursday, January 21, 2016

2.2.1 Existence of optimal solution, explicit and implicit $J(u)$

Tractability. $U\subseteq \Re ^{n}$, the decision variables. $J:\Re ^{n}\rightarrow \Re $. Problem: ﬁnd $u^{\ast }$ such that $J\left ( u^{\ast }\right ) =\inf _{u\in U}J\left ( u\right ) $.

$u^{\ast }$, when it exists, is called the optimal element. We do not allow $u^{\ast }=\pm \infty $, but allow $J\left ( u^{\ast }\right ) =\pm \infty $. For example. $J\left ( u\right ) =\frac{1}{u}$ on $U=\left ( 0,\infty \right ) $. We say $\sup _{u\in U}J\left ( u\right ) =J^{\ast }=0\,$ but $u^{\ast }=\infty $ do not exist. Hence $J^{\ast }$ is a limiting supremum value. Another example is $J\left ( u\right ) =-u$ on $\Re $. Therefore $J^{\ast }=-\infty $ but $u^{\ast }$ do not exist.

Reader: We can consider $\max _{u\in U}J\left ( u\right ) \equiv \sup _{u\in U}J\left ( u\right ) $. Note that \[ \max _{u\in U}J\left ( u\right ) =-\min _{u\in U}J\left ( u\right ) \] But $u_{\min }^{\ast }\neq u_{\max }^{\ast }$.

Some problems will be tractable and some are not. Some problems will not have deﬁned algorithms and some might not have certiﬁed algorithms. Some problems have algorithms but are not tractable. NP hard problems can be either tractable or not tractable. What about stochastic problems? If $u$ is random variable, we can not write $J\left ( u\right ) $. But instead we work with $\tilde{J}\left ( u\right ) =E\left ( J\left ( u\right ) \right ) $ where $E$ is expectation. So now $\tilde{J}\left ( u\right ) $ ﬁts in the frameworks we used earlier.

We also need to make distinction between explicit and implicit $J\left ( u\right ) $. When we write $J\left ( u\right ) =u_{1}^{2}+e^{u_{2}}\cos u_{1}+u_{2}$, then $J\left ( u\right ) $ here is explicit. But if we have a circuit as below, where $J\left ( u\right ) =V_{out}$, then here $J\left ( u\right ) $ is implicit, since we have to solve the complicated RLC circuit to ﬁnd $J\left ( u\right ) $, so we implicitly assume $J\left (u\right ) $ exists.

2.2.2 Farming problem

First detailed example Optimal farming example. Let $y\left ( k\right ) $ be the annual crop value where $k$ is the year. This is the end of year value of the crop. At end of year, we use some of this to invest and the rest we keep as proﬁt. Let $u\left ( k\right ) \in \left [ 0,1\right ] $ be the fraction of $y\left ( k\right ) $ invested back. Therefore $\left ( 1-u\left ( k\right ) \right ) $ is the fraction of $y\left ( k\right ) $ which is taken out as proﬁt. Dynamics of the problem are: $y\left ( k+1\right ) =y\left ( k\right ) $ if we invest nothing (i.e. $u=0$). But if we invest, then \[ y\left ( k+1\right ) =y\left ( k\right ) +\omega \left ( k\right ) \overset{\text{amount invested}}{\overbrace{u\left ( k\right ) y\left ( k\right ) }}\] Where $\omega \left ( k\right ) $ is an independent and identically distributed (i.i.d.) random variable which depends on the weather and other variables. Let $E\left ( \omega \left ( k\right ) \right ) =\varpi $. We have $N$ years planning horizon. What about $J\left ( u\right ) $? If we model things as a convex problem, we can solve it, but if we do not, it becomes hard to solve.\begin{equation} J\left ( u\right ) =E\left ( y\left ( k\right ) +{\displaystyle \sum \limits _{k=0}^{N-1}} y\left ( k\right ) \left ( 1-u\left ( k\right ) \right ) \right ) \tag{1} \end{equation}

The set $U$ here is $\left \{ u\left ( 0\right ) ,u\left ( 1\right ) ,\cdots ,u\left ( N-1\right ) \right \} $. In this example $J\left ( u\right ) $ is implicit. We need to make $J\left ( u\right ) $ explicit. Let us now calculate $J\left ( u\right ) $ for $N=2$\begin{align*} y_{1} & =y_{0}+\omega _{0}u_{0}y_{0}\\ & =y_{0}\left ( 1+\omega _{0}u_{0}\right ) \end{align*}

In class, we assumed that $y_{0}=1$. But will keep it here as $Y$, which is the initial conditions.\begin{equation} y_{1}=Y\left ( 1+\omega _{0}u_{0}\right ) \tag{2} \end{equation} Now for the second year, we have\begin{align*} y_{2} & =y_{1}+\omega _{1}u_{1}y_{1}\\ & =y_{1}\left ( 1+\omega _{1}u_{1}\right ) \end{align*}

Substituting (2) into the above gives\begin{align*} y_{2} & =Y\left ( 1+\omega _{0}u_{0}\right ) \left ( 1+\omega _{1}u_{1}\right )\\ & =Y+Y\omega _{0}u_{0}+Y\omega _{1}u_{1}+Y\omega _{0}\omega _{1}u_{0}u_{1} \end{align*}

Therefore from (1) we have for $N=2$\begin{align*} J\left ( u\right ) & =E\left ( y_{2}+{\displaystyle \sum \limits _{k=0}^{1}} y_{k}\left ( 1-u_{k}\right ) \right ) \\ & =E\left ( \overbrace{Y+Y\omega _{0}u_{0}+Y\omega _{1}u_{1}+Y\omega _{0}\omega _{1}u_{0}u_{1}}^{y}_{2}+\left [ Y\left ( 1-u_{0}\right ) +y_{1}\left ( 1-u_{1}\right ) \right ] \right ) \\ & =E\left ( Y+Y\omega _{0}u_{0}+Y\omega _{1}u_{1}+Y\omega _{0}\omega _{1}u_{0}u_{1}+\left [ Y\left ( 1-u_{0}\right ) +Y\left ( 1+\omega _{0}u_{0}\right ) \left ( 1-u_{1}\right ) \right ] \right ) \\ & =E\left ( Y+Y\omega _{0}u_{0}+Y\omega _{1}u_{1}+Y\omega _{0}\omega _{1}u_{0}u_{1}+Y-Yu_{0}+Y-Yu_{1}+Y\omega _{0}u_{0}-Y\omega _{0}u_{0}u_{1}\right ) \\ & =Y\left ( 3+\varpi u_{0}+\varpi u_{1}+\varpi ^{2}u_{0}u_{1}-u_{0}-u_{1}+\varpi u_{0}-\varpi u_{0}u_{1}\right ) \\ & =Y\left ( 3+2\varpi u_{0}+u_{1}\left ( \varpi -1\right ) +\varpi ^{2}u_{0}u_{1}-u_{0}-\varpi u_{0}u_{1}\right ) \\ & =Y\left ( 3+u_{0}\left ( 2\varpi -1\right ) +u_{1}\left ( \varpi -1\right ) +u_{0}u_{1}\left ( \varpi ^{2}-\varpi \right ) \right ) \end{align*}

If we want maximum proﬁt, then common sense says that $\varpi $ should be large (good weather). Then $u^{\ast }=\left ( 1,1\right ) $. This says $u_{0}=1$ and $u_{1}=1$. In other words, we invest everything back into the crop each year. The above was obtained by maximizing term by term since all terms are positive. If $\varpi <\frac{1}{2}$ (bad weather), then $u^{\ast }=\left ( 0,0\right ) $ and all coeﬃcients are negative.

In the above, $J\left ( u\right ) $ is multilinear function in $u_{0},u_{1}$. If all $u_{k}$ are held constant except for one at a time, then $J\left ( u\right )$ becomes linear in the free parameter. In HW 1, we need to proof that extreme point of a multilinear function is at a vertex. So for this problem, the possible solutions are $\left ( 0,0\right ) ,\left ( 0,1\right ) ,\left ( 1,0\right ) ,\left ( 1,1\right ) $.

Reader Is there a value of $\varpi $ that gives $\left ( 1,0\right ) $ and $\left ( 0,1\right ) $?

For arbitrary $N$ there are $2^{N}$ vertices in hypercube for a multilinear $J\left ( u\right ) $ where $u\in \Re ^{n}$.So for large $n$, a multilinear problem is much harder to solve than a convex optimization problem. Even simple problems demonstrate intractability. Let $M=N\times N$ matrix. Let each every $m_{ij}$ be known within bounds $m_{ij}^{-}\leq m_{ij}\leq m_{ij}^{+}$. The question is: What are the bounds on the determinant of matrix $M$?

Reader ponder this question in the context of multilinear problem. To determine this for small $n$ in Matlab, here is some code for $n=4$ (uses allcomb which is a ﬁle exchange ﬁle)

2.3 Lecture 3. Tuesday, January 26, 2016

2.3.1 Multilinear functions, level sets, contours

Today will be on multilinear functions, then back to inﬁmum of $J\left ( u\right ) $. Then we will go over real analysis to see when a $\min $ is reached. We need to ﬁnd ﬁrst if there exists a minimum before starting to search for it. For the farming problem we looked at, say $\varpi =2$. We want to look at contours in $\Re ^{2}$ and $\Re ^{3}$. Try ﬁrst in $\Re ^{2}$.

Reader Obtain level sets, contour lines, deﬁned as $\Lambda _{\gamma }=\left \{ u:J\left ( u\right ) =\gamma \right \} $. In words, contour line is curve of equal values of $J\left ( u\right ) $. For farming problem, show contour lines of $J\left ( u\right ) $ looks like

For $\varpi =2$, and using results from last lecture, where we had \[ J\left ( u\right ) =Y\left ( 3+u_{0}\left ( 2\varpi -1\right ) +u_{1}\left ( \varpi -1\right ) +u_{0}u_{1}\left ( \varpi ^{2}-\varpi \right ) \right ) \] Where $Y=y\left ( 0\right ) =1$, the above becomes\begin{equation} J\left ( u\right ) =3+u_{0}\left ( 2\varpi -1\right ) +u_{1}\left ( \varpi -1\right ) +u_{0}u_{1}\left ( \varpi ^{2}-\varpi \right ) \tag{1} \end{equation}

Answer Below is a function which generates $J\left ( u\right ) $ for any $N$. Function called nma_farm(N,y0), takes in $N$, which is how many years and $y\left ( 0\right ) $, the initial value. $\omega 0$ below is $\varpi $, the mean of the $\omega $ random variable).

>> nma_farm(2,1)
2*w0*u(0) - u(1) - u(0) + w0*u(1) + w0^2*u(0)*u(1) - w0*u(0)*u(1) + 3

>> nma_farm(4,1)
4*w0*u(0) - u(1) - u(2) - u(3) - u(0) + 3*w0*u(1) + 2*w0*u(2)
+ w0*u(3) + 3*w0^2*u(0)*u(1) + 2*w0^2*u(0)*u(2) + w0^2*u(0)*u(3)
+ 2*w0^2*u(1)*u(2) + w0^2*u(1)*u(3) + w0^2*u(2)*u(3) -
w0*u(0)*u(1) - w0*u(0)*u(2) - w0*u(0)*u(3) - w0*u(1)*u(2) -
w0*u(1)*u(3) - w0*u(2)*u(3) - w0^2*u(0)*u(1)*u(2) -
w0^2*u(0)*u(1)*u(3) + 2*w0^3*u(0)*u(1)*u(2) - w0^2*u(0)*u(2)*u(3)
+ w0^3*u(0)*u(1)*u(3) - w0^2*u(1)*u(2)*u(3) + w0^3*u(0)*u(2)*u(3)
+ w0^3*u(1)*u(2)*u(3) - w0^3*u(0)*u(1)*u(2)*u(3) +
w0^4*u(0)*u(1)*u(2)*u(3) + 5

>> nma_farm(5,1)

5*w0*u(0) - u(1) - u(2) - u(3) - u(4) - u(0) + 4*w0*u(1)
+ 3*w0*u(2) + 2*w0*u(3) + w0*u(4) + 4*w0^2*u(0)*u(1) +
3*w0^2*u(0)*u(2) + 2*w0^2*u(0)*u(3) + 3*w0^2*u(1)*u(2) +
w0^2*u(0)*u(4) + 2*w0^2*u(1)*u(3) + w0^2*u(1)*u(4) +
2*w0^2*u(2)*u(3) + w0^2*u(2)*u(4) + w0^2*u(3)*u(4) - w0*u(0)*u(1)
- w0*u(0)*u(2) - w0*u(0)*u(3) - w0*u(1)*u(2) - w0*u(0)*u(4)
- w0*u(1)*u(3) - w0*u(1)*u(4) - w0*u(2)*u(3) - w0*u(2)*u(4)
- w0*u(3)*u(4) - w0^2*u(0)*u(1)*u(2) - w0^2*u(0)*u(1)*u(3) +
3*w0^3*u(0)*u(1)*u(2) - w0^2*u(0)*u(1)*u(4) - w0^2*u(0)*u(2)*u(3)
+ 2*w0^3*u(0)*u(1)*u(3) - w0^2*u(0)*u(2)*u(4) - w0^2*u(1)*u(2)*u(3)
+ w0^3*u(0)*u(1)*u(4) + 2*w0^3*u(0)*u(2)*u(3) - w0^2*u(0)*u(3)*u(4)
- w0^2*u(1)*u(2)*u(4) + w0^3*u(0)*u(2)*u(4) + 2*w0^3*u(1)*u(2)*u(3)
- w0^2*u(1)*u(3)*u(4) + w0^3*u(0)*u(3)*u(4) + w0^3*u(1)*u(2)*u(4)
- w0^2*u(2)*u(3)*u(4) + w0^3*u(1)*u(3)*u(4) + w0^3*u(2)*u(3)*u(4)
- w0^3*u(0)*u(1)*u(2)*u(3) - w0^3*u(0)*u(1)*u(2)*u(4)
+ 2*w0^4*u(0)*u(1)*u(2)*u(3) - w0^3*u(0)*u(1)*u(3)*u(4)
+ w0^4*u(0)*u(1)*u(2)*u(4) - w0^3*u(0)*u(2)*u(3)*u(4)
+ w0^4*u(0)*u(1)*u(3)*u(4) - w0^3*u(1)*u(2)*u(3)*u(4) +
w0^4*u(0)*u(2)*u(3)*u(4) + w0^4*u(1)*u(2)*u(3)*u(4)
- w0^4*u(0)*u(1)*u(2)*u(3)*u(4) + w0^5*u(0)*u(1)*u(2)*u(3)*u(4) + 6

At $u^{\ast }=\left ( u_{0},u_{1}\right ) =\left ( 1,1\right ) $ then (1) becomes \begin{align*} J^{\ast } & =3+\left ( 4-1\right ) +\left ( 2-1\right ) +\left ( 4-2\right ) \\ & =9 \end{align*}

Reader Look at $J\left ( u\right ) $ in $\Re ^{3}$ for the farming problem. Today’s topic: When does the optimal element $u^{\ast }$ exist?

2.3.2 Pareto optimality

When we have more than one objective function, $J_{i}\left ( u\right ) :U\subseteq \Re ^{n}\rightarrow \Re $, for $i=1\cdots m$. We call $u^{\ast }\in U$ a Pareto optimal, if the following holds: Given any other $u\in U$, we can not have the following two relations be true at the same time:

This is related to eﬃciency in economics. So something that is not Pareto optimal, can be eliminated.

Reader Say $U=\left ( 0,\infty \right ) $, $J_{1}\left ( u\right ) =u+1,J_{2}\left ( u\right ) =\frac{1}{u+1}$, describe the set of Pareto optimal solutions.

Reader By creating $J\left ( u\right ) =\alpha _{1}J_{1}\left ( u\right ) +\alpha _{2}J_{2}\left ( u\right ) $, with $\alpha _{i}\geq 0$, show the solution (optimal $J\left ( u\right ) $), depends on $\alpha _{i}$ which reﬂects diﬀerent utility functions.

2.3.3 compact and bounded sets, open and closed sets

Now we will talk about existence of optimal element $u^{\ast }$. We will always assume that $J\left ( u\right ) $ is continuous function in $u\in U$. Two diﬀerent conditions on the set $U$ can be made

In both cases, we still want conditions on $J\left ( u\right ) $ itself as well. We begin with the deﬁnition of \[ \boxed{ J^{\ast }=\inf _{u\in U}J\left ( u\right ) } \] Remember that we do not allow $u=\pm \infty $, but $J^{\ast }$ can be $\pm \infty $.

Example $J\left ( u\right ) =-u$ on $\Re $. Then $J^{\ast }=-\infty $, but there is no optimal element $u^{\ast }=-\infty $. But it is always true that there is a sequence $\left \{ u^{k}\right \} _{k=1}^{\infty }=u^{k}$ such that $J\left ( u^{k}\right ) \rightarrow J^{\ast }$ even though there is no $u^{\ast }$ element. We write\[ \lim _{k\rightarrow \infty }J\left ( u^{k}\right ) =J^{\ast }\] A bad set $U$ is often one which is not closed. Example is of a gas pedal which when pressed to the ﬂoor will cause the car to malfunction, but the goal is to obtain the shortest travel time, which will require one to press the gas pedal to the ﬂoor in order to obtain the highest car speed.

This shows that there is no optimal $u^{\ast }$ but we can get as close to it as we want. This is an example of a set $U$ that is not closed on one end. We always prefer to work with closed sets. An open set is one such as $U=(0,\frac{\pi }{4}]$, and a closed set is one such as $U=\left [ 0,\frac{\pi }{4}\right ] $.

Deﬁnition of continuity If $u^{k}\rightarrow u_{0}$ then $J\left ( u^{k}\right ) \rightarrow J\left ( u_{0}\right ) $, we write $J\left ( u^{0}\right ) =\lim _{k\rightarrow \infty }J\left ( u^{k}\right ) $. This is for every sequence $u^{k}$.

Equivalent deﬁnition: There is continuity at $u^{0}$ if the following holds: Given any $\varepsilon \geq 0\,\ $ there exists $\delta $ such that $\left \vert J\left ( u\right ) -J\left ( u^{0}\right ) \right \vert <\varepsilon $, whenever $\left \vert u-u^{0}\right \vert <\delta $.

A set can be both open and closed. $\Re $ is such a set. To show a set is closed, we need to show that the limit of any sequence is also in the set.

The intersection of closed sets remains closed, but the union can be an open set. This is important in optimization. If $U_{i}$ represent constraint sets, then the intersection of the constraints remain a closed set. Closed sets also contain its boundaries.

Now we will talk about boundness of set. A set is bounded if we can put it inside a sphere of some radius. We always use the Euclidean norm. If a set is bounded, using one norm, then it is bounded in all other deﬁnitions of norms. There is more than one deﬁnition of a norm. But we will use Euclidean norm.

We like to work with compact sets. A Compact set is one which is both bounded and closed. These are the best for optimization.

2.3.4 B-W (The Bolzano-Weierstrass) theorem

Each bounded sequence in $\Re ^{n}$ has a convergent sub-sequence. This is useful, since it says if the sequence is bounded, then we can always ﬁnd at least one sub-sequence in it, which is convergent. For example $u^{k}=\cos k$. This does not converge, but is has a subsequence in it which does converge. The same for $u^{k}=\left (-1\right ) ^{k}$.

2.4 Lecture 4. Thursday, January 28, 2016

2.4.1 Existence of optimal solutions

We will spend few minutes to review existence of optimal solutions then we will talk about computability and convexity. We know now what $J\left ( u\right ) $ being continuous means. Closed sets are very important for well posed optimization problems. Typical closed set is $u\leq k$, where $k$ is constant. If the function $J\left ( u\right ) $ where $u\in U$, is a continuous function, then the set $U$ must be closed.

We talked about $U$ being closed and bounded (i.e. compact). Compact sets are best for optimization. We talked about sequence $\left \{ u^{k}\right \} _{k=1}^{\infty }=u^{k}$ and a subsequence in this sequence $\left \{ u^{k_{i}}\right \} _{i=1}^{\infty }=u^{k_{i}}$. If $u^{k}$ converges to $u^{\ast }$ then we say $\lim _{k\rightarrow \infty }\left \Vert u^{k}-u^{\ast }\right \Vert =0.$ Finally, we talked about Bolzano-Weierstrass theorem.

2.4.2 Classical existence theorem

Suppose $J:U\rightarrow \Re $ is continuous and assume $U$ is compact (i.e. bounded and closed) and non-empty, then there exists an optimal element $u^{\ast }\in U\,\ $such that $J\left ( u^{\ast }\right ) =J^{\ast }=\min _{u\in U}J\left ( u\right ) $. This does not say that $u^{\ast }$ is unique. Just that it exists.

Proof Let $u^{k}\in U$ be a sequence such that $J\left ( u^{k}\right ) \rightarrow J^{\ast }$, then by Bolzano-Weierstrass $u^{k_{i}}$ be a convergent subsequence with limit $u^{\ast }\in U$.

Reader Show that $J\left ( u^{k_{i}}\right ) $ also converges to $J^{\ast }$. Note: $J\left ( u^{k_{i}}\right ) $ is sequence of real numbers, which converges to a real number $J^{\ast }$.

Example $u^{k}=\left ( -1\right ) ^{k}$. Let $J\left ( u\right ) =e^{-u^{2}}$ then $J\left ( u\right ) $ converges, but $u^{k}$ does not. Hence we need to look for subsequence $u^{k_{i}}$ in $u^{k}$. Now, by continuity, $J\left ( u^{\ast }\right ) =\lim _{i\rightarrow \infty }J\left ( u^{k_{i}}\right ) =J^{\ast }$.

There are many problems where the set is open, as in unconstrained problems. These are called open cone problems.

2.4.3 Coercive functions and Coercivity theorem

Coercive function Suppose $U\subset \Re ^{n}$ and $J:U\rightarrow \not \Re $. We say that $J$ is positive coercive if \begin{equation} \boxed{ \lim _{\left \Vert u\right \Vert \rightarrow \infty }J\left ( u\right ) =\infty \tag{*} } \end{equation} Initially, think of $U$ as the whole $\Re ^{n}$ space. So $J\left ( u\right ) $ blows up at $\infty $. Note: there is a type of uniform continuity implied by Eq (*). What Eq (*) means is that given any $\gamma \ggg 0$, arbitrarily large, there exists radius $R$ such that $J\left ( u\right ) >\gamma $, whenever $\left \Vert u\right \Vert >R$. This basically says that for a Coercive function we can always ﬁnd a sphere, where all values of this function are larger than some value for any $\left \Vert u\right \Vert >R$. This is useful, if we are searching for a minimum, in that we can obtain a cut oﬀ on the values in $U$ to search for.

Example $J\left ( u\right ) =u^{2}$ is positive coercive, but $J\left ( u\right ) =e^{u}$ is not (since we can’t ﬁnd a sphere to limit values within it to some number), since as $u\rightarrow -\infty $, the function $e^{u}$ does not blow up. \[ J\left ( u\right ) =au^{2}+bu+c \] Is coercive only when $a>0$. The most famous coercive function is the positive deﬁnite quadratic form. Let $A$ be $n\times n$ symmetric and positive deﬁnite matrix and let $b\in \Re ^{n}$. Let $c$ be any real number, then $J\left ( u\right ) =u^{T}Au+b^{T}u+c$ is coercive. This is essential to quadratic programming.

Why is u^TAu + b^Tu + c coercive? From matrix algebra, if $A$ is $n\times n$ symmetric and $x\in \Re ^{n}$, then $\lambda _{\min }\left \Vert x\right \Vert ^{2}\leq x^{T}Ax\leq \lambda _{\max }\left \Vert x\right \Vert ^{2}$. For $x,y\in \Re ^{n}$, by Schwarz inequality, $\left \vert \left \langle x,y\right \rangle \right \vert ^{2}\leq \left \langle x,x\right \rangle \left \langle y,y\right \rangle $ or $x^{T}y\leq \left \Vert x\right \Vert \left \Vert y\right \Vert $, to establish $J\left ( u\right ) =u^{T}Au+b^{T}u+c\,\ $notice then $J\left ( u\right ) \geq \lambda _{\min }\left ( A\right ) \left \Vert u\right \Vert ^{2}-\left \Vert b\right \Vert \left \Vert u\right \Vert +c$. Since $A$ is positive deﬁnite, then $\lambda _{\min }\left ( A\right ) >0$ (smallest eigenvalue must be positive). So this is the same as scalar problem $au^{2}+by+c$. Hence $J\left ( u\right ) $ is coercive function in $u$.

Reader What if we have a physical problem, leading to $u^{T}Au+b^{T}u+c$, but $A$ is not symmetric, what to do? Solution: We can always work with the symmetrical part of $A$ using $u^{T}Au=\frac{1}{2}u^{T}\left ( A+A^{T}\right ) u$, hence we work with \[ J\left ( u\right ) =\frac{1}{2}u^{T}\left ( A+A^{T}\right ) u+b^{T}u+c \] Instead.

2.4.4 Convex sets and Coercivity theorem

Coercivity theorem Suppose $J:U\rightarrow \Re $ is a continuous function and coercive. And $U\subseteq \Re ^{n}$ is closed. Then an optimal element $u^{\ast }\in U$ exist minimizing $J\left ( u\right ) $. Reader Consider the maximization problem instead of minimization.

Now we start on a new topic: Convexity. Toward ﬁnding optimal. A set $U\subseteq \Re ^{n}$ is said to be convex set if the following holds: For any $u^{0},u^{1}\in U$, and $\lambda \in \left [ 0,1\right ] $ then it follows that $\lambda u^{0}+\left ( 1-\lambda \right ) u^{1}\in U$. In words, this says that all points on the straight line between any points in the set, are also in the set. Examples:

Reader If $U_{1},U_{2}$ are both convex then show the intersection is also convex.

Answer Yes. Example $U_{i}=\left \{ u\in \Re ^{n}:\left \Vert u\right \Vert \leq e^{-i}\right \} $. The union is not a convex set.

Constraints using OR are union. So harder to work with OR constraints, since union of convex sets is not convex.

Reader Suppose $U_{i}$ is deﬁned by set of inequalities $a_{i}^{T}u\leq b_{i}$, $i=1\cdots m$, these are intersections of sets. This is used in Linear programming.

2.5 Lecture 5. Tuesday, February 2, 2016

Back to the most important topic in optimization, which is convexity. We want to build on convex sets.

Deﬁnition A set $U\subseteq \Re ^{n}$ is convex if the following holds: Given any $u^{0},u^{1}\in U\,\ $ and $\lambda \in \left [ 0,1\right ] $, then $u^{\lambda }=\left ( 1-\lambda \right ) u^{0}+\lambda u^{1}$ is also in $U$. We will discuss convex function also soon.

Reader Show that the set $\left \{ u\in \Re ^{n},\left \Vert u\right \Vert \leq r\right \} $ is convex.

Answer: Let $\left \Vert u_{1}\right \Vert \leq r,\left \Vert u_{2}\right \Vert \leq r$. Then $\left \Vert \left ( 1-\lambda \right ) u_{1}+\lambda u_{2}\right \Vert \leq \left \Vert \left ( 1-\lambda \right ) u_{1}\right \Vert +\left \Vert \lambda u_{2}\right \Vert $ by triangle inequality. Hence

\[ \left \Vert \left ( 1-\lambda \right ) u_{1}+\lambda u_{2}\right \Vert \leq \left ( 1-\lambda \right ) \left \Vert u_{1}\right \Vert +\lambda \left \Vert u_{2}\right \Vert \]

If some center point, say $\hat{u}$ is given, then $\left \{ u:\left \Vert u-\hat{u}\right \Vert \leq r\right \} $ is convex set. The translation $u+\hat{u}$ is also convex set. Other norms on $\Re ^{n}$ are important. $\left \Vert u\right \Vert _{1}={\displaystyle \sum \limits _{i=1}^{n}} \left \vert u_{i}\right \vert $ and $\left \Vert u\right \Vert _{\infty }=\max \left \{ \left \vert u_{1}\right \vert ,\left \vert u_{2}\right \vert ,\cdots ,\left \vert u_{n}\right \vert \right \} $.

Reader In $\Re ^{2}$, sketch unit ball $\left \{ u:\left \Vert u\right \Vert \leq r\right \} $ using norms $\left \Vert u\right \Vert _{1}$ and $\left \Vert u\right \Vert _{\infty }$

Reader Suppose $U$ is convex, and $u^{0},u^{1},\cdots ,u^{m}\in U$ and $\lambda _{0},\lambda _{1},\cdots ,\lambda _{m}\geq 0$ with ${\displaystyle \sum \limits _{i=1}^{m}} \lambda _{i}=1$, then show that ${\displaystyle \sum \limits _{i=1}^{m}} \lambda _{i}u^{i}\in U$. (See HW 2). For three points, $u^{0},u^{1},u^{2}$, then ${\displaystyle \sum \limits _{i=1}^{m}} \lambda _{i}u^{i}$ is the mixture, which is convex set between all the three points:

2.5.1 Polytope

In words, these are ﬂat sided shapes, which are convex. Let $v^{1},v^{2},\cdots ,v^{m}\in \Re ^{n}$ be given. We call set $U$ a polytope generated by $v^{i}$ if $U$ is a set of mixtures of $v^{i}$. That is, \[ U=\left \{{\displaystyle \sum \limits _{i=1}^{m}} \lambda _{i}v^{i}:\lambda _{i}\geq 0\text{ and }{\displaystyle \sum \limits _{i=1}^{m}} \lambda _{i}=1\right \} \] The following are some illustration. Given these points

Notice that the points $v^{6},v^{7}$ are redundant and have not been used. In higher dimensions, it will be harder to know which are the vertices of the extreme points.

Extreme points Let $U\subseteq \Re ^{n}$ be convex set. A point $u\in U$ is said to be an extreme point if the following holds: $u$ can not be written as a convex combination of other points $u^{0},u^{1},\cdots $. Examples:

2.5.2 Convex functions

So far we talked about convex sets. Now we will talk about convex functions. $J\left ( u\right ) :\Re ^{n}\rightarrow \Re $ is said to be convex function is the following is satisﬁed. Given any $u^{0},u^{1}\in U\subseteq \Re ^{n}$ and $\lambda \in \left [ 0,1\right ] $ then \[ \boxed{ J\left ( \left ( 1-\lambda \right ) u^{0}+\lambda u^{1}\right ) \leq \left ( 1-\lambda \right ) J\left ( u^{0}\right ) +\lambda J\left ( u^{1}\right ) } \] In words, it means the function value $J\left ( u\right ) $ between 2 points, is always below the straight cord points joining $J\left ( u^{0}\right ) $ and $J\left ( u^{1}\right ) $

But the above is convex over some regions. But overall, it is not a convex function. We can not use this deﬁnition to check if a function is convex for higher dimensions. In that case, we have to use the Hessian to check.

Reader A function $J\left ( u\right ) $ is concave if $-J\left ( u\right ) $ is convex. Example: $e^{\alpha u}$ is convex. Any linear function is convex function. $a^{T}u+b$ is convex. Also $-\log \left ( u\right ) $ is a convex function. And $J\left ( u\right ) =au^{2}+bu+c$ with $a>0$ is a convex function. What about the following? \[ \max \left \{ J_{1},J_{2}\right \} =\max \left \{ \left ( a_{1}\right ) ^{T}u+b_{1},\left ( a_{2}\right ) ^{T}u+b_{2}\right \} \] Is it convex function? The pointwise maximum of two or more convex functions is a convex function.

Reader Suppose $J_{i}:\Re ^{n}\rightarrow \Re $ are convex functions, not necessarily linear, for $i=1,\cdots m\,$, then $J\left ( u\right ) =\max \left \{ J_{1}\left ( u\right ) ,\cdots ,J_{m}\left ( u\right ) \right \} $ is convex function. Hint, use $\left ( 1-\lambda \right ) J\left ( u\right ) +\lambda J\left ( u\right ) =\left ( 1-\lambda \right ) \max \left ( \cdots \right ) +\cdots $.

2.6 Lecture 6. Thursday, February 4, 2016

2.6.1 Convex functions and convex sets

An interpretation of convex function in 1D is bowl shaped. But pictures are only for low dimensions. Convex applications have taken oﬀ in the last 15 years. For example, CVX software. Why is convex so great? There are useful properties of convex functions

Reader Suppose $U\subseteq \Re ^{n}$ is convex set, and $J:U\rightarrow \Re $ is convex function. Then show the set of minimizer elements $U^{\ast }=\left \{ u\in U,J\left ( u\right ) =\min _{u\in U}J\left ( u\right ) \right \} $ is a convex set.

Other nice properties of convex functions is that point wise maximum of convex functions is also a convex function. The maximum over an indexed collection is also a convex function. Let $I$ be a set, perhaps uncountable, for each $i\in I$, suppose we have a convex function $J_{i}:\Re ^{n}\rightarrow \Re $ is convex. Let $J\left ( u\right ) =\sup _{i\in I}J_{i}\left ( u\right ) $ and assume $J\left ( u\right ) <\infty $.

Reader Show that $J\left ( u\right ) $ is convex function. Example: $J_{q}\left ( u\right ) =6u^{2}+\left ( 6q-\cos q\right ) +e^{-q}$. Where $\left \Vert q\right \Vert \leq 1$. Then $J\left ( u\right ) =\max _{\left \vert q\right \vert \leq 1}J_{q}\left ( u\right ) $.

2.6.2 convex functions and convex sets relation

Given $J:\Re ^{n}\rightarrow \Re $, deﬁne $epiJ$ as set \[ \boxed{ \left \{ \left ( u,v\right ) \in \Re ^{n+1}:v\geq J\left ( u\right ) \right \} } \] $J\left ( u\right ) $ is convex function and $epiJ$ is a convex set. See HW2 $epi$ problem.

2.6.3 Criterion for convexity, Gradient and Hessian

Begin with $J:\Re ^{n}\rightarrow \Re $. How to check $J\left ( u\right ) $ is convex? We assume $J\left ( u\right ) $ is twice diﬀerentiable, called $C^{2}$. And also assume $U$ is open and convex set.

Deﬁnition: Gradient $\nabla J\left ( u\right ) =\begin{bmatrix} \frac{\partial J}{\partial u_{1}} & \frac{\partial J}{\partial u_{1}} & \cdots & \frac{\partial J}{\partial u_{n}}\end{bmatrix} ^{T}\in \Re ^{n}$.

The Hessian \[ \nabla ^{2}J\left ( u\right ) =\begin{bmatrix} \frac{\partial J^{2}}{\partial u_{1}^{2}} & \frac{\partial ^{2}J}{\partial u_{1}\partial u_{2}} & \cdots & \frac{\partial ^{2}J}{\partial u_{1}\partial u_{n}}\\ \frac{\partial ^{2}J}{\partial u_{2}\partial u_{1}} & \frac{\partial J^{2}}{\partial u_{2}^{2}} & \cdots & \frac{\partial ^{2}J}{\partial u_{2}\partial u_{n}}\\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial ^{2}J}{\partial u_{n}\partial u_{1}} & \frac{\partial ^{2}J}{\partial u_{n}\partial u_{2}} & \cdots & \frac{\partial J^{2}}{\partial u_{n}^{2}}\end{bmatrix} \]

Reader Why the Hessian is always symmetric? (Order of diﬀerentiation does not matter).

Positive deﬁnite Hessian is the same as saying second derivative is greater than zero.

2.6.4 Hessian theorem

Suppose $J:\Re ^{n}\rightarrow \Re $ is $C^{2}$. Then $J$ is convex function in $U$ iﬀ $\nabla ^{2}J\left ( u\right ) $ is positive semi-deﬁnite matrix for all $u\in U$.

Proof We ﬁrst proof for $n=1$. Suﬃciency: Suppose $\nabla ^{2}J\left ( u\right ) \geq 0$ for all $u\in U$, (i.e. this is the same as saying $\frac{\partial ^{2}J}{\partial u^{2}}>0$, and let $u^{0},u^{1}$ be given in $U$. We need to show that for $\lambda \in \left [ 0,1\right ] $, that $J\left ( \lambda u^{0}+\left ( 1-\lambda \right ) u^{1}\right ) \leq \lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) $. We write \[ J\left ( u^{\lambda }\right ) =J\left ( u^{0}\right ) +\int _{u^{0}}^{u^{\lambda }}J^{\prime }\left ( \xi \right ) d\xi \] Since $J^{\prime }\left ( \xi \right ) $ non-decreasing and $J^{\prime \prime }\left ( \xi \right ) >0$, so the above can be upper bounded as\begin{equation} J\left ( u^{\lambda }\right ) \leq J\left ( u^{0}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{0}\right ) \tag{1} \end{equation} Similarly, \begin{align} J\left ( u^{1}\right ) & =J\left ( u^{\lambda }\right ) +\int _{u^{\lambda }}^{u^{1}}J^{\prime }\left ( \xi \right ) d\xi \nonumber \\ J\left ( u^{1}\right ) & \geq J\left ( u^{\lambda }\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{1}-u^{\lambda }\right ) \nonumber \\ J\left ( u^{\lambda }\right ) & \leq J\left ( u^{1}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{1}\right ) \tag{2} \end{align}

Therefore $\lambda \overset{(1)}{\overbrace{J\left ( u^{\lambda }\right ) }}+\left ( 1-\lambda \right ) \overset{(2)}{\overbrace{J\left ( u^{\lambda }\right ) }}$ using (1) and (2) gives \begin{align*} \lambda J\left ( u^{\lambda }\right ) +\left ( 1-\lambda \right ) J\left ( u^{\lambda }\right ) & \leq \lambda \left [ J\left ( u^{0}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{0}\right ) \right ] +\left ( 1-\lambda \right ) \left [ J\left ( u^{1}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{1}\right ) \right ] \\ J\left ( u^{\lambda }\right ) & \leq \lambda \left [ J\left ( u^{0}\right ) +u^{\lambda }J^{\prime }\left ( u^{\lambda }\right ) -u^{0}J^{\prime }\left ( u^{\lambda }\right ) \right ] +\left [ J\left ( u^{1}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{1}\right ) \right ] -\lambda \left [ J\left ( u^{1}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{1}\right ) \right ] \\ & =\lambda J\left ( u^{0}\right ) +\lambda u^{\lambda }J^{\prime }\left ( u^{\lambda }\right ) -\lambda u^{0}J^{\prime }\left ( u^{\lambda }\right ) +J\left ( u^{1}\right ) +u^{\lambda }J^{\prime }\left ( u^{\lambda }\right ) -u^{1}J^{\prime }\left ( u^{\lambda }\right ) -\left [ \lambda J\left ( u^{1}\right ) +\lambda J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{1}\right ) \right ] \\ & =\lambda J\left ( u^{0}\right ) +\lambda u^{\lambda }J^{\prime }\left ( u^{\lambda }\right ) -\lambda u^{0}J^{\prime }\left ( u^{\lambda }\right ) +J\left ( u^{1}\right ) +u^{\lambda }J^{\prime }\left ( u^{\lambda }\right ) -u^{1}J^{\prime }\left ( u^{\lambda }\right ) -\lambda J\left ( u^{1}\right ) -u^{\lambda }\lambda J^{\prime }\left ( u^{\lambda }\right ) +u^{1}\lambda J^{\prime }\left ( u^{\lambda }\right ) \\ & =\lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) -\lambda u^{0}J^{\prime }\left ( u^{\lambda }\right ) +u^{\lambda }J^{\prime }\left ( u^{\lambda }\right ) -u^{1}J^{\prime }\left ( u^{\lambda }\right ) +u^{1}\lambda J^{\prime }\left ( u^{\lambda }\right ) \\ & =\left [ \lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) \right ] +\lambda J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{1}-u^{0}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{1}\right ) \end{align*}

But $u^{\lambda }=\lambda u^{0}+\left ( 1-\lambda \right ) u^{1}$, hence the above becomes

\begin{align*} J\left ( u^{\lambda }\right ) & \leq \left [ \lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) \right ] +\lambda J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{1}-u^{0}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( \lambda u^{0}+\left ( 1-\lambda \right ) u^{1}-u^{1}\right ) \\ & =\left [ \lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) \right ] +\lambda J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{1}-u^{0}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( \lambda u^{0}+u^{1}-\lambda u^{1}-u^{1}\right ) \\ & =\left [ \lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) \right ] +\lambda J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{1}-u^{0}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( \lambda u^{0}-\lambda u^{1}\right ) \\ & =\left [ \lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) \right ] +\lambda J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{1}-u^{0}\right ) -\lambda J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{1}-u^{0}\right ) \\ & =\lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) \end{align*}

Hence we showed that $J\left ( u^{\lambda }\right ) \leq \lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) .$ Same idea can be used to establish necessity. We now establish $\nabla ^{2}J\left ( u\right ) >0$ and convexity next time. We carry this to $n$ dimensions.

2.7 Lecture 7, Tuesday, February 9, 2016

For $n=1$, Let $J:U\rightarrow \Re ^{n}$, where $U$ is open set (so that we can diﬀerentiate on it), and convex set. Let $J$ is $C^{2}$. $J$ is convex iﬀ $\frac{d^{2}J}{du^{2}}\geq 0$ for all $u\in U$. We are trying to establish the Hessian theorem. Now we want to show the above for $n>1$. We start with the bridging lemma, which will use to proof the Hessian theorem.

2.7.1 The Bridging Lemma

Given $J:U\rightarrow \Re ^{n}$, and $U$ is convex set, we want to know if $J$ is a convex function. The lemma says that $J$ is convex iﬀ the following condition holds:

Given any $\mathbf{u}\in U$, $\mathbf{z} \in \Re ^{n}$, then the function $\tilde{J}\left ( \lambda \right ) =J\left ( \mathbf{u}+\lambda \mathbf{z}\right ) $ is convex on the set $\Lambda =\left \{ \mathbf{u},\lambda \mathbf{z\in U}\right \} $. Notice that the function $\tilde{J}\left ( \lambda \right ) $ is scalar valued. It depends on scalar $\lambda $. Hence we say $\tilde{J}\left ( \lambda \right ) \doteqdot \Re \rightarrow \Re $. This lemma says that the function $\tilde{J}\left ( \lambda \right ) $ is convex in any direction we move to from $\mathbf{u}$ in the direction of $\mathbf{z}$ within the set $U$ iﬀ $J$ is convex function.

Proof See also the handout. Necessity: Assume $J$ is convex function. We must show that $\tilde{J}$ is convex function. Pick $\mathbf{z}\in \Re $ and $\lambda \in \left [ 0,1\right ] $ and any scalars $\alpha ^{0},\alpha ^{1}\in \Lambda $. We must show that\[ \tilde{J}\left ( \lambda \alpha ^{0}+\left ( 1-\lambda \right ) \alpha ^{1}\right ) \leq \lambda \tilde{J}\left ( \alpha ^{0}\right ) +\left ( 1-\lambda \right ) \tilde{J}\left ( \alpha ^{1}\right ) \] Indeed, from $\tilde{J}\left ( \lambda \right ) \doteqdot J\left ( \mathbf{u}+\lambda \mathbf{z}\right ) $, then\begin{align} \tilde{J}\left ( \lambda \alpha ^{0}+\left ( 1-\lambda \right ) \alpha ^{1}\right ) & \doteqdot J\left ( \mathbf{u}+\left ( \lambda \alpha ^{0}+\left ( 1-\lambda \right ) \alpha ^{1}\right ) \mathbf{z}\right ) \tag{1}\\ & \doteqdot J\left ( \lambda \left ( \mathbf{u}+\alpha ^{0}\mathbf{z}\right ) +\left ( 1-\lambda \right ) \left ( \mathbf{u}+\alpha ^{1}\mathbf{z}\right ) \right ) \tag{2} \end{align}

Reader Going from (1) to (2) above is just a rewriting and manipulation only. Now since $J$ is assumed convex, then\[ J\left ( \lambda \left ( \mathbf{u}+\alpha ^{0}\mathbf{z}\right ) +\left ( 1-\lambda \right ) \left ( \mathbf{u}+\alpha ^{1}\mathbf{z}\right ) \right ) \leq \lambda J\left ( \mathbf{u}+\alpha ^{0}\mathbf{z}\right ) +\left ( 1-\lambda \right ) J\left ( \mathbf{u}+\alpha ^{1}\mathbf{z}\right ) \] Therefore (1) becomes\begin{align*} \tilde{J}\left ( \lambda \alpha ^{0}+\left ( 1-\lambda \right ) \alpha ^{1}\right ) & \leq \lambda J\left ( \mathbf{u}+\alpha ^{0}\mathbf{z}\right ) +\left ( 1-\lambda \right ) J\left ( \mathbf{u}+\alpha ^{1}\mathbf{z}\right ) \\ & \leq \lambda \tilde{J}\left ( \alpha ^{0}\right ) +\left ( 1-\lambda \right ) \tilde{J}\left ( \alpha ^{1}\right ) \end{align*}

Hence $\tilde{J}$ is convex function. QED. Now to proof suﬃciency. Assume that $\tilde{J}$ is convex, we need to show that this implies that $J$ is convex on $U$. Since $\tilde{J}$ then\begin{align} \tilde{J}\left ( \left ( 1-\lambda \right ) \alpha ^{0}+\lambda \alpha ^{1}\right ) & \leq \left ( 1-\lambda \right ) \tilde{J}\left ( \alpha ^{0}\right ) +\lambda \tilde{J}\left ( \alpha ^{1}\right ) \nonumber \\ & =\left ( 1-\lambda \right ) J\left ( u+\alpha ^{0}z\right ) +\lambda J\left ( u+\alpha ^{1}z\right ) \tag{3} \end{align}

But \begin{align} \tilde{J}\left ( \left ( 1-\lambda \right ) \alpha ^{0}+\lambda \alpha ^{1}\right ) & =J\left ( u+\left [ \left ( 1-\lambda \right ) \alpha ^{0}+\lambda \alpha ^{1}\right ] z\right ) \tag{4}\\ & =J\left ( \left ( 1-\lambda \right ) \left ( u+\alpha ^{0}z\right ) +\lambda \left ( u+\alpha ^{1}z\right ) \right ) \tag{5} \end{align}

Where the main trick was going from (4) to (5) by just rewriting, so it match what we have in (3). Now replacing (5) into LHS of (3) we ﬁnd\[ J\left ( \left ( 1-\lambda \right ) \left ( u+\alpha ^{0}z\right ) +\lambda \left ( u+\alpha ^{1}z\right ) \right ) \leq \left ( 1-\lambda \right ) J\left ( u+\alpha ^{0}z\right ) +\lambda J\left ( u+\alpha ^{1}z\right ) \] Let $\left ( u+\alpha ^{0}z\right ) \equiv u^{0},u+\alpha ^{1}z\equiv u^{1}$, both in $U$, then the above becomes\[ J\left ( \left ( 1-\lambda \right ) u^{0}+\lambda u^{1}\right ) \leq \left ( 1-\lambda \right ) J\left ( u^{0}\right ) +\lambda J\left ( u^{1}\right ) \] Hence $J$ is convex function. QED. Now the bridging lemma is proved. we use it to proof the Hessian theorem.

2.7.2 The Hessian Theorem, strong local minimum

Let $J=U\rightarrow \Re $, where $U$ is open set in $\Re ^{n}$. Hessian theorem says that $J$ is convex function on $U$ iﬀ $\nabla ^{2}J\left ( u\right ) $ is PSD (positive semi-deﬁnite) evaluated at each $u\in U$.

Reader Suppose $\nabla ^{2}J\left ( u\right ) $ is PSD, does this imply strict convexity on $J\left ( u\right ) $? Answer: No. Need an example.

Algorithms We will now start new chapter. Looking at algorithms to ﬁnd optimal of $J\left ( u\right ) $. Preliminaries: Begin with $J:\Re ^{n}\rightarrow \Re $.

Strong local minimum $u^{\ast }$ is strong local minimum if there exists $\delta >0$ such that $J\left ( u^{\ast }\right ) <J\left ( u\right ) $ for all $u$ such that $\left \Vert u^{\ast }-u\right \Vert <\delta $.

We say $u^{\ast }$ is global minimum if $J\left ( u^{\ast }\right ) \leq J\left ( u\right ) $ for all $u$. Henceforth, $J\left ( u\right ) $ is $C^{2}$. From undergraduate calculus, $u^{\ast }$ is strong local minimum if the following is satisﬁed: (for $n=2$)

For $J:\Re ^{n}\rightarrow \Re $, in other words, in higher dimensions, deﬁne gradient\[ \boxed{ \left ( \nabla J\left ( u\right ) \right ) ^{T}=\begin{bmatrix} \frac{\partial J}{\partial u_{1}}\\ \frac{\partial J}{\partial u_{2}}\\ \vdots \\ \frac{\partial J}{\partial u_{n}}\end{bmatrix} } \] Normally we consider gradient as column vector. Then we say that a point $u^{\ast }\in \Re ^{n}$ is strong local min. if $\nabla J\left ( u\right ) =\mathbf{0}$, and $\nabla ^{2}J\left ( u\right ) >0$.

Proof: Suppose $u^{\ast }$ is strong local min. Then looking at neighborhood of $u^{\ast }$, let $v\in \Re ^{n}$ be arbitrary. Look at $J\left ( u^{\ast }+v\right ) $ and expand in Taylor series.\[ J\left ( u^{\ast }+v\right ) =J\left ( u^{\ast }\right ) +\nabla J\left ( u^{\ast }\right ) v+v^{T}\nabla ^{2}J\left ( u\right ) v+H.O.T\left ( O\left ( \left \Vert v\right \Vert ^{3}\right ) \right ) \] Since $u^{\ast }$ is strong local min. then $\nabla J\left ( u^{\ast }\right ) =0$. Hence $J\left ( u^{\ast }+v\right ) =J\left ( u^{\ast }\right ) +v^{T}\nabla ^{2}J\left ( u\right ) v+H.O.T.$ Since $J\left ( u^{\ast }+v\right ) >J\left ( u^{\ast }\right ) $ (since strong local minimum), then this implies that \[ v^{T}\nabla ^{2}J\left ( u\right ) v>0 \] Since $v^{T}\nabla ^{2}J\left ( u\right ) v$ dominate over $H.O.T.$. This complete the proof.

2.8 Lecture 8. Thursday, February 11, 2016

Reminder, test 1 next Thursday Feb. 18, 2016. Up to and including HW 3. Today’s lecture on gradient based optimization. We developed two conditions. $u^{\ast }$ is strong local minimum when $\nabla J\left ( u^{\ast }\right ) =0$ and $\nabla ^{2}J\left ( u^{\ast }\right ) <0$.

2.8.1 gradient based optimization and line searches

We mention line searches. It is about optimization for one variable functions only. There will be reading assignment on line search. Some methods used are

2.8.2 Optimal gain control problems, Lyapunov equation

We will now set up application areas. Optimal gain control and circuit analysis problems. In general we have\[ \dot{x}=Ax+Bu \] Where $\dot{x}$ is $n\times 1,A$ is $n\times n$, $B$ is $n\times m$ and $u$ is $m\times 1$. System has $n$ states. $u$ is the input. This can be voltage or current sources. $u$ is the control and $x$ is the state. We want to select $u$ so that $x\left ( t\right ) $ behaves optimally. Classical setup is to use state feedback \[ u=kx+v \] Where $k$ is $m\times n$ is called the feedback gain matrix and $v$ is extra input but we will not use it. It is there for extra ﬂexibility if needed. We use optimization to determine $k$. Entries of $k_{ij}$ are our optimization variables.

Often, with $u=0$, the system $\dot{x}=Ax$ may be unstable or have overshoot. We will set up a performance objective aimed at reducing or eliminating the badness of the original response (with no feedback control). Let \[ J\left ( k\right ) =\int _{0}^{\infty }\overset{\left \Vert x\right \Vert ^{2}}{\overbrace{x^{T}\left ( t\right ) x\left ( t\right ) }}dt \] In the above, $J\left ( k\right )$ is implicit function of $k$. This cost function was found to work well in practice. We now want to make $J\left ( k\right ) $ explicit in $k$. We can solve for $x\left ( t\right ) $ from \begin{align*} \dot{x} & =\left ( A+Bk\right ) x\\ x & =x\left ( 0\right ) e^{\left ( A+Bk\right ) t} \end{align*}

Then $J\left ( k\right ) =\int _{0}^{\infty }x^{T}\left ( 0\right ) e^{\left ( A+Bk\right ) ^{T}t}dt$. But this is not practical to use. This is not closed form and hard to compute. So how can we come up with closed form for $J\left ( k\right ) $ which is easier to work with? Let us look at the closed loop. Let $v=0$ and we have\begin{align*} \dot{x} & =Ax+Bkx\\ & =\left ( A+Bk\right ) x\\ & =A_{c}x \end{align*}

Where $A_{c}$ is the closed loop system matrix. Let us ﬁnd a matrix $P$ if possible such that \[ d\left ( x^{T}\left ( t\right ) Px\left ( t\right ) \right ) =-x^{T}\left ( t\right ) x\left ( t\right ) \] So that now \begin{align*} J\left ( k\right ) & =\int _{a}^{b}x^{T}\left ( t\right ) x\left ( t\right ) dt\\ & =-\int _{a}^{b}d\left ( x^{T}\left ( t\right ) Px\left ( t\right ) \right ) \\ & =\int _{b}^{a}d\left ( x^{T}\left ( t\right ) Px\left ( t\right ) \right ) \\ & =x^{T}\left ( a\right ) Px\left ( a\right ) -x^{T}\left ( b\right ) Px\left ( b\right ) \end{align*}

Can we ﬁnd $P$?\begin{align*} d\left ( x^{T}Px\right ) & =x^{T}P\dot{x}+\dot{x}^{T}Px\overset{?}{=}-x^{T}x\\ & =x^{T}P\left ( A_{c}x\right ) +\left ( A_{c}x\right ) ^{T}Px\overset{?}{=}-x^{T}x\\ & =x^{T}P\left ( A_{c}x\right ) +\left ( x^{T}A_{c}^{T}\right ) Px\overset{?}{=}-x^{T}x \end{align*}

Bring all the $x$ to LHS then \[ A_{c}^{T}x+PA_{c}=-I \] Where $I$ is the identity matrix. This is called the Lyapunov equation. This is the equation to determine $P$. Without loss of generality, we insist on $P$ being symmetric matrix. Using this $P$, now we write\begin{align*} J\left ( k\right ) & =\int _{0}^{\infty }x^{T}\left ( t\right ) x\left ( t\right ) dt\\ & =-\int _{0}^{\infty }d\left ( x^{T}Px\right ) \\ & =\left . x^{T}Px\right \vert _{\infty }^{0}\\ & =x^{T}\left ( 0\right ) Px\left ( 0\right ) -x^{T}\left ( \infty \right ) Px\left ( \infty \right ) \end{align*}

For stable system, $x\left ( \infty \right ) \rightarrow 0$ (remember that we set $v=0$, so there is no external input, hence if the system is stable, it must end up in zero state eventually). Therefore\[ J\left ( k\right ) =x^{T}\left ( 0\right ) Px\left ( 0\right ) \] With $k$ satisfying \[ A_{c}^{T}\left ( k\right ) x+PA_{c}\left ( k\right ) =-I \] Example Let $y^{\prime \prime }=u$. Hence $x_{1}^{\prime }=x_{2},x_{2}^{\prime }=u$. Note, this is not stable with $u=0$. Using linear state feedback,\begin{align*} u & =kx\\ u & =\begin{bmatrix} k_{1} & k_{2}\end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\end{bmatrix} \end{align*}

Hence \begin{align*} x^{\prime } & =Ax+Bu\\ & =Ax+Bkx\\ & =\left ( A+Bk\right ) x\\\begin{bmatrix} x_{1}^{\prime }\\ x_{2}^{\prime }\end{bmatrix} & =\left ( \overset{A}{\overbrace{\begin{bmatrix} 0 & 1\\ 0 & 0 \end{bmatrix} }}+\overset{B}{\overbrace{\begin{bmatrix} 0\\ 1 \end{bmatrix} }}\overset{k}{\overbrace{\begin{bmatrix} k_{1} & k_{2}\end{bmatrix} }}\right ) \begin{bmatrix} x_{1}\\ x_{2}\end{bmatrix} \\ & =\left ( \begin{bmatrix} 0 & 1\\ 0 & 0 \end{bmatrix} +\begin{bmatrix} 0 & 0\\ k_{1} & k_{2}\end{bmatrix} \right ) \begin{bmatrix} x_{1}\\ x_{2}\end{bmatrix} \\ & =\overset{A_{c}}{\overbrace{\begin{bmatrix} 0 & 1\\ k_{1} & k_{2}\end{bmatrix} }}\begin{bmatrix} x_{1}\\ x_{2}\end{bmatrix} \end{align*}

For stable closed loop, we need $k_{1}<0,k_{2}<0\,$ by looking at characteristic polynomial roots. Now we solve the Lyapunov equation.\begin{align*} A_{c}^{T}P+PA_{c} & =-I\\\begin{bmatrix} 0 & 1\\ k_{1} & k_{2}\end{bmatrix} ^{T}\begin{bmatrix} p_{11} & p_{12}\\ p_{21} & p_{22}\end{bmatrix} +\begin{bmatrix} p_{11} & p_{12}\\ p_{21} & p_{22}\end{bmatrix}\begin{bmatrix} 0 & 1\\ k_{1} & k_{2}\end{bmatrix} & =\begin{bmatrix} -1 & 0\\ 0 & -1 \end{bmatrix} \\\begin{bmatrix} 0 & k_{1}\\ 1 & k_{2}\end{bmatrix}\begin{bmatrix} p_{11} & p_{12}\\ p_{21} & p_{22}\end{bmatrix} +\begin{bmatrix} p_{11} & p_{12}\\ p_{21} & p_{22}\end{bmatrix}\begin{bmatrix} 0 & 1\\ k_{1} & k_{2}\end{bmatrix} & =\begin{bmatrix} -1 & 0\\ 0 & -1 \end{bmatrix} \end{align*}

Solving for $P$ gives\[ P=\begin{bmatrix} \frac{k_{2}^{2}-k_{1}+k_{1}^{2}}{2k_{1}k_{2}} & -\frac{1}{2k_{1}}\\ -\frac{1}{2k_{1}} & \frac{1-k_{1}}{2k_{1}k_{2}}\end{bmatrix} \] With $x\left ( 0\right ) =\begin{bmatrix} 1\\ 1 \end{bmatrix} $, then\begin{align*} J\left ( k\right ) & =x\left ( 0\right ) ^{T}Px\left ( 0\right ) \\ & =\begin{bmatrix} 1 & 1 \end{bmatrix}\begin{bmatrix} \frac{k_{2}^{2}-k_{1}+k_{1}^{2}}{2k_{1}k_{2}} & -\frac{1}{2k_{1}}\\ -\frac{1}{2k_{1}} & \frac{1-k_{1}}{2k_{1}k_{2}}\end{bmatrix}\begin{bmatrix} 1\\ 1 \end{bmatrix} \\ & =\frac{k_{1}^{2}+k_{2}^{2}-2k_{1}-2k_{2}+1}{2k_{1}k_{2}} \end{align*}

Let $k_{1}=k_{2}=k<0$, we obtain\[ J\left ( k\right ) =\frac{2k^{2}-4k+1}{2k^{2}}\] As $k\rightarrow -\infty $ then $J^{\ast }\rightarrow 1$. Therefore $J^{\ast }$ can never get to zero. This means there is no $k_{1}^{\ast },k_{2}^{\ast }$ such that $\nabla J\left ( k^{\ast }\right ) =0$. Set $k$ is not compact. Not coercive either. This is ill posed problem. This can be remedied by changing the control to \[ J\left ( k\right ) =\int _{0}^{\infty }x^{T}xdt+\lambda \int _{0}^{\infty }u^{T}udt \]

2.9 Lecture 9. Tuesday, February 16, 2016

2.9.1 keywords for next exam 1

2.9.2 Gradient based optimization

Starting new chapter. Gradient based optimization. Many algorithms involve line searches. In other words, optimization in $\Re ^{n}$ is often solved by performing many optimization (line searches) in $\Re $.

Optimization algorithm: Starting with $\mathbf{u}^{k}$ and direction $\mathbf{v}$ we study $J\left ( \mathbf{u}^{k}+h\mathbf{v}\right )$ where $h$ is the step size. This is called line search. $h\in \left [ 0,h_{\max }\right ] $. We want to use optimal step size $h^{\ast }$. Once found, then \[ \boxed{ \mathbf{u}^{k+1}=\mathbf{u}^{k}+h^{\ast }\mathbf{v} } \] Reader Read and learn about line search. Bisection, Golden section, Fibonacci and many more. We will not cover this in this course. We also want to minimize the number of function evaluations, since these can be expensive.

One way to do line search, is to do $h=0\, \delta \, h_{\max }$ and then evaluate $J\left ( h\right ) $ and pick the minimizing $h^{\ast }$. For stopping criteria, we can check for the following

We also need to pick a starting point for the search. This is $\mathbf{u}^{0}$. What if we do not know where to start? We can pick multiple starting locations. And pick the best result obtained.

Reader Find $\min _{\left \Vert \mathbf{v}\right \Vert =1}J\left ( \mathbf{u}+\mathbf{v}\right ) $. Show optimal $\mathbf{v}$ is \[ \boxed{ v^{\ast } =\frac{-\nabla J\left ( u\right ) }{\left \Vert \nabla J\left ( u\right ) \right \Vert } } \] This is called myopic local terrain. Gets us to local minimum. The steepest descent algorithm is the following:

See my class study notes for detailed algorithm of all search methods we did in this course.

Example: Let $u^{0}=\left [ 1,1\right ] $. Let $h=0.1$. Let $J\left ( u\right ) =u_{1}^{2}+2u_{2}^{2}-6u_{1}u_{2}+2u_{1}+u_{2}+4$. Then \[ \nabla J\left ( u\right ) =\begin{bmatrix} 2u_{1}-6u_{2}+2\\ 6u_{2}-6u_{1}+1 \end{bmatrix} \]

So $\nabla J\left ( u^{0}\right ) =\begin{bmatrix} -2\\ 1 \end{bmatrix} $ and $\left \Vert \nabla J\left ( u^{0}\right ) \right \Vert =\frac{1}{\sqrt{5}}$. Hence \begin{align*} u^{1} & =u^{0}-h\begin{bmatrix} -2\\ 1 \end{bmatrix} \frac{1}{\sqrt{5}}\\ & =\begin{bmatrix} 1\\ 1 \end{bmatrix} -0.1\begin{bmatrix} -2\\ 1 \end{bmatrix} \frac{1}{\sqrt{5}}\\ & =\begin{bmatrix} 1.0894\\ 0.95528 \end{bmatrix} \end{align*}

2.10 Lecture 10. Thursday, February 18, 2016 (Exam 1)

2.11 Lecture 11. Tuesday, February 23, 2016

Watch for HW4 going out today. Implementation of steepest descent with optimal step size. See handout circuit for 2 stage ampliﬁer.

2.11.1 Steepest descent

As the number of stages increases it becomes harder to analytically determine the optimal capacitance of each stage to produce maximum power. For two stages, by direct circuit analysis we obtain \[ J\left ( u\right ) =\left ( 11-u_{1}-u_{2}\right ) ^{2}+\left ( 1+u_{1}+10u_{2}-u_{1}u_{2}\right ) ^{2}\] Where $u_{i}$ is capacitance. There are two optimal values, they are $u^{\ast }=\left ( 10,1\right ) $ which is a maximum and $u^{\ast }=\left ( 13,4\right ) $ which is a minimum. There is also a minimum at $\left (7,-2\right ) $.

Geometric insight on what might go wrong with steepest descent: Gradient algorithms work best from a far but as they get close to the optimal point, there are better algorithms such as the generalized Newton Raphson method which works best when close to the optimal point. If the step size $h$ is big, we approach the optimal fast, but because the step size is large, we can overshoot and will end up oscillating around the optimal point. If $h$ is too small, the search will become very slow. Hence we use steepest descent but with optimal step size, where the step size is calculated at each step. Ingredients of the steepest descent algorithm are:

Convergence result. From Polak. Let $J\left ( u\right ) $ be smooth and diﬀerentiable. Let $u^{\ast }$ be strong local minimum. Assume constant $0\leq m\leq M$ s.t.

In neighborhood of $u^{\ast }$. This criteria says that there is a good convexity and a bad convexity. What does this mean? We’ll say more about this. In the neighborhood of $u^{\ast }$ let $\theta =\frac{m}{M}$. Interpretation:

Deﬁne $E=J\left ( u^{0}\right ) -J\left ( u^{\ast }\right ) $ then Polak says

Best case is when $E$ is small and $\theta $ is small. This is local result.

2.11.2 Classiﬁcations of Convergence

These are the three convergence types we will cover. The second algorithm has quadratic convergence, which is the generalized Newton-Raphson method. We will start on this now but will cover it fully next lecture.

The idea is to approximate $J\left ( u\right ) $ as quadratic at each step and obtain $h_{k}$. By assuming $J\left ( k\right ) $ is quadratic locally, we approximate $J\left (u\right ) $ using Taylor and drop all terms after the Hessian. Now we ﬁnd where the minimum is and use the step size to ﬁnd $u^{k+1}$. More on this next lecture.

2.12 Lecture 12. Thursday, February 25, 2016

2.12.1 Quadratic optimization, superlinear convergence

We will start the class with a reader problem. Consider \begin{align} J\left ( u\right ) & =\frac{1}{2}u^{T}Au+b^{T}u+c\tag{1}\\ \nabla J\left ( u\right ) & =Au^{k}+b \tag{2} \end{align}

with $A$ being symmetric positive deﬁnite (PSD) matrix. This is classic quadratic objective function. You can take a complete course on quadratic optimization. The global optimal is at the solution for $\nabla J(u) =0$. Hence we write\begin{align*} \nabla J(u) & = 0\\ Au^{\ast } +b & = 0\\ u^{\ast } & = -A^{-1} b \tag{3} \end{align*}

Note that since $A$ is PSD (the Hessian is PSD), then we know that $J\left ( u\right ) $ is convex. Hence the local minimum is also a global minimum. Now we imagine we are doing steepest descent on this function and we are at iterate $u^{k}$ with optimal step size, which we can make as large as we want. Hence we need to optimize \[ \tilde{J}\left ( h\right ) =J\left ( u^{k}-h\left [ \nabla J\left ( u\right ) \right ] \right ) \] for $h$. Notice we did not divide by $\left \Vert \nabla J\left ( u\right ) \right \Vert $ here, since the step size is free to be as large as needed. Expanding the above using (2) gives\begin{align*} \tilde{J}\left ( h\right ) & =J\left ( u^{k}-h\left ( Au^{k}+b\right ) \right ) \\ & =J\left ( \left ( I-hA\right ) u^{k}-hb\right ) \end{align*}

Using (1) for RHS of the above gives\[ \tilde{J}\left ( h\right ) =\frac{1}{2}\left ( \left ( I-hA\right ) u^{k}-hb\right ) ^{T}A\left ( \left ( I-hA\right ) u^{k}-hb\right ) +b^{T}\left ( \left ( I-hA\right ) u^{k}-hb\right ) +c \] The above is quadratic in $h$. The optimal $h$ we are solving for. Simplifying gives\[ \tilde{J}\left ( h\right ) =\frac{1}{2}\left ( Au^{k}+b\right ) ^{T}A\left ( Au^{k}+b\right ) h^{2}-\left ( Au^{k}+b\right ) ^{T}\left ( Au^{k}+b\right ) h+\text{ constant terms}\] To ﬁnd optimal $h$, then\begin{align*} \frac{d\tilde{J}\left ( h\right ) }{dh} & =0\\ \left ( Au^{k}+b\right ) ^{T}A\left ( Au^{k}+b\right ) h-\left ( Au^{k}+b\right ) ^{T}\left ( Au^{k}+b\right ) & =0\\ h^{\ast } & =\frac{\left ( Au^{k}+b\right ) ^{T}\left ( Au^{k}+b\right ) }{\left ( Au^{k}+b\right ) ^{T}A\left ( Au^{k}+b\right ) } \end{align*}

In practice, we would need to check $\frac{d^{2}\tilde{J}\left ( h\right ) }{dh^{2}}$ also to make sure $h$ is minimizer.

Reader Why does it take multiple iterations to get the common sense answer $u^{\ast }=-A^{-1}b$?

For quadratic objective function $J\left ( u\right ) $ we can obtain $u^{\ast }$ in one step, using $u^{\ast }=-A^{-1}b$. This is the idea behind the generalized Newton-Raphson method. $J\left ( u^{k}\right ) $ is approximated as quadratic function at each step, and $h^{\ast }$ is found from above. To elaborate, expanding by Taylor\[ J\left ( u^{k}+\Delta u\right ) =J\left ( u^{k}\right ) +\nabla J\left ( u^{k}\right ) ^{T}\Delta u+\frac{1}{2}\Delta u^{T}\nabla ^{2}J\left ( u^{k}\right ) \Delta u+HOT \] We approximate as quadratic by dropping higher order terms and optimize for $\Delta u$ (same as $h$ used earlier), and here $\nabla ^{2}J\left ( u^{k}\right ) $ is same as the $A$ above also. Therefore we ﬁnd \[ \Delta u^{\ast }=-\left [ \nabla ^{2}J\left ( u^{k}\right ) \right ] ^{-1}\nabla J\left ( u^{k}\right ) \] This converges in one step $\Delta u^{\ast }$ if $J\left ( u\right ) $ was actually a quadratic function. Notice that Newton method is expensive if used repeatedly, as it requires ﬁnding Hessian at each step and also ﬁnding the inverse of it. The algorithm is: Initialize $u^{0}$. Then iterate, where\begin{equation} u^{k+1}=u^{k}-\left [ \nabla ^{2}J\left ( u^{k}\right ) \right ] ^{-1}\nabla J\left ( u^{k}\right ) \tag{4} \end{equation} Then check for convergence. If Hessian fails to be PSD, in this case $J\left ( u^{k+1}\right ) $ can end up increasing not decreasing. How to stop? We can try more iterations to see if $J\left ( u^{k}\right ) $ will decrease again.

Example Let \[ J\left ( u\right ) =\left ( 11-u_{1}-u_{2}\right ) ^{2}+\left ( 1+10u_{2}+u_{1}-u_{1}u_{2}\right ) ^{2}\] Pick $u^{0}=\left ( 18,3\right ) $ then\begin{align*} \nabla J\left ( u^{0}\right ) & =\begin{bmatrix} 40\\ 100 \end{bmatrix} \\ \nabla ^{2}J\left ( u^{0}\right ) & =\begin{bmatrix} 10 & 44\\ 44 & 130 \end{bmatrix} \end{align*}

We see here that $\nabla ^{2}J\left ( u^{0}\right ) $ is not PSD (determinant is negative). Now we do the iterate equation (4), obtaining\begin{align*} u^{1} & =u^{0}-\begin{bmatrix} 10 & 44\\ 44 & 130 \end{bmatrix}\begin{bmatrix} 40\\ 100 \end{bmatrix} \\ & =\begin{bmatrix} 18\\ 3 \end{bmatrix} -\begin{bmatrix} 10 & 44\\ 44 & 130 \end{bmatrix}\begin{bmatrix} 40\\ 100 \end{bmatrix} \\ & =\begin{bmatrix} 19.3\\ 1.8 \end{bmatrix} \end{align*}

If we look at the contour plot, we will see this point made $J\left ( u\right ) $ larger (away from optimal) but if we let it iterate more, it will turn and move back to the optimal point.

Now we will start on the third algorithm. Conjugate gradient algorithms (CG) are family of algorithms with property of superlinear convergence. It also has quadratic convergence.

2.12.2 Quadratic convergence

Quadratic convergence says the following: If $J\left ( u\right ) $ is positive deﬁnite quadratic form, the iterate $u^{k}\rightarrow u^{\ast }$ completes in ﬁnite number of steps $N$. This means if we give the algorithm a quadratic form function, it will converge in $N$ steps to the optimal. The diﬀerence between this and Newton-Raphson, is that there is no Hessian to be calculated using this algorithm as the case was with Newton-Raphson. The conjugate direction algorithms work well from far away and also when close to the optimal point. (note: Steepest descent worked well from a far, but not when getting close to the optimal point $u^{\ast }$). The CG algorithms also have the property of superlinear convergence. This property do not apply to steepest descent.

2.12.3 Superlinear convergence

What is superlinear convergence? A sequence $\left \{ u^{k}\right \} $ in $\Re ^{n}$ is said to converge super-linearly to $u^{\ast }$ if the following holds: Given any $\theta \in (0,1]$, then $\frac{\left \Vert u^{k}-u^{\ast }\right \Vert }{\theta ^{k}}\rightarrow 0$. The important part of this deﬁnition is that the above should go to zero for any $\theta \in (0,1]$ and not some $\theta $. Examples below illustrate this. Let $\left \{ u^{k}\right \} =\frac{1}{k}$. Hence the sequence is $\left \{ 1,\frac{1}{2},\frac{1}{3},\cdots \right \} $. Clearly this sequence goes to $u^{\ast }=0$. Does it converge super-linearly to $u^{\ast }$? Applying the deﬁnition\[ \frac{\left \Vert \frac{1}{k}-0\right \Vert }{\theta ^{k}}=\frac{1}{k\theta ^{k}}\] If we can ﬁnd one $\theta $ that do not converge to zero, then we are done. Trying $\theta =\frac{3}{4}$, then the above becomes $\frac{4^{k}}{k3^{k}}$ which do not go to zero as $k\rightarrow \infty $. Hence this is not superlinear. How about $\left \{ u^{k}\right \} =\frac{1}{k^{2}}$. This is still not superlinear. Similarly $\left \{ u^{k}\right \} =\frac{1}{k^{m}}$. What about $\left \{ u^{k}\right \} =\frac{1}{e^{k}}$. Here we get \[ \frac{\left \Vert \frac{1}{k}-0\right \Vert }{\theta ^{k}}=\frac{e^{-k}}{\theta ^{k}}\] Trying $\theta =\frac{1}{2}$ gives $\frac{2^{k}}{e^{k}}$ which is not superlinear (do not go to zero for large $k$). But if $\theta =\frac{2}{3}$ it will converge. But it has to converge for all $\theta $, so $\frac{1}{e^{k}}$ is not superlinear. How about $\left \{ u^{k}\right \} =\frac{1}{e^{k^{2}}}\,$ here we ﬁnd it is superlinear. We obtain $\frac{e^{-k^{2}}}{\theta ^{k}}$ and this goes to zero for any $\theta $. To show this, use $\log $ on it and simplify. (Reader).

2.13 Lecture 13. Tuesday, March 1, 2016

2.13.1 Conjugate direction algorithms

Today lecture will be devoted to conjugate direction (C.D.) algorithms. We will start by remembering that there are many C.D. algorithms. From last lecture, be aware of: Superlinear convergence and quadratic convergence. The quadratic convergence concept is that on a P.S.D. (positive symmetric deﬁnite) form, the algorithm will converge in $n$ steps or less (where $n$ is the size $A$). Using exact arithmetic (not counting for ﬂoating point errors). We will proof this today. We will some preliminaries, then go over ingredients and go over examples, then go over properties of conjugate gradient.

Preliminaries: Let $A^{n\times n}$ be positive deﬁnite symmetric, then the pair of vectors $u,v$ are said to be mutually conjugate w.r.t. $A$ if \[ u^{T}Av=0 \] This is generalization of orthogonality. Because we can take $A=I_{n}$ which is PSD.

Reader A set of distinct mutually conjugate vectors always exist for $A$. These are the eigenvectors of $A$. The proof starts with writing $Av=\lambda _{1}v$ and $Au=\lambda _{2}u$, then applying $u^{T}Av=0$.

We will use this set of vectors as search directions. We will generate these vectors on the ﬂy during the search and do line search along these directions. So instead of using $\nabla J\left ( u\right ) $ as the direction we did line search on when using steepest descent, we will now use the conjugate vectors instead.

Properties: Suppose $v^{0},v^{1},\cdots ,v^{n-1}$ is a set of mutually conjugate vectors w.r.t $A$. ($A$ is PSD). First step is to show these vectors are linearly independent.

Lemma: $v^{i}$ are linearly independent. Proof: Suppose \begin{equation}{\displaystyle \sum \limits _{i=0}^{n-1}} \alpha _{i}v^{i}=0 \tag{1} \end{equation} For scalars $\alpha _{i}$. We must show that all $\alpha _{i}=0$. Let use consider $\alpha _{k}$. If we can show that $\alpha _{k}=0$ for any $k$, then we are done. Multiply (1) by $\left ( v^{k}\right ) ^{T}A$. Then\begin{align*} \left ( v^{k}\right ) ^{T}A{\displaystyle \sum \limits _{i=0}^{n-1}} \alpha _{i}v^{i} & =0\\{\displaystyle \sum \limits _{i=0}^{n-1}} \alpha _{i}\left ( v^{k}\right ) ^{T}Av^{i} & =0 \end{align*}

By mutual conjugate property, then all terms above vanish except $\alpha _{k}\left ( v^{k}\right ) ^{T}Av^{k}$. Hence \[ \alpha _{k}\left ( v^{k}\right ) ^{T}Av^{k}=0 \] But $A$ is PSD and $v^{k}\neq 0$, therefore $\alpha _{k}=0$ is only choice. QED. We have proved that $v^{0},v^{1},\cdots ,v^{n-1}$ are linearly independent So we can expand any vector $u\in \Re ^{n}$ using these as basis vectors\begin{equation} u={\displaystyle \sum \limits _{i=0}^{n-1}} a_{i}v^{i} \tag{2} \end{equation} Let us ﬁnd the coeﬃcients $a_{i}$. Premultiply by $\left ( v^{k}\right ) ^{T}A$ both sides\[ \left ( v^{k}\right ) ^{T}Au={\displaystyle \sum \limits _{i=0}^{n-1}} a_{i}\left ( v^{k}\right ) ^{T}Av^{i}\] As before, by mutual conjugate, the RHS becomes $a_{k}\left ( v^{k}\right ) ^{T}Av^{k}$. Solving for $a_{k}$ gives\[ a_{k}=\frac{\left ( v^{k}\right ) ^{T}Au}{\left ( v^{k}\right ) ^{T}Av^{k}}\] Hence (2) becomes\[ u={\displaystyle \sum \limits _{i=0}^{n-1}} \frac{\left ( v^{i}\right ) ^{T}Au}{\left ( v^{i}\right ) ^{T}Av^{i}}v^{i}\] This gives any vector $u$ in terms of set of vectors $v^{i}$ that are linearly independent

Example: Fletcher Reeves. \begin{align*} v^{0} & =-\nabla J\left ( u\right ) \\ v^{k+1} & =-\nabla J\left ( u^{k+1}\right ) +\frac{\left \Vert \nabla J\left ( u^{k+1}\right ) \right \Vert ^{2}}{\left \Vert \nabla J\left ( u^{k}\right ) \right \Vert ^{2}}v^{k} \end{align*}

Reader What is $A$ above? Where are these $v^{k}$ vectors mutually conjugate? $A$ is the Hessian. Note: These algorithms (C.D.) converge for convex $J\left ( u\right ) $. If $J\left ( u\right ) $ is not convex or we do not know, we need to put conditions to make sure it is converging.

2.13.2 Quadratic convergence theorem

Consider quadratic form \[ J\left ( u\right ) =\frac{1}{2}u^{T}Au+b^{T}u+c \] With $A=A^{T}$ and positive deﬁnite $n\times n~$matrix. Let $v^{0},\cdots ,v^{n+1}$ be mutually conjugate w.r.t. $A$. Let step size be as large as we want. The conjugate direction algorithm converges to optimal $u^{\ast }=-A^{-1}b$ in $n$ steps or less

Let $u^{k}$ be the $k^{th}$ iterate. If $u^{k}=u^{\ast }$ and $k\leq n$ then we are done. Without loss of generality, assume $u^{k}\neq u^{\ast }$. We must show that $u^{n}=u^{\ast }$. We ﬁrst ﬁnd $h_{k}$, the step size at iterate $k$. From \begin{align*} \tilde{J}\left ( h\right ) & =J\left ( u^{k}+hv^{k}\right ) \\ & =\frac{1}{2}\left ( u^{k}+hv^{k}\right ) ^{T}A\left ( u^{k}+hv^{k}\right ) +b^{T}\left ( u^{k}+hv^{k}\right ) +c \end{align*}

This is quadratic in $h$. \[ \tilde{J}(h) = \frac{1}{2} (v^{k})^{T} A v^{k} h^{2}+ \left ( (v^{k})^{T} A u^{k} + b^{T} v^{k} \right ) h + \overbrace{ \frac{1}{2} u^{k} A u^{k} +c}^{\text{constant term}} \] Now taking derivative gives\[ \frac{d\tilde{J}\left ( h\right ) }{dh}=\left ( v^{k}\right ) ^{T}Av^{k}h+\left ( \left ( v^{k}\right ) ^{T}Au^{k}+b^{T}v^{k}\right ) \] Setting this to zero and solving for $h$ gives\begin{align} h^{\ast } & =-\frac{\left ( v^{k}\right ) ^{T}Au^{k}+b^{T}v^{k}}{\left ( v^{k}\right ) ^{T}Av^{k}}\nonumber \\ & =-\frac{\left ( v^{k}\right ) ^{T}\left ( Au^{k}+b\right ) }{\left ( v^{k}\right ) ^{T}Av^{k}} \tag{3} \end{align}

Hence\begin{align*} u^{n} & =u^{0}+h_{0}v^{0}+\cdots +h_{n-1}v^{n-1}\\ & =u^{0}+{\displaystyle \sum \limits _{k=0}^{n-1}} h_{k}v^{k} \end{align*}

Using (3) in the RHS of above, replacing each $h_{k}$ with the optimal $h$ at each iterate gives\begin{align*} u^{n} & =u^{0}-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}\left ( Au^{k}+b\right ) }{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}\\ & =u^{0}-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}b}{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}Au^{k}}{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k} \end{align*}

Replacing $u^{k}$ in the second term above in the RHS with $u^{0}+{\displaystyle \sum \limits _{i=0}^{k-1}} h_{i}v^{i}$ gives\begin{equation} u^{n}=u^{0}-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}b}{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}-{\displaystyle \sum \limits _{k=0}^{n-1}} \frac{\left ( v^{k}\right ) ^{T}A\left ( u^{0}+{\displaystyle \sum \limits _{i=0}^{k-1}} h_{i}v^{i}\right ) }{\left ( v^{k}\right ) ^{T}Av^{k}}v^{k} \tag{4} \end{equation} But \begin{align*} \left ( v^{k}\right ) ^{T}A\left ( u^{0}+{\displaystyle \sum \limits _{i=0}^{k-1}} h_{i}v^{i}\right ) & =\left ( v^{k}\right ) ^{T}Au^{0}+{\displaystyle \sum \limits _{i=0}^{k-1}} h_{i}\left ( v^{k}\right ) ^{T}v^{i}\\ & =\left ( v^{k}\right ) ^{T}Au^{0} \end{align*}

Since all terms in ${\displaystyle \sum \limits _{i=0}^{k-1}} h_{i}\left ( v^{k}\right ) ^{T}v^{i}$ vanish by mutual conjugate property. Using this to simplify (4) gives\begin{align*} u^{n} & =u^{0}-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}b}{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}Au^{0}}{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}\\ & =u^{0}-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}\left ( Au^{0}+b\right ) }{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k} \end{align*}

Reader $\left ( v^{k}\right ) ^{T}Au^{0}$ is expansion of $u^{0}$. Using this in the above reduces it to \[ u^{n}=-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}b}{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}\] Insert $AA^{-1}$ into the above gives\begin{align*} u^{n} & =-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}AA^{-1}b}{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}\\ & =-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( \left ( v^{k}\right ) ^{T}A\right ) \left ( A^{-1}b\right ) }{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}\\ & =-A^{-1}b \end{align*}

Following some extra remarks added later from An introduction to optimization by Ching and Zak, 1996:

See Example 10.1, page 133 of the above text for illustrations how to determine each of $v^{i}$ vectors for given $A$ matrix.

2.14 Lecture 14. Thursday, March 3, 2016

2.14.1 Constraints and linear programming

We are about to enter new phase of the course with constraints and linear programming. Until now we used iterative methods to solve unconstrained problems. These are gradient based. Also looked at Newton-Raphson. We used steepest descent and conjugate directions. These methods are mainly applied to problem without constraints. i.e. $u$ is free, where $u$ are the variables. But in HW4 we had problem where $u$ was the capacitance. This can not be negative. But we did not account for this. When we have constraints and want to use the above iterative methods, there are ad hoc methods to handle this, but we will not cover these ad-hoc methods in this course, but will mention some of them.

We can check that no constraint is violated during the search and start a new search. There are literature on what is called “projection methods” and other names.

One good method is called the “penalty function method”. This works as follows

\[ \tilde{J}\left ( u\right ) =J\left ( u\right ) +J_{penalty}\left ( u\right ) \]

The original objective function is $J\left ( u\right ) $ and $J_{penalty}\left ( u\right ) $ is function we add such that it becomes very large when $u$ constraint is violated ($u\notin U$). (assuming we are minimizing $J(u)$).

This method works on many problems. For example, if we do not want $u_{1}$ to be negative, we can add\[ J_{penalty}\left ( u\right ) =-100\min \left ( 0,u_{1}\right ) \] This way, when $u_{1} \leq 0$, the result will be very large and positive. Hence $\tilde{J}\left ( u\right ) $ will become very large and the search will avoid this region and turn away during the line search.

But a theory with name of “Kohn Tucker” is the main way to handle such problems under heading of “non-linear programming”.

Now we go back to linear programming which we will cover over the next 4–5 lectures. key points of linear programming are

2.14.2 History of linear programming

2.14.3 Polytopes

Polytopes are central to L.P since polytopes are described by constraints. See handout “Polytopes” taken from textbook Barmish, Robust Control.

Polytope is convex hull of ﬁnite point set. These are the generators $v^{1},\cdots ,v^{N}$. Polytopes have extreme vertices. L.P. visits vertices. If the set is bounded, we call it polytope, else we call it polyhedron. So with linear inequalities constraints and bounded, we have polytopes.

Reader A linear function $J\left ( u\right ) =a^{T}u$ on polytope $P$ achieves its Max. or Min. at an extreme point. We showed that the Max. of convex function is at a vertex. we can also show that Min. of concave is at a vertex. Linear functions are both concave and convex. QED.

Often we do not have list of vertices. Need to ﬁrst generate them. They are generated from the constraint inequalities.

How many vertices to search? McMullen’s Upper Bound Theorem gives us the answer. Assuming we have $m$ constraints and $n$ variables, where $m\geq n$. Then\[ V\left ( m,n\right ) =\begin{pmatrix} m-\left \lfloor \frac{1}{2}(n+1)\right \rfloor \\ m-n \end{pmatrix} +\begin{pmatrix} m-\left \lfloor \frac{1}{2}(n+2)\right \rfloor \\ m-n \end{pmatrix} \] Example for $m=20,n=10\,\ $ we get $4004$ vertices. So this is a small problem.

Reader Consider $n=500,m=2000$ which is very modest in terms of current technology, assuming it takes 1 microsecond per vertex, then it will take order of $10^{296}$ years to search all the vertices.

Up to last few years, L.P. was considered a completed research. But in the last 5–10 years, there is new L.P. research starting. Modern L.P. solvers use linear inequalities description as input. This is expressed and formulated as $Ax=b$. What if vertices or some vertices generating mechanism was given as input instead of the constraints themselves? How to convert the vertices to constraints? This is a diﬃcult problem.

2.15 Lecture 15. Tuesday, March 8, 2016

2.15.1 Mechanism of linear programming

Today we will begin the mechanism of doing L.P. (linear programming). We know the extreme points is where the optimal will occur. But searching all extreme points is not practical for large $N$ as shown before. One can take a whole course just on L.P. but here we will cover the main ideas. We basically have a linear objective function in $u$ and linear constraints in $u$ where $u$ are the variables. This is called the raw L.P. formulation. This is converted to standard form L.P. and solved using the simplex method.

The solver ﬁnds the ﬁrst vertex then in a clever way moves to another until it ﬁnds the optimal one. The solver solves two linear programming problems and these are:

Standard form LP is\[ \boxed{ \begin{array} [c]{cc}\min & c^{T}x\\ st & Ax=b \end{array} } \] Notice that solution might be inﬁmum above. Example.\[\begin{array} [c]{cc}\min & x_{1}\\ st & \begin{array} [c]{c}x_{2}\leq 5\\ x_{1}<0 \end{array} \end{array} \] The solution is $x_{1}=-\infty $. This is closed by unbounded.

What if we have some variables $x_{i}$ which we want to be negative? We replace $x_{i}$ with new variable $x_{j}=-x_{i}.$ Now $x_{j}\geq 0$. Now in each place we have $x_{i}$ which is negative and can’t use, then we replace it with $-x_{j}$. This is now the same as before, but $x_{i}$ is gone and replaced with $-x_{j}$ and $x_{j}$ is now positive. So it is standard form.

At the end, when we obtain the solution, we replace $x_{j}$ back to $-x_{i}$. (what about free variables?).

What if we have inequality in the raw L.P.? how to convert to equality for standard form? We use Slack variables and Surplus variables .

Example given $x_{1}+2x_{2}-x_{3}\leq 6$, then introduce new slack variable $x_{4}$ and rewrite the constraint as $x_{1}+2x_{2}-x_{3}+x_{4}=6.$

If we have constraint $x_{1}+2x_{2}-x_{3}\geq 6$, then we need surplus variable $x_{4}$. Rewrite as $x_{1}+2x_{2}-x_{3}-x_{4}=6.$ Once we solve the LP problem and obtain $x^{\ast }$, we need to recover from this solution the actual variables of the raw LP (these are the $u$ variables) and these do not contain any slack nor surplus variables.

2.15.2 Example, the sector patrol problem

See also handout sector patrol. The objective function is $E\left ( T\right ) =\frac{\frac{u_{1}}{3}}{10}+\frac{\frac{u_{2}}{3}}{5}=\frac{u_{1}}{30}+\frac{u_{2}}{15}$. Constraints are $u_{1}\geq 0,u_{2}\geq 0$ and\begin{align*} 2u_{1}+2u_{2} & \geq 4\\ 2u_{1}+2u_{2} & \leq 10 \end{align*}

And as per handout, we need to add this constraint in order to obtain a realistic solution\[ u_{2}\geq 1.5u_{1}\] The above is the raw LP. Convert to standard form, using $x$ as variables. It becomes\begin{align*} 2x_{1}+2x_{2}-x_{4} & =4\\ 2x_{1}+2x_{2}+x_{3} & =10\\ -1.5x_{1}+x_{2}-x_{5} & =0 \end{align*}

Where $x_{3},x_{4},x_{5}$ above were added to make it standard form. Writing it as $Ax=b$, the constraint equation is (we put the slack variables ﬁrst by convention)\[\begin{bmatrix} 2 & 2 & 1 & 0 & 0\\ 2 & 2 & 0 & -1 & 0\\ -1.5 & 1 & 0 & 0 & -1 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{4}\end{bmatrix} =\begin{bmatrix} 10\\ 4\\ 0 \end{bmatrix} \] And $c^{T}x$ becomes (this is the objective function, notice we added the slack and surplus variables to it, but they are all zeros).\[\begin{bmatrix} \frac{1}{30} & \frac{1}{15} & 0 & 0 & 0 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{4}\end{bmatrix} \] Reader ﬁnd common sense solution working in $u_{1},u_{2}$ domain. (will do this next lecture).

2.15.3 Basic and Feasible solutions

A vector $x$ is said to be basic solution if it solves $Ax=b$ and the non-zeros elements of $x$ correspond to the linearly independent columns of $A.$

2.16 Lecture 16. Thursday, March 10, 2016

We talked about transforming the problem from raw LP to standard LP. The patrol sector problem, solved using common sense graphical approach is given below

The optimal $u^{\ast }$ has to be at one of the vertices of the feasible region. It will be at the vertex shown

Using Matlab, the above is solved as follows\[\begin{bmatrix} 2 & 2 & 1 & 0 & 0\\ 2 & 2 & 0 & -1 & 0\\ -1.5 & 1 & 0 & 0 & -1 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{4}\end{bmatrix} =\begin{bmatrix} 10\\ 4\\ 0 \end{bmatrix} \] And $c^{T}x$ becomes (this is the objective function, notice we add the slack and surplus variables to it, but they are all zeros).\[\begin{bmatrix} \frac{1}{30} & \frac{1}{15} & 0 & 0 & 0 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{4}\end{bmatrix} \] The code is

Optimization terminated.
X =
0.799999999994766
1.20000000008299
5.99999999984448
1.55520543353025e-10
9.08450336982967e-11
FVAL =
0.106666666672025
EXITFLAG =
1
OUTPUT =
iterations: 7
algorithm: 'interior-point-legacy'
cgiterations: 0
message: 'Optimization terminated.'
constrviolation: 8.88178419700125e-16
firstorderopt: 5.07058939341966e-12

In the above, we only need to map $x(1)$ to $u_{1}$ and $x(2)$ to $u_{2}$ to read the result. We see that Matlab result matches the graphical solution.

deﬁnition For LP, we say $\mathbf{x}$ is feasible if $\mathbf{x}$ satisﬁes the constraints.

For example, for the sector patrol, let $U$ be the feasible set in $\Re ^{2}$ (raw LP). However, in standard LP, the feasible set is in $\Re ^{5}$.

Reader Obtain feasible set in $\Re ^{5}$. Obtain feasible point with either $x_{3},x_{4},x_{5}$ nonzero.

Basic solution A vector $x$ is basic solution if the non-zero components of $x$ corresponds to the linearly independent columns of $A$. We do not require feasibility to be basic solution).

The diﬀerence in LP and standard $A\mathbf{x}=\mathbf{b}$ solution we have seen before many times in linear algebra, is that in LP, we want to solve $A\mathbf{x}=\mathbf{b}$ but with $\mathbf{x}\geq \mathbf{0}$ and at same time have $\mathbf{x}$ be optimal. This what makes LP diﬀerent from standard methods of solving this problem.

The algorithm takes a solution which is feasible and makes it feasible basic solution. Then after that, we move from one basic feasible solution to another basic feasible solution while at the same time making $J\left ( u\right ) $ smaller until it reaches the optimal value.

$A$ has $m$ linearly independent columns, since it has rank $m$.\begin{align*} A\mathbf{x} & =\mathbf{b}\\\begin{bmatrix} A_{basic} & A_{notbasic}\end{bmatrix}\begin{bmatrix} x_{basic}\\ x_{notbasic}\end{bmatrix} & =\begin{bmatrix} b_{basic}\\ b_{notbasic}\end{bmatrix} \end{align*}

2.16.1 Linear programming feasible and basic solutions

Theorem Suppose LP has feasible solution, then it has a basic feasible solution. Remember: $\mathbf{x}$ is feasible if it is in the feasible region (satisﬁes constraints), and $\mathbf{x}$ is basic solution if the non-zero elements of $\mathbf{x}$ correspond to linearly independent columns of $A$.

Proof say $\mathbf{x}$ is feasible. Let $a^{1},a^{2},\cdots ,a^{p}$ denote columns of $A$ corresponding to nonzero entries of $\mathbf{x}$. Without loss of generality, say the ﬁrst $p$ columns of $A$. (we can always rearrange $A$ to make it so). There are two cases:

Now we do the squeezing process. For $\varepsilon >0$ deﬁne vector $\eta ^{\varepsilon }$ with components\[ \eta ^{\varepsilon }=\begin{array} [c]{cc}x_{i}-\varepsilon y_{i} & \text{for }i\leq p\\ 0 & i>p \end{array} \] Reader $\eta ^{\varepsilon }$ is feasible for small $\varepsilon $. Hence \[ A\eta ^{\varepsilon }=A\mathbf{x}-\varepsilon A\begin{bmatrix} y_{1}\\ y_{2}\\ \vdots \\ y_{p}\\ 0\\ \vdots \\ 0 \end{bmatrix} \] Let $\varepsilon =\min \left \{ \frac{x_{i}}{y_{i}};y_{i}>0\right \} $. Reader $\eta ^{\varepsilon }$ is basic and has at least one more zero entry than $\mathbf{x}$. So now $\mathbf{x}$ has $p-1$ columns of $A$ corresponding to non-zero entries in $\mathbf{x}$. Continuing this process, we keep ﬁnding other basic feasible solutions.

example \[\begin{bmatrix} 1 & 2 & 1 & 2 & 0\\ 1 & 1 & 1 & 2 & 1\\ 2 & -1 & 1 & 2 & 1 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end{bmatrix} =\begin{bmatrix} 3\\ 4\\ 4 \end{bmatrix} \] Let starting $\mathbf{x}=\begin{bmatrix} 0\\ 0\\ 1\\ 1\\ 1 \end{bmatrix} $ This is feasible since it satisﬁes $A\mathbf{x}=\mathbf{b}$ with $\mathbf{x}\geq \mathbf{0}$. But not basic, since the last 3 columns are not linearly independent. (the last three columns of $A$. Since these are the ones that correspond to non-zero elements of $\mathbf{x}.$ we now write\[ y_{3}a^{3}+y_{4}a^{4}+y_{5}a^{5}=0 \] Where $a^{3},a^{4},a^{5}$ represent the last three columns of $A$ and $y^{i}$ are the scalars we want to solve for. Solving, gives \begin{align*} y_{3} & =2\\ y_{4} & =-1\\ y_{5} & =0 \end{align*}

Hence, ﬁrst ﬁnd $\varepsilon =\min \left \{ \frac{x_{i}}{y_{i}};y>0\right \} $, which we ﬁnd to be $\varepsilon =\frac{1}{2}$. Now we ﬁnd\[ \mathbf{x}^{new}=\eta ^{\varepsilon }=\begin{bmatrix} x_{1}-\varepsilon y_{1}\\ x_{2}-\varepsilon y_{2}\\ x_{3}-\varepsilon y_{3}\\ x_{4}-\varepsilon y_{4}\\ x_{5}-\varepsilon y_{5}\end{bmatrix} =\begin{bmatrix} 0-\frac{1}{2}\left ( 0\right ) \\ 0-\frac{1}{2}\left ( 0\right ) \\ 1-\frac{1}{2}\left ( 2\right ) \\ 1-\frac{1}{2}\left ( -1\right ) \\ 1-\frac{1}{2}\left ( 0\right ) \end{bmatrix} =\begin{bmatrix} 0\\ 0\\ 0\\ \frac{3}{2}\\ 1 \end{bmatrix} \] Since now $a^{4},a^{5}$ are linearly independent (these are the fourth and ﬁfth columns of $A$) and these correspond to the non zero of the new $\mathbf{x}=\eta ^{\varepsilon }$, then the new $\mathbf{x}$ is basic and feasible. So we have started with feasible solution, and from it obtained a basic feasible solution. This is not the ﬁnal optimal solution $\mathbf{x}$ but we repeat this process now.

2.17 Lecture 17. Tuesday, March 15, 2016

If we have a feasible solution, we can obtain a basic feasible solution from it using the squeezing method. We have not talked about optimality yet. The next thing to consider is optimality. We will proof if we have an optimal feasible solution, then by obtaining a basic solution from it, the basic solution will remain optimal. This is called the optimality theorem. Then we will talk about extreme points.

2.17.1 Optimality theorem

If an optimal feasible solution exist, then an optimal feasible and basic solution exist as well.

Proof Suppose $\mathbf{x}$ is optimal and feasible. If $\mathbf{x}$ is basic, we are done. If not, now we need to do the squeeze process to make it basic, but now we have to do the squeeze making sure it remains optimal. Say $a^{1},a^{2},\cdots ,a^{p}$ are the columns associated with non-zero entries in $\mathbf{x}$. As before, we way, WLOG these are the ﬁrst $p$ columns in $A$. Hence there exist scalars $y_{i}$, not all zero, such that $\sum _{i=1}^{p}y_{i}a^{i}=0$. Deﬁne $\eta ^{\varepsilon }$ for scalar $\varepsilon $ such that \[ \eta ^{\varepsilon }=\begin{array} [c]{cc}x_{i}-\varepsilon y_{i} & \text{for }i\leq p\\ 0 & i>p \end{array} \] Reader For small $\varepsilon $, say $\left \vert \varepsilon \right \vert \leq \delta $, then $\eta ^{\varepsilon }$ is still feasible. Claim: For $\varepsilon $ suitable small, $\eta ^{\varepsilon }$ is optimal. It suﬃce to show that $c^{T}y=0$ with $y_{i}=0$ for $i>p$. This being the case, then $c^{T}\eta ^{\varepsilon }=c^{T}\mathbf{x}=J^{\ast }.$ By contradiction: Say $c^{T}y\neq 0$. Let $\varepsilon =\delta sgn\left ( c^{T}y\right ) $. Let us show that $\eta ^{\varepsilon }$ is better than $\mathbf{x}$. This contradicts optimality. Now\begin{align*} c^{T}\eta ^{\varepsilon } & =c^{T}\left ( x-\varepsilon y\right ) \\ & =c^{T}x-c^{T}\left ( \delta sgn\left ( c^{T}y\right ) \right ) y\\ & =c^{T}x-\delta sgn\left ( c^{T}y\right ) \left ( c^{T}y\right ) \\ & =J^{\ast }-\delta \left \vert c^{T}y\right \vert \\ & <J^{\ast } \end{align*}

Now we will talk about extreme points. Extreme points and basic feasible solution are the same thing.

2.17.2 The extreme point theorem

Let $P=\left \{ x\in \Re ^{n},x\geq 0,Ax\leq b\right \} $. This polyhedron is the feasible set. The set of extreme points of $P$ are the basic feasible solution.

Proof See handout extreme send today. We need to get to the ﬁrst feasible solution. This can be hard to obtain. Once we ﬁnd a feasible solution, then we use it to ﬁnd the ﬁrst basic feasible solution, and from them we repeat the process (using the squeeze method). As we move from one basic feasible solution to another, we do this by making $J\left ( u\right ) $ smaller.

Example Consider LP with constraints \begin{align*} x_{i} & \geq 0,i=1,2,3\\ 2x_{1}+3x_{2} & =1\\ x_{1}+x_{2}+x_{3} & =1 \end{align*}

Note that $x_{1}+x_{2}+x_{3}=1$ is common in LP optimization. It is called unit simplex. Here is a plot of the above.

Reader Plane for $2x_{1}+3x_{2}=1$ intersect the unit simplex else no feasible region exist. So there are two basic feasible solutions at $\left ( 0,\frac{1}{3},\frac{2}{3}\right ) $ and $\left ( \frac{1}{2},0,\frac{1}{2}\right ) $. These are the two red points shown in the above plot.

2.17.3 Mechanism the simplex method

We will use the sector patrol problem to show the mechanisms of simplex. \begin{align*} Ax & =b\\\begin{bmatrix} 2 & 2 & 1 & 0 & 0\\ 2 & 2 & 0 & -1 & 0\\ -1.5 & 1 & 0 & 0 & -1 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end{bmatrix} & =\begin{bmatrix} 10\\ 4\\ 0 \end{bmatrix} \end{align*}

Step 1. Form partial tableau. \[\begin{bmatrix} A & b \end{bmatrix} =\begin{bmatrix} 2 & 2 & 1 & 0 & 0 & 10\\ 2 & 2 & 0 & -1 & 0 & 4\\ -1.5 & 1 & 0 & 0 & -1 & 0 \end{bmatrix} \rightarrow \begin{bmatrix} r_{1}\\ r_{2}\\ r_{3}\end{bmatrix} \] We need to identify an identify matrix. By row operations we obtain\begin{equation} \begin{bmatrix} r_{1}\\ -r_{2}\\ -r_{3}\end{bmatrix} =\begin{bmatrix} 2 & 2 & 1 & 0 & 0 & 10\\ -2 & -2 & 0 & 1 & 0 & -4\\ 1.5 & -1 & 0 & 0 & 1 & 0 \end{bmatrix} \rightarrow \begin{bmatrix} r_{1}^{\prime }\\ r_{2}^{\prime }\\ r_{3}^{\prime }\end{bmatrix} \tag{1} \end{equation} Now we can read out $\mathbf{x}^{1}$ and we see that $x_{3}=10,x_{4}=-4,x_{5}=0$. These are the entries in $x$ which corresponds to the columns of the unit matrix inside $A$. All other entries in $x$ are assumed zero. Hence \[ \mathbf{x}^{1}=\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end{bmatrix} =\begin{bmatrix} 0\\ 0\\ 10\\ -4\\ 0 \end{bmatrix} \] Note this is not feasible but basic. Now we go to next step. We have to remove an entry from $x^{1}$ and move in its place another entry. Let us pick $x_{3}$ to kick out and move in $x_{1}$. By row operations applied to (1) we obtain\[\begin{bmatrix} \frac{r_{1}^{\prime }}{2}\\ r_{2}^{\prime }+2r_{1}^{\prime \prime }\\ r_{3}^{\prime }-1.5r_{1}^{\prime \prime }\end{bmatrix} =\begin{bmatrix} 1 & 1 & \frac{1}{2} & 0 & 0 & 5\\ 0 & 0 & 1 & 1 & 0 & 6\\ 0 & -2.5 & -0.75 & 0 & 1 & -7.5 \end{bmatrix} \rightarrow \begin{bmatrix} r_{1}^{\prime \prime }\\ r_{2}^{\prime \prime }\\ r_{3}^{\prime \prime }\end{bmatrix} \] Hence the new identity matrix is $a^{1},a^{4},a^{5}$ and therefore $x_{1}=5,x_{4}=6,x_{5}=-7.5$, These are the entries in $x$ which corresponds to the columns of the unit matrix inside $A$. All other entries in $x$ are assumed zero. Hence \[ \mathbf{x}^{2}=\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end{bmatrix} =\begin{bmatrix} 5\\ 0\\ 0\\ 6\\ -7.5 \end{bmatrix} \] This is the second basic $\mathbf{x}$ we found.

Reader Find basic solution via pivoting row operations. Now we want to redo the above, with feasibility in mind which we did not consider above when moving elements out and selecting which one to move in.

\begin{align*} Ax & =b\\\begin{bmatrix} 1 & 0 & 0 & 4 & 3 & 2\\ 0 & 1 & 0 & 2 & -1 & 2\\ 0 & 0 & 1 & -1 & 2 & 1 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\\ x_{6}\end{bmatrix} & =\begin{bmatrix} 1\\ 2\\ 3 \end{bmatrix} \\\begin{bmatrix} A & b \end{bmatrix} & =\begin{bmatrix} 1 & 0 & 0 & 4 & 3 & 2 & 1\\ 0 & 1 & 0 & 2 & -1 & 2 & 2\\ 0 & 0 & 1 & -1 & 2 & 1 & 3 \end{bmatrix} \end{align*}

Now we need to start with feasible solution. Last example we did not care about this, but now we need the ﬁrst solution to be feasible. Here $x_{1}=1,x_{2}=2,x_{3}=3$ (read out, from the identity matrix, since the ﬁrst three columns are linearly independent). We we use the squeeze process, to decide which one to kick out and which to move in. Let us for now choose arbitrarily $x_{4}$ (fourth column) to move in and we have to kick out a column. Need $\varepsilon ^{\ast }$ to decide. \[ \varepsilon ^{\ast }=\min \left \{ \frac{1}{4},\frac{1}{2}\right \} =\frac{1}{4}\] Hence it is the ﬁrst column to kick out. Remember that when doing the above to determine which $x$ to kick out, we only divide by those entries in the column which are positive. If there is a negative entry, then do not use it. That is why in the above, we did not write $\varepsilon ^{\ast }=\min \left \{ \frac{1}{4},\frac{1}{2},\frac{3}{-1}\right \}$. We will continue next lecture.

2.18 Lecture 18. Thursday, March 17, 2016

A good exercise, to be done by hand, is the following: For the sector patrol problem which we considered in class, solve the Phase One LP to obtain a basic feasible solution.
Then, use this ﬁrst basic feasible solution as a starting point for the original LP and solve it via a sequence of tableau. Since you already know the answer, you will get feedback whether your calculations produce the right result.

2.18.1 Simplex method examples

We still need to know how to ﬁnd ﬁrst feasible solution. In the following example

A solution which is basic and feasible is $x_{1}=4,x_{2}=3,x_{3}=1$ and all others $x_{i}=0,i=4\cdots 6$.

This was just read out from the identity matrix in the ﬁrst 3 columns above. Let us now assume we want fourth column to be in the basis. We have to remove one of the current columns in the basis in order to let another column in. $\min \left \{ \frac{4}{2},\frac{3}{1}\right \} =2$. This means the ﬁrst row is the pivot row. Notice we do not consider $\frac{1}{-1}$ when doing the minimum operation. Any negative value in a column is bypassed. Now we know that ﬁrst row is pivot row and that we want fourth column in. This is all what we need to go to the next step. We know need to normalize entry $\left ( 1,4\right ) $ to one. (before it was 2). After normalizing the pivot row (by dividing the whole row by 2) we obtain

$x_{1}$	$x_{2}$	$x_{3}$	$x_{4}$	$x_{5}$	$x_{6}$	b


$\frac{1}{2}$	$0$	$0$	$1$	$2$	$3$	$2$
$0$	$1$	$0$	$1$	$2$	$3$	$3$
$0$	$0$	$1$	$-1$	$2$	$1$	$1$

Only now we start applying row operations, with row one as pivot row, we make all other entries in fourth column below $\left ( 1,4\right ) $ zero, This gives the following

	$x_{1}$	$x_{2}$	$x_{3}$	$x_{4}$	$x_{5}$	$x_{6}$	b


$r_{1}$	$\frac{1}{2}$	$0$	$0$	$1$	$2$	$3$	$2$
$^{r_{2}-r_{1}}$	$-\frac{1}{2}$	$1$	$0$	$0$	$0$	$0$	$1$
$r_{3}+r_{1}$	$\frac{1}{2}$	$0$	$1$	$0$	$4$	$4$	$3$

Now that we have a new identity matrix, we read out the new solution which is $x_{2}=1,x_{4}=2,x_{3}=3$ and all other $x$ entries zero.\[ x=\begin{pmatrix} 0\\ 1\\ 3\\ 2\\ 0\\ 0 \end{pmatrix} \] We will now revisit this, with optimality in mind. This means we need to know which column to bring into the basis. The question is, which $x_{i}$ to bring in. Set up this tableau¹

\[ \overset{A}{\overbrace{\begin{pmatrix} 1 & \cdots & 0 & a\left ( 1,m+1\right ) & \cdots & a\left ( 1,n\right ) \\ \vdots & \ddots & \vdots & \vdots & \ddots & \cdots \\ 0 & \cdots & 1 & a\left ( m,m+1\right ) & \cdots & a\left ( m,n\right ) \end{pmatrix} }}\overset{b}{\overbrace{\begin{pmatrix} y\left ( 1,0\right ) \\ y\left ( 2,0\right ) \\ \vdots \\ y\left ( m,0\right ) \end{pmatrix} }}\] Therefore, the current feasible and basic solution is $x_{1}=y\left ( 1,0\right ) ,x_{2}=y\left ( 2,0\right ) ,\cdots ,x_{m}=y\left ( m,0\right ) $. All other $x_{i}=0$. We need now to do a feasibility preserving perturbation.

Allow $x_{m+1},x_{m+2},\cdots ,x_{n}$ to be part of the solution.\begin{align*} x_{1}+\overset{\text{we allow this in}}{\overbrace{\sum _{i=m+1}^{n}a\left ( 1,i\right ) x_{i}}} & =y\left ( 1,0\right ) \\ x_{2}+\sum _{i=m+1}^{n}a\left ( 2,i\right ) x_{i} & =y\left ( 2,0\right ) \\ & \vdots \\ x_{m}+\sum _{i=m+1}^{n}a\left ( m,i\right ) x_{i} & =y\left ( 2,0\right ) \end{align*}

Now $\boldsymbol{x}=\begin{pmatrix} x_{1} & x_{2} & \cdots & x_{m} & 0 & \cdots & 0 \end{pmatrix} ^{T}$ is still feasible, but no longer basic. We know that $J=c^{T}x$, hence\begin{align*} J & =c_{1}\left ( y(1,0) - \sum _{i=m+1}^{n} a(1,i) x_{i}\right ) + c_{2}\left ( y(2,0) - \sum _{i=m+1}^{n} a(2,i) x_{i}\right ) + \cdots +c_{m} \left ( y(m,0) -\sum _{i=m+1}^{n} a(m,i) x_{i}\right ) + c_{m+1} x_{m+1} + \cdots +c_{n} x_{n}\\ J & = \overbrace{\sum _{i=1}^{m} c_{i} y(i,0)}^{ \text{current }J_{0} \text{value}} -c_{1} \sum _{i=1}^{m} a(1,i) x_{i} -c_{2} \sum _{i=1}^{m} a(2,i) x_{i} -\cdots -c_{m} \sum _{i=1}^{m} a(m,i) x_{i} +c_{m+1}x_{m+1} +\cdots +c_{n} x_{n} \end{align*}

Therefore \[ J=J_{0}-\left ( -c_{m+1}+c_{1}a\left ( 1,m+1\right ) +c_{2}a\left ( 2,m+1\right ) +\cdots +c_{m}a\left ( m,m+1\right ) \right ) x_{m+1}-\left ( -c_{m+1}+c_{1}a\left ( 2,m+1\right ) + \cdots +c_{m}a\left ( m,m+\right ) \right ) x_{m+2}-\cdots -\left ( -c_{n}+c_{1}a\left ( 1,n\right ) +c_{2}a\left ( 2,n\right ) +\cdots +c_{m}a\left ( m,n\right ) \right ) x_{n}\] Deﬁne cost coeﬃcients, for $j=m+1,\cdots ,n$\[ r_{j} = c_{j} - \sum _{i=1}^{m} c_{i} a(i,j) \] Hence\[ J = J_{0} + \sum _{j=m+1}^{n} r_{j} x_{j} \] To minimize $J$ we need to pick the most negative $r_{j}$. This tells us which $x_{j}$ we need to bring into the basis in order to reduce $J$. Now we add extra row (last row) which is $J\left ( u\right ) $ to the table, to keep cost in it. Here is the table with the cost coeﬃcient row added\[ \overset{A}{\overbrace{\begin{pmatrix} 1 & \cdots & 0 & a\left ( 1,m+1\right ) & \cdots & a\left ( 1,n\right ) \\ \vdots & \ddots & \vdots & \vdots & \ddots & \cdots \\ 0 & \cdots & 1 & a\left ( m,m+1\right ) & \cdots & a\left ( m,n\right ) \\ 0 & 0 & 0 & r_{m+1} & \cdots & r_{n}\end{pmatrix} }}\overset{b}{\overbrace{\begin{pmatrix} y\left ( 1,0\right ) \\ y\left ( 2,0\right ) \\ \vdots \\ -J_{0}\end{pmatrix} }}\] state street vendor example Selling rings and bracelets. Ring has 3 oz. gold., 1 oz. silver. Bracelet has 1 oz. gold, 2 oz. silver. Proﬁt on ring is $4 and proﬁt on bracelet is $5. Initially we have 8 oz. gold and 9 0z silver. How many rings and bracelets to produce to maximize proﬁt? This is the heart of many production problems. Note: we need integer LP to solve this. But if the quantities are very large, it will make little diﬀerence. So for now we can ignore the issue of integer programming. \begin{align*} u_{1} & =\text{number of rings}\\ u_{2} & =\text{number of bracelets}\\ J\left ( u\right ) & =4u_{1}+5u_{2} \end{align*}

Since we want to maximize this, then \[ J\left ( u\right ) =-4u_{1}-5u_{2}\] With $u_{i}\geq 0$. Constraints are $3u_{1}+u_{2}\leq 8$ and $u_{1}+2u_{2}\leq 9$. We convert to standard LP with slack variables and make the ﬁrst table

The ﬁrst basic feasible solution is $x_{3}=8,x_{4}=9$. To decide which column to bring in, we see the most negative is column 2 (last row is $-5$). To ﬁnd the pivot row, we see it is ﬁrst row since $\min \left \{ \frac{8}{2},\frac{9}{2}\right \} =\left \{ 4,4.5\right \} =4$. Now we do the ﬁrst stage.

Hence $x_{2}=4.5,x_{3}=3.5$. Since there is still a negative entry in the last row we need to repeat the process again. We Keep doing this until there are no negative entries in the last (third) row.

Now we will now talk about how to ﬁnd the ﬁrst basic feasible solution. There are two cases. If the number of slack variables is $m$ then ﬁrst basic feasible solution can be read out. This means there is no phase one LP. Case two. The number of slack variables is $z<m$. Now we need to solve the ﬁrst phase LP. Then use its result to solve second phase LP. In phase one, we introduce new artiﬁcial variables $y_{i}$ as many as $m-z$ and new artiﬁcial cost function $J\left ( y\right ) $ which we want to minimize to zero.

2.19 Lecture 19. Tuesday, March 22, 2016 (No class)

2.20 Lecture 20. Thursday, March 24, 2016 (No class)

2.21 Lecture 21. Tuesday, March 29, 2016

Coming up soon is the special problem. It is like one HW but can count up to two HW’s weight. Note: Possible rescheduling for April 6, 2016 remain in place.

2.21.1 Second exam keywords

Now we will start on today lecture. Often a problem is given to you, but it is not an LP problem. Sometimes it is not obvious how to convert it to an LP problem. Sometime we need algebra or cleaner reformulation of the problem to make it an LP problem. For example, in HW6, we had a min-max problem but it is was possible to convert it to an LP problem. See key solution for HW6.

2.21.2 Application of Linear programming to control problems

Another application area for LP is control. Example is the minimum fuel problem. In this, we want to go from some state to ﬁnal state with minimum control eﬀort. The control eﬀort is generic name which can mean many things depending on the problem itself. We also want to do this in minimal time. We begin with the discrete state equation \[ x(k+1)= A x(k) + B u(k) \] With $x(0)$ given as the initial state. In the above, $x$ is an $n\times 1$ vector, and $A$ is $n\times n$ and $B$ is $n\times m$ where $m$ is number of inputs, and $u$ is $m\times 1$ input vector.

We want to select $u(k)$ sending $x(0)$ to some target $x^{\ast }$ at some future time $k=N$. With $N$ being minimal, and control eﬀort minimum. Assume for now that $u$ is scalar, which means one input, then an energy measure is \[ \sum _{k=0}^{N-1} \left | u(k) \right | ^{2} \] On the other hand, a peak measure is \[ \max \{ u(k),k=0,\dots ,N-1\} \] But we will consider the fuel measure given by \[ \sum _{k=0}^{N-1} \left | u(k) \right | \] We will use fuel measure in the LP problem. The constraint is \begin{align*} \left | u(k) \right | \leq U \tag{$\ast $} \end{align*}

Which says that control is bounded. Note that is $(A,B)$ is controllable, we can get from initial state to ﬁnal state in one step if we want, but the eﬀort will be very large. We also want $x(N)=x^{\ast }$. The above two are the constraints in this problem. The objective function is \[ J(u) = \sum _{k=0}^{N-1} \left | u(k) \right | \] Therefore \begin{align*} x(1) = & A x(0) + B u(0)\\ x(2) = & A^{2} x(0) + A B u(0) + B u(1)\\ x(3) = & A^{3} x(0) + A^{2} B u(0) + A B u(1) + B u(2)\\ & \vdots \\ x(N) = & \underbrace{ A^{N} x(0) + \sum _{k=0}^{N-1} A^{N-1-k} B u(k) }_{x^{\ast } \text{This is the linear constraint}} \end{align*}

Now we rewrite the constraint $\left | u(k) \right | \leq U $ as \[ u(k) = u_{p}(k) - u_{n}(k) \] with $u_{p}(k),u_{n}(k)$ being positive. The objective function becomes (where we now put $N$ as parameter, to say this is for a speciﬁc value of $N$ \begin{align*} J_{N}(u) & = \sum _{k=0}^{N-1} \left | u(k) \right | \\ & = \sum _{k=0}^{N-1} u_{p}(k) + u_{n}(k) \end{align*}

Equation $\ast $ above becomes \begin{align*} u_{p}(k) & \leq U\\ u_{n}(k) & \leq U \end{align*}

So minimizing $J_{N}(u)$ is now an LP problem in $2N$ raw variables (we still need to add the needed slack variables). So by doubling the number of variables, we were able to convert this control problem to an LP problem. Let \[ N^{\ast } = \inf \{ N : LP_{N} \hspace{5pt} \text{feasible} \} \] Reader Argue that $l^{\infty }$ measure also lead to an LP problem. $l^{\infty }$ measure is $\max \{u(0),u(1),\dots \}$.

Example Let $A = \begin{pmatrix} 1 & 1-e^{-T}\\ 0 & e^{-T}\end{pmatrix} $ where $T=1$ is the sample time. And let $B=\begin{pmatrix} 0\\ 1 \end{pmatrix} $. The bound on $u(k)=1$. This means $U=1$. Let $x(0)=\begin{pmatrix} -40.91\\ 43.50 \end{pmatrix} $ and let the target $x^{\ast }=\begin{pmatrix} 0\\ 0 \end{pmatrix} $. Find $u^{\ast }$ and $N^{\ast }$. Running LP for diﬀerent values of $N$ from $1,\dots ,4$ we ﬁnd the ﬁrst feasible solution at $N=4$. These is the resulting optimal eﬀort $u(k)$ \begin{align*} u(0) & = -0.3009\\ u(1) & = -1\\ u(2) & = -0.2999\\ u(3) & = -1\\ \end{align*}

And the corresponding optimal objective function $J^{\ast }=2.6008$. In the above, we have priorities on minimal time ﬁrst.

2.21.3 Starting dynamic programming

Now we will start on the next topic, which is dynamic programming. This involves discrete decisions. In this course we will cover only discrete dynamic programming and not continuous dynamic programming. We will be making sequential decisions in time. For example, if $u(k)=\{-1,0,1\}$ then the decision tree will look like

We will get tree with potentially large number of branches. Combinatories arise. We get the curse of dimensionality problem again. For dynamic programming, Bellman is considered the person who originated the subject.

2.22 Lecture 22. Thursday, March 31, 2016. Second midterm exam

2.23 Lecture 23. Tuesday, April 5, 2016

For steepest descent problem, with optimal step size, max is $1$, need to use $\|u^{k+1}-u^{k}\| \leq 1$.

Back to dynamic programming. Bellman secret is simple but theory is powerful. Main things to take from this course are

2.23.1 First dynamic programming problem, trip from NY to SFO

Suppose we want to take trip from NY to San Francisco, such that the total toll is minimum. We also must go west at each step we take once we start from NY. Not allowed to go east direction, even if the cost might be lower.

Dynamic programming is now used to ﬁnd optimal route in the above problem. (i.e. the route with least toll (cost) from NY to SFO). Dynamic programming is based on what if decisions. Instead of starting from NY and trying every possible route, we instead start backwards, and ask, what if we were in PORT, which route would we take?

Clearly the only route PORT to SFO with cost of 5 exist. Then we ask, what if we were in LV, which route would we take? we see it is LV to SFO with cost of 3. Then we ask what if we were in SNY? Then since LV had cost 3, then the route $\text{SNY}\to \text{LV}\to \text{SFO}$ would be the one to take, with cost of $2+3=5$. Each time we ﬁnd the cost from one city to SFO, we label the city with this cost. We keep moving back to the east, doing the same. When we arrive all the way to JFK, then we see that the lowest cost is \[ J^{\ast } = \text{JFK} \to \text{NO} \to \text{PHX} \to \text{LV} \to \text{SFO} \]

The following diagram shows the route above, with the cost of moving from each city to SFO given next to each city name on the diagram.

If we were to solve the above problem using direct evaluation the number of computations is of order (assuming even $n$ is $\frac{(n-1)! n!}{\frac{n}{2}! \frac{n}{2}!}$ while with dynamic programming method as explained above, it is $\frac{n^{2}}{2}+n$. For $20$ cities, this given $220$ for dynamic programming compare to over one million computation for the direct approach (trying all the possible routes).

We will only consider discrete dynamic programming. This is an optimization problem. The variables are not continuous. The variable take in discrete ﬁxed values of choices each time. These applications are useful for integrate programming problems. An integer programming problem is much harder than continuous ones with much larger complexity.

When we are given an integer programming problem which is hard, we can try to approximate it to continuous programming problem and solve it more easily that way. For example \[ \min _{x} c^{T} x \quad \text{subject to} \quad Ax=b \] where $x$ is allowed to be integers, is a hard problem. But if we relax it and allow $x$ to take any value so that the problem becomes continuous, then it will become much easier to solve using Linear Programming.

There are papers written on when we can approximate integer programming problems as continuous.

We will be making sequential decisions. $u_{0},u_{1},\dots ,u_{N-1}$ i.e. $N$ decisions where $u\in \mathbb{R}^{n}$. If we are given a problem which is not sequential, we can treat it as one for this purpose. For example, the car toll problem above, we formulate it to sequential but we did not have to do this. But the ﬁnal answer $J^{\ast }$ should come out the same no matter how it was formulated of course.

There will be states $x(k),\, k=0,1,\dots ,N$ where $x(N)$ is the terminal state. The constraints are state dependent. Because it depends on where we are when making the decision. For the car toll problem above, the decision depended on which city we were in. We denote the decision as $u(k) \in \Omega $ and the state equation is \[ x(k+1) = f(x(k),u(k),k) \] and the objective function is \[ J(u)= \underbrace{ \sum _{k=0}^{N-1} J(x(k),u(k),k) }_{\text{stage cost}} + \underbrace{\psi (x(N))}_{\text{terminal cost}} \]

2.23.2 Subproblem in dynamic programming

Now we need to deﬁne a subproblem. Notion of subproblem: Will begin at $k=L$ and will be in state $x(L)$. The cost when in state $L$ is therefore \[ \boxed{ J_{L}(u) = \sum _{k=L}^{N-1} J(x(k),u(k),k) + \psi (x(N)) } \] Suppose $u^{\ast }$ is the optimal decision when we are at state $x(k)$. Let $x^{\ast }{k}$ be the optimal trajectory from $x(k)$. i.e. $x^{\ast }(k)$ is corresponding state path beginning at given $x(0)$. Hence \[ x^{\ast }(k+1) = f(x^{\ast }(k),u^{\ast }(k),k), \,\, k=0,1,\dots ,N-1 \]

2.23.3 Bellman principle of optimality

If the subproblem begins at $x(L)=x^{\ast }(L)$ i.e. we being subproblem along optimal trajectory of the original problem, then $u^{\ast }(L),u^{\ast }(L+1),\dots ,u^{\ast }(N-1)$ is optimal for the subproblem.

What all this means, is that if the subproblem is optimal, then its trajectory has to be part of the overall problem optimal trajectory. An optimal subproblem, can not become sub-optimal when viewed as part of the main problem.

2.24 Lecture 24. Thursday, April 7, 2016 (No class)

2.25 Lecture 25. Tuesday, April 12, 2016

The ﬁrst part of the lecture was on describing the special problem we have to do. This is described in the special problem HW itself included in HW chapters, under special problem.

2.25.1 Dynamic programming state equation

Now we go back to dynamic programming. The state equation is\[ x\left ( k+1\right ) =f\left ( x\left ( k\right ) ,u\left ( k\right ) ,k\right ) \qquad k=0,1,\cdots N \quad \text{and} \quad u\left ( k\right ) \in \Omega _{k}\] And the objective function is\[ J=\Psi \left ( x\left ( N\right ) \right ) +\sum _{k=0}^{N-1}J_{k}\left ( x\left ( k\right ) ,u\left ( k\right ) \right ) \] Where $\Psi \left ( x\left ( N\right ) \right ) $ is the cost of the terminal stage.

2.25.2 Subproblems and principle of optimality (POO)

P.O.O. (principle Of optimality): if initial state of a subproblem is on the optimal trajectory of original problem, then the subproblem is also optimal.

Proof by contradiction Let \[ u\left ( k\right ) =\left \{ \begin{array} [c]{c}u^{\ast }\left ( k\right ) \qquad k=0,\cdots ,L-1\\ u_{new}^{\ast }\left ( k\right ) \qquad k=L,\cdots N-1 \end{array} \right . \]

\[ \boxed{ I\left ( x\left ( N-1\right ) ,1\right ) =\min _{u\left ( N-1\right ) \in \Omega _{N-1}}\left \{ \overset{\text{cost of one step}}{\overbrace{J\left ( x\left ( N-1\right ) ,u\left ( N-1\right ) \right ) }}+\Psi \left ( x\left ( N\right ) \right ) \right \} } \]

The above is the optimal cost with one step to terminal stage. This is similar to what we did for the routing problem from NY to San Francisco before. The above is when we are standing in Portland and looking for the last step to take to San Francisco.

To minimize the above, we have to express everything in the same state $x\left ( N-1\right ) $. So we write the above, using the state equation, as

\begin{equation} I\left ( x\left ( N-1\right ) ,1\right ) =\min _{u\left ( N-1\right ) \in \Omega _{N-1}}\left \{ J\left ( x\left ( N-1\right ) ,u\left ( N-1\right ) \right ) +\Psi \left ( f\left ( x\left ( N-1\right ) ,u\left ( N-1\right ) \right ) \right ) \right \} \tag{1} \end{equation}

Now we ﬁnd $u^{\ast }\left ( N-1\right ) $ of the above, using \[ \frac{dI\left ( x\left ( N-1\right ) ,1\right ) }{du\left ( N-1\right ) }=\frac{d}{du\left ( N-1\right ) }J\left ( x\left ( N-1\right ) ,u\left ( N-1\right ) \right ) +\Psi \left ( f\left ( x\left ( N-1\right ) ,u\left ( N-1\right ) \right ) \right ) =0 \]

And solve for $u\left ( N-1\right ) $ using standard calculus. Once we ﬁnd $u^{\ast }\left ( N-1\right ) \,$, we plug it back into (1) and obtain

\[ I^{\ast }\left ( x\left ( N-1\right ) ,1\right ) =J\left ( x\left ( N-1\right ) ,u^{\ast }\left ( N-1\right ) \right ) +\Psi \left ( f\left ( x\left ( N-1\right ) ,u^{\ast }\left ( N-1\right ) \right ) \right ) \]

Notice now there is no $\min _{u\left ( N-1\right ) \in \Omega _{N-1}}$ since we already done this. We now apply POO. again to ﬁnd cost from stage $N-2$

\[ I\left ( x\left ( N-2\right ) ,2\right ) =\min _{u\left ( N-2\right ) \in \Omega _{N-2}}\left \{ J\left ( x\left ( N-2\right ) ,u\left ( N-2\right ) \right ) +I^{\ast }\left ( x\left ( N-1\right ) ,1\right ) \right \} \]

Notice the diﬀerence now. For all stages back, beyond $N-1$, we use the cost found from the ahead stage, which is $I^{\ast }\left ( x\left ( N-1\right ) ,1\right ) $ in the above case. We now repeat the process, and ﬁnd optimal $u^{\ast }\left ( x\left ( N-2\right ) ,2\right ) $. See HW 7, problem 2 for detailed example how to do this. More generally,

\[ I\left ( x\left ( L\right ) ,N-L\right ) =\min _{u\left ( L\right ) \in \Omega _{L}}\left \{ J\left ( x\left ( L\right ) ,u\left ( L\right ) \right ) +I^{\ast }\left ( x\left ( L+1\right ) ,N-L-1\right ) \right \} \]

2.26 Lecture 26. Thursday, April 14, 2016

2.26.1 Stages in dynamic programming

At stage $L$ the optimal cost from stage $L$ with $N-L$ steps to go is\[ \boxed{ I\left ( x_{L},N-L\right ) =\min _{u\left ( N-L\right ) \in \Omega _{L}}\left \{ J\left ( x_{L},u\left ( L\right ) \right ) +I\left ( x_{L+1},N-\left ( L+1\right ) \right ) \right \} } \] With appropriate initialization.

Some comments: The trickiest part is how to use this equation. Must be careful. Think of $u\left ( L\right ) $ as feedback. We call the optimal $u^{\ast }\left ( L\right ) .$

Warning. There is a constraint on $u$. Do not use derivative to ﬁnd optimal without being careful about the limits and constraints. For example, if $\left \vert u\left ( L\right ) \right \vert \leq 1$ and we have quadratic form. We will now use an example to show how to use these equations. Let\begin{equation} x_{k+1}=x_{k}-u_{k} \tag{1} \end{equation} Where $u_{k}$ is free to take any value. Let the objective function be\begin{equation} J\left ( x_{k},u_{k}\right ) =\sum _{k=0}^{N-1}\left ( x_{k+1}-u_{k}\right ) ^{2}+u_{k}^{2} \tag{2} \end{equation} Reﬂecting a simple tracking mechanism. We always start at $x\left ( N-1\right ) $ with one stage to go. Hence the optimal cost from $x_{N-1}$ with one step to go is\[ I\left ( x_{N-1},1\right ) =\min _{u\left ( N-1\right ) \in \Omega _{1}}\left \{ J\left ( x_{N-1},u_{N-1}\right ) +\Psi \left ( x_{N}\right ) \right \} \] $\Psi \left ( x_{N}\right ) $ is the terminal cost. Let us now remove it from the rest of the computation to simplify things. We also replace $J$ in the above from (2) and obtain\begin{align*} I\left ( x_{N-1},1\right ) & =\min _{u\left ( N-1\right ) }\left \{ \left ( x_{\left ( \left ( N-1\right ) +1\right ) }-u_{N-1}\right ) ^{2}+u_{N-1}^{2}\right \} \\ & =\min _{u\left ( N-1\right ) }\left \{ \left ( x_{N}-u_{N-1}\right ) ^{2}+u_{N-1}^{2}\right \} \\ & =\min _{u\left ( N-1\right ) }\left \{ \left ( x_{N}-u_{N-1}\right ) ^{2}+u_{N-1}^{2}\right \} \end{align*}

We want everything in terms of $x_{N-1}$. So we use (1) to write $x_{N}=x_{N-1}-u_{N-1}$ and plug it back in the last equation above to obtain\begin{align} I\left ( x_{N-1},1\right ) & =\min _{u_{N-1}}\left \{ \left ( x_{N-1}-u_{N-1}-u_{N-1}\right ) ^{2}+u_{N-1}^{2}\right \} \nonumber \\ & =\min _{u_{N-1}}\left \{ \left ( x_{N-1}-2u_{N-1}\right ) ^{2}+u_{N-1}^{2}\right \} \tag{3} \end{align}

Only now to take derivative, in order to ﬁnd $u_{N-1}^{\ast }$. therefore\begin{align*} \frac{dI\left ( x_{N-1},1\right ) }{u_{N-1}} & =0\\ 2\left ( x_{N-1}-2u_{N-1}\right ) \left ( -2\right ) +2u_{N-1} & =0\\ u_{N-1}^{\ast } & =\frac{2}{5}x_{N-1} \end{align*}

Now that we found the optimal $u_{N-1}^{\ast }$ we go back to (3) and replace $u_{N-1}$ in (3) by $u_{N-1}^{\ast }$. Hence\begin{align*} I\left ( x_{N-1},1\right ) & =\left ( x_{N-1}-2\left ( \frac{2}{5}x_{N-1}\right ) \right ) ^{2}+\left ( \frac{2}{5}x_{N-1}\right ) ^{2}\\ & =\frac{1}{5}x_{N-1}^{2} \end{align*}

Now we backup one step. We need to ﬁnd\begin{equation} I\left ( x_{N-2},2\right ) =\min _{u_{N-2}}\left \{ J\left ( x_{N-2},u_{N-1}\right ) +I\left ( x_{N-1},1\right ) \right \} \tag{4} \end{equation} Notice that we used $I\left ( x_{N-1},1\right ) $ in place of what we had before, which was the terminal cost $\Psi \left ( x\left ( N\right ) \right ) $. Since now we are two steps behind. All the work before was for ﬁnding optimal $I\left ( x_{N-1},1\right ) $. So now (4) becomes\begin{equation} I\left ( x_{N-2},2\right ) =\min _{u_{N-2}}\left \{ J\left ( x_{N-1},u_{N-1}\right ) +\frac{1}{5}x_{N-1}^{2}\right \} \tag{5} \end{equation} But from (2) \[ J\left ( x_{N-1},u_{N-2}\right ) =\left ( x_{N-1}-u_{N-2}\right ) ^{2}+u_{N-2}^{2}\] Hence (5) becomes\[ I\left ( x_{N-2},2\right ) =\min _{u_{N-2}}\left \{ \left ( x_{N-1}-u_{N-2}\right ) ^{2}+u_{N-2}^{2}+\frac{1}{5}x_{N-1}^{2}\right \} \] We need to use the state equation $x_{k+1}=x_{k}-u_{k}$ to rewrite $x_{N-1}$ in the above, since we want everything in $N-2$ terms. Therefore the above becomes\begin{align} I\left ( x_{N-2},2\right ) & =\min _{u_{N-2}}\left \{ \left ( \left ( x_{N-2}-u_{N-2}\right ) -u_{N-2}\right ) ^{2}+u_{N-2}^{2}+\frac{1}{5}\left ( x_{N-2}-u_{N-2}\right ) ^{2}\right \} \nonumber \\ & =\min _{u_{N-2}}\left \{ \frac{26}{5}u_{N-2}^{2}-\frac{22}{5}u_{N-2}x_{N-2}+\frac{6}{5}x_{N-2}^{2}\right \} \tag{6} \end{align}

Now we take derivative, to ﬁnd $u_{N-2}^{\ast }$\begin{align*} \frac{dI\left ( x_{N-2},2\right ) }{u_{N-2}} & =0\\ \frac{52}{5}u_{N-2}-\frac{22}{5}x_{N-2} & =0\\ u_{N-2}^{\ast } & =\frac{11}{26}x_{N-2} \end{align*}

Now that we found the optimal $u_{N-2}^{\ast }$, we go back to (6) and replace $u_{N-2}$ there with $u_{N-2}^{\ast }$\begin{align*} I\left ( x_{N-2},2\right ) & =\frac{26}{5}\left ( \frac{11}{26}x_{N-2}\right ) ^{2}-\frac{22}{5}\left ( \frac{11}{26}x_{N-2}\right ) x_{N-2}+\frac{6}{5}x_{N-2}^{2}\\ & =\frac{7}{26}x_{N-2}^{2} \end{align*}

Reader Carry out one more stage and obtain $J^{\ast }=\left ( x\left ( 0\right ) ,3\right ) $

Answer \begin{align*} I\left ( x_{N-3},3\right ) & =\min _{u_{N-3}}\left \{ J\left ( x_{N-3},u_{N-3}\right ) +I\left ( x_{N-2},2\right ) \right \} \\ & =\min _{u_{N-3}}\left \{ J\left ( x_{N-3},u_{N-3}\right ) +\frac{7}{26}x_{N-2}^{2}\right \} \end{align*}

But from (2) $J\left ( x_{N-3},u_{N-3}\right ) =\left ( x_{N-2}-u_{N-3}\right ) ^{2}+u_{N-3}^{2}$, hence the above becomes\[ I\left ( x_{N-3},3\right ) =\min _{u_{N-3}}\left \{ \left ( x_{N-2}-u_{N-3}\right ) ^{2}+u_{N-3}^{2}+\frac{7}{26}x_{N-2}^{2}\right \} \] We need to use the state equation $x_{k+1}=x_{k}-u_{k}$ to rewrite $x_{N-2}$ in the above, since we want everything in $N-3$ terms. Therefore the above becomes\begin{align} I\left ( x_{N-3},3\right ) & =\min _{u_{N-3}}\left \{ \left ( \left ( x_{N-3}-u_{N-3}\right ) -u_{N-3}\right ) ^{2}+u_{N-3}^{2}+\frac{7}{26}\left ( x_{N-3}-u_{N-3}\right ) ^{2}\right \} \nonumber \\ & =\min _{u_{N-3}}\left \{ \frac{137}{26}u_{N-3}^{2}-\frac{59}{13}u_{N-3}x_{N-3}+\frac{33}{26}x_{N-3}^{2}\right \} \tag{7} \end{align}

Now we take derivative, to ﬁnd $u_{N-3}^{\ast }$\begin{align*} \frac{dI\left ( x_{N-3},3\right ) }{u_{N-3}} & =0\\ \frac{274}{26}u_{N-3}-\frac{59}{13}x_{N-3} & =0\\ u_{N-3}^{\ast } & =\frac{59}{137}x_{N-3} \end{align*}

Replace this back in (7)\begin{align*} I\left ( x_{N-3},3\right ) & =\frac{137}{26}\left ( \frac{59}{137}x_{N-3}\right ) ^{2}-\frac{59}{13}\left ( \frac{59}{137}x_{N-3}\right ) x_{N-3}+\frac{33}{26}x_{N-3}^{2}\\ & =\frac{40}{137}x_{N-3}^{2} \end{align*}

Let us do one more one, $N=4.$\begin{align*} I\left ( x_{N-4},4\right ) & =\min _{u_{N-4}}\left \{ J\left ( x_{N-4},u_{N-4}\right ) +I\left ( x_{N-3},3\right ) \right \} \\ & =\min _{u_{N-4}}\left \{ J\left ( x_{N-4},u_{N-4}\right ) +\frac{40}{137}x_{N-3}^{2}\right \} \end{align*}

But from (2) $J\left ( x_{N-4},u_{N-4}\right ) =\left ( x_{N-3}-u_{N-4}\right ) ^{2}+u_{N-4}^{2}$, hence the above becomes\[ I\left ( x_{N-4},4\right ) =\min _{u_{N-4}}\left \{ \left ( x_{N-3}-u_{N-4}\right ) ^{2}+u_{N-4}^{2}+\frac{40}{137}x_{N-3}^{2}\right \} \] We need to use the state equation $x_{k+1}=x_{k}-u_{k}$ to rewrite $x_{N-3}$ in the above, since we want everything in $N-4$ terms. Therefore the above becomes\begin{align} I\left ( x_{N-4},4\right ) & =\min _{u_{N-4}}\left \{ \left ( \left ( x_{N-4}-u_{N-4}\right ) -u_{N-4}\right ) ^{2}+u_{N-4}^{2}+\frac{40}{137}\left ( x_{N-4}-u_{N-4}\right ) ^{2}\right \} \nonumber \\ & =\min _{u_{N-4}}\left \{ \frac{725}{137}u_{N-4}^{2}-\frac{628}{137}u_{N-4}x_{N-4}+\frac{177}{137}x_{N-4}^{2}\right \} \tag{8} \end{align}

Now we take derivative, to ﬁnd $u_{N-4}^{\ast }$\begin{align*} \frac{dI\left ( x_{N-4},4\right ) }{u_{N-4}} & =0\\ \frac{1450}{137}u_{N-4}-\frac{628}{137}x_{N-4} & =0\\ u_{N-4}^{\ast } & =\frac{314}{725}x_{N-4} \end{align*}

Therefore (8) becomes\begin{align*} I\left ( x_{N-4},4\right ) & =\frac{725}{137}\left ( \frac{314}{725}x_{N-4}\right ) ^{2}-\frac{628}{137}\left ( \frac{314}{725}x_{N-4}\right ) x_{N-4}+\frac{177}{137}x_{N-4}^{2}\\ & =\frac{217}{725}x_{N-4}^{2} \end{align*}


$L$	$I\left ( x_{4-L},L\right ) $

$1$	$0.2\ \left ( \left ( \left ( x_{0}-u_{0}\right ) -u_{1}\right ) -u_{2}\right ) ^{2}$

$2$	$0.2692\ \left ( \left ( x_{0}-u_{0}\right ) -u_{1}\right ) ^{2}$

$3$	$0.29197\ x_{1}^{2}$ $\left ( x_{0}-u_{0}\right ) ^{2}$

$4$	$0.29931\ x_{0}^{2}$

\[ I=0.2\ \left ( \left ( \left ( x_{0}-u_{0}\right ) -u_{1}\right ) -u_{2}\right ) ^{2}+0.2692\ \left ( \left ( x_{0}-u_{0}\right ) -u_{1}\right ) ^{2}+0.29197\ x_{1}^{2}\left ( x_{0}-u_{0}\right ) ^{2}+0.29931\ x_{0}^{2}\] This example is special case of LQR. What happens in this example above, or more generally when $N\rightarrow \infty $ ? As $N\rightarrow \infty $ we will see later that the feedback gains become time invariant. This is called steady state LQR and will we arrive at the Riccati equations.

2.26.2 Allocation problem, applying DP to investment problem

Next example is allocation problem We will do it in two steps. We can solve this without using D.P. but will use D.P. to illustrate the method. Consider two investments.

Let say with start with ﬁxed amount of money $k$ dollars. We have constraint: $b_{r}$ is maximum allowed amount of investment in real estate, $b_{o}$ is the maximum amount allowed for oil investment. To avoid trivial solution, assume also that $b_{r}+b_{o}>k$. Let $u_{r}$ the amount invested in real estate, and let $u_{o}$ amount invested in oil.

Common sense solution is $u_{0}^{\ast }=k-b_{o}$ and $u_{1}^{\ast }=b_{o}$ since investment in oil has higher return. $u_{1}$ is investment in oil, and $u_{0}$ is investment in real estate. Let do this using D.P. The objective function is\[ J\left ( u\right ) =2u_{0}+4u_{1}\] And the state equation is (we only have two states $x_{1},x_{0}$)\[ x_{1}=x_{0}-u_{0}\] Initial state is $x_{0}=k$ which is the money we have at the start. There are two stages. Hence $N=2$. We start with\begin{align*} I\left ( x_{N-1},1\right ) & =I\left ( x_{1},1\right ) \\ & =\max _{u_{1}\in \Omega _{1}}\left \{ J\left ( u\right ) \right \} \end{align*}

i.e. we assume we have one stage to go, and that we have made initial investment in real estate and now we are making investment in oil. Hence\[ \Omega _{1}=\left [ 0\cdots \min \left \{ b_{o},x_{1}\right \} \right ] \] Hence $u_{1}^{\ast }=\min \left \{ b_{o},x_{1}\right \} $ and\[ I\left ( x_{1},1\right ) =4\min \left \{ b_{o},x_{1}\right \} \] Now backup one step.\begin{align*} I\left ( x_{0},2\right ) & =\max _{u_{0}\in \Omega _{0}}\left \{ J\left ( u\right ) +I\left ( x_{1},1\right ) \right \} \\ & =\max _{u_{0}\in \Omega _{0}}\left \{ 2u_{0}+4\min \left \{ b_{o},x_{1}\right \} \right \} \end{align*}

Where $\Omega _{0}=\left [ 0\cdots b_{o}\right ] \,$. Therefore since $x_{1}=x_{0}-u_{0}$ the above becomes\[ I\left ( x_{0},2\right ) =\max _{u_{0}\in \Omega _{0}}\left \{ 2u_{0}+4\min \left \{ b_{o},x_{0}-u_{0}\right \} \right \} \] Let \begin{align*} F\left ( u_{0}\right ) & =2u_{0}+4\min \left \{ b_{o},x_{0}-u_{0}\right \} \\ & =2u_{0}+4\min \left \{ b_{o},k-u_{0}\right \} \end{align*}

Find the maximum using graphics method. This gives $u_{o}^{\ast }=k-b_{o}$ which is the same using the common sense approach.

The following diagram shows the solution of the above using the dynamic graph method.

2.27 Lecture 27. Tuesday, April 19, 2016

2.27.1 LQR and dynamic programming

In the following, some of the terms were re-written with the index being as subscript, as it is easier to see on the screen, Hence instead of $x\left ( k\right ) $ as was done in the lecture, it becomes $x_{k}$ and $u\left ( k\right ) $ becomes $u_{k}$ and so on.

Now we start with the state equation\[ \boxed{ x_{k+1}=Ax_{k}+Bu_{k} } \] In the above, $A$ is $n\times n$ and $B$ is $n\times m$ where $m$ is the number of inputs, and $u_{k}$ is column vector of size $m$, and similarly for $x_{k+1}$ and $x_{k}$. In continuous time, the above is\[ \boxed{ \dot{x}\left ( t\right ) =Ax\left ( t\right ) +Bu\left ( t\right ) } \] We can also allow time varying $A,B$ in the above, but in this discussion we assumed these are constant matrices. The goal is to make $x\left ( k\right ) $ track, or approach some desired $x^{d}\left ( k\right ) $ with as little eﬀort $u\left ( k\right ) $ as possible (what about also as fast as possible at same time?). $u\left ( k\right ) $ is called the control eﬀort. This diagram illustrates this

When $x^{d}\left ( \infty \right ) =0$, this means we want to bring the system to stable state. We want now to quantify the goal $J\left ( u\right ) $. So the problem is to bring the state $x\left ( k\right ) $ to zero using as small eﬀort $u$ as possible. We write the cost $J\left ( u\right ) $ as\[ \boxed{ J\left ( u\right ) =\sum _{k=0}^{N-1}x_{k+1}^{T}Qx_{k+1}+u_{k}^{T}Ru_{k} } \] This is just something we have to accept as given. We did not derive this, but it makes sense. Notice for example, when $x\left ( k\right ) $ is small, then $J\left ( u\right ) $ is small. But from now on, we just use the above as given. In the above, $Q$ and $R$ are called weighting matrices. For example, if $u\left ( 1\right ) $ is more important than $u\left ( 2\right ) $, we adjust the values in $Q$ to reﬂect diﬀerent weights we want to assign. In the above, $Q$ is $n\times m$ and $R$ is $m\times m$. Both $Q$ and $R$ are positive deﬁnite and symmetric. Now we start using the Bellman dynamic equations.\begin{align} I\left ( x_{N-1},1\right ) & =\min _{u_{N-1}\in \Omega _{N-1}}\left \{ J\left ( x_{N-1},u_{N-1}\right ) ,\Psi \left ( x_{N}\right ) \right \} \nonumber \\ & =\min _{u_{N-1}\in \Omega _{N-1}}\left \{ x_{N}^{T}Qx_{N}+u_{N-1}^{T}Ru_{N-1}\right \} \tag{1} \end{align}

We ignore the terminal cost $\Psi \left ( x_{N}\right ) $ for now. Be careful with the indices. Notice in the above, after replacing $J\left ( u\right ) $, that $x$ has index $N$ and the $u$ has the $N-1$ index on it. This is due to how $J\left ( u\right ) $ is given to us to use, which has $x_{k+1}$ in it already. EQ (1) is our starting point. Now we start applying the dynamic equations recursively. First , we replace the state equation in the above and obtain\begin{equation} I\left ( x_{N-1},1\right ) =\min _{u_{N-1}\in \Omega _{N-1}}\left \{ \left ( Ax_{N-1}+Bu_{N-1}\right ) ^{T}Q\left ( Ax_{N-1}+Bu_{N-1}\right ) +u_{N-1}^{T}Ru_{N-1}\right \} \tag{2} \end{equation} Notice that now all indices on $x$ are $N-1$, this is because the state equation being $x_{k+1}=Ax_{k}+Bu_{k}$. Notice that (2) is quadratic in $u_{N-1}$. This is important. Doing one more simpliﬁcation on (2) gives (where the leading $\min _{u_{N-1}\in \Omega _{N-1}}$ is now removed, just to make the equations ﬁt), but it is assumed to be on each equation on what follows

\begin{align*} I\left ( x_{N-1},1\right ) & =\left ( \left ( Ax_{N-1}\right ) ^{T}+\left ( Bu_{N-1}\right ) ^{T}\right ) Q\left ( Ax_{N-1}+Bu_{N-1}\right ) +u_{N-1}^{T}Ru_{N-1}\\ & =\left ( x_{N-1}^{T}A^{T}+u_{N-1}^{T}B^{T}\right ) Q\left ( Ax_{N-1}+Bu_{N-1}\right ) +u_{N-1}^{T}Ru_{N-1}\\ & =\left ( x_{N-1}^{T}A^{T}Q+u_{N-1}^{T}B^{T}Q\right ) \left ( Ax_{N-1}+Bu_{N-1}\right ) +u_{N-1}^{T}Ru_{N-1}\\ & =\left ( x_{N-1}^{T}A^{T}Q+u_{N-1}^{T}B^{T}Q\right ) \left ( Ax_{N-1}\right ) +\left ( x_{N-1}^{T}A^{T}Q+u_{N-1}^{T}B^{T}Q\right ) \left ( Bu_{N-1}\right ) +u_{N-1}^{T}Ru_{N-1}\\ & =x_{N-1}^{T}A^{T}QAx_{N-1}+u_{N-1}^{T}B^{T}QAx_{N-1}+x_{N-1}^{T}A^{T}QBu_{N-1} + u_{N-1}^{T}B^{T}QBu_{N-1} +u_{N-1}^{T}Ru_{N-1}\\ & =\left ( u_{N-1}^{T}B^{T}QBu_{N-1}+u_{N-1}^{T}Ru_{N-1}\right ) + u_{N-1}^{T}B^{T}QAx_{N-1}+x_{N-1}^{T}A^{T}QBu_{N-1}+x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1} + \left ( u_{N-1}^{T}B^{T}QAx_{N-1}+x_{N-1}^{T}A^{T}QBu_{N-1}\right ) +\left ( x_{N-1}^{T} A^{T}QAx_{N-1}\right ) \\ & =u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1} + \left ( u_{N-1}^{T}B^{T}QAx_{N-1}+\left ( \left ( QBu_{N-1}\right ) ^{T}\left ( x_{N-1}^{T}A^{T}\right ) ^{T}\right ) ^{T}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1} + \left ( u_{N-1}^{T}B^{T}QAx_{N-1}+\left ( \left ( QBu_{N-1}\right ) ^{T}\left ( Ax_{N-1}\right ) \right ) ^{T}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1} +\left ( u_{N-1}^{T}B^{T}QAx_{N-1}+\left ( \left ( u_{N-1}^{T}\left ( QB\right ) ^{T}\right ) \left ( Ax_{N-1}\right ) \right ) ^{T}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1} +\left ( u_{N-1}^{T}B^{T}QAx_{N-1}+\left ( u_{N-1}^{T}\left ( B^{T}Q^{T}\right ) \left ( Ax_{N-1}\right ) \right ) ^{T}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1} +\left ( u_{N-1}^{T}B^{T}QAx_{N-1}+\left ( u_{N-1}^{T}B^{T}Q^{T}Ax_{N-1}\right ) ^{T}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1} \end{align*}

But $Q^{T}=Q$ and the above becomes\begin{align*} I\left ( x_{N-1},1\right ) &=u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1}+\left ( u_{N-1}^{T}B^{T}QAx_{N-1}+\left ( u_{N-1}^{T}B^{T}QAx_{N-1}\right ) ^{T}\right ) \\ & \qquad +x_{N-1}^{T}A^{T}QAx_{N-1} \tag{2} \end{align*}

Using \[ \boxed{H=\frac{1}{2}\left ( H+H^{T}\right ) } \] On the middle term $\left ( u_{N-1}^{T}B^{T}QAx_{N-1}+\left ( u_{N-1}^{T}B^{T}QAx_{N-1}\right ) ^{T}\right ) $, where $H=u_{N-1}^{T}B^{T}QAx_{N-1}$ reduces (2) to \begin{align*} I\left ( x_{N-1},1\right ) &=\min _{u_{N-1}\in \Omega _{N-1}}\left \{ u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1}+2\left ( u_{N-1}^{T}B^{T}QAx_{N-1}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1}\right \} \tag{3} \end{align*}

The above is in the form $I=a_{1}u^{2}+2a_{2}u+a_{3}$, therefore it is quadratic in $u_{N-1}$. Taking derivative w.r.t. $u_{N-1}$ since this is what we are minimizing $I$ with respect to, we obtain from (3)\begin{align*} \frac{\partial I\left ( x_{N-1},1\right ) }{\partial u_{N-1}} & =0\\ 0 & =2\left ( B^{T}QB+R\right ) u_{N-1}+2\left ( B^{T}QAx_{N-1}\right ) \end{align*}

$R$ is positive deﬁnite, and $Q$ is positive deﬁnite. Solving for $u_{N-1}$ gives\begin{align} u_{N-1}^{\ast } & =-\left ( B^{T}QB+R\right ) ^{-1}\left ( B^{T}QAx_{N-1}\right ) \nonumber \\ & =-\left ( B^{T}QB+R\right ) ^{-1}B^{T}QAx_{N-1} \tag{4} \end{align}

Reader $\left ( B^{T}QB+R\right ) $ is the Hessian. Show it is positive deﬁnite matrix. Since the Hessian is P.D., then $u_{N-1}^{\ast }$ is global min. Eq (4) is linear feedback on state $\left ( N-1\right ) $. i.e. we write\[ u^{\ast }\left ( N-1\right ) =K\left ( N-1\right ) x\left ( N-1\right ) \] Where $K\left ( N-1\right ) $ is called the gain matrix which is\[ K\left ( N-1\right ) =-\left ( B^{T}QB+R\right ) ^{-1}B^{T}QA \] In all the expressions below, we see the term $\left ( B^{T}QB+R\right ) $ repeating. So to simplify things and make the equations smaller, let \[ \fbox{$\Phi =B^TQB+R$}\] Hence\[ K\left ( N-1\right ) =-\Phi ^{-1}B^{T}QA \] And therefore\begin{align*} u_{N-1}^{\ast } & =\left ( -\Phi ^{-1}B^{T}QA\right ) x_{N-1}\\ & =-\left ( B^{T}QB+R\right ) ^{-1}B^{T}QAx_{N-1} \end{align*}

Now we go back to (3) and replace the $u_{N-1}$ in that expression with $u_{N-1}^{\ast }$ we found in (4). (we remove the $\min _{u_{N-1}\in \Omega _{N-1}}$ we had in (3), since it is now the minimum)\begin{align*} I^{\ast }\left ( x_{N-1},1\right ) & =u_{N-1}^{\ast T}\Phi u_{N-1}^{\ast }+2\left ( u_{N-1}^{\ast T}B^{T}QAx_{N-1}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =\left ( \left ( -\Phi ^{-1}B^{T}QA\right ) x_{N-1}\right ) ^{T}\Phi \left ( -\Phi ^{-1}B^{T}QA\right ) x_{N-1}\\ & \qquad +2\left ( \left ( -\Phi ^{-1}B^{T}QA\right ) x_{N-1}\right ) ^{T}\left ( B^{T}QAx_{N-1}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =-x_{N-1}^{T}\left ( \Phi ^{-1}B^{T}QA\right ) ^{T}\Phi \left ( -\Phi ^{-1}B^{T}QA\right ) x_{N-1}-2x_{N-1}^{T}\left ( \Phi ^{-1}B^{T}QA\right ) ^{T}\left ( B^{T}QAx_{N-1}\right ) \\ & \qquad +x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =x_{N-1}^{T}\left [ \left ( \Phi ^{-1}B^{T}QA\right ) ^{T}\left ( B^{T}QA\right ) -2\left ( \Phi ^{-1}B^{T}QA\right ) ^{T}\left ( B^{T}QA\right ) +A^{T}QA\right ] x_{N-1}\\ & =x_{N-1}^{T}\left [ \left ( B^{T}QA\right ) ^{T}\left ( \Phi ^{-1}\right ) ^{T}\left ( B^{T}QA\right ) -2\left ( B^{T}QA\right ) ^{T}\left ( \Phi ^{-1}\right ) ^{T}\left ( B^{T}QA\right ) +A^{T}QA\right ] x_{N-1}\\ & =x_{N-1}^{T}\left [ \left ( QA\right ) ^{T}B\left ( \Phi ^{-1}\right ) ^{T}\left ( B^{T}QA\right ) -2\left ( QA\right ) ^{T}B\left ( \Phi ^{-1}\right ) ^{T}\left ( B^{T}QA\right ) +A^{T}QA\right ] x_{N-1}\\ & =x_{N-1}^{T}\left [ A^{T}Q^{T}B\left ( \Phi ^{-1}\right ) ^{T}\left ( B^{T}QA\right ) -2A^{T}Q^{T}B\left ( \Phi ^{-1}\right ) ^{T}\left ( B^{T}QA\right ) +A^{T}QA\right ] x_{N-1} \end{align*}

But $Q=Q^{T}$ and $\left ( \Phi ^{-1}\right ) ^{T}=\Phi ^{-1}$. Note that $\Phi $ is the Hessian matrix, and it is positive deﬁnite, and assumed symmetric. That is why $\left ( \Phi ^{-1}\right ) ^{T}=\Phi ^{-1}$. But we did not proof this. It was a reader to show this is positive deﬁnite. The above therefore becomes\begin{align*} I^{\ast }\left ( x_{N-1},1\right ) & =x_{N-1}^{T}\left [ A^{T}QB\Phi ^{-1}B^{T}QA-2A^{T}Q^{T}B\Phi ^{-1}B^{T}QA+A^{T}QA\right ] x_{N-1}\\ & =x_{N-1}^{T}\left [ A^{T}Q-A^{T}QB\Phi ^{-1}B^{T}QA\right ] x_{N-1}\\ & =x_{N-1}^{T}A^{T}Q\left [ I-B\Phi ^{-1}B^{T}QA\right ] x_{N-1}\\ & =x_{N-1}^{T}A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QAx_{N-1} \end{align*}

Let\[ M_{N-1}=A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA \] Then\[ I^{\ast }\left ( x_{N-1},1\right ) =x_{N-1}^{T}M_{N-1}x_{N-1}\] Now that we found $I^{\ast }\left ( x_{N-1},1\right ) $, we go back one more step\begin{align} I\left ( x_{N-2},2\right ) & =\min _{u_{N-2}\in \Omega _{N-2}}\left \{ J\left ( x_{N-2},u_{N-2}\right ) +I^{\ast }\left ( x_{N-1},1\right ) \right \} \nonumber \\ & =\min _{u_{N-2}\in \Omega _{N-2}}\left \{ x_{N-1}^{T}Qx_{N-1}+u_{N-2}^{T}Ru_{N-2}+I^{\ast }\left ( x_{N-1},1\right ) \right \} \nonumber \\ & =\min _{u_{N-2}\in \Omega _{N-2}}\left \{ x_{N-1}^{T}Qx_{N-1}+u_{N-2}^{T}Ru_{N-2}+x_{N-1}^{T}M_{N-1}x_{N-1}\right \} \nonumber \\ & =\min _{u_{N-2}\in \Omega _{N-2}}\left \{ x_{N-1}^{T}\left ( Q+M_{N-1}\right ) x_{N-1}+u_{N-2}^{T}Ru_{N-2}\right \} \tag{6} \end{align}

Think of $\left ( Q+M_{N-1}\right ) $ as the new $Q$ matrix at stage $N-2$. We need to replace everything to be at $N-2$ stage, using the state equation $x_{k+1}=Ax_{k}+Bu_{k}$ then\[ x_{N-1}=Ax_{N-2}+Bu_{N-2}\] Hence (6) becomes\[ I\left ( x_{N-2},2\right ) =\min _{u_{N-2}\in \Omega _{N-2}}\left \{ \left ( Ax_{N-2}+Bu_{N-2}\right ) ^{T}\left ( Q+M_{N-1}\right ) \left ( Ax_{N-2}+Bu_{N-2}\right ) +u_{N-2}^{T}Ru_{N-2}\right \} \] As above, remove $\min _{u_{N-2}\in \Omega _{N-2}}$ in what follows so that equations ﬁt on the page and let \[ \fbox{$Q^\prime =Q+M_N-1$}\] Then\begin{align} I\left ( x_{N-2},2\right ) & =\left ( \left ( Ax_{N-2}\right ) ^{T}+\left ( Bu_{N-2}\right ) ^{T}\right ) Q^{\prime }\left ( Ax_{N-2}+Bu_{N-2}\right ) +u_{N-2}^{T}Ru_{N-2}\nonumber \\ & =\left ( \left ( Ax_{N-2}\right ) ^{T}Q^{\prime }+\left ( Bu_{N-2}\right ) ^{T}Q^{\prime }\right ) \left ( Ax_{N-2}+Bu_{N-2}\right ) +u_{N-2}^{T}Ru_{N-2}\nonumber \\ & =\left ( Ax_{N-2}\right ) ^{T}Q^{\prime }\left ( Ax_{N-2}+Bu_{N-2}\right ) +\left ( Bu_{N-2}\right ) ^{T}Q^{\prime }\left ( Ax_{N-2}+Bu_{N-2}\right ) +u_{N-2}^{T}Ru_{N-2}\nonumber \\ & =\left ( Ax_{N-2}\right ) ^{T}Q^{\prime }Ax_{N-2}+\left ( Ax_{N-2}\right ) ^{T}Q^{\prime }Bu_{N-2}+\left ( Bu_{N-2}\right ) ^{T}Q^{\prime }Ax_{N-2}+\left ( Bu_{N-2}\right ) ^{T}Q^{\prime }Bu_{N-2} +\\ & \qquad u_{N-2}^{T}Ru_{N-2}\nonumber \\ & =x_{N-2}^{T}A^{T}Q^{\prime }Ax_{N-2}+x_{N-2}^{T}A^{T}Q^{\prime }Bu_{N-2}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}+u_{N-2}^{T}B^{T}Q^{\prime }Bu_{N-2}+u_{N-2}^{T}Ru_{N-2}\nonumber \\ & =u_{N-2}^{T}\left ( B^{T}Q^{\prime }B+R\right ) u_{N-2}+x_{N-2}^{T}\left ( A^{T}Q^{\prime }A\right ) x_{N-2}+x_{N-2}^{T}A^{T}Q^{\prime }Bu_{N-2}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2} \tag{7} \end{align}

But \begin{align*} x_{N-2}^{T}A^{T}Q^{\prime }Bu_{N-2}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2} & =\left ( u_{N-2}^{T}\left ( x_{N-2}^{T}A^{T}Q^{\prime }B\right ) ^{T}\right ) ^{T}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}\\ & =\left ( u_{N-2}^{T}\left ( \left ( A^{T}Q^{\prime }B\right ) ^{T}x_{N-2}\right ) \right ) ^{T}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}\\ & =\left ( u_{N-2}^{T}\left ( \left ( Q^{\prime }B\right ) ^{T}Ax_{N-2}\right ) \right ) ^{T}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}\\ & =\left ( u_{N-2}^{T}\left ( B^{T}\left ( Q^{\prime }\right ) ^{T}Ax_{N-2}\right ) \right ) ^{T}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2} \end{align*}

But $Q^{\prime }=\left ( Q^{\prime }\right ) ^{T}$ since symmetric² then the above becomes\begin{align} x_{N-2}^{T}A^{T}Q^{\prime }Bu_{N-2}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2} & =\left ( u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}\right ) ^{T}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}\nonumber \\ & =2\left ( u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}\right ) \tag{8} \end{align}

Using $H=\frac{1}{2}\left ( H+H^{T}\right ) $. Replacing (8) into (7) gives\begin{equation} I\left ( x_{N-2},2\right ) =u_{N-2}^{T}\left ( B^{T}Q^{\prime }B+R\right ) u_{N-2}+x_{N-2}^{T}\left ( A^{T}Q^{\prime }A\right ) x_{N-2}+2\left ( u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}\right ) \tag{9} \end{equation} We now ﬁnd $u_{N-2}^{\ast }$\begin{align*} \frac{\partial I\left ( x_{N-2},2\right ) }{\partial u_{N-2}} & =0\\ 0 & =2u_{N-2}\left ( B^{T}Q^{\prime }B+R\right ) +2B^{T}Q^{\prime }Ax_{N-2} \end{align*}

Hence\[ u_{N-2}^{\ast }=\left ( -\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }A\right ) x_{N-2}\] Where $Q^{\prime }=Q+M_{N-1}$. Hence\begin{align*} K\left ( N-2\right ) & =-\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }A\\ & =-\left ( B^{T}\left ( Q+M_{N-1}\right ) B+R\right ) ^{-1}B^{T}\left ( Q+M_{N-1}\right ) A \end{align*}

Now we go back to $I\left ( x_{N-2},2\right ) $ and replace $u_{N-2}$ with $u_{N-2}^{\ast }$ we just found. From (9)\begin{align*} I^{\ast }\left ( x_{N-2},2\right ) & =\left ( -\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}\right ) ^{T}\left ( B^{T}Q^{\prime }B+R\right ) \left ( -\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}\right ) +\\ & \qquad x_{N-2}^{T}\left ( A^{T}Q^{\prime }A\right ) x_{N-2}+2\left ( \left ( -\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}\right ) ^{T}B^{T}Q^{\prime }Ax_{N-2}\right ) \\ & =\left ( B^{T}Q^{\prime }Ax_{N-2}\right ) ^{T}\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}\left ( B^{T}Q^{\prime }B+R\right ) \left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}+\\ & \qquad x_{N-2}^{T}\left ( A^{T}Q^{\prime }A\right ) x_{N-2}-2\left ( \left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}\right ) ^{T}B^{T}Q^{\prime }Ax_{N-2}\\ & =\left ( B^{T}Q^{\prime }Ax_{N-2}\right ) ^{T}\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}+x_{N-2}^{T}\left ( A^{T}Q^{\prime }A\right ) x_{N-2}\\ & -2\left ( B^{T}Q^{\prime }Ax_{N-2}\right ) ^{T}\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}\\ & =x_{N-2}^{T}\left ( B^{T}Q^{\prime }A\right ) ^{T}\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}+x_{N-2}^{T}\left ( A^{T}Q^{\prime }A\right ) x_{N-2}\\ & =x_{N-2}^{T}\left [ \left ( A^{T}Q^{\prime T}B\right ) \left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }A+\left ( A^{T}Q^{\prime }A\right ) \right ] x_{N-2} \end{align*}

Reader Argue that $I^{\ast }\left ( x_{N-2},2\right ) $ looks like\[ I^{\ast }\left ( x_{N-2},2\right ) =x_{N-2}^{T}M_{N-2}x_{N-2}\] Reader Find $M_{N-2}$

\begin{align*} M_{N-2} & =\left ( A^{T}Q^{\prime T}B\right ) \left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }A+\left ( A^{T}Q^{\prime }A\right ) \\ & =\left ( A^{T}\left ( Q+M_{N}-1\right ) ^{T}B\right ) \left ( B^{T}\left ( Q+M_{N}-1\right ) B+R\right ) ^{-1}B^{T}\left ( Q+M_{N}-1\right ) A\\ & \qquad +\left (A^{T}\left ( Q+M_{N}-1\right ) A\right ) \end{align*}

\begin{align*} M_{N-2} & =\left ( A^{T}\left ( Q+\left ( A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA\right ) \right ) ^{T}B\right ) \left ( B^{T}\left ( Q+M_{N}-1\right ) B+R\right ) ^{-1}B^{T}\\ & \left ( Q+\left ( A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA\right ) \right ) A+\left ( A^{T}\left ( Q+\left ( A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA\right ) \right ) A\right ) \end{align*}

\begin{align*} M_{N-2} & =\left ( A^{T}\left ( Q+\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) \right ) ^{T}B\right ) \left ( B^{T}\left ( Q+M_{N}-1\right ) B+R\right ) ^{-1}B^{T}\\ & \hspace{0.75in}\left ( Q+\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) \right ) A\\ & \hspace{1in}+\left ( A^{T}\left ( Q+\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) \right ) A\right ) \end{align*}

\[ \boxed{ \begin{array}{rl} u_{N-i}^{\ast } & =K_{N-i}x_{N-i}\\ I^{\ast }\left ( x_{N-i}\right ) & =x_{N-i}^{T}M_{N-i}x_{N-i} \end{array} }\] Reader Find a formula for $K\left ( N-3\right ) $.

The expression for $K_{N-i}$ is\begin{align*} K_{N-1} & =-\left ( B^{T}QB+R\right ) ^{-1}B^{T}QA\\ K_{N-2} & =-\left ( B^{T}\left ( Q+M_{N-1}\right ) B+R\right ) ^{-1}B^{T}\left ( Q+M_{N-1}\right ) A\\ & =-\left ( B^{T}QB+R+B^{T}M_{N-1}B\right ) ^{-1}\left ( B^{T}QA+B^{T}M_{N-1}A\right ) \end{align*}

Where $M_{N-1}=A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA$ and $\Phi =B^{T}QB+R$, hence \begin{align*} K_{N-2} & =-\left ( B^{T}QB+R+B^{T}\left ( A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA\right ) B\right ) ^{-1}\\ & \qquad \left ( B^{T}QA+B^{T}\left ( A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA\right ) A\right ) \\ & =-\left ( B^{T}QB+R+B^{T}\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) B\right ) ^{-1}\\ & \qquad \hspace{0.1in}\hspace{1in}\left ( B^{T}QA+B^{T}\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) A\right ) \end{align*}

2.27.2 Example LQR using dynamic programming

Reader Let $A=\begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} ,B=\begin{pmatrix} 0\\ 1 \end{pmatrix} ,Q=\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} $ and $R=1$, the weight on input. For $N=3$, ﬁnd $K\left ( 2\right ) ,K\left ( 1\right ) ,K\left ( 0\right ) $ solving for LQR problem. We will get optimal gain $K\left ( i\right ) $ and these will not be the same. We do not like time varying gains, as in this case. We like the gain matrix to be constant, as it is easier to manger and more safe to use. If we make $N$ very large, then gain will become constant. We start from very large $N$ and go back to zero.

$N=3,$\begin{align*} K\left ( N-1\right ) & =-\left ( B^{T}QB+R\right ) ^{-1}B^{T}QA\\ K\left ( 2\right ) & =-\left ( \begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} 0\\ 1 \end{pmatrix} +1\right ) ^{-1}\begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} \\ & =-\left ( 2+1\right ) ^{-1}\begin{pmatrix} -1 & 4 \end{pmatrix} \\ & =\frac{-1}{3}\begin{pmatrix} -1 & 4 \end{pmatrix} \\ & =\begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix} \end{align*}

Hence \[ \fbox{$K\left ( 2\right ) =\begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix} =\begin{pmatrix} 0.33333 & -1.333\,3 \end{pmatrix} $}\] Will do things from now on using the state equations directly, as it seems easier. Starting again\begin{align*} I\left ( x\left ( 3-1\right ) ,1\right ) & =\min _{u\left ( 2\right ) }\left \{ J\left ( x_{2},u_{2}\right ) \right \} \\ I\left ( x_{2},1\right ) & =\min _{u_{2}}\left \{ x_{3}^{T}Qx_{3}+u_{2}^{T}Ru_{2}\right \} \end{align*}

But $x_{3}=Ax_{2}+Bu_{2}$ from state equation, hence the above becomes\begin{align*} I\left ( x_{2},1\right ) & =\min _{u_{2}}\left \{ \left ( Ax_{2}+Bu_{2}\right ) ^{T}Q\left ( Ax_{2}+Bu_{2}\right ) +u_{2}^{T}Ru_{2}\right \} \\ & =\min _{u_{2}}\left \{ \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} x_{2}+\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{2}\right ) ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} x_{2}+\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{2}\right ) +u_{2}^{T}\left ( 1\right ) u_{2}\right \} \\ & =\min _{u_{2}}\left \{ \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{2}\end{pmatrix} \right ) ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{2}\end{pmatrix} \right ) +u_{2}^{2}\right \} \\ & =\min _{u_{2}}\left \{ \begin{pmatrix} x_{11}-2x_{21}\\ u_{2}-x_{11}+3x_{21}\end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} x_{11}-2x_{21}\\ u_{2}-x_{11}+3x_{21}\end{pmatrix} +u_{2}^{2}\right \} \\ & =\min _{u_{2}}\left \{ \left ( x_{11}-2x_{21}\right ) \left ( u_{2}+x_{11}-x_{21}\right ) +\left ( 2u_{2}-x_{11}+4x_{21}\right ) \left ( u_{2}-x_{11}+3x_{21}\right ) +u_{2}^{2}\right \} \\ & =\min _{u_{2}}\left \{ 2u_{2}^{2}-2u_{2}x_{11}+8u_{2}x_{21}+2x_{11}^{2}-10x_{11}x_{21}+14x_{21}^{2}+u_{2}^{2}\right \} \\ & =\min _{u_{2}}\left \{ 3u_{2}^{2}-2u_{2}x_{11}+8u_{2}x_{21}+2x_{11}^{2}-10x_{11}x_{21}+14x_{21}^{2}\right \} \end{align*}

Hence\begin{align*} \frac{\partial I\left ( x_{2},1\right ) }{\partial u_{2}} & =0\\ 0 & =6u_{2}-2x_{11}+8x_{21}\\ 0 & =6u_{2}+\begin{pmatrix} -2 & 8 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \\ 0 & =6u_{2}+\begin{pmatrix} -2 & 8 \end{pmatrix} x_{1}\\ u_{2}^{\ast } & =-\frac{1}{6}\begin{pmatrix} -2 & 8 \end{pmatrix} x_{1}\\ & =\begin{pmatrix} \frac{2}{6} & \frac{-8}{6}\end{pmatrix} x_{1} \end{align*}

Therefore\[ \fbox{$u_2^\ast =\begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} $}\] And \[ \fbox{$K\left ( 2\right ) =\begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix} =$ $\begin{pmatrix} 0.33333 & -1.333\,3 \end{pmatrix} $}\] Which is the same as above using $-\left ( B^{T}QB+R\right ) ^{-1}B^{T}QA$.

Now we ﬁnd $I^{\ast }\left ( x_{2},1\right ) $ by using $u_{2}^{\ast }$ found above back in $I\left ( x_{2},1\right ) $\begin{align*} I^{\ast }\left ( x_{2},1\right ) & =\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{2}\right ) ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \\ & \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{2}\right ) +u_{2}^{T}\left ( 1\right ) u_{2}\\ & \\ & =\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix}\begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \right ) ^{T}\\ & \begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix}\begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \right ) \\ & \\ & +\left ( \begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \right ) ^{T}\begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \\ & =\begin{pmatrix} x_{11}-2x_{21}\\ \frac{5}{3}x_{21}-\frac{2}{3}x_{11}\end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} x_{11}-2x_{21}\\ \frac{5}{3}x_{21}-\frac{2}{3}x_{11}\end{pmatrix} \\ & +x_{11}\left ( \frac{1}{9}x_{11}-\frac{4}{9}x_{21}\right ) -x_{21}\left ( \frac{4}{9}x_{11}-\frac{16}{9}x_{21}\right ) \\ & \\ & =\left ( x_{11}-2x_{21}\right ) \left ( \frac{4}{3}x_{11}-\frac{7}{3}x_{21}\right ) +\left ( \frac{1}{3}x_{11}-\frac{4}{3}x_{21}\right ) \left ( \frac{2}{3}x_{11}-\frac{5}{3}x_{21}\right ) \\ & +x_{11}\left ( \frac{1}{9}x_{11}-\frac{4}{9}x_{21}\right ) -x_{21}\left ( \frac{4}{9}x_{11}-\frac{16}{9}x_{21}\right ) \\ & \\ & =\frac{5}{3}x_{11}^{2}-\frac{22}{3}x_{11}x_{21}+\frac{26}{3}x_{21}^{2} \end{align*}

We have to convert this to $x_{1}^{T}x_{1}$ to be able to use it in the next stage since we need to apply the state equation to it. We see that\[ \frac{5}{3}x_{11}^{2}-\frac{22}{3}x_{11}x_{21}+\frac{26}{3}x_{21}^{2}=\begin{pmatrix} x_{11} & x_{21}\end{pmatrix}\begin{pmatrix} c_{11} & c_{12}\\ c_{12} & c_{22}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \] Solving gives $\begin{pmatrix} c_{11} & c_{12}\\ c_{12} & c_{22}\end{pmatrix} =\begin{pmatrix} \frac{5}{3} & -\frac{22}{6}\\ -\frac{22}{6} & \frac{26}{3}\end{pmatrix} $, hence\[ \fbox{$I^\ast \left ( x_2,1\right ) =x_2^T\begin{pmatrix} \frac{5}{3} & -\frac{22}{6}\\ -\frac{22}{6} & \frac{26}{3}\end{pmatrix} x_2$}\] Now we ﬁnd $I\left ( x\left ( N-2\right ) ,2\right ) =I\left ( x\left ( 3-2\right ) ,2\right ) =I\left ( x_{1},2\right ) $\begin{align*} I\left ( x_{1},2\right ) & =\min _{u_{1}}\left \{ J\left ( x_{1},u_{1}\right ) +I^{\ast }\left ( x_{2},1\right ) \right \} \\ & =\min _{u_{3}}\left \{ x_{2}^{T}Qx_{2}+u_{1}^{T}Ru_{1}+I^{\ast }\left ( x_{2},1\right ) \right \} \end{align*}

But $x_{2}=Ax_{1}+Bu_{1}$ from state equation, hence the above becomes\begin{align*} I\left ( x_{1},2\right ) & =\min _{u_{3}}\left \{ \left ( Ax_{1}+Bu_{1}\right ) ^{T}Q\left ( Ax_{1}+Bu_{1}\right ) +u_{1}^{T}Ru_{1}+I^{\ast }\left ( x_{2},1\right ) \right \} \\ & =\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{1}\right ) ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{1}\right ) +u_{1}^{2}\\ & +x_{2}^{T}\begin{pmatrix} \frac{5}{3} & -\frac{22}{6}\\ -\frac{22}{6} & \frac{26}{3}\end{pmatrix} x_{2} \end{align*}

Now we have to use the state equations in $I^{\ast }\left ( x_{2},1\right ) $ to update the last term above, this is important, since everything should be at the same state\begin{align*} I\left ( x_{1},2\right ) & =3u_{1}^{2}-2u_{1}x_{11}+8u_{1}x_{21}+2x_{11}^{2}-10x_{11}x_{21}+14x_{21}^{2}\\ & +\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{1}\right ) ^{T}\begin{pmatrix} \frac{5}{3} & -\frac{22}{6}\\ -\frac{22}{6} & \frac{26}{3}\end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{1}\right ) \\ & \\ & =3u_{1}^{2}-2u_{1}x_{11}+8u_{1}x_{21}+2x_{11}^{2}-10x_{11}x_{21}+14x_{21}^{2}\\ & \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{1}\end{pmatrix} \right ) ^{T}\begin{pmatrix} \frac{5}{3} & -\frac{22}{6}\\ -\frac{22}{6} & \frac{26}{3}\end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{1}\end{pmatrix} \right ) \\ & \\ & =3u_{1}^{2}-2u_{1}x_{11}+8u_{1}x_{21}+2x_{11}^{2}-10x_{11}x_{21}+14x_{21}^{2}\\ & \frac{26}{3}u_{1}^{2}-\frac{74}{3}u_{1}x_{11}+\frac{200}{3}u_{1}x_{21}+\frac{53}{3}x_{11}^{2}-\frac{286}{3}x_{11}x_{21}+\frac{386}{3}x_{21}^{2}\\ & \\ & =\frac{35}{3}u_{1}^{2}-\frac{80}{3}u_{1}x_{11}+\frac{224}{3}u_{1}x_{21}+\frac{59}{3}x_{11}^{2}-\frac{316}{3}x_{11}x_{21}+\frac{428}{3}x_{21}^{2} \end{align*}

Now we take derivative to ﬁnd optimal $u_{1}^{\ast }$\begin{align*} \frac{\partial I\left ( x_{1},2\right ) }{\partial u_{1}} & =0\\ 0 & =\frac{70}{3}u_{1}-\frac{80}{3}x_{11}+\frac{224}{3}x_{21}\\ u_{1}^{\ast } & =\frac{3}{70}\left ( \frac{80}{3}x_{11}-\frac{224}{3}x_{21}\right ) \\ & =\frac{8}{7}x_{11}-\frac{16}{5}x_{21}\\ & =\begin{pmatrix} \frac{8}{7} & -\frac{16}{5}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \end{align*}

Hence \[ \fbox{$u_1^\ast =\begin{pmatrix} \frac{8}{7} & -\frac{16}{5}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} $}\] And \[ K\left ( 1\right ) =\begin{pmatrix} \frac{8}{7} & -\frac{16}{5}\end{pmatrix} =\begin{pmatrix} 1.1429 & -3.2 \end{pmatrix} \] Therefore, we ﬁnd $I^{\ast }\left ( x_{1},2\right ) $ using $u_{1}^{\ast }$\[ I^{\ast }\left ( x_{1},2\right ) =\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{1}^{\ast }\right ) ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{1}^{\ast }\right ) +u_{1}^{\ast T}\left ( 1\right ) u_{1}^{\ast }\] But $u_{1}^{\ast }=\begin{pmatrix} \frac{8}{7} & -\frac{16}{5}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} $, hence the above becomes\begin{align*} I^{\ast }\left ( x_{1},2\right ) & =\begin{pmatrix} x_{11}-2x_{21}\\ \frac{1}{7}x_{11}-\frac{1}{5}x_{21}\end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} x_{11}-2x_{21}\\ \frac{1}{7}x_{11}-\frac{1}{5}x_{21}\end{pmatrix} +\left ( \begin{pmatrix} \frac{8}{7} & -\frac{16}{5}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \right ) ^{T}\begin{pmatrix} \frac{8}{7} & -\frac{16}{5}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \\ & =\left ( x_{11}-2x_{21}\right ) \left ( \frac{15}{7}x_{11}-\frac{21}{5}x_{21}\right ) -\left ( \frac{1}{5}x_{21}-\frac{1}{7}x_{11}\right ) \left ( \frac{9}{7}x_{11}-\frac{12}{5}x_{21}\right ) \\ & +x_{11}\left ( \frac{64}{49}x_{11}-\frac{128}{35}x_{21}\right ) -x_{21}\left ( \frac{128}{35}x_{11}-\frac{256}{25}x_{21}\right ) \\ & \\ & =\frac{178}{49}x_{11}^{2}-\frac{82}{5}x_{11}x_{21}+\frac{478}{25}x_{21}^{2} \end{align*}

We have to write the above as $x^{T}Cx$ in order to update the state the next stage. As before, we solve for $C$ from\begin{align*} \frac{178}{49}x_{11}^{2}-\frac{82}{5}x_{11}x_{21}+\frac{478}{25}x_{21}^{2} & =\begin{pmatrix} x_{11} & x_{21}\end{pmatrix}\begin{pmatrix} c_{11} & c_{12}\\ c_{12} & c_{22}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \\ & =c_{11}x_{11}^{2}+2c_{12}x_{11}x_{21}c_{22}x_{21}^{2} \end{align*}

Hence $c_{11}=\frac{178}{49},c_{22}=\frac{478}{25},c_{12}=-\frac{82}{10}$, therefore\[ I^{\ast }\left ( x_{1},2\right ) =\begin{pmatrix} x_{11} & x_{21}\end{pmatrix}\begin{pmatrix} \frac{178}{49} & \frac{82}{10}\\ \frac{82}{10} & \frac{478}{25}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \] Now we back one more ﬁnal step to ﬁnd $K\left ( 0\right ) $. We ﬁnd $I\left ( x\left ( N-3\right ) ,3\right ) =I\left ( x\left ( 0\right ) ,3\right ) $\begin{align*} I\left ( x_{0},3\right ) & =\min _{u0}\left \{ J\left ( x_{0},u_{0}\right ) +I^{\ast }\left ( x_{1},2\right ) \right \} \\ & =\min _{u_{0}}\left \{ x_{1}^{T}Qx_{1}+u_{0}^{T}Ru_{0}+I^{\ast }\left ( x_{1},2\right ) \right \} \end{align*}

But $x_{1}=Ax_{0}+Bu_{0}$ from state equation, hence the above becomes\begin{align*} I\left ( x_{0},3\right ) & =\min _{u_{0}}\left \{ \left ( Ax_{0}+Bu_{0}\right ) ^{T}Q\left ( Ax_{0}+Bu_{0}\right ) +u_{0}^{T}Ru_{0}+I^{\ast }\left ( x_{1},2\right ) \right \} \\ & =\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{0}\right ) ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{0}\right ) +u_{0}^{2}\\ & +x_{1}^{T}\begin{pmatrix} \frac{178}{49} & \frac{82}{10}\\ \frac{82}{10} & \frac{478}{25}\end{pmatrix} x_{1} \end{align*}

Now we have to use the state equations in $I^{\ast }\left ( x_{1},2\right ) $ to update the last term above, this is important, since everything should be at the same state\begin{align*} I\left ( x_{0},3\right ) & =\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{0}\end{pmatrix} \right ) ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{0}\end{pmatrix} \right ) +u_{0}^{2}\\ & +\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{0}\end{pmatrix} \right ) ^{T}\begin{pmatrix} \frac{178}{49} & \frac{82}{10}\\ \frac{82}{10} & \frac{478}{25}\end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{0}\end{pmatrix} \right ) \\ & \\ & =\left ( x_{11}-2x_{21}\right ) \left ( u_{0}+x_{11}-x_{21}\right ) +u_{0}^{2}+\left ( 2u_{0}-x_{11}+4x_{21}\right ) \left ( u_{0}-x_{11}+3x_{21}\right ) \\ & +\left ( x_{11}-2x_{21}\right ) \left ( \frac{41}{5}u_{0}-\frac{1119}{245}x_{11}+\frac{4247}{245}x_{21}\right ) +\\ & \left ( \frac{478}{25}u_{0}-\frac{273}{25}x_{11}+\frac{1024}{25}x_{21}\right ) \left ( u_{0}-x_{11}+3x_{21}\right ) \\ & \\ & =\frac{553}{25}u_{0}^{2}-\frac{596}{25}u_{0}x_{11}+\frac{2248}{25}u_{0}x_{21}+\frac{10\,232}{1225}x_{11}^{2}-\frac{70\,132}{1225}x_{11}x_{21}+\frac{125\,208}{1225}x_{21}^{2} \end{align*}

Now we take derivative to ﬁnd optimal $u_{0}^{\ast }$\begin{align*} \frac{\partial I\left ( x_{0},3\right ) }{\partial u_{0}} & =0\\ 0 & =2\frac{553}{25}u_{0}-\frac{596}{25}x_{11}+\frac{2248}{25}x_{21} \end{align*}

Hence\[ u_{0}^{\ast }=\frac{596}{603}x_{11}-\frac{2248}{603}x_{21}\] Therefore\[ \fbox{$u_0^\ast =\begin{pmatrix} \frac{596}{603} & -\frac{2248}{603}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} =\begin{pmatrix} 0.98839 & -3.728 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} $}\] And \[ K\left ( 0\right ) =\begin{pmatrix} 0.988\,39 & -3.728 \end{pmatrix} \] Matlab dlqr gives a slightly diﬀerent result and the signs are switched. This needs to be looked at it more. Let us verify $K\left ( 1\right ) $ using the Bellman dynamic equations we derived earlier which is\begin{align} K\left ( 1\right ) & =-\left ( B^{T}QB+R+B^{T}\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) B\right ) ^{-1}\nonumber \\ & \left ( B^{T}QA+B^{T}\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) A\right ) \tag{10} \end{align}

Let $\Delta =B^{T}\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) ,$ hence\[ K\left ( 1\right ) =-\left ( B^{T}QB+R+\Delta B\right ) ^{-1}\left ( B^{T}QA+\Delta A\right ) \] We already found $K\left ( 1\right ) =\begin{pmatrix} 1.1429 & -3.2 \end{pmatrix} $ using the direct method. Just need to verify using the dynamic equations found.\begin{align*} \Delta & =\begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left [ \begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} ^{-1}-\begin{pmatrix} 0\\ 1 \end{pmatrix} \left ( \begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} 0\\ 1 \end{pmatrix} +1\right ) ^{-1}\begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\right ] \begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} \right ) \\ & =\begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\begin{pmatrix} 1 & -1\\ -1 & 4 \end{pmatrix}\begin{pmatrix} \frac{2}{3} & -\frac{1}{3}\\ -\frac{1}{3} & \frac{1}{3}\end{pmatrix}\begin{pmatrix} 1 & -1\\ -1 & 4 \end{pmatrix} \\ & =\begin{pmatrix} -\frac{11}{3} & \frac{26}{3}\end{pmatrix} \end{align*}

Hence (10) becomes\begin{align*} K\left ( 1\right ) & =-\left ( \begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} 0\\ 1 \end{pmatrix} +1+\begin{pmatrix} -\frac{11}{3} & \frac{26}{3}\end{pmatrix}\begin{pmatrix} 0\\ 1 \end{pmatrix} \right ) ^{-1}\left ( \begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} +\begin{pmatrix} -\frac{11}{3} & \frac{26}{3}\end{pmatrix}\begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} \right ) \\ & =-\begin{pmatrix} -\frac{8}{7} & \frac{16}{5}\end{pmatrix} \\ & =\begin{pmatrix} 1.142\,9 & -3.2 \end{pmatrix} \end{align*}

Which matches the direct method used above. Similar results are obtained using Matlab. Matlab has the signs reversed, this needs to be investigated to ﬁnd out why.

>> A=[1,-2;-1,3];
>> B=[0;1];
>> Q=[2,1;1,2];
>> R=1;
>> [k,s,e]=dlqr(A,B,Q,R)
k =
-1.3195    3.6050
s =
7.0346  -10.3887
-10.3887   28.3824
e =
0.2679
0.1270

Next we will talk about variations of dynamic programming. Floor and ceiling . We might want to optimize for maximum of some variable. For economy, this could be ceiling of unemployment. We can modify the dynamic equations to handle these problems. Floor and ceiling violate the additive process we used in D.P. but we can still manage to use the dynamic equations with some modiﬁcations.

2.28 Lecture 28. Thursday, April 21, 2016

2.28.1 Variations of dynamic programming, ﬂoor and ceiling

Today’s lecture is on variations of dynamic programming. Many integer programming problem can be cast as D.P. The emphasis will be heuristics more than proofs.

One such variation is when the objective function is in terms of the ﬂoor or ceiling of variable. Another variation is steady state (this is when the number of stages becomes very large and goes to inﬁnity).

Floor and ceiling To begin, say that $x_{k}$ is scalar. We are interested in the ﬂoor of $x_{k}$. This is $\min _{k}x_{k}$. We can also talk about ceiling. This is $\max _{k}x\left ( k\right ) $. More formally

Where $u\left ( k\right ) \in \Omega $. We introduce a monitoring function $g\left ( x\left ( k\right ) \right ) $. Therefore, the ﬂoor problem can be stated as

For example, the ceiling problem could be to minimize the maximum of unemployment, stated as

Modeling our D.P. analysis Assume we are doing the ﬂoor problem now. Then we write

\[ I\left ( x_{N-1},1\right ) =\max _{u_{N-1}\in \Omega _{N-1}}g\left ( x_{N}\right ) \]

\[ I\left ( x_{N-1},1\right ) =\min _{u_{N-1}\in \Omega _{N-1}}g\left ( x_{N}\right ) \]

\[ I\left ( x_{N-2},2\right ) =\min _{u_{N-2}\in \Omega _{N-2}}\left \{ \max \left \{ g\left ( x_{N}\right ) ,I^{\ast }\left ( x_{N-1},1\right ) \right \} \right \} \]

\[ I\left ( x_{L},N-L\right ) =\min _{u_{L}\in \Omega _{L}}\left \{ \max \left \{ g\left ( x_{L+1}\right ) ,I^{\ast }\left ( x_{L+1},N-L-1\right ) \right \} \right \} \]

When applying the above, we see one diﬀerence from previous examples of D.P., now we have $\Omega _{L}$ which might depends on $u_{L-1},u_{L-2},\cdots $. Now we will go over one example. A simpliﬁed economy model example. Let $N$ be the years of horizon planning. Let $y_{k}$ be the national income for the $k$ year. Let $c_{k}$ be the consumer expenditure. Let $I_{k}$ be the business investment. And let $u_{k}$ be the government expenditure. The constraint is

Where $U$ is the budget given and $u_{k}\geq 0$. Hence in a given year we have

\begin{align} y_{k} & =c_{k}+I_{k}+u_{k-1}\tag{1}\\ c_{k} & =\alpha y_{k-1}\tag{2} \end{align}

Where $\alpha $ is propensity factor to consume. \begin{equation} I_{k}=\beta \left ( c_{k}-c_{k-1}\right ) \tag{3} \end{equation} Reader Eliminate $c_{k},I_{k}$ from (1,2,3) to obtain\begin{equation} y_{k}=\left ( 1+\beta \right ) \alpha y_{k-1}-\alpha \beta y_{k-2}+u_{k-1}\tag{4} \end{equation} Our goal is to control the output $y_{k}$ using the input $u_{k}$.

Therefore, now \[ y_{k-2}=y_{k-1}-z_{k-1}\] Substituting the above in (4) gives\begin{align*} y_{k} & =\left ( 1+\beta \right ) \alpha y_{k-1}-\alpha \beta \left ( y_{k-1}-z_{k-1}\right ) +u_{k-1}\\ & =\alpha y_{k-1}+\alpha \beta z_{k-1}+u_{k-1} \end{align*}

We have state equation\[ y_{k+1}=\alpha y_{k}+\alpha \beta z_{k}+u_{k}\] Next we want state equation for $z_{k}$

Reader\[ z_{k+1}=\left ( \alpha -1\right ) y_{k}+\alpha \beta z_{k}+u_{k}\] Now deﬁne state $x_{1}\left ( k\right ) =y_{k}$ and $x_{2}\left ( k\right ) =z_{k}$, hence the state equation in matrix form becomes\[ x_{k+1}=\begin{pmatrix} \alpha & \alpha \beta \\ \alpha -1 & \alpha \beta \end{pmatrix} x_{k}+\begin{pmatrix} 1\\ 1 \end{pmatrix} u_{k}\] We want to maximize the ﬂoor of the national income $y_{k}$ using D.P. To illustrate two stages, i.e. $N=2,$ let $U=1$ dollar, and let $\alpha =\beta =\frac{1}{2}$ Hence\[\begin{pmatrix} x_{1}\left ( k+1\right ) \\ x_{2}\left ( k+1\right ) \end{pmatrix} =\begin{pmatrix} \frac{1}{2} & \frac{1}{4}\\ \frac{-1}{2} & \frac{1}{4}\end{pmatrix}\begin{pmatrix} x_{1}\left ( k\right ) \\ x_{2}\left ( k\right ) \end{pmatrix} +\begin{pmatrix} 1\\ 1 \end{pmatrix} u_{k}\] Monitor function is $g\left ( x\left ( k\right ) \right ) =x_{1}\left ( k\right ) $

\begin{equation} I\left ( x\left ( 1\right ) ,1\right ) =\max _{u\left ( 1\right ) \in \Omega _{1}}x_{1}\left ( 2\right ) \tag{5} \end{equation}

\[ I\left ( x\left ( 1\right ) ,1\right ) =\max _{u\left ( 1\right ) \in \left [ 0\cdots 1-u\left ( 0\right ) \right ] }\left ( \frac{1}{2}x_{1}\left ( 1\right ) +\frac{1}{4}x_{2}\left ( 1\right ) +u\left ( 1\right ) \right ) \]

The minimizer is $u^{\ast }\left ( 1\right ) =1-u\left ( 0\right ) $, therefore

\[ I^{\ast }\left ( x\left ( 1\right ) ,1\right ) =\frac{1}{2}x_{1}\left ( 1\right ) +\frac{1}{4}x_{2}\left ( 1\right ) +1-u\left ( 0\right ) \]

\[ I\left ( x\left ( 0\right ) ,2\right ) =\max _{u\left ( 0\right ) \in \Omega _{0}}\min \left \{ g\left ( x\left ( 1\right ) ,I\left ( x\left ( 1\right ) ,1\right ) \right ) \right \} \]

Where $\Omega _{0}=\left [ 0\ldots 1\right ] $ since we have one dollar to start with. Hence

\[ I\left ( x\left ( 0\right ) ,2\right ) =\max _{u\left ( 0\right ) \in \left [ 0\ldots 1\right ] }\min \left \{ x_{1}\left ( 1\right ) ,\frac{1}{2}x_{1}\left ( 1\right ) +\frac{1}{4}x_{2}\left ( 1\right ) +1-u\left ( 0\right ) \right \} \]

Back everything to get an equation in $u\left ( 0\right ) $\[ I\left ( x\left ( 0\right ) ,2\right ) =\max _{u\left ( 0\right ) \in \left [ 0\ldots 1\right ] }\min \left \{ \frac{1}{2}x_{1}\left ( 0\right ) +\frac{1}{4}x_{2}\left ( 0\right ) +u\left ( 0\right ) ,\frac{1}{2}\left ( \frac{1}{2}x_{1}\left ( 0\right ) +\frac{1}{4}x_{2}\left ( 0\right ) +u\left ( 0\right ) \right ) +\frac{1}{4}\left ( -\frac{1}{2}x_{1}\left ( 0\right ) +\frac{1}{4}x_{2}\left ( 0\right ) +u\left ( 0\right ) \right ) 1-u\left ( 0\right ) \right \} \]

\[ I\left ( x\left ( 0\right ) ,2\right ) =\max _{u\left ( 0\right ) \in \left [ 0\ldots 1\right ] }\min \left \{ A+u\left ( 0\right ) ,B\right \} \]

Where \begin{align*} A & =\frac{1}{2}x_{1}\left ( 0\right ) +\frac{1}{4}x_{2}\left ( 0\right ) +u\left ( 0\right ) \\ B & =\frac{1}{8}x\left ( 0\right ) +\frac{3}{16}x_{2}\left ( 0\right ) +1-\frac{1}{4}u\left ( 0\right ) \end{align*}

Hence the function to minimize is \[ F\left ( u\left ( 0\right ) \right ) =\min \left \{ A+u\left ( 0\right ) ,B\right \} \]

2.29 Lecture 29. Tuesday, April 26, 2016

2.29.1 Detailed example for a ﬂoor problem

We will start today with one more dynamic problem which will be useful for HW problem. Then we will start on steady state. Example is a ﬂoor problem. \begin{align*} x_{1}\left ( k+1\right ) & =\min \left \{ x_{1}\left ( k\right ) ,x_{2}\left ( k\right ) \right \} +u\left ( k\right ) \\ x_{2}\left ( k+1\right ) & =x_{1}\left ( k\right ) u\left ( k\right ) \end{align*}

And initial state is \begin{align*} x_{1}\left ( 0\right ) & =1\\ x_{2}\left ( 0\right ) & =-1 \end{align*}

And \[ J=\min _{k=1,2}x_{2}\left ( k\right ) \] With $\left \vert u\left ( k\right ) \right \vert \leq M$. One step to go is\begin{align*} I\left ( x\left ( 1\right ) ,1\right ) & =\max _{u\left ( 1\right ) \in \Omega _{1}}\left \{ \min J\right \} \\ & =\max _{u\left ( 1\right ) \in \Omega _{1}}x_{2}\left ( 2\right ) \\ & =\max _{\left \vert u\left ( 1\right ) \right \vert \leq M}x_{1}\left ( 1\right ) u\left ( 1\right ) \end{align*}

Hence $u^{\ast }=M$ sign$\left ( x_{1}\left ( x\right ) \right ) $, therefore\[ I^{\ast }\left ( x\left ( 1\right ) ,1\right ) =M\text{ }abs\left ( x_{1}\left ( 1\right ) \right ) \] With two steps\begin{align*} I\left ( x\left ( 0\right ) ,2\right ) & =\max _{u\left ( 0\right ) \in \Omega _{0}}\left \{ \min \left \{ J\left ( x\left ( 1\right ) ,I\left ( x\left ( 1\right ) ,1\right ) \right ) \right \} \right \} \\ & =\max _{u\left ( 0\right ) \in \Omega _{0}}\min \left \{ x_{2}\left ( 1\right ) ,M\left \vert x_{1}\left ( 1\right ) \right \vert \right \} \\ & =\max _{u\left ( 0\right ) \in \Omega _{0}}\min \left \{ x_{1}\left ( 0\right ) u\left ( 0\right ) ,M\left \vert \min \left \{ x_{1}\left ( 0\right ) ,x_{2}\left ( 0\right ) \right \} +u\left ( 0\right ) \right \vert \right \} \\ & =\max _{u\left ( 0\right ) \in \Omega _{0}}\min \left \{ u\left ( 0\right ) ,M\left \vert u\left ( 0\right ) -1\right \vert \right \} \end{align*}

Where $F\left ( u\right ) =\min \left \{ u\left ( 0\right ) ,M\left \vert u\left ( 0\right ) -1\right \vert \right \} $. Consider the case $0<M\leq \sqrt{2}$ and case $M>\sqrt{2}$. See key solution for HW 7 for the solution.

We now start talking about steady state. Notion of functional equations., then iterative solution to steady state problem, then Riccati equation. We begin with\[ x_{k+1}=f\left ( x_{k},u_{k}\right ) \] With a constraint on $u_{k}$ given by $u_{k}\in \Omega _{k}$. The branch cost is \[ J=\Psi \left ( x_{N}\right ) +\sum _{k=0}^{N-1}J_{k}\left ( x_{k},u_{k}\right ) \] Then let $N\rightarrow \infty $. We are hoping to get $J_{N}$ and obtain $u_{N}^{\ast }=u_{0}^{\ast },u_{1}^{\ast },\cdots ,u_{N-1}^{\ast }$ optimal controls at each stage. Notice that when $N$ changes, then the whole sequence $u_{N}^{\ast }$ changes also, and not just one term. We now ask, does $J_{N}$ converge? does $u_{N}^{\ast }$ converge? To make sense of the above, we remove the terminal cost $\Psi \left ( x_{N}\right ) $ from this analysis. We also want $\Omega _{k}$ to be ﬁxed for any $N$. This means we have the same constraints all the time.

If steady state solution exist, then it satisﬁes\begin{equation} I\left ( x\right ) =\min _{u\in \Omega }\left \{ J\left ( x,u\right ) +I\left ( f\left ( x,u\right ) \right ) \right \} \tag{1} \end{equation} This is a functional equation. Solving it means ﬁnding $u^{\ast }$. The solution $u^{\ast }$ in (1) is a function of $x$. i.e.. $u$ at state $x$ is a feedback. Say $u^{\ast }=\sigma \left ( x\right ) $. Substitute this in (1) gives\[ I\left ( x\right ) =\min _{u\in \Omega }\left \{ J\left ( x,\sigma \left ( x\right ) \right ) +I\left ( f\left ( x,\sigma \left ( x\right ) \right ) \right ) \right \} \] We do not know $I\left ( x\right ) $ here. Before, we knew $I$, but now we do not know $I$. So we have a function space issue to ﬁnd $I\left ( x\right ) $.

Example \begin{align*} J\left ( x,u\right ) & =x^{2}+xu+u^{2}\\ f\left ( x,u\right ) & =xu+x+u\\ x\left ( k+1\right ) & =x\left ( k\right ) u\left ( k\right ) +x\left ( k\right ) +u\left ( k\right ) \end{align*}

Let $u$ be free variable with no constraints. Hence (1) becomes\[ I\left ( x\right ) =\min _{u}\left \{ x^{2}+xu+u^{2}+I\left ( xu+x+u\right ) \right \} \]

2.29.2 Functional equations in dynamic programming

This is a functional equation since $I\left ( x\right ) $, appears on both sides of the equation. There are two famous methods to solve functional equations. The ﬁrst is the iterative method. Begin with initial $I_{o}\left ( x\right ) $, then $I_{k+1}\left ( x\right ) =\min _{u\left ( k\right ) }\left \{ J\left ( x,u\right ) +I_{k}\left ( f\left ( x,u\right ) \right ) \right \} $. We get sequence of solutions of $u\left ( k\right ) $ and $I\left ( x\left ( k\right ) \right ) $ and check for convergence.

Example \begin{align*} J\left ( x,u\right ) & =u^{2}+\left ( x-u\right ) ^{2}\\ x\left ( k+1\right ) & =x\left ( k\right ) -u\left ( k\right ) \end{align*}

$I_{0}\left ( x\right ) =0$, hence\[ I_{1}\left ( x\right ) =\min _{u}\left \{ u^{2}+\left ( x-u\right ) ^{2}\right \} \] $\frac{d}{du}=0$ gives $u^{\ast }=\frac{x}{2}$, hence $I_{1}^{\ast }\left ( x\right ) =\frac{x^{2}}{4}+\frac{x^{2}}{4}=\frac{x^{2}}{2}$. Next stage becomes\begin{align*} I_{2}\left ( x\right ) & =\min _{u}\left \{ u^{2}+\left ( x-u\right ) ^{2}+I_{1}^{\ast }\right \} \\ & =\min _{u}\left \{ u^{2}+\left ( x-u\right ) ^{2}+\frac{\left ( x-u\right ) ^{2}}{2}\right \} \end{align*}

$\frac{d}{du}=0$ gives $u^{\ast }=\frac{3}{5}x$, hence $I_{1}^{\ast }\left ( x\right ) =\frac{3}{5}x^{2}$.

Reader Continue this process. Does it converge? It will converge eventually leading to $u^{\ast }\rightarrow kx$

2.30 Lecture 30. Thursday, April 28, 2016

2.30.1 Steady state and functional equations

The plan for today: we have been talking about steady state. This lead to functional equations in D.P. so far, we talked about iterative solution. Today we will talk about closed form solution. Analogy between diﬀerential equations and functional equations. In diﬀerential equations, the iterative method is called Picard iterations method.

For linear state equations, we can get closed form solution for the functional equation. We start with a guess of the solution with a parameter to ﬁnd. Now we will use our main example to illustrate this.\begin{align*} x_{k+1} & =x_{k}-u_{k}\\ J & =\sum u_{k}^{2}+\left ( x_{k+1}-u_{k}\right ) ^{2} \end{align*}

Notice the cross term with $x$ and $u$ in it. Now we guess a form for $I$. Let $I=px^{2}$ and then we try to ﬁnd $p$. First write $J$ above so that all indices are the same with the help of the state equation. This will reduce the chance of error later on\begin{align*} J & =\sum u_{k}^{2}+\left ( \left ( x_{k}-u_{k}\right ) -u_{k}\right ) ^{2}\\ & =\sum u_{k}^{2}+x_{k}^{2}+4u_{k}^{2}-4x_{k}u_{k}\\ & =\sum x_{k}^{2}+5u_{k}^{2}-4x_{k}u_{k} \end{align*}

Consider\begin{align} I\left ( x\right ) & =\min _{u\in \Omega }\left \{ J\left ( x,u\right ) +I\left ( f\left ( x,u\right ) \right ) \right \} \nonumber \\ px^{2} & =\min _{u\in \Omega }\left \{ \left ( 5u^{2}+x^{2}-4xu\right ) +p\left ( x-u\right ) ^{2}\right \} \tag{1} \end{align}

\begin{align*} \frac{d\left ( 5u^{2}+x^{2}-4xu\right ) +p\left ( x-u\right ) ^{2}}{du} & =0\\ 0 & =2u-4\left ( x-2u\right ) -2p\left ( x-u\right ) \end{align*}

Solving gives $u^{\ast }=\frac{2+p}{5+p}x$. Substitute back in (1)\[ px^{2}=\left ( 5u^{2}+x^{2}-4x\left ( \frac{2+p}{5+p}x\right ) \right ) +p\left ( x-\left ( \frac{2+p}{5+p}x\right ) \right ) ^{2}\] And obtain an equation in $p$ and solve for $p$. We ﬁnd roots are $p=0.302$ and $p=-3.3$. If everything was done correct, there should be a positive root. Always pick the positive one. This is special case of LQR. In LQR there is no cross term between $x,u$. While in the above there was. Reader For $x\left ( 0\right ) =1$ ﬁnd $J^{\ast }$.

Example Consider \[ x\left ( k+1\right ) =x\left ( k\right ) +2u\left ( k\right ) \] With constraint $u\left ( k\right ) \in \left [ -1,1\right ] $\[ J=\sum _{k=0}^{\infty }e^{x\left ( k+1\right ) }\] Guess $I=ae^{x}$ then\[ I\left ( x\right ) =\min _{u\in \left [ -1,1\right ] }\left ( e^{x+2u}+ae^{x+2u}\right ) \]

Therefore\[ ae^{x}=\left ( e^{x-2}+ae^{x-2}\right ) \] Solving gives \[ a=\frac{1}{e^{2}-1}>0 \] For LQR, the steady state is given by\begin{align*} x\left ( k+1\right ) & =Ax\left ( k\right ) +Bu\left ( k\right ) \\ J & =\sum _{k=0}^{\infty }x^{T}\left ( k+1\right ) Qx\left ( k+1\right ) +u^{T}\left ( k\right ) Ru\left ( k\right ) \end{align*}

Where $Q,R$ are weight matrices and are positive deﬁnite symmetric. $I$ should be quadratic in the state $x\left ( k\right ) $. \[ I\left ( x\right ) =\min _{u}x^{T}Qx+u^{T}Ru \] Guess $I=x^{T}Px$ and now solve for $P$, this leads to Riccati matrix equation.\begin{equation} I\left ( x\right ) =\min _{u}\left \{ x^{T}Qx+u^{T}Ru+\left ( Ax+Bu\right ) ^{T}P\left ( Ax+Bu\right ) \right \} \tag{2} \end{equation} Taking gradient w.r.t. $u$ and setting to zero, gives\begin{align*} 2Ru+2B^{T}PBu+2B^{T}PAx & =0\\ u^{\ast } & =-\left ( R+B^{T}PB\right ) ^{-1}B^{T}PAx \end{align*}

Back to (2) we ﬁnd\begin{align*} x^{T}Px & =x^{T}Qx+\left ( -\left ( R+B^{T}PB\right ) ^{-1}B^{T}PAx\right ) ^{T}R\left ( -\left ( R+B^{T}PB\right ) ^{-1}B^{T}PAx\right ) \\ & +\left ( Ax+B\left ( -\left ( R+B^{T}PB\right ) ^{-1}B^{T}PAx\right ) \right ) ^{T}P\left ( Ax+B\left ( -\left ( R+B^{T}PB\right ) ^{-1}B^{T}PAx\right ) \right ) \end{align*}

Solving to $P$, we obtain the Riccati matrix equation\[ \boxed{ P=A^{T}PA-A^{T}PB\left ( R+B^{T}PB\right ) ^{-1}B^{T}PA+Q } \] In Matlab, use dare() to solve this for $P$.

Remarks Are we sure the solution exist? i.e. does there exist positive deﬁnite $P$ that satisﬁes the Riccati equation above? Yes, a solution exist if $\left ( A,B\right ) $ is controllable system. Notice that the Riccati equation is not linear in $P$. This is solved numerically. Consider the special case of LQR with one state and one input. Hence everything is scalar. We obtain\[ p=a^{2}p-apb\frac{1}{r+b^{2}p}bpa+q \] Solving for $p$, show that there is solution $p>0$. Assume $b>0$ for controllability.

2.31 Lecture 31. Tuesday, May 3, 2016

This is the last lecture. Final exam is next lecture. Review of special problem and results obtained by diﬀerent reports. General approaches to solving the special problem included: Cluster analysis, noisy gradient and random search.

2.31.1 Final review for ﬁnal exam

We talked about steady state. Quadratic regulator has no cross coupling terms between $x$ and $u$\[ J=\sum _{k=0}^{\infty }x^{T}\left ( x+1\right ) Qx\left ( k+1\right ) +u^{T}Ru \] For general regulator, one can get a cross term as in $\ J=ax^{2}+bxu+cu$ but we did not discuss this.

Test 3, will have 4 questions on dynamic programming. With dynamic programming, one can solve the problem using the Bellman equations or using the graphical method. If there are ﬁnite stages, and the state $x$ is discrete $x\left ( k+1\right ) =f\left ( x\left ( k\right ) ,u\left ( k\right ) \right ) $ and if $u$ is discrete, then a graphical method can be used. If there are constraints, this will reduce the size of the tree more.

We can take an integer linear programming problem, which is hard to solve and treat it as continuous problem under special conditions and solve it much easier. We did not discuss non-linear programming and kuhn-Tucker conditions. But for many non-linear programming problem, it is possible to use the penalty method. There is also large scale linear programming, where sparsity becomes important. Also parallel programming become important for these problems. For dynamic programming, most of the books are on continuous time, and very few discusses discrete dynamic programming problems.

2.32 Lecture 32. Thursday, May 5, 2016

Final exam l.6546 — TeX4ht warning — “SaveEverypar’s: 3 at “begindocument and 2 “enddocument —

\(x_{1}\)	\(x_{2}\)	\(x_{3}\)	\(x_{4}\)	\(x_{5}\)	\(x_{6}\)	b


\(1\)	\(0\)	\(0\)	\(2\)	\(4\)	\(6\)	\(4\)
\(0\)	\(1\)	\(0\)	\(1\)	\(2\)	\(3\)	\(3\)
\(0\)	\(0\)	\(1\)	\(-1\)	\(2\)	\(1\)	\(1\)

\(x_{1}\)	\(x_{2}\)	\(x_{3}\)	\(x_{4}\)	\(x_{5}\)	\(x_{6}\)	b


\(\frac{1}{2}\)	\(0\)	\(0\)	\(1\)	\(2\)	\(3\)	\(2\)
\(0\)	\(1\)	\(0\)	\(1\)	\(2\)	\(3\)	\(3\)
\(0\)	\(0\)	\(1\)	\(-1\)	\(2\)	\(1\)	\(1\)

	\(x_{1}\)	\(x_{2}\)	\(x_{3}\)	\(x_{4}\)	\(x_{5}\)	\(x_{6}\)	b


\(r_{1}\)	\(\frac{1}{2}\)	\(0\)	\(0\)	\(1\)	\(2\)	\(3\)	\(2\)
\(^{r_{2}-r_{1}}\)	\(-\frac{1}{2}\)	\(1\)	\(0\)	\(0\)	\(0\)	\(0\)	\(1\)
\(r_{3}+r_{1}\)	\(\frac{1}{2}\)	\(0\)	\(1\)	\(0\)	\(4\)	\(4\)	\(3\)

	\(x_{1}\)	\(x_{2}\)	\(x_{3}\)	\(x_{4}\)	\(b\)


row 1	\(3\)	\(2\)	\(1\)	\(0\)	\(8\)
row 2	\(1\)	\(2\)	\(0\)	\(1\)	\(9\)


\(J\left ( u\right ) \)	\(-4\)	\(-5\)	\(0\)	\(0\)	\(0\)

	\(x_{1}\)	\(x_{2}\)	\(x_{3}\)	\(x_{4}\)	\(b\)


row 1	\(2.5\)	\(0\)	\(1\)	\(-0.5\)	\(3.5\)
row 2	\(0.5\)	\(1\)	\(0\)	\(0.5\)	\(4.5\)


\(J\left ( u\right ) \)	\(-1.5\)	\(0\)	\(0\)	\(2.5\)	\(22.5\)


\(L\)	\(I\left ( x_{N-L},L\right ) \)

\(1\)	\(0.2\ x_{N-1}^{2}\)

\(2\)	\(0.2692\ x_{N-2}^{2}\)

\(3\)	\(0.29197\ x_{N-3}^{2}\)

\(4\)	\(0.29931\ x_{N-4}^{2}\)


\(L\)	\(I\left ( x_{4-L},L\right ) \)

\(1\)	\(0.2\ x_{3}^{2}\)

\(2\)	\(0.2692\ x_{2}^{2}\)

\(3\)	\(0.29197\ x_{1}^{2}\)

\(4\)	\(0.29931\ x_{0}^{2}\)

Chapter 2Class notes

2.1 Lecture 1. Tuesday, January 19, 2016