0%

Optimization Revisit: From Linear Programming to Lagrange

By Z.H. Fu
https://fuzihaofzh.github.io/blog/

Optimization is the foundation of many other popular technologies and is widely used in our modern society. In this post, we will go through the basics in the foundation of optimization. We will start from the simplest linear programming, extend it to a more general conic version and finally arrive at Lagrange optimization. In order to concentrate on the main framework of optimization, we leave out some long proof which can be easily found in reference materials. All this post comes from course ENGG5501 taught by Prof. Anthony Man-Cho So in CUHK.

(I) Linear Programming

In linear programming (LP), the objective function and constraints are all linear w.r.t. variables. The standard form of linear programming is given as follows:

(P)      vp=minx   cTxs.t.   Ax=bx0\begin{aligned} (P)\ \ \ \ \ \ v_p^* = \min_\mathbf{x}\ \ \ &\mathbf{c}^T \mathbf{x}\\ \text{s.t. }\ \ &A\mathbf{x}=\mathbf{b}\\ & \mathbf{x} \ge \mathbf{0} \end{aligned}

Where x,cRn\mathbf{x},\mathbf{c}\in \mathbb{R}^n, bRm\mathbf{b}\in \mathbb{R}^m ARm×nA\in \mathbb{R}^{m \times n }. Since x0\mathbf{x}\ge \mathbf{0}, the feasible region contains no line. It implies that it has at least one vertex. As stated by a theorem, if (P) has at least one vertex. Then, either the optimal value is -\infty or there exists a vertex that is optimal.

Weak Duality

By observing problem (P). We are minimizing some function. A natural question is whether we can construct a lower bound, such that, no matter how we change our variables, the object functions always bigger than the lower bound. Now, let construct such a lower bound.

If we can find a vector yRm\mathbf{y} \in \mathbb{R}^m, s.t. ATycA^T \mathbf{y}\le c, then we have bTy=xTATycTx\mathbf{b}^T\mathbf{y}=\mathbf{x}^TA^T\mathbf{y}\le \mathbf{c}^T\mathbf{x}. This is in fact another optimization problem. Because, for all y\mathbf{y}s, the inequality above all holds. Obviously we want to find the max one to get the maximal lower bound. We firstly write the problem as follows:

(D)      vd=maxx   bTys.t.   ATyc\begin{aligned} (D)\ \ \ \ \ \ v_d^* = \max_\mathbf{x}\ \ \ &\mathbf{b}^T \mathbf{y}\\ \text{s.t. }\ \ &A^T \mathbf{y}\le c \end{aligned}

Here, we find that for any LP primal problem (P), there exists a dual problem (D). The optimal value of the dual problem (D) is a lower bound of the primal problem (P). This is called the weak duality. By “weak”, we mean that we have an inequality relation holds between the two problems ( $ v _ p ^ * = v _ d ^ * $ ). A natural question is when the primal optimal value equals to the dual optimal value ($v _ p^ * = v _ d^ * $). It will be powerful if the equality holds, because if $v _ p^* = v _ d^* $, we can solve the primal problem by just solving the dual problem if the primal problem is to complicated. When the equality will hold is studied in the strong duality theory.

Strong Duality

To get the strong duality, we first introduce Farkas’s Lemma without proof.

(Farkas’s Lemma) Let ARm×nA \in \mathbb{R}^{m\times n} and bRmb \in \mathbb{R}^m be given. Then, exactly one of the following systems has a solution:

Ax=b,x0ATy0,bTy>0 \begin{aligned} &Ax=b, x\ge0 \\ &A^Ty\le 0, b^Ty>0 \end{aligned}

Farkas’s Lemma shows that there must be one and only on of the two systems has a solution. This can be used to prove the strong duality of LP problem.

(LP Strong Duality) Suppose that (P) has an optimal solution xRnx^* \in \mathbb{R}^n. Then, (D) also has an optimal solution yRmy^* \in \mathbb{R}^m, and cTx=bTyc^T x^* = b^T y^*.

Proof We claim that this system is unsolvable: $ Ax = b $ , $ x \ge 0 $ , $ c ^ Tx < x ^ Tx ^ * $ (since $x^* $ attains the minimal value.). We homogenize this system and get another unsolvable system:

[AbcTcTx][xt]=[01];[xt]0\begin{aligned} \begin{bmatrix}A&-b\\c^T&-c^T x^* \end{bmatrix}\begin{bmatrix}x\\t\end{bmatrix}=\begin{bmatrix}0\\-1\end{bmatrix};\begin{bmatrix}x\\t\end{bmatrix}\ge 0 \end{aligned}

By Fakas’s Lemma, we have:

[ATcbTcTx][y1y2]0;y2>0 is solvableAT(y1y2)c0   (1)bT(y1y2)+cTx0   (2)\begin{aligned} &\begin{bmatrix}A^T&c\\-b^T&-c^T x^* \end{bmatrix}\begin{bmatrix}y_1\\y_2\end{bmatrix}\le 0;-y_2>0\text{ is solvable}\\ &\Rightarrow \\ &A^T(\frac{y_1}{y_2})-c \le 0 \ \ \ (1)\\ &b^T(\frac{y_1}{y_2})+c^Tx^* \le 0 \ \ \ (2) \end{aligned}

Denote $ - ( \frac {y _ 1}{y _ 2} )=y ^ * $From (1), we have: ATycA ^ T y ^ * \le c, this is the dual feasibility, which means that (D) has an optimal solution. From (2), we have: $c ^ Tx ^ * \le b ^ T y ^ * $ and combined with the weak duality, we have $c ^ Tx ^ * = b ^ T y ^ * $.

From the above, we can see that the LP problem has a good quality. If the primal problem has an optimal solution, the dual problem automatically has an optimal solution. And the duality gap is 0 ($ c ^ Tx ^ * = b ^ T y ^ * $).

Complementarity

Since the primal and dual problem have a zero duality gap. We have:
0=cTxbTy=xTc(Ax)Ty=xTcxTATy=xT(cATy)0 = c^Tx-b^Ty=x^Tc-(Ax)^Ty=x^Tc-x^TA^Ty=x^T(c-A^Ty). Since x0;(cATy)0x\ge0;(c-A^Ty)\ge0 Which means that if xi0x_ i\ne0 then (cAy)i(c-Ay)_ i must be 0 and vice versa. As a result the optimization problem can be converted into solving a constrained linear system:

Ax=b,x0   (primal feasibility)ATyc   (dual feasibility)xi(cAy)i=0,i=1,...,n   (complementarity)\begin{aligned} &Ax=b,x\ge0\ \ \ &\text{(primal feasibility)}\\ &A^T y \le c\ \ \ &\text{(dual feasibility)}\\ &x_i(c-Ay)_ i=0,i=1,...,n\ \ \ &\text{(complementarity)} \end{aligned}

It can be seen later that almost all optimization problems can be solved by the primal feasibility, dual feasibility and complementary even though they are not linear.

(II) Conic Linear Programming

From the previous chapter, we find that the constrain of LP problem is x0x\ge0. However, a lot of practical problems are not constrained in this region. A natural problem is that whether we can constrain the variable on more flexible region. In this chapter, we will firstly extend x0x\ge 0 to a more general representation and with this representation, we can describe more constrained optimization problems.

Extend x0x\ge 0 to a Pointed Cone

We can see that by saying uvu \ge v, we actually define a “partial order”. I.e. if uvu \ge v, it satisfies:

(a) (Reflexivity) uuu \ge u for all uRnu \in \mathbb{R}^n;
(b) (Anti–Symmetry) uvu \ge v and vuv \ge u imply u=vu=v for all u,vRnu,v\in \mathbb{R}^n;
© (Transitivity) ugevu ge v and vwv \ge w imply uwu\ge w for all u,vRnu,v\in \mathbb{R}^n;
(d) (Homogeneity) for any u,vRnu,v\in \mathbb{R}^n and α0\alpha \ge 0, if uvu \ge v,then αuαv\alpha u\ge \alpha v;
(e) (Additivity) for any u,v,w,zRnu,v,w,z\in \mathbb{R}^n, if uvu\ge v and wzw\ge z, then u+wv+zu+w \ge v+z.

Now, we are going to give a more general representation of these five conditions. We firstly introduce the pointed cone. We claim that KK is a pointed cone if it has the following properties:

  1. KK is non–empty and closed under addition; i.e., u+vKu + v \in K whenever u,vKu, v \in K.
  2. KK is a cone; i.e.,for any uKu\in K and α>0\alpha > 0, we have αuK\alpha u\in K.
  3. KK is pointed; i.e., if uKu\in K and uK-u\in K, then u=0u = 0.

Then, we show how 1-3 can represent (a)-(e). Firstly, we define

uKvuvKu \ge_K v \Leftrightarrow u-v\in K

We can prove “K\ge_K” fulfill (a)-(e) one by one with 1-3. Therefore, we can see that “\ge” can be represented by a pointed cone. If the pointed cone K=x:x0K={x:x\ge 0} it is exactly the constrain x0x\ge 0. There are some other pointed cones such as Lorentz Cone (Qn+1={(t,x)R×Rn:tx2}Q^{n+1} = \{(t, x) \in \mathbb{R} \times \mathbb{R}^n : t \ge \|x\|_ 2\}) and Positive Semidefinite Cone (S+n={XSn:uTXu0,uRn}S_+^n = \{X \in S^n : u^T Xu \ge 0, \forall u \in \mathbb{R}^n \}), where SnS^n is the space of all symmetric $ n \times n $ matrix. We can see that our variable can be not only a vector, but also a matrix.

Conic Linear Programming

By saying conic linear programming, we are going to optimize a linear objective function with conic constrains. The form of the primal problem is defined as:

(P)      vp=infx   cTxs.t.   aix=bi,i[1,m]xK0\begin{aligned} (P)\ \ \ \ \ \ v_p^* = \inf_\mathbf{x}\ \ \ &\mathbf{c}^T \bullet \mathbf{x}\\ \text{s.t. }\ \ &a_i\bullet \mathbf{x}=b_i, i\in[1,m]\\ & \mathbf{x} \ge_K \mathbf{0} \end{aligned}

Since the variable may be a matrix (xx in a PSD cone), we use a more general inner product \bullet to express the inner product of vector or matrix. Here we use inf\inf instead of min\min, because different from LP, CLP problem cannot always attain the optimal value. The last term xK0\mathbf{x} \ge_K \mathbf{0} can also be written as xK\mathbf{x}\in K.

Weak Duality

Similar to LP problem, we can also derive the weak duality for CLP problem.

bTy=i=1m(aix)yi=i=1myiaixcxb^Ty=\sum_{i=1}^m(a_i\bullet x)y_i=\sum_{i=1}^my_ia_i\bullet x \le c\bullet x

We found that if we require that x(ci=1myiai)0x(c-\sum_{i=1}^my_ia_i)\ge 0, bTyb^Ty is a lower bound of cxc\bullet x. Since we already have xKx\in K, we define the dual cone of KK as K={wE:xw0,xK}K^* =\{w\in E:x\bullet w \ge0, \forall x\in K\}. As a result, for any $ c-\sum_{i=1}^my_ia_i \in K ^ * $ , $ b ^ T y $ is always the lower bound of cxc\bullet x.
Therefore, the dual form of CLP can be expressed as:

(D)      vd=supx   bTys.t.   ci=1myiaiK\begin{aligned} (D)\ \ \ \ \ \ v_d^* = \sup_\mathbf{x}\ \ \ &\mathbf{b}^T \mathbf{y}\\ \text{s.t. }\ \ &c-\sum_{i=1}^my_ia_i \in K^* \end{aligned}

Sometimes, it’s not easy to find the dual cone of a pointed cone. But luckily, a lot of common used pointed cone is self-dual, i.e. $K=(K ^ * ) ^ * $. For example, (R+n)=R+n,(R+n)=Qn+1,(S+n)=S+n(\mathbb{R}^n_+)^* = \mathbb{R}^n_+,(\mathbb{R}^n_+)^* =Q^{n+1},(S_+^n)^* = S_+^n.

Strong Duality

As the same with LP, after we derived the weak duality, we are interested in strong duality. I.e. when bTy=cxb^Ty=c\bullet x. But since we extend the constraints of LP problem to a more general form, we should pay the price. The root of the problem arises at the failure of Farkas Lemma which we used to prove the LP strong duality. We give a weaker Farkas Lemma with a stronger requirement without proof.

(Conic Farkas’ Lemma) Let EE be a finite–dimensional Euclidean space equipped with an inner product \bullet, KEK \subset E be a closed pointed cone with non–empty interior, and a1,,amEa_1,\cdots , a_m \in E and bRb \in \mathbb{R} be given vectors. Suppose that the Slater condition holds; i.e., there exists a yˉR\bar{y} \in R such that i=1myˉiaiint(K)-\sum_{i=1}^m \bar{y}_ia_i \in int(K^* ). Then, exactly one of the systems (I) and (II) has a solution:

(I)aix=bi,i[1,m],xK(II)i=1myiaiK,bTy>0\begin{aligned} &(I) a_i\bullet x=b_i, i\in[1,m],x\in K\\ &(II)-\sum_{i=1}^m y_ia_i\in K^* ,b^Ty>0 \end{aligned}

Observed that the Slater condition is a sufficient condition for the conic Farkas Lemma. Compared with the LP problem’s Farkas’s Lemma, it’s a little bit weaker. Use the Conic Farkas’s Lemma, we can prove the strong duality of CLP. We give the CLP strong duality without proof.

(CLP Strong Duality)

(a) Suppose that (P) is bounded below and strictly feasible; i.e., there exists a feasible solution xˉ\bar{x} to (P) such that xˉint(K)\bar{x}\in int(K). Then, we have $v_p^* = v_d^* $. Moreover, there exists a feasible solution (yˉ,sˉ)(\bar{y},\bar{s}) to (D) such that $b ^ T\bar{y}=v_p ^ * = v_d ^ * $; i.e., the common optimal value is attained by some dual feasible solution.
(b) Suppose that (D) is bounded above and strictly feasible; i.e., there exists a feasible solution (yˉ,sˉ)(\bar{y},\bar{s}) to (D) such that sˉint(K)\bar{s}\in int(K^* ). Then, we have $v_p^* = v_d^* $. Moreover, there exists a feasible solution xˉ\bar{x} to (P ) such that $c\bullet \bar{x}=v_p^* = v_d^* $ ; i.e., the common optimal value is attained by some primal feasible solution.
© Suppose that either (P ) or (D) is bounded and strictly feasible. Then, given a feasible solution xˉ\bar{x} to (P) and a feasible solution (yˉ,sˉ)(\bar{y},\bar{s}) to (D), the following are equivalent:

  • xˉ\bar{x} and (yˉ,sˉ)(\bar{y},\bar{s}) are optimal for (P) and (D), respectively.
  • The duality gap is zero; i.e., cxˉ=bTyc\bullet \bar{x}=b^Ty.
  • We have complementary slackness; i.e., $ \bar{x}\bullet\bar{s}= 0$.

From above we observe that if (P) or (D) have good property (have an inner feasible solution). The counter part (not themselves!) i.e. (D) or (P) can attain the optimal value. If the both have an inner feasible solution both of them can attain the optimal value.

(III) Lagrangian Duality

From the above chapters, we can see that for both LP and CLP problem, we construct the minimal value’s lower bound, the lower bound is exactly the maximum value of another problem (dual problem). Under some condition, the lower bound equals to the minimal value and thus the duality gap equals to 0. A natural question is: can we use the same trick to construct the lower bound for an arbitrary objective function? The answer is yes. We will show as follows.

(P)      vp=inf   f(x)s.t.   gi(x)0,i=1,...,m1,hj(x)=0,j=1,...,m2,xX\begin{aligned} (P)\ \ \ \ \ \ v_p^* = \inf\ \ \ &f(x)\\ \text{s.t. }\ \ &g_i(x)\le0,i=1,...,m_1,\\ & h_j(x)=0,j=1,...,m_2,\\ &x\in X \end{aligned}

Weak Duality

We will construct the weak duality by a penalty function. Firstly, we denote G(x)=(g1(x),g2(x),,gm1(x))G(x)=(g_1(x), g_2(x),\cdots,g_{m_1}(x)),H(x)=(h1(x),h2(x),,hm1(x))H(x)=(h_1(x), h_2(x),\cdots,h_{m_1}(x)). We define the Lagrangian function as

L(x,v,w)=f(x)+vTG(x)+wTH(x)L(x,v,w)=f(x)+v^TG(x)+w^TH(x)

Where vRm1,wRm2v\in\mathbb{R}^{m_1},w\in\mathbb{R}^{m_2}. By using the Lagrangian function, we can rewrite the primal problem (P) as follows:

infxXsupv0,wL(x,v,w)\inf_{x\in X}\sup_{v\ge0,w}L(x,v,w)

To examine this, we can observe the sup\sup inside is actually a penalty function. Because

supv0,wL(x,v,w)={f(x),   if G(x)0 and H(x)=0+,    otherwise\sup_{v\ge0,w}L(x,v,w)=\left\{\begin{aligned}f(x),\ \ \ &if\ G(x)\le 0\ and\ H(x)=0\\+\infty,\ \ \ \ &otherwise\end{aligned}\right.

If x violates the constraints, we can find a proper vv or ww to make the value to be ++\infty. Thus, it cannot be the minimal value. It can be examined that for any xˉ,vˉ,wˉ\bar{x},\bar{v},\bar{w}, we have:

infxXL(x,vˉ,wˉ)L(xˉ,vˉ,wˉ)supv0,wL(xˉ,v,w)\inf_{x\in X}L(x,\bar{v},\bar{w})\le L(\bar{x},\bar{v},\bar{w})\le \sup_{v\ge0, w}L(\bar{x},v,w)

We denote infxXL(x,vˉ,wˉ)\inf_{x\in X}L(x,\bar{v},\bar{w}) as θ(v,w)\theta(v,w). We can see that if xˉ \bar{x} is feasible for (P), then supv0,wL(xˉ,v,w)=f(xˉ)\sup_{v\ge0, w}L(\bar{x},v,w)=f(\bar{x}). Then we get:

θ(vˉ,wˉ)f(xˉ)\theta(\bar{v},\bar{w})\le f(\bar{x})

We construct the dual problem as

(D)      vd=supv0,w   θ(v,w)\begin{aligned} (D)\ \ \ \ \ \ v_d^* = \sup_{v\ge0,w}\ \ \ &\theta(v,w) \end{aligned}

It is a lower bound of f(xˉ)f(\bar{x}) is xˉ\bar{x} is feasible for (P).

Strong Duality

Now, we study the strong duality of Lagrangian duality. We will introduce the Saddle point first and then we will show that the strong duality is attained at the saddle points.

Define Saddle Point We say (xˉ,vˉ,wˉ)(\bar{x},\bar{v},\bar{w}) is a saddle point if

(1) xˉX\bar{x}\in X;
(2) vˉ0\bar{v}\ge0;
(3) xX,v0,w,L(xˉ,v,w)L(xˉ,vˉ,wˉ)L(x,vˉ,wˉ)\forall x\in X,v\ge 0,w,L(\bar{x},v,w)\le L(\bar{x},\bar{v},\bar{w})\le L(x,\bar{v},\bar{w})

The theorem of strong duality is as follows:

Theorem Strong Duality (xˉ,vˉ,wˉ)(\bar{x},\bar{v},\bar{w}) is a saddle point of (P) iff $v_p ^ * =v_d ^ * $; xˉ\bar{x} is optimal for (P);(vˉ,wˉ)(\bar{v},\bar{w}) is optimal for (D).
Proof

\Rightarrow:

By (1), xˉX\bar{x}\in X, this implies xˉ\bar{x} is feasible for (P);
By (2), (vˉ,wˉ)(\bar{v},\bar{w}) is feasible for (D);
By (3),

θ(vˉ,wˉ)=infxXL(x,vˉ,wˉ)=L(xˉ,vˉ,wˉ)=supv0,wL(xˉ,v,w)=f(xˉ)\begin{aligned}\theta(\bar{v},\bar{w})&=\inf_{x\in X}L(x,\bar{v},\bar{w})\\ &=L(\bar{x},\bar{v},\bar{w})\\ &=\sup_{v\ge0,w}L(\bar{x},v,w)\\ &=f(\bar{x}) \end{aligned}

\Leftarrow:

If saddle structure exists, (1) and (2) holds obviously. It is suffices prove (3). Consider θ(vˉ,wˉ)=infxXL(x,vˉ,wˉ)L(xˉ,vˉ,wˉ)supv0,wL(xˉ,v,w)=f(xˉ)\theta(\bar{v},\bar{w})=\inf_{x\in X}L(x,\bar{v},\bar{w})\le L(\bar{x},\bar{v},\bar{w})\le\sup_{v\ge0,w}L(\bar{x},v,w)=f(\bar{x}). Since we already have θ(vˉ,wˉ)=vd=vp=f(xˉ)\theta(\bar{v},\bar{w})=v_d ^ * =v_p ^ * = f(\bar{x}) Then, the previous inequality all gets equality.

Same with previous chapters, we can also write the primal feasibility, dual feasibility and complementarity.

Saddle Point Optimality Conditions
(xˉ,vˉ,wˉ)(\bar{x},\bar{v},\bar{w}) is a saddle point \Leftrightarrow

(a) (Primal Feasibility) xˉX,G(xˉ)0, and H(xˉ)=0.\bar{x}\in X,G(\bar{x}) \le 0,\ and\ H(\bar{x}) = 0.
(b) (Dual Feasibility/Lagrangian Optimality) vˉ0 and xˉ=argminxXL(x,vˉ,wˉ)\bar{v}\ge 0\ and\ \bar{x} = \arg \min_{x\in X} L(x, \bar{v}, \bar{w}).
© (Complementarity) vˉTG(xˉ)=0\bar{v}^T G(\bar{x}) = 0.

Karush–Kuhn–Tucker (KKT) Necessary Conditions

KKT is a necessary condition for optimal value. I.e. for any optimal solution, it must satisfy KKT condition. The KKT condition is given as follows:
(The Karush–Kuhn–Tucker Necessary Conditions) Let xˉS\bar{x}\in S be a local minimum of problem (P). Let I=i1,...,m1:gi(xˉ)=0I = {i \in {1,...,m_1} : g_i(\bar{x}) = 0} be the index set for the active constraints. Suppose that xˉ\bar{x} is regular; i.e., the family {gi(xˉ)}iI{hj(xˉ)}j=1m2\{\nabla g_i(\bar{x})\}_{i\in I} \bigcup \{\nabla h_j(\bar{x})\}^{m_2}_{j=1} of vectors is linearly independent. Then, there exist v1,...,vm1Rv_1,...,v_{m_1} \in \mathbb{R} and w1,...,wm2Rw_1,...,w_{m_2} \in \mathbb{R} such that

(1) f(xˉ)+ivigi(xˉ)+jwjhj(xˉ)=0\nabla f(\bar{x})+\sum_i v_i\nabla g_i(\bar{x})+\sum_jw_j\nabla h_j(\bar{x})=0;
(2) v0v\ge0;
(3) vigi(xˉ)=0,iv_ig_i(\bar{x})=0,\forall i(complementarity)

It should be noted that it requires that xˉ\bar{x} fulfill the regularity condition. There are also a lot of other regularity conditions that if one on them holds, then KKT condition holds.

(Regularity Condition 2) Consider problem (P), where g1,...,gm1g_1, . . . , g_{m1} are convex and h1,...,hm2h1, . . . , h_{m_2} are affine. Let xˉS\bar{x}\in S be a local minimum and I={i1,...,m1:gi(xˉ)=0}I = \{i \in {1,...,m_1} : g_i(\bar{x}) = 0\}. Suppose that the Slater condition is satisfied; i.e., there exists an xSx' \in S such that gi(x)<0g_i(x') < 0 for iIi \in I. Then, xˉ\bar{x} satisfies the KKT conditions.

(Regularity Condition 3) Consider problem (P), where g1,...,gm1g_1, . . . , g_{m1} are concave and h1,...,hm2h_1, . . . , h_{m_2} are affine. Let
xˉS\bar{x}\in S be a local minimum. Then, xˉ\bar{x} satisfies the KKT conditions.

(Regularity Condition 3) is important. Since all linear constraints is both convex and concave. It implies that if constraints are all linear, it fulfill KKT condition.

Reference

[1] ENGG5501 course note. Anthony Man-Cho So. CUHK.