Optimization is the foundation of many other popular technologies and is widely used in our modern society. In this post, we will go through the basics in the foundation of optimization. We will start from the simplest linear programming, extend it to a more general conic version and finally arrive at Lagrange optimization. In order to concentrate on the main framework of optimization, we leave out some long proof which can be easily found in reference materials. All this post comes from course ENGG5501 taught by Prof. Anthony Man-Cho So in CUHK.
(I) Linear Programming
In linear programming (LP), the objective function and constraints are all linear w.r.t. variables. The standard form of linear programming is given as follows:
Where , . Since , the feasible region contains no line. It implies that it has at least one vertex. As stated by a theorem, if (P) has at least one vertex. Then, either the optimal value is or there exists a vertex that is optimal.
Weak Duality
By observing problem (P). We are minimizing some function. A natural question is whether we can construct a lower bound, such that, no matter how we change our variables, the object functions always bigger than the lower bound. Now, let construct such a lower bound.
If we can find a vector , s.t. , then we have . This is in fact another optimization problem. Because, for all s, the inequality above all holds. Obviously we want to find the max one to get the maximal lower bound. We firstly write the problem as follows:
Here, we find that for any LP primal problem (P), there exists a dual problem (D). The optimal value of the dual problem (D) is a lower bound of the primal problem (P). This is called the weak duality. By “weak”, we mean that we have an inequality relation holds between the two problems ( $ v _ p ^ * = v _ d ^ * $ ). A natural question is when the primal optimal value equals to the dual optimal value ($v _ p^ * = v _ d^ * $). It will be powerful if the equality holds, because if $v _ p^* = v _ d^* $, we can solve the primal problem by just solving the dual problem if the primal problem is to complicated. When the equality will hold is studied in the strong duality theory.
Strong Duality
To get the strong duality, we first introduce Farkas’s Lemma without proof.
(Farkas’s Lemma) Let and be given. Then, exactly one of the following systems has a solution:
Farkas’s Lemma shows that there must be one and only on of the two systems has a solution. This can be used to prove the strong duality of LP problem.
(LP Strong Duality) Suppose that (P) has an optimal solution . Then, (D) also has an optimal solution , and .
Proof We claim that this system is unsolvable: $ Ax = b $ , $ x \ge 0 $ , $ c ^ Tx < x ^ Tx ^ * $ (since $x^* $ attains the minimal value.). We homogenize this system and get another unsolvable system:
By Fakas’s Lemma, we have:
Denote $ - ( \frac {y _ 1}{y _ 2} )=y ^ * $From (1), we have: , this is the dual feasibility, which means that (D) has an optimal solution. From (2), we have: $c ^ Tx ^ * \le b ^ T y ^ * $ and combined with the weak duality, we have $c ^ Tx ^ * = b ^ T y ^ * $.
From the above, we can see that the LP problem has a good quality. If the primal problem has an optimal solution, the dual problem automatically has an optimal solution. And the duality gap is 0 ($ c ^ Tx ^ * = b ^ T y ^ * $).
Complementarity
Since the primal and dual problem have a zero duality gap. We have:
. Since Which means that if then must be 0 and vice versa. As a result the optimization problem can be converted into solving a constrained linear system:
It can be seen later that almost all optimization problems can be solved by the primal feasibility, dual feasibility and complementary even though they are not linear.
(II) Conic Linear Programming
From the previous chapter, we find that the constrain of LP problem is . However, a lot of practical problems are not constrained in this region. A natural problem is that whether we can constrain the variable on more flexible region. In this chapter, we will firstly extend to a more general representation and with this representation, we can describe more constrained optimization problems.
Extend to a Pointed Cone
We can see that by saying , we actually define a “partial order”. I.e. if , it satisfies:
(a) (Reflexivity) for all ;
(b) (Anti–Symmetry) and imply for all ;
© (Transitivity) and imply for all ;
(d) (Homogeneity) for any and , if ,then ;
(e) (Additivity) for any , if and , then .
Now, we are going to give a more general representation of these five conditions. We firstly introduce the pointed cone. We claim that is a pointed cone if it has the following properties:
- is non–empty and closed under addition; i.e., whenever .
- is a cone; i.e.,for any and , we have .
- is pointed; i.e., if and , then .
Then, we show how 1-3 can represent (a)-(e). Firstly, we define
We can prove “” fulfill (a)-(e) one by one with 1-3. Therefore, we can see that “\ge” can be represented by a pointed cone. If the pointed cone it is exactly the constrain . There are some other pointed cones such as Lorentz Cone () and Positive Semidefinite Cone (), where is the space of all symmetric $ n \times n $ matrix. We can see that our variable can be not only a vector, but also a matrix.
Conic Linear Programming
By saying conic linear programming, we are going to optimize a linear objective function with conic constrains. The form of the primal problem is defined as:
Since the variable may be a matrix ( in a PSD cone), we use a more general inner product to express the inner product of vector or matrix. Here we use instead of , because different from LP, CLP problem cannot always attain the optimal value. The last term can also be written as .
Weak Duality
Similar to LP problem, we can also derive the weak duality for CLP problem.
We found that if we require that , is a lower bound of . Since we already have , we define the dual cone of as . As a result, for any $ c-\sum_{i=1}^my_ia_i \in K ^ * $ , $ b ^ T y $ is always the lower bound of .
Therefore, the dual form of CLP can be expressed as:
Sometimes, it’s not easy to find the dual cone of a pointed cone. But luckily, a lot of common used pointed cone is self-dual, i.e. $K=(K ^ * ) ^ * $. For example, .
Strong Duality
As the same with LP, after we derived the weak duality, we are interested in strong duality. I.e. when . But since we extend the constraints of LP problem to a more general form, we should pay the price. The root of the problem arises at the failure of Farkas Lemma which we used to prove the LP strong duality. We give a weaker Farkas Lemma with a stronger requirement without proof.
(Conic Farkas’ Lemma) Let be a finite–dimensional Euclidean space equipped with an inner product , be a closed pointed cone with non–empty interior, and and be given vectors. Suppose that the Slater condition holds; i.e., there exists a such that . Then, exactly one of the systems (I) and (II) has a solution:
Observed that the Slater condition is a sufficient condition for the conic Farkas Lemma. Compared with the LP problem’s Farkas’s Lemma, it’s a little bit weaker. Use the Conic Farkas’s Lemma, we can prove the strong duality of CLP. We give the CLP strong duality without proof.
(CLP Strong Duality)
(a) Suppose that (P) is bounded below and strictly feasible; i.e., there exists a feasible solution to (P) such that . Then, we have $v_p^* = v_d^* $. Moreover, there exists a feasible solution to (D) such that $b ^ T\bar{y}=v_p ^ * = v_d ^ * $; i.e., the common optimal value is attained by some dual feasible solution.
(b) Suppose that (D) is bounded above and strictly feasible; i.e., there exists a feasible solution to (D) such that . Then, we have $v_p^* = v_d^* $. Moreover, there exists a feasible solution to (P ) such that $c\bullet \bar{x}=v_p^* = v_d^* $ ; i.e., the common optimal value is attained by some primal feasible solution.
© Suppose that either (P ) or (D) is bounded and strictly feasible. Then, given a feasible solution to (P) and a feasible solution to (D), the following are equivalent:
- and are optimal for (P) and (D), respectively.
- The duality gap is zero; i.e., .
- We have complementary slackness; i.e., $ \bar{x}\bullet\bar{s}= 0$.
From above we observe that if (P) or (D) have good property (have an inner feasible solution). The counter part (not themselves!) i.e. (D) or (P) can attain the optimal value. If the both have an inner feasible solution both of them can attain the optimal value.
(III) Lagrangian Duality
From the above chapters, we can see that for both LP and CLP problem, we construct the minimal value’s lower bound, the lower bound is exactly the maximum value of another problem (dual problem). Under some condition, the lower bound equals to the minimal value and thus the duality gap equals to 0. A natural question is: can we use the same trick to construct the lower bound for an arbitrary objective function? The answer is yes. We will show as follows.
Weak Duality
We will construct the weak duality by a penalty function. Firstly, we denote ,. We define the Lagrangian function as
Where . By using the Lagrangian function, we can rewrite the primal problem (P) as follows:
To examine this, we can observe the inside is actually a penalty function. Because
If x violates the constraints, we can find a proper or to make the value to be . Thus, it cannot be the minimal value. It can be examined that for any , we have:
We denote as . We can see that if is feasible for (P), then . Then we get:
We construct the dual problem as
It is a lower bound of is is feasible for (P).
Strong Duality
Now, we study the strong duality of Lagrangian duality. We will introduce the Saddle point first and then we will show that the strong duality is attained at the saddle points.
Define Saddle Point We say is a saddle point if
(1) ;
(2) ;
(3)
The theorem of strong duality is as follows:
Theorem Strong Duality is a saddle point of (P) iff $v_p ^ * =v_d ^ * $; is optimal for (P); is optimal for (D).
Proof
:
By (1), , this implies is feasible for (P);
By (2), is feasible for (D);
By (3),
:
If saddle structure exists, (1) and (2) holds obviously. It is suffices prove (3). Consider . Since we already have Then, the previous inequality all gets equality.
Same with previous chapters, we can also write the primal feasibility, dual feasibility and complementarity.
Saddle Point Optimality Conditions
is a saddle point
(a) (Primal Feasibility)
(b) (Dual Feasibility/Lagrangian Optimality) .
© (Complementarity) .
Karush–Kuhn–Tucker (KKT) Necessary Conditions
KKT is a necessary condition for optimal value. I.e. for any optimal solution, it must satisfy KKT condition. The KKT condition is given as follows:
(The Karush–Kuhn–Tucker Necessary Conditions) Let be a local minimum of problem (P). Let be the index set for the active constraints. Suppose that is regular; i.e., the family of vectors is linearly independent. Then, there exist and such that
(1) ;
(2) ;
(3) (complementarity)
It should be noted that it requires that fulfill the regularity condition. There are also a lot of other regularity conditions that if one on them holds, then KKT condition holds.
(Regularity Condition 2) Consider problem (P), where are convex and are affine. Let be a local minimum and . Suppose that the Slater condition is satisfied; i.e., there exists an such that for . Then, satisfies the KKT conditions.
(Regularity Condition 3) Consider problem (P), where are concave and are affine. Let
be a local minimum. Then, satisfies the KKT conditions.
(Regularity Condition 3) is important. Since all linear constraints is both convex and concave. It implies that if constraints are all linear, it fulfill KKT condition.
Reference
[1] ENGG5501 course note. Anthony Man-Cho So. CUHK.