孟敏,李修賢
(同濟(jì)大學(xué)電子與信息工程學(xué)院控制科學(xué)與工程系,上海 201804;同濟(jì)大學(xué)上海自主智能無(wú)人系統(tǒng)科學(xué)中心,上海 201210;同濟(jì)大學(xué)上海智能科學(xué)與技術(shù)中心,上海 200092)
This paper deals with the constrained optimization problem formulated as follows:
where the objective functionf: Rn →R andg(x)(g1(x)g2(x)...gm(x))Twithgi: Rn →R being convex and continuously differentiable.By resorting to the(or augmented)LagrangianL(x,λ)of problem(1),the corresponding(or augmented)primal-dual gradient algorithm(PDG)(or Aug-PDG)can be designed as
whereα >0 is a stepsize and[.]+denotes the projection operator onto the nonnegative orthants componentwisely.It is known that Eq.(2) can find a saddle point ofL(x,λ),and thus it has been extensively studied to solve the constrained optimization problem[1].
Optimization has wide applications in the artificial intelligence field such as smart grids [2-3],wireless communication[4],robot systems[5],game theory [6-7],to name just a few.To date,there is a large body of literature on theoretical analysis of asymptotic convergence of various algorithms,including primaldual gradient-based algorithms,for tackling the optimization problem under different settings[8-18].
In recent decades,researchers have focused on the linear convergence and exponential convergence of primal-dual gradient-based algorithms in discrete-time and continuous-time,respectively.It is well-known that when the objective function is strongly convex and smooth,the gradient decent algorithm for unconstrained convex optimization can achieve global exponential convergence in continuous-time and global linear convergence in discrete-time.In the context of constrained optimization with equality constraintsAxbor affine inequality constraintsAx≤b,PDG is proved to converge globally exponentially in continuous-time setup [19].A proximal gradient flow was proposed in [20],which can be applied to resolve convex optimization problems with affine inequality constraints and has global exponential convergence whenAhas full row rank.Local exponential convergence of the primal-dual gradient dynamics can be established with the help of spectral bounds of saddle matrices[21].Recently,the authors in [22] proved that the Aug-PDGD in continuous-time for optimization with affine equality and inequality constraints achieves global exponential convergence,and the global linear converge of primal-dual gradient optimization (PDGO) in discretetime was discussed in [23] by contraction theory.It should be noted that the aforementioned works focus on unconstrained optimization or constrained optimization with affine equality and/or affine inequality constraints.For the case with nonlinear inequality constraints,the asymptotic convergence has been extensively studied such as in [24].However,the linear/exponential convergence for the optimization with nonlinear inequality constraints is seldom investigated in the literature.One exception is the recent work [25],where the authors established a semi-global exponential convergence of continuous-time Aug-PDGD in the sense that the convergence rate depends on the distance from the initial point to the optimal point.
However,[25]concentrates on the continuous-time dynamics.As discrete-time algorithms are easily implemented in practical applications,in this paper,the discrete-time algorithm is addressed for the optimization problem with nonlinear inequality constraints.Theoretical analysis based on a quadratic Lyapunov function that has non-zero off-diagonal terms is first presented to show that the Aug-PDG achieves semi-global linear convergence.
The rest of this paper is organized as follows.Section 2 introduces preliminaries on optimization with nonlinear equality constraints.The main result on the semi-global linear convergence of Aug-PDGA,along with its proof,is presented in Section 3.Section 4 provides a numerical example to illustrate the feasibility of the obtained result.Section 5 makes a brief conclusion.
Notations.Let Rm,and Rm×nbe the sets ofmdimensional real column vectors,m-dimensional nonnegative column vectors andm×nreal matrices,respectively.Define [x]+to be the component-wise projection of a vectorRmonto Rm+.x≥0 for any vectorRmmeans that each entry ofxis nonnegative.For an integern >0,denote[n] :{1,2,...,n}.Inis the identity matrix of dimensionn.1n(resp.0n)represents ann-dimensional vector with all of its elements being 1(resp.0).For a vector or matrixA,ATdenotes the transpose ofAandAIis a matrix composed of the rows ofAwith the indices inI.For real symmetric matricesPandQ,P ?(?,?,?)Qmeans thatP-Qis positive (positive semi-,negative,negative semi-) definite,while for two vectors/matricesw,vof the same dimension,w≤vmeans that each entry ofw -vis nonnegative.diag{a1,a2,...,an}represents a diagonal matrix withai,[n],on its diagonal.
Consider problem (1).An augmented Lagrangian associated with problem(1)is introduced as[26]
whereRn,λ(λ1λ2...λm)TRm,ρ >0 is the penalty parameter,and
It can be verified thatU(x,λ) is convex inxand concave inλ,andU(x,λ) is continuously differentiable,i.e.,
whereeiis ann-dimensional vector with theith entry being 1 and others 0.Then the Aug-PDG is explicitly written as
where(0,ρ] is the stepsize to be specified.Here,the initial conditions are arbitrarily chosen asx0Rnandλ0≥0.
To proceed,the following results are vital for solving the constrained optimization problem.
Lemma 1For Aug-PDG (7),ifλ0≥0,thenλk≥0,?k≥0.
ProofThis result can be proved by mathematical induction,which is omitted here.
Lemma 2A primal-dual pair (x*,λ*) is an equilibrium point of the Aug-PDG(7)if and only if(x*,λ*)is a Karush-Kuhn-Tucker(KKT)point of(1).
ProofIf a primal-dual pair (x*,λ*) is an equilibrium point of the Aug-PDG (7),that is,x*x*-α?xL(x*,λ*)andλ*λ*+α?λL(x*,λ*),then?xL(x*,λ*)0 and?λL(x*,λ*)0.?λL(x*,λ*)0 is equivalent to
Thus,it can be claimed that the primal-dual pair(x*,λ*)is a KKT point.
Conversely,if(x*,λ*)is a KKT point of(1),then
Via a simple computation,?xL(x*,λ*)0 and?λL(x*,λ*)0,which implies that (x*,λ*) is an equilibrium point of the Aug-PDG(7).
In this section,the main result on the linear convergence of the Aug-PDG is presented.
The following assumptions are essential for deriving the main result.
Assumption 1The problem (1) has a unique feasible solutionx*,and atx*,the linear independence constraint qualification (LICQ) holds atx*,i.e.,{?gi(x*)is linearly independent,whereI:[m]|gi(x*)0}is the so-called active set atx*.
Under Assumption 1,the optimal Lagrangian multiplierλ*is also unique[27].Denote byJthe Jacobian ofg(x)atx*andJIthe matrix composed of the rows ofJwith the indices inI.LICQ in Assumption 1 also implies that0[25].Define
to be the smallest eigenvalue of
Assumption 2The objective functionf(x)has a quadratic gradient growth with parameterμ>0 over Rn,i.e.,for anyRn,
The concept of quadratic gradient growth was introduced in [28],which is a relaxation of strong convexity condition for guaranteeing linear convergence of gradient-based optimization algorithms.In fact,the class of functions having quadratic gradient growth include the strongly convex functions as a proper subset and some functions with quadratic gradient growth are even not convex.
Assumption 3The objective functionfislsmooth over Rn,i.e.,
For any[m],gi(x) isLgi-smooth and has bounded gradient,i.e.,for someLgi,Bgi >0 and anyRn,there holds
Denote
Under Assumption 3,one can obtain that
Before giving the main result of this paper,it is convenient to list the following concept similar to that in continuous-time setting[29].
Definition 1Consider the dynamicsz(t+1)φ(z(t)) with initial pointz(0)z0.Assume thatzeis an equilibrium point satisfyingzeφ(ze).zeis said to be a semi-global linear stable point if for anyh >0,there existc >0 and 0<γ <1 such that for anyz0satisfying‖z0-ze‖≤h,‖z(t)-ze‖≤cγt‖z0-ze‖,?t≥0.zeis said to be a global linear stable point ifcandγdo not depend onh.
Then the main result is presented as follows.
Theorem 1Under Assumptions 1-3,if the stepsize 0<α <1 is chosen such that
whereδ >0 satisfies
where 0<γ <1 satisfies
Proof The proof is postponed to the next subsection.
Remark 1The selection of parametersαandδensures thatc1,c2,c3are positive,and thenγ >0 can be guaranteed.From Theorem 1,one can see that the convergence rate is related toπ*,where the distanced0between the initial point and the optimal one is involved.Therefore,the convergence rate decreases to 0 asd0goes to infinity.The rate also changes as(xk,λk)approaches the optimal point.In fact,any(xk,λk)can be regarded as an initial point of the studied algorithm in the sense of running the algorithm from the point (xk,λk).Then,when (xk,λk) approaches the optimal point,π*is smaller,leading to smaller 1-γ.As a result,the rate is slow at the beginning and then becomes fast whenxkgoes to the optimal point.Consequently,Theorem 1 does not guarantee the existence of a global linear convergence,and only semi-global linear convergence can be ensured.
Remark 2Compared with the most related literature [25],where a continuous-time algorithm,called Aug-PDGD,was studied with a semi-global exponential convergence,a discrete-time algorithm Aug-PDG is analyzed here with a semi-global linear convergence.Although discrete-time algorithms may be obtained by discretizing the continuoustime Aug-PDGD using such as explicit Euler method,it is unclear how to select the sampling stepsize to guarantee the convergence especially in the sense of semi-global convergence.In comparison,an explicit bound on the stepsizeαis established here.However,one drawback is that the upper bound ofαdepends on the bounds of the cost functions,constraint functions and their gradients,as well as the optimal solution.This may be tackled by adaptive methods,which will be our future research of interest.
To prove Theorem 1,intermediate results are first presented as follows.
Lemma 3[22]For anyy,y*R,there exists[0,1]such that[y]+-[y*]+ξ(y-y*).Specifically,ξcan be chosen as
Lemma 4Under Assumption 1-3,xkgenerated by the Aug-PDG(7)satisfies
ProofBy iterations in(7),one has that
Based on
for the second term on the right side of(18),one has
Note that
Define
if(ρgi(xk)+λi,k)-(ρgi(x*)+λ*)0.Thenξi,j[0,1].It can be obtained from Lemma 3 that
Substituting Eq.(23)into Eq.(20)yields that
By Eqs.(19)-(24)and Assumption 3,one has that
For the third term on the right side of Eq.(18),
where the inequality is derived based on Assumption 2 and the convexity ofU(x,λ)atx,i.e.,
for anyx,x'Rn.Plugging Eq.(25)and Eq.(26)into Eq.(18),one concludes that Eq.(17)holds.
Lemma 5Under Assumptions 1-3,λkgenerated by the Aug-PDG(7)satisfies
ProofFor‖λk+1-λ*‖2,by iteration (7b),one has
Recalling?λL(x*,λ*)?λU(x*,λ*)0 and the notation ofξi,kin Eqs.(21)-(22),it can be obtained that
whereΞkdiag{ξ1,k,...,ξm,k},the inequality is obtained based on Eq.(12) and‖Ξk‖≤1,‖Ξk -Im‖≤1 forξi,k[0,1],[m].
AsU(x,λ)is concave atλ,one has
By Eqs.(30)-(31),it can be derived from Eq.(29)that Eq.(28)holds.
In what follows,we prove Theorem 1 in detail.
Proof of Theorem 1Define
where
The bounds of‖xk+1-x*‖2andλk+1-λ*are given in Lemmas 4 and 5,respectively.It suffices to compute the bound of 2δ(xk+1-x*)TJT(λk+1-λ*).
Note that(x*,λ*)is the KKT point of(1),that is,
It is easy to obtain that
By Eq.(24),one has that
Similar to Eqs.(21)-(22),define
then
whereΞλ:diag{ξ1,λ,...,ξm,λ}.For the last term of Eq.(36),it holds that
Therefore,by Eqs.(30)and(37)-(41),Eq.(36)is rewritten as
where the last inequality is obtained by a simple computation,along with Eqs.(24),(30)and(38).
Define
Combining with Eqs.(17) (28) (42) and (44),the bound ofVδ,k+1can be derived that
Hence,to proveQ ?0,it suffices to ensure
By Eq.(16),one can obtain that
where
Subsequently,by the mathematical induction,Eq.(52)can be proved.The proof is completed.
In this section,an example motivated by applications in power systems[25]is presented to illustrate the feasibility of the discrete-time Aug-PGD(7).Consider the following constrained optimization problem:
wherex(p1...pn q1...qn)Tandpv,i,Si,[n]are constants.The problem Eq.(53)along with an affine inequality constraint was considered in [25] but via a continuous-time dynamics Aug-PDGD.The affine inequality constraints can be regarded as special nonlinear constraints.Hence the algorithm Aug-PDG in this paper is applicable to the optimization problem Eq.(53).
Letn10,S(S1,...,Sn)(2.7,1.35,2.7,1.35,2.025,2.025,2.7,2.7,1.35,2.025) andpv(pv,1,...,pv,n)4S.Chooseρ0.1.Three cases are simulated,where the initial point (x0,λ0) is selected randomly such that the distance from the initial point (x0,λ0) to the optimal point (x*,λ*) (i.e.,d0) is 0.1‖(x*,λ*)‖,5‖(x*,λ*)‖and 10‖(x*,λ*)‖,respectively.The curves of the normalized distancewith respect to the iterationkare shown in Fig.1 when choosingα0.1 and in Fig.2 when choosingα0.05,where for each case,10 instances of randomly selected initial points are considered.From Fig.1,it can be seen that the convergence rates are different for differentd0,and the distance‖(xk-x*,λk-λ*)‖linearly decays on the whole.From Figs.1 and 2,when the algorithm is convergent by choosing appropriate stepsizeα,it can be seen that the convergence rate is smaller ifαis smaller.Moreover,for each case,the decreasing rate also changes as (xk,λk) approaches the optimal point.Specifically,the decreasing rates are small at the beginning and then become large when(xk,λk)goes to the optimal point.These observations support the semi-global linear convergence of the Aug-PDG,which is consistent with our theory analysis.
Fig.1 Simulation of the relative distances to‖(x*,λ*)‖with respect to iteration k by choosing α0.1
Fig.2 Simulation of the relative distances to‖(x*,λ*)‖with respect to iteration k by choosing α0.05
In this paper,the linear convergence of an Aug-PDG in discrete-time for convex optimization with nonlinear inequality constraints has been investigated.Under some mild assumptions,the Aug-PDG has been proved to semi-globally converge at a linear rate,which depends on the distance from the initial point to the optimal point.Future research of interest may be to adopted adaptive methods to determine the upper bound of stepsizes.