The trust-region self-consistent field method: Towards a black-box optimization in Hartree–Fock and Kohn–Sham theories
Lea Thøgersen, Jeppe Olsen, Danny Yeager,a)and Poul Jørgensen Department of Chemistry, University of A˚ rhus, DK-8000 A˚rhus C, Denmark Paweł Sałek
Laboratory of Theoretical Chemistry, The Royal Institute of Technology, Teknikringen 30, Stockholm SE-10044, Sweden
Trygve Helgaker
Department of Chemistry, University of Oslo, P.O. Box 1033 Blindern, N-0315 Norway 共Received 17 February 2004; accepted 5 April 2004兲
The trust-region self-consistent field共TRSCF兲method is presented for optimizing the total energy ESCFof Hartree–Fock theory and Kohn–Sham density-functional theory. In the TRSCF method, both the Fock/Kohn–Sham matrix diagonalization step to obtain a new density matrix and the step to determine the optimal density matrix in the subspace of the density matrices of the preceding diagonalization steps have been improved. The improvements follow from the recognition that local models to ESCF may be introduced by carrying out a Taylor expansion of the energy about the current density matrix. At the point of expansion, the local models have the same gradient as ESCF but only an approximate Hessian. The local models are therefore valid only in a restricted region—
the trust region—and steps can only be taken with confidence within this region. By restricting the steps of the TRSCF model to be inside the trust region, a monotonic and significant reduction of the total energy is ensured in each iteration of the TRSCF method. Examples are given where the TRSCF method converges monotonically and smoothly, but where the standard DIIS method diverges. © 2004 American Institute of Physics. 关DOI: 10.1063/1.1755673兴
I. INTRODUCTION
The steady progress in computer technology and quantum-chemical methodology has widened the range of users of quantum-chemical software packages to include a vast number of practicing, experimental chemists. Routinely, such users perform Hartree–Fock 共HF兲 calculations and Kohn–Sham 共KS兲 density-functional theory 共DFT兲 calcula- tions for molecules of a size and complexity that, a decade ago, were beyond reach even for the most advanced research codes. This development calls for further advances in the automatization of the self-consistent field 共SCF兲 procedure used to optimize the HF and DFT energies, so as to ensure that convergence may be reached in a routine manner even for very complex molecules.
In the original formulation, the SCF procedure consists of a sequence of Roothaan–Hall 共RH兲iterations.1,2At each iteration, a Fock/KS matrix is first constructed from the cur- rent approximation to the one-electron density matrix and then diagonalized to yield an improved set of orbitals and orbital energies and thus an improved density matrix. In the subsequent iteration, this improved density matrix is then used to construct a new Fock/KS matrix, thereby establishing the iteration procedure. However, such a sequence of RH
iterations converges only in simple cases. To improve upon the convergence, each RH iteration may be extended to in- clude, in addition to the diagonalization step, also a step where the best density matrix is generated in the subspace of the density matrices of the current and preceding RH itera- tions. In the next RH iteration, this averaged density matrix rather than the pure density matrix obtained in the last diago- nalization is used to construct the new Fock/KS matrix.
In this paper, we make improvements both to the RH diagonalization step and to the density-subspace optimiza- tion step of the SCF scheme. Our approach follows from the recognition that, in both steps, we may construct local mod- els to the SCF energy function ESCFby a Taylor expansion of the energy about the current density matrix. However, since, at the point of expansion, these models have an exact gradi- ent but only an approximate Hessian, they are valid only in a restricted region about the current approximation to the den- sity matrix—the trust region. Therefore, when these local models are used in the course of the SCF optimization, it is essential they are used only to generate steps within their trust region. Only in this manner can it be ensured that the SCF energy is systematically and sufficiently lowered at each iteration.
In the RH diagonalization part of the SCF optimization, the improvements are obtained by introducing an energy function ERH that corresponds to the sum of the occupied
a兲On leave. Permanent address: Department of Chemistry, Texas A&M Uni- versity, P.O. Box 30012, College Station, Texas 77842-3012.
16
0021-9606/2004/121(1)/16/12/$22.00 © 2004 American Institute of Physics
orbital energies.3An unconstrained minimization of ERHre- sults in the same solution共i.e., density matrix兲as obtained by a diagonalization of the Fock/KS matrix. However, since, at the point of expansion, the RH energy function ERHhas only the gradient in common with the true SCF energy ESCF, a global minimization of ERH may lead to steps that are too long to be trusted. We therefore introduce a trust region where ERH is a good approximation to ESCF. If a global minimization of ERHleads to a step outside the trust region, then the step to the minimum on the boundary of the trust region for ERH is taken instead. This step is found by a level-shifting technique, where the occupied molecular or- bital energies effectively are shifted by some constant to in- crease the gap between the occupied and virtual molecular orbitals. Level shifting has previously been used to improve the convergence of the simple RH sequence of iterations. An essential feature of our implementation is to adjust the level shift in such a manner that the step is to the boundary of the trust region, recognizing that only in this manner does a low- ering of ERHresult in a lowering of ESCF. For this reason, the resulting method is called the trust-region RH 共TRRH兲 method.
The optimization of the density matrix in the subspace of the density matrices of the preceding RH iterations has a long history. Early on, it was recognized that a simple aver- aging of the density matrices of the last few RH iterations significantly improves the convergence of the RH scheme.
This simple density-matrix averaging technique was later ra- tionalized and systematized in the direct inversion in itera- tive subspace 共DIIS兲method of Pulay.4In the DIIS method, an improved density matrix is obtained as a linear combina- tion of the previous density matrices by minimizing the norm of the corresponding linear combination of gradients. The DIIS method significantly speeds up the local convergence and convergence can often be obtained to ground states of rather complex molecules with a small gap between energies of the highest occupied molecular orbital 共HOMO兲 and the lowest unoccupied molecular orbital 共LUMO兲 and with a large number of close-lying electronic states.
Several attempts have been made to modify the DIIS algorithm so as to improve upon its global convergence be- havior. Recently, Kudin, Scuseria, and Cances proposed the energy DIIS共EDIIS兲method, where the DIIS gradient-norm minimization is replaced by a minimization of an approxi- mate energy function.5In EDIIS, the variational parameters, which are the linear expansion coefficients of the density matrices from the previous RH iterations, may only take on values that give densities in the convex set—that is, densities with occupation numbers between 0 and 1. As the EDIIS method is based on the minimization of an approximate en- ergy function, it may have some advantages in the global region. However, it is worrying that a convex solution often cannot be obtained and that the observed local convergence of the EDIIS method is slower than in the standard DIIS method.
In the DIIS and EDIIS methods, an improved density matrix is obtained as a sum of the density matrices from the preceding RH diagonalization steps. Consequently, the aver- aged density matrix is not idempotent as required in HF and
KS theories. The deviation from idempotency may be re- duced using a purified density matrix as the one suggested by McWeeny.6This has been done for the SCF energy minimi- zation by several workers including Nunes and Vanderbilt7 and Daniels and Scuseria8 and for the calculation of geo- metrical derivatives by Ochsenfeld and co-workers.9It may also be done for the EDIIS energy function. The energy func- tion then has the same gradient as ESCF, but also contains terms which cannot be obtained from the densities and Fock/KS matrices of the previous RH iterations. Neglecting these terms, we arrive at the density-subspace minimization 共DSM兲algorithm proposed in this paper. At the point of ex- pansion, the DSM energy function EDSMthus has the same gradient as the true energy function ESCF but only an ap- proximate Hessian. Again, a trust region may be introduced and only steps within this region are taken, ensuring that any lowering of EDSM also corresponds to a lowering of ESCF. The resulting method is called the trust-region DSM 共TRDSM兲method.
In the next section, we first describe the standard opti- mization of the SCF energy function in a density-matrix for- mulation. The TRRH method is then discussed in Sec. II A and the TRDSM method in Sec. II B. In Sec. III, we give some numerical examples to demonstrate the performance of the resulting trust-region SCF 共TRSCF兲 method. The last section contains some concluding remarks.
II. THEORY
For a closed-shell system with N/2 electron pairs, the Hartree–Fock共HF兲energy excluding the nuclear–nuclear re- pulsion energy is given by3
ESCF共D兲⫽2 Tr hD⫹Tr DG共D兲, 共1兲 where D is the one-electron density matrix in the atomic- orbital共AO兲basis, h is the one-electron Hamiltonian matrix and G共D兲is defined as
G共D兲⫽
兺
共2g⫺g兲D, 共2兲where g is a two-electron integral in the AO basis. For the energy in Eq.共1兲to be a valid approximation to the true HF energy, the density matrix D must satisfy the symmetry, trace, and idempotency conditions:
DT⫽D, 共3兲
Tr DS⫽N
2 , 共4兲
DSD⫽D. 共5兲
Similar conditions apply in the Kohn–Sham共KS兲theory, but the energy function of Eq. 共1兲 must then be modified by including the exchange-correlation term and by scaling 共or complete removal兲of the exchange term from Eq.共2兲.
The traditional approach to the optimization of the HF energy is an iterative one. From the current approximation to the density matrix Dn in iteration n, a Fock matrix is built
F共Dn兲⫽h⫹G共Dn兲 共6兲
and, following the Roothaan–Hall共RH兲procedure, the Fock matrix is diagonalized
F共Dn兲Cocc⫽SCocc, 共7兲 where S is the overlap matrix in the AO basis, to give a set of occupied molecular orbitals 共MOs兲, from which a new ap- proximation to the density matrix is obtained as
Dn⫹1⫽CoccCoccT . 共8兲 The iteration procedure is established using Dn⫹1as the cur- rent density in Eq.共6兲. The final solution to the minimization problem is obtained when the Dn and Dn⫹1 are the same.
This self-consistent field共SCF兲procedure may also be used in KS theory, the only difference being the addition of the exchange-correlation potential and the scaling of the ex- change contribution in the Fock matrix to yield the KS ma- trix.
The pure RH iterations presented above often do not converge. A powerful method for handling this divergence is not to construct the Fock matrix from the density matrix Dn but rather from an average of all previous density matrices:
D¯n⫽i
兺
⫽1 nciDi. 共9兲
The averaged density matrix D¯n is then used in place of the pure density matrix Dn in Eq.共6兲 to obtain the Fock matrix F(D¯
n) as F共D¯n兲⫽i
兺
⫽1n
ciF共Di兲 共10兲
and the iteration procedure is established. In the course of the TRSCF iterations, the following matrices are set up in the order indicated: D1, F(D1), D2, F(D2), D¯2, F(D¯2), D3, F(D3), D¯
3, F(D¯
3), . . . . Among these, D1, F(D1), D2, F(D2), D3, F(D3), . . . are saved during the iteration proce- dure.
In the following, we describe improvements to the SCF diagonalization and density-subspace optimization steps. In Sec. II A, we describe how the trust-region RH 共TRRH兲 method is used to generate new density matrices by a modi- fication of the traditional RH method Eqs.共7兲and共8兲. Next, in Sec. II B, we introduce the trust-region density-subspace minimization共TRDSM兲method for calculating the averaged density matrix of Eq. 共9兲. In the following, we use the indi- ces i, j,k,l for occupied MOs and the indices a,b,c,d for the virtual MOs.
A. The trust-region Roothaan–Hall method
As discussed in Ref. 3, the traditional RH method may be viewed as a minimization of the sum of the orbital ener- gies of the occupied MOs
ERH⫽2
兺
i ⑀i⫽2 Tr F共D¯兲D, 共11兲subject to orthonormality constraints on the occupied MOs
i:
具i兩j典⫽␦i j. 共12兲
Whereas D¯ is the current approximation to the HF/KS den- sity matrix, usually obtained as a linear combination of the previous densities according to Eq.共9兲, the density matrix D to be optimized in Eq. 共11兲 is related to the occupied MOs resulting from the diagonalization of F(D¯ ) as
D⫽CoccCoccT . 共13兲
To see this, consider the constrained minimization of ERHin Eq. 共11兲expressed in terms of the Lagrangian
L⫽2 Tr F共D¯兲D⫺2 Tr共CoccT SCocc⫺IN/2兲, 共14兲 where the multipliers i j ensure orthonormality among the occupied MOs. Minimization of this Lagrangian leads to the standard RH equations:
F共D¯兲Cocc⫽SCocc. 共15兲 However, since ERHof Eq.共11兲is only a crude model of the true energy ESCF共the gradient is correct at D¯ assuming D¯ is idempotent兲, a global minimization of ERHaccording to Eq.
共15兲may easily lead to steps that are too long to be trusted as they are outside the region where ERHis a good approxima- tion to ESCF. Steps outside the trust region may often not lead to a reduction of the total energy ESCF.
1. The level-shifted Roothaan–Hall equations
To avoid too long steps, an additional constraint is im- posed on the optimization of Eq. 共11兲, namely, that the new density matrix D in Eq. 共13兲does not differ too much from the old matrix D¯ . This condition is conveniently expressed in terms of the overlap between the density matrices in the S metric norm
具D兩D¯典S⫽Tr DSD¯ S⫽a
冑
N2Tr D¯ SD¯ S, 共16兲where Tr D¯ SD¯ S⬇N/2 since D¯ is not necessarily idempotent.
Note that, for D equal to 共an idempotent兲 D¯ , a is equal to one. For a sufficiently close to one, a step will therefore be taken in the local region. In practice, we define sufficiently close to one by the parameter amin⫽0.975.
Introducing an undetermined multiplier associated with this new constraint, we obtain the following Lagrang- ian:
L⫽2 Tr F共D¯兲D⫺2共Tr SD¯ SD⫺a
冑
N2Tr D¯ SD¯ S兲⫺2 Tr共CoccT SCocc⫺IN/2兲. 共17兲 Differentiating this Lagrangian with respect to the MO coef- ficients and setting the result equal to zero, we arrive at the level-shifted RH equations
关F共D¯兲⫺SD¯ S兴Cocc共兲⫽SCocc共兲. 共18兲 To interpret the level-shift term, we note that D¯ S projects out the component of Coccthat is occupied in D¯ 共assuming idem- potent D¯ ), see Ref. 3. The level shift therefore works only on the occupied part of F(D¯ ), shifting all the occupied orbital energies and increasing the gap between the occupied and virtual MOs, in particular the HOMO-LUMO gap.
Since the SCF energy ESCFis invariant with respect to an orthogonal transformation between the MOs, Eq. 共18兲 may be transformed to the canonical basis:
关F共D¯兲⫺SD¯ S兴Cocc共兲⫽SCocc共兲⑀, 共19兲 where the diagonal matrix ⑀contains the orbital energies.
2. Choice of the RH level-shift parameter
The density matrix generated from the restricted RH so- lution Eq. 共19兲depends on the level-shift parameter:
D共兲⫽Cocc共兲CoccT 共兲. 共20兲 To see how is determined, we consider the determination of in the fourth iteration of the rhodium-complex calcula- tion described in Sec. III. In Fig. 1共a兲, we have plotted the HOMO-LUMO gap as a function of,
⌬⑀ai共兲⫽⑀a
LUMO共兲⫺⑀i
HOMO共兲, 共21兲
where⑀i
HOMO() and⑀a
LUMO() are the HOMO and LUMO orbital energies, respectively; in Fig. 1共b兲, we have plotted the overlap between the old and new density matrices as given by
a共兲⫽ 具D共兲兩D¯典S
冑
具D共兲兩D共兲典S具D¯兩D¯典S, 共22兲
where具D()兩D()典S is equal to N/2. For sufficiently large
, the HOMO-LUMO gap Eq. 共21兲is linear in. This lin- earity of ⌬⑀ai() for largearises from the dependence of the orbital energies on in Eq.共19兲, whereis effectively subtracted from the occupied orbital energies. The MOs C¯occ occupied in D¯ satisfy the generalized eigenvalue equations
SD¯ SC¯occ⫽SC¯occ, 共23兲 and become identical to the MOs Cocc(⬁) obtained from Eq.
共19兲 when tends to infinity. The corresponding density is denoted
D共⬁兲⫽C¯
occC¯
occ
T 共24兲
and represents a purified D¯ . In the linear regime of⌬⑀ai(), there is a continuous development of the occupied MOs from those occupied in D¯ . As decreases and we enter the non- linear regime at min, the MOs in Eq.共20兲no longer corre- spond to those in Eq.共23兲. Comparing plot共a兲and共b兲in Fig.
1, we note that the region a()⬍amin in Fig. 1共b兲 corre- sponds roughly to the region ⬍minin Fig. 1共a兲.
As we insist on a controlled, continuous development of the MOs from those occupied in D¯ , the level-shift parameter should be restricted to the linear regimemin⬍⬍⬁. To de- termine the optimal level-shift parameter opt, we therefore begin by establishing the onset of linearity min by linear extrapolation by means of two Fock/KS matrix diagonaliza- tions, giving the two⌬⑀aivalues marked by crosses and the linearly interpolatedminvalue marked with an arrow. Next, since, in the linear interval, a smallcorresponds to a large step, we investigate whether minis acceptable by checking if a(min)⬎amin. If this step is too long, we backtrack by increasing using inexact line search until an acceptable value optis found such that a(opt)⬇amin, requiring a few additional Fock/KS matrix diagonalizations. In Fig. 1共b兲, the accepted optis marked with an arrow.
For a better understanding of this step, consider the Hes- sian of the ERHenergy function:
Aai,b jRH ⫽␦i j␦ab共⑀a⫺⑀i兲. 共25兲
By restricting the level-shift parameter tomin⬍⬍⬁where
⑀a
LUMO()⫺⑀i
HOMO()⬎0, we ensure that the effective Hes- sian is positive definite and that the model energy function ERHis reduced. We note that the Hessian of the true energy function ESCFis given by the more complicated expression
Aai,b jSCF⫽␦i j␦ab共⑀a⫺⑀i兲⫹4gaib j⫺gabi j⫺ga jib. 共26兲 Often, the orbital energy difference dominates the Hessian.
In such cases, we expect the above step to reduce the SCF energy ESCFas well as the model function ERH. In any case, when a sufficiently large level shift is added in Eq.共19兲, the
FIG. 1. For the fourth iteration of the rhodium calculation described in Sec.
III we have displayed as a function of the level-shift parameter;共a兲the HOMO-LUMO gap⌬⑀ai, whereminis the smallest accepted level-shift, 共b兲the overlap a between the old and new density matrices, whereoptis the optimal level-shift, and共c兲the change in the model energy⌬ERHand the actual energy⌬ESCFRH .
Hessian structure of Eq. 共25兲becomes similar to that of the true energy function ESCF in Eq. 共26兲. The steps generated from ERH with such level shifts will therefore have essen- tially the same direction as the ones generated from ESCF.
By construction, the ERH energy function is lowered when is chosen according to the above prescription
⌬ERH⫽2 Tr F共D¯兲关D共兲⫺D¯兴⬍0. 共27兲 Since ERHis only a local model of the true energy function ESCF, the associated change in the true energy
⌬ESCFRH⫽ESCF关D共兲兴⫺ESCF共D¯兲 共28兲 may be either negative or positive, depending on how well ERHrepresents ESCFfor the chosen step. However, for suffi- ciently small steps,⌬ESCFRH⬍0, since the model function then represents the true energy well.
Let us consider the relationship between the true lower- ing⌬ESCFRH and the lowering predicted by the model function
⌬ERH. Introducing the 共presumably small兲 differential den- sity matrix
⌬⫽D共兲⫺D¯ 共29兲 and using the identity Tr AG(B)⫽Tr BG(A) valid for sym- metric matrices A and B, we find that the change in the true energy Eq.共28兲may be written in the form
⌬ESCFRH⫽2 Tr h关D共兲⫺D¯兴⫹Tr共D¯⫹⌬兲
⫻G共D¯⫹⌬兲⫺Tr D¯ G共D¯兲
⫽2 Tr h⌬⫹2 Tr⌬G共D¯兲⫹Tr⌬G共⌬兲, 共30兲 which shows that the changes in the true energy and in the model energy are related as
⌬ESCFRH⫽⌬ERH⫹Tr⌬G共⌬兲. 共31兲 If the last term共which is second order in⌬兲is negligible, the energy lowering predicted by the local model ERHbecomes equal to⌬ESCFRH . However, since the correction term is posi- tive 共strictly positive in the absence of exchange兲, its pres- ence in Eq. 共31兲 shows that, for sufficiently large steps, a lowering of the model function may not lead to a lowering of the total energy. To avoid such steps, it would be useful to provide an alternative prediction of⌬ESCFRH that is less expen- sive than the calculation of Tr⌬G共⌬兲itself. Section II A 3 is concerned with this problem.
To demonstrate the efficiency of the chosen level shift
optin the global region of a SCF optimization, we have for the fourth iteration of the rhodium-complex calculation plot- ted in Fig. 1共c兲,⌬ESCFRH and ⌬ERH as a function of . The energy gain ⌬ESCFRH is about optimal for the level shiftopt. Increasinggives a smaller energy gain while decreasing gives a slight increase in the energy gain and from ⬍4.5,
⌬ESCFRH is actually positive. Note also that for⬍opt,⌬ERH and ⌬ESCFRH start to differ indicating that the importance of Tr⌬G共⌬兲 increases. The step representing a RH iteration where ⫽0 is far too long to be trusted and results in a significant increase of the total energy.
3. Prediction of the energy close to the minimum To develop a better prediction of⌬ESCFRH than⌬ERH, we note that the only part, that cannot easily be evaluated from known Fock-matrices, is the second-order contribution to Eq.
共31兲 from that part of ⌬ that does not belong to the linear space spanned by the previous density matrices Di. To see this, we decompose the current density matrix D共兲into two parts
D共兲⫽D储⫹D⬜, 共32兲
where D储belongs to the linear space spanned by the previous density matrices and D⬜ belongs to its orthogonal comple- ment. We then expand D储 in the following manner:
D储共兲⫽i
兺
⫽n1 ci共兲Di, 共33兲where the expansion coefficients ci() are determined in a least-squares manner
ci共兲⫽
兺
j⫽n1 关M⫺1兴i jTr DjSD共兲S, Mi j⫽Tr DiSDjS. 共34兲The change in the SCF energy associated with the change of density matrix from D¯ to D共兲may be expressed as
⌬ESCFRH共兲⫽ESCF共D储兲⫺ESCF共D¯兲⫹2 Tr D⬜F共D储兲
⫹Tr D⬜G共D⬜兲. 共35兲 Ignoring the small term quadratic in D⬜, we may now pre- dict the change in the SCF energy at little cost from the expression
⌬ESCFP 共兲⫽ESCF共D储兲⫺ESCF共D¯兲⫹2 Tr D⬜F共D储兲, 共36兲 using only the density matrices and Fock/KS matrices of the previous iterations. In particular in the later parts of the it- eration sequence, where the space spanned by the densities of the preceding RH iterations is large, an accurate estimate of ⌬ESCFRH may be obtained from this formula. In the follow- ing, we shall see how we may use this prediction to deter- mine the level shift when min⫽0 and a(0)⬎amin.
To illustrate how ⌬ESCFP is used to find the level-shift parameter, consider as an example the determination of the level-shift parameter in the ninth iteration of the rhodium- complex calculation of Sec. III. The plot of the HOMO- LUMO gap in Fig. 2共a兲 shows that the allowed level-shift interval is 0⭐⬍⬁. In Fig. 2共b兲, we have plotted the over- lap a() as a function of . Since a(0)⬎amin, we should, according to the discussion in Sec. II A 2, use opt⫽0 to determine the step. In short, considerations based on the HOMO-LUMO gap and on the overlap with the averaged density matrix indicate that the next density matrix should be determined from the standard, unshifted RH equations.
However, from the nine density matrices of the previous RH iterations, we can use⌬ESCFP () to predict the change in ESCFRH() more accurately than with⌬ERH(). Indeed, from Fig. 2共c兲, we see that⌬ESCFP () provides a good global rep- resentation of⌬ESCFRH(), with a minimum close to the mini- mum of ⌬ESCFRH(). By contrast, the local model⌬ERH()
gives a minimum at⫽0. Clearly,⫽0 should be avoided in the calculation since it would lead to an increase in the SCF energy. Instead, the value of the level-shift parameter that corresponds to the minimum of⌬ESCFP 共denoted byopt) is chosen for the calculation of the next density matrix.
This procedure may be summarized as follows. Ifmin
⫽0 and a(0)⬎amin, then we calculate the predicted energies
⌬ESCFP (0) and ⌬ESCFP (␦) with ␦ⲏ0. If ⌬ESCFP (0)
⬍⌬ESCFP (␦), then we use D共0兲. Otherwise, we estimate the minimum opt of ⌬ESCFP () by an inexact line search and use the density matrix D(opt) at this minimum.
B. Density-subspace minimization 1. The DSM energy function
Let us assume that we have carried out n RH iterations and that we have kept all previous density matrices Di and the corresponding Fock matrices Fi. We would now like to construct an optimal density as a linear combination of the densities from these iterations according to Eq.共9兲,
D¯⫽i
兺
⫽n1 ciDi. 共37兲Ideally, this averaged density should also fulfill the condi- tions Eqs.共3兲–共5兲. The symmetry condition Eq.共3兲is trivi- ally satisfied since the averaged density Eq. 共37兲is a linear combination of symmetric density matrices. The trace condi- tion Eq. 共4兲 is also easily taken care of by imposing the restriction
i
兺
⫽1 nci⫽1 共38兲
on the expansion coefficients Tr D¯ S⫽i
兺
⫽1n
ciTr DiS⫽N
2. 共39兲
By contrast, the idempotency condition Eq. 共5兲 cannot be imposed on the averaged density matrix. However, the idem- potency may be significantly improved if, instead of working with D¯ , we work with the purified density matrix6
D˜⫽3D¯ SD¯⫺2D¯ SD¯ SD¯ , 共40兲 as proposed by Nunes and Vanderbilt.7The electronic energy may be expressed in terms of the purified average density matrix as
E共D˜兲⫽2 Tr hD˜⫹Tr D˜ G共D˜兲. 共41兲 We note that the purified density is correct to first order in the expansion coefficients ci and that E(D˜ ) thus contains errors through second order in ci. To determine the best average density matrix Eq. 共37兲, we shall minimize Eq.共41兲 with respect to the expansion coefficients ci subject to the condition Eq.共38兲.
One problem we encounter when minimizing Eq.共41兲is that new Fock matrices F(D˜ ) need to be evaluated. To avoid this problem, we shall use an approximate form of Eq.共41兲. Since the purified density matrix D˜ is close to the original density matrix D¯ , we can write it as
D˜⫽D¯⫹⌬, 共42兲
where ⌬ is the correction term. Inserting Eq.共42兲 into Eq.
共41兲, we obtain
E⫽2 Tr hD¯⫹Tr D¯ G共D¯兲⫹2 Tr h⌬
⫹2 Tr⌬G共D¯兲⫹Tr⌬G共⌬兲. 共43兲 Since⌬is small, we may ignore the term quadratic in⌬and arrive at the density-subspace minimization 共DSM兲 energy function
EDSM共c兲⫽2 Tr hD¯⫹Tr D¯ G共D¯兲⫹2 Tr h⌬⫹2 Tr⌬G共D¯兲
⫽E共D¯兲⫹2 Tr F共D¯兲共D˜⫺D¯兲. 共44兲 Since ⌬ is first order in the expansion coefficients ci, the DSM energy differs from the true energy to second and
FIG. 2. For the ninth iteration of the rhodium calculation described in Sec.
III we have displayed as a function of the level-shift parameter;共a兲the HOMO-LUMO gap⌬⑀ai, wheremin⫽0,共b兲the overlap a between the old and new density matrices, where aminis the smallest accepted overlap and 共c兲the change in the model energy⌬ERH, the actual energy⌬ESCFRH and the predicted energy⌬ESCF
P .optis found at the minimum of⌬ESCF P ().
higher orders in ci. The first contribution to the DSM energy function may for example be evaluated using the energy ex- pression of the EDIIS algorithm,5
E共D¯兲⫽
兺
i ciESCF共Di兲⫺12兺
i j cicjTr共Fi⫺Fj兲共Di⫺Dj兲.共45兲 Using Eq.共40兲, we find that the second contribution may be evaluated as
2 Tr F共D¯兲共D˜⫺D¯兲⫽⫺2
兺
i j cicjTr FiDj⫹6
兺
i jk cicjckTr FiDjSDk⫺4
兺
i jkl cicjckclTr FiDjSDkSDl. 共46兲All contributions to the DSM energy function are therefore easily calculated from the previous density and Fock/KS matrices.
2. The trust-region DSM minimization
We minimize the DSM energy functional by the trust- region method.12 We thus consider the second-order Taylor expansion of the DSM energy in Eq. 共44兲 about c0. Intro- ducing the step vector
⌬c⫽c⫺c0, 共47兲 we obtain
E(2)DSM共⌬c兲⫽E0⫹⌬cTg⫹12⌬cTH⌬c, 共48兲 where the energy, gradient, and Hessian at the expansion point are given by
E0⫽E共c0兲, g⫽E共c兲
c
冏
c⫽c0
, H⫽2E共c兲
c2
冏
c⫽c0
. 共49兲 As starting point c0, we choose the density matrix with the lowest energy ESCF(Di), usually from the last RH iteration.
The trace condition Eq. 共38兲imply
兺
i⫽1 n⌬ci⫽0. 共50兲
We also introduce a trust region of radius h for E(2)DSM(⌬c) and require that steps are always taken inside or to the boundary of this region. To determine a step to the boundary, we restrict the step to have the length h in the S metric norm of Eq.共34兲,
储⌬c共兲储S
2⫽
兺
i j ⌬ciMi j⌬cj⫽h2. 共51兲 Introducing the undetermined multipliers and for the trace and step-size constraints, we arrive at the following Lagrangian for minimization on the boundary of the trust region:L共⌬c,,兲⫽E0⫹⌬cTg⫹12⌬cTH⌬c⫺⌬cT1
⫺12共⌬cTM⌬c⫺h2兲, 共52兲 where 1 is a column vector with elements equal to 1. Differ- entiating this Lagrangian and setting the derivatives equal to zero, we obtain the equations
L
⌬c⫽g⫹H⌬c⫺M⌬c⫺1⫽0, 共53兲
L
⫽⫺⌬cT1⫽0, 共54兲
L
⫽⫺12共⌬cTM⌬c⫺h2兲⫽0. 共55兲 The optimization of the Lagrangian thus corresponds to the solution of the following set of linear equations:
冉
共H⫺⫺1TM兲 ⫺01冊 冉⌬c冊
⫽⫺冉
0g冊
, 共56兲
where the multiplieris iteratively adjusted until the step is to the boundary of the trust region Eq.共55兲. The step-length restriction may be lifted by setting⫽0, as needed for steps inside the trust region.
To understand the behavior of the step-length function, we consider first the generalized eigenvalue problem
冉
⫺H1T ⫺01冊 冉v冊
⫽冉
M0T 0冊 冉v冊
, 共57兲
冊
, 共57兲where 0 is a column vector with zero elements, is a small positive constant, and the eigenvector is normalized such that
vTv⫹2⫽1. 共58兲
We first note that, for a finite , v⫽0. Next, carrying out block multiplications in Eq.共57兲, we obtain
Hv⫺1⫽Mv, 共59兲
⫺1Tv⫽, 共60兲 which upon elimination of from the first equation yields the relation
Hv⫹共1Tv兲1⫽2Mv. 共61兲 Since (1Tv)1 is finite, we conclude that, as tends to zero, the eigenvalue tends to either plus or minus infinity
⫾⫺1/2. Next, substituting these values of into Eq. 共60兲, we find that v tends to the zero vector with elements propor- tional to 1/2 and that , because of the normalization Eq.
共58兲, tends to⫿1. In short, the eigenvalue problem Eq.共57兲 with ⫽0 has two eigenvalues ⫾⬁, whose eigenvectors have zero elements except for the last element, which is equal to⫿1. Finally, invoking the Hylleraas–Undheim inter- lace theorem,10,11we conclude that the remaining n⫺1 finite eigenvalues of Eq. 共57兲 bisects the n eigenvalues of the re- duced eigenvalue problem
Hv⫽Mv. 共62兲
Let us now consider the step length储⌬c()储Sas a func- tion of . In the diagonal representation of the augmented matrix in the linear equations Eq. 共57兲, we may write these equations in the following uncoupled form:
共hi⫺mi兲i⫽⫺␥i, i⫽1,2,3,...,n⫹1. 共63兲 Here, the hiand miare the diagonal elements of the Hessian and metric matrices, respectively, of the generalized eigen- value problem Eq.共57兲, whereas theiand␥i, respectively, are the corresponding elements of the solution and gradient vectors of Eq. 共56兲. Since the last element of the gradient vector in Eq. 共56兲is zero, the gradient vector has no contri- butions from the eigenvectors with infinite eigenvalues
␥1⫽␥n⫹1⫽0, 1⫽⫺n⫹1⫽⫺⬁ 共64兲 assuming that the eigenvalues are sorted in increasing order
1⬍2⬍¯⬍n⫹1. In the diagonal representation, there- fore, we may write the step norm in the form
储⌬c共兲储S⫽
冑
i兺
⫽2n mi␥i
2
共hi⫺mi兲2. 共65兲 From this expression, we note that the step function consists of n branches separated by n⫺1 asymptotes at the finite eigenvalues i. Moreover, it increases monotonically from zero to infinity as increases from minus infinity and ap- proaches the lowest finite eigenvalue2. Therefore, there is always one and only one ⫺⬁⬍⬍2 that gives rise to a step of length h. As shown by Fletcher,12 this value of corresponds to the global minimum on the boundary of the trust region.
In practice, we cannot easily determine the eigenvalues
i of the augmented eigenvalue problem Eq. 共57兲. Instead, we determine the eigenvaluesi of the reduced problem Eq.
共62兲 and restrict our search of to the smaller monotonic interval ⫺⬁⬍⬍1. Since 1⬍2, it is possible that no solution exists in this reduced interval. Mostly, however, this restriction is mild since the two eigenvalues are usually close. If no solution is found, we choose instead the slightly shorter step obtained with⫽1.
To illustrate how the level-shift parameterin Eq.共56兲 is determined, we consider the first关Fig. 3共a兲兴and third关Fig.
3共b兲兴 DSM step in the eighth iteration of the rhodium- complex calculation in Sec. III. We have plotted the step- length function储⌬c()储S as a function of. The plots con- sist of a series of branches between asymptotes where makes the matrix on the left-hand side of Eq.共56兲singular.
The lowest eigenvalue 1 is marked with a vertical dashed line in Figs. 3共a兲and 3共b兲. For minimization, the level-shift parameter is chosen in the interval ⫺⬁⬍⬍min(1,0), where 1 is the lowest eigenvalue of Eq. 共62兲. The proper value is found where the step-length function crosses the line representing the trust radius h, as marked with a cross in Fig.
3共a兲. If the step that minimizes E(2)DSMis inside the trust re- gion, ⫽0 is chosen as marked with a cross in Fig. 3共b兲. The trust region is updated during the iterative procedure.
3. Global optimization of the DSM function
The optimization of the EDSMenergy is carried out in the usual manner, requiring several trust-region steps, each of which involves the construction of the gradient g and the Hessian H, and the solution of the modified level-shifted Newton equations Eq.共56兲. After p iterations, the density is calculated from the coefficients
cp⫽c(0)⫹
兺
i⫽p1 ⌬ci. 共66兲However, since EDSM itself is a rather crude model of the true energy function ESCF, it resembles ESCFonly in a small region about the initial point c(0). The DSM iterations are therefore terminated when the total step length 储cp⫺c(0)储 exceeds some preset value k. If a minimum of EDSMis found inside the trust region 储cp⫺c(0)储⬍k, then the step to the minimum is taken and the iterations are terminated. This is often the case.
Occasionally, the iterations start where the lowest eigen- value of the Hessian in Eq.共62兲is negative. In the course of the iterations, the Hessian can become positive definite and a minimum is reached. In a few cases, however, a negative Hessian eigenvalue may persist, changing little from itera- tion to iteration. In our experience, a step along the eigen- vector corresponding to the negative eigenvalue cannot be trusted. This direction is therefore projected out from the step and the DSM function is minimized in the orthogonal sub- space.
As an illustration, consider the first DSM step of the tenth SCF iteration of the rhodium-complex calculation in Sec. III. In Fig. 4, we have, for comparison, plotted the step- length functions with the negative component kept and pro- jected out. The level shifts resulting from the two situations
FIG. 3. The step-length function储⌬c()储Sis plotted as a function offor the first共a兲and third共b兲DSM step in the eighth iteration of the rhodium calculation described in Sec. III. The trust radius h is represented by a horizontal line. The propervalue is marked with a cross.
are marked with crosses in Fig. 4. The level shift used in the DSM optimization is, in this particular case,⫽0.
When the trust-region minimization is terminated, a new RH iteration is initiated by constructing a new density and associated Fock matrix
D¯⫽i
兺
⫽n1 ciDi, F¯⫽i兺
⫽n1 ciF共Di兲, 共67兲where we have used the fact that the Fock matrix is linear in the density. By construction EDSM(c) is lowered at each it- eration of the trust-region minimization. The total energy lowering at the pth iteration is given by
⌬EDSM⫽EDSM共cp兲⫺EDSM共c(0)兲. 共68兲 Since EDSM is a local model to the true energy ESCF, the lowering of EDSMwill also lead to a lowering of ESCFpro- vided the total step is sufficiently short to be in the local region.
4. Relationship to the DIIS method
The optimal density has previously been determined us- ing the DIIS scheme of Pulay.4In the DIIS method, the im- proved density matrix is obtained as a linear combination of the previous density matrices where the expansion coeffi- cients are determined by minimizing the norm of the error vector, using the gradients of the previous iterations as error vectors. To highlight the difference between TRDSM and DIIS, we give below an alternative derivation of the DIIS algorithm.
In an SCF calculation, the electronic gradient with the averaged density matrix D¯ in Eq.共37兲may be expressed in the form,3
g共D¯兲⫽4共D¯ SF共D¯兲⫺F共D¯兲SD¯兲. 共69兲 To determine the best linear combination of densities Di, we minimize the norm of the squared gradient
储g共D¯兲储2⫽16 Tr关D¯ SF共D¯兲⫺F共D¯兲SD¯兴2. 共70兲 Inserting the expansion Eq. 共37兲, we obtain a quartic poly- nomial in ci,
储g共D¯兲储2⫽16 Tr
再 兺i cig共Di兲⫹兺
i, j cicj关DiSF共Dj⫺Di兲
⫺F共Dj⫺Di兲SDi兴
冎
2. 共71兲To simplify this expression, we neglect all cubic and quartic terms
储g共D¯兲储app
2 ⫽
兺
i, j cicjg共Di兲g共Dj兲. 共72兲Optimization of Eq. 共72兲 subject to the constraint Eq. 共38兲 gives the DIIS expression of the expansion coefficients in Eq. 共37兲.
III. APPLICATIONS
In this section, we examine the convergence characteris- tics of the TRSCF algorithm. First, we consider a rhodium- complex optimization as an example of a difficult case; next, as a simpler case, we consider a calculation on H2O with the OH bond lengths stretched to double length. For comparison, we also give the convergence characteristics of the DIIS algorithm4 and the quadratically convergent restricted step Hartree–Fock 共QRHF兲method.13,14All calculations are car- ried out using a local version of the DALTON program package.17
A. The rhodium complex calculation
In Fig. 5, we have plotted the error in the energy at each iteration of TRSCF, DIIS, and QRHF optimizations of the rhodium complex with the geometry specified in Table I us- ing the AhlrichsVDZ basis16combined with STO-3G on Rh.
The starting orbitals have been obtained from diagonalizing the one-electron Hamiltonian.
Clearly, the QRHF and DIIS methods do not work in this case. In particular, the DIIS method is unable to handle the global part of the optimization, where the initially indefinite Hessian changes its structure and becomes positive definite.
Since the DIIS method relies solely on gradient information, it does not see the negative eigenvalues and produces steps that may or may not be in the right direction, leading to
FIG. 4. The step-length function储⌬c()储Sis plotted as a function ofwith the direction corresponding to the negative Hessian eigenvalue kept共—兲and projected out 共- - -兲, respectively. The values resulting from the two situations are marked with crosses.
FIG. 5. The convergence of calculations on the rhodium complex using AhlrichsVDZ basis共Ref. 16兲combined with STO-3G for Rh. The error in the total energy is given for the TRSCF, the standard DIIS, and the QRHF method as a function of the iteration number. Furthermore results are given where DIIS is applied after nine TRSCF iterations.