• No results found

The trust-region self-consistent field method: Towards a black-box optimization in Hartree–Fock and Kohn–Sham theories

N/A
N/A
Protected

Academic year: 2022

Share "The trust-region self-consistent field method: Towards a black-box optimization in Hartree–Fock and Kohn–Sham theories"

Copied!
12
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

The trust-region self-consistent field method: Towards a black-box optimization in Hartree–Fock and Kohn–Sham theories

Lea Thøgersen, Jeppe Olsen, Danny Yeager,a)and Poul Jørgensen Department of Chemistry, University of A˚ rhus, DK-8000 A˚rhus C, Denmark Paweł Sałek

Laboratory of Theoretical Chemistry, The Royal Institute of Technology, Teknikringen 30, Stockholm SE-10044, Sweden

Trygve Helgaker

Department of Chemistry, University of Oslo, P.O. Box 1033 Blindern, N-0315 Norway 共Received 17 February 2004; accepted 5 April 2004兲

The trust-region self-consistent field共TRSCF兲method is presented for optimizing the total energy ESCFof Hartree–Fock theory and Kohn–Sham density-functional theory. In the TRSCF method, both the Fock/Kohn–Sham matrix diagonalization step to obtain a new density matrix and the step to determine the optimal density matrix in the subspace of the density matrices of the preceding diagonalization steps have been improved. The improvements follow from the recognition that local models to ESCF may be introduced by carrying out a Taylor expansion of the energy about the current density matrix. At the point of expansion, the local models have the same gradient as ESCF but only an approximate Hessian. The local models are therefore valid only in a restricted region—

the trust region—and steps can only be taken with confidence within this region. By restricting the steps of the TRSCF model to be inside the trust region, a monotonic and significant reduction of the total energy is ensured in each iteration of the TRSCF method. Examples are given where the TRSCF method converges monotonically and smoothly, but where the standard DIIS method diverges. © 2004 American Institute of Physics. 关DOI: 10.1063/1.1755673兴

I. INTRODUCTION

The steady progress in computer technology and quantum-chemical methodology has widened the range of users of quantum-chemical software packages to include a vast number of practicing, experimental chemists. Routinely, such users perform Hartree–Fock 共HF兲 calculations and Kohn–Sham 共KS兲 density-functional theory 共DFT兲 calcula- tions for molecules of a size and complexity that, a decade ago, were beyond reach even for the most advanced research codes. This development calls for further advances in the automatization of the self-consistent field 共SCF兲 procedure used to optimize the HF and DFT energies, so as to ensure that convergence may be reached in a routine manner even for very complex molecules.

In the original formulation, the SCF procedure consists of a sequence of Roothaan–Hall 共RH兲iterations.1,2At each iteration, a Fock/KS matrix is first constructed from the cur- rent approximation to the one-electron density matrix and then diagonalized to yield an improved set of orbitals and orbital energies and thus an improved density matrix. In the subsequent iteration, this improved density matrix is then used to construct a new Fock/KS matrix, thereby establishing the iteration procedure. However, such a sequence of RH

iterations converges only in simple cases. To improve upon the convergence, each RH iteration may be extended to in- clude, in addition to the diagonalization step, also a step where the best density matrix is generated in the subspace of the density matrices of the current and preceding RH itera- tions. In the next RH iteration, this averaged density matrix rather than the pure density matrix obtained in the last diago- nalization is used to construct the new Fock/KS matrix.

In this paper, we make improvements both to the RH diagonalization step and to the density-subspace optimiza- tion step of the SCF scheme. Our approach follows from the recognition that, in both steps, we may construct local mod- els to the SCF energy function ESCFby a Taylor expansion of the energy about the current density matrix. However, since, at the point of expansion, these models have an exact gradi- ent but only an approximate Hessian, they are valid only in a restricted region about the current approximation to the den- sity matrix—the trust region. Therefore, when these local models are used in the course of the SCF optimization, it is essential they are used only to generate steps within their trust region. Only in this manner can it be ensured that the SCF energy is systematically and sufficiently lowered at each iteration.

In the RH diagonalization part of the SCF optimization, the improvements are obtained by introducing an energy function ERH that corresponds to the sum of the occupied

a兲On leave. Permanent address: Department of Chemistry, Texas A&M Uni- versity, P.O. Box 30012, College Station, Texas 77842-3012.

16

0021-9606/2004/121(1)/16/12/$22.00 © 2004 American Institute of Physics

(2)

orbital energies.3An unconstrained minimization of ERHre- sults in the same solution共i.e., density matrix兲as obtained by a diagonalization of the Fock/KS matrix. However, since, at the point of expansion, the RH energy function ERHhas only the gradient in common with the true SCF energy ESCF, a global minimization of ERH may lead to steps that are too long to be trusted. We therefore introduce a trust region where ERH is a good approximation to ESCF. If a global minimization of ERHleads to a step outside the trust region, then the step to the minimum on the boundary of the trust region for ERH is taken instead. This step is found by a level-shifting technique, where the occupied molecular or- bital energies effectively are shifted by some constant to in- crease the gap between the occupied and virtual molecular orbitals. Level shifting has previously been used to improve the convergence of the simple RH sequence of iterations. An essential feature of our implementation is to adjust the level shift in such a manner that the step is to the boundary of the trust region, recognizing that only in this manner does a low- ering of ERHresult in a lowering of ESCF. For this reason, the resulting method is called the trust-region RH 共TRRH兲 method.

The optimization of the density matrix in the subspace of the density matrices of the preceding RH iterations has a long history. Early on, it was recognized that a simple aver- aging of the density matrices of the last few RH iterations significantly improves the convergence of the RH scheme.

This simple density-matrix averaging technique was later ra- tionalized and systematized in the direct inversion in itera- tive subspace 共DIIS兲method of Pulay.4In the DIIS method, an improved density matrix is obtained as a linear combina- tion of the previous density matrices by minimizing the norm of the corresponding linear combination of gradients. The DIIS method significantly speeds up the local convergence and convergence can often be obtained to ground states of rather complex molecules with a small gap between energies of the highest occupied molecular orbital 共HOMO兲 and the lowest unoccupied molecular orbital 共LUMO兲 and with a large number of close-lying electronic states.

Several attempts have been made to modify the DIIS algorithm so as to improve upon its global convergence be- havior. Recently, Kudin, Scuseria, and Cances proposed the energy DIIS共EDIIS兲method, where the DIIS gradient-norm minimization is replaced by a minimization of an approxi- mate energy function.5In EDIIS, the variational parameters, which are the linear expansion coefficients of the density matrices from the previous RH iterations, may only take on values that give densities in the convex set—that is, densities with occupation numbers between 0 and 1. As the EDIIS method is based on the minimization of an approximate en- ergy function, it may have some advantages in the global region. However, it is worrying that a convex solution often cannot be obtained and that the observed local convergence of the EDIIS method is slower than in the standard DIIS method.

In the DIIS and EDIIS methods, an improved density matrix is obtained as a sum of the density matrices from the preceding RH diagonalization steps. Consequently, the aver- aged density matrix is not idempotent as required in HF and

KS theories. The deviation from idempotency may be re- duced using a purified density matrix as the one suggested by McWeeny.6This has been done for the SCF energy minimi- zation by several workers including Nunes and Vanderbilt7 and Daniels and Scuseria8 and for the calculation of geo- metrical derivatives by Ochsenfeld and co-workers.9It may also be done for the EDIIS energy function. The energy func- tion then has the same gradient as ESCF, but also contains terms which cannot be obtained from the densities and Fock/KS matrices of the previous RH iterations. Neglecting these terms, we arrive at the density-subspace minimization 共DSM兲algorithm proposed in this paper. At the point of ex- pansion, the DSM energy function EDSMthus has the same gradient as the true energy function ESCF but only an ap- proximate Hessian. Again, a trust region may be introduced and only steps within this region are taken, ensuring that any lowering of EDSM also corresponds to a lowering of ESCF. The resulting method is called the trust-region DSM 共TRDSM兲method.

In the next section, we first describe the standard opti- mization of the SCF energy function in a density-matrix for- mulation. The TRRH method is then discussed in Sec. II A and the TRDSM method in Sec. II B. In Sec. III, we give some numerical examples to demonstrate the performance of the resulting trust-region SCF 共TRSCF兲 method. The last section contains some concluding remarks.

II. THEORY

For a closed-shell system with N/2 electron pairs, the Hartree–Fock共HF兲energy excluding the nuclear–nuclear re- pulsion energy is given by3

ESCFD兲⫽2 Tr hDTr DGD兲, 共1兲 where D is the one-electron density matrix in the atomic- orbital共AO兲basis, h is the one-electron Hamiltonian matrix and GD兲is defined as

G␮␯D兲⫽

␳␴ 2g␮␯␳␴g␮␴␳␯D␴␳, 2

where g␮␯␳␴ is a two-electron integral in the AO basis. For the energy in Eq.共1兲to be a valid approximation to the true HF energy, the density matrix D must satisfy the symmetry, trace, and idempotency conditions:

DTD, 共3兲

Tr DSN

2 , 共4兲

DSDD. 共5兲

Similar conditions apply in the Kohn–Sham共KS兲theory, but the energy function of Eq. 共1兲 must then be modified by including the exchange-correlation term and by scaling 共or complete removal兲of the exchange term from Eq.共2兲.

The traditional approach to the optimization of the HF energy is an iterative one. From the current approximation to the density matrix Dn in iteration n, a Fock matrix is built

FDn兲⫽hGDn兲 共6兲

(3)

and, following the Roothaan–Hall共RH兲procedure, the Fock matrix is diagonalized

FDnCoccSCocc␭, 共7兲 where S is the overlap matrix in the AO basis, to give a set of occupied molecular orbitals 共MOs兲, from which a new ap- proximation to the density matrix is obtained as

Dn1CoccCoccT . 共8兲 The iteration procedure is established using Dn1as the cur- rent density in Eq.共6兲. The final solution to the minimization problem is obtained when the Dn and Dn1 are the same.

This self-consistent field共SCF兲procedure may also be used in KS theory, the only difference being the addition of the exchange-correlation potential and the scaling of the ex- change contribution in the Fock matrix to yield the KS ma- trix.

The pure RH iterations presented above often do not converge. A powerful method for handling this divergence is not to construct the Fock matrix from the density matrix Dn but rather from an average of all previous density matrices:

ni

1 n

ciDi. 共9兲

The averaged density matrix D¯n is then used in place of the pure density matrix Dn in Eq.共6兲 to obtain the Fock matrix F(D¯

n) as Fn兲⫽i

1

n

ciFDi兲 共10兲

and the iteration procedure is established. In the course of the TRSCF iterations, the following matrices are set up in the order indicated: D1, F(D1), D2, F(D2), D¯2, F(D¯2), D3, F(D3), D¯

3, F(D¯

3), . . . . Among these, D1, F(D1), D2, F(D2), D3, F(D3), . . . are saved during the iteration proce- dure.

In the following, we describe improvements to the SCF diagonalization and density-subspace optimization steps. In Sec. II A, we describe how the trust-region RH 共TRRH兲 method is used to generate new density matrices by a modi- fication of the traditional RH method Eqs.共7兲and共8兲. Next, in Sec. II B, we introduce the trust-region density-subspace minimization共TRDSM兲method for calculating the averaged density matrix of Eq. 共9兲. In the following, we use the indi- ces i, j,k,l for occupied MOs and the indices a,b,c,d for the virtual MOs.

A. The trust-region Roothaan–Hall method

As discussed in Ref. 3, the traditional RH method may be viewed as a minimization of the sum of the orbital ener- gies of the occupied MOs

ERH⫽2

i i2 Tr FD¯D, 11

subject to orthonormality constraints on the occupied MOs

i:

具␾i兩␾j典⫽␦i j. 共12兲

Whereas D¯ is the current approximation to the HF/KS den- sity matrix, usually obtained as a linear combination of the previous densities according to Eq.共9兲, the density matrix D to be optimized in Eq. 共11兲 is related to the occupied MOs resulting from the diagonalization of F(D¯ ) as

DCoccCoccT . 共13兲

To see this, consider the constrained minimization of ERHin Eq. 共11兲expressed in terms of the Lagrangian

L2 Tr FD⫺2 Tr␭共CoccT SCoccIN/2兲, 共14兲 where the multipliers ␭i j ensure orthonormality among the occupied MOs. Minimization of this Lagrangian leads to the standard RH equations:

FCoccSCocc␭. 共15兲 However, since ERHof Eq.共11兲is only a crude model of the true energy ESCFthe gradient is correct at D¯ assuming D¯ is idempotent兲, a global minimization of ERHaccording to Eq.

共15兲may easily lead to steps that are too long to be trusted as they are outside the region where ERHis a good approxima- tion to ESCF. Steps outside the trust region may often not lead to a reduction of the total energy ESCF.

1. The level-shifted Roothaan–Hall equations

To avoid too long steps, an additional constraint is im- posed on the optimization of Eq. 共11兲, namely, that the new density matrix D in Eq. 共13兲does not differ too much from the old matrix D¯ . This condition is conveniently expressed in terms of the overlap between the density matrices in the S metric norm

DSTr DSD¯ Sa

N2Tr D¯ SD¯ S, 16

where Tr D¯ SD¯ SN/2 since D¯ is not necessarily idempotent.

Note that, for D equal to 共an idempotent兲 D¯ , a is equal to one. For a sufficiently close to one, a step will therefore be taken in the local region. In practice, we define sufficiently close to one by the parameter amin⫽0.975.

Introducing an undetermined multiplier ␮ associated with this new constraint, we obtain the following Lagrang- ian:

L2 Tr FD⫺2␮共Tr SD¯ SD⫺a

N2Tr D¯ SD¯ S

⫺2 Tr␭共CoccT SCoccIN/2兲. 共17兲 Differentiating this Lagrangian with respect to the MO coef- ficients and setting the result equal to zero, we arrive at the level-shifted RH equations

F兲⫺␮SD¯ SCocc共␮兲⫽SCocc共␮兲␭. 共18兲 To interpret the level-shift term, we note that D¯ S projects out the component of Coccthat is occupied in D¯ 共assuming idem- potent D¯ ), see Ref. 3. The level shift therefore works only on the occupied part of F(D¯ ), shifting all the occupied orbital energies and increasing the gap between the occupied and virtual MOs, in particular the HOMO-LUMO gap.

(4)

Since the SCF energy ESCFis invariant with respect to an orthogonal transformation between the MOs, Eq. 共18兲 may be transformed to the canonical basis:

F兲⫺␮SD¯ S兴Cocc共␮兲⫽SCocc共␮兲⑀, 共19兲 where the diagonal matrix ⑀contains the orbital energies.

2. Choice of the RH level-shift parameter

The density matrix generated from the restricted RH so- lution Eq. 共19兲depends on the level-shift parameter␮:

D共␮兲⫽Cocc共␮兲CoccT 共␮兲. 共20兲 To see how ␮is determined, we consider the determination of ␮in the fourth iteration of the rhodium-complex calcula- tion described in Sec. III. In Fig. 1共a兲, we have plotted the HOMO-LUMO gap as a function of␮,

⌬⑀ai共␮兲⫽⑀a

LUMO共␮兲⫺⑀i

HOMO共␮兲, 共21兲

where⑀i

HOMO(␮) anda

LUMO(␮) are the HOMO and LUMO orbital energies, respectively; in Fig. 1共b兲, we have plotted the overlap between the old and new density matrices as given by

a共␮兲⫽ 具D共␮兲兩S

D共␮兲兩D共␮兲典SD¯S

, 共22兲

where具D()D()S is equal to N/2. For sufficiently large

␮, the HOMO-LUMO gap Eq. 共21兲is linear in␮. This lin- earity of ⌬⑀ai(␮) for large␮arises from the dependence of the orbital energies on ␮in Eq.共19兲, where␮is effectively subtracted from the occupied orbital energies. The MOs C¯occ occupied in D¯ satisfy the generalized eigenvalue equations

SD¯ SC¯occSC¯occ, 共23兲 and become identical to the MOs Cocc(⬁) obtained from Eq.

共19兲 when ␮tends to infinity. The corresponding density is denoted

D共⬁兲⫽

occ

occ

T 共24兲

and represents a purified D¯ . In the linear regime of⌬⑀ai(␮), there is a continuous development of the occupied MOs from those occupied in D¯ . As␮ decreases and we enter the non- linear regime at ␮min, the MOs in Eq.共20兲no longer corre- spond to those in Eq.共23兲. Comparing plot共a兲and共b兲in Fig.

1, we note that the region a()amin in Fig. 1共b兲 corre- sponds roughly to the region ␮⬍␮minin Fig. 1共a兲.

As we insist on a controlled, continuous development of the MOs from those occupied in D¯ , the level-shift parameter should be restricted to the linear regime␮min⬍␮⬍⬁. To de- termine the optimal level-shift parameter ␮opt, we therefore begin by establishing the onset of linearity ␮min by linear extrapolation by means of two Fock/KS matrix diagonaliza- tions, giving the two⌬⑀aivalues marked by crosses and the linearly interpolated␮minvalue marked with an arrow. Next, since, in the linear interval, a small␮corresponds to a large step, we investigate whether ␮minis acceptable by checking if a(min)⬎amin. If this step is too long, we backtrack by increasing ␮ using inexact line search until an acceptable value ␮optis found such that a(opt)⬇amin, requiring a few additional Fock/KS matrix diagonalizations. In Fig. 1共b兲, the accepted ␮optis marked with an arrow.

For a better understanding of this step, consider the Hes- sian of the ERHenergy function:

Aai,b jRH ⫽␦i jab共⑀a⫺⑀i兲. 共25兲

By restricting the level-shift parameter to␮min⬍␮⬍⬁where

a

LUMO(␮)⫺⑀i

HOMO(␮)⬎0, we ensure that the effective Hes- sian is positive definite and that the model energy function ERHis reduced. We note that the Hessian of the true energy function ESCFis given by the more complicated expression

Aai,b jSCF⫽␦i jab共⑀a⫺⑀i兲⫹4gaib jgabi jga jib. 共26兲 Often, the orbital energy difference dominates the Hessian.

In such cases, we expect the above step to reduce the SCF energy ESCFas well as the model function ERH. In any case, when a sufficiently large level shift is added in Eq.共19兲, the

FIG. 1. For the fourth iteration of the rhodium calculation described in Sec.

III we have displayed as a function of the level-shift parameter;athe HOMO-LUMO gapai, whereminis the smallest accepted level-shift, bthe overlap a between the old and new density matrices, whereoptis the optimal level-shift, andcthe change in the model energyERHand the actual energyESCFRH .

(5)

Hessian structure of Eq. 共25兲becomes similar to that of the true energy function ESCF in Eq. 共26兲. The steps generated from ERH with such level shifts will therefore have essen- tially the same direction as the ones generated from ESCF.

By construction, the ERH energy function is lowered when ␮is chosen according to the above prescription

ERH2 Tr F兲关D共␮兲⫺兴⬍0. 共27兲 Since ERHis only a local model of the true energy function ESCF, the associated change in the true energy

ESCFRHESCFD共␮兲兴⫺ESCF兲 共28兲 may be either negative or positive, depending on how well ERHrepresents ESCFfor the chosen step. However, for suffi- ciently small steps,⌬ESCFRH⬍0, since the model function then represents the true energy well.

Let us consider the relationship between the true lower- ing⌬ESCFRH and the lowering predicted by the model function

ERH. Introducing the 共presumably small兲 differential den- sity matrix

⌬⫽D共␮兲⫺ 共29兲 and using the identity Tr AG(B)Tr BG(A) valid for sym- metric matrices A and B, we find that the change in the true energy Eq.共28兲may be written in the form

ESCFRH2 Tr hD共␮兲⫺兴⫹Tr共⫹⌬兲

G⫹⌬兲⫺Tr D¯ G

2 Tr h⌬⫹2 Tr⌬G兲⫹Tr⌬G共⌬兲, 共30兲 which shows that the changes in the true energy and in the model energy are related as

ESCFRH⫽⌬ERH⫹Tr⌬G共⌬兲. 共31兲 If the last term共which is second order in⌬兲is negligible, the energy lowering predicted by the local model ERHbecomes equal to⌬ESCFRH . However, since the correction term is posi- tive 共strictly positive in the absence of exchange兲, its pres- ence in Eq. 共31兲 shows that, for sufficiently large steps, a lowering of the model function may not lead to a lowering of the total energy. To avoid such steps, it would be useful to provide an alternative prediction of⌬ESCFRH that is less expen- sive than the calculation of Tr⌬G共⌬兲itself. Section II A 3 is concerned with this problem.

To demonstrate the efficiency of the chosen level shift

optin the global region of a SCF optimization, we have for the fourth iteration of the rhodium-complex calculation plot- ted in Fig. 1共c兲,⌬ESCFRH and ⌬ERH as a function of ␮. The energy gain ⌬ESCFRH is about optimal for the level shift␮opt. Increasing␮gives a smaller energy gain while decreasing␮ gives a slight increase in the energy gain and from ␮⬍4.5,

ESCFRH is actually positive. Note also that for␮⬍␮opt,⌬ERH and ⌬ESCFRH start to differ indicating that the importance of Tr⌬G共⌬兲 increases. The step representing a RH iteration where ␮⫽0 is far too long to be trusted and results in a significant increase of the total energy.

3. Prediction of the energy close to the minimum To develop a better prediction of⌬ESCFRH than⌬ERH, we note that the only part, that cannot easily be evaluated from known Fock-matrices, is the second-order contribution to Eq.

共31兲 from that part of ⌬ that does not belong to the linear space spanned by the previous density matrices Di. To see this, we decompose the current density matrix D共␮兲into two parts

D共␮兲⫽DD, 共32兲

where Dbelongs to the linear space spanned by the previous density matrices and D belongs to its orthogonal comple- ment. We then expand D in the following manner:

D共␮兲⫽i

n1 ciDi, 33

where the expansion coefficients ci(␮) are determined in a least-squares manner

ci共␮兲⫽

jn1 M1i jTr DjSDS, Mi jTr DiSDjS. 34

The change in the SCF energy associated with the change of density matrix from D¯ to D共␮兲may be expressed as

ESCFRH共␮兲⫽ESCFD兲⫺ESCF兲⫹2 Tr DFD

Tr DGD兲. 共35兲 Ignoring the small term quadratic in D, we may now pre- dict the change in the SCF energy at little cost from the expression

ESCFP 共␮兲⫽ESCFD兲⫺ESCF兲⫹2 Tr DFD兲, 共36兲 using only the density matrices and Fock/KS matrices of the previous iterations. In particular in the later parts of the it- eration sequence, where the space spanned by the densities of the preceding RH iterations is large, an accurate estimate of ⌬ESCFRH may be obtained from this formula. In the follow- ing, we shall see how we may use this prediction to deter- mine the level shift when ␮min0 and a(0)amin.

To illustrate how ⌬ESCFP is used to find the level-shift parameter, consider as an example the determination of the level-shift parameter in the ninth iteration of the rhodium- complex calculation of Sec. III. The plot of the HOMO- LUMO gap in Fig. 2共a兲 shows that the allowed level-shift interval is 0⭐␮⬍⬁. In Fig. 2共b兲, we have plotted the over- lap a(␮) as a function of ␮. Since a(0)amin, we should, according to the discussion in Sec. II A 2, use ␮opt⫽0 to determine the step. In short, considerations based on the HOMO-LUMO gap and on the overlap with the averaged density matrix indicate that the next density matrix should be determined from the standard, unshifted RH equations.

However, from the nine density matrices of the previous RH iterations, we can use⌬ESCFP (␮) to predict the change in ESCFRH(␮) more accurately than with⌬ERH(␮). Indeed, from Fig. 2共c兲, we see that⌬ESCFP (␮) provides a good global rep- resentation of⌬ESCFRH(␮), with a minimum close to the mini- mum of ⌬ESCFRH(␮). By contrast, the local model⌬ERH(␮)

(6)

gives a minimum at␮⫽0. Clearly,␮⫽0 should be avoided in the calculation since it would lead to an increase in the SCF energy. Instead, the value of the level-shift parameter that corresponds to the minimum of⌬ESCFP 共denoted by␮opt) is chosen for the calculation of the next density matrix.

This procedure may be summarized as follows. If␮min

0 and a(0)amin, then we calculate the predicted energies

ESCFP (0) and ⌬ESCFP (␦) with ␦ⲏ0. If ⌬ESCFP (0)

⬍⌬ESCFP (␦), then we use D共0兲. Otherwise, we estimate the minimum ␮opt of ⌬ESCFP (␮) by an inexact line search and use the density matrix D(opt) at this minimum.

B. Density-subspace minimization 1. The DSM energy function

Let us assume that we have carried out n RH iterations and that we have kept all previous density matrices Di and the corresponding Fock matrices Fi. We would now like to construct an optimal density as a linear combination of the densities from these iterations according to Eq.共9兲,

i

n1 ciDi. 37

Ideally, this averaged density should also fulfill the condi- tions Eqs.共3兲–共5兲. The symmetry condition Eq.共3兲is trivi- ally satisfied since the averaged density Eq. 共37兲is a linear combination of symmetric density matrices. The trace condi- tion Eq. 共4兲 is also easily taken care of by imposing the restriction

i

1 n

ci⫽1 共38兲

on the expansion coefficients Tr D¯ S⫽i

1

n

ciTr DiSN

2. 共39兲

By contrast, the idempotency condition Eq. 共5兲 cannot be imposed on the averaged density matrix. However, the idem- potency may be significantly improved if, instead of working with D¯ , we work with the purified density matrix6

3D¯ SD¯2D¯ SD¯ SD¯ , 共40兲 as proposed by Nunes and Vanderbilt.7The electronic energy may be expressed in terms of the purified average density matrix as

E兲⫽2 Tr hD˜Tr D˜ G兲. 共41兲 We note that the purified density is correct to first order in the expansion coefficients ci and that E(D˜ ) thus contains errors through second order in ci. To determine the best average density matrix Eq. 共37兲, we shall minimize Eq.共41兲 with respect to the expansion coefficients ci subject to the condition Eq.共38兲.

One problem we encounter when minimizing Eq.共41兲is that new Fock matrices F(D˜ ) need to be evaluated. To avoid this problem, we shall use an approximate form of Eq.共41兲. Since the purified density matrix D˜ is close to the original density matrix D¯ , we can write it as

⫹⌬, 共42兲

where ⌬ is the correction term. Inserting Eq.共42兲 into Eq.

共41兲, we obtain

E2 Tr hD¯Tr D¯ G共D¯兲⫹2 Tr h

⫹2 Tr⌬G兲⫹Tr⌬G共⌬兲. 共43兲 Since⌬is small, we may ignore the term quadratic in⌬and arrive at the density-subspace minimization 共DSM兲 energy function

EDSMc兲⫽2 Tr hD¯Tr D¯ G兲⫹2 Tr h⌬⫹2 Tr⌬G

E兲⫹2 Tr F兲共兲. 共44兲 Since ⌬ is first order in the expansion coefficients ci, the DSM energy differs from the true energy to second and

FIG. 2. For the ninth iteration of the rhodium calculation described in Sec.

III we have displayed as a function of the level-shift parameter;共a兲the HOMO-LUMO gapai, wheremin0,bthe overlap a between the old and new density matrices, where aminis the smallest accepted overlap and cthe change in the model energyERH, the actual energyESCFRH and the predicted energy⌬ESCF

P .optis found at the minimum of⌬ESCF P ().

(7)

higher orders in ci. The first contribution to the DSM energy function may for example be evaluated using the energy ex- pression of the EDIIS algorithm,5

E兲⫽

i ciESCFDi兲⫺12

i j cicjTrFiFj兲共DiDj.

共45兲 Using Eq.共40兲, we find that the second contribution may be evaluated as

2 Tr F兲共兲⫽⫺2

i j cicjTr FiDj

⫹6

i jk cicjckTr FiDjSDk

⫺4

i jkl cicjckclTr FiDjSDkSDl. 46

All contributions to the DSM energy function are therefore easily calculated from the previous density and Fock/KS matrices.

2. The trust-region DSM minimization

We minimize the DSM energy functional by the trust- region method.12 We thus consider the second-order Taylor expansion of the DSM energy in Eq. 共44兲 about c0. Intro- ducing the step vector

ccc0, 共47兲 we obtain

E(2)DSM共⌬c兲⫽E0⫹⌬cTg12cTHc, 共48兲 where the energy, gradient, and Hessian at the expansion point are given by

E0Ec0兲, g⫽⳵Ec

c

cc

0

, H⫽⳵2Ec

c2

cc

0

. 共49兲 As starting point c0, we choose the density matrix with the lowest energy ESCF(Di), usually from the last RH iteration.

The trace condition Eq. 共38兲imply

i1 n

ci⫽0. 共50兲

We also introduce a trust region of radius h for E(2)DSM(⌬c) and require that steps are always taken inside or to the boundary of this region. To determine a step to the boundary, we restrict the step to have the length h in the S metric norm of Eq.共34兲,

储⌬c共␮兲储S

2

i jciMi jcjh2. 共51兲 Introducing the undetermined multipliers ␭ and ␮ for the trace and step-size constraints, we arrive at the following Lagrangian for minimization on the boundary of the trust region:

L共⌬c,,␭兲⫽E0⫹⌬cTg12cTHc⫺␭⌬cT1

12␮共⌬cTMch2兲, 共52兲 where 1 is a column vector with elements equal to 1. Differ- entiating this Lagrangian and setting the derivatives equal to zero, we obtain the equations

L

⳵⌬cgHc⫺␮Mc⫺␭10, 共53兲

L

⳵␭⫽⫺⌬cT1⫽0, 共54兲

L

⳵␮⫽⫺12共⌬cTMch2兲⫽0. 55 The optimization of the Lagrangian thus corresponds to the solution of the following set of linear equations:

H1TM 01

c

⫽⫺

0g

, 56

where the multiplier␮is iteratively adjusted until the step is to the boundary of the trust region Eq.共55兲. The step-length restriction may be lifted by setting␮⫽0, as needed for steps inside the trust region.

To understand the behavior of the step-length function, we consider first the generalized eigenvalue problem

H1T 01

v

M0T 0

v

, 57

where 0 is a column vector with zero elements,␧ is a small positive constant, and the eigenvector is normalized such that

vTv⫹␯2⫽1. 共58兲

We first note that, for a finite ␧, v0. Next, carrying out block multiplications in Eq.共57兲, we obtain

Hv1␯⫽␻Mv, 共59兲

1Tv⫽␻␧␯, 共60兲 which upon elimination of ␯ from the first equation yields the relation

␻␧Hv⫹共1Tv1⫽␻2Mv. 共61兲 Since (1Tv)1 is finite, we conclude that, as␧ tends to zero, the eigenvalue ␻ tends to either plus or minus infinity

⫾␧1/2. Next, substituting these values of␻ into Eq. 共60兲, we find that v tends to the zero vector with elements propor- tional to ␧1/2 and that ␯, because of the normalization Eq.

共58兲, tends to⫿1. In short, the eigenvalue problem Eq.共57兲 with ␧⫽0 has two eigenvalues ⫾⬁, whose eigenvectors have zero elements except for the last element, which is equal to⫿1. Finally, invoking the Hylleraas–Undheim inter- lace theorem,10,11we conclude that the remaining n⫺1 finite eigenvalues of Eq. 共57兲 bisects the n eigenvalues of the re- duced eigenvalue problem

Hv⫽␷Mv. 共62兲

(8)

Let us now consider the step length储⌬c()Sas a func- tion of ␮. In the diagonal representation of the augmented matrix in the linear equations Eq. 共57兲, we may write these equations in the following uncoupled form:

hi⫺␮mi兲␴i⫽⫺␥i, i1,2,3,...,n⫹1. 共63兲 Here, the hiand miare the diagonal elements of the Hessian and metric matrices, respectively, of the generalized eigen- value problem Eq.共57兲, whereas the␴iand␥i, respectively, are the corresponding elements of the solution and gradient vectors of Eq. 共56兲. Since the last element of the gradient vector in Eq. 共56兲is zero, the gradient vector has no contri- butions from the eigenvectors with infinite eigenvalues

1⫽␥n1⫽0, ␻1⫽⫺␻n1⫽⫺⬁ 共64兲 assuming that the eigenvalues are sorted in increasing order

1⬍␻2⬍¯⬍␻n1. In the diagonal representation, there- fore, we may write the step norm in the form

储⌬c共␮兲储S

i

2

n mii

2

hi⫺␮mi2. 共65兲 From this expression, we note that the step function consists of n branches separated by n⫺1 asymptotes at the finite eigenvalues ␻i. Moreover, it increases monotonically from zero to infinity as ␮ increases from minus infinity and ap- proaches the lowest finite eigenvalue␻2. Therefore, there is always one and only one ⫺⬁⬍␮⬍␻2 that gives rise to a step of length h. As shown by Fletcher,12 this value of ␮ corresponds to the global minimum on the boundary of the trust region.

In practice, we cannot easily determine the eigenvalues

i of the augmented eigenvalue problem Eq. 共57兲. Instead, we determine the eigenvalues␷i of the reduced problem Eq.

共62兲 and restrict our search of ␮ to the smaller monotonic interval ⫺⬁⬍␮⬍␷1. Since ␷1⬍␻2, it is possible that no solution exists in this reduced interval. Mostly, however, this restriction is mild since the two eigenvalues are usually close. If no solution is found, we choose instead the slightly shorter step obtained with␮⫽␷1.

To illustrate how the level-shift parameter␮in Eq.共56兲 is determined, we consider the first关Fig. 3共a兲兴and third关Fig.

3共b兲兴 DSM step in the eighth iteration of the rhodium- complex calculation in Sec. III. We have plotted the step- length function储⌬c()S as a function of␮. The plots con- sist of a series of branches between asymptotes where ␮ makes the matrix on the left-hand side of Eq.共56兲singular.

The lowest eigenvalue ␷1 is marked with a vertical dashed line in Figs. 3共a兲and 3共b兲. For minimization, the level-shift parameter is chosen in the interval ⫺⬁⬍␮⬍min(␷1,0), where ␷1 is the lowest eigenvalue of Eq. 共62兲. The proper value is found where the step-length function crosses the line representing the trust radius h, as marked with a cross in Fig.

3共a兲. If the step that minimizes E(2)DSMis inside the trust re- gion, ␮⫽0 is chosen as marked with a cross in Fig. 3共b兲. The trust region is updated during the iterative procedure.

3. Global optimization of the DSM function

The optimization of the EDSMenergy is carried out in the usual manner, requiring several trust-region steps, each of which involves the construction of the gradient g and the Hessian H, and the solution of the modified level-shifted Newton equations Eq.共56兲. After p iterations, the density is calculated from the coefficients

cpc(0)

ip1 ci. 66

However, since EDSM itself is a rather crude model of the true energy function ESCF, it resembles ESCFonly in a small region about the initial point c(0). The DSM iterations are therefore terminated when the total step length 储cpc(0)exceeds some preset value k. If a minimum of EDSMis found inside the trust region 储cpc(0)储⬍k, then the step to the minimum is taken and the iterations are terminated. This is often the case.

Occasionally, the iterations start where the lowest eigen- value of the Hessian in Eq.共62兲is negative. In the course of the iterations, the Hessian can become positive definite and a minimum is reached. In a few cases, however, a negative Hessian eigenvalue may persist, changing little from itera- tion to iteration. In our experience, a step along the eigen- vector corresponding to the negative eigenvalue cannot be trusted. This direction is therefore projected out from the step and the DSM function is minimized in the orthogonal sub- space.

As an illustration, consider the first DSM step of the tenth SCF iteration of the rhodium-complex calculation in Sec. III. In Fig. 4, we have, for comparison, plotted the step- length functions with the negative component kept and pro- jected out. The level shifts resulting from the two situations

FIG. 3. The step-length function储⌬c()储Sis plotted as a function offor the firstaand thirdbDSM step in the eighth iteration of the rhodium calculation described in Sec. III. The trust radius h is represented by a horizontal line. The propervalue is marked with a cross.

(9)

are marked with crosses in Fig. 4. The level shift used in the DSM optimization is, in this particular case,␮⫽0.

When the trust-region minimization is terminated, a new RH iteration is initiated by constructing a new density and associated Fock matrix

i

n1 ciDi, F¯i

n1 ciFDi, 67

where we have used the fact that the Fock matrix is linear in the density. By construction EDSM(c) is lowered at each it- eration of the trust-region minimization. The total energy lowering at the pth iteration is given by

EDSMEDSMcp兲⫺EDSMc(0)兲. 共68兲 Since EDSM is a local model to the true energy ESCF, the lowering of EDSMwill also lead to a lowering of ESCFpro- vided the total step is sufficiently short to be in the local region.

4. Relationship to the DIIS method

The optimal density has previously been determined us- ing the DIIS scheme of Pulay.4In the DIIS method, the im- proved density matrix is obtained as a linear combination of the previous density matrices where the expansion coeffi- cients are determined by minimizing the norm of the error vector, using the gradients of the previous iterations as error vectors. To highlight the difference between TRDSM and DIIS, we give below an alternative derivation of the DIIS algorithm.

In an SCF calculation, the electronic gradient with the averaged density matrix D¯ in Eq.共37兲may be expressed in the form,3

g兲⫽4共D¯ SF兲⫺FSD¯兲. 共69兲 To determine the best linear combination of densities Di, we minimize the norm of the squared gradient

g兲储2⫽16 Tr关D¯ SF兲⫺FSD¯2. 共70兲 Inserting the expansion Eq. 共37兲, we obtain a quartic poly- nomial in ci,

g兲储2⫽16 Tr

i cigDi兲⫹

i, j cicjDiSFDjDi

FDjDiSDi

2. 71

To simplify this expression, we neglect all cubic and quartic terms

g兲储app

2

i, j cicjgDigDj. 72

Optimization of Eq. 共72兲 subject to the constraint Eq. 共38兲 gives the DIIS expression of the expansion coefficients in Eq. 共37兲.

III. APPLICATIONS

In this section, we examine the convergence characteris- tics of the TRSCF algorithm. First, we consider a rhodium- complex optimization as an example of a difficult case; next, as a simpler case, we consider a calculation on H2O with the OH bond lengths stretched to double length. For comparison, we also give the convergence characteristics of the DIIS algorithm4 and the quadratically convergent restricted step Hartree–Fock 共QRHF兲method.13,14All calculations are car- ried out using a local version of the DALTON program package.17

A. The rhodium complex calculation

In Fig. 5, we have plotted the error in the energy at each iteration of TRSCF, DIIS, and QRHF optimizations of the rhodium complex with the geometry specified in Table I us- ing the AhlrichsVDZ basis16combined with STO-3G on Rh.

The starting orbitals have been obtained from diagonalizing the one-electron Hamiltonian.

Clearly, the QRHF and DIIS methods do not work in this case. In particular, the DIIS method is unable to handle the global part of the optimization, where the initially indefinite Hessian changes its structure and becomes positive definite.

Since the DIIS method relies solely on gradient information, it does not see the negative eigenvalues and produces steps that may or may not be in the right direction, leading to

FIG. 4. The step-length function储⌬c()储Sis plotted as a function ofwith the direction corresponding to the negative Hessian eigenvalue keptand projected out - - -, respectively. The values resulting from the two situations are marked with crosses.

FIG. 5. The convergence of calculations on the rhodium complex using AhlrichsVDZ basisRef. 16combined with STO-3G for Rh. The error in the total energy is given for the TRSCF, the standard DIIS, and the QRHF method as a function of the iteration number. Furthermore results are given where DIIS is applied after nine TRSCF iterations.

Referanser

RELATERTE DOKUMENTER

The speed of the striation patterns along an array can be related to the target speed, taking account of the target’s track with its offset and course in relation to the

A UAV will reduce the hop count for long flows, increasing the efficiency of packet forwarding, allowing for improved network throughput. On the other hand, the potential for

Sorption of Cu, Sb and Pb (%) as a function a function of the total concentration of elements in the pond with charcoal and iron hydroxide as sorbents in two

This report presented effects of cultural differences in individualism/collectivism, power distance, uncertainty avoidance, masculinity/femininity, and long term/short

Only by mirroring the potential utility of force envisioned in the perpetrator‟s strategy and matching the functions of force through which they use violence against civilians, can

As a principle, a validating agent need certificates and revocation status for the entire certificate path in order to verify a signature.. The report will now commence with

It thus seems reasonable to anticipate that the difference between the Hartree–Fock limit and the results obtained with the cc-pVDZ London basis should be less than 2% for molecules

0009-2614 r 00 r $ - see front matter q 2000 Elsevier Science B.V.. In this Letter, we propose an exponential parametrization of the one-electron AO density ma- trix, by analogy