Derivation of interatomic potentials

One of the major challenges facing anyone wishing to use forcefield methods is to determine the functional form and parameters required to calculate the energy and properties. In the field of organic chemistry and biomolecules, there are a number of well-established forcefields, such as MM3 [106], and those associated with the programs AMBER [14], and CHARMM [15], which are, in principle, capable of handling most elements typically found in these systems (C, H, O, N, S, P, etc) in their common hybridisation states. These are constructed around the molecular mechanics approach where the forcefield is connectivity driven and interactions are described in terms of two-, three-, and four-body bonding terms, plus long-range Coulomb and non-bonded interactions. While there have been several attempts to generate general forcefields that cover the entire periodic table, such as UFF [107], ESFF [108], etc, none have been completely successful. Because of the enormity of the amount of data required when spanning the whole range of elements, it is impractical to determine such a universal forcefield by empirical fitting. Instead general rules must be used to predict parameters based on extrapolations and intrinsic properties of the element that are known - for instance the electronegativity. Not surprisingly, this leads to limitations in the quality of the results. For most inorganic systems it is usually necessary to derive a forcefield for the specific material, or family of materials, of interest.

There are two means by which a forcefield can generally be derived, if we exclude rule based extrapolations. Firstly, it is possible to obtain parameters by fitting to a potential energy surface obtained from a non-empirical theoretical method. This would typically consist of results from ab initio calculations, ideally on the periodic solid [109], or perhaps on a gas phase cluster, or even better, both of the aforementioned sources. The potential energy surface can be fitted either as a sequence of geometries with their corresponding energies, or derivatives of the energy could also be included to maximise the amount of information from each higher level calculation. Many of the early forcefields for ionic materials were determined using electron gas methods [110], in which the energies of interaction between pairs of ions were determined by a density functional calculation using the overlapping ionic electron densities, where the anion is confined in an appropriate potential well. Secondly, parameters can be obtained by empirical fitting in which the normal process of using a forcefield to determine the properties of a material is inverted. This approach depends on the availability of a range of experimental data. Knowledge of the crystal structure is a definite prerequisite for this method, but is insufficient alone since information is required as to the curvature of the energy surface about the minimum. This later data may come from quantities such as elastic constants, bulk moduli, piezoelectric constants, dielectric constants, or phonon frequencies.

In order to perform a fit, first it is necessary to define a quantity that measures the quality of the results, known as the sum of squares;


where $N_{obs}$ is the number of observables, $f_{i}^{obs}$ and $f_{i}^{calc}$ are the fitted and calculated values of the observable, respectively, and $w_{i}$ is the weighting factor for the given observable. Because of the weighting factor, there are an infinite number of possible solutions all of which are equally valid. Hence, one of the most important skills is knowing how to choose appropriate and sensible weighting factors. There are several criteria that can be used for guidance though. Firstly, if fitting experimental data, the weighting factor should be inversely proportional to the uncertainty in the measured value. Obviously trusted, precise values should be given more priority than data where there are large error bars. Secondly, the weight factor should be inversely proportional to the magnitude of the observable squared. This ensures that all values are fitted on an equal footing, regardless of units. For example, fitted vibrational frequencies in wavenumbers are typically two to three orders of magnitude larger than structural variables.

The fitting process itself involves minimising the above function $F$. To do this, the default approach is similar to that used in optimisation. For many terms, the evaluation of the derivatives of the sum of squares with respect to the variables is complex, and in some of the fitting algorithms that will be discussed subsequently it is even impossible. As a result, numerical derivatives are employed during fitting since it greatly simplifies the process. Because of this, the default optimiser is to use a BFGS update of an initial on-diagonal only Hessian, obtained by finite differences, in a Newton-Raphson process. However, for particularly difficult cases, where correlation between variables is strong, there is the option to use a full Hessian matrix, again obtained by finite differences.

Now we turn to specifically consider the fitting methodology for the case of empirical data. Traditionally, the experimental structure is fitted by varying the potential parameters so as to minimise the forces at this configuration, and this is the default strategy. Other observables, such as elastic constants etc, are then calculated at the experimental structure too. When working with the shell model, either dipole or breathing shell, there is an additional complication though in that while the cores are fully specified since they are equated with the nuclei of the ions, the positions/radii of the shells are undefined. One approach is to fit with the shells positioned to be concentric with the cores. However, this is unphysical since it implies that there is no ionic polarisation, which defeats the object of including the model in the first place. A second approach might be to place the shell according to specified ion dipoles, but this information is almost impossible to come by. Only data about the total polarisation of the unit cell is typically available and a unique atomic decomposition is not possible. In order to handle this issue, the simultaneous relaxation of shells approach was introduced into GULP [111]. Here the shells are allowed to move during fitting. Formally, the most correct approach is to allow the shells to be energy minimised at every evaluation of the fitting function. However, a simpler approach has been implemented in which the shell forces are added as observables and the shell positions become fitting variables. In this way, the shells are minimised directly within the fitting procedure. In the absence of any observables other than the structure, this is exactly equivalent to minimising the shells at every step of fitting. When observables are present there will be small differences, but these are usually an acceptable price to pay for the greater ease of implementation.

There is actually a more fundamental flaw with the approach of fitting forces at the experimental geometry as a method of fitting. Often we judge the quality of a fit by the percentage error in the structural variables rather than the forces. Although lowering the forces during a fit generally improves the optimised structure with respect to experiment, this is not guaranteed to be the case (and indeed we have found examples where it is not). This can be understood by making a harmonic estimate of the atomic displacements, $x$, based on the forces, $f$:


It can be readily seen that the magnitude of the displacements also depends on the inverse of the Hessian matrix. Thus if the forces improve, but the description of the curvature about the minimum deteriorates, then the errors can potentially increase. If curvature information is included in the fit, then this can tend to reduce this problem. However, there is a further difficulty here. Formally speaking, the expressions for the elastic constants and other properties are defined about a position of zero stress and zero internal derivative. Therefore, calculating the properties at the experimental structure when the forces are non-zero leads to errors also. The solution to both of these dilemas is to use the so-called relax fitting methodology [111] in which the structure is fully optimised at every evaluation of the fitting function and the properties calculated at the optimised configuration. Obviously this is a far more expensive procedure, but does yield the most reliable results. Also there is the requirement that the initial potential set is reasonable enough to actually give a valid minimisation of the structure.

Having obtained an apparently successful fit, it is important to assess the quality of the results, since there are plenty of pitfalls and so convergence shouldn't be taken to represent a good quality solution. Firstly, the potential model should be tested for all reasonable properties, not just those used in the fit. It could easily be the case that a forcefield reproduces a high symmetry structure and, say, a single curvature observable, such as the bulk modulus. However, examination of the full elastic constant tensor, dielectric properties and phonons might reveal that the system is unstable with respect to a lowering of symmetry which wouldn't show up in the fit. Secondly, the forcefield could be transferred to a different material, not used in the original fit, to test whether the results are sensible. Finally, it is important to look at the potential parameters and assess whether the numbers are physically sensible. For instance, it is not uncommon for dispersion terms to become extremely large if allowed to fit unconstrained. While this might have improved the sum of squares for one particular system, it means all hope of transferability has been lost. Similarly, there can often be problems with fitting $A$ and $\rho$ of a Buckingham potential concurrently due to the strong correlation coefficent between the parameters.

The focus above has been primarily on empirical derivation of interatomic potentials. However, with the increasing ability to perform periodic ab initio calculations on solids an attractive alternative is to derive forcefields that attempt to reproduce the results of such methods. There are several reasons to take this approach. Firstly, by fitting the outcomes of a single Hamiltonian it is possible to guarantee that the training set is fully consistent (i.e. there are no differences in temperature, pressure, sample quality, or variable uncertainities in the observables). Secondly, data can be obtained for materials were no experimental information exists or at geometries that are significantly perturbed from the equilibrium one. Thirdly, the data from quantum mechanical methods is free of statistical mechanical effects, such as thermal vibrations and zero point motion. Hence, if the aim is to perform free energy minimisation then the interatomic potentials will represent a proper starting point for the inclusion of these quantities.

Fitting of quantum mechanical data can be performed in one of two ways, either by proceeding in the same fashion as for empirical derivation, or by use of an energy hypersurface. In the latter case, this is achieved by specifying a series of structures with their corresponding energies, and optionally first derivatives. Typically the structures would include the equilibrium configuration and as many distinct distortions about this point in order to probe as many different interatomic distances between atoms as possible. Perhaps the most difficult decision is how to weight the configurations. Unless the forcefield is able to reproduce the ab initio data very accurately, then it is usually desirable to weight the fit in favour of configurations nearer the equilibrium structure. One approach that has been taken is to use a Boltzmann factor weighting based on the energy difference to the minimum energy configuration, with an appropriate choice of temperature to the task ultimately to be performed [112]. However, there are many other possible choices too. A further issue concerns the fitting of quantum mechanical energies. Unless these values have been converted to a binding or lattice energy with reference to the dissociated state of the species within the system then it is inappropriate to fit the absolute values of the energies. Consequently, the easiest solution is to include an additive energy shift parameter in the fit, such that only relative energies are actually fitted.

Finally, the option exists within GULP to perform fitting using genetic algorithms as well as via least squares techniques. This may be potentially useful in cases where a complex system is being fitted when there is no reasonable starting approximation to the forcefield available or where there may be multiple local minima in the parameter space. However, to date we have yet to encounter a situation where this approach has proved beneficial over the more conventional methodology. This emphasizes that there is no substitute for making physically sensible choices for the functional form of the forcefield and the initial parameters.