Accelerating seminumerical Fock-exchange calculations using mixed single- and double-precision arithmethic
H. Laqua, J. Kussmann, C. Ochsenfeld
Journal of Chemical Physics 154 (4), 214116 (2021).
We investigate the applicability of single-precision (fp32) floating point operations within our linear-scaling, seminumerical exchange method sn-LinK [Laqua et al., J. Chem. Theory Comput. 16, 1456 (2020)] and find that the vast majority of the three-center-one-electron (3c1e) integrals can be computed with reduced numerical precision with virtually no loss in overall accuracy. This leads to a near doubling in performance on central processing units (CPUs) compared to pure fp64 evaluation. Since the cost of evaluating the 3c1e integrals is less significant on graphic processing units (GPUs) compared to CPU, the performance gains from accelerating 3c1e integrals alone is less impressive on GPUs. Therefore, we also investigate the possibility of employing only fp32 operations to evaluate the exchange matrix within the self-consistent-field (SCF) followed by an accurate one-shot evaluation of the exchange energy using mixed fp32/fp64 precision. This still provides very accurate (1.8 µEh maximal error) results while providing a sevenfold speedup on a typical “gaming” GPU (GTX 1080Ti). We also propose the use of incremental exchange-builds to further reduce these errors. The proposed SCF scheme (i-sn-LinK) requires only one mixed-precision exchange matrix calculation, while all other exchange-matrix builds are performed with only fp32 operations. Compared to pure fp64 evaluation, this leads to 4–7× speedups for the whole SCF procedure without any significant deterioration of the results or the convergence behavior.
Highly Efficient Resolution-of-Identity Density Functional Theory Calculations on Central and Graphics Processing Units
J. Kussmann, H. Laqua, C. Ochsenfeld
Journal of Chemical Theory and Computation 17, 1512-1521 (2021).
We present an efficient method to evaluate Coulomb potential matrices using the resolution of identity approximation and semilocal exchange-correlation potentials on central (CPU) and graphics processing units (GPU). The new GPU-based RI-algorithm shows a high performance and ensures the favorable scaling with increasing basis set size as the conventional CPU-based method. Furthermore, our method is based on the J-engine algorithm [White; , Head-Gordon, J. Chem. Phys. 1996, 7, 2620], which allows for further optimizations that also provide a significant improvement of the corresponding CPU-based algorithm. Due to the increased performance for the Coulomb evaluation, the calculation of the exchange-correlation potential of density functional theory on CPUs quickly becomes a bottleneck to the overall computational time. Hence, we also present a GPU-based algorithm to evaluate the exchange-correlation terms, which results in an overall high-performance method for density functional calculations. The algorithms to evaluate the potential and nuclear derivative terms are discussed, and their performance on CPUs and GPUs is demonstrated for illustrative calculations.
A scaled explicitly correlated F12 correction to second-order MOller-Plesset perturbation theory
L. Urban, T.H. Thompson, C. Ochsenfeld
Journal of Chemical Physics 154 (4), 044101 (2021).
An empirically scaled version of the explicitly correlated F12 correction to second-order MOller-Plesset perturbation theory (MP2-F12) is introduced. The scaling eliminates the need for many of the most costly terms of the F12 correction while reproducing the unscaled explicitly correlated F12 interaction energy correction to a high degree of accuracy. The method requires a single, basis set dependent scaling factor that is determined by fitting to a set of test molecules. We present factors for the cc-pVXZ-F12 (X = D, T, Q) basis set family obtained by minimizing interaction energies of the S66 set of small- to medium-sized molecular complexes and show that our new method can be applied to accurately describe a wide range of systems. Remarkably good explicitly correlated corrections to the interaction energy are obtained for the S22 and L7 test sets, with mean percentage errors for the double-zeta basis of 0.60% for the F12 correction to the interaction energy, 0.05% for the total electron correlation interaction energy, and 0.03% for the total interaction energy, respectively. Additionally, mean interaction energy errors introduced by our new approach are below 0.01 kcal mol(-1) for each test set and are thus negligible for second-order perturbation theory based methods. The efficiency of the new method compared to the unscaled F12 correction is shown for all considered systems, with distinct speedups for medium- to large-sized structures.
Low-Scaling Tensor Hypercontraction in the Cholesky Molecular Orbital Basis Applied to Second-Order Moller-Plesset Perturbation Theory
F.H. Bangerter, M. Glasbrenner, C. Ochsenfeld
Journal of Chemical Theory and Computation 17 (1), 211-221 (2021).
We employ various reduced scaling techniques to accelerate the recently developed least-squares tensor hypercontraction (LS-THC) approximation [Parrish, R M., Hohenstein, E. G., Martinez, T. J., Sherrill, C. D. J. Chem. Phys. 137, 224106 (2012)] for electron repulsion integrals (ERIs) and apply it to second-order Moller-Plesset perturbation theory (MP2). The grid-projected ERI tensors are efficiently constructed using a localized Cholesky molecular orbital basis from density-fitted integrals with an attenuated Coulomb metric. Additionally, rigorous integral screening and the natural blocking matrix format are applied to reduce the complexity of this step. By recasting the equations to form the quantized representation of the 1/r operator Z into the form of a system of linear equations, the bottleneck of inverting the grid metric via pseudoinversion is removed. This leads to a reduced scaling THC algorithm and application to MP2 yields the (sub-)quadratically scaling THC-omega-RI-CDD-SOS-MP2 method. The efficiency of this method is assessed for various systems including DNA fragments with over 8000 basis functions and the subquadratic scaling is illustrated.
A range-separated generalized Kohn-Sham method including a long-range nonlocal random phase approximation correlation potential
D. Graf, C. Ochsenfeld
Journal of Chemical Physics 153 (24), 244118 (2002).
Based on our recently published range-separated random phase approximation (RPA) functional [Kreppel et al., "Range-separated density-functional theory in combination with the random phase approximation: An accuracy benchmark," J. Chem. Theory Comput. 16, 2985-2994 (2020)], we introduce self-consistent minimization with respect to the one-particle density matrix. In contrast to the range-separated RPA methods presented so far, the new method includes a long-range nonlocal RPA correlation potential in the orbital optimization process, making it a full-featured variational generalized Kohn-Sham (GKS) method. The new method not only improves upon all other tested RPA schemes including the standard post-GKS range-separated RPA for the investigated test cases covering general main group thermochemistry, kinetics, and noncovalent interactions but also significantly outperforms the popular G(0)W(0) method in estimating the ionization potentials and fundamental gaps considered in this work using the eigenvalue spectra obtained from the GKS Hamiltonian.
Efficient Reduced-Scaling Second-Order Moller-Plesset Perturbation Theory with Cholesky-Decomposed Densities and an Attenuated Coulomb Metric
M. Glasbrenner, D. Graf, C. Ochsenfeld
Journal of Chemical Theory and Computation 16 (11), 6856-6868 (2020).
We present a novel, highly efficient method for the computation of second-order Moller-Plesset perturbation theory (MP2) correlation energies, which uses the resolution of the identity (RI) approximation and local molecular orbitals obtained from a Cholesky decomposition of pseudodensity matrices (CDD), as in the RI-CDD-MP2 method developed previously in our group [Maurer, S. A.; Clin, L.; Ochsenfeld, C. J. Chem. Phys. 2014, 140, 224112]. In addition, we introduce an attenuated Coulomb metric and subsequently redesign the RI-CDD-MP2 method in order to exploit the resulting sparsity in the three-center integrals. Coulomb and exchange energy contributions are computed separately using specialized algorithms. A simple, yet effective integral screening protocol based on Schwarz estimates is used for the MP2 exchange energy. The Coulomb energy computation and the preceding transformations of the three-center integrals are accelerated using a modified version of the natural blocking approach [Jung, Y.; Head-Gordon, M. Phys. Chem. Chem. Phys. 2006, 8, 2831-2840]. Effective subquadratic scaling for a wide range of molecule sizes is demonstrated in test calculations in conjunction with a low prefactor. The method is shown to enable cost-efficient MP2 calculations on large molecular systems with several thousand basis functions.
Range-Separated Density-Functional Theory in Combination with the Random Phase Approximation: An Accuracy Benchmark
A. Kreppel, D. Graf, H. Laqua, C. Ochsenfeld
Journal of Chemical Theory and Computation 16 (5), 2985-2994 (2020).
A formulation of range-separated random phase approximation (RPA) based on our efficient omega-CDGD-RI-RPA [J. Chem. Theory Comput. 2018, 14, 2505] method and a large scale benchmark study are presented. By application to the GMTKN55 data set, we obtain a comprehensive picture of the performance of range-separated RPA in general main group thermochemistry, kinetics, and noncovalent interactions. The results show that range-separated RPA performs stably over the broad range of molecular chemistry included in the GMTKN55 set. It improves significantly over semilocal DFT but it is still less accurate than modern dispersion corrected double-hybrid functionals. Furthermore, range-separated RPA shows a faster basis set convergence compared to standard full-range RPA making it a promising applicable approach with only one empirical parameter.
Highly Efficient, Linear-Scaling Seminumerical Exact-Exchange Method for Graphic Processing Units
H. Laqua, T.H. Thompson, J. Kussmann, C. Ochsenfeld
Journal of Chemical Theory and Computation 16 (3), 1456-1468 (2020).
We present a highly efficient and asymptotically linear-scaling graphic processing unit accelerated seminumerical exact-exchange method (snLinK). We go beyond our previous central processing unit-based method (Laqua, H.; Kussmann, J.; Ochsenfeld, C. J. Chem. Theory Comput. 2018, 14, 3451-3458) by employing our recently developed integral bounds (Thompson, T. H.; Ochsenfeld, C. J. Chem. Phys. 2019, 1.50, 044101) and high-accuracy numerical integration grid (Laqua, H.; Kussmann, J.; Ochsenfeld, C. J. Chem. Phys. 2018, 149, 204111). The accuracy is assessed for several established test sets, providing errors significantly below 1mE(h) for the smallest grid. Moreover, a comprehensive performance analysis for large molecules between 62 and 1347 atoms is provided, revealing the outstanding performance of our method, in particular, for large basis sets such as the polarized quadruple-zeta level with diffuse functions.
A Schwarz inequality for complex basis function methods in non-Hermitian quantum chemistry
T.H. Thompson, C. Ochsenfeld, T.C. Jagau
Journal of Chemical Physics 151 (18), 184104 (2019).
A generalization of the Schwarz bound employed to reduce the scaling of quantum-chemical calculations is introduced in the context of non-Hermitian methods employing complex-scaled basis functions. Non-Hermitian methods offer a treatment of molecular metastable states in terms of L-2-integrable wave functions with complex energies, but until now, an efficient upper bound for the resulting electron-repulsion integrals has been unavailable due to the complications from non-Hermiticity. Our newly formulated bound allows us to inexpensively and rigorously estimate the sparsity in the complex-scaled two-electron integral tensor, providing the basis for efficient integral screening procedures. We have incorporated a screening algorithm based on the new Schwarz bound into the state-of-the-art complex basis function integral code by White, Head-Gordon, and McCurdy [J. Chem. Phys. 142, 054103 (2015)]. The effectiveness of the screening is demonstrated through non-Hermitian Hartree-Fock calculations of the static field ionization of the 2-pyridoxine 2-aminopyridine molecular complex. Published under license by AIP Publishing.
Low-Scaling Self-Consistent Minimization of a Density Matrix Based Random Phase Approximation Method in the Atomic Orbital Space
D. Graf, M. Beuerle, C. Ochsenfeld
Journal of Chemical Theory and Computation 15 (8), 4468-4477 (2019).
An efficient minimization of the random phase approximation (RPA) energy with respect to the one-particle density matrix in the atomic orbital space is presented. The problem of imposing full self-consistency on functionals depending on the potential itself is bypassed by approximating the RPA Hamiltonian on the basis of the well-known Hartree-Fock Hamiltonian making our self-consistent RPA method completely parameter-free. It is shown that the new method not only outperforms post-Kohn-Sham RPA in describing noncovalent interactions but also gives accurate dipole moments demonstrating the high quality of the calculated densities. Furthermore, the main drawback of atomic orbital based methods, in increasing the prefactor as compared to their canonical counterparts, is overcome by introducing Cholesky decomposed projectors allowing the use of large basis sets. Exploiting the locality of atomic and/or Cholesky orbitals enables us to present a self-consistent RPA method which shows asymptotically quadratic scaling opening the door for calculations on large molecular systems.