Introduction

We recently [1, 2] introduced an extension of AM1 molecular orbital theory, [3] named AM1*, which uses d-orbitals for the elements P, S, Cl, [1] Al, Si, Ti and Zr [2] and a slight modification of Voityuk and Rösch’s AM1(d) parameters for Mo [4]. AM1* performs significantly better than AM1 for P-, S- and Cl-containing compounds but retains its advantages (good energies for hydrogen bonds, higher rotation barriers for π-systems than MNDO [5, 6] or PM3 [79]) for the elements H, C, N, O and F. We now report AM1* parameters for copper and zinc. These elements represent our first AM1* parameters for the first row transition metals. Both copper [10] and zinc [11] individually and in combination [12] are important in the chemistry of metalloenzymes. Because the experimental data for heats of formation of compounds of these two metals are relatively sparse and prone to errors, we have paid special attention to reproducing calculated reaction energies as closely as possible in order to produce a robust parameterization. We have thus continued the philosophy used to parameterize Al, Si, Ti and Zr [2]. Zinc has been the subject of several parameterization attempts in which only s- and p- atomic orbitals have been used [8, 1317]. However, in AM1* we have included the filled d-shell in order to produce parameters consistent with the remaining metals of the first transition series. We also hope to avoid some of the problems associated with the inadequate description of non-valence electrons inherent in MNDO-like methods.

We also note that reaction-specific parameterizations of MNDO-like Hamiltonians have been reported, for instance for phosphorus [18] and iron [19, 20]. These local parameterizations use either Gaussian functions [19] or the treatment that we have used for AM1* [1, 2, 20] to modify the core-core terms. We anticipate that our more general AM1* parameter sets will be good starting points for such local parameterizations.

Theory

AM1* for the two new elements uses the same basic theory as outlined previously, [1, 2] with the exception that the core-core repulsion potential for the Cu-H and Zn-H interactions used a distance-dependent term δ ij , rather than the constant term used for core-core potentials for most other interactions in AM1* [1]. A distance-dependent δ ij was also used for the Mo-H and interaction in AM1(d) [3] and for Ti-H, Zr-H and Mo-H in AM1* [2]. The core-core terms for Cu-H and Zn-H are thus:

$$ E^{{core}} {\left( {i - j} \right)} = Z_{i} Z_{j} \rho ^{0}_{{ss}} {\left[ {1 + r_{{ij}} \delta _{{ij}} \exp {\left( { - \alpha _{{ij}} r_{{ij}} } \right)}} \right]} $$
(1)

where all terms have the same meaning as given in reference [1].

The parameterization techniques were those reported in references [1] and [2] and will not be described further here.

Parameterization data

The target values used for parameterization and their sources are defined in Table S1 of the Supplementary Material. We have used both reaction energies and heats of formation as we did for the Ti, Zr parameterization [2] and have also used an extensive series of model compounds whose heats of formation we have derived from DFT calculations [21]. As before, [1, 2] we checked that experimental values for heats of formation were reasonable using DFT calculations. This strategy is designed to ensure that unusual bonding situations can be included in the parameterization dataset by including the prototype compounds and reactions, thus making a more robust and general parameterization possible.

Experimental parameterization data for zinc and copper were taken largely from the NIST Webbook, [22] but also from the MNDO/d [14, 15] and AM1-Zn [6] datasets and heat-of-reaction values from the CRC Handbook [23] and Hildebrand [24].

DFT calculations used the Gaussian 03 suite of programs [25] with the LANL2DZ basis set and standard pseudopotentials [2629] augmented by a set of polarization functions [30] (designated LANL2DZ and LANL2DZ+pol, respectively) and the B3LYP hybrid functional [3133]. In some cases, coupled cluster calculations with single and double excitations and a perturbational corrections for triples (CCSD(T)) [3437] with the 6-311+G(g) basis set [3843] were used to check values for which DFT may be unreliable.

The energetic parameterization data and their sources are given in Table S1 of the Supplementary Material.

In addition to the energetic data, geometries, dipole moments and ionization potentials taken from the above sources, crystal structures from the Cambridge Structural Database (CSD) [44] were used in the parameterization to ensure that not only the energetic and electronic properties for the “prototype” compounds, but also the structures of large copper and zinc compounds are well reproduced. Note that these comparisons are subject to disagreement caused by the neglect of the effect of the crystal environment on the structures, but they nonetheless give an indication of the general applicability of the method.

Results

The optimized AM1* parameters are shown in Table 1. Most of the parameters are quite consistent along the second row so far. Geometries were optimized with the new AM1* parameterization and for AM1 and PM3 using VAMP 10.0, [45] while the PM5 calculations used LinMOPAC2.0 [46]. The two programs give essentially identical results for the Hamiltonians that are available in both.

Table 1 AM1* parameters for the elements Cu and Zn

Copper

Heats of formation

The calculated heats of formation for our training set of copper compounds are shown in Table 2. We have compared our results with the only comparable method available in its final form to date, the unpublished PM5 method implemented in Mopac [46]. Stewart has made his trial PM6 parameters available in OpenMOPAC [47] but we have not compared our results with this method because the parameterization is not yet finalized.

Table 2 Calculated AM1* and PM5 heats of formation and errors compared with our target values for the copper compounds used to parameterize AM1* (all values kcal mol−1)

AM1* performs remarkably well, even considering that the data shown in Table 2 are biased towards AM1* because it was trained on this set of compounds. The mean unsigned error of only 11.4 kcal mol−1 and RMSD of 16.8 kcal mol−1 are very respectable and suggest that AM1* has been parameterized well. The parameterization set for PM5 has not been published, but clearly does not cover the range of compounds used for AM1*.

The largest errors for AM1* are found for the “exotic” molecules CuTi and CuZr (−36.9 and −59.1 kcal mol−1, respectively), CuNH2 (−38.3 kcal mol−1), the unsymmetrically substituted molecules ClCuCH3 and HCuSH (26.2 and −22.4 kcal mol−1, respectively), Cu(NH2)2 (−25.7 kcal mol−1) and the two molecules with triple bonds, Cu2C2H2 (−42.8 kcal mol−1) and CuSCN (−30.9 kcal mol−1). The last two can be explained by AM1’s known tendency to treat triply bonded species incorrectly, as is shown by the good performance of AM1* for the reaction Cu+ + SCN → CuSCN (see Table 3 below).

Table 3 Calculated AM1* and PM5 heats of reaction and errors compared with our target values for the copper compounds used to parameterize AM1* (all values kcal mol−1)

Reaction energies

The calculated reaction energies are shown in Table 3. As expected from the parameterization procedure and also seen for Ti and Zr, [2] AM1* performs significantly better than other methods for these reactions. The largest error is only 21.2 kcal mol−1 for Cu + SCH3 → CuSCH3.

Ionization potentials and dipole moments

Table 4 shows the calculated Koopmans’ theorem ionization potentials and dipole moments for some copper compounds. AM1* performs significantly better than PM5 for ionization potentials and similarly for dipole moments. AM1* tends to underestimate ionization potentials, whereas they are overestimated severely by PM5. AM1* underestimates dipole moments systematically but gives a better correlation between experimental and target values than PM5, which shows no systematic deviation but correlates poorly. Correcting AM1* for the systematic deviation leads to the equation

$$ \mu _{{\exp }} = 1.028 + 0.991\mu _{{{\text{AMI}} * }} $$

which gives a mean unsigned error of 1.02 Debye and a standard deviation of 0.79 Debye.

Table 4 Calculated AM1* and PM5 Koopmans’ theorem ionization potentials and dipole moments for copper-containing compounds

Geometries

Table 5 shows a comparison of the AM1* and PM5 results for bond lengths and angles of small copper compounds. AM1* performs slightly better than PM5 for bond lengths and bond angles. The mean unsigned errors of 0.11 Å and 9.2° for bond lengths and angles, respectively, are acceptable. AM1* tends to overestimate bond lengths to copper (by 0.04 Å), whereas PM5 tends to make them too short.

Table 5 Calculated AM1* and PM5 bond lengths and angles at the metal for copper-containing compounds

Table 6 shows a comparison of the structures optimized for a series of copper compounds (including ion pairs) with the crystal structures taken from the Cambridge Structural Database [44]. The RMSD values were calculated using Quatfit [48] to overlay all the non-hydrogen atoms. A visual comparison of the crystal and AM1*-optimized structures is given in Table S2 of the Supplementary Material.

Table 6 Calculated AM1* and PM5 root-mean-square deviations from the crystal structures for a selection of copper compounds

Table S2 reveals some systematic weaknesses of AM1*. The optimized structure for entry ABETEH, for instance, shows a calculated copper coordination that is closer to square planar than the observed distorted tetrahedral geometry from the crystal structure. Eleven examples (ACITIP, ATCHU, EJEVOA, AKIYUO, ALEUCU, ALIWUN, AMPRCU, AQCBCU, ASTMEC, AVOQIL and AVOQOR) contain waters of crystallization that are coordinated either via hydrogen bonds to ligands or interact directly with the copper centre. The former are subject to the know AM1 problems with hydrogen-bond geometries, [3] whereas the latter (ACTHCU and ALIWUN) tend to bind the waters too tightly to the metal. The structures with both phosphine and cyanide ligands bound to copper (AWEMAQ and FEJMEN) optimize to geometries in which one of the nitrogens of the bipyridyl ligand is slightly dissociated and the cyanide occupies a bridging position above the Cu-P bond.

In general, however, the crystal structures, and especially the coordination at copper, are reproduced remarkably well by AM1*.

Zinc

Heats of formation

The results obtained for heats of formation of zinc compounds are shown in Table 7. The AM1* errors in heats of formation for zinc compounds are significantly lower than for the other methods, even MNDO/d, which normally performs best in our comparisons [1, 2]. The mean unsigned error between experimental and AM1*-calculated heats of formation is only 12.5 kcal mol−1 and the root mean square deviation 21.8 kcal mol−1. These values are 2–3 times smaller than those given by the other published methods except MNDO, which gives only slightly worse agreement with the experiment. The only major and consistent deviations in the AM1* results are given by compounds that contain both zinc and either titanium or zirconium, for which we have no experimental data and which are unlikely to be strongly represented in molecules calculated with AM1*, and for Zn(II) solvated by ammonia molecules. The latter error is disturbing because it implies that the all-important competition between nitrogen and oxygen coordination to Zn(II) in biological systems will not be treated adequately by AM1*. We were, however, unable to remove this error without severely worsening the calculated AM1* structures for zinc compounds in which the metal is coordinated to nitrogen ligands.

Table 7 Calculated AM1*, PM5, PM3, PM3-Zn, AM1, MNDO/d and MNDO heats of formation and errors compared with our target values for the zinc compounds used to parameterize AM1* (all values kcal mol−1)

The reaction energies calculated for zinc compounds (e.g. complexation energies of Zn2+ with water and ammonia) were used to calculate some of the heats of formation shown in Table 5 and are not listed separately.

Ionization potentials and dipole moments

The calculated Koopmans’ theorem ionization potentials and dipole moments are shown in Table 8. For the ionization potentials, PM5 performs significantly better than the other methods and MNDO significantly worse. MNDO/d shows no systematic error, whereas PM3, PM5, AM1 and MNDO show increasingly large positive mean signed errors and AM1* and PM3-Zn negative ones. The correlation coefficients (without Zn+, whose high ionization potential exerts a strong lever effect on the correlation) only vary between 0.81 (MNDO/d) and 0.93 (PM5).

Table 8 Calculated Koopmans’ theorem ionization potentials and dipole moments for zinc-containing compounds

AM1* overestimates dipole moments by about 0.5 Debye, whereas MNDO/d and MNDO underestimate them. Generally, none of the methods perform either very well or very badly for dipole moments of zinc compounds.

Geometries

Table 9 shows a comparison of the available and finalized methods for the bond lengths and angles of the compounds used for our AM1* parameterization. AM1* performs best for bond lengths to zinc (MUE = 0.068 Å) and moderately well (MUE = 3°), but not as well as PM5 (MUE = 2.2°) for bond angles with a zinc atom involved. The specific reparameterization of PM3 for zinc [17] also does well for these angles (MUE = 3.4°), whereas AM1, MNDO and MNDO/d are slightly worse (MUE 5–7°) and PM3 significantly so. Apart from PM5, which has a MUE of 0.19 Å for the bond lengths to zinc in this set of compounds, all other methods give MUEs between 0.11 and 0.15 Å. Thus, AM1* performs best for its own training set, which is not surprising. The data shown in Table 9 are best regarded as an indication of how general the parameterizations are, rather than an absolute measure of the quality of the parameterization. Thus, although AM1* is best for this set of compounds, there may be local compound types for which any of the other methods may perform better.

Table 9 Calculated AM1*, PM3, PM3-Zn, PM5, AM1, MNDO/d and MNDO bond lengths and angles at the metal for zinc-containing compounds

Table 10 shows a comparison of the results obtained by optimizing some zinc-containing structures from the Cambridge Structural Database as isolated molecules or complexes using the available methods. The same reservations about using such comparisons apply as for the corresponding copper cases, but AM1* and the specific PM3-parameterization for zinc give the best results. The AM1* results are shown graphically in Table S3 of the Supporting Material.

Table 10 Calculated AM1*, PM5, AM1, PM3, PM3-Zn, MNDO and MNDO/d root-mean-square deviations from the crystal structures for a selection of zinc compounds

As for copper, some of the compounds that show large deviations (ALOPAS, FIPRUR) involve waters moving during the optimization, whereas others (DOLVAB01, DTBZZN10, EXAPYZ, HEPBOT, HOCZEE, IPEHOA01, IZUPAU, MIXZEY, NAGMUE) show difficulties that AM1* suffers in reproducing the coordination of sulfur ligands to zinc. However, some of these effects may be due to missing intermolecular interactions in the calculations. Otherwise, the structures are reproduced remarkably well by AM1* despite some differences distinguishing between square planar and tetrahedral coordination or deviations caused by missing dispersion between monomers in a dimeric structure (DONWAE).

Discussion

As for our parameterization of Ti and Zr for AM1* [2], we have extended the range of the parameterization dataset by including results from DFT and ab initio calculations. Our aim thereby is to produce a parameter set that is more robust and generally applicable than those trained only on the experimental data. The results for our training dataset suggest that this objective has been achieved, although we warn against drawing too many conclusions by comparing calculational methods with each other based on training compounds used for only one of the methods. Nevertheless, our data suggest that the AM1* parameterizations for copper and zinc give surprisingly good energetic results and are acceptable for the “electronic” properties (ionization potential and dipole moment) to the other methods available.

Using 12 valence electrons for zinc apparently gives better results than those obtained by the other methods, which use only two valence electrons. This, however, may also simply be the effect of comparing the other methods with our own training set. However, using 12 valence electrons allows us to produce consistent parameterizations across the first transition-metal series at the cost of making AM1* slightly slower for zinc than the other methods. Note that none of the other methods use either occupied or virtual d-orbitals for zinc.

As with all semiempirical MO-techniques reported so far, we expect that there will be cases in which AM1* gives large deviations from experiment or higher-level calculations. However, by using a diverse training set that includes data obtained from DFT and ab initio calculations, we have attempted to make the parameterization as robust and generally applicable as possible. This is apparently possible without sacrificing accuracy for the more commonplace compounds for which reliable experimental data are available.