Introduction

Phase equilibrium, forward modeling (PEFM) (“forward” to distinguish it from the multiequilbrium approach that is an inverse modeling; see Powell and Holland 2008) has become an essential tool to unravel the evolution of metamorphic rocks (see reviews by Holland and Powell 2011 and Lanari and Duesterhoeft 2019). It has been used to interpret the evolution of migmatites (Indares et al. 2008; White and Powell 2002, White et al. 2005; Yakymchuk and Brown 2014); to reconstruct Pressure–Temperature (P–T) paths that were then linked to U-Th–Pb geochronology of accessory phases to reconstruct P–T-time paths used to test regional-scale tectonic models (Lang and Gilotti 2015; Larson et al. 2013; Dumond et al. 2015; Gervais and Crowley 2017); to predict directly accessory phases petrogenesis by incorporating their thermodynamics properties into the calculation (Spear and Pyle 2010; Spear 2010). One of main strengths of the method is that it allows a direct comparison between the observed minerals modes and composition to those predicted by equilibrium thermodynamics for a specific bulk composition and a recent improvement further allows to quantify this comparison (Duesterhoeft and Lanari 2020). Currently, three main softwares are used for such calculations: Thermocalc (Powell et al. 1998); Theriak-Domino (de Capitani and Petrakakis 2010) and Perple_X (Connolly and Petrini 2002; Connolly 2005). Out of the three softwares, the latter two allow the user to choose the desired solution models.

Although solution models (or activity-composition relations) have been identified as a major source of errors (White et al. 2011) there is generally no justification provided for the choice of a set of solution models (apart from the fact that they use internally consistent database). A few studies have evaluated a given thermodynamic database (see Lanari and Duesterhoeft 2019 for a discussion on what constitutes a database) against partial melting experiments with results broadly consistent with observations (Johnson et al. 2008; Grant 2009; White et al. 2011; García-Arias 2020), but few have compared solution models or complete database to each other (except for Tajčmanová et al. 2009 and Tropper et al. 2002). With the release of the thermodynamic database by Thermocalc composed of a revised internally consistent dataset (Holland and Powell 2011) and a set of compatible solution (activity-composition) models (White et al. 2014) the hopes were high that PEFM would achieve even better results than before, but results of two studies rather suggested that the older database (dataset tcds55 with associated solution models) reproduces natural observations, while the newer database does not (Guevara and Caddick 2016; Kendrick and Indares 2018), and one other study indicated that the new database reproduced well the paragenesis, but not the composition of clinopyroxene and amphibolite in several mafic granulites (Forshaw et al. 2019). In the absence of rigorous test, users cannot justify their choice of thermodynamic datasets and solution models to conduct PEFM. It is, therefore, crucial to evaluate different sets of solution models and thermodynamic datasets available and test whether some perform better than others for modeling different bulk compositions. Although we are aware of the advantages and pitfalls of PEFM and partial melting experiments (see White et al. 2011 and Sect. 2.3 below), we consider that comparing the two methods is one of the best way to conduct our test, another alternative would be to use the Bingo-Antidote software described in Duesterhoeft and Lanari (2020).

This study compares four sets of solution models aiming at reproducing six partial melting experiments of metapelite and aluminous greywacke by PEFM using four different sets of solutions models and two internally consistent thermodynamic datasets. Parameters investigated include: (1) proportion of major phases; (2) proportion of titanium oxydes; (3) composition of major phases. Our results reveal significant discrepancies between the sets and identify two that outperforms the others.

Methodology

Experiments

Six partial melting experiments were chosen (Fig. 1; Table 1) such that on AFM, AFC, AKF they cover most of the range of composition compiled on the MetPetDB database (Spear et al. 2009) for metapelitic and greywacke compositions. Experiments HQ36 (Patiño-Douce and Johnston 1991) and HP60 (Pickering and Johnston 1998) are close to the most common bulk composition of metapelites in the database, experiments MS (Patiño-Douce and Harris 1998) and NBS (Stevens et al. 1997) represent aluminous metapelites, whereas experiments NB (Stevens et al. 1997) and CEPV (Montel and Vielzeuf 1997; Vielzeuf and Montel 1994) represents metagreywackes (Fig. 1). All experiments contained MnO except HP60 and all experiment contained between 2.3 and 2.8 wt% TiO2, except MS that contained 3.23 wt% TiO2 (Table 1). Experiment NBS consisted of 13 runs from 750 to 1000 °C at 500 and from 800 to 1000 °C at 1000 MPa; experiment MS consisted of 11 runs from 750 to 900 °C at 600 and from 800 to 900 °C at 1000 MPa; experiment HQ36 consisted of 19 runs from 825 to 1075 °C at 700 and 1000 MPa; experiment HP60 consisted of 6 runs from 812 to 950 °C at 1000 and from 800 to 1000 °C at 1000 MPa; experiment CEPV consisted of 24 runs from 805 to 875 °C at 300, from 809 to 898 °C at 500 MPa, from 855 to 1040 °C at 800 MPa, from 855 to 1040 °C at 800 MPa and from 803 to 1000 °C at 1000 MPa; experiment NB consisted of 13 runs from 800 to 1000 °C at 500 and from 800 to 1000 °C at 1000 MPa. Out of the six experiments, only CEPV was used to calibrate a solution model and it was used to calibrate the three Ti-biotite models investigated herein. The six studies did not report the same data, which limited possible comparisons as described in the result section.

Fig. 1
figure 1

Bulk composition of starting material for the six fluid-absent partial melting experiments investigated in this contribution: HQ36 (Patiño-Douce and Johnston, 1991); NBS and NB (Stevens et al., 1997); MS (Patiño-Douce and Harris, 1998); HP60 (Pickering and Johnston, 1998); CEPV (Vielzeuf and Montel, 1994; Montel and Vielzeuf, 1997). Density plots were constructed with GCDkit 3.0 from the composition of 129 samples extracted from the MetPetDB database (Spear et al., 2009) with the query: protolith types = metapelite or metagreywacke. AFM: A = Al2O3–K2O, F = FeO, M = MgO. A’KF: A’ = Al2O3 + Fe2O3–(K2O + Na2O + CaO), K = K2O, F = FeO + MnO + MgO. A’’CF: A’’ = Al2O3 + Fe2O3–(K2O + Na2O), C = CaO; F = FeO + MnO + MgO. See text for details

Table 1 Key chemical chracteristics of the six partial melting experiments investigated herein along with the biotite-breakdown temperatures

Phase-equilibria (forward) modeling

PEFM was conducted with the program Perple_X (version 6.7.3) in the Ti–Mn-Na–Ca-K-Fe–Mg–Al–Si–H-O system. Oxygen fugacity was incorporated in the calculations either as a buffer by creating a new thermodynamic entity (qfm = 2 mt + 3 q-3 fa) in the datasets and fixing O2 as a saturated component or as the thermodynamic component O2 for the CEPV experiment that reported the Fe2O3 composition. The first set investigated, referred herein as VAR, consists of various solution models identified by the authors as performing best by trial and errors. It uses the same models as HP04 for melt, staurolite, spinel, cordierite and orthopyroxene, but includes the ternary feldspar of Fuhrman and Lindsley (1988), the Mn-bearing garnet model of White et al. (2005), the white mica model of Smye et al., (2010), the ideal ilmenite model slightly modified to include Wilm,pnt = 2200  (Wu and Zhao 2006), and the biotite model of Tajčmanová et al. (2009). The second set investigated, referred herein as set TCHP04, also includes the biotite model of Tajčmanová et al. (2009) and Mn in garnet (without interaction parameters for spessartine), but all the other solution models are the same as the third set. This third set, referred herein as set HP04, consists of the solution models provided by White et al. (2007), which do not include MnO. The fourth set investigated, referred herein as set HP11, consist of the solution models provided by White et al. (2014). The first three sets use the internally consistent thermodynamic dataset of Powell et al. (1998; updated in 2004 as tcds55), whereas the last set uses the tcds62 dataset of Holland and Powell (2011).

Strategy of investigation

Experiments and PEFM both have advantages and inconvenient. It is difficult to test solution models by modeling natural rocks, although the Bingo-Antidote software (Lanari and Duesterhoeft 2019; Duesterhoeft and Lanari 2020) now provides a more robust way to test the quality of a given calculation. In theory, experiments should provide “answers” toward which PEFM should tend. In practice, however, there will never be a perfect fit because both methods have pitfalls and uncertainties that are difficult to quantify. White et al. (2011) and García-Arias, (2020) provided an extensive description of the various sources of errors and pitfalls of each methods and we will only mention some of them here. For experiments, attainment of equilibrium and the oxygen fugacity in the capsule are two important factors that are difficult to ascertain (see Douce and Beard 1994). For PEFM, some important chemical components are omitted. For example, the fluor content of biotite significantly controls its stability, but cannot be modeled. There are also significant uncertainties in the thermodynamic formulation of each solution model (see Lanari and Duesterhoeft 2019 for an extensive discussion) that combines in PEFM, rendering quantification of uncertainties very difficult. Nevertheless, our working hypothesis is that well-formulated sets of solution models should systematically yield results closer to that of experiments.

Four sets of solid solutions were chosen for this study (see Table 2 for a complete description). The main research group on PEFM is responsible for the development of the Thermocalc software and the production of internally consistent thermodynamic datasets (now hosted at http://hpxeosandthermocalc.org/). Thermocalc proposes different sets of solution models depending on the bulk composition investigated. For metasediments, the proposed sets are HP04 and HP11 that accompany the older and newer versions of the internally consistent thermodynamic datasets tcds55 (Powell et al. (1998; updated in 2004) and tcds62 (Holland and Powell 2011), respectively. Note that solution models developed for the tcds62 dataset cannot be used with the tcds55 dataset, and vice versa. Because our informal work with the Ti-biotite model of Tajčmanová et al. (2009) appeared to significantly influence results of PEFM, we decided to specifically test this model. For this sake, we designed a set similar to HP04 (TCHP04), but replacing the Ti-biotite model of White et al. (2007) with that of Tajčmanová et al. (2009). The VAR set includes various solution models, including the Tajčmanová et al. (2009) biotite model, that appear to yield better results based on our informal experience and success at modeling natural rocks (Larson et al. 2013; Gervais and Crowley 2017; Perrot et al. 2020; Soucy La Roche et al. 2019). Although Thermocalc suggests excluding Mn in PEFM with the tcds55 dataset, we decided to include it in the TCHP04 set. We tried a set not including Mn, but it yielded redundant results. We, therefore decided to include Mn for a more robust comparison between sets VAR, TCHP04 and HP11 in a system as close as possible to natural compositions (Ti–Mn-Ca–Na–K-Fe–Mg–Al–Si–H-O). Finally, it is worth mentioning that the large majority of data presented herein were acquired before the publication of the HGP database (Holland et al. 2018). Nevertheless, we consider that it would not change significantly our results because this database yielded very similar results compared the HP11 sets for average metapelitic composition (Fig. 12 in Holland et al. 2018), a conclusion that we have confirmed for the MS bulk composition by PEFM (results not presented in this contribution).

Table 2 Solution models used for phase equilibria (forward) modeling

Several parameters are used herein to compare the four sets of solution models. We first calculated isochemical phase diagram sections (IPDS) for all six partial melting experiments using the four different sets of solution models, producing a total of 24 IPDS. IPDS are difficult to compare (see Figs. 2 and 3), hence we extracted data from them for each set for all experimental runs. Paragenesis, phase proportions and chemical compositions were, therefore, extracted from a total of 344 pressure–temperature data points and compared with results yielded by partial melting experiments of metasediments covering the entire field of natural compositions (Fig. 1) and P–T conditions varying between 500 and 1000 MPa and 750 to 1050 °C, depending of experimental conditions (see Sect. 2.1 above and supplementary files). The temperature at which biotite disappears is of uttermost importance for testing the validity of solution models because its breakdown is the main melting reaction in metasediments (see reviews by Clemens 2006 and Brown 2010) and it influences the stability of other ferromagnesian minerals. Figure 4 compares experimental and calculated biotite-breakdown temperatures. Parageneses of ferromagnesian minerals is another very important parameter because their proportion and chemical composition are regularly used for isopleth thermobarometry. Figure 5a presents the number of times a given set has reproduced the Fe–Mg experimental paragenesis and Fig. 5b shows the number of times one of the Fe–Mg mineral (we included muscovite in this list) caused the failure of the calculation (either as predicted to be there, while it was not observed or vice versa). Another important test for the effectiveness of the different sets of solution models is the comparison between predicted and experimental proportion for major phases. Proportions and Mg# of liquid, garnet, biotite are reported in the six investigated experiments (NB, NBS, MS, HQ36, HP60 and CEPV), whereas proportion and the anorthite content of plagioclase are reported in two (MS, CEPV). Only data points reproducing the experimental paragenesis of the main phases are included. Sets that best reproduced the experimental proportion and composition for a given data point is shown as larger symbol on Figs. 6 and 7, respectively, and the number of times this occurred for each set and each phase is shown on Fig. 8. Boxplots of the absolute differences between calculated and experimental proportions abs(calc.–exp) and composition are presented for each set on Fig. 9. Another way of comparing mineral compositions is to investigate trends yielded by experiments. The HQ36 experiment yielded compositional trends for garnet end members as well as for biotite Mg# and TiO2 (Fig. 10). Because Ti-oxides (rutile and ilmenite) parageneses constitute important clues for reconstructing the P–T evolution of metasediments, it is crucial to investigate whether PEFM is able to yield meaningful results. For this, the three experiments that yielded traceable reactions in a P–T diagram were selected (Fig. 11). Below, sets are generally compared by listing values in order of success at reproducing experimental results (from best to worst).

Fig. 2
figure 2

Isochemical phase diagram sections calculated with the four sets of solution models for the pelitic bulk composition HQ36 (Patiño-Douce and Johnston 1991). Each point represents one experimental run with white circles indicating calculations reproducing experimental paragenesis of ferromagnesian and Ti-oxides minerals, white circles with black contour indicating calculations reproducing experimental paragenesis of ferromagnesian, but not of Ti-oxides minerals, and black circles indicating failure of calculations at reproducing experimental paragenesis of ferromagnesian and of Ti-oxides. Yellow, blue, and red lines show disappearance of biotite, cordierite and garnet, respectively, with the +mineral name on the side of appearance

Fig. 3
figure 3

Isochemical phase diagram sections calculated with the four sets of solution models for the grewacke bulk composition NB (Stevens et al., 1997). Each point represents one experimental run with white circles indicating calculations reproducing experimental paragenesis of ferromagnesian and Ti-oxides minerals, white circles with black contour indicating calculations reproducing experimental paragenesis of ferromagnesian but not of Ti-oxides minerals, and black circles indicating failure of calculations at reproducing experimental paragenesis of ferromagnesian and of Ti-oxides. Yellow, blue, red and pale brown lines show disappearance of biotite, cordierite and garnet, and orthopyroxene, respectively, with the +mineral name on the side of appearance

Fig. 4
figure 4

Calculated vs experimental temperatures at given pressures coincident with the disappearance of biotite for the four sets of solution models. VAR, TCHP04, HP04 and HP11 are the names given to the four sets of solution models investigated in this contribution and described in Table 2

Fig. 5
figure 5

a Frequency plot of calculations that reproduce the paragenesis of ferromagnesian minerals (including muscovite) by the four sets of solution models for all runs of each experiment investigated in this study. For reference, grey bars behind the colored histograms represent the total number of data points calculated for each experiment. b Frequency that a given mineral was not reproduce correctly in an experimental run (either because it is predicted in a calculation and was not observed in the experimental run or vice versa)

Fig. 6
figure 6

Calculated vs experimental proportion for biotite (a), garnet (b) liquid (c) and plagioclase (d) expressed in volume%. Each data point compares results of a calculation against that of the corresponding experimental run. Solution model sets are color-coded, whereas experiments are represented by symbols. The set that best reproduce the experimental result in a given experimental run is shown as a larger symbol

Fig. 7
figure 7

Calculated vs experimental Mg# (100*Mg/Mg + Fe in mol) for biotite (a), garnet (b) liquid (c) and 100*XAn of plagioclase (d). Each data point compares results of a calculation against that of the corresponding experimental run. Solution model sets are color-coded, whereas experiments are represented by symbols. The set that best reproduce the experimental result in a given experimental runs is shown as a larger symbol

Fig. 8
figure 8

Frequency plot of each sets of solution models that best reproduce each phase proportion and their composition (Mg# or XAn in experimental runs). Shown as larger symbols on Figs. 6 and 7

Fig. 9
figure 9

Boxplots compiling absolute differences between calculations and experiments of all data points (i.e., for all experiments) for biotite (a), garnet (b), liquid (c) and plagioclase (d) proportions (in vol.%), as well as for biotite (e), garnet (f), liquid (g) 100*Mg# and plagioclase 100*XAn (h). Lines and X inside the boxplots are the median and average values, respectively. The box itself represents the interquartile range (25-75 percentile) and whiskers shows the maximum and minimum values. Data points yielding anomalous results for all sets were discarded. See text for details

Fig. 10
figure 10

Calculated (colored symbols) and experimental (lines) garnet and biotite composition for HQ36, the only experiment that yielded compositional trends with temperature. Triangles and circles are for experiments and calculations at 700 and 1000 MPa, respectively. a, b, c The grossular, almandine and pyrope content of garnet expressed in % (100*XAlm). d Depicts Mg# (100*Mg/Mg + Fe in mole) in biotite. e The TiO2 (in wt%) of biotite. See text for details

Fig. 11
figure 11

Frequency plot of calculations that reproduce the paragenesis of Ti-oxide minerals by the four sets of solution models for all runs of each experiment investigated in this study. For reference, gray bars behind the colored histograms represent the total number of data points calculated for each experiment. See text for details

Results

Isochemical phase diagram sections (IPDS)

Figures 2 and 3 present IPDS calculated with the four different sets for one metapelitic (HQ36) and one greywacke (NB) compositions. Obvious differences include the stability field of cordierite (blue lines), garnet (red lines) and biotite (yellow lines). Although there are obvious differences in topologies, the details are difficult to see on such diagrams. It is, therefore, important to extract specific properties from each diagram to thoroughly investigate the discrepancies between sets.

Biotite-out temperature

Figure 4 presents the upper stability of biotite at each pressure conditions in the six experiments compared with results of PEFM for the four sets investigated. Out of the 12 investigated conditions, calculations with sets VAR, TCHP04, HP04 and HP11 yield a biotite-out temperature within or higher than that observed in the experiments four, three, two and zero times, respectively. A striking observation is that sets VAR and TCHP04 systematically yield higher biotite-out temperatures than the HP’s sets, with HP04 generally yielding higher temperatures than HP11.

Paragenesis

Figure 5a presents the number of calculations reproducing the experimental paragenesis for ferromagnesian minerals (biotite, garnet, cordierite, orthopyroxene with the addition of muscovite) of a given experimental run for each set. The first important result is that only 15–37% of the runs are reproduced. The experimental paragenesis for ferromagnesian minerals is correctly modeled for the majority (> 50%) of runs only in experiments HP60 and HQ36 with calculations using sets VAR and TCHP04. Combining all points from all experiments, sets VAR (35%) and TCHP04 (37%) reproduce more parageneses than sets HP04 (28%) and HP11 (15%).

Figure 5b shows a compilation of the numbers of times, for each set of solution models, a calculation failed to reproduce the experimental paragenesis because a given phase was predicted but was not observed or vice versa. These histograms must be interpreted with caution because some minerals are more common than others (e.g., garnet and biotite). Hence, despite that cordierite is responsible for a similar number of failed calculations as garnet (~ 15 for all sets), the former is a much more problematic solution model because it is not expected in as many investigated points as the latter. The orthopyroxene model of the HP11 set performs better the other sets, with VAR being the worst. The only garnet model that does not incorporate Mn, HP04, is the most problematic. Moreover, it is clear that the biotite of the two HP sets are the most problematic solution models with 34 and 47 failed calculations, for sets HP04 and HP11, respectively, vs 21 and 18 for sets Var and TCHP04, respectively.

Proportion of major phases

General conclusions can be derived from graphs of calculated vs experimental proportion of phases and mineral compositions. Almost all calculations underestimate biotite proportion (Fig. 6a), whereas garnet proportions were overestimated for a majority of them (Fig. 6b). Data for liquid proportion plot above and below the 1: 1 line (Fig. 6c). Mg# (Mg/Fe + Mg mol%) of biotite plot evenly above and below the 1: 1 line (Fig. 8a), whereas that of garnet generally plot slightly above the line (Fig. 8b). Mg# for liquid generally plot below the 1:1 line except for results from the HP11 set that plots generally above (Fig. 8c). For plagioclase, calculations tend to slightly underestimate its proportion and overestimate the anorthite content (Fig. 9a and b). Apart perhaps for biotite proportion (Fig. 6a), there is no relationship between absolute differences (calculated-experimental) and experimental values suggesting that relative measures (e.g., calculated-experimental/experimental) would not constitute adequate parameters of comparison.

Systematic difference between sets are observed for biotite proportion. On the plot of calculated vs experimental vol%, sets VAR and TCHP04 yield values closer to that of experiments than sets HP04 and HP11 (Fig. 6a). Sets VAR, TCHP04, HP04 and HP11 reproduced best experimental runs 12, 6, 5 and 1 times, respectively (Fig. 8). On boxplots of absolute difference between calculated and experimental vol% (Fig. 9a), sets VAR and TCHP04 yield lower median values (8.8, 10.1, respectively) than sets HP04 (12.6) and HP11 (19.3), and the former two sets have upper whiskers that are lower than the upper quartile of set HP04, which also has upper whiskers lower than the upper quartile value of set HP11. The lower quartile and whiskers of sets VAR are also lower than all other sets. These results demonstrate that sets VAR and TCHP04, which use the Ti-biotite model of Tajčmanová et al. (2009), better reproduce experimental biotite proportion. This exerts significant control on all results presented herein.

Systematic difference between sets are also observed for garnet proportion, although it is not as clear as for biotite proportion. Sets TCHP04, HP04 VAR and HP11 reproduced best experimental runs 11, 9, 5 and 4 times, respectively (Fig. 8). On boxplots of absolute difference between calculated and experimental vol% (Fig. 9b), set TCHP04 yield a lower median value (1.6,) than the other sets (3.5–4.3), and has upper whiskers that are lower than the upper quartile of set HP04, which also has upper whiskers lower than the upper quartile value of set HP11. It is striking that the incorporation of Mn (albeit with no margules parameters) and the use of a different biotite model increase the ability of the HP04 garnet model (used in set TCHP04) at better reproducing experimental results.

No systematic differences between sets are observed for liquid proportion. The VAR set best reproduces experimental results for more runs (16) than the other sets (8–11; Fig. 8), but on boxplots of absolute difference between calculated and experimental vol%, it yields similar results as for sets TCHP04 and HP04, whereas set HP11 yields similar median value, but higher upper quartile and whisker values as the other sets (Fig. 9c).

For plagioclase, set VAR reproduce more experimental runs than the other sets (13 vs 3–4; Fig. 8), but on boxplots of absolute calculated–experimental vol% all sets reproduce equally well the experiment with a pooled median value around 4 (Fig. 9d).

Composition of major phases

Systematic difference between sets are observed for biotite Mg#. Sets VAR, TCHP04, HP11 and HP04 reproduce best experimental runs 11, 8, 5 and 1 times, respectively (Fig. 8). On boxplots of absolute difference between calculated and experimental Mg# (Fig. 9e), sets VAR and TCHP04 yield lower median values (2–2.7) than sets HP04 and HP11 (~ 7.8), and the former two sets have upper whiskers that are lower by > 7 than the upper quartile values of sets HP04 and HP11. The lower quartile and whiskers of sets VAR and HP04 are also lower than all other sets. These results demonstrate that sets VAR and TCHP04, which use the Ti-biotite model of Tajčmanová et al. (2009), better reproduce experimental biotite Mg#.

Systematic difference between sets are also observed for garnet Mg#, although it is not as clear as for biotite Mg#. Sets VAR, TCHP04, HP11 and HP04 reproduce best experimental runs 15, 9, 6 and 4 times, respectively (Fig. 8). On boxplots of absolute difference between calculated and experimental Mg# (Fig. 9f), sets VAR, TCHP04 and HP04 yield lower median value (1.6,) than set HP11 (3.5–4.3), whereas sets VAR, TCHP04 and HP11 have upper whiskers that are lower than the upper quartile value of set HP04 (16–17 vs 20–21, respectively). Sets VAR and TCHP04 thus better reproduce garnet Mg# than the HP’s sets.

Systematic difference between sets are observed for liquid Mg#. Sets HP11, VAR, TCHP04 and HP04 reproduce best experimental runs 20, 11, 5 and 5 times, respectively (Fig. 8). On boxplots of absolute difference between calculated and experimental Mg# (Fig. 9g), sets VAR, TCHP04, HP11 yield a lower median value (~ 6.1) than set HP04 (9.4). Set HP11 has upper whiskers that are lower than the upper quartile values of sets VAR, HP04 and HP11. Mg# of liquid is the only parameter investigated in this contribution for which set HP11 reproduces better experimental results than the other sets.

Systematic difference between sets are observed for the anorthite content of plagioclase. Sets VAR, HP11, TCHP04 and HP04 reproduce best experimental runs 9, 5, 2 and 2 times, respectively (Fig. 8). On boxplots of absolute difference between calculated and experimental XAn (Fig. 9h), all sets have a similar median values (1.6 to 2.3), but set VAR has an upper whisker value lower than the upper quartile values of the other sets. The plagioclase model of set VAR, constructed from Fuhrman and Lindsley (1988), thus better reproduces experimental compositions than the HP’s sets.

Another way of comparing mineral compositions is to investigate compositional change with temperature in the experiments. The HQ36 experiment yielded compositional trends for garnet end members as well as for biotite Mg# and TiO2. At 800 and 1000 MPa, XPrp and Bt-Mg# increase with increasing temperature, whereas XAlm and XGrs decreases and Bt-TiO2 remains relatively flat (lines on Fig. 10). There are not as many data points for the HP’s sets because only calculations that reproduce the paragenesis of the main phase are considered. In fact, there is only one data point for set HP11, which precludes any comparison with the other sets. Sets VAR, TCHP04 and HP04all follow the experimental trends. The large majority of data points calculated with sets VAR and TCHP04 further overlap experimental results, whereas set HP04 yield results that systematically diverge from experimental runs for all compositions (Fig. 10b, c, d, e), except for grossular (Fig. 10a).

Titanium oxides paragenesis and reactions

There is no difference between sets for the frequency of successful calculations at reproducing experimental parageneses for Ti-oxides. All sets reproduce ~ 50% of the observed parageneses for the NB and HQ36 experiment and 60–70% for the NBS experiment (Fig. 11).

Three experiments yielded results with coherent reactions for titanium oxides (Fig. 12). The NBS experiment yielded an ilmenite-in reaction with a positive slope from low to high P and an ilmenite\(\to\)rutile reaction at high P. NB yielded similar reactions except that the ilmenite-in has a negative slope. Finally, HQ36 appears to have a V-shaped ilmenite-rutile reaction at high P, although its exact location is not clearly defined.

Fig. 12
figure 12

Isochemical phase diagram sections showing Ti-oxides parageneses calculated with the four sets of solution models for the experiments that yielded coherent reactions (indicated as red lines). Circles represent experimental runs. Circles and background fill are color-coded for the Ti-oxides observed in experiments or calculated, respectively. Stippled black lines represent biotite-breakdown reaction. See text for details

Systematic differences between sets are observed for reactions involving titanium oxides. For the NBS experiment, sets VAR and TCHP04 reproduce the low-P ilmenite-in and the high-P rutile-in reactions, but ilmenite is not predicted to be stable at a high enough P compared to that observed in the experiment. In contrast, HP04 and HP11 do not reproduce any of the reactions, but HP04 does predict ilmenite stability at a higher P than the other three sets. For the NB experiment, sets VAR and TCHP04 predict the rutile-in reaction and an ilmenite-in reaction at a T ~ 50 °C higher than its observed appearance. Set HP04 predict the ilmenite-in reaction within 30 °C, albeit at a slightly higher P, and a rutile-in reaction at high-P at a T ~ 40 °C lower than its observed appearance. Set HP11 yield results diverging from that of the experiments. Interestingly, for the HQ36 experiment, all sets predict a V-shaped boundary for the rutile-in reaction. Set TCHP04 predicts the negatively sloping ilmenite\(\to\)rutile reaction within ~ 50 °C, while the positively sloping rutile \(\to\) ilmenite reaction is predicted at 100–200 MPa lower P and ~ 50 °C lower T, whereas set HP11 predicts the positively sloping rutile \(\to\) ilmenite reaction within 50 °C. All the other sets yield reactions that are far off the experimentally derived ones. It is interesting to note that for all three experiments, set TCHP04 yield results that are more similar to set VAR than to set HP04 despite that it uses the same ilmenite solution model as the latter. Because the main difference between sets TCHP04 and set HP04 is the biotite solution model, it appears that the choice of this model has a stronger influence on the stability field of Ti-oxides than the choice of the ilmenite solution model.

Discussion

As discussed by White et al. (2011), comparing results of partial melting experiments to that of phase equilibria modeling is fraught with many uncertainties that are difficult to ascertain. In any given comparison, it is almost impossible to determine if an observed discrepancy between the two methods is related to experimental problems, such as failure to reach equilibrium or poorly constrained or variable oxidation state; or if it is related to modeling problems, such as uncertainties in the thermodynamic dataset or with solution models. However, by comparing the same parameters (mineral proportion, composition and specific metamorphic reactions) between six experiments spanning a wide range of bulk compositions and results of PEFM using four different sets of solution models, we consider that we have largely circumvented the above-mentioned problems. Not only that this analysis allows us to determine if one (or more) set of solution models better reproduces experiments than the others (see below), but it could also provide an independent evaluation of partial melting experiments. For example, boxplots of calculated-experimental garnet Mg# values (Fig. 9e) show that the large majority of calculations are within 15 (absolute value). On the calculated vs experimental values graph (Fig. 7b), this is reflected by data points plotting near the 1:1 line. There is one exception, however, for points of all sets that plot at the extreme right of the graph (at experimental Mg# = 85), which corresponds to the run at 1000 MPa/1000 °C of the NBS experiment). The inconsistency of these data points compared to all other results strongly suggests it is an experimental anomaly, We therefore discarded this data point in the boxplots of Fig. 8e. Because only two other data points were discarded, we consider that the vast majority of experimental runs yield good results and can be used in our comparison with results of PEFM. Furthermore, these results suggest that PEFM using different sets of thermodynamic datasets and solid solutions could be used as an independent test for judging the validity and repeatability of experimental runs.

This contribution indicates that some sets systematically better reproduce experimental results than others. Set VAR and set TCHP04 systematically yield modeled biotite-breakdown temperature that are closer to those observed in experiments compared with set HP04 (higher by 20–100 °C), which, in turn, generally yields slightly higher temperatures than set HP11 (Fig. 4). The same two sets also reproduce the observed Fe–Mg parageneses (Fig. 5a) for more experimental runs than set HP04 (by 8–9%) and set HP11 (by 20–22%). Figure 8 shows that set VAR outnumbered the other sets for the number of times it best reproduces a given experimental run (shown as larger symbols on Figs. 6 and 7) for biotite, liquid and plagioclase proportion and for biotite, garnet and plagioclase composition, although set TCHP04 yields better results more frequently than the other sets for garnet proportion. On boxplots pooling all the difference between calculated and experimental values (absolute), sets VAR and TCHP04 clearly yield better results than the HP’s sets for biotite and garnet proportion (Fig. 9a and b) as well as for biotite and garnet Mg# (Fig. 9e and f). Set VAR also yields better results for the anorthite content of plagioclase (Fig. 9h). Set HP11 yields a better result than the other sets only for the liquid Mg# (both for the frequency of best reproduction, Fig. 8, and on boxplot of Fig. 9g). Perhaps that more attention was given to this parameter in the calibration of the solution model. Although the four sets reproduce the same proportion of Ti-oxides parageneses observed in experiments HQ36, NBS and NB, only sets VAR and TCHP04 reproduce the HP ilmenite \(\to\) rutile and the LP ilmenite-in reactions observed in the latter two experiments (Fig. 11). Consequently, results of this study clearly indicate that sets VAR and TCHP04 are preferable than sets HP04 and HP11 for PEFM of metasediments for a wide variety of investigated parameters.

Several clues suggest that one solution model is mainly responsible for the observed discrepancies between sets. The striking differences for temperatures of biotite-breakdown observed between solution model sets using the Tajčmanová et al. (2009) biotite model and those that do not (setVAR/TCHP04 vs HP04/HP11), hint that this model has a profound influence on PEFM of metasediments (Fig. 4). This hypothesis is supported by the compilation of minerals causing problems in the modeled Fe–Mg paragenesis (Fig. 5b) showing that biotite was the cause of failure of calculations at reproducing experimental paragenesis more frequently than any other phases. Finally, the fact that biotite is the only solution model distinguishing sets TCHP04 and HP04 confirms that it is the main cause of differences between sets. Because biotite is a major carrier of Fe, Mg and Ti and that it is the main phase involved in partial melting of metasediments, its proportion in calculations would necessarily have a significant impact on the proportion and composition of garnet, liquid and Ti-oxides, as observed herein.

Several factors could explain the discrepancies between biotite models documented herein. Tajčmanová et al. (2009) significantly modified the HP04 biotite model, from which the HP11 was also built upon. They first reduced the enthalpy of the ordering reaction (obi) from ΔH 0order  = − 2 kJmol−1 (Holland and Powell 2006) to ΔH 0order  = − 6.8 kJmol−1. This had the effect of lowering the octahedral Al-content of biotite that resulted in a better match with natural data and an increase in predicted biotite proportion. Based on crystallographic studies, they also changed the Ti substitution in biotite from ordering on the M1, as in the HP04 and HP11 sets, to the M2 octahedral site. This has the net effect of increasing the entropy of mixing, and thus lead again to an increase in biotite proportion. In parametrizing the model, Tajčmanová et al. (2009) determined by regression that the non-ideal mixing parameter involving the Ti endmember was null, whereas it is positive for the HP’s sets. This also has the net effect of increasing the predicted biotite proportion. Consequently, as predicted by the thermodynamic formulation of the model, PEFM of partial melting of sediments resulted in significant increase in the calculated biotite proportion by using the Tajčmanová et al. (2009) biotite model (Fig. 6a), which has a cascading effect on other parameters (i.e., proportion and Mg# of liquid and garnet).

Conclusion

  • This contribution tested four sets of solution models using two thermodynamic datasets (tcds55 and tcds62; Powell et al. 1998 and Holland and Powell 2011, respectively) for calculating isochemical phase diagram sections (Figs. 2 and 3) of six partial melting experiments covering a wide range of metasedimentary bulk compositions (Table 1 and Fig. 1).

  • Sets using the biotite model of Tajčmanová et al. (2009) and the datasets tcds55 better reproduce several experimentally derived parameters, notably: (1) biotite-breakdown temperatures (Fig. 4); (2) frequency of success at reproducing parageneses of the main phases (Fig. 5a); (3) proportion of biotite and garnet (Figs. 8, 9a, b); (4) more frequent success at reproducing experimental proportion of liquid and plagioclase (Fig. 8); (5) composition of biotite, garnet and plagioclase (Figs. 8, 9e, f, h); (6) reactions involving Ti-oxides (Fig. 12).

  • The solution model causing the discrepancies between sets is the biotite model with the model of Tajčmanová et al. (2009) being preferable.

  • Results presented herein and the excellent results obtained on natural rocks using this biotite model (Larson et al. 2013; Gervais and Crowley 2017; Perrot et al. 2020; Soucy La Roche et al. 2019), strongly argue in favor of using this biotite model for phase equilibrium forward modeling of migmatitic metasediments.