Testing Macro Models by Indirect Inference: A Survey for Users

Le, Vo Phuong Mai; Meenagh, David; Minford, Patrick; Wickens, Michael; Xu, Yongdeng

doi:10.1007/s11079-015-9377-5

Testing Macro Models by Indirect Inference: A Survey for Users

Research Article
Published: 25 September 2015

Volume 27, pages 1–38, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Open Economies Review Aims and scope Submit manuscript

Testing Macro Models by Indirect Inference: A Survey for Users

Download PDF

Vo Phuong Mai Le¹,
David Meenagh¹,
Patrick Minford^1,2,
Michael Wickens^1,2,3 &
…
Yongdeng Xu¹

732 Accesses
44 Citations
Explore all metrics

Abstract

With Monte Carlo experiments on models in widespread use we examine the performance of indirect inference (II) tests of DSGE models in small samples. We compare these tests with ones based on direct inference (using the Likelihood Ratio, LR). We find that both tests have power so that a substantially false model will tend to be rejected by both; but that the power of the II test is substantially greater, both because the LR is applied after re-estimation of the model error processes and because the II test uses the false model’s own restricted distribution for the auxiliary model’s coefficients. This greater power allows users to focus this test more narrowly on features of interest, trading off power against tractability.

Indirect Inference and Small Sample Bias — Some Recent Results

Article Open access 01 September 2023

Testing DSGE Models by Indirect Inference: a Survey of Recent Findings

Article Open access 28 February 2019

Indirect Inference

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

An unresolved issue in macroeconomics is how best to evaluate the empirical performance of DSGE models. In this paper we compare a relatively new type of test, indirect inference, with a standard procedure, the Likelihood Ratio test. Our main concern is the performance of these tests in small samples, though we will refer to asymptotic properties where known. Our main finding is that the power of the likelihood ratio test is rather weak relative to that of the indirect inference test. We consider why we find this. We also show how this new testing procedure enables users such as policymakers to exploit the ability of the test and its associated estimator to focus on key features of macro behaviour; this allows them to find tractable models that are relevant to their purposes and then to discover whether these models can with total reliability evaluate the policy reforms they are interested in.

The paper is set out as follows. In Section 2 we consider how in recent work DSGE models have been evaluated empirically. In Section 3 we review the main features of the indirect inference testing procedure as implemented in this paper. In Section 4 we compare the small sample properties of tests based on indirect inference with the Likelihood Ratio test that is used in direct inference. The comparison is based on Monte Carlo experiments on the widely used DSGE model introduced by Christiano et al. (2005) and estimated by Smets and Wouters (2003, 2007) on EU and US data. Initially, we use stationary data. In Section 5 we extend the analysis to non-stationary data and to the three-equation New Keynesian representation of the model of Clarida et al. (1999), again on both stationary and non-stationary data. In Section 6 we consider why the two testing methods have such different power, drawing on available asymptotic analysis as well as further Monte Carlo experiments. In Section 7 we show how the testing methods we propose can be used in practice to reduce model uncertainty for a user with a clear purpose such as policy reform. Our final section presents our conclusions.

2 The Empirical Evaluation of DSGE Models

DSGE models emerged largely as a response to the perceived shortcomings of previous formulations of macroeconometric models. The main complaints were that these macroeconometric models were not structural - despite being referred to as structural macroeconometric models - and so were subject to Lucas’s critique that they could not be used for policy evaluation (Lucas 1976), that they were not general equilibrium models of the economy but, rather, they comprised a set of partial equilibrium equations with no necessary coherent structure, that they incorporated ‘incredible’ identifying restrictions (Sims 1980) and that they over-fitted the data through data-mining. For all their theoretical advantages, the strong simplifying restrictions on the structure of DSGE models resulted in a severe deterioration of fit compared to structural macroeconometric models with their ad hoc supply and demand functions, their flexible lagged adjustment mechanisms and their serially correlated structural errors.

There have been various reactions to the empirical failures of DSGE models. The early version of the DSGE model, the RBC model, was perceived to have four main faults: predicted consumption was too smooth compared with the data, real wages were too flexible resulting in employment being too stable, the predicted real interest rate was too closely related to output and the model, being real, could not admit real effects arising from nominal rigidities. In retrospect, however, this empirical examination was limited and flawed. Typically, the model was driven by a single real stochastic shock (to productivity); there were no nominal shocks or mechanisms causing them to affect real variables; and the model’s dynamic structure was derived solely from budget constraints and the capital accumulation equation. Subsequent developments of the DSGE model aimed to address these limitations, and other specification issues, and they had some empirical success. Nevertheless, even this success has been questioned; for example Le et al. (2011) reject the widely acclaimed model of Smets and Wouters (2007).

Another reaction, mainly from econometricians, is the criticism that DSGE models have been calibrated (to an economy) rather than estimated and tested using traditional methods, and when estimated and tested using classical econometric methods, such as the Likelihood Ratio test, they are usually found to perform poorly and are rejected. Sargent^{Footnote 1}, discussing the response of Lucas and Prescott to these rejections, is quoted as saying that they thought that ‘those tests were rejecting too many good models’.

Current practice is to try to get around this problem by estimating DSGE models using Bayesian rather than classical estimation methods. Compared with calibration, Bayesian methods allow some flexibility in the prior beliefs about the structural parameters and permit the data to affect the final estimates. Calibrated parameters or, equivalently, the priors used in Bayesian estimation, often come from other studies or from micro-data estimates. Hansen and Heckman (1996) point out that the justification for these is weak: other studies generally come up with a wide variety of estimates, while micro-estimates may well not survive aggregation. If the priors cannot be justified and uninformative priors are substituted, then Bayesian estimation simply amounts to classical ML in which case test statistics are usually based on the Likelihood Ratio. The frequency of rejection by such classical testing methods is an issue of concern in this paper.

A more radical reaction to the empirical failures of DSGE models has been to say that they are all misspecified and so should not be tested by the usual econometric methods which would always reject them - see Canova (1994). If all models are false, instead of testing them in the classical manner under the null hypothesis that they are true, one should use a descriptive statistic to assess the ‘closeness’ of the model to the data. Canova (1994), for example, remarks that one should ask “how true is your false model?” and assess this using a closeness measure. Various econometricians - for example Watson (1993), Canova (1994, 1995, 2005), Del Negro and Schorfheide (2004, 2006) - have shown an interest in evaluating DSGE models in this way.

We adopt a somewhat different approach that restores the role of formal statistical tests of DSGE models and echoes the widely accepted foundations of economic testing methodology laid down by Friedman (1953). Plainly no DSGE model, or indeed no model of any sort, can be literally true as the ‘real world’ is too complex to be represented by a model that is ‘true’ in this literal sense and the ‘real world’ is not a model. In this sense, therefore, all DSGE models are literally false or ‘mis-specified’. Nevertheless an abstract model plus its implied residuals which represent other influences as exogenous error processes, may be able to mimic the data; if so, then according to usual econometric usage, the model would be ‘well specified’. The criterion by which Friedman judged a theory was its potential explanatory power in relation to its simplicity. He gave the example of perfect competition which, although never actually existing, closely predicts the behaviour of industries with a high degree of competition. According to Friedman, a model should be tested, not for its ‘literal truth’, but ‘as if it is true’. Thus, even though a macroeconomic model may be a gross simplification of a more complex reality, it should be tested on its ability to explain the data it was designed to account for by measuring the probability that the data could be generated by the model. In this spirit we assess a model using formal misspecifications tests. The probability of rejection gives a measure of the model’s ‘closeness’ to the facts. This procedure can be extended to a sub-set of the variables of the model rather than all variables. In this way, it should be possible to isolate which features of the data the model is able to mimic; different models have different strengths and weaknesses (‘horses for courses’) and our procedure can tease these out of the tests.

The test criterion may be formulated in a number of ways. It could, for example, be interpreted as a comparison of the values of the likelihood function for the DSGE model, or of a model designed to represent the DSGE model (an auxiliary model), or it could be based on the mean square prediction error of the raw data or on the impulse response functions obtained from these models or, as explained in more detail later, it could be based on a comparison of the coefficients of the auxiliary model being associated with the DSGE model. These criteria fall into two main groups: on the one hand, closeness to raw data, size of mean squared errors and ‘likelihood’ and, on the other hand, closeness to data features, to stylised facts or to coefficients of VARs or VECMs. Within each of these two categories the criteria can be regarded as mapping into each other so that there are equivalences between them; for example, a VAR implies sets of moments/cross-moments and vice versa. We discuss both types in this paper; we treat the Likelihood Ratio as our representative of the first type and the coefficients of a VAR as our representative of the second.

Before DSGE models were proposed as an alternative to structural macroeconometric models, in response to the latter’s failings, Sims (1980) suggested modelling the macroeconomy as a VAR. This is now widely used in macroeconometrics as a way of representing the data in a theory-free manner in order, for example, to estimate impulse response functions or for forecasting where they perform as well, or sometimes better, than structural models, including DSGE models, see Wieland and Wolters (2012) and Wickens (2013). Moreover, it can be shown that the solution to a (possibly linearized) DSGE model where the exogenous variables are generated by a VAR is, in general, a VAR with restrictions on its coefficients, Wickens (2013). It follows that a VAR is the natural auxiliary model to use for evaluating how closely a DSGE model fits the data whichever of the measures above are chosen for the comparison. The data can be represented by an unrestricted VAR and the DSGE model by the appropriately restricted VAR; the two sets of estimates can then be compared according to the chosen measure.

The apparent difficulty in implementing this procedure lies in estimating the restricted VAR. Indirect inference provides a simple solution. Having estimated the DSGE model by whatever means - the most widely used at present being Bayesian estimation - the model can be simulated to provide data consistent with the estimated model using the errors backed out of the model. The auxiliary model is then estimated unrestrictedly both on these simulated data and on the original data. The properties of the two sets of VAR estimates can then be compared using the chosen measure. More precise details of how we carry out this indirect inference procedure in this paper are given in the next section^{Footnote 2}.

3 Model Evaluation by Indirect Inference

Indirect inference provides a classical statistical inferential framework for judging a calibrated or already, but maybe partially, estimated model whilst maintaining the basic idea employed in the evaluation of the early RBC models of comparing the moments generated by data simulated from the model with actual data. An extension of this procedure is to posit a general but simple formal model (an auxiliary model) — in effect the conditional mean of the distribution of the data — and base the comparison on features of this model, estimated from simulated and actual data. If necessary these features can be supplemented with moments and other measures directly generated by the data and model simulations.

Indirect inference on structural models may be distinguished from indirect estimation of structural models. Indirect estimation has been widely used for some time, see Smith (1993), Gregory and Smith (1991, 1993), Gourieroux et al. (1993), Gourieroux and Monfort (1995) and Canova (2005). In indirect estimation the parameters of the structural model are chosen so that when this model is simulated it generates estimates of the auxiliary model similar to those obtained from actual data. The optimal choice of parameters for the structural model are those that minimise the distance between the two sets of estimated coefficients of the auxiliary model. Common choices for the auxiliary model are the moments of the data, the score and a VAR. Indirect estimates are asymptotically normal and consistent, like ML. These properties do not depend on the precise nature of the auxiliary model provided the function to be tested is a unique mapping of the parameters of the auxiliary model. Clearly, the auxiliary model should also capture as closely as possible the data features of the DSGE model on the hypothesis that it is true.

Using indirect inference for model evaluation does not necessarily involve the estimation of the parameters of the structural model. These can be taken as given. They might be calibrated or obtained using Bayesian or some other form of estimation. If the structural model is correct then its predictions about the auxiliary model estimated from data simulated from the given structural model should match those based on actual data. These predictions relate to particular properties (functions of the parameters) of the auxiliary model such as its coefficients, its impulse response functions or just the data moments. A test of the structural model may be based on the significance of the difference between estimates of these functions derived from the two sets of data. On the null hypothesis that the structural model is ‘true’ there should be no significant difference. In carrying out this test, rather than rely on the asymptotic distribution of the test statistic, we estimate its small sample distribution and use this.

Our choice of auxiliary model exploits the fact that the solution to a log-linearised DSGE model can be represented as a restricted VARMA and also often by a VAR (or if not then closely represented by a VAR). For further discussion on the use of a VAR to represent a DSGE model, see for example Canova (2005), Dave et al. (2007), Del Negro and Schorfheide (2004, 2006) and Del Negro et al (2007a, b) (together with the comments by Christiano (2007), Gallant (2007), Sims (2007), Faust (2007) and Kilian (2007)), and Fernandez-Villaverde et al. (2007). A levels VAR can be used if the shocks are stationary, but a VECM is required, as discussed below, if there are non-stationary shocks. The structural restrictions of the DSGE model are reflected in the data simulated from the model and will be consistent with a restricted version of the VAR^{Footnote 3}. The model can therefore be tested by comparing unrestricted VAR estimates (or some function of these estimates such as the value of the log-likelihood function or the impulse response functions) derived using data simulated from the DSGE model with unrestricted VAR estimates obtained from actual data.

The model evaluation criterion we use is based on the difference between the vector of relevant VAR coefficients from simulated and actual data as represented by a Wald statistic. If the DSGE model is correct (the null hypothesis) then the simulated data, and the VAR estimates based on these data, will not be significantly different from those derived from the actual data. The method is in essence extremely simple; although it is numerically taxing, with modern computer resources, it can be carried out quickly. The simulated data from the DSGE model are obtained by bootstrapping the model using the structural shocks implied by the given (or previously estimated) model and computed from the historical data. The test then compares the VAR coefficients estimated on the actual data with the distribution of VAR coefficient estimates derived from multiple independent sets of the simulated data. We then use a Wald statistic (WS) based on the difference between a _T, the estimates of the VAR coefficients derived from actual data, and $\overline {a_{S}(\theta _{0})}$, the mean of their distribution based on the simulated data, which is given by:

$$WS=(a_{T}-\overline{a_{S}(\theta_{0})})^{\prime }W(\theta_{0})(a_{T}- \overline{a_{S}(\theta_{0})}) $$

where W(θ ₀) is the inverse of the variance-covariance matrix of the distribution of simulated estimates a _S, and θ ₀ is the vector of parameters of the DSGE model on the null hypothesis that it is true.

As previously noted, we are not compelled to use the VAR coefficients in this formula: thus one could use other data ‘descriptors’ considered to be key features of the data that the model should match — these could be particular impulse response functions (such as to a monetary policy shock) or particular moments (such as the correlations of various variables with output). However, such measures are functions of the VAR coefficients and it seems that a parsimonious set of features is these coefficients themselves. There are still issues about which variables to include in the VAR (or equivalently whether to focus only on a subset of VAR coefficients related to these variables) and what order of lags the VAR should be. Also it is usual to include the variances of the data or of the VAR residuals as a measure of the model’s ability to match variation. We discuss these issues further below.

We can show where in the Wald statistic’s bootstrap distribution the Wald statistic based on the data lies (the Wald percentile). We can also show the Mahalanobis Distance based on the same joint distribution, normalised as a t-statistic, and also the equivalent Wald p-value, as an overall measure of closeness between the model and the data.^{Footnote 4} In Le et al. (2011) we applied this test to a well-known model of the US, that of Smets and Wouters (2007; qv). We found that the Bayesian estimates of the Smets and Wouters (SW) model were rejected for both the full post-war sample and for a more limited post-1984 (Great Moderation) sample. We then modified the model by adding competitive goods and labour market sectors. Using a powerful Simulated Annealing algorithm, we searched for values of the parameters of the modified model that might improve the Wald statistic and succeeded in finding such a set of parameters for the post-1984 sample.

A variety of practical issues concerning the use of the bootstrap and the robustness of these methods more generally are dealt with in Le et al. (2011). A particular concern with the bootstrap has been its consistency under conditions of near-unit roots. Several authors (e.g. Basawa et al. (1991), Hansen (1999) and Horowitz (2001a, b)) have noted that asymptotic distribution theory is unlikely to provide a good guide to the bootstrap distribution of the AR coefficient if the leading root of the process is a unit root or is close to a unit root. This is also likely to apply to the coefficients of a VAR when the leading root is close to unity and may therefore affect indirect inference where a VAR is used as the auxiliary model. In Le et al. (2011) we carried out a Monte Carlo experiment to check whether this was a problem in models such as the SW model. We found that the bootstrap was reasonably accurate in small samples, converged asymptotically on the appropriate chi-squared distribution and, being asymptotically chi-squared, satisfied the usual requirement for consistency of being asymptotically pivotal.

4 Comparing Indirect and Direct Inference Testing Methods

It is useful to consider how indirect inference is related to the familiar benchmark of direct inference. We focus on the Likelihood Ratio as representative of direct inference. We seek to compare the distribution of the Wald statistic for a test of certain features of the data with the corresponding distribution for likelihood ratio tests. We are particularly interested in the behaviour of these distributions on the null hypothesis and the power of the tests as the model deviates increasingly from its specification under the null hypothesis. We address these questions using Monte Carlo experiments.

4.1 Some Preliminary Experiments Comparing Indirect with Direct Inference

We base our comparison on tests of the performance of DSGE models. Our first comparison is based on the SW model of the US, estimated over the whole post-war sample (1947Q1−2004Q4), and with a VAR as the auxiliary model. We treat the SW model as true. The focus of the two tests is slightly different: direct inference asks how closely the model forecasts current data while indirect inference asks how closely the model replicates properties of the auxiliary model estimated from the data. For direct inference we use a likelihood ratio (LR) test of the DSGE model against the unrestricted VAR. In effect, this test shows how well the DSGE model forecasts the ‘data’ compared with an unrestricted VAR estimated on that data.

We examine the power of the Wald test by positing a variety of false models, increasing in their order of falseness. We generate the falseness by introducing a rising degree of numerical mis-specification for the model parameters. Thus we construct a False DSGE model whose parameters were moved x % away from their true values in both directions in an alternating manner (even-numbered parameters positive, odd ones negative); similarly, we alter the higher moments of the error processes (standard deviation, skewness and kurtosis) by the same +/−x %. We may think of this False Model as having been proposed as potentially ‘true’ following previous calibration or estimation of the original model but which was at the time thought to be mis-specified.^{Footnote 5}

Many of the structural disturbances in the SW model are serially correlated, some very highly. These autocorrelated errors in a DSGE model are regarded as exogenous shocks (or combinations of shocks) to the model’s specification, such as preferences, mark-ups, or technological change, the type of shock depending on which equation they appear in. Although they are, therefore, effectively the model’s exogenous variables, they are not observable except as structural residuals in these equations. The significance of this is that, when the False models are constructed, the autocorrelation processes of the resulting structural errors are likely to be different. This difference is a marker of the model’s mis-specification, as is the falseness of the structural coefficients. In order to give the model the best chance of not being rejected by the LR test, therefore, it is normal to re-estimate the autocorrelation processes of the structural errors. For the Wald test we falsify all model elements, structural and autocorrelation coefficients, and innovation properties, by the same +/−x %.

In evaluating the power of the test based on indirect inference using our Monte Carlo procedure we generate 10,000 samples from some True model (where we take an error distribution with the variance, skewness and kurtosis found in the SW model errors), and find the distribution of the Wald for these True samples. We then generate a set of 10,000 samples from the False model with parameters θ and calculate the Wald distribution for this False Model. We then calculate how many of the actual samples from the True model would reject the False Model on this calculated distribution with 95 % confidence. This gives us the rejection rate for a given percentage degree +/−x of mis-specification, spread evenly across the elements of the model. We use 10,000 samples because the size of the variance-covariance matrix of the VAR coefficients is large for VARs with a large number of variables.^{Footnote 6}

In evaluating the power of the test under direct inference we need to determine how well the DSGE model forecasts the simulated data generated by the True Model compared with a VAR model fitted to these data. We use the first 1000 samples; no more are needed in this case. The DSGE model is given a parameter set θ and for each sample the residuals and their autoregressive parameters ρ are extracted by LIML (McCallum 1976; Wickens 1982). The IV procedure is implemented using the VAR to project the rational expectations in each structural equation; the residual is then backed out of the resulting equation. In the forecasting test the model is given at each stage the lagged data, including the lagged errors. We assume that since the lagged errors are observed in each simulated sample, the researcher can also estimate the implied ρs for the sample errors and use these in the forecast. We assume the researcher does this by LIML which is a robust method — clearly the DSGE model’s forecasting capacity is helped by the presence of these autoregressive error processes. We find the distribution of the LR when θ is the true model. We then apply the 5 % critical value from this to the False model LR value for each True sample and obtain the rejection rate for the False Model. Further False models are obtained by changing the parameters θ by + or −x %.^{Footnote 7}

Table 1 shows that the power of the Indirect Inference Wald test is substantially greater than that of the Direct Inference LR test. With 5 % mis-specification, the Wald statistic rejects 87 % of the time (at the 95 % confidence level) while the LR test rejects 13 % of the time. At a sufficiently high degree of falseness both reject 100 % of the time. Nonetheless, the LR test also has reasonable power. Figure 1, which shows the correlation coefficients between the two tests for the true and 3 % false models, shows that there is little or no correlation between the two tests across samples. However, Fig. 2, which is a scatter diagram of the correlations between the two test statistics on the same samples but for increasing degrees of falseness, shows that as the model becomes more false, both tests increase their rejection rate. Taken together, these findings suggest that, when one measure is well-fitting, it may be well-fitting or badly-fitting on the other measure. A possible explanation for these findings is that the two tests are measuring different things; the LR test is measuring the forecasting ability of the model while the Wald test is measuring the model’s ability to explain the sample data behaviour.

Table 1 Rejection rates for wald and likelihood ratio for 3 variable VAR(1)

Testing Macro Models by Indirect Inference: A Survey for Users

Abstract

Similar content being viewed by others

Indirect Inference and Small Sample Bias — Some Recent Results

Testing DSGE Models by Indirect Inference: a Survey of Recent Findings

Indirect Inference

1 Introduction

2 The Empirical Evaluation of DSGE Models

3 Model Evaluation by Indirect Inference

4 Comparing Indirect and Direct Inference Testing Methods

4.1 Some Preliminary Experiments Comparing Indirect with Direct Inference

4.1.1 Comparison of the Tests with Different VAR Variable Coverage and VAR lag Order

4.1.2 Estimation and Test Power

5 Extending the Test Comparison

5.1 Non-stationary Shocks Applied to the SW Model

5.2 Extension to the 3-equation New Keynesian Model

6 Why Does the Indirect Inference Test have Greater Power than the Likelihood Ratio Test?

6.1 Summary: Why the Power is Different

6.1.1 Reason a): The Tests Employ Different Procedures so the Comparison is of Different Models

6.1.2 Reason b) Comparative Power when the LR and Indirect Inference Wald Procedures are Like-for-like

Understanding the Extra Power Provided by using the Restricted rather than the Unrestricted Wald Tests

Exploiting the Extra Power of the Wald-type Test with DSGE-model-restricted Variance Matrix

7 Using These Methods to Test a Model

7.1 Can Estimation Protect us Against Falseness?

7.2 Choosing the Testing Procedure

8 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Available IIW tests of macro models

Appendix 2: Steps in deriving the Wald statistic

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation