Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Errors in survey data can be divided depending on the source into two broad categories: sampling and non-sampling errors. The former includes errors in estimating the relevant population parameters derived from the inferential process: these tend to vanish as the sample size increases. Non-sampling errors mainly relate to measurement design, data collection and processing.

Non-sampling errors comprise quite diverse specific types of error that are usually harder to control than sampling ones. Following Biemer and Lyberg (2003), we can classify the non-sampling errors as: specification error; coverage or frame error; processing error; unit non-response; and measurement errors.Footnote 1 Usually non-sampling errors affect both bias and the variance of estimators; and their effects do not necessarily diminish as sample size increases. In many economic applications, the non-sampling component of total error outweighs the sampling one.Footnote 2 This is the case in many of the variables collected in the Bank of Italy’s Survey of Household Income and Wealth (SHIW). The survey estimate of total household net wealth is approximately half the corresponding value deriving from the financial accounts (FA). True, the FA data rely on many measurement hypotheses and are subject to errors; nevertheless this discrepancy cannot be attributed to sample variability and is likely to depend on non-sampling errors—presumably because of a lower propensity of wealthier households to participate in the survey and/or widespread underreporting by respondents of their assets. This evidence is the Bank of Italy’s strongest motivation for its efforts to analyse non-sampling errors for the household budget survey. In the next sections we evaluate non-sampling errors that typically occur in the SHIW. This informal approach allows the discussion of some of the typical problems associated with using household data.Footnote 3

After a brief description of the SHIW (Sect. 2), we describe the survey experiences with non-response (Sect. 3.1), measurement errors (Sect. 3.2) and underreporting (Sect. 3.3). Section 4 concludes.

2 The Survey on Household Income and Wealth

Since 1965, the SHIW gathers data on Italian households’ income, wealth, consumption and use of payment instruments. It was conducted annually until 1984 and biannually since (with the exception of 1998). The sample consists of about 8,000 households (secondary units) in 350 municipalities (primary units), drawn from a population of approximately 24 million households. The primary units are stratified by region and municipality size. Within each stratum, the selected municipalities include all those with a population of more than 40,000 units (self-representing municipalities), while the smaller towns are selected with probability proportional to size. At the second stage, the individual households are selected randomly from the population register.Footnote 4 \(^{,}\) Footnote 5 Through 1987 the survey used time-independent samples (cross sections) of households. In order to facilitate the analysis of changes, the 1989 survey introduced a panel component, and almost half of the sample now consists of households interviewed in one or more previous waves. Data are collected by a market research institute through computer-assisted personal interviews. Households answer an electronic questionnaire—that not only stores data but also performs a number of checks so that data inconsistencies can be remedied directly in the presence of the respondent. The Bank of Italy publishes a regular report with the main results, the text of the questionnaire and the main methodological choices. Anonymized microdata and full documentation can be accessed online for research purposes only (microdata are available from 1977 onwards). Recent economic studies based on this survey have covered such topics as households’ real and financial assets over time; risk aversion, wealth and financial market imperfections; dynamics of wealth accumulation; payment instruments used; and tax evasion. The financial section has been extensively exploited for studies on the financial structure of the Italian economy. The SHIW is also part of the European household survey promoted by the euro-area national central banks in order to gather harmonized data on income and wealth.

3 Unit Non-Response and Measurement Errors in the SHIW: Some Empirical Studies

3.1 The Analysis of Unit Non-Response

In most household surveys not all the units selected will participate. The difference between the intended and the actual sample reflects both unwillingness to participate (refusals) and other reasons (most commonly, “not at home”). This may have serious consequences for survey statistics, which need to be properly addressed. Let us consider the case of units that are selected to be surveyed but do not participate. Denoting by \(y_r\) the values of variable \(y\) for the group of \(n_r\) respondents and by \(y_{nr}\) the values for the unobserved group of \(n-n_r\) non-respondents, the estimator of the mean can be decomposed into two parts

$$\begin{aligned} \bar{y} = \frac{n_r}{n} \bar{y}_r + \frac{n-n_r}{n} \bar{y}_{nr}. \end{aligned}$$
(1)

The expected value of \(\bar{y}\) is given by \(\mu = f \mu _r + (1-f)\mu _{nr}\), where \(f\) is the response rate, i.e. the share of responding units in the population, and \(\mu _r\) and \(\mu _{nr}\) are the population means of the responding and non-responding units respectively.

The estimator computed on respondents only, \(\bar{y}_r\), is a biased estimator of \(\mu \), with a bias given by

$$\begin{aligned} E(\bar{y}_r)-\mu = (1-f)(\mu _r-\mu _{nr}). \end{aligned}$$
(2)

The magnitude of non-response bias depends both on the non-response rate \(1-f\) and on the difference between \(\mu _{r}\) and \(\mu _{nr}\). When non-response occurs, the estimator \(\bar{y}_r\) will be biased unless the pattern of non-response is random, that is the assumption \(\mu _r=\mu _{nr}\) holds.

In household surveys, however, we cannot assume that non-responses are totally random; both the sample units that refuse to participate and those that are not at home tend to belong to specific population groups; so we need a procedure to correct for the bias.Footnote 6

If we knew the participation probability \(p_i\) of household \(i\), an unbiased estimator of the population mean could be obtained by extending the Horvitz–Thompson estimator (Little and Rubin 1987)

$$\begin{aligned} \bar{y} = \frac{\sum _{i=1}^n w_i y_i}{\sum _{i=1}^n w_i}, \end{aligned}$$
(3)

where \(w_i = 1/(\pi _i p_i)\), to include both the probability of being included in the sample \(\pi _i\) and the probability of actually participating \(p_i\).Footnote 7 We assume that these two sets of weighting coefficients are independent of each other. In order to correct for non-response, we need information on the selection process governing the response behaviour. But how can we obtain information on this process, given that non-respondents—by definition—are not reached by interviewers or deliberately avoid participation?

Several statistical techniques, based on various assumptions, can be employed. Knowledge of the distribution of some relevant characteristics for the entire population allows us to compare the sample with the corresponding census data. A significant deviation of the sample distribution from that of the population gives us indirect information on the selection process. The sample composition can thus be aligned with the population distributions by means of post-stratification techniques.Footnote 8

The SHIW data show a higher frequency of elderly persons than the census of the population, while younger persons are underrepresented. Post-stratification is a common practice of embedding into estimators information about population structure; the procedure can also reduce the variability of the estimates. Unfortunately, the information available for post-stratification is often limited (sex, age, education, region, town size) and as such is insufficient for a complete detection of non-response behaviour.

As a part of the SHIW sample consists of households already interviewed in past waves (the panel component), information on the propensity to participate can be obtained by an analysis of attrition, i.e. non-participation of a panel household in a subsequent wave of the survey. Following this approach, Cannari and D’Alessio (1993) found that non-response characterizes households in urban areas and in the northern Italy; and that participation rates decline as income rises and household size decreases. The relationship with the age of the head of household is more ambiguous: not-at-homes decline sharply with age but refusals and other forms of non-participation increase. On the basis of these findings, Cannari and D’Alessio estimated that non-participation caused a 5.4 % underestimate of household income in 1989.

This approach cannot be considered fully satisfactory; in fact, its validity depends on the assumption that the pattern of attrition within the panel component can be assimilated to non-participation of households contacted for the first time. Actually, a household’s decision to participate in the survey may have been influenced by a previous interview and the estimation of the attrition pattern can shed light only on some aspects of non-response.

In many cases, some characteristics of non-respondents can be detected. In conducting personal interviews, for example, the characteristics of the neighbourhood and of the building are observable. In the most recent SHIW waves, several sorts of information on non-respondents have been gathered. Comparing respondents and non-respondents as regards these characteristics can help us understand the possible bias arising from non-response.

Information on the characteristics of non-respondents can also be inferred by analyzing the effort required to get the interview from responding households. The survey report usually includes a table with the number of contacts needed to obtain an interview, according to the characteristics of the households. In 2008, in order to get 7,977 interviews a total of 14,839 contacts was attempted (Banca d’Italia 2010b).Footnote 9 The difficulty of obtaining an interview increased with income, wealth and the educational attainment of the household head. It was easier to get interviews in smaller municipalities, with smaller households and with households headed by retired persons or women.Footnote 10

We can compare the households interviewed at first visit with those that have been interviewed after both an initial refusal or a failure in the contact (not at home). These two groups offer valuable information on non-response. The households successfully interviewed after first being found not-at-home and that who initially refused to participate appear to have a higher income and wealth than the average sample (for the two groups, by 5.0 and 21.6 % for income and by 5.5 and 27.1 % for wealth respectively).

Assuming that the households interviewed after an initial not-at-home or after a refusal can provide useful information on non-responding units, we can estimate the bias due to non-response. An adjusted estimate can be obtained by re-weighting the interviewed households by the inverse of their propensity to participate. The results for the 1998 survey (D’Alessio and Faiella 2002) showed that wealthier households had a lower propensity to participate in the SHIW. Thus the adjusted estimates of income and wealth are higher than the unadjusted estimates. The correction is smaller for income and for real wealth, more significant for financial assets (ranging respectively from 7 to 14 %, 8 to 21 % and 15 to 31  %, depending on the model adopted).Footnote 11

Different estimates of the effects of unit non-response on sample estimates were obtained by a specific experiment carried out in the 1998 survey. A supplementary sample of about 2,000 households, customers of a leading commercial bank, was contacted, 513 of which were actually interviewed.Footnote 12 For these out-of-sample households, the SHIW gathered data on actual financial assets held, the results of the current and supplementary samples were similar.Footnote 13

3.2 Measurement Errors: Uncorrelated Errors

One of the most important sources of error in sample surveys is the discrepancy between the recorded and the “true” micro-data. These inconsistencies may be due to response errors or to oversights in the processing phase prior to estimation. Irrespective of the reasons, the effects of errors on estimates are seldom negligible, so we need to evaluate their size and causes.

Involuntary errors in reporting values of some phenomena (e.g. the size of one’s dwellings), due to rounding or to lack of precise knowledge, may still cause serious problems for estimators.

Consider a continuous variable \(X\) measured with an additive error: \(Y=X+\varepsilon \). The measure \(Y\) differs from the true value \(X\) by a random component with the following properties: \(E(\varepsilon )=0\); \(E(X,\varepsilon )=\sigma _{X,\varepsilon }= 0\); \(E(\varepsilon ^2)= \sigma ^2_\varepsilon \). This type of disturbance is called homoscedastic and with uncorrelated measurement error. Under these assumptions, the average of \(Y\) is an unbiased estimator of the unobservable variable \(X\)–as \(E(Y)=E(X)\)–while the variance of \(Y\) is a biased estimator of the variance of \(X\). In fact

$$\begin{aligned} \sigma ^2_Y=\sigma ^2_X+\sigma ^2_\varepsilon =\frac{\sigma ^2_X}{\lambda ^2}, \end{aligned}$$
(4)

where \(\lambda ^2= \sigma ^2_X/\sigma ^2_Y\) is the reliability coefficient. Therefore, the index \(\lambda \) is the ratio of the \(X\) and \(Y\) variances (Lord and Novick 1968).Footnote 14

Under these assumptions, we can determine the equivalent size of a sample, i.e. the size that would yield the same variance of the sample mean if there were no measurement error: \(n^*=\lambda ^2 \cdot n\). If there were no error, equally precise estimates could be obtained with smaller samples (for instance, by 36 %, \((1- \lambda ^2)\), with a reliability index \(\lambda = 0.8\)).

In correlation analysis, if measurement error on \(X\) is assumed to be uncorrelated with \(X\) and with another variable \(Z\), measured free of error, then the correlation coefficient between \(X\) and \(Z\) is attenuated with intensity proportional to the reliability index of \(Y\): \(\rho _{Y,Z}=\lambda _Y \rho _{X,Z}\). If \(Z\) is also measured with error, \(W = Z + \eta \), with the \(\eta \) error of the same type as above and uncorrelated with \(\varepsilon \), the correlation coefficient is attenuated even more: \(\rho _{Y,W}=\lambda _Y \lambda _W \rho _{X,Z}\). In simple regression analysis too, measurement errors in independent variables lead to a downward bias in the parameter estimates (attenuation). In a multiple-regression context, measurement errors in independent variables still produce bias, but its direction can be either upward or downward. Random measurement error in the dependent variable does not bias the slope coefficients but does lead to larger standard errors.

The foregoing makes it clear that even unbiased and uncorrelated measurement errors may produce serious estimation problems.

How can we get a measure of the reliability of survey variables? A first possibility for time-invariant variables is the use of information collected over time on the same units (panel). In our survey half the sample is composed of panel households. If we assume that the measures of time invariant variables are independent (a plausible assumption for a survey conducted at two-year intervals), a comparison over time gives an indication of reliability.

Let \(Y_s\) and \(Y_t\) be the values observed in two subsequent waves, with additive errors: \(Y_s=X+\varepsilon _s\) and \(Y_t=X+\varepsilon _t\). Under the assumptions that

$$\begin{aligned} E(\varepsilon _s,\varepsilon _t)= 0\, \text{ and}\, E(X,\varepsilon _{s})= E(X,\varepsilon _{t}) = 0, \ \ \forall \ s,t=1,\ldots ,T, \ \ \ s \ne t, \end{aligned}$$
(5)

the correlation coefficient between the two measurements \(Y_s\) and \(Y_t\) equals the square of the reliability index: \(\rho _{Y_s,Y_t}=\lambda ^2\). If there is no measurement error, the coefficient equals 1. Hence, a reduction in the precision of the data collection process or in the reliability of the respondents’ answers lowers the correlation coefficient.

If we consider the surface area of the primary dwelling (computed only for households who did not move and did not incur extraordinary renovation expenses between the two survey waves), the correlation coefficient is 0.65 (and the reliability index \(\lambda =0.80\)). For the year of house construction, the correlation coefficient is still lower \((\rho =0.55)\); in 73 % of the cases, the spread is less than five years, but sometimes it is much greater, probably reflecting response difficulties for houses that have been heavily renovated.

Another variable that is subject to inconsistency is the year when the respondents started working. The usual problems of recall are presumably aggravated in this instance by a certain degree of ambiguity in the question: it is not clear whether occasional jobs or training periods should be included or not. Out of 6,708 individuals who answered the question both in 2006 and 2008, 40.6 % gave answers that do not match; linear correlation was only 0.71.

All these examples underscore the great importance and the difficulty, in surveys, of framing questions to which respondents can provide reliable answers. It is not only a problem of knowledge and memory. There may also be a more general ambiguity in definitions (how to count a garden or terrace in the surface area of a house? Should the walls be included?), which can be limited (say, by instructing both interviewers and respondents) but cannot be eliminated.

Dealing with categorical variables complicates the study; in fact the models presented above are no longer adequate. An index of reliability for categorical variables can be constructed using two measures (\(Y_1\) and \(Y_2\)) on the same set of \(n\) units. The fraction of units \(\lambda ^*\) classified consistently is a reliability index (Biemer and Trewin 1997). Analytically, \(\lambda ^*\) is given by

$$\begin{aligned} \lambda ^* = \frac{tr(F)}{n} = \frac{\sum _{i=1}^n f_{ii}}{n}, \end{aligned}$$
(6)

where \(F\) is the cross-tabulation of \(Y_1\) and \(Y_2\) whose generic element is \(f_{ij}\) and \(tr(.)\) is the trace operator, i.e. the sum of the diagonal elements.

However, the index \(\lambda ^*\) does not take account of the fact that consistent answers could be partly random: if the two measures \(Y_1\) and \(Y_2\) are independent random variables, the expected share of consistent units is \(\sum \nolimits _{i=1}^n f_{i.}f_{.i}/n\). A reliability index that controls for this effect is Cohen’s \(\kappa \) (Cohen 1960) that can be obtained by normalizing the share of observed matching cases with respect to the expected share, on the assumption that the two measurements of \(Y_1\) and \(Y_2\) were independent

$$\begin{aligned} \kappa = \frac{\lambda ^* - \sum _{i=1}^n f_{i.}f_{.i}/n}{1-\sum _{i=1}^n f_{i.}f_{.i}/n}. \end{aligned}$$
(7)

Both \(\lambda ^*\) and \(\kappa \) can also be applied to assess the reliability of all the categories of the qualitative variables, enabling us to pinpoint the main classification problems.Footnote 15

If we compare the information on the type of high school diploma reported in the 2006 and 2008 waves, we find that about 20 % of the responses differ (\(\lambda ^*=78.8\), Table 1). The transition matrix shows that a large part of the inconsistencies are between vocational and technical schools (4.1 and 4.2 %). In fact, the Technical school category reveals the lowest, but still high, reliability index \(\lambda _B^*=84.2\). However, once the correction for random consistent answers is considered the Cohen’s measure of reliability turns out to be \(\kappa =68.0\). Moreover, the residual Other and Vocational school categories appear to be quite unreliable (\(\kappa _F=17.8\) and \(\kappa _A=40.1\)).

Table 1 Reliability of type of high school degree, 2006–2008. Percentages

Unfortunately, most of the SHIW variables vary over time, so their reliability cannot be measured by these techniques. More sophisticated instruments are required to distinguish actual changes from those induced by wrong measurements. A simple model allowing the estimation of the reliability index on time-varying quantities has been proposed by Heise (1969). The Author showed that, under mild conditions, real dynamics can be disentangled from measurement errors by taking three separate measurements of the economic variable on the same panel units.

Let \(X_1\), \(X_2\) and \(X_3\) be the true unobservable values of the variable \(X\) during periods \(1\), \(2\), and \(3\), and \(Y_1\), \(Y_2\) and \(Y_3\) be the corresponding observed measures. In order to apply the Heise method we assume that

$$\begin{aligned} Y_t = X_t + \varepsilon _t \ \ \ \forall \ t=1,2,3 \end{aligned}$$
(8)

and the dependency structure between \(X_1\), \(X_2\) and \(X_3\) follows a first-order autoregressive model (not necessarily stationary) as

$$\begin{aligned} X_1 = \delta _1, \ X_2= \beta _{2,1} X_1 + \delta _2,\ldots ,X_3= \beta _{3,2} X_{2} + \delta _3 \end{aligned}$$
(9)

where \(\beta _{t,t-1}\) is the autoregressive coefficient and \(\delta _t\) is a classical idiosyncratic error. We further impose that the innovation \(\varepsilon _t\) follows a white noise process and that the level of reliability of a given variable does not vary over time. On these assumptions the estimate of reliability can be derived from the following simple relation

$$\begin{aligned} \lambda ^2= \frac{\rho _{Y_1Y_2}\rho _{Y_2Y_3}}{\rho _{Y_1Y_3}}. \end{aligned}$$
(10)

The intuition is that if measurement errors are independent over time and are not correlated with the underlying variable, then the absolute value of the estimated autocorrelation coefficients is lower than it would be if the observed value does not include measurement error. In fact, the method proposes an estimate of measurement reliability by comparing the product of one-step correlations \(\rho _{Y_1Y_2}\) and \(\rho _{Y_2Y_3}\) with the two-step correlation \(\rho _{Y_1Y_3}\). Without measurement error, the product \(\rho _{Y_1Y_2} \cdot \rho _{Y_2Y_3}\) would be equal to \(\rho _{Y_1Y_3}\). As the intensity of measurement error is actually proportional to the square of \(\rho _{Y_1Y_3}\), we can derive an indicator of measurement reliability by separating out the part that the model attributes to the actual variation of the underlying quantity.

In line with Biancotti et al. (2008), Table 2 reports the reliability indexes computed on three consecutive survey waves for the main variables, starting with 1989–1991–1993 and ending with 2004–2006–2008. The reliability estimate for income (on average 0.87) is higher than for net wealth and consumption (both averaging about 0.80).Footnote 16 Among the income components, higher index numbers are found for pension and transfer and for wage and salary (both around 0.95); incomes from self-employment or capital show lower values (around 0.80). As to the wealth components, greater reliability is found for real assets (on average 0.82), and in particular for primary residences (0.90), and lesser for financial assets (0.65).

Table 2 Heise reliability indexes of the main variables in the SHIW, 1989–2008

These results are useful from three different perspectives. First, they allow the many researchers who use the survey to take this aspect properly into account, i.e. by selecting, among similar economic indicators, the most reliable. This benefit may also extend to other, similar surveys, which are likely to be affected by the same issues. Second, our results can help data producers for this kind of survey to find ways of reducing this kind of error; in fact, the difficulties discussed here are not specific to the SHIW data acquisition procedures. Quantifying their impact and determining their causes are essential preliminaries to improving survey procedures. Third, our conclusions can hopefully serve as standard practice for data producers and a blueprint for quality reporting.

3.3 Measurement Errors: Underreporting

In household surveys on income and wealth, the most significant type of measurement error is the voluntary underreporting of income and wealth. This type of error can produce severe bias in estimates, and special techniques are required to overcome this effect.

To evaluate the underreporting problem, a useful approach is to compare the survey estimates with other sources of data such as the National Accounts, administrative registers, fiscal data, and other surveys. For example, the number of dwellings declared in the survey differs significantly from the number owned by households according to the census.Footnote 17 On the basis of this evidence, underreporting by households could amount to as much as 20 or 25 % of all dwellings.

Further, underreporting is not constant by type of dwelling. While owner-occupied dwellings (principal residences) appear to be always declared, underreporting of other real estate owned proves to be very substantial. The SHIW itself allows a comparison between the estimate of the total number of houses owned by households and rented to others and the corresponding estimate drawn from the number of households living in rented dwellings.Footnote 18 In practice, the underestimation here appears to be very severe, as much as 60 or 70 %.

The estimates of real and financial wealth also appear to be underestimated by comparison with the aggregate accounts (Banca d’Italia 2010a). The bias is greater for financial assets, and underreporting is larger for less commonly held assets (equity and investment fund units). This suggests that unadjusted sample estimates are biased and that this distortion is not uniform across segments of the population.

How can we learn more about this, and how can we adjust the estimates accordingly? One way of assessing the credibility of the survey responses is to ask for the interviewers’ own impression. That is, in the course of the interviews they are requested to look out for additional information, making a practical comparison between the household’s answers and the objective evidence they can see for themselves: type of neighbourhood and dwelling, the standard of living implied by the quality of furnishings, and so on.

In the 2008 survey, credibility is satisfactory overall (an average score of 7.6 out of 10) but not completely uniform. The highest scores are for the better educated and for payroll employees (7.9 and 7.8, respectively), the lowest for the elderly and the self-employed (7.4 and 7.3, respectively).

The correlation coefficient between the credibility score and the declared values of income, financial assets and financial liabilities is positive and significant, but small. The use of this type of information is of little help for the adjustment of the estimates. For example, considering only the sub-sample of households with credibility better than 5 (around 90 % of the sample), average household income rises by just 1.1 %. The adjustment is a bit larger (2.8 %) considering only the households that score 7 or more. In these two cases, the wealth adjustments are respectively 0.8 and 3.2 %; the adjustment for financial assets is greater (between 4 and 11 %).

Taking a completely different approach, underreporting can be analysed by statistical matching procedures. Cannari et al. (1990) performed statistical matching between the SHIW answers and the data acquired by means of a specific survey conducted by a commercial bank on its customers. Under the hypothesis that the bank clients report the full amount of financial assets held, as customers are likely to trust their bank, the Authors estimated the amount of financial assets held by the households in the SHIW database.Footnote 19 The study concluded that the survey respondents tend to underreport their assets quite significantly. The underreporting involved several different components. Some households, in fact, do not declare any bank or postal accounts, and hence the ownership of financial assets is underestimated. This behaviour was determined to result in an underestimation of about 5 %; it was more frequent among the poorer and less educated respondents. Underestimation due to non-reporting of single assets, i.e. the omission of assets actually held, involved a further 10 % of assets. But the bulk of the underreporting concerned the amounts of the assets declared. The study found that for a declared value of 100, households actually held assets worth 170.

Applying this correction, the total amount of financial assets owned by households doubled. The discrepancy with respect to the financial accounts was sharply reduced, but a significant gap remained, presumably deriving from definitional differences and the very substantial asset holdings of the tiny group of very wealthy households, which are not properly represented in sample surveys. The adjustment ratio for financial assets, finally, was higher among the elderly and the self-employed.

Another matching experiment, based on the same data but with different methods (Cannari and D’Alessio 1993), confirmed the foregoing results. The experiment also showed that the Gini concentration index of household wealth was not seriously affected by the adjustment procedures (from 0.644 to 0.635 for 1991).

In a recent paper on this topic, D’Aurizio et al. (2006) use an alternative method and data drawn from a different commercial bank. On average, the adjusted estimates are more than twice the unadjusted data and equal to 85 % of the financial accounts figures. The adjustments are greatest for the households whose head is less educated or retired.

Neri and Zizza (2010) propose different approaches to correct for underreporting of household income. To adjust the estimates for self-employed households, the procedure uses the ratio of the value of the primary residence to labour income; this approach is a variant of the one proposed by Pissarides and Weber (1989), based on the ratio of food expenditure to income. The ratio of the value of homes to labour income is estimated first for public employees, whose answers are presumed not to be underreported. The estimated parameters are then applied to the self-employed (the value of houses is assumed to be reported correctly by both types of respondent). On this basis the estimated average income from self-employment is 36 % greater than the unadjusted figure. To adjust income from financial assets, the authors used the (D’Aurizio et al. 2006) methodology for the correction of financial stocks, simply applying a return rate to the adjusted capital stock. It was found that on average this adjustment tripled the reported income. The increase in liabilities was modest (just 9 %). As to the income from real estate, they used the procedure developed by Cannari and D’Alessio (1990), which adjusts the number of declared second homes to the Census. The income from actual and imputed rents increased on average by 23 %. Income sources from other labour activities was adjusted on the basis of the Italian part of the European Union Statistics on Income and Living Conditions (EU-SILC), which includes information from administrative and fiscal sources. With this adjustment, additional payroll and self-employment income increased by 3 and 4 % points respectively. Overall, the adjustment procedures produce an estimate of total family income about 12 % greater than the declared value (between 2 and 4 times the corresponding sampling errors). In summary, analysis of the discrepancy between the survey figures and the financial accounts shows the simultaneous presence of non-response, non-reporting and underreporting. The underestimation of financial assets and liabilities due to non-participation in the survey appears to be less substantial than that caused by non-reporting and underreporting.

In the 2010 survey, the SHIW tried the unmatched count technique (Raghavarao and Federer 1979) for eliciting honest answers on usury, a serious problem mainly for small businesses and poor households but a phenomenon on which no reliable information is available. The technique uses anonymity to get a larger number of true answers to sensitive or embarrassing questions. In this case, the respondents are randomly split into two groups, \(A\) and \(B\). The control group \(B\) is asked to answer a set of \(k\) harmless binary questions \(X_1,\ldots ,X_k\), while the treatment group \(A\) has one additional question \(Y\) (the sensitive one). The respondents in both groups are to reveal only the number of applicable activities or behaviors, not to respond specifically to each item. Hence, the answers have the forms of \(S_B=X_1+ X_2+\cdots +X_k\) and \(S_A=S_B+Y\) for respondents belonging to \(A\) and \(B\) group respectively. With the unmatched count, the number of people who answered “yes” to the sensitive question is estimated by comparing the two mean values: \(\overline{Y}=\overline{S}_A-\overline{S}_B\). Under certain conditions, researchers can also perform regressions on this type of data.Footnote 20

4 Concluding Remarks

This work has described the research done at the Bank of Italy on non-sampling errors in the SHIW to bring out the most common problems in household surveys. These errors are frequent and constitute the largest part of the total error. We gauge the impact of non-participation in the survey, classic measurement error and underreporting, and describe some practical procedures for correcting these error sources. We show that the correction procedures often depend on the specific assumptions. For this reason the techniques are more in the nature of tools that a researcher can legitimately use than of standard practices for the production of descriptive statistics, such as those reported in the official Bank of Italy reports.

As survey designers, we have shown that it is simply essential to collect additional information, beyond that strictly related to the content of the survey. In the SHIW, we acquire information on: the households not interviewed; the effort needed to acquire the interviews; the time spent on the interviews; the credibility of answers; and the characteristics of the interviewers themselves. All these data can help us to grasp the extent and the causes of the various types of non-sampling error.

The analysis may serve to suggest more effective survey design. In fact, we have shown the lower response rate observed for wealthier households, which the usually employed stratification and post-stratification criteria are not able to correct properly. The availability of data on the average market value of houses by neighbourhood within the main cities suggests that serious consideration should be given to revising these criteria. Another solution might be the over-sampling of wealthier households to improve the efficiency of some overall estimators.

Specific techniques for collecting sensitive information are available. More generally, the questionnaire should be designed to include careful evaluation of various aspects of apparently less problematic questions as well.

Another matter for further research, on which work is under way, is interviewer effects: heterogeneous performances among interviewers in terms of response rate and measurement error. The results could help us to improve selection and training procedures.

The work also showed that the sample estimates for income and wealth are seriously affected by underreporting, in spite of the efforts to overcome respondents’ distrust. This evidence suggested increasing the share of panel households, which was accordingly raised from 25 % in 1991 to 55 % in 2008. Panel households, in fact, are better motivated to give truthful responses. The average credibility score for the panel households is greater than for households interviewed for the first time (7.73 as against 7.44 in 2008). However, while it may improve response credibility, increasing the panel proportion may reduce the coverage of particular population segments (e.g. young households) and worsen sample selection due to unit non-response. The terms of this trade-off need to be carefully evaluated.

As survey data users, we are aware that knowledge of the types of non-sampling errors can greatly improve both the specification of the empirical model and the interpretation of the results. In conclusion, we urge that in using surveys data practitioners maintain a critical reserve concerning the possible non-sampling errors affecting this type of data.