Introduction

Loss to follow-up is a potential source of bias in epidemiology. In the ideal world, there would be no boundaries for registration of health outcomes and migration would not cause any loss to follow-up. In the real world, however, emigration can cause loss to follow-up in disease or mortality studies, because migrants move out of the (national) catchment area for disease or death registration. Biased estimates of disease occurrence or exposure-disease associations can occur if emigrants and source populations differ with respect to health risk. This is likely whenever selective forces out of (emigration) or into (immigration, repatriation) a catchment area correlate with health or health-related characteristics. There are several options when dealing with emigration in cohort studies but apparently no standard way of computing person-time at risk. The most common solution is to censor out persons at the date of emigration (Mittendorfer-Rutz et al. [1]). This may not be the ideal solution (MacMahon and Pugh [2]) because migrations are dynamic events where emigration may be followed by repatriation. A dynamic concept with initial follow-up until emigration and re-established follow-up from the date of repatriation has been applied in some studies, eg, the British birth cohorts (Wadsworth et al. [3, 4], Atherton et al. [5]). Other options are to disregard emigration entirely (Saurel-Cubizolles et al. [6]), which would be the only solution if data on migration events are lacking. A less common procedure is to exclude from analysis all subjects who emigrate during follow-up (Helgesen et al. [7]).

Documentation concerning health risks among emigrants compared to the population of origin is scarce. Wadsworth et al. [3] found that emigrants in the British 1946 birth cohort were a selection of more skilled persons than those who remained. This has been described as a national “brain drain” (Wadsworth et al. [3]). More is known from migrant studies where immigrants often have been found to be healthier than the population in the country of destination (“the healthy migrant effect”) (Williams and Ecob [8], Davey Smith et al. [9], Abraído-Lanza et al. [10]). The healthy migrant effect is plausibly the equivalent of the brain drain from emigration; the main difference being the choice of the comparison population. Another potential explanation of low mortality in migrant populations is bias (“salmon bias”) due to selective return to the country of origin among those who expect to die shortly (Davey Smith et al. [9], Abraído-Lanza et al. [10]). However, salmon bias has not been convincingly documented in observational studies. Both healthy migrant effect and selective return were studied in the US National Longitudinal Mortality study but were not considered to explain the low mortality rates in Latino groups (Abraído-Lanza et al. [10]). If low mortality among migrants compared to the country of destination is partly explained by salmon bias we would expect increased mortality among repatriates compared with the population in the country of origin. To our knowledge, no studies have addressed this possibility.

The impact of migration on epidemiological studies is a topic that could be increasingly more important in a globalised world (Gushulak and MacPherson [11]). So far, the interest in many epidemiological studies has been focused on health among immigrants as compared to native residents in the country of destination (Williams and Ecob [8], Davey Smith et al. [9], Abraído-Lanza et al. [10], Ringbäck Weitoft et al. [12]). We had the possibility to study emigration and repatriation in a cohort of all people born in Norway between 1967 and 1976. The study aim was to assess influences of different decisions in the analytical handling of emigration and repatriation on estimated population mortality and cancer incidence in Norway.

Materials and methods

Study population and data collection

The Medical Birth Registry of Norway has since 1967 registered all births in Norway, live and stillbirths, of gestational duration 16 weeks or more (Bjerkedal [13]). The study population included all 626,928 live births in the Registry between 1967 and 1976. After excluding the 12,752 persons who were confirmed to be deceased before the start of follow-up January 1st 1992, the participants counted 614,176 persons. The national identification number allowed linkage with several national registries: Statistics Norway’s events data base (FD-trygd); the Cancer Registry of Norway; the Cause of Death Register; the Central Population Register; the Education Register of Statistics Norway; and the benefit register of the Norwegian Labour and Welfare Administration. Follow-up in the registries terminated December 31st 2004.

Variables

The FD-trygd data base covers the entire Norwegian population and includes individual emigration and immigration movements since 1992 (Akselsen et al. [16]). We created a migration history for each individual by registering movements. Table 1 presents the study population according to alternative migration histories. Duration of person-time before first emigration, during emigration, and after repatriation (eventually, between emigrations) were computed according to migration history. All who were considered alive as of 1 January 1992 (N = 614,176) contributed nearly 8 million person-years in the total follow-up; 7,604,358 person-years (95.7%) before emigration, 196,922 person-years (2.5%) as expatriates and 143,851 (1.8%) as repatriates. Never-emigrants counted 573,810 persons, 22,499 were emigrants who later repatriated and 17,867 were emigrants who did not return. Only 29,340 out of the 40,366 ever-emigrants counted person-time before emigration; the remaining 11,026 had emigrated before start of follow-up. Accordingly, 603,150 persons contributed person-time as residents before emigration or as never-emigrants. Contributors to person-time either before emigration or after repatriation counted 609,346 persons (603,150 residents at start of follow-up plus 6,196 who emigrated before 1992 but returned during follow-up).

Table 1 Distribution of persons born in Norway between 1967 and 1976 and followed up 1992–2004, according to migration history

Covariates considered were:

  • age at entry of follow-up, age at first emigration, age at last repatriation;

  • gender;

  • parental education level; and

  • childhood chronic disease.

Citizenship was not considered because only 0.4% of the participants were foreign at birth. Age was categorised into four levels: <20 years, 20–24 years, 25–29 years, and ≥30 years. Parental education was defined as attainment for the parent with the highest level, recorded in the Education Register when the child was 16 years old. Education was recorded in nine levels on basis of the Norwegian Standard Classification of Education (Statistics Norway [17]) and further collapsed into five ordered levels. Childhood chronic disease was defined as reception of financial benefit for medical reasons from the Norwegian Labour and Welfare Administration before age 16 years. We applied two benefit categories: reception of basic benefit provided to families in the case of extra expenses because of chronic disease, and attendance benefit (with or without basic benefit) for children who needed extra attendance for medical reasons. Categories and distributions of covariates are provided in Table 2.

Table 2 Distribution of selected characteristics among persons born in Norway between 1967 and 1976 and followed up 1992–2004

Study outcomes were death notified in the Cause of Death Register and the first incident cancer obtained from the Cancer Registry of Norway. Mortality statistics cover only persons registered as living in Norway at the time of death, without regard to whether death took place in Norway or not (Statistics Norway [14]). Incident cancer registration is considered to be complete in Norway, and regulations require all hospitals, laboratories and general practitioners in Norway to report all new cases of cancer (The Cancer Registry of Norway [15]). The consequence of these routines is that neither deaths during emigration nor cancers that are solely diagnosed or treated outside Norway are registered.

Follow-up scenarios

We defined four follow-up scenarios based on the categories presented in Table 1.

  • Scenario 1. A standard approach, in which person-time before emigration (total person-time for never-emigrants) was considered. This scenario is in accord with Mittendorfer-Rutz et al. [6]. Follow-up continued until date of death, emigration, first cancer (for cancer incidence), or study termination, whichever occurred first. Scenario 1 included all 603,150 live residents at start of follow-up. All person-time as residents before emigration counted.

  • Scenario 2. A dynamic approach, counting person-time both before emigration and after repatriation, similar to the British birth cohort studies (Wadsworth et al. [8, 9], Atherton et al. [10]). All person-time as residents both before emigration and after repatriation (eventually, until re-emigration) counted. In this approach, all 603,150 live residents at start of follow-up were included, in addition to the 6,196 who were emigrants at start of follow-up but repatriated and contributed person-time later.

  • Scenario 3. Ignoring emigration events, which would be the only option if migration data were lacking. This scenario is similar to the study by Saurel-Cubizolles et al. [11]. This option included all 614,176 persons in the birth cohort who were not confirmed deceased at start of follow-up (603,150 live residents 1992 plus 11,026 emigrants 1967–1991). Person-time was counted regardless of emigration status.

  • Scenario 4. Exclusion of all who emigrated during follow-up (Helgesen et al. [12]). Only the 573,810 never-emigrants were included in the analysis.

Statistical analysis

We used STATA/SE 10.1 software for statistical analysis. Person-time at risk was computed and cases of death and cancer incidence were registered for each individual according to his/her emigration status. Rates per 100,000 person-years before first emigration, during emigration(s), and after the last repatriation were compared by estimating rate ratios (RR) using Poisson regression models. There was no over-dispersion in the data (means and variances being 0.00872 and 0.00864 for death, and 0.00724 and 0.00719 for cancer).

Emigration status served as the main independent variable (exposure) in the Poisson analysis. Four covariates were considered as potential confounders or effect modifiers on basis of the stratified mortality and cancer rates (Table 2) and their distribution across migration categories (Table 3). Accordingly, estimates of migration category effects were adjusted for age (four categories), gender, parental education (five levels), and childhood disease benefit (three categories). Age as of January 1st 1992 was used in follow-up before emigration, age at date of first emigration for follow-up during emigration, and age at date of last repatriation for follow-up after return to Norway. We assumed that mortality and cancer rates would mainly be increased in the first period after repatriation if salmon bias was present. This was examined by estimating RRs in a separate follow-up in the first year after repatriation. Interaction was considered to be present if interaction term coefficients were statistically significant (Wald test).

Table 3 Percent distribution of selected characteristics according to emigration category among 614,176 persons born in Norway between 1967 and 1976 and followed up 1992–2004

Ninety-five percent confidence intervals (CI) were computed for all RR estimates, intervals that did not include the null (1) were considered statistically significant. Since some persons contributed in all three emigration categories, robust variance estimates of the regression coefficients were applied and corrected for dependencies by using the cluster option in STATA with the person’s identity as a cluster variable.

Crude mortality and cancer incidence were also computed for subpopulations according to the four follow-up scenarios.

Results

Out of the 614,176 persons who were not registered deceased as of January 1st 1992, 40,366 (6.6%) contributed between 1 and 13 migration events during follow-up; the total number of events was 60,298. The majority of the emigrants (55.7%, N = 22,499) repatriated during follow-up. Age-specific emigration rates from birth onwards are provided for females and males in the study population in Fig. 1. Rates were dependent on age with a sharp increase in the late teens and a peak in the early twenties. There was a female excess that was restricted to the late teens and the twenties and was strongest at age 20 years. Emigration rates for females and males were 366, respectively 262 per 100,000.

Fig. 1
figure 1

Emigration rates (first event) per 100,000 person-years according to age and gender among persons born in Norway between 1967 and 1976 and followed up from birth through 2004

Table 3 demonstrates a modest female excess among emigrants both with and without repatriation compared to the 573,810 never-emigrants. Emigrants were considerably more likely to have highly educated parents and less likely to have received benefits because of childhood chronic disease. Mean age at start of follow-up as residents were close to 20 years, while ages at emigration and repatriation were about 24 and 26 years, respectively (Table 3 foot-notes).

During follow-up, 5,354 deaths and 4,447 first cancer cases were recorded. Some deaths (N = 112) and cancers (N = 96) were registered after repatriation (Table 4). Both mortality and cancer rates were moderately higher after repatriation than before eventual emigration. Mortality in association with repatriation was confounded by gender, parental education, and childhood chronic disease, and the adjusted RR was moderately increased with a confidence interval that did not include the null (RR = 1.31; 95% CI 1.01–1.70). The cancer–repatriation association was confounded in the other direction by age, and the adjusted RR was slightly below unity. Twenty-six cancer cases were also notified during emigration in the Cancer Registry.

Table 4 Rate ratios (RR) of death and incident cancer in association with migration status among persons born in Norway between 1967 and 1976, according to length of follow-up

Restricting the analysis to the first year of follow-up yielded 21 deaths and 13 cancers after repatriation (Table 4). The confounding role of the covariates was similar to that of the total follow-up analysis. The excess mortality in association with repatriation was stronger than for the total follow-up with an adjusted RR of 2.03 (95% CI 1.02–4.03). The corresponding cancer incidence was nearly doubled after repatriation, but the adjusted RR was close to unity.

The analysis indicated no interaction except for heterogeneity of the repatriation effect on mortality according to parental education level. The overall 1.31 RR of mortality in association with repatriation (Table 4) was restricted to participants whose parents had secondary or less education (adjusted RR 1.59; 95% CI 1.19–2.12). The corresponding RR was 0.55 (95% CI 0.26–1.16) in the subset whose parents had tertiary education.

Adding up person-time and outcome cases in the four follow-up scenario groups provided quite similar rates for the four scenarios (Table 5). The mortality was lowest when emigration was disregarded and follow-up continued before, during, and after emigration (scenario 3, rate 67.4). Mortality was highest when all emigrants were excluded from analysis (scenario 4, rate 70.6). The rate pattern for cancer incidence was similar to that of mortality but differences were even smaller.

Table 5 Cancer incidence and mortality rates among persons born in Norway between 1967 and 1976, according to follow-up scenario

Discussion

Mortality was moderately higher after repatriation than before emigration in this population. This excess was restricted to participants with lower-educated parents and was stronger during the first year of follow-up. Cancer incidence was of similar size before emigration and after repatriation in a multivariate Poisson model. The different ways of handling emigration in the four scenarios had only a marginal influence on estimated mortality and cancer incidence.

This large study was based on the total national population born in a 10-year period. Linkage was carried out between several national registries and data were gathered prospectively. We consider linkage to be complete and with high accuracy due to the national identification number assigned to all residents. These strengths are important concerning several potential biases.

We believe that limited quality of some of the data is the main problem in this study. Data in national registries are crude and often collected for other purposes than research. Data quality problems with proxy variables can influence results. Criteria for notifications of death and incident cancer are clearly defined, and the data quality in the registries is reassuring. Registration of migrations in the FD trygd event data base could be more problematic. To our knowledge, there are no unambiguous criteria for notifying emigration and repatriation, and notifications are based on self report to the authorities. We found few obvious errors in the migration records, one example being three persons with recorded emigration after the date of death. However, there is little reason to suspect that notification errors correlate with vital status or incident cancer recorded in other registries. The 26 notifications of cancers during emigration (Table 4) are irregular. Anecdotal evidence suggests that these were cancer patients travelling to Norway to receive treatment, where the hospitals notified the Cancer Registry. Considering cancer cases during emigration (scenario 3) will obviously produce biased low incidence and RR estimates because the main portion of cases diagnosed and treated abroad will not be counted.

A selective repatriation and corresponding salmon bias has been suggested but not convincingly demonstrated in immigrant studies among Irish in UK (Williams and Ecob [8], Davey Smith et al. [9]) and Cubans and Puerto Ricans in the USA (Abraído-Lanza et al. [10]). In immigrant studies, immigrant mortality has been compared to the population experience in the country of destination. To our knowledge, this is the first study addressing this issue comparing repatriates with the population in the country of origin. The rate differences between the migrant categories in Table 4 provide support to the salmon bias hypothesis for mortality but not for incident cancer. There was a moderate but statistically significant mortality excess after repatriation compared to the pre-emigration reference. This excess was markedly higher during the first year of follow-up. Selective repatriation as an explanation could be challenged, however, given death were caused by conditions and life-style adjustments associated with the repatriation. This explanation is less plausible in this population consisting almost entirely of Norwegian citizens who were brought up in Norway. Furthermore, the majority of the 112 who died after repatriation were migrants for short periods; only 17 had been abroad for more than 3 years. We cannot explain why the increased repatriate mortality was restricted to participants with lower-educated parents.

The present study was designed to assess influences on observed mortality and cancer incidence by different analytical strategies in order to handle emigration and repatriation in cohort studies. This design was not well suited to compare pre-emigration health among those who emigrated with those who did not emigrate because we did not have information on mortality and cancer incidence during emigration. However, the considerably lower occurrences of childhood benefits from chronic disease among the migrant categories suggest that these groups were healthier than the never-emigrants. Higher parental education level in the migrant groups is suggestive of the same, in agreement with “brain drain” emigration in the 1946 British birth cohort (Wadsworth et al. [3]).

Albeit there were marginal differences across the four scenarios, the different options provide some suggestions as to design and analytical choices in populations with more frequent migration. The lowest rates were found in scenario 3 (ignoring emigration) where person-time during emigration was added for persons supposedly not at risk (outside the catchment area of registration). The second lowest rates were found for scenario 1 (standard approach), which included the assumed low-risk population of migrants before expatriation. The second highest rates (dynamic approach, scenario 2) included the repatriated, potentially influenced by salmon bias. The highest rates were encountered in the population where the low-risk migrants were entirely excluded (scenario 4).

The impact of emigration and repatriation on estimated population rates of health outcomes are dependent on the size of rate differences between emigrants, repatriates, and source population. Emigrant mortality is probably lower and the mortality of repatriates is higher than that of the population of origin. The net effect of migration is difficult to assess because emigration and repatriation counteract each other. The extent of emigration and repatriation in a population is an important factor in determining the potential effect on population mortality. The marginal rate differences across the four scenarios are probably due to the rather limited migration in this study.

Ignoring migration events or excluding emigrants from a study are not to be recommended because migrant populations and the remaining population should be suspected to be poor comparisons. In principle, the dynamic approach (scenario 2) is superior to the others because it is an option with correct denominator estimation (person-time at risk). However, due to suspected salmon bias effects, the commonly used scenario 1 is probably a better choice than complete inclusion of person-time as national residents. Scenario 1 is also simple in design and performance when emigration data are available. This is contrary to scenario 2, which is considerably more complex to carry out conceptually and analytically for persons with several migration events.