Introduction

Population research in Puerto Rico has gained salience since the mid-2000s when low fertility rates and negative net migration combined to reduce the total population (Borjas 2008; Duany 2007; Mora et al. 2017). After Hurricane Maria made landfall in September 2017, outmigration accelerated population losses. In this context, timely delivery of population data was necessary but lacking. For instance, delays in the publication of death records forced researchers to estimate the death toll through other means, which produced divergent results (Kishore et al. 2018; Miken Institute 2018; Santos-Lozada and Howard 2018). Similarly, net migration estimates after Hurricane Maria differed depending on the dataset used and the time interval covered (Hinojosa and Meléndez 2018; Stone 2017).

Accurate and reliable population data is critical to planning for the recovery from the economic crisis that has been plaguing Puerto Rico since 2006, as well as more recent shocks like Hurricane Maria and the 2020 earthquakes (Gluzmann et al. 2018; Hinojosa and Meléndez 2018; Van Der Elst et al. 2020). To facilitate population research in the future, this article reviews the publicly available population-representative databases for Puerto Rico and assesses their strengths and weaknesses. In the next section (“Population databases in Puerto Rico”), we outline all the databases reviewed in this article. Sections “Decennial Population Census (DPC),” “Puerto Rico Community Survey (PRCS),” “Population Estimates Program (PEP),” “Bureau of Transportation Statistics (BTS),” “Behavioral Risk Factor Surveillance System (BRFSS),” and “Puerto Rico Labor Survey (PRLS)” describe each database products concerning overall population, migration, health, and labor. In the “Conclusions and policy recommendations” section, we conclude with recommendations for improving access to population data.

Population databases in Puerto Rico

Currently, all publicly available databases of the population of Puerto Rico are cross-sectional, representing the population at a single point in time. Some panel datasets follow individuals over time, but they are not representative of the whole population. For instance, longitudinal administrative data on welfare participants is available from the Puerto Rico Department of Families (Cordero-Guzman 2017), and the Puerto Rico Elderly: Health Conditions study followed adults age 60 and older in 2002–2003 (Palloni et al. 2005). Table 1 lists all the population representative databases for Puerto Rico, along with the frequency of publication, the method of data collection, the unit of analysis, and the source. All of these databases are produced by the Census Bureau alone or in coordination with federal agencies.

Table 1 Summary of population databases in Puerto Rico

Decennial Population Census (DPC)

Population censuses were conducted periodically in Puerto Rico from 1765 when the island was under Spanish rule until 1898, when Puerto Rico became a US territory (Vázquez 1968). In 1899, 1 year later, the US War Department enumerated the island’s population (Gauthier 2002). In 1910, the Census Bureau carried out the first decennial population count (DPC) in Puerto Rico and continued to do so in every subsequent decade (Gauthier 2002). To date, the DPC is the only population census pursued in Puerto Rico and has the most accurate count of the total population by age, sex, ethnic origin, and household composition. It serves as the baseline for other statistical data products such as the Puerto Rico Community Survey and the Puerto Rico Labor Force Survey, both discussed below.

From 1910 onward, the DPC questionnaire administered in the USA was the same as in that administered in Puerto Rico with a few exceptions. In 1960, a question on citizenship was added to the questionnaires administered in Puerto Rico and New York but not in any other states (Gauthier 2002). Another exception was the exclusion of the race variable from the Puerto Rico questionnaire from 1960 to 1990. The race variable was reintroduced in the 2000 and 2010 DPC for Puerto Rico and asked respondents to classify household members according to the racial categories used by the US Office of Management and Budget (OMB): (1) White; (2) Black, African American, or Negro; (3) American Indian or Alaska Native; (4) Asian; and (5) Native Hawaiian or other Pacific Islander. As in the questionnaire administered in the USA, there was a separate question asking whether the respondents considered themselves Hispanic or Latino. In the case of Puerto Rico, 99% of the population identified as Hispanic or Latino in the 2000 DPC and the 2010 DPC. The OMB racial categories have been criticized as being inappropriate for Puerto Rico where mixed racial categories, such as trigueño or indio, prevail (Godreau et al. 2010; Gravlee 2005; Vargas-Ramos 2005). Using race and ethnicity categories that do not align to the local context make it difficult to measure social differences and inequalities. This concern applies to all the datasets described in this article except for passenger data from the Bureau of Transportation Statistics, the only dataset reviewed that is comprised of administrative records with no personally identifying information.

From 1970 to 2000, the DPC included a long form sample, a subsample of households that received additional questions measuring the economic, social, and housing characteristics of the population. In 2005, the long form was replaced by the annual American or Puerto Rico Community Survey (discussed in the “Decennial Population Census (DPC) section). The 2010 DPC also made other changes. First, questionnaires were now printed in both English and Spanish, allowing Spanish speakers to respond without having to request a Spanish version. Second, two questions were added to improve the count of household residents (Population Reference Bureau 2009). The first question prompted respondents to report types of people who were often overlooked by asking: “Were there any additional people staying here April 1, 2010, that you did not include in Question 1?” The second question prompted respondents to report on whether some residents might be reported in a different housing unit by asking: “Does Person X sometimes live or stay somewhere else?” These changes to the DPC questionnaire increased the participation rate in Puerto Rico, although, at 54% in 2010, the mail participation rate was much lower than the 74% reached in the USA (Santana-Ortiz and Druetto 2012).

An accurate DPC is of utmost importance for Puerto Rico, especially because the fiscal crisis and shocks from hurricanes, earthquakes, and other events have produced rapid changes in population dynamics. For example, the PRCS estimated a growing population during the period 2006–2009, reaching 3,967,288 in 2009. However, in 2010, the DPC reported that the population declined for the first time in the history of Puerto Rico, falling from 3.8 million in 2000 to 3.7 million in 2010.

The Census Bureau, along with the Puerto Rico Institute of Statistics, has been implementing some changes to improve the accuracy of and participation in the 2020 DPC. For instance, through the Local Update of Census Addresses and the Participant Statistical Areas Programs, the Census Bureau involved municipal governments across Puerto Rico in updating the archive of addresses (Puerto Rico Institute of Statistics 2019). Inputs from the postal service are also now being used to improve the Census Bureau’s Master Address File. These initiatives, as well as aggressive media coverage to enhance participation, are expected to improve the 2020 DPC.

Puerto Rico Community Survey (PRCS)

As already noted, the American Community Survey (ACS) replaced the long form questionnaire administered to a subsample of all households as part of the decennial census. The Puerto Rico Community Survey (PRCS) questionnaire is equivalent to that of the ACS, with minor adjustments to the local context in eight questions (Appendix 1). The PRCS collects information on a wide range of topics (Table 2). The PRCS data is available on interactive platforms such as data.census.gov, in the Application Programming Interface of the Census Bureau, in Summary Files (text files showing detailed tables), and in the DataFerrett interface, which allows the creation of interactive tables. A sample of record-level microdata is also available in Public Use Microdata Sample (PUMS) files. Such microdata can also be accessed in the Integrated PUMS (https://usa.ipums.org/usa/) where variables have been harmonized for comparability across the years. Because of the differences between the USA and Puerto Rico, some variables are not available for Puerto Rico in the Integrated PUMS, including metropolitan status; housing unit plumbing facilities; subfamily type; socioeconomic index of Hauser and Warren; occupational prestige scores; and occupational status scores.

Table 2 Subjects in the PRCS, 2017

The PRCS began surveying household residents in 2005, with group quarters samples added in 2006. With a sample size close to 36,000 addresses, the PRCS constitutes the largest annual population survey in Puerto Rico. The sampling frame of the PRCS is the Master Address File (MAF) that contains a list of residential, group quarters, and commercial addresses. The US mainland MAF is updated every year based on the information provided by the US Postal Service, but not in Puerto Rico. The Puerto Rico MAF was only recently updated through field operations in cooperation with municipal governments (Census Bureau 2018c). Both the ACS and PRCS are administered by mail, computer-assisted telephone interviews (CATI) (until 2019), telephone questionnaire assistance, and computer-assisted personal interviews (Census Bureau 2018c). In addition, the ACS administers Internet-assisted interviews (Census Bureau 2019). Sample stratification procedures differ for the Puerto Rico context as well. PRCS response rates in the period 2005–2016 averaged 96.2%, which is very similar to the ACS (96.5%). After Hurricane Maria in 2017, PRCS response rates declined to 81.2% and were only 91.5% in 2018, the year after the hurricane.

Statistics from the PRCS are aggregated to protect the privacy of residents of small geographic areas. From 2008 to 2012, the Census Bureau published statistics for the whole of Puerto Rico for 3-year periods, and since 2013, it has published statistics in 1- and 5-year periods. The larger samples in the 5-year datasets allow for detailed statistics for all municipalities, “barrios” (Spanish name for county subdivisions), tracts, and block groups. Only statistics for counties with more than 65,000 residents are available in the 1-year estimates, including Arecibo, Bayamon, Caguas, Carolina, Guaynabo, Mayaguez, Ponce, San Juan, Toa Alta, and Toa Baja.

The PRCS is presently the only population database in Puerto Rico that permits estimation of municipal level statistics with a low margin of error. However, for many block groups, the margin of error is as high as the estimate itself, reducing confidence in estimates for small area geography. This limitation prohibits the study of neighborhood-level phenomena (Napierala and Denton 2017). For example, according to the PRCS (2013–2017), Guaynabo has one of the lowest poverty rates in Puerto Rico with only 27% (90% CI: 25.3–28.5) living in poverty. However, it also has very poor communities such as the district of Guaraguao, where the poverty rate was 47.2% (90% CI: 37.0–59.4). This limitation also has implications for policy development. For instance, Act 1 of 2001 defined a public policy that sought to improve the social development of poor (“special”) communities and a socioeconomic profile was created for these neighborhoods in 2003. Researchers would like to update the profile of “Special Communities” in Puerto Rico to evaluate the changes in the material well-being of these population segments, but it is not possible with the PRCS given its high margin of error with regard to the block group. However, the PRCS still has the second-highest quality block group estimates available in Puerto Rico after the DCP and allows researchers to combine block group data to create their own custom areas.

However, the Census Bureau cautions that the PRCS is not an official population count. The Census Bureau (2010) states: “The American Community Survey is not the official source of population counts. The official population count—including population by age, sex, race and Hispanic origin—comes from the once-a-decade census, supplemented by annual population estimates (the Population Estimates Program). American Community Survey data are designed to show the characteristics of the nation’s population and should not be used as actual population counts or housing totals for the nation, states or counties.” Therefore, the Census Bureau revises the previous years’ population estimates, as discussed below, but the PRCS does not change its statistics.

Population Estimates Program (PEP)

To observe population changes between DPCs, the Population Estimates Program produces intercensal population estimates. These annual population estimates use the demographic balancing equation (Eq. 1) and combine birth and death data from administrative records of the Puerto Rico Health Department and the Puerto Rico Institute of Statistics, baseline population from the most recent DPC, and migration data from both the PRCS and the ACS.

Internal migration within Puerto Rico is estimated through the residual method (Census Bureau 2018a). Net migration between the USA and Puerto Rico is measured by subtracting the estimate of out-migrants from the PRCS from the estimate of Puerto Rican in-migrants from the ACS. These migration estimates come from the ACS/PRCS question on place of residence 1 year ago and comparing the respondent’s current and previous place of residence.

In particular, the population estimates are calculated as

$$ {P}_T={C}_n+\sum \limits_{t=n+1}^T\left({B}_t-{D}_t+{M}_t\right) $$
(1)

where P is the population estimate at current period T, C is the DPC at its latest n period, B is births, D is deaths, and M is the net outmigration. The PEP methodology considers the distribution of age, sex, race, and Hispanic origin in constructing the estimates (Census Bureau 2018a).

In sum, every year the PEP publishes the components of population change for Puerto Rico—overall and by age group and sex—and its municipalities for the previous vintage year (i.e., July 1 to June 30), which coincides with the fiscal year in Puerto Rico. The components of population change are shown in Table 3. The Bureau corrects previous estimates in the PEP every year with updated information from vital records, which usually have 2-year lags, and the ACS/PRCS, which lags behind the fiscal year used in the PEP.

Table 3 Components of intercensal population change, Puerto Rico 2011–2019

Table 3 shows the annual net migration figures from 2011 to 2019. Consistent negative net migration in nearly every year since 2011 has contributed to Puerto Rico’s population loss. The 2018 estimates show the largest negative net migration and the largest population loss, presumably due to Hurricane Maria’s destruction of housing and infrastructure on the island. Notably, net migration between 2017 and 2018 grew by a staggering 58% with respect to the previous year. However, the December 2018 methodology for estimating migration differs somewhat from previous years. The Census Bureau suspended PRCS data collection in October, November, and December of 2017. To estimate net migration, the PRCS data was combined with post-Maria data from the Bureau of Transportation Statistics (Bureau 2018b).

Before the Population Estimates Program released their 2018 estimates, researchers estimated net migration from other data sources. Echenique and Melgar (2018) used mobile phone call data and found that between October 2017 and February 2018, 407,465 individuals left the island and 359,813 returned, resulting in a net outmigration of 47,652 inhabitants. Similarly, Alexander et al. (2019) utilized data from Facebook participants, whose locations can be identified, to estimate a 17% increase in outmigration in the 3 months after Hurricane Maria relative to their counterfactual migration levels (i.e., migration of their control group). They also find evidence of increased return migration to Puerto Rico during January to March 2018. Hinojosa and Meléndez (2018) estimated that 159,415 individuals migrated after Hurricane Maria by using changes in public school enrollment in Puerto Rico between the academic years 2017 and 2018.

The PEP finds that Puerto Rico lost approximately 527,831 inhabitants from July 2010 to June 2019: a loss of 13%. All of these estimates agree that there was a sizable population loss, although they differ in the magnitude of loss.

Bureau of Transportation Statistics (BTS)

Since 1990, the Bureau of Transportation Statistics has collected data about domestic and international airline passengers and freight from air carriers throughout the USA and its territories. Airline passenger data from the BTS is used to estimate net migration flows by calculating net passenger movement as the difference between the inbound and outbound passengers from airports in Puerto Rico. This is a good estimate of mobility between Puerto Rico and the USA since most people travel to and from Puerto Rico by air and most out-migrants select the USA as their destination (Dávila 2018). Even though air travel does not strictly qualify as migration, since we do not know passengers’ migration status, the net balance of passengers has been used to approximate the outmigration patterns from Puerto Rico. For instance, Rayer (2018) uses the net balance of passengers to Florida (difference between outbound passengers to Florida from San Juan minus inbound passengers from Florida to San Juan) to estimate the hurricane-induced outmigration to Florida.

Figure 1 shows the net passenger movements between July 2017 and June 2018, the period that includes Hurricane Maria (September 2017). After Hurricane Maria, the net movement of individuals from Puerto Rico increased strongly, but the direction was reversed between December 2017 and March 2018. Nevertheless, there was a significant net outmigration: the net passenger movement from July 2017 to June 2018 was 166,316, more than twice the 63,508 net passenger flow from July 2016 to June 2017. These figures are close to the Population Estimates Programs’ figures in Table 3 (131,932 and 81,386, respectively).

Fig. 1
figure 1

Net passenger movements, July 2017-June 2018

Data collected by the Puerto Rico Port Authority from 1950 through 1990 extends the BTS data series further back in time (Fig. 2). The Puerto Rico Port Authority gathered data on freight and passenger movements from 10 airports and seaports in Puerto Rico to prepare the “Report on Freight and Passenger” (Puerto Rico Institute of Statistics 2011). The only difference between the BTS and the Puerto Rico Port Authority datasets is that the former does not include data from two airports: Eugenio de Hostos in Mayaguez (except for years 2001 and 2002) and Antonio Nery Juarbe in Arecibo. The time series for 1950 to 2019 (Fig. 2) shows that net migration, while usually negative, has not always been high. Net migration was minimal between 1963 and 1981, but the trend line stayed below 0 thereafter and losses grew after 2006, reaching a nadir in 2017. Figure 2 also compares the data from the BTS and the PRCS for the period 2005–2019, which show a consistent pattern of growing negative net migration.

Fig. 2
figure 2

Net passenger movements per year, 1950-2019

Behavioral Risk Factor Surveillance System (BRFSS)

The Behavioral Risk Factor Surveillance System (BRFSS) is a system of health-related telephone surveys that collect state-level data about residents of states and territories regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. It is sponsored by the Centers for Disease Control and Prevention and other federal and state agencies and has been conducted in Puerto Rico since 1996 with the sponsorship of the Puerto Rico Department of Health. In the case of Puerto Rico, the BRFSS represents the largest ongoing telephone health survey conducted annually in Puerto Rico on issues of health. An estimated 95.5% of households in Puerto Rico have telephone service (CDC 2018a). As such, it is a key source of information for health professionals and researchers.

The BRFSS telephone survey consists of a 15- to 20-min telephone interview. In 2018, the sample consisted of approximately 669 households with landline telephones and 4367 cell phone interviews, for a total sample size of 5036 (CDC 2018b). The sample design segments the island into eight clusters corresponding to the eight epidemiological regions defined by the Puerto Rico Health Department. Within each cluster, a probability sample is drawn so that all households with telephones have a known, nonzero chance of inclusion.

In 2018, the “cooperation” rate (i.e., the ratio of complete and partial interviews divided by contacted and eligible participants) for the Puerto Rico BRFSS was 81.7%, higher than the median (73%) in the USA (CDC 2018b). To correct for biases due to non-response and coverage, the data is weighted using age and gender characteristics representative of the Puerto Rican population, based on population controls from consulting firms such as the Nielsen Company and the PRCS (BRFSS 2018). It takes the BRFSS staff a full year to complete all phone interviews and an additional 6 months to organize and weight the data before making it available to the public for use.

The BRFSS questionnaire includes basic and optional modules. The basic module offers information on socioeconomic status, demographics, health status, healthy days/health-related quality of life, healthcare access, exercise, inadequate sleep, chronic health conditions, oral health, tobacco use, e-cigarettes, alcohol consumption, immunization, falls, seatbelt use, drinking and driving, breast and cervical cancer screening, prostate cancer screening, colorectal cancer screening, and HIV/AIDS knowledge, among other variables. The optional modules are sets of specific questions on particular topics. There are optional modules on prediabetes, diabetes, healthcare access, cognitive decline, caregiver, e-cigarettes, marijuana use, sleep disorder, depression and anxiety, respiratory health, indoor training, excess sun exposure, lung cancer screening, cancer survivorship, prostate cancer screening decision making, clinical breast exam, adult human papillomavirus vaccination, tetanus, diphtheria, shingles, industry and occupation, sexual orientation and gender identity, random child selection, childhood asthma prevalence, asthma call-back permission script, reproductive health callback permission script, and state-added questions. Optional modules are alternated every year, depending on the particular state. In 2018 the BRFSS in Puerto Rico included modules on childhood asthma prevalence, diabetes, marijuana use, pre-diabetes, random child selection, and respiratory health (CDC 2018c).

The BRFSS is mainly used for state-level statistics and has limited use at the municipality level due to large margins of error. This imposes some constraints on analyzing phenomena that may affect some municipalities more than others, let alone studying morbidity at the block level. However, regional analysis can be done by studying the aforementioned eight strata or by aggregating a number of counties to construct, for instance, a metropolitan statistical area.

Another limitation, similar to other federal population surveys, is that the BRFSS questionnaire uses racial categories that are determined by the OMB, which may not be relevant for Puerto Rico or other territories. The OMB categories do not align with those used locally, as explained above. Furthermore, the survey asks if a Hispanic person is Mexican, Puerto Rican, Cuban or Other, but does not offer “Dominican” as an option, limiting the information that can be obtained about this ethnic group, which represents the largest group of immigrants in Puerto Rico. Such problems limit research on health disparities in Puerto Rico.

An experiment conducted in the 2016 BRFSS tested an alternative to the OMB categories for measuring racial differences in Puerto Rico. The 2016 BRFSS added the following question to the standard questionnaire: “Using a scale where 1 is the lightest color and 6 is the darkest color, how would you describe your skin color?” The question produced a rather different racial spectrum than the one obtained with OMB categories. In Table 4, we show that among those who marked White in the OMB category, just 26.2% considered themselves to have a very light skin color. Among all respondents regardless of OMB category, the preferred skin color category (33.6%) for participants was in the middle of the distribution (number 3). Only one person in the dataset marked “American Indian or Alaskan Native Only” and two were reported as “Asian.” This demonstrates that the OMB categories are of limited value for studying racial health inequities in Puerto Rico.

Table 4 Cross tabulations of OMB racial categories and skin color, 2016

Puerto Rico Labor Survey

The Puerto Rico Labor Survey (PRLS) is a monthly household survey administered by the Puerto Rico Labor Department with the oversight of the US Bureau of Labor Statistics (BLS). The PRLS is equivalent to the Current Population Survey (CPS), a survey the US Census Bureau conducts on behalf of the BLS. The PRLS measures the extent of unemployment and other labor-related indicators along with personal characteristics such as age, sex, and educational attainment. It has a sample size of about 3500 households.Footnote 1 The most recent DPC is used as the sampling frame for the PRLS, dividing Puerto Rico into six clusters. Each cluster is stratified by income and population size and the samples are selected in three stages: first groups of blocks are selected, then blocks within the selected groups are chosen, and in the third stage segments are obtained from the selected blocks. This multistage sampling design is proportional to size. Finally, the sample is divided into eight subsamples in which households are interviewed for 4 months, “rest” for 8 months, reenter the sample for another 4 months, and then exit the sample. The purpose of the household rotation is to follow households over time without imposing a heavy burden on respondents. Similar to the CPS, PRLS interviews are either in person or by telephone and its statistics are weighted with population controls obtained from the Census Bureau, in the case of Puerto Rico, by using the PRCS. The questionnaire follows the guidelines provided by the International Labor Organization and the CPS.

The main purpose of the PRLS is to estimate the size of the labor force and the rates of employment and unemployment. Figure 3 shows change in the size of the civilian population 16 years of age and older in Puerto Rico, in other words, residents who were eligible for employment. The employment eligible population grew steadily until 2010 when it began to decline, presumably due to the negative net migration that was observed to increase around the same time.

Fig. 3
figure 3

Unemployment Rate and Civilian Population 16 and older, 1990-2019

Conclusions and policy recommendations

This review of population databases for Puerto Rico is intended to aid population scientists in identifying high-quality population representative databases. The databases reviewed here are managed or supervised by the federal government with data quality similar to that of their counterparts in the USA. Each of these databases has a particular use, whether it is to measure change in the size and composition of the population (DPC, PRCS, and PEP), population mobility (BTS), population health (BRFSS), or labor force participation (PRLS). While these databases are a tremendous resource for research on Puerto Rico, their comparability with their US counterparts creates some limitations by neglecting the local context of Puerto Rico.

To address these limitations, the federal government could enhance its databases on Puerto Rico. For instance, there are longitudinal surveys for labor and health that are done in the USA by the federal government, such as the National Health Interview Survey, Survey Income and Program Participation, the American Housing Survey, and the National Survey of Children’s Health, but are not implemented in Puerto Rico; these would greatly complement the data now available. In this regard, the Puerto Rico Institute of Statistics (2017: 5) states: “Just last year, the exclusion of Puerto Rico from the American Housing Survey meant federal policymakers could not estimate how many houses in Puerto Rico had air conditioning, which impaired their ability to determine where and how quickly the Zika virus would spread in Puerto Rico.” In addition, the DCP and PRLS continue to collect race and ethnicity data using questions that are inappropriate to the sociocultural context of Puerto Rico, thereby limiting analyses of sociodemographic inequality in Puerto Rico. Revision of these questions for the Puerto Rico context, and the addition of a skin color variable in surveys, would enhance population science in Puerto Rico.

Despite these challenges, there are critical demographic issues that can be addressed with the population data available in Puerto Rico. First, changes in the size and composition of Puerto Rico’s population that are associated with the economic crisis, the 2017 hurricane season, and more recent shocks, can be better documented to investigate which segments of the population were most affected. Second, since the early 2000’s the total fertility rate has dropped below replacement levels of 2.1, reaching a low of 1.04 in 2019, according to the World Bank (Stone 2017; World Bank ND), but there are no recent peer-reviewed journal articles on this topic (to our knowledge). Third, Puerto Rico’s population is aging rapidly due to the combination of negative net migration and low fertility, requiring more investigation to understand the causes and consequences of this dynamic. We encourage population scientists to study Puerto Rico’s population dynamics using the sources of population data described here.