Introduction

In December 2019, an outbreak of pneumonia of unknown etiology emerged in Wuhan, China [1]. The causative agent was later identified as the severe acute respiratory syndrome coronavirus-2 (SARS COV-2) and the disease was named coronavirus disease 2019 (COVID-19). SARS COV-2 has demonstrated a high degree of infectivity and cases of COVID-19 have been diagnosed in all countries in the world [2]. As of September 20, 2020, over 8.4 million cases of COVID-19 cases had been diagnosed in the USA, with about 222,900 deaths [2].

To halt the rapid spread of COVID-19, local and national governments instituted varying levels of restriction of movement of people, a strategy that was effective in limiting the spread of COVID-19 in China [3]. Similar to other hard-hit states in the USA, following a rapid upsurge in the number of COVID-19 cases, the governor of Michigan issued a stay-at-home order which took effect on Tuesday, March 23, 2020. This led to the closure of schools and all businesses. Only services categorized as being essential were allowed to operate. Services classified as essential included healthcare, security services, gas stations, grocery stores, pharmacies, and postal services. Non-essential services, as well as workers in essential services who could work remotely, worked from home. On account of this work structure, low-skilled workers in the essential service industry, a population that is disproportionately African American, continued to commute to work during the state-mandated shutdown. Data across the USA have shown disparities in COVID-19 hospitalization and deaths, with African Americans or Blacks having disproportionate rates of hospitalization and deaths [4, 5]. Michigan, with 51,915 confirmed cases and 4915 deaths (as of May 18, 2020), has one of the highest numbers of confirmed COVID-19 in the USA [2]. Michigan was also one of the first states in the USA to release COVID-19 data by race, and the reports showed a disproportionate burden of disease among Blacks [4, 6]. Racial disparities in COVID-19 incidence and mortality are now widely reported in other states in the USA and other nations [7, 8].

While reasons for these disparities are likely to be multifactorial [9], we posit that underlying disparities in social determinants of health (SDoH), which are well known sources of inequalities in disease burden, are the major driver of the disproportionate COVID-19 incidence and mortality in Blacks and other minority ethnic groups in the USA [6, 10, 11]. Using publicly available data, we investigated the relationship between the percentage of Blacks in each zip code and COVID-19 incidence cases and deaths (case fatality), taking into account the components of SDoH. We also examined changes in COVID-19 incidence and case fatality over time, during the state-mandated lockdown.

Methods

Study Design and Data Sources

We conducted an ecologic analysis using data from Oakland County, MI, COVID-19 website (https://www.oakgov.com/covid/casesByZip.html). Oakland County is the second-most populous county in Michigan with a population of 1,257,584 [12]. Based on the 2010 census demographic data, 77.3% were White, 13.6% Black or African American, 5.6% Asian, 0.3% Native American, 1.0% of some other race, and 2.2% of two or more races [12]. At the inception of the study, Oakland was the only county in Michigan which published COVID-19 data by zip code. Serial updated data on COVID-19 cases in each of the 70 zip codes were obtained from April 3, 2020, through May 16, 2020. Ecologic data on SDoH by zip code were obtained from the US Census report from the 2018 American Community Survey, 5-year estimates [13]. In line with the policies of the Northwestern University Institutional Review Board (IRB), IRB approval was not sought because we used publicly available, de-identified data.

Conceptual Framework

The conceptual framework for the study is summarized in Fig. 1. We mapped the domains of the SDoH [14, 15] over the constructs of the susceptible-exposed-infectious-recovered (SEIR) model [16]. SEIR is a widely used mathematical model for infectious diseases epidemiology [17] where individuals can transition sequentially through the four phases.

Fig. 1
figure 1

Conceptual framework of the role of social determinants of health in infectious diseases epidemiology. (Note: There is currently no evidence that race/ethnicity increases biological susceptibility to COVID-19.)

Observed transmission and deaths due to COVID-19 can be modeled as individuals going through the different phases in the SEIR model with drivers of transition from one phase to the next being different for each transition. For example, the type of a person’s job may increase the probability of transitioning from susceptible to exposed but may have less influence on the likelihood of successful recovery. Moving to a community-level perspective, we hypothesize that the community’s context and resources also affect this SEIR process. Living in crowded households may facilitate transmission of respiratory disease [18]. Neighborhood characteristics may also influence health behavior and access to healthcare [19]. Race composition has been known to be associated with neighborhood resources which again may affect a person’s chances of successful recovery and, on the public health level, testing availability in the community [20, 21]. In this paper, we focused on the key domains of the SDoH: race, economic stability, neighborhood and build environment, health and healthcare, education, and social and community context.

Study Variables

Our outcomes were the crude and adjusted incidence of COVID-19 and COVID-19 case fatality rates (CFR). Cumulative incidence is the cumulative number of cases up to a point in time divided by the population for a zip code (with 1000 as a multiplier) while CFR is the cumulative number of deaths over the cumulative number of cases for a zip code and point in time (with 100 as a multiplier). Our independent variable was the percentage of Blacks in each zip code. Covariates included the (1) median age of the population and surrogate markers of SDoH which included the (2) proportion of the population living below the poverty level (economic stability), (3) the proportion of the population with a bachelor’s degree or higher (education), (4) the median number of people per home (neighborhood and built environment), and (5) the proportion of people who drove alone to work (neighborhood and built environment).

Statistical Analysis

Negative binomial regression models were used to investigate the association between the percentage of Blacks in each zip code and our two outcomes by May 16, 2020. Negative binomial regression was used due to overdispersion of the count data. We started with a full model adjusting for percentage Blacks and the five covariates. We noted high collinearity in the full model (variance inflation factor (VIF) > 5 for poverty rate, education rate, and median income), so we developed two additional reduced models: one where the three variables are combined into a single score using principal component analysis (PCA), and another where we used only one of the three variables. We used likelihood ratio tests and Akaike information criterion (AIC) to assess which of the three variables results in a model with the best fit. We also assessed if terms in the selected reduced models led to acceptable VIF. In all three models, we then tested for significant interaction between percentage Black and the other covariates present in the model.

Two zip codes with no reported cases of COVID-19 were excluded from our CFR analyses. For sensitivity analysis and to account for spatial auto-correlation, we also ran geographically weighted regression models. In an exploratory analysis, we tested if the percentage of Blacks per zip code influenced the rate of change in the cumulative incidence of COVID-19 at four time points (April 3, 17, and 20, May 16) using generalized estimating equations (generalized estimating equations are preferred over ordinary least square regression when dealing with longitudinal data such as cases over time since it accounts for correlation across repeated measures [22]). Finally, we also explored mediation of association between percent Black and outcomes through the neighborhood wealth variable (education, poverty, income, percent drive) specified as a combined score through PCA.

In all analyses, a p < 0.05 was considered significant. Statistical analyses were carried out using R 4.0.2.

Results

The median incidence of COVID-19 was 44.78 (IQR: 28.05–64.69) per 10,000 populations. Median case fatality rate was 9.32 per 100 cases (IQR: 6.3 to 13.9). The percentage of Blacks in the zip codes ranged from 0 to 70%. The zip code–level median ages ranged from 30.1 to 52.4 years and the percentage of the population with a bachelor’s degree or higher ranged from 9.2 to 77.5%. The median family income ranged from $31,460 to $166,094, and the median number of people living in each household ranged from 1.9 to 3.1. The percentage of people living below the poverty level ranged from 1.7 to 39.1% (Table 1).

Table 1 Summary statistics of the zip codes

There was a positive linear association between the percentage of Blacks per zip code and the cumulative incidence of COVID-19 by May 16, 2020. In bivariate analyses, each increase in the percentage of Blacks within a zip code was associated with a 3% higher incidence of COVID-19 (IRR: 1.03, 95% CI: 1.02 to 1.04, p < 0.0001). Adjusting for all other covariates, the percentage of Blacks per zip code remained associated with a significantly higher incidence of COVID-19 (IRR: 1.02, 95% CI: 1.02 to 1.03, p < 0.0001). This association remained significant even after using the reduced and PCA models (Table 2, Fig. 2a) and using geo-weighted regression (Supplement Table 1).

Table 2 Negative binomial regression models of the relationship between the percentage of the population that is Black and the COVID-19 cases in Oakland County, MI (n = 70)
Fig. 2
figure 2

Estimated COVID-19 cumulative incidence (cases per 10,000 population) (a) and case fatality rate (deaths per 100 case) (b) according to the percentage of Black population per zip code on May 16, 2020, using the reduced (best fit) model. Note: Error bars refer to 95% confidence intervals

Association of other community variables with cumulative incidence was less clear. The full model suggested that percentage of people with bachelor’s degree was associated with higher incidence but this was not robust when looking at the better fitting reduced model. The reduced and PCA models also suggested that areas with higher person per household had significantly lower incidence (Table 2). We did not find significant interactions between race and other covariates for incidence. Results of mediation analysis showed that the association between COVID-19 incidence and percent Black in each zip code was through direct “effects” and was not mediated through the PCA wealth variable (Supplement Table 2).

Unlike cumulative incidence, we did not find an association between the percentage of Blacks per zip code and case fatality as of 16 May 2020 (deaths per 100 cases) (IRR: 0.999, 95% CI: 0.99 to 1.01, p = 0.762) (Table 3, Fig. 2b). The full model suggested that areas with more bachelor’s degree holders may have higher case fatality but this was not significant in the reduced model or in geo-weighted regression (Supplement Table 1). Persons per household was associated with lower case fatality in the reduced and PCA models but not in the full model. Again, none of the tested interactions was significant. Mediation analysis for the death outcome showed no direct or mediated effects of percentage Black on case fatality (Supplement Table 2).

Table 3 Negative binomial regression models of the relationship between the percentage of the population that is Black and the COVID-19 deaths in Oakland County, MI (n = 68)

In exploratory analysis, we found that rates of increase in COVID-19 incidence across communities were significantly associated with percentage of Blacks in the zip code (Fig. 3, Supplement Table 3). Areas with higher percentage of Blacks had higher incidence across the time points but we also see a steeper rise in incidence especially from the first to the second time point. The incidence across time periods was also affected by the average number of people in the household and “wealth” in the area where higher average people in the household and higher wealth PCA scores were associated with lower incidence (Supplement Table 3).

Fig. 3
figure 3

Estimated cumulative incidence of COVID-19 over time according to % Black in the population. Note: Error bars refer to 95% confidence intervals

Discussion

In the immediate period following the Michigan state–mandated lockdown, we found a 2% increase in the adjusted incidence of COVID-19 for each percentage increase in Blacks per zip code, in Oakland County, MI. Zip codes with higher proportion of Black residents also exhibited faster increases in incidence rates from April to May 2020. This higher COVID-19 incidence remained even after adjusting for zip code–level age, income, the number of people per household, means of transportation to work, and the level of education. In addition, mediation analysis suggested direct effects rather than mediation through social determinants. It is likely that racial composition is pointing into some unmeasured neighborhood context that increases risk of exposure and infection. Examples of unmeasured contextual variables are degree of implementation of public health measures or type of occupation by residents that increase risk.

We did not find an overall association between the percentage of Blacks in each zip code and COVID-19 case fatality. It should be noted, however, that we analyzed data collected in the early phase of the COVID-19 pandemic, so there may have been insufficient follow-up time to observe significant differences in survival. Similarly, a prospective study that evaluated the outcomes of COVID-19 in the US Veteran Affairs population early in the COVID-19 pandemic did not find an association between race and mortality [23]. Disparities in access to COVID-19 testing further limit conclusions that can be reached from current publicly available COVID-19 data [7, 21]. However, among hospitalized patients, there is clear evidence of increased mortality among Blacks and minority ethnic populations [4, 24]. This has been attributed to the higher risk of comorbid diseases such as asthma and diabetes mellitus which increase the risk of poorer COVID-19 outcomes among Blacks and ethnic minorities [4, 5].

The median income and the percentage of the population below the poverty level, both of which are markers of wealth or economic stability, were significantly associated with COVID-19 incidence in the unadjusted model. Zip codes with higher median income levels had a lower incidence of COVID-19, while zip codes with a higher percentage of population below the poverty level had a higher incidence of COVID-19. This is in line with previous research demonstrating that income inequality is a source of health disparities [25] including COVID-19 [26]. However, in the adjusted models (including the one where wealth variables were combined into a single score), these surrogate markers of economic stability were not independently associated with COVID-19 incidence suggesting they may be surrogate markers of other factors that increased the reported incidence of COVID-19 in the study population.

We found that a higher average number of individuals in households were associated with lower incidence and case fatality in reduced models. This was unexpected and seems contrary to the hypothesis that crowding will facilitate spread. We are however aware that the number of individual per household is an imperfect surrogate marker of overcrowding, since it does not take into account the size of the home. Disparities in access to COVID-19 testing in communities of color, which tend to have higher population densities, may also confound the association between overcrowding and COVID-19 incidence rates [7], but would not necessarily explain the association between the number of individuals in the household and case fatality.

Our study, that took advantage of publicly available data, contributes to the growing literature on disparities in COVID-19 incidence and mortality [10, 24, 27, 28]. Representative individual-level data on the racial differences in incidence and outcomes of COVID-19 in the USA is still lacking but community-level data allows us to investigate the potential impacts of race and racism on health. Using zip code–level data, we found robust evidence that community racial composition is associated with COVID-19 incidence. Similarly, using county-level data, Millett and colleagues reported disproportionately higher rates of COVID-19 cases and deaths among Black Americans, after controlling for risk factors [29]. Further research is, however, needed to identify what modifiable community-level variables are associated with racial composition that can reduce risk.

The findings from our study should be interpreted in the context of its limitation. We carried out an ecologic study and inference from population-level data do not always translate to findings at the individual-level. This means that while communities with higher proportion of Black residents have higher incidence rates, it does not necessarily mean individual Black residents are at higher risk of infection compared to non-Black residents (ecological fallacy [30]). Another limitation is that because we did not have access to individual-level data, we could not adjust for individual-level factors known to increase COVID-19 risks such as comorbidities, access to healthcare, and stress level [4]. We are also unable to account for disparities in access to COVID-19 testing in underserved communities which may result in under-reporting. Because our study was limited to one county in Michigan, it limits the generalizability of our finding to the larger US population. Another limitation is that surrogate markers for SDoH obtained from census tract data are not perfect markers of SDoH. Moreover, the association between race and the epidemiology of COVID-19 is complex. Finally, due to the guideline-based restriction of COVID-19 testing to specific populations, it was not feasible to get an accurate estimate of COVID-19 incidence.

Despite these limitations, our findings have important implications for public health policy and practice. Our analyses showed that the racial composition of zip codes was robustly associated with the incidence of COVID-19 during a state-mandated lockdown. This support other studies both within and outside the USA which have demonstrated that Blacks and ethnic minorities bear a higher brunt of COVID-19, not fully explained by traditional risk factors [31]. As the COVID-19 pandemic continues to unfold across communities across the USA, it is important to identify vulnerable populations, and institute policies that will reduce the adverse effects of COVID-19 in these populations. Equally important is the urgent need to institute long-term policies to improve the health of Blacks and other ethnic minorities in the USA.