Introduction

Over the last several decades, labor markets in many U.S. cities have absorbed large inflows of new immigrants. These new workers appear to be rapidly absorbed into local labor markets, as evidenced by unemployment rates very similar to those of natives (Chiswick et al. 1997). How does this happen? One hypothesis is that social networks act as a conduit between immigrant populations and jobs. Waldinger and Lichter (2003) argued that after initial immigrants establish a “beachhead” in specific occupations or geographic areas, social networks draw new immigrants to those same occupations or areas. Consistent with this, Patel and Vella (2013) found strong evidence that new immigrants to an area take up the occupations of earlier arrivals from their home country and that this pattern helps raise new immigrant earnings. Similarly, Model (1993) found that immigrants who share employment opportunities through their networks obtain employment in higher-paid occupations.

Although occupational sorting may be an important outcome of social networks, ultimately employment requires a connection to a specific firm. Thus, we expect networks to operate partly by helping new immigrants to gain employment with firms that employ others in their social network—likely, other immigrants and, in particular, compatriots. Quantifying and understanding how immigrants sort into workplaces is important because it is increasingly evident that the identity of an individual’s employer has an important role in determining their economic outcomes. Estimates of the determinants of an individual’s earnings have typically found that only about 20 % of earnings variation is accounted for by observable worker characteristics, such as education and work experience. Analysis of matched employer–employee data shows that roughly one-half of variation in earnings is accounted for by differences in mean pay across firms, even after controlling for unobserved worker effects.Footnote 1 Evidence also indicates that widening differences in pay across firms accounts for much of the recent rise in earnings inequality (see, e.g., Barth et al. 2011; Davis and Haltiwanger 1991). These findings imply that sorting across employers is critically important in determining earnings. Thus, differences in how immigrants and natives sort across firms are likely to be important for understanding why economic outcomes for immigrants differ from those of natives.

We are not the first to address this question. Hellerstein and Neumark (2008) found substantial sorting across workplaces by race and ethnicity, and Hellerstein et al. (2011) found evidence that those living in the same neighborhoods are particularly likely to work together, although neither study looked specifically at immigrants. Portes and Wilson (1980) found that not only do Cuban immigrants in Miami work together, but many work for firms owned by Cubans; and García-Pérez (2009) found that immigrant-owned small firms are particularly likely to hire immigrants.

Our work here adds to this literature in three ways. First, we systematically quantify the relative contributions of worker, employer, and locational characteristics in explaining the extent to which immigrants work with different employers than natives do. Second, we examine how concentration varies across 18 source countries. Third, we analyze how the likelihood of working with compatriots differs from the likelihood of working with immigrants from other countries.

We find that immigrants are much more likely to have immigrant coworkers than are natives, but at the same time, only a small share work in immigrant-only workplaces. Using a multivariate framework, we find that observable worker, employer, and locational characteristics together account for at least one-half of observed concentration. We find that immigrants are particularly likely to work with compatriots, but they are also somewhat more likely to work with immigrants from other countries than are natives. Our rich set of worker, firm, and locational characteristics accounts for virtually all the excess probability that immigrants work with immigrants from other countries, but leaves most compatriot concentration unexplained. These findings suggest that although immigrants work together partly because they often have similar skill levels and work in similar jobs, unmeasured country-specific factors also play an important role. A natural interpretation of these unmeasured factors is that country-specific social networks are at work.

Background

Our work draws primarily on the literature explaining sorting of workers into firms. This literature has identified four types of sorting that may contribute to segregated workplaces: (1) based on productive characteristics of workers, (2) based on the information available to workers and employers, (3) resulting from the residential location of workers relative to business locations, and (4) resulting from preferences of workers and employers. Because we have no direct measures of tastes, our empirical analysis focuses on factors (1)–(3).Footnote 2

There is substantial evidence of segregation by skill. For example, Kremer and Maskin (1996) found a high and rising correlation between coworker skill levels in firms during the 1970s and 1980s in the United States, Britain, and France. A positive correlation in skills may occur either because a firm demands workers of a particular skill level or because coordination within a firm requires that workers share a common skill, such as speaking a particular language.Footnote 3 Skill-based sorting could lead to workplace segregation of immigrants from natives because immigrants are more likely than natives to have an eighth grade education or less but are also more likely to have an advanced degree. Therefore, employers that hire exclusively low-skilled or exclusively high-skilled workers will tend to have above-average immigrant employment shares.

If a shared language increases worker productivity, employers may choose workforces in which everyone speaks the same language. If so, immigrants from non-English-speaking countries will be particularly likely to be segregated and may also be particularly likely to work with immigrants who speak their language. Lang (1986) developed a formal model of wage differences that arise because employers must pay a premium for bilingual workers who can bridge the language barrier. His model implied that complete segregation would occur if sufficient capital were owned by each language group. Several authors have found evidence consistent with such segregation by language. Hellerstein and Neumark (2008) (hereafter, HN) found evidence that Hispanics with poor English-language skills are particularly likely to work with other Hispanics. Portes and Wilson (1980) found that Cuban immigrants in Miami work together, and many work in firms owned by other Cubans. García-Pérez (2009) also found supporting evidence that immigrant-owned small firms (mostly Hispanic- or Asian-owned) are more likely to hire immigrants than are native-owned small firms.

Information-based theories focus on mechanisms that match workers to jobs. For example, if people interact outside of work mostly with others who have similar characteristics, employer use of employee referrals and/or employee use of personal contacts to find jobs will increase workplace segregation. Holzer (1987, 1988) and Montgomery (1991) found evidence that use of referrals and personal contacts may lower the costs of finding good matches. Elliot (2001) found that recent Latino immigrants are more likely than blacks or Latino natives to use personal contacts to find jobs. Weak English skills explain much of this difference. A greater reliance on referrals in small workplaces combined with a concentration of recent immigrants in small firms also contribute to the difference.

Information flows may combine with residential segregation to generate workplace segregation. Immigrants’ places of residence are spatially concentrated (see, e.g., Iceland 2009), and neighbors may provide important job contacts and references. Several studies have found that those working in the same place are disproportionately from the same neighborhoods. For example, Ellis et al. (2007) and Wright et al. (2010) found strong links between the residential concentration of immigrant groups in Los Angeles and their concentration by workplace tract and industry. Using data from Boston, Bayer et al. (2008) found that a worker is about one-third more likely to work with other residents of their census block as to work with residents of other blocks in their block group.Footnote 4

Hellerstein et al. (2008) (hereafter, HNM) also presented evidence of the importance of neighborhood network effects. Using a matched employer–employee data set that they developed, they found that for whites, another worker living in the same census tract has twice the probability of working in the same establishment as what one would expect from randomness. They found particularly large effects for Hispanics with poor English language skills and Hispanics who are immigrants. We draw on their work for ways to capture the importance of network effects in determining the distribution of workers. Because our aim is to identify the importance of these effects in accounting for immigrant concentration, whereas HNM’s goal was to establish the importance of networks for labor markets more generally, our results are not directly comparable with theirs. However, given their extensive work in this area, it is worth briefly clarifying how our analysis differs from their work.

The core differences stem from our more complete data on the immigrant status of coworkers. Using the 1-in-6 decennial long-form sample, HN matched workers’ write-in reports on place of work to employer addresses on the U.S. Census Business Register (a list of all employer establishments). They matched 29 % of long-form workers to their work location, giving them a sample of roughly 1-in-20 workers in the United States. In order to calculate the fraction of coworkers who are immigrants, immigrant status must be observed for at least two workers at an establishment. In HNM’s sample, requiring at least one coworker in the long-form data reduces a worker’s probability of inclusion from 1-in-20 to about 1-in-205 for workers at three-employee establishments, while having almost no effect on the probability of inclusion for workers at establishments with 80 or more employees.Footnote 5

HNM recognized these issues. To account for them, they used an elegant simulation approach that compares observed segregation with what one would expect to observe in their sample if employers hired randomly, drawing on statistical methods developed in Carrington and Troske (1997). If observed concentration is significantly greater than expected, this is taken as evidence of nonrandom hiring. HNM also carried out these simulations, allowing hiring to be random within a limited number of strata. If within strata the observed and expected concentration are the same, HNM took this as evidence that these strata explain the unconditional level of worker concentration. This method works well as long as the number of stratification variables is small.

Because our data (described in more detail in the next section) contain immigrant status for all coworkers of each worker in our sample, we do not need to take within-establishment sampling variation into account in our analysis. This allows us a more flexible approach than that used in HN, which in turn makes it possible for us to examine a wider set of characteristics. Our results will show that controlling for many characteristics simultaneously matters in this context. For example, we find that adding other controls reduces by about one-third the share of concentration that we would attribute to language proficiency differences, although this factor remains important. Our dense sample also readily permits analysis of concentration by country of origin. The latter is a distinctive feature of our analysis relative to this recent literature and yields some of our most interesting and novel results.

Methodology and Data

Data

We construct a cross-sectional sample of workers in selected MSAs by combining data from the Longitudinal Employer-Household Dynamics (LEHD) database and the 2000 Decennial Census 1-in-6 long form.Footnote 6 Because April 1 is the reference date for the census, we use information from jobs held in the second quarter of 2000. The LEHD database draws much of its data from complete sets of unemployment insurance (UI) earnings records for a subset of U.S. states. Workers’ earnings records have been matched to characteristics of their employers drawn from quarterly administrative UI reports and from U.S. Census Bureau business censuses and surveys.Footnote 7 Basic demographic data—including country of birth—are available for all workers. Geocoding of addresses for both employers and places of residence allows us to examine characteristics of both locations. The LEHD data have the important advantage of allowing us to measure country of origin for all coworkers of the individuals in our matched sample. Their main disadvantage for studying immigration is that they include only on-the-books employees, leaving out the self-employed and those working in the informal sector. Thus, they likely have poor coverage of undocumented immigrants. Coverage of employment in agriculture is incomplete, so we exclude that sector.

Each quarterly wage record includes a UI account number that identifies the employee’s firm of employment within a state in a specific quarter. Where firms have more than one location within a state, the LEHD data identify each separate location (establishment). Workers employed by a multi-establishment firm are assigned to specific establishments within a state through multiple imputations based on a rich set of information, including the location of the firm’s establishments in that state, the worker’s place of residence, and the employment histories of both worker and establishment.Footnote 8

We match to the 2000 long-form sample to obtain two additional variables that are likely to be important in this context: education and English proficiency. Of all UI–covered workers in our sample of MSAs, we match approximately 1 in 10 foreign-born workers and 1 in 9 native workers. Matched workers have a slightly lower immigrant coworker share than does the complete set of UI workers in our sample of MSAs, and there seems to be a tendency for older longer-tenure workers at large establishments and in older, multi-unit firms to be overrepresented in the matched sample. Generally, these differences are small, however. To adjust for differences in match rates associated with observable characteristics, we create weights for the matched sample based on a regression model of the propensity for UI workers to match to the long-form data.Footnote 9 Using these weights, regression results that exclude education and language controls are very similar whether we base them on the matched sample or the complete UI earnings sample.

We base our analysis on the matched sample but compute our dependent variable (coworker share) and several geographic controls using all applicable workers in the LEHD database. We limit our sample to workers employed in 31 selected metropolitan areas (MSAs) in 11 states (California, Colorado, Florida, Illinois, Maryland, Minnesota, New Jersey, North Carolina, Oregon, Pennsylvania, and Texas), with our choice of areas based on the presence of substantial immigrant populations and the availability of data for a state. Although we use a small number of states, they include five of the six states in which the 2000 foreign-born population exceeded 1 million. In addition to cities with large immigrant populations, we also include several MSAs with smaller immigrant populations but with very rapid growth in foreign-born residents between 1990 and 2000.Footnote 10 We include all matched employees of nonagricultural businesses located in a sample MSA, regardless of whether they live in the MSA. This gives us a sample of 3.5 million workers, with more than 3,000 immigrant workers in our sample for each of our MSAs.

The average immigrant workforce share across our 31 MSAs is 18.7 %, but immigrants account for less than 11 % of the workforce in eight MSAs, and they account for more than 35 % of the workforce in three MSAs. Even with random assignment to jobs within a local labor market, these substantial differences across areas would make immigrants more likely to work together than to work with natives, simply because immigrants are disproportionately in the MSAs with high immigrant shares. Because our interest is in how workers are matched with employers within a local labor market, we include MSA dummy variables in all our specifications so that estimates are based on within-MSA variation.

We follow HN and Aslund and Skans (2005, 2010) by using the share of coworkers in a particular group as a measure of exposure. That is, we exclude the worker when measuring the concentration of immigrants in the business of the worker. For worker i employed by business j that has s j employees, the share of immigrants among coworkers is

$$ {C}_{ij}=\frac{1}{s_j-1}{\displaystyle \sum_{k\ne i}^{s_j}{I}_k}, $$
(1)

where I k is an indicator for whether worker k is an immigrant. For the sake of brevity, we will refer to this simply as “the coworker share.” As pointed out by Aslund and Skans (2005), excluding the worker’s own characteristic in calculating concentration ensures that in the absence of any systematic concentration, in large samples, the mean coworker share for both immigrants and natives should equal the share of immigrants in the workforce. Based on this property, we use the difference between the mean coworker share for immigrants and natives to measure immigrant concentration. A significant positive value indicates that immigrants are more concentrated than would be expected based on random allocation.Footnote 11

Descriptive Statistics

Figure 1 plots the cumulative distribution of immigrant coworker shares for natives and for immigrants as of the second quarter of 2000. In our sample of immigrant-rich MSAs, 10 % of natives work in native-only workplaces, but the share of immigrants working for immigrant-only businesses is considerably smaller (2.8 %). About 10 % of the median native’s coworkers are immigrants, but for the median immigrant, the share is about 32 %. For reference purposes, we include a third line giving the cumulative distribution that would apply if immigrants and natives were randomly assigned to employers in a manner that preserves the size distribution of employment. This simulated distribution depends only on the overall immigrant share and the size distribution of employment. By assumption, the random assignment distribution is identical for immigrants and natives.

Fig. 1
figure 1

Cumulative distribution of coworker share for natives and immigrants. The cumulative distribution function under random assignment is constructed by first simulating the distribution of coworker shares conditional on employer size S by drawing 4,000 binomial random variates for S trials with p = .187 (share immigrant in our sample), and then using the number of immigrants (= number of successes in S trials) to calculate coworker shares. We simulate the distribution for each value of employer size from S = 2 to 2,000. The distribution of employers becomes thinner as S increases, but the distribution of coworker shares changes little as S increases for large S. So for employer sizes above 2,000, we group employers into size ranges, using intervals of 200 for employer sizes 2,000–8,000; 1,000 for employer sizes 9,000–20,000; and 10,000 for employer sizes above that level. We then sum the conditional probabilities for each coworker share across values of S using the empirical distribution of employer size as weights

Clearly, the observed distributions are inconsistent with random assignment. Because the likelihood of extreme values occurring randomly is quite low in large samples, and because large employers account for a substantial share of employment, about 60 % of workers would have between 17 % and 20 % immigrant coworkers if workers were grouped randomly. The share with only native coworkers would be well below the 10 % observed for natives (but only a bit above the 2.2 % observed for immigrants), and the share of employees working only with immigrants would be close to zero. Overall, it is apparent that native-born workers are far less likely to have immigrants as coworkers than are immigrants.

Our analysis focuses on the mean difference in coworker shares between immigrants and natives, given in the first row of Table 1. For the average native in our set of MSAs, about 14 % of coworkers are immigrants, and 37 % of the coworkers of immigrants are immigrants. The immigrant-native difference in coworker means—our measure of concentration—is 22.9, indicating substantial concentration.

Table 1 Sample characteristics

The following rows of Table 1 give demographic information that might help explain this concentration. Immigrants are relatively underrepresented among those younger than age 25, reflecting the fact that many arrive in the United States as young adults. Men substantially outnumber women among working immigrants; among working natives, men are more narrowly in the majority. Differences between immigrant and native women in rates of labor force participation likely contribute to these gaps. Immigrants are much more likely to not have completed high school than are natives, but immigrants are also overrepresented among those with advanced degrees.

The category “Speaks English very well” consists of those who report that they speak English “very well” along with those who speak only English at home. Unsurprisingly, immigrants are more likely than natives to fall into categories other than “very well,” but even the category “Not at all” includes some natives.Footnote 12 Mean log earnings on the primary (highest earnings) job are very similar for immigrants and natives, and immigrants are more likely than natives to work for their 2000-Q2 employer in at least one of the surrounding quarters. Differences in job tenure likely contribute to the slightly higher earnings of immigrants because most transitory jobs will involve less than three full months of work and thus are likely to have particularly low quarterly earnings. These jobs may also be associated with relatively low wage rates and part-time work.

We find only minor differences between immigrants and natives in broadly defined employer characteristics. Immigrants are more likely to work in the smallest establishments and less likely to work in the largest, but overall, the differences by employer size are small, as are differences by establishment age. However, immigrants are less likely than natives to work for multi-unit firms. Immigrants are more concentrated in manufacturing than are natives, but the differences by broad sector are otherwise not particularly large.

The last three rows of Table 1 give means for three additional measures that we construct to explore the relationship between workplace concentration and neighborhood networks. Each of these is based on information on worker tract of employment and/or tract of residence.Footnote 13 Because we have data only on those who work, we base these variables on workers residing in a particular tract rather than all residents of the tract.

The first measure is simply the share of immigrants in a worker’s tract of residence, which we use to control for residential segregation. Neighbors act as contacts and references for job opportunities, so concentration of immigrants in the neighborhood can contribute to immigrant concentration in the workplace. As can be seen in Table 1, immigrants in our sample of MSAs are substantially more likely to live in tracts with high immigrant shares than are natives, but even so, the majority of their neighbors are natives.

We construct a second variable for each worker by calculating the share of employees at other businesses located close to his employer who also live in the worker’s residential tract. The denominator is the number of employees working for other employers in a worker’s tract of employment. The numerator is the number among that group who live in the worker’s residential tract.Footnote 14 Proximity or convenient transportation links may make residents of certain neighborhoods likely to work at a particular location, resulting in a relationship between workplace and residence. This measure of the general propensity for workplace and residence locations to be connected will control for commuting patterns that influence concentration. We refer to this as our shared commute index. For the average worker, there is not a strong association between the tract of the employer and particular tracts of residence: the mean for this variable is only 0.3 % for immigrants and 0.5 % for natives.

Our third measure is intended as a proxy for the presence of a specific type of neighborhood-based social network. Neighborhood contacts and references may make neighbors more likely to be coworkers. For each worker, we calculate the fraction of their coworkers who also reside in the worker’s tract of residence. So, for example, if a business hired three workers from each of four different residential tracts, each worker would have a neighborhood network index of 2/11, given that two of their 11 coworkers would be from their neighborhood. The mean of the network index is small: for both immigrants and natives, 1.9 % of coworkers live in the same tract. However, it is larger than the shared commute index, suggesting that residential location is more strongly connected to firm of employment than geographic area of employment.

Regression Specifications

Our empirical approach is based on a series of regressions with the coworker share as the dependent variable and individual workers on their primary job as the unit of analysis. To ease computation with over 3 million workers, we use linear regression rather than adopting an approach that accounts for the limited range of the dependent variable. As Fig. 1 illustrates, most of the mass of the distribution is not at either 1 or 0, which mitigates some of the problems inherent in the linear model. There is a strong positive correlation in the coworker share among employees of the same business that generates a downward bias in conventionally estimated standard errors in all worker-level regressions. To avoid this, we use the Huber-White variance estimator, allowing for arbitrary correlation of errors among employees of the same establishment.

We start with a very simple regression specification to provide a basis of comparison:

$$ {C}_{ij}={\upgamma}_N^{base}+{\upgamma}_I^{base}{I}_i+{\uptheta}^{base} ms{a}_{ij}+{\upvarepsilon}_{ij}^{base}, $$
(2)

where i denotes an individual, and j denotes a workplace. I and N denote immigrants and natives, respectively. In Eq. (2), the constant term represents the mean coworker share for the omitted category, which in this simplest specification consists of natives in the omitted MSA. Coefficient γ base I gives us the mean within-MSA difference between immigrants and natives in how likely they are to have immigrant coworkers and thus represents our base measure of immigrant concentration.

We next add a vector of worker, employer, and locational characteristics x ij :

$$ {C}_{ij}={\upgamma}_N^{main}+{\upgamma}_I^{main}{I}_i+{\uptheta}^{main} ms{a}_{ij}+{\upbeta}^{main}{\mathbf{x}}_{ij}+{\upvarepsilon}_{ij}^{main}. $$
(3)

To the extent that γ main I < γ base I , the vector of characteristics in x partially account for the raw immigrant concentration. Comparing results from Eq. (3) to Eq. (2) allows us to address our first question: which characteristics of workers and employers are important in accounting for immigrant concentration?

We quantify the contributions of various sets of characteristics using a decomposition developed by Gelbach (2009).Footnote 15 Let δ = (γ base I − γ main I ) represent the amount of immigrant concentration explained by the characteristics included in x. Gelbach noted that the formula for omitted variable bias gives a natural way to decompose δ. If x has K components, then δ can be decomposed into K additive terms with the contribution of the kth variable given by δk = βk,main × α k I , where α k I are coefficients estimated from the K auxiliary regressions:

$$ {\mathbf{x}}_{ij}^k={\upalpha}_N^k+{\upalpha}_I^k{I}_i+{\upalpha}^{k,msa} ms{a}_{ij}+{\upvarphi}_{ij}^k. $$
(4)

This decomposition makes clear that two things must occur for a factor to account for a substantial share of immigrant concentration: (1) the factor must be strongly correlated with immigrant concentration even when conditioning on other controls (βk,main is large); and (2) within MSA, there must be a large average difference between immigrants and natives in x k k I is large).

Accounting for Immigrant Concentration

Basic Results

Table 2 presents estimates of γ base I and γ main I in rows 1 and 2. In the remaining rows, the contributions of sets of covariates are given as percentages of total within-MSA concentration (i.e., δk base I ). In the first column, average within-MSA concentration (γ base I ) is 17.1; that is, the average share of coworkers who are immigrants is 17.1 percentage points more for immigrants than for natives working in the same MSA. Comparing that with the overall difference (22.9) reported in Table 1, MSA effects alone account for about one-quarter of the total concentration. Controlling for observable employee and employer characteristics reduces estimated concentration from 17.1 (γ base I ) to 8.3 (γ main I ), which is roughly a 50 % reduction. Three factors stand out as important in the decomposition: English language skills, industry of employment, and the share of a worker’s neighbors who are immigrants. Together, these account for 48 % of within-MSA concentration, with the next runners-up (education and the interaction of firm age with multi-unit status) contributing about 1 % each.Footnote 16

Table 2 Contribution of covariates to immigrant concentration

Language skills make a large contribution to explaining concentration both because most of those who do not speak English well are immigrants, and because of the substantial increase in coworker share associated with reduced English proficiency even when controlling for numerous other factors. Given the large share of U.S. immigrants of Hispanic origin, it is worth comparing our findings with HN’s findings on the importance of language for Hispanic/white concentration. Using the same language grouping (and controlling only for MSA), HN found that about one-third of all Hispanic/white within-MSA concentration is attributable to segregation by language. In our sample, if we include only language and MSA controls, language explains 28 % of overall immigrant concentration. Using the broader set of controls given in Table 2, we attribute about 18 % of overall concentration to language. In both cases, language is important, but looking at language separately from other factors produces results that overstate its importance. This highlights the value of the multivariate approach used here.

The substantial contribution of industry comes about because the distribution of employment across detailed industries is quite different for immigrants and natives. This seems somewhat surprising given that the distribution across sectors in Table 1 shows only modest differences. To explore this, we split the contribution into differences in immigrant employment by sector and then into the contributions of detailed industry within-sector. This split is somewhat sensitive to how the detail is specified, but using the modal three-digit industry within each sector as omitted categories (as we do here), differences across broad sectors (particularly the high share of immigrants in manufacturing) and differences across detailed industries within services both appear to be important.

The other striking result is the almost one-third contribution of residential segregation across census tracts within MSAs. As noted in previous literature, this points to a very strong relationship between living and working with immigrants. Note that neither of the other tract-level variables (the network index and the shared commute index) accounts for much of the concentration. The network variable has a positive and statistically significant effect (not reported here), which is consistent with the hypothesis that network effects increase the likelihood of working with immigrants. However, the network variable cannot account for much immigrant concentration because there is little difference between immigrants and natives in its mean value. This latter point is important for interpreting the results of Table 2. Other factors such as establishment size and firm age interacted with multi-unit status have statistically significant estimated effects (not reported here) but account for relatively small shares of immigrant concentration because mean values differ little between immigrants and natives.

The second column of Table 2 performs the same exercise except that for immigrants, the dependent variable is the fraction of coworkers from non-U.S. countries other than their own. With this specification, γ I is positive if immigrants are more likely to have coworkers from other source countries than natives are to have immigrant coworkers. Row 1 shows that roughly one-quarter of workplace concentration stems from this cross-source-country concentration. Row 2 shows that controlling for worker, firm, and locational factors entirely explains this excess probability of working with noncompatriots. Looking at the decomposition, it is clear that firm and locational factors play a more important role than do worker characteristics in explaining why immigrants work with noncompatriots. This concentration largely reflects that some industries have large immigrant workforces (from various countries of origin) and that living in a tract with a high fraction of immigrants has a strong association with the chances of working with both immigrants from other countries and with compatriots. In the next section, we examine this novel finding about the difference in how much we can account for own country versus other country concentration for immigrants.

Country-of-Origin Differences

Our data permit further exploring patterns of concentration by examining how they vary by country of origin. That is, we can estimate how likely it is for an immigrant from Mexico (for example) to have coworkers who are from Mexico versus those who are from El Salvador or China. We then explore differences across countries of origin in the factors accounting for within country-of-origin concentration. To make this manageable, we rank countries of origin by their share of employment in our sample and then carry out some analyses separately for immigrants from the 18 largest source countries. Table 3 lists these countries and gives their sample shares in the first column.Footnote 17 In the row labeled “Other,” we group immigrants from the many source countries with smaller shares than those on our list. The remaining columns of Table 3 present statistics on three factors that are potentially important for patterns of concentration: English language proficiency, the share of neighbors who are immigrants, and levels of education.

Table 3 Selected sample characteristics by country of origin

As we emphasized in discussion of the Gelbach decomposition, for a characteristic to have a sizable effect on concentration, the difference between its mean for immigrants and for natives must be large. Unsurprisingly, immigrants from other English-speaking countries such as Great Britain, Canada, and Jamaica report English skills much like those of natives, but immigrants from Germany are also very unlikely to report difficulties with English. For these countries, language skills cannot be important, but for many of the other source countries with a large share of members with limited English—such as the Dominican Republic and China—language skills could play a sizable role in explaining differences in concentration.

Immigrants from Canada, Germany, and Great Britain also have patterns of residential segregation that closely resemble those of natives, with natives accounting for over 80 % of neighbors. Immigrants from Guatemala and El Salvador stand out as being most likely to live with immigrants from other countries, with Mexico accounting for the majority of their nonnative neighbors. Haitian and Dominican immigrants stand out as having large shares of compatriots among their neighbors, particularly given their sample shares.

Finally, the last two columns of Table 3 show the education distribution by group. As discussed earlier, the immigrant population as a whole includes larger shares of both those who have not completed high school and college graduates than the native population. As Table 3 shows, China is the only single source country that shows this pattern. Immigrants from other countries of origin tend to be overrepresented in either the lower or upper tail of the education distribution relative to natives, but not in both tails.

Table 4 presents estimates of concentration by country of origin for our 18 source countries. Each estimate is from a separate regression. The specifications used are analogs to Eqs. (2) and (3) but with different dependent variables: the country-specific coworker share in the “Own country” columns (e.g., share of coworkers who are Mexican immigrants in row 1), and the immigrant coworker share excluding that country of origin for the “Other country” columns (e.g., share of non-Mexican immigrants among coworkers in row 1). Each estimate is the coefficient on an indicator variable for being an immigrant from that row’s country.Footnote 18 The first and second columns include only country and MSA dummy variables as controls; in the third and fourth columns, we add the other sets of variables used in Table 2. However, we split the residential segregation measure used in Table 2 into 18 country-specific shares and the remainder, which is the share of neighbors who are immigrants from countries other than those listed.

Table 4 Concentration by country of birth

The first entry indicates that for the average Mexican immigrant, the share of coworkers who are Mexican is 15.7 percentage points higher than the share for the average native within the same MSA. The entry in the second column shows that for Mexican immigrants, the share of coworkers who are immigrants from other countries is only 2.1 percentage points higher than the share of non-Mexican immigrant coworkers for natives.

For most countries of origin, immigrants are much more likely to work with their compatriots than with other immigrants. There are two types of exceptions. Immigrants from three countries (Germany, Great Britain, and Canada) are roughly as likely to work with compatriots or other immigrants as natives. The other exceptions are immigrants from El Salvador, Guatemala, Taiwan, Jamaica, and the Dominican Republic—countries with sizable own-country effects, but even larger other-country effects. Based on results that we do not present here, for Salvadorans and Guatemalans, this largely reflects a propensity to work with immigrants from Mexico. Given such a propensity, the large other-immigrant effect likely reflects the fact that Mexican immigrants greatly outnumber Salvadoran and Guatemalan immigrants in our sample of MSAs. Immigrants from Taiwan are quite likely to work with immigrants from mainland China; Dominican immigrants are quite likely to work with Cubans; and Jamaicans, with Haitians. Although some of these cross-country patterns suggest the importance of a shared language, countries with a shared language may share other characteristics as well. There is no such tendency for Cubans to work with Mexicans, Salvadorans, or Guatemalans, despite a shared language.

The third and fourth columns of Table 4 report estimates of the same coefficients when we include our full set of covariates. A comparison of the first and third columns shows how much the added controls contribute to accounting for concentration measures by country of origin. For Mexico, adding covariates reduces own-country concentration by close to one-half, from 15.7 to 8.7—roughly similar to the magnitudes we observed in Table 2 for all immigrants. There is a similarly large reduction in concentration for Cubans as well as reductions in the range of 20 % to 30 % for Salvadoran, Guatemalan, Haitian, Jamaican, and Dominican immigrants. However, among Asian immigrant groups—particularly Korean and Japanese immigrants—adding covariates only modestly reduces concentration.

Although observable factors only partially explain compatriot concentration, for most countries of origin, these factors fully explain the excess tendency to work with immigrants from other countries. With the full set of controls, only immigrants from El Salvador, Guatemala, China, and Taiwan appear substantially more likely than natives to work with immigrants from other countries; even for these countries, covariates explain more than two-thirds of the excess noncompatriot concentration for all but Taiwan. The final row in Table 4 shows that the average unexplained other-country concentration is 0, whereas about two-thirds of own-country concentration remains unexplained.Footnote 19

Table 5 presents the Gelbach decomposition for own-country concentration, and Table 6 presents the other-country decomposition. We group variables as we did in Table 2 except that we split the residential segregation measure between compatriots and other immigrants. The three factors that account for most of overall concentration are also the primary factors when we look at concentration by country: residential segregation, English language skills, and industry of employment. However, the importance of these factors differs for own-country versus other-country concentration and varies substantially across country groups. Residential segregation accounts for virtually all (92 %) the explained variation in own-country concentration for Cubans as well as the majority of explained variation for all countries except for Mexico, India, and the Philippines. Residential segregation has substantial explanatory power for Cubans because 31 % of the neighbors of Cuban immigrants are Cuban, but less than 1 % of natives’ neighbors are Cuban. For both Cubans and natives, having a neighbor who is Cuban makes it more likely to have Cuban coworkers, but the large difference in the propensity for a neighbor to be Cuban dominates here. We stress the accounting nature of this exercise. Those who live with immigrants from a particular country are quite likely to work with immigrants from that country as well, suggesting that common factors underlie those patterns, but not that one causes the other. Looking back at Table 3, it is clear that immigrant neighborhoods often include immigrants from several countries of origin, rather than consisting of ethnic enclaves for a single country of origin.

Table 5 Decomposition of main effects by country of origin: Own-country concentration
Table 6 Decomposition of main effects by country of origin: Other-country concentration

The industry distribution of employment accounts for more than one-half of the explained concentration for immigrants from India and the Philippines, and residential segregation and the industry distribution of employment each count for about 40 % of explained variation for immigrants from Mexico. Although we found that English language skills were an important factor for overall concentration, these skills make relatively small contributions to explaining own-country concentration for all countries but Mexico.

In contrast, Table 6 shows that English proficiency is the most important factor in accounting for other-country concentration for 12 of the l8 countries. The countries in which language is unimportant are the six countries from which at least 95 % of immigrants speak English well or very well. It seems clear that part of other-country concentration reflects coworkers who speak languages that are shared by more than one country of origin: particularly, Spanish and Chinese. However, it is likely that low English language proficiency is correlated with low levels of other skills and that this pattern results partly from firms hiring low-skilled immigrants from several countries of origin. This seems likely to reflect concentration in jobs where verbal communication is not very important rather than jobs in which a shared non-English language facilitates communication.

Industry also plays a larger role in accounting for other-country concentration than own-country concentration, reflecting immigrant-intensive industries in which employers often hire from more than one country of origin. Finally, the other-country results also show evidence of a strong tie between living with other immigrants and working with them. With the exception of countries with little overall concentration, living with noncompatriot immigrants accounts for 12 % to 26 % of other-country concentration. Some of this may reflect how shared languages affect residential patterns, but many immigrants have coworkers from source countries that do not share their language. For example, immigrants from Taiwan and China are disproportionately likely to live with each other, but both groups are also more likely than natives to live with other Asian immigrants. Within MSAs, Chinese immigrants are 2.4 % more likely than natives to live with immigrants from Taiwan; but they are 2.9 % more likely than natives to live with immigrants from Vietnam and roughly 3 % more likely to work with immigrants from other Asian countries.

Before concluding, we briefly discuss richer specifications that permit a full set of interactions between our explanatory variables and dummy variables for own and other group. In unreported results, we find that such specifications don’t change the overall messages of our decompositions, but the interactions yield some additional insights.Footnote 20 In Table 7, we report only the coefficients on main and interaction terms involving our language and residential segregation measures—two variables that are of particular interest in these richer specifications. Each specification used here includes residential shares for our 18 largest countries (plus an other category), but in each row, we present only the coefficients on terms involving residential shares for the country used to define the dependent variable. So in the first row, we report the main/own/other coefficients on the share of neighbors who are from Mexico.

Table 7 Differential effects of language and residential segregation on concentration by country of birth

The “main” effects in the first column give the effect of speaking English poorly for natives. In the first row, for natives, speaking English poorly is associated with a 1.6 percentage point increase in the probability of working with immigrants from Mexico, relative to natives who speak English very well. The “own” effect gives the difference in the effect between workers from the designated country and natives, and the “other” column gives differences in the effect between immigrants from other countries and natives. For immigrants from Mexico, speaking English poorly has a slightly larger effect on the probability of working with immigrants from Mexico than it does for natives; for immigrants from other countries, the effect is smaller than for natives. Looking down the table, the effect of speaking English poorly on the probability that natives work with any of our source countries is tiny for all countries except Mexico. This reflects that most natives with limited English proficiency speak Spanish and that Mexico is our largest source country. Combining main and interaction effects, the implied effect of not speaking English well for immigrants from Mexico (1.9 + 1.6 = 3.5) is within the range of implied effects for own group for other countries. Similarly, although the “other” interaction has a relatively large negative coefficient in the row for Mexico, it is offset by the main effect.

The own-country effects vary widely across countries, with the largest effects generally found among immigrants from Asia and Poland. One possible explanation is that the language effects are particularly strong for those whose first language is linguistically distant from English. The other-country effects are generally small, particularly compared with the own-country effects.

In contrast, all the main effects and most of the own-/other-country effects for residential segregation are positive; in all rows of Table 7, having neighbors from a particular country of origin is positively associated with having coworkers from that country for both natives and immigrants. There is, however, considerable variation across countries. Asian countries (e.g., China, Japan, Korea, Taiwan, and Vietnam) all have especially large own-country effects, implying that immigrants from those countries with many compatriots among their neighbors are much more likely to work with compatriots than those who live primarily with natives.

Concluding Remarks

Using matched employer–employee data that comprehensively cover employment in our sample of MSAs, we find that immigrants are much more likely to work with one another—and hence are less likely to work with natives—than would be expected given random allocation of workers. This is driven partly by the distribution of immigrants across MSAs, but within MSAs, substantial concentration remains. Immigrants who work together are quite likely to be compatriots, particularly those who have poor English language skills. However, immigrants from different countries of origin are also more likely to work together than to work with natives.

We find substantial differences between immigrants and natives in three factors that have strong associations with concentration: industry of employment, the share of immigrants among neighbors, and (unsurprisingly) English language skills. As a result, these three factors (especially own-country residential segregation) are the most important in accounting for overall concentration. In contrast, although having coworkers who are neighbors has a substantial positive relationship with the share of immigrants among an immigrant’s coworkers, natives and immigrants differ little in the extent to which coworkers are neighbors. Thus, the share of coworkers who are neighbors accounts for very little concentration.

Sizable contributions from language and detailed industry in accounting for immigrant concentration are consistent with an important role for sorting on productive characteristics. Those speaking limited English are quite likely to work with compatriots, which is consistent with the need for a shared language to facilitate coordination within the workplace. Differences in the industries employing natives and immigrants likely reflect sorting on the kinds of skills that the two groups bring to the labor market. However, traditional measures of skill levels—education and earnings—do not account for much concentration.

Our results highlight the importance of taking several factors into account simultaneously. For example, other studies have found a role for language skills in explaining related outcomes. We find that language matters, but its contribution is diminished when we take into account own-country residential segregation and employer characteristics.

The characteristics we measure fully account for patterns of other-country concentration. In contrast, we can account for less than one-third of compatriot workplace concentration for 15 of the 18 source countries we consider—Mexico, Cuba, and the Dominican Republic are the exceptions. We interpret our success in explaining other-country concentration as suggesting that we have done reasonably well in identifying factors that lead nonnatives to end up grouped in workplaces. However, that success makes the large unexplained compatriot component of concentration a puzzle: workers’ excess tendency to work with their compatriots must be largely associated with factors we have not measured. Identifying such factors should be a high priority for future research. We view country-specific social networks centered on something other than U.S. tract of residence as one likely candidate.