Quantification of extracellular proteins, protein complexes and mRNAs in single cells by proximity sequencing

  Article
  Published:

Nature Methods

We present proximity sequencing (Prox-seq) for simultaneous measurement of proteins, protein complexes and mRNAs in thousands of single cells. Prox-seq combines proximity ligation assay with single-cell sequencing to measure proteins and their complexes from all pairwise combinations of targeted proteins, providing quadratically scaled multiplexing. We validate Prox-seq and analyze a mixture of T cells and B cells to show that it accurately identifies these cell types and detects well-known protein complexes. Next, by studying human peripheral blood mononuclear cells, we discover that naïve CD8+ T cells display the protein complex CD8–CD9. Finally, we study protein interactions during Toll-like receptor (TLR) signaling in human macrophages. We observe the formation of signal-specific protein complexes, find CD36 co-receptor activity and additive signal integration under lipopolysaccharide (TLR4) and Pam2CSK4 (TLR2) stimulation, and show that quantification of protein complexes identifies signaling inputs received by macrophages. Prox-seq provides access to an untapped measurement modality for single-cell phenotyping and can discover uncharacterized protein interactions in different cell types.

Fig. 1: Overview of Prox-seq for joint proteomic and transcriptomic analysis of single cells.
Fig. 2: Quantification of protein complexes and the proximity ligation background.
Fig. 3: Prox-seq reveals a new CD9–CD8 interaction in peripheral blood mononuclear cells.
Fig. 4: Simultaneous protein and mRNA measurements by Prox-seq on 8,700 single peripheral blood mononuclear cells.
Fig. 5: Prox-seq enables the study of receptor interactions under combined Toll-like receptor stimulation in macrophages.
Fig. 6: Prox-seq reveals single-macrophage variability in TLR signaling and enables identification of immune inputs from protein measurements.

Data availability

The raw and count data are deposited in NCBI’s Gene Expression Omnibus under accession numbers GSE149574 and GSE196130. Source data are provided with this paper.

Code availability

The custom program for PLA product alignment and the codes used for alignment and data analysis are available at


We thank J. Huang (Pritzker School of Molecular Imaging, University of Chicago) for the generous gift of the Jurkat and Raji cell lines used in this study. We thank A. Basu (Section of Genetic Medicine, Department of Medicine, University of Chicago) and H. Eckart (Section of Genetic Medicine, Department of Medicine, University of Chicago) for their advice on the Drop-seq protocol. We acknowledge both The University of Chicago Genomics Facility, The University of Chicago Cytometry and Antibody Technology facility and The University of Chicago Research Computing Center for their services. We thank U. Landegren (Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University) for advice with PLA. We thank Z. Ren and C. Chen at the CRI Bioinformatics Core for their advice on the alignment program. We thank A. A. Khan (Department of Pathology, University of Chicago) and D. Reiman (Department of Bioengineering, University of Illinois at Chicago) for discussions on data analysis. Automated library preparation was performed by the Cellular Screening Center at the University of Chicago. S.T. was awarded a National Institutes of Health (NIH) R01 grant GM127527 and the Paul G. Allen Distinguished Investigator Award, which supported this work. M.C. was awarded NIH R01 grants GM126553 and HG011883, and an NSF grant 2016307, which supported this work.

Author information

Authors and Affiliations



L.V., H.V.P., C.J., S.T.R. and S.T. conceived of and designed the project. L.V., H.V.P., B.K. and C.J. performed the experiments, designed the Prox-seq probe components and analyzed sequencing data. H.V.P. and M.C. performed statistical analysis. L.V., H.V.P. and S.T. wrote the manuscript. S.T. supervised the project. All authors reviewed the manuscript.

Corresponding author

Correspondence to Savaş Tay.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Nikolai Slavov, Chun Ye, and the other, anonymous, reviewer for their contribution to the peer review of this work. Primary Handling Editor: Lei Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Jurkat cell protein expression levels.

Flow cytometry data showing Prox-seq probe binding on Jurkat cells. (a) Each T cell marker in the panel along with isotype controls. (b) The gating strategy to identify individual cells. CD45RA uses the mouse IgG2a control, CD147 uses the goat control, and the rest use the mouse IgG1 control.

Extended Data Fig. 2 Raji cell protein expression levels.

Flow cytometry data showing Prox-seq probe binding on Raji cells. (a), Each B cell marker in the panel along with isotype controls. (b) The gating strategy to identify individual cells. B7 and ICAM1 use the mouse IgG1 control, HLA-DR uses the mouse IgG2a control, PDL1 uses the mouse IgG2b control, and CD147 uses the goat control.

Extended Data Fig. 3 Comparison of protein quantification between Prox-seq and flow cytometry.

(a, b) Distribution of the protein abundance of Jurkat markers as measured by (a) Prox-seq and (b) flow cytometry. (c, d) Distribution of the protein abundance of Raji markers as measured by (c) Prox-seq and (d) flow cytometry. (e) Scatter plot showing the median protein abundance as measured by flow cytometry or Prox-seq. Each point indicates a protein. The plot also shows the Spearman’s correlation coefficient, ρ, between Prox-seq and flow cytometry measurements.

Source data

Extended Data Fig. 4 Benchmarking of protein quantification based on PLA products.

(a) Schematic showing how the use of free oligo binding could help measure non-proximal Prox-seq probes. After the ligation step in the standard Prox-seq protocol, free DNA oligos were added so that they can be ligated to probes that are bound to protein but are not proximal to another Prox-seq probe. Antibodies’ cartoons were made with BioRender’s academic license. (b) Box plots showing the fraction of protein counts calculated from PLA products to the protein counts calculated from PLA products and non-proximal Prox-seq probes (n = 95 Jurkat cells and 93 Raji cells). The center line of the box indicates the median, the bottom and top bounds of the box indicate the 25th and 75th percentiles, and the whiskers extend to 1.5× the interquartile range beyond the box. (c) Scatter plots comparing protein quantification based on PLA products vs. free oligo binding method for Jurkat cell markers. (d) Scatter plots comparing protein quantification based on PLA products vs. free oligo for Raji cell markers. (e) Scatter plots comparing CD147 protein quantification based on PLA products vs. free oligo binding method for Jurkat and Raji cells. In (ce), the numbers above each panel indicate the Pearson’s correlation coefficients. The Free oligo-based estimates are made by taking the number of PLA UMI’s that contain one barcode from the indicated protein and one barcode from the free oligo.

Source data

Extended Data Fig. 5 Analysis of Prox-seq probe non-specific background binding.

(a) Schematic of the experiment. Jurkat and Raji cells were separately incubated with the full or half Prox-seq probe panel, then combined and processed with the 10x Prox-seq pipeline. The half probe panel includes Jurkat markers CD28, PD1, and CD147 probes, and Raji markers HLADR and PDL1. The full probe panel contains all the probes in the half panel, plus Jurkat marker CD3 and Raji markers ICAM1 and B7. (b) t-SNE plot based on PLA product data showing the cell type and probe panel identity. F and H stands for full and half panels, respectively (n = 856, 2738, 1159 and 1051 single cells for Jurkat_F, Jurkat_H, Raji_F and Raji_H, respectively). (c) t-SNE plots showing the expression levels of CD3E and HLA-DRA genes. (d) Plots showing the median counts of non-specific PLA products across different cell types and probe panels. The center line of the box indicates the median, the bottom and top bounds of the box indicate the 25th and 75th percentiles, and the whiskers extend to 1.5× the interquartile range beyond the box. Each black line connects the median counts of a non-specific PLA product in the half and full probe panel. n = 16 non-specific PLA products for both Jurkat and Raji clusters. (e) Violin plots showing the relative levels of Jurkat-specific PLA products CD28:CD28, PD1:PD1 and CD3:CD3, and Raji-specific PLA products HLADR:HLADR, PDL1:PDL1 and ICAM1:ICAM1. CD3:CD3 and ICAM1:ICAM1 PLA products were expected to only be detected in full probe panel clusters. (f) Heatmap showing the relative level (row z-score) of top 10 PLA product markers of each of the 4 clusters identified in (c). (g) Heatmaps showing the average log-transformed levels of protein complexes in Jurkat cells. (h) Violin plot showing the normalized levels of protein complexes CD28:CD28 and CD28:PD1 in Jurkat cells. The normalized levels were calculated by log-transforming counts per 10,000 UMIs of predicted protein complexes plus a pseudocount of 1. The fold-change was calculated by dividing the average normalized level of the Jurkat full panel cells by that of the Jurkat half panel cells.

Source data

Extended Data Fig. 6 Characterization of CD4 T cells in PBMCs using plate-based Prox-seq.

(a) Scatter plot showing two subpopulations of CD4 T cells, according to CD9-related PLA products level. (b) Violin plot showing that, unlike CD8 T cells, both subpopulations of CD4 T cells do not express the protein complex CD9:CD8. (c) Violin plots showing the distribution of proteins CD3, CD4 and CD9 in the two subpopulations of CD4 T cells. Note that the complex detection algorithm assigns zero values to low-abundance PLA products that do not pass the statistical test.

Source data

Extended Data Fig. 7 Number of predicted protein complexes across cell types.

(a) t-SNE plot of PBMC clusters, obtained using mRNA expression level. (b) Violin plots showing the number of predicted protein complexes per single cell, for each of the 8 clusters identified using mRNA data. The horizontal red lines indicate the total number of predicted protein complexes per cluster. In total, 61 protein complexes were detected across all 8 clusters, of which 37 complexes are unique. (c) Plot showing the number of protein complexes predicted by the algorithm at different number of cells. Here, various numbers of cells were randomly subsampled from each of the 8 clusters identified in (a), and the complex prediction algorithm was applied on the subsampled cells.

Source data

Extended Data Fig. 8 Analysis of LPS and PAM-treated macrophages.

(ac) Receiver operating characteristic (ROC) curves of 5-fold cross-validation of a logistic regression classifier that is trained on (a) 5-minute data, (b) 2-hour data, and (c) 12-hour data. The black dashed lines in (ac) indicate random classification. (d) Violin plots showing the log-transformed count of the top three PLA products of the logistic regression model that is trained on 2-hour data. P-values are calculated using two-sided Welch’s t-test (n = 31, 32, 32, and 36 single cells for the control, LPS, PAM, and both treatment groups, respectively). (e) Schematic showing how the logistic regression classifier is used to predict response (LPS-like, PAM-liked, and mixed) in cells treated with both LPS and PAM after 2 h. (f) Bar plot showing the proportion of LPS/PAM-treated cells that show LPS-like, PAM-like and mixed response. n indicates the number of cells in each response group. (g) Violin plots showing the log-transformed count of the top logistic regression coefficients (Fig. 6b) in each predicted response group for cells treated with both ligands. (h, i) Heatmaps showing (h) the relative PLA product levels and (i) the relative protein levels of the LPS-like, PAM-like and mixed response groups. The PLA product (or protein) counts are log-transformed, then averaged by response group, and finally standardized. Hierarchical clustering is performed on the PLA products (or proteins) and response groups using Euclidean distance and complete linkage. (j) ROC curves of a logistic regression classifier trained on protein levels from different time points. The classifier is trained to predict whether a single cell was stimulated with LPS or PAM. Each ROC curve represents the mean ROC curve from 5-fold cross-validation. The area under the curve (AUC) metric of each time point is presented as mean ± s.d. of the AUC metrics from 5-fold cross-validation for that particular time point.

Source data

Extended Data Fig. 9 Analysis of sequencing depth.

(ac) Effects of sequencing depth on (a) the number of detected genes and transcript counts per single cell, (b) the number of detected PLA products and their UMI counts, and (c) the number of detected proteins and protein UMI counts in 10x-based Prox-seq. (d) Effects of sequencing depth on automated cell type annotation based on mRNA data with singleR package. The cell type annotation at the maximum sequencing depth is used as the ground truth annotation. (e) Effects of sequencing depth on the number of detected protein complexes. Clusters were identified using mRNA data (see Extended Data Fig. 7). Clusters 0 and 3 were chosen as examples because they had the most number of cells per cluster. In (ae), the sequencing results of the mRNA and PLA product libraries from the 10x PBMC experiment were downsampled to 10%, 20%, 40%, 60%, and 80% to simulate different sequencing depths. (f, g) Effects of sequencing depth on (f) the number of detected PLA products and their UMI counts, and (g) the number of detected proteins and protein UMI counts in plate-based Prox-seq. In (f, g), the sequencing results of the mRNA and PLA product libraries from the plate-based PBMC experiment were downsampled to 0.5%, 1%, 5%, 10%, 25%, 50%, and 75% to simulate different sequencing depths. The red dashed lines in (e, f) indicate 10,000 mean reads per cell.

Source data

Supplementary information

Supplementary information

Supplementary Figs. 1–11, captions for Supplementary Tables 1–12 and Supplementary Methods and references.

Reporting Summary

Supplementary Tables 1–12

Supplementary tables 1–12

Supplementary Data 1

List of identified protein complexes.

Supplementary Data 2

List of gene expression levels.

Source data

Source Data

Statistical source data for Figs. 1–6, Extended Data Figs. 3–9 and Supplementary Figs. 2–11.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Vistain, L., Van Phan, H., Keisham, B. et al. Quantification of extracellular proteins, protein complexes and mRNAs in single cells by proximity sequencing. Nat Methods 19, 1578–1589 (2022).

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

