Introduction

Breast cancer is a malignant tumor that occurs in the breast tissue, usually originating in the glandular tissue of the breast, and can develop in various parts of the breast such as the breast lobules, ducts and glandular interstitium [1]. According to the latest global cancer statistics published by the World Health Organization's International Agency for Research on Cancer (WHO) in 2020, breast cancer is projected to have over 2.3 million new cases and cause 685,000 deaths worldwide, which has surpassed lung cancer, making breast cancer become the most common cancer globally [2]. At present, the main clinical treatments for breast cancer include surgery, radiotherapy, chemotherapy, endocrine therapy and targeted therapy [3]. Tumor immunotherapy mainly eliminates tumor cells by reactivating the host's anti-tumor immune response. In recent years, breakthrough research progress has been made in tumor immunotherapy, which has been widely used in melanoma, non-small cell lung cancer, gastric cancer and other malignant tumors, attracting great attention due to its unprecedented clinical benefits [4]. The traditional concept has always been that breast cancer belongs to "low immunogenicity" tumor, but the latest research shows that breast cancer is also expected to be a benefit object of immunotherapy, especially for those patients who are resistant to chemotherapy or endocrine therapy, tumor immunotherapy has shown great therapeutic prospects [5, 6]. Tumor immunotherapy has brought new light and vitality to advanced breast cancer patients, but its low response status in breast cancer patients is still a challenge that restricts the wide application of tumor immunotherapy in breast cancer patients [7, 8]. Studies have shown that breast cancer tumor immune microenvironment remodeling is an important factor in determining the efficacy of tumor immunotherapy [9]. Therefore, with the help of the latest research technology, objective, multidimensional and accurate characterization of the breast cancer immune microenvironment landscape induced by key immunomodulatory molecules and targeted research and development of new clinical strategies will certainly help improve the clinical status of low response of breast cancer patients in the application of tumor immunotherapy, thus creating more survival opportunities for breast cancer patients, which has important scientific significance and clinical practical value.

CD24, a small glycosylated cell adhesion protein, is closely related to the occurrence, progression and prognosis of breast cancer [10,11,12]. Recent studies have shown that CD24 has an important remodeling effect on the breast cancer immune microenvironment[12]. Nevertheless, there exists debate regarding the impact of CD24 on the immune microenvironment in breast cancer. While CD24 is a well-recognized marker for breast cancer stem cells, typically contributing to the creation of an immunosuppressive microenvironment [13]. However, recent studies have reported that overexpressed CD24 in breast cancer is linked to its interaction with Siglec-10 on macrophages, thereby promoting immune evasion and subsequently shaping an immunosuppressive microenvironment [12]. Therefore, whether CD24 or over-expression mediates the inhibitory remodeling of the immune microenvironment of breast cancer is urgently elucidated by the latest research technology, so as to develop new therapeutic strategies targeting CD24, a key molecule in the regulation of the remodeling of the tumor immune microenvironment, and improve the application of tumor immunotherapy in the clinical treatment of breast cancer. Recently, emerging multiomics technologies, including single-cell sequencing, bulk-RNA, and multicolor immunofluorescence labeling, have shown great advantages and application value in comprehensively and objectively revealing the tumor immune microenvironment [14,15,16]. However, no studies have applied the above multiomics techniques to comprehensively characterize the CD24-mediated immune microenvironment.

In this study, we used single-cell RNA sequencing (scRNA-seq) to create transcriptomic maps of breast cancer samples, revealing a variety of cell types. Subsequent analyses focused on epithelial cells, using methods including CellChat and locus analysis. Further, we used large amounts of RNA-seq data from the Cancer Genome Atlas (TCGA) to construct a CD24-associated breast cancer prognosis model and confirmed the predictive value of this feature by using the GSE20685 dataset. Finally, we used multi-color immunofluorescence technology to reveal that CD24+PANCK+breast cancer cells in the tumor microenvironment can induce the CD8+T cells exhausted, leading to the emergence of inhibitory immune microenvironment in breast cancer and poor prognosis. This suggests that targeted inhibition of CD24 may improve tumor immunotherapy by alleviating CD8+T cell exhausting for clinical breast cancer treatment. The workflow diagram is shown in Fig. 1.

Fig. 1
figure 1

Workflow diagram of the study

Methods and materials

Data collection and preprocessing

The expression data of scRNA-seq of GSE148673 which included six breast cancer was obtained from TISCH with h5 format (http://tisch.comp-genomics.org/). The Seurat package in R software (version 4.1.3) was used to process the raw data of each sample. After the cells with less than 300 genes and the samples with less than 3 cells were excluded, a total of 10,114 cells were obtained after the samples underwent quality control including the removal of low-quality cells, data standardization and normalization for subsequent analysis. Bulk-RNA sequencing data with FPKM format and corresponding clinical information of breast cancer were obtained from the publicly available GDC-TCGA project in the UCSC Genome Browser datasets (https://xenabrowser.net/datapages/). After excluding normal tissue, tissue from the same patient, and tissue with incomplete survival information, 1075 breast cancer samples were obtained for subsequent analysis. The GSE20685 cohort included 327 breast cancer patients was obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) as an external validation cohort. Detailed clinical information for all cohorts were included in Table 1.

Table 1 The clinical features of breast cancer in TCGA cohort and GSE20685

Identification of cellular types and differentially expressed genes

Cell clusters within the single-cell RNA-seq dataset were initially identified using dimensionality reduction clustering with t-SNE [17]. Subsequently, cell type annotation was performed with the SingleR package to accurately determine cell identities [18]. Marker genes for all annotated cell subpopulations were identified using the FindAllMarkers function, and the reliability of cell subpopulation annotations was verified in the CellMarkers database (http://117.50.127.228/CellMarker/). Following the annotation of cell subpopulations, the cells were divided into immune and non-immune subsets based on the expression level of specific immune markers, such as CD45 (PTPRC). The differential gene expression analysis was then performed to identify CD24-related differential genes within these specific subsets.

Pseudo-temporal trajectory analysis at the single-cell level

Pseudo-time trajectory analysis, generated using the Monocle2 package, offers insights into cellular dynamics and transitions [19]. To reduce the data dimensionality of differentially expressed genes, we employed the reduceDimension function with parameters reduction_method set to "DDRTree" and max_components set at 2 to reduce the dimensionality of differentially expressed genes. Moreover, the plot_genes_jitter and plot_genes_violin functions were applied to acquire understanding of cellular sequencing and alterations in gene expression.

Analysis of interactions between cells

Interactions between ligands and cell surface receptors play a pivotal role in a multitude of biological processes by facilitating cell-to-cell communication. To elucidate potential intercellular communication among different cell types, the CellChat package was utilized to scrutinize ligand-receptor interactions within specific cellular populations [20]. The createCellChat function was employed to instantiate a cell communication object, laying the groundwork for an extensive analysis. The expression data of signaling genes were carefully extracted and processed, leading to the subsequent identification of overexpressed genes and ligand-receptor pairs. These discoveries were subsequently mapped onto the Protein–Protein Interaction network, thereby facilitating a more profound comprehension of potential signaling interactions. Cell-to-cell communication probabilities were computed, thus revealing a network of cellular interactions. In order to enhance data reliability, communication relationships involving fewer than three cells were systematically excluded. Furthermore, we extended our analysis to decode cell–cell communication at the signal pathway level, enabling computation and visualization of communication networks at an aggregated cell level.

Construction and evaluation of prognostic signature

CD24-related differentially expressed genes (DEGs) in the scRNA-seq analysis were extracted in the TCGA cohort for the construction of subsequent prognostic models. Univariate cox regression analysis was performed to screen differential genes associated with breast cancer prognosis in the TCGA cohort by survival package. Then, least absolute shrinkage and selection operator (LASSO) Cox regression analysis was conducted to further capture the Characteristic gene associated with prognostic via glmnet package [21]. In the TCGA train cohort (N = 755), multivariate cox regression analysis was utilized to construct a prognostic model and obtained the regression coefficients of relevant prognostic genes. The TCGA test cohort (N = 320) and GSE20685 cohort (N = 327) were used as verification queues to verify the accuracy of the model. The formula of the prognosis model was shown below:

$$Risk \; score = \sum\nolimits_{{i = 1}}^{n} {(Expression \; \ast \; coef)_i}$$

The risk score for each sample was then calculated using the prediction function in the survival package, and all breast cancer patients were divided into high- and low-risk groups based on the median risk score. Kaplan–Meier survival curves were constructed using the "survminer" and "survival" packages to compare the overall survival between the high and low risk groups. Furthermore, risk curves for all cohorts and survival status maps for all patients were visualized using the pheatmap package. The accuracy of above prognostic model was rigorously appraised through the generation of receiver operating characteristic (ROC) curves and computation of area under the curve (AUC) values [22].

Construction of a nomogram integrating clinical variables

To enhance the precision of prognostic model, a comprehensive nomogram was developed by rms packages, which integrated risk score and clinical variables such as age and TNM stage [23]. The process initiated with univariate and multivariate Cox regression analyses to evaluate the independence of risk scores and clinical variables as prognostic factors. The predictive performance of this nomogram was rigorously compared to other prognostic factors through Receiver Operating Characteristic (ROC) analysis and Decision Curve Analysis (DCA), utilizing the timeROC and ggDCA packages, respectively.

Evaluation of immune landscape in high- and low-risk group

In order to explore the biological functions of different risk group, Gene set enrichment analysis (GSEA) was employed by org.Hs.eg.db and clusterProfiler package, where gene set of "c2.cp.kegg.v7.4.symbols.gmt" was obtained from the MSigDB database (https://www.gsea-msigdb.org/gsea/msigdb/index.jsp) [24]. Given the critical role of the tumor microenvironment (TME) in tumor immunotherapy [9], we utilized the ESTIMATE package to assess TME composition [25]. This entailed calculating immune scores, stromal scores, and ESTIMATE scores, which were used to estimate the levels of stromal and immune cells within malignant tumor tissues based on gene expression characteristics. In addition, the CIBERSORT algorithm was applied to evaluate the transcriptome data and quantify the infiltration levels of 22 immune cells in each sample [26]. Therefore, after evaluating the differences of the above tumor microenvironment and immune infiltration related indicators in the high and low risk groups, Pearson correlation analysis was used to further analyze the correlation between immune cell infiltration level and risk score.

Estimation of immunotherapy response

Following a comprehensive analysis of the immune landscape across various risk groups, we proceeded to investigate their respective immunotherapy responsiveness by assessing parameters such as immune checkpoint expression, and IPS score. The Immunophenotype Score (IPS), a well-established predictor of CTLA-4 and PD-1 responsiveness, was acquired from the TCIA database (https://tcia.at/) and leveraged to forecast the responses of high- and low-risk groups during immune checkpoint inhibitor therapy [27].

Analysis of multiplexed quantitative immunofluorescence and spatial proximity

A total of 62 breast cancer samples who have given Informed consent without drug treatment underwent TissueFAXS panoramic tissue quantitative analysis. Multiplex immunofluorescence staining of tissues used the Alpha TSA Multiplex IHC Kit (Beijing, China) with the following labels: XTSA 570 (CD3), XTSA 480 (CD8), XTSA 620 (CD24), XTSA 690 (FOXP3), and XTSA 780 (PANCK). The samples were deparaffinized using xylene and anhydrous ethanol, followed by rehydration with ethanol gradients. After two 5-min rinses with distilled water, antigen repair was performed using the provided solution. Post-cooling, samples underwent three PBST rinses and a 15-min immersion in sealing solution. Primary antibody incubation took place at 37 ℃ for 1 h, followed by three PBST washes. Subsequently, samples were incubated with secondary antibodies at 37 ℃ for 10 min, rinsed three times with PBST, and subjected to a 5-min fluorescent dye treatment at room temperature. This dyeing process was repeated for comprehensive marker labeling. Finally, DAPI nuclear staining occurred for 8 min at room temperature, followed by PBST rinsing and slide sealing for subsequent scanning. The ZEISS Axioscan7 full-slice imaging system in conjunction with ZEN 3.3 software was utilized for image acquisition and analysis. Quantitative assessments, including measurements of nuclear area, fluorescence intensity, and cell density within individual cell areas to identify positive cells, were performed using Strata Quest software (Tissue Gnostics). Furthermore, the software was also employed to quantify the number of CD3+CD8+FOXP3+T and CD3+CD8+FOXP3T cells within 10 μm distance from CD24+PANCK+cells [28].

Statistical analysis

The statistical analysis in this study was carried out using R software (version 4.1.3) along with relevant R packages obtained from Bioconductor and CRAN. Differences between groups were assessed using the Wilcoxon test for two-group comparisons and the Kruskal–Wallis test for analyses involving more than two groups. Correlation analyses were performed using the Pearson test. Statistical significance was established at a threshold of P < 0.05.

Results

Cellular constitution of breast cancer

In this study, six single-cell samples from breast tumors within the GSE148673 dataset were analyzed. The clustering analysis divided the 10,114 cells into 24 distinct clusters, with the major marker genes identified (Fig. 2A, B). These clusters were then categorized into immune and non-immune subgroups, guided by variations in PTPRC expression (Fig. 2C, D). Subsequent subcluster annotation revealed 9 distinct cell types based on marker gene expression using SingleR (Fig. 2E, F), including B cells (elevated IGHG1), chondrocytes (heightened TEKT3), endothelial cells (increased VWF), epithelial cells (raised TFF3), fibroblasts (enhanced COL3A1), macrophages (elevated TYROBP), monocytes (elevated ASPM), T cells (heightened CD2), and tissue stem cells (increased RGS5 expression). Moreover, the expression patterns of CD24 and EPCAM within these 9 cell types were assessed, our further analysis found that both CD24 and EPCAM were highly expressed in epithelial cells and there may be a co-expression relationship (Fig. 3).

Fig. 2
figure 2

Single-cell atlas of breast cancer. A t-SNE and UMAP plots for 24 clusters. B Heatmap illustrating the primary five marker genes within 24 clusters. C Bubble chart depicting 9 genes across 24 clusters. D t-SNE plots of Immune and non-immune subgroups. E t-SNE annotating 9 distinct cell types in single-cell RNA sequencing. F Violin plot depicting the principal marker gene across a range of 9 diverse cell types

Fig. 3
figure 3

Expression of CD24 and EPCAM. A Feature plots showing the CD24 and EPCAM expressions across 9 cell types. B Single-cell density plots of CD24 and EPCAM. C Single-cell heatmap of CD24, EPCAM, and their co-expression

Cell trajectory analysis of epithelial cells in scRNA-seq

To validate the developmental stages of epithelial cell subsets and assess the expression of CD24 at different stages, we employed Monocle2 to conduct pseudo-time series analysis. The results revealed that the 12 clusters of epithelial cells could be broadly categorized into 5 distinctive differentiated states (Fig. 4A). The results from the timeline of cell differentiation showed that the cells in cluster2 and cluster16 were at the early stage of development, while the cells in cluster0 were at the late stage of development, suggesting that the cells in cluster0 were likely developed from cluster2 (Fig. 4A). Furthermore, the analysis showed that the expression levels of CD24 and EPCAM exhibited changes in alignment with the trajectory of cell development, with both following similar trends (Fig. 4B). Specifically, the expressions of CD24 and EPCAM decreased in early stage of development (state1), but progressively increased throughout development and reached their maximum at the end of development (state3).

Fig. 4
figure 4

Trajectory analysis of epithelial cells in breast cancer. A Trajectory plots showing different clusters in epithelial cells. B Dynamic expression of CD24 and EPCAM along pseudo time

Analysis of interactions between cells

To investigate the communication among cell types, we employed CellChat to identify ligand-receptor pairs and signal pathways (Fig. 5A, B). The results showed a direct and powerful interaction between CD24+epithelial cells and T cells (Fig. 5C, D). In particular, the interaction between CD24+epithelial cells and CD8+T cells was significantly stronger than that between CD24epithelial cells (Fig. 5E, F). The MIF-(CD74 + CXCR4) and MIF-(CD74 + CD44) ligand-receptor pairs played a central role in mediating communication between CD24+epithelial cells and CD8+T cells (Fig. 5G). Furthermore, when CD24+epithelial cells assume the role of the major signal providers, the intercellular communication with immune cells, notably CD8 T cells, seems to occur via engagement with the MK signaling pathway (Fig. 5H). Further analysis showed that the core components of MK signaling pathway including SDC4 and ITGA6 were specifically highly expressed in CD24+epithelial cells (Fig. 5I). In addition, we further investigated the potential signaling pathways of these cells as signal receivers and signal transmitters respectively. Notably, contributors to the output signaling in the CD24+epithelial cell population included MIF, MK, VISFATIN, GRN, ncWNT, and WNT. Conversely, MK, PARs, IFN-II, VISFATIN, GRN, EGF, TWEAK, ncWNT, and WNT were the primary influencers of output signaling in the CD24+epithelial cell population (Fig. 5J).

Fig. 5
figure 5

Analysis of intercellular communication. A The number of interactions in intercellular communication networks between 9 different cell type. B The interaction weights in intercellular communication networks between 9 different cell types. C The number of interactions in intercellular communication networks between epithelial cells and different T-cell subtypes. D The interaction weights in intercellular communication networks between epithelial cells and different T-cell subtypes. E The number of interactions of CD24+epithelial cells with CD8+T cells and regulatory T cells in intercellular communication networks. F The interaction weights of CD24+epithelial cells with CD8+T cells and regulatory T cells in intercellular communication networks. G Bubble plot of ligand-receptor pair-mediated interactions between CD24+epithelial cells and different T-cell subtypes. H Heatmap depicting ligand-receptor interaction patterns in the MK signaling pathway among diverse cell types. I Violin plot illustrating gene expression in the MK signaling pathway. J Heatmaps show the correlation of seven cell classes with multiple signaling pathways in both outgoing and incoming signaling patterns

Construction and validation of prognostic signature

In scRNA-seq, we identified 1,488 CD24-associated differentially expressed genes (DEGs) in immune and non-immune cell subpopulations (Supplementary Fig. 1). In order to further explore the clinical significance of these genes, we constructed a prognostic model of these genes in a cohort of breast cancer patients in TCGA. Specifically, 110 prognostic genes were identified by univariate Cox regression analysis in the TCGA cohort, and the gene dimensions were further reduced to 27 prognostic trait genes by LASSO regression (Fig. 6A, B). Subsequently, a prognostic model containing 15 prognostic genes was constructed in the TCGA training cohort (N = 755) by multivariate Cox regression analysis (Fig. 6C). The TCGA test cohort (N = 320) and the GSE20685 test cohort (N = 327) were used to verify the accuracy of the model. The risk score was calculated according to the following formula: CD24-related risk score = CD24*0.163877443425818 + MAL2*0.235684100040366 + GIMAP7*0.585242039122506 + TNN*(−0.209860274669745) + NT5E*0.379813027325492 + OFD1*(−0.46131900865562) + DLG3*0.538438513913014 + EIF4EBP1*0.168560977772221 + FABP7*(−0.190743699523104) + LCP1*(−0.316385097230447) + IFITM1*(−0.149289394352447) + HSPA2*(−0.157055037715933) + JAK1*(−0.298906770713586) + KLRB1*(−0.571931971007859) + ARID1B*0.613982854705078.

Fig. 6
figure 6

Establishing the CD24-related prognostic model. A Performed least absolute shrinkage for LASSO regression model construction. B LASSO Cox regression coefficient profiles. C Prognostic models incorporate gene coefficient. D In the TCGA training set, we assessed breast cancer patients’ risk score distribution, their overall survival time in relation to risk scores, time-dependent ROC curves and Kaplan–Meier overall survival (OS) curves for 1, 3, and 5 years. E In the TCGA testing set, we assessed breast cancer patients’ risk score distribution, their overall survival time in relation to risk scores, time-dependent ROC curves and Kaplan–Meier overall survival (OS) curves for 1, 3, and 5 years. F In the TCGA set, we assessed breast cancer patients’ risk score distribution, their overall survival time in relation to risk scores, time-dependent ROC curves and Kaplan–Meier overall survival (OS) curves for 1, 3, and 5 years. G In the GEO testing set, we assessed breast cancer patients’ risk score distribution, their overall survival time in relation to risk scores, time-dependent ROC curves and Kaplan–Meier overall survival (OS) curves for 1, 3, and 5 years

The patients were stratified into high- and low-risk groups using the median risk score as the dividing point. Notably, overall survival was significantly lower in the high-risk group than in the low-risk group, both in the training cohort and in the test cohort. In TCGA train cohort, the results of time-dependent ROC curve analysis confirmed that the above prognostic model had good prediction accuracy with AUC values of 0.683 (1 year), 0.797 (3 year) and 0.792 (5 year), respectively (Fig. 6D). The AUC values of time-dependent ROC curve analysis had reached to 0.851 (1 years), 0.767 (3 years), 0.706 (5 years) in TCGA test cohort (Fig. 6E). This performance was robustly validated in both the TCGA all cohort and GSE20685 test cohort (Fig. 6F, G).

Nomogram establishment and evaluation

To assess the potential independent prognostic value of CD24-related risk score and Clinical variables, we conducted univariate and multivariate Cox regression analyses (Table 2). The results of univariate Cox regression analysis showed that risk score, age, stage and TNM stage were closely related to the prognosis of breast cancer (Fig. 7A). Moreover, the multivariate Cox regression analysis indicated that the risk score, along with age and N stage, could independently function as prognostic factors for breast cancer patients (Fig. 7B). To improve the accuracy of the prognostic model, we constructed a nomogram to quantitatively predict survival of breast cancer patients by integrating the above clinical variables and risk scores (Fig. 7C, D). Compared to other clinical features and risk scores, the nomogram showed higher prediction accuracy, with AUC value of 0.814 (1 year), 0.803 (3 year), and 0.791 (5 year), respectively, in the TCGA cohort (Fig. 7E). Additionally, decision curve analysis (DCA) confirmed the superior predictive accuracy of the nomogram when compared to other prediction indices (Fig. 7F).

Table 2 Univariate cox regression and multivariate cox regression of risk score and Clinical characteristics
Fig. 7
figure 7

Creation of Nomogram based on prognostic model and clinical information. A Univariate analysis forest plot for riskScore and clinical features. B multivariate analysis forest plot for riskScore and clinical features. C Nomogram included riskScore and clinical parameters for predicting breast cancer prognosis probability. D Calibration curves based on the nomogram. E ROC curve analysis of the nomogram for 1, 3, and 5-year overall survival. F Decision curve analysis for 1, 3, and 5-year overall survival

Analysis of immune characteristics in breast cancer

Through GSEA enrichment analysis, we found significant differences in the biological processes involved between the high-risk group and the low-risk group. The biological processes were characterized by the DNA replication, homologous recombination, mismatch repair, protein export, and terpenoid backbone biosynthesis in high-risk group (Fig. 8A). However, the low-risk group was significantly enriched in signaling pathways such as primary immunodeficiency, and T cell receptors (Fig. 8A), which indicated that there may be a potential link between the low-risk group and immunity. To further explore this relationship, we conducted an analysis of immune landscape, specifically focusing on the tumor microenvironment (TME) and immune infiltration (Fig. 8B). The results indicate notable differences between the low-risk and high-risk groups. Specifically, the ESTIMATE score, immune score, and stromal score in the low-risk group were significantly elevated compared to those in the high-risk group (Fig. 8C). These observations collectively suggested that the low-risk group exhibited a higher proportion of stromal and immune cells within the TME. Additionally, we further explored the infiltration of 22 immune cells in distinct risk groups using the CIBERSORT algorithm. The higher infiltration of naive B cells, CD8 T cells, CD4 memory activated T cells, gamma delta T cells, Monocytes and activated dendritic cells was observed in the low-risk group, while M0 and M2 Macrophages exhibited higher infiltration in the high-risk group (Fig. 8D). Additionally, we observed a negative correlation between the infiltration levels of naive B cells, activated dendritic cells, monocytes, CD4 memory-activated T cells, CD8 T cells, and gamma delta T cells and the risk score (Fig. 8E).

Fig. 8
figure 8

Association between prognostic models and infiltration of immune cells. A GSEA analysis in the low- and high-risk groups. B Heatmap for assessing immune infiltration levels in high- and low- groups. C Stromal and immune cell proportions in breast cancer. D Assessment of infiltration levels of 22 immune cells in high- and low- groups. E Correlation analysis between riskScore and 6 immune infiltrating cell levels

The association between the risk group and immunotherapy response

Having identified substantial disparities in the immune landscape between the low-risk and high-risk groups, we proceeded to investigate their connection with immunotherapy response using various indicators. Initially, we investigated the expression of immune checkpoints and observed a significant increase in the expression of 36 immune checkpoints within the low-risk group (Fig. 9A). Additionally, we employed IPS assessments of PD1 and CTLA4 as quantitative measures to further gauge the potential efficacy of immune checkpoint inhibitors. Our results revealed that the IPS-CTLA4, IPS-PD1, and IPS-PD1-CTLA4 scores were notably elevated in the low-risk group (Fig. 9B). This suggested that patients in the low-risk group might exhibit a higher likelihood of responding positively to immune checkpoint inhibitors.

Fig. 9
figure 9

Immune checkpoint inhibitor treatment responsiveness prediction. A Analysis of immune checkpoint expression levels in high-risk and low-risk groups. B The Immunophenotype Score (IPS) determined by CTLA-4 or PD-1 status assessed the effectiveness of immune checkpoint inhibitor therapy

Potential immunosuppressive domestication of CD24+PANCK+ cell subsets on exhausted CD8+T cells in tumor microenvironment

Based on the findings of previous cell communication studies, we further used multicolor immunofluorescence techniques to identify the exact spatial interaction between CD24+breast cancer cells (CD24+PANCK+cells) and CD8+T cells. The presence and exhaustion status of CD8+T cells within the tumor immune microenvironment play a pivotal role in the response to tumor immunotherapy. Researches have shown that the exhaustion of CD8+T cells is a dynamic process and only completely exhausted CD8+T cell could entirely lose their ability to eliminate tumor cells. Recent studies have indicated that FOXP3+ serves as a marker for completely exhausted CD8+T cell that exclusively induced by the tumor immune microenvironment, thus presenting strong evidence for the immunosuppressive domestication of CD8+T cells by a specific subpopulation of cancer cells. Consequently, we employed TissueFAXS Cytometry Panoramic Tissue Quantification assays to investigate the spatial correlation between CD24+PANCK+cells and exhausted CD8+T cell with FOXP3+, which aimed to uncover the impact of CD24+breast cancer cell subpopulations on CD8+T cells in breast cancer microenvironment (Fig. 10A, B). The findings disclosed that within a 10 μm radius surrounding CD24+PANCK+cells, the average intensity of CD8 in the CD3+CD8+FOXP3+group was significantly reduced in comparison to the CD3+CD8+FOXP3group (Fig. 10C, D). Within a 10 μm radius of CD24+PANCK+cells, there was a notable increase in the proportion of exhausted CD8+ T cells with FOXP3+ among the total CD8+T cell population (Fig. 10E). These findings indicated that CD24+PANCK+cells had immunosuppressive domestication effect on CD8+T cells in breast cancer immune microenvironment. We further evaluated the effects of CD3+CD8+FOXP3+T cells and CD24+PANCK+cells on the survival of breast cancer patients. Both the CD24+PANCK+cells-high group and the high percentage of CD3+CD8+FOXP3+T cells group (within 10 μm to CD24+PANCK+cells subpopulation) had worse survival probability (Fig. 10F, G), which further substantiated that CD24+breast cancer cells contributed to the establishment of an immunosuppressive microenvironment, ultimately leading to worse prognosis of breast cancer patients. Detailed clinical information on the cohort is shown in Table 3.

Fig. 10
figure 10

Spatial distribution of CD24+breast cancer subsets and CD8+T-cell subtypes in TME. A Exemplary multi-label staining in breast cancer patient samples: DAPI (Bluish violet), CD3 (green), CD8 (blue), FOXP3 (pink), PANCK (yellow), CD24 (orange). B Simulation depicting spatial proximity analysis in representative areas. C Scatterplots of the mean intensity of CD8 within the area of CD3+CD8+FOXP3+cells and CD3+CD8+FOXP3cells. D Violin plots of CD8 mean intensity in different cell types within 10 μm from CD24+PANCK+cells. E Violin plots of percent of cells in different cell types within 10 μm from CD24+PANCK+cells. F Survival curves for the groups with high and low number of CD24+PANCK+cells within a 10 μm range (cutoff set at 2114). G Survival curves for the groups with high and low percentages of CD3+CD8+FOXP3+cells within a 10 μm range (cutoff set at 0.7321)

Table 3 Detailed clinical information of the spatial proximity analysis cohort

Discussion

Breast cancer is a remarkably heterogeneous malignancy that originates from breast tissue [29, 30]. Although surgery, chemotherapy, radiotherapy and emerging immunotherapy approaches have significantly improved the prognosis of breast cancer, problems in the course of clinical treatment such as recurrence, drug resistance and low response still reduce the overall survival rate of breast cancer patients [31,32,33]. The latest research shows that immunotherapy has great promise in breast cancer patients, especially for those who have developed resistance to chemotherapy or targeted therapies [6]. However, the low response status in breast cancer patients is still a challenge that restricts the wide application of tumor immunotherapy in breast cancer patients [8]. Addressing this challenge requires extensive research to identify new therapeutic targets and develop personalized immunotherapy strategies tailored to the specific characteristics of breast cancer patients. The research of Weissman et al. demonstrated that CD24 may be a major innate immune checkpoint in breast cancer and a promising target for immunotherapy [12]. The inhibition of CD24-Siglec-10 signaling via monoclonal antibodies enhances phagocytic clearance of CD24+cancer cells by macrophages, highlighting the substantial therapeutic potential of CD24 blockade in immunotherapy. In addition, CD24 has been shown to exert its pro-apoptotic role in B cell precursors and differentiated B cells by activating the MAPK pathway [34, 35] as well as suppressing immune responses by modulating T cell activity [36]. However, the specific mechanism by which CD24+tumor cells regulate the activity of CD8+T cells remains unclear. In this study, we integrated single-cell sequencing analysis and bulk-RNA sequencing analysis to reveal the heterogeneity and clinical practical application of CD24 in the immune microenvironment of breast cancer, and also found that CD24+tumor cells shaped immunosuppressive microenvironment by mediating exhaustion of CD8+T cells through multiple immunofluorescence techniques.

We identified 15 prognostic genes associated with CD24, which include CD24, MAL2, GIMAP7, TNN, NT5E, OFD1, DLG3, EIF4EBP1, FABP7, LCP1, IFITM1, HSPA2, JAK1, KLRB1 and ARID1B. In the risk model, CD24, MAL2, GIMAP7, NT5E, DLG3, EIF4EBP1, and ARID1B had adverse effects on breast cancer patient outcomes, while other genes were protective for prognosis. MAL2, also known as Myelin and Lymphocyte Protein 2, is a gene that codes for a protein involved in various cellular functions, including membrane trafficking and signaling. The lipid raft-resident protein MAL2 is implicated in the pathogenesis of various malignancies, including breast cancer [37,38,39,40]. Increased MAL2 expression is linked to adverse breast cancer prognosis and contributes to immune evasion through the suppression of tumor antigen presentation. Knockdown MAL2 boosted CD8+T-cells cytotoxicity and inhibited breast tumor growth, indicating the potential as an immunotherapy target in breast cancer [41]. NT5E, also known as CD73, which catalyzes the production of adenosine thereby leading to the development of tumor immune escape [42]. In bioinformatics studies, higher NT5E expression has been associated with poor prognosis in patients with a variety of malignancies, including breast cancer [42]. These findings underscore the potential of MAL2 and NT5E as targets for immunotherapy strategies in breast cancer, opening new avenues for therapeutic interventions. In addition, we further analyzed the immune-related characteristics in high- and low-risk groups which were divided by median risk score. In the low-risk group, we observed elevated levels of infiltrating immune cells, including naive B cells, CD8 T cells, CD4 memory activated T cells, and gamma delta T cells. Naive B cells, also known as mature B cells, are a type of white blood cell that plays a central role in the adaptive immune system, primarily responsible for recognizing and responding to pathogens and foreign substances [43]. CD8 T cells equipped with the ability to directly target and destroy infected or cancerous cells are essential players in cancer immunotherapy. Immune checkpoint inhibitor therapies such as PD-1 and CTLA-4 inhibitors, operate by reinvigorating the anti-tumor potential of CD8 T cells [44, 45]. Furthermore, CAR-T cell therapy involves the genetic modification of CD8 T cells, to express chimeric antigen receptors (CARs) capable of recognizing specific tumor antigens. These engineered T cells are subsequently reintroduced into the patient's body to selectively locate and eradicate cancer cells [46]. CD4 memory activated T cells possess the ability to stimulate and coordinate immune responses, making them pivotal in orchestrating effective anti-tumor immunity [47]. In contrast, gamma delta T cells are a unique subset of T cells that function as a bridge between the innate and adaptive immune systems, and their involvement in tumor immunosurveillance is an area of growing interest [48]. These cells play pivotal roles in stimulating immune responses and direct targeting of cancerous cells. Therefore, the enhanced infiltration of these cells in the low-risk group is an encouraging sign for the potential success of immunotherapy in these patients, implying that these individuals may have a more receptive immune microenvironment which could translate into improved responses to immunotherapeutic approaches. This observation underscores the significance of understanding the immune landscape to tailor immunotherapies for the best possible outcomes.

In order to further develop a new targeted therapy for CD24 to improve the current poor status of immunotherapy in breast cancer patients. Therefore, we used emerging multi-omics technologies such as single-cell sequencing and multicolor immunofluorescence labeling to fully reveal the CD24-mediated breast cancer immune microenvironment landscape. The analysis of intercellular communication indicated that CD24+epithelial cells and CD8+T cells may transmit signals through MIF signaling pathway. Macrophage migration inhibitory factor (MIF) is an important regulatory factor of the innate immune system, which has been proven to induce immunosuppressive environment [49]. The relationship between MIF and the immunosuppressive microenvironment in breast cancer is intricate. MIF can contribute to the formation of an immunosuppressive microenvironment through various mechanisms. It may promote the recruitment and activation of immunosuppressive cells, such as regulatory T cells (Tregs) and myeloid-derived suppressor cells (MDSCs), which can dampen the anti-tumor immune response [50, 51]. Additionally, MIF can induce the expression of immune checkpoint molecules on T cells, rendering them less effective in targeting cancer cells, while their role in suppressing CD8+T cells and shaping immunosuppressive microenvironments has not been fully elucidated. Furthermore, we found stronger interactions between CD24+tumor subsets (CD24+PANCK+cells) and exhausted CD8+T cells with FOXP3+ by using multiple immunofluorescence techniques. Recent studies have shown that CD8+FOXP3+T cells are a specific type of exhausted CD8+T cells that are difficult to detect in blood and normal tissues and only appear when induced by the tumor microenvironment[52, 53]. This suggests that the CD24+tumor subpopulation may secrete some chemokines or cytokines or directly interact with receptor molecules on the surface of CD8+T cells to induce exhaustion of CD8+T cell, and this particular interaction can also be defined as the domestication of the tumor microenvironment to immune cells. However, TME is a complex dynamic ecosystem composed of various cell types, extracellular matrix (ECM), blood vessels, and signaling molecules that play a key role in tumor initiation, progression, and therapeutic response [54]. The present study confirmed that immunosuppressive factors produced in the tumor microenvironment can promote the recruitment of immunosuppressive cells such as Tregs to TME [55, 56]. However, it is clear that this study found that the CD24+tumor subpopulation did not recruit exhausted CD8+T cells with FOXP3+ in this way. This similar inhibitory domestication relationship between CD24+tumor subpopulations and exhausted CD8+T cells allows us to understand the role of CD24+ breast cancer cells in forming an immunosuppressive microenvironment, but the specific mechanism of action remains to be further explored.

However, it is important to acknowledge several limitations inherent to our study. Firstly, the reliance on publicly available datasets may limit the generalizability and specificity of our findings, despite the advantages of large sample sizes for robust statistical analysis. Additionally, the use of computational inference for cell identification in single-cell sequencing data introduces potential limitations, as experimental validation through laboratory research would enhance the credibility of our results. Furthermore, the diversity of data sources utilized in our analysis may introduce bias, impacting the level of gene detection. Concerns were also raised regarding the validation of our CD24-related prognostic risk model in larger patient cohorts and the lack of adjustment for clinical variables in survival analyses. Finally, the inability to conduct more in-depth biological experiments to elucidate the specific mechanisms of action of CD24 + tumor cells due to resource constraints represents a significant limitation. Therefore, Future studies must include broader and diverse data sets, employ rigorous experimental designs to validate our findings and provide greater insight into targeting CD24+breast cancer cells to reverse the exhaustion of CD8+T cells.

Conclusion

In summary, we constructed a single cell transcriptome map of breast cancer using scRNA-seq. CD24+breast cancer cells were analyzed by various methods, and preliminarily elucidated the heterogeneity of CD24+breast cancer cells in cell differentiation trajectory and intercellular communication. In addition, we combined scRNA-seq with bulk RNA-seq data to identify prognosis characteristics associated with CD24. The CD24-associated prognostic risk model and the nomogram have independent prognostic value and can provide accurate prediction of survival and immunotherapy responsiveness in breast cancer patients. Notably, the CD24+PANCK+subgroup associated with poor breast cancer prognosis, was closely associated with exhaustion of CD8+T cell and the formation of immunosuppressive microenvironment in breast cancer. During immunotherapy, the tumor microenvironment can be reshaped by targeting CD24+PANCK+subsets or reversing the exhaustion of CD8+T cells, thereby restoring the antitumor role of effector T cells. However, further researches are needed to fully understand the underlying mechanisms.