Introduction

Breast cancer (BC) is one of the most commonly diagnosed malignancies, responsible for approximately 31% of new cancer cases in women and is listed as the second cause of cancer-related deaths in women in the 2023 Cancer Statistics1. TNBC is a special subtype of BC that is histologically distinguished by the absence of human epidermal growth factor receptor 2 (HER-2), estrogen receptor (ER), and progesterone receptor (PR). Current available treatments for TNBC include a mastectomy or breast-conserving procedure, chemotherapy, radiotherapy, immunotherapy, and other newly developed therapies2. Despite a variety of treatment strategies, the incidence of metastasis, recurrence, and drug resistance remains high in TNBC patients. Accumulating evidence indicates that numerous malignancies involving BC have populations of cells that resemble stem cells3. CSCs are a dynamic subgroup of tumor cells that have the ability to self-renew, be pluripotent, and proliferate indefinitely4. Given their relative resistance to cytotoxic chemotherapy and radiotherapy, these BC stem cells (BCSCs) may play a vital role in drug resistance and recurrence5,6. Hence, it is crucial to discover the vital BCSC stemness regulators for theoretical research as well as practical application.

In 2003, Al-Hajj initially described a subtype of cell fraction isolated from BC tissues. This subpopulation of BC cells was identified with the CD44 + /CD24-/low phenotype and was able to reproduce the tumor burden in mice7. After that, in 2007, a subgroup of BC cells was found to have the ability to initiate malignancy in vitro and in vivo by Ginestier et al., who determined that they had high aldehyde dehydrogenase (ALDH) activity8. Hence, human BCSCs are described on the basis of the expression of the cell surface marker CD44, the absence of CD24, and high ALDH activity9. Additionally, evidence is mounting suggesting BCSCs with these traits are involved in tumor initiation, development, and metastasis10. BCSCs are the main contributors to therapeutic resistance and recurrence, despite the fact that radiation therapy and chemotherapy can successfully eradicate the majority of tumor cells11,12,13,14. Since CSCs possess the ability to escape from strict regulation while relying on aberrant activation of some signaling pathways, they are thought to be an early event in oncogenesis and allow these cells to resist regular chemotherapies, leading to the recurrence of tumors9. Recently, a number of clinical trials based on the theory of relative tolerance of BCSCs to cytotoxic chemotherapy have been designed to find ways to address TNBC resistance3,9. Alongside the intrinsic signals that regulate BCSCs, these cells are also controlled by the components of the TME15,16,17,18,19. Numerous biological elements, such as endothelial cells, fibroblasts, inflammatory cells, and mesenchymal stem cells, are present in the microenvironment surrounding stem cells and may interact with CSCs via cytokine loops15. Some inflammatory factors, such as IL-8 and IL-6, have been found to promote the proliferation of CSCs after chemotherapeutic treatment20,21. Besides, some studies have demonstrated that serum levels of IL-8 and IL-6 in individuals suffering from advanced BC are linked to metastatic development and a poor prognosis22,23. Hence, a further comprehensive description of the BCSCs and the TME landscape surrounding them would make contributions to addressing the problems of therapy resistance and tumor recurrence, as well as developing CSC-targeted treatment strategies and novel immunotherapies.

This study aims to investigate the relationship between CSC-related genes and the prognosis of TNBC patients, as well as to elucidate the underlying molecular mechanisms. By classifying TNBC from the perspective of cancer stem cells, this research seeks to identify prognostic biomarkers and novel targeted therapeutic strategies for TNBC. In the current study, on the basis of the public stemness gene sets, the unsupervised clustering method, and weighted gene correlation network analysis (WGCNA) were applied to identify the stemness subgroups with discrete stemness and determine the genes associated with the stemness subgroups and prognostic outcomes. After that, the stemness-related signature was established by least absolute shrinkage and selection operator (LASSO) Cox regression and random survival forest analysis. Subsequently, the prognostic outcomes, TME landscapes, and chemotherapeutic sensitivities were analyzed in TNBC based on the stemness-related risk model. According to the result of random survival forest analysis, kinesin family member 15 (KIF15) was identified as the vital gene associated with prognosis and TNBC cell sphere, and a comprehensive series of validation experiments were further conducted in vitro.

Methods

Acquisition and pre-processing of TNBC datasets

The RNA-sequencing (RNA-seq) expression data and corresponding information of TNBC samples were extracted from The Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov/), Molecular Taxonomy of Breast Cancer International Consortium (METABRIC, http://www.cbioportal.org/), and GSE10309124,25 in the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) databases, respectively. The TNBC samples with insufficient clinical profiles and an overall survival (OS) of less than 60 days were excluded. Ultimately, a total of 499 samples were included in the study. Then, the log2 transformation was performed on the TCGA and GEO datasets. The Combat function from the “sva” R package was adopted to perform batch correction of the high-throughput sequencing results26. Each database was considered a separate batch, and the “mod” parameter was set to “global”. After that, 26 stemness gene sets were collected from StemChecker (http://stemchecker.sysbiolab.eu/), a web-based tool with comprehensive genetic information on published stemness27,28, which are provided in Supplemental Table 1.

Stemness features and consensus clustering for TNBC stemness subgroups

The “GSVA29 and GSEABase” R packages were employed to calculate the stemness enrichment fraction of the stemness genes on a sample-by-sample basis for each TNBC and present them as ssGSEA scores. Subsequently, Pearson’s correlation analysis was performed on the 26 stemness gene sets by using the “corrplot” R package. After that, the “ConsensusClusterplus” R package was used to conduct the consensus clustering algorithm (K-means clustering) for unsupervised classification on TNBC samples with 10,000 repetitions30. In the consensus clustering analysis, the maximum number of clusters was set to 6 (maxK = 6), the sampling proportion was 0.8 (pltem = 0.8), and the distance matrix used was Euclidean.

Analysis of immune cell type components and the tumor immune microenvironment (TIME)

CIBERSORT (http://cibersort.stanford.edu/) was employed to quantify the components of 22 immune cells through the RNA-seq expression profiles of normalized bulk samples31. The immune cell infiltration of each TNBC tissue was analyzed by the “CIBERSORT” R package with the LM22 annotated immune cell gene signature. In the analysis that followed, the assumption regarding the type of immune cell was deemed true and statistically significant at P < 0.05. Subsequently, the proportion of immune-stromal components of TME was assessed through the Estimation of Stromal and Immune Cells in Malignant Tumor Tissues Using Expression (ESTIMATE) algorithm with the “estimate” R package32. The results were shown with a stromal score, an immune score, and an ESTIMATE score.

Drug sensitivity analysis

On the basis of the website tool, Genomics of Drug Sensitivity in Cancer (GDSC) (https://www.cancerrxgene.org/), the IC50 of six classical chemotherapeutic drugs for TNBC patients, including Palbociclib, Paclitaxel, Epirubicin, Docetaxel, Cyclophosphamide, and Cisplatin, were computed in each TNBC sample by the “oncoPredict” R package33. The ridge regression was used to determine the significance of variations in the IC50 Z-score among various subgroups with tenfold cross-validation to assess the accuracy of drug sensitivity prediction.

Weighted gene co-expression network analysis (WGCNA)

The “WGCNA” R package was performed on overall gene expression to establish the co-expression network. In order to evaluate the correlation of gene expression patterns among patients with TNBC, paired Pearson correlation coefficient matrices were further developed34. After that, the similarity matrix was processed into the adjacency matrix by using the formula as follows: Sij = ∣cor (xi,xj)∣aij = Sijβ. The soft-threshold power b was determined through a “WGCNA” package feature called soft connection based on the scale-free network. The gene co-expression network's topology was regarded as scale-free, and batch effects were absent if the Topology Fit Index (TFI) had a low power (< 30) value of 0.85 or above. As a result, the power of β was selected as 14. Subsequently, the network modules were analyzed through the Topological Overlap Measure (TOM) and correlative dissimilarity matrix (1-TOM). The minimum module size was set to 50, and the default values for the other parameters were fixed. Following the calculation of the module eigengene (ME), which indicated the first principal component of each module, connections between modules and each stemness subgroup were established. Hub genes for a given module were defined by their gene significance (GS, Pearson's correlation between each gene and clinical trait) and module membership (MM, correlation between each gene and module) parameters, both of which were set at > 0.5 and > 0.835,36, respectively. As a result, the genes with significant MM and high GS were selected as hub genes for subsequent analysis. The “clusterProfiler” R package was employed to conduct the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses37,38,39 on the identified hub genes of the co-expression module40.

Establishment of a stemness-related prognostic signature

The prognostic stemness-related genes were screened out from the hub genes in the selected model by univariate Cox regression analysis. Afterward, the LASSO Cox regression analysis was adopted to avoid model overfitting through redundant gene reduction. The LASSO coefficient profiles of the prognostic genes were generated and tenfold cross-validation results were produced to determine optimal values of the penalty parameter λ (λ = 0.08529673). Accordingly, independent prognostic stemness-associated genes were ultimately screened, and their expression was included in the risk model formula. \(Risk Score={\sum }_{i=1}^{n}\left({Exp}_{i}*{Coe}_{i}\right)\) (n = 6, the expression of each gene is shown as \({Exp}_{i}\), and the corresponding Cox regression coefficient of each gene is shown as \({Coe}_{i}\)). Based on the median value of the risk score, the entire TNBC population was separated into high-risk and low-risk groups, and a survival analysis was conducted. Subsequently, the stemness-related risk scores of the 3 clusters were calculated, respectively, and P < 0.05 was considered statistically significant (Kruskal–Wallis test). Notably, the prognostic value of the stemness-related risk score was assessed within the clusters obtained by unsupervised clustering. In order to identify the critical genes for further study, the random forest survival analysis was performed on the 39 prognostic stemness-related genes selected by univariate Cox regression analysis using the R package “randomForestSRC”.

Enrichment analysis of hallmark gene sets

Based on the median value of the stemness-related score, 499 TNBC samples were divided into high-risk and low-risk cohorts. The hallmark gene sets were obtained from the online database Molecular Signatures Database (MSigDB, http://software.broadinstitute.org/gsea/msigdb/). The R package “clusterProfler” was adopted to conduct the gene set enrichment analysis (GSEA)41. As a result, the relevant subsets of hallmark gene sets and functional enrichment pathways were identified.

Cell culture

The human breast epithelial cell lines MCF-10A and TNBC cell lines SUM159, CAL-148, MDA-MB-231, and MDA-MB-453 were obtained from the American Type Culture Collection (ATCC). According to the vendor’s guidelines, each cell line was kept in good condition. All the cell lines were placed in the incubator with 5% CO2 and 37 °C. MCF-10A cell lines were cultivated in specialized medium purchased from Procell (Wuhan, China; CM-0525). Basically, SUM159, MDA-MB-231, and MDA-MB-453 were maintained in Dulbecco's modified Eagle's medium (DMEM; Gibco BRL, USA). CAL-148 was cultured in RPMI 1640 medium (Gibco BRL, USA). And all medium supplemented with 10% fetal bovine serum (FBS; Gibco, Grand Island, NY, USA) and 1% penicillin–streptomycin (Beyotime, Shanghai, China).

SiRNA transfection

Following the manufacturer's instructions, cells were seeded in 6-well plates and transfected with either target KIF15 (siKIF15) or negative control (NC) using Lipofectamine 3000 (Invitrogen, USA). The NC sequence is a widely cited sequence in the literature that lacks homology with all mammals, making it suitable for the study of different genes in rats, mice, humans42. After 48 h transfection, cells were extracted and further examined. The sequence details for the siRNAs, which were specially produced from GenePharma (Suzhou, China), were as follows:

NC:

Sense 5’-UUCUCCGAACGUGUCACGUTT-3’

Antisense 5’-ACGUGACACGUUCGGAGAATT-3’

siKIF15-#1:

Sense 5’-CCGAGAGGAUCAAAUAAUATT-3’

Antisense 5’-UAUUAUUUGAUCCUCUCGGTT-3’

siKIF15-#2.

Sense 5’-GGAUGAAGAAGAGCAUAAATT-3’

Antisense 5’-UUUAUGCUCUUCUUCAUCCTT-3’

Quantitative reverse transcription-PCR (qRT-PCR)

Total RNA was isolated using NucleoZOL Reagent (Macherey–Nagel, Düren Germany) according to the manufacturer’s instructions. The SYBR Green method (Vazyme, Shanghai, China) was employed to examine the expression of targeted genes in triplicate. The data was calculated by using the cycle threshold (CT) (2−ΔΔCT) approach, and the data analysis was performed by the QuantStudioTM 5 Real-Time PCR System (Thermo Fisher, MA, USA). The assay was done three times for each sample. Using the comparative CT approach, the expression levels were standardized to those of b-actin. The relevant primers were provided in Supplemental Table 2.

Western blot (WB)

A RIPA lysis buffer supplemented with 1 mM phenylmethylsulfonyl fluoride (PMSF) and protease/phosphatase inhibitors (Roche, Basel, Switzerland) were used to extract the total protein from cell lysates. The total protein content was determined using a BCA assay kit (Beyotime, Shanghai, China), and 20 mg of denatured proteins were submitted to 10% SDS-PAGE before being electroblotted onto PVDF membranes (Millipore, Burlington, MA, United States). Following this, the membranes were blocked via 5% milk in TBST for 1 h at room temperature before being incubated overnight at 4 °C with primary antibodies: KIF15(55,407–1-AP), Nanog(14,295–1-AP), SOX2(11,064–1-AP), OCT4(11,263–1-AP) all from proteintech, vinculin(sc-73614, santa cruz), and β-actin(AB0035, abways). Eventually, blots were treated with the relevant secondary antibody labeled HRP (1:1000, Beyotime, Shanghai, China) for 1 h at room temperature before exposing the blots to the Bio-Rad imaging system.

Tumor sphere formation assay

The sphere culture was performed as document described43, briefly, The TNBC cells (1 × 104 cells/well) were seeded into the ultralow-attachment six-well plate with DMEM-F12 medium (Gibco, Milan, Italy) containing 2% B27 supplement (no vitamin A; Invitrogen, Paisley, UK; 12,587) and Recombinant Human (20 ng/ml; Sigma Aldrich, Poole, UK; E-9644). After 7 days cultured, the tumor cells sphere was examined and captured.

Statistical analysis

All statistical analyses were conducted in R software (version 4.2.0) (https://www.r-project.org/). The survival analysis was conducted using the Kaplan–Meier curve. The Wilcoxon test and Kruskal–Wallis test were employed for pairwise comparisons between two groups and several groups, respectively. Pearson’s correlation analysis was employed in the comparison between ssGSEA scores of stemness-related gene sets. The comparison of PCR results was performed using a t-test. And p < 0.05 was considered statistically significant. All methods were carried out in accordance with the relevant laws and regulations.

Results

The enrichment analysis of stem-related gene sets and construction of stemness subsets

The relationship among the ssGSEA enrichment scores of the stem-related gene sets was assessed and shown in Fig. 1a. The 499 TNBC samples were then divided into 3 different clusters using unsupervised clustering (k = 3, Cluster 1–3, Fig. 1b). In Fig. 1c, the enrichment degrees of the stemness gene sets in the three clusters were presented by heatmap. Subsequently, according to the survival analysis, the TNBC patients in Cluster 1 were indicated with better OS than those in Cluster 2 and Cluster 3 (P < 0.05, Fig. 1d). In order to identify the characteristics of TME among the stemness-related subsets, the components of immune cells were analyzed by CIBERSORT. The TME fractions of the three clusters were shown in Fig. 1e. The TNBC samples in Cluster 1 manifested a subtype of anti-tumor immune activation with abundant macrophages M0 and macrophages M1. Cluster 2 exhibited a moderate infiltration of tumor immune cells, for instance B cells, T cells, (natural killer) NK cells and monocytes, but the infraction of macrophages M2 was higher that was related to worse prognosis. As for Cluster 3, some immune cells associated with tumor promotion and immunosuppression were shown higher infiltrations like regulatory T cells (Tregs) and CD8 T cells. After that, we further investigate the drug sensitivity of six classical chemotherapeutic drugs for TNBC, including Palbociclib, Paclitaxel, Epirubicin, Docetaxel, Cyclophosphamide and Cisplatin (Fig. 1f). The IC50 for Paclitaxel, Docetaxel, Cyclophosphamide and Cisplatin was lower in Cluster 1, suggesting that the subtype of Cluster 1 may be potentially more sensitive to these chemotherapeutic agents. Besides, the subtypes of Cluster 2 and Cluster 3 were indicated more sensitive to Palbociclib and Epirubicin.

Fig. 1
figure 1

The enrichment analysis of stem-related gene sets and identification of stemness subgroups. (a) The relationship among the ssGSEA enrichment scores of the stem-related gene sets. Red indicates positive correlations, and blue indicates negative correlations. The darker the color, the stronger the correlation. (b) A total of 499 TNBC patients were classified into three clusters according to the consensus clustering matrix (k = 3). (c) Heatmap and the clinicopathologic characters of the three clusters classified through the enrichment of the stemness gene sets (stage, AJCC tumor stage I/II/III/IV). The scale of color in the heatmap represents the ssGSEA score of each sample. (d) Kaplan–Meier curves for the three clusters. (e) The infiltration fractions of different immune cells among the three clusters (* indicates P < 0.05, ** indicates P < 0.01, *** indicates P < 0.001, ns indicates P > 0.05). (f) The IC50 for Palbociclib, Paclitaxel, Epirubicin, Docetaxel, Cyclophosphamide, and Cisplatin in three clusters (** indicates P < 0.01 and *** indicates P < 0.001).

Identification of the cluster 1-related stemness module and hub genes in TNBC

Since the prognostic outcomes of patients with TNBC in Cluster 1 were better than others, the WGCNA study was performed to identify genes important for Cluster 1 subtype. The 499 samples were clustered to filter outliers for subsequent analysis, and 2 outlier samples from TCGA database were removed out. Then, the trait heatmap was drawn on the basis of the 3 Clusters provided by unsupervised clustering. The heatmap showing the relationships between sample distribution and cluster grouping is presented in Fig. 2a. The gene expression profiles varied among different clusters. Each cluster was associated with a distinct prognosis, and differentially expressed genes exhibited diverse distributions across these clusters. Afterwards, for purpose of establishing a scale-free network, a power of β = 14 (scale-free R2 = 0.85) was chosen as the soft threshold parameter (Fig. 2b). Following the setting of 50 as the minimum number of genes for every single module, the genes with comparable expression patterns were grouped into 8 modules, as illustrated by clustering dendrogram (Supplemental Fig. 1). Out of all 8 modules, the subtype of Cluster 1 and the yellow module demonstrated the largest positive association (Fig. 2c, ME = 0.62, P < 0.05). Total 103 genes in yellow module were provided in the Supplemental Table 3. As a result, the yellow module was selected as the crucial module, and 39 crossover potential pivotal genes were then screened from this module using the screening parameters of MM > 0.8 and GS > 0.5 for further analysis (Fig. 2d).

Fig. 2
figure 2

Identification of the Cluster 1-related stemness module and hub genes in TNBC. (a) Clustering dendrogram of 499 samples. The trait heatmap based on the three clusters provided by unsupervised clustering shows the relationship between sample distribution and cluster grouping. (b) Scale-free fit index (left) and the mean connectivity (right) for soft-thresholding powers. When b was set at 14, the scale-free network was constructed. (c) Analysis of module-trait relationships in TNBC based on the 497 TNBC patients. A positive correlation between the yellow module (containing 103 genes) and the Cluster 1 subtype was indicated with a P < 0.05 (correlation coefficient = 0.62, P = 1E-54). (d) A scatter plot of GS for Cluster 1 subtype and the MM in the yellow module (correlation coefficient = 0.8, P = 3.9E-24). (e) GO analysis on the hub genes in the yellow module, showing the top ten GO terms of MF, CC, and BP. (f) The 10 enriched KEGG pathways in the hub genes.

Subsequently, GO and KEGG enrichment analysis were conducted on the hub genes in the yellow module to investigate the associated biofunction activities. The top ten GO terms of molecular function (MF), cellular component (CC), and biological process (BP) were shown in Fig. 2e, which were mainly enriched in the activities of chromosomes. Besides, the results of KEGG indicated that the yellow module was principally involved in cell cycle, progesterone-mediated oocyte maturation, oocyte meiosis, cellular senescence, human T-cell leukemia virus 1 infection, microRNAs in cancer, viral carcinogenesis, FoxO signaling pathway, DNA replication, and p53 signaling pathway (Fig. 2f).

Establishment of stemness-related gene risk model

The prognostic stemness-associated gene signature was developed based on the hub genes in the yellow module. The univariate Cox regression analysis was performed on the 39 hub genes in the first instance to identify the prognosis-related ones (Supplemental Fig. 2). After that, the LASSO regression analysis was performed on the prognostic stemness-related genes. Figure 3a showed the LASSO coefficient profiles of the 39 genes. In order to find the optimal value for the penalty parameter λ, tenfold cross-validation results were created (Fig. 3b, λ = 0.08529673). Consequently, 4 candidate genes were identified to construct the stemness-related risk model: thyroid hormone receptor interacting protein 13 (TRIP13), Forkhead box M1 (FOXM1), kinesin family member 15 (KIF15), and centrosomal protein 55 (CEP55). Subsequently, the expression levels of the 4 genes and the relevant regression coefficients were used to establish the risk model: risk score = (− 0.076 × expression level of TRIP13) + (− 0.020 × expression level of FOXM1) + (− 0.016 × expression level of KIF15) + (− 0.017 × expression level of CEP55). Based on the stemness-related gene signature, all TNBC sample were divided into high-risk and low-risk cohorts according to the median risk score. Then, the survival analysis was performed on the two cohorts and visualized by drawn Kaplan–Meier curves. As shown in Fig. 3c, the clinical outcomes of TNBC patients in the high-risk cohort were worse than those in the low-risk cohort (P < 0.05). Afterwards, GSEA analysis was conducted to identify the hallmark gene sets enriched differently between the high-risk and low-risk cohorts. The genes with high expression level in the stemness-related high-risk cohort were considerably enriched in a number of hallmark gene sets, including androgen response, DNA repair, E2F targets, estrogen response, G2M checkpoint, glycolysis, heme metabolism, mitotic spindle, mtorc1 signaling, MYC targets 1, MYC targets 2, Notch signaling, oxidative phosphorylation, protein secretion, reactive oxygen species pathway, TGF-β signaling, TNFA signaling via NF-κB, unfolded protein response, UV response, and WNT/β-catenin signaling (Fig. 3d).

Fig. 3
figure 3

Establishment of a stemness-related gene risk model. (a) LASSO coefficient profiles of 39 stemness-related hub genes. (b) The results of the tenfold cross-validation determined the optimal value of the penalty parameter λ (λ = 0.08529673). Four independent prognostic genes for the signature establishment were identified. (c) Kaplan–Meier survival curves of 499 TNBC patients suggest that the OS of the high-risk group is lower than the low-risk group (P = 1.4321E − 09). (d) GSEA analysis shows that the hallmark gene sets are enriched differently between the high-risk and low-risk groups. (e) The stemness-related risk score was calculated for Cluster 1, Cluster 2, and Cluster 3.

According to the stemness-related risk model, the stemness-related risk score was calculated for Cluster 1, Cluster 2, and Cluster 3, respectively. As shown in Fig. 3e, Cluster 1 had a lower stemness risk score than the other two (P < 0.05).

The association of stemness-related risk model and TIME

Total 499 TNBC samples were involved in the CIBERSORT analysis to investigate the differences of immune characteristics between stemness high-risk and low-risk cohorts. As shown in Fig. 4a, the infiltrations of some immune cells were significantly higher in the high stemness-risk cohort, for instance CD8 + T cells, resting memory CD4 + T cells, Tregs, activated NK cells, monocytes, and resting mast cells. By contrast, a large number of other anti-tumor immune cells were observed in the low-risk cohort, including activated memory CD4 + T cells, follicular helper T cells, macrophages M0, and macrophages M1. After that, the ESTIMATE scores were calculated in the high and low stemness-risk cohorts and shown in Fig. 4b-d. The high-risk cohort was found to have higher values of Stromal score, Immune score, and ESTIMATE score than the low-risk cohort (P < 0.05).

Fig. 4
figure 4

The association between stemness-related risk model, immune cell infiltration, and chemotherapeutic sensitivity. (a) The characteristics of immune cell infiltration between low-risk and high-risk groups. (b-d) The Stromal scores, Immune scores, and ESTIMATE scores of high-risk and low-risk groups. (e) The IC50 for Palbociclib, Paclitaxel, Epirubicin, Docetaxel, Cyclophosphamide, and Cisplatin in high-risk and low-risk groups (* indicates P < 0.05, ** indicates P < 0.01, and *** indicates P < 0.001).

The relevance of drug sensitivity and stemness-related risk score

The "oncoPredict" tool in R software was used to determine the relationship between the stemness-related risk score and the half maximal inhibitory concentration (IC50) of several anticancer medicines. As illustrated by Fig. 4e, TNBC anticancer agents commonly used in clinical practice, including paclitaxel, docetaxel, cyclophosphamide, and cisplatin, have been shown to have superior efficacy in high-risk populations.

Identification and validation of targeted gene

Based on the yellow module, the 39 potential pivotal genes were subjected to the random forest survival analysis to find out the gene that was relatively important for OS (Fig. 5a). As a result, KIF15 was selected. Subsequently, the expression level of KIF15 was detected in the different unsupervised clustering subtypes, respectively. As shown in Fig. 5b, the expression of KIF15 was higher in the Cluster 1, which had a better prognosis, than Cluster 2 and Cluster 3 (P < 0.05). Similarly, the expression of KIF15 was furtherly assessed in the high-risk and low-risk cohorts on the basis of the stemness-related signature. As shown in Fig. 5c, in contrast to the high-risk cohort, KIF15 expression was higher in the low-risk cohort (P < 0.05). Besides, the expression of KIF15 was found negatively associated with the stemness-related risk score in TNBC (Fig. 5d, R = − 0.81, P < 0.05). According to the median value of the expression level of KIF15, 499 TNBC patients were divided into KIF15 high-expression group and low-expression group. The survival analysis was performed on these two groups, indicating that TNBC patients with lower expression of KIF15 had better prognosis (Fig. 5e, P < 0.05). The Sankey diagram was drawn to display the association among the expression of KIF15, unsupervised clustering stemness subtypes, stemness-related risk types, and the survival outcomes (Fig. 5f).

Fig. 5
figure 5

Identification of the targeted gene. (a) The random forest survival analysis was conducted on the 39 potential pivotal genes to identify the gene that was relatively important for OS. The x-axis refers to variable importance. It represents the contribution of each selected prognosis-related gene to the prognostic model. (b) The expression of KIF15 in Cluster 1, Cluster 2, and Cluster 3. (c) The expression of KIF15 in stemness-related high-risk group and low-risk group. (d) The correlation between the expression of KIF15 and the stemness-related risk score in TNBC (R = -0.81, P < 0.05). (e) Kaplan–Meier survival curves indicate that TNBC patients with high expression of KIF15 have a better OS than those with low expression (P < 0.001). (f) The Sankey diagram indicated the correlation among KIF15, different clusters, stemness-related risk score, and survival states (alive or dead).

Knockdown of KIF15 strengthens CSC traits of TNBC cells

From the candidate genes in the yellow module, KIF15 was the most significant gene ascertained by LASSO and random forest survival analysis, thus the potential role of KIF15 in TNBC characterization was further validated in terms of function. The mRNA expression level of KIF15 and the protein level of KIF15 were detected by qRT-PCR and WB, respectively. The mRNA level of KIF15 was significantly higher in MDA-MB-453 and MDA-MB-231cell lines than that in others (Fig. 6a). The protein level of KIF15 was markedly elevated in MDA-MB-453 and MDA-MB-231cell lines (Fig. 6b). As illustrated in Fig. 6c and d, the KIF15 was inhibited successfully in MDA-MB-453 and MDA-MB-231 cell lines by specific siRNAs (si-KIF15-1 and si-KIF15-2). KIF15 was considered to be associated with CSC. Consequently, the expressions of stem cell markers Nanog, SOX2 and OCT4 were evaluated through qRT-PCR and WB. The expression levels of Nanog, SOX2 and OCT4 were found to be elevated after si-KIF15-1 and si-KIF15-2 transfections in MDA-MB-453 as well as MDA-MB-231 (Fig. 6e–f). Subsequently, the expression of KIF15 mRNA was validated in the sphere and adherence cells via qRT-PCR, demonstrating that the expression of KIF15 was decreased in the sphere TNBC cells than that in the adherence cells (Supplemental Fig. 3). Finally, we conducted the sphere formation assay to further evaluate the stemness characteristics of KIF15. As shown in Fig. 6g, the sphere numbers and sizes were significantly increased after si-KIF15-1 and si-KIF15-2 transfections both in MDA-MB-453 and MDA-MB-231. The above results shown that knockdown KIF15 promotes TNBC stemness.

Fig. 6
figure 6

Knockdown of KIF15 strengthens the CSC traits of TNBC cells. (a) The qRT-PCR analysis of KIF15 mRNA levels in normal mammary epithelial cell lines (MCF10A) and TNBC cell lines (SUM159, CAL-148, MDA-MB-231, and MDA-MB-453). (b) Western blot analysis of KIF15 protein levels in normal mammary epithelial cell lines (MCF10A) and TNBC cell lines (SUM159, CAL-148, MDA-MB-231, and MDA-MB-453). (c) KIF15 mRNA expression was successfully inhibited by specific siRNAs (si-KIF15-1 and si-KIF15-2) compared to the negative control (NC) in MDA-MB-453 and MDA-MB-231. (d) KIF15 protein expression was inhibited by specific siRNAs (si-KIF15-1 and si-KIF15-2) compared to NC in MDA-MB-453 and MDA-MB-231. (e) The qRT-PCR analysis shows the mRNA expression of stem-related markers Nanog, SOX2, and OCT4 in si-KIF15-1 and si-KIF15-2 transfected TNBC cells. (f) Western blot analysis shows the protein expression of stem-related markers Nanog, SOX2, and OCT4 in si-KIF15-1 and si-KIF15-2 transfected TNBC cells. (g) Representative images of TNBC cell spheres after transfection with NC, si-KIF15-1, and si-KIF15-2 for 7 days, respectively.

Discussion

Recent studies confirm the concept that stem cells are essential for the creation of multicellular creatures as well as the growth of malignancies44. It has been shown in several studies that CSCs may arise from mutations in normal stem cells yet may also be derived from mutant progenitor cells and be highly heterogeneous44,45,46,47. Given the evidence, CSCs possess the ability to self-renew, the power to differentiate into any type of tumor cell, and the proliferative capacity to promote further growth of the malignant cell population14,48,49. In the current research, a large cohort of multicenter TNBC patients were subjected to thorough bioinformatic studies to elucidate the molecular properties of 26 CSCs gene sets. A cancer stem cell (CSC)-related risk model was constructed to predict the prognosis as well as guide the diagnosis and treatment of TNBC. Some studies have shown that the stemness of tumor cells is associated with signaling pathways such as E2F, G2M, WNT, and NOTCH. This was consistent with our findings, where these pathways were enriched in the high-risk group related to CSC associated genes50,51,52. Based on the CSC-related signature, KIF15 was identified as a critical gene correlated with stemness of TNBC cells by means of univariate regression analysis and random forest approach, which was validated by in vitro experiments.

KIFs are a group of molecular motors that hydrolyze adenosine triphosphate (ATP) to create energy for material transport53,54,55. As a member of the kinesin-12 subfamily, KIF15 is another tetrameric spindle motor associated with the progress of mitosis and cytokinesis53,56,57. Previous studies indicated that KIF15 played a significant role in the development of several tumors and resulting a poor prognosis58,59,60,61,62. According to the results of these research, KIF15 was identified as an oncogene which served an essential role in the proliferation, migration, and metastasis of some malignant tumors, including lung cancer63, hepatocellular cancer64, and pancreatic cancer65. Besides, Sheng et al. found that KIF15 was markedly upregulated in TNBC and the downregulation of KIF15 could suppress the invasion of TNBC cells through inducing G2/M arrest and apoptosis54. However, in our study, survival analysis suggested that KIF15 was positively associated with prognosis, as well as its expression was upregulated in TNBC. Based on the CSC-related signature, the expression of KIF15 was higher in the low-risk group which was identified with a better clinical outcome. In addition, the Kaplan–Meier curves indicated a better OS for Cluster 1, which was found with a higher expression level of KIF15 than the other clusters. Therefore, the findings demonstrated that increased KIF15 acted in a defensive manner in the progression of TNBC. Interestingly, a study about gastric cancer (GC) stem cells found that GC patients with high expression of KIF15 had longer survival times66. It's consistent with our findings and contrary to other previous studies. The paradoxical opposite role of KIF15 expression may be attributed to the particular way in which KIF15 modulates cellular processes through diverse regulating networks67. The results of our study indicated that KIF15 may play an important role in the CSC development.

In the current research, as a gene consisting of the CSC-related signature, KIF15 was found to be significantly upregulated in the TNBC cells, especially in MDA-MB-453 and MDA-MB-231cell lines. After that, the KIF15 was knocked down successfully in MDA-MB-453 and MDA-MB-231 cell lines. Notably, some stem cell markers were indicated elevated after si-KIF15-1 and si-KIF15-2 transfections in TNBC cells, including Nanog, SOX2 and OCT468,69,70. These outcomes were observed simultaneously at the level of mRNA expression and protein translation. Consequently, KIF15 was considered to play a significant role in the progression of tumor cells through CSC development regulation. Besides, the sphere numbers and sizes were markedly increased after KIF15 was knocked down in the sphere formation assay, suggesting that KIF15 may suppress the stemness of TNBC cells. On the other hand, to further confirm our conjecture, the expression of KIF15 mRNA was validated in the sphere and adherence TNBC cells through qRT-PCR. The down-regulation expression of KIF15 in the sphere cells supported our inference. In future research, we will continue to investigate the role of KIF15 in tumor cell stemness and its specific mechanisms affecting the prognosis of TNBC. KIF15 may potentially become one of the tumor markers for assessing TNBC prognosis in clinical practice. During the treatment follow-up of TNBC patients, it could serve as a factor for evaluating therapeutic efficacy. Furthermore, the role of CSCs in the progression of TNBC remains unclear. Further research into the relationship between CSC-related genes and TNBC prognosis could lead to a more refined classification of TNBC, aiding in the identification of new targets for targeted therapy and achieving precision treatment.

Naturally, this study has a number of limitations that must be discussed and further validated. Firstly, the data involved in this study were collected from the public databases without prospective clinical data from our medical center. The data in the database originate from different laboratories and studies, with variations in experimental conditions, sample handling, and data standardization methods. Additionally, we excluded samples with incomplete information and those with short survival times, which may affect the comprehensiveness and accuracy of the analysis. Despite the sufficient TNBC samples as training and validation sets to provide support for our findings, additional validation of the predictive effect of the CSC-related risk model on prognosis through a large cohort of sample sizes from our medical center is still important with future studies. Secondly, the CSC related genes to establish the risk model were identified through bioinformatics. Different clustering algorithms and parameter choices can affect the final consensus results. Additionally, the outcomes of consensus clustering are dependent on the quality of the initial clustering results. In some cases, determining the optimal number of clusters (k) remains a challenge. Although consensus clustering can provide stability assessments, it does not directly address the issue of selecting the number of clusters. Furthermore, our study included a relatively small dataset, and despite employing various analytical methods to construct the risk model, the issue of model overfitting still persists. It is of significant importance to conduct functional experiments to validate the association among the CSC-related risk model, TME of stemness cell, anti-tumor immunotherapies as well as the chemotherapeutic resistance. Thirdly, according to the result of our research, KIF15 played a defensive role in the development of TNBC stem cells. However, the specific function of KIF15 in tumor stem cells and the mechanism of regulation of tumor cell development are not clear. These will become the focus of our future research in TNBC as well.

In conclusion, three stemness-associated subgroups with varying prognoses, immune cell infiltration, and chemotherapeutic responses were systematically discovered by unsupervised clustering on stem cell gene sets. A four-gene stemness-associated signature was established with the potential ability to predict the prognosis of TNBC patients. KIF15 plays diverse roles in different tumors and even in different subpopulations within the same tumor. The upregulation of KIF15 expression in a subpopulation of TNBC stem cells may lead to a loss of stem cell stemness and may inhibit the progression of TNBC.