Introduction

Repeated implantation failure (RIF) is a disorder in which good-quality embryos fail to implant in the endometrium following several in vitro fertilization cycles [1]. RIF occurs in approximately 10% of women undergoing in vitro fertilization treatment [2]. Although therapies, such as adjuvant drugs (e.g., growth hormone, androgens, and glucocorticoids), have been used to increase pregnancy rates in women with RIF [3, 4], accurately diagnosing and treating women with RIF are difficult tasks.

Some RNAs have been found to be differentially expressed in women with RIF compared to those in normal controls [5]. A previous study revealed that long noncoding RNAs (lncRNAs) participate in regulating endometrial receptivity [5]. A recent study indicated that the lncRNA, TUNAR, is involved in embryo implantation during RIF [6]. According to Li et al., the lncRNA, ENST00000433673, promotes high mRNA expression of ICAM1 and the adhesion of endometrial epithelial cells, which facilitates adhesion and implantation between the embryo and the mother [7]. Diverse endometrial mRNA signatures have also been recognized to contribute to the development of RIF [8]. According to a previous study, microRNAs (miRNAs) participate in endometrial construction in women with RIF and can be used as biomarkers to predict embryo implantation more reliably [9]. A genome-wide study revealed that lncRNAs, miRNAs, and mRNA form a competing endogenous RNA (ceRNA) network during the progression of RIF [10]. In addition, a previous study indicated that ceRNA networks are involved in functions, such as immunological activity, which prepare the endometrium for embryo implantation [11]. Overall, RNAs and the ceRNA regulatory network may play vital roles in the development of RIF. However, the detailed molecular mechanisms and key genes in the ceRNA network that contribute to RIF progression are still unclear.

This multicenter study was retrospective in design. Briefly, bioinformatic analyses were performed based on two lncRNA/mRNA expression profiles and one miRNA expression profile. Genes, including differentially expressed mRNAs (DEMs), differentially expressed lncRNAs (DElncRNAs) and differentially expressed microRNAs (DEmiRNAs) between normal and RIF samples were explored, and then weighted gene co-expression network analysis (WGCNA) was performed. The functions, pathways, and lncRNA-mRNA interactions were investigated based on these genes. Moreover, a ceRNA network was constructed based on the lncRNA-mRNA and miRNA-mRNA interactions. Finally, the expression of the potential genes in the ceRNA network was verified using an additional mRNA dataset. Overall, this study aimed to identify the potential biomarkers and molecular mechanisms associated with RIF.

Materials and methods

Microarray data

The keywords “repeated implantation failure” and “Homo sapiens” were used to search all expression profile data uploaded in the past five years (2015–2020) in the Gene Expression Omnibus (GEO) database [12]. Datasets meeting the following criteria were included in our study: (i) sample size > 10; (ii) data unified as the expression profile data of endometrial tissue samples; and (iii) samples classified as RIF and normal control. A total of four datasets, including GSE71331 (lncRNA + mRNA), GSE111974 (lncRNA + mRNA), GSE71332 (miRNA), and GSE58144 (mRNA), were selected. Overall, 108 normal control samples (C group) and 79 RIF samples (RIF group) were used in the current study. Flow chart of the current study is provided in Supplementary Fig. 1. Among the four datasets, GSE71332 was used as the test set, while GSE58144 was used as the validation set as it comprised a relatively single datatype. Detailed information on these datasets is provided in Table 1.

Table 1 The detail information for all enrolled datasets in current study

Data preprocessing

Gene expression profile data were downloaded and annotated. For genes associated with different probes, the average values of the different probes were used as the final expression values. A probe with annotation information of “protein_coding” was reserved as an mRNA-compatible probe, while a probe with annotation information of “noncoding RNA” and “pseudogene” was reserved as an lncRNA-corresponding probe. For different miRNAs mapped to the same new ID, the mean value of the different miRNA IDs was considered as the final expression value of this new miRNA ID [13]. This process was performed using the mean algorithm in R software.

Analysis of the differentially expressed genes

The ComBat function [14] in R software (version: 3.34.0) [15] was used to eliminate heterogeneity between the GSE71331 and GSE111974 datasets. Thereafter, the two datasets were combined, and the empirical Bayesian methods in the limma package (3.10.3) of R [16] and t-test were employed to explore the DEMs, DElncRNAs, and DEmiRNAs between the C and RIF groups based on the lncRNA + mRNA and miRNA expression matrix. The Benjamini & Hochberg (BH) adjusted P < 0.05 and |logfold change (FC)| > 0.585 were selected as thresholds. Finally, the results were visualized using a heat map.

Co-expression network analysis

WGCNA (version 1.61) [17] was performed to elucidate the co-expression network. Briefly, the optimal soft threshold (0–1) was selected, and a scale-free network and module partition analysis was performed. The module significance (MS) of each module was calculated, and the correlation between MS and clinical traits in the module was determined. Finally, the gene significance (GS) in the module was investigated.

Function and pathway enrichment analysis

ClusterProfiler (version: 3.16.0) [18] software was used for Gene Ontology (GO)-biological function (GO-BP) [19] and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses [20]. P < 0.05 and count ≥ 2 were selected as cut-off values for the enrichment analysis.

lncRNA-mRNA co-expression investigation

lncRNA-mRNA pairs with r > 0.5 and FDR-adjusted P < 0.05 were selected for further analysis [21]. The results were visualized using the Cytoscape (version 3.4.0) software [22]. The CytoNCA software was used to carry out node topology property analysis in the network [23]. Finally, the degree, betweenness, and closeness of the nodes were obtained.

Enrichment analysis of the lncRNAs

Pathway enrichment analysis of the mRNA (considered as a target gene of lncRNA) was performed using clusterProfiler (version: 3.8.1) in R software [24]. P < 0.05 and count ≥ 2 were considered as cut-off values for significant enrichment results.

ceRNA network construction

The online tool miRWalk (version: 2.0) [25, 26] was used to predict the target genes of miRNAs. If the predicted mRNA existed in at least four of the six databases (including miRWalk2.0 [26], miRDB [27], TargetScan [28], miRanda [29], RNA22 [30], and miRMap [31]), it was considered as a potential mRNA that is regulated by the corresponding mRNA. The miRNAs of key lncRNAs were predicted using the DIANA-LncBase (version: 2.0) database [32]. Finally, the lncRNAs and mRNA that were not only regulated by the same miRNA but had a positive expression relationship were selected to construct the ceRNA network, which was then visualized using Cytoscape software (version: 3.4.0).

Verification of the identified mRNAs

The expression of the potential genes identified in the ceRNA network was verified using the GSE58144 dataset. The gene expression results between the two groups were visualized using a box diagram with R software. Differences were considered statistically significant at P < 0.05.

Reverse transcription-quantitative polymerase chain reaction (RT-qPCR)

The study comprised two groups: women diagnosed with RIF and healthy volunteers. RIF samples were collected from 10 women diagnosed with RIF during the luteal phase at the Assisted Reproductive Medicine Department of Shanghai Ninth People’s Hospital. The participants met the following inclusion criteria: age, 18–38 years; 3 or more implantation failure; and no hormonal preparations used for 3 months prior to sample collection. Women with serious internal and external diseases were excluded from the study. The 10 healthy volunteers were recruited using advertisements posted on social media. The volunteers met the following inclusion criteria: age 18–38 years, proven fertile (have a normal history of childbirth in the past 3 years), and no hormonal or intrauterine contraception used for at least three months prior to sample collection. Women with serious internal and external diseases were excluded from the study. RT-qPCR was performed to determine the key RNAs. Total RNA was extracted using TRIzol reagent, and cDNA was reverse transcribed using Revert Aid First Strand cDNA Synthesis Kit (Thermo Scientific, Waltham, MA, USA). After that, cDNA was subjected to RT-qPCR using the SYBR green method (Takara Bio, Inc., Shiga, Japan) and the BioRad IQTM5 Multicolor Realtime PCR Detection System (Bio-Rad Laboratories, Inc., Hercules, CA, USA). The following reaction conditions were employed: 95 °C for 10 min, followed by 40 cycles of 15 s at 95 °C, 60 s at 60 °C, and 1 min at 95 °C. The relative mRNA expression of each gene was normalized to that of GAPDH, while the relative miRNA expression was normalized to that of the U6 snRNA. The primer sequences are listed in Table 2. This study was approved by the Ethics Committee of the Shanghai Ninth People’s Hospital, which is affiliated with Shanghai Jiaotong University School of Medicine. The experiment was conducted in triplicate. Statistical significance was determined based on the cut-off value of P < 0.05.

Table 2 Primers used in RT-PCR

Results

DEMs, DEmiRNAs, and DElncRNAs between the C and RIF groups

A total of 53 DEmiRNAs (46 upregulated and seven downregulated miRNAs), 327 DEMs (171 upregulated and 156 downregulated mRNAs), and 13 DElncRNAs (eight upregulated and five downregulated lncRNAs) were identified between the C and RIF groups. The volcano plot and heatmap of these genes are shown in Fig. 1.

Fig. 1
figure 1

Volcano plot and heatmap of the differentially expressed lncRNAs (A, B), miRNAs (C, D), and mRNAs (E, F)

The red dot represents upregulated genes; the blue dot represents downregulated genes. For the volcano plot, the X-axis represents the value of log2(FC) while the Y-axis represents the -Log10(p-value); the labeled dots in the volcano plot represent the top 5 upregulated and downregulated genes

WGCNA

WGCNA was performed using all genes. The soft threshold for network construction was 16 (Supplementary Fig. 2), and the fitting degree of the scale-free topological model was 0.9. Using a cutHeight value of 0.3, nine modules were obtained in the current study (Supplementary Fig. 3). The relationship between the expression values of genes in the module and the adjacency correlation between modules was calculated and visualized using a heat map (Supplementary Fig. 4). Based on the correlation between modules and disease status, the black and magenta modules were found to have the highest correlation with disease (Fig. 2A). The correlation between modules and disease status was further investigated according to GS (Fig. 2B). The magenta module was found to be positively correlated with the disease (r = 0.73, P = 3e-11) and was considered the key module for the subsequent analysis.

Fig. 2
figure 2

Results of the correlation analysis for nine modules and disease status. A, Correlation between modules extracted from the datasets and disease; the X-axis represents the different modules while the Y-axis represents the gene significance value. B, Correlation between module and disease status based on the P value

Enrichment investigation for genes in the two modules

Enrichment analysis was performed using the genes in the black and magenta modules. The 315 genes in the black module were found to be mainly enriched in 243 GO-BP functions, such as glomerulus development (GO:0032835, P = 0.0002) (Fig. 3A), and 19 KEGG pathways, such as axon guidance (hsa04360, P = 0.0016) (Fig. 3B). Meanwhile, the 99 genes in the magenta module were mainly enriched in 109 GO-BP functions, such as the regulation of cell growth (GO:0001558, P = 0.0005) (Fig. 3C), and 16 KEGG pathways, such as glioma (hsa05214, P = 0.0034) (Fig. 3D).

Fig. 3
figure 3

Results of the enrichment analysis for genes in the black and magenta modules. A, Gene ontology (GO) biological function (GO-BP) assembled based on genes in the black module. B, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enriched by genes in the black module. C, GO-BP assembled based on genes in the magenta module. D, KEGG pathway enriched by genes in the magenta module. The redder the bubble, the more significant the P value. The larger the bubble, the greater the number of genes enriched in the item

miRNA-mRNA interaction network analysis

Intersection analysis among genes in the magenta module, all DEMs, and all DElncRNAs revealed three intersecting DElncRNAs and 40 intersecting DEMs (Supplementary Fig. 5). An evaluation of the lncRNA-mRNA interaction was performed using these intersecting genes. Based on the results, three lncRNAs (peptidylprolyl isomerase E-like pseudogene (PPIEL), CTAGE family member 7, pseudogene (CTAGE7P), and testis-specific transcript, y-Linked 14 (TTTY14)), 33 mRNAs, and 61 interactions were found in the lncRNA-mRNA interaction network (Fig. 4). Moreover, an enrichment analysis was performed using the three lncRNAs in the lncRNA-mRNA interaction network. These lncRNAs were found to be mainly enriched in GO-BP functions, such as plasma membrane organization (GO:0007009, P = 0.0001, fold enrichment = 27.36), and KEGG pathways, such as pancreatic cancer (hsa05212, P = 0.0024, fold enrichment = 19.32) (Fig. 5).

Fig. 4
figure 4

lncRNA-mRNA interaction network constructed based on intersecting genes in the magenta module

The red diamond represents the upregulated lncRNA; the pink circle represents the upregulated mRNAs; the line between the two nodes represents the interaction

Fig. 5
figure 5

Enrichment analysis of the three intersecting lncRNAs in the magenta module. A, GO-BP functions assembled based on the three intersecting lncRNAs. B, KEGG pathways enriched by the three intersecting lncRNAs. The X-axis represents the different lncRNAs, while the Y-axis represents the different items of GO-BP functions or KEGG pathways

ceRNA network investigation

Prediction of the interacting DEMs regulated by DEmiRNAs revealed 15 mRNAs and 27 DEmiRNAs, while prediction of the interacting DElncRNAs regulated by DEmiRNAs revealed two lncRNAs and three DEmiRNAs. A ceRNA network was constructed with the lncRNA-miRNA-mRNA interactions, including TTTY14-miR-6088-semaphorin 5 A (SEMA5A) (Fig. 6). As a result, 3 miRNAs, 2 lncRNAs, and 8 mRNAs were found in the network.

Fig. 6
figure 6

lncRNA-miRNA-mRNA network

The pink circle represents upregulated mRNA; the red diamond represents upregulated lncRNA; the green hexagon represents downregulated miRNA. The blue line represents the co-expression relation. The grey line represents the regulatory relation

Verification of the identified mRNAs

The expression of the eight mRNAs identified in the ceRNA network was verified using the GSE58144 dataset. Only six mRNAs, including barH-like homeobox 1 (BARHL1), calcium/calmodulin-dependent protein kinase II beta (CAMK2B), cyclin-dependent kinase 6 (CDK6), lysosomal-associated membrane protein 3 (LAMP3), SEMA5A, and zinc finger protein 555 (ZNF555), were exported in GSE58144. Gene expression in the C and RIF groups was visualized using a box diagram. All six mRNAs were found to be upregulated in the RIF group compared to the level found in the C group. The mRNA levels of SEMA5A and ZNF555 were significantly upregulated in the RIF group compared to those in the C group (all P < 0.05) (Fig. 7).

Fig. 7
figure 7

Expression verification of the mRNAs in the ceRNA network

The X-axis represents different groups while the Y-axis represents the expression value. * P < 0.05 compared with the control group

RT-qPCR

Finally, we validated the key RNAs, namely SEMA5A, ZNF555, TTTY14, PPIEL, and miR-6088, using RT-qPCR. The levels of SEMA5A, ZNF555, TTTY14, and PPIE were confirmed to be significantly upregulated in the RIF group, while that of miR-6088 was significantly downregulated relative to those in the C group (all P < 0.05; Fig. 8).

Fig. 8
figure 8

Reverse transcription-quantitative polymerase chain reaction (RT-qPCR) analysis of the five RNAs

**P < 0.01, ***P < 0.001, ****P < 0.0001

Discussion

Some physical habits, such as excessive mental stress, irregular sleep patterns, lack of exercise, or excessive exercise, are reported to be associated with increased RIF in patients [33]. Although the incidence of RIF is high in women following in vitro fertilization treatment, the detailed molecular mechanism remains unclear [34]. In the present bioinformatics study, 53 DEmiRNAs, 327 DEMs, and 13 DElncRNAs were found between the C and RIF groups. WGCNA revealed that the magenta module was positively correlated with RIF disease status. According to the lncRNA-mRNA interaction analysis based on genes in the magenta module, three intersecting lncRNAs, including PPIEL and TTTY14, were found; these lncRNAs were mainly involved in functions, such as plasma membrane organization. Moreover, ceRNA network analysis revealed several interactions, such as TTTY14-miR-6088-SEMA5A. Finally, verification analysis showed that SEMA5A and ZNF555 were significantly upregulated in the RIF group compared to that in the C group. RT-qPCR was performed to verify the key RNAs. Notably, the RT-qPCR results were consistent with the above results.

SEMA5A is involved in axonal guidance during neural development [35]. SEMA5A not only functions in the nervous system, but also in the development of diseases. Dziobek et al. revealed that SEMA5A was upregulated in endometrial cancer and could be used as a supplementary molecular marker for endometrial cancer [36]. In animal models, SEMA5A is regulated by miR-24-1-5p, which participates in endometriosis during the implantation window in rats [37]. Honda et al. performed microarray analysis on neonatal brain exposed to cadmium during gestation and lactation and found abnormal expression of SEMA5A [38]. Moreover, endometrial miRNAs, such as miR-6088, are altered during the window of implantation in patients with RIF [39]. Importantly, embryo implantation requires an optimal endometrium environment, which includes signaling by miRNAs, such as miR-6088 [9]. According to previous studies, miR-6088 is dysregulated in women with a high risk of ovarian cancer [40], and TTTY14 is significantly correlated with overall survival in patients with cancer [41]. Bhat et al. utilized a panel of molecular biology tools to examine the Y chromosome microchimerism in the endometrium using secretory phase samples from fertile and infertile patients with severe (stage IV) ovarian endometriosis and without endometriosis. Based on their result, TTTY14 exhibited a bimodal pattern of expression characterized by low expression in samples from fertile patients and high expression in samples from infertile patients [42]. The lncRNAs in ceRNA have been proven to contribute to RIF progression by regulating the miRNA-mRNA interaction, which can be used as biomarkers for predicting endometrial receptivity [11]. Consistent with the above results, in the present study, SEMA5A was significantly upregulated in the RIF group compared with that in the C group; TTTY14-miR-6088-SEMA5A was one of the ceRNA interactions identified in the current study. Thus, we speculated that the lncRNA, TTTY14, might participate in RIF progression by regulating the miR-6088/SEMA5A axis.

In a previous study, ZNF555 was identified as a putative transcriptional factor highly expressed in human diseases, such as in primary myoblasts, and could serve as a therapeutic target [43]. The lncRNA, PPIEL, is also closely related to non-small cell lung cancer in women [44]. Different DNA methylation statuses of PPIEL were previously revealed to contribute to the development of diseases [45]. Yeung et al. determined whether DNA methylation at birth and in childhood differs by conception using assisted reproductive technologies or ovulation induction compared with conception without fertility treatment. Based on their results, regions in maternally imprinted genes, including IGF1R, PPIEL, and SVOPL, had decreased mean DNA methylation levels among newborns conceived by assisted reproductive technologies [46]. In this study, ZNF555 was one of the two significantly upregulated DEMs in the RIF group relative to the levels in the C group. Meanwhile, the lncRNA, PPIEL, was not only differentially expressed between the C and RIF groups but also enriched in the GO-BP function of plasma membrane organization. More importantly, the RT-qPCR results were consistent with the above results. Thus, we speculated that the mRNA, ZNF555, and lncRNA, PPIEL, might be novel biomarkers for RIF, and PPIEL might participate in the progression of RIF via its plasma membrane organization function.

This study had some limitations, such as its small sample size. Thus, further verification analysis is needed with a larger sample size. In addition, no function assay was performed to validate the conclusion that lncRNA TTTY14 participates in the progression of RIF by regulating miR-6088/SEMA5A axis.

In conclusion, the lncRNA, TTTY14, may participate in the progression of RIF by regulating the miR-6088/SEMA5A axis. Moreover, the mRNA, ZNF555, and lncRNA, PPIEL, could be considered novel biomarkers for RIF, and the lncRNA, PPIEL, might participate in the progression of RIF via its plasma membrane organization function.