Introduction

Breast cancer is one of the most commonly diagnosed cancer during pregnancy is referred as pregnancy–associated breast cancer. Every 1 of 3000 pregnant women are associated with poor prognosis [1]. It is much important to find appropriate treatment for the breast cancer during pregnancy. Protein post-translational modification (PTMs) is recognized as key regulators of protein functions, led to diverse functions of proteins.

PTM can occur at any stage of protein to enhance or reduce their property and functionality or sometimes totally degrade the protein. So, as a whole, PTM are important controllers of the whole cellular functions. For example phosphorylation, glycosylation, ubiquitination etc., are the manipulators of many cellular events such as signal transduction, protein-protein interactions etc. Consequently the alteration in the PTM leads to affect the cellular growth mechanisms, which may in turn leads to abnormal cellular proliferation [2]. Its alteration also have progressive associations with many disease and disorders [3]. Thus, understanding the post translational modification is important to characterize the cancer biology.

Data mining method association rule mining is applied to select the specific set of gene from the DEG. Most of the previous studies applied clustering to analyze microarray gene expression data, in order to find out the group of gene expressions, in different biological situations. Usually clustering algorithm, groups the genes based on the similarities in two or more biological constraints. Accordingly a single gene cannot be present in two or more group even it has some similarity. Thus the main drawback of clustering is gene single grouping, which lacks the information that the single gene can interact with different sets of gene; consequently unsupervised data mining technique describes the relationships among genes. Association rule mining is used to search for the frequent patterns and to be applied to gene expression data, in alternative to cluster technique [46]. Apriori algorithm is the competent algorithm [7] for predictive association mining of unknown knowledge from categorical data, which is applied to find the frequently associated PTMs for the differentially expressed genes of pregnancy-associated breast cancer (PABC). However the power of rule mining technique for the genes expression study is explored in this study. An attempt has been made to provide in-depth insight of the PTMs and their contribution in pregnancy associated breast cancer by an integrative analysis of important genes and pathways. Furthermore protein – protein interaction network has been constructed to study and identify the target genes.

Methods

Data Collection and Preprocessing

To study about the breast cancer impacts during pregnancy the dataset GSE31192 [8] was downloaded from the publicly available genomic data repository NCBI GEO (Gene Expression Omnibus) database, which is GPL570 (HG-U133 Plus 2) Affymetrix Human Genome U133 Plus 2.0 Array platform. In that 13 were normal and 20 were tumor samples includes both PABC and non- PABC patients. The raw dataset was preprocessed using limma package in R (V.3.10.1). The DEGs were selected from the normalized data of tumor and normal PABC samples using p-value <0.05 and |log FC| > = 1.5 as the threshold.

Association Rule Mining of PTM

For each of the DEG the PTMs were collected from UniProt [9] and organized as the transaction data set, which is compatible for Apriori algorithm. Using the Apriori algorithm implemented in WEKA, the frequently associated PTMs for DEGs have been identified. This algorithm search for the subsets of transaction from the item sets [10], can be applied to the expression data [5]. Here, datasets are organized in the form of transaction datasets. Each transaction contains a list of items; (differentially expressed genes and their corresponding PTMs). For an example, the rule (A  B, C) interpreted as B and C are frequently associated with A. hence the rule Phosphorylation Nitrosylation, Acetylation., can be interpreted as nitrosylation and acetylation are frequently associated with phosphorylation and this association is overrepresented in DEGs. The quality of the associations was measured using two indexes: support and confidence. Hence the rules with high support and confidence were considered and their corresponding genes were collected for the further analysis like pathway enrichment analysis, network interaction, etc.,

Pathway Enrichment Analysis

Identification of pathways and gene networks for DEGs in cancer progression would yield biologically significant information of the underlying cellular mechanism. The KEGG PATHWAY database is a standard and comprehensive which provides a valuable resource for various biological networks [11]. The clustered genes were analyzed for pathway enrichment analysis using DAVID functional enrichment tool with the threshold of p-value <0.05 and gene count >2.

Interaction Network Analysis

For both up and down regulated DEGs the protein-protein interaction network were constructed using online GeneMANIA [12] program with attributes namely pathway, co-expression, genetic interaction, physical interaction and shared protein domain. The functional enrichment analysis for the network was done by DAVID tool [13]. The network was visualized by Cytoscape [14], the centrality measures were calculated by a plugin Network Analyzer. The network topological parameter, betweenness [15] centrality measures the node’s control over the information flow in the network and the node with high betweenness centrality can influence the information flow by altering or hindering the communication in the network [16]. Hence here it is used as measure to select the hub gene in the network [17].

Result

Differentially Expressed Genes

Based on p value <0.05 and |log FC| > = 1.5 as threshold, 352 down-regulated genes and 321 up-regulated genes were identified. Out of it, top 10 DEGs were shown in tables 1 and 2. The corresponding PTMs for both up and down-regulated genes were retrieved from UniProt. Most of the up regulated gene products are ribonucleotide binding proteins, involved in cell-cycle process and many of them resides in cytoplasm. The down regulated gene products are membrane proteins, having calcium ion binding and kinase activity and involved in receptor linked signal transduction process.

Table 1 Top ten up regulated DEG
Table 2 Top ten down regulated DEG

Association Rule Mining of PTM

The up and down regulated genes with PTMs are organized as transaction dataset to find the frequently associated pattern using Apriori algorithm are shown in tables 3 and 4. With confidence threshold 1.0 the top ten rules were selected. Tables 5 and 6 shows some of the association rules for up and down regulated genes respectively. It shows that, up regulated genes are enriched in phosphorylation, associated with Sumoylation and Caspase. Also down regulated genes enriched in phosphorylation and associated with ubiquitilation, N-Linked glycosylation and Acetylation. The genes involving in the top ten associations were selected for the further analysis.

Table 3 Up – regulated DEG enriched in acetylation, ubiquitylation, methylation and citrulination
Table 4 Down - regulated genes enriched in ubiquitilation, phosphorylation and caspase
Table 5 Best association rule - up regulated genes
Table 6 Best association rule - down regulated genes

Pathway Enrichment Analysis

Pathway enrichment analysis reveals the whole set of interconnected events and their biological interactions of cluster identified. It clearly shows that most of the differentially expressed genes are enriched in Cytokine-cytokine receptor interaction, Focal adhesion, Chemokine signaling pathway. Deficiency in cytokine leads susceptibility to viral infections as well as tumor growth [18]. Focal adhesion pathway is important in cell proliferation, cell survival and cell migration. Altered activities of focal adhesion kinases are associated with cancer cells [19]. Table 7 shows that most of the growth related pathways are altered in this data set. Most of the genes are enriched in development process, since these are pregnant patient’s samples. Further analysis is required to eliminate genes involved in such developmental process during pregnancy in order to find the cancer related gene by constructing an interaction network.

Table 7 Pathway enrichment score

Interaction Network Analysis

The protein-protein interaction network provides the topological and dynamic features of gene products involved in the disease mechanisms. The interaction network for the selected up and down regulated genes was constructed by GeneMANIA, which consist of 86 node and 652 edges. Figure 1 shows the gene product interaction network for up and down regulated genes. The gene products are differentiated based on their betweenness centrality score. The network has been visualized and betweenness centrality of each node of the network was calculated by Cytoscape visualization tool, shown in Table 7. The genes KLF12, COL17A1, MKI67, BLM, FEN1, SP110, MUC1, TFAP2C, EGFR, TFRC, IRF1, TTK, STAT1, KIRREL, PDZRN3, RRM2, FYB with high betweenness centrality were selected. The contribution of selected genes towards cancer progression and pregnancy are tabulated (Table 8), which helps to find the genes involving in development of cancer which is not involved in any form of fetal development or any other pregnancy related process. And the roles of those genes are discussed briefly.

Fig. 1
figure 1

Protein – protein interaction network constructed based on the Betweenness Centrality. Cytoscape tool used to visualize the interactions and VizMapper graphics plugin is utilized to highlight the network with different shapes and shades. The proteins with higher betweenness centrality are highlighted by dark shades as well as different shapes

Table 8 Network topological parameters

Discussion

It is a fact that pregnancy lowers lifetime risk of developing breast cancer. And it is evident that incidence of breast cancer observed in nulliparous women and women giving birth at late 30s [2022]. Approximately 7 % of women with breast cancer are diagnosed before the age of 40 years, this disease accounts for more than 40 % of all cancer in women in this age group [23]. Rare breast cancer in young women is worthy of special attention due to the unique and complex issues that are raised [24]. Thus, a better understanding of driver pathways and genes of PABCs is imperative for improved diagnosis and therapeutic strategies for pregnant and lactating women [20]. Our study is to find specific genes and pathways in PABC tissue expression. From the association rules, the PTM pattern of the PABC was explored, which contributes in every stage of protein’s lifetime to regulate their function. The phosphorylation of proteins regulates almost all aspects of all living cell, modification in the ratio of phosphorylation results in modifications in their function which reflect in the cellular such as cancer evaluation [25]. Phosphorylation and Sumoylation of progesterone receptor involve in the regulation of mammary gland development. Poor Sumoylation of progesterone receptors significantly associated with cancer metastasis and shorten the survival [26]. Phosphorylated progesterone receptor might be under sumoylated during the development of breast cancer or mammary gland development [27]. Caspase cleavage is regulating the apoptotic cell death, change in caspase activity leads to disease such as cancer [28, 29]. N-linked glycosylation is important for the stability of the ATP Binding Cassette (ABC) transporter. The increased expression of ABCG2 results in resistance to chemotherapy [30, 31]. The de-glycosylated ABC transporters, which are known as multidrug resistance proteins in cancer cells, are degraded by Ubiquitylation. Ubiquitylation regulates the stability of glycol proteins, so that they affect the functions of the membrane proteins that mediate multi drug resistance [32]. It is found that glycoproteins constitutively ubiqutinated in cancer cells. These relationships between the PTM are mined by association rules. The over represented relationships and their corresponding genes of PABC were selected from the top 10 rules and their interaction networks were obtained from GeneMANIA tool. From the network, the DEGs, KLF12, COL17A1, MKI67, BLM, FEN1, SP110, MUC1, TFAP2C, EGFR, TFRC, IRF1, TTK, STAT1, KIRREL, PDZRN3, RRM2, FYB were selected as hub genes by using network topological parameter betweenness centrality which is the node’s centrality in a network. Majority of the hub genes are related to pregnancy, fetal development and also related to cancer initiation, progression and metastasis. Among interacts, SP110 and KIRREL are expressed in cancer tissues but still the role of these genes in cancer is unclear. MKI67 is involved in cellular proliferation and reported as potential target for HR positive breast cancer [33] and also expressed in normal pregnant patients [34], hence targeting it would harm the fetal development. BLM, TFAP2C, EGFR, LAMA1, CTSB, TTK, KIRREL, PDZRN3 and RRM2 were known to be expressed in cancer tissues and also previously reported as biomarkers but they are involving in critical roles such as pronephros, brain, eyes, embryonic angiogenic remodeling of fetal development process [3543]. Accordingly inhibition or regeneration of these genes will affect the fetal development. So the genes such as KLF12, FEN1 SP110 and MUC1, which are involving in cancer and not harm to pregnancy, lactation as well as fetal developmental related process were chosen for the further studies. KLF12 is the transcription factor reported as potential target for gastric cancer and also a negative regulator of decidualization and implantation of maternal endometrium development. Hence down regulation of KLF12 may improve the growth and development of the conceptus and also it prevents the cancer growth. FEN1 is the tumor suppressor gene and overexpression of FEN1 leads resistance to chemotherapy. Its overexpression during pregnancy leads to embryonic lethality and normal fetal development was observed in FEN−/− in mouse model [44]. SP110 is involved in chromatin remodeling and formation, but up-regulation of SP110 results in hepatic veno-occlusive disease with immunodeficiency for fetus; hence inhibition of its expression would help in the progressive fetal development [45]. MUC1 is the important gene in preventing embryo implantation and developing Ectopic Pregnancy. It also interacts with EGRF and other receptor tyrosine kinases in the cell membrane and activates the PI3K/ AKT which is the most altered pathway in cancer development. It is localizes in the nucleus and activates the Wnt/B-catenin, Signal transducer and activation of transcription (STAT) and involves in the self-renewal of breast cancer cells NF-κB➔IL-8/CXCR1 pathway [46, 47]. Thus MUC1 may act as the potential 5target for the pregnancy associated breast cancer. Hence this study uncovers that four genes (KLF12, FEN1, SP110 and MUC1) might be the potential target for the PABC, which are not affect the fetal development and improves fetal implantation as well. For further validation the expression of the above said four genes were compared between the cancer samples (antibody staining) and breast and female reproductive system tissues from Human Protein Atlas (HPA) [4850] Database. The gene SP110 expressed low level in breast and female reproductive tissues and medium in breast cancer tissue samples. And the gene FEN1 in expressed high in cancer tissues as well as endometrium, ovary and placenta and low in normal breast tissue. The Gene MUC1 shows high expression in antibody staining of breast cancer tissues but very low or not detected in ovary placenta and breast tissues. Hence this observation supports that targeting these may not disturb the normal tissues where they are expressed very less and can control the cancer progression.