Introduction

Homeobox (HOX) genes were initially characterized as developmental genes, which code for transcription factors that subsequently lead to embryogenesis. These genes are evolutionarily highly conserved. In humans, 39 HOX genes are distributed in four paralogous clusters, namely, A, B, C, and D, which are located on chromosomes 7p15, 17q21.2, 12q13, and 2q31, respectively, and are numbered from 1 to 13, according to their 3′-to-5′ order of alignment [1]. They are known to control development along the anterior–posterior axis, according to three unique precepts: spatial collinearity (the expression of HOX genes follow the same order along the anterior–posterior axis of the embryo), posterior prevalence (HOX genes that are positioned more toward 5′ in the cluster will have a dominant phenotype, compared to those more toward 3′), and temporal collinearity (HOX genes are activated in a timed sequence, which follows their numeric order, i.e. 3′-to-5′ genomic order). They are responsible for maintaining cell pluripotency, determining cell fate, and promoting differentiation in multicellular organisms. Their expression is tightly controlled and coordinated within the family during development [1,2,3]. Contrarily, their expression in mature, normal tissue is generally limited, and gene coordination within the family has not been observed or reported [4, 5].

A wide spectrum of regulatory mechanisms is involved in the control of HOX expression during development. Wnt signaling, Retinoic acid (RA) and CDX genes are demonstrated to act upstream of HOX genes [6,7,8,9,10]. It is also known that histone modification and transition in chromatin configuration are crucial for collinear and sequential expression of HOX genes. Their expression depends on the distribution of histone modifications associated with inactive (trimethylation at Lys27 of histone H3, H3K27me3) and active (trimethylation at Lys4 of histone H3, H3K4me3) chromatin [11, 12]. During activation, the relevant areas of the cluster enter a nuclear territory of active transcription. This relocation and change of configuration of HOX chromatin lead to exposure of cis-regulatory elements and sequential expression of HOX genes [13]. In addition, many noncoding transcripts, including miRNA and lncRNA, embedded within the HOX cluster play an important role in HOX gene regulation and coordination [14, 15].

Furthermore, HOX genes are known to play a key role in both solid and hematological malignancies, including cancers of the colon, breast, prostate, lung, brain, thyroid, ovary, bladder, kidney, skin, and blood [1, 16]. In breast cancer, each HOX gene has been shown to play important roles in the progression of cancer by contributing to anti-apoptotic pathways, invasion, epithelial-to-mesenchymal transition, tumor angiogenesis, and endocrine therapy resistance [17,18,19,20,21,22,23,24].

Considering the strong coordination of and within the HOX gene family during the embryonic period, we proposed an idea that this might also be applicable in cancer. Previous studies on HOX genes and cancer focused on exploring the role of individual HOX genes in cancer, and none of the studies have performed an effective comprehensive analysis on the whole gene family [16]. In the present study, by conducting a comprehensive analysis of the role of the HOX family in cancer, we aimed to determine whether HOX gene coordination is also noted in cancer as well as to elucidate the molecular-biological background underlying it. Moreover, we hypothesized that cancer with characteristics similar to embryogenesis might present with a higher degree of malignancy, as undifferentiated (stem cell-like or pluripotent) cancer is often clinically unfavorable. Here, we present a comprehensive analysis of the HOX gene family in cancer, with the purpose to understand the true role of this gene family as well as the associated mechanisms.

Methods

Data in this study

Following data and their uses are listed in Table 1.

  1. (1)

    Breast cancer microarray data for meta-analysis

    Publicly available microarrays and prognosis data were retrieved from the NCBI Gene Expression Omnibus (GEO) data repository. We collected breast cancer data from four array datasets (GSE11121, GSE7390, GSE3494, and GSE2990).

  2. (2)

    TCGA breast cancer data and leukemia data

    Normalized transcriptome data available for breast cancer (512 samples) and acute myeloid leukemia (148 samples) were downloaded via cBioPortal (https://www.cbioportal.org/) with information related to their prognosis.

  3. (3)

    Sarcoma microarray data

    Public microarray and prognosis data for synovial sarcoma were retrieved from the GEO data repository (GSE20196; 18 samples).

  4. (4)

    Microarray breast cancer data available at our hospital

    Clinical specimens of estrogen receptor-positive, HER2-negative human breast cancer (n = 30) were collected from patients with primary operable breast cancer (invasive ductal carcinoma) who underwent total or partial mastectomy between October 2017 and April 2019 in Keio University Hospital (Tokyo, Japan). ER and HER2 expression was evaluated by IHC, and additional fluorescence in situ hybridization (FISH) was performed for HER2, if necessary. The present study was approved by the Ethics Committee at the Keio University School of Medicine (Approval Number, 20170406), and the study was performed in accordance with the provisions of the Declaration of Helsinki (as revised in Fortaleza, Brazil, October 2013). All subjects provided informed consent. The microarray platform used was Affymetrix Human Genome U133 Plus 2.0 Array. The possibility of a bias between formalin-fixed paraffin-embedded (FFPE) and fresh frozen samples was assessed by hierarchal clustering.

  5. (5)

    Normal breast tissue microarray data

    Public microarray data of histologically normal breast epithelium was retrieved from the GEO data repository (GSE20437; 42 normal samples).

    Table 1 Data and purpose in this study

Gene correlation analysis

Pearson correlation coefficient was calculated for the expression of each HOX gene, and the genes showing strong correlation were extracted in each subtype. The relationships of HOX genes as well as the strongly correlated genes were drawn by R package "corrplot version 0.84".

Prognosis analysis

  1. (1)

    Meta-analysis of public breast cancer transcriptome data

    Microarray data was normalized by the robust multiarray average algorithm (RMA) method. Batch-specific effects for each study were adjusted, considering the proportions of breast cancer subtypes calculated by PAM50. Subtypes of PAM50 of each sample were calculated using R package "genefu version 2.12.0" (http://www.pmgenomics.ca/bhklab/software/genefu). All statistical analyses in this study were performed using R ver3.25.

    We extracted samples of the luminal B subtype and classified them into two groups based on the HOX genes expression pattern, by unsupervised hierarchal clustering. The recurrence-free survival (RFS) rate of the two groups was compared by Kaplan–Meier method and evaluated by the log-rank test.

  2. (2)

    Gene enrichment analysis

    Using aforementioned meta-data, gene ontology (GO) enrichment analysis using differently expressed genes (DEGs) (FDR < 0.05; fold change > 1.5) was performed using David ver6.7. In the function annotation chart obtained from DAVID, the GO term with a p value of < 0.1 was considered indicative of a statistically significant difference.

  3. (3)

    Comparison of HOX pattern classification and other risk scores

    Using available meta-data, the risk scores of OncotypeDX and gene70 were virtually calculated by R package "genefu version 2.12.0".

  4. (4)

    Comparison of prognosis of non-epithelial malignancies

    Using the gene expression pattern of HOX genes, unsupervised hierarchal clustering was performed for publicly retrieved leukemia RNA-Seq data and sarcoma microarray data, and samples were divided into two groups. The overall survival (OS) rates of the two groups were compared by Kaplan–Meier method and evaluated by the log-rank test.

Results

HOX genes strongly correlate within the gene family in breast cancer, but not in normal breast tissue

To investigate the genes which are expressed together with the HOX family, we calculated the correlation coefficients of each HOX gene in breast cancer and the normal breast gland. In all subtypes of breast cancer, the neighboring HOX genes, especially those in the same clusters of HOX-A, HOX-B, and HOX-C, strongly correlate with each other (Fig. 1a, Supplementary Fig. 1a–c). HOX genes in the same cluster or those proximal to each other tend to work together and are expressed together. We also analyzed the correlation in luminal breast cancer data of our hospital patients and observed similar strong correlation among HOX genes, as recorded in aforementioned public data results (Supplementary Fig. 1d). No bias was noted between FFPE and fresh frozen samples. However, in the normal breast tissue, much less correlations were observed among HOX genes (Fig. 1b).

Fig. 1
figure 1

Correlation analysis of all HOX genes in a luminal B breast cancer and b normal mammary tissue. Size and color of each circle show the strength of correlation of the two genes in the upper row and left-side column. Larger and thicker colored circles represent stronger correlation between the two genes (positive correlations, blue circles; negative correlations, red circles). In breast cancer, neighboring HOX genes, especially in the same clusters of HOX-A, HOX-B, and HOX-C, strongly correlate with each other, and are presented as large blue circles. However, in normal mammary tissue, the correlation is weaker among the members of the HOX family, and are presented as less large blue circles and more red circles. Correlation analysis of c HOX-B genes and d HOX-C genes in luminal B breast cancer. Thicker lines represent stronger correlation between the two genes (correlation values are shown as digits). CDX2 strongly correlates with HOX-B genes, and HOTAIR strongly correlates with HOX-C genes. CDX, CDX2; HOT, HOTAIR; ANK, ANKRD22; BMP, BMP4. HOX-C cluster gene correlation pattern in e HOTAIR-high-expressing breast cancer and f HOTAIR-low-expressing breast cancer. In HOTAIR-high-expressing breast cancer, neighboring HOX-C genes strongly correlate with each other, and are presented as large thick blue circles. However, in HOTAIR-low expressing breast cancer, less correlations are observed among the members of the HOX family, and are presented as smaller pale blue circles

In addition, we recognized several non-HOX genes whose expressions strongly correlated with the HOX genes in each subtype. The important non-HOX genes include HOTAIR, CDX2, SKAP1, SKAP2, BMP4, and FGF5 (Fig. 1a, Supplementary Fig. 1a–d). HOTAIR, the long non-coding RNA coded in the opposing strand of HOXC11, presented strong correlation within the HOX-C cluster in the luminal and basal types (Fig. 1a, d, Supplementary Fig. 1a, b). CDX2, one of the major upstream signals of the HOX gene, strongly correlated with the HOX-B cluster in all breast cancer subtypes (Fig. 1a, c, Supplementary Fig. 1a–d). SKAP1, a 3′-side neighboring gene of HOXB1, strongly correlated with the HOX-B cluster (Supplementary Fig. 1a, d), and SKAP2, a 3′-side neighboring gene of HOXA1, strongly correlated with the HOX-A cluster (Supplementary Fig. 1b–d).

Breast cancer with high expression of HOTAIR presents stronger gene correlations within the HOX-C cluster

To further investigate the role of HOTAIR in breast cancer, we compared the gene correlation pattern of the HOX-C cluster by dividing TCGA breast cancer samples into two groups according to the strength of HOTAIR expression. The HOTAIR-high-expressing group presented stronger correlation within the HOX-C cluster, especially in the posterior genes which are proximal to HOTAIR, compared to the HOTAIR-low-expressing group (Fig. 1e, f).

HOX gene expression pattern predicts the prognosis of luminal B breast cancer

We collected 702 samples of breast cancer microarray and prognosis data from four datasets. We classified these meta-data samples into two clusters, based on the HOX gene expression pattern, by unsupervised hierarchal clustering. In luminal B patients, the RFS of the patients in the two clusters was statistically different (p = 0.016) (Fig. 2a, b). Gene Ontology (GO) analysis using differentially expressed genes (DEGs) between the two groups showed that the Wnt signaling pathway (GO: 00016605) was activated in the poor prognostic cluster.

Fig. 2
figure 2

Classification of microarray breast cancer meta-data by the HOX gene expression pattern. a Luminal B samples were classified into two clusters, based on the HOX gene expression pattern, by unsupervised hierarchal clustering. b In luminal B patients, the recurrence-free survival (RFS) of the patients in the two clusters was statistically different (p = 0.016). Correlation analysis of all HOX genes in the luminal B breast cancer clusters: c Luminal B cluster 1 (favorable prognosis), d Luminal B cluster 2 (poor prognosis). Size and color of each circle show the strength of correlation of the two genes given in the upper row and left-side column. Larger and thicker colored circles represent stronger correlation between the two genes; positive correlations, blue circles; negative correlations, red circles. In cluster 1 (favorable prognosis) luminal B breast cancer (c), less correlations are observed among the members of the HOX family, presenting few large blue circles and more red circles. This correlation pattern is similar to the results obtained for normal mammary tissue (Fig. 1b). In cluster 2 (poor prognosis), neighboring HOX genes, especially those in the same clusters of HOX-A, HOX-B, and HOX-C, strongly correlate with each other, presenting large blue circles

We virtually calculated OncotypeDX and gene70 score for the same set of luminal B patients and compared their sensitivity and specificity with respect to recurrence prediction. The HOX pattern classification achieved well-balanced sensitivity and specificity, which were equivalent to or better than preexisting risk scores (Supplementary Table 1).

Poor prognostic luminal B breast cancer presents stronger HOX gene correlation

We compared the correlation of the two luminal B groups, which were formerly classified by HOX gene expression. Favorable prognostic cluster 1 showed weak correlation among the HOX genes, similar to the correlation pattern of normal tissues, while poor prognostic cluster 2 presented strong correlation (Supplementary Fig. 1e, f).

HOX gene expression pattern predicts the prognosis of non-epithelial malignancies

We next aimed to explore if prognosis prediction by the HOX gene expression pattern is universally applicable to malignancies. Given that breast cancer belongs to the solid cancer group, we chose two non-epithelial malignancies, namely, leukemia and sarcoma for further verification. We classified each dataset samples into two groups based on the HOX gene expression pattern, by unsupervised hierarchal clustering. The OS of the patients in the two groups was statistically different in both leukemia (p = 0.00016; Fig. 3a) and sarcoma (p = 0.018; Fig. 3b).

Fig. 3
figure 3

Classification of non-epithelial malignancies by the HOX gene expression pattern: a leukemia, b sarcoma. Dataset samples of leukemia and sarcoma were divided into two groups based on the HOX gene expression pattern, by unsupervised hierarchal clustering. Overall survival (OS) in the two groups was statistically different in both malignancies (leukemia, p = 0.00016; sarcoma, p = 0.018)

Discussion

One of our major findings in this correlation analysis was that proximal HOX genes strongly interact with each other and are expressed together in breast cancer. We obtained similar results for both TCGA and in-house hormone receptor-positive breast cancer data, which further supports the hypothesis of the presence of this powerful interaction, irrespective of the data type or set. Interestingly, these correlations were not observed in the normal breast tissue. This discovery is also applicable to some non-HOX genes located near HOX, such as SKAP1, which strongly correlates with the HOXB cluster and is a 3′-side neighboring gene of HOXB1, and SKAP2, which strongly correlates with the HOXA cluster and is a 3′-side neighboring gene of HOXA1. During embryogenesis, SKAP1 is known to be a enhancer of HOXB1 expression [25], and SKAP2 is known to play an important role in the initiation of collinear HOXA gene expression [26]. Also, lncRNA HOTAIR, which strongly correlates with HOXC10 and 11, is transcribed from the opposing strand of HOXC11. Our results indicate that high expression of HOTAIR may lead to stronger correlation among HOX-C cluster genes, especially in posterior genes (HOXC10-13) which are proximal to HOTAIR. The strong interactions within proximal HOX genes in cancer indicate that they orchestrate and control each other. This is similar to the pattern observed during embryogenesis, wherein HOX genes delicately control each other to be expressed in a collinear pattern. Moreover, interactions among HOX genes might be specific to cancer tissue, because this has not been observed in normal breast tissue data. Expression of HOX genes in mature, normal tissue is known to be limited and their gene coordination has not been reported, indicating lack of HOX gene correlation. This is consistent with our results in normal breast epithelium.

Another finding in correlation analysis was that CDX2 strongly correlated with HOX genes in breast cancer. Although not many studies discuss the relevance of CDX and HOX in cancer, CDX is widely known to be one of the major upstream signals of HOX gene during development, together with Wnt and retinoic acid (RA) [7, 27]. The anterior (3′) part of HOX cluster is initially activated by the Wnt signal, and subsequent collinear induction of the HOX genes located centrally and more toward 5′ is maintained by CDX, until the posterior (5′) HOX13 arrests the activation of the central HOX gene [8]. Moreover, some embryologists suspect that CDX and HOX genes are involved in a positive feedback loop on Wnt signaling during body axis elongation, in order to sustain the signal [9]. Our results indicate that HOX gene expression is influenced and maintained by CDX, not only during development, but also in cancer.

Collectively, the results of gene correlation analysis showed concordance with published developmental studies. Moreover, although these data were derived from animal or cell line models, our study succeeded in showcasing the similarities in clinical samples. We suspect that the principles and phenomena pertinent to HOX genes during development are also applicable and pertinent to cancer.

Classification of breast cancer microarray meta-data on the basis of the HOX gene expression pattern aided in predicting the prognosis of luminal B patients with statistical difference, achieving comparable sensitivity and specificity with pre-existing risk scores. GO analysis revealed that the Wnt signaling pathway was activated in the poor prognostic cluster. Moreover, the poor prognostic cluster, as classified by HOX genes, presented stronger correlation among the members of the HOX family. Considering that the Wnt signaling pathway is one of the major upstream signals of HOX genes and controls their activation during embryogenesis [8, 10], it is possible that the HOX gene expression pattern and the correlations in breast cancer reflect the activation of the Wnt signaling pathway. In this regard, we hypothesize that the HOX gene expression pattern predicts prognosis by detecting the activation of the Wnt signaling pathway in breast cancer. Although several HOX genes have been reported to be influenced by the Wnt signaling pathway in cancer [28, 29], further studies are necessary to elucidate the complete picture of the Wnt signaling pathway and HOX gene expression in cancer.

Given our findings that favorable prognostic cancer, as classified by HOX genes or normal tissue, hardly shows correlation among the members of the HOX gene family, the strong interaction of HOX genes seems specific to malignancies, especially in poor prognostic cancer. Considering that the HOX genes play important roles in various solid cancers, we suspect that similar phenomena would also be present among them. This should be verified in future studies. Similar results obtained for leukemia and sarcoma (Fig. 3) indicate that the HOX gene expression pattern also predicts prognosis of other malignancies, not only cancer, but also non-epithelial malignancies.

As mentioned in the introduction, a variety of regulatory mechanisms is involved in HOX expression, and it is difficult to focus on one to explain the upregulated expression and correlation of HOX genes in cancer. However, in addition to previously discussed Wnt signaling pathway, histone methylation pattern alterations accompanied by chromatin configuration remodeling is crucial during development. Progressive activation of HOX genes corresponds to removal of Histone H3K27me3 and appearance of H3K4me3, and later in development, the genes are silenced by methylation at H3K27 and H3K9 [11, 12]. Moreover, when the cluster is transcriptionally inactive, HOX genes associate into a single structure delimited from flanking regions, and when transcription starts, HOX clusters switch to a bimodal organization where newly activated genes progressively cluster into a transcriptionally active compartment, to be expressed in a collinear manner [12]. Considering the strong correlation among neighboring HOX genes in our results, we suspect that this mechanism is also present in cancer, however in a dysregulated way, which the genes fail to be inactivated once they are activated. Several studies report that loss of H3K27me3 in HOX gene lesion leads to an increased expression of HOX genes and promotes cancer progression [30,31,32]. Future research is warranted to explore the underlying mechanism of upregulated expression and correlation of HOX genes in cancer.

Advances in cancer biology has primarily attributed the origin of cancer to mutation-involving genetic mechanisms. However, accumulating evidence indicates close similarities between embryo development and cancer process [33,34,35]. By demonstrating the similarities in the roles played by HOX genes in development and cancer, we could offer a new perspective to underscore this idea.

This study has several limitations. One limitation is that we were unable to perform gene correlation analysis in embryo and directly compare the results with cancer and normal tissue. This was because there was no comprehensive gene expression data available for analysis. Another limitation is that in gene correlation analysis, which was performed using RNA sequence data or microarray data, the different platform could create an artifact of differences in expression. However, we obtained similar results in different platforms, and found concordances in the results and previous literature in each platform.

To summarize, to the best of our knowledge, this is the first study to focus on the entire HOX family and demonstrate that its role in cancer and development are similar. We discovered that HOX genes strongly interact with each other in breast cancer, similar to that during development, contrary to the findings observed for normal breast tissue. We also found that the HOX gene expression pattern predicts breast cancer prognosis by reflecting the activation of the Wnt signaling pathway, which is one of the major upstream signals of HOX genes in development. The findings of this study might be applicable to other cancers and malignancies, considering that similar findings have been obtained for leukemia and sarcoma.