1 Introduction

Metastasis is a complex series of events that involves several gene products, including those important for the invasion and detachment of neoplastic cells from the primary tumor, penetration into blood and lymphatics, arrest at distant sites, adhesion to endothelial cells, extravasation, induction of angiogenesis, evasion of host antitumor responses, and growth at metastatic sites [1, 2]. The regulated expression of several tumor cell genes is thought to be important in this process [3].

During an attempt to identify candidate metastasis-associated genes in rat mammary adenocarcinoma systems in 1994, we first identified mta1 (metastasis-associated gene 1; rat homologue) complementary DNA (cDNA) as a completely novel gene [4]. Subsequently, we cloned the human homologue of mta1, MTA1 [5], and investigated the expression of human MTA1 messenger RNA (mRNA) in surgically resected human cancer tissues. We found significant positive correlations between the expression levels of MTA1 mRNA and several clinicopathological factors related to malignant potential [6, 7]. In this brief review, we will discuss the researches for the first identification and characterization of mta1/MTA1 genes and their encoded proteins (Mta1/MTA1), as the discoverers of the mta1/MTA1 genes.

2 Identification of candidate metastasis-associated genes in the rat mammary adenocarcinoma metastatic system

In 1992, our laboratory embarked a project to identify candidate metastasis-associated genes at the University of Texas M. D. Anderson Cancer Center in Houston, Texas. At that time, several techniques had been used to search for genes involved in the metastatic cascade, including: somatic cell fusion, karyotypic analysis, transfection of isolated genes into recipient cells, and differential or subtraction cDNA hybridization. Using these techniques, several genes have been identified as differentially expressed and possibly involved in the metastasis of mammary and breast cancers: mts1, nm23, stromelysin-3, among other genes.

We used the 13762NF rat mammary adenocarcinoma cell lines of differing metastatic potentials in experimental and spontaneous metastasis assays [8] to identify possible genes that are involved in the metastatic process. This model system has been well characterized and was known to be similar to human breast adenocarcinoma in many respects, including cytoskeletal organization, cell surface components, and pathologic mode of spread in vivo [8, 9]. In this system, gene expression was compared between the nonmetastatic cell line MTC.4 and the highly metastatic cell line MTLn3. MTC.4 was a subclone of the MTC cell line, derived from the primary tumor growing in the mammary fad pad and possessed no ability to metastasize from a primary implant site in the mammary fad pad or colonize tissues after intravenous injection. Line MTLn3 was derived from a spontaneous lung metastasis and was found to be highly spontaneously metastatic from the mammary fat pads of syngeneic rats as well as forming large numbers of lung colonies upon intravenous injection [8]. These cell lines were found to be phenotypically stable during prolonged passage in vivo or in vitro [8]. Such stability is important for differential hybridization analysis and identification of differentially expressed genes, and it is also essential for gene transfer experiments and metastasis assays.

Using the differential gene expression approach, multiple cDNAs were eventually isolated that were differentially expressed in either poorly or highly metastatic 13762NF lines. After tertiary screening, 10 separate cDNAs were finally identified [10]. Partial sequencing of those cDNAs, followed by a homology search with the GenBank/EMBL data banks, revealed that eight of them were known genes, including annexin I, elongation factor-1α (EF-1α), mitogen-activated protein kinase (MAPK), type IV collagenase (72kD) and cathepsin L. Most of these genes had been reported previously as implicated in metastasis of various cancer cells. Interestingly, EF-1α had been independently identified in a similar differential hybridization approach to search for metastasis-associated genes using a fos-transfected metastatic cell line (fos-SR-3Y1-202) and a non-transfected, nonmetastatic version of the line [11]. Thus, we judged that this screen had worked well for the purpose of identifying candidate metastasis-associated genes in the 13762NF rat mammary adenocarcinoma system.

Two of 10 cDNAs identified above had no homologous nucleotide sequences in the databank, and one of these cDNAs was designated as clone 10.14. This cDNA clone contained about 2.2 kilobase pairs (kbp). Because its corresponding mRNA was ∼3.0 kbp in RNA blotting analysis (to be described below), this was not a full-length cDNA clone. Since complete sequencing of the 10.14 cDNA revealed a single open reading frame of 583 amino acid extending to the 5ʹ-end of the sequence, and its corresponding amino acid sequence was also completely novel, we were determined to isolate a full-length cDNA of clone 10.14, and then further analyze this gene sequence. At this point, we named this candidate gene mta1 (metastasis-associated gene 1).

3 Cloning and sequencing of a full-length mta1 cDNA (10.14)

A full-length mta1 cDNA (designated 10.14) was isolated by screening the cDNA library of subline MTC.4 cell line, followed by sequencing [4]. The sequence was 2756-bp long and contained a single open reading frame encoding a protein of 703 amino acid residues, which was named Mta1. A computer-assisted homology search was performed for the nucleotide and amino acid sequences at the National Center for Biotechnology and Information (NCBI) and no significant homology was found, indicating that mta1 was completely a novel gene [4].

To confirm the differential expression of the mta1 mRNA between the metastatic MTLn3 and nonmetastatic MTC.4 cell lines, we examined its steady-state mRNA levels by RNA blot analysis. This showed that both lines expressed mta1 mRNA of ∼3.0 kb in size and that the expression level was estimated to be fourfold higher in the highly metastatic MTLn3 cell line. Because Southern blotting showed that there were counterparts of the rat mta1 gene in the human and mouse genomes, we further examined the expression of the human homologue (MTA1) of the mta1 gene in two well-characterized human breast adenocarcinoma cell lines. As a result, these human cell lines also expressed MTA1 mRNA, the homologue of the mta1 gene, of approximately the same length as rat mta1 mRNA.

For comparison, we used the human breast cancer cell line MCF-7, derived from the pleural effusion of a breast cancer patient, as a relatively noninvasive and nonmetastatic cell line in nude mice. Conversely, its malignant derivatives, sublines MCF7/LCC1 and MCF7/LCC2, are invasive and metastatic in vivo [12, 13]. The expression ratio of the MTA1 mRNA in these cells was MCF-7:MCF7/LCC1:MCF7/LCC2 = 1:2:4. Using another cell line set, the expression of the MTA1 mRNA in MDA-MB-231 (metastatic in nude mice) was calculated to be ∼4 times as high as in MDA-MB-468 cells, which was nonmetastatic in nude mice [14]. Thus, the mRNA expression levels of the human homologue of the rat mta1 gene also correlated with the metastatic potential found in two human breast cancer metastatic systems. These results suggested that the mta1 and MTA1 genes were good candidates for metastasis-associated genes.

The cDNA fragment containing the entire open reading frame of the rat mta1 gene was inserted into a prokaryotic expression vector, and the protein derived from it was expressed as a fusion protein with glutathione S-transferase (GST) in Escherichia coli. As a result, the fusion protein contained the expected molecular mass of ∼108 kDa, including the full-length Mta1 protein of ∼80 kDa. This indicated that an open reading frame does exist in the mta1 cDNA clone. Using antibodies raised against the GST-Mtal fusion protein or a synthetic oligopeptide (containing amino acid residues 329–343) we performed Western blot analysis with an MTLn3 cell lysate. Both antibodies recognized bands of ∼80 kDa in size [4].

The expression levels of the Mtal protein or its human homologue MTA1 protein were then examined in MTLn3, MTC.4, MCF-7, MCF7/LCC1, MCF7/LCC2, MDA-MB-468 and MDA-MB-231 cell lines, all of which were used previously for Northern blot analysis, as mentioned before. In these experiments, the Mta1/MTA1 protein expression levels correlated well with the metastatic potentials of the various cell lines, and the expression ratios of the protein were quite similar to the expression ratios obtained from the Northern blot analyses [4].

Using homology to the rat mta1 cDNA, we cloned the human MTA1 cDNA from a human melanoma A2058 cDNA library [5]. The human MTA1 gene encoded a putative protein of 715 amino acid residues with a predicted molecular weight of ∼82 kDa. The amino acid sequences of the rat and human proteins were compared and found to be 96 % identical and 98 % similar [5]. To assess the extent of evolutionary conservation of the MTA1 gene, we analyzed genomic DNA of several species by Southern blot analysis and showed that the MTA1 gene was conserved in human, rat, mouse, dog, cow, rabbit, and chicken [5].

4 Structural analysis of the Mta1/MTA1 protein and the clues to assume its function(s)

Hydropathy plot (Kyte-Doolittle) values for the predicted Mta1 protein did not show any apparent membrane-spanning or membrane-associated regions, nor was there an NH2-terminal signal sequence. This protein was quite hydrophilic, suggesting that it is not a cell surface protein, nor is it a secreted protein requiring a signal sequence [15]. Sequence analyses of the Mta1/MTA1 proteins revealed the existence of several common important protein sequence motifs [4, 5, 1520]. These are illustrated in Fig. 1, which includes: (1) A bromo-adjacent homology (BAH) domain was found at the N-terminal of Mta1/MTA1 protein. This domain has been identified in a variety of proteins involved in transcriptional co-regulation and/or DNA binding and is thought to be involved in protein-protein interactions; (2) A SANT domain was located to the C-terminal site of the BAH domain of Mta1/MTA1 protein. The SANT domain is similar to the DNA binding domain of myb-related proteins and was identified in SWI3 (a yeast component of the SWl/SNF complex), along with ADA2 (a component of the histone deacetylase complex), N-CoR (a nuclear hormone co-repressor) and the TFIIIB B subunit (a basal pol III transcription factor in yeast). The SANT domain has also been referred to as the WFY domain, since it has many aromatic amino acid residues at fixed positions; (3) An ELM domain (egl-27 and MTA1 homology) was located between the BAH and the SANT domains of Mta1/MTA1 protein. This has an unknown function, but the ELM-SANT domain of MTA1 has now been shown to be bound to HDAC1 [21]; (4) A zinc-finger motif (Cys-X 2-Cys-X17-Cys-X 2-Cys) belonging to the type found in transcription factors that bind to the GATA sequence that are involved in hematopoiesis and heart development; (5)A leucine-zipper motif (Leu-X6-Leu-X6 Isoleu­X6-Leu-X7-Leu) was found beginning at residue 251 of MTA1; (6) The Mta1/MTA1 protein contains proline-rich sequences at each carboxyl terminal extremity (rat: LPLRPPPPAP and human: LPPRPPPPAP). These amino acid sequences completely matched the consensus sequence for the src-homology3 (SH3) domain-binding motif. The SH3 domains are considered to be important in protein-protein interactions in signal transduction pathways, and they are known to associate with cytoskeletal components as well as their involvement in other protein-protein interactions. Another possible SH3-binding motif (PRPPKPDP) was also observed; (7) Three putative nuclear localization signal sequences were found at the carboxyl terminus of the Mtal/MTAl protein; and (8) A KRAARR motif was found near the C-terminal end of MTA1 protein, which has been recently shown to bind to Rbp48 protein in the nucleosome remodeling and histone deacetylation (NuRD) complex [22]. The first four domains or motifs are well conserved in all three primary members of the MTA family, including MTA1, MTA2, and MTA3 (Fig. 2). The findings on putative protein domains or motifs strongly suggested that Mta1/MTA1 protein might possibly be involved in transcriptional control.

Fig. 1
figure 1

Schematic representation of the structural domains or motifs of Mta1/MTA1 protein

Fig. 2
figure 2

Sequence alignment of MTA1, MTA2, and MTA3 from various eukaryotes, including human, mouse, rat, frog, and bovine. Known MTA1 domains, BAH, ELM and SANT, and zinc-finger motif are indicated. The KRAARR motif is boxed in gray. This figure has been reproduced from Alqarni et al. [22] with permission from Dr. Mackay

5 Conclusions

When we first reported on the amino acid sequence of the rat mta1 cDNA and Mtal protein in 1994, there were no similar or homologous nucleotide or protein sequences in the databases, suggesting that mtal was a completely new, novel gene [4]. By 1999, similar genes or genes with homologous amino acid sequences to Mtal/MTA1 protein appeared in the databases. However, the functions of MTA1 were still unknown at this time.

The first clues on the molecular and biochemical functions of MTA1 were obtained by four different groups from 1998 to 1999 [2327]. In these studies, two disparate chromatin-modifying activities, ATP-dependent nucleosome remodeling activity and histone deacetylation, were discovered to be functionally and physically linked in the same protein complex. The original complex was named the NuRD (Nucleosome Remodeling and histone Deacetylation) complex, which has been shown to have transcription-repressing activity [2427]. Furthermore, in order to determine the biologic function of MTA1, we had to wait for the epoch study by the Kumar laboratory where evidence was obtained that directly demonstrated the relationship between the MTA1 protein in the NuRD complex and invasion/metastasis of cancer cells [28].

After identification of the human MTA1 cDNA, we first showed the clinicopathological significance of its overexpression in human cancer specimens, including esophageal, gastric, and colorectal cancers [6, 7]. Many clinicopathological correlative studies followed our studies and the conclusions obtained from those have been reinforced by additional experiments that show the biological relevance of MTA1 protein overexpression to carcinogenesis and cancer progression. All of these topics mentioned here will be introduced in the other chapters of this special issue.