Introduction

GATA-binding protein 3 (GATA-3), a 48 kDa protein, is encoded by a gene located on chromosome 10p15. It is a transcription factor of the GATA family, containing 2 transactivating domains at the N-terminus and 2 zinc-finger DNA-binding domains at the C-terminus [1]. Like other members of the GATA family, it binds to G-A-T-A-nucleotide sequences in target gene promoters and regulates gene expression [2, 3]. It is not only an important transcriptional factor for T-cell development [4, 5], but it is also involved in cellular proliferation, development, and differentiation in many non-hematopoietic cells, including luminal epithelial cells of the breast, urothelial epithelium, parathyroid gland, adipose tissue, sympathetic nervous system, lens fiber cells of the eye, and hair follicles of the skin [6,7,8,9,10,11,12,13,14,15,16]. Although GATA-3 functions in many tissues and cell types, it has been shown to be a sensitive and specific marker only for urothelial and breast carcinomas [17].

Despite a number of studies suggested GATA-3 to be an effective marker for identifying metastatic breast cancers [18,19,20], the current understanding of GATA-3 as a breast cancer marker is still limited. The quoted expression rate from various studies, mostly with small sample sizes [21, 22], ranged variably from 32 to 100%. Little information is available regarding its value compared to other common breast markers, namely, gross cystic disease fluid protein-15 (GCDFP-15) and mammaglobin (MGB) [22, 23]. Several studies reported on the GATA-3 expression rates among different molecular subtypes, but only a few with relatively larger cohort size; thus, the information for the uncommon subtypes (HER2-OE and TNBC) is still limited. Breast specific markers are helpful in analysis of lymph nodes metastases and differential diagnosis of metastatic cancers. The expression of GATA-3 in metastatic IBC has been examined [24,25,26,27]. However, it remains to be investigated whether GATA-3 expression is retained in metastatic cancers when compared to its primary site.

In the present study, GATA-3 expression was assessed in a large cohort of IBC with paired nodal metastases. The results were compared to GCDFP-15 and MGB. The value of GATA-3 in metastatic setting was evaluated by examining its differential expression rate in lung and breast. In addition, its expression in matched primary and metastatic tumors (both distant and nodal metastases) was compared. The biomarkers were assessed according to REMARK criteria [28].

Materials and methods

Tissue samples

Consecutive archival paraffin-embedded tissue samples from patients with IBC and paired nodal metastases, if positive, were retrieved. Two archival cohorts, consisting of lung carcinomas, and paired primary IBC with distant metastases were also used.

For IBC, information about the age and sex of patients, tumor size and lymph node status, were obtained from the pathology reports. The original H&E slides for each case were retrieved and reviewed by two pathologists independently. Histologic diagnosis was made according to the WHO classification of tumours of the breast (4th ed) [29]. The tumors were graded (modified Bloom and Richardson) [30] and TNM staged (7th ed AJCC) [31]. Any discrepancies were resolved at a multiheaded microscope by discussion to reach a consensus.

For lung carcinoma, patients’ information, including age and sex, the histologic diagnosis and TNM stage were retrieved from the pathology reports.

This study was approved by Joint Chinese University of Hong Kong-New Territories East Cluster clinical research ethics committee.

Tissue microarray (TMA) construction

Cellular areas of the tumors on H&E-stained slides were chosen for both breast and lung carcinomas, and the corresponding areas were taken from the paraffin blocks for TMA construction. The TMA was assembled with a tissue arrayer (Beecher Instruments, Silver Springs, MD). For IBC, two 0.6-mm tissue cores were obtained from each case. For the lung carcinoma, one (1.0-mm) tissue core was obtained from each case. Serial 4-m sections were cut and transferred to Superfrost Plus glass slides (Menzel-Glaser, Germany). One section from each tissue array block was stained with H&E and reviewed to confirm that representative tumors were included in the TMA blocks.

For the distant metastatic breast carcinoma foci with available primary tumor, whole paraffin section slides were used.

Immunohistochemical (IHC) staining and scoring

The paraffin slides were stained using Benchmark Autostainer (Roche Benchmark Ventana XT) with the Ventana Discovery System. The IBC slides were stained for estrogen receptor (ER), progesterone receptor (PR), HER2, proliferation marker (Ki67), cytokeratin5/6 (CK5/6), epidermal growth factor receptor (EGFR), GATA-3, GCDFP-15 and MGB, and the lung carcinoma slides were stained for GATA-3. All the slides were counterstained with hematoxylin. Details of the antibodies used were shown in Supplementary Table S1.

The TMA was scored for the percentage of the tumor cells showing moderate to high intensity staining by two of the authors blinded to the clinical information. For ER, PR, Ki67, and GATA-3, the staining was nuclear; for HER2, the staining was membranous; and for GCDFP-15 and MGB, the staining was cytoplasmic. ER and PR were considered positive when ≥ 1% of tumor cells showed staining [32]. HER2 was scored as 0, 1+ , 2+, and 3+ [33], and 3+ staining was considered positive. For Ki67, high expression was defined arbitrarily as staining of 15% or more of tumor cells. For GATA-3, GCDFP-15, and MGB, a 5% cutoff was used to define positivity.

The IBC were classified into molecular subtypes using IHC phenotyping as surrogate as follows [34]:

Luminal A: ER+ , PR ≥ 20%, HER2−, CK5/6± and Ki67 < 20%

Luminal B: ER+ , CK5/6± , HER2 + or Ki67 ≥ 20% or PR < 20%

HER2-OE: ER−, PR−, HER2+

Triple negative (TNBC): ER−, PR−, and HER2−

Statistical analysis

Statistical analysis was performed using SPSS V23.0 for windows. Chi-square analysis was performed to evaluate the correlation between GATA-3 expression and clinic-pathologic parameters, and the correlation between GATA-3, GCDFP-15, and MGB expressions. A p value of < 0.05 was considered statistically significant. The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated to evaluate the sensitivity and specificity of GATA-3 in differentiating breast and lung carcinomas.

Results

A total of 993 cases of primary IBC and 254 paired nodal metastases with complete clinic-pathological and IHC information were included in this cohort (representative staining for IHC is shown in Supplementary Fig. S1.)

All the patients with IBC were female, with a mean age of 54.5 (range 22–97) years. The cases included invasive carcinoma of no special type (845 cases, 85.1%), invasive lobular carcinoma (35 cases, 3.5%), and others (113 cases, 11.4%). For the histologic grade, 137 (13.8%), 386 (38.9%), and 470 (47.3%) cases were of grades 1, 2, and 3, respectively. For the TNM stage, the numbers of stage I, II, III, and IV cases were 254 (25.6%), 527 (53.1%), 208 (20.9%), and 4 (0.4%), respectively. For the molecular subtyping, there were 445 (44.8%), 305 (30.7%), 104 (10.5%), and 139 (14.0%) cases of luminal A, luminal B, HER2-OE, and TNBC subtypes, respectively.

Comparison of GATA-3 expression with GCDFP-15 and MGB expression in IBC

GATA-3 was expressed in 82.5% (819/993) of all primary IBC (Fig. 1). It has a much higher overall expression in IBC (819/993; 82.5%) than MGB (463/993; 46.6%) and GCDFP-15 (237/993; 23.9%). Among different molecular subtypes, GATA-3 was expressed in 95.7% (426/445) of luminal A, 91.1% (278/305) of luminal B, 59.6% (62/104) of HER2-OE, and 38.1% (53/139) of TNBC subtypes, with a predominant expression in the luminal cancers (93.9%). Compared to MGB (46.5 and 50.2% for luminal A and B, respectively) and GCDFP-15 (28.3 and 17.7% for luminal A and B, respectively), it showed the highest expression rate also in the luminal subtypes. Although GATA-3 showed a significantly higher expression than GCDFP-15 in HER2-OE (25.0%) and TNBC (22.3%), GATA-3 expression in these two subtypes was comparable to MGB (HER2-OE-56.7% and TNBC-31.7%).

GATA-3 expression was associated with low histologic grade (p < 0.001), ER positivity (p < 0.001), PR positivity (p < 0.001), HER2 negativity (p = 0.004), and lower Ki67 index (p < 0.001). Similar to GATA-3, GCDFP-15 showed association with lower grade (p < 0.001) and low Ki67 (p = 0.04). However, both showed no association with ER and PR positivity and MGB showed positive association with higher grade (p = 0.049) and HER2 positivity (p < 0.001) (Table 1, Supplementary Fig. S2).

Table 1 Comparison of GATA-3, MGB, and GCDFP-15 expressions and their correlation with clinic-pathologic parameters, biomarkers, and molecular subtypes in primary tumor

There was no significant correlation between GATA-3 and GCDFP-15 expression (p = 0.279), while a positive correlation of expression was noted between GATA-3 and MGB (p = 0.01). The detection rate was also analyzed for combining these markers. In fact, the overall IBC detection rate was slightly increased from 82.5% by GATA-3 alone (the most sensitive single marker) to 91.2% (906/993) by combining all three markers. Using two markers combination of GATA-3/MGB, the overall IBC detection rate increased from 82.5 to 89.1% (885/993). An increased rate was also seen for luminal A (from 95.7 to 97.1%), luminal B (from 91.1 to 95.7%), HER2-OE (from 59.6 to 80.8%), and TNBC cases (from 38.1 to 55.4%). Using a GATA-3/GCDFP-15 combination, the detection rates were increased for luminal A subtype (from 95.7 to 97.1%), luminal B subtype (from 91.1 to 92.1%), HER2-OE subtype (from 59.6 to 69.2%), and TNBC subtype (from 38.1 to 50.4%). Altogether, the combination of GATA-3/MGB showed the best detection rate in overall IBC and all distinct subtypes, especially in the HER2-OE and TNBC cases (Supplementary Table S2).

Comparison of GATA-3 expression with GCDFP-15 and MGB expression in nodal metastasis

In nodal metastasis, the GATA-3 expression rate was also 83.9% (213/254 cases). The clinico-pathological association of GATA-3 in nodal metastasis was similar to the findings in primary tumor (data not shown). The expression rates of GCDFP-15 and MGB in nodal metastases were 23.2% (59/254) and 42.5% (108/254), respectively. Classifying into molecular subtypes, the detection rates by GATA-3, MGB, and GCDFP-15 for luminal A were 87.2% (75/86), 40.7% (35/86), and 20.9% (18/86), respectively; for luminal B the rates were 87.9% (87/99), 45.5% (45/99), and 22.2% (22/99), respectively; for HER2-OE, the rates were 74.1% (20/27), 51.9% (14/27), and 29.6% (8/27), respectively; and for TNBC, the rates were 73.2% (30/41), 31.7% (13/41), and 24.4% (10/41), respectively. As in primary tumors, among the three markers, GATA-3 showed the highest detection rate in nodal metastases overall, as well as in distinct molecular subtypes (Table 2).

Table 2 Expression of GATA-3, MGB, and GCDFP-15 expressions in nodal metastasis and comparison with corresponding primary tumor

The expression of GATA-3 between the primary tumor and nodal metastases was concordant in 90.6% (230/254) of cases (201 positive/positive and 29 negative/negative) and discordant in 9.4% (24/254) of cases (12 positive/negative and 12 negative/positive). For MGB and GCDFP-15, the concordant rates were lower, 66.1% (168/254) and 81.5% (207/254), respectively (Supplementary Fig. S3). It is possible that such discordance of GATA-3 expression may be related to alteration of ER status at the sites. A higher rate of ER discordance was also observed in the GATA-3 discordant cases. Among the 24 GATA-3 discordant cases with ER information, 75% (18/24) of cases showed concordant ER status in both sites (15 positive/positive and 3 negative/negative) and discordance for ER in 25% (6/24) of cases (3 positive/negative and 3 negative/positive). On the contrary, in those GATA-3 concordant cases, there was only 10.9% (25/230) of cases (10 positive/negative and 15 negative/positive) with ER discordance. GATA-3 concordance positively correlated with ER concordance (p = 0.046).

Value of GATA-3 in differentiating breast and lung carcinomas

To assess the value of GATA-3 in identifying the origin of cancers, its expression in IBC was compared to a cohort consisting of 208 lung carcinomas. Patients with lung carcinomas included 96 females and 112 males, and the mean age was 63.1 (range 27–94) years. The carcinomas included adenocarcinomas (194 cases, 93.3%), adenosquamous carcinomas (8 cases, 3.8%), squamous cell carcinomas (2 cases, 1.0%), poorly differentiated carcinomas (3 cases, 1.4%), and sarcomatoid carcinoma (1 case, 0.5%) (Table 3).

Table 3 GATA-3 expression in invasive breast carcinoma and pulmonary carcinoma

GATA-3 expression was identified in two cases of lung adenocarcinoma [1.1% (2/194)] in a diffuse and moderate-to-strong pattern. Interestingly, in our previously published data [35], these two GATA-3 positive lung adenocarcinomas were also TTF-1 negative using different clones of antibodies (one case was TTF-1 negative by both 8G7G3/1 and SPT24 clones; the other case was TTF-1 negative by 8G7G3/1 but positive by SPT24). The sensitivity, specificity, PPV, and NPV were 82.5, 99.0, 99.8, and 54.2%, respectively (Table 3). Thus, GATA-3 showed both high specificity and high sensitivity in differentiating breast carcinoma from lung carcinoma.

Value of GATA-3 in differentiating metastatic breast carcinoma

To assess the value of GATA-3 in identifying metastatic breast carcinomas, 23 cases of metastases from breast carcinomas (18 brain metastases and 5 lung metastases) and 11 cases of corresponding primary breast carcinomas were included. Among all the breast cancer markers, GATA-3 expression was noted in all the primary and metastatic foci with available IHC results (11/11 and 22/22 cases, respectively), and was much higher than 18.2% (2/11 of primary tumors) and 56.5% (13/23 of metastatic tumors) for GCDFP-15 and 54.5% (6/11 of primary tumors) and 78.3% (18/23 of metastatic tumors) for MGB (Table 4). There was 100% concordance between primary and metastatic foci for GATA-3. All 11 paired primary and metastatic foci showed GATA-3 positivity. For MGB and GCDFP-15, a lower concordance was observed (7/11 and 8/10 cases, respectively). Of note, the concordant cases for MGB were mainly with positive staining (6/7 cases). However, with the low sensitivity of GCDFP-15, concordance was mostly found in negative cases (6/8 cases). For the discordant cases, all re-expressed the marker at the metastatic foci (Table 4, Supplementary Fig. S3).

Table 4 Concordance of GATA-3, MGB, and GCDFP-15 in metastatic breast cancers and corresponding primary tumor

Discussion

GATA-3 plays an important function in lineage determination and differentiation of mammary gland [3]. It is reported to be highly expressed (around 90%) in breast and urothelial carcinomas and, hence, has been used as a marker for these cancers [17, 36]. A number of cancers also showed GATA-3 positivity including basal cell carcinoma, cutaneous squamous cell carcinoma, skin adnexal tumors, choriocarcinoma, endodermal sinus tumor, renal chromophobe carcinoma, malignant mesothelioma, and salivary gland and pancreatic ductal adenocarcinoma (37–98%) [36]. Despite the lack of specificity, it is still regarded as a sensitive breast cancer marker. Up till now, comprehensive studies of GATA-3 in IBC are still relatively limited. In the current study, GATA-3 was found to be expressed in 82.5% of IBC. The expression rate is slightly lower than those reported in the smaller cohorts: 93.7% in a 268 cases’ cohort and 94% in a 147 cases’ cohort [17, 36], but similar with that in another large cohort with 1637 cases [22].

The current study confirmed the previous findings in the clinico-pathologic association of GATA-3, with correlation with lower grade, hormonal receptor expression, and HER2 negativity [37, 38]. Consistently, GATA-3 expression was higher in the luminal than non-luminal subtypes. This observation reflected gene profiling results and the presumed biological role of GATA-3 in breast tissue [7, 8, 39]. Some authors reported decreased GATA-3 expression in luminal B subtype defined by gene profiling [40], but others [41], including the current series, reported similar GATA-3 expression rate in luminal A and luminal B subtypes. GATA-3 expression was lower in non-luminal subtypes, especially in TNBC (33.3%). In murine studies, GATA-3 expression inhibited the TNBC phenotype [42]. In the published data, the expression rate of GATA-3 in non-luminal cases ranged from 2.6 to 83% [43]. The wide range likely reflected the differences in methodologies across the studies. The current results were mid-range. Usually, whole paraffin section reveals a higher marker positivity than TMA section [44], likely reflecting sampling error due to small sample size in the TMA. The differences in antibodies choice could also lead to the variable results. Most publications have used either antibody clones HG3-31 or L50-823, and the latter appeared to be more sensitive. It had been shown that L50-823 stained 66% of ER negative cancers compared to 44% by HG3-31 [45], and we previously reported a similar observation [46]. For diagnostic purpose, an increased clinical sensitivity will be required; thus, the clone with higher sensitivity was chosen for the current study. In addition, there was a wide range of thresholds, from any nuclear staining [36] to over 20% nuclear labeling [47] was used to define GATA-3 positivity. Thus, GATA-3 showed better detection rate for breast carcinoma than GCDFP-15 and MBG, in both primary tumors and nodal metastases.

Previous comparative studies demonstrated similar results showing a higher expression rate of GATA-3 (72–82.83%) than GCDFP-15 (44–62%) and MGB (36–64%) [22, 24], as well as in different histological subtypes [48]. In TNBC, GATA-3 was the most sensitive (40–60%) among all the three markers (15–17% for GCDFP-15 and 7.1–26% for MGB) [23, 25]. We previously demonstrated the superiority of combined GCDFP-15/MGB over single markers in identifying IBC [49]. Therefore, we have examined whether other breast markers could potentially complement GATA-3 in detecting IBC. As GATA-3 revealed high expression rate in luminal cases, combination with other makers did not show significantly increased sensitivity. In non-luminal cases, GATA-3/MGB significantly enhanced the detection rate. Nevertheless, it should be noted that identification of TNBC remains problematic due to the low expression of all three markers negative marker results should be interpreted with caution and not be taken as firm evidence of ruling out a breast origin.

In addition to primary breast cancers, GATA-3 also demonstrated the highest detection rate in metastatic foci (at lymph node and distal metastatic sites) compared to GCDFP-15 and MGB. GATA-3 labeling was successfully documented in cytologic specimens of metastatic cases, including both fine needle aspirates and body fluid collection [26, 27]. In a large series of metastatic breast carcinoma (n = 166; hormone status not reported), the sensitivities of GATA-3, mammaglobin, and GCDFP-15 were 95, 78, and 65%, respectively [27]. Here, GATA-3 showed positivity in all the distal metastatic cases, including ten ER negative cases. It could have the greatest diagnostic potential in identifying carcinoma of breast origin. It has also been documented that GATA-3 labelled 100% of metastatic breast cancers in which ER and/or PR expression was lost from the primary to the metastasis [50], but the results were derived from small number of cases. Despite the high concordance of over 90% for GATA-3 expression at primary and nodal metastasis in our large series, the 10% cases with discordant results showed a tendency of discordance also in ER expression. One should be cautious that there could be a likelihood of change in GATA-3 status in metastasis if changes of hormonal receptor status were also observed in the metastatic diseases.

GATA-3 expression was identified in two lung adenocarcinomas in the present study, similar to previous results [35]. These two GATA-3 positive lung adenocarcinomas were also TTF-1 negative using different clones of antibodies (one case was TTF-1 negative by both 8G7G3/1 and SPT24 clones; the other case was TTF-1 negative by 8G7G3/1 but positive by SPT24) [35]. TTF-1 and GATA-3 have been advocated as ‘lung specific’ and ‘breast specific’ markers. Apart from our observations, there was previous report of GATA-3 expression in squamous cell carcinoma (SCC) of the lung [18]. On the other hand, IBC also rarely expresses TTF-1 [18, 35, 51]. Hence, one should be aware of the fallacies in using these ‘specific’ markers.

In conclusion, GATA-3, compared to GCDFP-15 and MGB, showed the highest expression in IBC. It also showed variable expression among different molecular subtypes, with the highest expression in luminal subtypes, and lower expression in non-luminal subtypes. A GATA-3/MGB combination was more sensitive in identifying non-luminal cases. GATA-3 is also useful in identifying metastatic carcinoma of breast origin. GATA-3 expression in lung adenocarcinoma has also been identified. Cautions should be taken for its use in differentiating between breast and lung adenocarcinomas, particularly in a metastatic setting.