Introduction

Tea (Camellia sinensis (L.)) is one of the three most popular beverages worldwide due to its beneficial health properties (Yang et al. 2002). Over two billion people in more than 125 countries have habits of drinking tea (Mei 2015). Teas have been typically classified according to processing technology, which produces different characteristics and fermentation (oxidation) degrees. According to various manufacturing processes and fermentation degrees, teas can be classified into six main categories: unfermented green tea (GT), slightly fermented white tea (WT), partly fermented yellow tea (YT), semi-fermented oolong tea (OT), fully fermented black tea (BT), and post-fermented dark tea (DT). Usually, total catechin contents of the different category teas were 11.21 g/100 g, 5.93 g/100 g, 9.41 g/100 g, 8.21 g/100 g, 1.07 g/100 g, and 1.94 g/100 g (dry basis), respectively. And theanine contents were 1.48 g/100 g, 1.51 g/100 g, 1.36 g/100 g, 0.70 g/100 g, 0.97 g/100 g, and 0.64 g/100 g (dry basis), respectively, while caffeine contents were from 1.51 g/100 g to 5.00 g/100 g (dry basis) (Yang et al. 2001).

In tea markets, the tea types and their quality are traditionally determined by sensory evaluation of the well-trained professionals, and the results could be easily influenced by weather, gender, and mood, which cannot be quantified accurately and sometimes lead to trade disputes. A quantitative identification assisted to sensory evaluation is very important and urgently required, especially for consumers that come from non-producing countries. Compared to sensory evaluation, the chemical analysis methods are more precise (Samadi and Jalali-Heravi 2013). However, other crucial components increased the difficulty of tea classification and significance of developing discriminant methods. First, the six tea categories contain more than 1000 varieties. Each variety is different in terms of tea assessment at home and abroad due to its own characteristics and unique taste, making it difficult to understand and memorize. Second, teas have been commonly confused due to resemblances in appearance, aroma, or taste, especially yellow tea, aging green tea, and oolong tea. Aging green tea has similar characteristics as yellow tea in color and taste. The faint-scented type oolong tea is more lightly fermented to increase its greenness, making its aroma, color, and taste similar to those of green tea (Wang et al. 2008; Wu et al. 2008). None of the teas can be identified quickly, not even by professionals. For example, Anji white tea, which is a special type of green tea, has an almost identical shape and color compared with white tea (Söhle et al. 2009; López and Calvo 2011; Koutelidakis et al. 2009). A more reliable and quantifiable method to identify tea categories has not been developed. Therefore, it is necessary to develop accurate and quantitative methods to distinguish between different tea categories. Tea is abundant in chemical components that differ in both their contents and component ratios, which not only results in multiple dimensions of tea quality but also leads to some difficulties in the qualitative analysis of tea, especially when tea chemical information far exceeds the subjective analysis of the taster. Thus, quantitative indicators need to be utilized to identify tea. More recently, the tea field has made its greatest progress in characterizing the nature of tea at the chemistry level. Unremitting efforts have been made to research the key components of various teas based on chemistry and physical characteristics. Black tea, oolong tea, white tea (Wang & Wang 2015), different grades of longjing tea (Chen et al. 2012), Wuyi oolong tea (Wang et al. 2010), Tieguanyin oolong tea (Yu et al. 2014), and oolong tea (Wu et al. 2014) have been identified based on their main chemical component content, using infrared spectroscopy techniques or pattern recognition. As widely mentioned, the chemical composition of tea determines its taste; however, such a method has not been systematically applied to classify the six tea categories. Theanine is detected by titration and colorimetric methods (Baptista et al. 2012; Kakuda et al. 1996; Gao et al. 2005), and caffeine is detected by colorimetric method. Both methods are not completely accurate. Among various analytical methods, high-performance liquid chromatography (HPLC) has been widely used by the main exporting and importing countries to identify chemical compositions, providing high accuracy and sensitivity. Therefore, we monitored key components, such as catechins, caffeine, and theanine, which may be the first components utilized to set a discrimination standard (Obuchowicz et al. 2011). For the above reasons, the tea chemical classification proposal was presented at an international standard at the 25th international conference of the International Organization for Standardization (ISO) in Japan in June 2015 along with the setup of working groups. Therefore, we aimed to develop an accurate and quantitative method based on the chemical compositions of teas. This study was based on the significant variation in the chemical compositions among the teas as a result of different fermentation degrees during processing. In our study, numerous representative samples covering six tea categories that correspond to different processing methods were collected worldwide. Catechins, caffeine, and theanine were employed as distinct variables to develop internationally recognized tea models and provide references to establish chemical classifications of ISO, which would aid the international tea trade and consumers.

Materials and Methods

Tea Samples

A total of 436 tea samples were collected from different processing methods, which produce different chemical quantities, according to the uneven distribution of tea in the market, as described in Table 1. These samples were produced in 2012 and 2013. A total of 354 samples were collected from tea processing enterprises in China, and 82 samples were obtained from reliable sources in other countries. Tea samples with different processing methods were sorted from different grades and production regions and analyzed by HPLC for category detection. All samples were stored in a refrigerator at 4 °C until analyzed.

Table 1 Description of tea samples

Chemical Analysis and Methods

Chemicals and Reagents

Standards of caffeine, (+)-catechin (99 %, C), (−)-epicatechin (99 %, EC), (−)-epicatechin gallate (99 %, ECG), (−)-epigallocatechin gallate (99 %, EGCG), and (−)-epigallocatechin (99 %, EGC) were from Sigma (Sigma Chemical Co., St. Louis, MO, USA). HPLC-grade acetonitrile, methyl alcohol, and acetic acid for HPLC were from Tedia (Tedia Co., Ohio, USA).

Theanine (100 %) was purchased from Sigma. HPLC grade water was purified with a water purification system (Aquapro International Co., Delaware, USA).

Preparation of the Test Solution

For catechins and caffeine quantification, the sample solution was prepared as described in ISO 14502-2 (2005). Briefly, a sample of tea powder (0.2 g) was extracted with 5 mL 70 % methanol at 70 °C for 10 min with stirring (every 5 min), then centrifuged at 1400×g for 10 min under room temperature. After centrifuge, the supernatant was transferred into a new 10-mL volumetric flask. Then we repeated the extraction and combined all the supernatant together which metered up to 10 mL with 70 % v/v methanol. The extracts were further filtered through 0.22 μm Millipore filter for analyzing catechins and caffeine by HPLC.

A tea sample of 1 g was extracted with 300 mL of deionized water (100 °C) for 20 min. The extract was vacuum filtered, then transferred to 500-mL volumetric flask and diluted with water to a constant volume and mixed. After that, the extract was filtered through a 0.22-μm Millipore filter and transferred into a centrifuge tube. Finally, pre-column derivatization was performed using the AccQ-Fluor Reagent Kit according to the manufacturer’s specifications. The test solution was further analyzed for amino acids by HPLC.

Catechins and Caffeine Analyses

HPLC was used to determine catechins and caffeine in tea samples. The Waters E2695 series HPLC system (Waters Corp., Milford, MA, USA) consisted of a sample manager, a quaternary solvent manager, an ultraviolet/visible detector, and a SymmetryR C18 column (particle size 5 μm; column size 250 mm × 4.6 mm; Waters). The gradient was from solvent A (9 % acetonitrile, 2 % acetic acid, 0.2 % EDTA, 88.8 % water) to solvent B (80 % acetonitrile, 2 % acetic acid, 0.2 % EDTA, 17.8 % water), and the linear gradient condition of the mobile phase was 0∼10 min, 100 % A; 10∼25 min, 100∼68 % A; 25∼35 min, 68 % A; 35∼40 min, 68∼45 % A; 40∼45 min, 45 % A; 45∼50 min, 45∼100 % A; and 50∼60 min, 100 % A. The flow rate was 1 mL/min, and the detection was performed at 278 nm (ISO 14502-2 2005). The liquid samples were prepared with different solution concentrations of caffeine and catechins, according to the method described in ISO 14502-2 (2005). The peak retention times and samples areas were determined by HPLC (Fig. 1a) and quantitative analysis conducted according to the external standard method. Standard curves of caffeine were Y = 12,816x and R 2 = 1.0000.

Fig. 1
figure 1

HPLC chromatogram of catechins, caffeine, and theanine. a HPLC chromatogram of mixed references of catechins and caffeine. b HPLC chromatogram of theanine

As a percentage by mass of samples dry matter, the total catechin contents of tea were calculated by summing the individual catechins. Total and single catechins were calculated as follows: % total catechins = (% EGC) + (% C) + (% EC) + (% EGCG) + (% ECG); and % single catechins = (% EGC) + (% C) + (% EC).

Theanine Analysis

Sample analysis was carried out with a Waters 600 series HPLC system (Waters Corp., Milford, MA, USA). A Waters AccQ.Tag reversed-phase HPLC column (particle size 4 μm; column size 3.9 mm × 150 mm; Waters) was used, and the column temperature was maintained at 25 °C. The mobile phase was composed of AccQ.Tag Eluent A (A), acetonitrile (B), and MilliQ water (C). The following are the gradient programs used for separation of the amino acids in tea sampels: 100 % A at 0 min, turned linearly to 91 % A, 5 % B, and 4 % C at 17 min, then changed into 80 % A, 17 % B, and 3 % C at 24 min, 68 % A, 20 % B, and 12 % C at 32 min, and the last for 2 min, then went to 60 % B and 40 % C at 35∼37 min, and returned to 100 % A at 38 min, then added to column wash and stabilization from 38 to 45 min. The flow rate was 1 mL/min, and the detection was at 395 nm. The content of theanine was obtained from chromatograms (Chen et al. 2009). The liquid samples were prepared with different solution concentrations of theanine, according to the method described in ISO/WD 19563. The peak retention times and samples areas were determined by HPLC (Fig. 1b), and quantitative analysis was conducted out according to the external standard method. The standard curve of theanine was Y = 7851.10x + 3089 and R 2 = 0.9998.

Data Analysis

The instrumental data were analyzed by principal component analysis (PCA). PCA is a technique used to emphasize variation and elucidate strong patterns in a dataset. PCA is often used to convert data into a form that is easy to explore and visualize. A small comprehensive index amount was utilized to obtain equivalent information with a large number of variables (Shao et al. 2007; Duda et al. 2003; Liu and Yi 2008). PCA was performed by Statistical Product and Service Solutions (SPSS, version 18.0), and the data were to better visualize the differences among the various parameter by projecting them into the most significant components.

Fisher Discriminant Analysis

Fisher discriminant analysis (FDA) is suitable for two kinds of discriminant method, which is associated with the PCA and equivalent to canonical correlation analysis. The first canonical variable, which represented the greatest possible multiple linear combination of the related variables, was selected for each group. Next, the second typical variable was determined. The center of gravity coordinate was calculated in low-dimensional space using typical variable categories; the discriminant patterns to calculate the coordinates of each sample are given. The category of each sample was evaluated based on the distance from each observational point to the center.

Results and Discussion

Chemical Compositions of Samples

Several compounds were remarkably found to have significant differences among the six tea categories, as listed in Table 2. The highest compound contents in green tea compared with the other teas were EGCG, EGC, and ECG, which might indicate that the enzyme deactivation preserves the greatest number of polyphenols of fresh tea leaves during green tea processing. Theanine and caffeine were present in the highest contents in white tea, which is processed with long withering times and does not undergo enzyme inactivation and rolling processing. Because the drying temperatures and cell damage rates of white tea are much lower than those of other teas, caffeine sublimation of white tea is lower than that of other teas. There is another reason that the tea varieties for white tea have more fine hairs which contain high caffeine. However, black and dark teas showed much lower contents of EGCG, EGC, single catechins, and total catechins. In addition, dark tea also had the lowest theanine content; most polyphenols in black and dark teas are dramatically decreased due to oxidation during the fermentation and post-fermentation processes, which cause automatic oxidation of polyphenols and enzymatic oxidation. Oolong and yellow teas had middle contents of EGCG, EGC, and total catechins compared with other teas, due to their partially fermented polyphenols. The results showed that green tea contained the highest total catechin level compared with other teas, and black and dark teas had at the lowest level compared with other teas.

Table 2 Catechins, caffeine, and theanine contents of six tea categories

Factor Analysis

Main Factor Selection with PCA

While teas contain many chemical compositions, we needed to determine which chemicals were the most significant contributors for different tea categories. Principal component analysis was performed on C, EC, EGC, EGCG, ECG, theanine, caffeine, and their combinations of the six tea categories using SPSS software. The results showed that the sum of the first extracted two accumulated contributor rates exceeded 91.10 % of the principal components, and the first three accumulated contributor rates exceeded 97.70 % of the principal components. Thus, the first three principal components represent almost all of the teas information. The first principal components mainly represent the information of caffeine; the second principal group mainly represents the information of theanine, EGCG/total catechins, and theanine × caffeine; while the third principal components mainly represents the information of total catechins, theanine × theanine, and theanine × caffeine. Six representatives of the principal component factors were obtained, which were caffeine (X 1), total catechins (X 2), theanine (X 3), theanine × theanine (X 4), EGCG/total catechins (X 5), and theanine × caffeine (X 6).

Tea Classification with Bayes

Bayes discriminant analysis was used to distinguish among the six tea categories, using six new variables. As shown in Fig. 2a, the discriminant rates of green and yellow tea were only 75.00 and 57.10 %, respectively, which were lower than the statistical standards of 85.00 %.The black and dark teas, which are both fermented teas, and most of the polyphenols were dramatically decreased during fermentation, could not discriminated. The black and dark teas had similar chemical compositions of EGCG, total catechins, and single catechins. Many green teas and yellow teas were mixed together when discriminated, and partly, white teas were also mixed with green teas (Fig. 2b). The six tea categories could not be effectively identified at one time with Bayes discriminant method; therefore, a new method was explored.

Fig. 2
figure 2

Discriminant rate and schematic diagram of canonical discriminant classification of six tea categories using Bayes discriminant analysis. a The discriminant rates of six tea categories using Bayes discriminant analysis. b The schematic diagram of canonical discriminant classification of six tea categories using Bayes discriminant analysis

Fisher Discriminant Method

Black, Dark, and Other Tea Classifications with Fisher Analysis

X 1, X 2, X 3, X 4, X 5, and X 6 were selected as variables, then the Fisher discriminant function mathematical equation and center function value of the masses were obtained using SPSS. The discriminant function was as follows: Fisher = 0.17X 1 + 0.20X 2 + 0.70X 3 + 0.80X 4 + 7.71X 5 − 0.65X 6 − 4.73,the six tea categories were divided into two groups; one group contained black and dark teas, and the center function value was −4.82, while another group included green, white, yellow, and oolong teas, and the center function value was 1.33 (Fig. 3a). The center function value was calculated for any of the teas, and if the value was close to −4.82, the tea was categorized black or dark tea. However, if the value was close to1.33, then the tea was categorized as one of other four types. Only 13 samples were wrongly identified among the 155 black and dark teas, the discriminant rate was 91.60 %, and the overall discriminant rate was 96.60 % for all samples (Fig. 4a).

Fig. 3
figure 3

Distribution of the center function value. a Distribution of the center function value of group one (black and dark teas) and group two (green, white, yellow, and oolong teas). b Distribution of the center function value of dark tea and black tea. c Distribution of the center function value of white tea and group (green, yellow, and oolong teas). d Distribution of the center function value of oolong tea and group (green and yellow teas)

Fig. 4
figure 4

The results of Fisher discriminant analysis. a The discriminant rate of group one (black and dark teas), group two (green, white, yellow and oolong teas), and total discriminant rate using Fisher discriminant analysis. b The discriminant rate of dark tea, black tea, and total discriminant rate using Fisher discriminant analysis. c The discriminant rate of white tea, group (green, yellow and oolong teas), and total discriminant rate using Fisher discriminant analysis. d The discriminant rate of oolong tea, group (green and yellow teas), and total discriminant rate using Fisher discriminant analysis

Black and Dark Tea Classification with Fisher Analysis

In order to discriminate the black and dark teas in first group, the Fisher analysis was further applied. The discriminant function of black tea and dark tea was as follows: Fisher = 0.74X 1 − 0.02X 2 + 17.25X 3 + 6.42X 4 + 1.45X 5 − 1.27X 6 − 6.03, and the center function value of the mass is shown in Fig. 3b. The results show that only four samples were wrongly classified as dark tea samples among the 97 black tea samples, and the discriminant rate was 95.60 %. Overall, 58 dark tea samples were correctly recognized (Fig. 4b). Most of the polyphenols were oxidized in black and dark tea processing, but polyphenol oxidation ratio in dark tea was higher than those in black tea. This is reflected on the differences of EGCG content in those two teas (Table 2).

White and Green and White and Oolong Tea Classifications with Fisher Analysis

The Fisher analysis is further used to discriminate the second group containing white, green, yellow, and oolong teas. Firstly, the white teas were extracted as a subgroup and the other three teas as the second subgroup. The white tea classification results are shown in Fig. 4c. Only two samples were wrongly classified among the 50 white tea samples, while 221 tea samples were correctly recognized among the 221 green, yellow, and oolong tea samples. The overall discriminant rate was 95.70 % for all of the samples. The discrimination function was as follows: Fisher = −0.11X 1 + 0.44X 2 − 0.09X 3 + 2.03X 4 − 1.36X 5 − 1.40X 6 − 1.16, and the center function value was −2.85 and 0.62, separately (Fig. 3c).

Oolong, Green, and Yellow Tea Classification with Fisher Analysis

Although oolong tea was semi-fermented tea, there were three kinds of fermentation type of oolong tea, which were fresh aromatic oolong, strong aromatic oolong, and baking aromatic oolong. Specially, fresh aromatic oolong was slightly fermented, and the chemical composition, color, was close to green tea. To further discriminate the oolong, green, and yellow teas, the Fisher analysis is successively applied. The discriminant function of oolong tea, green tea, and yellow tea was as follows: Fisher = 0.86X 1 − 0.16X 2 + 8.81X 3 − 2.73X 4 + 0.85X 5 − 0.39X 6 − 3.37, and the center function value of oolong tea was −1.79, and the center function value of green tea and yellow tea was 1.23 (Fig. 3d). A total of 15 samples were wrongly classified as oolong tea among the 137 yellow and green tea samples. The discriminant rate of oolong tea was 89.10 %, and the overall discriminant rate was 91.80 % for all of the samples (Fig. 4d).

Green Tea and Yellow Tea Classification with Fisher Analysis

As listed in Table 2, almost all of the compounds in the green and yellow teas were remarkably found to have similar chemical contents. Yellow tea was partly fermented tea, and polyphenols were oxidated partly in the process. Green tea was not a fermented tea, but polyphenols could oxidate also in low-grade green tea because of crude conditions in the process. So the yellow tea and green tea were very similar in chemical compositions. The compounds with the highest differences between them were total catechins. When the Fisher analysis was applied, the discrimination function achieved was as follows:

$$ \mathrm{Fisher}=-4.05{X}_1+0.55{X}_2-6.41{X}_3-1.18{X}_4+6.37{X}_5+2.78{X}_6+1.39 $$

and the center function value of green tea was 0.32, and the center function value of yellow tea was −1.75. There were 13 samples wrongly classified for yellow tea among 116 green tea samples, while three yellow tea samples were wrongly classified as green tea. The overall discriminant rate was 88.30 % in all of the samples (Table 3).

Table 3 Cross-validation of the classification results

Taken together, the six tea categories were successfully discriminated by stepwise Fisher analysis combined with the PCA of main chemical compounds in teas as the strategies shown in Fig. 4.

Conclusions

In a previous report, tea was investigated for chemical characteristic using LC-MS metabolomic analysis and chromatographic methods. And LC-MSn method was used for the detection of secondary metabolites in 10 Chinese dark tea varieties. After MS/MS fragmentation analysis, it was found that flavonoid triglycosides and catechin derivatives were the distinctive compounds of fu teas (Zhang et al. 2014; Zhao et al. 2011; Yashin et al. 2015), while in our present study, main catechins, caffeine, and theanine were clearly determined by the widely used HPLC method, according to the generally harmonious ISO standard. The contents of catechins, theanine, and caffeine in tea samples were used as chemical descriptors to identify the categories of teas. Fisher analysis was applied as a pattern recognition tool to develop identification models. Here, the novel stepwise strategy was explored to discriminate the six categories from group to group, and finally, the overall results sufficiently demonstrated that it was feasible to realize tea chemical classification stepwise. In contrast with the Bayes approach, the Fisher algorithm showed excellent performance in identification results. We conclude that Fisher pattern recognition can be applied to quantitatively identify the six tea categories based on their chemical component.