Introduction

Traditional Chinese medicines (TCMs) have been used in China for centuries and have gained increasing global recognition for their role in disease prevention and treatment [1, 2]. However, the diversity and potential for confusion surrounding TCM herbal materials present formidable challenges for quality control. Gaining a comprehensive understanding of TCM chemical composition is immensely valuable for ensuring quality and enabling natural product drug discovery [3]. Nevertheless, the highly complex and structurally similar components in TCMs, combined with scarce reference standards, pose great difficulties for systematic characterization.

Liquid chromatography-mass spectrometry (LC-MS) is now widely utilized to characterize the chemical compositions of TCMs, enabling acquisition of chromatographic separation, accurate mass, and fragment information [4, 5]. Due to the complexity of TCMs, containing many trace or isomeric components, enhanced separation is often required to fully resolve and identify chemical marker constituents. Researchers have improved separation using two-dimensional liquid chromatography (2D-LC), which couples two columns with different, independent separation mechanisms. On the first dimension column, the sample undergoes concentration and fractionation before transfer to the second dimension for further separation. This facilitates improved separation of components that cannot be completely resolved in the first column. Various column combinations can be selected in 2D-LC based on the specific separation objectives and sample characteristics [6, 7]. On the other hand, due to the presence of complex information and background ion interference in high-resolution mass spectrometry (HRMS) data, various data mining techniques such as mass defect filtering (MDF) filtering [8], neutral loss filtering [9, 10], and diagnostic ion filtering [11, 12] are extensively employed. These techniques play a crucial role in eliminating a significant number of interfering ions, thus simplifying the characterization process.

Arnebiae Radix, known as “Zicao” in Chinese, has a long history of use for treating various skin conditions including eczema, dermatitis, and pruritus. It also promotes wound healing and is applied for trauma and burns, with the commercial prescription “Zicao Ointment” frequently used for skin abscesses and wound repair. Arnebiae Radix possesses a diverse array of compounds including shikonins, shikonofurans, phenolic acids, flavonoids, and other constituents such as fatty acids and polysaccharides. Among them, shikonins and shikonofurans are the most abundant and considered the major bioactive components. Owing to their exceptional anti-tumor activities, they have attracted significant research attention. Recent work by Professor Ma’s team [13] demonstrated the potential of shikonin as an IKKβ/NEMO complex inhibitor through molecular pharmacology, proteomics, and bioinformatics. By suppressing NF-κB, shikonin effectively inhibits colorectal cancer cell proliferation in vitro and in vivo, with the binding site also elucidated, highlighting its potential for small molecule PPI inhibitor development. In Chinese Pharmacopoeia (2020 edition), β,β-dimethylacrylalkannin is designated as the quantitative marker for the quality control of this herb. For shikonofurans, they exhibit similar activities to shikonins, including anti-inflammatory [14] and anti-tumor [15] properties. Phenolic acids are another major class in Arnebiae Radix, including monomers (e.g., salvianic acid A) and polymers (e.g., salvianolic acid B) that also occur in Salvia miltiorrhiza (Danshen), known for their anti-inflammatory [16] and antioxidant [17] activities. In summary, the diverse bioactive chemical compositions present in Arnebiae Radix underscore the importance of comprehensive investigations into its chemical makeup.

Arnebiae Radix, derived from the dried roots of Arnebia euchroma (Royle) Johnst. and Arnebia guttata Bunge, is officially recorded in the 2020 Chinese Pharmacopoeia. The former one, known as “Ruanzicao (RZC),” is the highest quality and most widely circulated, while the latter has limited resources and is rarely found. However, other species such as Lithospermum erythrorhizon Sieb. et Zucc. (“Yingzicao,” YZC), Onosma paniculatum Bur. et Franch. (“Dianzicao,” DZC), and Onosma hookeri Clarke. var. longiforum Duthie (“Zangzicao,” ZZC) have similar properties and are relatively cheaper, leading to potential adulteration of authentic Arnebiae Radix. While some studies have explored its phytochemistry, there is a shortage in research comparing Arnebiae Radix and its common substitutes, hence hampering its quality control. Feng et al. [18] used UPLC-Q-Orbitrap MS with background subtraction, diagnostic ion filtering, and MDF to identify 96 compounds mainly including shikonins and shikonfurans in the mixed extracts of Arnebiae Radix and YZC. Zhu et al. [19] characterized 188 compounds, primarily including shikonins, shikonfurans, phenolic acids, and flavonoids, from RZC using UPLC-Q-Exactive Orbitrap MS. However, differences among Arnebiae Radix and its adulterants remain unclear.

In this study, comprehensive characterization and comparison of the phytochemical components in Arnebiae Radix (RZC) and its three common adulterants (YZC, DZC, ZZC) were performed using an offline 2D-LC system to reveal their compositional differences and discover novel compounds (Fig. 1). Several strategies were implemented to enhance separation, improve identification reliability, and effectively profile the four species: (1) An offline 2D-LC system integrating HILIC and RP separations was established to achieve efficient separation and uncover more trace constituents; (2) A polygonal MDF generated precursor ion lists (PILs) guiding CID-MS2/HCD-MS3 acquisition on an LTQ-Orbitrap mass spectrometry to obtain more fragments; (3) A three-step characterization workflow involving formula confirmation, class assignment via diagnostic ions/neutral losses, and structural identification was used to ensure reliable compound identification; (4) Systematic characterization and analysis of the chemical profiles in Arnebiae Radix and its adulterants were achieved using self-built/online databases and references. This study elucidated the compositions of Arnebiae Radix and its common adulterants while identifying potential differentiation markers, which is critical for its quality control. Additionally, new bioactive analogs were discovered, providing inspiration for structural modification.

Fig. 1
figure 1

A workflow for multicomponent characterization of Arnebiae Radix (RZC) and its three confusing species

Materials and methods

Material and reagents

A total of 17 compounds were used as reference compounds, including the following: (1) 8 shikonins: shikonin, deoxyshikonin, acetylshikonin, β,β′-dimethylacrylshikonin, isobutyrylshikonin, β-hydroxyisovalerylshikonin, β-acetoxyisovalerylalkannin, and isovalerylshikonin; (2) 1 shikonofuran: shikonofuran A; (3) 4 phenolic acids: sodium danshensu, rosmarinic acid, lithospermic acid, and salvianolic acid B; and (4) 4 flavonoids: quercetin, rutin, naringenin, and hesperidin. These reference compounds were purchased from Weikeqi Biological, Chengdu Must Biological, or Shanghai Standard Technology Co. Ltd. with purity ≥98%. The structures are depicted in Figure S1.

A total of 28 batches of dried samples (10 batches of RZC, 10 batches of YZC, 7 batches of DZC, 1 batch of ZZC) were obtained through professional field collection or purchased from herbal markets. Identification was carefully performed by Mr. Shuai Yao, a senior investigator in medicinal plant identification at institution, based on characteristics listed in the Flora of China (www.iplant.cn/frps), Chinese Pharmacopoeia, and provincial standards, along with HPLC fingerprinting of reference crude drugs. Voucher specimens were deposited at the Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.

LC/MS grade acetonitrile (ACN) and methanol (MeOH) were acquired from Ourchem, Sinopharm Chemical Reagent Co. Ltd. (Beijing, China). LC-MS grade formic acid was obtained from TCI (Shanghai) and ultrapure water was purified using a Milli-Q Integral system (Merck Millipore, USA) for mobile phase preparation. Analytical grade methanol used as the extraction solvent was also acquired from Sinopharm Chemical Reagent Co. Ltd. (Beijing, China).

Sample preparation

Powdered samples of the four species (RZC, YZC, DZC, ZZC) were prepared by grinding and sieving through a No. 65 mesh. For each crude sample powder (1 g), ultrasonic extraction (300 W, 40 kHz) was performed with 10 mL of 100% MeOH for 45 min at room temperature. The resulting supernatant was obtained by centrifugation at 4000 rpm for 10 min. Multi-batch mixtures were concentrated fivefold and used as the test solutions for 1D-HILIC separation.

Each 1D fraction was dried under nitrogen and reconstituted in 100% MeOH. The resulting mixture was then subjected to centrifugation at 14,000 rpm for 10 min. The solution obtained after centrifugation, referred to as the supernatant, was used as the test solution for 2D-RP liquid chromatography.

The QC sample was created by combining 0.3 mL of the test solutions of the four species before concentration. This sample was then utilized to establish the 2D-LC system and assess its orthogonality.

Offline 2D-LC/LTQ-Orbitrap MS

An offline 2D-LC system was established by combining HILIC and RP-UHPLC (reversed-phase ultra-high performance liquid chromatography). The 1D-HILIC separation was performed using an Agilent 1260 HPLC system fitted with a Waters XBridge Amide column (4.6 × 150 mm, 3.5 μm) which was maintained at a temperature of 30°C. The mobile phase was composed of 0.1% formic acid (A) and ACN (B) and the following gradient program was utilized: 0–4 min: 95% B; 4–20 min: 95–85% B; 20–25 min: 85–65% B; 25–40 min: 65–50% B; 40–50 min: 50% B. The flow rate was 0.6 mL/min and the injection volume was 20 μL. To retain the separation achieved in 1D-HILIC and avoid peak splitting, a peak-dependent fraction collection method was utilized. This resulted in 12 fractions being collected for each species (Fig. S6). The solvents were evaporated under a continuous flow of nitrogen at 25°C, and the remaining substances were then prepared again for further 2D analysis. The 2D-RP separation was performed on a Thermo Ultimate® 3000 UHPLC system. A BEH C18 column (2.1 × 100 mm, 1.7 μm) was kept at a temperature of 30°C. The mobile phase consisted of 0.03% formic acid (A) and ACN (B), and was used following the provided gradient program: 0–4 min: 5–32% B; 4–6 min: 32–50% B; 6–11 min: 50–52% B; 11–16 min: 52–68% B; 16–25 min: 68–83% B; 25–27 min: 83–95% B. The flow rate was adjusted to 0.3 mL/min, while the injection volume was varied between 2 and 8 μL in order to achieve a suitable MS response.

Data for HRMS in negative ionization mode were obtained using a Thermo Fisher Scientific LTQ-Orbitrap Velos Pro hybrid mass spectrometer that was equipped with a heated electrospray ionization (ESI) source. In-depth chemical characterization of RZC, YZC, DZC, and ZZC was performed using a full mass scan/CID-MS2/HCD-MS3 method with a PIL. The Orbitrap analyzer scanned an m/z range of 100–1200 with a resolution of 30,000 for the full MS1 scan and 7500 for the MS2 and MS3 scans. Ions with ion intensities greater than 5000 in the full mass spectrum and within the specified m/z range of the PIL were selected to trigger CID-MS2. Specific PILs were set for the four species, with detailed information provided in the following subsection. The ion with the highest intensity in the MS2 spectrum was selected for HCD-MS3 fragmentation. Normalized collision energies were set at 55% for CID-MS2 and 85% for HCD-MS3. The parameters of ESI source and dynamic exclusion were set as described in the previous work [6] of our group.

Orthogonality evaluation of HILIC×RP

Orthogonality evaluates the selectivity and applicability of the constructed offline two-dimensional system. Orthogonality can be calculated in various ways, with the asterisk equations being most widely used [20]. The orthogonality of the established two-dimensional liquid phase system was evaluated using 55 index compounds as indices. The normalized retention times (tR,norm(i)) of each indicator were calculated in the first and second dimensions according to Eq. 1, taking the normalized retention time on the XBridge Amide column as the horizontal coordinate and the normalized retention time on the BEH C18 column as the vertical coordinate. This dispersed the indicators across the two-dimensional t1,norm-t2,norm space. The four lines Z_, Z+, Z1, and Z2 divide the two-dimensional distribution space into different regions, with the 55 indices distributed around the four lines represented by SZ_, SZ+, SZ1, and SZ2 per Eqs. 25. The parameters Z_, Z+, Z1, and Z2 describe semi-quantitatively the distribution of compounds around the four lines, with larger Z values indicating more even distribution around that line (Eqs. 69). Finally, the orthogonality A0 of 2D-LC system was obtained via Eq. 10.

$${t}_{R,\mathit{\operatorname{norm}}(i)}=\frac{t_{R(i)}-{t}_D}{t_G-{t}_D}$$
(1)
$${S}_{Z\_}=\sigma \left\{{{}{}^1t}_{R,\mathit{\operatorname{norm}}(i)}-{{}{}^2t}_{R,\mathit{\operatorname{norm}}(i)}\right\}$$
(2)
$${S}_{Z_{+}}=\sigma \left\{{{}{}^1t}_{R,\mathit{\operatorname{norm}}(i)}-\left(1-{{}{}^2t}_{R,\mathit{\operatorname{norm}}(i)}\right)\right\}$$
(3)
$${S}_{Z_1}=\sigma \left\{{{}{}^1t}_{R,\mathit{\operatorname{norm}}(i)}-0.5\right\}$$
(4)
$${S}_{Z_2}=\sigma \left\{{{}{}^2t}_{R,\mathit{\operatorname{norm}}(i)}-0.5\right\}$$
(5)
$${Z}_{-}=\mid 1-2.5\mid {S}_{Z\_}-0.4\Big\Vert$$
(6)
$${Z}_{+}=\mid 1-2.5\mid {S}_{Z_{+}}-0.4\mid$$
(7)
$${Z}_1=1-\mid 2.5\bullet {S}_{Z_1}\bullet \sqrt{2}-1\mid$$
(8)
$${Z}_2=1-\mid 2.5\bullet {S}_{Z_2}\bullet \sqrt{2}-1\mid$$
(9)
$${A}_0=\sqrt{Z_{-}\bullet {Z}_{+}\bullet {Z}_1\bullet {Z}_2}$$
(10)

A MDF algorithm for PIL

Mass defect is the difference between a compound’s exact mass and the nearest integer mass. MDF utilizes the known exact mass and mass defect ranges for compounds with specific structures and their analogs. In most cases, MDF is used to filter HRMS data after data acquisition. Complex spectral information can be simplified through MDF to obtain data pertaining to target compounds, improving parsing efficiency and reliability. Several dozen MDF approaches have been developed and matured, such as conventional MDF [21], multiple MDF [22], linear gradient MDF [23], and stepwise MDF [24]. However, some certain target ions may fail to record their multistage mass (MSn) information due to their insufficient response after undergoing MDF window filtering. Recently, several studies have employed MDF before data acquisition, specifically by utilizing MDF to obtain a PIL that guides the collection of MSn data, such as compound-specific MDF [25] and polygonal MDF [26].

Here, we applied a polygonal MDF to obtain the PILs for triggering multistage fragmentation of more potential target ions. By searching online databases for chemical constituents reported from the four species, we constructed a self-built database covering 136 compounds. Xcalibur 2.1 software was used to precisely calculate the [M-H] m/z value of each compound in the database to obtain the integer mass and mass defect. These two parameters were used to generate a 2D scatter plot of all database compounds, from which the polygonal MDF window was constructed as Excel functions according to the distribution. UHPLC/LTQ-Orbitrap MS provided full scan mass spectra of the four species and the ion list was obtained by QI extraction. The extracted QI ion list was first filtered using the MDF window, followed by the exclusion of ions with even integer mass, to generate the final set of target precursor ions.

Result and discussion

Establishment of an offline 2D-LC/HRMS system

To construct an efficient 2D-LC system, it is necessary to optimize the parameters for both the 1D and 2D separations, based on QC samples. Generally, the 2D column is chosen first, followed by the determination of 1D column, with different selection principles for each. When selecting a 2D column, the main concern is the separation effectiveness, including factors such as the degree of separation, analysis time, and information richness it provides. On the other hand, when choosing a 1D column, more attention is given to its separation difference from the 2D column, referred to as orthogonality. Additionally, the separation capacity and sample load of the 1D column are also considered.

Hence, six columns including BEH C18, BEH Phenyl, HSS T3, HSS Cyano, SB-AQ, and Kinetex F5 (Table S1) were initially evaluated to obtain satisfactory 2D separation. As a result, the BEH C18 column provided better separation and peak shape compared to the other columns, and was selected as 2D column for the separation of the four species (Fig. 2a). Some key chromatographic conditions were optimized to achieve the best 2D separation: using ACN-0.03% formic acid as the mobile phase (Fig. S2), setting the column temperature at 30°C (Fig. S3). Additionally, the HSS Cyano column exhibited very different selectivity from the BEH C18 column, making it a candidate column for 1D separation.

Fig. 2
figure 2

Selection of the stationary phases for the 2D-LC system. a Influence of different chromatographic columns on 2D-RP separation of a QC sample. b Comparison of the base peak chromatograms of a QC sample of six candidate columns for 1D separation. c The scatter plots of six candidate columns for 1D separation versus BEH C18 using 16 index compounds

The 1D column was chosen according to its selectivity difference with the 2D BEH C18 column, which was determined by calculating the linear regression correlation coefficient (R2) of the relative retention times of 16 index compounds (Table S2). Due to their different separation mechanisms, six chromatographic columns including HSS Cyano, Xbridge Amide, BEH Amide, Accucore HILIC, BEH HILIC, and Accucore-150-Amide-HILIC (Table S1) were examined. The Xbridge Amide, BEH Amide, and Accucore-150-Amide-HILIC columns showed relatively good separation and selectivity differences (R2 < 0.05). Ultimately, the Xbridge Amide column was chosen for 1D due to its higher loading capacity (Fig. 2b, c). The optimal conditions were determined to be ACN-0.1% formic acid as the mobile phase and a column temperature of 30°C (Fig. S4, S5).

To assess the orthogonality of the 2D LC system, a total of 55 compounds, including 17 reference standards, were chosen as index components (Table S3). Four parameters, Z_, Z+, Z1, and Z2, were calculated to measure the separation space. The values obtained for these parameters were 0.88, 0.78, 0.95, and 0.87, respectively. Ultimately, the overall orthogonality was calculated to be 0.75 (AO), as shown in Fig. S9. In summary, the constructed 2D system exhibited good orthogonality and could enable detection of more trace components for in-depth characterization of the four species.

Development of a MDF–PIL-CID-MS2/HCD-MS3 method for target acquisition

The creation of the polygonal MDF consisted of three steps: First, a home-made database of 136 components reported from the four species was created, involving shikonins (66), shikonofurans (19), phenolic acids (30), and flavonoids (21). Second, the exact m/z values of [M-H] for these 136 database components were obtained by calculating in Xcalibur software. Third, the polygonal window was constructed based on the known masses (integer mass) and known mass-dependent mass defect ranges. Figure 3a illustrates the distribution of 136 compounds on the two-dimensional scatter plot, exhibiting a well-defined cluster that encompasses over 99% of the components in database. Moreover, a ±20 ppm variation range was applied for the MDF window endpoints ensuring that ions resulting from systematic variations or potential analogs were appropriately captured within the specified variation range (Fig. 3b).

Fig. 3
figure 3

Development of a polygonal MDF method by plotting integer mass (horizontal axis) vs mass defect (vertical axis) for 136 compounds from the four species (a) ±20 ppm variation range for MDF window endpoints. (b) Application of the MDF on RZC (c), YZC (d), DZC (e), and ZZC (f) to obtain precursor ion lists

This MDF method can effectively remove false positive ions, as exemplified by RZC. Applying the MDF directly to the 4453 precursor ions extracted from the full scan QI list removed 1496 ions, eliminating 33.6% false positives (Fig. 3c). Similarly, 45.3%, 44.9%, and 46.2% false positive ions were filtered out for the other three species, leaving 1685, 3270, and 3468 ions respectively for further processing to obtain the final PILs (Fig. 3d, e, f). Subsequently, ions with even integer mass were removed to obtain target ions. For RZC, this step filtered out 467 false positives, leaving 2490 targets ions. Since the final PIL only retained 2 decimal m/z values, these 2490 ions were de-processed, yielding a final list of 794 precursor ions. Similarly, the other three species underwent these steps, generating PILs containing 508, 1117, and 1178 ions respectively. Finally, these m/z values in PILs were imputed into the Xcalibur software to trigger the MS2 and MS3 acquisition.

LTQ-Orbitrap mass spectrometry provides CID and HCD fragmentation modes. CID is a soft ionization technique at lower energies; fragmentation is milder, more readily producing stable characteristic fragment ions. In contrast, HCD is a hard ionization method. Compared to CID, HCD more readily generates low mass fragments but does not support further fragmentation. These complementary techniques acquire expanded fragment ion information to enable more accurate identification.

Twelve reference compounds representing the four main components in Arnebiae Radix (shikonins, shikonfurans, phenolic acids, flavonoids) were used to optimize CID energy. The goal was to obtain characteristic fragment ions essential for compound classification. For instance, McLafferty rearrangement of type I shikonins and type I shikonfurans produces diagnostic ions at m/z 269 and 255 respectively. CID energy was optimized from 35 to 65%, with the optimal value selected based on characteristic ion response intensity. When normalized collision energy was 55%, most components showed maximal intensities for key diagnostic fragments (Fig. S7). Thus, 55% was set for CID. For MS3, HCD energy was optimized from 55 to 85% using 4 representative reference compounds. The optimal value was selected based on fragment ion abundance. As energy increased, more fragments were progressively produced for all compounds (Fig. S8). Thus, 85% HCD energy was set to generate maximal fragments in MS3.

Construction of a three-step interpretation workflows for identification

The four species contain four main compound classes: shikonins, shikonofurans, phenolic acids, and flavonoids. The MS behavior of representative standards from each class was studied under CID-MS2/HCD-MS3, obtaining diagnostic product ions (DPIs) and neutral losses (NLs) for unknown components identification. CID readily produces diagnostic ions and neutral losses for compound typing. In contrast, HCD generates more fragments, aiding final identification. For certain product ions lacking reference standards, the literature [18, 19] was consulted to elucidate fragmentation patterns.

Shikonins are classified into 5 types (I–V) in structure [27]. Types I–III frequently undergo McLafferty rearrangement producing diagnostic ions in MS2 spectra, while types IV–V preferentially occur α-cleave. Fragmentation patterns of type I shikonins were studied using six standards, as listed in the “Material and reagents” section, under CID and HCD. As shown in Fig. 4a, type I shikonins produced a dominant peak at m/z 269.08 via McLafferty rearrangement under CID. HCD showed neutral losses H2O and CO due to consecutive elimination of hydroxyl and carbonyl groups on the parent nuclei. For other types, compounds in samples were used to determine fragmentation patterns. Compared to type I, type II shikonin has the C1-hydroxyl replaced by a methoxy group. Thus, CID showed a characteristic ion at m/z 283.10, 14.02 Da higher than type I. With HCD, type II more readily eliminated CO, producing abundant product ions with m/z 255.10 and 227.11 from loss and sequential loss of CO groups, followed by further cleavage of the C4H7 side chain, evidenced by m/z at 172.05 (Fig. 4b). Type III shikonin is a polymer of deoxyshikonin and type I shikonin. CID produced a diagnostic ion at m/z 539.17, while in MS3, there was a prominent observation of the neutral loss of H2O, resulting in an ion at m/z 521.16. Meanwhile, a rich product ion at m/z 251.07 was also observed, matching the dominant peak of type I shikonin in MS3 (Fig. 4c). Type IV and V shikonins share similar structures, differing only in C1 or C4 hydroxyl versus methoxy groups. Under CID-MS2, the DPI of type V shikonin was 14.02 Da higher than that of type IV, at m/z 299.09 and 285.08, respectively. Similar neutral losses of H2O and CO occurred in MS3, following analogous fragmentation patterns (Fig. 4d, e). In summary, DPIs for type I–V shikonins were observed at m/z 269.08, 283.10, 539.17, 285.08, and 299.09, respectively. These DPIs served as base peaks in the CID-MS2 spectra on the LTQ-Orbitrap mass and due to carbonyl and hydroxyl substitutions on the phenyl ring, neutral losses of H2O and CO frequently occurred in MS3-HCD spectra.

Fig. 4
figure 4

Annotation of the CID-MS2 and HCD-MS3 spectra of four class compounds of the four species (ae type I–type V shikonin; f, g type I–type II shikonofuran; h phenolic acid; i flavonoid)

Shikonofurans are classified into two types based on structure: shikonofuran (type I) and hydroxyshikonofuran (type II) [27], with the latter containing an additional C14 hydroxyl. As exemplified by shikonofuran A, type I shikonofurans undergo McLafferty rearrangement in CID-MS2 to produce a DPI at m/z 255.10. HCD-MS3 showed neutral loss of a CO from furan ring cleavage and subsequent C4H7 side chain loss, giving an characteristic ion with m/z 172.05. Neutral loss of a H2O also occurred due to the C1 hydroxyl (Fig. 4f). Since type II shikonofuran references were unavailable, compounds in samples were used to deduce fragmentation patterns by comparison to literatures. The m/z of the CID-MS2 diagnostic ion was 273.11, 18.01 Da higher than type I due to the extra hydroxyl. Unlike type I, fewer MS3 fragments were observed, mainly with m/z 255.10 and 237.09 from H2O loss, owing to the additional hydroxyl (Fig. 4g).

The four species also contain abundant phenolic acids, including monomers like salvianic acid A and caffeic acid, along with polymers such as lithospermic acid and salvianolic acid B. These structures frequently have carboxyl, hydroxyl, and carbonyl groups, producing neutral losses of CO2, H2O, and CO in MSn [28, 29]. In addition, polymers can undergo the neutral loss of either salvianic acid A, caffeic acid, or caffeoyl, leading to losses of 198.17 Da, 180.16 Da, and 162.03 Da, respectively [25, 30]. For instance, the MS1 spectrum of lithospermic acid showed abundant in-source decay product ions at m/z 493.11 from neutral loss of a CO2. The most characteristic MS2 fragmentation pathway was preferred elimination of a caffeic acid and a salvianic acid A moieties, generating m/z 313.07 and 295.06 product ions, respectively. In the MS3 spectrum, the substituent at the 2-position of benzofuran was lost, yielding the highest abundance ion of catechol at m/z 109.03 along with the fragment ion at m/z 185.02 (Fig. 4h).

Flavonoids tend to undergo a characteristic fragmentation pathway called Retro-Diels-Alder (RDA) reaction [31], producing distinctive RDA fragments (e.g., A, m/z 179.00 for 1,2A, m/z 151.00 for 1,3A) due to the cleavage of ring C, along with some neutral losses (e.g., 43.99 Da for CO2, 27.99 Da for CO, 15.02 Da for CH3, 18.02 Da for H2O). For flavonoid glycosides, loss of sugar moieties is also observed in MSn, including 162.05 Da for glucose (glc), 146.06 Da for rhamnose (rha), and 308.11 Da for rutinoside [25]. In the case of rutin, a predominant DPI at m/z 301.03 can occur due to the cleavage of glucosidic bond, losing a rutinoside moiety. Moreover, the abundant RDA reaction products in MS3 yield the base peak 1,3A at m/z 151.00, along with 1,2B (m/z 121.03), 0,2A (m/z 163.00), and 1,2A (m/z 179.00). Some neutral losses like CO and CO2 also occur, generating ions at m/z 271.02 and 255.03 (Fig. 4i).

Based on the aforementioned summary of fragmentation patterns, in order to achieve faster and more precise characterization, we propose a three-step analytical strategy, as depicted in Fig. 5.

Fig. 5
figure 5

A three-step characterization strategy including neutral losses and diagnostic product ions useful for the structural elucidation

In-depth characterization of the multicomponents in the four species

The chemical constituents from the four species were characterized using offline 2D-LC/HRMS, along with the proposed characterization strategy, reference literatures, and various online databases. In total, 437 components were characterized or tentatively identified across the four species, including 112 shikonins, 22 shikonofurans, 144 phenolic acids, 131 glycosides, 18 flavonoids, and 10 other compounds.

  • Shikonins

We were able to characterize 112 shikonins (25.6%), the most characteristic components in these species. Among these, the five common shikonin types were all detected, including 10 type I, 2 type II, 23 type III, 27 type IV, and 14 type V shikonins, along with 36 other shikonins. Notably, within other shikonins, 13 previously unreported shikonins were identified.

Unknown compounds 390 (Rt =17.55 min, C19H20O6) and 437 (Rt =26.57 min, C37H36O10) were used to demonstrate the identification process for type I–V shikonins. They produced abundant [M-H] precursor ions at m/z 343.1188 and 627.2236 in MS1 spectra. The MS2 spectra exhibited dominant DPIs at m/z 269.0817 and 539.1711, preliminarily inferring type I and type III shikonins respectively. This was further evidenced by consecutive neutral losses of H2O and CO in the MS3 spectra. Compound 390 underwent similar fragmentation pathways to acetylshikonin (C18H18O6) except with one additional CH2 in the molecular formula; hence, it was tentatively identified as propionylshikonin (Fig. S10a). Based on the molecular weight of compound 437, it was inferred to be a polymer of deoxyshikonin and isobutylshikonin, and thus identified as 7-(11′-deoxyalkannin)-isobutyrylshikonin/alkannin or its isomer (Fig. S10b).

Other shikonins such as shikonin, dehydrated shikonin, deoxyshikonin isomer, and shikometabolin A or B were also identified across the four species. Additionally, some potential novel shikonins were discovered, for instance, compound 305 (Rt =8.10 min, C21H24O6; Fig. S11a) and compound 282 (Rt =6.71 min, C18H20O6), which gave rich precursor ions at 371.1449 and 331.0882 respectively. With regard to the MS2 spectra of these two compounds, base peak fragment ion at m/z 271.10 was presumed to arise from McLafferty rearrangement. Compared to the MS2 DPI at m/z 269.08 of type I shikonins, it contained two additional hydrogens, potentially due to the absence of a carbon-carbon double bond at positions 13 and 14. Compound 305 was tentatively identified as arnebin II according to PubChem online database. However, compound 282 had no database records and was tentatively characterized as an arnebin derivative.

Another potential novel shikonin with DPI at m/z 303.09 contained one additional H2O versus the type IV (m/z 285.08). While they showed similar neutral H2O and CO losses in MS3, distinct side chain losses of C3H4 and C5H7 also occurred, generating abundant fragments at m/z 245.05, 227.04, 218.02, and 190.03. Figure S11b shows the possible cleavage mode for this type of shikonin, with compound 309 (Rt =8.38 min, m/z 403.1399) as an example. It was ultimately identified as hydroxylithospermidin F or isomer.

  • Shikonofurans

A total of 22 shikonofurans, the second characteristic compounds in these species, were characterized, including 8 type I and 14 type II shikonofurans. The structural elucidation of compounds 338 (RT =10.43 min, C19H21O5) and 364 (RT =10.43 min, C21H28O6) is shown in Fig. S12, giving deprotonated precursor ions at m/z 329.1387 and 375.1813. Its MS2 spectrum displayed dominant DPIs at m/z 255.10 and 273.11 rapidly classified them as type I and type II shikonofurans. They were tentatively identified as shikonofuran F and hydroxyshikonofuran L, respectively, by matching the molecular formula to previously published papers [18, 19].

  • Phenolic acids

Phenolic acids are widely distributed in plants and are abundantly present in the four species, with some characteristic components such as lithospermic acid. We could identify or tentatively characterize 144 phenolic acids in the four species. Phenolic acid polymers tend to have characteristic structures compared to monomers. Compound 262, which was identified as a phenolic polymer, displayed an [M-H] ion at m/z 701.1515 (C36H29O15) in the MS1 spectrum. This ion indicated that the compound had one less oxygen atom compared to salvianolic acid B ([M-H], C36H29O16). The MS2 and MS3 spectra revealed predominant fragment ions at m/z 519.0934 and 321.0403, respectively, generating by the loss of deoxy-salvianic acid A (C9H10O4) and salvianic acid A (Fig. S13a). By comparing the obtained data to values found in literature [30], it was proposed that compound 262 was 3′′′-deoxy-salvianolic acid B or an isomer.

Phenolic acids can be linked to sugars, such as Glc or xylose (Xyl), through glycosidic bonds, forming phenolic glycosides. Compound 152, with a retention time of 4.77 min, serves as an illustrative example. Its molecular formula, C14H16O8, was determined by calculating the m/z of its [M-H] peak at 311.0767 in the MS1 spectrum. The MS2 spectrum showed the presence of two product ions, m/z 179.0360 and 149.0455, which were assigned as [M-H-xyl] and [M-H-caffeoyl], respectively (Fig. S13b). Based on this information, compound 152 was tentatively identified as 1-O-Caffeoyl xylose according to the PubChem database.

  • Flavonoids

The number of flavonoids identified in the four species is limited, with only 18 such compounds detected. Compound 197 (Rt =5.17 min, m/z 623.1602) lost a rutinoside molecule in the MS2 spectrum to produce a fragment ion at m/z 315.0513. Despite low abundance, some RDA fragment ions were still observable in the MS3 spectrum (m/z 151.0029 for 1,3A, 163.0032 for 0,2A). Notably, the most abundant ion in MS3 spectrum (m/z 300.0276) was generated by the loss of CH3, indicating a potential methoxy substituent, which was further supported by the ion at m/z 255.0298 (Fig. S14a). We hypothesize that the methoxylation of flavonoids may hinder the RDA reaction. Ultimately, we putatively identified this compound as isorhamnetin-3-O-rutinoside according to PubChem database.

  • Glycosides

Interestingly, we found a large number of unidentified specific types of glycosides, 131 in total, which had not been reported in the literature before. The MSn fragmentation features of these glycosides are characterized by the loss of sugars, 162.05 Da for Glc, 132.04 Da for Xyl, and 308.11Da for rutinoside. For the unknown compound 117 (Rt =4.41 min), a prominent precursor ion at m/z 443.1552 was observed in the MS1 spectrum, as well as a neutral loss of Glc in the MS2 spectrum, and successive loss of another Glc in the MS3 spectrum, accompanied by the generation of some glc fragments (m/z at 113.02, 101.93) (Fig. S14b). Ultimately, compound 117 was tentatively identified as C8H8O-glc-glc.

  • Others

Besides the abovementioned components, 4 organic acids (compounds 1, 2, 5, 8), 2 sugars (compounds 3, 4), 2 lignins (compounds 207, 225), and 2 phenylethanol derivatives (compounds 177, 202) have been tentatively characterized.

Summary of characterized compounds in RZC, YZC, DZC, and ZZC

A total of 361, 230, 340, and 328 chemical components were identified from RZC, YZC, DZC, and ZZC, respectively. In total, 437 compounds were characterized across the four species. Two-dimensional scatter plots were constructed with retention time on the x-axis and m/z on the y-axis to visualize the distribution of different compound classes (Fig. 6a). Phenolic acids and glycosides were widely distributed within the first 8 mins, indicating relatively high polarity. The molecular weight range of phenolic acids was extensive, as evidenced by their presence in both the upper and lower regions. On the other hand, shikonins and shikonfurans were eluted after 6 min, highlighting their comparatively low polarity. Shikonins displayed a very wide polarity range from 6 to 27 min. Notably, higher molecular weight shikonins (especially type III) tended to have later retention times and lower polarity.

Fig. 6
figure 6

Summary of characterized components across the four species. a Two-dimensional scatter plot of 437 compounds. b A distribution of each class of compounds identified in the four species. c Shikonins and shikonfurans characterized in the four species. d A Venn diagram comparing the compound compositions among the four species

Notable differences were observed in the compound compositions across the four species. Regarding phenolic acids, RZC, DZC, and ZZC contained significantly higher levels compared to YZC (124, 118, 126 versus 64; Fig. 6b). For the major bioactive components in these species, shikonins and shikonfurans, RZC and YZC showed greater quantities than DZC and ZZC (107, 97 versus 79, 60; 22, 22 versus 18, 15; Fig. 6b). The latter two contained fewer types of shikonins overall, except for type II shikonin in DZC. In particular, type II shikonin was absent in ZZC (Fig. 6c), which may distinguish it from the other three species. In summary, RZC is the official source plant, containing higher levels of phenolic acids and shikonins than the other species, which may contribute to its higher market price compared to the other species.

Systematic comparison of the chemical compositions revealed 142 common compounds across all the four species, accounting for 32.5% of the characterized components. Additionally, 19 (4.3%), 2 (0.5%), 4 (0.9%), and 5 (4.3%) characteristic compounds were found exclusively in RZC, YZC, DZC, and ZZC, respectively (Fig. 6d). This indicates a high degree of similarity in the chemical profiles of the four species. The most abundant characteristic ions were selected for verification in total ion chromatogram (TIC) of the four species plants. As illustrated in Fig. S15, the characteristic ions M1 and M2 were identified as verbascoside or an isomer, sharing the same m/z of 623.1981 and retention times of 4.99 and 5.21 min, respectively. These ions were present in the extracted ion chromatogram of RZC but absent in the other three species, reinforcing the utility of the identified characteristic ions as markers for distinguishing RZC. Similarly, the effectiveness of M3-M5 ions in distinguishing YZC, DZC, and ZZC was confirmed. Details on these potential characteristic markers are provided in Supplementary Table S4. Pending validation in a larger sample set, these compounds may assist in distinguishing between the four species.

Conclusion

In this study, we developed an approach combining offline 2D-LC system, polygonal MDF-assisted PIL generation, CID/HCD data acquisition, and diagnostic ion and neutral loss filtering strategies for systematic characterization of the multicomponents across Arnebiae Radix and its three common adulterants. The established offline 2D system exhibited good orthogonality with AO of 0.75. Polygonal MDF was utilized to filter 794, 508, 1117, and 1178 ions from the four species, triggering PILs for MS2 fragmentation. Finally, combined with diagnostic ions and neutral losses, a total of 437 compounds were characterized, including 112 shikonins, 22 shikonfurans, 144 phenolic acids, 131 glycosides, 18 flavonoids, and 10 other compounds. Of these, 253 compounds were characterized for the first time, including 13 potential new shikonin analogs. Moreover, 361, 230, 340, and 328 compounds were identified from RZC, YZC, DZC, and ZZC, respectively, with 142 common compounds and 19, 2, 4, and 5 characteristic components for each species. These compounds may serve as potential identifiers for the four species pending validation.